MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7284
JIRA: https://issues.redhat.com/browse/RHEL-23894
commit f2eae58c4428bd792c8e91e3666ab0718d87b44a
Author: Todd Brandt <todd.e.brandt@intel.com>
Date: Tue May 20 03:45:55 2025 -0700
platform/x86/intel/pmc: Fix Arrow Lake U/H NPU PCI ID
The ARL requires that the GMA and NPU devices both be in D3Hot in order
for PC10 and S0iX to be achieved in S2idle. The original ARL-H/U addition
to the intel_pmc_core driver attempted to do this by switching them to D3
in the init and resume calls of the intel_pmc_core driver.
The problem is the ARL-H/U have a different NPU device and thus are not
being properly set and thus S0iX does not work properly in ARL-H/U. This
patch creates a new ARL-H specific device id that is correct and also
adds the D3 fixup to the suspend callback. This way if the PCI devies
drop from D3 to D0 after resume they can be corrected for the next
suspend. Thus there is no dropout in S0iX.
Fixes: bd820906ea9d ("platform/x86/intel/pmc: Add Arrow Lake U/H support to intel_pmc_core driver")
Signed-off-by: Todd Brandt <todd.e.brandt@intel.com>
Link: https://lore.kernel.org/r/a61f78be45c13f39e122dcc684b636f4b21e79a0.1747737446.git.todd.e.brandt@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: CKI GitLab Kmaint Pipeline Bot <26919896-cki-kmaint-pipeline-bot@users.noreply.gitlab.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7277
JIRA: https://issues.redhat.com/browse/RHEL-109504
Calling drm_dev_unplug() is the drm way to say the device
is gone and can not be accessed any more.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Tested-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20250507082821.2710706-1-kraxel@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 2507789a724d607fa9e162dcadeb9f51b071fc49)
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Gerd Hoffmann <kraxel@redhat.com>
Approved-by: José Expósito <jexposit@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: CKI GitLab Kmaint Pipeline Bot <26919896-cki-kmaint-pipeline-bot@users.noreply.gitlab.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7314
JIRA: https://issues.redhat.com/browse/RHEL-112493
intel_pstate is the cpu frequency driver for intel processors and
needs to be updated regularly.
Signed-off-by: David Arcari <darcari@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: CKI GitLab Kmaint Pipeline Bot <26919896-cki-kmaint-pipeline-bot@users.noreply.gitlab.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7285
JIRA: https://issues.redhat.com/browse/RHEL-110614
Turbostat is heavily utilized by x86 customers and as such it is routinely updated to match the latest upsteam.
Signed-off-by: David Arcari <darcari@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: CKI GitLab Kmaint Pipeline Bot <26919896-cki-kmaint-pipeline-bot@users.noreply.gitlab.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit e0423541477dfb684fbc6e6b5386054bc650f264
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Fri Sep 5 15:44:45 2025 +0200
PM: EM: Add function for registering a PD without capacity update
The intel_pstate driver manages CPU capacity changes itself and it does
not need an update of the capacity of all CPUs in the system to be
carried out after registering a PD.
Moreover, in some configurations (for instance, an SMT-capable
hybrid x86 system booted with nosmt in the kernel command line) the
em_check_capacity_update() call at the end of em_dev_register_perf_domain()
always fails and reschedules itself to run once again in 1 s, so
effectively it runs in vain every 1 s forever.
To address this, introduce a new variant of em_dev_register_perf_domain(),
called em_dev_register_pd_no_update(), that does not invoke
em_check_capacity_update(), and make intel_pstate use it instead of the
original.
Fixes: 7b010f9b9061 ("cpufreq: intel_pstate: EAS support for hybrid platforms")
Closes: https://lore.kernel.org/linux-pm/40212796-734c-4140-8a85-854f72b8144d@panix.com/
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Cc: 6.16+ <stable@vger.kernel.org> # 6.16+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-109212
Conflicts: Missing 5995d90d2d19f some surrounding APIs are different
commit bec324f33d1ed346394b2eee25bf6dbf3511f727
Author: Alex Markuze <amarkuze@redhat.com>
Date: Tue Aug 12 09:57:39 2025 +0000
ceph: fix race condition where r_parent becomes stale before sending message
When the parent directory's i_rwsem is not locked, req->r_parent may become
stale due to concurrent operations (e.g. rename) between dentry lookup and
message creation. Validate that r_parent matches the encoded parent inode
and update to the correct inode if a mismatch is detected.
[ idryomov: folded a follow-up fix from Alex to drop extra reference
from ceph_get_reply_dir() in ceph_fill_trace():
ceph_get_reply_dir() may return a different, referenced inode when
r_parent is stale and the parent directory lock is not held.
ceph_fill_trace() used that inode but failed to drop the reference
when it differed from req->r_parent, leaking an inode reference.
Keep the directory inode in a local variable and iput() it at
function end if it does not match req->r_parent. ]
Cc: stable@vger.kernel.org
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-109212
Conflicts: Centos missing 197b7d792d6ae adjesent API is different
Centos missing 2827badaf8162 obviates some function chnages
commit 15f519e9f883b316d86e2bb6b767a023aafd9d83
Author: Alex Markuze <amarkuze@redhat.com>
Date: Tue Aug 12 09:57:38 2025 +0000
ceph: fix race condition validating r_parent before applying state
Add validation to ensure the cached parent directory inode matches the
directory info in MDS replies. This prevents client-side race conditions
where concurrent operations (e.g. rename) cause r_parent to become stale
between request initiation and reply processing, which could lead to
applying state changes to incorrect directory inodes.
[ idryomov: folded a kerneldoc fixup and a follow-up fix from Alex to
move CEPH_CAP_PIN reference when r_parent is updated:
When the parent directory lock is not held, req->r_parent can become
stale and is updated to point to the correct inode. However, the
associated CEPH_CAP_PIN reference was not being adjusted. The
CEPH_CAP_PIN is a reference on an inode that is tracked for
accounting purposes. Moving this pin is important to keep the
accounting balanced. When the pin was not moved from the old parent
to the new one, it created two problems: The reference on the old,
stale parent was never released, causing a reference leak.
A reference for the new parent was never acquired, creating the risk
of a reference underflow later in ceph_mdsc_release_request(). This
patch corrects the logic by releasing the pin from the old parent and
acquiring it for the new parent when r_parent is switched. This
ensures reference accounting stays balanced. ]
Cc: stable@vger.kernel.org
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
Reviewed-by: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Alex Markuze <amarkuze@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7283
JIRA: https://issues.redhat.com/browse/RHEL-103591
commit e9576e078220c50ace9e9087355423de23e25fa5
Author: Yazen Ghannam <yazen.ghannam@amd.com>
Date: Mon Jul 21 18:11:54 2025 +0000
x86/CPU/AMD: Ignore invalid reset reason value
The reset reason value may be "all bits set", e.g. 0xFFFFFFFF. This is a
commonly used error response from hardware. This may occur due to a real
hardware issue or when running in a VM.
The user will see all reset reasons reported in this case.
Check for an error response value and return early to avoid decoding
invalid data.
Also, adjust the data variable type to match the hardware register size.
Fixes: ab8131028710 ("x86/CPU/AMD: Print the reason for the last reset")
Reported-by: Libing He <libhe@redhat.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250721181155.3536023-1-yazen.ghannam@amd.com
Signed-off-by: David Arcari <darcari@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: CKI GitLab Kmaint Pipeline Bot <26919896-cki-kmaint-pipeline-bot@users.noreply.gitlab.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7185
JIRA: https://issues.redhat.com/browse/RHEL-105422
amd-pstate is the cpu frequency driver for AMD processors and should
be routinely updated.
The following commits have been excluded from the backport:
c895ecdab2e4 x86/msr: Rename 'wrmsrl_on_cpu()' to 'wrmsrq_on_cpu()'
d7484babd2c4 x86/msr: Rename 'rdmsrl_on_cpu()' to 'rdmsrq_on_cpu()'
27a23a544a55 x86/msr: Rename 'wrmsrl_safe_on_cpu()' to 'wrmsrq_safe_on_cpu()'
5e404cb7ac4c x86/msr: Rename 'rdmsrl_safe_on_cpu()' to 'rdmsrq_safe_on_cpu()'
78255eb23973 x86/msr: Rename 'wrmsrl()' to 'wrmsrq()'
c435e608cf59 x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'
eaff6b62d343 cpufreq: Pass policy pointer to ->update_limits()
8157fbc90745 ("cpufreq/amd-pstate: Update asym_prefer_cpu when core rankings change")
Signed-off-by: David Arcari <darcari@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: CKI GitLab Kmaint Pipeline Bot <26919896-cki-kmaint-pipeline-bot@users.noreply.gitlab.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7268
JIRA: https://issues.redhat.com/browse/RHEL-110206
commit 897e8601b9cff1d054cdd53047f568b0e1995726
Author: Halil Pasic <pasic@linux.ibm.com>
Date: Tue Jul 22 18:18:17 2025 +0200
s390/ism: fix concurrency management in ism_cmd()
The s390x ISM device data sheet clearly states that only one
request-response sequence is allowable per ISM function at any point in
time. Unfortunately as of today the s390/ism driver in Linux does not
honor that requirement. This patch aims to rectify that.
This problem was discovered based on Aliaksei's bug report which states
that for certain workloads the ISM functions end up entering error state
(with PEC 2 as seen from the logs) after a while and as a consequence
connections handled by the respective function break, and for future
connection requests the ISM device is not considered -- given it is in a
dysfunctional state. During further debugging PEC 3A was observed as
well.
A kernel message like
[ 1211.244319] zpci: 061a:00:00.0: Event 0x2 reports an error for PCI function 0x61a
is a reliable indicator of the stated function entering error state
with PEC 2. Let me also point out that a kernel message like
[ 1211.244325] zpci: 061a:00:00.0: The ism driver bound to the device does not support error recovery
is a reliable indicator that the ISM function won't be auto-recovered
because the ISM driver currently lacks support for it.
On a technical level, without this synchronization, commands (inputs to
the FW) may be partially or fully overwritten (corrupted) by another CPU
trying to issue commands on the same function. There is hard evidence that
this can lead to DMB token values being used as DMB IOVAs, leading to
PEC 2 PCI events indicating invalid DMA. But this is only one of the
failure modes imaginable. In theory even completely losing one command
and executing another one twice and then trying to interpret the outputs
as if the command we intended to execute was actually executed and not
the other one is also possible. Frankly, I don't feel confident about
providing an exhaustive list of possible consequences.
Fixes: 684b89bc39 ("s390/ism: add device driver for internal shared memory")
Reported-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Tested-by: Mahanta Jambigi <mjambigi@linux.ibm.com>
Tested-by: Aliaksei Makarau <Aliaksei.Makarau@ibm.com>
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250722161817.1298473-1-wintera@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mete Durlu <mdurlu@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Patrick Talbert <ptalbert@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7267
JIRA: https://issues.redhat.com/browse/RHEL-110235
commit 62355f1f87b8c7f8785a8dd3cd5ca6e5b513566a
Author: Niklas Schnelle <schnelle@linux.ibm.com>
Date: Wed Jun 25 11:28:30 2025 +0200
s390/pci: Allow automatic recovery with minimal driver support
According to Documentation/PCI/pci-error-recovery.rst only the
error_detected() callback in the err_handler struct is mandatory for
a driver to support error recovery. So far s390's error recovery chose
a stricter approach also requiring slot_reset() and resume().
Relax this requirement and only require error_detected(). If a callback
is not implemented EEH and AER treat this as PCI_ERS_RESULT_NONE. This
return value is otherwise used by drivers abstaining from their vote
on how to proceed with recovery and currently also not supported by
s390's recovery code.
So to support missing callbacks in-line with other implementors of the
recovery flow, also handle PCI_ERS_RESULT_NONE. Since s390 only does per
PCI function recovery and does not do voting, treat PCI_ERS_RESULT_NONE
optimistically and proceed through recovery unless other failures
prevent this.
Reviewed-by: Farhan Ali <alifm@linux.ibm.com>
Reviewed-by: Julian Ruess <julianr@linux.ibm.com>
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Mete Durlu <mdurlu@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Patrick Talbert <ptalbert@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/7244
JIRA: https://issues.redhat.com/browse/RHEL-107194
This Merge Request intends to fix an error with the buildcheck test QE runs. It builds the kernel using a separate output directory for the build artifacts. The core of the problem is that some backports in RHEL 9 are already using a new way to reference the source directory used upstream, but those backports failed to bring the necessary kbuild changes to use it. The core of the Merge Request here is commit "kbuild: use $(src) instead of $(srctree)/$(src) for source directory", which is the main change required. All other changes are dependencies/pre-requisites or fixes to this change. With changes below, using the build with a separate output directory is then fixed.
- Makefile: add $(srctree) to dependency of compile_commands.json target (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- kbuild: scripts/gdb: bring the "abspath" back (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- kbuild: Use $(obj)/%.cc to fix host C++ module builds (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src) (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- kbuild: use $(src) instead of $(srctree)/$(src) for source directory (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- kbuild: use $(obj)/ instead of $(src)/ for common pattern rules (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- kbuild: do not add $(srctree) or $(objtree) to header search paths (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- arch: use $(obj)/ instead of $(src)/ for preprocessed linker scripts (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- arm64: vdso32: Remove unused vdso32-offsets.h (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- staging: vc04_services: interface: Drop include Makefile directive (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- staging: vc04_services: vchiq-mmal: Drop include Makefile directive (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- staging: vc04_services: bcm2835-camera: Drop include Makefile directive (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- staging: vc04_services: bcm2835-audio: Drop include Makefile directive (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- certs: check-in the default x509 config file (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- sparc: move the install rule to arch/sparc/Makefile (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- riscv: move the (z)install rules to arch/riscv/Makefile (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- powerpc: move the install rule to arch/powerpc/Makefile (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- powerpc: make the install target not depend on any build artifact (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- powerpc: remove unused zInstall target from arch/powerpc/boot/Makefile (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- nios2: move the install rule to arch/nios2/Makefile (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
- ARM: 9102/1: move theinstall rules to arch/arm/Makefile (Herton R. Krzesinski) [[RHEL-107194](https://issues.redhat.com/browse/RHEL-107194)]
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: Waiman Long <longman@redhat.com>
Approved-by: Jan Stancek <jstancek@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Vladis Dronov <vdronov@redhat.com>
Approved-by: Michal Schmidt <mschmidt@redhat.com>
Approved-by: José Expósito <jexposit@redhat.com>
Approved-by: Eric Auger <eric.auger@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Patrick Talbert <ptalbert@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-113139
commit f694481b1d3177144fcac4242eb750cfcb9f7bd5
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Thu Jun 5 17:07:31 2025 +0200
ACPI: processor: Rescan "dead" SMT siblings during initialization
Make acpi_processor_driver_init() call arch_cpu_rescan_dead_smt_siblings(),
via a new wrapper function called acpi_idle_rescan_dead_smt_siblings(),
after successfully initializing the driver, to allow the "dead" SMT
siblings to go into deep idle states, which is necessary for the
processor to be able to reach deep package C-states (like PC10) going
forward, so that power can be reduced sufficiently in suspend-to-idle,
among other things.
However, do it only if the ACPI idle driver is the current cpuidle
driver (otherwise it is assumed that another cpuidle driver will take
care of this) and avoid doing it on architectures other than x86.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/2005721.PYKUYFuaPT@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-113139
commit e91a158b694d7f4bd937763dde79ed0afa472d8a
Author: Len Brown <len.brown@intel.com>
Date: Fri Aug 8 15:37:14 2025 -0400
intel_idle: Allow loading ACPI tables for any family
There is no reason to limit intel_idle's loading of ACPI tables to
family 6. Upcoming Intel processors are not in family 6.
Below "Fixes" really means "applies cleanly until".
That syntax commit didn't change the previous logic,
but shows this patch applies back 5-years.
Fixes: 4a9f45a053 ("intel_idle: Convert to new X86 CPU match macros")
Signed-off-by: Len Brown <len.brown@intel.com>
Link: https://patch.msgid.link/06101aa4fe784e5b0be1cb2c0bdd9afcf16bd9d4.1754681697.git.len.brown@intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-113139
commit 72840238e2bcb8fb24cb35d8d1d5a822c04e62a4
Author: Uros Bizjak <ubizjak@gmail.com>
Date: Mon Jun 9 08:35:01 2025 +0200
intel_idle: Update arguments of mwait_idle_with_hints()
Commit a17b37a3f416 ("x86/idle: Change arguments of mwait_idle_with_hints()
to u32") changed the type of arguments of mwait_idle_with_hints() from
unsigned long to u32.
Change the type of variables in the call to mwait_idle_with_hints() to
unsigned int to follow the change.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20250609063528.48715-1-ubizjak@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-113139
commit a430c11f401589a0f4f57fd398271a5d85142c7a
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Thu Jun 5 17:06:08 2025 +0200
intel_idle: Rescan "dead" SMT siblings during initialization
Make intel_idle_init() call arch_cpu_rescan_dead_smt_siblings() after
successfully registering intel_idle as the cpuidle driver so as to
allow the "dead" SMT siblings (if any) to go into deep idle states.
This is necessary for the processor to be able to reach deep package
C-states (like PC10) going forward which is requisite for reducing
power sufficiently in suspend-to-idle, among other things.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/10669885.nUPlyArG6x@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-113139
commit 4c529a4a7260776bb4abe264498857b4537aa70d
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Sat Jun 7 14:22:56 2025 +0200
x86/smp: PM/hibernate: Split arch_resume_nosmt()
Move the inner part of the arch_resume_nosmt() code into a separate
function called arch_cpu_rescan_dead_smt_siblings(), so it can be
used in other places where "dead" SMT siblings may need to be taken
online and offline again in order to get into deep idle states.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/3361688.44csPzL39Z@rjwysocki.net
[ rjw: Prevent build issues with CONFIG_SMP unset ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-111532
commit 3c14917953a51a22f4fa7e13dfc13a4ec09bf348
Author: Mingming Cao <mmc@linux.ibm.com>
Date: Thu Aug 21 06:02:15 2025 -0700
ibmvnic: Increase max subcrq indirect entries with fallback
POWER8 support a maximum of 16 subcrq indirect descriptor entries per
H_SEND_SUB_CRQ_INDIRECT call, while POWER9 and newer hypervisors
support up to 128 entries. Increasing the max number of indirect
descriptor entries improves batching efficiency and reduces
hcall overhead, which enhances throughput under large workload on POWER9+.
Currently, ibmvnic driver always uses a fixed number of max indirect
descriptor entries (16). send_subcrq_indirect() treats all hypervisor
errors the same:
- Cleanup and Drop the entire batch of descriptors.
- Return an error to the caller.
- Rely on TCP/IP retransmissions to recover.
- If the hypervisor returns H_PARAMETER (e.g., because 128
entries are not supported on POWER8), the driver will continue
to drop batches, resulting in unnecessary packet loss.
In this patch:
Raise the default maximum indirect entries to 128 to improve ibmvnic
batching on morden platform. But also gracefully fall back to
16 entries for Power 8 systems.
Since there is no VIO interface to query the hypervisor’s supported
limit, vnic handles send_subcrq_indirect() H_PARAMETER errors:
- On first H_PARAMETER failure, log the failure context
- Reduce max_indirect_entries to 16 and allow the single batch to drop.
- Subsequent calls automatically use the correct lower limit,
avoiding repeated drops.
The goal is to optimizes performance on modern systems while handles
falling back for older POWER8 hypervisors.
Performance shows 40% improvements with MTU (1500) on largework load.
Signed-off-by: Mingming Cao <mmc@linux.ibm.com>
Reviewed-by: Brian King <bjking1@linux.ibm.com>
Reviewed-by: Haren Myneni <haren@linux.ibm.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250821130215.97960-1-mmc@linux.ibm.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mamatha Inamdar <minamdar@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-113139
commit c0f691388992c708436ab5f6e810865be6ddf5c6
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Thu Jun 5 17:04:11 2025 +0200
intel_idle: Use subsys_initcall_sync() for initialization
It is not necessary to wait until the device_initcall() stage with
intel_idle initialization. All of its dependencies are met after
all subsys_initcall()s have run, so subsys_initcall_sync() can be
used for initializing it.
It is also better to ensure that intel_idle will always initialize
before the ACPI processor driver that uses module_init() for its
initialization.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/2994397.e9J7NaK4W3@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-113139
commit 6138f34515162340520b0415184367e12775f68a
Author: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Date: Mon Mar 17 15:55:39 2025 +0200
intel_idle: Add C1 demotion on/off sysfs knob
Add a sysfs knob to enable/disable C1 demotion for the following Intel
platforms: Sapphire Rapids Xeon, Emerald Rapids Xeon, Granite Rapids Xeon,
Sierra Forest Xeon, and Grand Ridge SoC.
This sysfs file shows up as
"/sys/devices/system/cpu/cpuidle/intel_c1_demotion".
The C1 demotion feature involves the platform firmware demoting deep
C-state requests from the OS (e.g., C6 requests) to C1. The idea is
that firmware monitors CPU wake-up rate, and if it is higher than a
platform-specific threshold, the firmware demotes deep C-state
requests to C1. For example, Linux requests C6, but firmware noticed
too many wake-ups per second, and it keeps the CPU in C1. When the
CPU stays in C1 long enough, the platform promotes it back to C6.
The default value for C1 demotion is whatever is configured by BIOS.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Link: https://patch.msgid.link/20250317135541.1471754-2-dedekind1@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit fc64e0421598aaa87d61184f6777b52614a095be
Author: Li RongQing <lirongqing@baidu.com>
Date: Mon Jun 23 18:56:01 2025 +0800
cpufreq: intel_pstate: Add Granite Rapids support in no-HWP mode
Users may disable HWP in firmware, in which case intel_pstate
wouldn't load unless the CPU model is explicitly supported.
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://patch.msgid.link/20250623105601.3924-1-lirongqing@baidu.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit 1cefe495cacba5fb0417da3a75a1a76e3546d176
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Mon Jun 16 20:19:19 2025 +0200
cpufreq: intel_pstate: Always use HWP_DESIRED_PERF in passive mode
In the passive mode, intel_cpufreq_update_pstate() sets HWP_MIN_PERF in
accordance with the target frequency to ensure delivering adequate
performance, but it sets HWP_DESIRED_PERF to 0, so the processor has no
indication that the desired performance level is actually equal to the
floor one. This may cause it to choose a performance point way above
the desired level.
Moreover, this is inconsistent with intel_cpufreq_adjust_perf() which
actually sets HWP_DESIRED_PERF in accordance with the target performance
value.
Address this by adjusting intel_cpufreq_update_pstate() to pass
target_pstate as both the minimum and the desired performance levels
to intel_cpufreq_hwp_update().
Fixes: a365ab6b9d ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Shashank Balaji <shashank.mahadasyam@sony.com>
Link: https://patch.msgid.link/6173276.lOV4Wx5bFT@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
Conflicts: needed to add include of linux/cacheinfo.h as RHEL does not
have upstream c51a4f11e6d8246590b5e64908c1ed84b33e8ba2
commit 05cf8b8c5118479637efe281e5eb98972d3a3386
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Tue May 6 22:47:53 2025 +0200
cpufreq: intel_pstate: EAS: Increase cost for CPUs using L3 cache
On some hybrid platforms some efficient CPUs (E-cores) are not connected
to the L3 cache, but there are no other differences between them and the
other E-cores that use L3. In that case, it is generally more efficient
to run "light" workloads on the E-cores that do not use L3 and allow all
of the cores using L3, including P-cores, to go into idle states.
For this reason, slightly increase the cost for all CPUs sharing the L3
cache to make EAS prefer CPUs that do not use it to the other CPUs of
the same type (if any).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/2032776.usQuhbGJ8B@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit 7b010f9b906107ae4e5ac626329ab818b3f0a6b6
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Tue May 6 22:44:30 2025 +0200
cpufreq: intel_pstate: EAS support for hybrid platforms
Modify intel_pstate to register EM perf domains for CPUs on hybrid
platforms without SMT which causes EAS to be enabled on them when
schedutil is used as the cpufreq governor (which requires intel_pstate
to operate in the passive mode).
This change is targeting platforms (for example, Lunar Lake) where the
"little" CPUs (E-cores) are always more energy-efficient than the "big"
or "performance" CPUs (P-cores) when run at the same HWP performance
level, so it is sufficient to tell EAS that E-cores are always preferred
(so long as there is enough spare capacity on one of them to run the
given task). However, migrating tasks between CPUs of the same type
too often is not desirable because it may hurt both performance and
energy efficiency due to leaving warm caches behind.
For this reason, register a separate perf domain for each CPU and choose
the cost values for them so that the cost mostly depends on the CPU type,
but there is also a small component of it depending on the performance
level (utilization) which helps to balance the load between CPUs of the
same type.
The cost component related to the CPU type is computed with the help of
the observation that the IPC metric value for a given CPU is inversely
proportional to its performance-to-frequency scaling factor and the cost
of running code on it can be assumed to be roughly proportional to that
IPC ratio (in principle, the higher the IPC ratio, the more resources
are utilized when running at a given frequency, so the cost should be
higher).
For all CPUs that are online at the system initialization time, EM perf
domains are registered when the driver starts up, after asymmetric
capacity support has been enabled. For the CPUs that become online
later, EM perf domains are registered after setting the asymmetric
capacity for them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://patch.msgid.link/6057101.MhkbZ0Pkbq@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit 6bceea7a1e076ef9d71b20d8dda2f7dc52bd34d2
Author: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Date: Fri Apr 18 19:55:03 2025 -0700
arch_topology: Relocate cpu_scale to topology.[h|c]
arch_topology.c provides functionality to parse and scale CPU capacity.
It also provides a corresponding sysfs interface. Some architectures
parse and scale CPU capacity differently as per their own needs. On
Intel processors, for instance, it is responsibility of the Intel
P-state driver.
Relocate the implementation of that interface to a common location in
topology.c. Architectures can use the interface and populate it using
their own mechanisms.
An alternative approach would be to compile arch_topology.c even if
not needed only to get this interface. This approach would create
duplicated and conflicting functionality and data structures.
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20250419025504.9760-2-ricardo.neri-calderon@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit c72bbf200162a3b4b9e1baedec9008d8d710b427
Author: James Morse <james.morse@arm.com>
Date: Tue Nov 21 13:43:54 2023 +0000
arch_topology: Make register_cpu_capacity_sysctl() tolerant to late CPUs
register_cpu_capacity_sysctl() adds a property to sysfs that describes
the CPUs capacity. This is done from a subsys_initcall() that assumes
all possible CPUs are registered.
With CPU hotplug, possible CPUs aren't registered until they become
present, (or for arm64 enabled). This leads to messages during boot:
| register_cpu_capacity_sysctl: too early to get CPU1 device!
and once these CPUs are added to the system, the file is missing.
Move this to a cpuhp callback, so that the file is created once
CPUs are brought online. This covers CPUs that are added late by
mechanisms like hotplug.
One observable difference is the file is now missing for offline CPUs.
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/E1r5R2g-00CsyV-Ss@rmk-PC.armlinux.org.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
Conflicts: RHEL doesn't have upstream cf61d53b0268
commit 4a6b1cf0d4c02d6da2976c6314c264d20672937e
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Tue May 6 22:41:21 2025 +0200
PM: EM: Introduce em_adjust_cpu_capacity()
Add a function for updating the Energy Model for a CPU after its
capacity has changed, which subsequently will be used by the
intel_pstate driver.
An EM_PERF_DOMAIN_ARTIFICIAL check is added to em_recalc_and_update()
to prevent it from calling em_compute_costs() for an "artificial" perf
domain with a NULL cb parameter which would cause it to crash.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://patch.msgid.link/3637203.iIbC2pHGDl@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit 3e3ba654d3097e0031f2add215b12ff81c23814e
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Tue May 6 22:39:35 2025 +0200
PM: EM: Move CPU capacity check to em_adjust_new_capacity()
Move the check of the CPU capacity currently stored in the energy model
against the arch_scale_cpu_capacity() value to em_adjust_new_capacity()
so it will be done regardless of where the latter is called from.
This will be useful when a new em_adjust_new_capacity() caller is added
subsequently.
While at it, move the pd local variable declaration in
em_check_capacity_update() into the loop in which it is used.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://patch.msgid.link/7810787.EvYhyI6sBW@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit 179c0c7044a378198adb36f2a12410ab68cc730a
Author: Yaxiong Tian <tianyaxiong@kylinos.cn>
Date: Fri Apr 18 09:06:13 2025 +0800
PM: EM: Fix potential division-by-zero error in em_compute_costs()
When the device is of a non-CPU type, table[i].performance won't be
initialized in the previous em_init_performance(), resulting in division
by zero when calculating costs in em_compute_costs().
Since the 'cost' algorithm is only used for EAS energy efficiency
calculations and is currently not utilized by other device drivers, we
should add the _is_cpu_device(dev) check to prevent this division-by-zero
issue.
Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division")
Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/tencent_7F99ED4767C1AF7889D0D8AD50F34859CE06@qq.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: David Arcari <darcari@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-112493
commit 860a731f52f83309c213b943bac8f4ea70a88805
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date: Wed Mar 5 22:08:21 2025 +0100
PM: EM: Consify two parameters of em_dev_register_perf_domain()
Notice that em_dev_register_perf_domain() and the functions called by it
do not update objects pointed to by its cb and cpus parameters, so the
const modifier can be added to them.
This allows the return value of cpumask_of() or a pointer to a
struct em_data_callback declared as const to be passed to
em_dev_register_perf_domain() directly without explicit type
casting which is rather handy.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://patch.msgid.link/4648962.LvFx2qVVIh@rjwysocki.net
Signed-off-by: David Arcari <darcari@redhat.com>