JIRA: https://issues.redhat.com/browse/RHEL-81906
Upstream Status: ba58eee1c57b2ad45c36f782861c18faef170a55
commit ba58eee1c57b2ad45c36f782861c18faef170a55
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Mon Nov 11 14:21:33 2024 -0600
PCI: Drop duplicate pcie_get_speed_cap(), pcie_get_width_cap() declarations
6cf57be0f7 ("PCI: Add pcie_get_speed_cap() to find max supported link
speed") and c70b65fb7f ("PCI: Add pcie_get_width_cap() to find max
supported link width") added declarations to drivers/pci/pci.h.
576c7218a1 ("PCI: Export pcie_get_speed_cap and pcie_get_width_cap")
subsequently added duplicates to include/linux/pci.h.
Remove the originals from drivers/pci/pci.h. Both interfaces are used by
amdgpu, so they must be in include/linux/pci.h.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-81906
Upstream Status: de9a6c8d5dbfedb5eb3722c822da0490f6a59a45
commit de9a6c8d5dbfedb5eb3722c822da0490f6a59a45
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: Fri Oct 18 17:47:53 2024 +0300
PCI/bwctrl: Add pcie_set_target_speed() to set PCIe Link Speed
Currently, PCIe Link Speeds are adjusted by custom code rather than in a
common function provided in PCI core. The PCIe bandwidth controller
(bwctrl) introduces an in-kernel API, pcie_set_target_speed(), to set PCIe
Link Speed.
Convert Target Speed quirk to use the new API. The Target Speed quirk runs
very early when bwctrl is not yet probed for a Port and can also run later
when bwctrl is already setup for the Port, which requires the per port
mutex (set_speed_mutex) to be only taken if the bwctrl setup is already
complete.
The new API is also intended to be used in an upcoming commit that adds a
thermal cooling device to throttle PCIe bandwidth when thermal thresholds
are reached.
The PCIe bandwidth control procedure is as follows. The highest speed
supported by the Port and the PCIe device which is not higher than the
requested speed is selected and written into the Target Link Speed in the
Link Control 2 Register. Then bandwidth controller retrains the PCIe Link.
Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to
keep track PCIe Link Speed changes. While Bandwidth Notifications should
also be generated when bandwidth controller alters the PCIe Link Speed, a
few platforms do not deliver LMBS interrupt after Link Training as
expected. Thus, after changing the Link Speed, bandwidth controller makes
additional read for the Link Status Register to ensure cur_bus_speed is
consistent with the new PCIe Link Speed.
Link: https://lore.kernel.org/r/20241018144755.7875-8-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: squash devm_mutex_init() error checking from
https://lore.kernel.org/r/20241030163139.2111689-1-andriy.shevchenko@linux.intel.com,
drop export of pcie_set_target_speed()]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-81906
Upstream Status: 665745f274870c921020f610e2c99a3b1613519b
commit 665745f274870c921020f610e2c99a3b1613519b
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: Fri Oct 18 17:47:52 2024 +0300
PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller
This mostly reverts the commit b4c7d2076b ("PCI/LINK: Remove bandwidth
notification"). An upcoming commit extends this driver building PCIe
bandwidth controller on top of it.
PCIe bandwidth notifications were first added in the commit e8303bb7a7
("PCI/LINK: Report degraded links via link bandwidth notification") but
later had to be removed. The significant changes compared with the old
bandwidth notification driver include:
1) Don't print the notifications into kernel log, just keep the Link
Speed cached in struct pci_bus updated. While somewhat unfortunate,
the log spam was the source of complaints that eventually lead to
the removal of the bandwidth notifications driver (see the links
below for further information).
2) Besides the Link Bandwidth Management Interrupt, also enable Link
Autonomous Bandwidth Interrupt to cover the other source of bandwidth
changes.
3) Handle Link Speed updates robustly. Refresh the cached Link Speed
when enabling Bandwidth Notification Interrupts, and solve the race
between Link Speed read and LBMS/LABS update in
pcie_bwnotif_irq_thread().
4) Use concurrency safe LNKCTL RMW operations.
5) The driver is now called PCIe bwctrl (bandwidth controller) instead
of just bandwidth notifications because of increased scope and
functionality within the driver.
6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain().
Provide LBMS counting API for it.
7) Tweaks to variable/functions names for consistency and length reasons.
Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to
keep track PCIe Link Speed changes.
[bhelgaas: This is based on previous work by Alexandru Gagniuc
<mr.nuke.me@gmail.com>; see e8303bb7a7 ("PCI/LINK: Report degraded links
via link bandwidth notification")]
Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com
Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/
Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/
Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/
Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler:
https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Stefan Wahren <wahrenst@gmx.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-81906
Upstream Status: e93d9fcfd7dc643eb5fce43053774d27bea2b263
commit e93d9fcfd7dc643eb5fce43053774d27bea2b263
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: Fri Oct 18 17:47:50 2024 +0300
PCI: Refactor pcie_update_link_speed()
pcie_update_link_speed() is passed the Link Status register but not all
callers have that value at hand nor need the value.
Refactor pcie_update_link_speed() to include reading the Link Status
register and create __pcie_update_link_speed() which can be used by the
hotplug code that has the register value at hand beforehand (and needs the
value for other purposes).
Link: https://lore.kernel.org/r/20241018144755.7875-5-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-81906
Upstream Status: d2bd39c0456b75be9dfc7d774b8d021355c26ae3
commit d2bd39c0456b75be9dfc7d774b8d021355c26ae3
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: Fri Oct 18 17:47:49 2024 +0300
PCI: Store all PCIe Supported Link Speeds
The PCIe bandwidth controller added by a subsequent commit will require
selecting PCIe Link Speeds that are lower than the Maximum Link Speed.
The struct pci_bus only stores max_bus_speed. Even if PCIe r6.1 sec 8.2.1
currently disallows gaps in supported Link Speeds, the Implementation Note
in PCIe r6.1 sec 7.5.3.18, recommends determining supported Link Speeds
using the Supported Link Speeds Vector in the Link Capabilities 2 Register
(when available) to "avoid software being confused if a future
specification defines Links that do not require support for all slower
speeds."
Reuse code in pcie_get_speed_cap() to add pcie_get_supported_speeds() to
query the Supported Link Speeds Vector of a PCIe device. The value is taken
directly from the Supported Link Speeds Vector or synthesized from the Max
Link Speed in the Link Capabilities Register when the Link Capabilities 2
Register is not available.
The Supported Link Speeds Vector in the Link Capabilities Register 2
corresponds to the bus below on Root Ports and Downstream Ports, whereas it
corresponds to the bus above on Upstream Ports and Endpoints (PCIe r6.1 sec
7.5.3.18):
Supported Link Speeds Vector - This field indicates the supported Link
speed(s) of the associated Port.
Add supported_speeds into the struct pci_dev that caches the
Supported Link Speeds Vector.
supported_speeds contains a set of Link Speeds only in the case where PCIe
Link Speed can be determined. Root Complex Integrated Endpoints do not have
a well-defined Link Speed because they do not implement either of the Link
Capabilities Registers, which is allowed by PCIe r6.1 sec 7.5.3 (the same
limitation applies to determining cur_bus_speed and max_bus_speed that are
PCI_SPEED_UNKNOWN in such case). This is of no concern from PCIe bandwidth
controller point of view because such devices are not attached into a PCIe
Root Port that could be controlled.
The supported_speeds field keeps the extra reserved zero at the least
significant bit to match the Link Capabilities 2 Register layout.
An attempt was made to store supported_speeds field into the struct pci_bus
as an intersection of both ends of the Link, however, the subordinate
struct pci_bus is not available early enough. The Target Speed quirk (in
pcie_failed_link_retrain()) can run either during initial scan or later,
requiring it to use the API provided by the PCIe bandwidth controller to
set the Target Link Speed in order to co-exist with the bandwidth
controller. When the Target Speed quirk is calling the bandwidth controller
during initial scan, the struct pci_bus is not yet initialized. As such,
storing supported_speeds into the struct pci_bus is not viable.
Suggested-by: Lukas Wunner <lukas@wunner.de>
Link: https://lore.kernel.org/r/20241018144755.7875-4-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: move pcie_get_supported_speeds() decl to drivers/pci/pci.h]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-67693
Upstream Status: 6aa9c09f1bcd339749b249830c604871740df268
commit 6aa9c09f1bcd339749b249830c604871740df268
Author: Thomas Richard <thomas.richard@bootlin.com>
Date: Wed Jun 19 12:15:13 2024 +0200
PCI: Add T_PERST_CLK_US macro
The "Power Sequencing and Reset Signal Timings" table of the PCI
Express Card Electromechanical Specification, Revision 5.1, Section
2.9.2, indicates PERST# should be deasserted after minimum of 100us
once REFCLK is stable (symbol T_PERST-CLK).
Add a macro so that PCIe controller drivers can use it.
Link: https://lore.kernel.org/linux-pci/20240102-j7200-pcie-s2r-v7-5-a2f9156da6c3@bootlin.com
Signed-off-by: Thomas Richard <thomas.richard@bootlin.com>
[kwilczynski: commit log, update sleep interval macros code comments]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-67693
Upstream Status: 7ff7509fa52397455e04bd44982e9dfbbd19457f
commit 7ff7509fa52397455e04bd44982e9dfbbd19457f
Author: Philipp Stanner <pstanner@redhat.com>
Date: Mon Jul 29 11:36:26 2024 +0200
PCI: Make pcim_request_region() a public function
pcim_request_region() is the managed counterpart of pci_request_region().
It is currently only used internally for PCI.
It can be useful for a number of drivers and exporting it is a step towards
deprecating more complicated functions.
Make pcim_request_region() a public function.
Link: https://lore.kernel.org/r/20240729093625.17561-4-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-67693
Upstream Status: 87f10faf166a9114aa0d4132298cad379de16fdd
commit 87f10faf166a9114aa0d4132298cad379de16fdd
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Tue Aug 27 18:48:48 2024 -0500
PCI: Rename CRS Completion Status to RRS
PCIe r6.0 changed the abbreviation for "Configuration Request Retry Status"
Completion Status from "CRS" to "RRS" and uses the terminology of
"Configuration RRS Software Visibility" instead of "CRS Software
Visibility".
Align the Linux usage with the r6.0 spec language. No functional change
intended.
It's confusing to make this change, but I think "RRS" *is* a better
abbreviation because it was easy to interpret "CRS" as "Completion Retry
Status", which really didn't make any sense.
Link: https://lore.kernel.org/r/20240827234848.4429-4-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-76025
Upstream Status: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=enumeration&id=4453f360862e5d9f0807941d613162c3f7a36559
In maintainer's PCI 'enumeration' branch, slated for v6.14.
https://lore.kernel.org/all/20250120224418.GA906057@bhelgaas/
commit 4453f360862e5d9f0807941d613162c3f7a36559 (pci/enumeration)
Author: Alex Williamson <alex.williamson@redhat.com>
Date: Mon Jan 20 11:21:59 2025 -0700
PCI: Batch BAR sizing operations
Toggling memory enable is free on bare metal, but potentially expensive
in virtualized environments as the device MMIO spaces are added and
removed from the VM address space, including DMA mapping of those spaces
through the IOMMU where peer-to-peer is supported. Currently memory
decode is disabled around sizing each individual BAR, even for SR-IOV
BARs while VF Enable is cleared.
This can be better optimized for virtual environments by sizing a set
of BARs at once, stashing the resulting mask into an array, while only
toggling memory enable once. This also naturally improves the SR-IOV
path as the caller becomes responsible for any necessary decode disables
while sizing BARs, therefore SR-IOV BARs are sized relying only on the
VF Enable rather than toggling the PF memory enable in the command
register.
Link: https://lore.kernel.org/r/20250120182202.1878581-1-alex.williamson@redhat.com
Reported-by: Mitchell Augustin <mitchell.augustin@canonical.com>
Link: https://lore.kernel.org/r/CAHTA-uYp07FgM6T1OZQKqAdSA5JrZo0ReNEyZgQZub4mDRrV5w@mail.gmail.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Mitchell Augustin <mitchell.augustin@canonical.com>
Reviewed-by: Mitchell Augustin <mitchell.augustin@canonical.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-71363
Upstream Status: 59100eb248c0b15585affa546c7f6834b30eb5a4
commit 59100eb248c0b15585affa546c7f6834b30eb5a4
Author: Maciej W. Rozycki <macro@orcam.me.uk>
Date: Fri Aug 9 14:25:02 2024 +0100
PCI: Use an error code with PCIe failed link retraining
Given how the call place in pcie_wait_for_link_delay() got structured now,
and that pcie_retrain_link() returns a potentially useful error code,
convert pcie_failed_link_retrain() to return an error code rather than a
boolean status, fixing handling at the call site mentioned. Update the
other call site accordingly.
Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091156530.61955@angie.orcam.me.uk
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.5+
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-71363
Upstream Status: d591f6804e7e1310881c9224d72247a2b65039af
commit d591f6804e7e1310881c9224d72247a2b65039af
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Tue Aug 27 18:48:46 2024 -0500
PCI: Wait for device readiness with Configuration RRS
After a device reset, delays are required before the device can
successfully complete config accesses. PCIe r6.0, sec 6.6, specifies some
delays required before software can perform config accesses. Devices that
require more time after those delays may respond to config accesses with
Configuration Request Retry Status (RRS) completions.
Callers of pci_dev_wait() are responsible for delays until the device can
respond to config accesses. pci_dev_wait() waits any additional time until
the device can successfully complete config accesses.
Reading config space of devices that are not present or not ready typically
returns ~0 (PCI_ERROR_RESPONSE). Previously we polled the Command register
until we got a value other than ~0. This is sometimes a problem because
Root Complex handling of RRS completions may include several retries and
implementation-specific behavior that is invisible to software (see sec
2.3.2), so the exponential backoff in pci_dev_wait() may not work as
intended.
Linux enables Configuration RRS Software Visibility on all Root Ports that
support it. If it is enabled, read the Vendor ID instead of the Command
register. RRS completions cause immediate return of the 0x0001 reserved
Vendor ID value, so the pci_dev_wait() backoff works correctly.
When a read of Vendor ID eventually completes successfully by returning a
non-0x0001 value (the Vendor ID or 0xffff for VFs), the device should be
initialized and ready to respond to config requests.
For conventional PCI devices or devices below Root Ports that don't support
Configuration RRS Software Visibility, poll the Command register as before.
This was developed independently, but is very similar to Stanislav
Spassov's previous work at
https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com
Link: https://lore.kernel.org/r/20240827234848.4429-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Duc Dang <ducdang@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-65598
Upstream Status: 100ae5d77f07f9f046106e228778c7aa1c6d3af3
commit 100ae5d77f07f9f046106e228778c7aa1c6d3af3
Author: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Date: Wed Jun 19 20:41:12 2024 +0530
PCI: Bring the PCIe speed to MBps logic to new pcie_dev_speed_mbps()
Bring the switch case in pcie_link_speed_mbps() to new function to
the header file so that it can be used in other places like
in controller driver.
Link: https://lore.kernel.org/linux-pci/20240619-opp_support-v15-3-aa769a2173a3@quicinc.com
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 70a7bfb1e515b03e54491254a4375cdfb9515227
commit 70a7bfb1e515b03e54491254a4375cdfb9515227
Author: Damien Le Moal <dlemoal@kernel.org>
Date: Sat Apr 13 09:41:20 2024 +0900
PCI: rockchip-host: Wait 100ms after reset before starting configuration
PCIe r6.0, sec 6.6.1, states that the host should wait for at least 100
msec from the end of a conventional reset (PERST# is de-asserted) before
sending a configuration request to ensure that the device is able to
respond with a "Request Retry Status" completion.
Add the PCIE_T_RRS_READY_MS macro to define this wait time and modify
rockchip_pcie_host_init_port() to add this 100ms sleep after deasserting
PERST# using the ep_gpio GPIO.
Link: https://lore.kernel.org/linux-pci/20240413004120.1099089-3-dlemoal@kernel.org
Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 407abde9caee0d8f757fc8bed43fa5efc6fe509a
commit 407abde9caee0d8f757fc8bed43fa5efc6fe509a
Author: Vidya Sagar <vidyas@nvidia.com>
Date: Wed May 8 23:11:36 2024 +0530
PCI: of: Add of_pci_preserve_config() for per-host bridge support
Add of_pci_preserve_config() to look for the "linux,pci-probe-only"
property under a specified node. If it's not found there, look under
"of_chosen" in addition.
If the caller didn't specify a node, look under "of_chosen".
With a future patch, this will support "linux,pci-probe-only" on a per host
bridge basis based on the presence of the property in the respective PCI
host bridge DT node.
Implement of_pci_check_probe_only() using of_pci_preserve_config().
Link: https://lore.kernel.org/r/20240508174138.3630283-3-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 9d7d5db8e78ef1b67690bbffa5af60016d8e279d
commit 9d7d5db8e78ef1b67690bbffa5af60016d8e279d
Author: Vidya Sagar <vidyas@nvidia.com>
Date: Wed May 8 23:11:35 2024 +0530
PCI: Move PRESERVE_BOOT_CONFIG _DSM evaluation to pci_register_host_bridge()
Move the PRESERVE_BOOT_CONFIG _DSM evaluation from acpi_pci_root_create()
to pci_register_host_bridge().
This will help unify the ACPI _DSM path and the DT-based
"linux,pci-probe-only" paths.
This should be safe because it happens earlier than it used to:
acpi_pci_root_create
pci_create_root_bus
pci_register_host_bridge
+ bridge->preserve_config = pci_preserve_config(bridge)
pci_acpi_preserve_config
+ acpi_evaluate_dsm_typed(DSM_PCI_PRESERVE_BOOT_CONFIG)
- acpi_evaluate_dsm_typed(DSM_PCI_PRESERVE_BOOT_CONFIG)
No functional change intended.
Link: https://lore.kernel.org/r/20240508174138.3630283-2-vidyas@nvidia.com
Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: f748a07a0b6430b3ed638e5df7ae5007a28eaf11
commit f748a07a0b6430b3ed638e5df7ae5007a28eaf11
Author: Philipp Stanner <pstanner@redhat.com>
Date: Thu Jun 13 13:50:24 2024 +0200
PCI: Remove legacy pcim_release()
Thanks to preceding cleanup steps, pcim_release() is now not needed
anymore and can be replaced by pcim_disable_device(), which is the exact
counterpart to pcim_enable_device().
This permits removing further parts of the old PCI devres implementation.
Replace pcim_release() with pcim_disable_device(). Remove the now unused
function get_pci_dr(). Remove the struct pci_devres from pci.h.
Link: https://lore.kernel.org/r/20240613115032.29098-12-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 2c3e842f125fc1c57cd2824840d04e401c0542c2
commit 2c3e842f125fc1c57cd2824840d04e401c0542c2
Author: Philipp Stanner <pstanner@redhat.com>
Date: Thu Jun 13 13:50:22 2024 +0200
PCI: Give pcim_set_mwi() its own devres cleanup callback
Managing pci_set_mwi() with devres can easily be done with its own
callback, without the necessity to store any state about it in a
device-related struct.
Remove the MWI state from struct pci_devres. Give pcim_set_mwi() a
separate devres cleanup callback.
Link: https://lore.kernel.org/r/20240613115032.29098-10-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 1b9469cf15976a7cb7378caaa8a1772e7901514d
commit 1b9469cf15976a7cb7378caaa8a1772e7901514d
Author: Philipp Stanner <pstanner@redhat.com>
Date: Thu Jun 13 13:50:21 2024 +0200
PCI: Move struct pci_devres.pinned bit to struct pci_dev
The bit describing whether the PCI device is currently pinned is stored
in struct pci_devres. To clean up and simplify the PCI devres API, it's
better if this information is stored in struct pci_dev.
This will later permit simplifying pcim_enable_device().
Move the 'pinned' boolean bit to struct pci_dev.
Restructure bits in struct pci_dev so the pm / pme fields are next to
each other.
Link: https://lore.kernel.org/r/20240613115032.29098-9-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 77f79ac8de0f490fca4f0a5f2e1e38eeee191f05
commit 77f79ac8de0f490fca4f0a5f2e1e38eeee191f05
Author: Philipp Stanner <pstanner@redhat.com>
Date: Thu Jun 13 13:50:20 2024 +0200
PCI: Remove struct pci_devres.enabled status bit
The struct pci_devres has a separate boolean to track whether a device is
enabled. That, however, can easily be tracked in an agnostic manner through
the function pci_is_enabled().
Using it allows for simplifying the PCI devres implementation.
Replace the separate 'enabled' status bit from struct pci_devres with
calls to pci_is_enabled() at the appropriate places.
Link: https://lore.kernel.org/r/20240613115032.29098-8-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: d47bde708086c77b1ceeb7643e600089f63dd03b
commit d47bde708086c77b1ceeb7643e600089f63dd03b
Author: Philipp Stanner <pstanner@redhat.com>
Date: Thu Jun 13 13:50:18 2024 +0200
PCI: Add managed pcim_request_region()
These existing functions:
pci_request_region()
pci_request_selected_regions()
pci_request_selected_regions_exclusive()
are "hybrid" functions built on __pci_request_region() and are managed if
pcim_enable_device() has been called, but unmanaged otherwise.
Add these new functions:
pcim_request_region()
pcim_request_region_exclusive()
These are *always* managed and use the new pcim_addr_devres tracking
infrastructure instead of find_pci_dr() and struct pci_devres.region_mask.
Implement the hybrid functions using the new "pure" functions and remove
struct pci_devres.region_mask, which is no longer needed.
Link: https://lore.kernel.org/r/20240613115032.29098-6-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: bbaff68bf4a404bee5f5e20e7b1e30301b26304a
commit bbaff68bf4a404bee5f5e20e7b1e30301b26304a
Author: Philipp Stanner <pstanner@redhat.com>
Date: Thu Jun 13 13:50:16 2024 +0200
PCI: Add managed partial-BAR request and map infrastructure
The pcim_iomap_devres table tracks entire-BAR mappings, so we can't use it
to build a managed version of pci_iomap_range(), which maps partial BARs.
Add struct pcim_addr_devres, which can track request and mapping of both
entire BARs and partial BARs.
Add the following internal devres functions based on struct
pcim_addr_devres:
pcim_iomap_region() # request & map entire BAR
pcim_iounmap_region() # unmap & release entire BAR
pcim_request_region() # request entire BAR
pcim_release_region() # release entire BAR
pcim_request_all_regions() # request all entire BARs
pcim_release_all_regions() # release all entire BARs
Rework the following public interfaces using the new infrastructure
listed above:
pcim_iomap() # map partial BAR
pcim_iounmap() # unmap partial BAR
pcim_iomap_regions() # request & map specified BARs
pcim_iomap_regions_request_all() # request all BARs, map specified BARs
pcim_iounmap_regions() # unmap & release specified BARs
Link: https://lore.kernel.org/r/20240613115032.29098-4-pstanner@redhat.com
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
[bhelgaas: commit log]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: fe4a83ec07818f2243eac584488e65397699550c
commit fe4a83ec07818f2243eac584488e65397699550c
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: Tue May 7 15:17:58 2024 +0300
PCI: Make pcie_bandwidth_capable() static
pcie_bandwidth_capable() is only used within pci.c, make it static.
Link: https://lore.kernel.org/r/20240507121758.13849-1-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: 407d1a51921e9f28c1bcec647c2205925bd1fdab
Conflict(s):
Files: drivers/pci/Kconfig, drivers/pci/Makefile
Not intending to enable new functionality, thus why the hunks pertaining
to the above files were skipped, but keeping up with code changes to help
general re-base dependency efforts. Iff PCI_DYNAMIC_OF_NODES is needed
in the future, it can be handled when such occurs.
commit 407d1a51921e9f28c1bcec647c2205925bd1fdab
Author: Lizhi Hou <lizhi.hou@amd.com>
Date: Tue Aug 15 10:19:57 2023 -0700
PCI: Create device tree node for bridge
The PCI endpoint device such as Xilinx Alveo PCI card maps the register
spaces from multiple hardware peripherals to its PCI BAR. Normally,
the PCI core discovers devices and BARs using the PCI enumeration process.
There is no infrastructure to discover the hardware peripherals that are
present in a PCI device, and which can be accessed through the PCI BARs.
Apparently, the device tree framework requires a device tree node for the
PCI device. Thus, it can generate the device tree nodes for hardware
peripherals underneath. Because PCI is self discoverable bus, there might
not be a device tree node created for PCI devices. Furthermore, if the PCI
device is hot pluggable, when it is plugged in, the device tree nodes for
its parent bridges are required. Add support to generate device tree node
for PCI bridges.
Add an of_pci_make_dev_node() interface that can be used to create device
tree node for PCI devices.
Add a PCI_DYNAMIC_OF_NODES config option. When the option is turned on,
the kernel will generate device tree nodes for PCI bridges unconditionally.
Initially, add the basic properties for the dynamically generated device
tree nodes which include #address-cells, #size-cells, device_type,
compatible, ranges, reg.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://lore.kernel.org/r/1692120000-46900-3-git-send-email-lizhi.hou@amd.com
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-23582
Currently a DOE instance cannot be shared by multiple drivers because
each driver creates its own pci_doe_mb struct for a given DOE instance.
For the same reason a DOE instance cannot be shared between the PCI core
and a driver.
Moreover, finding out which protocols a DOE instance supports requires
creating a pci_doe_mb for it. If a device has multiple DOE instances,
a driver looking for a specific protocol may need to create a pci_doe_mb
for each of the device's DOE instances and then destroy those which
do not support the desired protocol. That's obviously an inefficient
way to do things.
Overcome these issues by creating mailboxes in the PCI core on device
enumeration.
Provide a pci_find_doe_mailbox() API call to allow drivers to get a
pci_doe_mb for a given (pci_dev, vendor, protocol) triple. This API is
modeled after pci_find_capability() and can later be amended with a
pci_find_next_doe_mailbox() call to iterate over all mailboxes of a
given pci_dev which support a specific protocol.
On removal, destroy the mailboxes in pci_destroy_dev(), after the driver
is unbound. This allows drivers to use DOE in their ->remove() hook.
On surprise removal, cancel ongoing DOE exchanges and prevent new ones
from being scheduled. Thereby ensure that a hot-removed device doesn't
needlessly wait for a running exchange to time out.
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Ming Li <ming4.li@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/40a6f973f72ef283d79dd55e7e6fddc7481199af.1678543498.git.lukas@wunner.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from commit ac04840350e2c21a17d867b262a1586603b87a92)
Signed-off-by: John W. Linville <linville@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/4151
# Merge Request Required Information
JIRA: https://issues.redhat.com/browse/RHEL-28780
JIRA: https://issues.redhat.com/browse/RHEL-12083
JIRA: https://issues.redhat.com/browse/RHEL-12322
JIRA: https://issues.redhat.com/browse/RHEL-29105
JIRA: https://issues.redhat.com/browse/RHEL-29357
JIRA: https://issues.redhat.com/browse/RHEL-29359
Omitted-fix: ed8b94f6e0ac ("powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add")
- Reverted by 1fba2bf8e9d5 ("Revert "powerpc/pseries/iommu: Fix iommu initialisation during DLPAR add"")
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
branch: next
Tested: In progress
- general cki coverage
- Nvidia testing arm-smmu-v3 and iommufd related changes they have requested.
- Multiple rounds testing of amd_iommu, intel_iommu, and arm-smmu-v3 with
various iommu configurations with disk i/o using fio,
covering lazy iotlb invalidation, strict iotlb invalidation,
and passthrough. Also tested with forcedac set. Intel
Scalable Mode capable systems tested with the iotlb invalidation
policies, and passthrough with scalable mode enabled, and disabled.
AMD systems tested tested with v1 pages tables and v2.
- Tested booting with various iommu configurations, and verifying system
in correct state on AMD, Intel, and ARM.
- Limited test on ppc64le. The system I had access to was
setting up a 64-bit bypass window, and using dma_direct
calls. It ran, but since I don't normally touch ppc64le
iommu code, I need to investigate more or get IBM assistance
to more thoroughly test it.
- Working on getting testing assistance from IBM for the s390x changes.
## Summary of Changes
This brings iommu, iommufd, and dma mapping api up to 6.9 with some additions from Joerg's
next branch minus some commits changes in a 6.9 SEV-SNP pull for AMD. Some hightlights:
- The removal of the amd_iommu_v2 code, and the addition of it's replacement based on the
iommu core SVA api, along with a re-org of the amd_iommu code.
- The migration of s390 to the iommu core dma-iommu dma ops implementation, joining Intel,
AMD, and ARM as users of the same code base.
- The beginnings of a re-work of the arm-smmu-v3 driver by Jason, and others.
- A number of changes to iommufd as it continues to get fleshed out.
- IOPT memory usage observability (code that was basis for talk at LPC last year)
Example output in vmstat files:
```
# grep iommu /sys/devices/system/node/node*/vmstat
/sys/devices/system/node/node0/vmstat:nr_iommu_pages 342
/sys/devices/system/node/node1/vmstat:nr_iommu_pages 0
```
- Continued work on shared virtual addressing and io page faulting (PRI).
- Dynamic swiotlb memory pools. This is not enabled yet, as they still seem to be
shaking out issues upstream, but the code is in place now.
- Re-working of iommu core domain allocation.
Note: iommufd selftest is being enabled in separate work that has been delegated to
another engineer starting to help with iommu. So that will be enabled in the
next few weeks to add more coverage for iommufd.
Conflicts wise, they should be noted in the individual commits, but
not too bad overall. 13/30 were dropping unsupported bits, and another
8 were context diffs. A couple caused by out of order backports due
to fixes, and couple upstream conflicts from colliding patchsets that
had to be resolved in the merge commits.
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Approved-by: Jan Stancek <jstancek@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Phil Auld <pauld@redhat.com>
Approved-by: David Airlie <airlied@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Mark Langsdorf <mlangsdo@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>
Merged-by: Lucas Zampieri <lzampier@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: be9c3a4c8be13326e434d8817d6dda6c5d2835f5
Conflict(s):
Patching file drivers/pci/Makefile: Hunk #1 FAILED at 4. Post context
conflict due to originating patch not taking into account upstream
commit 1e8cc8e6bd85 "PCI: Place interrupt related code into irq.c"
(see merge commit b8de187056f "Merge branch 'pci/sysfs'").
commit be9c3a4c8be13326e434d8817d6dda6c5d2835f5
Author: Lukas Wunner <lukas@wunner.de>
Date: Mon Oct 30 13:32:12 2023 +0100
PCI/sysfs: Compile pci-sysfs.c only if CONFIG_SYSFS=y
It is possible to enable CONFIG_PCI but disable CONFIG_SYSFS and for
space-constrained devices such as routers, such a configuration may
actually make sense.
However pci-sysfs.c is compiled even if CONFIG_SYSFS is disabled,
unnecessarily increasing the kernel's size.
To rectify that:
* Move pci_mmap_fits() to mmap.c. It is not only needed by
pci-sysfs.c, but also proc.c.
* Move pci_dev_type to probe.c and make it private. It references
pci_dev_attr_groups in pci-sysfs.c. Make that public instead for
consistency with pci_dev_groups, pcibus_groups and pci_bus_groups,
which are likewise public and referenced by struct definitions in
pci-driver.c and probe.c.
* Define pci_dev_groups, pci_dev_attr_groups, pcibus_groups and
pci_bus_groups to NULL if CONFIG_SYSFS is disabled. Provide empty
static inlines for pci_{create,remove}_legacy_files() and
pci_{create,remove}_sysfs_dev_files().
Result:
vmlinux size is reduced by 122996 bytes in my arm 32-bit test build.
Link: https://lore.kernel.org/r/85ca95ae8e4d57ccf082c5c069b8b21eb141846e.1698668982.git.lukas@wunner.de
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 815a3909ead7440e2827042e5ec618f4396f022c
commit 815a3909ead7440e2827042e5ec618f4396f022c
Author: Philipp Stanner <pstanner@redhat.com>
Date: Wed Jan 31 10:00:23 2024 +0100
PCI: Move devres code from pci.c to devres.c
The file pci.c is very large and contains a number of devres functions.
These functions should now reside in devres.c.
Move as much devres-specific code from pci.c to devres.c as possible.
There are a few callers left in pci.c that do devres operations. These
should be ported in the future. Add corresponding TODOs.
The reason they are not moved right now in this commit is that PCI's devres
currently implements a sort of "hybrid-mode": pci_request_region(), for
instance, does not have a corresponding pcim_ equivalent, yet. Instead, the
function can be made managed by previously calling pcim_enable_device()
(instead of pci_enable_device()). This makes it unreasonable to move
pci_request_region() to devres.c. Moving the functions would require
changes to PCI's API and is, therefore, left for future work.
In summary, this commit serves as a preparation step for a following
patch series that will cleanly separate the PCI's managed and unmanaged
API.
Link: https://lore.kernel.org/r/20240131090023.12331-5-pstanner@redhat.com
Suggested-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 17423360a27ae58c1850f588bdd8013bbfcd250b
commit 17423360a27ae58c1850f588bdd8013bbfcd250b
Author: David E. Box <david.e.box@linux.intel.com>
Date: Fri Feb 23 14:58:50 2024 -0600
PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") restored the L1 PM Substates Capability after resume,
which reduced power consumption by making the ASPM L1.x states work after
resume.
a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume"") reverted 4ff116d0d5fd because resume failed on some
systems, so power consumption after resume increased again.
a7152be79b62 mentioned that we restore L1 PM substate configuration even
though ASPM L1 may already be enabled. This is due the fact that the
pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state().
Save and restore the L1 PM Substates Capability, following PCIe r6.1, sec
5.5.4 more closely by:
1) Do not restore ASPM configuration in pci_restore_pcie_state() but
do that after PCIe capability is restored in pci_restore_aspm_state()
following PCIe r6.1, sec 5.5.4.
2) If BIOS reenables L1SS, particularly L1.2, we need to clear the
enables in the right order, downstream before upstream. Defer
restoring the L1SS config until we are at the downstream component.
Then update the config for both ends of the link in the prescribed
order.
3) Program ASPM L1 PM substate configuration before L1 enables.
4) Program ASPM L1 PM substate enables last, after rest of the fields
in the capability are programmed.
[bhelgaas: commit log, squash L1SS-related patches, do both LNKCTL restores
in pci_restore_pcie_state()]
Link: https://lore.kernel.org/r/20240128233212.1139663-3-david.e.box@linux.intel.com
Link: https://lore.kernel.org/r/20240128233212.1139663-4-david.e.box@linux.intel.com
Link: https://lore.kernel.org/r/20240223205851.114931-5-helgaas@kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217321
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
Co-developed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Co-developed-by: David E. Box <david.e.box@linux.intel.com>
Reported-by: Koba Ko <koba.ko@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Tasev Nikola <tasev.stefanoska@skynet.be> # Asus UX305FA
Cc: Mark Enriquez <enriquezmark36@gmail.com>
Cc: Thomas Witt <kernel@witt.link>
Cc: Werner Sembach <wse@tuxedocomputers.com>
Cc: Vidya Sagar <vidyas@nvidia.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 1e11b5494c3dbb1e5fce7e95021c1698799c7288
commit 1e11b5494c3dbb1e5fce7e95021c1698799c7288
Author: David E. Box <david.e.box@linux.intel.com>
Date: Fri Feb 23 14:58:49 2024 -0600
PCI/ASPM: Move pci_save_ltr_state() to aspm.c
Even when CONFIG_PCIEASPM is not set, we save and restore the LTR
Capability so that if ASPM L1.2 and LTR were configured by the platform,
ASPM L1.2 will still work after suspend/resume, when that platform
configuration may be lost. See dbbfadf231 ("PCI/ASPM: Save LTR Capability
for suspend/resume").
Since ASPM L1.2 depends on the LTR Capability, move the save/restore code
to the part of aspm.c that is always compiled regardless of
CONFIG_PCIEASPM. No functional change intended.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20240128233212.1139663-5-david.e.box@linux.intel.com
[bhelgaas: commit log, reorder to make this a pure move]
Link: https://lore.kernel.org/r/20240223205851.114931-4-helgaas@kernel.org
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: fa84f4435a6202dd90248517f41e54bf3fb85bc5
Conflict(s):
Patching file drivers/pci/pci.h; Hunk #2 FAILED at 572.
False conflict as this upstream patch's basis was prior to upstream
commit 1e560864159d "PCI/ASPM: Fix deadlock when enabling ASPM".
commit fa84f4435a6202dd90248517f41e54bf3fb85bc5
Author: David E. Box <david.e.box@linux.intel.com>
Date: Fri Feb 23 14:58:47 2024 -0600
PCI/ASPM: Move pci_configure_ltr() to aspm.c
The Latency Tolerance Reporting (LTR) mechanism supports the ASPM L1.2
state and is only configured when CONFIG_PCIEASPM is set.
Move pci_configure_ltr() and pci_bridge_reconfigure_ltr() into aspm.c since
they only build when CONFIG_PCIEASPM is set. No functional change
intended.
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20240128233212.1139663-2-david.e.box@linux.intel.com
[bhelgaas: commit log, split build change from function moves]
Link: https://lore.kernel.org/r/20240223205851.114931-2-helgaas@kernel.org
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 0a5a46a6a61be7b63c12c18495d427f91f3662a9
commit 0a5a46a6a61be7b63c12c18495d427f91f3662a9
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date: Tue Feb 6 15:57:15 2024 +0200
PCI/AER: Generalize TLP Header Log reading
Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs
7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after
AER as the struct aer_header_log_regs. Also, not all places that handle TLP
Header Log use the struct and the struct members are named individually.
Generalize the struct name and members, and use it consistently where TLP
Header Log is being handled so that a pcie_read_tlp_log() helper can be
easily added.
Link: https://lore.kernel.org/r/20240206135717.8565-3-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
[bhelgaas: drop ixgbe changes for now, tidy whitespace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-28780
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
commit 39714fd73c6b60a8d27bcc5b431afb0828bf4434
Author: Ethan Zhao <haifeng.zhao@linux.intel.com>
Date: Tue Mar 5 20:21:14 2024 +0800
PCI: Make pci_dev_is_disconnected() helper public for other drivers
Make pci_dev_is_disconnected() public so that it can be called from
Intel VT-d driver to quickly fix/workaround the surprise removal
unplug hang issue for those ATS capable devices on PCIe switch downstream
hotplug capable ports.
Beside pci_device_is_present() function, this one has no config space
space access, so is light enough to optimize the normal pure surprise
removal and safe removal flow.
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Tested-by: Haorong Ye <yehaorong@bytedance.com>
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
Link: https://lore.kernel.org/r/20240301080727.3529832-2-haifeng.zhao@linux.intel.com
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
(cherry picked from commit 39714fd73c6b60a8d27bcc5b431afb0828bf4434)
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-28627
Upstream Status: 65f8e0beac5a495b8f3b387add1f9f4470678cb5
commit 65f8e0beac5a495b8f3b387add1f9f4470678cb5
Author: Puranjay Mohan <puranjay12@gmail.com>
Date: Sat Nov 6 16:56:05 2021 +0530
PCI: Update BAR # and window messages
The PCI log messages print the register offsets at some places and BAR
numbers at other places. There is no uniformity in this logging mechanism.
It would be better to print names than register offsets.
Add a helper function that aids in printing more meaningful information
about the BAR numbers like "VF BAR", "ROM", "bridge window", etc. This
function can be called while printing PCI log messages.
[bhelgaas: fold in Lukas' static array suggestion from
https://lore.kernel.org/all/20211106115831.GA7452@wunner.de/]
Link: https://lore.kernel.org/r/20211106112606.192563-2-puranjay12@gmail.com
Signed-off-by: Puranjay Mohan <puranjay12@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26162
Upstream Status: 1e560864159d002b453da42bd2c13a1805515a20
commit 1e560864159d002b453da42bd2c13a1805515a20
Author: Johan Hovold <johan+linaro@kernel.org>
Date: Tue Jan 30 11:02:43 2024 +0100
PCI/ASPM: Fix deadlock when enabling ASPM
A last minute revert in 6.7-final introduced a potential deadlock when
enabling ASPM during probe of Qualcomm PCIe controllers as reported by
lockdep:
============================================
WARNING: possible recursive locking detected
6.7.0 #40 Not tainted
--------------------------------------------
kworker/u16:5/90 is trying to acquire lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pcie_aspm_pm_state_change+0x58/0xdc
but task is already holding lock:
ffffacfa78ced000 (pci_bus_sem){++++}-{3:3}, at: pci_walk_bus+0x34/0xbc
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(pci_bus_sem);
lock(pci_bus_sem);
*** DEADLOCK ***
Call trace:
print_deadlock_bug+0x25c/0x348
__lock_acquire+0x10a4/0x2064
lock_acquire+0x1e8/0x318
down_read+0x60/0x184
pcie_aspm_pm_state_change+0x58/0xdc
pci_set_full_power_state+0xa8/0x114
pci_set_power_state+0xc4/0x120
qcom_pcie_enable_aspm+0x1c/0x3c [pcie_qcom]
pci_walk_bus+0x64/0xbc
qcom_pcie_host_post_init_2_7_0+0x28/0x34 [pcie_qcom]
The deadlock can easily be reproduced on machines like the Lenovo ThinkPad
X13s by adding a delay to increase the race window during asynchronous
probe where another thread can take a write lock.
Add a new pci_set_power_state_locked() and associated helper functions that
can be called with the PCI bus semaphore held to avoid taking the read lock
twice.
Link: https://lore.kernel.org/r/ZZu0qx2cmn7IwTyQ@hovoldconsulting.com
Link: https://lore.kernel.org/r/20240130100243.11011-1-johan+linaro@kernel.org
Fixes: f93e71aea6c6 ("Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"")
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org> # 6.7
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26162
Upstream Status: f93e71aea6c60ebff8adbd8941e678302d377869
commit f93e71aea6c60ebff8adbd8941e678302d377869
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Mon Jan 1 12:08:18 2024 -0600
Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
This reverts commit 08d0cc5f34265d1a1e3031f319f594bd1970976c.
Michael reported that when attempting to resume from suspend to RAM on ASUS
mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1), 08d0cc5f3426
("PCI/ASPM: Remove pcie_aspm_pm_state_change()") caused a 12-second delay
with no output, followed by a reboot.
Workarounds include:
- Reverting 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
- Booting with "pcie_aspm=off"
- Booting with "pcie_aspm.policy=performance"
- "echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm"
before suspending
- Connecting a USB flash drive
Link: https://lore.kernel.org/r/20240102232550.1751655-1-helgaas@kernel.org
Fixes: 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
Reported-by: Michael Schaller <michael@5challer.de>
Link: https://lore.kernel.org/r/76c61361-b8b4-435f-a9f1-32b716763d62@5challer.de
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-26162
Upstream Status: 164f66be0c2523e65df41b755c41b7c9ff58035a
commit 164f66be0c2523e65df41b755c41b7c9ff58035a
Author: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Date: Wed Oct 18 17:56:17 2023 +0900
PCI: Add T_PVPERL macro
According to the PCIe CEM r5.0, sec 2.9.2, Power stable to PERST#
inactive interval is 100 ms as minimum. Add a macro so that the PCIe
controller drivers can make use of it.
Link: https://lore.kernel.org/linux-pci/20231018085631.1121289-2-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Serge Semin <fancer.lancer@gmail.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15044
Upstream Status: e78bd50b4078b3b2d9f85d97796b7c271e7860ca
commit e78bd50b4078b3b2d9f85d97796b7c271e7860ca
Author: Frank Li <Frank.Li@nxp.com>
Date: Mon Aug 21 14:48:13 2023 -0400
PCI: Add PCIE_PME_TO_L2_TIMEOUT_US L2 ready timeout value
Add the PCIE_PME_TO_L2_TIMEOUT_US macro to define the L2 ready timeout
as described in the PCI specifications.
Link: https://lore.kernel.org/r/20230821184815.2167131-2-Frank.Li@nxp.com
Signed-off-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Acked-by: Manivannan Sadhasivam <mani@kernel.org>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-2570
Upstream Status: 7b3ba09febf409117a6f5b3e8ae10d503a972fee
commit 7b3ba09febf409117a6f5b3e8ae10d503a972fee
Author: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Tue Apr 25 09:47:51 2023 +0300
PCI/PM: Shorten pci_bridge_wait_for_secondary_bus() wait time for slow links
With slow links (<= 5GT/s) active link reporting is not mandatory, so if a
device is disconnected during system sleep we might end up waiting for it
to respond for ~60s, which slows down resume time.
PCIe r6.0, sec 6.6.1, mandates that software must wait for at least 1s
before it can assume a device is broken, so use that minimum requirement
for slow links and bail out if the device doesn't respond within 1s.
However, if the port supports active link reporting we can wait longer as
we do with the fast links.
This should make system resume time faster for slow links as well while
still following the PCIe spec.
While there move the PCI_RESET_WAIT constant into pci.c because it is
not used outside of that file anymore.
Link: https://lore.kernel.org/r/20230425064751.24951-1-mika.westerberg@linux.intel.com
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-2570
Upstream Status: 1abb47390350a1bd5430390e296492e6248865b2
Conflict(s):
Upstream had two patches that effectively did the same thing which
obviously lead to conflicts:
3c0ec896a4b4 PCI/ASPM: Factor out waiting for link training to complete
9c7f136433d2 PCI/ASPM: Factor out pcie_wait_for_retrain()
These were resolved upstream via merge commit 1abb47390350 "Merge branch
'pci/enumeration'".
This patch does effectively the same thing as upstream. One way to
detect the needed content is to run 'git blame' on the files and
look for this patch's commit ID.
The end result was also verified by running 'diff -u' on the files
against upstream to assure that the content in question matched.
commit 1abb47390350a1bd5430390e296492e6248865b2
Merge: 0f32114ea074 08e3ed12ca86
Author: Bjorn Helgaas <bhelgaas@google.com>
Date: Mon Jun 26 12:59:56 2023 -0500
Merge branch 'pci/enumeration'
- Add PCI_EXT_CAP_ID_PL_32GT define (Ben Dooks)
- Propagate firmware node by calling device_set_node() for better
modularity (Andy Shevchenko)
- Discover Data Link Layer Link Active Reporting earlier so quirks can take
advantage of it (Maciej W. Rozycki)
- Use cached Data Link Layer Link Active Reporting capability in pciehp,
powerpc/eeh, and mlx5 (Maciej W. Rozycki)
- Run quirk for devices that require OS to clear Retrain Link earlier, so
later quirks can rely on it (Maciej W. Rozycki)
- Export pcie_retrain_link() for use outside ASPM (Maciej W. Rozycki)
- Add Data Link Layer Link Active Reporting as another way for
pcie_retrain_link() to determine the link is up (Maciej W. Rozycki)
- Work around link training failures (especially on the ASMedia ASM2824
switch) by training first at 2.5GT/s and then attempting higher rates
(Maciej W. Rozycki)
* pci/enumeration:
PCI: Add failed link recovery for device reset events
PCI: Work around PCIe link training failures
PCI: Use pcie_wait_for_link_status() in pcie_wait_for_link_delay()
PCI: Add support for polling DLLLA to pcie_retrain_link()
PCI: Export pcie_retrain_link() for use outside ASPM
PCI: Export PCIe link retrain timeout
PCI: Execute quirk_enable_clear_retrain_link() earlier
PCI/ASPM: Factor out waiting for link training to complete
PCI/ASPM: Avoid unnecessary pcie_link_state use
PCI/ASPM: Use distinct local vars in pcie_retrain_link()
net/mlx5: Rely on dev->link_active_reporting
powerpc/eeh: Rely on dev->link_active_reporting
PCI: pciehp: Rely on dev->link_active_reporting
PCI: Initialize dev->link_active_reporting earlier
PCI: of: Propagate firmware node by calling device_set_node()
PCI: Add PCI_EXT_CAP_ID_PL_32GT define
# Conflicts:
# drivers/pci/pcie/aspm.c
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-2570
Upstream Status: a89c82249c3763780522f763dd2e615e2ea114de
commit a89c82249c3763780522f763dd2e615e2ea114de
Author: Maciej W. Rozycki <macro@orcam.me.uk>
Date: Sun Jun 11 18:20:10 2023 +0100
PCI: Work around PCIe link training failures
Attempt to handle cases such as with a downstream port of the ASMedia
ASM2824 PCIe switch where link training never completes and the link
continues switching between speeds indefinitely with the data link layer
never reaching the active state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch,
using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433,
wired to a SiFive HiFive Unmatched board. In this setup the switches
should negotiate a link speed of 5.0GT/s, falling back to 2.5GT/s if
necessary.
Instead the link continues oscillating between the two speeds, at the rate
of 34-35 times per second, with link training reported repeatedly active
~84% of the time. Limiting the target link speed to 2.5GT/s with the
upstream ASM2824 device makes the two switches communicate correctly.
Removing the speed restriction afterwards makes the two devices switch to
5.0GT/s then.
Make use of these observations and detect the inability to train the link
by checking for the Data Link Layer Link Active status bit being off while
the Link Bandwidth Management Status indicating that hardware has changed
the link speed or width in an attempt to correct unreliable link operation.
Restrict the speed to 2.5GT/s then with the Target Link Speed field,
request a retrain and wait 200ms for the data link to go up. If this is
successful, lift the restriction, letting the devices negotiate a higher
speed.
Also check for a 2.5GT/s speed restriction the firmware may have already
arranged and lift it too with ports of devices known to continue working
afterwards (currently only ASM2824), that already report their data link
being up.
[bhelgaas: reorder and squash stubs from
https://lore.kernel.org/r/alpine.DEB.2.21.2306111619570.64925@angie.orcam.me.uk
to avoid adding stubs that do nothing]
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2203022037020.56670@angie.orcam.me.uk/
Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2305310038540.59226@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-2570
Upstream Status: 680e9c47a2293bcc6a67a6f13f3b23d4c456885b
commit 680e9c47a2293bcc6a67a6f13f3b23d4c456885b
Author: Maciej W. Rozycki <macro@orcam.me.uk>
Date: Sun Jun 11 18:19:53 2023 +0100
PCI: Add support for polling DLLLA to pcie_retrain_link()
Let the caller of pcie_retrain_link() specify whether they want to use the
LT bit or the DLLLA bit of the Link Status Register to determine if link
training has completed. It is up to the caller to verify whether the use
of the DLLLA bit, the implementation of which is optional, is valid for the
device requested.
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110310540.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-2570
Upstream Status: 37edd87eb621a96d33ee4eefe4b54cfc5a7e03df
commit 37edd87eb621a96d33ee4eefe4b54cfc5a7e03df
Author: Maciej W. Rozycki <macro@orcam.me.uk>
Date: Sun Jun 11 18:19:41 2023 +0100
PCI: Export pcie_retrain_link() for use outside ASPM
Export pcie_retrain_link() for link retrain needs outside ASPM. Struct
pcie_link_state is local to ASPM and only used by pcie_retrain_link() to
get at the associated PCI device, so change the operand and adjust the lone
call site accordingly. Document the interface. No functional change at
this point.
Link: https://lore.kernel.org/r/alpine.DEB.2.21.2306110229010.64925@angie.orcam.me.uk
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Myron Stowe <mstowe@redhat.com>