Commit Graph

997 Commits

Author SHA1 Message Date
Myron Stowe 66c02b62b2 PCI: Honor Max Link Speed when determining supported speeds
JIRA: https://issues.redhat.com/browse/RHEL-81906
Upstream Status: 3202ca221578850f34e0fea39dc6cfa745ed7aac

commit 3202ca221578850f34e0fea39dc6cfa745ed7aac
Author: Lukas Wunner <lukas@wunner.de>
Date:   Tue Dec 17 10:51:01 2024 +0100

    PCI: Honor Max Link Speed when determining supported speeds

    The Supported Link Speeds Vector in the Link Capabilities 2 Register
    indicates the *supported* link speeds.  The Max Link Speed field in the
    Link Capabilities Register indicates the *maximum* of those speeds.

    pcie_get_supported_speeds() neglects to honor the Max Link Speed field and
    will thus incorrectly deem higher speeds as supported.  Fix it.

    One user-visible issue addressed here is an incorrect value in the sysfs
    attribute "max_link_speed".

    But the main motivation is a boot hang reported by Niklas:  Intel JHL7540
    "Titan Ridge 2018" Thunderbolt controllers supports 2.5-8 GT/s speeds,
    but indicate 2.5 GT/s as maximum.  Ilpo recalls seeing this on more
    devices.  It can be explained by the controller's Downstream Ports
    supporting 8 GT/s if an Endpoint is attached, but limiting to 2.5 GT/s
    if the port interfaces to a PCIe Adapter, in accordance with USB4 v2
    sec 11.2.1:

       "This section defines the functionality of an Internal PCIe Port that
        interfaces to a PCIe Adapter. [...]
        The Logical sub-block shall update the PCIe configuration registers
        with the following characteristics: [...]
        Max Link Speed field in the Link Capabilities Register set to 0001b
        (data rate of 2.5 GT/s only).
        Note: These settings do not represent actual throughput. Throughput
        is implementation specific and based on the USB4 Fabric performance."

    The present commit is not sufficient on its own to fix Niklas' boot hang,
    but it is a prerequisite:  A subsequent commit will fix the boot hang by
    enabling bandwidth control only if more than one speed is supported.

    The GENMASK() macro used herein specifies 0 as lowest bit, even though
    the Supported Link Speeds Vector ends at bit 1.  This is done on purpose
    to avoid a GENMASK(0, 1) macro if Max Link Speed is zero.  That macro
    would be invalid as the lowest bit is greater than the highest bit.
    Ilpo has witnessed a zero Max Link Speed on Root Complex Integrated
    Endpoints in particular, so it does occur in practice.  (The Link
    Capabilities Register is optional on RCiEPs per PCIe r6.2 sec 7.5.3.)

    Fixes: d2bd39c0456b ("PCI: Store all PCIe Supported Link Speeds")
    Closes: https://lore.kernel.org/r/70829798889c6d779ca0f6cd3260a765780d1369.camel@kernel.org
    Link: https://lore.kernel.org/r/fe03941e3e1cc42fb9bf4395e302bff53ee2198b.1734428762.git.lukas@wunner.de
    Reported-by: Niklas Schnelle <niks@kernel.org>
    Tested-by: Niklas Schnelle <niks@kernel.org>
    Signed-off-by: Lukas Wunner <lukas@wunner.de>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2025-03-20 10:33:58 -06:00
Myron Stowe 9512e6dbf7 PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller
JIRA: https://issues.redhat.com/browse/RHEL-81906
Upstream Status: 665745f274870c921020f610e2c99a3b1613519b

commit 665745f274870c921020f610e2c99a3b1613519b
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Fri Oct 18 17:47:52 2024 +0300

    PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller

    This mostly reverts the commit b4c7d2076b ("PCI/LINK: Remove bandwidth
    notification"). An upcoming commit extends this driver building PCIe
    bandwidth controller on top of it.

    PCIe bandwidth notifications were first added in the commit e8303bb7a7
    ("PCI/LINK: Report degraded links via link bandwidth notification") but
    later had to be removed. The significant changes compared with the old
    bandwidth notification driver include:

    1) Don't print the notifications into kernel log, just keep the Link
       Speed cached in struct pci_bus updated. While somewhat unfortunate,
       the log spam was the source of complaints that eventually lead to
       the removal of the bandwidth notifications driver (see the links
       below for further information).

    2) Besides the Link Bandwidth Management Interrupt, also enable Link
       Autonomous Bandwidth Interrupt to cover the other source of bandwidth
       changes.

    3) Handle Link Speed updates robustly. Refresh the cached Link Speed
       when enabling Bandwidth Notification Interrupts, and solve the race
       between Link Speed read and LBMS/LABS update in
       pcie_bwnotif_irq_thread().

    4) Use concurrency safe LNKCTL RMW operations.

    5) The driver is now called PCIe bwctrl (bandwidth controller) instead
       of just bandwidth notifications because of increased scope and
       functionality within the driver.

    6) Coexist with the Target Link Speed quirk in pcie_failed_link_retrain().
       Provide LBMS counting API for it.

    7) Tweaks to variable/functions names for consistency and length reasons.

    Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus to
    keep track PCIe Link Speed changes.

    [bhelgaas: This is based on previous work by Alexandru Gagniuc
    <mr.nuke.me@gmail.com>; see e8303bb7a7 ("PCI/LINK: Report degraded links
    via link bandwidth notification")]

    Link: https://lore.kernel.org/r/20241018144755.7875-7-ilpo.jarvinen@linux.intel.com
    Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/
    Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/
    Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/
    Suggested-by: Lukas Wunner <lukas@wunner.de> # Building bwctrl on top of bwnotif
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    [bhelgaas: squash fix to drop IRQF_ONESHOT and convert to hardirq handler:
    https://lore.kernel.org/r/20241115165717.15233-1-ilpo.jarvinen@linux.intel.com]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Stefan Wahren <wahrenst@gmx.net>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2025-03-20 10:33:57 -06:00
Myron Stowe 13bc7bd987 PCI: Store all PCIe Supported Link Speeds
JIRA: https://issues.redhat.com/browse/RHEL-81906
Upstream Status: d2bd39c0456b75be9dfc7d774b8d021355c26ae3

commit d2bd39c0456b75be9dfc7d774b8d021355c26ae3
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Fri Oct 18 17:47:49 2024 +0300

    PCI: Store all PCIe Supported Link Speeds

    The PCIe bandwidth controller added by a subsequent commit will require
    selecting PCIe Link Speeds that are lower than the Maximum Link Speed.

    The struct pci_bus only stores max_bus_speed. Even if PCIe r6.1 sec 8.2.1
    currently disallows gaps in supported Link Speeds, the Implementation Note
    in PCIe r6.1 sec 7.5.3.18, recommends determining supported Link Speeds
    using the Supported Link Speeds Vector in the Link Capabilities 2 Register
    (when available) to "avoid software being confused if a future
    specification defines Links that do not require support for all slower
    speeds."

    Reuse code in pcie_get_speed_cap() to add pcie_get_supported_speeds() to
    query the Supported Link Speeds Vector of a PCIe device. The value is taken
    directly from the Supported Link Speeds Vector or synthesized from the Max
    Link Speed in the Link Capabilities Register when the Link Capabilities 2
    Register is not available.

    The Supported Link Speeds Vector in the Link Capabilities Register 2
    corresponds to the bus below on Root Ports and Downstream Ports, whereas it
    corresponds to the bus above on Upstream Ports and Endpoints (PCIe r6.1 sec
    7.5.3.18):

      Supported Link Speeds Vector - This field indicates the supported Link
      speed(s) of the associated Port.

    Add supported_speeds into the struct pci_dev that caches the
    Supported Link Speeds Vector.

    supported_speeds contains a set of Link Speeds only in the case where PCIe
    Link Speed can be determined. Root Complex Integrated Endpoints do not have
    a well-defined Link Speed because they do not implement either of the Link
    Capabilities Registers, which is allowed by PCIe r6.1 sec 7.5.3 (the same
    limitation applies to determining cur_bus_speed and max_bus_speed that are
    PCI_SPEED_UNKNOWN in such case). This is of no concern from PCIe bandwidth
    controller point of view because such devices are not attached into a PCIe
    Root Port that could be controlled.

    The supported_speeds field keeps the extra reserved zero at the least
    significant bit to match the Link Capabilities 2 Register layout.

    An attempt was made to store supported_speeds field into the struct pci_bus
    as an intersection of both ends of the Link, however, the subordinate
    struct pci_bus is not available early enough. The Target Speed quirk (in
    pcie_failed_link_retrain()) can run either during initial scan or later,
    requiring it to use the API provided by the PCIe bandwidth controller to
    set the Target Link Speed in order to co-exist with the bandwidth
    controller. When the Target Speed quirk is calling the bandwidth controller
    during initial scan, the struct pci_bus is not yet initialized. As such,
    storing supported_speeds into the struct pci_bus is not viable.

    Suggested-by: Lukas Wunner <lukas@wunner.de>
    Link: https://lore.kernel.org/r/20241018144755.7875-4-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    [bhelgaas: move pcie_get_supported_speeds() decl to drivers/pci/pci.h]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2025-03-20 10:33:56 -06:00
Myron Stowe e5d50346c5 PCI: Fix pci_enable_acs() support for the ACS quirks
JIRA: https://issues.redhat.com/browse/RHEL-67693
Upstream Status: f3c3ccc4fe49dbc560b01d16bebd1b116c46c2b4

commit f3c3ccc4fe49dbc560b01d16bebd1b116c46c2b4
Author: Jason Gunthorpe <jgg@ziepe.ca>
Date:   Wed Oct 16 20:52:33 2024 -0300

    PCI: Fix pci_enable_acs() support for the ACS quirks

    There are ACS quirks that hijack the normal ACS processing and deliver to
    to special quirk code. The enable path needs to call
    pci_dev_specific_enable_acs() and then pci_dev_specific_acs_enabled() will
    report the hidden ACS state controlled by the quirk.

    The recent rework got this out of order and we should try to call
    pci_dev_specific_enable_acs() regardless of any actual ACS support in the
    device.

    As before command line parameters that effect standard PCI ACS don't
    interact with the quirk versions, including the new config_acs= option.

    Link: https://lore.kernel.org/r/0-v1-f96b686c625b+124-pci_acs_quirk_fix_jgg@nvidia.com
    Fixes: 47c8846a49ba ("PCI: Extend ACS configurability")
    Reported-by: Jiri Slaby <jirislaby@kernel.org>
    Closes: https://lore.kernel.org/all/e89107da-ac99-4d3a-9527-a4df9986e120@kernel.org
    Closes: https://bugzilla.suse.com/show_bug.cgi?id=1229019
    Tested-by: Steffen Dirkwinkel <me@steffen.cc>
    Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2025-02-18 09:48:10 -07:00
Myron Stowe d23dc59b04 PCI: Pass domain number to pci_bus_release_domain_nr() explicitly
JIRA: https://issues.redhat.com/browse/RHEL-67693
Upstream Status: 0cca961a026177af69044f10d6ae76d8ce043764

commit 0cca961a026177af69044f10d6ae76d8ce043764
Author: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Date:   Thu Sep 12 11:00:25 2024 +0530

    PCI: Pass domain number to pci_bus_release_domain_nr() explicitly

    The pci_bus_release_domain_nr() API is supposed to free the domain
    number allocated by pci_bus_find_domain_nr(). Most of the callers of
    pci_bus_find_domain_nr(), store the domain number in pci_bus::domain_nr.

    As such, the pci_bus_release_domain_nr() implicitly frees the domain
    number by dereferencing 'struct pci_bus'. However, one of the callers
    of this API, the PCI endpoint subsystem, doesn't have 'struct pci_bus',
    so it only passes NULL. Due to this, the API will end up dereferencing
    the NULL pointer.

    To fix this issue, pass the domain number to this API explicitly. Since
    'struct pci_bus' is not used for anything else other than extracting the
    domain number, it makes sense to pass the domain number directly.

    Fixes: 0328947c5032 ("PCI: endpoint: Assign PCI domain number for endpoint controllers")
    Closes: https://lore.kernel.org/linux-pci/c0c40ddb-bf64-4b22-9dd1-8dbb18aa2813@stanley.mountain
    Link: https://lore.kernel.org/linux-pci/20240912053025.25314-1-manivannan.sadhasivam@linaro.org
    Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
    Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
    [kwilczynski: commit log]
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2025-02-18 09:48:10 -07:00
Myron Stowe 731f98d0ae PCI: Rename CRS Completion Status to RRS
JIRA: https://issues.redhat.com/browse/RHEL-67693
Upstream Status: 87f10faf166a9114aa0d4132298cad379de16fdd

commit 87f10faf166a9114aa0d4132298cad379de16fdd
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Aug 27 18:48:48 2024 -0500

    PCI: Rename CRS Completion Status to RRS

    PCIe r6.0 changed the abbreviation for "Configuration Request Retry Status"
    Completion Status from "CRS" to "RRS" and uses the terminology of
    "Configuration RRS Software Visibility" instead of "CRS Software
    Visibility".

    Align the Linux usage with the r6.0 spec language.  No functional change
    intended.

    It's confusing to make this change, but I think "RRS" *is* a better
    abbreviation because it was easy to interpret "CRS" as "Completion Retry
    Status", which really didn't make any sense.

    Link: https://lore.kernel.org/r/20240827234848.4429-4-helgaas@kernel.org
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2025-02-17 12:01:29 -07:00
Rado Vrbovsky 2ba815bf62 Merge: PCI/ASPM: PCIe link training fixes
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6055

```
JIRA: https://issues.redhat.com/browse/RHEL-71363

This series include a set of key fixes related to PCIe's link training from
upstream v6.12.

Signed-off-by: Myron Stowe <mstowe@redhat.com>
```

Approved-by: Charles Mirabile <cmirabil@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2025-01-23 13:14:39 +00:00
Myron Stowe 5f7319dc18 PCI: Wait for Link before restoring Downstream Buses
JIRA: https://issues.redhat.com/browse/RHEL-71363
Upstream Status: 3e40aa29d47e231a54640addf6a09c1f64c5b63f

commit 3e40aa29d47e231a54640addf6a09c1f64c5b63f
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Thu Aug 8 15:17:07 2024 +0300

    PCI: Wait for Link before restoring Downstream Buses

    __pci_reset_bus() calls pci_bridge_secondary_bus_reset() to perform the
    reset and also waits for the Secondary Bus to become again accessible.
    __pci_reset_bus() then calls pci_bus_restore_locked() that restores the PCI
    devices connected to the bus, and if necessary, recursively restores also
    the subordinate buses and their devices.

    The logic in pci_bus_restore_locked() does not take into account that after
    restoring a device on one level, there might be another Link Downstream
    that can only start to come up after restore has been performed for its
    Downstream Port device. That is, the Link may require additional wait until
    it becomes accessible.

    Similarly, pci_slot_restore_locked() lacks wait.

    Amend pci_bus_restore_locked() and pci_slot_restore_locked() to wait for
    the Secondary Bus before recursively performing the restore of that bus.

    Fixes: 090a3c5322 ("PCI: Add pci_reset_slot() and pci_reset_bus()")
    Link: https://lore.kernel.org/r/20240808121708.2523-1-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-12-18 08:00:17 -07:00
Myron Stowe 2844a0051c PCI: Use an error code with PCIe failed link retraining
JIRA: https://issues.redhat.com/browse/RHEL-71363
Upstream Status: 59100eb248c0b15585affa546c7f6834b30eb5a4

commit 59100eb248c0b15585affa546c7f6834b30eb5a4
Author: Maciej W. Rozycki <macro@orcam.me.uk>
Date:   Fri Aug 9 14:25:02 2024 +0100

    PCI: Use an error code with PCIe failed link retraining

    Given how the call place in pcie_wait_for_link_delay() got structured now,
    and that pcie_retrain_link() returns a potentially useful error code,
    convert pcie_failed_link_retrain() to return an error code rather than a
    boolean status, fixing handling at the call site mentioned.  Update the
    other call site accordingly.

    Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
    Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091156530.61955@angie.orcam.me.uk
    Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Link: https://lore.kernel.org/r/aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com/
    Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Cc: <stable@vger.kernel.org> # v6.5+

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-12-18 07:59:58 -07:00
Myron Stowe 916f943e31 PCI: Clear the LBMS bit after a link retrain
JIRA: https://issues.redhat.com/browse/RHEL-71363
Upstream Status: 8037ac08c2bbb3186f83a5a924f52d1048dbaec5

commit 8037ac08c2bbb3186f83a5a924f52d1048dbaec5
Author: Maciej W. Rozycki <macro@orcam.me.uk>
Date:   Fri Aug 9 14:24:46 2024 +0100

    PCI: Clear the LBMS bit after a link retrain

    The LBMS bit, where implemented, is set by hardware either in response
    to the completion of retraining caused by writing 1 to the Retrain Link
    bit or whenever hardware has changed the link speed or width in attempt
    to correct unreliable link operation.  It is never cleared by hardware
    other than by software writing 1 to the bit position in the Link Status
    register and we never do such a write.

    We currently have two places, namely apply_bad_link_workaround() and
    pcie_failed_link_retrain() in drivers/pci/controller/dwc/pcie-tegra194.c
    and drivers/pci/quirks.c respectively where we check the state of the LBMS
    bit and neither is interested in the state of the bit resulting from the
    completion of retraining, both check for a link fault.

    And in particular pcie_failed_link_retrain() causes issues consequently, by
    trying to retrain a link where there's no downstream device anymore and the
    state of 1 in the LBMS bit has been retained from when there was a device
    downstream that has since been removed.

    Clear the LBMS bit then at the conclusion of pcie_retrain_link(), so that
    we have a single place that controls it and that our code can track link
    speed or width changes resulting from unreliable link operation.

    Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
    Link: https://lore.kernel.org/r/alpine.DEB.2.21.2408091133140.61955@angie.orcam.me.uk
    Reported-by: Matthew W Carlis <mattc@purestorage.com>
    Link: https://lore.kernel.org/r/20240806000659.30859-1-mattc@purestorage.com/
    Link: https://lore.kernel.org/r/20240722193407.23255-1-mattc@purestorage.com/
    Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    Cc: <stable@vger.kernel.org> # v6.5+

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-12-18 07:59:22 -07:00
Myron Stowe 7307178813 PCI: Wait for device readiness with Configuration RRS
JIRA: https://issues.redhat.com/browse/RHEL-71363
Upstream Status: d591f6804e7e1310881c9224d72247a2b65039af

commit d591f6804e7e1310881c9224d72247a2b65039af
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Aug 27 18:48:46 2024 -0500

    PCI: Wait for device readiness with Configuration RRS

    After a device reset, delays are required before the device can
    successfully complete config accesses.  PCIe r6.0, sec 6.6, specifies some
    delays required before software can perform config accesses.  Devices that
    require more time after those delays may respond to config accesses with
    Configuration Request Retry Status (RRS) completions.

    Callers of pci_dev_wait() are responsible for delays until the device can
    respond to config accesses.  pci_dev_wait() waits any additional time until
    the device can successfully complete config accesses.

    Reading config space of devices that are not present or not ready typically
    returns ~0 (PCI_ERROR_RESPONSE).  Previously we polled the Command register
    until we got a value other than ~0.  This is sometimes a problem because
    Root Complex handling of RRS completions may include several retries and
    implementation-specific behavior that is invisible to software (see sec
    2.3.2), so the exponential backoff in pci_dev_wait() may not work as
    intended.

    Linux enables Configuration RRS Software Visibility on all Root Ports that
    support it.  If it is enabled, read the Vendor ID instead of the Command
    register.  RRS completions cause immediate return of the 0x0001 reserved
    Vendor ID value, so the pci_dev_wait() backoff works correctly.

    When a read of Vendor ID eventually completes successfully by returning a
    non-0x0001 value (the Vendor ID or 0xffff for VFs), the device should be
    initialized and ready to respond to config requests.

    For conventional PCI devices or devices below Root Ports that don't support
    Configuration RRS Software Visibility, poll the Command register as before.

    This was developed independently, but is very similar to Stanislav
    Spassov's previous work at
    https://lore.kernel.org/linux-pci/20200223122057.6504-1-stanspas@amazon.com

    Link: https://lore.kernel.org/r/20240827234848.4429-2-helgaas@kernel.org
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Duc Dang <ducdang@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-12-18 07:59:11 -07:00
Robert Foss c7bc023366
PM: runtime: Simplify pm_runtime_get_if_active() usage
JIRA: https://issues.redhat.com/browse/RHEL-53569
Upstream Status: v6.9-rc1

Conflicts:
	Conflicts due to whitespace change DRM v6.9 backport
        drivers/gpu/drm/i915/intel_runtime_pm.c

	0d08026ac609 ("net: ipa: kill ipa_clock_get_additional()")
        drivers/net/ipa/ipa_smp2p.c

	d3fcd7360338 ("PCI: Fix runtime PM race with PME polling")
        drivers/pci/pci.c

commit c0ef3df8dbaef51ee4cfd58a471adf2eaee6f6b3
Author:     Sakari Ailus <sakari.ailus@linux.intel.com>
AuthorDate: Tue Jan 30 13:28:05 2024 +0200
Commit:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CommitDate: Mon Feb 12 16:57:47 2024 +0100

    There are two ways to opportunistically increment a device's runtime PM
    usage count, calling either pm_runtime_get_if_active() or
    pm_runtime_get_if_in_use(). The former has an argument to tell whether to
    ignore the usage count or not, and the latter simply calls the former with
    ign_usage_count set to false. The other users that want to ignore the
    usage_count will have to explicitly set that argument to true which is a
    bit cumbersome.

    To make this function more practical to use, remove the ign_usage_count
    argument from the function. The main implementation is in a static
    function called pm_runtime_get_conditional() and implementations of
    pm_runtime_get_if_active() and pm_runtime_get_if_in_use() are moved to
    runtime.c.

    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Reviewed-by: Alex Elder <elder@linaro.org>
    Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Acked-by: Takashi Iwai <tiwai@suse.de> # sound/
    Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> # drivers/accel/ivpu/
    Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> # drivers/gpu/drm/i915/
    Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Signed-off-by: Robert Foss <rfoss@redhat.com>
2024-12-17 22:59:19 +01:00
Rado Vrbovsky 191f608532 Merge: PCI: ACS updates
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5246

```
JIRA: https://issues.redhat.com/browse/RHEL-48601

Signed-off-by: Myron Stowe <mstowe@redhat.com>

```

Approved-by: John W. Linville <linville@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-12-09 08:21:20 +00:00
Myron Stowe 854d83025c PCI: Bring the PCIe speed to MBps logic to new pcie_dev_speed_mbps()
JIRA: https://issues.redhat.com/browse/RHEL-65598
Upstream Status: 100ae5d77f07f9f046106e228778c7aa1c6d3af3

commit 100ae5d77f07f9f046106e228778c7aa1c6d3af3
Author: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Date:   Wed Jun 19 20:41:12 2024 +0530

    PCI: Bring the PCIe speed to MBps logic to new pcie_dev_speed_mbps()

    Bring the switch case in pcie_link_speed_mbps() to new function to
    the header file so that it can be used in other places like
    in controller driver.

    Link: https://lore.kernel.org/linux-pci/20240619-opp_support-v15-3-aa769a2173a3@quicinc.com
    Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-11-04 15:44:21 -07:00
Rado Vrbovsky 14b4cc02eb Merge: BPF 6.9 rebase
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5142

Rebase BPF subsystem to upstream version 6.9

JIRA: https://issues.redhat.com/browse/RHEL-23649

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>

Approved-by: Viktor Malik <vmalik@redhat.com>
Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Mark Salter <msalter@redhat.com>
Approved-by: Toke Høiland-Jørgensen <toke@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-30 07:25:08 +00:00
Rado Vrbovsky aae21e3edb Merge: Update CXL subsystem with content from v6.10
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5049

Back-port kernel's CXL subsystem core content from upstream v6.10

Notably excluded is "cxl/dax: Create dax devices for CXL RAM regions" (09d09e04d2fc).

Also, the memory tiering code is updated to match v6.10.

## Approved Development Ticket
JIRA: https://issues.redhat.com/browse/RHEL-54609    
Depends: !4961 

Signed-off-by: John W. Linville <linville@redhat.com>

Approved-by: Chris von Recklinghausen <crecklin@redhat.com>
Approved-by: Jeff Moyer <jmoyer@redhat.com>
Approved-by: Myron Stowe <mstowe@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-30 07:20:45 +00:00
Rado Vrbovsky 67448d15b8 Merge: Update kernel's PCI subsystem to v6.11
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5357

```
This series updates RHEL9's PCI subsystem with content from upstream v6.11 -

  Merge tag 'pci-v6.11-fixes-4' of git://git.kernel.org/pub/scm/../pci/pci
  https://lkml.org/lkml/2024/9/13/
  commit b7718454f937f50f44f98c1222f5135eaef29132
  Merge: e936e7d4a83b fc8c818e7569

  Merge tag 'pci-v6.11-fixes-3' of git://git.kernel.org/pub/scm/../pci/pci
  https://lkml.org/lkml/2024/9/6/1405
  commit 487ee43bac846446fb3e832436bdedd7acb4fe46
  Merge: a86b83f77797 8f62819aaace
  4 files changed, 44 insertions(+), 5 deletions(-)

  Merge tag 'pci-v6.11-fixes-2' of git://git.kernel.org/pub/scm/../pci/pci
  https://lkml.org/lkml/2024/8/30/1561
  commit 8101b2766d5bfee43a4de737107b9592db251470
  Merge: 216d163165a9 150b572a7c1d
  3 files changed, 21 insertions(+), 2 deletions(-)


  Merge tag 'pci-v6.11-fixes-1' of git://git.kernel.org/pub/scm/../pci/pci
  https://lkml.org/lkml/2024/8/1/1278
  commit c0ecd6388360d930440cc5554026818895199923
  Merge: 183d46ff422e 5560a612c20d
  2 files changed, 11 insertions(+), 8 deletions(-)

  Merge tag 'pci-v6.11-changes' of git://git.kernel.org/pub/scm/../pci/pci
  https://lkml.org/lkml/2024/7/19/844
  commit 3f386cb8ee9f04ff4be164ca7a1d0ef3f81f7374
  Merge: 8e5c0abfa02d 45659274e608
  105 files changed, 5208 insertions(+), 1932 deletions(-)


All but three of patches within the series back-ported cleanly.  However,
there were a few back-ports where some changes were made to the originating
upstream patch due to it either not being quite up to date with more recent
changes, or subsequent changes were made during its merge commit.  All such
occurances are noted in the back-port's commit message with the same changes
that occurred upstream being made in the back-port to keep things in sync.

v2: Removing back-ports of merge commit df5dd337283a "Merge branch
    'pci/controller/qcom'" due to prerequisite content that conflicts
    with other MRs.  Will create a separate MR for df5dd337283a once
    the dependent MRs have merged.

JIRA: https://issues.redhat.com/browse/RHEL-59033

Signed-off-by: Myron Stowe <mstowe@redhat.com>
```

Approved-by: Andrew Halaney <ahalaney@redhat.com>
Approved-by: Jarod Wilson <jarod@redhat.com>
Approved-by: Mika Penttilä <mpenttil@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Ivan Vecera <ivecera@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-10-19 08:16:08 +00:00
Jerome Marchand 33482c3f06 mm: Introduce vmap_page_range() to map pages in PCI address space
JIRA: https://issues.redhat.com/browse/RHEL-23649

Conflicts: There is no loongarch arch on RHEL-9 kernel.

commit d7bca9199a27b8690ae1c71dc11f825154af7234
Author: Alexei Starovoitov <ast@kernel.org>
Date:   Fri Mar 8 09:12:54 2024 -0800

    mm: Introduce vmap_page_range() to map pages in PCI address space

    ioremap_page_range() should be used for ranges within vmalloc range only.
    The vmalloc ranges are allocated by get_vm_area(). PCI has "resource"
    allocator that manages PCI_IOBASE, IO_SPACE_LIMIT address range, hence
    introduce vmap_page_range() to be used exclusively to map pages
    in PCI address space.

    Fixes: 3e49a866c9dc ("mm: Enforce VM_IOREMAP flag and range in ioremap_page_range.")
    Reported-by: Miguel Ojeda <ojeda@kernel.org>
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Tested-by: Miguel Ojeda <ojeda@kernel.org>
    Link: https://lore.kernel.org/bpf/CANiq72ka4rir+RTN2FQoT=Vvprp_Ao-CvoYEkSNqtSY+RZj+AA@mail.gmail.com

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
2024-10-15 10:49:14 +02:00
John W. Linville b0fc7cbc66 PCI/CXL: Add 'cxl_bus' reset method for devices below CXL Ports
JIRA: https://issues.redhat.com/browse/RHEL-54609

By default Secondary Bus Reset (SBR) is masked for CXL Ports (see CXL r3.1,
sec 8.1.5.2).

Add cxl_reset_bus_function() (method "cxl_bus") to set the "Unmask SBR" bit
in the upstream CXL Port before performing the bus reset and restore the
original value afterwards.

This method allows the user to perform a bus reset on a CXL device without
needing to set the "Unmask SBR" bit via a user tool.

Link: https://lore.kernel.org/r/20240502165851.1948523-5-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[bhelgaas: simplify commit log, invert condition to avoid negation]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from commit 53c49b6e6dd2ebc1d3257ae838e067699229bc8d)
Signed-off-by: John W. Linville <linville@redhat.com>
2024-10-07 14:03:30 -04:00
John W. Linville 9c6c2e14df PCI/CXL: Fail bus reset if upstream CXL Port has SBR masked
JIRA: https://issues.redhat.com/browse/RHEL-54609

Per CXL spec r3.1, sec 8.1.5.2, the Secondary Bus Reset (SBR) bit in the
Bridge Control register of a CXL port has no effect unless the "Unmask SBR"
bit is set.

Return -ENOTTY if we attempt a bus reset on a device below a CXL Port where
"Unmask SBR" is 0.  Otherwise, the bus reset would appear to have succeeded
even though setting the bridge SBR bit had no effect.

Link: https://lore.kernel.org/linux-cxl/20240220203956.GA1502351@bhelgaas/
Link: https://lore.kernel.org/r/20240502165851.1948523-4-dave.jiang@intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
[bhelgaas: simplify commit log and comments]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from commit b1956e2d0713e210a56ae65ad3488ae36f833e76)
Signed-off-by: John W. Linville <linville@redhat.com>
2024-10-07 14:03:30 -04:00
John W. Linville 9fcbacec86 cxl: Calculate and store PCI link latency for the downstream ports
JIRA: https://issues.redhat.com/browse/RHEL-54609

The latency is calculated by dividing the flit size over the bandwidth. Add
support to retrieve the flit size for the CXL switch device and calculate
the latency of the PCIe link. Cache the latency number with cxl_dport.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/170319621931.2212653.6800240203604822886.stgit@djiang5-mobl3
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from commit 4d07a05397c8c15c37c8c3abb7afaea1dcd2f0e7)
Signed-off-by: John W. Linville <linville@redhat.com>
2024-10-07 13:43:50 -04:00
Myron Stowe 1f804955d7 PCI: Warn on missing cfg_access_lock during secondary bus reset
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 920f6468924f8dc7e0e6e1510d000888592ef861

Conflict(s):
  There isn't a conflict per sey; the upstream patch was based on
  code prior to commit c9d52fb313d3 "PCI: Revert the cfg_access_lock
  lockdep mechanism".  However, commit c9d52fb313d3 was in place
  prior to this patch so it doesn't apply cleanly.


commit 920f6468924f8dc7e0e6e1510d000888592ef861
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Thu May 30 18:04:29 2024 -0700

    PCI: Warn on missing cfg_access_lock during secondary bus reset

    The recent adventure with adding lockdep tracking for cfg_access_lock,
    while it yielded many false positives [1], did catch a true positive in the
    pci_reset_bus() path [2].

    So, while lockdep is difficult to deploy, open coding a check that
    cfg_access_lock is held during the reset is feasible.

    While this does not offer a full backtrace, it should be sufficient to
    implicate the caller of pci_bridge_secondary_bus_reset() as a path that
    needs investigation.

    Link: https://lore.kernel.org/r/171711746953.1628941.4692125082286867825.stgit@dwillia2-xfh.jf.intel.com
    Link: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html [1]
    Link: http://lore.kernel.org/r/cfb50601-5d2a-4676-a958-1bd3f1b06654@intel.com [2]
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Hans de Goede <hdegoede@redhat.com>
    Tested-by: Kalle Valo <kvalo@kernel.org>
    Reviewed-by: Dave Jiang <dave.jiang@intel.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-10-01 13:25:26 -06:00
Myron Stowe 2df1e4bea1 PCI: Fix devres regression in pci_intx()
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 00f89ae4e759a7eef07e4188e1534af7dd2c7e9c

commit 00f89ae4e759a7eef07e4188e1534af7dd2c7e9c
Author: Philipp Stanner <pstanner@redhat.com>
Date:   Thu Jul 25 14:07:30 2024 +0200

    PCI: Fix devres regression in pci_intx()

    pci_intx() becomes managed if pcim_enable_device() has been called in
    advance. Commit 25216afc9db5 ("PCI: Add managed pcim_intx()") changed this
    behavior so that pci_intx() always leads to creation of a separate device
    resource for itself, whereas earlier, a shared resource was used for all
    PCI devres operations.

    Unfortunately, pci_intx() seems to be used in some drivers' remove() paths;
    in the managed case this causes a device resource to be created on driver
    detach, which causes .probe() to fail if the driver is reloaded:

      pci 0000:00:1f.2: Resources present before probing

    Fix the regression by only redirecting pci_intx() to its managed twin
    pcim_intx() if the pci_command changes.

    Link: https://lore.kernel.org/r/20240725120729.59788-2-pstanner@redhat.com
    Fixes: 25216afc9db5 ("PCI: Add managed pcim_intx()")
    Reported-by: Damien Le Moal <dlemoal@kernel.org>
    Closes: https://lore.kernel.org/all/b8f4ba97-84fc-4b7e-ba1a-99de2d9f0118@kernel.org/
    Signed-off-by: Philipp Stanner <pstanner@redhat.com>
    [bhelgaas: add error message to commit log]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Damien Le Moal <dlemoal@kernel.org>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-10-01 11:50:43 -06:00
Myron Stowe 2d1cf513dd PCI: Add managed pcim_intx()
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 25216afc9db53d85dc648aba8fb7f6d31f2c8731

commit 25216afc9db53d85dc648aba8fb7f6d31f2c8731
Author: Philipp Stanner <pstanner@redhat.com>
Date:   Thu Jun 13 13:50:23 2024 +0200

    PCI: Add managed pcim_intx()

    pci_intx() is a "hybrid" function, i.e., it is managed if
    pcim_enable_device() has been called, but unmanaged otherwise.

    Add pcim_intx(), which is always managed, and implement pci_intx() using
    it.

    Remove the now-unused struct pci_devres.orig_intx and .restore_intx and
    find_pci_dr().

    Link: https://lore.kernel.org/r/20240613115032.29098-11-pstanner@redhat.com
    Signed-off-by: Philipp Stanner <pstanner@redhat.com>
    [kwilczynski: squashed in
    https://lore.kernel.org/r/426645d40776198e0fcc942f4a6cac4433c7a9aa.camel@red
hat.com
    to fix problem reported and tested by Ashish Kalra <Ashish.Kalra@amd.com>:
    https://lore.kernel.org/r/20240708214656.4721-1-Ashish.Kalra@amd.com
    https://lore.kernel.org/r/8c4634e9-4f02-4c54-9c89-d75e2f4bf026@amd.com/]
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    [bhelgaas: commit log]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-10-01 11:50:42 -06:00
Myron Stowe 43236dc826 PCI: Remove struct pci_devres.enabled status bit
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 77f79ac8de0f490fca4f0a5f2e1e38eeee191f05

commit 77f79ac8de0f490fca4f0a5f2e1e38eeee191f05
Author: Philipp Stanner <pstanner@redhat.com>
Date:   Thu Jun 13 13:50:20 2024 +0200

    PCI: Remove struct pci_devres.enabled status bit

    The struct pci_devres has a separate boolean to track whether a device is
    enabled. That, however, can easily be tracked in an agnostic manner through
    the function pci_is_enabled().

    Using it allows for simplifying the PCI devres implementation.

    Replace the separate 'enabled' status bit from struct pci_devres with
    calls to pci_is_enabled() at the appropriate places.

    Link: https://lore.kernel.org/r/20240613115032.29098-8-pstanner@redhat.com
    Signed-off-by: Philipp Stanner <pstanner@redhat.com>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-10-01 11:50:42 -06:00
Myron Stowe ec1f828164 PCI: Document hybrid devres hazards
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: 81fcf28e74a3ffda67a6896cd38843d80bc9ec68

commit 81fcf28e74a3ffda67a6896cd38843d80bc9ec68
Author: Philipp Stanner <pstanner@redhat.com>
Date:   Thu Jun 13 13:50:19 2024 +0200

    PCI: Document hybrid devres hazards

    These functions:

      pci_request_region()
      pci_request_regions()
      pci_request_regions_exclusive()
      pci_request_selected_regions()
      pci_request_selected_regions_exclusive()
      pci_intx()

    are "hybrid" functions that are managed if pcim_enable_device() has been
    called, but unmanaged otherwise.

    This is confusing and has already caused a bug (in 8558de401b
    ("drm/vboxvideo: use managed pci functions")) because users believe all PCI
    functions, such as pci_iomap_range(), can become managed that way, which is
    not the case.

    Add comments to the relevant functions' docstrings that warn users about
    this behavior.

    Link: https://lore.kernel.org/r/20240613115032.29098-7-pstanner@redhat.com
    Signed-off-by: Philipp Stanner <pstanner@redhat.com>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    [bhelgaas: commit log]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-10-01 11:50:42 -06:00
Myron Stowe 662face48b PCI: Add managed pcim_request_region()
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: d47bde708086c77b1ceeb7643e600089f63dd03b

commit d47bde708086c77b1ceeb7643e600089f63dd03b
Author: Philipp Stanner <pstanner@redhat.com>
Date:   Thu Jun 13 13:50:18 2024 +0200

    PCI: Add managed pcim_request_region()

    These existing functions:

      pci_request_region()
      pci_request_selected_regions()
      pci_request_selected_regions_exclusive()

    are "hybrid" functions built on __pci_request_region() and are managed if
    pcim_enable_device() has been called, but unmanaged otherwise.

    Add these new functions:

      pcim_request_region()
      pcim_request_region_exclusive()

    These are *always* managed and use the new pcim_addr_devres tracking
    infrastructure instead of find_pci_dr() and struct pci_devres.region_mask.

    Implement the hybrid functions using the new "pure" functions and remove
    struct pci_devres.region_mask, which is no longer needed.

    Link: https://lore.kernel.org/r/20240613115032.29098-6-pstanner@redhat.com
    Signed-off-by: Philipp Stanner <pstanner@redhat.com>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    [bhelgaas: commit log]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-10-01 11:50:42 -06:00
Myron Stowe 37d2c8944f PCI: Add managed partial-BAR request and map infrastructure
JIRA: https://issues.redhat.com/browse/RHEL-59033
Upstream Status: bbaff68bf4a404bee5f5e20e7b1e30301b26304a

commit bbaff68bf4a404bee5f5e20e7b1e30301b26304a
Author: Philipp Stanner <pstanner@redhat.com>
Date:   Thu Jun 13 13:50:16 2024 +0200

    PCI: Add managed partial-BAR request and map infrastructure

    The pcim_iomap_devres table tracks entire-BAR mappings, so we can't use it
    to build a managed version of pci_iomap_range(), which maps partial BARs.

    Add struct pcim_addr_devres, which can track request and mapping of both
    entire BARs and partial BARs.

    Add the following internal devres functions based on struct
    pcim_addr_devres:

      pcim_iomap_region()               # request & map entire BAR
      pcim_iounmap_region()             # unmap & release entire BAR
      pcim_request_region()             # request entire BAR
      pcim_release_region()             # release entire BAR
      pcim_request_all_regions()        # request all entire BARs
      pcim_release_all_regions()        # release all entire BARs

    Rework the following public interfaces using the new infrastructure
    listed above:

      pcim_iomap()                      # map partial BAR
      pcim_iounmap()                    # unmap partial BAR
      pcim_iomap_regions()              # request & map specified BARs
      pcim_iomap_regions_request_all()  # request all BARs, map specified BARs
      pcim_iounmap_regions()            # unmap & release specified BARs

    Link: https://lore.kernel.org/r/20240613115032.29098-4-pstanner@redhat.com
    Signed-off-by: Philipp Stanner <pstanner@redhat.com>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    [bhelgaas: commit log]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-10-01 11:50:41 -06:00
Myron Stowe 4eeffc7615 PCI: Extend ACS configurability
JIRA: https://issues.redhat.com/browse/RHEL-48601
Upstream Status: 47c8846a49baa8c0b7a6a3e7e7eacd6e8d119d25

commit 47c8846a49baa8c0b7a6a3e7e7eacd6e8d119d25
Author: Vidya Sagar <vidyas@nvidia.com>
Date:   Tue Jun 25 21:01:50 2024 +0530

    PCI: Extend ACS configurability

    PCIe ACS settings control the level of isolation and the possible P2P paths
    between devices. With greater isolation the kernel will create smaller
    iommu_groups and with less isolation there is more HW that can achieve P2P
    transfers. From a virtualization perspective all devices in the same
    iommu_group must be assigned to the same VM as they lack security
    isolation.

    There is no way for the kernel to automatically know the correct ACS
    settings for any given system and workload. Existing command line options
    (e.g., disable_acs_redir) allow only for large scale change, disabling all
    isolation, but this is not sufficient for more complex cases.

    Add a kernel command-line option 'config_acs' to directly control all the
    ACS bits for specific devices, which allows the operator to setup the right
    level of isolation to achieve the desired P2P configuration.  The
    definition is future proof; when new ACS bits are added to the spec the
    open syntax can be extended.

    ACS needs to be setup early in the kernel boot as the ACS settings affect
    how iommu_groups are formed. iommu_group formation is a one time event
    during initial device discovery, so changing ACS bits after kernel boot can
    result in an inaccurate view of the iommu_groups compared to the current
    isolation configuration.

    ACS applies to PCIe Downstream Ports and multi-function devices.  The
    default ACS settings are strict and deny any direct traffic between two
    functions. This results in the smallest iommu_group the HW can support.
    Frequently these values result in slow or non-working P2PDMA.

    ACS offers a range of security choices controlling how traffic is
    allowed to go directly between two devices. Some popular choices:

      - Full prevention

      - Translated requests can be direct, with various options

      - Asymmetric direct traffic, A can reach B but not the reverse

      - All traffic can be direct

    Along with some other less common ones for special topologies.

    The intention is that this option would be used with expert knowledge of
    the HW capability and workload to achieve the desired configuration.

    Link: https://lore.kernel.org/r/20240625153150.159310-1-vidyas@nvidia.com
    Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
    [bhelgaas: add example, tidy printk formats]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-09-19 14:13:25 -06:00
CKI Backport Bot 699ed49382 PCI: Add missing bridge lock to pci_bus_lock()
JIRA: https://issues.redhat.com/browse/RHEL-59331
CVE: CVE-2024-46750

commit a4e772898f8bf2e7e1cf661a12c60a5612c4afab
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Thu May 30 18:04:35 2024 -0700

    PCI: Add missing bridge lock to pci_bus_lock()

    One of the true positives that the cfg_access_lock lockdep effort
    identified is this sequence:

      WARNING: CPU: 14 PID: 1 at drivers/pci/pci.c:4886 pci_bridge_secondary_bus_reset+0x5d/0x70
      RIP: 0010:pci_bridge_secondary_bus_reset+0x5d/0x70
      Call Trace:
       <TASK>
       ? __warn+0x8c/0x190
       ? pci_bridge_secondary_bus_reset+0x5d/0x70
       ? report_bug+0x1f8/0x200
       ? handle_bug+0x3c/0x70
       ? exc_invalid_op+0x18/0x70
       ? asm_exc_invalid_op+0x1a/0x20
       ? pci_bridge_secondary_bus_reset+0x5d/0x70
       pci_reset_bus+0x1d8/0x270
       vmd_probe+0x778/0xa10
       pci_device_probe+0x95/0x120

    Where pci_reset_bus() users are triggering unlocked secondary bus resets.
    Ironically pci_bus_reset(), several calls down from pci_reset_bus(), uses
    pci_bus_lock() before issuing the reset which locks everything *but* the
    bridge itself.

    For the same motivation as adding:

      bridge = pci_upstream_bridge(dev);
      if (bridge)
        pci_dev_lock(bridge);

    to pci_reset_function() for the "bus" and "cxl_bus" reset cases, add
    pci_dev_lock() for @bus->self to pci_bus_lock().

    Link: https://lore.kernel.org/r/171711747501.1628941.15217746952476635316.stgit@dwillia2-xfh.jf.intel.com
    Reported-by: Imre Deak <imre.deak@intel.com>
    Closes: http://lore.kernel.org/r/6657833b3b5ae_14984b29437@dwillia2-xfh.jf.intel.com.notmuch
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    [bhelgaas: squash in recursive locking deadlock fix from Keith Busch:
    https://lore.kernel.org/r/20240711193650.701834-1-kbusch@meta.com]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Hans de Goede <hdegoede@redhat.com>
    Tested-by: Kalle Valo <kvalo@kernel.org>
    Reviewed-by: Dave Jiang <dave.jiang@intel.com>

Signed-off-by: CKI Backport Bot <cki-ci-bot+cki-gitlab-backport-bot@redhat.com>
2024-09-18 10:00:50 +00:00
Rado Vrbovsky 2131e1ec0c Merge: PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/5087

```
JIRA: https://issues.redhat.com/browse/RHEL-54981
CVE: CVE-2024-42302

Signed-off-by: Myron Stowe <mstowe@redhat.com>
```

Approved-by: Desnes Nunes <desnesn@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Rado Vrbovsky <rvrbovsk@redhat.com>
2024-09-11 07:16:05 +00:00
Myron Stowe 3482703d91 PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
JIRA: https://issues.redhat.com/browse/RHEL-54981
CVE: CVE-2024-42302
Upstream Status: 11a1f4bc47362700fcbde717292158873fb847ed

commit 11a1f4bc47362700fcbde717292158873fb847ed
Author: Lukas Wunner <lukas@wunner.de>
Date:   Tue Jun 18 12:54:55 2024 +0200

    PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal

    Keith reports a use-after-free when a DPC event occurs concurrently to
    hot-removal of the same portion of the hierarchy:

    The dpc_handler() awaits readiness of the secondary bus below the
    Downstream Port where the DPC event occurred.  To do so, it polls the
    config space of the first child device on the secondary bus.  If that
    child device is concurrently removed, accesses to its struct pci_dev
    cause the kernel to oops.

    That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
    reference on the child device.  Before v6.3, the function was only
    called on resume from system sleep or on runtime resume.  Holding a
    reference wasn't necessary back then because the pciehp IRQ thread
    could never run concurrently.  (On resume from system sleep, IRQs are
    not enabled until after the resume_noirq phase.  And runtime resume is
    always awaited before a PCI device is removed.)

    However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
    called on a DPC event.  Commit 53b54ad074de ("PCI/DPC: Await readiness
    of secondary bus after reset"), which introduced that, failed to
    appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
    reference on the child device because dpc_handler() and pciehp may
    indeed run concurrently.  The commit was backported to v5.10+ stable
    kernels, so that's the oldest one affected.

    Add the missing reference acquisition.

    Abridged stack trace:

      BUG: unable to handle page fault for address: 00000000091400c0
      CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0
      RIP: pci_bus_read_config_dword+0x17/0x50
      pci_dev_wait()
      pci_bridge_wait_for_secondary_bus()
      dpc_reset_link()
      pcie_do_recovery()
      dpc_handler()

    Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset")
    Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/
    Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.1718707743.git.lukas@wunner.de
    Reported-by: Keith Busch <kbusch@kernel.org>
    Tested-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Lukas Wunner <lukas@wunner.de>
    Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Cc: stable@vger.kernel.org # v5.10+

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-23 09:16:10 -06:00
Myron Stowe 57bfaa7c9d PCI: Revert the cfg_access_lock lockdep mechanism
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: c9d52fb313d3719d69a040f4ca78a3e2e95fba21

commit c9d52fb313d3719d69a040f4ca78a3e2e95fba21
Author: Dan Williams <dan.j.williams@intel.com>
Date:   Thu May 30 18:04:24 2024 -0700

    PCI: Revert the cfg_access_lock lockdep mechanism

    While the experiment did reveal that there are additional places that are
    missing the lock during secondary bus reset, one of the places that needs
    to take cfg_access_lock (pci_bus_lock()) is not prepared for lockdep
    annotation.

    Specifically, pci_bus_lock() takes pci_dev_lock() recursively and is
    currently dependent on the fact that the device_lock() is marked
    lockdep_set_novalidate_class(&dev->mutex). Otherwise, without that
    annotation, pci_bus_lock() would need to use something like a new
    pci_dev_lock_nested() helper, a scheme to track a PCI device's depth in the
    topology, and a hope that the depth of a PCI tree never exceeds the max
    value for a lockdep subclass.

    The alternative to ripping out the lockdep coverage would be to deploy a
    dynamic lock key for every PCI device. Unfortunately, there is evidence
    that increasing the number of keys that lockdep needs to track to be
    per-PCI-device is prohibitively expensive for something like the
    cfg_access_lock.

    The main motivation for adding the annotation in the first place was to
    catch unlocked secondary bus resets, not necessarily catch lock ordering
    problems between cfg_access_lock and other locks. Solve that narrower
    problem with follow-on patches, and just due to targeted revert for now.

    Link: https://lore.kernel.org/r/171711746402.1628941.14575335981264103013.stgit@dwillia2-xfh.jf.intel.com
    Fixes: 7e89efc6e9e4 ("PCI: Lock upstream bridge for pci_reset_function()")
    Reported-by: Imre Deak <imre.deak@intel.com>
    Closes: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_134186v1/shard-dg2-1/igt@device_reset@unbind-reset-rebind.html
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Hans de Goede <hdegoede@redhat.com>
    Tested-by: Kalle Valo <kvalo@kernel.org>
    Reviewed-by: Dave Jiang <dave.jiang@intel.com>
    Cc: Jani Saarinen <jani.saarinen@intel.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe 815e68c019 PCI: Make pcie_bandwidth_capable() static
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: fe4a83ec07818f2243eac584488e65397699550c

commit fe4a83ec07818f2243eac584488e65397699550c
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Tue May 7 15:17:58 2024 +0300

    PCI: Make pcie_bandwidth_capable() static

    pcie_bandwidth_capable() is only used within pci.c, make it static.

    Link: https://lore.kernel.org/r/20240507121758.13849-1-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe e9671490b6 PCI: Annotate pci_cache_line_size variables as __ro_after_init
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: c7ae396ec597b2f3644f90f5c7278674b0527aa9

commit c7ae396ec597b2f3644f90f5c7278674b0527aa9
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Thu Apr 18 20:29:21 2024 +0200

    PCI: Annotate pci_cache_line_size variables as __ro_after_init

    Annotate both variables as __ro_after_init, enforcing that they can't be
    changed after the init phase.

    Link: https://lore.kernel.org/r/52fd058d-6d72-48db-8e61-5fcddcd0aa51@gmail.com
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe 837b24c1e2 PCI/PM: Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: 256df20c590bf0e4d63ac69330cf23faddac3e08

commit 256df20c590bf0e4d63ac69330cf23faddac3e08
Author: Mario Limonciello <mario.limonciello@amd.com>
Date:   Thu Mar 7 10:37:09 2024 -0600

    PCI/PM: Avoid D3cold for HP Pavilion 17 PC/1972 PCIe Ports

    Hewlett-Packard HP Pavilion 17 Notebook PC/1972 is an Intel Ivy Bridge
    system with a muxless AMD Radeon dGPU.  Attempting to use the dGPU fails
    with the following sequence:

      ACPI Error: Aborting method \AMD3._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
      radeon 0000:01:00.0: not ready 1023ms after resume; waiting
      radeon 0000:01:00.0: not ready 2047ms after resume; waiting
      radeon 0000:01:00.0: not ready 4095ms after resume; waiting
      radeon 0000:01:00.0: not ready 8191ms after resume; waiting
      radeon 0000:01:00.0: not ready 16383ms after resume; waiting
      radeon 0000:01:00.0: not ready 32767ms after resume; waiting
      radeon 0000:01:00.0: not ready 65535ms after resume; giving up
      radeon 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible

    The issue is that the Root Port the dGPU is connected to can't handle the
    transition from D3cold to D0 so the dGPU can't properly exit runtime PM.

    The existing logic in pci_bridge_d3_possible() checks for systems that are
    newer than 2015 to decide that D3 is safe.  This would nominally work for
    an Ivy Bridge system (which was discontinued in 2015), but this system
    appears to have continued to receive BIOS updates until 2017 and so this
    existing logic doesn't appropriately capture it.

    Add the system to bridge_d3_blacklist to prevent D3cold from being used.

    Link: https://lore.kernel.org/r/20240307163709.323-1-mario.limonciello@amd.com
    Reported-by: Eric Heintzmann <heintzmann.eric@free.fr>
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3229
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Eric Heintzmann <heintzmann.eric@free.fr>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe b4d4ad3baa PCI: Do not wait for disconnected devices when resuming
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: 6613443ffc49d03e27f0404978f685c4eac43fba

commit 6613443ffc49d03e27f0404978f685c4eac43fba
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Thu Feb 8 15:23:21 2024 +0200

    PCI: Do not wait for disconnected devices when resuming

    On runtime resume, pci_dev_wait() is called:

      pci_pm_runtime_resume()
        pci_pm_bridge_power_up_actions()
          pci_bridge_wait_for_secondary_bus()
            pci_dev_wait()

    While a device is runtime suspended along with its PCI hierarchy, the
    device could get disconnected. In such case, the link will not come up no
    matter how long pci_dev_wait() waits for it.

    Besides the above mentioned case, there could be other ways to get the
    device disconnected while pci_dev_wait() is waiting for the link to come
    up.

    Make pci_dev_wait() exit if the device is already disconnected to avoid
    unnecessary delay.

    The use cases of pci_dev_wait() boil down to two:

      1. Waiting for the device after reset
      2. pci_bridge_wait_for_secondary_bus()

    The callers in both cases seem to benefit from propagating the
    disconnection as error even if device disconnection would be more
    analoguous to the case where there is no device in the first place which
    return 0 from pci_dev_wait(). In the case 2, it results in unnecessary
    marking of the devices disconnected again but that is just harmless extra
    work.

    Also make sure compiler does not become too clever with dev->error_state
    and use READ_ONCE() to force a fetch for the up-to-date value.

    Link: https://lore.kernel.org/r/20240208132322.4811-1-ilpo.jarvinen@linux.intel.com
    Reported-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe 73528b22d7 PCI: Remove unused pci_enable_device_io()
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: 844177a80753fc173131f3e591124c8dcbc89812

commit 844177a80753fc173131f3e591124c8dcbc89812
Author: Heiner Kallweit <hkallweit1@gmail.com>
Date:   Sat Mar 23 18:16:36 2024 +0100

    PCI: Remove unused pci_enable_device_io()

    After the last user was removed, remove this PCI core function.  It's very
    unlikely that we'll see a new device requiring io space access, even though
    memory space access is supported.

    Link: https://lore.kernel.org/r/213ebf62-53a3-42b7-8518-ecd5cd6d6b08@gmail.com
    Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe e1ce2d3e77 PCI: Clarify intent of LT wait
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: cdc6c4abcb313be1b7118b6e86eb99a85a626578

commit cdc6c4abcb313be1b7118b6e86eb99a85a626578
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Tue Apr 23 16:08:20 2024 +0300

    PCI: Clarify intent of LT wait

    Clarify the comment relating to the LT wait and the purpose of the check
    that implements the implementation note in PCIe r6.1 sec 7.5.3.7.

    Suggested-by: Maciej W. Rozycki <macro@orcam.me.uk>
    Link: https://lore.kernel.org/r/20240423130820.43824-2-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe af5d87ce2b PCI: Wait for Link Training==0 before starting Link retrain
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: 73cb3a35f94db723c0211ad099bce55b2155e3f0

commit 73cb3a35f94db723c0211ad099bce55b2155e3f0
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Tue Apr 23 16:08:19 2024 +0300

    PCI: Wait for Link Training==0 before starting Link retrain

    Two changes were made in link retraining logic independent of each other.

    The commit e7e39756363a ("PCI/ASPM: Avoid link retraining race") added a
    check to pcie_retrain_link() to ensure no Link Training is currently active
    to address the Implementation Note in PCIe r6.1 sec 7.5.3.7. At that time
    pcie_wait_for_retrain() only checked for the Link Training (LT) bit being
    cleared.

    The commit 680e9c47a229 ("PCI: Add support for polling DLLLA to
    pcie_retrain_link()") generalized pcie_wait_for_retrain() into
    pcie_wait_for_link_status() which can wait either for LT or the Data Link
    Layer Link Active (DLLLA) bit with 'use_lt' argument and supporting waiting
    for either cleared or set using 'active' argument.

    In the merge commit 1abb47390350 ("Merge branch 'pci/enumeration'"), those
    two divergent branches converged. The merge changed LT bit checking added
    in the commit e7e39756363a ("PCI/ASPM: Avoid link retraining race") to now
    wait for completion of any ongoing Link Training using DLLLA bit being set
    if 'use_lt' is false.

    When 'use_lt' is false, the pseudo-code steps of what occurs in
    pcie_retrain_link():

            1. Wait for DLLLA==1
            2. Trigger link to retrain
            3. Wait for DLLLA==1

    Step 3 waits for the link to come up from the retraining triggered by Step
    2. As Step 1 is supposed to wait for any ongoing retraining to end, using
    DLLLA also for it does not make sense because link training being active is
    still indicated using LT bit, not with DLLLA.

    Correct the pcie_wait_for_link_status() parameters in Step 1 to only wait
    for LT==0 to ensure there is no ongoing Link Training.

    This only impacts the Target Speed quirk, which is the only case where
    waiting for DLLLA bit is used. It currently works in the problematic case
    by means of link training getting initiated by hardware repeatedly and
    respecting the new link parameters set by the caller, which then make
    training succeed and bring the link up, setting DLLLA and causing
    pcie_wait_for_link_status() to return success. We are not supposed to rely
    on luck and need to make sure that LT transitioned through the inactive
    state though before we initiate link training by hand via RL (Retrain Link)
    bit.

    Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
    Link: https://lore.kernel.org/r/20240423130820.43824-1-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe 2d69d79e9a PCI: Lock upstream bridge for pci_reset_function()
JIRA: https://issues.redhat.com/browse/RHEL-50255
Upstream Status: 7e89efc6e9e402839643cb297bab14055c547f07

commit 7e89efc6e9e402839643cb297bab14055c547f07
Author: Dave Jiang <dave.jiang@intel.com>
Date:   Thu May 2 09:57:31 2024 -0700

    PCI: Lock upstream bridge for pci_reset_function()

    Fix a long-standing locking gap for missing pci_cfg_access_lock() while
    manipulating bridge reset registers and configuration during
    pci_reset_bus_function().

    If there is an upstream bridge, lock it before locking the device itself.
    pci_dev_lock() calls pci_cfg_access_lock(), which blocks the writing of PCI
    config space by user space.

    Add lockdep assertion via pci_dev->cfg_access_lock to verify
    pci_dev->block_cfg_access is set.

    Co-developed-by: Dan Williams <dan.j.williams@intel.com>
    Link: https://lore.kernel.org/r/20240502165851.1948523-3-dave.jiang@intel.com
    Signed-off-by: Dan Williams <dan.j.williams@intel.com>
    Signed-off-by: Dave Jiang <dave.jiang@intel.com>
    [bhelgaas: commit log]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-08-15 15:31:13 -06:00
Myron Stowe 934bc80496 PCI: Place interrupt related code into irq.c
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 1e8cc8e6bd85d7b25e0ed3759aedde804c91ba97

Conflict(s)
  drivers/pci/Makefile: Same conflict(s) as encountered upstream
  see 420b8c360695 Merge branch 'pci/enumeration'.


commit 1e8cc8e6bd85d7b25e0ed3759aedde804c91ba97
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Mon Jan 29 13:36:54 2024 +0200

    PCI: Place interrupt related code into irq.c

    Interrupt related code is spread into irq.c, pci.c, and setup-irq.c.
    Group them into pre-existing irq.c.

    Link: https://lore.kernel.org/r/20240129113655.3368-1-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:55:19 -06:00
Myron Stowe 39776a1a0a PCI: Move devres code from pci.c to devres.c
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 815a3909ead7440e2827042e5ec618f4396f022c

commit 815a3909ead7440e2827042e5ec618f4396f022c
Author: Philipp Stanner <pstanner@redhat.com>
Date:   Wed Jan 31 10:00:23 2024 +0100

    PCI: Move devres code from pci.c to devres.c

    The file pci.c is very large and contains a number of devres functions.
    These functions should now reside in devres.c.

    Move as much devres-specific code from pci.c to devres.c as possible.

    There are a few callers left in pci.c that do devres operations. These
    should be ported in the future. Add corresponding TODOs.

    The reason they are not moved right now in this commit is that PCI's devres
    currently implements a sort of "hybrid-mode": pci_request_region(), for
    instance, does not have a corresponding pcim_ equivalent, yet. Instead, the
    function can be made managed by previously calling pcim_enable_device()
    (instead of pci_enable_device()). This makes it unreasonable to move
    pci_request_region() to devres.c. Moving the functions would require
    changes to PCI's API and is, therefore, left for future work.

    In summary, this commit serves as a preparation step for a following
    patch series that will cleanly separate the PCI's managed and unmanaged
    API.

    Link: https://lore.kernel.org/r/20240131090023.12331-5-pstanner@redhat.com
    Suggested-by: Danilo Krummrich <dakr@redhat.com>
    Signed-off-by: Philipp Stanner <pstanner@redhat.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:52:32 -06:00
Myron Stowe 6104885fc7 PCI/ASPM: Disable L1 before configuring L1 Substates
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 64dbb2d707444f691539fb12aacf81797786c10b

commit 64dbb2d707444f691539fb12aacf81797786c10b
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Tue Mar 5 15:15:25 2024 -0600

    PCI/ASPM: Disable L1 before configuring L1 Substates

    Per PCIe r6.1, sec 5.5.4, L1 must be disabled while setting ASPM L1 PM
    Substates enable bits.  Previously this was enforced by clearing
    PCI_EXP_LNKCTL_ASPMC before calling pci_restore_aspm_l1ss_state().

    Move the L1 (and L0s, although that doesn't seem required) disable into
    pci_restore_aspm_l1ss_state() itself so it's closer to the code that
    depends on it.

    Link: https://lore.kernel.org/r/20240223213733.GA115410@bhelgaas
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:52:32 -06:00
Myron Stowe c5e76d4ac3 PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: c198fafa0125e97728d16411aa653602900ab0bc

commit c198fafa0125e97728d16411aa653602900ab0bc
Author: David E. Box <david.e.box@linux.intel.com>
Date:   Fri Feb 23 14:58:51 2024 -0600

    PCI/ASPM: Call pci_save_ltr_state() from pci_save_pcie_state()

    ASPM state is saved and restored from pci_save/restore_pcie_state().  Since
    the LTR Capability is linked with ASPM, move the LTR save and restore calls
    there as well.  No functional change intended.

    Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
    Link: https://lore.kernel.org/r/20240128233212.1139663-6-david.e.box@linux.intel.com
    Link: https://lore.kernel.org/r/20240223205851.114931-6-helgaas@kernel.org
    Signed-off-by: David E. Box <david.e.box@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:52:32 -06:00
Myron Stowe 39e98b2dcc PCI/ASPM: Save L1 PM Substates Capability for suspend/resume
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 17423360a27ae58c1850f588bdd8013bbfcd250b

commit 17423360a27ae58c1850f588bdd8013bbfcd250b
Author: David E. Box <david.e.box@linux.intel.com>
Date:   Fri Feb 23 14:58:50 2024 -0600

    PCI/ASPM: Save L1 PM Substates Capability for suspend/resume

    4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
    suspend/resume") restored the L1 PM Substates Capability after resume,
    which reduced power consumption by making the ASPM L1.x states work after
    resume.

    a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for
    suspend/resume"") reverted 4ff116d0d5fd because resume failed on some
    systems, so power consumption after resume increased again.

    a7152be79b62 mentioned that we restore L1 PM substate configuration even
    though ASPM L1 may already be enabled. This is due the fact that the
    pci_restore_aspm_l1ss_state() was called before pci_restore_pcie_state().

    Save and restore the L1 PM Substates Capability, following PCIe r6.1, sec
    5.5.4 more closely by:

      1) Do not restore ASPM configuration in pci_restore_pcie_state() but
         do that after PCIe capability is restored in pci_restore_aspm_state()
         following PCIe r6.1, sec 5.5.4.

      2) If BIOS reenables L1SS, particularly L1.2, we need to clear the
         enables in the right order, downstream before upstream. Defer
         restoring the L1SS config until we are at the downstream component.
         Then update the config for both ends of the link in the prescribed
         order.

      3) Program ASPM L1 PM substate configuration before L1 enables.

      4) Program ASPM L1 PM substate enables last, after rest of the fields
         in the capability are programmed.

    [bhelgaas: commit log, squash L1SS-related patches, do both LNKCTL restores
    in pci_restore_pcie_state()]

    Link: https://lore.kernel.org/r/20240128233212.1139663-3-david.e.box@linux.intel.com
    Link: https://lore.kernel.org/r/20240128233212.1139663-4-david.e.box@linux.intel.com
    Link: https://lore.kernel.org/r/20240223205851.114931-5-helgaas@kernel.org
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217321
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=216782
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=216877
    Co-developed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Co-developed-by: David E. Box <david.e.box@linux.intel.com>
    Reported-by: Koba Ko <koba.ko@canonical.com>
    Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
    Signed-off-by: David E. Box <david.e.box@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Tested-by: Tasev Nikola <tasev.stefanoska@skynet.be> # Asus UX305FA
    Cc: Mark Enriquez <enriquezmark36@gmail.com>
    Cc: Thomas Witt <kernel@witt.link>
    Cc: Werner Sembach <wse@tuxedocomputers.com>
    Cc: Vidya Sagar <vidyas@nvidia.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:52:32 -06:00
Myron Stowe 11beb3fff4 PCI/ASPM: Move pci_save_ltr_state() to aspm.c
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 1e11b5494c3dbb1e5fce7e95021c1698799c7288

commit 1e11b5494c3dbb1e5fce7e95021c1698799c7288
Author: David E. Box <david.e.box@linux.intel.com>
Date:   Fri Feb 23 14:58:49 2024 -0600

    PCI/ASPM: Move pci_save_ltr_state() to aspm.c

    Even when CONFIG_PCIEASPM is not set, we save and restore the LTR
    Capability so that if ASPM L1.2 and LTR were configured by the platform,
    ASPM L1.2 will still work after suspend/resume, when that platform
    configuration may be lost. See dbbfadf231 ("PCI/ASPM: Save LTR Capability
    for suspend/resume").

    Since ASPM L1.2 depends on the LTR Capability, move the save/restore code
    to the part of aspm.c that is always compiled regardless of
    CONFIG_PCIEASPM.  No functional change intended.

    Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
    Link: https://lore.kernel.org/r/20240128233212.1139663-5-david.e.box@linux.intel.com
    [bhelgaas: commit log, reorder to make this a pure move]
    Link: https://lore.kernel.org/r/20240223205851.114931-4-helgaas@kernel.org
    Signed-off-by: David E. Box <david.e.box@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:52:32 -06:00
Myron Stowe 47a9ded2ef PCI/ASPM: Move pci_configure_ltr() to aspm.c
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: fa84f4435a6202dd90248517f41e54bf3fb85bc5

Conflict(s):
  Patching file drivers/pci/pci.h; Hunk #2 FAILED at 572.
  False conflict as this upstream patch's basis was prior to upstream
  commit 1e560864159d "PCI/ASPM: Fix deadlock when enabling ASPM".


commit fa84f4435a6202dd90248517f41e54bf3fb85bc5
Author: David E. Box <david.e.box@linux.intel.com>
Date:   Fri Feb 23 14:58:47 2024 -0600

    PCI/ASPM: Move pci_configure_ltr() to aspm.c

    The Latency Tolerance Reporting (LTR) mechanism supports the ASPM L1.2
    state and is only configured when CONFIG_PCIEASPM is set.

    Move pci_configure_ltr() and pci_bridge_reconfigure_ltr() into aspm.c since
    they only build when CONFIG_PCIEASPM is set.  No functional change
    intended.

    Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
    Link: https://lore.kernel.org/r/20240128233212.1139663-2-david.e.box@linux.intel.com
    [bhelgaas: commit log, split build change from function moves]
    Link: https://lore.kernel.org/r/20240223205851.114931-2-helgaas@kernel.org
    Signed-off-by: David E. Box <david.e.box@linux.intel.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:52:11 -06:00
Myron Stowe e4b1b96519 PCI/AER: Generalize TLP Header Log reading
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 0a5a46a6a61be7b63c12c18495d427f91f3662a9

commit 0a5a46a6a61be7b63c12c18495d427f91f3662a9
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Tue Feb 6 15:57:15 2024 +0200

    PCI/AER: Generalize TLP Header Log reading

    Both AER and DPC RP PIO provide TLP Header Log registers (PCIe r6.1 secs
    7.8.4 & 7.9.14) to convey error diagnostics but the struct is named after
    AER as the struct aer_header_log_regs. Also, not all places that handle TLP
    Header Log use the struct and the struct members are named individually.

    Generalize the struct name and members, and use it consistently where TLP
    Header Log is being handled so that a pcie_read_tlp_log() helper can be
    easily added.

    Link: https://lore.kernel.org/r/20240206135717.8565-3-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    [bhelgaas: drop ixgbe changes for now, tidy whitespace]
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:49:50 -06:00
Myron Stowe 33e16d3ebf PCI: Add debug print for device ready delay
JIRA: https://issues.redhat.com/browse/RHEL-33544
Upstream Status: 0a5ef95923e01aa93210d22e0d62d66b601238d7

commit 0a5ef95923e01aa93210d22e0d62d66b601238d7
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Nov 15 13:17:17 2023 +0100

    PCI: Add debug print for device ready delay

    Currently, the time it took a PCI device to become ready after reset is
    only printed if it was longer than 1000ms ('PCI_RESET_WAIT'). However,
    for debugging purposes it is useful to know this time even if it was
    shorter. For example, with the device I am working on, hardware
    engineers asked to verify that it becomes ready on the first try (no
    delay).

    To that end, add a debug level print that can be enabled using dynamic
    debug. Example:

     # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
     # dmesg -c | grep ready
     # echo "file drivers/pci/pci.c +p" > /sys/kernel/debug/dynamic_debug/control
     # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
     # dmesg -c | grep ready
     [  396.060335] mlxsw_spectrum4 0000:01:00.0: ready 0ms after bus reset
     # echo "file drivers/pci/pci.c -p" > /sys/kernel/debug/dynamic_debug/control
     # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset
     # dmesg -c | grep ready

    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>
    Signed-off-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>

Signed-off-by: Myron Stowe <mstowe@redhat.com>
2024-05-13 15:49:50 -06:00