JIRA: https://issues.redhat.com/browse/RHEL-54308
commit fea93a3e5d5e6a09eb153866d2ce60ea3287a70d
Author: Wei Liu <wei.liu@kernel.org>
Date: Mon Jul 1 20:26:05 2024 +0000
PCI: hv: Return zero, not garbage, when reading PCI_INTERRUPT_PIN
The intent of the code snippet is to always return 0 for both
PCI_INTERRUPT_LINE and PCI_INTERRUPT_PIN.
The check misses PCI_INTERRUPT_PIN. This patch fixes that.
This is discovered by this call in VFIO:
pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
The old code does not set *val to 0 because it misses the check for
PCI_INTERRUPT_PIN. Garbage is returned in that case.
Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Link: https://lore.kernel.org/linux-pci/20240701202606.129606-1-wei.liu@kernel.org
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Cc: stable@kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-54308
commit b5ff74c1ef50fe08e384026875fec660fadfaedd
Author: Michael Kelley <mhklinux@outlook.com>
Date: Fri Feb 16 12:22:40 2024 -0800
PCI: hv: Fix ring buffer size calculation
For a physical PCI device that is passed through to a Hyper-V guest VM,
current code specifies the VMBus ring buffer size as 4 pages. But this
is an inappropriate dependency, since the amount of ring buffer space
needed is unrelated to PAGE_SIZE. For example, on x86 the ring buffer
size ends up as 16 Kbytes, while on ARM64 with 64 Kbyte pages, the ring
size bloats to 256 Kbytes. The ring buffer for PCI pass-thru devices
is used for only a few messages during device setup and removal, so any
space above a few Kbytes is wasted.
Fix this by declaring the ring buffer size to be a fixed 16 Kbytes.
Furthermore, use the VMBUS_RING_SIZE() macro so that the ring buffer
header is properly accounted for, and so the size is rounded up to a
page boundary, using the page size for which the kernel is built. While
w/64 Kbyte pages this results in a 64 Kbyte ring buffer header plus a
64 Kbyte ring buffer, that's the smallest possible with that page size.
It's still 128 Kbytes better than the current code.
Link: https://lore.kernel.org/linux-pci/20240216202240.251818-1-mhklinux@outlook.com
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Long Li <longli@microsoft.com>
Cc: <stable@vger.kernel.org> # 5.15.x
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-25415
commit 07e8f88568f558fb0f9529f49b3ab120cbe750fe
Author: Andrew Cooper <andrew.cooper3@citrix.com>
Date: Thu Nov 2 12:26:19 2023 +0000
x86/apic: Drop apic::delivery_mode
This field is set to APIC_DELIVERY_MODE_FIXED in all cases, and is read
exactly once. Fold the constant in uv_program_mmr() and drop the field.
Searching for the origin of the stale HyperV comment reveals commit
a31e58e129 ("x86/apic: Switch all APICs to Fixed delivery mode") which
notes:
As a consequence of this change, the apic::irq_delivery_mode field is
now pointless, but this needs to be cleaned up in a separate patch.
6 years is long enough for this technical debt to have survived.
[ bp: Fold in
https://lore.kernel.org/r/20231121123034.1442059-1-andrew.cooper3@citrix.com
]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lore.kernel.org/r/20231102-x86-apic-v1-1-bf049a2a0ed6@citrix.com
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-20318
commit f741bcadfe52e424985926d4d1c1e3941bf8403e
Author: Kees Cook <keescook@chromium.org>
Date: Fri Sep 22 10:52:57 2023 -0700
PCI: hv: Annotate struct hv_dr_state with __counted_by
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct hv_dr_state.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Link: https://lore.kernel.org/linux-pci/20230922175257.work.900-kees@kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Reviewed-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Krzysztof Wilczyński <kw@linux.com>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Rob Herring <robh@kernel.org>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: linux-hyperv@vger.kernel.org
Cc: linux-pci@vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-20318
commit 503112f4225fac761d2a0170e6a5f09b69ae1d36
Author: Olaf Hering <olaf@aepfle.de>
Date: Mon Nov 7 17:18:31 2022 +0000
PCI: hv: update comment in x86 specific hv_arch_irq_unmask
The function hv_set_affinity was removed in commit 831c1ae7 ("PCI: hv:
Make the code arch neutral by adding arch specific interfaces").
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Link: https://lore.kernel.org/r/20221107171831.25283-1-olaf@aepfle.de
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2211797
commit 04bbe863241a9be7d57fb4cf217ee4a72f480e70
Author: Dexuan Cui <decui@microsoft.com>
Date: Wed Aug 16 10:59:39 2023 -0700
PCI: hv: Fix a crash in hv_pci_restore_msi_msg() during hibernation
When a Linux VM with an assigned PCI device runs on Hyper-V, if the PCI
device driver is not loaded yet (i.e. MSI-X/MSI is not enabled on the
device yet), doing a VM hibernation triggers a panic in
hv_pci_restore_msi_msg() -> msi_lock_descs(&pdev->dev), because
pdev->dev.msi.data is still NULL.
Avoid the panic by checking if MSI-X/MSI is enabled.
Link: https://lore.kernel.org/r/20230816175939.21566-1-decui@microsoft.com
Fixes: dc2b453290c4 ("PCI: hv: Rework MSI handling")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Reviewed-by: sathyanarayanan.kuppuswamy@linux.intel.com
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2222608
commit a494aef23dfc732945cb42e22246a5c31174e4a5
Author: Dexuan Cui <decui@microsoft.com>
Date: Thu Apr 20 18:30:25 2023 -0700
PCI: hv: Replace retarget_msi_interrupt_params with hyperv_pcpu_input_arg
4 commits are involved here:
A (2016): commit 0de8ce3ee8 ("PCI: hv: Allocate physically contiguous hypercall params buffer")
B (2017): commit be66b67365 ("PCI: hv: Use page allocation for hbus structure")
C (2019): commit 877b911a5b ("PCI: hv: Avoid a kmemleak false positive caused by the hbus buffer")
D (2018): commit 68bb7bfb79 ("X86/Hyper-V: Enable IPI enlightenments")
Patch D introduced the per-CPU hypercall input page "hyperv_pcpu_input_arg"
in 2018. With patch D, we no longer need the per-Hyper-V-PCI-bus hypercall
input page "hbus->retarget_msi_interrupt_params" that was added in patch A,
and the issue addressed by patch B is no longer an issue, and we can also
get rid of patch C.
The change here is required for PCI device assignment to work for
Confidential VMs (CVMs) running without a paravisor, because otherwise we
would have to call set_memory_decrypted() for
"hbus->retarget_msi_interrupt_params" before calling the hypercall
HVCALL_RETARGET_INTERRUPT.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20230421013025.17152-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2222608
commit 2c6ba4216844ca7918289b49ed5f3f7138ee2402
Author: Michael Kelley <mikelley@microsoft.com>
Date: Sun Mar 26 06:52:07 2023 -0700
PCI: hv: Enable PCI pass-thru devices in Confidential VMs
For PCI pass-thru devices in a Confidential VM, Hyper-V requires
that PCI config space be accessed via hypercalls. In normal VMs,
config space accesses are trapped to the Hyper-V host and emulated.
But in a confidential VM, the host can't access guest memory to
decode the instruction for emulation, so an explicit hypercall must
be used.
Add functions to make the new MMIO read and MMIO write hypercalls.
Update the PCI config space access functions to use the hypercalls
when such use is indicated by Hyper-V flags. Also, set the flag to
allow the Hyper-V PCI driver to be loaded and used in a Confidential
VM (a.k.a., "Isolation VM"). The driver has previously been hardened
against a malicious Hyper-V host[1].
[1] https://lore.kernel.org/all/20220511223207.3386-2-parri.andrea@gmail.com/
Co-developed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1679838727-87310-13-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2182619
commit 067d6ec7ed5b49380688e06c1e5f883a71bef4fe
Author: Dexuan Cui <decui@microsoft.com>
Date: Wed Jun 14 21:44:51 2023 -0700
PCI: hv: Add a per-bus mutex state_lock
In the case of fast device addition/removal, it's possible that
hv_eject_device_work() can start to run before create_root_hv_pci_bus()
starts to run; as a result, the pci_get_domain_bus_and_slot() in
hv_eject_device_work() can return a 'pdev' of NULL, and
hv_eject_device_work() can remove the 'hpdev', and immediately send a
message PCI_EJECTION_COMPLETE to the host, and the host immediately
unassigns the PCI device from the guest; meanwhile,
create_root_hv_pci_bus() and the PCI device driver can be probing the
dead PCI device and reporting timeout errors.
Fix the issue by adding a per-bus mutex 'state_lock' and grabbing the
mutex before powering on the PCI bus in hv_pci_enter_d0(): when
hv_eject_device_work() starts to run, it's able to find the 'pdev' and call
pci_stop_and_remove_bus_device(pdev): if the PCI device driver has
loaded, the PCI device driver's probe() function is already called in
create_root_hv_pci_bus() -> pci_bus_add_devices(), and now
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able
to call the PCI device driver's remove() function and remove the device
reliably; if the PCI device driver hasn't loaded yet, the function call
hv_eject_device_work() -> pci_stop_and_remove_bus_device() is able to
remove the PCI device reliably and the PCI device driver's probe()
function won't be called; if the PCI device driver's probe() is already
running (e.g., systemd-udev is loading the PCI device driver), it must
be holding the per-device lock, and after the probe() finishes and releases
the lock, hv_eject_device_work() -> pci_stop_and_remove_bus_device() is
able to proceed to remove the device reliably.
Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-6-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2182619
commit a847234e24d03d01a9566d1d9dcce018cc018d67
Author: Dexuan Cui <decui@microsoft.com>
Date: Wed Jun 14 21:44:50 2023 -0700
Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally"
This reverts commit d6af2ed29c.
The statement "the hv_pci_bus_exit() call releases structures of all its
child devices" in commit d6af2ed29c is not true: in the path
hv_pci_probe() -> hv_pci_enter_d0() -> hv_pci_bus_exit(hdev, true): the
parameter "keep_devs" is true, so hv_pci_bus_exit() does *not* release the
child "struct hv_pci_dev *hpdev" that is created earlier in
pci_devices_present_work() -> new_pcichild_device().
The commit d6af2ed29c was originally made in July 2020 for RHEL 7.7,
where the old version of hv_pci_bus_exit() was used; when the commit was
rebased and merged into the upstream, people didn't notice that it's
not really necessary. The commit itself doesn't cause any issue, but it
makes hv_pci_probe() more complicated. Revert it to facilitate some
upcoming changes to hv_pci_probe().
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Wei Hu <weh@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-5-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2182619
commit add9195e69c94b32e96f78c2f9cea68f0e850b3f
Author: Dexuan Cui <decui@microsoft.com>
Date: Wed Jun 14 21:44:49 2023 -0700
PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev
The hpdev->state is never really useful. The only use in
hv_pci_eject_device() and hv_eject_device_work() is not really necessary.
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-4-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2182619
commit 2738d5ab7929a845b654cd171a1e275c37eb428e
Author: Dexuan Cui <decui@microsoft.com>
Date: Wed Jun 14 21:44:48 2023 -0700
PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic
When the host tries to remove a PCI device, the host first sends a
PCI_EJECT message to the guest, and the guest is supposed to gracefully
remove the PCI device and send a PCI_EJECTION_COMPLETE message to the host;
the host then sends a VMBus message CHANNELMSG_RESCIND_CHANNELOFFER to
the guest (when the guest receives this message, the device is already
unassigned from the guest) and the guest can do some final cleanup work;
if the guest fails to respond to the PCI_EJECT message within one minute,
the host sends the VMBus message CHANNELMSG_RESCIND_CHANNELOFFER and
removes the PCI device forcibly.
In the case of fast device addition/removal, it's possible that the PCI
device driver is still configuring MSI-X interrupts when the guest receives
the PCI_EJECT message; the channel callback calls hv_pci_eject_device(),
which sets hpdev->state to hv_pcichild_ejecting, and schedules a work
hv_eject_device_work(); if the PCI device driver is calling
pci_alloc_irq_vectors() -> ... -> hv_compose_msi_msg(), we can break the
while loop in hv_compose_msi_msg() due to the updated hpdev->state, and
leave data->chip_data with its default value of NULL; later, when the PCI
device driver calls request_irq() -> ... -> hv_irq_unmask(), the guest
crashes in hv_arch_irq_unmask() due to data->chip_data being NULL.
Fix the issue by not testing hpdev->state in the while loop: when the
guest receives PCI_EJECT, the device is still assigned to the guest, and
the guest has one minute to finish the device removal gracefully. We don't
really need to (and we should not) test hpdev->state in the loop.
Fixes: de0aa7b2f9 ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-3-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2182619
commit 440b5e3663271b0ffbd4908115044a6a51fb938b
Author: Dexuan Cui <decui@microsoft.com>
Date: Wed Jun 14 21:44:47 2023 -0700
PCI: hv: Fix a race condition bug in hv_pci_query_relations()
Since day 1 of the driver, there has been a race between
hv_pci_query_relations() and survey_child_resources(): during fast
device hotplug, hv_pci_query_relations() may error out due to
device-remove and the stack variable 'comp' is no longer valid;
however, pci_devices_present_work() -> survey_child_resources() ->
complete() may be running on another CPU and accessing the no-longer-valid
'comp'. Fix the race by flushing the workqueue before we exit from
hv_pci_query_relations().
Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615044451.5580-2-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2114045
Conflicts: out of order commits in drivers/pci/controller/pci-hyperv.c which
removed the usage of struct cpumask *
Omitted-fix: 5182fecc4be8 "PCI: hv: Take a const cpumask in hv_compose_msi_req_get_cpu()"
This is the same as upstream 9167fd5d5549 (RHEL e9d95ad75d)
commit 4d0b8298818b623f5fa51d5c49e1a142d3618ac9
Author: Samuel Holland <samuel@sholland.org>
Date: Fri Jul 1 15:00:55 2022 -0500
genirq: Return a const cpumask from irq_data_get_affinity_mask
Now that the irq_data_update_affinity helper exists, enforce its use
by returning a a const cpumask from irq_data_get_affinity_mask.
Since the previous commit already updated places that needed to call
irq_data_update_affinity, this commit updates the remaining code that
either did not modify the cpumask or immediately passed the modified
mask to irq_set_affinity.
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220701200056.46555-8-samuel@sholland.org
Signed-off-by: David Arcari <darcari@redhat.com>
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/2097
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172436
This patch series updates the Hyper-V VMBus driver to upstream kernel 6.3
Signed-off-by: Mohamed Gamal Morsy <mgamal@redhat.com>
Approved-by: Lyude Paul <lyude@redhat.com>
Approved-by: Donald Dutile <ddutile@redhat.com>
Approved-by: Tony Camuso <tcamuso@redhat.com>
Approved-by: David Arcari <darcari@redhat.com>
Approved-by: Lenny Szubowicz <lszubowi@redhat.com>
Approved-by: John W. Linville <linville@redhat.com>
Approved-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Jan Stancek <jstancek@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2175165
commit d474d92d70250d43e7ce0c7cb8623f31ee7c40f6
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Fri Nov 11 14:55:17 2022 +0100
x86/apic: Remove X86_IRQ_ALLOC_CONTIGUOUS_VECTORS
Now that the PCI/MSI core code does early checking for multi-MSI support
X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore.
Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20221111122015.865042356@linutronix.de
Signed-off-by: David Arcari <darcari@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2175165
commit dc2b453290c471266a2d56d7ead981e3c5cea05e
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Mon Dec 6 23:51:33 2021 +0100
PCI: hv: Rework MSI handling
Replace the about to vanish iterators and make use of the filtering. Take
the descriptor lock around the iterators.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211206210748.629363944@linutronix.de
Signed-off-by: David Arcari <darcari@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2172436
Since commit fc7a6209d571 ("bus: Make remove callback return
void") forces bus_type::remove be void-returned, it doesn't
make much sense for any bus based driver implementing remove
callbalk to return non-void to its caller.
As such, change the remove function for Hyper-V VMBus based
drivers to return void.
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Link: https://lore.kernel.org/r/TYCP286MB2323A93C55526E4DF239D3ACCAFA9@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit 96ec2939620c48a503d9c89865c0c230d6f955e4)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2151246
Upstream Status: e58f2259b91c02974c20db7b28d39d810a21249b
Conflict(s):
Patching file arch/powerpc/platforms/powernv/pci-ioda.c: Hunk #1 FAILED
at 2154
-and-
Patching file arch/powerpc/platforms/pseries/msi.c: Hunk #2 FAILED at
449, Hunk #3 FAILED at 580.
RHEL's version of these drivers is out of date and does not have
'MSI domains' implemented (see upstream commit 0fcfe2247e75
"powerpc/powernv/pci: Add MSI domains" and a5f3d2c17b07
"powerpc/pseries/pci: Add MSI domains"), thus the changes do not apply.
These can be addressed if the drivers are ever updated.
Patching file drivers/net/wireless/ath/ath11k/pci.c: Hunk #1 FAILED at
911. The conflict is with surrounding context, not with the line of
code being changed. This is due to out-of-order (with respect to
what occurred upstream) backports with the driver and this series
changes.
For RHEL, the change to drivers/pci/controller/pci-hyperv.c was required
since prior hyperv related backports skipped over this dependency:
namely RHEL commit 234333956e "PCI: hv: Only reuse existing IRTE
allocation for Multi-MSI". This will clean that up.
commit e58f2259b91c02974c20db7b28d39d810a21249b
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Mon Dec 6 23:27:39 2021 +0100
genirq/msi, treewide: Use a named struct for PCI/MSI attributes
The unnamed struct sucks and is in the way of further cleanups. Stick the
PCI related MSI data into a real data structure and cleanup all users.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20211206210224.374863119@linutronix.de
Signed-off-by: Myron Stowe <mstowe@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
Jeffrey added Multi-MSI support to the pci-hyperv driver by the 4 patches:
08e61e861a0e ("PCI: hv: Fix multi-MSI to allow more than one MSI vector")
455880dfe292 ("PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI")
b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
a2bad844a67b ("PCI: hv: Fix interrupt mapping for multi-MSI")
It turns out that the third patch (b4b77778ecc5) causes a performance
regression because all the interrupts now happen on 1 physical CPU (or two
pCPUs, if one pCPU doesn't have enough vectors). When a guest has many PCI
devices, it may suffer from soft lockups if the workload is heavy, e.g.,
see https://lwn.net/ml/linux-kernel/20220804025104.15673-1-decui@microsoft.com/
Commit b4b77778ecc5 itself is good. The real issue is that the hypercall in
hv_irq_unmask() -> hv_arch_irq_unmask() ->
hv_do_hypercall(HVCALL_RETARGET_INTERRUPT...) only changes the target
virtual CPU rather than physical CPU; with b4b77778ecc5, the pCPU is
determined only once in hv_compose_msi_msg() where only vCPU0 is specified;
consequently the hypervisor only uses 1 target pCPU for all the interrupts.
Note: before b4b77778ecc5, the pCPU is determined twice, and when the pCPU
is determined the second time, the vCPU in the effective affinity mask is
used (i.e., it isn't always vCPU0), so the hypervisor chooses different
pCPU for each interrupt.
The hypercall will be fixed in future to update the pCPU as well, but
that will take quite a while, so let's restore the old behavior in
hv_compose_msi_msg(), i.e., don't reuse the existing IRTE allocation for
single-MSI and MSI-X; for multi-MSI, we choose the vCPU in a round-robin
manner for each PCI device, so the interrupts of different devices can
happen on different pCPUs, though the interrupts of each device happen on
some single pCPU.
The hypercall fix may not be backported to all old versions of Hyper-V, so
we want to have this guest side change forever (or at least till we're sure
the old affected versions of Hyper-V are no longer supported).
Fixes: b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
Co-developed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Co-developed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20221104222953.11356-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit c234ba8042920fa83635808dc5673f36869ca280)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
RHEL-only:
Don't use msi_attrib->pci as the struct isn't introduced in RHEL-9
Conflicts:
drivers/pci/controller/pci-hyperv.c (missing commit 4d0b8298818b and RHEL-only code)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
The local variable 'vector' must be u32 rather than u8: see the
struct hv_msi_desc3.
'vector_count' should be u16 rather than u8: see struct hv_msi_desc,
hv_msi_desc2 and hv_msi_desc3.
Fixes: a2bad844a67b ("PCI: hv: Fix interrupt mapping for multi-MSI")
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: Jeffrey Hugo <quic_jhugo@quicinc.com>
Cc: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/20221027205256.17678-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit e70af8d040d2b7904dca93d942ba23fb722e21b1)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Conflicts:
drivers/pci/controller/pci-hyperv.c (missing commit 4d0b8298818b)
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
The cpumask that is passed to this function ultimately comes from
irq_data_get_effective_affinity_mask(), which was recently changed to
return a const cpumask pointer. The first level of functions handling
the affinity mask were updated, but not this helper function.
Fixes: 4d0b8298818b ("genirq: Return a const cpumask from irq_data_get_affinity_mask")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220708004931.1672-1-samuel@sholland.org
(cherry picked from commit 9167fd5d5549bcea6d4735a270908da2a3475f3a)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
[ Similarly to commit a765ed47e4516 ("PCI: hv: Fix synchronization
between channel callback and hv_compose_msi_msg()"): ]
The (on-stack) teardown packet becomes invalid once the completion
timeout in hv_pci_bus_exit() has expired and hv_pci_bus_exit() has
returned. Prevent the channel callback from accessing the invalid
packet by removing the ID associated to such packet from the VMbus
requestor in hv_pci_bus_exit().
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20220511223207.3386-3-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit b4927bd272623694314f37823302f9d67aa5964c)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest in the host-to-guest ring buffer. Ensure that
invalid values cannot cause data being copied out of the bounds of the
source buffer in hv_pci_onchannelcallback().
While at it, remove a redundant validation in hv_pci_generic_compl():
hv_pci_onchannelcallback() already ensures that all processed incoming
packets are "at least as large as [in fact larger than] a response".
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20220511223207.3386-2-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit 9937fa6d1eb6fac95586970e17617a718919c858)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
According to Dexuan, the hypervisor folks beleive that multi-msi
allocations are not correct. compose_msi_msg() will allocate multi-msi
one by one. However, multi-msi is a block of related MSIs, with alignment
requirements. In order for the hypervisor to allocate properly aligned
and consecutive entries in the IOMMU Interrupt Remapping Table, there
should be a single mapping request that requests all of the multi-msi
vectors in one shot.
Dexuan suggests detecting the multi-msi case and composing a single
request related to the first MSI. Then for the other MSIs in the same
block, use the cached information. This appears to be viable, so do it.
Suggested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1652282599-21643-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit a2bad844a67b1c7740bda63e87453baf63c3a7f7)
RHEL-only:
Use RHEL-9 structs
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
Currently if compose_msi_msg() is called multiple times, it will free any
previous IRTE allocation, and generate a new allocation. While nothing
prevents this from occurring, it is extraneous when Linux could just reuse
the existing allocation and avoid a bunch of overhead.
However, when future IRTE allocations operate on blocks of MSIs instead of
a single line, freeing the allocation will impact all of the lines. This
could cause an issue where an allocation of N MSIs occurs, then some of
the lines are retargeted, and finally the allocation is freed/reallocated.
The freeing of the allocation removes all of the configuration for the
entire block, which requires all the lines to be retargeted, which might
not happen since some lines might already be unmasked/active.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1652282582-21595-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit b4b77778ecc5bfbd4e77de1b2fd5c1dd3c655f1f)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
In the multi-MSI case, hv_arch_irq_unmask() will only operate on the first
MSI of the N allocated. This is because only the first msi_desc is cached
and it is shared by all the MSIs of the multi-MSI block. This means that
hv_arch_irq_unmask() gets the correct address, but the wrong data (always
0).
This can break MSIs.
Lets assume MSI0 is vector 34 on CPU0, and MSI1 is vector 33 on CPU0.
hv_arch_irq_unmask() is called on MSI0. It uses a hypercall to configure
the MSI address and data (0) to vector 34 of CPU0. This is correct. Then
hv_arch_irq_unmask is called on MSI1. It uses another hypercall to
configure the MSI address and data (0) to vector 33 of CPU0. This is
wrong, and results in both MSI0 and MSI1 being routed to vector 33. Linux
will observe extra instances of MSI1 and no instances of MSI0 despite the
endpoint device behaving correctly.
For the multi-MSI case, we need unique address and data info for each MSI,
but the cached msi_desc does not provide that. However, that information
can be gotten from the int_desc cached in the chip_data by
compose_msi_msg(). Fix the multi-MSI case to use that cached information
instead. Since hv_set_msi_entry_from_desc() is no longer applicable,
remove it.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1651068453-29588-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit 455880dfe292a2bdd3b4ad6a107299fce610e64b)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
Dexuan wrote:
"[...] when we disable AccelNet, the host PCI VSP driver sends a
PCI_EJECT message first, and the channel callback may set
hpdev->state to hv_pcichild_ejecting on a different CPU. This can
cause hv_compose_msi_msg() to exit from the loop and 'return', and
the on-stack variable 'ctxt' is invalid. Now, if the response
message from the host arrives, the channel callback will try to
access the invalid 'ctxt' variable, and this may cause a crash."
Schematically:
Hyper-V sends PCI_EJECT msg
hv_pci_onchannelcallback()
state = hv_pcichild_ejecting
hv_compose_msi_msg()
alloc and init comp_pkt
state == hv_pcichild_ejecting
Hyper-V sends VM_PKT_COMP msg
hv_pci_onchannelcallback()
retrieve address of comp_pkt
'free' comp_pkt and return
comp_pkt->completion_func()
Dexuan also showed how the crash can be triggered after introducing
suitable delays in the driver code, thus validating the 'assumption'
that the host can still normally respond to the guest's compose_msi
request after the host has started to eject the PCI device.
Fix the synchronization by leveraging the requestor lock as follows:
- Before 'return'-ing in hv_compose_msi_msg(), remove the ID (while
holding the requestor lock) associated to the completion packet.
- Retrieve the address *and call ->completion_func() within a same
(requestor) critical section in hv_pci_onchannelcallback().
Reported-by: Wei Hu <weh@microsoft.com>
Reported-by: Dexuan Cui <decui@microsoft.com>
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220419122325.10078-7-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit a765ed47e45166451680ee9af2b9e435c82ec3ba)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
Currently, pointers to guest memory are passed to Hyper-V as transaction
IDs in hv_pci. In the face of errors or malicious behavior in Hyper-V,
hv_pci should not expose or trust the transaction IDs returned by
Hyper-V to be valid guest memory addresses. Instead, use small integers
generated by vmbus_requestor as request (transaction) IDs.
Suggested-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Andrea Parri (Microsoft) <parri.andrea@gmail.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/20220419122325.10078-3-parri.andrea@gmail.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit de5ddb7d44347ad8b00533c1850a4e2e636a1ce9)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2139460
If the allocation of multiple MSI vectors for multi-MSI fails in the core
PCI framework, the framework will retry the allocation as a single MSI
vector, assuming that meets the min_vecs specified by the requesting
driver.
Hyper-V advertises that multi-MSI is supported, but reuses the VECTOR
domain to implement that for x86. The VECTOR domain does not support
multi-MSI, so the alloc will always fail and fallback to a single MSI
allocation.
In short, Hyper-V advertises a capability it does not implement.
Hyper-V can support multi-MSI because it coordinates with the hypervisor
to map the MSIs in the IOMMU's interrupt remapper, which is something the
VECTOR domain does not have. Therefore the fix is simple - copy what the
x86 IOMMU drivers (AMD/Intel-IR) do by removing
X86_IRQ_ALLOC_CONTIGUOUS_VECTORS after calling the VECTOR domain's
pci_msi_prepare().
Fixes: 4daace0d8c ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1649856981-14649-1-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit 08e61e861a0e47e5e1a3fb78406afd6b0cea6b6d)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2092794
commit 23e118a48acf7be223e57d98e98da8ac5a4071ac
Author: Dexuan Cui <decui@microsoft.com>
Date: Mon May 2 00:42:55 2022 -0700
PCI: hv: Do not set PCI_COMMAND_MEMORY to reduce VM boot time
Currently when the pci-hyperv driver finishes probing and initializing the
PCI device, it sets the PCI_COMMAND_MEMORY bit; later when the PCI device
is registered to the core PCI subsystem, the core PCI driver's BAR detection
and initialization code toggles the bit multiple times, and each toggling of
the bit causes the hypervisor to unmap/map the virtual BARs from/to the
physical BARs, which can be slow if the BAR sizes are huge, e.g., a Linux VM
with 14 GPU devices has to spend more than 3 minutes on BAR detection and
initialization, causing a long boot time.
Reduce the boot time by not setting the PCI_COMMAND_MEMORY bit when we
register the PCI device (there is no need to have it set in the first place).
The bit stays off till the PCI device driver calls pci_enable_device().
With this change, the boot time of such a 14-GPU VM is reduced by almost
3 minutes.
Link: https://lore.kernel.org/lkml/20220419220007.26550-1-decui@microsoft.com/
Tested-by: Boqun Feng (Microsoft) <boqun.feng@gmail.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Jake Oshins <jakeo@microsoft.com>
Link: https://lore.kernel.org/r/20220502074255.16901-1-decui@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2086678
Fix the following build error:
drivers/pci/controller/pci-hyperv.c:769:13: error: ‘hv_set_msi_entry_from_desc’ defined but not used [-Werror=unused-function]
769 | static void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
The arm64 implementation of hv_set_msi_entry_from_desc() is not used after
d06957d7a692 ("PCI: hv: Avoid the retarget interrupt hypercall in
irq_unmask() on ARM64"), so remove it.
Fixes: d06957d7a692 ("PCI: hv: Avoid the retarget interrupt hypercall in irq_unmask() on ARM64")
Link: https://lore.kernel.org/r/20220317085130.36388-1-yuehaibing@huawei.com
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
(cherry picked from commit 22ef7ee3eeb2a41e07f611754ab9a2663232fedf)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2086678
On ARM64 Hyper-V guests, SPIs are used for the interrupts of virtual PCI
devices, and SPIs can be managed directly via GICD registers. Therefore
the retarget interrupt hypercall is not needed on ARM64.
An arch-specific interface hv_arch_irq_unmask() is introduced to handle
the architecture level differences on this. For x86, the behavior
remains unchanged, while for ARM64 no hypercall is invoked when
unmasking an irq for virtual PCI devices.
Link: https://lore.kernel.org/r/20220217034525.1687678-1-boqun.feng@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
(cherry picked from commit d06957d7a6929e6a4aa959cb59d66f0c095fc974)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2086678
When kernel boots with a NUMA topology with some NUMA nodes offline, the PCI
driver should only set an online NUMA node on the device. This can happen
during KDUMP where some NUMA nodes are not made online by the KDUMP kernel.
This patch also fixes the case where kernel is booting with "numa=off".
Fixes: 999dd956d8 ("PCI: hv: Add support for protocol 1.3 and support PCI_BUS_RELATIONS2")
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@microsoft.com>
Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/1643247814-15184-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit 3149efcdf2c6314420c418dfc94de53bfd076b1f)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2068432
commit 8d21732475c637c7efcdb91dc927a4c594e97898
Author: Michael Kelley <mikelley@microsoft.com>
Date: Thu Mar 24 09:14:52 2022 -0700
PCI: hv: Propagate coherence from VMbus device to PCI device
PCI pass-thru devices in a Hyper-V VM are represented as a VMBus
device and as a PCI device. The coherence of the VMbus device is
set based on the VMbus node in ACPI, but the PCI device has no
ACPI node and defaults to not hardware coherent. This results
in extra software coherence management overhead on ARM64 when
devices are hardware coherent.
Fix this by setting up the PCI host bus so that normal
PCI mechanisms will propagate the coherence of the VMbus
device to the PCI device. There's no effect on x86/x64 where
devices are always hardware coherent.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1648138492-2191-3-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2024852
commit d9932b46915664c88709d59927fa67e797adec56
Author: Sunil Muthuswamy <sunilmut@microsoft.com>
Date: Wed Jan 5 11:32:36 2022 -0800
PCI: hv: Add arm64 Hyper-V vPCI support
Add arm64 Hyper-V vPCI support by implementing the arch specific
interfaces. Introduce an IRQ domain and chip specific to Hyper-v vPCI that
is based on SPIs. The IRQ domain parents itself to the arch GIC IRQ domain
for basic vector management.
[bhelgaas: squash in fix from Yang Li <yang.lee@linux.alibaba.com>:
https://lore.kernel.org/r/20220112003324.62755-1-yang.lee@linux.alibaba.com]
Link: https://lore.kernel.org/r/1641411156-31705-3-git-send-email-sunilmut@linux.microsoft.com
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2024852
commit 831c1ae725f7d2f8f858b0840692b48e75b49331
Author: Sunil Muthuswamy <sunilmut@microsoft.com>
Date: Wed Jan 5 11:32:35 2022 -0800
PCI: hv: Make the code arch neutral by adding arch specific interfaces
Encapsulate arch dependencies in Hyper-V vPCI through a set of
arch-dependent interfaces. Adding these arch specific interfaces will
allow for an implementation for other architectures, such as arm64.
There are no functional changes expected from this patch.
Link: https://lore.kernel.org/r/1641411156-31705-2-git-send-email-sunilmut@linux.microsoft.com
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Boqun Feng <boqun.feng@gmail.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2024852
commit f18312084300598544529510bcfab5f3e795e36a
Author: Krzysztof Wilczyński <kw@linux.com>
Date: Fri Oct 8 22:27:30 2021 +0000
PCI: hv: Remove unnecessary use of %hx
"dom_req" is a u16 but varargs automatically promotes it to int, so there's
no point in using the %h modifier. Drop it.
See cbacb5ab0a ("docs: printk-formats: Stop encouraging use of
unnecessary %h[xudi] and %hh[xudi]") and 70eb2275ff ("checkpatch: add
warning for unnecessary use of %h[xudi] and %hh[xudi]").
Link: https://lore.kernel.org/r/20211008222732.2868493-1-kw@linux.com
Signed-off-by: Krzysztof Wilczyński <kw@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2008571
In hv_pci_bus_exit, the code is holding a spinlock while calling
pci_destroy_slot(), which takes a mutex.
This is not safe for spinlock. Fix this by moving the children to be
deleted to a list on the stack, and removing them after spinlock is
released.
Fixes: 94d2276320 ("PCI: hv: Fix a race condition when removing the device")
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Wei Liu <wei.liu@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Rob Herring <robh@kernel.org>
Cc: "Krzysztof Wilczyński" <kw@linux.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/linux-hyperv/20210823152130.GA21501@kili/
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1630365207-20616-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
(cherry picked from commit 41608b64b10b80fe00dd253cd8326ec8ad85930f)
Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1949613
commit 88f94c7f8f40d7e26f991f6f6ed914ff44361d75
Author: Boqun Feng <boqun.feng@gmail.com>
Date: Tue Jul 27 02:06:57 2021 +0800
PCI: hv: Turn on the host bridge probing on ARM64
Now we have everything we need, just provide a proper sysdata type for
the bus to use on ARM64 and everything else works.
Link: https://lore.kernel.org/r/20210726180657.142727-9-boqun.feng@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1949613
commit 9e7f9178ab4943b3a7294a12bc38925c515ca3f0
Author: Boqun Feng <boqun.feng@gmail.com>
Date: Tue Jul 27 02:06:56 2021 +0800
PCI: hv: Set up MSI domain at bridge probing time
Since PCI_HYPERV depends on PCI_MSI_IRQ_DOMAIN which selects
GENERIC_MSI_IRQ_DOMAIN, we can use dev_set_msi_domain() to set up the
MSI domain at probing time, and this works for both x86 and ARM64.
Therefore use it as the preparation for ARM64 Hyper-V PCI support.
As a result, no longer need to maintain ->fwnode in x86 specific
pci_sysdata, and make hv_pcibus_device own it instead.
Link: https://lore.kernel.org/r/20210726180657.142727-8-boqun.feng@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1949613
commit 38c0d266dc80b81f7f72314620f01ff6a1e119fe
Author: Boqun Feng <boqun.feng@gmail.com>
Date: Tue Jul 27 02:06:55 2021 +0800
PCI: hv: Set ->domain_nr of pci_host_bridge at probing time
No functional change, just store and maintain the PCI domain number in
the ->domain_nr of pci_host_bridge. Note that we still need to keep
the copy of domain number in x86-specific pci_sysdata, because x86 is
not a PCI_DOMAINS_GENERIC=y architecture, so the ->domain_nr of
pci_host_bridge doesn't work for it yet.
Link: https://lore.kernel.org/r/20210726180657.142727-7-boqun.feng@gmail.com
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1949613
commit 418cb6c8e051119125b886c879efdacb04df7165
Author: Arnd Bergmann <arnd@arndb.de>
Date: Tue Jul 27 02:06:54 2021 +0800
PCI: hv: Generify PCI probing
In order to support ARM64 Hyper-V PCI, we need to set up the bridge at
probing time because ARM64 is a PCI_DOMAIN_GENERIC=y arch and we don't
have pci_config_window (ARM64 sysdata) for a PCI root bus on Hyper-V, so
it's impossible to retrieve the information (e.g. PCI domains, MSI
domains) from bus sysdata on ARM64 after creation.
Originally in create_root_hv_pci_bus(), pci_create_root_bus() is used to
create the root bus and the corresponding bridge based on x86 sysdata.
Now we create a bridge first and then call pci_scan_root_bus_bridge(),
which allows us to do the necessary set-ups for the bridge.
Link: https://lore.kernel.org/r/20210726180657.142727-6-boqun.feng@gmail.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1949613
commit 8f6a6b3c50ce1caa81c47bb5855be02050c0eff7
Author: Sunil Muthuswamy <sunilmut@microsoft.com>
Date: Mon Jul 12 21:58:18 2021 +0000
PCI: hv: Support for create interrupt v3
Hyper-V vPCI protocol version 1_4 adds support for create interrupt
v3. Create interrupt v3 essentially makes the size of the vector
field bigger in the message, thereby allowing bigger vector values.
For example, that will come into play for supporting LPI vectors
on ARM, which start at 8192.
Link: https://lore.kernel.org/r/MW4PR21MB20026A6EA554A0B9EC696AA8C0159@MW4PR21MB2002.namprd21.prod.outlook.com
Signed-off-by: Sunil Muthuswamy <sunilmut@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Add check for hv_is_hyperv_initialized() at the top of
init_hv_pci_drv(), so if the pci-hyperv driver is force-loaded on non
Hyper-V platforms, the init_hv_pci_drv() will exit immediately, without
any side effects, like assignments to hvpci_block_ops, etc.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reported-and-tested-by: Mohammad Alqayeem <mohammad.alqyeem@nutanix.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1621984653-1210-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
With the new method of flushing/stopping the workqueue before doing bus
removal, the old mechanism of using refcount and wait for completion
is no longer needed. Remove those dead code.
Link: https://lore.kernel.org/r/1620806809-31055-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Long Li <longli@microsoft.com>
[lorenzo.pieralisi@arm.com: Reworded subject]
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
On removing the device, any work item (hv_pci_devices_present() or
hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
and race with hv_pci_remove().
This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
and decide to rescind the channel immediately after that.
Fix this by flushing/destroying the workqueue of hbus before doing hbus remove.
Link: https://lore.kernel.org/r/1620806800-30983-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>