linux-kernelorg-stable/include/linux/io.h

208 lines
5.8 KiB
C
Raw Normal View History

/* SPDX-License-Identifier: GPL-2.0-only */
/*
* Copyright 2006 PathScale, Inc. All Rights Reserved.
*/
#ifndef _LINUX_IO_H
#define _LINUX_IO_H
fix missing vmalloc.h includes Patch series "Memory allocation profiling", v6. Overview: Low overhead [1] per-callsite memory allocation profiling. Not just for debug kernels, overhead low enough to be deployed in production. Example output: root@moria-kvm:~# sort -rn /proc/allocinfo 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext 56373248 4737 mm/slub.c:2259 func:alloc_slab_page 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start 3940352 962 mm/memory.c:4214 func:alloc_anon_folio 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node ... Usage: kconfig options: - CONFIG_MEM_ALLOC_PROFILING - CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT - CONFIG_MEM_ALLOC_PROFILING_DEBUG adds warnings for allocations that weren't accounted because of a missing annotation sysctl: /proc/sys/vm/mem_profiling Runtime info: /proc/allocinfo Notes: [1]: Overhead To measure the overhead we are comparing the following configurations: (1) Baseline with CONFIG_MEMCG_KMEM=n (2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) (3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) (4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1) (5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT (6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y (7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y && CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y Performance overhead: To evaluate performance we implemented an in-kernel test executing multiple get_free_page/free_page and kmalloc/kfree calls with allocation sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU affinity set to a specific CPU to minimize the noise. Below are results from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on 56 core Intel Xeon: kmalloc pgalloc (1 baseline) 6.764s 16.902s (2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%) (3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%) (4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%) (5 memcg) 13.388s (+97.94%) 48.460s (+186.71%) (6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%) (7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%) Memory overhead: Kernel size: text data bss dec diff (1) 26515311 18890222 17018880 62424413 (2) 26524728 19423818 16740352 62688898 264485 (3) 26524724 19423818 16740352 62688894 264481 (4) 26524728 19423818 16740352 62688898 264485 (5) 26541782 18964374 16957440 62463596 39183 Memory consumption on a 56 core Intel CPU with 125GB of memory: Code tags: 192 kB PageExts: 262144 kB (256MB) SlabExts: 9876 kB (9.6MB) PcpuExts: 512 kB (0.5MB) Total overhead is 0.2% of total memory. Benchmarks: Hackbench tests run 100 times: hackbench -s 512 -l 200 -g 15 -f 25 -P baseline disabled profiling enabled profiling avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023) stdev 0.0137 0.0188 0.0077 hackbench -l 10000 baseline disabled profiling enabled profiling avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859) stdev 0.0933 0.0286 0.0489 stress-ng tests: stress-ng --class memory --seq 4 -t 60 stress-ng --class cpu --seq 4 -t 60 Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/ [2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/ This patch (of 37): The next patch drops vmalloc.h from a system header in order to fix a circular dependency; this adds it to all the files that were pulling it in implicitly. [kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c] Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev [surenb@google.com: fix arch/x86/mm/numa_32.c] Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com [kent.overstreet@linux.dev: a few places were depending on sizes.h] Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev [arnd@arndb.de: fix mm/kasan/hw_tags.c] Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org [surenb@google.com: fix arc build] Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Tested-by: Kees Cook <keescook@chromium.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alex Gaynor <alex.gaynor@gmail.com> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Andreas Hindborg <a.hindborg@samsung.com> Cc: Benno Lossin <benno.lossin@proton.me> Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Gary Guo <gary@garyguo.net> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-03-21 16:36:23 +00:00
#include <linux/sizes.h>
#include <linux/types.h>
x86/mm: Decouple <linux/vmalloc.h> from <asm/io.h> Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so remove it from there and fix up the resulting build problems triggered on x86 {64|32}-bit {def|allmod|allno}configs. The breakages were triggering in places where x86 builds relied on vmalloc() facilities but did not include <linux/vmalloc.h> explicitly and relied on the implicit inclusion via <asm/io.h>. Also add: - <linux/init.h> to <linux/io.h> - <asm/pgtable_types> to <asm/io.h> ... which were two other implicit header file dependencies. Suggested-by: David Miller <davem@davemloft.net> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> [ Tidied up the changelog. ] Acked-by: David Miller <davem@davemloft.net> Acked-by: Takashi Iwai <tiwai@suse.de> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Vinod Koul <vinod.koul@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Colin Cross <ccross@android.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: James E.J. Bottomley <JBottomley@odin.com> Cc: Jaroslav Kysela <perex@perex.cz> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Kees Cook <keescook@chromium.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Kristen Carlson Accardi <kristen@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Suma Ramars <sramars@cisco.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-06-02 09:01:38 +00:00
#include <linux/init.h>
#include <asm/io.h>
#include <asm/page.h>
struct device;
#ifndef __iowrite32_copy
void __iowrite32_copy(void __iomem *to, const void *from, size_t count);
#endif
void __ioread32_copy(void *to, const void __iomem *from, size_t count);
#ifndef __iowrite64_copy
void __iowrite64_copy(void __iomem *to, const void *from, size_t count);
#endif
#ifdef CONFIG_MMU
int ioremap_page_range(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot);
int vmap_page_range(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot);
#else
static inline int ioremap_page_range(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot)
{
return 0;
}
static inline int vmap_page_range(unsigned long addr, unsigned long end,
phys_addr_t phys_addr, pgprot_t prot)
{
return 0;
}
#endif
/*
* Managed iomap interface
*/
#ifdef CONFIG_HAS_IOPORT_MAP
void __iomem * devm_ioport_map(struct device *dev, unsigned long port,
unsigned int nr);
void devm_ioport_unmap(struct device *dev, void __iomem *addr);
#else
static inline void __iomem *devm_ioport_map(struct device *dev,
unsigned long port,
unsigned int nr)
{
return NULL;
}
static inline void devm_ioport_unmap(struct device *dev, void __iomem *addr)
{
}
#endif
void __iomem *devm_ioremap(struct device *dev, resource_size_t offset,
resource_size_t size);
void __iomem *devm_ioremap_uc(struct device *dev, resource_size_t offset,
resource_size_t size);
void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset,
resource_size_t size);
void devm_iounmap(struct device *dev, void __iomem *addr);
int check_signature(const volatile void __iomem *io_addr,
const unsigned char *signature, int length);
void devm_ioremap_release(struct device *dev, void *res);
void *devm_memremap(struct device *dev, resource_size_t offset,
size_t size, unsigned long flags);
void devm_memunmap(struct device *dev, void *addr);
/* architectures can override this */
pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
unsigned long size, pgprot_t prot);
linux/io.h: Add pci_remap_cfgspace() interface The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and Posting") mandate non-posted configuration transactions. As further highlighted in the PCIe specifications (4.0 - Rev0.3, "Ordering Considerations for the Enhanced Configuration Access Mechanism"), through ECAM and ECAM-derivative configuration mechanism, the memory mapped transactions from the host CPU into Configuration Requests on the PCI express fabric may create ordering problems for software because writes to memory address are typically posted transactions (unless the architecture can enforce through virtual address mapping non-posted write transactions behaviour) but writes to Configuration Space are not posted on the PCI express fabric. Current DT and ACPI host bridge controllers map PCI configuration space (ECAM and ECAM-derivative) into the virtual address space through ioremap() calls, that are non-cacheable device accesses on most architectures, but may provide "bufferable" or "posted" write semantics in architecture like eg ARM/ARM64 that allow ioremap'ed regions writes to be buffered in the bus connecting the host CPU to the PCI fabric; this behaviour, as underlined in the PCIe specifications, may trigger transactions ordering rules and must be prevented. Introduce a new generic and explicit API to create a memory mapping for ECAM and ECAM-derivative config space area that defaults to ioremap_nocache() (which should provide a sane default behaviour) but still allowing architectures on which ioremap_nocache() results in posted write transactions to override the function call with an arch specific implementation that complies with the PCI specifications for configuration transactions. [bhelgaas: fold in #ifdef CONFIG_PCI wrapper] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com>
2017-04-19 16:48:51 +00:00
#ifdef CONFIG_PCI
/*
* The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and
* Posting") mandate non-posted configuration transactions. This default
* implementation attempts to use the ioremap_np() API to provide this
* on arches that support it, and falls back to ioremap() on those that
* don't. Overriding this function is deprecated; arches that properly
* support non-posted accesses should implement ioremap_np() instead, which
* this default implementation can then use to return mappings compliant with
* the PCI specification.
linux/io.h: Add pci_remap_cfgspace() interface The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and Posting") mandate non-posted configuration transactions. As further highlighted in the PCIe specifications (4.0 - Rev0.3, "Ordering Considerations for the Enhanced Configuration Access Mechanism"), through ECAM and ECAM-derivative configuration mechanism, the memory mapped transactions from the host CPU into Configuration Requests on the PCI express fabric may create ordering problems for software because writes to memory address are typically posted transactions (unless the architecture can enforce through virtual address mapping non-posted write transactions behaviour) but writes to Configuration Space are not posted on the PCI express fabric. Current DT and ACPI host bridge controllers map PCI configuration space (ECAM and ECAM-derivative) into the virtual address space through ioremap() calls, that are non-cacheable device accesses on most architectures, but may provide "bufferable" or "posted" write semantics in architecture like eg ARM/ARM64 that allow ioremap'ed regions writes to be buffered in the bus connecting the host CPU to the PCI fabric; this behaviour, as underlined in the PCIe specifications, may trigger transactions ordering rules and must be prevented. Introduce a new generic and explicit API to create a memory mapping for ECAM and ECAM-derivative config space area that defaults to ioremap_nocache() (which should provide a sane default behaviour) but still allowing architectures on which ioremap_nocache() results in posted write transactions to override the function call with an arch specific implementation that complies with the PCI specifications for configuration transactions. [bhelgaas: fold in #ifdef CONFIG_PCI wrapper] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com>
2017-04-19 16:48:51 +00:00
*/
#ifndef pci_remap_cfgspace
#define pci_remap_cfgspace pci_remap_cfgspace
static inline void __iomem *pci_remap_cfgspace(phys_addr_t offset,
size_t size)
{
return ioremap_np(offset, size) ?: ioremap(offset, size);
linux/io.h: Add pci_remap_cfgspace() interface The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and Posting") mandate non-posted configuration transactions. As further highlighted in the PCIe specifications (4.0 - Rev0.3, "Ordering Considerations for the Enhanced Configuration Access Mechanism"), through ECAM and ECAM-derivative configuration mechanism, the memory mapped transactions from the host CPU into Configuration Requests on the PCI express fabric may create ordering problems for software because writes to memory address are typically posted transactions (unless the architecture can enforce through virtual address mapping non-posted write transactions behaviour) but writes to Configuration Space are not posted on the PCI express fabric. Current DT and ACPI host bridge controllers map PCI configuration space (ECAM and ECAM-derivative) into the virtual address space through ioremap() calls, that are non-cacheable device accesses on most architectures, but may provide "bufferable" or "posted" write semantics in architecture like eg ARM/ARM64 that allow ioremap'ed regions writes to be buffered in the bus connecting the host CPU to the PCI fabric; this behaviour, as underlined in the PCIe specifications, may trigger transactions ordering rules and must be prevented. Introduce a new generic and explicit API to create a memory mapping for ECAM and ECAM-derivative config space area that defaults to ioremap_nocache() (which should provide a sane default behaviour) but still allowing architectures on which ioremap_nocache() results in posted write transactions to override the function call with an arch specific implementation that complies with the PCI specifications for configuration transactions. [bhelgaas: fold in #ifdef CONFIG_PCI wrapper] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com>
2017-04-19 16:48:51 +00:00
}
#endif
#endif
/*
* Some systems do not have legacy ISA devices.
* /dev/port is not a valid interface on these systems.
* So for those archs, <asm/io.h> should define the following symbol.
*/
#ifndef arch_has_dev_port
#define arch_has_dev_port() (1)
#endif
/*
* Some systems (x86 without PAT) have a somewhat reliable way to mark a
* physical address range such that uncached mappings will actually
* end up write-combining. This facility should be used in conjunction
* with pgprot_writecombine, ioremap-wc, or set_memory_wc, since it has
* no effect if the per-page mechanisms are functional.
* (On x86 without PAT, these functions manipulate MTRRs.)
*
* arch_phys_del_wc(0) or arch_phys_del_wc(any error code) is guaranteed
* to have no effect.
*/
#ifndef arch_phys_wc_add
static inline int __must_check arch_phys_wc_add(unsigned long base,
unsigned long size)
{
return 0; /* It worked (i.e. did nothing). */
}
static inline void arch_phys_wc_del(int handle)
{
}
#define arch_phys_wc_add arch_phys_wc_add
x86/mm/mtrr: Avoid #ifdeffery with phys_wc_to_mtrr_index() There is only one user but since we're going to bury MTRR next out of access to drivers, expose this last piece of API to drivers in a general fashion only needing io.h for access to helpers. Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Abhilash Kesavan <a.kesavan@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Antonino Daplas <adaplas@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Cristian Stoica <cristian.stoica@freescale.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Dave Airlie <airlied@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suresh Siddha <sbsiddha@gmail.com> Cc: Thierry Reding <treding@nvidia.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Ville Syrjälä <syrjala@sci.fi> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will.deacon@arm.com> Cc: dri-devel@lists.freedesktop.org Link: http://lkml.kernel.org/r/1429722736-4473-1-git-send-email-mcgrof@do-not-panic.com Link: http://lkml.kernel.org/r/1432628901-18044-11-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-26 08:28:13 +00:00
#ifndef arch_phys_wc_index
static inline int arch_phys_wc_index(int handle)
{
return -1;
}
#define arch_phys_wc_index arch_phys_wc_index
#endif
#endif
int devm_arch_phys_wc_add(struct device *dev, unsigned long base, unsigned long size);
enum {
/* See memremap() kernel-doc for usage description... */
MEMREMAP_WB = 1 << 0,
MEMREMAP_WT = 1 << 1,
MEMREMAP_WC = 1 << 2,
x86/mm: Add support to access boot related data in the clear Boot data (such as EFI related data) is not encrypted when the system is booted because UEFI/BIOS does not run with SME active. In order to access this data properly it needs to be mapped decrypted. Update early_memremap() to provide an arch specific routine to modify the pagetable protection attributes before they are applied to the new mapping. This is used to remove the encryption mask for boot related data. Update memremap() to provide an arch specific routine to determine if RAM remapping is allowed. RAM remapping will cause an encrypted mapping to be generated. By preventing RAM remapping, ioremap_cache() will be used instead, which will provide a decrypted mapping of the boot related data. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Dave Young <dyoung@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Toshimitsu Kani <toshi.kani@hpe.com> Cc: kasan-dev@googlegroups.com Cc: kvm@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-efi@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/81fb6b4117a5df6b9f2eda342f81bbef4b23d2e5.1500319216.git.thomas.lendacky@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-07-17 21:10:16 +00:00
MEMREMAP_ENC = 1 << 3,
MEMREMAP_DEC = 1 << 4,
};
void *memremap(resource_size_t offset, size_t size, unsigned long flags);
void memunmap(void *addr);
x86/io: add interface to reserve io memtype for a resource range. (v1.1) A recent change to the mm code in: 87744ab3832b mm: fix cache mode tracking in vm_insert_mixed() started enforcing checking the memory type against the registered list for amixed pfn insertion mappings. It happens that the drm drivers for a number of gpus relied on this being broken. Currently the driver only inserted VRAM mappings into the tracking table when they came from the kernel, and userspace mappings never landed in the table. This led to a regression where all the mapping end up as UC instead of WC now. I've considered a number of solutions but since this needs to be fixed in fixes and not next, and some of the solutions were going to introduce overhead that hadn't been there before I didn't consider them viable at this stage. These mainly concerned hooking into the TTM io reserve APIs, but these API have a bunch of fast paths I didn't want to unwind to add this to. The solution I've decided on is to add a new API like the arch_phys_wc APIs (these would have worked but wc_del didn't take a range), and use them from the drivers to add a WC compatible mapping to the table for all VRAM on those GPUs. This means we can then create userspace mapping that won't get degraded to UC. v1.1: use CONFIG_X86_PAT + add some comments in io.h Cc: Toshi Kani <toshi.kani@hp.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: x86@kernel.org Cc: mcgrof@suse.com Cc: Dan Williams <dan.j.williams@intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
2016-10-24 05:27:59 +00:00
/*
* On x86 PAT systems we have memory tracking that keeps track of
* the allowed mappings on memory ranges. This tracking works for
* all the in-kernel mapping APIs (ioremap*), but where the user
* wishes to map a range from a physical device into user memory
* the tracking won't be updated. This API is to be used by
* drivers which remap physical device pages into userspace,
* and wants to make sure they are mapped WC and not UC.
*/
#ifndef arch_io_reserve_memtype_wc
static inline int arch_io_reserve_memtype_wc(resource_size_t base,
resource_size_t size)
{
return 0;
}
static inline void arch_io_free_memtype_wc(resource_size_t base,
resource_size_t size)
{
}
#endif
int devm_arch_io_reserve_memtype_wc(struct device *dev, resource_size_t start,
resource_size_t size);
#ifdef CONFIG_STRICT_DEVMEM
static inline int range_is_allowed(unsigned long pfn, unsigned long size)
{
u64 from = ((u64)pfn) << PAGE_SHIFT;
u64 to = from + size;
u64 cursor = from;
while (cursor < to) {
if (!devmem_is_allowed(pfn))
return 0;
cursor += PAGE_SIZE;
pfn++;
}
return 1;
}
#else
static inline int range_is_allowed(unsigned long pfn, unsigned long size)
{
return 1;
}
#endif
#endif /* _LINUX_IO_H */