linux-kernelorg-stable/include/linux
Mel Gorman b795854b1f sched/numa: Set preferred NUMA node based on number of private faults
Ideally it would be possible to distinguish between NUMA hinting faults that
are private to a task and those that are shared. If treated identically
there is a risk that shared pages bounce between nodes depending on
the order they are referenced by tasks. Ultimately what is desirable is
that task private pages remain local to the task while shared pages are
interleaved between sharing tasks running on different nodes to give good
average performance. This is further complicated by THP as even
applications that partition their data may not be partitioning on a huge
page boundary.

To start with, this patch assumes that multi-threaded or multi-process
applications partition their data and that in general the private accesses
are more important for cpu->memory locality in the general case. Also,
no new infrastructure is required to treat private pages properly but
interleaving for shared pages requires additional infrastructure.

To detect private accesses the pid of the last accessing task is required
but the storage requirements are a high. This patch borrows heavily from
Ingo Molnar's patch "numa, mm, sched: Implement last-CPU+PID hash tracking"
to encode some bits from the last accessing task in the page flags as
well as the node information. Collisions will occur but it is better than
just depending on the node information. Node information is then used to
determine if a page needs to migrate. The PID information is used to detect
private/shared accesses. The preferred NUMA node is selected based on where
the maximum number of approximately private faults were measured. Shared
faults are not taken into consideration for a few reasons.

First, if there are many tasks sharing the page then they'll all move
towards the same node. The node will be compute overloaded and then
scheduled away later only to bounce back again. Alternatively the shared
tasks would just bounce around nodes because the fault information is
effectively noise. Either way accounting for shared faults the same as
private faults can result in lower performance overall.

The second reason is based on a hypothetical workload that has a small
number of very important, heavily accessed private pages but a large shared
array. The shared array would dominate the number of faults and be selected
as a preferred node even though it's the wrong decision.

The third reason is that multiple threads in a process will race each
other to fault the shared page making the fault information unreliable.

Signed-off-by: Mel Gorman <mgorman@suse.de>
[ Fix complication error when !NUMA_BALANCING. ]
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-30-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-10-09 12:40:35 +02:00
..
amba
bcma
byteorder
can
ceph
clk
crush
decompress
dma
extcon
fsl/bestcomm
hsi
i2c
iio
input
irqchip
isdn
lockd
mfd
mlx4
mlx5
mmc
mtd
netfilter
netfilter_arp
netfilter_bridge
netfilter_ipv4
netfilter_ipv6
nfsd
pinctrl
platform_data
power
raid
regulator
rtc
sched
spi
ssb
sunrpc
tc_act
unaligned
usb
uwb
wimax
8250_pci.h
a.out.h
acct.h
acpi.h
acpi_dma.h
acpi_gpio.h
acpi_io.h
acpi_pmtmr.h
adb.h
adfs_fs.h
aer.h
agp_backend.h
agpgart.h
ahci_platform.h
aio.h
alarmtimer.h
altera_jtaguart.h
altera_uart.h
amd-iommu.h
amifd.h
amifdreg.h
amigaffs.h
anon_inodes.h
apm-emulation.h
apm_bios.h
apple_bl.h
arcdevice.h
arm-cci.h
asn1.h
asn1_ber_bytecode.h
asn1_decoder.h
async.h
async_tx.h
ata.h
ata_platform.h
atalk.h
ath9k_platform.h
atm.h
atm_suni.h
atm_tcp.h
atmdev.h
atmel-mci.h
atmel-pwm-bl.h
atmel-ssc.h
atmel_pdc.h
atmel_pwm.h
atmel_serial.h
atmel_tc.h
atomic.h
attribute_container.h
audit.h
auto_dev-ioctl.h
auto_fs.h
auxvec.h
average.h
b1pcmcia.h
backing-dev.h
backlight.h
balloon_compaction.h
basic_mmio_gpio.h
bcd.h
bch.h
bcm47xx_wdt.h
bfin_mac.h
binfmts.h
bio.h
bit_spinlock.h
bitmap.h
bitops.h
bitrev.h
blk-iopoll.h
blk_types.h
blkdev.h
blktrace_api.h
blockgroup_lock.h
bma150.h
bootmem.h
bottom_half.h
brcmphy.h
bsearch.h
bsg-lib.h
bsg.h
btree-128.h
btree-type.h
btree.h
btrfs.h
buffer_head.h
bug.h
c2port.h
cache.h
capability.h
cb710.h
cciss_ioctl.h
cdev.h
cdrom.h
cfag12864b.h
cgroup.h
cgroup_subsys.h
circ_buf.h
cleancache.h
clk-private.h
clk-provider.h
clk.h
clkdev.h
clksrc-dbx500-prcmu.h
clockchips.h
clocksource.h
cm4000_cs.h
cmdline-parser.h
cn_proc.h
cnt32_to_63.h
coda.h
coda_psdev.h
com20020.h
compaction.h
compat.h
compiler-gcc.h
compiler-gcc3.h
compiler-gcc4.h
compiler-intel.h
compiler.h
completion.h
concap.h
configfs.h
connector.h
console.h
console_struct.h
consolemap.h
context_tracking.h
context_tracking_state.h
cordic.h
coredump.h
cper.h
cpu.h
cpu_cooling.h
cpu_pm.h
cpu_rmap.h
cpufreq.h
cpuidle.h
cpumask.h
cpuset.h
cramfs_fs.h
cramfs_fs_sb.h
crash_dump.h
crc-ccitt.h
crc-itu-t.h
crc-t10dif.h
crc7.h
crc8.h
crc16.h
crc32.h
crc32c.h
cred.h
crypto.h
cryptohash.h
cryptouser.h
cs5535.h
ctype.h
cuda.h
cyclades.h
cycx_x25.h
davinci_emac.h
dca.h
dcache.h
dccp.h
dcookies.h
debug_locks.h
debugfs.h
debugobjects.h
delay.h
delayacct.h
devfreq.h
device-mapper.h
device.h
device_cgroup.h
devpts_fs.h
digsig.h
dio.h
dirent.h
dlm.h
dlm_plock.h
dm-dirty-log.h
dm-io.h
dm-kcopyd.h
dm-region-hash.h
dm9000.h
dma-attrs.h
dma-buf.h
dma-contiguous.h
dma-debug.h
dma-direction.h
dma-mapping.h
dma_remapping.h
dmaengine.h
dmapool.h
dmar.h
dmi.h
dnotify.h
dns_resolver.h
dqblk_qtree.h
dqblk_v1.h
dqblk_v2.h
drbd.h
drbd_genl.h
drbd_genl_api.h
drbd_limits.h
ds1286.h
ds2782_battery.h
ds17287rtc.h
dtlk.h
dw_apb_timer.h
dw_dmac.h
dynamic_debug.h
dynamic_queue_limits.h
earlycpio.h
ecryptfs.h
edac.h
edd.h
edma.h
eeprom_93cx6.h
eeprom_93xx46.h
efi-bgrt.h
efi.h
efs_vh.h
eisa.h
elevator.h
elf-fdpic.h
elf.h
elfcore-compat.h
elfcore.h
elfnote.h
enclosure.h
err.h
errno.h
errqueue.h
etherdevice.h
ethtool.h
eventfd.h
eventpoll.h
evm.h
export.h
exportfs.h
ext2_fs.h
extcon.h
f2fs_fs.h
f75375s.h
falloc.h
fanotify.h
fault-inject.h
fb.h
fcdevice.h
fcntl.h
fd.h
fddidevice.h
fdtable.h
fec.h
file.h
filter.h
fips.h
firewire.h
firmware-map.h
firmware.h
fixp-arith.h
flat.h
flex_array.h
flex_proportions.h
fmc-sdb.h
fmc.h
font.h
freezer.h
frontswap.h
fs.h
fs_enet_pd.h
fs_stack.h
fs_struct.h
fs_uart_pd.h
fscache-cache.h
fscache.h
fsl-diu-fb.h
fsl_devices.h
fsl_hypervisor.h
fsnotify.h
fsnotify_backend.h
ftrace.h
ftrace_event.h
ftrace_irq.h
futex.h
gameport.h
gcd.h
genalloc.h
generic_acl.h
genetlink.h
genhd.h
genl_magic_func.h
genl_magic_struct.h
getcpu.h
gfp.h
gpio-fan.h
gpio-pxa.h
gpio.h
gpio_keys.h
gpio_mouse.h
gsmmux.h
hardirq.h
hash.h
hashtable.h
hdlc.h
hdlcdrv.h
hdmi.h
hid-debug.h
hid-roccat.h
hid-sensor-hub.h
hid-sensor-ids.h
hid.h
hiddev.h
hidraw.h
highmem.h
highuid.h
hil.h
hil_mlc.h
hippidevice.h
hp_sdc.h
hpet.h
hrtimer.h
htcpld.h
htirq.h
huge_mm.h
hugetlb.h
hugetlb_cgroup.h
hugetlb_inline.h
hw_breakpoint.h
hw_random.h
hwmon-sysfs.h
hwmon-vid.h
hwmon.h
hwspinlock.h
hyperv.h
i2c-algo-bit.h
i2c-algo-pca.h
i2c-algo-pcf.h
i2c-dev.h
i2c-gpio.h
i2c-mux-gpio.h
i2c-mux-pinctrl.h
i2c-mux.h
i2c-ocores.h
i2c-omap.h
i2c-pca-platform.h
i2c-pnx.h
i2c-pxa.h
i2c-smbus.h
i2c-xiic.h
i2c.h
i2o.h
i7300_idle.h
i8042.h
i8253.h
i82593.h
icmp.h
icmpv6.h
ide.h
idr.h
ieee80211.h
if_arp.h
if_bridge.h
if_eql.h
if_ether.h
if_fddi.h
if_frad.h
if_link.h
if_ltalk.h
if_macvlan.h
if_phonet.h
if_pppol2tp.h
if_pppox.h
if_team.h
if_tun.h
if_tunnel.h
if_vlan.h
igmp.h
ihex.h
ima.h
in.h
in6.h
inet.h
inet_diag.h
inet_lro.h
inetdevice.h
init.h
init_ohci1394_dma.h
init_task.h
initrd.h
inotify.h
input-polldev.h
input.h
integrity.h
intel-iommu.h
intel_mid_dma.h
intel_pmic_gpio.h
interrupt.h
interval_tree.h
interval_tree_generic.h
io-mapping.h
io.h
ioc3.h
ioc4.h
iocontext.h
iommu-helper.h
iommu.h
ioport.h
ioprio.h
iova.h
ip.h
ipack.h
ipc.h
ipc_namespace.h
ipmi-fru.h
ipmi.h
ipmi_smi.h
ipv6.h
ipv6_route.h
irq.h
irq_cpustat.h
irq_work.h
irqchip.h
irqdesc.h
irqdomain.h
irqflags.h
irqnr.h
irqreturn.h
isa.h
isapnp.h
iscsi_boot_sysfs.h
iscsi_ibft.h
isdn.h
isdn_divertif.h
isdn_ppp.h
isdnif.h
isicom.h
jbd.h
jbd2.h
jbd_common.h
jhash.h
jiffies.h
journal-head.h
joystick.h
jump_label.h
jump_label_ratelimit.h
jz4740-adc.h
kallsyms.h
kbd_diacr.h
kbd_kern.h
kbuild.h
kcmp.h
kconfig.h
kcore.h
kd.h
kdb.h
kdebug.h
kdev_t.h
kern_levels.h
kernel-page-flags.h
kernel.h
kernel_stat.h
kernelcapi.h
kexec.h
key-type.h
key.h
keyboard.h
kfifo.h
kgdb.h
khugepaged.h
klist.h
kmemcheck.h
kmemleak.h
kmod.h
kmsg_dump.h
kobj_map.h
kobject.h
kobject_ns.h
kprobes.h
kref.h
ks0108.h
ks8842.h
ks8851_mll.h
ksm.h
kthread.h
ktime.h
kvm_host.h
kvm_para.h
kvm_types.h
l2tp.h
lapb.h
latencytop.h
lcd.h
lcm.h
led-lm3530.h
leds-bd2802.h
leds-lp3944.h
leds-pca9532.h
leds-regulator.h
leds-tca6507.h
leds.h
leds_pwm.h
lglock.h
lguest.h
lguest_launcher.h
libata.h
libfdt.h
libfdt_env.h
libps2.h
license.h
linkage.h
linux_logo.h
lis3lv02d.h
list.h
list_bl.h
list_lru.h
list_nulls.h
list_sort.h
llc.h
llist.h
lockdep.h
lockref.h
log2.h
lp.h
lru_cache.h
lsm_audit.h
lz4.h
lzo.h
m48t86.h
mISDNdsp.h
mISDNhw.h
mISDNif.h
mailbox.h
maple.h
marvell_phy.h
math64.h
max17040_battery.h
mbcache.h
mbus.h
mc6821.h
mc146818rtc.h
mdio-bitbang.h
mdio-gpio.h
mdio-mux.h
mdio.h
mei_cl_bus.h
memblock.h
memcontrol.h
memory.h
memory_hotplug.h
mempolicy.h
mempool.h
memstick.h
mg_disk.h
micrel_phy.h
migrate.h mm: numa: Scan pages with elevated page_mapcount 2013-10-09 12:40:32 +02:00
migrate_mode.h
mii.h
miscdevice.h
mm.h sched/numa: Set preferred NUMA node based on number of private faults 2013-10-09 12:40:35 +02:00
mm_inline.h
mm_types.h sched/numa: Set preferred NUMA node based on number of private faults 2013-10-09 12:40:35 +02:00
mman.h
mmdebug.h
mmiotrace.h
mmu_context.h
mmu_notifier.h
mmzone.h
mnt_namespace.h
mod_devicetable.h
module.h
moduleloader.h
moduleparam.h
mount.h
mpage.h
mpi.h
mroute.h
mroute6.h
msdos_fs.h
msg.h
msi.h
msm_mdp.h
mutex-debug.h
mutex.h
mv643xx.h
mv643xx_eth.h
mv643xx_i2c.h
mxm-wmi.h
n_r3964.h
namei.h
nbd.h
net.h
netdev_features.h
netdevice.h
netfilter.h
netfilter_bridge.h
netfilter_ipv4.h
netfilter_ipv6.h
netlink.h
netpoll.h
nfs.h
nfs3.h
nfs4.h
nfs_fs.h
nfs_fs_i.h
nfs_fs_sb.h
nfs_idmap.h
nfs_iostat.h
nfs_page.h
nfs_xdr.h
nfsacl.h
nilfs2_fs.h
nl802154.h
nls.h
nmi.h
node.h
nodemask.h
notifier.h
nsc_gpio.h
nsproxy.h
ntb.h
nubus.h
numa.h
nvme.h
nvram.h
nwpserial.h
nx842.h
of.h
of_address.h
of_device.h
of_dma.h
of_fdt.h
of_gpio.h
of_iommu.h
of_irq.h
of_mdio.h
of_mtd.h
of_net.h
of_pci.h
of_pdt.h
of_platform.h
of_reserved_mem.h
oid_registry.h
olpc-ec.h
omap-dma.h
omap-iommu.h
omap-mailbox.h
omapfb.h
oom.h
openvswitch.h
opp.h
oprofile.h
oxu210hp.h
padata.h
page-debug-flags.h
page-flags-layout.h sched/numa: Set preferred NUMA node based on number of private faults 2013-10-09 12:40:35 +02:00
page-flags.h
page-isolation.h
page_cgroup.h
pageblock-flags.h
pagemap.h
pagevec.h
parport.h
parport_pc.h
parser.h
pata_arasan_cf_data.h
patchkey.h
path.h
pch_dma.h
pci-acpi.h
pci-aspm.h
pci-ats.h
pci-dma.h
pci.h
pci_hotplug.h
pci_ids.h
pcieport_if.h
pda_power.h
percpu-defs.h
percpu-refcount.h
percpu-rwsem.h
percpu.h
percpu_counter.h
percpu_ida.h
perf_event.h
perf_regs.h
personality.h
pfn.h
phonedev.h
phonet.h
phy.h
phy_fixed.h
pid.h
pid_namespace.h
pim.h
pipe_fs_i.h
pktcdvd.h
platform_device.h
plist.h
pm.h
pm2301_charger.h
pm_clock.h
pm_domain.h
pm_qos.h
pm_runtime.h
pm_wakeup.h
pmu.h
pnfs_osd_xdr.h
pnp.h
poison.h
poll.h
posix-clock.h
posix-timers.h
posix_acl.h
posix_acl_xattr.h
power_supply.h
ppp-comp.h
ppp_channel.h
ppp_defs.h
pps-gpio.h
pps_kernel.h
preempt.h
preempt_mask.h
prefetch.h
printk.h
prio_heap.h
proc_fs.h
proc_ns.h
profile.h
projid.h
proportions.h
pstore.h
pstore_ram.h
pti.h
ptp_classify.h
ptp_clock_kernel.h
ptrace.h
pvclock_gtod.h
pwm.h
pwm_backlight.h
pxa2xx_ssp.h
pxa168_eth.h
qnx6_fs.h
quicklist.h
quota.h
quotaops.h
radix-tree.h
raid_class.h
ramfs.h
random.h
range.h
ratelimit.h
rational.h
rbtree.h
rbtree_augmented.h
rculist.h
rculist_bl.h
rculist_nulls.h
rcupdate.h
rcutiny.h
rcutree.h
reboot.h
reciprocal_div.h
regmap.h
regset.h
relay.h
remoteproc.h
res_counter.h
reservation.h
reset-controller.h
reset.h
resource.h
resume-trace.h
rfkill-gpio.h
rfkill-regulator.h
rfkill.h
ring_buffer.h
rio.h
rio_drv.h
rio_ids.h
rio_regs.h
rmap.h
rndis.h
root_dev.h
rotary_encoder.h
rpmsg.h
rslib.h
rtc-ds2404.h
rtc-v3020.h
rtc.h
rtmutex.h
rtnetlink.h
rwlock.h
rwlock_api_smp.h
rwlock_types.h
rwsem-spinlock.h
rwsem.h
rxrpc.h
s3c_adc_battery.h
sa11x0-dma.h
scatterlist.h
scc.h
sched.h sched/numa: Add infrastructure for split shared/private accounting of NUMA hinting faults 2013-10-09 12:40:30 +02:00
sched_clock.h
screen_info.h
sctp.h
scx200.h
scx200_gpio.h
sdb.h
sdla.h
seccomp.h
securebits.h
security.h
selection.h
selinux.h
sem.h
semaphore.h
seq_file.h
seq_file_net.h
seqlock.h
serial.h
serial_8250.h
serial_core.h
serial_max3100.h
serial_mfd.h
serial_pnx8xxx.h
serial_s3c.h
serial_sci.h
serio.h
sfi.h
sfi_acpi.h
sh_clk.h
sh_dma.h
sh_eth.h
sh_intc.h
sh_timer.h
shdma-base.h
shm.h
shmem_fs.h
shrinker.h
signal.h
signalfd.h
sirfsoc_dma.h
sizes.h
skbuff.h
slab.h
slab_def.h
slub_def.h
sm501-regs.h
sm501.h
smc91x.h
smc911x.h
smp.h
smpboot.h
smsc911x.h
smscphy.h
sock_diag.h
socket.h
sonet.h
sony-laptop.h
sonypi.h
sort.h
sound.h
soundcard.h
spinlock.h
spinlock_api_smp.h
spinlock_api_up.h
spinlock_types.h
spinlock_types_up.h
spinlock_up.h
splice.h
srcu.h
ssbi.h
stackprotector.h
stacktrace.h
start_kernel.h
stat.h
statfs.h
static_key.h
stddef.h
ste_modem_shm.h
stmmac.h
stmp3xxx_rtc_wdt.h
stmp_device.h
stop_machine.h
string.h
string_helpers.h
stringify.h
sudmac.h
sungem_phy.h
sunserialcore.h
superhyway.h
suspend.h
svga.h
swab.h
swap.h
swapfile.h
swapops.h
swiotlb.h
synclink.h
sys.h
sys_soc.h
syscalls.h
syscore_ops.h
sysctl.h
sysfs.h
syslog.h
sysrq.h
sysv_fs.h
task_io_accounting.h
task_io_accounting_ops.h
task_work.h
taskstats_kern.h
tboot.h
tc.h
tca6416_keypad.h
tcp.h
tegra-ahb.h
tegra-cpuidle.h
tegra-powergate.h
tegra-soc.h
textsearch.h
textsearch_fsm.h
tfrc.h
thermal.h
thread_info.h
threads.h
ti_wilink_st.h
tick.h
tifm.h
timb_dma.h
timb_gpio.h
time.h
timekeeper_internal.h
timer.h
timerfd.h
timeriomem-rng.h
timerqueue.h
timex.h
topology.h
toshiba.h
tpm.h
tpm_command.h
trace_clock.h
trace_seq.h
tracehook.h
tracepoint.h
transport_class.h
tsacct_kern.h
tty.h sched/wait: Make the __wait_event*() interface more friendly 2013-10-04 10:16:25 +02:00
tty_driver.h
tty_flip.h
tty_ldisc.h
typecheck.h
types.h
u64_stats_sync.h
uaccess.h
ucb1400.h
ucs2_string.h
udp.h
uidgid.h
uinput.h
uio.h
uio_driver.h
uprobes.h
usb.h
usb_usual.h
usbdevice_fs.h
user-return-notifier.h
user.h
user_namespace.h
uts.h
utsname.h
uuid.h
uwb.h
vermagic.h
vexpress.h
vfio.h
vfs.h
vga_switcheroo.h
vgaarb.h
via-core.h
via-gpio.h
via.h
via_i2c.h
video_output.h
videodev2.h
virtio.h
virtio_caif.h
virtio_config.h
virtio_console.h
virtio_mmio.h
virtio_ring.h
virtio_scsi.h
vlynq.h
vm_event_item.h
vm_sockets.h
vmalloc.h
vme.h
vmpressure.h
vmstat.h
vmw_vmci_api.h
vmw_vmci_defs.h
vringh.h
vt.h
vt_buffer.h
vt_kern.h
vtime.h
w1-gpio.h
wait.h sched/wait: Clean up wait.h details a bit 2013-10-04 13:57:19 +02:00
wanrouter.h
watchdog.h
wireless.h
wl12xx.h
wm97xx.h
workqueue.h
writeback.h
ww_mutex.h
xattr.h
xilinxfb.h
xz.h
yam.h
z2_battery.h
zbud.h
zconf.h
zlib.h
zorro.h
zorro_ids.h
zutil.h