2017-11-01 14:08:43 +00:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
|
2012-10-13 09:46:48 +00:00
|
|
|
/*
|
|
|
|
* NUMA memory policies for Linux.
|
|
|
|
* Copyright 2003,2004 Andi Kleen SuSE Labs
|
|
|
|
*/
|
|
|
|
#ifndef _UAPI_LINUX_MEMPOLICY_H
|
|
|
|
#define _UAPI_LINUX_MEMPOLICY_H
|
|
|
|
|
|
|
|
#include <linux/errno.h>
|
|
|
|
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Both the MPOL_* mempolicy mode and the MPOL_F_* optional mode flags are
|
|
|
|
* passed by the user to either set_mempolicy() or mbind() in an 'int' actual.
|
|
|
|
* The MPOL_MODE_FLAGS macro determines the legal set of optional mode flags.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* Policies */
|
|
|
|
enum {
|
|
|
|
MPOL_DEFAULT,
|
|
|
|
MPOL_PREFERRED,
|
|
|
|
MPOL_BIND,
|
|
|
|
MPOL_INTERLEAVE,
|
2012-10-25 12:16:28 +00:00
|
|
|
MPOL_LOCAL,
|
mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes
Patch series "Introduce multi-preference mempolicy", v7.
This patch series introduces the concept of the MPOL_PREFERRED_MANY
mempolicy. This mempolicy mode can be used with either the
set_mempolicy(2) or mbind(2) interfaces. Like the MPOL_PREFERRED
interface, it allows an application to set a preference for nodes which
will fulfil memory allocation requests. Unlike the MPOL_PREFERRED mode,
it takes a set of nodes. Like the MPOL_BIND interface, it works over a
set of nodes. Unlike MPOL_BIND, it will not cause a SIGSEGV or invoke the
OOM killer if those preferred nodes are not available.
Along with these patches are patches for libnuma, numactl, numademo, and
memhog. They still need some polish, but can be found here:
https://gitlab.com/bwidawsk/numactl/-/tree/prefer-many It allows new
usage: `numactl -P 0,3,4`
The goal of the new mode is to enable some use-cases when using tiered memory
usage models which I've lovingly named.
1a. The Hare - The interconnect is fast enough to meet bandwidth and
latency requirements allowing preference to be given to all nodes with
"fast" memory.
1b. The Indiscriminate Hare - An application knows it wants fast
memory (or perhaps slow memory), but doesn't care which node it runs
on. The application can prefer a set of nodes and then xpu bind to
the local node (cpu, accelerator, etc). This reverses the nodes are
chosen today where the kernel attempts to use local memory to the CPU
whenever possible. This will attempt to use the local accelerator to
the memory.
2. The Tortoise - The administrator (or the application itself) is
aware it only needs slow memory, and so can prefer that.
Much of this is almost achievable with the bind interface, but the bind
interface suffers from an inability to fallback to another set of nodes if
binding fails to all nodes in the nodemask.
Like MPOL_BIND a nodemask is given. Inherently this removes ordering from the
preference.
> /* Set first two nodes as preferred in an 8 node system. */
> const unsigned long nodes = 0x3
> set_mempolicy(MPOL_PREFER_MANY, &nodes, 8);
> /* Mimic interleave policy, but have fallback *.
> const unsigned long nodes = 0xaa
> set_mempolicy(MPOL_PREFER_MANY, &nodes, 8);
Some internal discussion took place around the interface. There are two
alternatives which we have discussed, plus one I stuck in:
1. Ordered list of nodes. Currently it's believed that the added
complexity is nod needed for expected usecases.
2. A flag for bind to allow falling back to other nodes. This
confuses the notion of binding and is less flexible than the current
solution.
3. Create flags or new modes that helps with some ordering. This
offers both a friendlier API as well as a solution for more customized
usage. It's unknown if it's worth the complexity to support this.
Here is sample code for how this might work:
> // Prefer specific nodes for some something wacky
> set_mempolicy(MPOL_PREFER_MANY, 0x17c, 1024);
>
> // Default
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_SOCKET, NULL, 0);
> // which is the same as
> set_mempolicy(MPOL_DEFAULT, NULL, 0);
>
> // The Hare
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE, NULL, 0);
>
> // The Tortoise
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE_REV, NULL, 0);
>
> // Prefer the fast memory of the first two sockets
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE, -1, 2);
>
This patch (of 5):
The NUMA APIs currently allow passing in a "preferred node" as a single
bit set in a nodemask. If more than one bit it set, bits after the first
are ignored.
This single node is generally OK for location-based NUMA where memory
being allocated will eventually be operated on by a single CPU. However,
in systems with multiple memory types, folks want to target a *type* of
memory instead of a location. For instance, someone might want some
high-bandwidth memory but do not care about the CPU next to which it is
allocated. Or, they want a cheap, high capacity allocation and want to
target all NUMA nodes which have persistent memory in volatile mode. In
both of these cases, the application wants to target a *set* of nodes, but
does not want strict MPOL_BIND behavior as that could lead to OOM killer
or SIGSEGV.
So add MPOL_PREFERRED_MANY policy to support the multiple preferred nodes
requirement. This is not a pie-in-the-sky dream for an API. This was a
response to a specific ask of more than one group at Intel. Specifically:
1. There are existing libraries that target memory types such as
https://github.com/memkind/memkind. These are known to suffer from
SIGSEGV's when memory is low on targeted memory "kinds" that span more
than one node. The MCDRAM on a Xeon Phi in "Cluster on Die" mode is an
example of this.
2. Volatile-use persistent memory users want to have a memory policy
which is targeted at either "cheap and slow" (PMEM) or "expensive and
fast" (DRAM). However, they do not want to experience allocation
failures when the targeted type is unavailable.
3. Allocate-then-run. Generally, we let the process scheduler decide
on which physical CPU to run a task. That location provides a default
allocation policy, and memory availability is not generally considered
when placing tasks. For situations where memory is valuable and
constrained, some users want to allocate memory first, *then* allocate
close compute resources to the allocation. This is the reverse of the
normal (CPU) model. Accelerators such as GPUs that operate on
core-mm-managed memory are interested in this model.
A check is added in sanitize_mpol_flags() to not permit 'prefer_many'
policy to be used for now, and will be removed in later patch after all
implementations for 'prefer_many' are ready, as suggested by Michal Hocko.
[mhocko@kernel.org: suggest to refine policy_node/policy_nodemask handling]
Link: https://lkml.kernel.org/r/1627970362-61305-1-git-send-email-feng.tang@intel.com
Link: https://lore.kernel.org/r/20200630212517.308045-4-ben.widawsky@intel.com
Link: https://lkml.kernel.org/r/1627970362-61305-2-git-send-email-feng.tang@intel.com
Co-developed-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>b
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-09-02 22:00:06 +00:00
|
|
|
MPOL_PREFERRED_MANY,
|
mm/mempolicy: introduce MPOL_WEIGHTED_INTERLEAVE for weighted interleaving
When a system has multiple NUMA nodes and it becomes bandwidth hungry,
using the current MPOL_INTERLEAVE could be an wise option.
However, if those NUMA nodes consist of different types of memory such as
socket-attached DRAM and CXL/PCIe attached DRAM, the round-robin based
interleave policy does not optimally distribute data to make use of their
different bandwidth characteristics.
Instead, interleave is more effective when the allocation policy follows
each NUMA nodes' bandwidth weight rather than a simple 1:1 distribution.
This patch introduces a new memory policy, MPOL_WEIGHTED_INTERLEAVE,
enabling weighted interleave between NUMA nodes. Weighted interleave
allows for proportional distribution of memory across multiple numa nodes,
preferably apportioned to match the bandwidth of each node.
For example, if a system has 1 CPU node (0), and 2 memory nodes (0,1),
with bandwidth of (100GB/s, 50GB/s) respectively, the appropriate weight
distribution is (2:1).
Weights for each node can be assigned via the new sysfs extension:
/sys/kernel/mm/mempolicy/weighted_interleave/
For now, the default value of all nodes will be `1`, which matches the
behavior of standard 1:1 round-robin interleave. An extension will be
added in the future to allow default values to be registered at kernel and
device bringup time.
The policy allocates a number of pages equal to the set weights. For
example, if the weights are (2,1), then 2 pages will be allocated on node0
for every 1 page allocated on node1.
The new flag MPOL_WEIGHTED_INTERLEAVE can be used in set_mempolicy(2)
and mbind(2).
Some high level notes about the pieces of weighted interleave:
current->il_prev:
Tracks the node previously allocated from.
current->il_weight:
The active weight of the current node (current->il_prev)
When this reaches 0, current->il_prev is set to the next node
and current->il_weight is set to the next weight.
weighted_interleave_nodes:
Counts the number of allocations as they occur, and applies the
weight for the current node. When the weight reaches 0, switch
to the next node. Operates only on task->mempolicy.
weighted_interleave_nid:
Gets the total weight of the nodemask as well as each individual
node weight, then calculates the node based on the given index.
Operates on VMA policies.
bulk_array_weighted_interleave:
Gets the total weight of the nodemask as well as each individual
node weight, then calculates the number of "interleave rounds" as
well as any delta ("partial round"). Calculates the number of
pages for each node and allocates them.
If a node was scheduled for interleave via interleave_nodes, the
current weight will be allocated first.
Operates only on the task->mempolicy.
One piece of complexity is the interaction between a recent refactor which
split the logic to acquire the "ilx" (interleave index) of an allocation
and the actually application of the interleave. If a call to
alloc_pages_mpol() were made with a weighted-interleave policy and ilx set
to NO_INTERLEAVE_INDEX, weighted_interleave_nodes() would operate on a VMA
policy - violating the description above.
An inspection of all callers of alloc_pages_mpol() shows that all external
callers set ilx to `0`, an index value, or will call get_vma_policy() to
acquire the ilx.
For example, mm/shmem.c may call into alloc_pages_mpol. The call stacks
all set (pgoff_t ilx) or end up in `get_vma_policy()`. This enforces the
`weighted_interleave_nodes()` and `weighted_interleave_nid()` policy
requirements (task/vma respectively).
Link: https://lkml.kernel.org/r/20240202170238.90004-4-gregory.price@memverge.com
Suggested-by: Hasan Al Maruf <Hasan.Maruf@amd.com>
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Co-developed-by: Rakie Kim <rakie.kim@sk.com>
Signed-off-by: Rakie Kim <rakie.kim@sk.com>
Co-developed-by: Honggyu Kim <honggyu.kim@sk.com>
Signed-off-by: Honggyu Kim <honggyu.kim@sk.com>
Co-developed-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
Signed-off-by: Hyeongtak Ji <hyeongtak.ji@sk.com>
Co-developed-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
Co-developed-by: Ravi Jonnalagadda <ravis.opensrc@micron.com>
Signed-off-by: Ravi Jonnalagadda <ravis.opensrc@micron.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-02 17:02:37 +00:00
|
|
|
MPOL_WEIGHTED_INTERLEAVE,
|
2012-10-13 09:46:48 +00:00
|
|
|
MPOL_MAX, /* always last member of enum */
|
|
|
|
};
|
|
|
|
|
|
|
|
/* Flags for set_mempolicy */
|
|
|
|
#define MPOL_F_STATIC_NODES (1 << 15)
|
|
|
|
#define MPOL_F_RELATIVE_NODES (1 << 14)
|
numa balancing: migrate on fault among multiple bound nodes
Now, NUMA balancing can only optimize the page placement among the NUMA
nodes if the default memory policy is used. Because the memory policy
specified explicitly should take precedence. But this seems too strict in
some situations. For example, on a system with 4 NUMA nodes, if the
memory of an application is bound to the node 0 and 1, NUMA balancing can
potentially migrate the pages between the node 0 and 1 to reduce
cross-node accessing without breaking the explicit memory binding policy.
So in this patch, we add MPOL_F_NUMA_BALANCING mode flag to
set_mempolicy() when mode is MPOL_BIND. With the flag specified, NUMA
balancing will be enabled within the thread to optimize the page placement
within the constrains of the specified memory binding policy. With the
newly added flag, the NUMA balancing control mechanism becomes,
- sysctl knob numa_balancing can enable/disable the NUMA balancing
globally.
- even if sysctl numa_balancing is enabled, the NUMA balancing will be
disabled for the memory areas or applications with the explicit
memory policy by default.
- MPOL_F_NUMA_BALANCING can be used to enable the NUMA balancing for
the applications when specifying the explicit memory policy
(MPOL_BIND).
Various page placement optimization based on the NUMA balancing can be
done with these flags. As the first step, in this patch, if the memory of
the application is bound to multiple nodes (MPOL_BIND), and in the hint
page fault handler the accessing node are in the policy nodemask, the page
will be tried to be migrated to the accessing node to reduce the
cross-node accessing.
If the newly added MPOL_F_NUMA_BALANCING flag is specified by an
application on an old kernel version without its support, set_mempolicy()
will return -1 and errno will be set to EINVAL. The application can use
this behavior to run on both old and new kernel versions.
And if the MPOL_F_NUMA_BALANCING flag is specified for the mode other than
MPOL_BIND, set_mempolicy() will return -1 and errno will be set to EINVAL
as before. Because we don't support optimization based on the NUMA
balancing for these modes.
In the previous version of the patch, we tried to reuse MPOL_MF_LAZY for
mbind(). But that flag is tied to MPOL_MF_MOVE.*, so it seems not a good
API/ABI for the purpose of the patch.
And because it's not clear whether it's necessary to enable NUMA balancing
for a specific memory area inside an application, so we only add the flag
at the thread level (set_mempolicy()) instead of the memory area level
(mbind()). We can do that when it become necessary.
To test the patch, we run a test case as follows on a 4-node machine with
192 GB memory (48 GB per node).
1. Change pmbench memory accessing benchmark to call set_mempolicy()
to bind its memory to node 1 and 3 and enable NUMA balancing. Some
related code snippets are as follows,
#include <numaif.h>
#include <numa.h>
struct bitmask *bmp;
int ret;
bmp = numa_parse_nodestring("1,3");
ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING,
bmp->maskp, bmp->size + 1);
/* If MPOL_F_NUMA_BALANCING isn't supported, fall back to MPOL_BIND */
if (ret < 0 && errno == EINVAL)
ret = set_mempolicy(MPOL_BIND, bmp->maskp, bmp->size + 1);
if (ret < 0) {
perror("Failed to call set_mempolicy");
exit(-1);
}
2. Run a memory eater on node 3 to use 40 GB memory before running pmbench.
3. Run pmbench with 64 processes, the working-set size of each process
is 640 MB, so the total working-set size is 64 * 640 MB = 40 GB. The
CPU and the memory (as in step 1.) of all pmbench processes is bound
to node 1 and 3. So, after CPU usage is balanced, some pmbench
processes run on the CPUs of the node 3 will access the memory of
the node 1.
4. After the pmbench processes run for 100 seconds, kill the memory
eater. Now it's possible for some pmbench processes to migrate
their pages from node 1 to node 3 to reduce cross-node accessing.
Test results show that, with the patch, the pages can be migrated from
node 1 to node 3 after killing the memory eater, and the pmbench score
can increase about 17.5%.
Link: https://lkml.kernel.org/r/20210120061235.148637-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 20:09:43 +00:00
|
|
|
#define MPOL_F_NUMA_BALANCING (1 << 13) /* Optimize with NUMA balancing if possible */
|
2012-10-13 09:46:48 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* MPOL_MODE_FLAGS is the union of all possible optional mode flags passed to
|
|
|
|
* either set_mempolicy() or mbind().
|
|
|
|
*/
|
numa balancing: migrate on fault among multiple bound nodes
Now, NUMA balancing can only optimize the page placement among the NUMA
nodes if the default memory policy is used. Because the memory policy
specified explicitly should take precedence. But this seems too strict in
some situations. For example, on a system with 4 NUMA nodes, if the
memory of an application is bound to the node 0 and 1, NUMA balancing can
potentially migrate the pages between the node 0 and 1 to reduce
cross-node accessing without breaking the explicit memory binding policy.
So in this patch, we add MPOL_F_NUMA_BALANCING mode flag to
set_mempolicy() when mode is MPOL_BIND. With the flag specified, NUMA
balancing will be enabled within the thread to optimize the page placement
within the constrains of the specified memory binding policy. With the
newly added flag, the NUMA balancing control mechanism becomes,
- sysctl knob numa_balancing can enable/disable the NUMA balancing
globally.
- even if sysctl numa_balancing is enabled, the NUMA balancing will be
disabled for the memory areas or applications with the explicit
memory policy by default.
- MPOL_F_NUMA_BALANCING can be used to enable the NUMA balancing for
the applications when specifying the explicit memory policy
(MPOL_BIND).
Various page placement optimization based on the NUMA balancing can be
done with these flags. As the first step, in this patch, if the memory of
the application is bound to multiple nodes (MPOL_BIND), and in the hint
page fault handler the accessing node are in the policy nodemask, the page
will be tried to be migrated to the accessing node to reduce the
cross-node accessing.
If the newly added MPOL_F_NUMA_BALANCING flag is specified by an
application on an old kernel version without its support, set_mempolicy()
will return -1 and errno will be set to EINVAL. The application can use
this behavior to run on both old and new kernel versions.
And if the MPOL_F_NUMA_BALANCING flag is specified for the mode other than
MPOL_BIND, set_mempolicy() will return -1 and errno will be set to EINVAL
as before. Because we don't support optimization based on the NUMA
balancing for these modes.
In the previous version of the patch, we tried to reuse MPOL_MF_LAZY for
mbind(). But that flag is tied to MPOL_MF_MOVE.*, so it seems not a good
API/ABI for the purpose of the patch.
And because it's not clear whether it's necessary to enable NUMA balancing
for a specific memory area inside an application, so we only add the flag
at the thread level (set_mempolicy()) instead of the memory area level
(mbind()). We can do that when it become necessary.
To test the patch, we run a test case as follows on a 4-node machine with
192 GB memory (48 GB per node).
1. Change pmbench memory accessing benchmark to call set_mempolicy()
to bind its memory to node 1 and 3 and enable NUMA balancing. Some
related code snippets are as follows,
#include <numaif.h>
#include <numa.h>
struct bitmask *bmp;
int ret;
bmp = numa_parse_nodestring("1,3");
ret = set_mempolicy(MPOL_BIND | MPOL_F_NUMA_BALANCING,
bmp->maskp, bmp->size + 1);
/* If MPOL_F_NUMA_BALANCING isn't supported, fall back to MPOL_BIND */
if (ret < 0 && errno == EINVAL)
ret = set_mempolicy(MPOL_BIND, bmp->maskp, bmp->size + 1);
if (ret < 0) {
perror("Failed to call set_mempolicy");
exit(-1);
}
2. Run a memory eater on node 3 to use 40 GB memory before running pmbench.
3. Run pmbench with 64 processes, the working-set size of each process
is 640 MB, so the total working-set size is 64 * 640 MB = 40 GB. The
CPU and the memory (as in step 1.) of all pmbench processes is bound
to node 1 and 3. So, after CPU usage is balanced, some pmbench
processes run on the CPUs of the node 3 will access the memory of
the node 1.
4. After the pmbench processes run for 100 seconds, kill the memory
eater. Now it's possible for some pmbench processes to migrate
their pages from node 1 to node 3 to reduce cross-node accessing.
Test results show that, with the patch, the pages can be migrated from
node 1 to node 3 after killing the memory eater, and the pmbench score
can increase about 17.5%.
Link: https://lkml.kernel.org/r/20210120061235.148637-2-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24 20:09:43 +00:00
|
|
|
#define MPOL_MODE_FLAGS \
|
|
|
|
(MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES | MPOL_F_NUMA_BALANCING)
|
2012-10-13 09:46:48 +00:00
|
|
|
|
|
|
|
/* Flags for get_mempolicy */
|
|
|
|
#define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */
|
|
|
|
#define MPOL_F_ADDR (1<<1) /* look up vma using address */
|
|
|
|
#define MPOL_F_MEMS_ALLOWED (1<<2) /* return allowed memories */
|
|
|
|
|
|
|
|
/* Flags for mbind */
|
|
|
|
#define MPOL_MF_STRICT (1<<0) /* Verify existing pages in the mapping */
|
2012-10-25 12:16:32 +00:00
|
|
|
#define MPOL_MF_MOVE (1<<1) /* Move pages owned by this process to conform
|
|
|
|
to policy */
|
|
|
|
#define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to policy */
|
2023-10-03 09:24:18 +00:00
|
|
|
#define MPOL_MF_LAZY (1<<3) /* UNSUPPORTED FLAG: Lazy migrate on fault */
|
2012-10-25 12:16:32 +00:00
|
|
|
#define MPOL_MF_INTERNAL (1<<4) /* Internal flags start here */
|
|
|
|
|
|
|
|
#define MPOL_MF_VALID (MPOL_MF_STRICT | \
|
|
|
|
MPOL_MF_MOVE | \
|
2012-11-16 09:37:58 +00:00
|
|
|
MPOL_MF_MOVE_ALL)
|
2012-10-13 09:46:48 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Internal flags that share the struct mempolicy flags word with
|
|
|
|
* "mode flags". These flags are allocated from bit 0 up, as they
|
|
|
|
* are never OR'ed into the mode in mempolicy API arguments.
|
|
|
|
*/
|
|
|
|
#define MPOL_F_SHARED (1 << 0) /* identify shared policies */
|
2012-10-25 12:16:30 +00:00
|
|
|
#define MPOL_F_MOF (1 << 3) /* this policy wants migrate on fault */
|
2015-02-12 22:58:22 +00:00
|
|
|
#define MPOL_F_MORON (1 << 4) /* Migrate On protnone Reference On Node */
|
2012-10-13 09:46:48 +00:00
|
|
|
|
2021-05-05 01:36:01 +00:00
|
|
|
/*
|
|
|
|
* These bit locations are exposed in the vm.zone_reclaim_mode sysctl
|
|
|
|
* ABI. New bits are OK, but existing bits can never change.
|
|
|
|
*/
|
|
|
|
#define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */
|
|
|
|
#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
|
|
|
|
#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
|
2012-10-13 09:46:48 +00:00
|
|
|
|
|
|
|
#endif /* _UAPI_LINUX_MEMPOLICY_H */
|