Revert "mm/vmscan: never demote for memcg reclaim"

Bugzilla: https://bugzilla.redhat.com/2160210

commit 3f1509c57b1ba5646de0fb8d81bd7107aec22257
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Wed May 18 15:09:11 2022 -0400

    Revert "mm/vmscan: never demote for memcg reclaim"

    This reverts commit 3a235693d3930e1276c8d9cc0ca5807ef292cf0a.

    Its premise was that cgroup reclaim cares about freeing memory inside the
    cgroup, and demotion just moves them around within the cgroup limit.
    Hence, pages from toptier nodes should be reclaimed directly.

    However, with NUMA balancing now doing tier promotions, demotion is part
    of the page aging process.  Global reclaim demotes the coldest toptier
    pages to secondary memory, where their life continues and from which they
    have a chance to get promoted back.  Essentially, tiered memory systems
    have an LRU order that spans multiple nodes.

    When cgroup reclaims pages coming off the toptier directly, there can be
    colder pages on lower tier nodes that were demoted by global reclaim.
    This is an aging inversion, not unlike if cgroups were to reclaim directly
    from the active lists while there are inactive pages.

    Proactive reclaim is another factor.  The goal of that it is to offload
    colder pages from expensive RAM to cheaper storage.  When lower tier
    memory is available as an intermediate layer, we want offloading to take
    advantage of it instead of bypassing to storage.

    Revert the patch so that cgroups respect the LRU order spanning the memory
    hierarchy.

    Of note is a specific undercommit scenario, where all cgroup limits in the
    system add up to <= available toptier memory.  In that case, shuffling
    pages out to lower tiers first to reclaim them from there is inefficient.
    This is something could be optimized/short-circuited later on (although
    care must be taken not to accidentally recreate the aging inversion).
    Let's ensure correctness first.

    Link: https://lkml.kernel.org/r/20220518190911.82400-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
    Reviewed-by: Yang Shi <shy828301@gmail.com>
    Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
    Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Shakeel Butt <shakeelb@google.com>
    Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
This commit is contained in:
Chris von Recklinghausen 2023-03-24 07:44:07 -04:00
parent 2e8fd1adb6
commit 2244d30d80
1 changed files with 2 additions and 7 deletions

View File

@ -528,13 +528,8 @@ static bool can_demote(int nid, struct scan_control *sc)
{
if (!numa_demotion_enabled)
return false;
if (sc) {
if (sc->no_demotion)
return false;
/* It is pointless to do demotion in memcg reclaim */
if (cgroup_reclaim(sc))
return false;
}
if (sc && sc->no_demotion)
return false;
if (next_demotion_node(nid) == NUMA_NO_NODE)
return false;