Commit Graph

1135 Commits

Author SHA1 Message Date
Nico Pache 6fc7da1ce3 mm: vmscan: make rotations a secondary factor in balancing anon vs file
commit 0538a82c39e94d49fa6985c6a0101ca819be11ee
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Thu Oct 13 15:31:13 2022 -0400

    mm: vmscan: make rotations a secondary factor in balancing anon vs file

    We noticed a 2% webserver throughput regression after upgrading from 5.6.
    This could be tracked down to a shift in the anon/file reclaim balance
    (confirmed with swappiness) that resulted in worse reclaim efficiency and
    thus more kswapd activity for the same outcome.

    The change that exposed the problem is aae466b005 ("mm/swap: implement
    workingset detection for anonymous LRU").  By qualifying swapins based on
    their refault distance, it lowered the cost of anon reclaim in this
    workload, in turn causing (much) more anon scanning than before.  Scanning
    the anon list is more expensive due to the higher ratio of mmapped pages
    that may rotate during reclaim, and so the result was an increase in %sys
    time.

    Right now, rotations aren't considered a cost when balancing scan pressure
    between LRUs.  We can end up with very few file refaults putting all the
    scan pressure on hot anon pages that are rotated en masse, don't get
    reclaimed, and never push back on the file LRU again.  We still only
    reclaim file cache in that case, but we burn a lot CPU rotating anon
    pages.  It's "fair" from an LRU age POV, but doesn't reflect the real cost
    it imposes on the system.

    Consider rotations as a secondary factor in balancing the LRUs.  This
    doesn't attempt to make a precise comparison between IO cost and CPU cost,
    it just says: if reloads are about comparable between the lists, or
    rotations are overwhelmingly different, adjust for CPU work.

    This fixed the regression on our webservers.  It has since been deployed
    to the entire Meta fleet and hasn't caused any problems.

    Link: https://lkml.kernel.org/r/20221013193113.726425-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Rik van Riel <riel@surriel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168372
Signed-off-by: Nico Pache <npache@redhat.com>
2023-06-14 15:11:02 -06:00
Nico Pache 011900bb25 mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
commit 81a70c21d9170de67a45843bdd627f4cce9c4215
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date:   Fri Nov 18 12:36:03 2022 +0530

    mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1

    balance_dirty_pages doesn't do the required dirty throttling on cgroupv1.
    See commit 9badce000e ("cgroup, writeback: don't enable cgroup writeback
    on traditional hierarchies").  Instead, the kernel depends on writeback
    throttling in shrink_folio_list to achieve the same goal.  With large
    memory systems, the flusher may not be able to writeback quickly enough
    such that we will start finding pages in the shrink_folio_list already in
    writeback.  Hence for cgroupv1 let's do a reclaim throttle after waking up
    the flusher.

    The below test which used to fail on a 256GB system completes till the the
    file system is full with this change.

    root@lp2:/sys/fs/cgroup/memory# mkdir test
    root@lp2:/sys/fs/cgroup/memory# cd test/
    root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes
    root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks
    root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M
    Killed

    Link: https://lkml.kernel.org/r/20221118070603.84081-1-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: zefan li <lizefan.x@bytedance.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168372
Signed-off-by: Nico Pache <npache@redhat.com>
2023-06-14 15:11:02 -06:00
Nico Pache 63ce9c4eb8 mm: vmscan: fix extreme overreclaim and swap floods
commit f53af4285d775cd9a9a146fc438bd0a1bee1838a
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Tue Aug 2 12:28:11 2022 -0400

    mm: vmscan: fix extreme overreclaim and swap floods

    During proactive reclaim, we sometimes observe severe overreclaim, with
    several thousand times more pages reclaimed than requested.

    This trace was obtained from shrink_lruvec() during such an instance:

        prio:0 anon_cost:1141521 file_cost:7767
        nr_reclaimed:4387406 nr_to_reclaim:1047 (or_factor:4190)
        nr=[7161123 345 578 1111]

    While he reclaimer requested 4M, vmscan reclaimed close to 16G, most of it
    by swapping.  These requests take over a minute, during which the write()
    to memory.reclaim is unkillably stuck inside the kernel.

    Digging into the source, this is caused by the proportional reclaim
    bailout logic.  This code tries to resolve a fundamental conflict: to
    reclaim roughly what was requested, while also aging all LRUs fairly and
    in accordance to their size, swappiness, refault rates etc.  The way it
    attempts fairness is that once the reclaim goal has been reached, it stops
    scanning the LRUs with the smaller remaining scan targets, and adjusts the
    remainder of the bigger LRUs according to how much of the smaller LRUs was
    scanned.  It then finishes scanning that remainder regardless of the
    reclaim goal.

    This works fine if priority levels are low and the LRU lists are
    comparable in size.  However, in this instance, the cgroup that is
    targeted by proactive reclaim has almost no files left - they've already
    been squeezed out by proactive reclaim earlier - and the remaining anon
    pages are hot.  Anon rotations cause the priority level to drop to 0,
    which results in reclaim targeting all of anon (a lot) and all of file
    (almost nothing).  By the time reclaim decides to bail, it has scanned
    most or all of the file target, and therefor must also scan most or all of
    the enormous anon target.  This target is thousands of times larger than
    the reclaim goal, thus causing the overreclaim.

    The bailout code hasn't changed in years, why is this failing now?  The
    most likely explanations are two other recent changes in anon reclaim:

    1. Before the series starting with commit 5df741963d ("mm: fix LRU
       balancing effect of new transparent huge pages"), the VM was
       overall relatively reluctant to swap at all, even if swap was
       configured. This means the LRU balancing code didn't come into play
       as often as it does now, and mostly in high pressure situations
       where pronounced swap activity wouldn't be as surprising.

    2. For historic reasons, shrink_lruvec() loops on the scan targets of
       all LRU lists except the active anon one, meaning it would bail if
       the only remaining pages to scan were active anon - even if there
       were a lot of them.

       Before the series starting with commit ccc5dc6734 ("mm/vmscan:
       make active/inactive ratio as 1:1 for anon lru"), most anon pages
       would live on the active LRU; the inactive one would contain only a
       handful of preselected reclaim candidates. After the series, anon
       gets aged similarly to file, and the inactive list is the default
       for new anon pages as well, making it often the much bigger list.

       As a result, the VM is now more likely to actually finish large
       anon targets than before.

    Change the code such that only one SWAP_CLUSTER_MAX-sized nudge toward the
    larger LRU lists is made before bailing out on a met reclaim goal.

    This fixes the extreme overreclaim problem.

    Fairness is more subtle and harder to evaluate.  No obvious misbehavior
    was observed on the test workload, in any case.  Conceptually, fairness
    should primarily be a cumulative effect from regular, lower priority
    scans.  Once the VM is in trouble and needs to escalate scan targets to
    make forward progress, fairness needs to take a backseat.  This is also
    acknowledged by the myriad exceptions in get_scan_count().  This patch
    makes fairness decrease gradually, as it keeps fairness work static over
    increasing priority levels with growing scan targets.  This should make
    more sense - although we may have to re-visit the exact values.

    Link: https://lkml.kernel.org/r/20220802162811.39216-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: Rik van Riel <riel@surriel.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2168372
Signed-off-by: Nico Pache <npache@redhat.com>
2023-06-14 15:11:02 -06:00
Rafael Aquini 8cfc52e479 mm/demotion: demote pages according to allocation fallback order
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2186559

This patch is a backport of the following upstream commit:
commit 32008027289239100d8d2876f50b15d92bde1855
Author: Jagdish Gediya <jvgediya.oss@gmail.com>
Date:   Thu Aug 18 18:40:40 2022 +0530

    mm/demotion: demote pages according to allocation fallback order

    Currently, a higher tier node can only be demoted to selected nodes on the
    next lower tier as defined by the demotion path.  This strict demotion
    order does not work in all use cases (e.g.  some use cases may want to
    allow cross-socket demotion to another node in the same demotion tier as a
    fallback when the preferred demotion node is out of space).  This demotion
    order is also inconsistent with the page allocation fallback order when
    all the nodes in a higher tier are out of space: The page allocation can
    fall back to any node from any lower tier, whereas the demotion order
    doesn't allow that currently.

    This patch adds support to get all the allowed demotion targets for a
    memory tier.  demote_page_list() function is now modified to utilize this
    allowed node mask as the fallback allocation mask.

    Link: https://lkml.kernel.org/r/20220818131042.113280-9-aneesh.kumar@linux.ibm.com
    Signed-off-by: Jagdish Gediya <jvgediya.oss@gmail.com>
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
    Acked-by: Wei Xu <weixugc@google.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Bharata B Rao <bharata@amd.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Hesham Almatary <hesham.almatary@huawei.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2023-04-26 08:55:46 -04:00
Rafael Aquini 892be419d9 mm/demotion: move memory demotion related code
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2186559

This patch is a backport of the following upstream commit:
commit 9195244022788935eac0df16132394ffa5613542
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date:   Thu Aug 18 18:40:34 2022 +0530

    mm/demotion: move memory demotion related code

    This moves memory demotion related code to mm/memory-tiers.c.  No
    functional change in this patch.

    Link: https://lkml.kernel.org/r/20220818131042.113280-3-aneesh.kumar@linux.ibm.com
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
    Acked-by: Wei Xu <weixugc@google.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Bharata B Rao <bharata@amd.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Davidlohr Bueso <dave@stgolabs.net>
    Cc: Hesham Almatary <hesham.almatary@huawei.com>
    Cc: Jagdish Gediya <jvgediya.oss@gmail.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Tim Chen <tim.c.chen@intel.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2023-04-26 08:55:41 -04:00
Chris von Recklinghausen a7fb36ec82 mm: shrinkers: fix deadlock in shrinker debugfs
Conflicts: mm/vmscan.c - We don't have
	d6c3af7d8a2b ("mm: multi-gen LRU: debugfs interface")
	so add an #include of linux/debugfs.h

Bugzilla: https://bugzilla.redhat.com/2160210

commit badc28d4924bfed73efc93f716a0c3aa3afbdf6f
Author: Qi Zheng <zhengqi.arch@bytedance.com>
Date:   Thu Feb 2 18:56:12 2023 +0800

    mm: shrinkers: fix deadlock in shrinker debugfs

    The debugfs_remove_recursive() is invoked by unregister_shrinker(), which
    is holding the write lock of shrinker_rwsem.  It will waits for the
    handler of debugfs file complete.  The handler also needs to hold the read
    lock of shrinker_rwsem to do something.  So it may cause the following
    deadlock:

            CPU0                            CPU1

    debugfs_file_get()
    shrinker_debugfs_count_show()/shrinker_debugfs_scan_write()

                                    unregister_shrinker()
                                    --> down_write(&shrinker_rwsem);
                                        debugfs_remove_recursive()
                                            // wait for (A)
                                        --> wait_for_completion();

        // wait for (B)
    --> down_read_killable(&shrinker_rwsem)
    debugfs_file_put() -- (A)

                                        up_write() -- (B)

    The down_read_killable() can be killed, so that the above deadlock can be
    recovered.  But it still requires an extra kill action, otherwise it will
    block all subsequent shrinker-related operations, so it's better to fix
    it.

    [akpm@linux-foundation.org: fix CONFIG_SHRINKER_DEBUG=n stub]
    Link: https://lkml.kernel.org/r/20230202105612.64641-1-zhengqi.arch@bytedance.com
    Fixes: 5035ebc644ae ("mm: shrinkers: introduce debugfs interface for memory shrinkers")
    Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
    Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Kent Overstreet <kent.overstreet@gmail.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:35 -04:00
Chris von Recklinghausen e60aca2995 vmscan: check folio_test_private(), not folio_get_private()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 36a3b14b5febdaf0e7f70c4ca6f62c8ea75fabfe
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Sep 2 20:26:39 2022 +0100

    vmscan: check folio_test_private(), not folio_get_private()

    These two predicates are the same for file pages, but are not the same for
    anonymous pages.

    Link: https://lkml.kernel.org/r/20220902192639.1737108-3-willy@infradead.org
    Fixes: 07f67a8dedc0 ("mm/vmscan: convert shrink_active_list() to use a folio")
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reported-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:32 -04:00
Chris von Recklinghausen 0b0d5d11c5 mm: vmpressure: don't count proactive reclaim in vmpressure
Bugzilla: https://bugzilla.redhat.com/2160210

commit 73b73bac90d97400e29e585c678c4d0ebfd2680d
Author: Yosry Ahmed <yosryahmed@google.com>
Date:   Thu Jul 14 06:49:18 2022 +0000

    mm: vmpressure: don't count proactive reclaim in vmpressure

    memory.reclaim is a cgroup v2 interface that allows users to proactively
    reclaim memory from a memcg, without real memory pressure.  Reclaim
    operations invoke vmpressure, which is used: (a) To notify userspace of
    reclaim efficiency in cgroup v1, and (b) As a signal for a memcg being
    under memory pressure for networking (see
    mem_cgroup_under_socket_pressure()).

    For (a), vmpressure notifications in v1 are not affected by this change
    since memory.reclaim is a v2 feature.

    For (b), the effects of the vmpressure signal (according to Shakeel [1])
    are as follows:
    1. Reducing send and receive buffers of the current socket.
    2. May drop packets on the rx path.
    3. May throttle current thread on the tx path.

    Since proactive reclaim is invoked directly by userspace, not by memory
    pressure, it makes sense not to throttle networking.  Hence, this change
    makes sure that proactive reclaim caused by memory.reclaim does not
    trigger vmpressure.

    [1] https://lore.kernel.org/lkml/CALvZod68WdrXEmBpOkadhB5GPYmCXaDZzXH=yyGOCAjFRn4NDQ@mail.gmail.com/

    [yosryahmed@google.com: update documentation]
      Link: https://lkml.kernel.org/r/20220721173015.2643248-1-yosryahmed@google.com
    Link: https://lkml.kernel.org/r/20220714064918.2576464-1-yosryahmed@google.com
    Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
    Acked-by: Shakeel Butt <shakeelb@google.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: David Rientjes <rientjes@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Cc: NeilBrown <neilb@suse.de>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:28 -04:00
Chris von Recklinghausen f78820ee59 mm: shrinkers: fix double kfree on shrinker name
Bugzilla: https://bugzilla.redhat.com/2160210

commit 14773bfa70e67f4d4ebd60e60cb6e25e8c84d4c0
Author: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date:   Wed Jul 20 23:47:55 2022 +0900

    mm: shrinkers: fix double kfree on shrinker name

    syzbot is reporting double kfree() at free_prealloced_shrinker() [1], for
    destroy_unused_super() calls free_prealloced_shrinker() even if
    prealloc_shrinker() returned an error.  Explicitly clear shrinker name
    when prealloc_shrinker() called kfree().

    [roman.gushchin@linux.dev: zero shrinker->name in all cases where shrinker->name is freed]
      Link: https://lkml.kernel.org/r/YtgteTnQTgyuKUSY@castle
    Link: https://syzkaller.appspot.com/bug?extid=8b481578352d4637f510 [1]
    Link: https://lkml.kernel.org/r/ffa62ece-6a42-2644-16cf-0d33ef32c676@I-love.SAKURA.ne.jp
    Fixes: e33c267ab70de424 ("mm: shrinkers: provide shrinkers with names")
    Reported-by: syzbot <syzbot+8b481578352d4637f510@syzkaller.appspotmail.com>
    Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
    Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:28 -04:00
Chris von Recklinghausen 735b1144c9 mm, docs: fix comments that mention mem_hotplug_end()
Bugzilla: https://bugzilla.redhat.com/2160210

commit e8da368a1e42a8056d1a6b419e1b91b6cf11d77e
Author: Yun-Ze Li <p76091292@gs.ncku.edu.tw>
Date:   Mon Jun 20 07:15:16 2022 +0000

    mm, docs: fix comments that mention mem_hotplug_end()

    Comments that mention mem_hotplug_end() are confusing as there is no
    function called mem_hotplug_end().  Fix them by replacing all the
    occurences of mem_hotplug_end() in the comments with mem_hotplug_done().

    [akpm@linux-foundation.org: grammatical fixes]
    Link: https://lkml.kernel.org/r/20220620071516.1286101-1-p76091292@gs.ncku.edu.tw
    Signed-off-by: Yun-Ze Li <p76091292@gs.ncku.edu.tw>
    Cc: Souptick Joarder <jrdr.linux@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:20 -04:00
Chris von Recklinghausen 626f944477 mm/swap: convert __delete_from_swap_cache() to a folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit ceff9d3354e95ca17e12ad869acea5407cc467f9
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 18:50:20 2022 +0100

    mm/swap: convert __delete_from_swap_cache() to a folio

    All callers now have a folio, so convert the entire function to operate
    on folios.

    Link: https://lkml.kernel.org/r/20220617175020.717127-23-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:20 -04:00
Chris von Recklinghausen 13005f9b18 mm: convert page_swap_flags to folio_swap_flags
Bugzilla: https://bugzilla.redhat.com/2160210

commit b98c359f1d921deae04bb5dbbbbbb9d8705b7c4c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 18:50:18 2022 +0100

    mm: convert page_swap_flags to folio_swap_flags

    The only caller already has a folio, so push the folio->page conversion
    down a level.

    Link: https://lkml.kernel.org/r/20220617175020.717127-21-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:20 -04:00
Chris von Recklinghausen eed5a2e492 mm: convert destroy_compound_page() to destroy_large_folio()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 5375336c8c42a343c3b440b6f1e21c65e7b174b9
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 18:50:17 2022 +0100

    mm: convert destroy_compound_page() to destroy_large_folio()

    All callers now have a folio, so push the folio->page conversion
    down to this function.

    [akpm@linux-foundation.org: uninline destroy_large_folio() to fix build issue]
    Link: https://lkml.kernel.org/r/20220617175020.717127-20-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:20 -04:00
Chris von Recklinghausen 5f78168909 mm/vmscan: convert reclaim_pages() to use a folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit a83f0551f49682c81444d682053d49f9dfcbe5fa
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 16:42:48 2022 +0100

    mm/vmscan: convert reclaim_pages() to use a folio

    Remove a few hidden calls to compound_head, saving 76 bytes of text.

    Link: https://lkml.kernel.org/r/20220617154248.700416-6-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:18 -04:00
Chris von Recklinghausen b80fd4fb22 mm/vmscan: convert shrink_active_list() to use a folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit 07f67a8dedc0788f3f91d945bc6e987cf9cccd4a
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 16:42:47 2022 +0100

    mm/vmscan: convert shrink_active_list() to use a folio

    Remove a few hidden calls to compound_head, saving 411 bytes of text.

    Link: https://lkml.kernel.org/r/20220617154248.700416-5-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:18 -04:00
Chris von Recklinghausen a1783a48b2 mm/vmscan: convert move_pages_to_lru() to use a folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit ff00a170d950309f9daef836caa3d54671b883b8
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 16:42:46 2022 +0100

    mm/vmscan: convert move_pages_to_lru() to use a folio

    Remove a few hidden calls to compound_head, saving 387 bytes of text on
    my test configuration.

    Link: https://lkml.kernel.org/r/20220617154248.700416-4-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:18 -04:00
Chris von Recklinghausen 7c42c35c60 mm/vmscan: convert isolate_lru_pages() to use a folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit 166e3d32276f4c9ffd290f92b9df55b255f5fed7
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 16:42:45 2022 +0100

    mm/vmscan: convert isolate_lru_pages() to use a folio

    Remove a few hidden calls to compound_head, saving 279 bytes of text.

    Link: https://lkml.kernel.org/r/20220617154248.700416-3-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:18 -04:00
Chris von Recklinghausen d816e11222 mm/vmscan: convert reclaim_clean_pages_from_list() to folios
Bugzilla: https://bugzilla.redhat.com/2160210

commit b8cecb9376b9d3031cf62b476a0db087b6b01072
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Jun 17 16:42:44 2022 +0100

    mm/vmscan: convert reclaim_clean_pages_from_list() to folios

    Patch series "nvert much of vmscan to folios"

    vmscan always operates on folios since it puts the pages on the LRU list.
    Switching all of these functions from pages to folios saves 1483 bytes of
    text from removing all the baggage around calling compound_page() and
    similar functions.

    This patch (of 5):

    This is a straightforward conversion which removes several hidden calls
    to compound_head, saving 330 bytes of kernel text.

    Link: https://lkml.kernel.org/r/20220617154248.700416-1-willy@infradead.org
    Link: https://lkml.kernel.org/r/20220617154248.700416-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:18 -04:00
Chris von Recklinghausen 8dced2b153 mm: shrinkers: provide shrinkers with names
Bugzilla: https://bugzilla.redhat.com/2160210

commit e33c267ab70de4249d22d7eab1cc7d68a889bac2
Author: Roman Gushchin <roman.gushchin@linux.dev>
Date:   Tue May 31 20:22:24 2022 -0700

    mm: shrinkers: provide shrinkers with names

    Currently shrinkers are anonymous objects.  For debugging purposes they
    can be identified by count/scan function names, but it's not always
    useful: e.g.  for superblock's shrinkers it's nice to have at least an
    idea of to which superblock the shrinker belongs.

    This commit adds names to shrinkers.  register_shrinker() and
    prealloc_shrinker() functions are extended to take a format and arguments
    to master a name.

    In some cases it's not possible to determine a good name at the time when
    a shrinker is allocated.  For such cases shrinker_debugfs_rename() is
    provided.

    The expected format is:
        <subsystem>-<shrinker_type>[:<instance>]-<id>
    For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair.

    After this change the shrinker debugfs directory looks like:
      $ cd /sys/kernel/debug/shrinker/
      $ ls
        dquota-cache-16     sb-devpts-28     sb-proc-47       sb-tmpfs-42
        mm-shadow-18        sb-devtmpfs-5    sb-proc-48       sb-tmpfs-43
        mm-zspool:zram0-34  sb-hugetlbfs-17  sb-pstore-31     sb-tmpfs-44
        rcu-kfree-0         sb-hugetlbfs-33  sb-rootfs-2      sb-tmpfs-49
        sb-aio-20           sb-iomem-12      sb-securityfs-6  sb-tracefs-13
        sb-anon_inodefs-15  sb-mqueue-21     sb-selinuxfs-22  sb-xfs:vda1-36
        sb-bdev-3           sb-nsfs-4        sb-sockfs-8      sb-zsmalloc-19
        sb-bpf-32           sb-pipefs-14     sb-sysfs-26      thp-deferred_split-10
        sb-btrfs:vda2-24    sb-proc-25       sb-tmpfs-1       thp-zero-9
        sb-cgroup2-30       sb-proc-39       sb-tmpfs-27      xfs-buf:vda1-37
        sb-configfs-23      sb-proc-41       sb-tmpfs-29      xfs-inodegc:vda1-38
        sb-dax-11           sb-proc-45       sb-tmpfs-35
        sb-debugfs-7        sb-proc-46       sb-tmpfs-40

    [roman.gushchin@linux.dev: fix build warnings]
      Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle
      Reported-by: kernel test robot <lkp@intel.com>
    Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev
    Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Hillf Danton <hdanton@sina.com>
    Cc: Kent Overstreet <kent.overstreet@gmail.com>
    Cc: Muchun Song <songmuchun@bytedance.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:17 -04:00
Chris von Recklinghausen 22b1bd509b mm: shrinkers: introduce debugfs interface for memory shrinkers
Bugzilla: https://bugzilla.redhat.com/2160210

commit 5035ebc644aec92d55d1bbfe042f35341e4bffb5
Author: Roman Gushchin <roman.gushchin@linux.dev>
Date:   Tue May 31 20:22:23 2022 -0700

    mm: shrinkers: introduce debugfs interface for memory shrinkers

    This commit introduces the /sys/kernel/debug/shrinker debugfs interface
    which provides an ability to observe the state of individual kernel memory
    shrinkers.

    Because the feature adds some memory overhead (which shouldn't be large
    unless there is a huge amount of registered shrinkers), it's guarded by a
    config option (enabled by default).

    This commit introduces the "count" interface for each shrinker registered
    in the system.

    The output is in the following format:
    <cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>...
    <cgroup inode id> <nr of objects on node 0> <nr of objects on node 1>...
    ...

    To reduce the size of output on machines with many thousands cgroups, if
    the total number of objects on all nodes is 0, the line is omitted.

    If the shrinker is not memcg-aware or CONFIG_MEMCG is off, 0 is printed as
    cgroup inode id.  If the shrinker is not numa-aware, 0's are printed for
    all nodes except the first one.

    This commit gives debugfs entries simple numeric names, which are not very
    convenient.  The following commit in the series will provide shrinkers
    with more meaningful names.

    [akpm@linux-foundation.org: remove WARN_ON_ONCE(), per Roman]
      Reported-by: syzbot+300d27c79fe6d4cbcc39@syzkaller.appspotmail.com
    Link: https://lkml.kernel.org/r/20220601032227.4076670-3-roman.gushchin@linux.dev
    Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
    Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
    Acked-by: Muchun Song <songmuchun@bytedance.com>
    Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Cc: Dave Chinner <dchinner@redhat.com>
    Cc: Hillf Danton <hdanton@sina.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:17 -04:00
Chris von Recklinghausen 73c174d303 vmscan: Add check_move_unevictable_folios()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 77414d195f905dd43f58bce82118775ffa59575c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Jun 4 17:39:09 2022 -0400

    vmscan: Add check_move_unevictable_folios()

    Change the guts of check_move_unevictable_pages() over to use folios
    and add check_move_unevictable_pages() as a wrapper.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:16 -04:00
Chris von Recklinghausen 2244d30d80 Revert "mm/vmscan: never demote for memcg reclaim"
Bugzilla: https://bugzilla.redhat.com/2160210

commit 3f1509c57b1ba5646de0fb8d81bd7107aec22257
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Wed May 18 15:09:11 2022 -0400

    Revert "mm/vmscan: never demote for memcg reclaim"

    This reverts commit 3a235693d3930e1276c8d9cc0ca5807ef292cf0a.

    Its premise was that cgroup reclaim cares about freeing memory inside the
    cgroup, and demotion just moves them around within the cgroup limit.
    Hence, pages from toptier nodes should be reclaimed directly.

    However, with NUMA balancing now doing tier promotions, demotion is part
    of the page aging process.  Global reclaim demotes the coldest toptier
    pages to secondary memory, where their life continues and from which they
    have a chance to get promoted back.  Essentially, tiered memory systems
    have an LRU order that spans multiple nodes.

    When cgroup reclaims pages coming off the toptier directly, there can be
    colder pages on lower tier nodes that were demoted by global reclaim.
    This is an aging inversion, not unlike if cgroups were to reclaim directly
    from the active lists while there are inactive pages.

    Proactive reclaim is another factor.  The goal of that it is to offload
    colder pages from expensive RAM to cheaper storage.  When lower tier
    memory is available as an intermediate layer, we want offloading to take
    advantage of it instead of bypassing to storage.

    Revert the patch so that cgroups respect the LRU order spanning the memory
    hierarchy.

    Of note is a specific undercommit scenario, where all cgroup limits in the
    system add up to <= available toptier memory.  In that case, shuffling
    pages out to lower tiers first to reclaim them from there is inefficient.
    This is something could be optimized/short-circuited later on (although
    care must be taken not to accidentally recreate the aging inversion).
    Let's ensure correctness first.

    Link: https://lkml.kernel.org/r/20220518190911.82400-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
    Reviewed-by: Yang Shi <shy828301@gmail.com>
    Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
    Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
    Reviewed-by: Muchun Song <songmuchun@bytedance.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Shakeel Butt <shakeelb@google.com>
    Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
    Cc: Zi Yan <ziy@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:12 -04:00
Chris von Recklinghausen 7b1db0833d mm: don't be stuck to rmap lock on reclaim path
Bugzilla: https://bugzilla.redhat.com/2160210

commit 6d4675e601357834dadd2ba1d803f6484596015c
Author: Minchan Kim <minchan@kernel.org>
Date:   Thu May 19 14:08:54 2022 -0700

    mm: don't be stuck to rmap lock on reclaim path

    The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended
    under memory pressure if processes keep working on their vmas(e.g., fork,
    mmap, munmap).  It makes reclaim path stuck.  In our real workload traces,
    we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it
    makes other processes entering direct reclaim, which were also stuck on
    the lock.

    This patch makes lru aging path try_lock mode like shink_page_list so the
    reclaim context will keep working with next lru pages without being stuck.
    if it found the rmap lock contended, it rotates the page back to head of
    lru in both active/inactive lrus to make them consistent behavior, which
    is basic starting point rather than adding more heristic.

    Since this patch introduces a new "contended" field as out-param along
    with try_lock in-param in rmap_walk_control, it's not immutable any longer
    if the try_lock is set so remove const keywords on rmap related functions.
    Since rmap walking is already expensive operation, I doubt the const
    would help sizable benefit( And we didn't have it until 5.17).

    In a heavy app workload in Android, trace shows following statistics.  It
    almost removes rmap lock contention from reclaim path.

    Martin Liu reported:

    Before:

       max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
             1632            0            1631   151.542173        31672    209  page_lock_anon_vma_read
              601            0             601   145.544681        28817    198  rmap_walk_file

    After:

       max_dur(ms)  min_dur(ms)  max-min(dur)ms  avg_dur(ms)  sum_dur(ms)  count blocked_function
              NaN          NaN              NaN          NaN          NaN    0.0             NaN
                0            0                0     0.127645            1     12  rmap_walk_file

    [minchan@kernel.org: add comment, per Matthew]
      Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com
    Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.org
    Signed-off-by: Minchan Kim <minchan@kernel.org>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: John Dias <joaodias@google.com>
    Cc: Tim Murray <timmurray@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
    Cc: Martin Liu <liumartin@google.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:11 -04:00
Chris von Recklinghausen 0fa6b9a627 vmscan: remove remaining uses of page in shrink_page_list
Conflicts: mm/vmscan.c - The backport of
	d4b4084ac315 ("mm: Turn can_split_huge_page() into can_split_folio()")
	didn't change page_maybe_dma_pinned to folio_maybe_dma_pinned.
	Do that here.

Bugzilla: https://bugzilla.redhat.com/2160210

commit c28a0e9695b724fbaa58b1f5bbf0a03c5a79d721
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:03 2022 -0700

    vmscan: remove remaining uses of page in shrink_page_list

    These are all straightforward conversions to the folio API.

    Link: https://lkml.kernel.org/r/20220504182857.4013401-16-willy@infradead.or
g
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:07 -04:00
Chris von Recklinghausen 894061152f vmscan: convert the activate_locked portion of shrink_page_list to folios
Bugzilla: https://bugzilla.redhat.com/2160210

commit 246b648038096c6024a812aac354d27e8da987a2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:03 2022 -0700

    vmscan: convert the activate_locked portion of shrink_page_list to folios

    This accounts the number of pages activated correctly for large folios.

    Link: https://lkml.kernel.org/r/20220504182857.4013401-14-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen 10ccc36f3e vmscan: move initialisation of mapping down
Bugzilla: https://bugzilla.redhat.com/2160210

commit 5441d4902f969293b5bfe057e26038db1a8b342e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:03 2022 -0700

    vmscan: move initialisation of mapping down

    Now that we don't interrogate the BDI for congestion, we can delay looking
    up the folio's mapping until we've got further through the function,
    reducing register pressure and saving a call to folio_mapping for folios
    we're adding to the swap cache.

    Link: https://lkml.kernel.org/r/20220504182857.4013401-13-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen 7787974bef vmscan: convert lazy freeing to folios
Bugzilla: https://bugzilla.redhat.com/2160210

commit 64daa5d818ae3430f0785206b0af13ef528cb9ef
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:03 2022 -0700

    vmscan: convert lazy freeing to folios

    Remove a hidden call to compound_head(), and account nr_pages instead of a
    single page.  This matches the code in lru_lazyfree_fn() that accounts
    nr_pages to PGLAZYFREE.

    Link: https://lkml.kernel.org/r/20220504182857.4013401-12-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen 0d40b7c3ba vmscan: convert page buffer handling to use folios
Bugzilla: https://bugzilla.redhat.com/2160210

commit 0a36111c8c20b2edc7c83f084bdba2be9d42c1e9
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:02 2022 -0700

    vmscan: convert page buffer handling to use folios

    This mostly just removes calls to compound_head() although nr_reclaimed
    should be incremented by the number of pages, not just 1.

    Link: https://lkml.kernel.org/r/20220504182857.4013401-11-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen fe256ab87e vmscan: convert dirty page handling to folios
Bugzilla: https://bugzilla.redhat.com/2160210

commit 49bd2bf9679f4a5b30236546fca61e66b989ce96
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:02 2022 -0700

    vmscan: convert dirty page handling to folios

    Mostly this just eliminates calls to compound_head(), but
    NR_VMSCAN_IMMEDIATE was being incremented by 1 instead of by nr_pages.

    Link: https://lkml.kernel.org/r/20220504182857.4013401-10-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen 1d9f6a1729 swap: convert add_to_swap() to take a folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit 09c02e56327bdaf9cdbb2742a35fb8c6a6f9a6c7
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:02 2022 -0700

    swap: convert add_to_swap() to take a folio

    The only caller already has a folio available, so this saves a conversion.
    Also convert the return type to boolean.

    Link: https://lkml.kernel.org/r/20220504182857.4013401-9-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen 560e042b65 vmscan: convert the writeback handling in shrink_page_list() to folios
Bugzilla: https://bugzilla.redhat.com/2160210

commit d33e4e1412c8b618f5f2f251ab9ddcfdf9f4adf3
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:02 2022 -0700

    vmscan: convert the writeback handling in shrink_page_list() to folios

    Slightly more efficient due to fewer calls to compound_head().

    Link: https://lkml.kernel.org/r/20220504182857.4013401-7-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen 1d00716968 vmscan: use folio_mapped() in shrink_page_list()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 1bee2c1677bcb57dadd374cafb59be86ad1a1a82
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 12 20:23:01 2022 -0700

    vmscan: use folio_mapped() in shrink_page_list()

    Remove some legacy function calls.

    Link: https://lkml.kernel.org/r/20220504182857.4013401-6-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen 6182f722a9 mm/vmscan: don't use NUMA_NO_NODE as indicator of page on different node
Bugzilla: https://bugzilla.redhat.com/2160210

commit ed657e5568c5fc877c517b654d65fbdaeb628539
Author: Wei Yang <richard.weiyang@gmail.com>
Date:   Thu May 12 20:23:00 2022 -0700

    mm/vmscan: don't use NUMA_NO_NODE as indicator of page on different node

    Now we are sure there is at least one page on page_list, so it is safe to
    get the nid of it.  This means it is not necessary to use NUMA_NO_NODE as
    an indicator for the beginning of iteration or a page on different node.

    Link: https://lkml.kernel.org/r/20220429014426.29223-2-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:06 -04:00
Chris von Recklinghausen 70fc18c0c6 mm/vmscan: filter empty page_list at the beginning
Bugzilla: https://bugzilla.redhat.com/2160210

commit 1ae65e2749b0a3d236fc17d0eca5ea5f6f2c0032
Author: Wei Yang <richard.weiyang@gmail.com>
Date:   Thu May 12 20:23:00 2022 -0700

    mm/vmscan: filter empty page_list at the beginning

    node_page_list would always be !empty on finishing the loop, except
    page_list is empty.

    Let's handle empty page_list before doing any real work including touching
    PF_MEMALLOC flag.

    Link: https://lkml.kernel.org/r/20220429014426.29223-1-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:05 -04:00
Chris von Recklinghausen cd64a7df78 mm/vmscan: use helper folio_is_file_lru()
Bugzilla: https://bugzilla.redhat.com/2160210

commit f19a27e399c4354b91d608dd77f33877f613224a
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu May 12 20:23:00 2022 -0700

    mm/vmscan: use helper folio_is_file_lru()

    Use helper folio_is_file_lru() to check whether folio is file lru.  Minor
    readability improvement.

    [linmiaohe@huawei.com: use folio_is_file_lru()]
      Link: https://lkml.kernel.org/r/20220428105802.21389-1-linmiaohe@huawei.com
    Link: https://lkml.kernel.org/r/20220425111232.23182-7-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:05 -04:00
Chris von Recklinghausen 11753046de mm/vmscan: remove obsolete comment in kswapd_run
Bugzilla: https://bugzilla.redhat.com/2160210

commit 4355e4b265ccb55dd6625b82f0e2016f42f2956c
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu May 12 20:23:00 2022 -0700

    mm/vmscan: remove obsolete comment in kswapd_run

    Since commit 6b700b5b3c ("mm/vmscan.c: remove cpu online notification
    for now"), cpu online notification is removed.  So kswapd won't move to
    proper cpus if cpus are hot-added.  Remove this obsolete comment.

    Link: https://lkml.kernel.org/r/20220425111232.23182-6-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:05 -04:00
Chris von Recklinghausen ba9cdf6c5e mm/vmscan: take all base pages of THP into account when race with speculative reference
Bugzilla: https://bugzilla.redhat.com/2160210

commit 9aafcffc18785fcdd9295640eb2ed927960b31a1
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu May 12 20:23:00 2022 -0700

    mm/vmscan: take all base pages of THP into account when race with speculative reference

    If the page has buffers, shrink_page_list will try to free the buffer
    mappings associated with the page and try to free the page as well.  In
    the rare race with speculative reference, the page will be freed shortly
    by speculative reference.  But nr_reclaimed is not incremented correctly
    when we come across the THP.  We need to account all the base pages in
    this case.

    Link: https://lkml.kernel.org/r/20220425111232.23182-5-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:05 -04:00
Chris von Recklinghausen 547a19b472 mm/vmscan: introduce helper function reclaim_page_list()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 1fe47c0beb2df2739b07b8e051425b6400abce5b
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu May 12 20:22:59 2022 -0700

    mm/vmscan: introduce helper function reclaim_page_list()

    Introduce helper function reclaim_page_list() to eliminate the duplicated
    code of doing shrink_page_list() and putback_lru_page.  Also we can
    separate node reclaim from node page list operation this way.  No
    functional change intended.

    Link: https://lkml.kernel.org/r/20220425111232.23182-3-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Huang, Ying <ying.huang@intel.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Oscar Salvador <osalvador@suse.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:05 -04:00
Chris von Recklinghausen 8b9cc530f6 mm/vmscan: add a comment about MADV_FREE pages check in folio_check_dirty_writeback
Bugzilla: https://bugzilla.redhat.com/2160210

commit 32a331a72f3eec30f65fd929aeb4dfc514eca28f
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu May 12 20:22:59 2022 -0700

    mm/vmscan: add a comment about MADV_FREE pages check in folio_check_dirty_writeback

    Patch series "A few cleanup and fixup patches for vmscan

    This series contains a few patches to remove obsolete comment, introduce
    helper to remove duplicated code and so no.  Also we take all base pages
    of THP into account in rare race condition.  More details can be found in
    the respective changelogs.

    This patch (of 6):

    The MADV_FREE pages check in folio_check_dirty_writeback is a bit hard to
    follow.  Add a comment to make the code clear.

    Link: https://lkml.kernel.org/r/20220425111232.23182-2-linmiaohe@huawei.com
    Suggested-by: Huang, Ying <ying.huang@intel.com>
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Oscar Salvador <osalvador@suse.de>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:05 -04:00
Chris von Recklinghausen 2d7f88672d mm/vmscan: not necessary to re-init the list for each iteration
Bugzilla: https://bugzilla.redhat.com/2160210

commit 048f6e1a427ee9cddf62f9b3766372c69846fa4f
Author: Wei Yang <richard.weiyang@gmail.com>
Date:   Thu May 12 20:22:59 2022 -0700

    mm/vmscan: not necessary to re-init the list for each iteration

    node_page_list is defined with LIST_HEAD and be cleaned until
    list_empty.

    So it is not necessary to re-init it again.

    [akpm@linux-foundation.org: remove unneeded braces]
    Link: https://lkml.kernel.org/r/20220426021743.21007-1-richard.weiyang@gmail.com
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:05 -04:00
Chris von Recklinghausen 886ecf919c mm/vmscan: take min_slab_pages into account when try to call shrink_node
Bugzilla: https://bugzilla.redhat.com/2160210

commit d8ff6fde8e88d801b62328883680c202510ed518
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu May 12 20:22:58 2022 -0700

    mm/vmscan: take min_slab_pages into account when try to call shrink_node

    Since commit 6b4f7799c6 ("mm: vmscan: invoke slab shrinkers from
    shrink_zone()"), slab reclaim and lru page reclaim are done together in
    the shrink_node.  So we should take min_slab_pages into account when try
    to call shrink_node.

    Link: https://lkml.kernel.org/r/20220425112118.20924-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Huang Ying <ying.huang@intel.com>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:05 -04:00
Chris von Recklinghausen b4b2f59b17 fs: Remove aops->freepage
Bugzilla: https://bugzilla.redhat.com/2160210

commit 8560cb1a7d75048af275dd23fb0cf05382b3c2b9
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 5 00:43:09 2022 -0400

    fs: Remove aops->freepage

    All implementations now use free_folio so we can delete the callers
    and the method.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:02 -04:00
Chris von Recklinghausen 3ef6297538 fs: Add free_folio address space operation
Bugzilla: https://bugzilla.redhat.com/2160210

commit d2329aa0c78f4a8dd368bb706f196ab99f692eaa
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun May 1 07:35:31 2022 -0400

    fs: Add free_folio address space operation

    Include documentation and convert the callers to use ->free_folio as
    well as ->freepage.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:02 -04:00
Chris von Recklinghausen 6379ebb826 fs: Change try_to_free_buffers() to take a folio
Conflicts: drop changes to fs/hfs/inode.c fs/hfsplus/inode.c fs/ocfs2/aops.c
	fs/reiserfs/inode.c fs/reiserfs/journal.c - unsupported configs

Bugzilla: https://bugzilla.redhat.com/2160210

commit 68189fef88c7d02eb92e038be3d6428ebd0d2945
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun May 1 01:08:08 2022 -0400

    fs: Change try_to_free_buffers() to take a folio

    All but two of the callers already have a folio; pass a folio into
    try_to_free_buffers().  This removes the last user of cancel_dirty_page()
    so remove that wrapper function too.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:02 -04:00
Chris von Recklinghausen ae948f1a01 mm: submit multipage write for SWP_FS_OPS swap-space
Bugzilla: https://bugzilla.redhat.com/2160210

commit 2282679fb20bf036a714ed49fadd0230c278a203
Author: NeilBrown <neilb@suse.de>
Date:   Mon May 9 18:20:49 2022 -0700

    mm: submit multipage write for SWP_FS_OPS swap-space

    swap_writepage() is given one page at a time, but may be called repeatedly
    in succession.

    For block-device swapspace, the blk_plug functionality allows the multiple
    pages to be combined together at lower layers.  That cannot be used for
    SWP_FS_OPS as blk_plug may not exist - it is only active when
    CONFIG_BLOCK=y.  Consequently all swap reads over NFS are single page
    reads.

    With this patch we pass a pointer-to-pointer via the wbc.  swap_writepage
    can store state between calls - much like the pointer passed explicitly to
    swap_readpage.  After calling swap_writepage() some number of times, the
    state will be passed to swap_write_unplug() which can submit the combined
    request.

    Link: https://lkml.kernel.org/r/164859778128.29473.5191868522654408537.stgit@noble.brown
    Signed-off-by: NeilBrown <neilb@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Tested-by: David Howells <dhowells@redhat.com>
    Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:00 -04:00
Chris von Recklinghausen 5ee86401f1 mm: reclaim mustn't enter FS for SWP_FS_OPS swap-space
Bugzilla: https://bugzilla.redhat.com/2160210

commit d791ea676b66489ef6dabd04cd655f5c77426e40
Author: NeilBrown <neilb@suse.de>
Date:   Mon May 9 18:20:48 2022 -0700

    mm: reclaim mustn't enter FS for SWP_FS_OPS swap-space

    If swap-out is using filesystem operations (SWP_FS_OPS), then it is not
    safe to enter the FS for reclaim.  So only down-grade the requirement for
    swap pages to __GFP_IO after checking that SWP_FS_OPS are not being used.

    This makes the calculation of "may_enter_fs" slightly more complex, so
    move it into a separate function.  with that done, there is little value
    in maintaining the bool variable any more.  So replace the may_enter_fs
    variable with a may_enter_fs() function.  This removes any risk for the
    variable becoming out-of-date.

    Link: https://lkml.kernel.org/r/164859778124.29473.16176717935781721855.stgit@noble.brown
    Signed-off-by: NeilBrown <neilb@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Tested-by: David Howells <dhowells@redhat.com>
    Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:00 -04:00
Chris von Recklinghausen f942ace7a2 mm: create new mm/swap.h header file
Bugzilla: https://bugzilla.redhat.com/2160210

commit 014bb1de4fc17d54907d54418126a9a9736f4aff
Author: NeilBrown <neilb@suse.de>
Date:   Mon May 9 18:20:47 2022 -0700

    mm: create new mm/swap.h header file

    Patch series "MM changes to improve swap-over-NFS support".

    Assorted improvements for swap-via-filesystem.

    This is a resend of these patches, rebased on current HEAD.  The only
    substantial changes is that swap_dirty_folio has replaced
    swap_set_page_dirty.

    Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem.  It
    has previously worked for NFS but that broke a few releases back.  This
    series changes to use a new ->swap_rw rather than ->readpage and
    ->direct_IO.  It also makes other improvements.

    There is a companion series already in linux-next which fixes various
    issues with NFS.  Once both series land, a final patch is needed which
    changes NFS over to use ->swap_rw.

    This patch (of 10):

    Many functions declared in include/linux/swap.h are only used within mm/

    Create a new "mm/swap.h" and move some of these declarations there.
    Remove the redundant 'extern' from the function declarations.

    [akpm@linux-foundation.org: mm/memory-failure.c needs mm/swap.h]
    Link: https://lkml.kernel.org/r/164859751830.29473.5309689752169286816.stgit@noble.brown
    Link: https://lkml.kernel.org/r/164859778120.29473.11725907882296224053.stgit@noble.brown
    Signed-off-by: NeilBrown <neilb@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Tested-by: David Howells <dhowells@redhat.com>
    Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Miaohe Lin <linmiaohe@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:00 -04:00
Chris von Recklinghausen 0483dc9361 fs: Convert is_dirty_writeback() to take a folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit 520f301c54faa3484e820b80d4505d48ee587163
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Jan 17 14:35:22 2022 -0500

    fs: Convert is_dirty_writeback() to take a folio

    Pass a folio instead of a page to aops->is_dirty_writeback().
    Convert both implementations and the caller.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:58 -04:00
Chris von Recklinghausen c2b12b7b5b mm/vmscan: fix comment for isolate_lru_pages
Bugzilla: https://bugzilla.redhat.com/2160210

commit b2cb6826b6df2bdf91ae4406fd2ef97da7a9cd35
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu Apr 28 23:16:04 2022 -0700

    mm/vmscan: fix comment for isolate_lru_pages

    Since commit 791b48b642 ("mm: vmscan: scan until it finds eligible
    pages"), splicing any skipped pages to the tail of the LRU list won't put
    the system at risk of premature OOM but will waste lots of cpu cycles.
    Correct the comment accordingly.

    Link: https://lkml.kernel.org/r/20220416025231.8082-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:54 -04:00
Chris von Recklinghausen f085301398 mm/vmscan: fix comment for current_may_throttle
Bugzilla: https://bugzilla.redhat.com/2160210

commit 5829f7dbae4158f9c946fac42760de848a8f1695
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu Apr 28 23:16:04 2022 -0700

    mm/vmscan: fix comment for current_may_throttle

    Since commit 6d6435811c19 ("remove bdi_congested() and wb_congested() and
    related functions"), there is no congested backing device check anymore.
    Correct the comment accordingly.

    [akpm@linux-foundation.org: tweak grammar]
    Link: https://lkml.kernel.org/r/20220414120202.30082-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:54 -04:00