Commit Graph

487 Commits

Author SHA1 Message Date
Rafael Aquini 8bc682717f mm/writeback: update filemap_dirty_folio() comment
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit ab428b4c459e62df7dab3b1b783ea03ea06ca895
Author: Jianguo Bao <roidinev@gmail.com>
Date:   Sun Sep 17 23:04:01 2023 +0800

    mm/writeback: update filemap_dirty_folio() comment

    Change to use new address space operation dirty_folio().

    Link: https://lkml.kernel.org/r/20230917-trycontrib1-v1-1-db22630b8839@gmail.com
    Fixes: 6f31a5a261db ("fs: Add aops->dirty_folio")
    Signed-off-by: Jianguo Bau <roidinev@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:22:29 -05:00
Bill O'Donnell 770d4ac268 filemap: add a per-mapping stable writes flag
JIRA: https://issues.redhat.com/browse/RHEL-62760

Conflicts: context errors in pagemap.h

commit 762321dab9a72760bf9aec48362f932717c9424d
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Oct 25 16:10:17 2023 +0200

    filemap: add a per-mapping stable writes flag

    folio_wait_stable waits for writeback to finish before modifying the
    contents of a folio again, e.g. to support check summing of the data
    in the block integrity code.

    Currently this behavior is controlled by the SB_I_STABLE_WRITES flag
    on the super_block, which means it is uniform for the entire file system.
    This is wrong for the block device pseudofs which is shared by all
    block devices, or file systems that can use multiple devices like XFS
    witht the RT subvolume or btrfs (although btrfs currently reimplements
    folio_wait_stable anyway).

    Add a per-address_space AS_STABLE_WRITES flag to control the behavior
    in a more fine grained way.  The existing SB_I_STABLE_WRITES is kept
    to initialize AS_STABLE_WRITES to the existing default which covers
    most cases.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20231025141020.192413-2-hch@lst.de
    Tested-by: Ilya Dryomov <idryomov@gmail.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
2024-11-09 10:06:46 -06:00
Rafael Aquini 20240fe828 mm: remove folio_account_redirty
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit ed2da9246f324ae88a2dcae629fc2008632ff151
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Jun 28 17:31:44 2023 +0200

    mm: remove folio_account_redirty

    Fold folio_account_redirty into folio_redirty_for_writepage now
    that all other users except for the also unused account_page_redirty
    wrapper are gone.

    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:17:24 -04:00
Rafael Aquini 4ba565e805 mm: kill lock|unlock_page_memcg()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 6c77b607ee26472fb945aa41734281c39d06d68f
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Wed Jun 14 22:36:12 2023 +0800

    mm: kill lock|unlock_page_memcg()

    Since commit c7c3dec1c9db ("mm: rmap: remove lock_page_memcg()"),
    no more user, kill lock_page_memcg() and unlock_page_memcg().

    Link: https://lkml.kernel.org/r/20230614143612.62575-1-wangkefeng.wang@huawei.com
    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:37:02 -04:00
Chris von Recklinghausen 6dd9ab12cf mm: avoid overflows in dirty throttling logic
JIRA: https://issues.redhat.com/browse/RHEL-50004

commit 385d838df280eba6c8680f9777bfa0d0bfe7e8b2
Author: Jan Kara <jack@suse.cz>
Date:   Fri Jun 21 16:42:38 2024 +0200

    mm: avoid overflows in dirty throttling logic

    The dirty throttling logic is interspersed with assumptions that dirty
    limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
    fit into 64-bits).  If limits end up being larger, we will hit overflows,
    possible divisions by 0 etc.  Fix these problems by never allowing so
    large dirty limits as they have dubious practical value anyway.  For
    dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
    so large limits.  For dirty_ratio / dirty_background_ratio it isn't so
    simple as the dirty limit is computed from the amount of available memory
    which can change due to memory hotplug etc.  So when converting dirty
    limits from ratios to numbers of pages, we just don't allow the result to
    exceed UINT_MAX.

    This is root-only triggerable problem which occurs when the operator
    sets dirty limits to >16 TB.

    Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz
    Signed-off-by: Jan Kara <jack@suse.cz>
    Reported-by: Zach O'Keefe <zokeefe@google.com>
    Reviewed-By: Zach O'Keefe <zokeefe@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-07-22 11:07:50 -04:00
Chris von Recklinghausen 64f014484f Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
JIRA: https://issues.redhat.com/browse/RHEL-50004

commit 30139c702048f1097342a31302cbd3d478f50c63
Author: Jan Kara <jack@suse.cz>
Date:   Fri Jun 21 16:42:37 2024 +0200

    Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"

    Patch series "mm: Avoid possible overflows in dirty throttling".

    Dirty throttling logic assumes dirty limits in page units fit into
    32-bits.  This patch series makes sure this is true (see patch 2/2 for
    more details).

    This patch (of 2):

    This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.

    The commit is broken in several ways.  Firstly, the removed (u64) cast
    from the multiplication will introduce a multiplication overflow on 32-bit
    archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
    default settings with 4GB of RAM will trigger this).  Secondly, the
    div64_u64() is unnecessarily expensive on 32-bit archs.  We have
    div64_ul() in case we want to be safe & cheap.  Thirdly, if dirty
    thresholds are larger than 1<<32 pages, then dirty balancing is going to
    blow up in many other spectacular ways anyway so trying to fix one
    possible overflow is just moot.

    Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
    Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
    Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Reviewed-By: Zach O'Keefe <zokeefe@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-07-22 11:07:50 -04:00
Nico Pache 734c9b831e writeback: account the number of pages written back
commit 8344a3d44be3d18671e18c4ba23bb03dd21e14ad
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jun 28 19:55:48 2023 +0100

    writeback: account the number of pages written back

    nr_to_write is a count of pages, so we need to decrease it by the number
    of pages in the folio we just wrote, not by 1.  Most callers specify
    either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
    end up writing 512x as many pages as it asked for.

    Dave added:

    : XFS is the only filesystem this would affect, right?  AFAIA, nothing
    : else enables large folios and uses writeback through
    : write_cache_pages() at this point...
    :
    : In which case, I'd be surprised if much difference, if any, gets
    : noticed by anyone.

    Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org
    Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:31 -06:00
Nico Pache ab0829bbbe mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78
Author: Zach O'Keefe <zokeefe@google.com>
Date:   Thu Jan 18 10:19:53 2024 -0800

    mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again

    (struct dirty_throttle_control *)->thresh is an unsigned long, but is
    passed as the u32 divisor argument to div_u64().  On architectures where
    unsigned long is 64 bytes, the argument will be implicitly truncated.

    Use div64_u64() instead of div_u64() so that the value used in the "is
    this a safe division" check is the same as the divisor.

    Also, remove redundant cast of the numerator to u64, as that should happen
    implicitly.

    This would be difficult to exploit in memcg domain, given the ratio-based
    arithmetic domain_drity_limits() uses, but is much easier in global
    writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g.
    vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32)

    Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com
    Fixes: f6789593d5 ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()")
    Signed-off-by: Zach O'Keefe <zokeefe@google.com>
    Cc: Maxim Patlasov <MPatlasov@parallels.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:29 -06:00
Nico Pache e7cf2d0fc3 mm: fix arithmetic for max_prop_frac when setting max_ratio
commit fa151a39a6879144b587f35c0dfcc15e1be9450f
Author: Jingbo Xu <jefflexu@linux.alibaba.com>
Date:   Tue Dec 19 22:25:08 2023 +0800

    mm: fix arithmetic for max_prop_frac when setting max_ratio

    Since now bdi->max_ratio is part per million, fix the wrong arithmetic for
    max_prop_frac when setting max_ratio.  Otherwise the miscalculated
    max_prop_frac will affect the incrementing of writeout completion count
    when max_ratio is not 100%.

    Link: https://lkml.kernel.org/r/20231219142508.86265-3-jefflexu@linux.alibaba.com
    Fixes: efc3e6ad53ea ("mm: split off __bdi_set_max_ratio() function")
    Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
    Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Stefan Roesch <shr@devkernel.io>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:23 -06:00
Nico Pache 6f7322d7cd mm: fix arithmetic for bdi min_ratio
commit e0646b7590084a5bf3b056d3ad871d9379d2c25a
Author: Jingbo Xu <jefflexu@linux.alibaba.com>
Date:   Tue Dec 19 22:25:07 2023 +0800

    mm: fix arithmetic for bdi min_ratio

    Since now bdi->min_ratio is part per million, fix the wrong arithmetic.
    Otherwise it will fail with -EINVAL when setting a reasonable min_ratio,
    as it tries to set min_ratio to (min_ratio * BDI_RATIO_SCALE) in
    percentage unit, which exceeds 100% anyway.

        # cat /sys/class/bdi/253\:0/min_ratio
        0
        # cat /sys/class/bdi/253\:0/max_ratio
        100
        # echo 1 > /sys/class/bdi/253\:0/min_ratio
        -bash: echo: write error: Invalid argument

    Link: https://lkml.kernel.org/r/20231219142508.86265-2-jefflexu@linux.alibaba.com
    Fixes: 8021fb3232f2 ("mm: split off __bdi_set_min_ratio() function")
    Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
    Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Stefan Roesch <shr@devkernel.io>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:23 -06:00
Chris von Recklinghausen bee415387b mm,jfs: move write_one_page/folio_write_one to jfs
JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 452a8f40728065800b5a5b81f1152e9a16d39656
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Mar 7 15:31:25 2023 +0100

    mm,jfs: move write_one_page/folio_write_one to jfs

    The last remaining user of folio_write_one through the write_one_page
    wrapper is jfs, so move the functionality there and hard code the call to
    metapage_writepage.

    Note that the use of the pagecache by the JFS 'metapage' buffer cache is a
    bit odd, and we could probably do without VM-level dirty tracking at all,
    but that's a change for another time.

    Link: https://lkml.kernel.org/r/20230307143125.27778-4-hch@lst.de
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Evgeniy Dushistov <dushistov@mail.ru>
    Cc: Gang He <ghe@suse.com>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jan Kara via Ocfs2-devel <ocfs2-devel@oss.oracle.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Joseph Qi <jiangqi903@gmail.com>
    Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:11 -04:00
Aristeu Rozanski 5a6ac04952 fs: convert writepage_t callback to pass a folio
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: mpage_writepage() still exists, so update it to use __mpage_writepage() with folio, 0c493b5cf16e2 was backported already, so update fs/nfs/write changes accordingly; dropped ntfs3 changes as we don't support it; adding fix present on merge 3822a7c40997 for gfs2

commit d585bdbeb79aa13b8a9bbe952d90f5252f7fe909
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu Jan 26 20:12:54 2023 +0000

    fs: convert writepage_t callback to pass a folio

    Patch series "Convert writepage_t to use a folio".

    More folioisation.  I split out the mpage work from everything else
    because it completely dominated the patch, but some implementations I just
    converted outright.

    This patch (of 2):

    We always write back an entire folio, but that's currently passed as the
    head page.  Convert all filesystems that use write_cache_pages() to expect
    a folio instead of a page.

    Link: https://lkml.kernel.org/r/20230126201255.1681189-1-willy@infradead.org
    Link: https://lkml.kernel.org/r/20230126201255.1681189-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Christoph Hellwig <hch@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:13 -04:00
Aristeu Rozanski 98ad16b46e mm/fs: convert inode_attach_wb() to take a folio
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 9cfb816b1c6c99f4b3c1d4a0fb096162cd17ec71
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Jan 16 19:25:06 2023 +0000

    mm/fs: convert inode_attach_wb() to take a folio

    Patch series "Writeback folio conversions".

    Remove more calls to compound_head() by passing folios around instead of
    pages.

    This patch (of 2):

    The only caller of inode_attach_wb() which doesn't pass NULL already has a
    folio, so convert the whole call-chain to take folios.

    Link: https://lkml.kernel.org/r/20230116192507.2146150-1-willy@infradead.org
    Link: https://lkml.kernel.org/r/20230116192507.2146150-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:10 -04:00
Aristeu Rozanski 291939094d page-writeback: convert write_cache_pages() to use filemap_get_folios_tag()
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 0fff435f060c8b29cb068d4068cb2df513046865
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date:   Wed Jan 4 13:14:29 2023 -0800

    page-writeback: convert write_cache_pages() to use filemap_get_folios_tag()

    Convert function to use folios throughout.  This is in preparation for the
    removal of find_get_pages_range_tag().  This change removes 8 calls to
    compound_head(), and the function now supports large folios.

    Link: https://lkml.kernel.org/r/20230104211448.4804-5-vishal.moola@gmail.com
    Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Reviewed-by: Matthew Wilcow (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:09 -04:00
Aristeu Rozanski 20dd56698e mm: remove zap_page_range and create zap_vma_pages
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: dropped RISCV changes, and due missing b59c9dc4d9d47b

commit e9adcfecf572fcfaa9f8525904cf49c709974f73
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date:   Tue Jan 3 16:27:32 2023 -0800

    mm: remove zap_page_range and create zap_vma_pages

    zap_page_range was originally designed to unmap pages within an address
    range that could span multiple vmas.  While working on [1], it was
    discovered that all callers of zap_page_range pass a range entirely within
    a single vma.  In addition, the mmu notification call within zap_page
    range does not correctly handle ranges that span multiple vmas.  When
    crossing a vma boundary, a new mmu_notifier_range_init/end call pair with
    the new vma should be made.

    Instead of fixing zap_page_range, do the following:
    - Create a new routine zap_vma_pages() that will remove all pages within
      the passed vma.  Most users of zap_page_range pass the entire vma and
      can use this new routine.
    - For callers of zap_page_range not passing the entire vma, instead call
      zap_page_range_single().
    - Remove zap_page_range.

    [1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
    Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Suggested-by: Peter Xu <peterx@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Peter Xu <peterx@redhat.com>
    Acked-by: Heiko Carstens <hca@linux.ibm.com>    [s390]
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Eric Dumazet <edumazet@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Will Deacon <will@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:03 -04:00
Aristeu Rozanski 836d6a510f mm: remove generic_writepages
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: keeping generic_writepages() until mpage_writepage() gets removed in the future

commit c2ca7a59a4199059556b57cfdf98fcf46039ca6b
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Dec 29 06:10:31 2022 -1000

    mm: remove generic_writepages

    Now that all external callers are gone, just fold it into do_writepages.

    Link: https://lkml.kernel.org/r/20221229161031.391878-7-hch@lst.de
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:02 -04:00
Aristeu Rozanski c9156ec75f mm/swap: convert deactivate_page() to folio_deactivate()
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: f3cd4ab0aabf was backported before this change

commit 5a9e34747c9f731bbb6b7fd7521c4fec0d840593
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date:   Wed Dec 21 10:08:48 2022 -0800

    mm/swap: convert deactivate_page() to folio_deactivate()

    Deactivate_page() has already been converted to use folios, this change
    converts it to take in a folio argument instead of calling page_folio().
    It also renames the function folio_deactivate() to be more consistent with
    other folio functions.

    [akpm@linux-foundation.org: fix left-over comments, per Yu Zhao]
    Link: https://lkml.kernel.org/r/20221221180848.20774-5-vishal.moola@gmail.com
    Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: SeongJae Park <sj@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:02 -04:00
Audra Mitchell 41b0dfb7e0 mm: add bdi_set_min_ratio_no_scale() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 2c44af4f2aaa260199f218f11920c406e688693c
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:52:13 2022 -0800

    mm: add bdi_set_min_ratio_no_scale() function

    This introduces bdi_set_min_ratio_no_scale(). It uses the max
    granularity for the ratio. This function by the new sysfs knob
    min_ratio_fine.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-19-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:43:00 -04:00
Audra Mitchell 89db0dc68e mm: add bdi_set_max_ratio_no_scale() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 4e230b406eda9bdf7f8a71e2cc3df18a824abcb0
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:52:10 2022 -0800

    mm: add bdi_set_max_ratio_no_scale() function

    This introduces bdi_set_max_ratio_no_scale(). It uses the max
    granularity for the ratio. This function by the new sysfs knob
    max_ratio_fine.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-16-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:59 -04:00
Audra Mitchell 108692baab mm: add bdi_set_min_bytes() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 803c98050569850be5fd51a2025c67622de887d9
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:52:07 2022 -0800

    mm: add bdi_set_min_bytes() function

    This introduces the bdi_set_min_bytes() function. The min_bytes function
    does not store the min_bytes value. Instead it converts the min_bytes
    value into the corresponding ratio value.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-13-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:59 -04:00
Audra Mitchell 4646f8bbad mm: split off __bdi_set_min_ratio() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 8021fb3232f265b81c7e4e7aba15bc3a04ff1fd3
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:52:06 2022 -0800

    mm: split off __bdi_set_min_ratio() function

    This splits off the __bdi_set_min_ratio() function from the
    bdi_set_min_ratio() function. The __bdi_set_min_ratio() function will
    also be called from the bdi_set_min_bytes() function, which will be
    introduced in the next patch.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-12-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:59 -04:00
Audra Mitchell 06524ee44f mm: add bdi_get_min_bytes() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 712c00d66a342a3ed375df41c3df7d3d2abad2c0
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:52:05 2022 -0800

    mm: add bdi_get_min_bytes() function

    This adds a function to return the specified value for min_bytes. It
    converts the stored min_ratio of the bdi to the corresponding bytes
    value. This is an approximation as it is based on the value that is
    returned by global_dirty_limits(), which can change. The returned
    value can be different than the value when the min_bytes value was set.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-11-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:59 -04:00
Audra Mitchell 34875d9b81 mm: add bdi_set_max_bytes() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 1bf27e98d26d1e62166a456ef17460be085cbe0b
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:52:02 2022 -0800

    mm: add bdi_set_max_bytes() function

    This introduces the bdi_set_max_bytes() function. The max_bytes function
    does not store the max_bytes value. Instead it converts the max_bytes
    value into the corresponding ratio value.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-8-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:59 -04:00
Audra Mitchell 19ea20bdf1 mm: split off __bdi_set_max_ratio() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit efc3e6ad53ea14225b434fddca261c9a1c56c707
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:52:01 2022 -0800

    mm: split off __bdi_set_max_ratio() function

    This splits off __bdi_set_max_ratio() from bdi_set_max_ratio().
    __bdi_set_max_ratio() will also be called from bdi_set_max_bytes(),
    which will be introduced in the next patch.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-7-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:59 -04:00
Audra Mitchell 1b284fc818 mm: add bdi_get_max_bytes() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 00df7d51263b46ed93f7572e2d09579746f7b1eb
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:52:00 2022 -0800

    mm: add bdi_get_max_bytes() function

    This adds a function to return the specified value for max_bytes. It
    converts the stored max_ratio of the bdi to the corresponding bytes
    value. It introduces the bdi_get_bytes helper function to do the
    conversion. This is an approximation as it is based on the value that is
    returned by global_dirty_limits(), which can change. The helper function
    will also be used by the min_bytes bdi knob.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-6-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:59 -04:00
Audra Mitchell 54d5ecadea mm: use part per 1000000 for bdi ratios
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit ae82291e9ca47c3d6da6b77a00f427754aca413e
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:51:59 2022 -0800

    mm: use part per 1000000 for bdi ratios

    To get finer granularity for ratio calculations use part per million
    instead of percentiles. This is especially important if we want to
    automatically convert byte values to ratios. Otherwise the values that
    are actually used can be quite different. This is also important for
    machines with more main memory (1% of 256GB is already 2.5GB).

    Link: https://lkml.kernel.org/r/20221119005215.3052436-5-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Chris Mason <clm@meta.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:59 -04:00
Audra Mitchell 0639c7ce8b mm: add bdi_set_strict_limit() function
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 8e9d5ead865a1a7af74a444d2f00f1ef4539bfba
Author: Stefan Roesch <shr@devkernel.io>
Date:   Fri Nov 18 16:51:56 2022 -0800

    mm: add bdi_set_strict_limit() function

    Patch series "mm/block: add bdi sysfs knobs", v4.

    At meta network block devices (nbd) are used to implement remote block
    storage.  In testing and during production it has been observed that these
    network block devices can consume a huge portion of the dirty writeback
    cache and writeback can take a considerable time.

    To be able to give stricter limits, I'm proposing the following changes:

    1) introduce strictlimit knob

      Currently the max_ratio knob exists to limit the dirty_memory. However
      this knob only applies once (dirty_ratio + dirty_background_ratio) / 2
      has been reached.
      With the BDI_CAP_STRICTLIMIT flag, the max_ratio can be applied without
      reaching that limit. This change exposes that knob.

      This knob can also be useful for NFS, fuse filesystems and USB devices.

    2) Use part of 1000000 internal calculation

      The max_ratio is based on percentage. With the current machine sizes
      percentage values can be very high (1% of a 256GB main memory is already
      2.5GB). This change uses part of 1000000 instead of percentages for the
      internal calculations.

    3) Introduce two new sysfs knobs: min_bytes and max_bytes.

      Currently all calculations are based on ratio, but for a user it often
      more convenient to specify a limit in bytes. The new knobs will not
      store bytes values, instead they will translate the byte value to a
      corresponding ratio. As the internal values are now part of 1000, the
      ratio is closer to the specified value. However the value should be more
      seen as an approximation as it can fluctuate over time.

    3) Introduce two new sysfs knobs: min_ratio_fine and max_ratio_fine.

      The granularity for the existing sysfs bdi knobs min_ratio and max_ratio
      is based on percentage values. The new sysfs bdi knobs min_ratio_fine
      and max_ratio_fine allow to specify the ratio as part of 1 million.

    This patch (of 20):

    This adds the bdi_set_strict_limit function to be able to set/unset the
    BDI_CAP_STRICTLIMIT flag.

    Link: https://lkml.kernel.org/r/20221119005215.3052436-1-shr@devkernel.io
    Link: https://lkml.kernel.org/r/20221119005215.3052436-2-shr@devkernel.io
    Signed-off-by: Stefan Roesch <shr@devkernel.io>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Chris Mason <clm@meta.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:58 -04:00
Ming Lei 953c1697f5 blk-wbt: Fix detection of dirty-throttled tasks
JIRA: https://issues.redhat.com/browse/RHEL-25988

commit f814bdda774c183b0cc15ec8f3b6e7c6f4527ba5
Author: Jan Kara <jack@suse.cz>
Date:   Tue Jan 23 18:58:26 2024 +0100

    blk-wbt: Fix detection of dirty-throttled tasks

    The detection of dirty-throttled tasks in blk-wbt has been subtly broken
    since its beginning in 2016. Namely if we are doing cgroup writeback and
    the throttled task is not in the root cgroup, balance_dirty_pages() will
    set dirty_sleep for the non-root bdi_writeback structure. However
    blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback
    structure. Thus detection of recently throttled tasks is not working in
    this case (we noticed this when we switched to cgroup v2 and suddently
    writeback was slow).

    Since blk-wbt has no easy way to get to proper bdi_writeback and
    furthermore its intention has always been to work on the whole device
    rather than on individual cgroups, just move the dirty_sleep timestamp
    from bdi_writeback to backing_dev_info. That fixes the checking for
    recently throttled task and saves memory for everybody as a bonus.

    CC: stable@vger.kernel.org
    Fixes: b57d74aff9 ("writeback: track if we're sleeping on progress in balance_dirty_pages()")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20240123175826.21452-1-jack@suse.cz
    [axboe: fixup indentation errors]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-07 13:20:01 +08:00
Chris von Recklinghausen 6d19e00e58 mm: export balance_dirty_pages_ratelimited_flags()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 611df5d6616d80a22906c352ccd80c395982fbd9
Author: Stefan Roesch <shr@fb.com>
Date:   Mon Sep 12 12:27:41 2022 -0700

    mm: export balance_dirty_pages_ratelimited_flags()

    Export the function balance_dirty_pages_ratelimited_flags(). It is now
    also called from btrfs.

    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Stefan Roesch <shr@fb.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:11 -04:00
Chris von Recklinghausen 83b2b27a26 mm: Add balance_dirty_pages_ratelimited_flags() function
Bugzilla: https://bugzilla.redhat.com/2160210

commit fe6c9c6e3e3e332b998393d214fba9d09ab0acb0
Author: Jan Kara <jack@suse.cz>
Date:   Thu Jun 23 10:51:46 2022 -0700

    mm: Add balance_dirty_pages_ratelimited_flags() function

    This adds the helper function balance_dirty_pages_ratelimited_flags().
    It adds the parameter flags to balance_dirty_pages_ratelimited().
    The flags parameter is passed to balance_dirty_pages(). For async
    buffered writes the flag value will be BDP_ASYNC.

    If balance_dirty_pages() gets called for async buffered write, we don't
    want to wait. Instead we need to indicate to the caller that throttling
    is needed so that it can stop writing and offload the rest of the write
    to a context that can block.

    The new helper function is also used by balance_dirty_pages_ratelimited().

    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Stefan Roesch <shr@fb.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220623175157.1715274-4-shr@fb.com
    [axboe: fix kerneltest bot 'ret' issue]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:27 -04:00
Chris von Recklinghausen 74df9e0609 mm: Move updates of dirty_exceeded into one place
Bugzilla: https://bugzilla.redhat.com/2160210

commit e92eebbb09218e128e559cf12b65317721309324
Author: Jan Kara <jack@suse.cz>
Date:   Thu Jun 23 10:51:45 2022 -0700

    mm: Move updates of dirty_exceeded into one place

    Transition of wb->dirty_exceeded from 0 to 1 happens before we go to
    sleep in balance_dirty_pages() while transition from 1 to 0 happens when
    exiting from balance_dirty_pages(), possibly based on old values. This
    does not make a lot of sense since wb->dirty_exceeded should simply
    reflect whether wb is over dirty limit and so we should ratelimit
    entering to balance_dirty_pages() less. Move the two updates together.

    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Stefan Roesch <shr@fb.com>
    Link: https://lore.kernel.org/r/20220623175157.1715274-3-shr@fb.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:27 -04:00
Chris von Recklinghausen 7c7763c786 mm: Move starting of background writeback into the main balancing loop
Bugzilla: https://bugzilla.redhat.com/2160210

commit ea6813be07dcdc072aa9ad18099115a74cecb5e1
Author: Jan Kara <jack@suse.cz>
Date:   Thu Jun 23 10:51:44 2022 -0700

    mm: Move starting of background writeback into the main balancing loop

    We start background writeback if we are over background threshold after
    exiting the main loop in balance_dirty_pages(). This may result in
    basing the decision on already stale values (we may have slept for
    significant amount of time) and it is also inconvenient for refactoring
    needed for async dirty throttling. Move the check into the main waiting
    loop.

    Signed-off-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Stefan Roesch <shr@fb.com>
    Link: https://lore.kernel.org/r/20220623175157.1715274-2-shr@fb.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:27 -04:00
Chris von Recklinghausen 0b6f15f9ec filemap: Update the folio_mark_dirty documentation
Bugzilla: https://bugzilla.redhat.com/2160210

commit 2ca456c24801e439256c0ec7dbe21eba7b01544e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu Apr 28 14:21:02 2022 -0400

    filemap: Update the folio_mark_dirty documentation

    The previous comment was not terribly helpful.  Be a bit more explicit
    about the necessary locking environment.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:58 -04:00
Chris von Recklinghausen 74be884ce9 mm: rework calculation of bdi_min_ratio in bdi_set_min_ratio
Bugzilla: https://bugzilla.redhat.com/2160210

commit 21f0dd88f23dc9dc46b781f8ec9acf975dca4e6e
Author: Chen Wandun <chenwandun@huawei.com>
Date:   Thu Apr 28 23:15:57 2022 -0700

    mm: rework calculation of bdi_min_ratio in bdi_set_min_ratio

    In function bdi_set_min_ratio, min_ratio is unsigned int, it will
    result underflow when setting min_ratio below bdi->min_ratio, it
    is confusing. Rework it, no functional change.

    Link: https://lkml.kernel.org/r/20220422095159.2858305-1-chenwandun@huawei.com
    Signed-off-by: Chen Wandun <chenwandun@huawei.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:52 -04:00
Chris von Recklinghausen 50a9ae71a2 mm: fix unused variable kernel warning when SYSCTL=n
Bugzilla: https://bugzilla.redhat.com/2160210

commit 3c6a4cba3138d1aeeb8fd917178c6578b9b8ae29
Author: Luis Chamberlain <mcgrof@kernel.org>
Date:   Fri Apr 15 15:08:02 2022 -0700

    mm: fix unused variable kernel warning when SYSCTL=n

    When CONFIG_SYSCTL=n the variable dirty_bytes_min which is just used
    as a minimum to a proc handler is not used. So just move this under
    the ifdef for CONFIG_SYSCTL.

    Fixes: aa779e510219 ("mm: move page-writeback sysctls to their own file")
    Reported-by: kernel test robot <lkp@intel.com>
    Acked-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:51 -04:00
Chris von Recklinghausen 2805839ca0 mm: move page-writeback sysctls to their own file
Bugzilla: https://bugzilla.redhat.com/2160210

commit aa779e5102195e1d9ade95dcbc0bfbd8f916eb59
Author: zhanglianjie <zhanglianjie@uniontech.com>
Date:   Thu Feb 17 18:51:51 2022 -0800

    mm: move page-writeback sysctls to their own file

    kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
    dishes, this makes it very difficult to maintain.

    To help with this maintenance let's start by moving sysctls to places
    where they actually belong.  The proc sysctl maintainers do not want to
    know what sysctl knobs you wish to add for your own piece of code, we just
    care about the core logic.

    So move the page-writeback sysctls to its own file.

    [akpm@linux-foundation.org: coding-style cleanups]

    akpm@linux-foundation.org: fix CONFIG_SYSCTL=n warnings]
    Link: https://lkml.kernel.org/r/20220129012955.26594-1-zhanglianjie@uniontech.com
    Signed-off-by: zhanglianjie <zhanglianjie@uniontech.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Iurii Zaikin <yzaikin@google.com>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:50 -04:00
Nico Pache 59af49451c writeback: avoid use-after-free after removing device
commit f87904c075515f3e1d8f4a7115869d3b914674fd
Author: Khazhismel Kumykov <khazhy@chromium.org>
Date:   Mon Aug 1 08:50:34 2022 -0700

    writeback: avoid use-after-free after removing device

    When a disk is removed, bdi_unregister gets called to stop further
    writeback and wait for associated delayed work to complete.  However,
    wb_inode_writeback_end() may schedule bandwidth estimation dwork after
    this has completed, which can result in the timer attempting to access the
    just freed bdi_writeback.

    Fix this by checking if the bdi_writeback is alive, similar to when
    scheduling writeback work.

    Since this requires wb->work_lock, and wb_inode_writeback_end() may get
    called from interrupt, switch wb->work_lock to an irqsafe lock.

    Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com
    Fixes: 45a2966fd641 ("writeback: fix bandwidth estimate for spiky workload")
    Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Cc: Michael Stapelberg <stapelberg+linux@google.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498
Signed-off-by: Nico Pache <npache@redhat.com>
2022-11-08 10:11:41 -07:00
Chris von Recklinghausen d024eb830c mm/writeback: minor clean up for highmem_dirtyable_memory
Bugzilla: https://bugzilla.redhat.com/2120352

commit 854d8e36168d79ad09a831d60bd4d835ad33e188
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Tue Mar 22 14:39:31 2022 -0700

    mm/writeback: minor clean up for highmem_dirtyable_memory

    Since commit a804552b9a ("mm/page-writeback.c: fix
    dirty_balance_reserve subtraction from dirtyable memory"), local
    variable x can not be negative.  And it can not overflow when it is the
    total number of dirtyable highmem pages.  Thus remove the unneeded
    comment and overflow check.

    Link: https://lkml.kernel.org/r/20220224115416.46089-1-linmiaohe@huawei.com
    Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:49 -04:00
Chris von Recklinghausen d717f9da28 fs: Remove aops ->set_page_dirty
Bugzilla: https://bugzilla.redhat.com/2120352

commit 3a3bae50af5d73fab5da20484029de77ca67bb2e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:22:15 2022 +0000

    fs: Remove aops ->set_page_dirty

    With all implementations converted to ->dirty_folio, we can stop calling
    this fallback method and remove it entirely.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:48 -04:00
Chris von Recklinghausen 77da4a630d fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
Bugzilla: https://bugzilla.redhat.com/2120352

commit 46de8b979492e1377947700ecb1e3169088668b2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:22:13 2022 +0000

    fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio

    This is a mechanical change.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:48 -04:00
Chris von Recklinghausen 63c79e60d5 fs: Convert __set_page_dirty_buffers to block_dirty_folio
Conflicts: Drop change to fs/ntfs3/inode.c - file not in CS9, even if it
	was it wouldn't be a supported config.

Bugzilla: https://bugzilla.redhat.com/2120352

commit e621900ad28b748e058b81d6078a5d5eb37b3973
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:22:12 2022 +0000

    fs: Convert __set_page_dirty_buffers to block_dirty_folio

    Convert all callers; mostly this is just changing the aops to point
    at it, but a few implementations need a little more work.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:48 -04:00
Chris von Recklinghausen 892fa2d7d9 mm/vmscan: centralise timeout values for reclaim_throttle
Conflicts: The presence of
	d1d8a3b4d06d ("mm: Turn isolate_lru_page() into folio_isolate_lru()")
	causes a merge coflict due to differing context. Jus remove the
	timeout argument to reclaim_throttle

Bugzilla: https://bugzilla.redhat.com/2120352

commit c3f4a9a2b082c5392fbff17c6d8551154add5fdb
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Fri Nov 5 13:42:42 2021 -0700

    mm/vmscan: centralise timeout values for reclaim_throttle

    Neil Brown raised concerns about callers of reclaim_throttle specifying
    a timeout value.  The original timeout values to congestion_wait() were
    probably pulled out of thin air or copy&pasted from somewhere else.
    This patch centralises the timeout values and selects a timeout based on
    the reason for reclaim throttling.  These figures are also pulled out of
    the same thin air but better values may be derived

    Running a workload that is throttling for inappropriate periods and
    tracing mm_vmscan_throttled can be used to pick a more appropriate
    value.  Excessive throttling would pick a lower timeout where as
    excessive CPU usage in reclaim context would select a larger timeout.
    Ideally a large value would always be used and the wakeups would occur
    before a timeout but that requires careful testing.

    Link: https://lkml.kernel.org/r/20211022144651.19914-7-mgorman@techsingulari
ty.net
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: "Darrick J . Wong" <djwong@kernel.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: NeilBrown <neilb@suse.de>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:29 -04:00
Chris von Recklinghausen b8e20abb98 mm/writeback: throttle based on page writeback instead of congestion
Bugzilla: https://bugzilla.redhat.com/2120352

commit 8d58802fc9de1b416601d90da794a3feaad1898d
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Fri Nov 5 13:42:35 2021 -0700

    mm/writeback: throttle based on page writeback instead of congestion

    do_writepages throttles on congestion if the writepages() fails due to a
    lack of memory but congestion_wait() is partially broken as the
    congestion state is not updated for all BDIs.

    This patch stalls waiting for a number of pages to complete writeback
    that located on the local node.  The main weakness is that there is no
    correlation between the location of the inode's pages and locality but
    that is still better than congestion_wait.

    Link: https://lkml.kernel.org/r/20211022144651.19914-5-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: "Darrick J . Wong" <djwong@kernel.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: NeilBrown <neilb@suse.de>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: "Theodore Ts'o" <tytso@mit.edu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:29 -04:00
Aristeu Rozanski e1a409a714 mm: warn on deleting redirtied only if accounted
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 566d3362885aab04d6b0f885f12db3176ca3a032
Author: Hugh Dickins <hughd@google.com>
Date:   Thu Mar 24 18:13:59 2022 -0700

    mm: warn on deleting redirtied only if accounted

    filemap_unaccount_folio() has a WARN_ON_ONCE(folio_test_dirty(folio)).  It
    is good to warn of late dirtying on a persistent filesystem, but late
    dirtying on tmpfs can only lose data which is expected to be thrown away;
    and it's a pity if that warning comes ONCE on tmpfs, then hides others
    which really matter.  Make it conditional on mapping_cap_writeback().

    Cleanup: then folio_account_cleaned() no longer needs to check that for
    itself, and so no longer needs to know the mapping.

    Link: https://lkml.kernel.org/r/b5a1106c-7226-a5c6-ad41-ad4832cae1f@google.com
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Jan Kara <jack@suse.de>
    Cc: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:21 -04:00
Aristeu Rozanski ffba7441fd fs: Add aops->dirty_folio
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 6f31a5a261dbbe7bf7f585dfe81f8acd4b25ec3b
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:22:00 2022 +0000

    fs: Add aops->dirty_folio

    This replaces ->set_page_dirty().  It returns a bool instead of an int
    and takes the address_space as a parameter instead of expecting the
    implementations to retrieve the address_space from the page.  This is
    particularly important for filesystems which use FS_OPS for swap.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:14 -04:00
Aristeu Rozanski b08f1848d6 mm/writeback: Improve __folio_mark_dirty() comment
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit a229a4f00d1eab3f665b92dc9f8dbceca9b8f49c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Dec 20 06:44:51 2020 -0500

    mm/writeback: Improve __folio_mark_dirty() comment

    Add some notes about how this function needs to be called.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:07 -04:00
Aristeu Rozanski 2d7ee21c2e folio: Add a function to get the host inode for a folio
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2019485
Tested: ran Rafael's set of sanity tests, other than known issues, seems ok

commit 452c472e26348df1e7052544130aa98eebbd2331
Author: David Howells <dhowells@redhat.com>
Date:   Thu Aug 12 22:09:57 2021 +0100

    folio: Add a function to get the host inode for a folio

    Add a convenience function, folio_inode() that will get the host inode from
    a folio's mapping.

    Changes:
     ver #3:
      - Fix mistake in function description[2].
     ver #2:
      - Fix contradiction between doc and implementation by disallowing use
        with swap caches[1].

    Signed-off-by: David Howells <dhowells@redhat.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Jeff Layton <jlayton@kernel.org>
    Tested-by: Dominique Martinet <asmadeus@codewreck.org>
    Tested-by: kafs-testing@auristor.com
    Link: https://lore.kernel.org/r/YST8OcVNy02Rivbm@casper.infradead.org/ [1]
    Link: https://lore.kernel.org/r/YYKLkBwQdtn4ja+i@casper.infradead.org/ [2]
    Link: https://lore.kernel.org/r/162880453171.3369675.3704943108660112470.stgit@warthog.procyon.org.uk/ # rfc
    Link: https://lore.kernel.org/r/162981151155.1901565.7010079316994382707.stgit@warthog.procyon.org.uk/
    Link: https://lore.kernel.org/r/163005744370.2472992.18324470937328925723.stgit@warthog.procyon.org.uk/ # v2
    Link: https://lore.kernel.org/r/163584184628.4023316.9386282630968981869.stgit@warthog.procyon.org.uk/ # v3
    Link: https://lore.kernel.org/r/163649325519.309189.15072332908703129455.stgit@warthog.procyon.org.uk/ # v4
    Link: https://lore.kernel.org/r/163657850401.834781.1031963517399283294.stgit@warthog.procyon.org.uk/ # v5

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-04-07 09:58:32 -04:00
Aristeu Rozanski 89555aded3 mm/writeback: Add folio_write_one
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2019485
Tested: ran Rafael's set of sanity tests, other than known issues, seems ok

commit 121703c1c817b3c77f61002466d0bfca7e39f25d
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Mar 9 13:48:03 2021 -0500

    mm/writeback: Add folio_write_one

    Transform write_one_page() into folio_write_one() and add a compatibility
    wrapper.  Also move the declaration to pagemap.h as this is page cache
    functionality that doesn't need to be used by the rest of the kernel.

    Saves 58 bytes of kernel text.  While folio_write_one() is 101 bytes
    smaller than write_one_page(), the inlined call to page_folio() expands
    each caller.  There are fewer than ten callers so it doesn't seem worth
    putting a wrapper in the core.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: David Howells <dhowells@redhat.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-04-07 09:58:31 -04:00
Aristeu Rozanski dba43082bb mm/writeback: Add folio_redirty_for_writepage()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2019485
Tested: ran Rafael's set of sanity tests, other than known issues, seems ok

commit cd78ab11a8810dd297f4751d17cc53e3dce36024
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun May 2 23:22:52 2021 -0400

    mm/writeback: Add folio_redirty_for_writepage()

    Reimplement redirty_page_for_writepage() as a wrapper around
    folio_redirty_for_writepage().  Account the number of pages in the
    folio, add kernel-doc and move the prototype to writeback.h.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: David Howells <dhowells@redhat.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-04-07 09:58:30 -04:00
Aristeu Rozanski 5e31a2815b mm/writeback: Add folio_account_redirty()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2019485
Tested: ran Rafael's set of sanity tests, other than known issues, seems ok

commit 25ff8b15537dfa0e1a62d55cfcc48f3c8bd8a76c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon May 3 10:06:55 2021 -0400

    mm/writeback: Add folio_account_redirty()

    Account the number of pages in the folio that we're redirtying.
    Turn account_page_dirty() into a wrapper around it.  Also turn
    the comment on folio_account_redirty() into kernel-doc and
    edit it slightly so it makes sense to its potential callers.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: David Howells <dhowells@redhat.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-04-07 09:58:30 -04:00