JIRA: https://issues.redhat.com/browse/RHEL-27745
This patch is a backport of the following upstream commit:
commit ab428b4c459e62df7dab3b1b783ea03ea06ca895
Author: Jianguo Bao <roidinev@gmail.com>
Date: Sun Sep 17 23:04:01 2023 +0800
mm/writeback: update filemap_dirty_folio() comment
Change to use new address space operation dirty_folio().
Link: https://lkml.kernel.org/r/20230917-trycontrib1-v1-1-db22630b8839@gmail.com
Fixes: 6f31a5a261db ("fs: Add aops->dirty_folio")
Signed-off-by: Jianguo Bau <roidinev@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62760
Conflicts: context errors in pagemap.h
commit 762321dab9a72760bf9aec48362f932717c9424d
Author: Christoph Hellwig <hch@lst.de>
Date: Wed Oct 25 16:10:17 2023 +0200
filemap: add a per-mapping stable writes flag
folio_wait_stable waits for writeback to finish before modifying the
contents of a folio again, e.g. to support check summing of the data
in the block integrity code.
Currently this behavior is controlled by the SB_I_STABLE_WRITES flag
on the super_block, which means it is uniform for the entire file system.
This is wrong for the block device pseudofs which is shared by all
block devices, or file systems that can use multiple devices like XFS
witht the RT subvolume or btrfs (although btrfs currently reimplements
folio_wait_stable anyway).
Add a per-address_space AS_STABLE_WRITES flag to control the behavior
in a more fine grained way. The existing SB_I_STABLE_WRITES is kept
to initialize AS_STABLE_WRITES to the existing default which covers
most cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20231025141020.192413-2-hch@lst.de
Tested-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Bill O'Donnell <bodonnel@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27743
This patch is a backport of the following upstream commit:
commit ed2da9246f324ae88a2dcae629fc2008632ff151
Author: Christoph Hellwig <hch@lst.de>
Date: Wed Jun 28 17:31:44 2023 +0200
mm: remove folio_account_redirty
Fold folio_account_redirty into folio_redirty_for_writepage now
that all other users except for the also unused account_page_redirty
wrapper are gone.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27742
This patch is a backport of the following upstream commit:
commit 6c77b607ee26472fb945aa41734281c39d06d68f
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date: Wed Jun 14 22:36:12 2023 +0800
mm: kill lock|unlock_page_memcg()
Since commit c7c3dec1c9db ("mm: rmap: remove lock_page_memcg()"),
no more user, kill lock_page_memcg() and unlock_page_memcg().
Link: https://lkml.kernel.org/r/20230614143612.62575-1-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rafael Aquini <raquini@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-50004
commit 385d838df280eba6c8680f9777bfa0d0bfe7e8b2
Author: Jan Kara <jack@suse.cz>
Date: Fri Jun 21 16:42:38 2024 +0200
mm: avoid overflows in dirty throttling logic
The dirty throttling logic is interspersed with assumptions that dirty
limits in PAGE_SIZE units fit into 32-bit (so that various multiplications
fit into 64-bits). If limits end up being larger, we will hit overflows,
possible divisions by 0 etc. Fix these problems by never allowing so
large dirty limits as they have dubious practical value anyway. For
dirty_bytes / dirty_background_bytes interfaces we can just refuse to set
so large limits. For dirty_ratio / dirty_background_ratio it isn't so
simple as the dirty limit is computed from the amount of available memory
which can change due to memory hotplug etc. So when converting dirty
limits from ratios to numbers of pages, we just don't allow the result to
exceed UINT_MAX.
This is root-only triggerable problem which occurs when the operator
sets dirty limits to >16 TB.
Link: https://lkml.kernel.org/r/20240621144246.11148-2-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Zach O'Keefe <zokeefe@google.com>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-50004
commit 30139c702048f1097342a31302cbd3d478f50c63
Author: Jan Kara <jack@suse.cz>
Date: Fri Jun 21 16:42:37 2024 +0200
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
commit 8344a3d44be3d18671e18c4ba23bb03dd21e14ad
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Wed Jun 28 19:55:48 2023 +0100
writeback: account the number of pages written back
nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1. Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
end up writing 512x as many pages as it asked for.
Dave added:
: XFS is the only filesystem this would affect, right? AFAIA, nothing
: else enables large folios and uses writeback through
: write_cache_pages() at this point...
:
: In which case, I'd be surprised if much difference, if any, gets
: noticed by anyone.
Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78
Author: Zach O'Keefe <zokeefe@google.com>
Date: Thu Jan 18 10:19:53 2024 -0800
mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again
(struct dirty_throttle_control *)->thresh is an unsigned long, but is
passed as the u32 divisor argument to div_u64(). On architectures where
unsigned long is 64 bytes, the argument will be implicitly truncated.
Use div64_u64() instead of div_u64() so that the value used in the "is
this a safe division" check is the same as the divisor.
Also, remove redundant cast of the numerator to u64, as that should happen
implicitly.
This would be difficult to exploit in memcg domain, given the ratio-based
arithmetic domain_drity_limits() uses, but is much easier in global
writeback domain with a BDI_CAP_STRICTLIMIT-backing device, using e.g.
vm.dirty_bytes=(1<<32)*PAGE_SIZE so that dtc->thresh == (1<<32)
Link: https://lkml.kernel.org/r/20240118181954.1415197-1-zokeefe@google.com
Fixes: f6789593d5 ("mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()")
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Maxim Patlasov <MPatlasov@parallels.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
commit fa151a39a6879144b587f35c0dfcc15e1be9450f
Author: Jingbo Xu <jefflexu@linux.alibaba.com>
Date: Tue Dec 19 22:25:08 2023 +0800
mm: fix arithmetic for max_prop_frac when setting max_ratio
Since now bdi->max_ratio is part per million, fix the wrong arithmetic for
max_prop_frac when setting max_ratio. Otherwise the miscalculated
max_prop_frac will affect the incrementing of writeout completion count
when max_ratio is not 100%.
Link: https://lkml.kernel.org/r/20231219142508.86265-3-jefflexu@linux.alibaba.com
Fixes: efc3e6ad53ea ("mm: split off __bdi_set_max_ratio() function")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
commit e0646b7590084a5bf3b056d3ad871d9379d2c25a
Author: Jingbo Xu <jefflexu@linux.alibaba.com>
Date: Tue Dec 19 22:25:07 2023 +0800
mm: fix arithmetic for bdi min_ratio
Since now bdi->min_ratio is part per million, fix the wrong arithmetic.
Otherwise it will fail with -EINVAL when setting a reasonable min_ratio,
as it tries to set min_ratio to (min_ratio * BDI_RATIO_SCALE) in
percentage unit, which exceeds 100% anyway.
# cat /sys/class/bdi/253\:0/min_ratio
0
# cat /sys/class/bdi/253\:0/max_ratio
100
# echo 1 > /sys/class/bdi/253\:0/min_ratio
-bash: echo: write error: Invalid argument
Link: https://lkml.kernel.org/r/20231219142508.86265-2-jefflexu@linux.alibaba.com
Fixes: 8021fb3232f2 ("mm: split off __bdi_set_min_ratio() function")
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Reported-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Stefan Roesch <shr@devkernel.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27741
commit 452a8f40728065800b5a5b81f1152e9a16d39656
Author: Christoph Hellwig <hch@lst.de>
Date: Tue Mar 7 15:31:25 2023 +0100
mm,jfs: move write_one_page/folio_write_one to jfs
The last remaining user of folio_write_one through the write_one_page
wrapper is jfs, so move the functionality there and hard code the call to
metapage_writepage.
Note that the use of the pagecache by the JFS 'metapage' buffer cache is a
bit odd, and we could probably do without VM-level dirty tracking at all,
but that's a change for another time.
Link: https://lkml.kernel.org/r/20230307143125.27778-4-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Gang He <ghe@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jan Kara via Ocfs2-devel <ocfs2-devel@oss.oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: mpage_writepage() still exists, so update it to use __mpage_writepage() with folio, 0c493b5cf16e2 was backported already, so update fs/nfs/write changes accordingly; dropped ntfs3 changes as we don't support it; adding fix present on merge 3822a7c40997 for gfs2
commit d585bdbeb79aa13b8a9bbe952d90f5252f7fe909
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Thu Jan 26 20:12:54 2023 +0000
fs: convert writepage_t callback to pass a folio
Patch series "Convert writepage_t to use a folio".
More folioisation. I split out the mpage work from everything else
because it completely dominated the patch, but some implementations I just
converted outright.
This patch (of 2):
We always write back an entire folio, but that's currently passed as the
head page. Convert all filesystems that use write_cache_pages() to expect
a folio instead of a page.
Link: https://lkml.kernel.org/r/20230126201255.1681189-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230126201255.1681189-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
commit 9cfb816b1c6c99f4b3c1d4a0fb096162cd17ec71
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Mon Jan 16 19:25:06 2023 +0000
mm/fs: convert inode_attach_wb() to take a folio
Patch series "Writeback folio conversions".
Remove more calls to compound_head() by passing folios around instead of
pages.
This patch (of 2):
The only caller of inode_attach_wb() which doesn't pass NULL already has a
folio, so convert the whole call-chain to take folios.
Link: https://lkml.kernel.org/r/20230116192507.2146150-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20230116192507.2146150-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
commit 0fff435f060c8b29cb068d4068cb2df513046865
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date: Wed Jan 4 13:14:29 2023 -0800
page-writeback: convert write_cache_pages() to use filemap_get_folios_tag()
Convert function to use folios throughout. This is in preparation for the
removal of find_get_pages_range_tag(). This change removes 8 calls to
compound_head(), and the function now supports large folios.
Link: https://lkml.kernel.org/r/20230104211448.4804-5-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Matthew Wilcow (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: dropped RISCV changes, and due missing b59c9dc4d9d47b
commit e9adcfecf572fcfaa9f8525904cf49c709974f73
Author: Mike Kravetz <mike.kravetz@oracle.com>
Date: Tue Jan 3 16:27:32 2023 -0800
mm: remove zap_page_range and create zap_vma_pages
zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas. While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma. In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas. When
crossing a vma boundary, a new mmu_notifier_range_init/end call pair with
the new vma should be made.
Instead of fixing zap_page_range, do the following:
- Create a new routine zap_vma_pages() that will remove all pages within
the passed vma. Most users of zap_page_range pass the entire vma and
can use this new routine.
- For callers of zap_page_range not passing the entire vma, instead call
zap_page_range_single().
- Remove zap_page_range.
[1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com> [s390]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: keeping generic_writepages() until mpage_writepage() gets removed in the future
commit c2ca7a59a4199059556b57cfdf98fcf46039ca6b
Author: Christoph Hellwig <hch@lst.de>
Date: Thu Dec 29 06:10:31 2022 -1000
mm: remove generic_writepages
Now that all external callers are gone, just fold it into do_writepages.
Link: https://lkml.kernel.org/r/20221229161031.391878-7-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me
Conflicts: f3cd4ab0aabf was backported before this change
commit 5a9e34747c9f731bbb6b7fd7521c4fec0d840593
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date: Wed Dec 21 10:08:48 2022 -0800
mm/swap: convert deactivate_page() to folio_deactivate()
Deactivate_page() has already been converted to use folios, this change
converts it to take in a folio argument instead of calling page_folio().
It also renames the function folio_deactivate() to be more consistent with
other folio functions.
[akpm@linux-foundation.org: fix left-over comments, per Yu Zhao]
Link: https://lkml.kernel.org/r/20221221180848.20774-5-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 2c44af4f2aaa260199f218f11920c406e688693c
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:52:13 2022 -0800
mm: add bdi_set_min_ratio_no_scale() function
This introduces bdi_set_min_ratio_no_scale(). It uses the max
granularity for the ratio. This function by the new sysfs knob
min_ratio_fine.
Link: https://lkml.kernel.org/r/20221119005215.3052436-19-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 4e230b406eda9bdf7f8a71e2cc3df18a824abcb0
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:52:10 2022 -0800
mm: add bdi_set_max_ratio_no_scale() function
This introduces bdi_set_max_ratio_no_scale(). It uses the max
granularity for the ratio. This function by the new sysfs knob
max_ratio_fine.
Link: https://lkml.kernel.org/r/20221119005215.3052436-16-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 803c98050569850be5fd51a2025c67622de887d9
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:52:07 2022 -0800
mm: add bdi_set_min_bytes() function
This introduces the bdi_set_min_bytes() function. The min_bytes function
does not store the min_bytes value. Instead it converts the min_bytes
value into the corresponding ratio value.
Link: https://lkml.kernel.org/r/20221119005215.3052436-13-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 8021fb3232f265b81c7e4e7aba15bc3a04ff1fd3
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:52:06 2022 -0800
mm: split off __bdi_set_min_ratio() function
This splits off the __bdi_set_min_ratio() function from the
bdi_set_min_ratio() function. The __bdi_set_min_ratio() function will
also be called from the bdi_set_min_bytes() function, which will be
introduced in the next patch.
Link: https://lkml.kernel.org/r/20221119005215.3052436-12-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 712c00d66a342a3ed375df41c3df7d3d2abad2c0
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:52:05 2022 -0800
mm: add bdi_get_min_bytes() function
This adds a function to return the specified value for min_bytes. It
converts the stored min_ratio of the bdi to the corresponding bytes
value. This is an approximation as it is based on the value that is
returned by global_dirty_limits(), which can change. The returned
value can be different than the value when the min_bytes value was set.
Link: https://lkml.kernel.org/r/20221119005215.3052436-11-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 1bf27e98d26d1e62166a456ef17460be085cbe0b
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:52:02 2022 -0800
mm: add bdi_set_max_bytes() function
This introduces the bdi_set_max_bytes() function. The max_bytes function
does not store the max_bytes value. Instead it converts the max_bytes
value into the corresponding ratio value.
Link: https://lkml.kernel.org/r/20221119005215.3052436-8-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit efc3e6ad53ea14225b434fddca261c9a1c56c707
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:52:01 2022 -0800
mm: split off __bdi_set_max_ratio() function
This splits off __bdi_set_max_ratio() from bdi_set_max_ratio().
__bdi_set_max_ratio() will also be called from bdi_set_max_bytes(),
which will be introduced in the next patch.
Link: https://lkml.kernel.org/r/20221119005215.3052436-7-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 00df7d51263b46ed93f7572e2d09579746f7b1eb
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:52:00 2022 -0800
mm: add bdi_get_max_bytes() function
This adds a function to return the specified value for max_bytes. It
converts the stored max_ratio of the bdi to the corresponding bytes
value. It introduces the bdi_get_bytes helper function to do the
conversion. This is an approximation as it is based on the value that is
returned by global_dirty_limits(), which can change. The helper function
will also be used by the min_bytes bdi knob.
Link: https://lkml.kernel.org/r/20221119005215.3052436-6-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit ae82291e9ca47c3d6da6b77a00f427754aca413e
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:51:59 2022 -0800
mm: use part per 1000000 for bdi ratios
To get finer granularity for ratio calculations use part per million
instead of percentiles. This is especially important if we want to
automatically convert byte values to ratios. Otherwise the values that
are actually used can be quite different. This is also important for
machines with more main memory (1% of 256GB is already 2.5GB).
Link: https://lkml.kernel.org/r/20221119005215.3052436-5-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Chris Mason <clm@meta.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-27739
This patch is a backport of the following upstream commit:
commit 8e9d5ead865a1a7af74a444d2f00f1ef4539bfba
Author: Stefan Roesch <shr@devkernel.io>
Date: Fri Nov 18 16:51:56 2022 -0800
mm: add bdi_set_strict_limit() function
Patch series "mm/block: add bdi sysfs knobs", v4.
At meta network block devices (nbd) are used to implement remote block
storage. In testing and during production it has been observed that these
network block devices can consume a huge portion of the dirty writeback
cache and writeback can take a considerable time.
To be able to give stricter limits, I'm proposing the following changes:
1) introduce strictlimit knob
Currently the max_ratio knob exists to limit the dirty_memory. However
this knob only applies once (dirty_ratio + dirty_background_ratio) / 2
has been reached.
With the BDI_CAP_STRICTLIMIT flag, the max_ratio can be applied without
reaching that limit. This change exposes that knob.
This knob can also be useful for NFS, fuse filesystems and USB devices.
2) Use part of 1000000 internal calculation
The max_ratio is based on percentage. With the current machine sizes
percentage values can be very high (1% of a 256GB main memory is already
2.5GB). This change uses part of 1000000 instead of percentages for the
internal calculations.
3) Introduce two new sysfs knobs: min_bytes and max_bytes.
Currently all calculations are based on ratio, but for a user it often
more convenient to specify a limit in bytes. The new knobs will not
store bytes values, instead they will translate the byte value to a
corresponding ratio. As the internal values are now part of 1000, the
ratio is closer to the specified value. However the value should be more
seen as an approximation as it can fluctuate over time.
3) Introduce two new sysfs knobs: min_ratio_fine and max_ratio_fine.
The granularity for the existing sysfs bdi knobs min_ratio and max_ratio
is based on percentage values. The new sysfs bdi knobs min_ratio_fine
and max_ratio_fine allow to specify the ratio as part of 1 million.
This patch (of 20):
This adds the bdi_set_strict_limit function to be able to set/unset the
BDI_CAP_STRICTLIMIT flag.
Link: https://lkml.kernel.org/r/20221119005215.3052436-1-shr@devkernel.io
Link: https://lkml.kernel.org/r/20221119005215.3052436-2-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Chris Mason <clm@meta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Audra Mitchell <audra@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-25988
commit f814bdda774c183b0cc15ec8f3b6e7c6f4527ba5
Author: Jan Kara <jack@suse.cz>
Date: Tue Jan 23 18:58:26 2024 +0100
blk-wbt: Fix detection of dirty-throttled tasks
The detection of dirty-throttled tasks in blk-wbt has been subtly broken
since its beginning in 2016. Namely if we are doing cgroup writeback and
the throttled task is not in the root cgroup, balance_dirty_pages() will
set dirty_sleep for the non-root bdi_writeback structure. However
blk-wbt checks dirty_sleep only in the root cgroup bdi_writeback
structure. Thus detection of recently throttled tasks is not working in
this case (we noticed this when we switched to cgroup v2 and suddently
writeback was slow).
Since blk-wbt has no easy way to get to proper bdi_writeback and
furthermore its intention has always been to work on the whole device
rather than on individual cgroups, just move the dirty_sleep timestamp
from bdi_writeback to backing_dev_info. That fixes the checking for
recently throttled task and saves memory for everybody as a bonus.
CC: stable@vger.kernel.org
Fixes: b57d74aff9 ("writeback: track if we're sleeping on progress in balance_dirty_pages()")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20240123175826.21452-1-jack@suse.cz
[axboe: fixup indentation errors]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-1848
commit 611df5d6616d80a22906c352ccd80c395982fbd9
Author: Stefan Roesch <shr@fb.com>
Date: Mon Sep 12 12:27:41 2022 -0700
mm: export balance_dirty_pages_ratelimited_flags()
Export the function balance_dirty_pages_ratelimited_flags(). It is now
also called from btrfs.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit fe6c9c6e3e3e332b998393d214fba9d09ab0acb0
Author: Jan Kara <jack@suse.cz>
Date: Thu Jun 23 10:51:46 2022 -0700
mm: Add balance_dirty_pages_ratelimited_flags() function
This adds the helper function balance_dirty_pages_ratelimited_flags().
It adds the parameter flags to balance_dirty_pages_ratelimited().
The flags parameter is passed to balance_dirty_pages(). For async
buffered writes the flag value will be BDP_ASYNC.
If balance_dirty_pages() gets called for async buffered write, we don't
want to wait. Instead we need to indicate to the caller that throttling
is needed so that it can stop writing and offload the rest of the write
to a context that can block.
The new helper function is also used by balance_dirty_pages_ratelimited().
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Stefan Roesch <shr@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220623175157.1715274-4-shr@fb.com
[axboe: fix kerneltest bot 'ret' issue]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit e92eebbb09218e128e559cf12b65317721309324
Author: Jan Kara <jack@suse.cz>
Date: Thu Jun 23 10:51:45 2022 -0700
mm: Move updates of dirty_exceeded into one place
Transition of wb->dirty_exceeded from 0 to 1 happens before we go to
sleep in balance_dirty_pages() while transition from 1 to 0 happens when
exiting from balance_dirty_pages(), possibly based on old values. This
does not make a lot of sense since wb->dirty_exceeded should simply
reflect whether wb is over dirty limit and so we should ratelimit
entering to balance_dirty_pages() less. Move the two updates together.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Stefan Roesch <shr@fb.com>
Link: https://lore.kernel.org/r/20220623175157.1715274-3-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit ea6813be07dcdc072aa9ad18099115a74cecb5e1
Author: Jan Kara <jack@suse.cz>
Date: Thu Jun 23 10:51:44 2022 -0700
mm: Move starting of background writeback into the main balancing loop
We start background writeback if we are over background threshold after
exiting the main loop in balance_dirty_pages(). This may result in
basing the decision on already stale values (we may have slept for
significant amount of time) and it is also inconvenient for refactoring
needed for async dirty throttling. Move the check into the main waiting
loop.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Stefan Roesch <shr@fb.com>
Link: https://lore.kernel.org/r/20220623175157.1715274-2-shr@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 2ca456c24801e439256c0ec7dbe21eba7b01544e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Thu Apr 28 14:21:02 2022 -0400
filemap: Update the folio_mark_dirty documentation
The previous comment was not terribly helpful. Be a bit more explicit
about the necessary locking environment.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 21f0dd88f23dc9dc46b781f8ec9acf975dca4e6e
Author: Chen Wandun <chenwandun@huawei.com>
Date: Thu Apr 28 23:15:57 2022 -0700
mm: rework calculation of bdi_min_ratio in bdi_set_min_ratio
In function bdi_set_min_ratio, min_ratio is unsigned int, it will
result underflow when setting min_ratio below bdi->min_ratio, it
is confusing. Rework it, no functional change.
Link: https://lkml.kernel.org/r/20220422095159.2858305-1-chenwandun@huawei.com
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit 3c6a4cba3138d1aeeb8fd917178c6578b9b8ae29
Author: Luis Chamberlain <mcgrof@kernel.org>
Date: Fri Apr 15 15:08:02 2022 -0700
mm: fix unused variable kernel warning when SYSCTL=n
When CONFIG_SYSCTL=n the variable dirty_bytes_min which is just used
as a minimum to a proc handler is not used. So just move this under
the ifdef for CONFIG_SYSCTL.
Fixes: aa779e510219 ("mm: move page-writeback sysctls to their own file")
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2160210
commit aa779e5102195e1d9ade95dcbc0bfbd8f916eb59
Author: zhanglianjie <zhanglianjie@uniontech.com>
Date: Thu Feb 17 18:51:51 2022 -0800
mm: move page-writeback sysctls to their own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we just
care about the core logic.
So move the page-writeback sysctls to its own file.
[akpm@linux-foundation.org: coding-style cleanups]
akpm@linux-foundation.org: fix CONFIG_SYSCTL=n warnings]
Link: https://lkml.kernel.org/r/20220129012955.26594-1-zhanglianjie@uniontech.com
Signed-off-by: zhanglianjie <zhanglianjie@uniontech.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
commit f87904c075515f3e1d8f4a7115869d3b914674fd
Author: Khazhismel Kumykov <khazhy@chromium.org>
Date: Mon Aug 1 08:50:34 2022 -0700
writeback: avoid use-after-free after removing device
When a disk is removed, bdi_unregister gets called to stop further
writeback and wait for associated delayed work to complete. However,
wb_inode_writeback_end() may schedule bandwidth estimation dwork after
this has completed, which can result in the timer attempting to access the
just freed bdi_writeback.
Fix this by checking if the bdi_writeback is alive, similar to when
scheduling writeback work.
Since this requires wb->work_lock, and wb_inode_writeback_end() may get
called from interrupt, switch wb->work_lock to an irqsafe lock.
Link: https://lkml.kernel.org/r/20220801155034.3772543-1-khazhy@google.com
Fixes: 45a2966fd641 ("writeback: fix bandwidth estimate for spiky workload")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Michael Stapelberg <stapelberg+linux@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2089498
Signed-off-by: Nico Pache <npache@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2120352
commit 854d8e36168d79ad09a831d60bd4d835ad33e188
Author: Miaohe Lin <linmiaohe@huawei.com>
Date: Tue Mar 22 14:39:31 2022 -0700
mm/writeback: minor clean up for highmem_dirtyable_memory
Since commit a804552b9a ("mm/page-writeback.c: fix
dirty_balance_reserve subtraction from dirtyable memory"), local
variable x can not be negative. And it can not overflow when it is the
total number of dirtyable highmem pages. Thus remove the unneeded
comment and overflow check.
Link: https://lkml.kernel.org/r/20220224115416.46089-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2120352
commit 3a3bae50af5d73fab5da20484029de77ca67bb2e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Wed Feb 9 20:22:15 2022 +0000
fs: Remove aops ->set_page_dirty
With all implementations converted to ->dirty_folio, we can stop calling
this fallback method and remove it entirely.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2120352
commit 46de8b979492e1377947700ecb1e3169088668b2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Wed Feb 9 20:22:13 2022 +0000
fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio
This is a mechanical change.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: Drop change to fs/ntfs3/inode.c - file not in CS9, even if it
was it wouldn't be a supported config.
Bugzilla: https://bugzilla.redhat.com/2120352
commit e621900ad28b748e058b81d6078a5d5eb37b3973
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Wed Feb 9 20:22:12 2022 +0000
fs: Convert __set_page_dirty_buffers to block_dirty_folio
Convert all callers; mostly this is just changing the aops to point
at it, but a few implementations need a little more work.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Conflicts: The presence of
d1d8a3b4d06d ("mm: Turn isolate_lru_page() into folio_isolate_lru()")
causes a merge coflict due to differing context. Jus remove the
timeout argument to reclaim_throttle
Bugzilla: https://bugzilla.redhat.com/2120352
commit c3f4a9a2b082c5392fbff17c6d8551154add5fdb
Author: Mel Gorman <mgorman@techsingularity.net>
Date: Fri Nov 5 13:42:42 2021 -0700
mm/vmscan: centralise timeout values for reclaim_throttle
Neil Brown raised concerns about callers of reclaim_throttle specifying
a timeout value. The original timeout values to congestion_wait() were
probably pulled out of thin air or copy&pasted from somewhere else.
This patch centralises the timeout values and selects a timeout based on
the reason for reclaim throttling. These figures are also pulled out of
the same thin air but better values may be derived
Running a workload that is throttling for inappropriate periods and
tracing mm_vmscan_throttled can be used to pick a more appropriate
value. Excessive throttling would pick a lower timeout where as
excessive CPU usage in reclaim context would select a larger timeout.
Ideally a large value would always be used and the wakeups would occur
before a timeout but that requires careful testing.
Link: https://lkml.kernel.org/r/20211022144651.19914-7-mgorman@techsingulari
ty.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Darrick J . Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/2120352
commit 8d58802fc9de1b416601d90da794a3feaad1898d
Author: Mel Gorman <mgorman@techsingularity.net>
Date: Fri Nov 5 13:42:35 2021 -0700
mm/writeback: throttle based on page writeback instead of congestion
do_writepages throttles on congestion if the writepages() fails due to a
lack of memory but congestion_wait() is partially broken as the
congestion state is not updated for all BDIs.
This patch stalls waiting for a number of pages to complete writeback
that located on the local node. The main weakness is that there is no
correlation between the location of the inode's pages and locality but
that is still better than congestion_wait.
Link: https://lkml.kernel.org/r/20211022144651.19914-5-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Darrick J . Wong" <djwong@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Rik van Riel <riel@surriel.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
commit 566d3362885aab04d6b0f885f12db3176ca3a032
Author: Hugh Dickins <hughd@google.com>
Date: Thu Mar 24 18:13:59 2022 -0700
mm: warn on deleting redirtied only if accounted
filemap_unaccount_folio() has a WARN_ON_ONCE(folio_test_dirty(folio)). It
is good to warn of late dirtying on a persistent filesystem, but late
dirtying on tmpfs can only lose data which is expected to be thrown away;
and it's a pity if that warning comes ONCE on tmpfs, then hides others
which really matter. Make it conditional on mapping_cap_writeback().
Cleanup: then folio_account_cleaned() no longer needs to check that for
itself, and so no longer needs to know the mapping.
Link: https://lkml.kernel.org/r/b5a1106c-7226-a5c6-ad41-ad4832cae1f@google.com
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jan Kara <jack@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
commit 6f31a5a261dbbe7bf7f585dfe81f8acd4b25ec3b
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Wed Feb 9 20:22:00 2022 +0000
fs: Add aops->dirty_folio
This replaces ->set_page_dirty(). It returns a bool instead of an int
and takes the address_space as a parameter instead of expecting the
implementations to retrieve the address_space from the page. This is
particularly important for filesystems which use FS_OPS for swap.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
Tested-by: David Howells <dhowells@redhat.com> # afs
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
commit a229a4f00d1eab3f665b92dc9f8dbceca9b8f49c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Sun Dec 20 06:44:51 2020 -0500
mm/writeback: Improve __folio_mark_dirty() comment
Add some notes about how this function needs to be called.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2019485
Tested: ran Rafael's set of sanity tests, other than known issues, seems ok
commit 121703c1c817b3c77f61002466d0bfca7e39f25d
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Tue Mar 9 13:48:03 2021 -0500
mm/writeback: Add folio_write_one
Transform write_one_page() into folio_write_one() and add a compatibility
wrapper. Also move the declaration to pagemap.h as this is page cache
functionality that doesn't need to be used by the rest of the kernel.
Saves 58 bytes of kernel text. While folio_write_one() is 101 bytes
smaller than write_one_page(), the inlined call to page_folio() expands
each caller. There are fewer than ten callers so it doesn't seem worth
putting a wrapper in the core.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2019485
Tested: ran Rafael's set of sanity tests, other than known issues, seems ok
commit cd78ab11a8810dd297f4751d17cc53e3dce36024
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Sun May 2 23:22:52 2021 -0400
mm/writeback: Add folio_redirty_for_writepage()
Reimplement redirty_page_for_writepage() as a wrapper around
folio_redirty_for_writepage(). Account the number of pages in the
folio, add kernel-doc and move the prototype to writeback.h.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2019485
Tested: ran Rafael's set of sanity tests, other than known issues, seems ok
commit 25ff8b15537dfa0e1a62d55cfcc48f3c8bd8a76c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date: Mon May 3 10:06:55 2021 -0400
mm/writeback: Add folio_account_redirty()
Account the number of pages in the folio that we're redirtying.
Turn account_page_dirty() into a wrapper around it. Also turn
the comment on folio_account_redirty() into kernel-doc and
edit it slightly so it makes sense to its potential callers.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>