Commit Graph

160 Commits

Author SHA1 Message Date
Rafael Aquini 450b6d66af mm: support order-1 folios in the page cache
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 8897277acfef7f70fdecc054073bea2542fc7a1b
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Feb 26 15:55:28 2024 -0500

    mm: support order-1 folios in the page cache

    Folios of order 1 have no space to store the deferred list.  This is not a
    problem for the page cache as file-backed folios are never placed on the
    deferred list.  All we need to do is prevent the core MM from touching the
    deferred list for order 1 folios and remove the code which prevented us
    from allocating order 1 folios.

    Link: https://lore.kernel.org/linux-mm/90344ea7-4eec-47ee-5996-0c22f42d6a6a@google.com/
    Link: https://lkml.kernel.org/r/20240226205534.1603748-3-zi.yan@sent.com
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Zi Yan <ziy@nvidia.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Luis Chamberlain <mcgrof@kernel.org>
    Cc: Michal Koutny <mkoutny@suse.com>
    Cc: Roman Gushchin <roman.gushchin@linux.dev>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: Zach O'Keefe <zokeefe@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:24:28 -05:00
Rafael Aquini fa23ed8367 mm/readahead: do not allow order-1 folio
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit ec056cef76a525706601b32048f174f9bea72c7c
Author: Ryan Roberts <ryan.roberts@arm.com>
Date:   Fri Dec 1 16:10:45 2023 +0000

    mm/readahead: do not allow order-1 folio

    The THP machinery does not support order-1 folios because it requires meta
    data spanning the first 3 `struct page`s.  So order-2 is the smallest
    large folio that we can safely create.

    There was a theoretical bug whereby if ra->size was 2 or 3 pages (due to
    the device-specific bdi->ra_pages being set that way), we could end up
    with order = 1.  Fix this by unconditionally checking if the preferred
    order is 1 and if so, set it to 0.  Previously this was done in a few
    specific places, but with this refactoring it is done just once,
    unconditionally, at the end of the calculation.

    This is a theoretical bug found during review of the code; I have no
    evidence to suggest this manifests in the real world (I expect all
    device-specific ra_pages values are much bigger than 3).

    Link: https://lkml.kernel.org/r/20231201161045.3962614-1-ryan.roberts@arm.com
    Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:23:50 -05:00
Rafael Aquini a726366716 mm: remove unnecessary pagevec includes
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 994ec4e29b3de188d11fe60d17403285fcc8917a
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jun 21 17:45:57 2023 +0100

    mm: remove unnecessary pagevec includes

    These files no longer need pagevec.h, mostly due to function declarations
    being moved out of it.

    Link: https://lkml.kernel.org/r/20230621164557.3510324-14-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:37:33 -04:00
Donald Dutile 700d99ffeb mm/readahead: limit page cache size in page_cache_ra_order()
JIRA: https://issues.redhat.com/browse/RHEL-14441

commit 1f789a45c3f1aa77531db21768fca70b66c0eeb1
Author: Gavin Shan <gshan@redhat.com>
Date:   Thu Jun 27 10:39:50 2024 +1000

    mm/readahead: limit page cache size in page_cache_ra_order()

    In page_cache_ra_order(), the maximal order of the page cache to be
    allocated shouldn't be larger than MAX_PAGECACHE_ORDER.  Otherwise, it's
    possible the large page cache can't be supported by xarray when the
    corresponding xarray entry is split.

    For example, HPAGE_PMD_ORDER is 13 on ARM64 when the base page size is
    64KB.  The PMD-sized page cache can't be supported by xarray.

    Link: https://lkml.kernel.org/r/20240627003953.1262512-3-gshan@redhat.com
    Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
    Signed-off-by: Gavin Shan <gshan@redhat.com>
    Acked-by: David Hildenbrand <david@redhat.com>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Don Dutile <ddutile@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Ryan Roberts <ryan.roberts@arm.com>
    Cc: William Kucharski <william.kucharski@oracle.com>
    Cc: Zhenyu Zhang <zhenyzha@redhat.com>
    Cc: <stable@vger.kernel.org>    [5.18+]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-07-12 12:34:59 -04:00
Donald Dutile f586d210cd readahead: use ilog2 instead of a while loop in page_cache_ra_order()
JIRA: https://issues.redhat.com/browse/RHEL-14441

commit e03c16fb4af1dfc615a4e1f51be0d5fe5840b904
Author: Pankaj Raghav <p.raghav@samsung.com>
Date:   Mon Jan 15 11:25:22 2024 +0100

    readahead: use ilog2 instead of a while loop in page_cache_ra_order()

    A while loop is used to adjust the new_order to be lower than the
    ra->size.  ilog2 could be used to do the same instead of using a loop.

    ilog2 typically resolves to a bit scan reverse instruction.  This is
    particularly useful when ra->size is smaller than the 2^new_order as it
    resolves in one instruction instead of looping to find the new_order.

    No functional changes.

    Link: https://lkml.kernel.org/r/20240115102523.2336742-1-kernel@pankajraghav.com
    Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-07-12 12:34:59 -04:00
Donald Dutile 860e3d9a6e filemap: Allow __filemap_get_folio to allocate large folios
JIRA: https://issues.redhat.com/browse/RHEL-14441

Conflicts: Same RHEL9 backport diff from upstream e999a5c5a19cf.

commit 4f66170119107f1452d2438ba4606e105e9e3afe
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri May 19 16:10:37 2023 -0400

    filemap: Allow __filemap_get_folio to allocate large folios

    Allow callers of __filemap_get_folio() to specify a preferred folio
    order in the FGP flags.  This is only honoured in the FGP_CREATE path;
    if there is already a folio in the page cache that covers the index,
    we will return it, no matter what its order is.  No create-around is
    attempted; we will only create folios which start at the specified index.
    Unmodified callers will continue to allocate order 0 folios.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Donald Dutile <ddutile@redhat.com>
2024-07-12 12:34:59 -04:00
Nico Pache 1542c42254 mm: use memalloc_nofs_save() in page_cache_ra_order()
commit 30153e4466647a17eebfced13eede5cbe4290e69
Author: Kefeng Wang <wangkefeng.wang@huawei.com>
Date:   Fri Apr 26 19:29:38 2024 +0800

    mm: use memalloc_nofs_save() in page_cache_ra_order()

    See commit f2c817bed5 ("mm: use memalloc_nofs_save in readahead path"),
    ensure that page_cache_ra_order() do not attempt to reclaim file-backed
    pages too, or it leads to a deadlock, found issue when test ext4 large
    folio.

     INFO: task DataXceiver for:7494 blocked for more than 120 seconds.
     "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
     task:DataXceiver for state:D stack:0     pid:7494  ppid:1      flags:0x00000200
     Call trace:
      __switch_to+0x14c/0x240
      __schedule+0x82c/0xdd0
      schedule+0x58/0xf0
      io_schedule+0x24/0xa0
      __folio_lock+0x130/0x300
      migrate_pages_batch+0x378/0x918
      migrate_pages+0x350/0x700
      compact_zone+0x63c/0xb38
      compact_zone_order+0xc0/0x118
      try_to_compact_pages+0xb0/0x280
      __alloc_pages_direct_compact+0x98/0x248
      __alloc_pages+0x510/0x1110
      alloc_pages+0x9c/0x130
      folio_alloc+0x20/0x78
      filemap_alloc_folio+0x8c/0x1b0
      page_cache_ra_order+0x174/0x308
      ondemand_readahead+0x1c8/0x2b8
      page_cache_async_ra+0x68/0xb8
      filemap_readahead.isra.0+0x64/0xa8
      filemap_get_pages+0x3fc/0x5b0
      filemap_splice_read+0xf4/0x280
      ext4_file_splice_read+0x2c/0x48 [ext4]
      vfs_splice_read.part.0+0xa8/0x118
      splice_direct_to_actor+0xbc/0x288
      do_splice_direct+0x9c/0x108
      do_sendfile+0x328/0x468
      __arm64_sys_sendfile64+0x8c/0x148
      invoke_syscall+0x4c/0x118
      el0_svc_common.constprop.0+0xc8/0xf0
      do_el0_svc+0x24/0x38
      el0_svc+0x4c/0x1f8
      el0t_64_sync_handler+0xc0/0xc8
      el0t_64_sync+0x188/0x190

    Link: https://lkml.kernel.org/r/20240426112938.124740-1-wangkefeng.wang@huawei.com
    Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
    Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Zhang Yi <yi.zhang@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

CVE: CVE-2024-36882
JIRA: https://issues.redhat.com/browse/RHEL-39635
Signed-off-by: Nico Pache <npache@redhat.com>
2024-06-03 15:27:34 -06:00
Nico Pache 9cea784254 vfs: fix readahead(2) on block devices
commit 7116c0af4b8414b2f19fdb366eea213cbd9d91c2
Author: Reuben Hawkins <reubenhwk@gmail.com>
Date:   Mon Oct 2 20:57:04 2023 -0500

    vfs: fix readahead(2) on block devices

    Readahead was factored to call generic_fadvise.  That refactor added an
    S_ISREG restriction which broke readahead on block devices.

    In addition to S_ISREG, this change checks S_ISBLK to fix block device
    readahead.  There is no change in behavior with any file type besides block
    devices in this change.

    Fixes: 3d8f761531 ("vfs: implement readahead(2) using POSIX_FADV_WILLNEED")
    Signed-off-by: Reuben Hawkins <reubenhwk@gmail.com>
    Link: https://lore.kernel.org/r/20231003015704.2415-1-reubenhwk@gmail.com
    Reviewed-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:32 -06:00
Nico Pache f834ed50c4 readahead: avoid multiple marked readahead pages
commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6
Author: Jan Kara <jack@suse.cz>
Date:   Thu Jan 4 09:58:39 2024 +0100

    readahead: avoid multiple marked readahead pages

    ra_alloc_folio() marks a page that should trigger next round of async
    readahead.  However it rounds up computed index to the order of page being
    allocated.  This can however lead to multiple consecutive pages being
    marked with readahead flag.  Consider situation with index == 1, mark ==
    1, order == 0.  We insert order 0 page at index 1 and mark it.  Then we
    bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page
    at index 2 is marked as well.  Then we bump order to 2, index is
    incremented to 4, mark gets rounded to 4 so page at index 4 is marked as
    well.  The fact that multiple pages get marked within a single readahead
    window confuses the readahead logic and results in readahead window being
    trimmed back to 1.  This situation is triggered in particular when maximum
    readahead window size is not a power of two (in the observed case it was
    768 KB) and as a result sequential read throughput suffers.

    Fix the problem by rounding 'mark' down instead of up.  Because the index
    is naturally aligned to 'order', we are guaranteed 'rounded mark' == index
    iff 'mark' is within the page we are allocating at 'index' and thus
    exactly one page is marked with readahead flag as required by the
    readahead code and sequential read performance is restored.

    This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix
    readahead with large folios").  The commit changed the rounding with the
    rationale:

    "...  we were setting the readahead flag on the folio which contains the
    last byte read from the block.  This is wrong because we will trigger
    readahead at the end of the read without waiting to see if a subsequent
    read is going to use the pages we just read."

    Although this is true, the fact is this was always the case with read
    sizes not aligned to folio boundaries and large folios in the page cache
    just make the situation more obvious (and frequent).  Also for sequential
    read workloads it is better to trigger the readahead earlier rather than
    later.  It is true that the difference in the rounding and thus earlier
    triggering of the readahead can result in reading more for semi-random
    workloads.  However workloads really suffering from this seem to be rare.
    In particular I have verified that the workload described in commit
    b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading
    random 100k blocks from a file like:

    [reader]
    bs=100k
    rw=randread
    numjobs=1
    size=64g
    runtime=60s

    is not impacted by the rounding change and achieves ~70MB/s in both cases.

    [jack@suse.cz: fix one more place where mark rounding was done as well]
      Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@suse.cz
    Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz
    Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios")
    Signed-off-by: Jan Kara <jack@suse.cz>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Guo Xuenan <guoxuenan@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

JIRA: https://issues.redhat.com/browse/RHEL-5619
Signed-off-by: Nico Pache <npache@redhat.com>
2024-04-30 17:51:29 -06:00
Aristeu Rozanski 03032ae71a readahead: convert readahead_expand() to use a folio
JIRA: https://issues.redhat.com/browse/RHEL-27740
Tested: by me

commit 11a980420719712f419dbb325940907f5d1afbdd
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Mon Jan 16 19:39:41 2023 +0000

    readahead: convert readahead_expand() to use a folio

    Replace the uses of page with a folio.  Also add a missing test for
    workingset in the leading edge expansion.

    Link: https://lkml.kernel.org/r/20230116193941.2148487-4-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2024-04-29 14:33:10 -04:00
Chris von Recklinghausen a1981f7607 filemap: Don't release a locked folio
Bugzilla: https://bugzilla.redhat.com/2160210

commit 6bf74cddcffac0bc5ee0fad724aac778d2e53f75
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Jun 7 15:45:53 2022 -0400

    filemap: Don't release a locked folio

    We must hold a reference over the call to filemap_release_folio(),
    otherwise the page cache will put the last reference to the folio
    before we unlock it, leading to splats like this:

     BUG: Bad page state in process u8:5  pfn:1ab1f4
     page:ffffea0006ac7d00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x28b1de pfn:0x1ab1f4
     flags: 0x17ff80000040001(locked|reclaim|node=0|zone=2|lastcpupid=0xfff)
     raw: 017ff80000040001 dead000000000100 dead000000000122 0000000000000000
     raw: 000000000028b1de 0000000000000000 00000000ffffffff 0000000000000000
     page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set

    It's an error path, so it doesn't see much testing.

    Reported-by: Darrick J. Wong <djwong@kernel.org>
    Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()")
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:13 -04:00
Chris von Recklinghausen 764d797fb9 mm,fs: Remove aops->readpage
Conflicts: mm/filemap.c - We already have
	176042404ee6 ("mm: add PSI accounting around ->read_folio and ->readahead calls")
	so just replace the logic to call either read_page or read_folio
	with an unconditional call to read_folio

Bugzilla: https://bugzilla.redhat.com/2160210

commit 7e0a126519b82648b254afcd95a168c15f65ea40
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Apr 29 11:53:28 2022 -0400

    mm,fs: Remove aops->readpage

    With all implementations of aops->readpage converted to aops->read_folio,
    we can stop checking whether it's set and remove the member from aops.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:59 -04:00
Chris von Recklinghausen 03fadfa7f6 fs: Introduce aops->read_folio
Conflicts: mm/filemap.c - We already have
	176042404ee6 ("mm: add PSI accounting around ->read_folio and ->readahead calls")
	so put the logic to call read_page/read_folio between the
	accounting calls

Bugzilla: https://bugzilla.redhat.com/2160210

commit 5efe7448a1426250b5747c10ad438517f44f1e51
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Apr 29 08:43:23 2022 -0400

    fs: Introduce aops->read_folio

    Change all the callers of ->readpage to call ->read_folio in preference,
    if it exists.  This is a transitional duplication, and will be removed
    by the end of the series.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:58 -04:00
Chris von Recklinghausen b985a281a1 readahead: Use a folio in read_pages()
Bugzilla: https://bugzilla.redhat.com/2160210

commit a42634a6c07d5a66e8ad446ad0f184c0c78012ff
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu Mar 31 14:15:59 2022 -0400

    readahead: Use a folio in read_pages()

    Handle multi-page folios correctly and removes a few calls to
    compound_head().

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:58 -04:00
Chris von Recklinghausen cfacd80e4b riscv: compat: syscall: Add compat_sys_call_table implementation
Bugzilla: https://bugzilla.redhat.com/2160210

commit 59c10c52f573faca862cda5ebcdd43831608eb5a
Author: Guo Ren <guoren@linux.alibaba.com>
Date:   Tue Apr 5 15:13:05 2022 +0800

    riscv: compat: syscall: Add compat_sys_call_table implementation

    Implement compat sys_call_table and some system call functions:
    truncate64, ftruncate64, fallocate, pread64, pwrite64,
    sync_file_range, readahead, fadvise64_64 which need argument
    translation.

    Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
    Signed-off-by: Guo Ren <guoren@kernel.org>
    Reviewed-by: Arnd Bergmann <arnd@arndb.de>
    Tested-by: Heiko Stuebner <heiko@sntech.de>
    Link: https://lore.kernel.org/r/20220405071314.3225832-12-guoren@kernel.org
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:51 -04:00
Frantisek Hrbata d9819eb3e5 Merge: block: update with v6.1-rc2
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1517

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

Signed-off-by: Ming Lei <ming.lei@redhat.com>

Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Nigel Croxon <ncroxon@redhat.com>
Approved-by: Jeff Moyer <jmoyer@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-11-03 13:30:02 -04:00
Ming Lei ff9f752d02 mm: add PSI accounting around ->read_folio and ->readahead calls
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144
Conflicts: context difference because rhel9 doesn't backport
290e1a320437 ("filemap: Use filemap_read_folio() in do_read_cache_folio()")

commit 176042404ee6a96ba7e9054e1bda6220360a26ad
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Sep 15 10:41:56 2022 +0100

    mm: add PSI accounting around ->read_folio and ->readahead calls

    PSI tries to account for the cost of bringing back in pages discarded by
    the MM LRU management.  Currently the prime place for that is hooked into
    the bio submission path, which is a rather bad place:

     - it does not actually account I/O for non-block file systems, of which
       we have many
     - it adds overhead and a layering violation to the block layer

    Add the accounting into the two places in the core MM code that read
    pages into an address space by calling into ->read_folio and ->readahead
    so that the entire file system operations are covered, to broaden
    the coverage and allow removing the accounting in the block layer going
    forward.

    As psi_memstall_enter can deal with nested calls this will not lead to
    double accounting even while the bio annotations are still present.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Link: https://lore.kernel.org/r/20220915094200.139713-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:11 +08:00
Chris von Recklinghausen b317585314 readahead: Update comments
Bugzilla: https://bugzilla.redhat.com/2120352

commit 1e4702806faca1551733f58be17ea11a9d214e91
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu Mar 31 15:02:34 2022 -0400

    readahead: Update comments

     - Refer to folios where appropriate, not pages (Matthew Wilcox)
     - Eliminate references to the internal PG_readhead
     - Use "readahead" consistently - not "read-ahead" or "read ahead"
       (mostly Neil Brown)
     - Clarify some sections that, on reflection, weren't very clear (Neil
       Brown)
     - Minor punctuation/spelling fixes (Neil Brown)

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:28:04 -04:00
Chris von Recklinghausen ea2b8aa178 mm: remove the skip_page argument to read_pages
Conflicts: mm/readahead.c - We already have
	730633f0b7f9 ("mm: Protect operations adding pages to page cache with in
validate_lock")
	so keep the call to filemap_invalidate_lock_shared

Bugzilla: https://bugzilla.redhat.com/2120352

commit b4e089d705eef82364945abae325cd241c80e107
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Mar 31 05:35:55 2022 -0700

    mm: remove the skip_page argument to read_pages

    The skip_page argument to read_pages controls if rac->_index is
    incremented before returning from the function.  Just open code that in
    the callers.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
    Acked-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:28:04 -04:00
Chris von Recklinghausen 9989661542 mm: remove the pages argument to read_pages
Conflicts: mm/readahead.c - We already have
	730633f0b7f9 ("mm: Protect operations adding pages to page cache with invalidate_lock")
	so keep the call to filemap_invalidate_lock_shared

Bugzilla: https://bugzilla.redhat.com/2120352

commit dfd8b4fc76d5f7ae5663328b791c4acf222c4d39
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Mar 31 05:35:23 2022 -0700

    mm: remove the pages argument to read_pages

    This is always an empty list or NULL with the removal of the ->readahead
    support, so remove it.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
    Acked-by: Al Viro <viro@zeniv.linux.org.uk>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:28:04 -04:00
Chris von Recklinghausen 0ccb9258f5 fs: Remove ->readpages address space operation
Bugzilla: https://bugzilla.redhat.com/2120352

commit 704528d895dd3e7b173e672116b4eb2b0a0fceb0
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Mar 23 21:29:04 2022 -0400

    fs: Remove ->readpages address space operation

    All filesystems have now been converted to use ->readahead, so
    remove the ->readpages operation and fix all the comments that
    used to refer to it.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
    Acked-by: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:28:04 -04:00
Chris von Recklinghausen 2509362897 readahead: Remove read_cache_pages()
Bugzilla: https://bugzilla.redhat.com/2120352

commit ebf921a9fac38560e0fc3a4381e163a6969efd5a
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Jan 22 15:46:22 2022 -0500

    readahead: Remove read_cache_pages()

    With no remaining users, remove this function and the related
    infrastructure.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
    Acked-by: Al Viro <viro@zeniv.linux.org.uk>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:28:03 -04:00
Chris von Recklinghausen ad6766c2ff remove inode_congested()
Conflicts:
	include/linux/backing-dev.h - We already have
		dec223c92a46 ("blk-cgroup: move struct blkcg to block/blk-cgroup
.h")
		so keep current declaration of wb_blkcg_offline
	mm/vmscan.c - We already have
		c79b7b96db8b ("mm/vmscan: Account large folios correctly")
		which increments stat->nr_congested by pages

Bugzilla: https://bugzilla.redhat.com/2120352

commit fe55d563d4174f13839a9b7ef7309da5031b5d93
Author: NeilBrown <neilb@suse.de>
Date:   Tue Mar 22 14:39:07 2022 -0700

    remove inode_congested()

    inode_congested() reports if the backing-device for the inode is
    congested.  No bdi reports congestion any more, so this always returns
    'false'.

    So remove inode_congested() and related functions, and remove the call
    sites, assuming that inode_congested() always returns 'false'.

    Link: https://lkml.kernel.org/r/164549983741.9187.2174285592262191311.stgit@
noble.brown
    Signed-off-by: NeilBrown <neilb@suse.de>
    Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Ilya Dryomov <idryomov@gmail.com>
    Cc: Jaegeuk Kim <jaegeuk@kernel.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jeff Layton <jlayton@kernel.org>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
    Cc: Miklos Szeredi <miklos@szeredi.hu>
    Cc: Paolo Valente <paolo.valente@linaro.org>    Cc: Philipp Reisner <philipp.reisner@linbit.com>
    Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:49 -04:00
Chris von Recklinghausen 3fb239248b mm: improve cleanup when ->readpages doesn't process all pages
Bugzilla: https://bugzilla.redhat.com/2120352

commit 9fd472af84abd6da15376353c2283b3df9497646
Author: NeilBrown <neilb@suse.de>
Date:   Tue Mar 22 14:38:54 2022 -0700

    mm: improve cleanup when ->readpages doesn't process all pages

    If ->readpages doesn't process all the pages, then it is best to act as
    though they weren't requested so that a subsequent readahead can try
    again.

    So:

      - remove any 'ahead' pages from the page cache so they can be loaded
        with ->readahead() rather then multiple ->read()s

      - update the file_ra_state to reflect the reads that were actually
        submitted.

    This allows ->readpages() to abort early due e.g.  to congestion, which
    will then allow us to remove the inode_read_congested() test from
    page_Cache_async_ra().

    Link: https://lkml.kernel.org/r/164549983736.9187.16755913785880819183.stgit@noble.brown
    Signed-off-by: NeilBrown <neilb@suse.de>
    Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Ilya Dryomov <idryomov@gmail.com>
    Cc: Jaegeuk Kim <jaegeuk@kernel.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jeff Layton <jlayton@kernel.org>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
    Cc: Miklos Szeredi <miklos@szeredi.hu>
    Cc: Paolo Valente <paolo.valente@linaro.org>
    Cc: Philipp Reisner <philipp.reisner@linbit.com>
    Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:49 -04:00
Chris von Recklinghausen d773af909c mm: document and polish read-ahead code
Conflicts:
	mm/readahead.c - Commit
		518d55051a8c ("mm: remove spurious blkdev.h includes")
		removed the include of linux/blkdev.h. Commit
		c97ab271576d ("blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h>")
		added it back in. This patch was made upstream between the
		commits of the two. Centos-stream-9 has both of the above
		patches so it causes a conflict with the backport of this patch.
	Drop changes to Documentation/core-api/mm-api.rst - processing this
		file causes sphinx to hang and causes the brew build of the
		noarch package to timeout and eventualy fail

Bugzilla: https://bugzilla.redhat.com/2120352

commit 84dacdbd5352bfef82423760fa2e8bffaeef9e05
Author: NeilBrown <neilb@suse.de>
Date:   Tue Mar 22 14:38:51 2022 -0700

    mm: document and polish read-ahead code

    Add some "big-picture" documentation for read-ahead and polish the code
    to make it fit this documentation.

    The meaning of ->async_size is clarified to match its name.  i.e.  Any
    request to ->readahead() has a sync part and an async part.  The caller
    will wait for the sync pages to complete, but will not wait for the
    async pages.  The first async page is still marked PG_readahead

    Note that the current function names page_cache_sync_ra() and
    page_cache_async_ra() are misleading.  All ra request are partly sync
    and partly async, so either part can be empty.  A page_cache_sync_ra()
    request will usually set ->async_size non-zero, implying it is not all
    synchronous.

    When a non-zero req_count is passed to page_cache_async_ra(), the
    implication is that some prefix of the request is synchronous, though
    the calculation made there is incorrect - I haven't tried to fix it.

    Link: https://lkml.kernel.org/r/164549983734.9187.11586890887006601405.stgit
@noble.brown
    Signed-off-by: NeilBrown <neilb@suse.de>
    Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
    Cc: Chao Yu <chao@kernel.org>
    Cc: Darrick J. Wong <djwong@kernel.org>
    Cc: Ilya Dryomov <idryomov@gmail.com>
    Cc: Jaegeuk Kim <jaegeuk@kernel.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Jeff Layton <jlayton@kernel.org>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
    Cc: Miklos Szeredi <miklos@szeredi.hu>
    Cc: Paolo Valente <paolo.valente@linaro.org>
    Cc: Philipp Reisner <philipp.reisner@linbit.com>
    Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:49 -04:00
Chris von Recklinghausen c2e21ac2eb mm/readahead.c: fix incorrect comments for get_init_ra_size
Bugzilla: https://bugzilla.redhat.com/2120352

commit fb25a77dde78dbcaa70c828ea8c2f7cf182510ae
Author: Lin Feng <linf@wangsu.com>
Date:   Fri Nov 5 13:43:47 2021 -0700

    mm/readahead.c: fix incorrect comments for get_init_ra_size

    In fact, formated values returned by get_init_ra_size are not that
    intuitive.  This patch make the comments reflect its truth.

    Link: https://lkml.kernel.org/r/20211019104812.135602-1-linf@wangsu.com
    Signed-off-by: Lin Feng <linf@wangsu.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:30 -04:00
Aristeu Rozanski 54abe3bc39 filemap: Fix serialization adding transparent huge pages to page cache
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
Conflicts: context due missing b4e089d705eef8

commit 00fa15e0d56482e32d8ca1f51d76b0ee00afb16b
Author: Alistair Popple <apopple@nvidia.com>
Date:   Mon Jun 20 19:05:36 2022 +1000

    filemap: Fix serialization adding transparent huge pages to page cache

    Commit 793917d997df ("mm/readahead: Add large folio readahead")
    introduced support for using large folios for filebacked pages if the
    filesystem supports it.

    page_cache_ra_order() was introduced to allocate and add these large
    folios to the page cache. However adding pages to the page cache should
    be serialized against truncation and hole punching by taking
    invalidate_lock. Not doing so can lead to data races resulting in stale
    data getting added to the page cache and marked up-to-date. See commit
    730633f0b7f9 ("mm: Protect operations adding pages to page cache with
    invalidate_lock") for more details.

    This issue was found by inspection but a testcase revealed it was
    possible to observe in practice on XFS. Fix this by taking
    invalidate_lock in page_cache_ra_order(), to mirror what is done for the
    non-thp case in page_cache_ra_unbounded().

    Signed-off-by: Alistair Popple <apopple@nvidia.com>
    Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:22 -04:00
Aristeu Rozanski 746eeb4ed1 mm/readahead: Fix readahead with large folios
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit b9ff43dd27434dbd850b908e2e0e1f6e794efd9b
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Apr 27 17:01:28 2022 -0400

    mm/readahead: Fix readahead with large folios

    Reading 100KB chunks from a big file (eg dd bs=100K) leads to poor
    readahead behaviour.  Studying the traces in detail, I noticed two
    problems.

    The first is that we were setting the readahead flag on the folio which
    contains the last byte read from the block.  This is wrong because we
    will trigger readahead at the end of the read without waiting to see
    if a subsequent read is going to use the pages we just read.  Instead,
    we need to set the readahead flag on the first folio _after_ the one
    which contains the last byte that we're reading.

    The second is that we were looking for the index of the folio with the
    readahead flag set to exactly match the start + size - async_size.
    If we've rounded this, either down (as previously) or up (as now),
    we'll think we hit a folio marked as readahead by a different read,
    and try to read the wrong pages.  So round the expected index to the
    order of the folio we hit.

    Reported-by: Guo Xuenan <guoxuenan@huawei.com>
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:22 -04:00
Aristeu Rozanski 031883f992 mm/readahead: Switch to page_cache_ra_order
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 56a4d67c264e37014b8392cba9869c7fe904ed1e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Jul 24 23:26:14 2021 -0400

    mm/readahead: Switch to page_cache_ra_order

    do_page_cache_ra() was being exposed for the benefit of
    do_sync_mmap_readahead().  Switch it over to page_cache_ra_order()
    partly because it's a better interface but mostly for the benefit of
    the next patch.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:20 -04:00
Aristeu Rozanski d66f5fc03c mm/readahead: Add large folio readahead
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 793917d997df2e432f3e9ac126e4482d68256d01
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 5 11:27:01 2020 -0500

    mm/readahead: Add large folio readahead

    Allocate large folios in the readahead code when the filesystem supports
    them and it seems worth doing.  The heuristic for choosing which folio
    sizes will surely need some tuning, but this aggressive ramp-up has been
    good for testing.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:20 -04:00
Aristeu Rozanski 2858b612a7 fs: Turn do_invalidatepage() into folio_invalidate()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
Conflicts: context due missing 0a4ee518185e9027

commit 5ad6b2bdaaea712486145fa5a78ec24d25289071
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:21:28 2022 +0000

    fs: Turn do_invalidatepage() into folio_invalidate()

    Take a folio instead of a page, fix the types of the offset & length,
    and export it to filesystems.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:13 -04:00
Aristeu Rozanski 50e92db624 readahead: Convert page_cache_ra_unbounded to folios
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 0387df1d1fa7d6371a7f0603c30c1d8b3bd54eba
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Mar 10 16:06:51 2021 -0500

    readahead: Convert page_cache_ra_unbounded to folios

    This saves 99 bytes of kernel text.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:08 -04:00
Aristeu Rozanski 4f0f6b20a1 readahead: Convert page_cache_async_ra() to take a folio
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 7836d9990079ed611199819ccf487061b748193a
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 27 12:30:54 2021 -0400

    readahead: Convert page_cache_async_ra() to take a folio

    Using the folio here avoids checking whether it's a tail page.
    This patch mostly just enables some of the following patches.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:08 -04:00
Ming Lei 1e80e9b8ac blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917

commit c97ab271576dec2170e7b804cb05f7617b30fed9
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Apr 20 06:27:19 2022 +0200

    blk-cgroup: remove unneeded includes from <linux/blk-cgroup.h>

    Remove all the includes that aren't actually needed from
    <linux/blk-cgroup.h> and push them to the actual source files where
    needed.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20220420042723.1010598-12-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-22 08:58:03 +08:00
Ming Lei 394e7fb164 mm: remove spurious blkdev.h includes
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403

commit 518d55051a8c368f1ba8ba1bed837a582f27a584
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Sep 20 14:33:15 2021 +0200

    mm: remove spurious blkdev.h includes

    Various files have acquired spurious includes of <linux/blkdev.h> over
    time.  Remove them.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20210920123328.1399408-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2021-12-06 16:42:47 +08:00
Rafael Aquini 4d9de0c3d3 mm: Protect operations adding pages to page cache with invalidate_lock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396

This patch is a backport of the following upstream commit:
commit 730633f0b7f951726e87f912a6323641f674ae34
Author: Jan Kara <jack@suse.cz>
Date:   Thu Jan 28 19:19:45 2021 +0100

    mm: Protect operations adding pages to page cache with invalidate_lock

    Currently, serializing operations such as page fault, read, or readahead
    against hole punching is rather difficult. The basic race scheme is
    like:

    fallocate(FALLOC_FL_PUNCH_HOLE)                 read / fault / ..
      truncate_inode_pages_range()
                                                      <create pages in page
                                                       cache here>
      <update fs block mapping and free blocks>

    Now the problem is in this way read / page fault / readahead can
    instantiate pages in page cache with potentially stale data (if blocks
    get quickly reused). Avoiding this race is not simple - page locks do
    not work because we want to make sure there are *no* pages in given
    range. inode->i_rwsem does not work because page fault happens under
    mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
    the performance for mixed read-write workloads suffer.

    So create a new rw_semaphore in the address_space - invalidate_lock -
    that protects adding of pages to page cache for page faults / reads /
    readahead.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jan Kara <jack@suse.cz>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2021-11-29 11:40:22 -05:00
David Howells 3ca2364401 mm: Implement readahead_control pageset expansion
Provide a function, readahead_expand(), that expands the set of pages
specified by a readahead_control object to encompass a revised area with a
proposed size and length.

The proposed area must include all of the old area and may be expanded yet
more by this function so that the edges align on (transparent huge) page
boundaries as allocated.

The expansion will be cut short if a page already exists in either of the
areas being expanded into.  Note that any expansion made in such a case is
not rolled back.

This will be used by fscache so that reads can be expanded to cache granule
boundaries, thereby allowing whole granules to be stored in the cache, but
there are other potential users also.

Changes:
v6:
- Fold in a patch from Matthew Wilcox to tell the ondemand readahead
  algorithm about the expansion so that the next readahead starts at the
  right place[2].

v4:
- Moved the declaration of readahead_expand() to a better place[1].

Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
cc: Alexander Viro <viro@zeniv.linux.org.uk>
cc: Christoph Hellwig <hch@lst.de>
cc: Mike Marshall <hubcap@omnibond.com>
cc: linux-mm@kvack.org
cc: linux-cachefs@redhat.com
cc: linux-afs@lists.infradead.org
cc: linux-nfs@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-fsdevel@vger.kernel.org
Link: https://lore.kernel.org/r/20210217161358.GM2858050@casper.infradead.org/ [1]
Link: https://lore.kernel.org/r/20210407201857.3582797-4-willy@infradead.org/ [2]
Link: https://lore.kernel.org/r/159974633888.2094769.8326206446358128373.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/160588479816.3465195.553952688795241765.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161118131787.1232039.4863969952441067985.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/161161028670.2537118.13831420617039766044.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/161340389201.1303470.14353807284546854878.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/161539530488.286939.18085961677838089157.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/161653789422.2770958.2108046612147345000.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/161789069829.6155.4295672417565512161.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 10:14:29 +01:00
Matthew Wilcox (Oracle) f615bd5c47 mm/readahead: Handle ractl nr_pages being modified
Filesystems are not currently permitted to modify the number of pages
in the ractl.  An upcoming patch to add readahead_expand() changes that
rule, so remove the check and resync the loop counter after every call
to the filesystem.

Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20210420200116.3715790-1-willy@infradead.org/
Link: https://lore.kernel.org/r/20210421170923.4005574-1-willy@infradead.org/ # v2
2021-04-23 10:14:28 +01:00
Matthew Wilcox (Oracle) fcd9ae4f7f mm/filemap: Pass the file_ra_state in the ractl
For readahead_expand(), we need to modify the file ra_state, so pass it
down by adding it to the ractl.  We have to do this because it's not always
the same as f_ra in the struct file that is already being passed.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Tested-By: Marc Dionne <marc.dionne@auristor.com>
Link: https://lore.kernel.org/r/20210407201857.3582797-2-willy@infradead.org/
Link: https://lore.kernel.org/r/161789067431.6155.8063840447229665720.stgit@warthog.procyon.org.uk/ # v6
2021-04-23 09:25:00 +01:00
Jens Axboe 324bcf54c4 mm: use limited read-ahead to satisfy read
For the case where read-ahead is disabled on the file, or if the cgroup
is congested, ensure that we can at least do 1 page of read-ahead to
make progress on the read in an async fashion. This could potentially be
larger, but it's not needed in terms of functionality, so let's error on
the side of caution as larger counts of pages may run into reclaim
issues (particularly if we're congested).

This makes sure we're not hitting the potentially sync ->readpage() path
for IO that is marked IOCB_WAITQ, which could cause us to block. It also
means we'll use the same path for IO, regardless of whether or not
read-ahead happens to be disabled on the lower level device.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Hao_Xu <haoxu@linux.alibaba.com>
[axboe: updated for new ractl API]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-17 13:49:08 -06:00
David Howells b1647dc0de mm/readahead: pass a file_ra_state into force_page_cache_ra
The file_ra_state being passed into page_cache_sync_readahead() was being
ignored in favour of using the one embedded in the struct file.  The only
caller for which this makes a difference is the fsverity code if the file
has been marked as POSIX_FADV_RANDOM, but it's confusing and worth fixing.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-10-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:16 -07:00
Matthew Wilcox (Oracle) fefa7c478f mm/readahead: add page_cache_sync_ra and page_cache_async_ra
Reimplement page_cache_sync_readahead() and page_cache_async_readahead()
as wrappers around versions of the function which take a readahead_control
in preparation for making do_sync_mmap_readahead() pass down an RAC
struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-8-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:16 -07:00
David Howells 7b3df3b9ac mm/readahead: pass readahead_control to force_page_cache_ra
Reimplement force_page_cache_readahead() as a wrapper around
force_page_cache_ra().  Pass the existing readahead_control from
page_cache_sync_readahead().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-7-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:16 -07:00
David Howells 6e4af69ae9 mm/readahead: make ondemand_readahead take a readahead_control
Make ondemand_readahead() take a readahead_control struct in preparation
for making do_sync_mmap_readahead() pass down an RAC struct.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-6-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:16 -07:00
Matthew Wilcox (Oracle) 8238287ead mm/readahead: make do_page_cache_ra take a readahead_control
Rename __do_page_cache_readahead() to do_page_cache_ra() and call it
directly from ondemand_readahead() instead of indirecting via ra_submit().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-5-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:16 -07:00
Matthew Wilcox (Oracle) 73bb49da50 mm/readahead: make page_cache_ra_unbounded take a readahead_control
Define it in the callers instead of in page_cache_ra_unbounded().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-4-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:16 -07:00
Matthew Wilcox (Oracle) 1aa83cfa5a mm/readahead: add DEFINE_READAHEAD
Patch series "Readahead patches for 5.9/5.10".

These are infrastructure for both the THP patchset and for the fscache
rewrite,

For both pieces of infrastructure being build on top of this patchset, we
want the ractl to be available higher in the call-stack.

For David's work, he wants to add the 'critical page' to the ractl so that
he knows which page NEEDS to be brought in from storage, and which ones
are nice-to-have.  We might want something similar in block storage too.
It used to be simple -- the first page was the critical one, but then mmap
added fault-around and so for that usecase, the middle page is the
critical one.  Anyway, I don't have any code to show that yet, we just
know that the lowest point in the callchain where we have that information
is do_sync_mmap_readahead() and so the ractl needs to start its life
there.

For THP, we havew the code that needs it.  It's actually the apex patch to
the series; the one which finally starts to allocate THPs and present them
to consenting filesystems:
798bcf30ab

This patch (of 8):

Allow for a more concise definition of a struct readahead_control.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Biggers <ebiggers@google.com>
Cc: David Howells <dhowells@redhat.com>
Link: https://lkml.kernel.org/r/20200903140844.14194-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20200903140844.14194-3-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:15 -07:00
Matthew Wilcox (Oracle) f2c817bed5 mm: use memalloc_nofs_save in readahead path
Ensure that memory allocations in the readahead path do not attempt to
reclaim file-backed pages, which could lead to a deadlock.  It is
possible, though unlikely this is the root cause of a problem observed
by Cong Wang.

Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Suggested-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-16-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:07 -07:00
Matthew Wilcox (Oracle) 2d8163e489 mm: document why we don't set PageReadahead
If the page is already in cache, we don't set PageReadahead on it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-15-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:07 -07:00
Matthew Wilcox (Oracle) 2c684234d3 mm: add page_cache_readahead_unbounded
ext4 and f2fs have duplicated the guts of the readahead code so they can
read past i_size.  Instead, separate out the guts of the readahead code
so they can call it directly.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-14-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:06 -07:00