Commit Graph

191 Commits

Author SHA1 Message Date
Rafael Aquini e010e31c9b mm: Fix missing folio invalidation calls during truncation
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 0aa2e1b2fb7a75aa4b5b4347055ccfea6f091769
Author: David Howells <dhowells@redhat.com>
Date:   Fri Aug 23 21:08:09 2024 +0100

    mm: Fix missing folio invalidation calls during truncation

    When AS_RELEASE_ALWAYS is set on a mapping, the ->release_folio() and
    ->invalidate_folio() calls should be invoked even if PG_private and
    PG_private_2 aren't set.  This is used by netfslib to keep track of the
    point above which reads can be skipped in favour of just zeroing pagecache
    locally.

    There are a couple of places in truncation in which invalidation is only
    called when folio_has_private() is true.  Fix these to check
    folio_needs_release() instead.

    Without this, the generic/075 and generic/112 xfstests (both fsx-based
    tests) fail with minimum folio size patches applied[1].

    Fixes: b4fa966f03b7 ("mm, netfs, fscache: stop read optimisation when folio removed from pagecache")
    Signed-off-by: David Howells <dhowells@redhat.com>
    Link: https://lore.kernel.org/r/20240815090849.972355-1-kernel@pankajraghav.com/ [1]
    Link: https://lore.kernel.org/r/20240823200819.532106-2-dhowells@redhat.com
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    cc: Pankaj Raghav <p.raghav@samsung.com>
    cc: Jeff Layton <jlayton@kernel.org>
    cc: Marc Dionne <marc.dionne@auristor.com>
    cc: linux-afs@lists.infradead.org
    cc: netfs@lists.linux.dev
    cc: linux-mm@kvack.org
    cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:25:34 -05:00
Rafael Aquini 288fab6492 mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
JIRA: https://issues.redhat.com/browse/RHEL-27745
Conflicts:
  * virt/kvm/guest_memfd.c: difference in the hunk due to RHEL missing upstream
    commit 1d23040caa8b ("KVM: guest_memfd: Use AS_INACCESSIBLE when creating
    guest_memfd inode") which would end up being reverted with this follow-up fix.

This patch is a backport of the following upstream commit:
commit 27e6a24a4cf3d25421c0f6ebb7c39f45fc14d20f
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Jul 11 13:56:54 2024 -0400

    mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE

    The flags AS_UNMOVABLE and AS_INACCESSIBLE were both added just for guest_memfd;
    AS_UNMOVABLE is already in existing versions of Linux, while AS_INACCESSIBLE was
    acked for inclusion in 6.11.

    But really, they are the same thing: only guest_memfd uses them, at least for
    now, and guest_memfd pages are unmovable because they should not be
    accessed by the CPU.

    So merge them into one; use the AS_INACCESSIBLE name which is more comprehensive.
    At the same time, this fixes an embarrassing bug where AS_INACCESSIBLE was used
    as a bit mask, despite it being just a bit index.

    The bug was mostly benign, because AS_INACCESSIBLE's bit representation (1010)
    corresponded to setting AS_UNEVICTABLE (which is already set) and AS_ENOSPC
    (except no async writes can happen on the guest_memfd).  So the AS_INACCESSIBLE
    flag simply had no effect.

    Fixes: 1d23040caa8b ("KVM: guest_memfd: Use AS_INACCESSIBLE when creating guest_memfd inode")
    Fixes: c72ceafbd12c ("mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory")
    Cc: linux-mm@kvack.org
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: David Hildenbrand <david@redhat.com>
    Tested-by: Michael Roth <michael.roth@amd.com>
    Reviewed-by: Michael Roth <michael.roth@amd.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:25:25 -05:00
Rafael Aquini 9af1ce9e46 mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory
JIRA: https://issues.redhat.com/browse/RHEL-27745
Conflicts:
  * include/linux/pagemap.h: context differences due to RHEL out-of-order
    backports of commits 762321dab9a7 ("filemap: add a per-mapping stable
    writes flag"), and 0003e2a41468 ("mm: Add AS_UNMOVABLE to mark mapping
    as completely unmovable"), causing the latter to  end up missing the
    conflict resolution with the former and its fixing through merge commit
    136292522e43 (which we do here).

This patch is a backport of the following upstream commit:
commit c72ceafbd12cf95e088681ae5e535ef1a78bf0ed
Author: Michael Roth <michael.roth@amd.com>
Date:   Fri Mar 29 16:24:42 2024 -0500

    mm: Introduce AS_INACCESSIBLE for encrypted/confidential memory

    filemap users like guest_memfd may use page cache pages to
    allocate/manage memory that is only intended to be accessed by guests
    via hardware protections like encryption. Writes to memory of this sort
    in common paths like truncation may cause unexpected behavior such as
    writing garbage instead of zeros when attempting to zero pages, or
    worse, triggering hardware protections that are considered fatal as far
    as the kernel is concerned.

    Introduce a new address_space flag, AS_INACCESSIBLE, and use this
    initially to prevent zero'ing of pages during truncation, with the
    understanding that it is up to the owner of the mapping to handle this
    specially if needed.

    This is admittedly a rather blunt solution, but it seems like
    there are no other places that should take into account the
    flag to keep its promise.

    Link: https://lore.kernel.org/lkml/ZR9LYhpxTaTk6PJX@google.com/
    Cc: Matthew Wilcox <willy@infradead.org>
    Suggested-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Michael Roth <michael.roth@amd.com>
    Message-ID: <20240329212444.395559-5-michael.roth@amd.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:25:10 -05:00
Rafael Aquini d2f2fe5ae8 mm: remove invalidate_inode_page()
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 2033c98cce666b0d125ae956613ab5111bb8d202
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Nov 8 18:28:09 2023 +0000

    mm: remove invalidate_inode_page()

    All callers are now converted to call mapping_evict_folio().

    Link: https://lkml.kernel.org/r/20231108182809.602073-7-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:23:38 -05:00
Rafael Aquini 3c11b7e193 mm: make mapping_evict_folio() the preferred way to evict clean folios
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 1e12cbb9f69541181afab6b1ff358b4f1dd3e253
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Nov 8 18:28:04 2023 +0000

    mm: make mapping_evict_folio() the preferred way to evict clean folios

    Patch series "Fix fault handler's handling of poisoned tail pages".

    Since introducing the ability to have large folios in the page cache, it's
    been possible to have a hwpoisoned tail page returned from the fault
    handler.  We handle this situation poorly; failing to remove the affected
    page from use.

    This isn't a minimal patch to fix it, it's a full conversion of all the
    code surrounding it.

    This patch (of 6):

    invalidate_inode_page() does very little beyond calling
    mapping_evict_folio().  Move the check for mapping being NULL into
    mapping_evict_folio() and make it available to the rest of the MM for use
    in the next few patches.

    Link: https://lkml.kernel.org/r/20231108182809.602073-1-willy@infradead.org
    Link: https://lkml.kernel.org/r/20231108182809.602073-2-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:23:34 -05:00
Rafael Aquini 5d6754d7f7 mm: invalidation check mapping before folio_contains
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit aa5b9178c01905d7691512b366cf2886dfe2680c
Author: Hugh Dickins <hughd@google.com>
Date:   Tue Aug 8 21:36:12 2023 -0700

    mm: invalidation check mapping before folio_contains

    Enabling tmpfs "direct IO" exposes it to invalidate_inode_pages2_range(),
    which when swapping can hit the VM_BUG_ON_FOLIO(!folio_contains()): the
    folio has been moved from page cache to swap cache (with folio->mapping
    reset to NULL), but the folio_index() embedded in folio_contains() sees
    swapcache, and so returns the swapcache_index() - whereas folio->index
    would be the right one to check against the index from mapping's xarray.

    There are different ways to fix this, but my preference is just to order
    the checks in invalidate_inode_pages2_range() the same way that they are
    in __filemap_get_folio() and find_lock_entries() and filemap_fault():
    check folio->mapping before folio_contains().

    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Message-Id: <f0b31772-78d7-f198-6482-9f25aab8c13f@google.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:21:35 -04:00
Rafael Aquini 89b7c01962 mm: increase usage of folio_next_index() helper
JIRA: https://issues.redhat.com/browse/RHEL-27743

This patch is a backport of the following upstream commit:
commit 87b11f862254396a93636f0998377ac3f6648f5f
Author: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Date:   Tue Jun 27 10:43:49 2023 -0700

    mm: increase usage of folio_next_index() helper

    Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
    the existing helper folio_next_index().

    Link: https://lkml.kernel.org/r/20230627174349.491803-1-sidhartha.kumar@oracle.com
    Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Suggested-by: Christoph Hellwig <hch@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: Andreas Dilger <adilger.kernel@dilger.ca>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Theodore Ts'o <tytso@mit.edu>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-10-01 11:17:23 -04:00
Rafael Aquini 25e4aa840e mm: remove references to pagevec
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 1fec6890bf2247ecc93f5491c2d3f33c333d5c6e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jun 21 17:45:56 2023 +0100

    mm: remove references to pagevec

    Most of these should just refer to the LRU cache rather than the data
    structure used to implement the LRU cache.

    Link: https://lkml.kernel.org/r/20230621164557.3510324-13-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:37:32 -04:00
Rafael Aquini d0df95d609 mm: rename invalidate_mapping_pagevec to mapping_try_invalidate
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit 1a0fc811f5f5addf54499826bd1b6e34e917491c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jun 21 17:45:55 2023 +0100

    mm: rename invalidate_mapping_pagevec to mapping_try_invalidate

    We don't use pagevecs for the LRU cache any more, and we don't know that
    the failed invalidations were due to the folio being in an LRU cache.  So
    rename it to be more accurate.

    Link: https://lkml.kernel.org/r/20230621164557.3510324-12-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:37:31 -04:00
Chris von Recklinghausen c0446613df mm: return an ERR_PTR from __filemap_get_folio
Conflicts:
	fs/nilfs2/page.c - We already have
		f6e0e1734424 ("nilfs2: Convert nilfs_copy_back_pages() to use filemap_get_folios()")
		so use folios instead of pages
	fs/smb/client/cifsfs.c - The backport of
		7b2404a886f8 ("cifs: Fix flushing, invalidation and file size with copy_file_range()")
		cited the lack of this patch as a conflict. Fix it.

JIRA: https://issues.redhat.com/browse/RHEL-27741

commit 66dabbb65d673aef40dd17bf62c042be8f6d4a4b
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Mar 7 15:34:10 2023 +0100

    mm: return an ERR_PTR from __filemap_get_folio

    Instead of returning NULL for all errors, distinguish between:

     - no entry found and not asked to allocated (-ENOENT)
     - failed to allocate memory (-ENOMEM)
     - would block (-EAGAIN)

    so that callers don't have to guess the error based on the passed in
    flags.

    Also pass through the error through the direct callers: filemap_get_folio,
    filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio.

    [hch@lst.de: fix null-pointer deref]
      Link: https://lkml.kernel.org/r/20230310070023.GA13563@lst.de
      Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004
    Link: https://lkml.kernel.org/r/20230307143410.28031-8-hch@lst.de
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs2]
    Cc: Andreas Gruenbacher <agruenba@redhat.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2024-04-30 07:00:24 -04:00
Audra Mitchell fb208bc6ad filemap: find_get_entries() now updates start offset
JIRA: https://issues.redhat.com/browse/RHEL-27739

This patch is a backport of the following upstream commit:
commit 9fb6beea79c6e7c959adf4fb7b94cf9a6028b941
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date:   Mon Oct 17 09:18:00 2022 -0700

    filemap: find_get_entries() now updates start offset

    Initially, find_get_entries() was being passed in the start offset as a
    value.  That left the calculation of the offset to the callers.  This led
    to complexity in the callers trying to keep track of the index.

    Now find_get_entries() takes in a pointer to the start offset and updates
    the value to be directly after the last entry found.  If no entry is
    found, the offset is not changed.  This gets rid of multiple hacky
    calculations that kept track of the start offset.

    Link: https://lkml.kernel.org/r/20221017161800.2003-3-vishal.moola@gmail.com
    Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:50 -04:00
Audra Mitchell 765c2fd97b filemap: find_lock_entries() now updates start offset
JIRA: https://issues.redhat.com/browse/RHEL-27739
Conflicts:
    Context conflict due to out of order backport:
    9efa394ef3 ("tmpfs: fix data loss from failed fallocate")

This patch is a backport of the following upstream commit:
commit 3392ca121872dd8c33015c7703d4981c78819be3
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date:   Mon Oct 17 09:17:59 2022 -0700

    filemap: find_lock_entries() now updates start offset

    Patch series "Rework find_get_entries() and find_lock_entries()", v3.

    Originally the callers of find_get_entries() and find_lock_entries() were
    keeping track of the start index themselves as they traverse the search
    range.

    This resulted in hacky code such as in shmem_undo_range():

                            index = folio->index + folio_nr_pages(folio) - 1;

    where the - 1 is only present to stay in the right spot after incrementing
    index later.  This sort of calculation was also being done on every folio
    despite not even using index later within that function.

    These patches change find_get_entries() and find_lock_entries() to
    calculate the new index instead of leaving it to the callers so we can
    avoid all these complications.

    This patch (of 2):

    Initially, find_lock_entries() was being passed in the start offset as a
    value.  That left the calculation of the offset to the callers.  This led
    to complexity in the callers trying to keep track of the index.

    Now find_lock_entries() takes in a pointer to the start offset and updates
    the value to be directly after the last entry found.  If no entry is
    found, the offset is not changed.  This gets rid of multiple hacky
    calculations that kept track of the start offset.

    Link: https://lkml.kernel.org/r/20221017161800.2003-1-vishal.moola@gmail.com
    Link: https://lkml.kernel.org/r/20221017161800.2003-2-vishal.moola@gmail.com
    Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Audra Mitchell <audra@redhat.com>
2024-04-09 09:42:50 -04:00
Chris von Recklinghausen d680c6a64b folio-compat: remove lru_cache_add()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 6e1ca48d0669b0f5efcbaa051b23cd8e651a1614
Author: Vishal Moola (Oracle) <vishal.moola@gmail.com>
Date:   Tue Nov 1 10:53:26 2022 -0700

    folio-compat: remove lru_cache_add()

    There are no longer any callers of lru_cache_add(), so remove it.  This
    saves 79 bytes of kernel text.  Also cleanup some comments such that
    they reference the new folio_add_lru() instead.

    Link: https://lkml.kernel.org/r/20221101175326.13265-6-vishal.moola@gmail.com
    Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:30 -04:00
Chris von Recklinghausen a2c59e7f9d mm: add split_folio()
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit d788f5b374c2ba204fed57e39acf2452acc24812
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Sep 2 20:46:00 2022 +0100

    mm: add split_folio()

    This wrapper removes a need to use split_huge_page(&folio->page).  Convert
    two callers.

    Link: https://lkml.kernel.org/r/20220902194653.1739778-5-willy@infradead.org
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:13:51 -04:00
Dave Wysochanski 924daddc03 mm: merge folio_has_private()/filemap_release_folio() call pairs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2209756

Patch series "mm, netfs, fscache: Stop read optimisation when folio
removed from pagecache", v7.

This fixes an optimisation in fscache whereby we don't read from the cache
for a particular file until we know that there's data there that we don't
have in the pagecache.  The problem is that I'm no longer using PG_fscache
(aka PG_private_2) to indicate that the page is cached and so I don't get
a notification when a cached page is dropped from the pagecache.

The first patch merges some folio_has_private() and
filemap_release_folio() pairs and introduces a helper,
folio_needs_release(), to indicate if a release is required.

The second patch is the actual fix.  Following Willy's suggestions[1], it
adds an AS_RELEASE_ALWAYS flag to an address_space that will make
filemap_release_folio() always call ->release_folio(), even if
PG_private/PG_private_2 aren't set.  folio_needs_release() is altered to
add a check for this.

This patch (of 2):

Make filemap_release_folio() check folio_has_private().  Then, in most
cases, where a call to folio_has_private() is immediately followed by a
call to filemap_release_folio(), we can get rid of the test in the pair.

There are a couple of sites in mm/vscan.c that this can't so easily be
done.  In shrink_folio_list(), there are actually three cases (something
different is done for incompletely invalidated buffers), but
filemap_release_folio() elides two of them.

In shrink_active_list(), we don't have have the folio lock yet, so the
check allows us to avoid locking the page unnecessarily.

A wrapper function to check if a folio needs release is provided for those
places that still need to do it in the mm/ directory.  This will acquire
additional parts to the condition in a future patch.

After this, the only remaining caller of folio_has_private() outside of
mm/ is a check in fuse.

Link: https://lkml.kernel.org/r/20230628104852.3391651-1-dhowells@redhat.com
Link: https://lkml.kernel.org/r/20230628104852.3391651-2-dhowells@redhat.com
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steve French <sfrench@samba.org>
Cc: Shyam Prasad N <nspmangalore@gmail.com>
Cc: Rohith Surabattula <rohiths.msft@gmail.com>
Cc: Dave Wysochanski <dwysocha@redhat.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Xiubo Li <xiubli@redhat.com>
Cc: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(cherry picked from commit 0201ebf274a306a6ebb95e5dc2d6a0a27c737cac)
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
2023-09-13 18:19:41 -04:00
Chris von Recklinghausen c991a31fce mm: Remove __delete_from_page_cache()
Bugzilla: https://bugzilla.redhat.com/2160210

commit 6ffcd825e7d0416d78fd41cd5b7856a78122cc8c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Jun 28 20:41:40 2022 -0400

    mm: Remove __delete_from_page_cache()

    This wrapper is no longer used.  Remove it and all references to it.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:19:15 -04:00
Chris von Recklinghausen 50de8bd114 fs: Remove aops->invalidatepage
Bugzilla: https://bugzilla.redhat.com/2160210

commit f50015a596fa106bf642bd85fbf6e6b52cc913d0
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:21:51 2022 +0000

    fs: Remove aops->invalidatepage

    With all users migrated to ->invalidate_folio, remove the old operation.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-03-24 11:18:47 -04:00
Chris von Recklinghausen ff89a3a83f fs: Turn block_invalidatepage into block_invalidate_folio
Bugzilla: https://bugzilla.redhat.com/2120352

commit 7ba13abbd31ee9265e88d7dc029c0f786e665192
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:21:34 2022 +0000

    fs: Turn block_invalidatepage into block_invalidate_folio

    Remove special-casing of a NULL invalidatepage, since there is no
    more block_invalidatepage.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:48 -04:00
Chris von Recklinghausen f053c9cc69 mm: remove cleancache
Conflicts: Drop changes to fs/btrfs/extent_io.c, fs/ntfs3/ntfs_fs.h - We don't
	build them

Bugzilla: https://bugzilla.redhat.com/2120352

commit 0a4ee518185e902758191d968600399f3bc2be31
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Jan 21 22:14:34 2022 -0800

    mm: remove cleancache

    Patch series "remove Xen tmem leftovers".

    Since the removal of the Xen tmem driver in 2019, the cleancache hooks
    are entirely unused, as are large parts of frontswap.  This series
    against linux-next (with the folio changes included) removes
    cleancaches, and cuts down frontswap to the bits actually used by zswap.

    This patch (of 13):

    The cleancache subsystem is unused since the removal of Xen tmem driver
    in commit 814bbf49dc ("xen: remove tmem driver").

    [akpm@linux-foundation.org: remove now-unreachable code]

    Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de
    Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Juergen Gross <jgross@suse.com>
    Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Seth Jennings <sjenning@redhat.com>
    Cc: Dan Streetman <ddstreet@ieee.org>
    Cc: Vitaly Wool <vitaly.wool@konsulko.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:42 -04:00
Chris von Recklinghausen 3a3ee2bede vfs: keep inodes with page cache off the inode shrinker LRU
Conflicts:
	mm/filemap.c - We already have
		452e9e6992fe ("filemap: Add filemap_remove_folio and __filemap_remove_folio")
		so just add the spin_lock call.
	mm/truncate.c - The backport of
		51dcbdac28d4 ("mm: Convert find_lock_entries() to use a folio_batch")
		listed the lack of this patch as a conflict. Keep the
		''fbatch->nr = j;' line.
	mm/vmscan.c - We already have
		be7c07d60e13 ("mm/vmscan: Convert __remove_mapping() to take a folio")
		so change a couple of lines from 'if (!PageSwapCache(page))'
		to 'if (!folio_test_swapcache(folio))'

Bugzilla: https://bugzilla.redhat.com/2120352

commit 51b8c1fe250d1bd70c1722dc3c414f5cff2d7cca
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Mon Nov 8 18:31:24 2021 -0800

    vfs: keep inodes with page cache off the inode shrinker LRU

    Historically (pre-2.5), the inode shrinker used to reclaim only empty
    inodes and skip over those that still contained page cache.  This caused
    problems on highmem hosts: struct inode could put fill lowmem zones
    before the cache was getting reclaimed in the highmem zones.

    To address this, the inode shrinker started to strip page cache to
    facilitate reclaiming lowmem.  However, this comes with its own set of
    problems: the shrinkers may drop actively used page cache just because
    the inodes are not currently open or dirty - think working with a large
    git tree.  It further doesn't respect cgroup memory protection settings
    and can cause priority inversions between containers.

    Nowadays, the page cache also holds non-resident info for evicted cache
    pages in order to detect refaults.  We've come to rely heavily on this
    data inside reclaim for protecting the cache workingset and driving swap
    behavior.  We also use it to quantify and report workload health through
    psi.  The latter in turn is used for fleet health monitoring, as well as
    driving automated memory sizing of workloads and containers, proactive
    reclaim and memory offloading schemes.

    The consequences of dropping page cache prematurely is that we're seeing
    subtle and not-so-subtle failures in all of the above-mentioned
    scenarios, with the workload generally entering unexpected thrashing
    states while losing the ability to reliably detect it.

    To fix this on non-highmem systems at least, going back to rotating
    inodes on the LRU isn't feasible.  We've tried (commit a76cf1a474
    ("mm: don't reclaim inodes with many attached pages")) and failed
    (commit 69056ee6a8 ("Revert "mm: don't reclaim inodes with many
    attached pages"")).

    The issue is mostly that shrinker pools attract pressure based on their
    size, and when objects get skipped the shrinkers remember this as
    deferred reclaim work.  This accumulates excessive pressure on the
    remaining inodes, and we can quickly eat into heavily used ones, or
    dirty ones that require IO to reclaim, when there potentially is plenty
    of cold, clean cache around still.

    Instead, this patch keeps populated inodes off the inode LRU in the
    first place - just like an open file or dirty state would.  An otherwise
    clean and unused inode then gets queued when the last cache entry
    disappears.  This solves the problem without reintroducing the reclaim
    issues, and generally is a bit more scalable than having to wade through
    potentially hundreds of thousands of busy inodes.

    Locking is a bit tricky because the locks protecting the inode state
    (i_lock) and the inode LRU (lru_list.lock) don't nest inside the
    irq-safe page cache lock (i_pages.xa_lock).  Page cache deletions are
    serialized through i_lock, taken before the i_pages lock, to make sure
    depopulated inodes are queued reliably.  Additions may race with
    deletions, but we'll check again in the shrinker.  If additions race
    with the shrinker itself, we're protected by the i_lock: if find_inode()
    or iput() win, the shrinker will bail on the elevated i_count or
    I_REFERENCED; if the shrinker wins and goes ahead with the inode, it
    will set I_FREEING and inhibit further igets(), which will cause the
    other side to create a new instance of the inode instead.

    Link: https://lkml.kernel.org/r/20210614211904.14420-4-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Dave Chinner <david@fromorbit.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2022-10-12 07:27:31 -04:00
Aristeu Rozanski 528faa0405 mm/truncate: Combine invalidate_mapping_pagevec() and __invalidate_mapping_pages()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit c56109dd35c9204cd6c49d2116ef36e5044ef867
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Feb 13 17:22:10 2022 -0500

    mm/truncate: Combine invalidate_mapping_pagevec() and __invalidate_mapping_pages()

    We can save a function call by combining these two functions, which
    are identical except for the return value.  Also move the prototype
    to mm/internal.h.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:17 -04:00
Aristeu Rozanski b86ef54bdd mm: Turn deactivate_file_page() into deactivate_file_folio()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 261b6840ed10419ac2f554e515592d59dd5c82cf
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Feb 13 16:40:24 2022 -0500

    mm: Turn deactivate_file_page() into deactivate_file_folio()

    This function has one caller which already has a reference to the
    page, so we don't need to use get_page_unless_zero().  Also move the
    prototype to mm/internal.h.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:17 -04:00
Aristeu Rozanski fc7fcd9b05 mm/truncate: Convert __invalidate_mapping_pages() to use a folio
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit b4545f46533b7e69cb20e05c9fe987be76e1a3da
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Feb 13 16:38:07 2022 -0500

    mm/truncate: Convert __invalidate_mapping_pages() to use a folio

    Now we can call mapping_evict_folio() instead of invalidate_inode_page()
    and save a few calls to compound_head().

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:17 -04:00
Aristeu Rozanski 5a8509634f mm/truncate: Split invalidate_inode_page() into mapping_evict_folio()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit d6c75dc22c755c567838f12f12a16f2a323ebd4e
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Feb 13 15:22:28 2022 -0500

    mm/truncate: Split invalidate_inode_page() into mapping_evict_folio()

    Some of the callers already have the address_space and can avoid calling
    folio_mapping() and checking if the folio was already truncated.  Also
    add kernel-doc and fix the return type (in case we ever support folios
    larger than 4TB).

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:17 -04:00
Aristeu Rozanski 275becb2f3 mm: Convert remove_mapping() to take a folio
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 5100da38ef3c33d9ad8b60b29c2b671249bf7e1d
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Feb 12 22:48:55 2022 -0500

    mm: Convert remove_mapping() to take a folio

    Add kernel-doc and return the number of pages removed in order to
    get the statistics right in __invalidate_mapping_pages().

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:17 -04:00
Aristeu Rozanski 88d103c7f7 mm/truncate: Replace page_mapped() call in invalidate_inode_page()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit e41c81d0d30e1a6ebf408feaf561f80cac4457dc
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Feb 12 17:43:16 2022 -0500

    mm/truncate: Replace page_mapped() call in invalidate_inode_page()

    folio_mapped() is expensive because it has to check each page's mapcount
    field.  A cheaper check is whether there are any extra references to
    the page, other than the one we own, one from the page private data and
    the ones held by the page cache.

    The call to remove_mapping() will fail in any case if it cannot freeze
    the refcount, but failing here avoids cycling the i_pages spinlock.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:17 -04:00
Aristeu Rozanski 8ce712a2b5 mm/truncate: Convert invalidate_inode_page() to use a folio
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 4418481396b2caeded6d0eed11ac9052ab4c0763
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Feb 12 17:39:10 2022 -0500

    mm/truncate: Convert invalidate_inode_page() to use a folio

    This saves a number of calls to compound_head().

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:16 -04:00
Aristeu Rozanski e7d8da2114 mm/truncate: Inline invalidate_complete_page() into its one caller
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
Conflicts: differences due missing simplification from 43b93121056c52

commit 1b8ddbeeb9b819e62b7190115023ce858a159f5c
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Feb 12 15:27:42 2022 -0500

    mm/truncate: Inline invalidate_complete_page() into its one caller

    invalidate_inode_page() is the only caller of invalidate_complete_page()
    and inlining it reveals that the first check is unnecessary (because we
    hold the page locked, and we just retrieved the mapping from the page).
    Actually, it does make a difference, in that tail pages no longer fail
    at this check, so it's now possible to remove a tail page from a mapping.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:16 -04:00
Aristeu Rozanski 36d1439564 fs: Add aops->launder_folio
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
Conflicts: we didn't backport is_partially_uptodate() (8ab22b9abb)

commit affa80e8c6a1df473694c2087259901872309cc4
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:21:52 2022 +0000

    fs: Add aops->launder_folio

    Since the only difference between ->launder_page and ->launder_folio
    is the type of the pointer, these can safely use a union without
    affecting bisectability.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:14 -04:00
Aristeu Rozanski 0f8e8a5ec4 fs: Add invalidate_folio() aops method
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 128d1f8241d62ab014eef6dd4ef9bb977dbeadb2
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:21:32 2022 +0000

    fs: Add invalidate_folio() aops method

    This is used in preference to invalidatepage, if defined.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:13 -04:00
Aristeu Rozanski 2858b612a7 fs: Turn do_invalidatepage() into folio_invalidate()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
Conflicts: context due missing 0a4ee518185e9027

commit 5ad6b2bdaaea712486145fa5a78ec24d25289071
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Feb 9 20:21:28 2022 +0000

    fs: Turn do_invalidatepage() into folio_invalidate()

    Take a folio instead of a page, fix the types of the offset & length,
    and export it to filesystems.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
    Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
    Tested-by: David Howells <dhowells@redhat.com> # afs

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:13 -04:00
Aristeu Rozanski 4a1432e31d truncate,shmem: Handle truncates that split large folios
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit b9a8a4195c7d3a51235a4fc974a46ad4e9689ffd
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed May 27 17:59:22 2020 -0400

    truncate,shmem: Handle truncates that split large folios

    Handle folio splitting in the parts of the truncation functions which
    already handle partial pages.  Factor all that code out into a new
    function called truncate_inode_partial_folio().

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:11 -04:00
Aristeu Rozanski 07d2b88c7e truncate: Convert invalidate_inode_pages2_range to folios
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit f6357c3a9d3ea5a00c5bf52845b633d649da6722
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu May 20 08:17:44 2021 -0400

    truncate: Convert invalidate_inode_pages2_range to folios

    If we're going to unmap a folio, we have to be sure to unmap the entire
    folio, not just the part of it which lies after the search index.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:11 -04:00
Aristeu Rozanski cc07877b40 mm: Remove pagevec_remove_exceptionals()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 1613fac9aaf840af76faa747ea428a714af98dbd
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Dec 7 14:28:49 2021 -0500

    mm: Remove pagevec_remove_exceptionals()

    All of its callers now call folio_batch_remove_exceptionals().

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:11 -04:00
Aristeu Rozanski 59e51bcf24 mm: Convert find_lock_entries() to use a folio_batch
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
Conflicts: context due missing 51b8c1fe250d1bd70c17

commit 51dcbdac28d4dde915f78adf08bb3fac87f516e9
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Dec 7 14:15:07 2021 -0500

    mm: Convert find_lock_entries() to use a folio_batch

    find_lock_entries() already only returned the head page of folios, so
    convert it to return a folio_batch instead of a pagevec.  That cascades
    through converting truncate_inode_pages_range() to
    delete_from_page_cache_batch() and page_cache_delete_batch().

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:11 -04:00
Aristeu Rozanski 433ab58b7a filemap: Return only folios from find_get_entries()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 0e499ed3d7a216706e02eeded562627d3e69dcfd
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Tue Sep 1 23:17:50 2020 -0400

    filemap: Return only folios from find_get_entries()

    The callers have all been converted to work on folios, so convert
    find_get_entries() to return a batch of folios instead of pages.
    We also now return multiple large folios in a single call.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:10 -04:00
Aristeu Rozanski 2e4f1700b5 truncate: Add invalidate_complete_folio2()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites
Conflicts: context due missing 51b8c1fe250d1bd70c17

commit 78f426608f21c997975adb96641b7ac82d4d15b1
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Wed Jul 28 15:52:34 2021 -0400

    truncate: Add invalidate_complete_folio2()

    Convert invalidate_complete_page2() to invalidate_complete_folio2().
    Use filemap_free_folio() to free the page instead of calling ->freepage
    manually.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:10 -04:00
Aristeu Rozanski 59c269217b truncate: Convert invalidate_inode_pages2_range() to use a folio
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit fae9bc4a90176868cbbbecc693acb0ff2607818d
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu Dec 2 23:25:01 2021 -0500

    truncate: Convert invalidate_inode_pages2_range() to use a folio

    If we're going to unmap a folio, we have to be sure to unmap the entire
    folio, not just the part of it which lies after the search index.

    We cannot yet remove the struct page from invalidate_inode_pages2_range()
    because the page pointer in the pvec might be a shadow/dax/swap entry
    instead of actually a page.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:10 -04:00
Aristeu Rozanski 7b60f6f55b truncate: Skip known-truncated indices
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit ccbbf761d440b0d5afcbf232db37435dc38d6161
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Nov 26 13:25:38 2021 -0500

    truncate: Skip known-truncated indices

    If we've truncated an entire folio, we can skip over all the indices
    covered by this folio.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:10 -04:00
Aristeu Rozanski 0a91f1bd33 truncate,shmem: Add truncate_inode_folio()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 1e84a3d997b74c33491899e31d48774f252213ab
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Thu Dec 2 16:01:55 2021 -0500

    truncate,shmem: Add truncate_inode_folio()

    Convert all callers of truncate_inode_page() to call
    truncate_inode_folio() instead, and move the declaration to mm/internal.h.
    Move the assertion that the caller is not passing in a tail page to
    generic_error_remove_page().  We can't entirely remove the struct page
    from the callers yet because the page pointer in the pvec might be a
    shadow/dax/swap entry instead of actually a page.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:10 -04:00
Aristeu Rozanski 4f77d61363 mm: Add unmap_mapping_folio()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit 3506659e18a61ae525f3b9b4f5af23b4b149d4db
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sun Nov 28 14:53:35 2021 -0500

    mm: Add unmap_mapping_folio()

    Convert both callers of unmap_mapping_page() to call unmap_mapping_folio()
    instead.  Also move zap_details from linux/mm.h to mm/memory.c

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:10 -04:00
Aristeu Rozanski 2f0cf8ca83 truncate: Add truncate_cleanup_folio()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083861
Tested: by me with multiple test suites

commit efe99bba2862aef24f1b05b786f6bf5acb076209
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Fri Nov 26 13:58:10 2021 -0500

    truncate: Add truncate_cleanup_folio()

    Convert both callers of truncate_cleanup_page() to use
    truncate_cleanup_folio() instead.

    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>

Signed-off-by: Aristeu Rozanski <arozansk@redhat.com>
2022-07-10 10:44:09 -04:00
Rafael Aquini 5b5c28b990 fs: inode: count invalidated shadow pages in pginodesteal
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396

This patch is a backport of the following upstream commit:
commit 7ae12c809f6a31d3da7b96339dbefa141884c711
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Thu Sep 2 14:53:24 2021 -0700

    fs: inode: count invalidated shadow pages in pginodesteal

    pginodesteal is supposed to capture the impact that inode reclaim has on
    the page cache state.  Currently, it doesn't consider shadow pages that
    get dropped this way, even though this can have a significant impact on
    paging behavior, memory pressure calculations etc.

    To improve visibility into these effects, make sure shadow pages get
    counted when they get dropped through inode reclaim.

    This changes the return value semantics of invalidate_mapping_pages()
    semantics slightly, but the only two users are the inode shrinker itsel
    and a usb driver that logs it for debugging purposes.

    Link: https://lkml.kernel.org/r/20210614211904.14420-3-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2021-11-29 11:40:53 -05:00
Rafael Aquini 7ac6d2bbef mm: remove irqsave/restore locking from contexts with irqs enabled
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396

This patch is a backport of the following upstream commit:
commit 3047250972ff935b1d7a0629fa3acb04c12dcc07
Author: Johannes Weiner <hannes@cmpxchg.org>
Date:   Thu Sep 2 14:53:18 2021 -0700

    mm: remove irqsave/restore locking from contexts with irqs enabled

    The page cache deletion paths all have interrupts enabled, so no need to
    use irqsafe/irqrestore locking variants.

    They used to have irqs disabled by the memcg lock added in commit
    c4843a7593 ("memcg: add per cgroup dirty page accounting"), but that has
    since been replaced by memcg taking the page lock instead, commit
    0a31bc97c8 ("mm: memcontrol: rewrite uncharge AP").

    Link: https://lkml.kernel.org/r/20210614211904.14420-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2021-11-29 11:40:44 -05:00
Rafael Aquini 4d9de0c3d3 mm: Protect operations adding pages to page cache with invalidate_lock
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396

This patch is a backport of the following upstream commit:
commit 730633f0b7f951726e87f912a6323641f674ae34
Author: Jan Kara <jack@suse.cz>
Date:   Thu Jan 28 19:19:45 2021 +0100

    mm: Protect operations adding pages to page cache with invalidate_lock

    Currently, serializing operations such as page fault, read, or readahead
    against hole punching is rather difficult. The basic race scheme is
    like:

    fallocate(FALLOC_FL_PUNCH_HOLE)                 read / fault / ..
      truncate_inode_pages_range()
                                                      <create pages in page
                                                       cache here>
      <update fs block mapping and free blocks>

    Now the problem is in this way read / page fault / readahead can
    instantiate pages in page cache with potentially stale data (if blocks
    get quickly reused). Avoiding this race is not simple - page locks do
    not work because we want to make sure there are *no* pages in given
    range. inode->i_rwsem does not work because page fault happens under
    mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
    the performance for mixed read-write workloads suffer.

    So create a new rw_semaphore in the address_space - invalidate_lock -
    that protects adding of pages to page cache for page faults / reads /
    readahead.

    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jan Kara <jack@suse.cz>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2021-11-29 11:40:22 -05:00
Rafael Aquini 5a88d17b6c mm: Fix comments mentioning i_mutex
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2023396

This patch is a backport of the following upstream commit:
commit 9608703e488cf7a711c42c7ccd981c32377f7b78
Author: Jan Kara <jack@suse.cz>
Date:   Mon Apr 12 15:50:21 2021 +0200

    mm: Fix comments mentioning i_mutex

    inode->i_mutex has been replaced with inode->i_rwsem long ago. Fix
    comments still mentioning i_mutex.

    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    Acked-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Jan Kara <jack@suse.cz>

Signed-off-by: Rafael Aquini <aquini@redhat.com>
2021-11-29 11:40:21 -05:00
Hugh Dickins 22061a1ffa mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
There is a race between THP unmapping and truncation, when truncate sees
pmd_none() and skips the entry, after munmap's zap_huge_pmd() cleared
it, but before its page_remove_rmap() gets to decrement
compound_mapcount: generating false "BUG: Bad page cache" reports that
the page is still mapped when deleted.  This commit fixes that, but not
in the way I hoped.

The first attempt used try_to_unmap(page, TTU_SYNC|TTU_IGNORE_MLOCK)
instead of unmap_mapping_range() in truncate_cleanup_page(): it has
often been an annoyance that we usually call unmap_mapping_range() with
no pages locked, but there apply it to a single locked page.
try_to_unmap() looks more suitable for a single locked page.

However, try_to_unmap_one() contains a VM_BUG_ON_PAGE(!pvmw.pte,page):
it is used to insert THP migration entries, but not used to unmap THPs.
Copy zap_huge_pmd() and add THP handling now? Perhaps, but their TLB
needs are different, I'm too ignorant of the DAX cases, and couldn't
decide how far to go for anon+swap.  Set that aside.

The second attempt took a different tack: make no change in truncate.c,
but modify zap_huge_pmd() to insert an invalidated huge pmd instead of
clearing it initially, then pmd_clear() between page_remove_rmap() and
unlocking at the end.  Nice.  But powerpc blows that approach out of the
water, with its serialize_against_pte_lookup(), and interesting pgtable
usage.  It would need serious help to get working on powerpc (with a
minor optimization issue on s390 too).  Set that aside.

Just add an "if (page_mapped(page)) synchronize_rcu();" or other such
delay, after unmapping in truncate_cleanup_page()? Perhaps, but though
that's likely to reduce or eliminate the number of incidents, it would
give less assurance of whether we had identified the problem correctly.

This successful iteration introduces "unmap_mapping_page(page)" instead
of try_to_unmap(), and goes the usual unmap_mapping_range_tree() route,
with an addition to details.  Then zap_pmd_range() watches for this
case, and does spin_unlock(pmd_lock) if so - just like
page_vma_mapped_walk() now does in the PVMW_SYNC case.  Not pretty, but
safe.

Note that unmap_mapping_page() is doing a VM_BUG_ON(!PageLocked) to
assert its interface; but currently that's only used to make sure that
page->mapping is stable, and zap_pmd_range() doesn't care if the page is
locked or not.  Along these lines, in invalidate_inode_pages2_range()
move the initial unmap_mapping_range() out from under page lock, before
then calling unmap_mapping_page() under page lock if still mapped.

Link: https://lkml.kernel.org/r/a2a4a148-cdd8-942c-4ef8-51b77f643dbe@google.com
Fixes: fc127da085 ("truncate: handle file thp")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Yang Shi <shy828301@gmail.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jue Wang <juew@google.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wang Yugui <wangyugui@e16-tech.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-06-16 09:24:42 -07:00
Matthew Wilcox (Oracle) 46be67b424 mm: stop accounting shadow entries
We no longer need to keep track of how many shadow entries are present in
a mapping.  This saves a few writes to the inode and memory barriers.

Link: https://lkml.kernel.org/r/20201026151849.24232-3-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 11:27:19 -07:00
Matthew Wilcox (Oracle) 7716506ada mm: introduce and use mapping_empty()
Patch series "Remove nrexceptional tracking", v2.

We actually use nrexceptional for very little these days.  It's a minor
pain to keep in sync with nrpages, but the pain becomes much bigger with
the THP patches because we don't know how many indices a shadow entry
occupies.  It's easier to just remove it than keep it accurate.

Also, we save 8 bytes per inode which is nothing to sneeze at; on my
laptop, it would improve shmem_inode_cache from 22 to 23 objects per
16kB, and inode_cache from 26 to 27 objects.  Combined, that saves
a megabyte of memory from a combined usage of 25MB for both caches.
Unfortunately, ext4 doesn't cross a magic boundary, so it doesn't save
any memory for ext4.

This patch (of 4):

Instead of checking the two counters (nrpages and nrexceptional), we can
just check whether i_pages is empty.

Link: https://lkml.kernel.org/r/20201026151849.24232-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20201026151849.24232-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Tested-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-05-05 11:27:19 -07:00
Matthew Wilcox (Oracle) a656a20241 mm: remove pagevec_lookup_entries
pagevec_lookup_entries() is now just a wrapper around find_get_entries()
so remove it and convert all its callers.

Link: https://lkml.kernel.org/r/20201112212641.27837-15-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-26 09:40:59 -08:00