Commit Graph

92 Commits

Author SHA1 Message Date
Rafael Aquini 3a57aa85e2 kernfs: drop shared NUMA mempolicy hooks
JIRA: https://issues.redhat.com/browse/RHEL-27745

This patch is a backport of the following upstream commit:
commit 4b981bc1aa73c204c2aa7f99b5f4f74d03b0e381
Author: Hugh Dickins <hughd@google.com>
Date:   Tue Oct 3 02:16:29 2023 -0700

    kernfs: drop shared NUMA mempolicy hooks

    It seems strange that kernfs should be an outlier with a set_policy and
    get_policy in its kernfs_vm_ops.  Ah, it dates back to v2.6.30's commit
    095160aee9 ("sysfs: fix some bin_vm_ops errors"), when I had crashed on
    powerpc's pci_mmap_legacy_page_range() fallback to shmem_zero_setup().

    Well, that was commendably thorough, to give sysfs-bin a set_policy and
    get_policy, just to avoid the way it was coded resulting in EINVAL from
    mmap when CONFIG_NUMA; but somehow feels a bit over-the-top to me now.

    It's easier to say that nobody should expect to manage a shmem object's
    shared NUMA mempolicy via some kernfs backdoor to that object: delete that
    code (and there's no longer an EINVAL from mmap in the NUMA case).

    This then leaves set_policy/get_policy as implemented only by shmem -
    though importantly also by SysV SHM, which has to interface with shmem
    which implements them, and with SHM_HUGETLB which does not.

    Link: https://lkml.kernel.org/r/302164-a760-4a9e-879b-6870c9b4013@google.com
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: Andi Kleen <ak@linux.intel.com>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: "Huang, Ying" <ying.huang@intel.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Nhat Pham <nphamcs@gmail.com>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Tejun heo <tj@kernel.org>
    Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
    Cc: Yang Shi <shy828301@gmail.com>
    Cc: Yosry Ahmed <yosryahmed@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-12-09 12:23:09 -05:00
Waiman Long 982a1c3490 kernfs: fix all kernel-doc warnings and multiple typos
JIRA: https://issues.redhat.com/browse/RHEL-56023

commit 24b3e3dd9c9c742a4dd18e71b6963f9e7ab72911
Author: Randy Dunlap <rdunlap@infradead.org>
Date:   Fri, 11 Nov 2022 19:14:56 -0800

    kernfs: fix all kernel-doc warnings and multiple typos

    Fix kernel-doc warnings. Many of these are about a function's
    return value, so use the kernel-doc Return: format to fix those

    Use % prefix on numeric constant values.

    dir.c: fix typos/spellos
    file.c fix typo: s/taret/target/

    Fix all of these kernel-doc warnings:

    dir.c:305: warning: missing initial short description on line:
     *      kernfs_name_hash

    dir.c:137: warning: No description found for return value of 'kernfs_path_from_node_locked'
    dir.c:196: warning: No description found for return value of 'kernfs_name'
    dir.c:224: warning: No description found for return value of 'kernfs_path_from_node'
    dir.c:292: warning: No description found for return value of 'kernfs_get_parent'
    dir.c:312: warning: No description found for return value of 'kernfs_name_hash'
    dir.c:404: warning: No description found for return value of 'kernfs_unlink_sibling'
    dir.c:588: warning: No description found for return value of 'kernfs_node_from_dentry'
    dir.c:806: warning: No description found for return value of 'kernfs_find_ns'
    dir.c:879: warning: No description found for return value of 'kernfs_find_and_get_ns'
    dir.c:904: warning: No description found for return value of 'kernfs_walk_and_get_ns'
    dir.c:927: warning: No description found for return value of 'kernfs_create_root'
    dir.c:996: warning: No description found for return value of 'kernfs_root_to_node'
    dir.c:1016: warning: No description found for return value of 'kernfs_create_dir_ns'
    dir.c:1048: warning: No description found for return value of 'kernfs_create_empty_dir'
    dir.c:1306: warning: No description found for return value of 'kernfs_next_descendant_post'
    dir.c:1568: warning: No description found for return value of 'kernfs_remove_self'
    dir.c:1630: warning: No description found for return value of 'kernfs_remove_by_name_ns'
    dir.c:1667: warning: No description found for return value of 'kernfs_rename_ns'

    file.c:66: warning: No description found for return value of 'of_on'
    file.c:88: warning: No description found for return value of 'kernfs_deref_open_node_locked'
    file.c:1036: warning: No description found for return value of '__kernfs_create_file'

    inode.c💯 warning: No description found for return value of 'kernfs_setattr'

    mount.c:160: warning: No description found for return value of 'kernfs_root_from_sb'
    mount.c:198: warning: No description found for return value of 'kernfs_node_dentry'
    mount.c:302: warning: No description found for return value of 'kernfs_super_ns'
    mount.c:318: warning: No description found for return value of 'kernfs_get_tree'

    symlink.c:28: warning: No description found for return value of 'kernfs_create_link'

    Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Tejun Heo <tj@kernel.org>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20221112031456.22980-1-rdunlap@infradead.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-09-30 09:46:58 -04:00
Waiman Long 4781e8fdc9 kernfs: Fix typo 'the the' in comment
JIRA: https://issues.redhat.com/browse/RHEL-56023

commit 3fe4076482789c2c4a772f6676b246a0d96c99c4
Author: Slark Xiao <slark_xiao@163.com>
Date:   Fri, 22 Jul 2022 18:05:18 +0800

    kernfs: Fix typo 'the the' in comment

    Replace 'the the' with 'the' in the comment.

    Signed-off-by: Slark Xiao <slark_xiao@163.com>
    Link: https://lore.kernel.org/r/20220722100518.79741-1-slark_xiao@163.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-09-30 09:46:58 -04:00
Waiman Long 3cf12fb5ae kernfs: fix typos in comments
JIRA: https://issues.redhat.com/browse/RHEL-56023

commit 1970a0623002a13845b7db4c45a67402e11b3011
Author: Julia Lawall <Julia.Lawall@inria.fr>
Date:   Mon, 14 Mar 2022 12:53:28 +0100

    kernfs: fix typos in comments

    Various spelling mistakes in comments.
    Detected with the help of Coccinelle.

    Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
    Link: https://lore.kernel.org/r/20220314115354.144023-5-Julia.Lawall@inria.fr
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2024-09-30 09:46:57 -04:00
Rafael Aquini 800fb8128b tty, proc, kernfs, random: Use copy_splice_read()
JIRA: https://issues.redhat.com/browse/RHEL-27742

This patch is a backport of the following upstream commit:
commit b0072734ffaa3f5fec64058d0d3333765d789bc0
Author: David Howells <dhowells@redhat.com>
Date:   Mon May 22 14:49:59 2023 +0100

    tty, proc, kernfs, random: Use copy_splice_read()

    Use copy_splice_read() for tty, procfs, kernfs and random files rather
    than going through generic_file_splice_read() as they just copy the file
    into the output buffer and don't splice pages.  This avoids the need for
    them to have a ->read_folio() to satisfy filemap_splice_read().

    Signed-off-by: David Howells <dhowells@redhat.com>
    Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    cc: Christoph Hellwig <hch@lst.de>
    cc: Jens Axboe <axboe@kernel.dk>
    cc: Al Viro <viro@zeniv.linux.org.uk>
    cc: John Hubbard <jhubbard@nvidia.com>
    cc: David Hildenbrand <david@redhat.com>
    cc: Matthew Wilcox <willy@infradead.org>
    cc: Miklos Szeredi <miklos@szeredi.hu>
    cc: Arnd Bergmann <arnd@arndb.de>
    cc: linux-block@vger.kernel.org
    cc: linux-fsdevel@vger.kernel.org
    cc: linux-mm@kvack.org
    Link: https://lore.kernel.org/r/20230522135018.2742245-13-dhowells@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Rafael Aquini <raquini@redhat.com>
2024-09-05 20:35:54 -04:00
Ian Kent cafaee9ad8 kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info
JIRA: https://issues.redhat.com/browse/RHEL-52956
Upstream status: Linus

Conflicts: There was a reject when applying the single hunk to
	fs/kernfs/kernfs-internal.h due to the needed RH_KABI_EXTEND()
	of the previous patch in this series.

commit c9f2dfb7b59e5a6db054f821a6e1a6db8fa57d64
Author: Imran Khan <imran.f.khan@oracle.com>
Date:   Thu Mar 9 22:09:31 2023 +1100

    kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info.

    Right now per-fs kernfs_rwsem protects list of kernfs_super_info instances
    for a kernfs_root. Since kernfs_rwsem is used to synchronize several other
    operations across kernfs and since most of these operations don't impact
    kernfs_super_info, we can use a separate per-fs rwsem to synchronize access
    to list of kernfs_super_info.
    This helps in reducing contention around kernfs_rwsem and also allows
    operations that change/access list of kernfs_super_info to proceed without
    contending for kernfs_rwsem.

    Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
    Link: https://lore.kernel.org/r/20230309110932.2889010-3-imran.f.khan@oracle.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2024-08-09 18:44:29 +08:00
Waiman Long e1c53c6d0c kernfs: Skip kernfs_drain_open_files() more aggressively
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit bdb2fd7fc56e197a63c0b0e7e07d25d5e20e7c72
Author: Tejun Heo <tj@kernel.org>
Date:   Sat, 27 Aug 2022 19:04:35 -1000

    kernfs: Skip kernfs_drain_open_files() more aggressively

    Track the number of mmapped files and files that need to be released and
    skip kernfs_drain_open_file() if both are zero, which are the precise
    conditions which require draining open_files. The early exit test is
    factored into kernfs_should_drain_open_files() which is now tested by
    kernfs_drain_open_files()'s caller - kernfs_drain().

    This isn't a meaningful optimization on its own but will enable future
    stand-alone kernfs_deactivate() implementation.

    v2: Chengming noticed that on->nr_to_release was leaking after ->open()
        failure. Fix it by telling kernfs_unlink_open_file() that it's called
        from the ->open() fail path and should dec the counter. Use kzalloc() to
        allocate kernfs_open_node so that the tracking fields are correctly
        initialized.

    Cc: Chengming Zhou <zhouchengming@bytedance.com>
    Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
    Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20220828050440.734579-5-tj@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:22 -05:00
Waiman Long a5939d41a4 kernfs: Refactor kernfs_get_open_node()
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit cf2dc9db93704c24f3d6d87d3bd09ae970446d1f
Author: Tejun Heo <tj@kernel.org>
Date:   Sat, 27 Aug 2022 19:04:34 -1000

    kernfs: Refactor kernfs_get_open_node()

    Factor out commont part. This is cleaner and should help with future
    changes. No functional changes.

    Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
    Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20220828050440.734579-4-tj@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:21 -05:00
Waiman Long 00d958223e kernfs: Drop unnecessary "mutex" local variable initialization
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit b52c2379c38ffa49cbf10e30abc9dc4f9c051d41
Author: Tejun Heo <tj@kernel.org>
Date:   Sat, 27 Aug 2022 19:04:33 -1000

    kernfs: Drop unnecessary "mutex" local variable initialization

    These are unnecessary and unconventional. Remove them. Also move variable
    declaration into the block that it's used. No functional changes.

    Cc: Imran Khan <imran.f.khan@oracle.com>
    Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
    Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20220828050440.734579-3-tj@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:21 -05:00
Waiman Long c56e1fbddd kernfs: Simply by replacing kernfs_deref_open_node() with of_on()
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit 3db48aca879db475844182a24d1760ee3d230627
Author: Tejun Heo <tj@kernel.org>
Date:   Sat, 27 Aug 2022 19:04:32 -1000

    kernfs: Simply by replacing kernfs_deref_open_node() with of_on()

    kernfs_node->attr.open is an RCU pointer to kernfs_open_node. However, RCU
    dereference is currently only used in kernfs_notify(). Everywhere else,
    either we're holding the lock which protects it or know that the
    kernfs_open_node is pinned becaused we have a pointer to a kernfs_open_file
    which is hanging off of it.

    kernfs_deref_open_node() is used for the latter case - accessing
    kernfs_open_node from kernfs_open_file. The lifetime and visibility rules
    are simple and clear here. To someone who can access a kernfs_open_file, its
    kernfs_open_node is pinned and visible through of->kn->attr.open.

    Replace kernfs_deref_open_node() which simpler of_on(). The former takes
    both @kn and @of and RCU deref @kn->attr.open while sanity checking with
    @of. The latter takes @of and uses protected deref on of->kn->attr.open.

    As the return value can't be NULL, remove the error handling in the callers
    too.

    This shouldn't cause any functional changes.

    Cc: Imran Khan <imran.f.khan@oracle.com>
    Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
    Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com>
    Signed-off-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20220828050440.734579-2-tj@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:20 -05:00
Waiman Long e4702ab3bc Revert "kernfs: Change kernfs_notify_list to llist."
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit 2fd26970cf66bd52dc42843c46968040caa8c9a1
Author: Imran Khan <imran.f.khan@oracle.com>
Date:   Wed, 6 Jul 2022 06:10:26 +1000

    Revert "kernfs: Change kernfs_notify_list to llist."

    This reverts commit b8f35fa1188b84035c59d4842826c4e93a1b1c9f.

    This is causing regression due to same kernfs_node getting
    added multiple times in kernfs_notify_list so revert it until
    safe way of using llist in this context is found.

    Reported-by: Nathan Chancellor <nathan@kernel.org>
    Reported-by: Michael Walle <michael@walle.cc>
    Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
    Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20220705201026.2487665-1-imran.f.khan@oracle.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:20 -05:00
Waiman Long eb2bcac9fa kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit 1d25b84e444ad66313c473407979ea9cd33deb3f
Author: Imran Khan <imran.f.khan@oracle.com>
Date:   Wed, 15 Jun 2022 12:10:59 +1000

    kernfs: Replace global kernfs_open_file_mutex with hashed mutexes.

    In current kernfs design a single mutex, kernfs_open_file_mutex, protects
    the list of kernfs_open_file instances corresponding to a sysfs attribute.
    So even if different tasks are opening or closing different sysfs files
    they can contend on osq_lock of this mutex. The contention is more apparent
    in large scale systems with few hundred CPUs where most of the CPUs have
    running tasks that are opening, accessing or closing sysfs files at any
    point of time.

    Using hashed mutexes in place of a single global mutex, can significantly
    reduce contention around global mutex and hence can provide better
    scalability. Moreover as these hashed mutexes are not part of kernfs_node
    objects we will not see any singnificant change in memory utilization of
    kernfs based file systems like sysfs, cgroupfs etc.

    Modify interface introduced in previous patch to make use of hashed
    mutexes. Use kernfs_node address as hashing key.

    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
    Link: https://lore.kernel.org/r/20220615021059.862643-5-imran.f.khan@oracle.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:19 -05:00
Waiman Long b4a2815964 kernfs: Introduce interface to access global kernfs_open_file_mutex.
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit 41448c614815965d1cdfa720df34257b84afbb9d
Author: Imran Khan <imran.f.khan@oracle.com>
Date:   Wed, 15 Jun 2022 12:10:58 +1000

    kernfs: Introduce interface to access global kernfs_open_file_mutex.

    This allows to change underlying mutex locking, without needing to change
    the users of the lock. For example next patch modifies this interface to
    use hashed mutexes in place of a single global kernfs_open_file_mutex.

    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
    Link: https://lore.kernel.org/r/20220615021059.862643-4-imran.f.khan@oracle.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:19 -05:00
Waiman Long eedf1934c1 kernfs: Change kernfs_notify_list to llist.
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit b8f35fa1188b84035c59d4842826c4e93a1b1c9f
Author: Imran Khan <imran.f.khan@oracle.com>
Date:   Wed, 15 Jun 2022 12:10:57 +1000

    kernfs: Change kernfs_notify_list to llist.

    At present kernfs_notify_list is implemented as a singly linked
    list of kernfs_node(s), where last element points to itself and
    value of ->attr.next tells if node is present on the list or not.
    Both addition and deletion to list happen under kernfs_notify_lock.

    Change kernfs_notify_list to llist so that addition to list can heppen
    locklessly.

    Suggested by: Al Viro <viro@zeniv.linux.org.uk>

    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
    Link: https://lore.kernel.org/r/20220615021059.862643-3-imran.f.khan@oracle.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:18 -05:00
Waiman Long 1cad48d133 kernfs: make ->attr.open RCU protected.
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit 086c00c71fc8d47db6983f419a45f9ee167de03f
Author: Imran Khan <imran.f.khan@oracle.com>
Date:   Wed, 15 Jun 2022 12:10:56 +1000

    kernfs: make ->attr.open RCU protected.

    After removal of kernfs_open_node->refcnt in the previous patch,
    kernfs_open_node_lock can be removed as well by making ->attr.open
    RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open
    to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock.

    Suggested by: Al Viro <viro@zeniv.linux.org.uk>

    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
    Link: https://lore.kernel.org/r/20220615021059.862643-2-imran.f.khan@oracle.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:18 -05:00
Waiman Long 6e107fe972 kernfs/file.c: remove redundant error return counter assignment
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit dcab8da13ff4886aab26348b925d20dca4f12bac
Author: Lin Feng <linf@wangsu.com>
Date:   Fri, 17 Jun 2022 17:17:46 +0800

    kernfs/file.c: remove redundant error return counter assignment

    Since previous 'rc = -EINVAL;', rc value doesn't change, so not
    necessary to re-assign it again.

    Signed-off-by: Lin Feng <linf@wangsu.com>
    Link: https://lore.kernel.org/r/20220617091746.206515-1-linf@wangsu.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:17 -05:00
Waiman Long 375694d33a kernfs: Rename kernfs_put_open_node to kernfs_unlink_open_file.
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit c1b1352f21bcf8c0678c4d4fbfafc4f6729e1daa
Author: Imran Khan <imran.f.khan@oracle.com>
Date:   Wed, 4 May 2022 19:51:19 +1000

    kernfs: Rename kernfs_put_open_node to kernfs_unlink_open_file.

    Since we are no longer using refcnt for kernfs_open_node instances, rename
    kernfs_put_open_node to kernfs_unlink_open_file to reflect this change.
    Also update function description and inline comments accordingly.

    Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
    Link: https://lore.kernel.org/r/20220504095123.295859-2-imran.f.khan@oracle.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:14 -05:00
Waiman Long 8e1af693cf kernfs: Remove reference counting for kernfs_open_node.
JIRA: https://issues.redhat.com/browse/RHEL-16027

commit bd900901b8d1838bf1b6e63063e0025fca42d283
Author: Imran Khan <imran.f.khan@oracle.com>
Date:   Thu, 24 Mar 2022 21:30:39 +1100

    kernfs: Remove reference counting for kernfs_open_node.

    The decision to free kernfs_open_node object in kernfs_put_open_node can
    be taken based on whether kernfs_open_node->files list is empty or not. As
    far as kernfs_drain_open_files is concerned it can't overlap with
    kernfs_fops_open and hence can check for ->attr.open optimistically
    (if ->attr.open is NULL) or under kernfs_open_file_mutex (if it needs to
    traverse the ->files list.) Thus kernfs_drain_open_files can work w/o ref
    counting involved kernfs_open_node as well.
    So remove ->refcnt and modify the above mentioned users accordingly.

    Suggested by: Al Viro <viro@zeniv.linux.org.uk>

    Signed-off-by: Imran Khan <imran.f.khan@oracle.com>
    Link: https://lore.kernel.org/r/20220324103040.584491-2-imran.f.khan@oracle.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Waiman Long <longman@redhat.com>
2023-11-08 14:47:14 -05:00
Luis Claudio R. Goncalves acf160f57a kernfs: switch global kernfs_rwsem lock to per-fs lock
Bugzilla: http://bugzilla.redhat.com/2152737
Upstream status: master tree.

commit 393c3714081a53795bbff0e985d24146def6f57f
Author: Minchan Kim <minchan@kernel.org>
Date:   Thu Nov 18 15:00:08 2021 -0800

    kernfs: switch global kernfs_rwsem lock to per-fs lock

    The kernfs implementation has big lock granularity(kernfs_rwsem) so
    every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
    lock. It makes trouble for some cases to wait the global lock
    for a long time even though they are totally independent contexts
    each other.

    A general example is process A goes under direct reclaim with holding
    the lock when it accessed the file in sysfs and process B is waiting
    the lock with exclusive mode and then process C is waiting the lock
    until process B could finish the job after it gets the lock from
    process A.

    This patch switches the global kernfs_rwsem to per-fs lock, which
    put the rwsem into kernfs_root.

    Suggested-by: Tejun Heo <tj@kernel.org>
    Acked-by: Tejun Heo <tj@kernel.org>
    Signed-off-by: Minchan Kim <minchan@kernel.org>
    Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
2022-12-12 19:32:17 -03:00
Ian Kent 81193d508b kernfs: switch kernfs to use an rwsem
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2004858
Upstream status: Linus
Testing: The series has been included in RHEL-8 and customer
	testing has been done there. The upstreaming process
	includes fairly broad general testing as well.

commit 7ba0273b2f34a55efe967d3c7381fb1da2ca195f
From: Ian Kent <raven@themaw.net>
Date: 2021-07-16 17:28:29 +0800

    kernfs: switch kernfs to use an rwsem

    The kernfs global lock restricts the ability to perform kernfs node
    lookup operations in parallel during path walks.

    Change the kernfs mutex to an rwsem so that, when opportunity arises,
    node searches can be done in parallel with path walk lookups.

    Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
    Signed-off-by: Ian Kent <raven@themaw.net>
    Link: https://lore.kernel.org/r/162642770946.63632.2218304587223241374.stgit@web.messagingengine.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Signed-off-by: Ian Kent <ikent@redhat.com>
2021-11-29 13:54:20 +08:00
Christoph Hellwig f2d6c2708b kernfs: wire up ->splice_read and ->splice_write
Wire up the splice_read and splice_write methods to the default
helpers using ->read_iter and ->write_iter now that those are
implemented for kernfs.  This restores support to use splice and
sendfile on kernfs files.

Fixes: 36e2c7421f ("fs: don't allow splice read/write without explicit ops")
Reported-by: Siddharth Gupta <sidgup@codeaurora.org>
Tested-by: Siddharth Gupta <sidgup@codeaurora.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210120204631.274206-4-hch@lst.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-21 18:30:28 +01:00
Christoph Hellwig cc099e0b39 kernfs: implement ->write_iter
Switch kernfs to implement the write_iter method instead of plain old
write to prepare to supporting splice and sendfile again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210120204631.274206-3-hch@lst.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-21 18:30:28 +01:00
Christoph Hellwig 4eaad21a6a kernfs: implement ->read_iter
Switch kernfs to implement the read_iter method instead of plain old
read to prepare to supporting splice and sendfile again.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20210120204631.274206-2-hch@lst.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-21 18:30:28 +01:00
Amir Goldstein 40a100d3ad fsnotify: pass dir and inode arguments to fsnotify()
The arguments of fsnotify() are overloaded and mean different things
for different event types.

Replace the to_tell argument with separate arguments @dir and @inode,
because we may be sending to both dir and child.  Using the @data
argument to pass the child is not enough, because dirent events pass
this argument (for audit), but we do not report to child.

Document the new fsnotify() function argumenets.

Link: https://lore.kernel.org/r/20200722125849.17418-7-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27 23:15:48 +02:00
Amir Goldstein 82ace1efb3 fsnotify: create helper fsnotify_inode()
Simple helper to consolidate biolerplate code.

Link: https://lore.kernel.org/r/20200722125849.17418-5-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27 23:13:51 +02:00
Amir Goldstein 497b0c5a7c fsnotify: send event to parent and child with single callback
Instead of calling fsnotify() twice, once with parent inode and once
with child inode, if event should be sent to parent inode, send it
with both parent and child inodes marks in object type iterator and call
the backend handle_event() callback only once.

The parent inode is assigned to the standard "inode" iterator type and
the child inode is assigned to the special "child" iterator type.

In that case, the bit FS_EVENT_ON_CHILD will be set in the event mask,
the dir argument to handle_event will be the parent inode, the file_name
argument to handle_event is non NULL and refers to the name of the child
and the child inode can be accessed with fsnotify_data_inode().

This will allow fanotify to make decisions based on child or parent's
ignored mask.  For example, when a parent is interested in a specific
event on its children, but a specific child wishes to ignore this event,
the event will not be reported.  This is not what happens with current
code, but according to man page, it is the expected behavior.

Link: https://lore.kernel.org/r/20200716084230.30611-15-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-27 21:24:52 +02:00
Amir Goldstein 9991bb84b2 kernfs: do not call fsnotify() with name without a parent
When creating an FS_MODIFY event on inode itself (not on parent)
the file_name argument should be NULL.

The change to send a non NULL name to inode itself was done on purpuse
as part of another commit, as Tejun writes: "...While at it, supply the
target file name to fsnotify() from kernfs_node->name.".

But this is wrong practice and inconsistent with inotify behavior when
watching a single file.  When a child is being watched (as opposed to the
parent directory) the inotify event should contain the watch descriptor,
but not the file name.

Fixes: df6a58c5c5 ("kernfs: don't depend on d_find_any_alias()...")
Link: https://lore.kernel.org/r/20200708111156.24659-5-amir73il@gmail.com
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2020-07-15 17:36:52 +02:00
Michel Lespinasse c1e8d7c6a7 mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead.

[akpm@linux-foundation.org: fix up linux-next leftovers]
[akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil]
[akpm@linux-foundation.org: more linux-next fixups, per Michel]

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-09 09:39:14 -07:00
Waiman Long 0f605db5bd kernfs: Change kernfs_node lockdep name to "kn->active"
The kernfs_node lockdep tracking is being done on kn->active, the
active reference count. The other reference count (kn->count) is not
tracked by lockdep. So change the lockdep name to reflect what it is
tracking.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20200402171056.27871-1-longman@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-23 16:59:15 +02:00
Tejun Heo 67c0496e87 kernfs: convert kernfs_node->id from union kernfs_node_id to u64
kernfs_node->id is currently a union kernfs_node_id which represents
either a 32bit (ino, gen) pair or u64 value.  I can't see much value
in the usage of the union - all that's needed is a 64bit ID which the
current code is already limited to.  Using a union makes the code
unnecessarily complicated and prevents using 64bit ino without adding
practical benefits.

This patch drops union kernfs_node_id and makes kernfs_node->id a u64.
ino is stored in the lower 32bits and gen upper.  Accessors -
kernfs[_id]_ino() and kernfs[_id]_gen() - are added to retrieve the
ino and gen.  This simplifies ID handling less cumbersome and will
allow using 64bit inos on supported archs.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexei Starovoitov <ast@kernel.org>
2019-11-12 08:18:03 -08:00
Thomas Gleixner 55716d2643 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428
Based on 1 normalized pattern(s):

  this file is released under the gplv2

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 68 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:16 +02:00
Al Viro 25b229dff4 fsnotify(): switch to passing const struct qstr * for file_name
Note that in fnsotify_move() and fsnotify_link() we are guaranteed
that dentry->d_name won't change during the fsnotify() evaluation
(by having the parent directory locked exclusive), so we don't
need to fetch dentry->d_name.name in the callers.  In fsnotify_dirent()
the same stability of dentry->d_name is also true, but it's a bit
more convoluted - there is one callchain (devpts_pty_new() ->
fsnotify_create() -> fsnotify_dirent()) where the parent is _not_
locked, but on devpts ->d_name of everything is unchanging; it
has neither explicit nor implicit renames.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-04-26 13:37:25 -04:00
Johannes Weiner 147e1a97c4 fs: kernfs: add poll file operation
Patch series "psi: pressure stall monitors", v3.

Android is adopting psi to detect and remedy memory pressure that
results in stuttering and decreased responsiveness on mobile devices.

Psi gives us the stall information, but because we're dealing with
latencies in the millisecond range, periodically reading the pressure
files to detect stalls in a timely fashion is not feasible.  Psi also
doesn't aggregate its averages at a high enough frequency right now.

This patch series extends the psi interface such that users can
configure sensitive latency thresholds and use poll() and friends to be
notified when these are breached.

As high-frequency aggregation is costly, it implements an aggregation
method that is optimized for fast, short-interval averaging, and makes
the aggregation frequency adaptive, such that high-frequency updates
only happen while monitored stall events are actively occurring.

With these patches applied, Android can monitor for, and ward off,
mounting memory shortages before they cause problems for the user.  For
example, using memory stall monitors in userspace low memory killer
daemon (lmkd) we can detect mounting pressure and kill less important
processes before device becomes visibly sluggish.

In our memory stress testing psi memory monitors produce roughly 10x
less false positives compared to vmpressure signals.  Having ability to
specify multiple triggers for the same psi metric allows other parts of
Android framework to monitor memory state of the device and act
accordingly.

The new interface is straightforward.  The user opens one of the
pressure files for writing and writes a trigger description into the
file descriptor that defines the stall state - some or full, and the
maximum stall time over a given window of time.  E.g.:

        /* Signal when stall time exceeds 100ms of a 1s window */
        char trigger[] = "full 100000 1000000";
        fd = open("/proc/pressure/memory");
        write(fd, trigger, sizeof(trigger));
        while (poll() >= 0) {
                ...
        }
        close(fd);

When the monitored stall state is entered, psi adapts its aggregation
frequency according to what the configured time window requires in order
to emit event signals in a timely fashion.  Once the stalling subsides,
aggregation reverts back to normal.

The trigger is associated with the open file descriptor.  To stop
monitoring, the user only needs to close the file descriptor and the
trigger is discarded.

Patches 1-4 prepare the psi code for polling support.  Patch 5
implements the adaptive polling logic, the pressure growth detection
optimized for short intervals, and hooks up write() and poll() on the
pressure files.

The patches were developed in collaboration with Johannes Weiner.

This patch (of 5):

Kernfs has a standardized poll/notification mechanism for waking all
pollers on all fds when a filesystem node changes.  To allow polling for
custom events, add a .poll callback that can override the default.

This is in preparation for pollable cgroup pressure files which have
per-fd trigger configurations.

Link: http://lkml.kernel.org/r/20190124211518.244221-2-surenb@google.com
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-05 21:07:17 -08:00
Radu Rendec 03c0a9208b kernfs: Improve kernfs_notify() poll notification latency
kernfs_notify() does two notifications: poll and fsnotify. Originally,
both notifications were done from scheduled work context and all that
kernfs_notify() did was schedule the work.

This patch simply moves the poll notification from the scheduled work
handler to kernfs_notify(). The fsnotify notification still needs to be
done from scheduled work context because it can sleep (it needs to lock
a mutex).

If the poll notification is time critical (the notified thread needs to
wake as quickly as possible), it's better to do it from kernfs_notify()
directly. One example is calling sysfs_notify_dirent() from a hardware
interrupt handler to wake up a thread and handle the interrupt in user
space.

Signed-off-by: Radu Rendec <radu.rendec@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-11-27 11:59:33 +01:00
Dmitry Torokhov 488dee96bb kernfs: allow creating kernfs objects with arbitrary uid/gid
This change allows creating kernfs files and directories with arbitrary
uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
and always create objects belonging to the global root.

When creating symlinks ownership (uid/gid) is taken from the target kernfs
object.

Co-Developed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-07-20 23:44:35 -07:00
Souptick Joarder 9ee84466b7 fs: kernfs: Adding new return type vm_fault_t
Use new return type vm_fault_t for page_mkwrite and
fault handler. For now, this is just documenting that
the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t
will become a distinct type.

Reference id -> 1c8f422059 ("mm: change return type to
vm_fault_t")

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-23 13:52:34 +02:00
Linus Torvalds a9a08845e9 vfs: do bulk POLL* -> EPOLL* replacement
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.

Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11 14:34:03 -08:00
Linus Torvalds 878e66d06f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs fixes from Al Viro.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  seq_file: fix incomplete reset on read from zero offset
  kernfs: fix regression in kernfs_fop_write caused by wrong type
2018-02-09 19:22:17 -08:00
Ivan Vecera ba87977a49 kernfs: fix regression in kernfs_fop_write caused by wrong type
Commit b7ce40cff0 ("kernfs: cache atomic_write_len in
kernfs_open_file") changes type of local variable 'len' from ssize_t
to size_t. This change caused that the *ppos value is updated also
when the previous write callback failed.

Mentioned snippet:
...
len = ops->write(...); <- return value can be negative
...
if (len > 0)           <- true here in this case
        *ppos += len;
...

Fixes: b7ce40cff0 ("kernfs: cache atomic_write_len in kernfs_open_file")
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-19 12:19:13 -05:00
Al Viro 076ccb76e1 fs: annotate ->poll() instances
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-11-27 16:20:05 -05:00
Linus Torvalds a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Waiman Long 39bf04db6b kernfs: Clarify lockdep name for kn->count
The reference count in kernfs_node structure is treated like a rwsem by
using lockdep instrumentation code. The lockdep name, however, is still
"s_active" which is carried over from the old sysfs code. As s_active
is no longer the variable name, its use may confuse users on where the
lock is when it is reported by lockdep. So it is changed to "kn->count"
which is how this variable is normally referenced in kernfs code.

Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-28 16:50:15 +02:00
Shaohua Li c53cd490b1 kernfs: introduce kernfs_node_id
inode number and generation can identify a kernfs node. We are going to
export the identification by exportfs operations, so put ino and
generation into a separate structure. It's convenient when later patches
use the identification.

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Shaohua Li 319ba91d35 kernfs: don't set dentry->d_fsdata
When working on adding exportfs operations in kernfs, I found it's hard
to initialize dentry->d_fsdata in the exportfs operations. Looks there
is no way to do it without race condition. Look at the kernfs code
closely, there is no point to set dentry->d_fsdata. inode->i_private
already points to kernfs_node, and we can get inode from a dentry. So
this patch just delete the d_fsdata usage.

Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-29 09:00:03 -06:00
Vaibhav Jain 966fa72a71 kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file()
Recently started seeing a kernel oops when a module tries removing a
memory mapped sysfs bin_attribute. On closer investigation the root
cause seems to be kernfs_release_file() trying to call
kernfs_op.release() callback that's NULL for such sysfs
bin_attributes. The oops occurs when kernfs_release_file() is called from
kernfs_drain_open_files() to cleanup any open handles with active
memory mappings.

The patch fixes this by checking for flag KERNFS_HAS_RELEASE before
calling kernfs_release_file() in function kernfs_drain_open_files().

On ppc64-le arch with cxl module the oops back-trace is of the
form below:
[  861.381126] Unable to handle kernel paging request for instruction fetch
[  861.381360] Faulting instruction address: 0x00000000
[  861.381428] Oops: Kernel access of bad area, sig: 11 [#1]
....
[  861.382481] NIP: 0000000000000000 LR: c000000000362c60 CTR:
0000000000000000
....
Call Trace:
[c000000f1680b750] [c000000000362c34] kernfs_drain_open_files+0x104/0x1d0 (unreliable)
[c000000f1680b790] [c00000000035fa00] __kernfs_remove+0x260/0x2c0
[c000000f1680b820] [c000000000360da0] kernfs_remove_by_name_ns+0x60/0xe0
[c000000f1680b8b0] [c0000000003638f4] sysfs_remove_bin_file+0x24/0x40
[c000000f1680b8d0] [c00000000062a164] device_remove_bin_file+0x24/0x40
[c000000f1680b8f0] [d000000009b7b22c] cxl_sysfs_afu_remove+0x144/0x170 [cxl]
[c000000f1680b940] [d000000009b7c7e4] cxl_remove+0x6c/0x1a0 [cxl]
[c000000f1680b990] [c00000000052f694] pci_device_remove+0x64/0x110
[c000000f1680b9d0] [c0000000006321d4] device_release_driver_internal+0x1f4/0x2b0
[c000000f1680ba20] [c000000000525cb0] pci_stop_bus_device+0xa0/0xd0
[c000000f1680ba60] [c000000000525e80] pci_stop_and_remove_bus_device+0x20/0x40
[c000000f1680ba90] [c00000000004a6c4] pci_hp_remove_devices+0x84/0xc0
[c000000f1680bad0] [c00000000004a688] pci_hp_remove_devices+0x48/0xc0
[c000000f1680bb10] [c0000000009dfda4] eeh_reset_device+0xb0/0x290
[c000000f1680bbb0] [c000000000032b4c] eeh_handle_normal_event+0x47c/0x530
[c000000f1680bc60] [c000000000032e64] eeh_handle_event+0x174/0x350
[c000000f1680bd10] [c000000000033228] eeh_event_handler+0x1e8/0x1f0
[c000000f1680bdc0] [c0000000000d384c] kthread+0x14c/0x190
[c000000f1680be30] [c00000000000b5a0] ret_from_kernel_thread+0x5c/0xbc

Fixes: f83f3c5156 ("kernfs: fix locking around kernfs_ops->release() callback")
Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-17 10:25:59 +09:00
Ingo Molnar 589ee62844 sched/headers: Prepare to remove the <linux/mm_types.h> dependency from <linux/sched.h>
Update code that relied on sched.h including various MM types for them.

This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:37 +01:00
Linus Torvalds f7878dc3a9 Merge branch 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
 "Several noteworthy changes.

   - Parav's rdma controller is finally merged. It is very straight
     forward and can limit the abosolute numbers of common rdma
     constructs used by different cgroups.

   - kernel/cgroup.c got too chubby and disorganized. Created
     kernel/cgroup/ subdirectory and moved all cgroup related files
     under kernel/ there and reorganized the core code. This hurts for
     backporting patches but was long overdue.

   - cgroup v2 process listing reimplemented so that it no longer
     depends on allocating a buffer large enough to cache the entire
     result to sort and uniq the output. v2 has always mangled the sort
     order to ensure that users don't depend on the sorted output, so
     this shouldn't surprise anybody. This makes the pid listing
     functions use the same iterators that are used internally, which
     have to have the same iterating capabilities anyway.

   - perf cgroup filtering now works automatically on cgroup v2. This
     patch was posted a long time ago but somehow fell through the
     cracks.

   - misc fixes asnd documentation updates"

* 'for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (27 commits)
  kernfs: fix locking around kernfs_ops->release() callback
  cgroup: drop the matching uid requirement on migration for cgroup v2
  cgroup, perf_event: make perf_event controller work on cgroup2 hierarchy
  cgroup: misc cleanups
  cgroup: call subsys->*attach() only for subsystems which are actually affected by migration
  cgroup: track migration context in cgroup_mgctx
  cgroup: cosmetic update to cgroup_taskset_add()
  rdmacg: Fixed uninitialized current resource usage
  cgroup: Add missing cgroup-v2 PID controller documentation.
  rdmacg: Added documentation for rdmacg
  IB/core: added support to use rdma cgroup controller
  rdmacg: Added rdma cgroup controller
  cgroup: fix a comment typo
  cgroup: fix RCU related sparse warnings
  cgroup: move namespace code to kernel/cgroup/namespace.c
  cgroup: rename functions for consistency
  cgroup: move v1 mount functions to kernel/cgroup/cgroup-v1.c
  cgroup: separate out cgroup1_kf_syscall_ops
  cgroup: refactor mount path and clearly distinguish v1 and v2 paths
  cgroup: move cgroup v1 specific code to kernel/cgroup/cgroup-v1.c
  ...
2017-02-27 21:41:08 -08:00
Dave Jiang 11bac80004 mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf
->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to
take a vma and vmf parameter when the vma already resides in vmf.

Remove the vma parameter to simplify things.

[arnd@arndb.de: fix ARM build]
  Link: http://lkml.kernel.org/r/20170125223558.1451224-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Tejun Heo f83f3c5156 kernfs: fix locking around kernfs_ops->release() callback
The release callback may be called from two places - file release
operation and kernfs open file draining.  kernfs_open_file->mutex is
used to synchronize the two callsites.  This unfortunately leads to
possible circular locking because of->mutex is used to protect the
usual kernfs operations which may use locking constructs which are
held while removing and thus draining kernfs files.

@of->mutex is for synchronizing concurrent kernfs access operations
and all we need here is synchronization between the releaes and drain
paths.  As the drain path has to grab kernfs_open_file_mutex anyway,
let's use the mutex to synchronize the release operation instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Fixes: 0e67db2f9f ("kernfs: add kernfs_ops->open/release() callbacks")
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-21 15:49:25 -05:00
Tejun Heo 0e67db2f9f kernfs: add kernfs_ops->open/release() callbacks
Add ->open/release() methods to kernfs_ops.  ->open() is called when
the file is opened and ->release() when the file is either released or
severed.  These callbacks can be used, for example, to manage
persistent caching objects over multiple seq_file iterations.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Acked-by: Zefan Li <lizefan@huawei.com>
2016-12-27 14:49:03 -05:00