Commit Graph

212 Commits

Author SHA1 Message Date
Ming Lei 0ca4538f8e blktrace: remove redundant return at end of function
JIRA: https://issues.redhat.com/browse/RHEL-79409

commit ccb9868ab7f4b253440b8723a3487b8b9a16d371
Author: Colin Ian King <colin.i.king@gmail.com>
Date:   Wed Dec 4 15:04:50 2024 +0000

    blktrace: remove redundant return at end of function

    A recent change added return 0 before an existing return statement
    at the end of function blk_trace_setup. The final return is now
    redundant, so remove it.

    Fixes: 64d124798244 ("blktrace: move copy_[to|from]_user() out of ->debugfs_lock")
    Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
    Link: https://lore.kernel.org/r/20241204150450.399005-1-colin.i.king@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2025-03-14 16:48:33 +08:00
Ming Lei 63350ca314 blktrace: move copy_[to|from]_user() out of ->debugfs_lock
JIRA: https://issues.redhat.com/browse/RHEL-79409

commit b769a2f409e7a356db852a1bb62a32f7809b3a3c
Author: Ming Lei <ming.lei@redhat.com>
Date:   Thu Nov 28 20:50:27 2024 +0800

    blktrace: move copy_[to|from]_user() out of ->debugfs_lock

    Move copy_[to|from]_user() out of ->debugfs_lock and cut the dependency
    between mm->mmap_lock and q->debugfs_lock, then we avoids lots of
    lockdep false positive warning. Obviously ->debug_lock isn't needed
    for copy_[to|from]_user().

    The only behavior change is to call blk_trace_remove() in case of setup
    failure handling by re-grabbing ->debugfs_lock, and this way is just
    fine since we do cover concurrent setup() & remove().

    Reported-by: syzbot+91585b36b538053343e4@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/linux-block/67450fd4.050a0220.1286eb.0007.GAE@google.com/
    Closes: https://lore.kernel.org/linux-block/6742e584.050a0220.1cc393.0038.GAE@google.com/
    Closes: https://lore.kernel.org/linux-block/6742a600.050a0220.1cc393.002e.GAE@google.com/
    Closes: https://lore.kernel.org/linux-block/67420102.050a0220.1cc393.0019.GAE@google.com/
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20241128125029.4152292-3-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2025-03-14 16:48:32 +08:00
Ming Lei 38b144a09a blktrace: don't centralize grabbing q->debugfs_mutex in blk_trace_ioctl
JIRA: https://issues.redhat.com/browse/RHEL-79409

commit fd9b0244f5c5f63461ca9752eebd2423ae02bb59
Author: Ming Lei <ming.lei@redhat.com>
Date:   Thu Nov 28 20:50:26 2024 +0800

    blktrace: don't centralize grabbing q->debugfs_mutex in blk_trace_ioctl

    Call each handler directly and the handler do grab q->debugfs_mutex,
    prepare for killing dependency between ->debug_mutex and ->mmap_lock.

    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20241128125029.4152292-2-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2025-03-14 16:48:32 +08:00
Ming Lei d94433cc97 block: remove more NULL checks after bdev_get_queue()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 9e0c7efa5ea231d85c0d41693a5115b3b971717c
Author: Juhyung Park <qkrwngud825@gmail.com>
Date:   Fri Feb 3 11:40:29 2023 +0900

    block: remove more NULL checks after bdev_get_queue()

    bdev_get_queue() never returns NULL. Several commits [1][2] have been made
    before to remove such superfluous checks, but some still remained.

    For places where bdev_get_queue() is called solely for NULL checks, it is
    removed entirely.

    [1] commit ec9fd2a13d74 ("blk-lib: don't check bdev_get_queue() NULL check")
    [2] commit fea127b36c93 ("block: remove superfluous check for request queue in bdev_is_zoned()")

    Signed-off-by: Juhyung Park <qkrwngud825@gmail.com>
    Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
    Link: https://lore.kernel.org/r/20230203024029.48260-1-qkrwngud825@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:44 +08:00
Ming Lei 87bb3f8f17 trace/blktrace: fix memory leak with using debugfs_lookup()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 83e8864fee26f63a7435e941b7c36a20fd6fe93e
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Thu Feb 2 15:19:56 2023 +0100

    trace/blktrace: fix memory leak with using debugfs_lookup()

    When calling debugfs_lookup() the result must have dput() called on it,
    otherwise the memory will leak over time.  To make things simpler, just
    call debugfs_lookup_and_remove() instead which handles all of the logic
    at once.

    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: linux-block@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Cc: linux-trace-kernel@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20230202141956.2299521-1-gregkh@linuxfoundation.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:43 +08:00
Ming Lei 1e5ec82724 blktrace: Fix output non-blktrace event when blk_classic option enabled
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit f596da3efaf4130ff61cd029558845808df9bf99
Author: Yang Jihong <yangjihong1@huawei.com>
Date:   Tue Nov 22 12:04:10 2022 +0800

    blktrace: Fix output non-blktrace event when blk_classic option enabled

    When the blk_classic option is enabled, non-blktrace events must be
    filtered out. Otherwise, events of other types are output in the blktrace
    classic format, which is unexpected.

    The problem can be triggered in the following ways:

      # echo 1 > /sys/kernel/debug/tracing/options/blk_classic
      # echo 1 > /sys/kernel/debug/tracing/events/enable
      # echo blk > /sys/kernel/debug/tracing/current_tracer
      # cat /sys/kernel/debug/tracing/trace_pipe

    Fixes: c71a896154 ("blktrace: add ftrace plugin")
    Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
    Link: https://lore.kernel.org/r/20221122040410.85113-1-yangjihong1@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:34 +08:00
Ming Lei 2844d2612f block: bdev & blktrace: use consistent function doc. notation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 2e833c8c8c42a3b6e22d6b3a9d2d18e425551261
Author: Randy Dunlap <rdunlap@infradead.org>
Date:   Wed Nov 30 23:03:31 2022 -0800

    block: bdev & blktrace: use consistent function doc. notation

    Use only one hyphen in kernel-doc notation between the function name
    and its short description.

    The is the documented kerenl-doc format. It also fixes the HTML
    presentation to be consistent with other functions.

    Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
    Cc: Jens Axboe <axboe@kernel.dk>
    Cc: linux-block@vger.kernel.org
    Link: https://lore.kernel.org/r/20221201070331.25685-1-rdunlap@infradead.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:33 +08:00
Ming Lei ee37b69754 blktrace: remove unnessary stop block trace in 'blk_trace_shutdown'
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit 2db96217e7e515071726ca4ec791742c4202a1b2
Author: Ye Bin <yebin10@huawei.com>
Date:   Wed Oct 19 11:36:02 2022 +0800

    blktrace: remove unnessary stop block trace in 'blk_trace_shutdown'

    As previous commit, 'blk_trace_cleanup' will stop block trace if
    block trace's state is 'Blktrace_running'.
    So remove unnessary stop block trace in 'blk_trace_shutdown'.

    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20221019033602.752383-4-yebin@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:51:12 +08:00
Ming Lei 6fcb4150b2 blktrace: fix possible memleak in '__blk_trace_remove'
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit dcd1a59c62dc49da75539213611156d6db50ab5d
Author: Ye Bin <yebin10@huawei.com>
Date:   Wed Oct 19 11:36:01 2022 +0800

    blktrace: fix possible memleak in '__blk_trace_remove'

    When test as follows:
    step1: ioctl(sda, BLKTRACESETUP, &arg)
    step2: ioctl(sda, BLKTRACESTART, NULL)
    step3: ioctl(sda, BLKTRACETEARDOWN, NULL)
    step4: ioctl(sda, BLKTRACESETUP, &arg)
    Got issue as follows:
    debugfs: File 'dropped' in directory 'sda' already present!
    debugfs: File 'msg' in directory 'sda' already present!
    debugfs: File 'trace0' in directory 'sda' already present!

    And also find syzkaller report issue like "KASAN: use-after-free Read in relay_switch_subbuf"
    "https://syzkaller.appspot.com/bug?id=13849f0d9b1b818b087341691be6cc3ac6a6bfb7"

    If remove block trace without stop(BLKTRACESTOP) block trace, '__blk_trace_remove'
    will just set 'q->blk_trace' with NULL. However, debugfs file isn't removed, so
    will report file already present when call BLKTRACESETUP.
    static int __blk_trace_remove(struct request_queue *q)
    {
            struct blk_trace *bt;

            bt = rcu_replace_pointer(q->blk_trace, NULL,
                                     lockdep_is_held(&q->debugfs_mutex));
            if (!bt)
                    return -EINVAL;

            if (bt->trace_state != Blktrace_running)
                    blk_trace_cleanup(q, bt);

            return 0;
    }

    If do test as follows:
    step1: ioctl(sda, BLKTRACESETUP, &arg)
    step2: ioctl(sda, BLKTRACESTART, NULL)
    step3: ioctl(sda, BLKTRACETEARDOWN, NULL)
    step4: remove sda

    There will remove debugfs directory which will remove recursively all file
    under directory.
    >> blk_release_queue
    >>      debugfs_remove_recursive(q->debugfs_dir)
    So all files which created in 'do_blk_trace_setup' are removed, and
    'dentry->d_inode' is NULL. But 'q->blk_trace' is still in 'running_trace_lock',
    'trace_note_tsk' will traverse 'running_trace_lock' all nodes.
    >>trace_note_tsk
    >>  trace_note
    >>    relay_reserve
    >>       relay_switch_subbuf
    >>        d_inode(buf->dentry)->i_size

    To solve above issues, reference commit '5afedf670caf', call 'blk_trace_cleanup'
    unconditionally in '__blk_trace_remove' and first stop block trace in
    'blk_trace_cleanup'.

    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20221019033602.752383-3-yebin@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:51:11 +08:00
Ming Lei d2c3f725ed blktrace: introduce 'blk_trace_{start,stop}' helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit 60a9bb9048f9e95029df10a9bc346f6b066c593c
Author: Ye Bin <yebin10@huawei.com>
Date:   Wed Oct 19 11:36:00 2022 +0800

    blktrace: introduce 'blk_trace_{start,stop}' helper

    Introduce 'blk_trace_{start,stop}' helper. No functional changed.

    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20221019033602.752383-2-yebin@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:51:11 +08:00
Ming Lei 1e5904dfd4 blktrace: Fix the blk_fill_rwbs() kernel-doc header
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit 020e3618cc81abf11fe6bffaac27861ff94707ce
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Fri Jul 15 11:47:35 2022 -0700

    blktrace: Fix the blk_fill_rwbs() kernel-doc header

    Reflect recent changes in the blk_fill_rwbs() kernel-doc header.

    Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: Chaitanya Kulkarni <kch@nvidia.com>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Fixes: 919dbca8670d ("blktrace: Use the new blk_opf_t type")
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20220715184735.2326034-3-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:22 +08:00
Ming Lei 460fc4fe2d blktrace: Use the new blk_opf_t type
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit 919dbca8670d0f7828dfbb2f9b434ac22dca8d2e
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Thu Jul 14 11:06:37 2022 -0700

    blktrace: Use the new blk_opf_t type

    Improve static type checking by using the new blk_opf_t type for a function
    argument that represents a combination of a request operation and request
    flags. Rename that argument from 'op' into 'opf' to make its role more
    clear.

    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20220714180729.1065367-12-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:19 +08:00
Ming Lei 8125e18349 blktrace: Trace remapped requests correctly
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit 22c80aac882f712897b88b7ea8f5a74ea19019df
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Thu Jul 14 11:06:36 2022 -0700

    blktrace: Trace remapped requests correctly

    Trace the remapped operation and its flags instead of only the data
    direction of remapped operations. This issue was detected by analyzing
    the warnings reported by sparse related to the new blk_opf_t type.

    Reviewed-by: Jun'ichi Nomura <junichi.nomura@nec.com>
    Cc: Mike Snitzer <snitzer@kernel.org>
    Cc: Mike Christie <michael.christie@oracle.com>
    Cc: Li Zefan <lizf@cn.fujitsu.com>
    Cc: Chaitanya Kulkarni <kch@nvidia.com>
    Fixes: 1b9a9ab78b ("blktrace: use op accessors")
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20220714180729.1065367-11-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:19 +08:00
Ming Lei 03a68278a3 block: remove bdevname
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit 900d156bac2bc474cf7c7bee4efbc6c83ec5ae58
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Jul 13 07:53:17 2022 +0200

    block: remove bdevname

    Replace the remaining calls of bdevname with snprintf using the %pg
    format specifier.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Link: https://lore.kernel.org/r/20220713055317.1888500-10-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:18 +08:00
Ming Lei 7ddabe0840 block: simplify blktrace sysfs attribute creation
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit cc5c516df028b221d94c65c47c5ae8d20f61b6f9
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Jun 28 19:18:45 2022 +0200

    block: simplify blktrace sysfs attribute creation

    Add the trace attributes to the default gendisk attributes, just like
    we already do for partitions.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20220628171850.1313069-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:15 +08:00
Ming Lei 86c6de1ac0 block: serialize all debugfs operations using q->debugfs_mutex
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094256

commit 5cf9c91ba927119fc6606b938b1895bb2459d3bc
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Jun 14 09:48:25 2022 +0200

    block: serialize all debugfs operations using q->debugfs_mutex

    Various places like I/O schedulers or the QOS infrastructure try to
    register debugfs files on demans, which can race with creating and
    removing the main queue debugfs directory.  Use the existing
    debugfs_mutex to serialize all debugfs operations that rely on
    q->debugfs_dir or the directories hanging off it.

    To make the teardown code a little simpler declare all debugfs dentry
    pointers and not just the main one uncoditionally in blkdev.h.

    Move debugfs_mutex next to the dentries that it protects and document
    what it is used for.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20220614074827.458955-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-29 09:46:50 +08:00
Ming Lei 38db1c7456 blk-cgroup: replace bio_blkcg with bio_blkcg_css
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917

commit bbb1ebe7a909db4de49777fb7676d5bf293f34c9
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Apr 20 06:27:17 2022 +0200

    blk-cgroup: replace bio_blkcg with bio_blkcg_css

    All callers of bio_blkcg actually want the CSS, so replace it with an
    interface that does return the CSS.  This now allows to move
    struct blkcg_gq to block/blk-cgroup.h instead of exposing it in a
    public header.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20220420042723.1010598-10-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-22 08:58:03 +08:00
Ming Lei 525946c019 blktrace: cleanup the __trace_note_message interface
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917

commit f4a6a61cb6d40d9ae63e47743d33200f3efe3fe7
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Apr 20 06:27:16 2022 +0200

    blktrace: cleanup the __trace_note_message interface

    Pass the cgroup_subsys_state instead of a the blkg so that blktrace
    doesn't need to poke into blk-cgroup internals, and give the name a
    blk prefix as the current name is way too generic for a public
    interface.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Acked-by: Tejun Heo <tj@kernel.org>
    Link: https://lore.kernel.org/r/20220420042723.1010598-9-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-22 08:58:03 +08:00
Ming Lei 8a0914e31d scsi: block: Remove REQ_OP_WRITE_SAME support
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917
Conflicts: small context difference in code removing [__]blkdev_issue_write_same

commit 73bd66d9c834220579c881a3eb020fd8917075d8
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Feb 9 09:28:28 2022 +0100

    scsi: block: Remove REQ_OP_WRITE_SAME support

    No more users of REQ_OP_WRITE_SAME or drivers implementing it are left,
    so remove the infrastructure.

    [mkp: fold in and tweak sysfs reporting fix]

    Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-22 08:57:59 +08:00
Ming Lei 9e603a920c blktrace: fix use after free for struct blk_trace
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066297

commit 30939293262eb433c960c4532a0d59c4073b2b84
Author: Yu Kuai <yukuai3@huawei.com>
Date:   Mon Feb 28 11:43:54 2022 +0800

    blktrace: fix use after free for struct blk_trace

    When tracing the whole disk, 'dropped' and 'msg' will be created
    under 'q->debugfs_dir' and 'bt->dir' is NULL, thus blk_trace_free()
    won't remove those files. What's worse, the following UAF can be
    triggered because of accessing stale 'dropped' and 'msg':

    ==================================================================
    BUG: KASAN: use-after-free in blk_dropped_read+0x89/0x100
    Read of size 4 at addr ffff88816912f3d8 by task blktrace/1188

    CPU: 27 PID: 1188 Comm: blktrace Not tainted 5.17.0-rc4-next-20220217+ #469
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-4
    Call Trace:
     <TASK>
     dump_stack_lvl+0x34/0x44
     print_address_description.constprop.0.cold+0xab/0x381
     ? blk_dropped_read+0x89/0x100
     ? blk_dropped_read+0x89/0x100
     kasan_report.cold+0x83/0xdf
     ? blk_dropped_read+0x89/0x100
     kasan_check_range+0x140/0x1b0
     blk_dropped_read+0x89/0x100
     ? blk_create_buf_file_callback+0x20/0x20
     ? kmem_cache_free+0xa1/0x500
     ? do_sys_openat2+0x258/0x460
     full_proxy_read+0x8f/0xc0
     vfs_read+0xc6/0x260
     ksys_read+0xb9/0x150
     ? vfs_write+0x3d0/0x3d0
     ? fpregs_assert_state_consistent+0x55/0x60
     ? exit_to_user_mode_prepare+0x39/0x1e0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae
    RIP: 0033:0x7fbc080d92fd
    Code: ce 20 00 00 75 10 b8 00 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 1
    RSP: 002b:00007fbb95ff9cb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
    RAX: ffffffffffffffda RBX: 00007fbb95ff9dc0 RCX: 00007fbc080d92fd
    RDX: 0000000000000100 RSI: 00007fbb95ff9cc0 RDI: 0000000000000045
    RBP: 0000000000000045 R08: 0000000000406299 R09: 00000000fffffffd
    R10: 000000000153afa0 R11: 0000000000000293 R12: 00007fbb780008c0
    R13: 00007fbb78000938 R14: 0000000000608b30 R15: 00007fbb780029c8
     </TASK>

    Allocated by task 1050:
     kasan_save_stack+0x1e/0x40
     __kasan_kmalloc+0x81/0xa0
     do_blk_trace_setup+0xcb/0x410
     __blk_trace_setup+0xac/0x130
     blk_trace_ioctl+0xe9/0x1c0
     blkdev_ioctl+0xf1/0x390
     __x64_sys_ioctl+0xa5/0xe0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

    Freed by task 1050:
     kasan_save_stack+0x1e/0x40
     kasan_set_track+0x21/0x30
     kasan_set_free_info+0x20/0x30
     __kasan_slab_free+0x103/0x180
     kfree+0x9a/0x4c0
     __blk_trace_remove+0x53/0x70
     blk_trace_ioctl+0x199/0x1c0
     blkdev_common_ioctl+0x5e9/0xb30
     blkdev_ioctl+0x1a5/0x390
     __x64_sys_ioctl+0xa5/0xe0
     do_syscall_64+0x35/0x80
     entry_SYSCALL_64_after_hwframe+0x44/0xae

    The buggy address belongs to the object at ffff88816912f380
     which belongs to the cache kmalloc-96 of size 96
    The buggy address is located 88 bytes inside of
     96-byte region [ffff88816912f380, ffff88816912f3e0)
    The buggy address belongs to the page:
    page:000000009a1b4e7c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0f
    flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
    raw: 0017ffffc0000200 ffffea00044f1100 dead000000000002 ffff88810004c780
    raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected

    Memory state around the buggy address:
     ffff88816912f280: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
     ffff88816912f300: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    >ffff88816912f380: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
                                                        ^
     ffff88816912f400: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
     ffff88816912f480: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
    ==================================================================

    Fixes: c0ea57608b ("blktrace: remove debugfs file dentries from struct blk_trace")
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Link: https://lore.kernel.org/r/20220228034354.4047385-1-yukuai3@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-04-11 11:44:36 +08:00
Ming Lei bd612d5288 block: remove the ->rq_disk field in struct request
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066297
Conflicts: scsi doesn't backport big cleanup of 51f3a47889287
	("scsi: core: Introduce the scsi_cmd_to_rq() function"),
	so we have to convert rq->rq_disk into rq->q->disk directly

commit f3fa33acca9f0058157214800f68b10d8e71ab7a
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Nov 26 13:18:00 2021 +0100

    block: remove the ->rq_disk field in struct request

    Just use the disk attached to the request_queue instead.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-04-11 11:44:24 +08:00
Wander Lairson Costa ec78db9ca2
blktrace: switch trace spinlock to a raw spinlock
Bugzilla: http://bugzilla.redhat.com/2047971

commit 361c81dbc58c8aa230e1f2d556045fa7bc3eb4a3
Author: Wander Lairson Costa <wander@redhat.com>
Date:   Mon Dec 20 16:28:27 2021 -0300

    blktrace: switch trace spinlock to a raw spinlock

    The running_trace_lock protects running_trace_list and is acquired
    within the tracepoint which implies disabled preemption. The spinlock_t
    typed lock can not be acquired with disabled preemption on PREEMPT_RT
    because it becomes a sleeping lock.
    The runtime of the tracepoint depends on the number of entries in
    running_trace_list and has no limit. The blk-tracer is considered debug
    code and higher latencies here are okay.

    Make running_trace_lock a raw_spinlock_t.

    Signed-off-by: Wander Lairson Costa <wander@redhat.com>
    Link: https://lore.kernel.org/r/20211220192827.38297-1-wander@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Wander Lairson Costa <wander@redhat.com>
2022-01-29 09:25:34 -03:00
Ming Lei 9bcf43d65f block: don't call blk_status_to_errno in blk_update_request
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403

commit 8a7d267b4a2c71a5ff5dd9046abea7117c7d0ac2
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Oct 18 10:45:18 2021 +0200

    block: don't call blk_status_to_errno in blk_update_request

    We only need to call it to resolve the blk_status_t -> errno mapping for
    tracing, so move the conversion into the tracepoints that are not called
    at all when tracing isn't enabled.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2021-12-06 16:44:48 +08:00
Ming Lei e27a36aff3 blktrace: Fix uaf in blk_trace access after removing by sysfs
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403

commit 5afedf670caf30a2b5a52da96eb7eac7dee6a9c9
Author: Zhihao Cheng <chengzhihao1@huawei.com>
Date:   Thu Sep 23 21:49:21 2021 +0800

    blktrace: Fix uaf in blk_trace access after removing by sysfs

    There is an use-after-free problem triggered by following process:

          P1(sda)                           P2(sdb)
                            echo 0 > /sys/block/sdb/trace/enable
                              blk_trace_remove_queue
                                synchronize_rcu
                                blk_trace_free
                                  relay_close
    rcu_read_lock
    __blk_add_trace
      trace_note_tsk
      (Iterate running_trace_list)
                                    relay_close_buf
                                      relay_destroy_buf
                                        kfree(buf)
        trace_note(sdb's bt)
          relay_reserve
            buf->offset <- nullptr deference (use-after-free) !!!
    rcu_read_unlock

    [  502.714379] BUG: kernel NULL pointer dereference, address:
    0000000000000010
    [  502.715260] #PF: supervisor read access in kernel mode
    [  502.715903] #PF: error_code(0x0000) - not-present page
    [  502.716546] PGD 103984067 P4D 103984067 PUD 17592b067 PMD 0
    [  502.717252] Oops: 0000 [#1] SMP
    [  502.720308] RIP: 0010:trace_note.isra.0+0x86/0x360
    [  502.732872] Call Trace:
    [  502.733193]  __blk_add_trace.cold+0x137/0x1a3
    [  502.733734]  blk_add_trace_rq+0x7b/0xd0
    [  502.734207]  blk_add_trace_rq_issue+0x54/0xa0
    [  502.734755]  blk_mq_start_request+0xde/0x1b0
    [  502.735287]  scsi_queue_rq+0x528/0x1140
    ...
    [  502.742704]  sg_new_write.isra.0+0x16e/0x3e0
    [  502.747501]  sg_ioctl+0x466/0x1100

    Reproduce method:
      ioctl(/dev/sda, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
      ioctl(/dev/sda, BLKTRACESTART)
      ioctl(/dev/sdb, BLKTRACESETUP, blk_user_trace_setup[buf_size=127])
      ioctl(/dev/sdb, BLKTRACESTART)

      echo 0 > /sys/block/sdb/trace/enable &
      // Add delay(mdelay/msleep) before kernel enters blk_trace_free()

      ioctl$SG_IO(/dev/sda, SG_IO, ...)
      // Enters trace_note_tsk() after blk_trace_free() returned
      // Use mdelay in rcu region rather than msleep(which may schedule out)

    Remove blk_trace from running_list before calling blk_trace_free() by
    sysfs if blk_trace is at Blktrace_running state.

    Fixes: c71a896154 ("blktrace: add ftrace plugin")
    Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
    Link: https://lore.kernel.org/r/20210923134921.109194-1-chengzhihao1@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2021-12-06 16:39:56 +08:00
Linus Torvalds 3ab6608e66 block-5.12-2021-02-27
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmA6njIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprolD/9zWti9LsZvA7yE+PhVwrwF3CsNzLfQlClw
 99HaA7HxtAc/VLJrnD/SubhCAPdBC5B2xPv6faajdwF2iUR3Rr1Uc93CQ3uP2KKq
 kvm6ALTpzPTMI6YSABhY74sg9BkkoDbMo54JQYVQPleiE+5eDLbuFZck6ObfUHyY
 a4aaImlndWp/t14GzrClL4hucF+5KJy846P+QCVclkh0yl8xSsqZ5LIFU7tu3iQb
 HpZ5HKLT/2ma/EOr3wknnsIe97AUZQU0q5aMparhYlm+qR511eop3QXx850FL/oC
 tEGceKLij6qazmkiocKVzML8Fs+Y9/a4vCMjLCScWJmzDlmKdlH2uudeahN6b9Hm
 15qRQHOjl1Hc2bdr5ZVn87nq9RWhSm18C+SRMwOKHCOnEhwxqM3RjRfAgj4BJ6QB
 PFbFqdY+8Y1YLPFmn9hph72ePaEcN4L2IXW6TI/WX8mot8ODAnkq9Hr38dKwzO+i
 0mon6DVyJKKho6XwvVu5IYurkR2beQprjeVUxwZjjT6DxUgsc+J6itK5LDHFSkeZ
 qZlXn5Di8MkiXg0DFJYDQiFXnO0Z5GlRWOGPVfBaOr3x+1dqzDdHGw4oz1oGqvnr
 GNNYCsYIpDGm7eauX5lqL5MUFpjqRCceXy5JSHPhnWWw617nYkr4H9jdsV9HiTX1
 tQFx05QW3w==
 =ccMs
 -----END PGP SIGNATURE-----

Merge tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block

Pull more block updates from Jens Axboe:
 "A few stragglers (and one due to me missing it originally), and fixes
  for changes in this merge window mostly. In particular:

   - blktrace cleanups (Chaitanya, Greg)

   - Kill dead blk_pm_* functions (Bart)

   - Fixes for the bio alloc changes (Christoph)

   - Fix for the partition changes (Christoph, Ming)

   - Fix for turning off iopoll with polled IO inflight (Jeffle)

   - nbd disconnect fix (Josef)

   - loop fsync error fix (Mauricio)

   - kyber update depth fix (Yang)

   - max_sectors alignment fix (Mikulas)

   - Add bio_max_segs helper (Matthew)"

* tag 'block-5.12-2021-02-27' of git://git.kernel.dk/linux-block: (21 commits)
  block: Add bio_max_segs
  blktrace: fix documentation for blk_fill_rw()
  block: memory allocations in bounce_clone_bio must not fail
  block: remove the gfp_mask argument to bounce_clone_bio
  block: fix bounce_clone_bio for passthrough bios
  block-crypto-fallback: use a bio_set for splitting bios
  block: fix logging on capacity change
  blk-settings: align max_sectors on "logical_block_size" boundary
  block: reopen the device in blkdev_reread_part
  block: don't skip empty device in in disk_uevent
  blktrace: remove debugfs file dentries from struct blk_trace
  nbd: handle device refs for DESTROY_ON_DISCONNECT properly
  kyber: introduce kyber_depth_updated()
  loop: fix I/O error on fsync() in detached loop devices
  block: fix potential IO hang when turning off io_poll
  block: get rid of the trace rq insert wrapper
  blktrace: fix blk_rq_merge documentation
  blktrace: fix blk_rq_issue documentation
  blktrace: add blk_fill_rwbs documentation comment
  block: remove superfluous param in blk_fill_rwbs()
  ...
2021-02-28 11:23:38 -08:00
Chaitanya Kulkarni 94d4bffdda blktrace: fix documentation for blk_fill_rw()
Add missing ":" after rwbs function parameter documentation that fixes
following warning :-

./kernel/trace/blktrace.c:1877: warning: Function parameter or member 'rwbs' not described in 'blk_fill_rwbs'

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 1f83bb4b49 ("blktrace: add blk_fill_rwbs documentation comment")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-24 08:55:30 -07:00
Greg Kroah-Hartman c0ea57608b blktrace: remove debugfs file dentries from struct blk_trace
These debugfs dentries do not need to be saved for anything as the whole
directory and everything in it is properly cleaned up when the parent
directory is removed.  So remove them from struct blk_trace and don't
save them when created as it's not necessary.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-23 09:54:51 -07:00
Linus Torvalds c958423470 Tracing updates for 5.12
- Update to the way irqs and preemption is tracked via the trace event PC field
 
  - Fix handling of unregistering event failing due to allocate memory.
    This is only triggered by failure injection, as it is pretty much guaranteed
    to have less than a page allocation succeed.
 
  - Do not show the useless "filter" or "enable" files for the "ftrace" trace
    system, as they have no effect on doing anything.
 
  - Add a warning if kprobes are registered more than once.
 
  - Synthetic events now have their fields parsed by semicolons.
    Old formats without semicolons will still work, but new features will
    require them.
 
  - New option to allow trace events to show %p without hashing in trace file.
    The trace file can only be read by root, and reading the raw event buffer
    did not have any pointers hashed, so this does not expose anything new.
 
  - New directory in tools called tools/tracing, where a new tool that reads
    sequential latency reports from the ftrace latency tracers.
 
  - Other minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYDL2wBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qti6AP0RUcSU5U1onx8DcwPQLC5Xr3CPqJkm
 RvKeJDdgFP+sVgEAiMTFsy2UMc0gmlHZMFd5nZLSiJCu1I2hHmhS5yKbHgY=
 =fD9+
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:

 - Update to the way irqs and preemption is tracked via the trace event
   PC field

 - Fix handling of unregistering event failing due to allocate memory.
   This is only triggered by failure injection, as it is pretty much
   guaranteed to have less than a page allocation succeed.

 - Do not show the useless "filter" or "enable" files for the "ftrace"
   trace system, as they have no effect on doing anything.

 - Add a warning if kprobes are registered more than once.

 - Synthetic events now have their fields parsed by semicolons. Old
   formats without semicolons will still work, but new features will
   require them.

 - New option to allow trace events to show %p without hashing in trace
   file. The trace file can only be read by root, and reading the raw
   event buffer did not have any pointers hashed, so this does not
   expose anything new.

 - New directory in tools called tools/tracing, where a new tool that
   reads sequential latency reports from the ftrace latency tracers.

 - Other minor fixes and cleanups.

* tag 'trace-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  kprobes: Fix to delay the kprobes jump optimization
  tracing/tools: Add the latency-collector to tools directory
  tracing: Make hash-ptr option default
  tracing: Add ptr-hash option to show the hashed pointer value
  tracing: Update the stage 3 of trace event macro comment
  tracing: Show real address for trace event arguments
  selftests/ftrace: Add '!event' synthetic event syntax check
  selftests/ftrace: Update synthetic event syntax errors
  tracing: Add a backward-compatibility check for synthetic event creation
  tracing: Update synth command errors
  tracing: Rework synthetic event command parsing
  tracing/dynevent: Delegate parsing to create function
  kprobes: Warn if the kprobe is reregistered
  ftrace: Remove unused ftrace_force_update()
  tracepoints: Code clean up
  tracepoints: Do not punish non static call users
  tracepoints: Remove unnecessary "data_args" macro parameter
  tracing: Do not create "enable" or "filter" files for ftrace event subsystem
  kernel: trace: preemptirq_delay_test: add cpu affinity
  tracepoint: Do not fail unregistering a probe due to memory failure
  ...
2021-02-22 14:07:15 -08:00
Chaitanya Kulkarni 1f83bb4b49 blktrace: add blk_fill_rwbs documentation comment
blk_fill_rwbs() is an expoted function, add kernel style documentation
comment.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-22 06:37:41 -07:00
Chaitanya Kulkarni 179d160072 block: remove superfluous param in blk_fill_rwbs()
The last parameter for the function blk_fill_rwbs() was added in
5782138e47 ("tracing/events: convert block trace points to
TRACE_EVENT()") in order to signal read request and use of that parameter
was replaced with using switch case REQ_OP_READ with
1b9a9ab78b ("blktrace: use op accessors"), but the parameter was never
removed.

Remove the unused parameter and adjust the respective call sites.

Fixes: 1b9a9ab78b ("blktrace: use op accessors")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-22 06:37:41 -07:00
Sebastian Andrzej Siewior 36590c50b2 tracing: Merge irqflags + preempt counter.
The state of the interrupts (irqflags) and the preemption counter are
both passed down to tracing_generic_entry_update(). Only one bit of
irqflags is actually required: The on/off state. The complete 32bit
of the preemption counter isn't needed. Just whether of the upper bits
(softirq, hardirq and NMI) are set and the preemption depth is needed.

The irqflags and the preemption counter could be evaluated early and the
information stored in an integer `trace_ctx'.
tracing_generic_entry_update() would use the upper bits as the
TRACE_FLAG_* and the lower 8bit as the disabled-preemption depth
(considering that one must be substracted from the counter in one
special cases).

The actual preemption value is not used except for the tracing record.
The `irqflags' variable is mostly used only for the tracing record. An
exception here is for instance wakeup_tracer_call() or
probe_wakeup_sched_switch() which explicilty disable interrupts and use
that `irqflags' to save (and restore) the IRQ state and to record the
state.

Struct trace_event_buffer has also the `pc' and flags' members which can
be replaced with `trace_ctx' since their actual value is not used
outside of trace recording.

This will reduce tracing_generic_entry_update() to simply assign values
to struct trace_entry. The evaluation of the TRACE_FLAG_* bits is moved
to _tracing_gen_ctx_flags() which replaces preempt_count() and
local_save_flags() invocations.

As an example, ftrace_syscall_enter() may invoke:
- trace_buffer_lock_reserve() -> … -> tracing_generic_entry_update()
- event_trigger_unlock_commit()
  -> ftrace_trace_stack() -> … -> tracing_generic_entry_update()
  -> ftrace_trace_userstack() -> … -> tracing_generic_entry_update()

In this case the TRACE_FLAG_* bits were evaluated three times. By using
the `trace_ctx' they are evaluated once and assigned three times.

A build with all tracers enabled on x86-64 with and without the patch:

    text     data      bss      dec      hex    filename
21970669 17084168  7639260 46694097  2c87ed1 vmlinux.old
21970293 17084168  7639260 46693721  2c87d59 vmlinux.new

text shrank by 379 bytes, data remained constant.

Link: https://lkml.kernel.org/r/20210125194511.3924915-2-bigeasy@linutronix.de

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-02 17:02:06 -05:00
Christoph Hellwig 309dca309f block: store a block_device pointer in struct bio
Replace the gendisk pointer in struct bio with a pointer to the newly
improved struct block device.  From that the gendisk can be trivially
accessed with an extra indirection, but it also allows to directly
look up all information related to partition remapping.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24 18:17:20 -07:00
Linus Torvalds 09c0796adf Tracing updates for 5.11
The major update to this release is that there's a new arch config option called:
 CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS. Currently, only x86_64 enables it.
 All the ftrace callbacks now take a struct ftrace_regs instead of a struct
 pt_regs. If the architecture has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then
 the ftrace_regs will have enough information to read the arguments of the
 function being traced, as well as access to the stack pointer. This way, if
 a user (like live kernel patching) only cares about the arguments, then it
 can avoid using the heavier weight "regs" callback, that puts in enough
 information in the struct ftrace_regs to simulate a breakpoint exception
 (needed for kprobes).
 
 New config option that audits the timestamps of the ftrace ring buffer at
 most every event recorded.  The "check_buffer()" calls will conflict with
 mainline, because I purposely added the check without including the fix that
 it caught, which is in mainline. Running a kernel built from the commit of
 the added check will trigger it.
 
 Ftrace recursion protection has been cleaned up to move the protection to
 the callback itself (this saves on an extra function call for those
 callbacks).
 
 Perf now handles its own RCU protection and does not depend on ftrace to do
 it for it (saving on that extra function call).
 
 New debug option to add "recursed_functions" file to tracefs that lists all
 the places that triggered the recursion protection of the function tracer.
 This will show where things need to be fixed as recursion slows down the
 function tracer.
 
 The eval enum mapping updates done at boot up are now offloaded to a work
 queue, as it caused a noticeable pause on slow embedded boards.
 
 Various clean ups and last minute fixes.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCX9uq8xQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtrwAQCHevqWMjKc1Q76bnCgwB0AbFKB6vqy
 5b6g/co5+ihv8wD/eJPWlZMAt97zTVW7bdp5qj/GTiCDbAsODMZ597LsxA0=
 =rZEz
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing updates from Steven Rostedt:
 "The major update to this release is that there's a new arch config
  option called CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS.

  Currently, only x86_64 enables it. All the ftrace callbacks now take a
  struct ftrace_regs instead of a struct pt_regs. If the architecture
  has HAVE_DYNAMIC_FTRACE_WITH_ARGS enabled, then the ftrace_regs will
  have enough information to read the arguments of the function being
  traced, as well as access to the stack pointer.

  This way, if a user (like live kernel patching) only cares about the
  arguments, then it can avoid using the heavier weight "regs" callback,
  that puts in enough information in the struct ftrace_regs to simulate
  a breakpoint exception (needed for kprobes).

  A new config option that audits the timestamps of the ftrace ring
  buffer at most every event recorded.

  Ftrace recursion protection has been cleaned up to move the protection
  to the callback itself (this saves on an extra function call for those
  callbacks).

  Perf now handles its own RCU protection and does not depend on ftrace
  to do it for it (saving on that extra function call).

  New debug option to add "recursed_functions" file to tracefs that
  lists all the places that triggered the recursion protection of the
  function tracer. This will show where things need to be fixed as
  recursion slows down the function tracer.

  The eval enum mapping updates done at boot up are now offloaded to a
  work queue, as it caused a noticeable pause on slow embedded boards.

  Various clean ups and last minute fixes"

* tag 'trace-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
  tracing: Offload eval map updates to a work queue
  Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"
  ring-buffer: Add rb_check_bpage in __rb_allocate_pages
  ring-buffer: Fix two typos in comments
  tracing: Drop unneeded assignment in ring_buffer_resize()
  tracing: Disable ftrace selftests when any tracer is running
  seq_buf: Avoid type mismatch for seq_buf_init
  ring-buffer: Fix a typo in function description
  ring-buffer: Remove obsolete rb_event_is_commit()
  ring-buffer: Add test to validate the time stamp deltas
  ftrace/documentation: Fix RST C code blocks
  tracing: Clean up after filter logic rewriting
  tracing: Remove the useless value assignment in test_create_synth_event()
  livepatch: Use the default ftrace_ops instead of REGS when ARGS is available
  ftrace/x86: Allow for arguments to be passed in to ftrace_regs by default
  ftrace: Have the callbacks receive a struct ftrace_regs instead of pt_regs
  MAINTAINERS: assign ./fs/tracefs to TRACING
  tracing: Fix some typos in comments
  ftrace: Remove unused varible 'ret'
  ring-buffer: Add recording of ring buffer recursion into recursed_functions
  ...
2020-12-17 13:22:17 -08:00
Linus Torvalds ac7ac4618c for-5.11/block-2020-12-14
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/Xec8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoLbEACzXypgZWwMdfgRckA/Vt333rXHtbhUV+hK
 2XP+P81iRvr9Esi31UPbRp82vrgcDO0cpI1QmQojS5U5TIQP88BfXptfRZZu48eb
 wT5RDDNQ34HItqAh/yEuYsv9yUKcxeIrB99tBVvM+4UmQg9zTdIW3mg6PvCBdbhV
 N38jI0tCF/PJatjfRuphT/nXonQLPWBlVDmZk06KZQFOwQe9ep1vUi1+nbiRPuo3
 geFBpTh1Kp6Vl1B3n4RpECs6Y7I0RRuJdaH2sDizICla1/BW91F9fQwHimNnUxUq
 e1Q1kMuh6ftcQGkYlHSYcPhuv6CvorldTZCO5arPxWpcwvxriTSMRPWAgUr5pEiF
 fhiGhqeDu9e6vl9vS31wUD1B30hy+jFz9wyjRrDwJ3cPHH1JVBjTzvdX+cIh/1ku
 IbIwUMteUtvUrzqAv/DzbGhedp7xWtOFaVo8j0QFYh9zkjd6b8yDOF/yztwX2gjY
 Xt1cd+KpDSiN449ZRaoMI0sCJAxqzhMa6nsWlb0L7KuNyWKAbvKQBm9Rb47FLV9A
 Vx70KC+zkFoyw23capvIahmQazerriUJ5PGe0lVm6ROgmIFdCpXTPDjnrvq/6RZ/
 GEpD7gTW9atGJ7EuEE8686sAfKD5kneChWLX5EHXf0d0AG5Mr2lKsluiGp5LpPJg
 Q1Xqs6xwww==
 =zo4w
 -----END PGP SIGNATURE-----

Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "Another series of killing more code than what is being added, again
  thanks to Christoph's relentless cleanups and tech debt tackling.

  This contains:

   - blk-iocost improvements (Baolin Wang)

   - part0 iostat fix (Jeffle Xu)

   - Disable iopoll for split bios (Jeffle Xu)

   - block tracepoint cleanups (Christoph Hellwig)

   - Merging of struct block_device and hd_struct (Christoph Hellwig)

   - Rework/cleanup of how block device sizes are updated (Christoph
     Hellwig)

   - Simplification of gendisk lookup and removal of block device
     aliasing (Christoph Hellwig)

   - Block device ioctl cleanups (Christoph Hellwig)

   - Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig)

   - Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig)

   - sbitmap improvements (Pavel Begunkov)

   - Hybrid polling fix (Pavel Begunkov)

   - bvec iteration improvements (Pavel Begunkov)

   - Zone revalidation fixes (Damien Le Moal)

   - blk-throttle limit fix (Yu Kuai)

   - Various little fixes"

* tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits)
  blk-mq: fix msec comment from micro to milli seconds
  blk-mq: update arg in comment of blk_mq_map_queue
  blk-mq: add helper allocating tagset->tags
  Revert "block: Fix a lockdep complaint triggered by request queue flushing"
  nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class
  blk-mq: add new API of blk_mq_hctx_set_fq_lock_class
  block: disable iopoll for split bio
  block: Improve blk_revalidate_disk_zones() checks
  sbitmap: simplify wrap check
  sbitmap: replace CAS with atomic and
  sbitmap: remove swap_lock
  sbitmap: optimise sbitmap_deferred_clear()
  blk-mq: skip hybrid polling if iopoll doesn't spin
  blk-iocost: Factor out the base vrate change into a separate function
  blk-iocost: Factor out the active iocgs' state check into a separate function
  blk-iocost: Move the usage ratio calculation to the correct place
  blk-iocost: Remove unnecessary advance declaration
  blk-iocost: Fix some typos in comments
  blktrace: fix up a kerneldoc comment
  block: remove the request_queue to argument request based tracepoints
  ...
2020-12-16 12:57:51 -08:00
Jani Nikula abf4e00c7b blktrace: make relay callbacks const
Now that relay_open() accepts const callbacks, make relay callbacks
const.

Link: https://lkml.kernel.org/r/7ff5ce0b735901eb4f10e13da2704f1d8c4a2507.1606153547.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-12-15 22:46:18 -08:00
Christoph Hellwig 45dc656aeb blktrace: fix up a kerneldoc comment
Fixes: a54895fa05 ("block: remove the request_queue to argument request based tracepoints")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07 13:20:31 -07:00
Christoph Hellwig a54895fa05 block: remove the request_queue to argument request based tracepoints
The request_queue can trivially be derived from the request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Christoph Hellwig 1c02fca620 block: remove the request_queue argument to the block_bio_remap tracepoint
The request_queue can trivially be derived from the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Christoph Hellwig eb6f7f7cd3 block: remove the request_queue argument to the block_split tracepoint
The request_queue can trivially be derived from the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Christoph Hellwig e8a676d61c block: simplify and extend the block_bio_merge tracepoint class
The block_bio_merge tracepoint class can be reused for most bio-based
tracepoints.  For that it just needs to lose the superfluous q and rq
parameters.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Christoph Hellwig b81b8f40c5 block: remove the unused block_sleeprq tracepoint
The block_sleeprq tracepoint was only used by the legacy request code.
Remove it now that the legacy request code is gone.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-04 09:42:00 -07:00
Christoph Hellwig 0d02129e76 block: merge struct block_device and struct hd_struct
Instead of having two structures that represent each block device with
different life time rules, merge them into a single one.  This also
greatly simplifies the reference counting rules, as we can use the inode
reference count as the main reference count for the new struct
block_device, with the device model reference front ending it for device
model interaction.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:40 -07:00
Christoph Hellwig 29ff57c610 block: move the start_sect field to struct block_device
Move the start_sect field to struct block_device in preparation
of killing struct hd_struct.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:40 -07:00
Christoph Hellwig a782483cc1 block: remove the nr_sects field in struct hd_struct
Now that the hd_struct always has a block device attached to it, there is
no need for having two size field that just get out of sync.

Additionally the field in hd_struct did not use proper serialization,
possibly allowing for torn writes.  By only using the block_device field
this problem also gets fixed.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Coly Li <colyli@suse.de>			[bcache]
Acked-by: Chao Yu <yuchao0@huawei.com>			[f2fs]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-01 14:53:40 -07:00
Qiujun Huang 2b5894cc33 tracing: Fix some typos in comments
s/detetector/detector/
s/enfoced/enforced/
s/writen/written/
s/actualy/actually/
s/bascially/basically/
s/Regarldess/Regardless/
s/zeroes/zeros/
s/followd/followed/
s/incrememented/incremented/
s/separatelly/separately/
s/accesible/accessible/
s/sythetic/synthetic/
s/enabed/enabled/
s/heurisitc/heuristic/
s/assocated/associated/
s/otherwides/otherwise/
s/specfied/specified/
s/seaching/searching/
s/hierachry/hierarchy/
s/internel/internal/
s/Thise/This/

Link: https://lkml.kernel.org/r/20201029150554.3354-1-hqjagain@gmail.com

Signed-off-by: Qiujun Huang <hqjagain@gmail.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-11-10 20:39:40 -05:00
Christoph Hellwig 10ed16662d block: add a bdget_part helper
All remaining callers of bdget() outside of fs/block_dev.c want to get a
reference to the struct block_device for a given struct hd_struct.  Add
a helper just for that and then mark bdget static.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-10-05 10:38:33 -06:00
Christoph Hellwig fa01b1e973 block: add a bdev_is_partition helper
Add a littler helper to make the somewhat arcane bd_contains checks a
little more obvious.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-25 08:18:57 -06:00
Wang Hai e75ad2cc41 blktrace: make function blk_trace_bio_get_cgid() static
The sparse tool complains as follows:

kernel/trace/blktrace.c:796:5: warning:
 symbol 'blk_trace_bio_get_cgid' was not declared. Should it be static?

This function is not used outside of blktrace.c, so this commit
marks it static.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-09-07 20:11:15 -06:00
Gustavo A. R. Silva df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Jan Kara f3bdc62fd8 blktrace: Provide event for request merging
Currently blk-mq does not report any event when two requests get merged
in the elevator. This then results in difficult to understand sequence
of events like:

...
  8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
  8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
  8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
  8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
  8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
  8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]

and you can only guess (after quite some headscratching since the above
excerpt is intermixed with a lot of other IO) that request 215023544+56
got merged to request 215023504+40. Provide proper event for request
merging like we used to do in the legacy block layer.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-06-25 21:06:11 -06:00