Commit Graph

76 Commits

Author SHA1 Message Date
Ming Lei 03ea72cb4b lib/sbitmap: define swap_lock as raw_spinlock_t
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit 65f666c6203600053478ce8e34a1db269a8701c9
Author: Ming Lei <ming.lei@redhat.com>
Date:   Thu Sep 19 10:17:09 2024 +0800

    lib/sbitmap: define swap_lock as raw_spinlock_t

    When called from sbitmap_queue_get(), sbitmap_deferred_clear() may be run
    with preempt disabled. In RT kernel, spin_lock() can sleep, then warning
    of "BUG: sleeping function called from invalid context" can be triggered.

    Fix it by replacing it with raw_spin_lock.

    Cc: Yang Yang <yang.yang@vivo.com>
    Fixes: 72d04bdcf3f7 ("sbitmap: fix io hung due to race on sbitmap_word::cleared")
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Yang Yang <yang.yang@vivo.com>
    Link: https://lore.kernel.org/r/20240919021709.511329-1-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:19:18 +08:00
Ming Lei fbaeddbae7 sbitmap: fix io hung due to race on sbitmap_word::cleared
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit 72d04bdcf3f7d7e07d82f9757946f68802a7270a
Author: Yang Yang <yang.yang@vivo.com>
Date:   Tue Jul 16 16:26:27 2024 +0800

    sbitmap: fix io hung due to race on sbitmap_word::cleared

    Configuration for sbq:
      depth=64, wake_batch=6, shift=6, map_nr=1

    1. There are 64 requests in progress:
      map->word = 0xFFFFFFFFFFFFFFFF
    2. After all the 64 requests complete, and no more requests come:
      map->word = 0xFFFFFFFFFFFFFFFF, map->cleared = 0xFFFFFFFFFFFFFFFF
    3. Now two tasks try to allocate requests:
      T1:                                       T2:
      __blk_mq_get_tag                          .
      __sbitmap_queue_get                       .
      sbitmap_get                               .
      sbitmap_find_bit                          .
      sbitmap_find_bit_in_word                  .
      __sbitmap_get_word  -> nr=-1              __blk_mq_get_tag
      sbitmap_deferred_clear                    __sbitmap_queue_get
      /* map->cleared=0xFFFFFFFFFFFFFFFF */     sbitmap_find_bit
        if (!READ_ONCE(map->cleared))           sbitmap_find_bit_in_word
          return false;                         __sbitmap_get_word -> nr=-1
        mask = xchg(&map->cleared, 0)           sbitmap_deferred_clear
        atomic_long_andnot()                    /* map->cleared=0 */
                                                  if (!(map->cleared))
                                                    return false;
                                         /*
                                          * map->cleared is cleared by T1
                                          * T2 fail to acquire the tag
                                          */

    4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken
    up due to the wake_batch being set at 6. If no more requests come, T1
    will wait here indefinitely.

    This patch achieves two purposes:
    1. Check on ->cleared and update on both ->cleared and ->word need to
    be done atomically, and using spinlock could be the simplest solution.
    2. Add extra check in sbitmap_deferred_clear(), to identify whether
    ->word has free bits.

    Fixes: ea86ea2cdc ("sbitmap: ammortize cost of clearing bits")
    Signed-off-by: Yang Yang <yang.yang@vivo.com>
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20240716082644.659566-1-yang.yang@vivo.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:19:16 +08:00
Ming Lei f850e7d528 sbitmap: use READ_ONCE to access map->word
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit 6ad0d7e0f4b68f87a98ea2b239123b7d865df86b
Author: linke li <lilinke99@qq.com>
Date:   Fri Apr 26 18:34:44 2024 +0800

    sbitmap: use READ_ONCE to access map->word

    In __sbitmap_queue_get_batch(), map->word is read several times, and
    update atomically using atomic_long_try_cmpxchg(). But the first two read
    of map->word is not protected.

    This patch moves the statement val = READ_ONCE(map->word) forward,
    eliminating unprotected accesses to map->word within the function.
    It is aimed at reducing the number of benign races reported by KCSAN in
    order to focus future debugging effort on harmful races.

    Signed-off-by: linke li <lilinke99@qq.com>
    Link: https://lore.kernel.org/r/tencent_0B517C25E519D3D002194E8445E86C04AD0A@qq.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:18:58 +08:00
Ming Lei 5d67769294 sbitmap: remove stale comment in sbq_calc_wake_batch
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit 5c7fa5c8ad79a1d7cc9f59636e2f99e8b5471248
Author: Kemeng Shi <shikemeng@huaweicloud.com>
Date:   Mon Jan 15 22:56:26 2024 +0800

    sbitmap: remove stale comment in sbq_calc_wake_batch

    After commit 106397376c036 ("sbitmap: fix batching wakeup"), we may wake
    up more than one queue for each batch. Just remove stale comment that
    we wake up only one queue for each batch.

    Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
    Link: https://lore.kernel.org/r/20240115145626.665562-1-shikemeng@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:18:32 +08:00
Chris von Recklinghausen 36dfc4a2b3 treewide: use prandom_u32_max() when possible, part 2
JIRA: https://issues.redhat.com/browse/RHEL-1848

commit 8b3ccbc1f1f91847160951aa15dd27c22dddcb49
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Wed Oct 5 16:43:38 2022 +0200

    treewide: use prandom_u32_max() when possible, part 2

    Rather than incurring a division or requesting too many random bytes for
    the given range, use the prandom_u32_max() function, which only takes
    the minimum required bytes from the RNG and avoids divisions. This was
    done by hand, covering things that coccinelle could not do on its own.

    Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Reviewed-by: Yury Norov <yury.norov@gmail.com>
    Reviewed-by: Jan Kara <jack@suse.cz> # for ext2, ext4, and sbitmap
    Acked-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Signed-off-by: Chris von Recklinghausen <crecklin@redhat.com>
2023-10-20 06:15:03 -04:00
Ming Lei 7c980a47e0 sbitmap: fix batching wakeup
JIRA: https://issues.redhat.com/browse/RHEL-1516

commit 106397376c0369fcc01c58dd189ff925a2724a57
Author: David Jeffery <djeffery@redhat.com>
Date:   Fri Jul 21 17:57:15 2023 +0800

    sbitmap: fix batching wakeup

    Current code supposes that it is enough to provide forward progress by
    just waking up one wait queue after one completion batch is done.

    Unfortunately this way isn't enough, cause waiter can be added to wait
    queue just after it is woken up.

    Follows one example(64 depth, wake_batch is 8)

    1) all 64 tags are active

    2) in each wait queue, there is only one single waiter

    3) each time one completion batch(8 completions) wakes up just one
       waiter in each wait queue, then immediately one new sleeper is added
       to this wait queue

    4) after 64 completions, 8 waiters are wakeup, and there are still 8
       waiters in each wait queue

    5) after another 8 active tags are completed, only one waiter can be
       wakeup, and the other 7 can't be waken up anymore.

    Turns out it isn't easy to fix this problem, so simply wakeup enough
    waiters for single batch.

    Cc: Kemeng Shi <shikemeng@huaweicloud.com>
    Cc: Chengming Zhou <zhouchengming@bytedance.com>
    Cc: Jan Kara <jack@suse.cz>
    Signed-off-by: David Jeffery <djeffery@redhat.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Link: https://lore.kernel.org/r/20230721095715.232728-1-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-09-18 17:59:21 +08:00
Ming Lei ff270588cf sbitmap: correct wake_batch recalculation to avoid potential IO hung
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit b5fcf7871acb7f9a3a8ed341a68bd86aba3e254a
Author: Kemeng Shi <shikemeng@huaweicloud.com>
Date:   Tue Jan 17 04:50:59 2023 +0800

    sbitmap: correct wake_batch recalculation to avoid potential IO hung

    Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened")
    mentioned that in case of shared tags, there could be just one real
    active hctx(queue) because of lazy detection of tag idle. Then driver tag
    allocation may wait forever on this real active hctx(queue) if wake_batch
    is > hctx_max_depth where hctx_max_depth is available tags depth for the
    actve hctx(queue). However, the condition wake_batch > hctx_max_depth is
    not strong enough to avoid IO hung as the sbitmap_queue_wake_up will only
    wake up one wait queue for each wake_batch even though there is only one
    waiter in the woken wait queue. After this, there is only one tag to free
    and wake_batch may not be reached anymore. Commit 180dccb0dba4f ("blk-mq:
    fix tag_get wait task can't be awakened") methioned that driver tag
    allocation may wait forever. Actually, the inactive hctx(queue) will be
    truely idle after at most 30 seconds and will call blk_mq_tag_wakeup_all
    to wake one waiter per wait queue to break the hung. But IO hung for 30
    seconds is also not acceptable. Set batch size to small enough that depth
    of the shared hctx(queue) is enough to wake up all of the queues like
    sbq_calc_wake_batch do to fix this potential IO hung.

    Although hctx_max_depth will be clamped to at least 4 while wake_batch
    recalculation does not do the clamp, the wake_batch will be always
    recalculated to 1 when hctx_max_depth <= 4.

    Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened")
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
    Link: https://lore.kernel.org/r/20230116205059.3821738-6-shikemeng@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:39 +08:00
Ming Lei 2f652881d3 sbitmap: add sbitmap_find_bit to remove repeat code in __sbitmap_get/__sbitmap_get_shallow
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 678418c6128f112fc5584beb5cdd21fbc225badf
Author: Kemeng Shi <shikemeng@huaweicloud.com>
Date:   Tue Jan 17 04:50:58 2023 +0800

    sbitmap: add sbitmap_find_bit to remove repeat code in __sbitmap_get/__sbitmap_get_shallow

    There are three differences between __sbitmap_get and
    __sbitmap_get_shallow when searching free bit:
    1. __sbitmap_get_shallow limit number of bit to search per word.
    __sbitmap_get has no such limit.
    2. __sbitmap_get_shallow always searches with wrap set. __sbitmap_get set
    wrap according to round_robin.
    3. __sbitmap_get_shallow always searches from first bit in first word.
    __sbitmap_get searches from first bit when round_robin is not set
    otherwise searches from SB_NR_TO_BIT(sb, alloc_hint).

    Add helper function sbitmap_find_bit function to do common search while
    accept "limit depth per word", "wrap flag" and "first bit to
    search" from caller to support the need of both __sbitmap_get and
    __sbitmap_get_shallow.

    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
    Link: https://lore.kernel.org/r/20230116205059.3821738-5-shikemeng@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:39 +08:00
Ming Lei e03abe8372 sbitmap: rewrite sbitmap_find_bit_in_index to reduce repeat code
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 08470a98a7d7e32c787b23b87353f13b03c23195
Author: Kemeng Shi <shikemeng@huaweicloud.com>
Date:   Tue Jan 17 04:50:57 2023 +0800

    sbitmap: rewrite sbitmap_find_bit_in_index to reduce repeat code

    Rewrite sbitmap_find_bit_in_index as following:
    1. Rename sbitmap_find_bit_in_index to sbitmap_find_bit_in_word
    2. Accept "struct sbitmap_word *" directly instead of accepting
    "struct sbitmap *" and "int index" to get "struct sbitmap_word *".
    3. Accept depth/shallow_depth and wrap for __sbitmap_get_word from caller
    to support need of both __sbitmap_get_shallow and __sbitmap_get.

    With helper function sbitmap_find_bit_in_word, we can remove repeat
    code in __sbitmap_get_shallow to find bit considring deferred clear.

    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
    Link: https://lore.kernel.org/r/20230116205059.3821738-4-shikemeng@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:39 +08:00
Ming Lei 7996ea0f35 sbitmap: remove redundant check in __sbitmap_queue_get_batch
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 903e86f3a64d9573352bbab2f211fdbbaa5772b7
Author: Kemeng Shi <shikemeng@huaweicloud.com>
Date:   Tue Jan 17 04:50:56 2023 +0800

    sbitmap: remove redundant check in __sbitmap_queue_get_batch

    Commit fbb564a557809 ("lib/sbitmap: Fix invalid loop in
    __sbitmap_queue_get_batch()") mentioned that "Checking free bits when
    setting the target bits. Otherwise, it may reuse the busying bits."
    This commit add check to make sure all masked bits in word before
    cmpxchg is zero. Then the existing check after cmpxchg to check any
    zero bit is existing in masked bits in word is redundant.

    Actually, old value of word before cmpxchg is stored in val and we
    will filter out busy bits in val by "(get_mask & ~val)" after cmpxchg.
    So we will not reuse busy bits methioned in commit fbb564a557809
    ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()"). Revert
    new-added check to remove redundant check.

    Fixes: fbb564a55780 ("lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()")
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
    Link: https://lore.kernel.org/r/20230116205059.3821738-3-shikemeng@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:39 +08:00
Ming Lei 78925a07b2 sbitmap: remove unnecessary calculation of alloc_hint in __sbitmap_get_shallow
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit f1591a8bb3e02713f4ee2efe20df0d84ed80da48
Author: Kemeng Shi <shikemeng@huaweicloud.com>
Date:   Tue Jan 17 04:50:55 2023 +0800

    sbitmap: remove unnecessary calculation of alloc_hint in __sbitmap_get_shallow

    Updates to alloc_hint in the loop in __sbitmap_get_shallow() are mostly
    pointless and equivalent to setting alloc_hint to zero (because
    SB_NR_TO_BIT() considers only low sb->shift bits from alloc_hint). So
    simplify the logic.

    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
    Link: https://lore.kernel.org/r/20230116205059.3821738-2-shikemeng@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:39 +08:00
Ming Lei 2e854413e7 sbitmap: Try each queue to wake up at least one waiter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 26edb30dd1c0c9be11fa676b4f330ada7b794ba6
Author: Gabriel Krisman Bertazi <krisman@suse.de>
Date:   Tue Nov 15 17:45:53 2022 -0500

    sbitmap: Try each queue to wake up at least one waiter

    Jan reported the new algorithm as merged might be problematic if the
    queue being awaken becomes empty between the waitqueue_active inside
    sbq_wake_ptr check and the wake up.  If that happens, wake_up_nr will
    not wake up any waiter and we loose too many wake ups.  In order to
    guarantee progress, we need to wake up at least one waiter here, if
    there are any.  This now requires trying to wake up from every queue.

    Instead of walking through all the queues with sbq_wake_ptr, this call
    moves the wake up inside that function.  In a previous version of the
    patch, I found that updating wake_index several times when walking
    through queues had a measurable overhead.  This ensures we only update
    it once, at the end.

    Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags")
    Reported-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20221115224553.23594-4-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:30 +08:00
Ming Lei 7e56872c53 sbitmap: Advance the queue index before waking up a queue
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 976570b4ecd30d3ec6e1b0910da8e5edc591f2b6
Author: Gabriel Krisman Bertazi <krisman@suse.de>
Date:   Tue Nov 15 17:45:51 2022 -0500

    sbitmap: Advance the queue index before waking up a queue

    When a queue is awaken, the wake_index written by sbq_wake_ptr currently
    keeps pointing to the same queue.  On the next wake up, it will thus
    retry the same queue, which is unfair to other queues, and can lead to
    starvation.  This patch, moves the index update to happen before the
    queue is returned, such that it will now try a different queue first on
    the next wake up, improving fairness.

    Fixes: 4f8126bb2308 ("sbitmap: Use single per-bitmap counting to wake up queued tags")
    Reported-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20221115224553.23594-2-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:30 +08:00
Ming Lei 604476d60a sbitmap: Use single per-bitmap counting to wake up queued tags
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2175212

commit 4f8126bb2308066b877859e4b5923ffb54143630
Author: Gabriel Krisman Bertazi <krisman@suse.de>
Date:   Sat Nov 5 19:10:55 2022 -0400

    sbitmap: Use single per-bitmap counting to wake up queued tags

    sbitmap suffers from code complexity, as demonstrated by recent fixes,
    and eventual lost wake ups on nested I/O completion.  The later happens,
    from what I understand, due to the non-atomic nature of the updates to
    wait_cnt, which needs to be subtracted and eventually reset when equal
    to zero.  This two step process can eventually miss an update when a
    nested completion happens to interrupt the CPU in between the wait_cnt
    updates.  This is very hard to fix, as shown by the recent changes to
    this code.

    The code complexity arises mostly from the corner cases to avoid missed
    wakes in this scenario.  In addition, the handling of wake_batch
    recalculation plus the synchronization with sbq_queue_wake_up is
    non-trivial.

    This patchset implements the idea originally proposed by Jan [1], which
    removes the need for the two-step updates of wait_cnt.  This is done by
    tracking the number of completions and wakeups in always increasing,
    per-bitmap counters.  Instead of having to reset the wait_cnt when it
    reaches zero, we simply keep counting, and attempt to wake up N threads
    in a single wait queue whenever there is enough space for a batch.
    Waking up less than batch_wake shouldn't be a problem, because we
    haven't changed the conditions for wake up, and the existing batch
    calculation guarantees at least enough remaining completions to wake up
    a batch for each queue at any time.

    Performance-wise, one should expect very similar performance to the
    original algorithm for the case where there is no queueing.  In both the
    old algorithm and this implementation, the first thing is to check
    ws_active, which bails out if there is no queueing to be managed. In the
    new code, we took care to avoid accounting completions and wakeups when
    there is no queueing, to not pay the cost of atomic operations
    unnecessarily, since it doesn't skew the numbers.

    For more interesting cases, where there is queueing, we need to take
    into account the cross-communication of the atomic operations.  I've
    been benchmarking by running parallel fio jobs against a single hctx
    nullb in different hardware queue depth scenarios, and verifying both
    IOPS and queueing.

    Each experiment was repeated 5 times on a 20-CPU box, with 20 parallel
    jobs. fio was issuing fixed-size randwrites with qd=64 against nullb,
    varying only the hardware queue length per test.

    queue size 2                 4                 8                 16                 32                 64
    6.1-rc2    1681.1K (1.6K)    2633.0K (12.7K)   6940.8K (16.3K)   8172.3K (617.5K)   8391.7K (367.1K)   8606.1K (351.2K)
    patched    1721.8K (15.1K)   3016.7K (3.8K)    7543.0K (89.4K)   8132.5K (303.4K)   8324.2K (230.6K)   8401.8K (284.7K)

    The following is a similar experiment, ran against a nullb with a single
    bitmap shared by 20 hctx spread across 2 NUMA nodes. This has 40
    parallel fio jobs operating on the same device

    queue size 2                 4                 8                16                  32                 64
    6.1-rc2    1081.0K (2.3K)    957.2K (1.5K)     1699.1K (5.7K)   6178.2K (124.6K)    12227.9K (37.7K)   13286.6K (92.9K)
    patched    1081.8K (2.8K)    1316.5K (5.4K)    2364.4K (1.8K)   6151.4K  (20.0K)    11893.6K (17.5K)   12385.6K (18.4K)

    It has also survived blktests and a 12h-stress run against nullb. I also
    ran the code against nvme and a scsi SSD, and I didn't observe
    performance regression in those. If there are other tests you think I
    should run, please let me know and I will follow up with results.

    [1] https://lore.kernel.org/all/aef9de29-e9f5-259a-f8be-12d1b734e72@google.com/

    Cc: Hugh Dickins <hughd@google.com>
    Cc: Keith Busch <kbusch@kernel.org>
    Cc: Liu Song <liusong@linux.alibaba.com>
    Suggested-by: Jan Kara <jack@suse.cz>
    Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
    Link: https://lore.kernel.org/r/20221105231055.25953-1-krisman@suse.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-03-11 23:27:29 +08:00
Ming Lei bb1bb8b81d sbitmap: fix lockup while swapping
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit 30514bd2dd4e86a3ecfd6a93a3eadf7b9ea164a0
Author: Hugh Dickins <hughd@google.com>
Date:   Thu Sep 29 12:50:12 2022 -0700

    sbitmap: fix lockup while swapping

    Commit 4acb83417cad ("sbitmap: fix batched wait_cnt accounting")
    is a big improvement: without it, I had to revert to before commit
    040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
    to avoid the high system time and freezes which that had introduced.

    Now okay on the NVME laptop, but 4acb83417cad is a disaster for heavy
    swapping (kernel builds in low memory) on another: soon locking up in
    sbitmap_queue_wake_up() (into which __sbq_wake_up() is inlined), cycling
    around with waitqueue_active() but wait_cnt 0 .  Here is a backtrace,
    showing the common pattern of outer sbitmap_queue_wake_up() interrupted
    before setting wait_cnt 0 back to wake_batch (in some cases other CPUs
    are idle, in other cases they're spinning for a lock in dd_bio_merge()):

    sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
    __blk_mq_free_request < blk_mq_free_request < __blk_mq_end_request <
    scsi_end_request < scsi_io_completion < scsi_finish_command <
    scsi_complete < blk_complete_reqs < blk_done_softirq < __do_softirq <
    __irq_exit_rcu < irq_exit_rcu < common_interrupt < asm_common_interrupt <
    _raw_spin_unlock_irqrestore < __wake_up_common_lock < __wake_up <
    sbitmap_queue_wake_up < sbitmap_queue_clear < blk_mq_put_tag <
    __blk_mq_free_request < blk_mq_free_request < dd_bio_merge <
    blk_mq_sched_bio_merge < blk_mq_attempt_bio_merge < blk_mq_submit_bio <
    __submit_bio < submit_bio_noacct_nocheck < submit_bio_noacct <
    submit_bio < __swap_writepage < swap_writepage < pageout <
    shrink_folio_list < evict_folios < lru_gen_shrink_lruvec <
    shrink_lruvec < shrink_node < do_try_to_free_pages < try_to_free_pages <
    __alloc_pages_slowpath < __alloc_pages < folio_alloc < vma_alloc_folio <
    do_anonymous_page < __handle_mm_fault < handle_mm_fault <
    do_user_addr_fault < exc_page_fault < asm_exc_page_fault

    See how the process-context sbitmap_queue_wake_up() has been interrupted,
    after bringing wait_cnt down to 0 (and in this example, after doing its
    wakeups), before advancing wake_index and refilling wake_cnt: an
    interrupt-context sbitmap_queue_wake_up() of the same sbq gets stuck.

    I have almost no grasp of all the possible sbitmap races, and their
    consequences: but __sbq_wake_up() can do nothing useful while wait_cnt 0,
    so it is better if sbq_wake_ptr() skips on to the next ws in that case:
    which fixes the lockup and shows no adverse consequence for me.

    The check for wait_cnt being 0 is obviously racy, and ultimately can lead
    to lost wakeups: for example, when there is only a single waitqueue with
    waiters.  However, lost wakeups are unlikely to matter in these cases,
    and a proper fix requires redesign (and benchmarking) of the batched
    wakeup code: so let's plug the hole with this bandaid for now.

    Signed-off-by: Hugh Dickins <hughd@google.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Link: https://lore.kernel.org/r/9c2038a7-cdc5-5ee-854c-fbc6168bf16@google.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:14 +08:00
Ming Lei 7fddf132c9 sbitmap: fix batched wait_cnt accounting
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit 4acb83417cadfdcbe64215f9d0ddcf3132af808e
Author: Keith Busch <kbusch@kernel.org>
Date:   Fri Sep 9 11:40:22 2022 -0700

    sbitmap: fix batched wait_cnt accounting

    Batched completions can clear multiple bits, but we're only decrementing
    the wait_cnt by one each time. This can cause waiters to never be woken,
    stalling IO. Use the batched count instead.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:10 +08:00
Ming Lei ab2602adce sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit c35227d4e8cbc70a6622cc7cc5f8c3bff513f1fa
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Sep 8 17:12:00 2022 +0200

    sbitmap: Use atomic_long_try_cmpxchg in __sbitmap_queue_get_batch

    Use atomic_long_try_cmpxchg instead of
    atomic_long_cmpxchg (*ptr, old, new) == old in __sbitmap_queue_get_batch.
    x86 CMPXCHG instruction returns success in ZF flag, so this change
    saves a compare after cmpxchg (and related move instruction in front
    of cmpxchg).

    Also, atomic_long_cmpxchg implicitly assigns old *ptr value to "old"
    when cmpxchg fails, enabling further code simplifications, e.g.
    an extra memory read can be avoided in the loop.

    No functional change intended.

    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
    Link: https://lore.kernel.org/r/20220908151200.9993-1-ubizjak@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:10 +08:00
Ming Lei a395275fe4 sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit 48c033314f372478548203c583529f53080fd078
Author: Jan Kara <jack@suse.cz>
Date:   Thu Sep 8 15:09:37 2022 +0200

    sbitmap: Avoid leaving waitqueue in invalid state in __sbq_wake_up()

    When __sbq_wake_up() decrements wait_cnt to 0 but races with someone
    else waking the waiter on the waitqueue (so the waitqueue becomes
    empty), it exits without reseting wait_cnt to wake_batch number. Once
    wait_cnt is 0, nobody will ever reset the wait_cnt or wake the new
    waiters resulting in possible deadlocks or busyloops. Fix the problem by
    making sure we reset wait_cnt even if we didn't wake up anybody in the
    end.

    Fixes: 040b83fcecfb ("sbitmap: fix possible io hung due to lost wakeup")
    Reported-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20220908130937.2795-1-jack@suse.cz
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:10 +08:00
Ming Lei df0e746fb4 Revert "sbitmap: fix batched wait_cnt accounting"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit bce1b56c73826fec8caf6187f0c922ede397a5a8
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sun Sep 4 06:39:25 2022 -0600

    Revert "sbitmap: fix batched wait_cnt accounting"

    This reverts commit 16ede66973c84f890c03584f79158dd5b2d725f5.

    This is causing issues with CPU stalls on my test box, revert it for
    now until we understand what is going on. It looks like infinite
    looping off sbitmap_queue_wake_up(), but hard to tell with a lot of
    CPUs hitting this issue and the console scrolling infinitely.

    Link: https://lore.kernel.org/linux-block/e742813b-ce5c-0d58-205b-1626f639b1bd@kernel.dk/
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:09 +08:00
Ming Lei 24ea5ae6c2 sbitmap: fix batched wait_cnt accounting
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit 16ede66973c84f890c03584f79158dd5b2d725f5
Author: Keith Busch <kbusch@kernel.org>
Date:   Thu Aug 25 07:53:12 2022 -0700

    sbitmap: fix batched wait_cnt accounting

    Batched completions can clear multiple bits, but we're only decrementing
    the wait_cnt by one each time. This can cause waiters to never be woken,
    stalling IO. Use the batched count instead.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Link: https://lore.kernel.org/r/20220825145312.1217900-1-kbusch@fb.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:09 +08:00
Ming Lei b874038569 sbitmap: remove unnecessary code in __sbitmap_queue_get_batch
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit ddbfc34fcf5d0bc33b006b90c580c56edeb31068
Author: Liu Song <liusong@linux.alibaba.com>
Date:   Fri Aug 26 11:14:13 2022 +0800

    sbitmap: remove unnecessary code in __sbitmap_queue_get_batch

    If "nr + nr_tags <= map_depth", then the value of nr_tags will not be
    greater than map_depth, so no additional comparison is required.

    Signed-off-by: Liu Song <liusong@linux.alibaba.com>
    Link: https://lore.kernel.org/r/1661483653-27326-1-git-send-email-liusong@linux.alibaba.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:09 +08:00
Ming Lei e43cab6f5f sbitmap: fix possible io hung due to lost wakeup
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

commit 040b83fcecfb86f3225d3a5de7fd9b3fbccf83b4
Author: Yu Kuai <yukuai3@huawei.com>
Date:   Wed Aug 3 20:15:04 2022 +0800

    sbitmap: fix possible io hung due to lost wakeup

    There are two problems can lead to lost wakeup:

    1) invalid wakeup on the wrong waitqueue:

    For example, 2 * wake_batch tags are put, while only wake_batch threads
    are woken:

    __sbq_wake_up
     atomic_cmpxchg -> reset wait_cnt
                            __sbq_wake_up -> decrease wait_cnt
                            ...
                            __sbq_wake_up -> wait_cnt is decreased to 0 again
                             atomic_cmpxchg
                             sbq_index_atomic_inc -> increase wake_index
                             wake_up_nr -> wake up and waitqueue might be empty
     sbq_index_atomic_inc -> increase again, one waitqueue is skipped
     wake_up_nr -> invalid wake up because old wakequeue might be empty

    To fix the problem, increasing 'wake_index' before resetting 'wait_cnt'.

    2) 'wait_cnt' can be decreased while waitqueue is empty

    As pointed out by Jan Kara, following race is possible:

    CPU1                            CPU2
    __sbq_wake_up                    __sbq_wake_up
     sbq_wake_ptr()                  sbq_wake_ptr() -> the same
     wait_cnt = atomic_dec_return()
     /* decreased to 0 */
     sbq_index_atomic_inc()
     /* move to next waitqueue */
     atomic_set()
     /* reset wait_cnt */
     wake_up_nr()
     /* wake up on the old waitqueue */
                                     wait_cnt = atomic_dec_return()
                                     /*
                                      * decrease wait_cnt in the old
                                      * waitqueue, while it can be
                                      * empty.
                                      */

    Fix the problem by waking up before updating 'wake_index' and
    'wait_cnt'.

    With this patch, noted that 'wait_cnt' is still decreased in the old
    empty waitqueue, however, the wakeup is redirected to a active waitqueue,
    and the extra decrement on the old empty waitqueue is not handled.

    Fixes: 88459642cb ("blk-mq: abstract tag allocation out into sbitmap library")
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    Link: https://lore.kernel.org/r/20220803121504.212071-1-yukuai1@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:09 +08:00
Ming Lei 40b9461562 lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit fbb564a557809466c171b95f8d593a0972450ff2
Author: wuchi <wuchi.zero@gmail.com>
Date:   Sun Jun 5 22:58:35 2022 +0800

    lib/sbitmap: Fix invalid loop in __sbitmap_queue_get_batch()

    1. Getting next index before continue branch.
    2. Checking free bits when setting the target bits. Otherwise,
    it may reuse the busying bits.

    Signed-off-by: wuchi <wuchi.zero@gmail.com>
    Reviewed-by: Martin Wilck <mwilck@suse.com>
    Link: https://lore.kernel.org/r/20220605145835.26916-1-wuchi.zero@gmail.com
    Fixes: 9672b0d43782 ("sbitmap: add __sbitmap_queue_get_batch()")
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:12 +08:00
Ming Lei e3ac74e8a2 sbitmap: Delete old sbitmap_queue_get_shallow()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917

commit 3f607293b74d6acb06571a774a500143c1f0ed2c
Author: John Garry <john.garry@huawei.com>
Date:   Tue Feb 8 20:07:04 2022 +0800

    sbitmap: Delete old sbitmap_queue_get_shallow()

    Since __sbitmap_queue_get_shallow() was introduced in commit c05e667337
    ("sbitmap: add sbitmap_get_shallow() operation"), it has not been used.

    Delete __sbitmap_queue_get_shallow() and rename public
    __sbitmap_queue_get_shallow() -> sbitmap_queue_get_shallow() as it is odd
    to have public __foo but no foo at all.

    Signed-off-by: John Garry <john.garry@huawei.com>
    Link: https://lore.kernel.org/r/1644322024-105340-1-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-22 08:56:21 +08:00
Ewan D. Milne 95b38e96d3 lib/sbitmap: allocate sb->map via kvzalloc_node
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071832
Upstream Status: From upstream linux mainline

sbitmap has been used in scsi for replacing atomic operations on
sdev->device_busy, so IOPS on some fast scsi storage can be improved.

However, sdev->device_busy can be changed in fast path, so we have to
allocate the sb->map statically. sdev->device_busy has been capped to 1024,
but some drivers may configure the default depth as < 8, then
cause each sbitmap word to hold only one bit. Finally 1024 * 128(
sizeof(sbitmap_word)) bytes is needed for sb->map, given it is order 5
allocation, sometimes it may fail.

Avoid the issue by using kvzalloc_node() for allocating sb->map.

Cc: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220316012708.354668-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
(cherry picked from commit 863a66cdb4df25fd146d9851c3289072298566d5)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-04-27 18:55:08 -04:00
Ming Lei 342db1faa5 lib/sbitmap: kill 'depth' from sbitmap_word
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2064695
Upstream Status: merged to for-5.18/block

commit 3301bc53358a0eb0a0db65fd7a513cd4cb50c83a
Author: Ming Lei <ming.lei@redhat.com>
Date:   Mon Jan 10 15:29:45 2022 +0800

    lib/sbitmap: kill 'depth' from sbitmap_word

    Only the last sbitmap_word can have different depth, and all the others
    must have same depth of 1U << sb->shift, so not necessary to store it in
    sbitmap_word, and it can be retrieved easily and efficiently by adding
    one internal helper of __map_depth(sb, index).

    Remove 'depth' field from sbitmap_word, then the annotation of
    ____cacheline_aligned_in_smp for 'word' isn't needed any more.

    Not see performance effect when running high parallel IOPS test on
    null_blk.

    This way saves us one cacheline(usually 64 words) per each sbitmap_word.

    Cc: Martin Wilck <martin.wilck@suse.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Reviewed-by: Martin Wilck <mwilck@suse.com>
    Reviewed-by: John Garry <john.garry@huawei.com>
    Link: https://lore.kernel.org/r/20220110072945.347535-1-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-03-18 09:41:14 +08:00
Ming Lei 27879455e9 blk-mq: Fix wrong wakeup batch configuration which will cause hang
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044184

commit 10825410b956dc1ed8c5fbc8bbedaffdadde7f20
Author: Laibin Qiu <qiulaibin@huawei.com>
Date:   Thu Jan 27 18:00:47 2022 +0800

    blk-mq: Fix wrong wakeup batch configuration which will cause hang

    Commit 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be
    awakened") will recalculate wake_batch when incrementing or decrementing
    active_queues to avoid wake_batch > hctx_max_depth. At the same time, in
    order to not affect performance as much as possible, the minimum wakeup
    batch is set to 4. But when the QD is small (such as QD=1), if inc or dec
    active_queues increases wakeup batch, that can lead to a hang:

    Fix this problem with the following strategies:
    QD          :  >= 32 | < 32
    ---------------------------------
    wakeup batch:  8~4   | 3~1

    Fixes: 180dccb0dba4f ("blk-mq: fix tag_get wait task can't be awakened")
    Link: https://lore.kernel.org/linux-block/78cafe94-a787-e006-8851-69906f0c2128@huawei.com/T/#t
    Reported-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
    Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
    Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
    Link: https://lore.kernel.org/r/20220127100047.1763746-1-qiulaibin@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-02-07 15:52:12 +08:00
Ming Lei 11df2a036f blk-mq: fix tag_get wait task can't be awakened
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044184

commit 180dccb0dba4f5e84a4a70c1be1d34cbb6528b32
Author: Laibin Qiu <qiulaibin@huawei.com>
Date:   Thu Jan 13 10:55:36 2022 +0800

    blk-mq: fix tag_get wait task can't be awakened

    In case of shared tags, there might be more than one hctx which
    allocates from the same tags, and each hctx is limited to allocate at
    most:
            hctx_max_depth = max((bt->sb.depth + users - 1) / users, 4U);

    tag idle detection is lazy, and may be delayed for 30sec, so there
    could be just one real active hctx(queue) but all others are actually
    idle and still accounted as active because of the lazy idle detection.
    Then if wake_batch is > hctx_max_depth, driver tag allocation may wait
    forever on this real active hctx.

    Fix this by recalculating wake_batch when inc or dec active_queues.

    Fixes: 0d2602ca30 ("blk-mq: improve support for shared tags maps")
    Suggested-by: Ming Lei <ming.lei@redhat.com>
    Suggested-by: John Garry <john.garry@huawei.com>
    Signed-off-by: Laibin Qiu <qiulaibin@huawei.com>
    Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
    Link: https://lore.kernel.org/r/20220113025536.1479653-1-qiulaibin@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-02-07 15:50:11 +08:00
Ming Lei 64d7526405 sbitmap: silence data race warning
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403

commit 9f8b93a7df4d8e1e8715fb2a45a893cffad9da0b
Author: Jens Axboe <axboe@kernel.dk>
Date:   Mon Oct 25 10:45:01 2021 -0600

    sbitmap: silence data race warning

    KCSAN complaints about the sbitmap hint update:

    ==================================================================
    BUG: KCSAN: data-race in sbitmap_queue_clear / sbitmap_queue_clear

    write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 1:
     sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606
     blk_mq_put_tag+0x82/0x90
     __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507
     blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541
     __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565
     blk_mq_end_request+0x37/0x50 block/blk-mq.c:574
     lo_complete_rq+0xca/0x170 drivers/block/loop.c:541
     blk_complete_reqs block/blk-mq.c:584 [inline]
     blk_done_softirq+0x69/0x90 block/blk-mq.c:589
     __do_softirq+0x12c/0x26e kernel/softirq.c:558
     run_ksoftirqd+0x13/0x20 kernel/softirq.c:920
     smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164
     kthread+0x262/0x280 kernel/kthread.c:319
     ret_from_fork+0x1f/0x30

    write to 0xffffe8ffffd145b8 of 4 bytes by interrupt on cpu 0:
     sbitmap_queue_clear+0xca/0xf0 lib/sbitmap.c:606
     blk_mq_put_tag+0x82/0x90
     __blk_mq_free_request+0x114/0x180 block/blk-mq.c:507
     blk_mq_free_request+0x2c8/0x340 block/blk-mq.c:541
     __blk_mq_end_request+0x214/0x230 block/blk-mq.c:565
     blk_mq_end_request+0x37/0x50 block/blk-mq.c:574
     lo_complete_rq+0xca/0x170 drivers/block/loop.c:541
     blk_complete_reqs block/blk-mq.c:584 [inline]
     blk_done_softirq+0x69/0x90 block/blk-mq.c:589
     __do_softirq+0x12c/0x26e kernel/softirq.c:558
     run_ksoftirqd+0x13/0x20 kernel/softirq.c:920
     smpboot_thread_fn+0x22f/0x330 kernel/smpboot.c:164
     kthread+0x262/0x280 kernel/kthread.c:319
     ret_from_fork+0x1f/0x30

    value changed: 0x00000035 -> 0x00000044

    Reported by Kernel Concurrency Sanitizer on:
    CPU: 0 PID: 10 Comm: ksoftirqd/0 Not tainted 5.15.0-rc6-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
    ==================================================================

    which is a data race, but not an important one. This is just updating the
    percpu alloc hint, and the reader of that hint doesn't ever require it to
    be valid.

    Just annotate it with data_race() to silence this one.

    Reported-by: syzbot+4f8bfd804b4a1f95b8f6@syzkaller.appspotmail.com
    Acked-by: Marco Elver <elver@google.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2021-12-06 16:45:23 +08:00
Ming Lei 3c7899835e sbitmap: add helper to clear a batch of tags
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403

commit 1aec5e4a2962f7e0b3fb3e7308dd726be2472c26
Author: Jens Axboe <axboe@kernel.dk>
Date:   Fri Oct 8 05:44:23 2021 -0600

    sbitmap: add helper to clear a batch of tags

    sbitmap currently only supports clearing tags one-by-one, add a helper
    that allows the caller to pass in an array of tags to clear.

    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2021-12-06 16:44:47 +08:00
Ming Lei 519090d8d2 sbitmap: add __sbitmap_queue_get_batch()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018403

commit 9672b0d43782047b1825a96bafee1b6aefa35bc2
Author: Jens Axboe <axboe@kernel.dk>
Date:   Sat Oct 9 13:02:23 2021 -0600

    sbitmap: add __sbitmap_queue_get_batch()

    The block layer tag allocation batching still calls into sbitmap to get
    each tag, but we can improve on that. Add __sbitmap_queue_get_batch(),
    which returns a mask of tags all at once, along with an offset for
    those tags.

    An example return would be 0xff, where bits 0..7 are set, with
    tag_offset == 128. The valid tags in this case would be 128..135.

    A batch is specific to an individual sbitmap_map, hence it cannot be
    larger than that. The requested number of tags is automatically reduced
    to the max that can be satisfied with a single map.

    On failure, 0 is returned. Caller should fall back to single tag
    allocation at that point/

    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2021-12-06 16:42:52 +08:00
Zhen Lei 9dbbc3b9d0 lib: fix spelling mistakes
Fix some spelling mistakes in comments:
permanentely ==> permanently
wont ==> won't
remaning ==> remaining
succed ==> succeed
shouldnt ==> shouldn't
alpha-numeric ==> alphanumeric
storeing ==> storing
funtion ==> function
documenation ==> documentation
Determin ==> Determine
intepreted ==> interpreted
ammount ==> amount
obious ==> obvious
interupts ==> interrupts
occured ==> occurred
asssociated ==> associated
taking into acount ==> taking into account
squence ==> sequence
stil ==> still
contiguos ==> contiguous
matchs ==> matches

Link: https://lkml.kernel.org/r/20210607072555.12416-1-thunder.leizhen@huawei.com
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-07-08 11:48:20 -07:00
Ming Lei 2d13b1ea9f scsi: sbitmap: Add sbitmap_calculate_shift() helper
Move code for calculating default shift into a public helper which can be
used by SCSI.

Link: https://lore.kernel.org/r/20210122023317.687987-7-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00
Ming Lei cbb9950b41 scsi: sbitmap: Export sbitmap_weight
SCSI's .device_busy will be converted to sbitmap and sbitmap_weight is
needed. Export the helper.

The only existing user of sbitmap_weight() uses it to find out how many
bits are set and not cleared. Align sbitmap_weight() meaning with this
usage model.

Link: https://lore.kernel.org/r/20210122023317.687987-6-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00
Ming Lei c548e62bcf scsi: sbitmap: Move allocation hint into sbitmap
Allocation hint should have belonged to sbitmap. Also, when sbitmap's depth
is high and there is no need to use mulitple wakeup queues, user can
benefit from percpu allocation hint too.

Move allocation hint into sbitmap, then SCSI device queue can benefit from
allocation hint when converting to plain sbitmap.

Convert vhost/scsi.c to use sbitmap allocation with percpu alloc hint. This
is more efficient than the previous approach.

Link: https://lore.kernel.org/r/20210122023317.687987-5-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: virtualization@lists.linux-foundation.org
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00
Ming Lei bf2c4282a1 scsi: sbitmap: Add helpers for updating allocation hint
Add helpers for updating allocation hint so that we can avoid duplicate
code.

Prepare for moving allocation hint into sbitmap.

Link: https://lore.kernel.org/r/20210122023317.687987-4-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00
Ming Lei efe1f3a1d5 scsi: sbitmap: Maintain allocation round_robin in sbitmap
Currently the allocation round_robin info is maintained by sbitmap_queue.

However, bit allocation really belongs to sbitmap. Move it there.

Link: https://lore.kernel.org/r/20210122023317.687987-3-ming.lei@redhat.com
Cc: Omar Sandoval <osandov@fb.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Cc: Ewan D. Milne <emilne@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: virtualization@lists.linux-foundation.org
Tested-by: Sumanesh Samanta <sumanesh.samanta@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:36:59 -05:00
Pavel Begunkov 0eff1f1a38 sbitmap: simplify wrap check
__sbitmap_get_word() doesn't warp if it's starting from the beginning
(i.e. initial hint is 0). Instead of stashing the original hint just set
@wrap accordingly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07 17:12:49 -07:00
Pavel Begunkov c3250c8d24 sbitmap: replace CAS with atomic and
sbitmap_deferred_clear() does CAS loop to propagate cleared bits,
replace it with equivalent atomic bitwise and. That's slightly faster
and makes wait-free instead of lock-free as before.

The atomic can be relaxed (i.e. barrier-less) because following
sbitmap_get*() deal with synchronisation, see comments in
sbitmap_queue_clear().

It's ok to cast to atomic_long_t, that's what bitops/lock.h does.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07 17:12:49 -07:00
Pavel Begunkov 661d4f55a7 sbitmap: remove swap_lock
map->swap_lock protects map->cleared from concurrent modification,
however sbitmap_deferred_clear() is already atomically drains it, so
it's guaranteed to not loose bits on concurrent
sbitmap_deferred_clear().

A one threaded tag heavy test on top of nullbk showed ~1.5% t-put
increase, and 3% -> 1% cycle reduction of sbitmap_get() according to perf.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07 17:12:49 -07:00
Pavel Begunkov b78beea038 sbitmap: optimise sbitmap_deferred_clear()
Because of spinlocks and atomics sbitmap_deferred_clear() have to reload
&sb->map[index] on each access even though the map address won't change.
Pass in sbitmap_word instead of {sb, index}, so it's cached in a
variable. It also improves code generation of
sbitmap_find_bit_in_index().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-12-07 17:12:49 -07:00
John Garry 6bf0eb5504 sbitmap: Consider cleared bits in sbitmap_bitmap_show()
sbitmap works by maintaining separate bitmaps of set and cleared bits.
The set bits are cleared in a batch, to save the burden of continuously
locking the "word" map to unset.

sbitmap_bitmap_show() only shows the set bits (in "word"), which is not
too much use, so mask out the cleared bits.

Fixes: ea86ea2cdc ("sbitmap: ammortize cost of clearing bits")
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-07-01 10:53:00 -06:00
David Jeffery df034c93f1 sbitmap: only queue kyber's wait callback if not already active
Under heavy loads where the kyber I/O scheduler hits the token limits for
its scheduling domains, kyber can become stuck.  When active requests
complete, kyber may not be woken up leaving the I/O requests in kyber
stuck.

This stuck state is due to a race condition with kyber and the sbitmap
functions it uses to run a callback when enough requests have completed.
The running of a sbt_wait callback can race with the attempt to insert the
sbt_wait.  Since sbitmap_del_wait_queue removes the sbt_wait from the list
first then sets the sbq field to NULL, kyber can see the item as not on a
list but the call to sbitmap_add_wait_queue will see sbq as non-NULL. This
results in the sbt_wait being inserted onto the wait list but ws_active
doesn't get incremented.  So the sbitmap queue does not know there is a
waiter on a wait list.

Since sbitmap doesn't think there is a waiter, kyber may never be
informed that there are domain tokens available and the I/O never advances.
With the sbt_wait on a wait list, kyber believes it has an active waiter
so cannot insert a new waiter when reaching the domain's full state.

This race can be fixed by only adding the sbt_wait to the queue if the
sbq field is NULL.  If sbq is not NULL, there is already an action active
which will trigger the re-running of kyber.  Let it run and add the
sbt_wait to the wait list if still needing to wait.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Jeffery <djeffery@redhat.com>
Reported-by: John Pittman <jpittman@redhat.com>
Tested-by: John Pittman <jpittman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-12-20 16:51:54 -07:00
John Garry 708edafa88 sbitmap: Delete sbitmap_any_bit_clear()
Since the only caller of this function has been deleted, delete this one
also.

Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-11-13 12:50:40 -07:00
Pavel Begunkov 417232880c sbitmap: Replace cmpxchg with xchg
cmpxchg() with an immediate value could be replaced with less expensive
xchg(). The same true if new value don't _depend_ on the old one.

In the second block, atomic_cmpxchg() return value isn't checked, so
after atomic_cmpxchg() ->  atomic_xchg() conversion it could be replaced
with atomic_set(). Comparison with atomic_read() in the second chunk was
left as an optimisation (if that was the initial intention).

Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-07-01 11:57:12 -06:00
Thomas Gleixner 0fc479b1ad treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 328
Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license v2 as published
  by the free software foundation this program is distributed in the
  hope that it will be useful but without any warranty without even
  the implied warranty of merchantability or fitness for a particular
  purpose see the gnu general public license for more details you
  should have received a copy of the gnu general public license along
  with this program if not see https www gnu org licenses

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 2 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190530000435.923873561@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-05 17:37:06 +02:00
Andrea Parri a0934fd2b1 sbitmap: fix improper use of smp_mb__before_atomic()
This barrier only applies to the read-modify-write operations; in
particular, it does not apply to the atomic_set() primitive.

Replace the barrier with an smp_mb().

Fixes: 6c0ca7ae29 ("sbitmap: fix wakeup hang after sbq resize")
Cc: stable@vger.kernel.org
Reported-by: "Paul E. McKenney" <paulmck@linux.ibm.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrea Parri <andrea.parri@amarulasolutions.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: linux-block@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-05-23 10:25:26 -06:00
Ming Lei e6d1fa584e sbitmap: order READ/WRITE freed instance and setting clear bit
Inside sbitmap_queue_clear(), once the clear bit is set, it will be
visiable to allocation path immediately. Meantime READ/WRITE on old
associated instance(such as request in case of blk-mq) may be
out-of-order with the setting clear bit, so race with re-allocation
may be triggered.

Adds one memory barrier for ordering READ/WRITE of the freed associated
instance with setting clear bit for avoiding race with re-allocation.

The following kernel oops triggerd by block/006 on aarch64 may be fixed:

[  142.330954] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000330
[  142.338794] Mem abort info:
[  142.341554]   ESR = 0x96000005
[  142.344632]   Exception class = DABT (current EL), IL = 32 bits
[  142.350500]   SET = 0, FnV = 0
[  142.353544]   EA = 0, S1PTW = 0
[  142.356678] Data abort info:
[  142.359528]   ISV = 0, ISS = 0x00000005
[  142.363343]   CM = 0, WnR = 0
[  142.366305] user pgtable: 64k pages, 48-bit VAs, pgdp = 000000002a3c51c0
[  142.372983] [0000000000000330] pgd=0000000000000000, pud=0000000000000000
[  142.379777] Internal error: Oops: 96000005 [#1] SMP
[  142.384613] Modules linked in: null_blk ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp vfat fat rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm libiscsi ib_umad scsi_transport_iscsi ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core sbsa_gwdt crct10dif_ce ghash_ce ipmi_ssif sha2_ce ipmi_devintf sha256_arm64 sg sha1_ce ipmi_msghandler ip_tables xfs libcrc32c mlx5_core sdhci_acpi mlxfw ahci_platform at803x sdhci libahci_platform qcom_emac mmc_core hdma hdma_mgmt i2c_dev [last unloaded: null_blk]
[  142.429753] CPU: 7 PID: 1983 Comm: fio Not tainted 5.0.0.cki #2
[  142.449458] pstate: 00400005 (nzcv daif +PAN -UAO)
[  142.454239] pc : __blk_mq_free_request+0x4c/0xa8
[  142.458830] lr : blk_mq_free_request+0xec/0x118
[  142.463344] sp : ffff00003360f6a0
[  142.466646] x29: ffff00003360f6a0 x28: ffff000010e70000
[  142.471941] x27: ffff801729a50048 x26: 0000000000010000
[  142.477232] x25: ffff00003360f954 x24: ffff7bdfff021440
[  142.482529] x23: 0000000000000000 x22: 00000000ffffffff
[  142.487830] x21: ffff801729810000 x20: 0000000000000000
[  142.493123] x19: ffff801729a50000 x18: 0000000000000000
[  142.498413] x17: 0000000000000000 x16: 0000000000000001
[  142.503709] x15: 00000000000000ff x14: ffff7fe000000000
[  142.509003] x13: ffff8017dcde09a0 x12: 0000000000000000
[  142.514308] x11: 0000000000000001 x10: 0000000000000008
[  142.519597] x9 : ffff8017dcde09a0 x8 : 0000000000002000
[  142.524889] x7 : ffff8017dcde0a00 x6 : 000000015388f9be
[  142.530187] x5 : 0000000000000001 x4 : 0000000000000000
[  142.535478] x3 : 0000000000000000 x2 : 0000000000000000
[  142.540777] x1 : 0000000000000001 x0 : ffff00001041b194
[  142.546071] Process fio (pid: 1983, stack limit = 0x000000006460a0ea)
[  142.552500] Call trace:
[  142.554926]  __blk_mq_free_request+0x4c/0xa8
[  142.559181]  blk_mq_free_request+0xec/0x118
[  142.563352]  blk_mq_end_request+0xfc/0x120
[  142.567444]  end_cmd+0x3c/0xa8 [null_blk]
[  142.571434]  null_complete_rq+0x20/0x30 [null_blk]
[  142.576194]  blk_mq_complete_request+0x108/0x148
[  142.580797]  null_handle_cmd+0x1d4/0x718 [null_blk]
[  142.585662]  null_queue_rq+0x60/0xa8 [null_blk]
[  142.590171]  blk_mq_try_issue_directly+0x148/0x280
[  142.594949]  blk_mq_try_issue_list_directly+0x9c/0x108
[  142.600064]  blk_mq_sched_insert_requests+0xb0/0xd0
[  142.604926]  blk_mq_flush_plug_list+0x16c/0x2a0
[  142.609441]  blk_flush_plug_list+0xec/0x118
[  142.613608]  blk_finish_plug+0x3c/0x4c
[  142.617348]  blkdev_direct_IO+0x3b4/0x428
[  142.621336]  generic_file_read_iter+0x84/0x180
[  142.625761]  blkdev_read_iter+0x50/0x78
[  142.629579]  aio_read.isra.6+0xf8/0x190
[  142.633409]  __io_submit_one.isra.8+0x148/0x738
[  142.637912]  io_submit_one.isra.9+0x88/0xb8
[  142.642078]  __arm64_sys_io_submit+0xe0/0x238
[  142.646428]  el0_svc_handler+0xa0/0x128
[  142.650238]  el0_svc+0x8/0xc
[  142.653104] Code: b9402a63 f9000a7f 3100047f 540000a0 (f9419a81)
[  142.659202] ---[ end trace 467586bc175eb09d ]---

Fixes: ea86ea2cdc ("sbitmap: ammortize cost of clearing bits")
Reported-and-bisected_and_tested-by: Yi Zhang <yi.zhang@redhat.com>
Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-03-25 13:05:47 -06:00
Ming Lei fe76fc6aaf sbitmap: Protect swap_lock from hardirq
Because we may call blk_mq_get_driver_tag() directly from
blk_mq_dispatch_rq_list() without holding any lock, then HARDIRQ may
come and the above DEADLOCK is triggered.

Commit ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq") tries to
fix this issue by using 'spin_lock_bh', which isn't enough because we
complete request from hardirq context direclty in case of multiqueue.

Cc: Clark Williams <williams@redhat.com>
Fixes: ab53dcfb3e7b ("sbitmap: Protect swap_lock from hardirq")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-15 16:29:57 +12:00
Steven Rostedt (VMware) 3719876809 sbitmap: Protect swap_lock from softirqs
The swap_lock used by sbitmap has a chain with locks taken from softirq,
but the swap_lock is not protected from being preempted by softirqs.

A chain exists of:

 sbq->ws[i].wait -> dispatch_wait_lock -> swap_lock

Where the sbq->ws[i].wait lock can be taken from softirq context, which
means all locks below it in the chain must also be protected from
softirqs.

Reported-by: Clark Williams <williams@redhat.com>
Fixes: 58ab5e32e6 ("sbitmap: silence bogus lockdep IRQ warning")
Fixes: ea86ea2cdc ("sbitmap: amortize cost of clearing bits")
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-15 07:31:18 +12:00