Commit Graph

742 Commits

Author SHA1 Message Date
Ewan D. Milne 26912770d9 scsi: core: Clear driver private data when retrying request
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Conflicts: Merge differences due to various commits not present in RHEL 9
	   which only affected the portion of this patch that removed the
	   memset() of the per-cmd private drivate data from the function
	   scsi_init_command() -- RHEL 9 still avoids zeroing the scsi_req
	   but this does not affect the memset() added to scsi_queue_rq().

After commit 1bad6c4a57 ("scsi: zero per-cmd private driver data for each
MQ I/O"), the xen-scsifront/virtio_scsi/snic drivers all removed code that
explicitly zeroed driver-private command data.

In combination with commit 464a00c9e0 ("scsi: core: Kill DRIVER_SENSE"),
after virtio_scsi performs a capacity expansion, the first request will
return a unit attention to indicate that the capacity has changed. And then
the original command is retried. As driver-private command data was not
cleared, the request would return UA again and eventually time out and fail.

Zero driver-private command data when a request is retried.

Fixes: f7de50da14 ("scsi: xen-scsifront: Remove code that zeroes driver-private command data")
Fixes: c2bb87318b ("scsi: virtio_scsi: Remove code that zeroes driver-private command data")
Fixes: c3006a9264 ("scsi: snic: Remove code that zeroes driver-private command data")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250217021628.2929248-1-yebin@huaweicloud.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit dce5c4afd035e8090a26e5d776b1682c0e649683)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2025-04-13 14:10:19 -04:00
Ewan D. Milne 8c8174d8c0 scsi: core: Do not retry I/Os during depopulation
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline

Fail I/Os instead of retry to prevent user space processes from being
blocked on the I/O completion for several minutes.

Retrying I/Os during "depopulation in progress" or "depopulation restore in
progress" results in a continuous retry loop until the depopulation
completes or until the I/O retry loop is aborted due to a timeout by the
scsi_cmd_runtime_exceeced().

Depopulation is slow and can take 24+ hours to complete on 20+ TB HDDs.
Most I/Os in the depopulation retry loop end up taking several minutes
before returning the failure to user space.

Cc: stable@vger.kernel.org # 4.18.x: 2bbeb8d scsi: core: Handle depopulation and restoration in progress
Cc: stable@vger.kernel.org # 4.18.x
Fixes: e37c7d9a03 ("scsi: core: sanitize++ in progress")
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20250131184408.859579-1-ipylypiv@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9ff7c383b8ac0c482a1da7989f703406d78445c6)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2025-04-13 14:10:19 -04:00
Ewan D. Milne 18b31dff93 scsi: scsi_lib: Add kernel-doc for exported functions
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline

Add kernel-doc for scsi_failures_reset_retries() and scsi_alloc_request()
since these are exported.  This allows them to be part of the SCSI
driver-api docbook.

Fix kernel-doc comments for scsi_vpd_tpg_id() [add kernel-doc for one
parameter and fix a typo].

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20241212205217.597844-4-rdunlap@infradead.org
CC: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
CC: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 39d2112ab7c8050642fdc2830a0a3edc5337827f)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2025-04-13 14:10:18 -04:00
Augusto Caringi 74782eb600 Merge: block: update with v6.14
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6580

JIRA: https://issues.redhat.com/browse/RHEL-79409

we don't backport  "block: Fix potential deadlock while freezing queue and acquiring sysfs_lock

Omitted-Fix: 224749be6c23 ("block: Revert "block: Fix potential deadlock while freezing queue and acquiring sysfs_lock"")

Omitted-Fix: 2fa07d7a0f00 ("btrfs: pass write-hint for buffered IO")

Omitted-Fix: e559ee022658 ("btrfs: validate queue limits")

Omitted-Fix: 7467bc5959bf ("btrfs: zoned: calculate max_extent_size properly on non-zoned setup")

Omitted-Fix: c7c97ceff98c ("btrfs: handle bio_split() errors")

Signed-off-by: Ming Lei <ming.lei@redhat.com>

Approved-by: Ewan D. Milne <emilne@redhat.com>
Approved-by: Maurizio Lombardi <mlombard@redhat.com>
Approved-by: Rafael Aquini <raquini@redhat.com>
Approved-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Approved-by: Steve Best <sbest@redhat.com>
Approved-by: CKI KWF Bot <cki-ci-bot+kwf-gitlab-com@redhat.com>

Merged-by: Augusto Caringi <acaringi@redhat.com>
2025-04-04 12:34:54 -03:00
Ming Lei 16cc2d94bb block: force noio scope in blk_mq_freeze_queue
JIRA: https://issues.redhat.com/browse/RHEL-79409

commit 1e1a9cecfab3f22ebef0a976f849c87be8d03c1c
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Jan 31 13:03:47 2025 +0100

    block: force noio scope in blk_mq_freeze_queue

    When block drivers or the core block code perform allocations with a
    frozen queue, this could try to recurse into the block device to
    reclaim memory and deadlock.  Thus all allocations done by a process
    that froze a queue need to be done without __GFP_IO and __GFP_FS.
    Instead of tying to track all of them down, force a noio scope as
    part of freezing the queue.

    Note that nvme is a bit of a mess here due to the non-owner freezes,
    and they will be addressed separately.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20250131120352.1315351-2-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2025-03-14 16:48:44 +08:00
Ming Lei bad9e90252 block: simplify tag allocation policy selection
JIRA: https://issues.redhat.com/browse/RHEL-79409

commit ce32496ec1abe866225f2e2005ceda68cf4c7bf4
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jan 6 09:35:11 2025 +0100

    block: simplify tag allocation policy selection

    Use a plain BLK_MQ_F_* flag to select the round robin tag selection
    instead of overlaying an enum with just two possible values into the
    flags space.

    Doing so allows adding a BLK_MQ_F_MAX sentinel for simplified overflow
    checking in the messy debugfs helpers.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: John Garry <john.g.garry@oracle.com>
    Link: https://lore.kernel.org/r/20250106083531.799976-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2025-03-14 16:48:39 +08:00
Ming Lei 6fa7a62b2f block: remove BLK_MQ_F_SHOULD_MERGE
JIRA: https://issues.redhat.com/browse/RHEL-79409
Conflicts: drop change on ublk which isn't supported by rhel9,
	and drop mtd change which isn't needed.

commit cc76ace465d6977b47daa427379b7be1e0976f12
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Dec 19 07:01:59 2024 +0100

    block: remove BLK_MQ_F_SHOULD_MERGE

    BLK_MQ_F_SHOULD_MERGE is set for all tag_sets except those that purely
    process passthrough commands (bsg-lib, ufs tmf, various nvme admin
    queues) and thus don't even check the flag.  Remove it to simplify the
    driver interface.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20241219060214.1928848-1-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2025-03-14 16:48:35 +08:00
Ewan D. Milne f8d9901d43 scsi: core: Fix command pass through retry regression
JIRA: https://issues.redhat.com/browse/RHEL-77122
Upstream Status: From upstream linux mainline

scsi_check_passthrough() is always called, but it doesn't check for if a
command completed successfully. As a result, if a command was successful and
the caller used SCMD_FAILURE_RESULT_ANY to indicate what failures it wanted
to retry, we will end up retrying the command. This will cause delays during
device discovery because of the command being sent multiple times. For some
USB devices it can also cause the wrong device size to be used.

This patch adds a check for if the command was successful. If it is we
return immediately instead of trying to match a failure.

Fixes: 994724e6b3f0 ("scsi: core: Allow passthrough to request midlayer retries")
Reported-by: Kris Karas <bugs-a21@moonlit-rail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219652
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20250107010220.7215-1-michael.christie@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8604f633f59375687fa115d6f691de95a42520e3)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2025-01-30 17:44:35 -05:00
Ming Lei 8c226a7118 blk-integrity: improved sg segment mapping
JIRA: https://issues.redhat.com/browse/RHEL-68422

commit 76c313f658d2752e8527610677164aa7094ef7a5
Author: Keith Busch <kbusch@kernel.org>
Date:   Fri Sep 13 12:17:46 2024 -0700

    blk-integrity: improved sg segment mapping

    Make the integrity mapping more like data mapping, blk_rq_map_sg. Use
    the request to validate the segment count, and update the callers so
    they don't have to.

    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Link: https://lore.kernel.org/r/20240913191746.2628196-1-kbusch@meta.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-11-28 17:34:14 +08:00
Ming Lei 6395f3c0a6 scsi: use request to get integrity segments
JIRA: https://issues.redhat.com/browse/RHEL-68422

commit 27c3785e94f003c664d9d867fbd62d1494546876
Author: Keith Busch <kbusch@kernel.org>
Date:   Fri Sep 13 11:28:51 2024 -0700

    scsi: use request to get integrity segments

    The request tracks the integrity segments already, so no need to recount
    the segments again.

    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Link: https://lore.kernel.org/r/20240913182854.2445457-7-kbusch@meta.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-11-28 17:34:13 +08:00
Ewan D. Milne e36fee2fbf scsi: core: Remove scsi_execute_req()/scsi_execute() functions
JIRA: https://issues.redhat.com/browse/RHEL-62151
Upstream Status: From upstream linux mainline
Conflicts: Merge differences due to prior commits, also removed __scsi_execute()

scsi_execute() and scsi_execute_req() are no longer used so remove them.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 946a10511f6588c20bbd312be15d64cc3c3fc796)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-11-15 17:57:59 -05:00
Jerry Snitselaar 4afee8d65a scsi: check that busses support the DMA API before setting dma parameters
JIRA: https://issues.redhat.com/browse/RHEL-61942
Upstream Status: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

commit b5a73bf4d1de95e620bf5f592557b81f71c76f0e
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Aug 22 06:56:31 2024 +0200

    scsi: check that busses support the DMA API before setting dma parameters

    We'll start throwing warnings soon when dma_set_seg_boundary and
    dma_set_max_seg_size are called on devices for buses that don't fully
    support the DMA API.  Prepare for that by making the calls in the SCSI
    midlayer conditional.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Robin Murphy <robin.murphy@arm.com>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

(cherry picked from commit b5a73bf4d1de95e620bf5f592557b81f71c76f0e)
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
2024-11-04 08:57:36 -07:00
Ming Lei 6758bae338 block: move dma_pad_mask into queue_limits
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit e94b45d08b5d1c230c0f59c3eed758d28658851e
Author: Christoph Hellwig <hch@lst.de>
Date:   Wed Jun 26 16:26:29 2024 +0200

    block: move dma_pad_mask into queue_limits

    dma_pad_mask is a queue_limits by all ways of looking at it, so move it
    there and set it through the atomic queue limits APIs.

    Add a little helper that takes the alignment and pad into account to
    simplify the code that is touched a bit.

    Note that there never was any need for the > check in
    blk_queue_update_dma_pad, this probably was just copy and paste from
    dma_update_dma_alignment.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
    Link: https://lore.kernel.org/r/20240626142637.300624-9-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:19:11 +08:00
Ming Lei 367246a3de block: move the bounce flag into the features field
JIRA: https://issues.redhat.com/browse/RHEL-56837
Conflicts: cover mmc change which is done in upstream merge

commit 339d3948c07b4aa2940aeb874294a7d6782cec16
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jun 17 08:04:53 2024 +0200

    block: move the bounce flag into the features field

    Move the bounce flag into the features field to reclaim a little bit of
    space.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/20240617060532.127975-27-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:19:09 +08:00
Ming Lei 94ab84acc0 block: move the add_random flag to queue_limits
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit 39a9f1c334f9f27b3b3e6d0005c10ed667268346
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jun 17 08:04:42 2024 +0200

    block: move the add_random flag to queue_limits

    Move the add_random flag into the queue_limits feature field so that it
    can be set atomically with the queue frozen.

    Note that this also removes code from dm to clear the flag based on
    the underlying devices, which can't be reached as dm devices will
    always start out without the flag set.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/20240617060532.127975-16-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:19:08 +08:00
Ming Lei 8c1c7e0046 scsi: core: Add a dma_alignment field to the host and host template
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit 5b7dfbeff92a4a00b55b2be580f057d533b65cd5
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Apr 9 16:37:32 2024 +0200

    scsi: core: Add a dma_alignment field to the host and host template

    Get drivers out of the business of having to call the block layer DMA
    alignment limits helpers themselves.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20240409143748.980206-8-hch@lst.de
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: John Garry <john.g.garry@oracle.com>
    Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:19:02 +08:00
Ming Lei 7d11957b1e scsi: core: Add a no_highmem flag to struct Scsi_Host
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit 6248d7f7714f018f2c02f356582784e74596f8e8
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Apr 9 16:37:31 2024 +0200

    scsi: core: Add a no_highmem flag to struct Scsi_Host

    While we really should be killing the block layer bounce buffering ASAP, I
    even more urgently need to stop the drivers to fiddle with the limits from
    ->slave_configure.  Add a no_highmem flag to the Scsi_Host to centralize
    this setting and switch the remaining four drivers that use block layer
    bounce buffering to it.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20240409143748.980206-7-hch@lst.de
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: John Garry <john.g.garry@oracle.com>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:19:01 +08:00
Ming Lei 2c00fcfc02 scsi: core: Initialize scsi midlayer limits before allocating the queue
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit afd53a3d852808bfeb5bc3ae3cd1caa9389bcc94
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue Apr 9 16:37:29 2024 +0200

    scsi: core: Initialize scsi midlayer limits before allocating the queue

    Turn __scsi_init_queue() into scsi_init_limits() which initializes
    queue_limits structure that can be passed to blk_mq_alloc_queue().

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Link: https://lore.kernel.org/r/20240409143748.980206-5-hch@lst.de
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: John Garry <john.g.garry@oracle.com>
    Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:19:01 +08:00
Ming Lei ac0b5e1fff block: Remove BLK_STS_ZONE_RESOURCE
JIRA: https://issues.redhat.com/browse/RHEL-56837

commit 63b5385e781417e73bda3fd652c2199826afda6e
Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon Apr 8 10:41:19 2024 +0900

    block: Remove BLK_STS_ZONE_RESOURCE

    The zone append emulation of the scsi disk driver was the only driver
    using BLK_STS_ZONE_RESOURCE. With this code removed,
    BLK_STS_ZONE_RESOURCE is now unused. Remove this macro definition and
    simplify blk_mq_dispatch_rq_list() where this status code was handled.

    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Tested-by: Hans Holmberg <hans.holmberg@wdc.com>
    Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20240408014128.205141-20-dlemoal@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-09-27 11:18:56 +08:00
Ewan D. Milne b1f054d489 scsi: core: Fix handling of SCMD_FAIL_IF_RECOVERING
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

There is code in the SCSI core that sets the SCMD_FAIL_IF_RECOVERING
flag but there is no code that clears this flag. Instead of only clearing
SCMD_INITIALIZED in scsi_end_request(), clear all flags. It is never
necessary to preserve any command flags inside scsi_end_request().

Cc: stable@vger.kernel.org
Fixes: 310bcaef6d7e ("scsi: core: Support failing requests while recovering")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240325224417.1477135-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ca91259b775f6fd98ae5d23bb4eec101d468ba8d)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:22 -04:00
Ewan D. Milne 96d8b99128 scsi: sd: Handle read/write CDL timeout failures
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

Commands using a duration limit descriptor that has limit policies set to a
value other than 0x0 may be failed by the device if one of the limits are
exceeded. For such commands, since the failure is the result of the user
duration limit configuration and workload, the commands should not be
retried and terminated immediately. Furthermore, to allow the user to
differentiate these "soft" failures from hard errors due to hardware
problem, a different error code than EIO should be returned.

There are 2 cases to consider:

(1) The failure is due to a limit policy failing the command with a check
condition sense key, that is, any limit policy other than 0xD.  For this
case, scsi_check_sense() is modified to detect failures with the ABORTED
COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING or COMMAND
TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR
RECOVERY additional sense code. For these failures, a SUCCESS disposition
is returned so that scsi_finish_command() is called to terminate the
command.

(2) The failure is due to a limit policy set to 0xD, which result in the
command being terminated with a GOOD status, COMPLETED sense key, and DATA
CURRENTLY UNAVAILABLE additional sense code. To handle this case, the
scsi_check_sense() is modified to return a SUCCESS disposition so that
scsi_finish_command() is called to terminate the command.  In addition,
scsi_decide_disposition() has to be modified to see if a command being
terminated with GOOD status has sense data.  This is as defined in SCSI
Primary Commands - 6 (SPC-6), so all according to spec, even if GOOD status
commands were not checked before.

If scsi_check_sense() detects sense data representing a duration limit,
scsi_check_sense() will set the newly introduced SCSI ML byte
SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in scsi_noretry_cmd(),
so that a command that failed because of a CDL timeout cannot be
retried. The SCSI ML byte is also checked in scsi_result_to_blk_status() to
complete the command request with the BLK_STS_DURATION_LIMIT status, which
result in the user seeing ETIME errors for the failed commands.

Co-developed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-12-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 390e2d1a587405a522dc6b433d45648f895a352c)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:20 -04:00
Ewan D. Milne 41addfe554 scsi: core: Support retrieving sub-pages of mode pages
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

Allow scsi_mode_sense() to retrieve sub-pages of mode pages by adding the
subpage argument. Change all the current caller sites to specify the
subpage 0.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-7-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a6cdc35fab0d813d54744abe2af07d6c49c07d6e)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:17 -04:00
Ewan D. Milne 011b7a6125 scsi: core: Rename and move get_scsi_ml_byte()
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

SCSI has two different getters:

 - get_XXX_byte() (in scsi_cmnd.h) which takes a struct scsi_cmnd *, and

 - XXX_byte() (in scsi.h) which takes a scmd->result.

The proper name for get_scsi_ml_byte() should thus be without the get_
prefix, as it takes a scmd->result. Rename the function to rectify this.
(This change was suggested by Mike Christie.)

Additionally, move get_scsi_ml_byte() to scsi_priv.h since both scsi_lib.c
and scsi_error.c will need to use this helper in a follow-up patch.

Cc: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-6-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 734326937b65cec7ffd00bfbbce0f791ac4aac84)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:17 -04:00
Ewan D. Milne c88b9a5589 scsi: core: Have midlayer retry scsi_mode_sense() UAs
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

This has scsi_mode_sense() have the SCSI midlayer retry UAs instead of
driving them itself.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20240123002220.129141-13-michael.christie@oracle.com
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 21bdff48e12bf674208e0575a03ca89d663f1a3c)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:14 -04:00
Ewan D. Milne 8c1c82c4a3 scsi: core: Allow passthrough to request midlayer retries
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
Conflicts: Merge differences due to lack of commit 946a10511f65
	   ("scsi: core: Remove scsi_execute_req()/scsi_execute() functions")
	   as well as lack of commit dbb4c84d87af
	   ("scsi: core: Move the result field from struct scsi_request to struct scsi_cmnd")

For passthrough we don't retry any error which we get a check condition
for. This results in a lot of callers driving their own retries for all
UAs, specific UAs, NOT_READY, specific sense values or any type of failure.

This adds the core code to allow passthrough users to specify what errors
they want the SCSI midlayer to retry for them. We can then convert users to
drop a lot of their sense parsing and retry handling.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20240123002220.129141-2-michael.christie@oracle.com
Reviewed-by: John Garry <john.g.garry@oracle.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 994724e6b3f05fb3b6e4b1e87d7e074b65d47bf9)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:11 -04:00
Ewan D. Milne 5923fc89b1 scsi: Fix sshdr use in scsi_test_unit_ready
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-10-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f43158eefd655d34e38b0cc35b959149ddf02485)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:10 -04:00
Ewan D. Milne 2bd0680e46 scsi: core: Clean up scsi_dev_queue_ready()
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

This is just a cleanup for scsi_dev_queue_ready() to avoid a redundant goto
and if statement. No functional change.

Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231018113746.1940197-2-haowenchao2@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3dc985bfbd00e1fb3dac4b1359efd6b71855b81f)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:06 -04:00
Ewan D. Milne 8c283dd606 scsi: core: Handle depopulation and restoration in progress
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

The default handling of the NOT READY sense key is to wait for the device
to become ready. The "wait" is assumed to be relatively short. However
there is a sub-class of NOT READY that have the "... in progress" phrase in
their additional sense code and these can take much longer.  Following on
from commit 505aa4b6a8 ("scsi: sd: Defer spinning up drive while SANITIZE
is in progress") we now have element depopulation and restoration that can
take a long time.  For example, over 24 hours for a 20 TB, 7200 rpm hard
disk to depopulate 1 of its 20 elements.

Add handling of ASC/ASCQ: 0x4,0x24 (depopulation in progress)
and ASC/ASCQ: 0x4,0x25 (depopulation restoration in progress)
to sd.c . The scsi_lib.c has incomplete handling of these
two messages, so complete it.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20231015050650.131145-1-dgilbert@interlog.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2bbeb8d12404cf0603f513fc33269ef9abfbb396)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:06 -04:00
Lucas Zampieri 1d4b4af7c6 Merge: block: sync with v6.8
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3822

JIRA: https://issues.redhat.com/browse/RHEL-25988

Signed-off-by: Ming Lei <ming.lei@redhat.com>

Approved-by: Nico Pache <npache@redhat.com>
Approved-by: Maurizio Lombardi <mlombard@redhat.com>
Approved-by: Jeff Moyer <jmoyer@redhat.com>

Merged-by: Lucas Zampieri <lzampier@redhat.com>
2024-03-18 16:54:21 -03:00
Ming Lei f989fe2192 block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICT
JIRA: https://issues.redhat.com/browse/RHEL-25988

commit 7ba150834b840f6f5cdd07ca69a4ccf39df59a66
Author: Mike Christie <michael.christie@oracle.com>
Date:   Fri Apr 7 15:05:35 2023 -0500

    block: Rename BLK_STS_NEXUS to BLK_STS_RESV_CONFLICT

    BLK_STS_NEXUS is used for NVMe/SCSI reservation conflicts and DASD's
    locking feature which works similar to NVMe/SCSI reservations where a
    host can get a lock on a device and when the lock is taken it will get
    failures.

    This patch renames BLK_STS_NEXUS so it better reflects this type of
    use.

    Signed-off-by: Mike Christie <michael.christie@oracle.com>
    Link: https://lore.kernel.org/r/20230407200551.12660-3-michael.christie@oracle.com
    Acked-by: Stefan Haberland <sth@linux.ibm.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-14 16:21:27 +08:00
Ming Lei a206d32059 block: Improve performance for BLK_MQ_F_BLOCKING drivers
JIRA: https://issues.redhat.com/browse/RHEL-25988

commit 65a558f66c308251e256317957b75d1e643c33c3
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Fri Jul 21 10:27:30 2023 -0700

    block: Improve performance for BLK_MQ_F_BLOCKING drivers

    blk_mq_run_queue() runs the queue asynchronously if BLK_MQ_F_BLOCKING
    has been set. This is suboptimal since running the queue asynchronously
    is slower than running the queue synchronously. This patch modifies
    blk_mq_run_queue() as follows if BLK_MQ_F_BLOCKING has been set:
    - Run the queue synchronously if it is allowed to sleep.
    - Run the queue asynchronously if it is not allowed to sleep.
    Additionally, blk_mq_run_hw_queue(hctx, false) calls are modified into
    blk_mq_run_hw_queue(hctx, hctx->flags & BLK_MQ_F_BLOCKING) if the caller
    may be invoked from atomic context.

    The following caller chains have been reviewed:

    blk_mq_run_hw_queue(hctx, false)
      blk_mq_get_tag()      /* may sleep, hence the functions it calls may also sleep */
      blk_execute_rq()             /* may sleep */
      blk_mq_run_hw_queues(q, async=false)
        blk_freeze_queue_start()   /* may sleep */
        blk_mq_requeue_work()      /* may sleep */
        scsi_kick_queue()
          scsi_requeue_run_queue() /* may sleep */
          scsi_run_host_queues()
            scsi_ioctl_reset()     /* may sleep */
      blk_mq_insert_requests(hctx, ctx, list, run_queue_async=false)
        blk_mq_dispatch_plug_list(plug, from_sched=false)
          blk_mq_flush_plug_list(plug, from_schedule=false)
            __blk_flush_plug(plug, from_schedule=false)
            blk_add_rq_to_plug()
              blk_mq_submit_bio()  /* may sleep if REQ_NOWAIT has not been set */
      blk_mq_plug_issue_direct()
        blk_mq_flush_plug_list()   /* see above */
      blk_mq_dispatch_plug_list(plug, from_sched=false)
        blk_mq_flush_plug_list()   /* see above */
      blk_mq_try_issue_directly()
        blk_mq_submit_bio()        /* may sleep if REQ_NOWAIT has not been set */
      blk_mq_try_issue_list_directly(hctx, list)
        blk_mq_insert_requests() /* see above */

    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20230721172731.955724-4-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-07 13:19:54 +08:00
Ming Lei 423710392f scsi: Remove a blk_mq_run_hw_queues() call
JIRA: https://issues.redhat.com/browse/RHEL-25988

commit d42e2e3448a99c41c8489766eeb732d8d741d5be
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Fri Jul 21 10:27:29 2023 -0700

    scsi: Remove a blk_mq_run_hw_queues() call

    blk_mq_kick_requeue_list() calls blk_mq_run_hw_queues() asynchronously.
    Leave out the direct blk_mq_run_hw_queues() call. This patch causes
    scsi_run_queue() to call blk_mq_run_hw_queues() asynchronously instead
    of synchronously. Since scsi_run_queue() is not called from the hot I/O
    submission path, this patch does not affect the hot path.

    This patch prepares for allowing blk_mq_run_hw_queue() to sleep if
    BLK_MQ_F_BLOCKING has been set. scsi_run_queue() may be called from
    atomic context and must not sleep. Hence the removal of the
    blk_mq_run_hw_queues(q, false) call. See also scsi_unblock_requests().

    Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20230721172731.955724-3-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-07 13:19:54 +08:00
Ming Lei 92c7bd49c1 scsi: Inline scsi_kick_queue()
JIRA: https://issues.redhat.com/browse/RHEL-25988
Conflicts: rhel9 didn't backport commit d460f6240592 ("scsi: core: Rework
scsi_single_lun_run()"), so we have to replace scsi_kick_queue() in
scsi_single_lun_run() with blk_mq_run_hw_queues() too.

commit b5ca9acff553874aaf1faf176e076cbd7cc4aa0e
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Fri Jul 21 10:27:28 2023 -0700

    scsi: Inline scsi_kick_queue()

    Inline scsi_kick_queue() to prepare for modifying the second argument
    passed to blk_mq_run_hw_queues().

    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20230721172731.955724-2-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-03-07 13:19:54 +08:00
Ming Lei 9f83840c83 scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
JIRA: https://issues.redhat.com/browse/RHEL-23941

commit 4e6c9011990726f4d175e2cdfebe5b0b8cce4839
Author: Ming Lei <ming.lei@redhat.com>
Date:   Sat Feb 3 10:45:21 2024 +0800

    scsi: core: Move scsi_host_busy() out of host lock if it is for per-command

    Commit 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock
    for waking up EH handler") intended to fix a hard lockup issue triggered by
    EH. The core idea was to move scsi_host_busy() out of the host lock when
    processing individual commands for EH. However, a suggested style change
    inadvertently caused scsi_host_busy() to remain under the host lock. Fix
    this by calling scsi_host_busy() outside the lock.

    Fixes: 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler")
    Cc: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
    Cc: Bart Van Assche <bvanassche@acm.org>
    Cc: Ewan D. Milne <emilne@redhat.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-02-18 15:47:51 +08:00
Ming Lei 66e05829c1 scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler
JIRA: https://issues.redhat.com/browse/RHEL-23941

commit 4373534a9850627a2695317944898eb1283a2db0
Author: Ming Lei <ming.lei@redhat.com>
Date:   Fri Jan 12 15:00:00 2024 +0800

    scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler

    Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host
    lock every time for deciding if error handler kthread needs to be waken up.

    This can be too heavy in case of recovery, such as:

     - N hardware queues

     - queue depth is M for each hardware queue

     - each scsi_host_busy() iterates over (N * M) tag/requests

    If recovery is triggered in case that all requests are in-flight, each
    scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
    for the last in-flight request, scsi_host_busy() has been run for (N * M -
    1) times, and request has been iterated for (N*M - 1) * (N * M) times.

    If both N and M are big enough, hard lockup can be triggered on acquiring
    host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).

    Fix the issue by calling scsi_host_busy() outside the host lock. We don't
    need the host lock for getting busy count because host the lock never
    covers that.

    [mkp: Drop unnecessary 'busy' variables pointed out by Bart]

    Cc: Ewan Milne <emilne@redhat.com>
    Fixes: 6eb045e092 ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com
    Reviewed-by: Ewan D. Milne <emilne@redhat.com>
    Reviewed-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
    Tested-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-02-18 15:47:24 +08:00
Ewan D. Milne 8db7b08b76 scsi: core: Improve warning message in scsi_device_block()
JIRA: https://issues.redhat.com/browse/RHEL-14312
Upstream Status: From upstream linux mainline

If __scsi_internal_device_block() returns an error, it is always -EINVAL
because of an invalid state transition. For debugging purposes, it makes
more sense to print the device state.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-8-mwilck@suse.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6d7160c7da6fa3010252910a1680c62ababa6c2f)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-11-03 12:17:06 -04:00
Ewan D. Milne 0d509d6ee9 scsi: core: Replace scsi_target_block() with scsi_block_targets()
JIRA: https://issues.redhat.com/browse/RHEL-14312
Upstream Status: From upstream linux mainline
Conflicts: Retain scsi_target_block() for compatibility with out-of-box drivers
           as it is a longstanding exported function

All callers (fc_remote_port_delete(), __iscsi_block_session(),
__srp_start_tl_fail_timers(), srp_reconnect_rport(), snic_tgt_del()) pass
parent devices of scsi_target devices to scsi_target_block().

Rename the function to scsi_block_targets(), and simplify it by assuming
that it is always passed a parent device. Also, have callers pass the
Scsi_Host pointer to scsi_block_targets(), as every caller has this pointer
readily available.

Suggested-by: Christoph Hellwig <hch@lst.de>
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-7-mwilck@suse.com
Cc: Karan Tilak Kumar <kartilak@cisco.com>
Cc: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 31950192d939a969415d0e1da4c62598023b0850)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-11-03 12:17:06 -04:00
Ewan D. Milne c69ab6d6e3 scsi: core: Don't wait for quiesce in scsi_device_block()
JIRA: https://issues.redhat.com/browse/RHEL-14312
Upstream Status: From upstream linux mainline

scsi_device_block() is only called from scsi_target_block(), which calls it
repeatedly for every child device. For targets with many devices, waiting
for every queue to quiesce may cause a substantial delay (we measured more
than 100s delay for blocking a FC rport with 2048 LUNs).

Just call blk_mq_wait_quiesce_done() once from scsi_target_block() after
stopping all queues.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-6-mwilck@suse.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e20fff8a1f4940f46be888bd175412c2e3e64e96)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-11-03 12:17:06 -04:00
Ewan D. Milne 325e08407e scsi: core: Don't wait for quiesce in scsi_stop_queue()
JIRA: https://issues.redhat.com/browse/RHEL-14312
Upstream Status: From upstream linux mainline

scsi_stop_queue() has just two callers, one with and one without
"nowait". As blk_mq_quiesce_queue() comes down to
blk_mq_quiesce_queue_nowait() followed by blk_mq_wait_quiesce_done(), we
might as well open-code this in scsi_device_block().

Also, add a comment explaining why blk_mq_quiesce_queue_nowait() must be
called with the state_mutex held, see
https://lore.kernel.org/linux-scsi/3b8b13bf-a458-827a-b916-07d7eee8ae00@acm.org/.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-5-mwilck@suse.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d7035b73a73a79a1dc991fad0ee5f784559e81ed)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-11-03 12:17:06 -04:00
Ewan D. Milne 07f47c07fb scsi: core: Merge scsi_internal_device_block() and device_block()
JIRA: https://issues.redhat.com/browse/RHEL-14312
Upstream Status: From upstream linux mainline

scsi_internal_device_block() is only called from device_block().  Merge the
two functions, and call the result scsi_device_block(), as the name
device_block() is confusingly generic.

Signed-off-by: Martin Wilck <mwilck@suse.com>
Link: https://lore.kernel.org/r/20230614103616.31857-4-mwilck@suse.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c5e46f7ad43b0519980020378a2b00b339359968)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-11-03 12:17:05 -04:00
Ewan D. Milne f24f646c30 scsi: core: Support setting BLK_MQ_F_BLOCKING
JIRA: https://issues.redhat.com/browse/RHEL-14312
Upstream Status: From upstream linux mainline
Conflicts: KABI changes

Prepare for adding code in ufshcd_queuecommand() that may sleep. This patch
is similar to a patch posted last year by Mike Christie. See also
https://lore.kernel.org/all/20220308003957.123312-2-michael.christie@oracle.com/

Cc: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230529202640.11883-3-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b125bb99559e3639764b8d169e3e9b80858fa2af)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-11-03 12:17:04 -04:00
Ewan D. Milne b167fc11c4 scsi: core: Rework scsi_host_block()
JIRA: https://issues.redhat.com/browse/RHEL-14312
Upstream Status: From upstream linux mainline

Make scsi_host_block() easier to read by converting it to the widely used
early-return style. See also commit f983622ae6 ("scsi: core: Avoid
calling synchronize_rcu() for each device in scsi_host_block()").

Reviewed-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Cc: Ye Bin <yebin10@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230529202640.11883-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c854bcdf5e18a3b672e363138f2f6657a1803170)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-11-03 12:17:04 -04:00
Ewan D. Milne dbb9eef3ab scsi: core: Only kick the requeue list if necessary
JIRA: https://issues.redhat.com/browse/RHEL-14312
Upstream Status: From upstream linux mainline

Instead of running the request queue of each device associated with a host
every 3 ms (BLK_MQ_RESOURCE_DELAY) while host error handling is in
progress, run the request queue after error handling has finished.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230518193159.1166304-4-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8b566edbdbfb5cde31a322c57932694ff48125ed)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-11-03 12:17:04 -04:00
Ming Lei ddafe61712 scsi: core: Decrease scsi_device's iorequest_cnt if dispatch failed
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2203125

commit 09e797c8641f6ad435c33ae24c223351197ea29a
Author: Wenchao Hao <haowenchao2@huawei.com>
Date:   Mon May 15 15:01:56 2023 +0800

    scsi: core: Decrease scsi_device's iorequest_cnt if dispatch failed

    If scsi_dispatch_cmd() failed, the SCSI command was not sent to the target,
    scsi_queue_rq() would return BLK_STS_RESOURCE and the related request would
    be requeued. The timeout of this request would not fire, no one would
    increase iodone_cnt.

    The above flow would result the iodone_cnt smaller than iorequest_cnt.  So
    decrease the iorequest_cnt if dispatch failed to workaround the issue.

    Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
    Reported-by: Ming Lei <ming.lei@redhat.com>
    Closes: https://lore.kernel.org/r/ZF+zB+bB7iqe0wGd@ovpn-8-17.pek2.redhat.com
    Link: https://lore.kernel.org/r/20230515070156.1790181-3-haowenchao2@huawei.com
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-05-26 09:17:18 +08:00
Ming Lei b69db1c573 scsi: Revert "scsi: core: Do not increase scsi_device's iorequest_cnt if dispatch failed"
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2203125

commit 6ca9818d1624e136a76ae8faedb6b6c95ca66903
Author: Wenchao Hao <haowenchao2@huawei.com>
Date:   Mon May 15 15:01:55 2023 +0800

    scsi: Revert "scsi: core: Do not increase scsi_device's iorequest_cnt if dispatch failed"

    The "atomic_inc(&cmd->device->iorequest_cnt)" in scsi_queue_rq() would
    cause kernel panic because cmd->device may be freed after returning from
    scsi_dispatch_cmd().

    This reverts commit cfee29ffb45b1c9798011b19d454637d1b0fe87d.

    Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
    Reported-by: Ming Lei <ming.lei@redhat.com>
    Closes: https://lore.kernel.org/r/ZF+zB+bB7iqe0wGd@ovpn-8-17.pek2.redhat.com
    Link: https://lore.kernel.org/r/20230515070156.1790181-2-haowenchao2@huawei.com
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2023-05-26 09:17:16 +08:00
Ewan D. Milne b9b30e5973 scsi: core: Extend struct scsi_exec_args
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline
Conflicts: Merge differences due to lack of commit 6aded12b10e0
	   ("scsi: core: Remove struct scsi_request") and related commits

Allow SCSI LLDs to specify SCMD_* flags.

Link: https://lore.kernel.org/r/20230210193258.4004923-2-bvanassche@acm.org
Cc: Mike Christie <michael.christie@oracle.com>
Cc: John Garry <john.g.garry@oracle.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 35cd2f5542df569122d48caf606b972642012c50)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:49 -04:00
Ewan D. Milne 8cfeec905c scsi: core: Convert to scsi_execute_cmd()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

scsi_execute_req() is going to be removed. Convert SCSI midlayer to
scsi_execute_cmd().

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7dfe0b5e7ca67c659475883712c1d0449f900f9c)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:46 -04:00
Ewan D. Milne b7ca7e0406 scsi: core: Add struct for args to execution functions
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline
Conflicts: RHEL9 cannot remove exported function __scsi_execute(), so retain
           that functionality via macro invocation of scsi_execute()
           In addition we have to #include <linux/dma-direction.h> because
           it was missing from this upstream commit -- even though the
           scsi_execute_req() macro is removed upstream in a later commit,
           we will still need it and in any case it was not bisectable
           Merge differences due to lack of commit 6aded12b10e0
           ("scsi: core: Remove struct scsi_request" and related commits

Move the SCSI execution functions to use a struct for passing in optional
args. This commit adds the new struct, temporarily converts scsi_execute()
and scsi_execute_req() ands a new helper, scsi_execute_cmd(), which takes
the scsi_exec_args struct.

There should be no change in behavior. We no longer allow users to pass in
any request->rq_flags value, but they were only passing in RQF_PM which we
do support by allowing users to pass in the BLK_MQ_REQ flags used by
blk_mq_alloc_request().

Subsequent commits will convert scsi_execute() and scsi_execute_req() users
to the new helpers then remove scsi_execute() and scsi_execute_req().

Signed-off-by: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d0949565811f0896c1c7e781ab2ad99d34273fdf)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:44 -04:00
Ewan D. Milne 96844d9a6e scsi: core: Do not increase scsi_device's iorequest_cnt if dispatch failed
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

If scsi_dispatch_cmd() failed, the SCSI command was not sent to the target.
scsi_queue_rq() would return BLK_STS_RESOURCE if scsi_dispatch_cmd()
failed, and the related request would be requeued. The timeout of this
request would not fire, so noone would increase iodone_cnt.

Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
Link: https://lore.kernel.org/r/20221123122137.150776-3-haowenchao@huawei.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cfee29ffb45b1c9798011b19d454637d1b0fe87d)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:41 -04:00
Ewan D. Milne 3ece38fcf2 scsi: core: Support failing requests while recovering
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

The current behavior for SCSI commands submitted while error recovery is
ongoing is to retry command submission after error recovery has finished.
See also the scsi_host_in_recovery() check in scsi_host_queue_ready(). Add
support for failing SCSI commands while host recovery is in progress. This
functionality will be used to fix a deadlock in the UFS driver.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-4-bvanassche@acm.org
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 310bcaef6d7ed1626bba95dd9b5c5acd189c0e35)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:40 -04:00