Commit Graph

314 Commits

Author SHA1 Message Date
Ewan D. Milne e125b74c02 scsi: scsi_error: Add kernel-doc for exported functions
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline

Convert scsi_report_bus_reset() and scsi_report_device_reset() to
kernel-doc since they are exported. This allows them to be part of the
driver-api/scsi.rst docbook.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20241212205217.597844-2-rdunlap@infradead.org
CC: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
CC: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 20b98768c2e743df7648476521818600ac4a86ef)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2025-04-13 14:10:18 -04:00
Ewan D. Milne 96d8b99128 scsi: sd: Handle read/write CDL timeout failures
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

Commands using a duration limit descriptor that has limit policies set to a
value other than 0x0 may be failed by the device if one of the limits are
exceeded. For such commands, since the failure is the result of the user
duration limit configuration and workload, the commands should not be
retried and terminated immediately. Furthermore, to allow the user to
differentiate these "soft" failures from hard errors due to hardware
problem, a different error code than EIO should be returned.

There are 2 cases to consider:

(1) The failure is due to a limit policy failing the command with a check
condition sense key, that is, any limit policy other than 0xD.  For this
case, scsi_check_sense() is modified to detect failures with the ABORTED
COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING or COMMAND
TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR
RECOVERY additional sense code. For these failures, a SUCCESS disposition
is returned so that scsi_finish_command() is called to terminate the
command.

(2) The failure is due to a limit policy set to 0xD, which result in the
command being terminated with a GOOD status, COMPLETED sense key, and DATA
CURRENTLY UNAVAILABLE additional sense code. To handle this case, the
scsi_check_sense() is modified to return a SUCCESS disposition so that
scsi_finish_command() is called to terminate the command.  In addition,
scsi_decide_disposition() has to be modified to see if a command being
terminated with GOOD status has sense data.  This is as defined in SCSI
Primary Commands - 6 (SPC-6), so all according to spec, even if GOOD status
commands were not checked before.

If scsi_check_sense() detects sense data representing a duration limit,
scsi_check_sense() will set the newly introduced SCSI ML byte
SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in scsi_noretry_cmd(),
so that a command that failed because of a CDL timeout cannot be
retried. The SCSI ML byte is also checked in scsi_result_to_blk_status() to
complete the command request with the BLK_STS_DURATION_LIMIT status, which
result in the user seeing ETIME errors for the failed commands.

Co-developed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-12-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 390e2d1a587405a522dc6b433d45648f895a352c)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:20 -04:00
Ewan D. Milne 1d939e03cb scsi: core: Kick the requeue list after inserting when flushing
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

When libata calls ata_link_abort() to abort all ata queued commands, it
calls blk_abort_request() on the SCSI command representing each QC.

This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() for
each SCSI command.

scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds the
command to shost->eh_cmd_q.

This will wake up the SCSI EH, and eventually the libata EH strategy
handler will be called, which calls scsi_eh_flush_done_q() to either flush
retry or flush finish each failed command.

The commands that are flush retried by scsi_eh_flush_done_q() are done so
using scsi_queue_insert().

Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with the
second argument set to true, indicating that it should always kick/run the
requeue list after inserting.

After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() does not kick/run the requeue list after
inserting, if the current SCSI host state is recovery (which is the case in
the libata example above).

This optimization is probably fine in most cases, as I can only assume that
most often someone will eventually kick/run the queues.

However, that is not the case for scsi_eh_flush_done_q(), where we can see
that the request gets inserted to the requeue list, but the queue is never
started after the request has been inserted, leading to the block layer
waiting for the completion of command that never gets to run.

Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host
state is most likely always in recovery when this function is called.

Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after
inserting a flush retry command, so that scsi_eh_flush_done_q() keeps the
same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick the
requeue list if necessary").

Simple reproducer for the libata example above:
$ hdparm -Y /dev/sda
$ echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete

Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
Reported-by: Kevin Locke <kevin@kevinlocke.name>
Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240111120533.3612509-1-cassel@kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6df0e077d76bd144c533b61d6182676aae6b0a85)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:08 -04:00
Ewan D. Milne 1a9369c1a6 scsi: core: Add a precondition check in scsi_eh_scmd_add()
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

Calling scsi_eh_scmd_add() may cause the error handler never to be woken up
because this may result in shost->host_failed to become larger than
scsi_host_busy(shost). Hence complain if scsi_eh_scmd_add() is called after
SCMD_STATE_INFLIGHT has been cleared.

Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20231115193343.2262013-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 10b53db2db8dfda84b25833043f2b63123572af6)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:07 -04:00
Ewan D. Milne bbb62e4e58 scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline

Earlier commits in this series allow battery-powered systems to build
their kernels with the default-disabled CONFIG_RCU_LAZY=y Kconfig option.
This Kconfig option causes call_rcu() to delay its callbacks in order
to batch them.  This means that a given RCU grace period covers more
callbacks, thus reducing the number of grace periods, in turn reducing
the amount of energy consumed, which increases battery lifetime which
can be a very good thing.  This is not a subtle effect: In some important
use cases, the battery lifetime is increased by more than 10%.

This CONFIG_RCU_LAZY=y option is available only for CPUs that offload
callbacks, for example, CPUs mentioned in the rcu_nocbs kernel boot
parameter passed to kernels built with CONFIG_RCU_NOCB_CPU=y.

Delaying callbacks is normally not a problem because most callbacks do
nothing but free memory.  If the system is short on memory, a shrinker
will kick all currently queued lazy callbacks out of their laziness,
thus freeing their memory in short order.  Similarly, the rcu_barrier()
function, which blocks until all currently queued callbacks are invoked,
will also kick lazy callbacks, thus enabling rcu_barrier() to complete
in a timely manner.

However, there are some cases where laziness is not a good option.
For example, synchronize_rcu() invokes call_rcu(), and blocks until
the newly queued callback is invoked.  It would not be a good for
synchronize_rcu() to block for ten seconds, even on an idle system.
Therefore, synchronize_rcu() invokes call_rcu_hurry() instead of
call_rcu().  The arrival of a non-lazy call_rcu_hurry() callback on a
given CPU kicks any lazy callbacks that might be already queued on that
CPU.  After all, if there is going to be a grace period, all callbacks
might as well get full benefit from it.

Yes, this could be done the other way around by creating a
call_rcu_lazy(), but earlier experience with this approach and
feedback at the 2022 Linux Plumbers Conference shifted the approach
to call_rcu() being lazy with call_rcu_hurry() for the few places
where laziness is inappropriate.

And another call_rcu() instance that cannot be lazy is the one in the
scsi_eh_scmd_add() function.  Leaving this instance lazy results in
unacceptably slow boot times.

Therefore, make scsi_eh_scmd_add() use call_rcu_hurry() in order to
revert to the old behavior.

[ paulmck: Apply s/call_rcu_flush/call_rcu_hurry/ feedback from Tejun Heo. ]

Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Uladzislau Rezki <urezki@gmail.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: <linux-scsi@vger.kernel.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
(cherry picked from commit 54d87b0a0c19bc3f740e4cd4b87ba14ce2e4ea73)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-06-03 13:47:02 -04:00
Ming Lei 9f83840c83 scsi: core: Move scsi_host_busy() out of host lock if it is for per-command
JIRA: https://issues.redhat.com/browse/RHEL-23941

commit 4e6c9011990726f4d175e2cdfebe5b0b8cce4839
Author: Ming Lei <ming.lei@redhat.com>
Date:   Sat Feb 3 10:45:21 2024 +0800

    scsi: core: Move scsi_host_busy() out of host lock if it is for per-command

    Commit 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock
    for waking up EH handler") intended to fix a hard lockup issue triggered by
    EH. The core idea was to move scsi_host_busy() out of the host lock when
    processing individual commands for EH. However, a suggested style change
    inadvertently caused scsi_host_busy() to remain under the host lock. Fix
    this by calling scsi_host_busy() outside the lock.

    Fixes: 4373534a9850 ("scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler")
    Cc: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
    Cc: Bart Van Assche <bvanassche@acm.org>
    Cc: Ewan D. Milne <emilne@redhat.com>
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20240203024521.2006455-1-ming.lei@redhat.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-02-18 15:47:51 +08:00
Ming Lei 66e05829c1 scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler
JIRA: https://issues.redhat.com/browse/RHEL-23941

commit 4373534a9850627a2695317944898eb1283a2db0
Author: Ming Lei <ming.lei@redhat.com>
Date:   Fri Jan 12 15:00:00 2024 +0800

    scsi: core: Move scsi_host_busy() out of host lock for waking up EH handler

    Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host
    lock every time for deciding if error handler kthread needs to be waken up.

    This can be too heavy in case of recovery, such as:

     - N hardware queues

     - queue depth is M for each hardware queue

     - each scsi_host_busy() iterates over (N * M) tag/requests

    If recovery is triggered in case that all requests are in-flight, each
    scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
    for the last in-flight request, scsi_host_busy() has been run for (N * M -
    1) times, and request has been iterated for (N*M - 1) * (N * M) times.

    If both N and M are big enough, hard lockup can be triggered on acquiring
    host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).

    Fix the issue by calling scsi_host_busy() outside the host lock. We don't
    need the host lock for getting busy count because host the lock never
    covers that.

    [mkp: Drop unnecessary 'busy' variables pointed out by Bart]

    Cc: Ewan Milne <emilne@redhat.com>
    Fixes: 6eb045e092 ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com
    Reviewed-by: Ewan D. Milne <emilne@redhat.com>
    Reviewed-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
    Tested-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2024-02-18 15:47:24 +08:00
Scott Weaver fab724ae78 Merge: scsi: core: Always send batch on reset or error handling command
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/3551

JIRA: https://issues.redhat.com/browse/RHEL-19730

    scsi: core: Always send batch on reset or error handling command

    In commit 8930a6c207 ("scsi: core: add support for request batching") the
    block layer bd->last flag was mapped to SCMD_LAST and used as an indicator
    to send the batch for the drivers that implement this feature. However, the
    error handling code was not updated accordingly.

    scsi_send_eh_cmnd() is used to send error handling commands and request
    sense. The problem is that request sense comes as a single command that
    gets into the batch queue and times out. As a result the device goes
    offline after several failed resets. This was observed on virtio_scsi
    during a device resize operation.

    [  496.316946] sd 0:0:4:0: [sdd] tag#117 scsi_eh_0: requesting sense
    [  506.786356] sd 0:0:4:0: [sdd] tag#117 scsi_send_eh_cmnd timeleft: 0
    [  506.787981] sd 0:0:4:0: [sdd] tag#117 abort

    To fix this always set SCMD_LAST flag in scsi_send_eh_cmnd() and
    scsi_reset_ioctl().

    Fixes: 8930a6c207 ("scsi: core: add support for request batching")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
    Link: https://lore.kernel.org/r/20231215121008.2881653-1-alexander.atanasov@virtuozzo.com
    Reviewed-by: Ming Lei <ming.lei@redhat.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    (cherry picked from commit 066c5b46b6eaf2f13f80c19500dbb3b84baabb33)
    Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Approved-by: Ming Lei <ming.lei@redhat.com>
Approved-by: Chris Leech <cleech@redhat.com>

Signed-off-by: Scott Weaver <scweaver@redhat.com>
2024-01-26 13:26:57 -05:00
Ewan D. Milne 5ef059b25e scsi: core: Always send batch on reset or error handling command
JIRA: https://issues.redhat.com/browse/RHEL-19730
Upstream Status: From upstream linux mainline

In commit 8930a6c207 ("scsi: core: add support for request batching") the
block layer bd->last flag was mapped to SCMD_LAST and used as an indicator
to send the batch for the drivers that implement this feature. However, the
error handling code was not updated accordingly.

scsi_send_eh_cmnd() is used to send error handling commands and request
sense. The problem is that request sense comes as a single command that
gets into the batch queue and times out. As a result the device goes
offline after several failed resets. This was observed on virtio_scsi
during a device resize operation.

[  496.316946] sd 0:0:4:0: [sdd] tag#117 scsi_eh_0: requesting sense
[  506.786356] sd 0:0:4:0: [sdd] tag#117 scsi_send_eh_cmnd timeleft: 0
[  506.787981] sd 0:0:4:0: [sdd] tag#117 abort

To fix this always set SCMD_LAST flag in scsi_send_eh_cmnd() and
scsi_reset_ioctl().

Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
Link: https://lore.kernel.org/r/20231215121008.2881653-1-alexander.atanasov@virtuozzo.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 066c5b46b6eaf2f13f80c19500dbb3b84baabb33)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2024-01-04 17:08:04 -05:00
Tomas Henzl 2bac835539 scsi: core: Allow libata to complete successful commands via EH
JIRA: https://issues.redhat.com/browse/RHEL-10941

In SCSI, we get the sense data as part of the completion, for ATA however,
we need to fetch the sense data as an extra step. For an aborted ATA
command the sense data is fetched via libata's ->eh_strategy_handler().

For Command Duration Limits policy 0xD:

  The device shall complete the command without error with the additional
  sense code set to DATA CURRENTLY UNAVAILABLE.

In order to handle this policy in libata, we intend to send a successful
command via SCSI EH, and let libata's ->eh_strategy_handler() fetch the
sense data for the good command. This is similar to how we handle an
aborted ATA command, just that we need to read the Successful NCQ Commands
log instead of the NCQ Command Error log.

When we get a SATA completion with successful commands, ATA_SENSE will be
set, indicating that some commands in the completion have sense data.

The sense_valid bitmask in the Sense Data for Successful NCQ Commands log
will inform exactly which commands that had sense data, which might be a
subset of all the commands that was completed in the same completion. (Yet
all will have ATA_SENSE set, since the status is per completion.)

The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE" sense
data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will not set
the scmd->result to DID_TIME_OUT for these commands. However, the
successful commands that did not have sense data, must not get their result
marked as DID_TIME_OUT by SCSI EH.

Add a new flag SCMD_FORCE_EH_SUCCESS, which tells SCSI EH to not mark a
command as DID_TIME_OUT, even if it has scmd->result == SAM_STAT_GOOD.

This will be used by libata in a subsequent commit.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-5-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3d848ca1ebc8d8864f25bd461914c93eff82a2d2)
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
2023-11-07 01:01:19 +01:00
Ewan D. Milne f579e72550 scsi: core: scsi_error: Do not queue pointless abort workqueue functions
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

If a host template doesn't implement the .eh_abort_handler() there is no
point in queueing the abort workqueue function; all it does is invoking
SCSI EH anyway.  So return 'FAILED' from scsi_abort_command() if the
.eh_abort_handler() is not implemented and save us from having to wait for
the abort workqueue function to complete.

Cc: Niklas Cassel <niklas.cassel@wdc.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
[niklas: moved the check to the top of scsi_abort_command()]
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20221206131346.2045375-1-niklas.cassel@wdc.com
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d0b9025540ef57cc4464ab2fc64ed8ddc49b5658)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:44 -04:00
Ewan D. Milne d9090f85f4 scsi: core: Increase scsi_device's iodone_cnt in scsi_timeout()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

If a SCSI command times out and is going to be aborted, we should increase
the iodone_cnt of the related scsi_device. Otherwise the iodone_cnt would
be smaller than iorequest_cnt.

Increasing iodone_cnt in scsi_timeout() would not cause a double accounting
issue. Brief analysis follows:

 - We add the iodone_cnt when BLK_EH_DONE is returned in
   scsi_timeout(). The related command's timeout event would not happen.

 - If the abort succeeds and the command is not retried, the command would
   be completed with scsi_finish_command() which would not increase
   iodone_cnt.

 - If the abort succeeds and the command is retried, it would be requeue. A
   scsi_dispatch_cmd() would be called and iorequest_cnt would be increased
   again.

 - If the abort fails, the error handler successfully recovers the device,
   and the command is not retried, the command would be completed with
   scsi_finish_command() which would not increase iodone_cnt.

 - If the abort fails, the error handler successfully recovers the device,
   and the command is retried, the iorequest_cnt would be increased again.

Signed-off-by: Wenchao Hao <haowenchao@huawei.com>
Link: https://lore.kernel.org/r/20221123122137.150776-2-haowenchao@huawei.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ec9780e48c77f469c339b53940ef0c5eacc8b9d2)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:41 -04:00
Ewan D. Milne 427448c2f4 scsi: core: Change the return type of .eh_timed_out()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline
Conflicts: This upstream change is incompatible with previous return values
           from the scsi_host_template.eh_timed_out() function, which is used
	   by out-of-box drivers.  For RHEL, change the ordinal values of
	   SCSI_EH_xxx to be compatible.  The new SCSI_EH_DONE value is
	   currently only used by UFS.

Commit 6600593cbd ("block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE")
made it impossible for .eh_timed_out() implementations to call
scsi_done() without causing a crash.

Restore support for SCSI timeout handlers to call scsi_done() as follows:

 * Change all .eh_timed_out() handlers as follows:

   - Change the return type into enum scsi_timeout_action.
   - Change BLK_EH_RESET_TIMER into SCSI_EH_RESET_TIMER.
   - Change BLK_EH_DONE into SCSI_EH_NOT_HANDLED.

 * In scsi_timeout(), convert the SCSI_EH_* values into BLK_EH_* values.

Reviewed-by: Lee Duncan <lduncan@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-3-bvanassche@acm.org
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit dee7121e8c0a3ce41af2b02d516f54eaec32abcd)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:39 -04:00
Ewan D. Milne 4e4f37780a scsi: core: Fix a race between scsi_done() and scsi_timeout()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

If there is a race between scsi_done() and scsi_timeout() and if
scsi_timeout() loses the race, scsi_timeout() should not reset the request
timer. Hence change the return value for this case from BLK_EH_RESET_TIMER
into BLK_EH_DONE.

Although the block layer holds a reference on a request (req->ref) while
calling a timeout handler, restarting the timer (blk_add_timer()) while a
request is being completed is racy.

Reviewed-by: Mike Christie <michael.christie@oracle.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 15f73f5b3e ("blk-mq: move failure injection out of blk_mq_complete_request")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20221018202958.1902564-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 978b7922d3dca672b41bb4b8ce6c06ab77112741)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-12 13:20:39 -04:00
Ewan D. Milne f2cd346628 scsi: core: Add I/O timeout count for SCSI device
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

Currently struct scsi_device maintains counters for requests, completions,
and errors but is missing a counter for timeouts.

For better tracking of timeouts, add a suitable counter.

Link: https://lore.kernel.org/r/1663666339-17560-1-git-send-email-wubo40@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 48517eefb20ec2d6595ebd77ae11f34b3540cd78)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-11 12:49:52 -04:00
Ewan D. Milne c5eed3c9b0 scsi: core: Convert scsi_decide_disposition() to use SCSIML_STAT
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

Don't use:

 - DID_TARGET_FAILURE

 - DID_NEXUS_FAILURE

 - DID_ALLOC_FAILURE

 - DID_MEDIUM_ERROR

Instead use the SCSI midlayer internal values.

Link: https://lore.kernel.org/r/20220812010027.8251-10-michael.christie@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7dfaae6ac1b0267d5a064970ba794a916c33b823)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-11 12:49:51 -04:00
Ewan D. Milne ee643e7d3e scsi: core: Add error codes for internal SCSI midlayer use
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2171093
Upstream Status: From upstream linux mainline

If a driver returns:

 - DID_TARGET_FAILURE

 - DID_NEXUS_FAILURE

 - DID_ALLOC_FAILURE

 - DID_MEDIUM_ERROR

we hit a couple bugs:

1. The SCSI error handler runs because scsi_decide_disposition() has no
case statements for them and we return FAILED.

2. For SG IO the userspace app gets a success status instead of failed,
because scsi_result_to_blk_status() clears those errors.

This patch adds a new internal error code byte for use by the SCSI
midlayer.  This will be used instead of the above error codes, so we don't
have to play that clearing the host code game in
scsi_result_to_blk_status() and drivers cannot accidentally use them.

A subsequent commit will then remove the internal users of the above codes
and convert us to use the new ones.

Link: https://lore.kernel.org/r/20220812010027.8251-9-michael.christie@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 36ebf1e2aa148bdcf03c413bddfc605c54b57669)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2023-04-11 12:49:51 -04:00
Frantisek Hrbata d9819eb3e5 Merge: block: update with v6.1-rc2
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/1517

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144

Signed-off-by: Ming Lei <ming.lei@redhat.com>

Approved-by: Rafael Aquini <aquini@redhat.com>
Approved-by: Nigel Croxon <ncroxon@redhat.com>
Approved-by: Jeff Moyer <jmoyer@redhat.com>

Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
2022-11-03 13:30:02 -04:00
Ewan D. Milne 7cbabe2912 scsi: core: Shorten long warning messages
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2132461
Upstream Status: From upstream linux mainline

sdev_printk() will only accept messages up to 128 bytes.

Shorten strings exceeding 128 bytes avoid printing an incomplete sentence
like:

  [  475.156955] sd 9:0:0:0: Warning! Received an indication that the LUN assignments on this target have changed. The Linux SCSI layer does not automatical

Link: https://lore.kernel.org/r/20220630024516.1571209-1-lizhijian@fujitsu.com
Suggested-by: Finn Thain <fthain@linux-m68k.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a2417db3679cffa67fbdc6c175cf68ffc86b8ac3)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-10-25 13:58:25 -04:00
Ewan D. Milne 7f2683766f scsi: core: Remove unreachable code warning
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2132461
Upstream Status: From upstream linux mainline

The smatch tool reported the following warning:
drivers/scsi/scsi_error.c:1988 scsi_decide_disposition() warn: ignoring
unreachable code.

Remove the "default:return FAILED;" instead of "return FAILED;" reported by
smatch, because compilers can provide more useful diagnostics about
switch/case statements that do not have a default statement, especially if
the "switch" applies to a value with enumeration type.

Link: https://lore.kernel.org/r/20220301080448.112813-1-yang.lee@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e1b353e7a31dcaf47c234812c46a2db9cd5be584)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-10-24 16:12:57 -04:00
Ming Lei 26dca21eb1 block: change request end_io handler to pass back a return value
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2131144
Conflicts: drop change in drivers/ufs/core/ufshpb.c; drop change
on nvme since rhel9 doesn't backport nvme uring pt command yet

commit de671d6116b5210097cd6fbb877bac92536f265b
Author: Jens Axboe <axboe@kernel.dk>
Date:   Wed Sep 21 15:19:54 2022 -0600

    block: change request end_io handler to pass back a return value

    Everything is just converted to returning RQ_END_IO_NONE, and there
    should be no functional changes with this patch.

    In preparation for allowing the end_io handler to pass ownership back
    to the block layer, rather than retain ownership of the request.

    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-23 20:50:08 +08:00
Ming Lei 7ee4df9ee8 scsi/core: Change the return type of scsi_noretry_cmd() into bool
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit 88b32c3cdf5fff7ed5bdaec7493428185cc65b6e
Author: Bart Van Assche <bvanassche@acm.org>
Date:   Thu Jul 14 11:07:06 2022 -0700

    scsi/core: Change the return type of scsi_noretry_cmd() into bool

    This patch prepares for introducing the new blk_opf_t type in the SCSI core.
    Since the value returned by scsi_noretry_cmd() is only used in boolean
    expressions, this patch does not change any functionality.

    Cc: Martin K. Petersen <martin.petersen@oracle.com>
    Cc: Christoph Hellwig <hch@lst.de>
    Cc: Ming Lei <ming.lei@redhat.com>
    Cc: Hannes Reinecke <hare@suse.de>
    Cc: John Garry <john.garry@huawei.com>
    Cc: Mike Christie <michael.christie@oracle.com>
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20220714180729.1065367-41-bvanassche@acm.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:21 +08:00
Ming Lei 681c4122e9 blk-mq: Drop blk_mq_ops.timeout 'reserved' arg
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit 9bdb4833dd399cbff82cc20893f52bdec66a9eca
Author: John Garry <john.garry@huawei.com>
Date:   Wed Jul 6 20:03:51 2022 +0800

    blk-mq: Drop blk_mq_ops.timeout 'reserved' arg

    With new API blk_mq_is_reserved_rq() we can tell if a request is from
    the reserved pool, so stop passing 'reserved' arg. There is actually
    only a single user of that arg for all the callback implementations, which
    can use blk_mq_is_reserved_rq() instead.

    This will also allow us to stop passing the same 'reserved' around the
    blk-mq iter functions next.

    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:16 +08:00
Ming Lei 7c09a1cb5b scsi: core: Remove reserved request time-out handling
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2118511

commit deef1be18e3fc62ddf04fb3e5e8ff6a301693dcc
Author: John Garry <john.garry@huawei.com>
Date:   Wed Jul 6 20:03:49 2022 +0800

    scsi: core: Remove reserved request time-out handling

    The SCSI core code does not currently support reserved commands. As such,
    requests which time-out would never be reserved, and scsi_timeout()
    'reserved' arg should never be set.

    Remove handling for reserved requests, drop the wrapper scsi_timeout()
    as it now just calls scsi_times_out() always, and finally rename
    scsi_times_out() -> scsi_timeout() to match the blk_mq_ops method name.

    Signed-off-by: John Garry <john.garry@huawei.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/1657109034-206040-2-git-send-email-john.garry@huawei.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-10-12 09:20:16 +08:00
Ewan D. Milne 08529f2309 scsi: restore setting of scmd->scsi_done() in EH and reset ioctl paths
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2120469
Upstream Status: RHEL-only

Since not all SCSI drivers have been updated to call scsi_done() directly,
restore the setting of scmd->scsi_done() in both the EH and reset ioctl
command submission paths to avoid a null pointer dereference panic.

For some reason, upstream commit bf23e619039d ("scsi: core: Use a structure
member to track the SCSI command submitter") removed the setting of
scmd->scsi_done() prior to all the drivers being updated.  This should
have instead been done in the later patch in the series 11b68e36b167
("scsi: core: Call scsi_done directly") after all the driver changes.

Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-09-14 14:22:54 -04:00
Patrick Talbert 61e00f3f38 Merge: Additional SCSI updates for 9.1
MR: https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/973

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094105
Upstream Status: From upstream linux mainline
Tested: Tested by QE regression testing for error handling paths
Signed-off-by: Ewan D. Milne <emilne@redhat.com>

Additional updates to the SCSI beyond upstream 5.17 have been requested
by kernel-rt:

scsi: usb: storage: Complete the SCSI request directly
scsi: usb: Call scsi_done() directly

These changes are dependent on the scsi_done_direct() functionality,
which in turn depend on some refactoring of the scsi_done() mechanism
and require some earlier patches in order to apply cleanly:

scsi: core: Add scsi_done_direct() for immediate completion
scsi: core: Rename scsi_mq_done() into scsi_done() and export it
scsi: core: Use a structure member to track the SCSI command submitter

Note that the upstream series to refactor scsi_done() touched *all* drivers
to have them invoke the now exported scsi_done() symbols for command
completion, instead of calling direct through scsi_cmnd->scsi_done().
As it is undesirable to make these changes, particularly at this stage
in the development cycle, due to the amount of testing required, the
changes for this BZ will not include the removal of scsi_cmnd->scsi_done()
and will allow drivers to use either the old indirect or the new direct
mechanism, to provide more time to allow all the drivers to be updated.

The usb changes will be incorporated into a subsequent usb sybsystem update.

Approved-by: Maurizio Lombardi <mlombard@redhat.com>
Approved-by: Tomas Henzl <thenzl@redhat.com>
Approved-by: Chris Leech <cleech@redhat.com>
Approved-by: Torez Smith <torez@redhat.com>

Signed-off-by: Patrick Talbert <ptalbert@redhat.com>
2022-07-21 10:40:40 +02:00
Ming Lei d64fe46972 blk-mq: remove the done argument to blk_execute_rq_nowait
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2083917
Conflicts: drop changes on uhs and nvme io_uring passthrough

commit e2e530867245d051dc7800b0d07193b3e581f5b9
Author: Christoph Hellwig <hch@lst.de>
Date:   Tue May 24 14:15:30 2022 +0200

    blk-mq: remove the done argument to blk_execute_rq_nowait

    Let the caller set it together with the end_io_data instead of passing
    a pointless argument.  Note the the target code did in fact already
    set it and then just overrode it again by calling blk_execute_rq_nowait.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Keith Busch <kbusch@kernel.org>
    Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-06-22 08:58:09 +08:00
Ewan D. Milne 44b40ff5dc scsi: core: Use a structure member to track the SCSI command submitter
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2094105
Upstream Status: From upstream linux mainline
Conflicts: KABI changes for addition to struct scsi_cmnd

Conditional statements are faster than indirect calls. Use a structure
member to track the SCSI command submitter such that later patches can call
scsi_done(scmd) instead of scmd->scsi_done(scmd).

The asymmetric behavior that scsi_send_eh_cmnd() sets the submission
context to the SCSI error handler and that it does not restore the
submission context to the SCSI core is retained.

Link: https://lore.kernel.org/r/20211007202923.2174984-2-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bf23e619039d360d503b7282d030daf2277a5d47)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-06-06 17:22:49 -04:00
Ewan D. Milne cce2fcde29 scsi: core: sd: Add silence_suspend flag to suppress some PM messages
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071832
Upstream Status: From upstream linux mainline

Kernel messages produced during runtime PM can cause a never-ending cycle
because user space utilities (e.g. journald or rsyslog) write the messages
back to storage, causing runtime resume, more messages, and so on.

Messages that tell of things that are expected to happen are arguably
unnecessary, so add a flag to suppress them. This flag is used by the UFS
driver.

Link: https://lore.kernel.org/r/20220228113652.970857-2-adrian.hunter@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit af4edb1d50c6d1044cb34bc43621411b7ba2cffe)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-04-27 18:55:08 -04:00
Ewan D. Milne 6cc9603e63 scsi: core: Use eh_timeout for START STOP UNIT
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071832
Upstream Status: From upstream linux mainline

In some scenarios START STOP UNIT may time out. The default recovery
time of 30 seconds is relatively large. Modifying rq_timeout to adjust
the START STOP UNIT timeout value will affect the regular I/O.

Commit 9728c0814e ("[SCSI] make scsi_eh_try_stu use block timeout")
switched to rq_timeout for the START STOP UNIT command. However commit
0816c9251a ("[SCSI] Allow error handling timeout to be specified")
introduced an explicit eh_timeout parameter. It makes more sense to
use this value as the timeout for START STOP UNIT.

Link: https://lore.kernel.org/r/1636507412-21678-1-git-send-email-brookxu.cn@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Wu Bo <wubo40@huawei.com>
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit adcc796b4f55c18ee5fca8190a592c84cf8682e0)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-04-27 18:55:03 -04:00
Ewan D. Milne 60d1a2a24f scsi: core: Simplify control flow in scmd_eh_abort_handler()
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071832
Upstream Status: From upstream linux mainline

Simplify the nested conditionals in the function by using a label for the
error path.  Introduce local "shost" to avoid repeated "sdev->shost" usage.
Also remove scsi_eh_complete_abort() since there is now only one place it
would be called.

Link: https://lore.kernel.org/r/20211029194311.17504-3-emilne@redhat.com
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 54d816d3d36293728ffc8488fae14b002d4b4a64)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-04-27 18:55:02 -04:00
Ewan D. Milne f4fbde3853 scsi: core: Avoid leaving shost->last_reset with stale value if EH does not run
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071832
Upstream Status: From upstream linux mainline

The changes to issue the abort from the scmd->abort_work instead of the EH
thread introduced a problem if eh_deadline is used.  If aborting the
command(s) is successful, and there are never any scmds added to the
shost->eh_cmd_q, there is no code path which will reset the ->last_reset
value back to zero.

The effect of this is that after a successful abort with no EH thread
activity, a subsequent timeout, perhaps a long time later, might
immediately be considered past a user-set eh_deadline time, and the host
will be reset with no attempt at recovery.

Fix this by resetting ->last_reset back to zero in scmd_eh_abort_handler()
if it is determined that the EH thread will not run to do this.

Thanks to Gopinath Marappan for investigating this problem.

Link: https://lore.kernel.org/r/20211029194311.17504-2-emilne@redhat.com
Fixes: e494f6a728 ("[SCSI] improved eh timeout handler")
Cc: stable@vger.kernel.org
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5ae17501bc62a49b0b193dcce003f16375f16654)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-04-27 18:55:02 -04:00
Ewan D. Milne a652e613d0 scsi: core: Use scsi_cmd_to_rq() instead of scsi_cmnd.request
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2071832
Upstream Status: From upstream linux mainline
Conflicts: Merge differences due to out-of-order commits 21fd056f65f0
           ("block: remove the ->rq_disk field in struct request")
	   and e1c83c8f82 ("scsi: core: scsi_logging: Fix a BUG")

Prepare for removal of the request pointer by using scsi_cmd_to_rq()
instead. Cast away constness where necessary when passing a SCSI command
pointer to scsi_cmd_to_rq(). This patch does not change any functionality.

Link: https://lore.kernel.org/r/20210809230355.8186-3-bvanassche@acm.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit aa8e25e5006aac52c943c84e9056ab488630ee19)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
2022-04-27 18:54:51 -04:00
Ming Lei df922b5e32 block: remove the gendisk argument to blk_execute_rq
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066297
Conflicts: drop change on ufshpb since rhel9 doesn't support it; apply
 change on fs/nfsd/blocklayout.c

commit b84ba30b6c7a75babdf73b83bc3c7b59b944501a
Author: Christoph Hellwig <hch@lst.de>
Date:   Fri Nov 26 13:18:01 2021 +0100

    block: remove the gendisk argument to blk_execute_rq

    Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait
    given that it is unused now.  Also convert the boolean at_head parameter
    to actually use the bool type while touching the prototype.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
    Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-04-11 11:44:25 +08:00
Ming Lei 91f6e9fde8 block: remove blk_{get,put}_request
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066297
Conflicts: drop change on scsi/ufs/ufshpb.c which isn't included in
rhel9

commit 0bf6d96cb8294094ce1e44cbe8cf65b0899d0a3a
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Oct 25 09:05:07 2021 +0200

    block: remove blk_{get,put}_request

    These are now pointless wrappers around blk_mq_{alloc,free}_request,
    so remove them.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
    Link: https://lore.kernel.org/r/20211025070517.1548584-3-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-04-11 11:44:15 +08:00
Ming Lei 5dd7fe91bc scsi: add a scsi_alloc_request helper
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2066297

commit 68ec3b819a5d600a4ede8b596761dccac9f39ebc
Author: Christoph Hellwig <hch@lst.de>
Date:   Thu Oct 21 08:06:05 2021 +0200

    scsi: add a scsi_alloc_request helper

    Add a new helper that calls blk_get_request and initializes the
    scsi_request to avoid the indirect call through ->.initialize_rq_fn.

    Note that this makes the pktcdvd driver depend on the SCSI core, but
    given that only SCSI devices support SCSI passthrough requests that
    is not a functional change.

    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: Hannes Reinecke <hare@suse.de>
    Link: https://lore.kernel.org/r/20211021060607.264371-6-hch@lst.de
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

Signed-off-by: Ming Lei <ming.lei@redhat.com>
2022-04-11 11:44:14 +08:00
Linus Torvalds a022f7d575 block-5.14-2021-07-08
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmDnGVYQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpv6UEAC78zkseI8TmKaowNfkz/+MkP9eSFb1pVn3
 rxpbPOsZompHoZpeWt4oHL+3Rmm3a9iRo/APA2ELas4zvp+Q+6uG7eha2Dc4hUA9
 YgeO4z9YfG8wQNZc3x7bncb6ZwqEE5nnbFe/m25SyrAZVLlZ7FKHxfoZDqjhlGFC
 eLNiYO6vdvwgCoBMcotyCDttrPfEu6947/5vB1zevv57twdQQaEWGUhvyx1XrlDX
 0YD5fmdOjNU2isgxt4xo2Ur2zL6w254/hvj58sV3Z7JfkJpI9DCK+ztKEfzuyEhA
 WYz06rDAT1+1KuVLfowaZ+pYiPPOIsL0+QXI83r3nLaE7WGGlfS8Hmz//1FbziYs
 ZSZI826kEN+/lKeWTcKOOMhmkYyXEFFuQZS34eg9KI4xwML8v+ILlHmcp+tjebw9
 vzNF6f7N2ki+jnyxxyNxeMHxeAMWsqnIRROOhZg6bbs6UVNpDy4qRzpQaDOaJsVe
 uSAQ6PTd/etR9KE+ClhLe6X7Rmp/lfZCPe64wqM/3k1qV2KWhE1fwCQO4c5o1MBN
 rpk3Ef5PZYP3aakCvZnfcjMWlpZNbq/xMc6vPc+yq32akq1t1KbODVBiR5odcH0C
 Gt5N11im50SO06haBt7EOe4JMQLbK5sxG15t4C6mNQZgPegGfaLlVkKpzIkOzUha
 OkRofKMcDA==
 =gHse
 -----END PGP SIGNATURE-----

Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block

Pull more block updates from Jens Axboe:
 "A combination of changes that ended up depending on both the driver
  and core branch (and/or the IDE removal), and a few late arriving
  fixes. In detail:

   - Fix io ticks wrap-around issue (Chunguang)

   - nvme-tcp sock locking fix (Maurizio)

   - s390-dasd fixes (Kees, Christoph)

   - blk_execute_rq polling support (Keith)

   - blk-cgroup RCU iteration fix (Yu)

   - nbd backend ID addition (Prasanna)

   - Partition deletion fix (Yufen)

   - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph)

   - Removal of now dead block request types due to IDE removal
     (Christoph)

   - Loop probing and control device cleanups (Christoph)

   - Device uevent fix (Christoph)

   - Misc cleanups/fixes (Tetsuo, Christoph)"

* tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits)
  blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs
  block: fix the problem of io_ticks becoming smaller
  nvme-tcp: can't set sk_user_data without write_lock
  loop: remove unused variable in loop_set_status()
  block: remove the bdgrab in blk_drop_partitions
  block: grab a device refcount in disk_uevent
  s390/dasd: Avoid field over-reading memcpy()
  dasd: unexport dasd_set_target_state
  block: check disk exist before trying to add partition
  ubd: remove dead code in ubd_setup_common
  nvme: use return value from blk_execute_rq()
  block: return errors from blk_execute_rq()
  nvme: use blk_execute_rq() for passthrough commands
  block: support polling through blk_execute_rq
  block: remove REQ_OP_SCSI_{IN,OUT}
  block: mark blk_mq_init_queue_data static
  loop: rewrite loop_exit using idr_for_each_entry
  loop: split loop_lookup
  loop: don't allow deleting an unspecified loop device
  loop: move loop_ctl_mutex locking into loop_add
  ...
2021-07-09 12:05:33 -07:00
Christoph Hellwig da6269da4c block: remove REQ_OP_SCSI_{IN,OUT}
With the legacy IDE driver gone drivers now use either REQ_OP_DRV_*
or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests
into a single one.

Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-06-30 15:34:19 -06:00
Hannes Reinecke 3d45cefc8e scsi: core: Drop obsolete Linux-specific SCSI status codes
Originally the SCSI subsystem has been using 'special' SCSI status codes,
which were the SAM-specified ones but shifted by 1.  As most drivers have
now been modified to use the SAM-specified ones, having two nearly
identical sets of definitions only causes confusion.

The Linux-specifed SCSI status codes have been marked obsolete for several
years so drop them and use the SAM-specified status codes throughout.

Link: https://lore.kernel.org/r/20210427083046.31620-41-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-31 23:59:18 -04:00
Hannes Reinecke 54cf31d07a scsi: core: Drop message byte helper
The message byte is now unused, so we can drop the helper to set the
message byte and the check for message bytes during error recovery.

Link: https://lore.kernel.org/r/20210427083046.31620-38-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-31 22:48:24 -04:00
Hannes Reinecke 4bd51e54e1 scsi: core: Use DID_TIME_OUT instead of DRIVER_TIMEOUT
Set DID_TIME_OUT instead of DRIVER_TIMEOUT when a command
is finally marked as failed after error recovery.

Link: https://lore.kernel.org/r/20210427083046.31620-12-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-31 22:48:22 -04:00
Hannes Reinecke d0672a03e0 scsi: core: Introduce scsi_status_is_check_condition()
Add a helper function scsi_status_is_check_condition() to encapsulate the
frequent checks for SAM_STAT_CHECK_CONDITION.

Link: https://lore.kernel.org/r/20210427083046.31620-9-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-31 22:48:21 -04:00
Bart Van Assche b8e162f9e7 scsi: core: Introduce enum scsi_disposition
Improve readability of the code in the SCSI core by introducing an
enumeration type for the values used internally that decide how to continue
processing a SCSI command. The eh_*_handler return values have not been
changed because that would involve modifying all SCSI drivers.

The output of the following command has been inspected to verify that no
out-of-range values are assigned to a variable of type enum
scsi_disposition:

KCFLAGS=-Wassign-enum make CC=clang W=1 drivers/scsi/

Link: https://lore.kernel.org/r/20210415220826.29438-6-bvanassche@acm.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-15 22:44:40 -04:00
Bart Van Assche 280e91b026 scsi: core: Modify the scsi_send_eh_cmnd() return value for the SDEV_BLOCK case
The comment above scsi_send_eh_cmnd() says: "Returns SUCCESS or FAILED or
NEEDS_RETRY". This patch makes all values returned by scsi_send_eh_cmnd()
match the documentation of this function. This change does not affect the
behavior of scsi_eh_tur() nor of scsi_eh_try_stu() nor of the
scsi_request_sense() callers.

See also commit bbe9fb0d04 ("scsi: Avoid that .queuecommand() gets called
for a blocked SCSI device"; v5.3).

Link: https://lore.kernel.org/r/20210415220826.29438-5-bvanassche@acm.org
Cc: Christoph Hellwig <hch@lst.de>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-15 22:44:40 -04:00
Linus Torvalds bdb39c9509 SCSI misc on 20210219
This series consists of the usual driver updates (ufs, ibmvfc,
 qla2xxx, hisi_sas, pm80xx) plus the removal of the gdth driver (which
 is bound to cause conflicts with a trivial change somewhere).  The
 only big major rework of note is the one from Hannes trying to clean
 up our result handling code in the drivers to make it consistent.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYDAdliYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishTblAQCk6wD8
 fcb4TItSRp0DpRzs37zhppEbrBgveuAFHhr5swEA0gL2mHcq0vnyNBinCLnERrE7
 TPYJqUKJNktnjVG7ZWc=
 =wW6p
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (ufs, ibmvfc,
  qla2xxx, hisi_sas, pm80xx) plus the removal of the gdth driver (which
  is bound to cause conflicts with a trivial change somewhere).

  The only big major rework of note is the one from Hannes trying to
  clean up our result handling code in the drivers to make it
  consistent"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (194 commits)
  scsi: MAINTAINERS: Adjust to reflect gdth scsi driver removal
  scsi: ufs: Give clk scaling min gear a value
  scsi: lpfc: Fix 'physical' typos
  scsi: megaraid_mbox: Fix spelling of 'allocated'
  scsi: qla2xxx: Simplify the calculation of variables
  scsi: message: fusion: Fix 'physical' typos
  scsi: target: core: Change ASCQ for residual write
  scsi: target: core: Signal WRITE residuals
  scsi: target: core: Set residuals for 4Kn devices
  scsi: hisi_sas: Add trace FIFO debugfs support
  scsi: hisi_sas: Flush workqueue in hisi_sas_v3_remove()
  scsi: hisi_sas: Enable debugfs support by default
  scsi: hisi_sas: Don't check .nr_hw_queues in hisi_sas_task_prep()
  scsi: hisi_sas: Remove deferred probe check in hisi_sas_v2_probe()
  scsi: lpfc: Add auto select on IRQ_POLL
  scsi: ncr53c8xx: Fix typos
  scsi: lpfc: Fix ancient double free
  scsi: qla2xxx: Fix some memory corruption
  scsi: qla2xxx: Remove redundant NULL check
  scsi: megaraid: Fix ifnullfree.cocci warnings
  ...
2021-02-22 10:24:58 -08:00
Guoqing Jiang 8eeed0b554 block: remove unnecessary argument from blk_execute_rq_nowait
The 'q' is not used since commit a1ce35fa49 ("block: remove dead
elevator code"), also update the comment of the function.

And more importantly it never really was needed to start with given
that we can trivial derive it from struct request.

Cc: target-devel@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ide@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24 21:52:39 -07:00
Muneendra Kumar 60bee27ba2 scsi: core: No retries on abort success
Add a new optional routine, eh_should_retry_cmd(), in scsi_host_template
that allows the transport to decide if a cmd is retryable. Return true if
the transport is in a state the cmd should be retried on.

Update scmd_eh_abort_handler() and scsi_eh_flush_done_q() to both call
scsi_eh_should_retry_cmd() to check whether the command needs to be
retried.

The above changes were based on a patch by Mike Christie.

Link: https://lore.kernel.org/r/1609969748-17684-3-git-send-email-muneendra.kumar@broadcom.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:55:17 -05:00
Muneendra Kumar 962c8dcdd5 scsi: core: Add a new error code DID_TRANSPORT_MARGINAL in scsi.h
Add code in scsi_result_to_blk_status to translate a new error
DID_TRANSPORT_MARGINAL to the corresponding blk_status_t i.e
BLK_STS_TRANSPORT.

Add DID_TRANSPORT_MARGINAL case to scsi_decide_disposition().

Link: https://lore.kernel.org/r/1609969748-17684-2-git-send-email-muneendra.kumar@broadcom.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-14 22:55:17 -05:00
Linus Torvalds 55e0500eb5 SCSI misc on 20201013
This series consists of the usual driver updates (ufs, qla2xxx, tcmu,
 ibmvfc, lpfc, smartpqi, hisi_sas, qedi, qedf, mpt3sas) and minor bug
 fixes.  There are only three core changes: adding sense codes,
 cleaning up noretry and adding an option for limitless retries.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX4YulyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaZDAQCT7rwG
 UEZYHgYkU9EX9ERVBQM0SW4mLrxf3g3P5ioJsAEAtkclCM4QsIOP+MIPjIa0EyUY
 khu0kcrmeFR2YwA8zhw=
 =4w4S
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "The usual driver updates (ufs, qla2xxx, tcmu, ibmvfc, lpfc, smartpqi,
  hisi_sas, qedi, qedf, mpt3sas) and minor bug fixes.

  There are only three core changes: adding sense codes, cleaning up
  noretry and adding an option for limitless retries"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (226 commits)
  scsi: hisi_sas: Recover PHY state according to the status before reset
  scsi: hisi_sas: Filter out new PHY up events during suspend
  scsi: hisi_sas: Add device link between SCSI devices and hisi_hba
  scsi: hisi_sas: Add check for methods _PS0 and _PR0
  scsi: hisi_sas: Add controller runtime PM support for v3 hw
  scsi: hisi_sas: Switch to new framework to support suspend and resume
  scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq()
  scsi: qedf: Remove redundant assignment to variable 'rc'
  scsi: lpfc: Remove unneeded variable 'status' in lpfc_fcp_cpu_map_store()
  scsi: snic: Convert to use DEFINE_SEQ_ATTRIBUTE macro
  scsi: qla4xxx: Delete unneeded variable 'status' in qla4xxx_process_ddb_changed
  scsi: sun_esp: Use module_platform_driver to simplify the code
  scsi: sun3x_esp: Use module_platform_driver to simplify the code
  scsi: sni_53c710: Use module_platform_driver to simplify the code
  scsi: qlogicpti: Use module_platform_driver to simplify the code
  scsi: mac_esp: Use module_platform_driver to simplify the code
  scsi: jazz_esp: Use module_platform_driver to simplify the code
  scsi: mvumi: Fix error return in mvumi_io_attach()
  scsi: lpfc: Drop nodelist reference on error in lpfc_gen_req()
  scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs()
  ...
2020-10-14 15:15:35 -07:00
Mike Christie 2a242d59d6 scsi: core: Add limitless cmd retry support
Add infinite retry support to SCSI midlayer by combining common checks for
retries into some helper functions, and then checking for the
-1/SCSI_CMD_RETRIES_NO_LIMIT.

Link: https://lore.kernel.org/r/1601566554-26752-2-git-send-email-michael.christie@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-02 18:53:06 -04:00