JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Skip the reset settle delay during error handling since the scsi_debug
driver doesn't need this delay.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20241216184852.2626339-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 29081c21a7064cdbf29295d07f0e44776280918e)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Since commit 771f712ba5 ("scsi: scsi_debug: Fix cmd duration
calculation"), ns_from_boot value is only evaluated in schedule_resp()
for polled requests.
However, ns_from_boot is also required for hrtimer support for when
ndelay is less than INCLUSIVE_TIMING_MAX_NS, so fix up the logic to
decide when to evaluate ns_from_boot.
Fixes: 771f712ba5 ("scsi: scsi_debug: Fix cmd duration calculation")
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241202130045.2335194-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6918141d815acef056a0d10e966a027d869a922d)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
If the sg_copy_buffer() call returns less than sdebug_sector_size, then
we drop out of the copy loop. However, we still report that we copied
the full expected amount, which is not proper.
Fix by keeping a running total and return that value.
Fixes: 84f3a3c01d70 ("scsi: scsi_debug: Atomic write support")
Reported-by: Colin Ian King <colin.i.king@gmail.com>
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20241018101655.4207-1-john.g.garry@oracle.com
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d28d17a845600dd9f7de241de9b1528a1b138716)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Conflicts: Merge differences due to out-of-order commit 84f3a3c01d70
("scsi: scsi_debug: Atomic write support")
Track per GROUP NUMBER how many write commands have been processed. Make
this information available in sysfs. Reset these statistics if any data
is written into the sysfs attribute.
Note: SCSI devices should only interpret the information in the GROUP
NUMBER field as a stream identifier if the ST_ENBLE bit has been set to
one. This patch follows a simpler approach: count the number of writes
per GROUP NUMBER whether or not the group number represents a stream
identifier.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-20-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit af180c0880f9df14be31807f0bb0fa6f0d34a943)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Implement the GET STREAM STATUS SCSI command. Report that the first
five stream indexes correspond to permanent streams.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-19-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ad620becda436fc02e50e5f6fe01de1d1f3794c9)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Implement an IO Advice Hints Grouping mode page with three permanent
streams. A permanent stream is a stream for which the device server does
not allow closing or otherwise modifying the configuration of that
stream. The stream identifier enable (ST_ENBLE) bit specifies whether
the stream identifier may be used in the GROUP NUMBER field of SCSI
WRITE commands.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-18-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f8ab2710177a762ce0f9b8426f9fc292394949df)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Make the MODE SENSE response buffer larger and allocate it from the heap.
This patch prepares for adding support for the IO Advice Hints Grouping
mode page.
Suggested-by: Douglas Gilbert <dgilbert@interlog.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-17-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b952eb270df38bc0d930a1ef965666ecf54a2097)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Move the subpage code checks into the switch statement to make it easier
to add support for new page code / subpage code combinations.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-16-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f19c3e4fe2542d7b145d294386666958c9fabe17)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
Instead of tracking whether or not the page code is valid in a boolean
variable, jump to error handling code if an unsupported page code is
encountered.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-15-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b2f860903fe9774f755a917edc674ba6e879fa55)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
>From SBC-5 r05:
"Reduced stream control:
a) reduces the maximum number of streams that the device server supports;
and
b) increases the number of write commands that are able to specify a stream
to be written in any write command that contains the GROUP NUMBER field
in its CDB.
If the RSCS bit (see 6.6.5) is set to one, then the device server shall:
a) support per group stream identifier usage as described in 4.32.2;
b) support the IO Advice Hints Grouping mode page (see 6.5.7); and
c) set the MAXIMUM NUMBER OF STREAMS field (see 6.6.5) to a value that is
less than 64.
Device servers that set the RSCS bit to one may support other features
(e.g., permanent streams (see 4.32.4)).
4.32.4 Permanent streams
A permanent stream is a stream for which the device server does not allow
closing or otherwise modifying the configuration of that stream. The PERM
bit (see 5.9.2.3) indicates whether a stream is a permanent stream. If a
STREAM CONTROL command (see 5.32) specifies the closing of a permanent
stream, the device server terminates that command with CHECK CONDITION
status instead of closing the specified stream. A permanent stream is always
an open stream. Device severs should assign the lowest numbered stream
identifiers to permanent streams."
Report that reduced stream control is supported.
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-14-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b1e5c0b34db8e7dac04af618e53c64e70c86aac8)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-86156
Upstream Status: From upstream linux mainline
All VPD pages have the page code in byte one. Reduce code duplication by
storing the VPD page code once.
Reviewed-by: Avri Altman <avri.altman@wdc.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20240130214911.1863909-13-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a5fe98eb8f630f3ad3d1d5c16374621e8c0cd702)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62151
Upstream Status: From upstream linux mainline
'arr' is kzalloc()'ed, so there is no need to call memset(.., 0, ...) on
it. It is already cleared.
This is a follow up of commit b952eb270df3 ("scsi: scsi_debug: Allocate the
MODE SENSE response from the heap").
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/6296722174e39a51cac74b7fc68b0d75bd0db2a3.1725690433.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit bba20b894e3c2e20f1ac914561b9ac241e0e359e)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-62151
Upstream Status: From upstream linux mainline
Target debugfs entry is removed via async_schedule() which isn't drained
when adding same name target, so failure of "Directory 'target11:0:0' with
parent 'scsi_debug' already present!" can be triggered easily.
Fix it by switching to domain async schedule, and draining it before
adding new target debugfs entry.
Cc: Wenchao Hao <haowenchao2@huawei.com>
Fixes: f084fe52c640 ("scsi: scsi_debug: Add debugfs interface to fail target reset")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Wenchao Hao <haowenchao22@gmail.com>
Link: https://lore.kernel.org/r/20240619013803.3008857-1-ming.lei@redhat.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b402a0dce64aa3e14a9bd15ab1dd87a93967f90c)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-56837
Conflicts: context difference because we don't backport patchset of
"Pass data lifetime information to SCSI disk devices"
commit 84f3a3c01d70efba736bc42155cf32722067b327
Author: John Garry <john.g.garry@oracle.com>
Date: Thu Jun 20 12:53:58 2024 +0000
scsi: scsi_debug: Atomic write support
Add initial support for atomic writes.
As is standard method, feed device properties via modules param, those
being:
- atomic_max_size_blks
- atomic_alignment_blks
- atomic_granularity_blks
- atomic_max_size_with_boundary_blks
- atomic_max_boundary_blks
These just match sbc4r22 section 6.6.4 - Block limits VPD page.
We just support ATOMIC WRITE (16).
The major change in the driver is how we lock the device for RW accesses.
Currently the driver uses a per-device lock for accessing device metadata
and "media" data (calls to do_device_access()) atomically for the duration
of the whole read/write command.
This should not suit verifying atomic writes. Reason being that currently
all reads/writes are atomic, so using atomic writes does not prove
anything.
Change device access model to basis that regular writes only atomic on a
per-sector basis, while reads and atomic writes are fully atomic.
As mentioned, since accessing metadata and device media is atomic,
continue to have regular writes involving metadata - like discard or PI -
as atomic. We can improve this later.
Currently we only support model where overlapping going reads or writes
wait for current access to complete before commencing an atomic write.
This is described in 4.29.3.2 section of the SBC. However, we simplify,
things and wait for all accesses to complete (when issuing an atomic
write).
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20240620125359.2684798-10-john.g.garry@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
Now that the driver core can properly handle constant struct bus_type, move
the pseudo_lld_bus variable to be a constant structure as well, placing it
into read-only memory which can not be modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Link: https://lore.kernel.org/r/20240203-bus_cleanup-scsi-v1-3-6f552fb24f71@marliere.net
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ac0dd0f33adb804b8301ae415a91f56f97f40bae)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
Smatch complains that "dentry" is never initialized. These days everyone
initializes all their stack variables to zero so this means that it will
trigger a warning every time this function is run.
Really, debugfs functions are not supposed to be checked for errors in
normal code. For example, if we updated this code to check the correct
variable then it would print a warning if CONFIG_DEBUGFS was disabled. We
don't want that. Just delete the check.
Fixes: f084fe52c640 ("scsi: scsi_debug: Add debugfs interface to fail target reset")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/c602c9ad-5e35-4e18-a47f-87ed956a9ec2@moroto.mountain
Reviewed-by: Wenchao Hao <haowenchao2@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 037fbd3fcfbd99145f9310d93f6637012807cfd0)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
There are two bug in this code:
1) If count is zero, then it will lead to a NULL dereference. The
kmalloc() will successfully allocate zero bytes and the test for "if
(buf[0] == '-')" will read beyond the end of the zero size buffer and
Oops.
2) The code does not ensure that the user's string is properly NUL
terminated which could lead to a read overflow.
Fixes: a9996d722b11 ("scsi: scsi_debug: Add interface to manage error injection for a single device")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/7733643d-e102-4581-8d29-769472011c97@moroto.mountain
Reviewed-by: Wenchao Hao <haowenchao2@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 860c3d03bbc3f17aef8600662c488f27fd093142)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
Conflicts: Merge differences
Add new module param "allow_restart" to control scsi_device's allow_restart
flag. This flag determines if EH is triggered after a command completes
with sense_key 0x6, ASC 0x4 and ASCQ 0x2. EH would be triggered if
allow_restart=1 in this condition.
The new param can be used with the error injection capability to test how
commands completing with sense_key 0x6, ASC 0x4 and ASCQ 0x2 are handled.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-11-haowenchao2@huawei.com
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 573c2d066eb950dd9bd6e8735d3a859bbc21b3cc)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
The interface is found at
/sys/kernel/debug/scsi_debug/target<h:c:t>/fail_reset where <h:c:t>
identifies the target to inject errors on. It's a simple bool type
interface which would make this target's reset fail if set to 'Y'.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-10-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f084fe52c640775de51056670f568ec7104b922a)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
Add error injection type 4 to make scsi_debug_device_reset() return FAILED.
Fail abort command format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x4 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "4 -10 0x12" > ${error}
will make the device return FAILED when trying to reset LUN with inquiry
command 10 times.
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "4 -10 0xff" > ${error}
will make the device return FAILED when trying to reset LUN 10 times.
Usually we do not care about what command it is when trying to perform
reset LUN, so 0xff could be applied.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-9-haowenchao2@huawei.com
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0267811625e13a7743eeb6072b57509bf909f484)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
Add error injection type 3 to make scsi_debug_abort() return FAILED. Fail
abort command format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x3 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "3 -10 0x12" > ${error}
will make the device return FAILED when aborting inquiry command 10 times.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-8-haowenchao2@huawei.com
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5551ce928805ba790db0fa0a895e207d5d05717d)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
If a fail command error is injected, set the command's status and sense
data then finish this SCSI command.
Set SCSI command's status and sense data format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x2 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error Count |
| | | 0: the rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
| 4 | x8 | Host byte in scsi_cmd::status |
| | | [scsi_cmd::status has 32 bits holding these 3 bytes] |
+--------+------+-------------------------------------------------------+
| 5 | x8 | Driver byte in scsi_cmd::status |
+--------+------+-------------------------------------------------------+
| 6 | x8 | SCSI Status byte in scsi_cmd::status |
+--------+------+-------------------------------------------------------+
| 7 | x8 | SCSI Sense Key in scsi_cmnd |
+--------+------+-------------------------------------------------------+
| 8 | x8 | SCSI ASC in scsi_cmnd |
+--------+------+-------------------------------------------------------+
| 9 | x8 | SCSI ASCQ in scsi_cmnd |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "2 -10 0x88 0 0 0x2 0x3 0x11 0x0" >${error}
will make device's read command return with media error with additional
sense of "Unrecovered read error" (UNC):
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-7-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 33592274321eab2e35e2fdf01857c722431fcf8f)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
If a fail queuecommand error is injected, return the failed value defined
in the rule from queuecommand.
Make queuecommand return format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x1 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
| 4 | x32 | The queuecommand() return value we want |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "1 1 0x12 0x1055" > ${error}
will make each INQUIRY command sent to that device return 0x1055
(SCSI_MLQUEUE_HOST_BUSY).
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-6-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 33bccf55c20b66dfca6644c8dc9b396cec82e85c)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
If a timeout error is injected, return 0 from scsi_debug_queuecommand to
make the command time out.
Time out SCSI command format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x0 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "0 -10 0x12" > ${error}
will make the device's inquiry command time out 10 times.
echo "0 1 0x12" > ${error}
will make the device's inquiry time out each time it is invoked on this
device.
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-5-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 32be8b6e22eb76a08c66414593ef02d5eb151be7)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
The grammar to remove error injection is a line with fixed 3 columns
separated by spaces.
First column is fixed to "-". It tells this is a removal operation. Second
column is the error code to match. Third column is the scsi command to
match.
For example the following command would remove timeout injection of inquiry
command:
echo "- 0 0x12" > /sys/kernel/debug/scsi_debug/0:0:0:1/error
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-4-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 962d77cd4c852e6a34ffd44fc5f32ba678e02633)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
This new facility uses the debugfs pseudo file system which is typically
mounted under the /sys/kernel/debug directory and requires root permissions
to access.
The interface file is found at /sys/kernel/debug/scsi_debug/<h:c:t:l>/error
where <h:c:t:l> identifies the device (logical unit (LU)) to inject errors
on.
For the following description the ${error} environment variable is assumed
to be set to/sys/kernel/debug/scsi_debug/1:0:0:0/error where 1:0:0:0 is a
pseudo device (LU) owned by the scsi_debug driver. Rules are written to
${error} in the normal sysfs fashion (e.g. 'echo "0 -2 0x12" > ${error}').
More than one rule can be active on a device at a time and inactive rules
(i.e. those whose error count is 0) remain in the rule listing. The
existing rules can be read with 'cat ${error}' with oneline output for each
rule.
The interface format is line-by-line, each line is an error injection rule.
Each rule contains integers separated by spaces, the first three columns
correspond to "Error code", "Error count" and "SCSI command", other
columns depend on Error code.
General rule format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error code |
| | | 0: timeout SCSI command |
| | | 1: fail queuecommand, make queuecommand return |
| | | given value |
| | | 2: fail command, finish command with SCSI status, |
| | | sense key and ASC/ASCQ values |
| | | 3: make abort commands for specific command fail |
| | | 4: make reset lun for specific command fail |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
| ... | xxx | Error type specific fields |
+--------+------+-------------------------------------------------------+
Notes:
- When multiple error inject rules are added for the same SCSI command,
the one with smaller error code will take effect (and the others will be
ignored).
- If the same error (i.e. same Error code and SCSI command) is added, the
older one will be overwritten..
- Currently, the basic types are (u8/u16/u32/u64/s8/s16/s32/s64) and the
hexadecimal types (x8/x16/x32/x64).
- Where a hexadecimal value is expected (e.g. Column 3: SCSI command
opcode) the "0x" prefix is optional on the value (e.g. the INQUIRY
opcode can be given as '0x12' or '12').
- When the Error count is negative, reading ${error} will show that value
incrementing, stopping when it gets to 0.
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-3-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a9996d722b1197b625acfa350dd2849e35ad6092)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-33543
Upstream Status: From upstream linux mainline
Create directory scsi_debug in the root of the debugfs filesystem. Prepare
to add interface for manage error injection.
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-2-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6e2d15f59b1cc6ed613b94e0969335a7868f04ca)
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-25988
Conflicts: drop change on ublk, btrfs and f2fs, all are not supported
on rhel9.5
commit 7437bb73f087e5f216f9c6603f5149d354e315af
Author: Christoph Hellwig <hch@lst.de>
Date: Sun Dec 17 17:53:57 2023 +0100
block: remove support for the host aware zone model
When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):
- host managed where non-conventional zones there is strict requirement
to write at the write pointer, or else an error is returned
- host aware where a write point is maintained if writes always happen
at it, otherwise it is left in an under-defined state and the
sequential write preferred zones behave like conventional zones
(probably very badly performing ones, though)
Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it). Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.
Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production. Drop the support before it is too
late. Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 0c028b6a115e7f18480ec3f98ba7bccf011646ea
Author: John Garry <john.g.garry@oracle.com>
Date: Sun Apr 16 17:56:54 2023 +0000
scsi: scsi_debug: Abort commands from scsi_debug_device_reset()
Currently scsi_debug_device_reset() does not do much apart from setting the
SDEBUG_UA_POR ("Power on, reset, or bus device reset") flag, which is
eventually passed back to the SCSI midlayer later for a "unit attention"
command.
There is a report that blktest scsi/007 test fails due to commit
1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd").
The problem there is that there are dangling scsi_debug queued commands
when we attempt to remove the driver.
scsi/007 test triggers SCSI EH and attempts to abort a timed-out command.
Function scsi_debug_device_reset() is called as part of the EH, but does
not deal with outstanding erroneous command. Prior to the named commit,
removing the driver caused all dangling queued commands to be stopped -
this should have not been necessary.
Fix by aborting outstanding commands on a scsi_device basis from
scsi_debug_device_reset().
Fixes: 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202304071111.e762fcbd-yujie.liu@intel.com
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230416175654.159163-1-john.g.garry@oracle.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit b32283d75335d8263fc9f5ae16c8a196f1d8b5d5
Author: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Date: Thu Apr 6 00:46:07 2023 -0700
scsi: scsi_debug: Fix missing error code in scsi_debug_init()
Smatch reports: drivers/scsi/scsi_debug.c:6996
scsi_debug_init() warn: missing error code 'ret'
Although it is unlikely that KMEM_CACHE might fail, but if it does then ret
might be zero. So to fix this explicitly mark ret as "-ENOMEM" and then
goto driver_unreg.
Fixes: 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20230406074607.3637097-1-harshit.m.mogalapalli@oracle.com
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit f1437cd1e535c5d5cc9f6e5bfdfc9b1cd3141bc4
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:10 2023 +0000
scsi: scsi_debug: Drop sdebug_queue
It's easy to get scsi_debug to error on throughput testing when we have
multiple shosts:
$ lsscsi
[7:0:0:0] disk Linux scsi_debug 0191
[0:0:0:0] disk Linux scsi_debug 0191
$ fio --filename=/dev/sda --filename=/dev/sdb --direct=1 --rw=read --bs=4k
--iodepth=256 --runtime=60 --numjobs=40 --time_based --name=jpg
--eta-newline=1 --readonly --ioengine=io_uring --hipri --exitall_on_error
jpg: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.28
Starting 40 processes
[ 27.521809] hrtimer: interrupt took 33067 ns
[ 27.904660] sd 7:0:0:0: [sdb] tag#171 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
[ 27.904660] sd 0:0:0:0: [sda] tag#58 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
fio: io_u error [ 27.904667] sd 0:0:0:0: [sda] tag#58 CDB: Read(10) 28 00 00 00 27 00 00 01 18 00
on file /dev/sda[ 27.904670] sd 0:0:0:0: [sda] tag#62 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
The issue is related to how the driver manages submit queues and tags. A
single array of submit queues - sdebug_q_arr - with its own set of tags is
shared among all shosts. As such, for occasions when we have more than one
shost it is possible to overload the submit queues and run out of tags.
The struct sdebug_queue is to manage tags and hold the associated
queued command entry pointer (for that tag).
Since the tagset iters are now used for functions like
sdebug_blk_mq_poll(), there is no need to manage these queues. Indeed,
blk-mq already provides what we need for managing tags and queues.
Drop sdebug_queue and all its usage in the driver.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-12-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 57f7225a4fe25425c29402adad990c7409958c40
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:09 2023 +0000
scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts
The shost->can_queue value is initially used to set per-HW queue context
tag depth in the block layer. This ensures that the shost is not sent too
many commands which it can deal with. However lowering sdebug_max_queue
separately means that we can easily overload the shost, as in the following
example:
$ cat /sys/bus/pseudo/drivers/scsi_debug/max_queue
192
$ cat /sys/class/scsi_host/host0/can_queue
192
$ echo 100 > /sys/bus/pseudo/drivers/scsi_debug/max_queue
$ cat /sys/class/scsi_host/host0/can_queue
192
$ fio --filename=/dev/sda --direct=1 --rw=read --bs=4k --iodepth=256
--runtime=1200 --numjobs=10 --time_based --group_reporting
--name=iops-test-job --eta-newline=1 --readonly --ioengine=io_uring
--hipri --exitall_on_error
iops-test-job: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=256
...
fio-3.28
Starting 10 processes
[ 111.269885] scsi_io_completion_action: 400 callbacks suppressed
[ 111.269885] blk_print_req_error: 400 callbacks suppressed
[ 111.269889] I/O error, dev sda, sector 440 op 0x0:(READ) flags 0x1200000 phys_seg 1 prio class 2
[ 111.269892] sd 0:0:0:0: [sda] tag#132 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=0s
[ 111.269897] sd 0:0:0:0: [sda] tag#132 CDB: Read(10) 28 00 00 00 01 68 00 00 08 00
[ 111.277058] I/O error, dev sda, sector 360 op 0x0:(READ) flags 0x1200000 phys_seg 1 prio class 2
[...]
Ensure that this cannot happen by allowing sdebug_max_queue be modified
only when we have no shosts. As such, any shost->can_queue value will match
sdebug_max_queue, and sdebug_max_queue cannot be modified separately.
Since retired_max_queue is no longer set, remove support.
Continue to apply the restriction that sdebug_host_max_queue cannot be
modified when sdebug_host_max_queue is set. Adding support for that would
mean extra code, and no one has complained about this restriction
previously.
A command like the following may be used to remove a shost:
echo -1 > /sys/bus/pseudo/drivers/scsi_debug/add_host
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-11-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 12f3eef016ea7a72c6e0d0fe6f66882086d9c4a9
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:08 2023 +0000
scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store()
The functions to update ndelay and delay value first check whether we have
any in-flight IO for any host. It does this by checking if any tag is used
in the global submit queues.
We can achieve the same by setting the host as blocked and then ensuring
that we have no in-flight commands with scsi_host_busy().
Note that scsi_host_busy() checks SCMD_STATE_INFLIGHT flag, which is only
set per command after we ensure that the host is not blocked, i.e. we see
more commands active after the check for scsi_host_busy() returns 0.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-10-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 9c559c9b4748fed11687694e65e5d6d1eb2919cd
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:07 2023 +0000
scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued()
Instead of iterating all deferred commands in the submission queue
structures, use blk_mq_tagset_busy_iter(), which is a standard API for
this.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-9-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 600d9ead3936b2f22e664c59345a2e006ff324c5
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:06 2023 +0000
scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll()
Instead of iterating all deferred commands in the submission queue
structures, use blk_mq_tagset_busy_iter(), which is a standard API for
this.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-8-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 1107c7b24ee3280abfc59f1b9186e285cabdd3ec
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:05 2023 +0000
scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd
Eventually we will drop the sdebug_queue struct as it is not really
required, so start with making the sdebug_queued_cmd dynamically allocated
for the lifetime of the scsi_cmnd in the driver.
As an interim measure, make sdebug_queued_cmd.sd_dp a pointer to struct
sdebug_defer. Also keep a value of the index allocated in
sdebug_queued_cmd.qc_arr in struct sdebug_queued_cmd.
To deal with an races in accessing the scsi cmnd allocated struct
sdebug_queued_cmd, add a spinlock for the scsi command in its priv area.
Races may be between scheduling a command for completion, aborting a
command, and the command actually completing and freeing the struct
sdebug_queued_cmd.
[mkp: typo fix]
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-7-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit a0473bf31df5bf2da7ecb50023c129659ce0a835
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:04 2023 +0000
scsi: scsi_debug: Use scsi_block_requests() to block queues
The feature to block queues is quite dubious, since it races with in-flight
IO. Indeed, it seems unnecessary for block queues for any times we do so.
Anyway, to keep the same behaviour, use standard SCSI API to stop IO being
sent - scsi_{un}block_requests().
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-6-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 25b80b2c7582ea15ba90b8007f1e1f1b8fc762b9
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:03 2023 +0000
scsi: scsi_debug: Protect block_unblock_all_queues() with mutex
There is no reason that calls to block_unblock_all_queues() from different
context can't race with one another, so protect with the
sdebug_host_list_mutex. There's no need for a more fine-grained per shost
locking here (and we don't have a per-host lock anyway).
Also simplify some touched code in sdebug_change_qdepth().
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-5-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 0aaa3fad4fd931ba3accac3a1fcc7334ca780591
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:02 2023 +0000
scsi: scsi_debug: Change shost list lock to a mutex
The shost list lock, sdebug_host_list_lock, is a spinlock. We would only
lock in non-atomic context in this driver, so use a mutex instead, which is
friendlier if we need to schedule when iterating.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-4-john.g.garry@oracle.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 00f9d622e8b237d8403569ee51f7bfb9bf89a2d5
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:01 2023 +0000
scsi: scsi_debug: Don't iter all shosts in clear_luns_changed_on_target()
In clear_luns_changed_on_target(), we iter all devices for all shosts to
conditionally clear the SDEBUG_UA_LUNS_CHANGED flag in the per-device
uas_bm.
One condition to see whether we clear the flag is to test whether the host
for the device under consideration is the same as the matching device's
(devip) host. This check will only ever pass for devices for the same
shost, so only iter the devices for the matching device shost.
We can now drop the spinlock'ing of the sdebug_host_list_lock in the same
function. This will allow us to use a mutex instead of the spinlock for the
global shost lock, as clear_luns_changed_on_target() could be called in
non-blocking context, in scsi_debug_queuecommand() -> make_ua() ->
clear_luns_changed_on_target() (which is why required a spinlock).
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-3-john.g.garry@oracle.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 6500d2045d5247cfb2ac31cc1691d7191096389b
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 27 07:43:00 2023 +0000
scsi: scsi_debug: Fix check for sdev queue full
There is a report that the blktests scsi/004 test for "TASK SET FULL" (TSF)
now fails.
The condition upon we should issue this TSF is when the sdev queue is
full. The check for a full queue has an off-by-1 error. Previously we would
increment the number of requests in the queue after testing if the queue
would be full, i.e. test if one less than full. Since we now use
scsi_device_busy() to count the number of requests in the queue, this would
already account for the current request, so fix the test for queue full
accordingly.
Fixes: 151f0ec9ddb5 ("scsi: scsi_debug: Drop sdebug_dev_info.num_in_q")
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202303201334.18b30edc-oliver.sang@intel.com
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230327074310.1862889-2-john.g.garry@oracle.com
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit c45b3804292ba5f95b86a8866e2e2cac03fa0155
Author: Lizhe <sensor1010@163.com>
Date: Sun Mar 19 12:27:32 2023 +0800
scsi: scsi_debug: Remove redundant driver match function
If there is no driver match function, the driver core assumes that each
candidate pair (driver, device) matches, see driver_match_device().
Drop the pseudo_lld bus match function that always returned 1. This results
in the same behaviour as when there is no match function.
[mkp+jgg: patch description]
Signed-off-by: Lizhe <sensor1010@163.com>
Link: https://lore.kernel.org/r/20230319042732.278691-1-sensor1010@163.com
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 548ebb335f743fa2647fe61bb1ad29d2c706afda
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 13 09:31:14 2023 +0000
scsi: scsi_debug: Add poll mode deferred completions to statistics
Currently commands completed via poll mode are not included in the
statistics gathering for deferred completions and missed CPUs.
Poll mode completions should be treated the same as other deferred
completion types, so add poll mode completions to the statistics.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20230313093114.1498305-12-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit f037b5cb07138cd519f35fd08ebef2faf075959f
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 13 09:31:13 2023 +0000
scsi: scsi_debug: Get command abort feature working again
The command abort feature allows us to test aborting a command which has
timed-out.
The idea is that for specific commands we just don't call scsi_done() and
allow the request to timeout, which ensures SCSI EH kicks-in we try to
abort the command.
Since commit 4a0c6f432d ("scsi: scsi_debug: Add new defer type for
mq_poll") this does not seem to work. The issue is that we clear the
sd_dp->aborted flag in schedule_resp() before the completion callback has
run. When the completion callback actually runs, it calls scsi_done() as
normal as sd_dp->aborted unset. This is all very racy.
Fix by not clearing sd_dp->aborted in schedule_resp(). Also move the call
to blk_abort_request() from schedule_resp() to sdebug_q_cmd_complete(),
which makes the code have a more logical sequence.
I also note that this feature only works for commands which are classed as
"SDEG_RES_IMMED_MASK", but only practically triggered with prior RW
commands. So for my experiment I need to run fio to trigger the error on
the "nth" command (see inject_on_this_cmd()), and then run something like
sg_sync to queue a command to actually trigger the abort.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20230313093114.1498305-11-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 151f0ec9ddb539c403a17c86da092e751736c121
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 13 09:31:12 2023 +0000
scsi: scsi_debug: Drop sdebug_dev_info.num_in_q
In schedule_resp(), under certain conditions we check whether the
per-device queue is full (num_in_q == queue depth - 1) and we may inject a
"task set full" (TSF) error if it is.
However how we read num_in_q is racy - many threads may see the same "queue
is full" value (and also issue a TSF).
There is per-queue locking in reading per-device num_in_q, but that would
not help.
Replace how we read num_in_q at this location with a call to
scsi_device_busy(). Calling scsi_device_busy() is likewise racy (as reading
num_in_q), so nothing lost or gained. Calling scsi_device_busy() is also
slow as it needs to read all bits in the per-device budget bitmap, but we
can live with that since we're just a simulator and it's only under a
certain configs which we would see this.
Also move the "task set full" print earlier as it would only be called now
under this condition. However, previously it may not have been called -
like returning early - but keep it simple and always call it.
At this point we can drop sdebug_dev_info.num_in_q - it is difficult to
maintain properly and adds extra normal case command processing.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20230313093114.1498305-10-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 0befb8790969087946f5726d8d80b4f83053ea21
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 13 09:31:11 2023 +0000
scsi: scsi_debug: Drop check for num_in_q exceeding queue depth
The per-device num_in_q value cannot exceed the device queue depth, so drop
the check.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20230313093114.1498305-9-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 9c2303820bf033f798fe14a856d7df431640001b
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 13 09:31:10 2023 +0000
scsi: scsi_debug: Drop scsi_debug_host_reset() device NULL pointer check
The check for device pointer for the SCSI command is unnecessary, so drop
it.
The only caller is scsi_try_host_reset() -> eh_host_reset_handler(), and
there that pointer cannot be NULL.
Indeed, there is already code later in the same function which does not
check the device pointer for the SCSI command.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20230313093114.1498305-8-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit 519bfc14c156f31cc113709c71e7f66e0f6f228e
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 13 09:31:09 2023 +0000
scsi: scsi_debug: Drop scsi_debug_bus_reset() NULL pointer checks
The checks for SCSI cmnd, SCSI device, and SCSI host are unnecessary, so
drop them. Likewise, drop the NULL check for sdbg_host.
The only caller is scsi_try_bus_reset() -> eh_bus_reset_handler(), and
there those pointers cannot be NULL.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20230313093114.1498305-7-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
JIRA: https://issues.redhat.com/browse/RHEL-15276
commit a15df530a189fcc62003df7a7272b2918a9ef73a
Author: John Garry <john.g.garry@oracle.com>
Date: Mon Mar 13 09:31:08 2023 +0000
scsi: scsi_debug: Drop scsi_debug_target_reset() NULL pointer checks
The checks for SCSI cmnd, SCSI device, and SCSI host are unnecessary, so
drop them. Likewise, drop the NULL check for sdbg_host.
The only caller is scsi_try_target_reset() -> eh_target_reset_handler(),
and there those pointers cannot be NULL.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20230313093114.1498305-6-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>