From: Bart Van Assche <[email protected]>
To: "Martin K. Petersen" <[email protected]>
Cc: Nitesh Shetty <[email protected]>,
Javier Gonzalez <[email protected]>,
Matthew Wilcox <[email protected]>,
Keith Busch <[email protected]>, Christoph Hellwig <[email protected]>,
Keith Busch <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>
Subject: Re: [PATCHv10 0/9] write hints with nvme fdp, scsi streams
Date: Wed, 27 Nov 2024 10:42:34 -0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 11/26/24 6:54 PM, Martin K. Petersen wrote:
> Bart wrote:
>> There are some strong arguments in this thread from May 2024 in favor of
>> representing the entire copy operation as a single REQ_OP_ operation:
>> https://lore.kernel.org/linux-block/[email protected]/
>
> As has been discussed many times, a copy operation is semantically a
> read operation followed by a write operation. And, based on my
> experience implementing support for both types of copy offload in Linux,
> what made things elegant was treating the operation as a read followed
> by a write throughout the stack. Exactly like the token-based offload
> specification describes.
Submitting a copy operation as two bios or two requests means that there
is a risk that one of the two operations never reaches the block driver
at the bottom of the storage stack and hence that a deadlock occurs. I
prefer not to introduce any mechanisms that can cause a deadlock.
As one can see here, Damien Le Moal and Keith Busch both prefer to
submit copy operations as a single operation: Keith Busch, Re: [PATCH
v20 02/12] Add infrastructure for copy offload in block and request
layer, linux-block mailing list, 2024-06-24
(https://lore.kernel.org/all/[email protected]/).
>> Token-based copy offloading (called ODX by Microsoft) could be
>> implemented by maintaining a state machine in the SCSI sd driver
>
> I suspect the SCSI maintainer would object strongly to the idea of
> maintaining cross-device copy offload state and associated object
> lifetime issues in the sd driver.
Such information wouldn't have to be maintained inside the sd driver. A
new kernel module could be introduced that tracks the state of copy
operations and that interacts with the sd driver.
>> I'm assuming that the IMMED bit will be set to zero in the WRITE USING
>> TOKEN command. Otherwise one or more additional RECEIVE ROD TOKEN
>> INFORMATION commands would be required to poll for the WRITE USING TOKEN
>> completion status.
>
> What would the benefit of making WRITE USING TOKEN be a background
> operation? That seems like a completely unnecessary complication.
If a single copy operation takes significantly more time than the time
required to switch between power states, power can be saved by using
IMMED=1. Mechanisms like run-time power management (RPM) or the UFS host
controller auto-hibernation mechanism can only be activated if no
commands are in progress. With IMMED=0, the link between the host and
the storage device will remain powered as long as the copy operation is
in progress. With IMMED=1, the link between the host and the storage
device can be powered down after the copy operation has been submitted
until the host decides to check whether or not the copy operation has
completed.
>> I guess that the block layer maintainer wouldn't be happy if all block
>> drivers would have to deal with three or four phases for copy
>> offloading just because ODX is this complicated.
>
> Last I looked, EXTENDED COPY consumed something like 70 pages in the
> spec. Token-based copy is trivially simple and elegant by comparison.
I don't know of any storage device vendor who has implemented all
EXTENDED COPY features that have been standardized. Assuming that 50
lines of code fit on a single page, here is an example of an EXTENDED
COPY implementation that can be printed on 21 pages of paper:
$ wc -l drivers/target/target_core_xcopy.c
1041
$ echo $(((1041 + 49) / 50))
21
The advantages of EXTENDED COPY over ODX are as follows:
- EXTENDED COPY is a single SCSI command and hence better suited for
devices with a limited queue depth. While the UFS 3.0 standard
restricts the queue depth to 32, most UFS 4.0 devices support a
queue depth of 64.
- The latency of setting up a copy command with EXTENDED COPY is
lower since only a single command has to be sent to the device.
ODX requires three round-trips to the device (assuming IMMED=0).
- EXTENDED COPY requires less memory in storage devices. Each ODX
token occupies some memory and the rules around token lifetimes
are nontrivial.
Thanks,
Bart.
next prev parent reply other threads:[~2024-11-27 18:42 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-29 15:19 [PATCHv10 0/9] write hints with nvme fdp, scsi streams Keith Busch
2024-10-29 15:19 ` [PATCHv10 1/9] block: use generic u16 for write hints Keith Busch
2024-10-29 17:21 ` Bart Van Assche
2024-10-29 15:19 ` [PATCHv10 2/9] block: introduce max_write_hints queue limit Keith Busch
2024-10-29 15:19 ` [PATCHv10 3/9] statx: add write hint information Keith Busch
2024-10-29 15:19 ` [PATCHv10 4/9] block: allow ability to limit partition write hints Keith Busch
2024-10-29 15:23 ` Christoph Hellwig
2024-10-29 17:25 ` Bart Van Assche
2024-10-30 4:46 ` Christoph Hellwig
2024-10-30 20:11 ` Keith Busch
2024-10-30 20:26 ` Bart Van Assche
2024-10-30 20:37 ` Keith Busch
2024-10-30 21:15 ` Bart Van Assche
2024-10-29 15:19 ` [PATCHv10 5/9] block, fs: add write hint to kiocb Keith Busch
2024-10-29 15:19 ` [PATCHv10 6/9] io_uring: enable per-io hinting capability Keith Busch
2024-11-07 2:09 ` Jens Axboe
2024-10-29 15:19 ` [PATCHv10 7/9] block: export placement hint feature Keith Busch
2024-10-29 15:19 ` [PATCHv10 8/9] nvme: enable FDP support Keith Busch
2024-10-30 0:24 ` Chaitanya Kulkarni
2024-10-29 15:19 ` [PATCHv10 9/9] scsi: set permanent stream count in block limits Keith Busch
2024-10-29 15:26 ` Christoph Hellwig
2024-10-29 15:34 ` Keith Busch
2024-10-29 15:37 ` Christoph Hellwig
2024-10-29 15:38 ` Keith Busch
2024-10-29 15:53 ` Christoph Hellwig
2024-10-29 16:22 ` Keith Busch
2024-10-30 4:55 ` Christoph Hellwig
2024-10-30 15:41 ` Keith Busch
2024-10-30 15:45 ` Christoph Hellwig
2024-10-30 15:48 ` Keith Busch
2024-10-30 15:50 ` Christoph Hellwig
2024-10-30 16:42 ` Keith Busch
2024-10-30 16:57 ` Christoph Hellwig
2024-10-30 17:05 ` Keith Busch
2024-10-30 17:15 ` Christoph Hellwig
2024-10-30 17:23 ` Keith Busch
2024-10-30 22:32 ` Keith Busch
2024-10-31 8:19 ` Hans Holmberg
2024-10-31 13:02 ` Christoph Hellwig
2024-10-31 14:06 ` Keith Busch
2024-11-01 7:16 ` Hans Holmberg
2024-11-01 8:19 ` Javier González
2024-11-01 14:49 ` Keith Busch
2024-11-06 14:26 ` Hans Holmberg
2024-10-30 16:59 ` Bart Van Assche
2024-10-30 17:14 ` Christoph Hellwig
2024-10-30 17:44 ` Bart Van Assche
2024-11-01 1:03 ` Jaegeuk Kim
2024-10-29 17:18 ` Bart Van Assche
2024-10-30 5:42 ` Christoph Hellwig
2024-10-29 15:24 ` [PATCHv10 0/9] write hints with nvme fdp, scsi streams Christoph Hellwig
2024-11-05 15:50 ` Christoph Hellwig
2024-11-06 18:36 ` Keith Busch
2024-11-07 20:36 ` Keith Busch
2024-11-08 14:18 ` Christoph Hellwig
2024-11-08 15:51 ` Keith Busch
2024-11-08 16:54 ` Matthew Wilcox
2024-11-08 17:43 ` Javier Gonzalez
2024-11-08 18:51 ` Bart Van Assche
2024-11-11 9:31 ` Javier Gonzalez
2024-11-11 17:45 ` Bart Van Assche
2024-11-12 13:52 ` Nitesh Shetty
2024-11-19 2:03 ` Martin K. Petersen
2024-11-25 23:21 ` Bart Van Assche
2024-11-27 2:54 ` Martin K. Petersen
2024-11-27 18:42 ` Bart Van Assche [this message]
2024-11-27 20:14 ` Martin K. Petersen
2024-11-27 21:06 ` Bart Van Assche
2024-11-28 2:09 ` Martin K. Petersen
2024-11-28 8:51 ` Damien Le Moal
2024-11-29 6:19 ` Christoph Hellwig
2024-11-29 6:23 ` Damien Le Moal
2024-11-28 3:24 ` Christoph Hellwig
2024-11-28 15:21 ` Keith Busch
2024-11-28 16:40 ` Christoph Hellwig
2024-11-11 6:51 ` Christoph Hellwig
2024-11-11 9:30 ` Javier Gonzalez
2024-11-11 9:37 ` Johannes Thumshirn
2024-11-11 9:41 ` Javier Gonzalez
2024-11-11 9:42 ` hch
2024-11-11 9:43 ` Johannes Thumshirn
2024-11-11 10:37 ` Javier Gonzalez
2024-11-11 6:49 ` Christoph Hellwig
2024-11-11 6:48 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox