From: Damien Le Moal <[email protected]>
To: "Martin K. Petersen" <[email protected]>,
Bart Van Assche <[email protected]>
Cc: Nitesh Shetty <[email protected]>,
Javier Gonzalez <[email protected]>,
Matthew Wilcox <[email protected]>,
Keith Busch <[email protected]>, Christoph Hellwig <[email protected]>,
Keith Busch <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>
Subject: Re: [PATCHv10 0/9] write hints with nvme fdp, scsi streams
Date: Thu, 28 Nov 2024 17:51:52 +0900 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 11/28/24 11:09, Martin K. Petersen wrote:
>
> Bart,
>
>> What if the source LBA range does not require splitting but the
>> destination LBA range requires splitting, e.g. because it crosses a
>> chunk_sectors boundary? Will the REQ_OP_COPY_IN operation succeed in
>> this case and the REQ_OP_COPY_OUT operation fail?
>
> Yes.
>
> I experimented with approaching splitting in an iterative fashion. And
> thus, if there was a split halfway through the COPY_IN I/O, we'd issue a
> corresponding COPY_OUT up to the split point and hope that the write
> subsequently didn't need a split. And then deal with the next segment.
>
> However, given that copy offload offers diminishing returns for small
> I/Os, it was not worth the hassle for the devices I used for
> development. It was cleaner and faster to just fall back to regular
> read/write when a split was required.
>
>> Does this mean that a third operation is needed to cancel
>> REQ_OP_COPY_IN operations if the REQ_OP_COPY_OUT operation fails?
>
> No. The device times out the token.
>
>> Additionally, how to handle bugs in REQ_OP_COPY_* submitters where a
>> large number of REQ_OP_COPY_IN operations is submitted without
>> corresponding REQ_OP_COPY_OUT operation? Is perhaps a mechanism
>> required to discard unmatched REQ_OP_COPY_IN operations after a
>> certain time?
>
> See above.
>
> For your EXTENDED COPY use case there is no token and thus the COPY_IN
> completes immediately.
>
> And for the token case, if you populate a million tokens and don't use
> them before they time out, it sounds like your submitting code is badly
> broken. But it doesn't matter because there are no I/Os in flight and
> thus nothing to discard.
>
>> Hmm ... we may each have a different opinion about whether or not the
>> COPY_IN/COPY_OUT semantics are a requirement for token-based copy
>> offloading.
>
> Maybe. But you'll have a hard time convincing me to add any kind of
> state machine or bio matching magic to the SCSI stack when the simplest
> solution is to treat copying like a read followed by a write. There is
> no concurrency, no kernel state, no dependency between two commands, nor
> two scsi_disk/scsi_device object lifetimes to manage.
And that also would allow supporting a fake copy offload with regular read/write
BIOs very easily, I think. So all block devices can be presented as supporting
"copy offload". That is nice for FSes.
>
>> Additionally, I'm not convinced that implementing COPY_IN/COPY_OUT for
>> ODX devices is that simple. The COPY_IN and COPY_OUT operations have
>> to be translated into three SCSI commands, isn't it? I'm referring to
>> the POPULATE TOKEN, RECEIVE ROD TOKEN INFORMATION and WRITE USING
>> TOKEN commands. What is your opinion about how to translate the two
>> block layer operations into these three SCSI commands?
>
> COPY_IN is translated to a NOP for devices implementing EXTENDED COPY
> and a POPULATE TOKEN for devices using tokens.
>
> COPY_OUT is translated to an EXTENDED COPY (or NVMe Copy) for devices
> using a single command approach and WRITE USING TOKEN for devices using
> tokens.
ATA WRITE GATHERED command is also a single copy command. That matches and while
I have not checked SAT, translation would likely work.
While I was initially worried that the 2 BIO based approach would be overly
complicated, it seems that I was wrong :)
>
> There is no need for RECEIVE ROD TOKEN INFORMATION.
>
> I am not aware of UFS devices using the token-based approach. And for
> EXTENDED COPY there is only a single command sent to the device. If you
> want to do power management while that command is being processed,
> please deal with that in UFS. The block layer doesn't deal with the
> async variants of any of the other SCSI commands either...
>
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2024-11-28 8:51 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-29 15:19 [PATCHv10 0/9] write hints with nvme fdp, scsi streams Keith Busch
2024-10-29 15:19 ` [PATCHv10 1/9] block: use generic u16 for write hints Keith Busch
2024-10-29 17:21 ` Bart Van Assche
2024-10-29 15:19 ` [PATCHv10 2/9] block: introduce max_write_hints queue limit Keith Busch
2024-10-29 15:19 ` [PATCHv10 3/9] statx: add write hint information Keith Busch
2024-10-29 15:19 ` [PATCHv10 4/9] block: allow ability to limit partition write hints Keith Busch
2024-10-29 15:23 ` Christoph Hellwig
2024-10-29 17:25 ` Bart Van Assche
2024-10-30 4:46 ` Christoph Hellwig
2024-10-30 20:11 ` Keith Busch
2024-10-30 20:26 ` Bart Van Assche
2024-10-30 20:37 ` Keith Busch
2024-10-30 21:15 ` Bart Van Assche
2024-10-29 15:19 ` [PATCHv10 5/9] block, fs: add write hint to kiocb Keith Busch
2024-10-29 15:19 ` [PATCHv10 6/9] io_uring: enable per-io hinting capability Keith Busch
2024-11-07 2:09 ` Jens Axboe
2024-10-29 15:19 ` [PATCHv10 7/9] block: export placement hint feature Keith Busch
2024-10-29 15:19 ` [PATCHv10 8/9] nvme: enable FDP support Keith Busch
2024-10-30 0:24 ` Chaitanya Kulkarni
2024-10-29 15:19 ` [PATCHv10 9/9] scsi: set permanent stream count in block limits Keith Busch
2024-10-29 15:26 ` Christoph Hellwig
2024-10-29 15:34 ` Keith Busch
2024-10-29 15:37 ` Christoph Hellwig
2024-10-29 15:38 ` Keith Busch
2024-10-29 15:53 ` Christoph Hellwig
2024-10-29 16:22 ` Keith Busch
2024-10-30 4:55 ` Christoph Hellwig
2024-10-30 15:41 ` Keith Busch
2024-10-30 15:45 ` Christoph Hellwig
2024-10-30 15:48 ` Keith Busch
2024-10-30 15:50 ` Christoph Hellwig
2024-10-30 16:42 ` Keith Busch
2024-10-30 16:57 ` Christoph Hellwig
2024-10-30 17:05 ` Keith Busch
2024-10-30 17:15 ` Christoph Hellwig
2024-10-30 17:23 ` Keith Busch
2024-10-30 22:32 ` Keith Busch
2024-10-31 8:19 ` Hans Holmberg
2024-10-31 13:02 ` Christoph Hellwig
2024-10-31 14:06 ` Keith Busch
2024-11-01 7:16 ` Hans Holmberg
2024-11-01 8:19 ` Javier González
2024-11-01 14:49 ` Keith Busch
2024-11-06 14:26 ` Hans Holmberg
2024-10-30 16:59 ` Bart Van Assche
2024-10-30 17:14 ` Christoph Hellwig
2024-10-30 17:44 ` Bart Van Assche
2024-11-01 1:03 ` Jaegeuk Kim
2024-10-29 17:18 ` Bart Van Assche
2024-10-30 5:42 ` Christoph Hellwig
2024-10-29 15:24 ` [PATCHv10 0/9] write hints with nvme fdp, scsi streams Christoph Hellwig
2024-11-05 15:50 ` Christoph Hellwig
2024-11-06 18:36 ` Keith Busch
2024-11-07 20:36 ` Keith Busch
2024-11-08 14:18 ` Christoph Hellwig
2024-11-08 15:51 ` Keith Busch
2024-11-08 16:54 ` Matthew Wilcox
2024-11-08 17:43 ` Javier Gonzalez
2024-11-08 18:51 ` Bart Van Assche
2024-11-11 9:31 ` Javier Gonzalez
2024-11-11 17:45 ` Bart Van Assche
2024-11-12 13:52 ` Nitesh Shetty
2024-11-19 2:03 ` Martin K. Petersen
2024-11-25 23:21 ` Bart Van Assche
2024-11-27 2:54 ` Martin K. Petersen
2024-11-27 18:42 ` Bart Van Assche
2024-11-27 20:14 ` Martin K. Petersen
2024-11-27 21:06 ` Bart Van Assche
2024-11-28 2:09 ` Martin K. Petersen
2024-11-28 8:51 ` Damien Le Moal [this message]
2024-11-29 6:19 ` Christoph Hellwig
2024-11-29 6:23 ` Damien Le Moal
2024-11-28 3:24 ` Christoph Hellwig
2024-11-28 15:21 ` Keith Busch
2024-11-28 16:40 ` Christoph Hellwig
2024-11-11 6:51 ` Christoph Hellwig
2024-11-11 9:30 ` Javier Gonzalez
2024-11-11 9:37 ` Johannes Thumshirn
2024-11-11 9:41 ` Javier Gonzalez
2024-11-11 9:42 ` hch
2024-11-11 9:43 ` Johannes Thumshirn
2024-11-11 10:37 ` Javier Gonzalez
2024-11-11 6:49 ` Christoph Hellwig
2024-11-11 6:48 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox