From: Damien Le Moal <[email protected]>
To: Kanchan Joshi <[email protected]>,
"[email protected]" <[email protected]>
Cc: "[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>
Subject: Re: [PATCH 0/3] zone-append support in aio and io-uring
Date: Fri, 19 Jun 2020 03:08:33 +0000 [thread overview]
Message-ID: <CY4PR04MB37515E4FCD1EAA5880E2E1A2E7980@CY4PR04MB3751.namprd04.prod.outlook.com> (raw)
In-Reply-To: 20200618175258.GA4141152@test-zns
On 2020/06/19 2:55, Kanchan Joshi wrote:
> On Wed, Jun 17, 2020 at 11:56:34PM -0700, Christoph Hellwig wrote:
>> On Wed, Jun 17, 2020 at 10:53:36PM +0530, Kanchan Joshi wrote:
>>> This patchset enables issuing zone-append using aio and io-uring direct-io interface.
>>>
>>> For aio, this introduces opcode IOCB_CMD_ZONE_APPEND. Application uses start LBA
>>> of the zone to issue append. On completion 'res2' field is used to return
>>> zone-relative offset.
>>>
>>> For io-uring, this introduces three opcodes: IORING_OP_ZONE_APPEND/APPENDV/APPENDV_FIXED.
>>> Since io_uring does not have aio-like res2, cqe->flags are repurposed to return zone-relative offset
>>
>> And what exactly are the semantics supposed to be? Remember the
>> unix file abstractions does not know about zones at all.
>>
>> I really don't think squeezing low-level not quite block storage
>> protocol details into the Linux read/write path is a good idea.
>
> I was thinking of raw block-access to zone device rather than pristine file
> abstraction. And in that context, semantics, at this point, are unchanged
> (i.e. same as direct writes) while flexibility of async-interface gets
> added.
The aio->aio_offset use by the user and kernel differs for regular writes and
zone append writes. This is a significant enough change to say that semantic
changed. Yes both cases are direct IOs, but specification of the write location
by the user and where the data actually lands on disk are different.
There are a lot of subtle things that can happen that makes mapping of zone
append operations to POSIX semantic difficult. E.g. for a regular file, using
zone append for any write issued to a file open with O_APPEND maps well to POSIX
only for blocking writes. For asynchronous writes, that is not true anymore
since the order of data defined by the automatic append after the previous async
write breaks: data can land anywhere in the zone regardless of the offset
specified on submission.
> Synchronous-writes on single-zone sound fine, but synchronous-appends on
> single-zone do not sound that fine.
Why not ? This is a perfectly valid use case that actually does not have any
semantic problem. It indeed may not be the most effective method to get high
performance but saying that it is "not fine" is not correct in my opinion.
>
>> What could be a useful addition is a way for O_APPEND/RWF_APPEND writes
>> to report where they actually wrote, as that comes close to Zone Append
>> while still making sense at our usual abstraction level for file I/O.
>
> Thanks for suggesting this. O and RWF_APPEND may not go well with block
> access as end-of-file will be picked from dev inode. But perhaps a new
> flag like RWF_ZONE_APPEND can help to transform writes (aio or uring)
> into append without introducing new opcodes.
Yes, RWF_ZONE_APPEND may be better if the semantic of RWF_APPEND cannot be
cleanly reused. But as Christoph said, RWF_ZONE_APPEND semantic need to be
clarified so that all reviewer can check the code against the intended behavior,
and comment on that intended behavior too.
> And, I think, this can fit fine on file-abstraction of ZoneFS as well.
May be. Depends on what semantic you are after for user zone append interface.
Ideally, we should have at least the same for raw block device and zonefs. But
zonefs may be able to do a better job thanks to its real regular file
abstraction of zones. As Christoph said, we started looking into it but lacked
time to complete this work. This is still on-going.
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2020-06-19 3:08 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20200617172653epcas5p488de50090415eb802e62acc0e23d8812@epcas5p4.samsung.com>
2020-06-17 17:23 ` [PATCH 0/3] zone-append support in aio and io-uring Kanchan Joshi
[not found] ` <CGME20200617172702epcas5p4dbf4729d31d9a85ab1d261d04f238e61@epcas5p4.samsung.com>
2020-06-17 17:23 ` [PATCH 1/3] fs,block: Introduce IOCB_ZONE_APPEND and direct-io handling Kanchan Joshi
2020-06-17 19:02 ` Pavel Begunkov
2020-06-18 7:16 ` Damien Le Moal
2020-06-18 18:35 ` Kanchan Joshi
[not found] ` <CGME20200617172706epcas5p4dcbc164063f58bad95b211b9d6dfbfa9@epcas5p4.samsung.com>
2020-06-17 17:23 ` [PATCH 2/3] aio: add support for zone-append Kanchan Joshi
2020-06-18 7:33 ` Damien Le Moal
[not found] ` <CGME20200617172713epcas5p352f2907a12bd4ee3c97be1c7d8e1569e@epcas5p3.samsung.com>
2020-06-17 17:23 ` [PATCH 3/3] io_uring: " Kanchan Joshi
2020-06-17 18:55 ` Pavel Begunkov
2020-06-18 7:39 ` Damien Le Moal
2020-06-18 8:35 ` [email protected]
2020-06-18 8:47 ` Damien Le Moal
2020-06-18 9:11 ` [email protected]
2020-06-19 9:41 ` [email protected]
2020-06-19 11:15 ` Matias Bjørling
2020-06-19 14:18 ` Jens Axboe
2020-06-19 15:14 ` Matias Bjørling
2020-06-19 15:20 ` Jens Axboe
2020-06-19 15:40 ` Matias Bjørling
2020-06-19 15:44 ` Jens Axboe
2020-06-21 18:55 ` [email protected]
2020-06-19 14:15 ` Jens Axboe
2020-06-19 14:59 ` Pavel Begunkov
2020-06-19 15:02 ` Jens Axboe
2020-06-21 18:52 ` [email protected]
2020-06-17 17:42 ` [PATCH 0/3] zone-append support in aio and io-uring Matthew Wilcox
2020-06-18 6:56 ` Christoph Hellwig
2020-06-18 8:29 ` Javier González
2020-06-18 17:52 ` Kanchan Joshi
2020-06-19 3:08 ` Damien Le Moal [this message]
2020-06-19 7:56 ` Christoph Hellwig
2020-06-18 8:04 ` Matias Bjørling
2020-06-18 8:27 ` Javier González
2020-06-18 8:32 ` Matias Bjørling
2020-06-18 8:39 ` Javier González
2020-06-18 8:46 ` Matias Bjørling
2020-06-18 14:16 ` Christoph Hellwig
2020-06-18 19:21 ` Kanchan Joshi
2020-06-18 20:04 ` Matias Bjørling
2020-06-19 1:03 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CY4PR04MB37515E4FCD1EAA5880E2E1A2E7980@CY4PR04MB3751.namprd04.prod.outlook.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox