From: JeffleXu <[email protected]>
To: Dave Chinner <[email protected]>, Jens Axboe <[email protected]>
Cc: Hao Xu <[email protected]>,
Alexander Viro <[email protected]>,
Christoph Hellwig <[email protected]>,
"Darrick J. Wong" <[email protected]>,
[email protected],
Konstantin Khlebnikov <[email protected]>,
[email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH v3 RESEND] iomap: set REQ_NOWAIT according to IOCB_NOWAIT in Direct IO
Date: Thu, 10 Dec 2020 10:33:59 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 12/10/20 5:15 AM, Dave Chinner wrote:
> On Mon, Dec 07, 2020 at 04:40:38PM -0700, Jens Axboe wrote:
>> On 12/6/20 7:21 PM, Dave Chinner wrote:
>>> On Fri, Dec 04, 2020 at 05:44:56PM +0800, Hao Xu wrote:
>>>> Currently, IOCB_NOWAIT is ignored in Direct IO, REQ_NOWAIT is only set
>>>> when IOCB_HIPRI is set. But REQ_NOWAIT should be set as well when
>>>> IOCB_NOWAIT is set.
>>>>
>>>> Suggested-by: Jeffle Xu <[email protected]>
>>>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>>>> Signed-off-by: Hao Xu <[email protected]>
>>>> ---
>>>>
>>>> Hi all,
>>>> I tested fio io_uring direct read for a file on ext4 filesystem on a
>>>> nvme ssd. I found that IOCB_NOWAIT is ignored in iomap layer, which
>>>> means REQ_NOWAIT is not set in bio->bi_opf.
>>>
>>> What iomap is doing is correct behaviour. IOCB_NOWAIT applies to the
>>> filesystem behaviour, not the block device.
>>>
>>> REQ_NOWAIT can result in partial IO failures because the error is
>>> only reported to the iomap layer via IO completions. Hence we can
>>> split a DIO into multiple bios and have random bios in that IO fail
>>> with EAGAIN because REQ_NOWAIT is set. This error will
>>> get reported to the submitter via completion, and it will override
>>> any of the partial IOs that actually completed.
>>>
>>> Hence, like the recently reported multi-mapping IOCB_NOWAIT bug
>>> reported by Jens and fixed in commit 883a790a8440 ("xfs: don't allow
>>> NOWAIT DIO across extent boundaries") we'll get silent partial
>>> writes occurring because the second submitted bio in an IO can
>>> trigger EAGAIN errors with partial IO completion having already
>>> occurred.
>>>
>>> Further, we don't allow partial IO completion for DIO on XFS at all.
>>> DIO must be completely submitted and completed or return an error
>>> without having issued any IO at all. Hence using REQ_NOWAIT for
>>> DIO bios is incorrect and not desirable.
>>
>> What you say makes total sense for a user using RWF_NOWAIT, but it
>> doesn't make a lot of sense for io_uring where we really want
>> IOCB_NOWAIT to be what it suggests it is - don't wait for other IO to
>> complete, if avoidable. One of the things that really suck with
>> aio/libai is the "yeah it's probably async, but lol, might not be"
>> aspect of it.
>
> Sure, but we have no way of telling what semantics the IO issuer
> actually requires from above. And because IOCB_NOWAIT behaviour is
> directly exposed to userspace by RWF_NOWAIT, that's the behaviour we
> have to implement.
>
>> For io_uring, if we do get -EAGAIN, we'll retry without NOWAIT set. So
>> the concern about fractured/short writes doesn't bubble up to the
>> application. Hence we really want an IOCB_NOWAIT_REALLY on that side,
>> instead of the poor mans IOCB_MAYBE_NOWAIT semantics.
>
> Yup, perhaps what we really want is a true IOCB_NONBLOCK flag as an
> internal kernel implementation. i.e. don't block anywhere in the
> stack, and the caller must handle retrying/completing the entire IO
> regardless of where the -EAGAIN comes from during the IO, including
> from a partial completion that the caller is waiting for...
>
> i.e. rather than hacking around this specific instance of "it blocks
> and we don't want it to", define the semantics and behaviour of a
> fully non-blocking IO through all layers from the VFS down to the
> hardware and let's implement that. Then we can stop playing
> whack-a-mole with all the "but it blocks when I do this, doctor!"
> issues that we seem to keep having.... :)
>
From my opinion, the semantics of NOWAIT is clear, just like Jens said,
"don't wait, if**avoidable**". Also it should be the responsibility of
the callers who set this flag to resubmit the failed IO once a -EAGAIN
returned. e.g. if the user app sets RWF_NOWAIT, then it's the
responsibility of the user app to resubmit the failed IO once -EAGAIN
returned. If io_uring layer newly set IOCB_NONBLOCK flag, then it's the
responsibility of io_uring to resubmit.
The limitation here is that once a big DIO/bio got split, then the
atomicity of whole DIO/bio can not be guaranteed. This seems a
limitation of direct IO routine and block layer, and the work of above
layers e.g. filesystem, is needed to guarantee the atomicity.
--
Thanks,
Jeffle
next prev parent reply other threads:[~2020-12-10 2:35 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-04 9:44 [PATCH v3 RESEND] iomap: set REQ_NOWAIT according to IOCB_NOWAIT in Direct IO Hao Xu
2020-12-04 11:44 ` Pavel Begunkov
2020-12-07 2:21 ` Dave Chinner
2020-12-07 23:40 ` Jens Axboe
2020-12-09 21:15 ` Dave Chinner
2020-12-10 2:33 ` JeffleXu [this message]
2020-12-08 5:46 ` JeffleXu
2020-12-09 21:23 ` Dave Chinner
2020-12-10 1:55 ` JeffleXu
2020-12-10 5:18 ` Dave Chinner
2020-12-11 2:50 ` JeffleXu
2020-12-14 2:56 ` Dave Chinner
2020-12-15 9:43 ` JeffleXu
2021-04-02 14:32 ` Pavel Begunkov
2021-04-02 16:26 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59e5a477-d0b1-adda-34d1-2004a06fdef5@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox