public inbox for [email protected]
 help / color / mirror / Atom feed
From: JeffleXu <[email protected]>
To: Dave Chinner <[email protected]>, Jens Axboe <[email protected]>
Cc: Hao Xu <[email protected]>,
	Alexander Viro <[email protected]>,
	Christoph Hellwig <[email protected]>,
	"Darrick J. Wong" <[email protected]>,
	[email protected],
	Konstantin Khlebnikov <[email protected]>,
	[email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH v3 RESEND] iomap: set REQ_NOWAIT according to IOCB_NOWAIT in Direct IO
Date: Thu, 10 Dec 2020 10:33:59 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>



On 12/10/20 5:15 AM, Dave Chinner wrote:
> On Mon, Dec 07, 2020 at 04:40:38PM -0700, Jens Axboe wrote:
>> On 12/6/20 7:21 PM, Dave Chinner wrote:
>>> On Fri, Dec 04, 2020 at 05:44:56PM +0800, Hao Xu wrote:
>>>> Currently, IOCB_NOWAIT is ignored in Direct IO, REQ_NOWAIT is only set
>>>> when IOCB_HIPRI is set. But REQ_NOWAIT should be set as well when
>>>> IOCB_NOWAIT is set.
>>>>
>>>> Suggested-by: Jeffle Xu <[email protected]>
>>>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>>>> Signed-off-by: Hao Xu <[email protected]>
>>>> ---
>>>>
>>>> Hi all,
>>>> I tested fio io_uring direct read for a file on ext4 filesystem on a
>>>> nvme ssd. I found that IOCB_NOWAIT is ignored in iomap layer, which
>>>> means REQ_NOWAIT is not set in bio->bi_opf.
>>>
>>> What iomap is doing is correct behaviour. IOCB_NOWAIT applies to the
>>> filesystem behaviour, not the block device.
>>>
>>> REQ_NOWAIT can result in partial IO failures because the error is
>>> only reported to the iomap layer via IO completions. Hence we can
>>> split a DIO into multiple bios and have random bios in that IO fail
>>> with EAGAIN because REQ_NOWAIT is set. This error will
>>> get reported to the submitter via completion, and it will override
>>> any of the partial IOs that actually completed.
>>>
>>> Hence, like the recently reported multi-mapping IOCB_NOWAIT bug
>>> reported by Jens and fixed in commit 883a790a8440 ("xfs: don't allow
>>> NOWAIT DIO across extent boundaries") we'll get silent partial
>>> writes occurring because the second submitted bio in an IO can
>>> trigger EAGAIN errors with partial IO completion having already
>>> occurred.
>>>
>>> Further, we don't allow partial IO completion for DIO on XFS at all.
>>> DIO must be completely submitted and completed or return an error
>>> without having issued any IO at all.  Hence using REQ_NOWAIT for
>>> DIO bios is incorrect and not desirable.
>>
>> What you say makes total sense for a user using RWF_NOWAIT, but it
>> doesn't make a lot of sense for io_uring where we really want
>> IOCB_NOWAIT to be what it suggests it is - don't wait for other IO to
>> complete, if avoidable. One of the things that really suck with
>> aio/libai is the "yeah it's probably async, but lol, might not be"
>> aspect of it.
> 
> Sure, but we have no way of telling what semantics the IO issuer
> actually requires from above. And because IOCB_NOWAIT behaviour is
> directly exposed to userspace by RWF_NOWAIT, that's the behaviour we
> have to implement.
> 
>> For io_uring, if we do get -EAGAIN, we'll retry without NOWAIT set. So
>> the concern about fractured/short writes doesn't bubble up to the
>> application. Hence we really want an IOCB_NOWAIT_REALLY on that side,
>> instead of the poor mans IOCB_MAYBE_NOWAIT semantics.
> 
> Yup, perhaps what we really want is a true IOCB_NONBLOCK flag as an
> internal kernel implementation. i.e. don't block anywhere in the
> stack, and the caller must handle retrying/completing the entire IO
> regardless of where the -EAGAIN comes from during the IO, including
> from a partial completion that the caller is waiting for...
> 
> i.e. rather than hacking around this specific instance of "it blocks
> and we don't want it to", define the semantics and behaviour of a
> fully non-blocking IO through all layers from the VFS down to the
> hardware and let's implement that. Then we can stop playing
> whack-a-mole with all the "but it blocks when I do this, doctor!"
> issues that we seem to keep having.... :)
> 

From my opinion, the semantics of NOWAIT is clear, just like Jens said,
"don't wait, if**avoidable**". Also it should be the responsibility of
the callers who set this flag to resubmit the failed IO once a -EAGAIN
returned. e.g. if the user app sets RWF_NOWAIT, then it's the
responsibility of the user app to resubmit the failed IO once -EAGAIN
returned. If io_uring layer newly set IOCB_NONBLOCK flag, then it's the
responsibility of io_uring to resubmit.

The limitation here is that once a big DIO/bio got split, then the
atomicity of whole DIO/bio can not be guaranteed. This seems a
limitation of direct IO routine and block layer, and the work of above
layers e.g. filesystem, is needed to guarantee the atomicity.

-- 
Thanks,
Jeffle

  reply	other threads:[~2020-12-10  2:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-04  9:44 [PATCH v3 RESEND] iomap: set REQ_NOWAIT according to IOCB_NOWAIT in Direct IO Hao Xu
2020-12-04 11:44 ` Pavel Begunkov
2020-12-07  2:21 ` Dave Chinner
2020-12-07 23:40   ` Jens Axboe
2020-12-09 21:15     ` Dave Chinner
2020-12-10  2:33       ` JeffleXu [this message]
2020-12-08  5:46   ` JeffleXu
2020-12-09 21:23     ` Dave Chinner
2020-12-10  1:55       ` JeffleXu
2020-12-10  5:18         ` Dave Chinner
2020-12-11  2:50           ` JeffleXu
2020-12-14  2:56             ` Dave Chinner
2020-12-15  9:43               ` JeffleXu
2021-04-02 14:32                 ` Pavel Begunkov
2021-04-02 16:26                   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59e5a477-d0b1-adda-34d1-2004a06fdef5@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox