From: Pavel Begunkov <[email protected]>
To: Dave Chinner <[email protected]>
Cc: Christian Brauner <[email protected]>,
[email protected], [email protected],
"Darrick J . Wong" <[email protected]>,
[email protected], wu lei <[email protected]>
Subject: Re: [PATCH v2 1/1] iomap: propagate nowait to block layer
Date: Tue, 4 Mar 2025 22:47:48 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 3/4/25 21:11, Dave Chinner wrote:
...
>> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
>> index b521eb15759e..07c336fdf4f0 100644
>> --- a/fs/iomap/direct-io.c
>> +++ b/fs/iomap/direct-io.c
>> @@ -363,9 +363,14 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>> */
>> if (need_zeroout ||
>> ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) ||
>> - ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
>> + ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) {
>> dio->flags &= ~IOMAP_DIO_CALLER_COMP;
>>
>> + if (!is_sync_kiocb(dio->iocb) &&
>> + (dio->iocb->ki_flags & IOCB_NOWAIT))
>> + return -EAGAIN;
>> + }
>
> How are we getting here with IOCB_NOWAIT IO? This is either
> sub-block unaligned write IO, it is a write IO that requires
> allocation (i.e. write beyond EOF), or we are doing a O_DSYNC write
> on hardware that doesn't support REQ_FUA.
>
> The first 2 cases should have already been filtered out by the
> filesystem before we ever get here. The latter doesn't require
> multiple IOs in IO submission - the O_DSYNC IO submission (if any is
> required) occurs from data IO completion context, and so it will not
> block IO submission at all.
>
> So what type of IO in what mapping condition is triggering the need
> to return EAGAIN here?
I didn't hit it but neither could prove that it's impossible.
I'll drop the hunk since you're saying it shouldn't be needed.
>> /*
>> * The rules for polled IO completions follow the guidelines as the
>> * ones we set for inline and deferred completions. If none of those
>> @@ -374,6 +379,23 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>> if (!(dio->flags & (IOMAP_DIO_INLINE_COMP|IOMAP_DIO_CALLER_COMP)))
>> dio->iocb->ki_flags &= ~IOCB_HIPRI;
>>
>> + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic);
>> +
>> + if (!is_sync_kiocb(dio->iocb) && (dio->iocb->ki_flags & IOCB_NOWAIT)) {
>> + /*
>> + * This is nonblocking IO, and we might need to allocate
>> + * multiple bios. In this case, as we cannot guarantee that
>> + * one of the sub bios will not fail getting issued FOR NOWAIT
>> + * and as error results are coalesced across all of them, ask
>> + * for a retry of this from blocking context.
>> + */
>> + if (bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS + 1) >
>> + BIO_MAX_VECS)
>> + return -EAGAIN;
>> +
>> + bio_opf |= REQ_NOWAIT;
>> + }
>
> Ok, so this allows a max sized bio to be used. So, what, 1MB on 4kB
> page size is the maximum IO size for IOCB_NOWAIT now? I bet that's
> not documented anywhere....
>
> Ah. This doesn't fix the problem at all.
>
> Say, for exmaple, I have a hardware storage device with a max
> hardware IO size of 128kB. This is from the workstation I'm typing
> this email on:
>
> $ cat /sys/block/nvme0n1/queue/max_hw_sectors_kb
> 128
> $ cat /sys/block/nvme0n1/queue/max_segments
> 33
> $
>
> We build a 1MB bio above, set REQ_NOWAIT, then:
>
> submit_bio
> ....
> blk_mq_submit_bio
> __bio_split_to_limits(bio, &q->limits, &nr_segs);
> bio_split_rw()
> .....
> split:
> .....
> /*
> * We can't sanely support splitting for a REQ_NOWAIT bio. End it
> * with EAGAIN if splitting is required and return an error pointer.
> */
> if (bio->bi_opf & REQ_NOWAIT)
> return -EAGAIN;
>
>
> So, REQ_NOWAIT effectively limits bio submission to the maximum
> single IO size of the underlying storage. So, we can't use
> REQ_NOWAIT without actually looking at request queue limits before
> we start building the IO - yuk.
>
> REQ_NOWAIT still feels completely unusable to me....
Not great but better than not using the async path at all (i.e. executing
by a kernel worker) as far as io_uring concerned. It should cover a good
share of users, and io_uring has a fallback path, so userspace won't even
know about the error. The overhead per byte is less biting for 128K as well.
The patch already limits the change to async users only, but if you're
concerned about non-io_uring users, I can try to narrow it to io_uring
requests only, even though I don't think it's a good way.
Are you only concerned about the size being too restrictive or do you
see any other problems?
--
Pavel Begunkov
next prev parent reply other threads:[~2025-03-04 22:46 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-04 12:18 [PATCH v2 1/1] iomap: propagate nowait to block layer Pavel Begunkov
2025-03-04 16:07 ` Christoph Hellwig
2025-03-04 16:41 ` Pavel Begunkov
2025-03-04 16:59 ` Christoph Hellwig
2025-03-04 17:36 ` Jens Axboe
2025-03-04 23:26 ` Christoph Hellwig
2025-03-04 23:43 ` Jens Axboe
2025-03-04 23:49 ` Christoph Hellwig
2025-03-05 0:14 ` Pavel Begunkov
2025-03-05 0:18 ` Pavel Begunkov
2025-03-04 17:54 ` Pavel Begunkov
2025-03-04 23:28 ` Christoph Hellwig
2025-03-04 19:22 ` Darrick J. Wong
2025-03-04 20:35 ` Pavel Begunkov
2025-03-05 0:01 ` Christoph Hellwig
2025-03-05 0:45 ` Pavel Begunkov
2025-03-05 1:34 ` Christoph Hellwig
2025-03-04 21:11 ` Dave Chinner
2025-03-04 22:47 ` Pavel Begunkov [this message]
2025-03-04 23:40 ` Christoph Hellwig
2025-03-05 1:19 ` Dave Chinner
2025-03-05 14:10 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox