From: Jens Axboe <[email protected]>
To: Dave Chinner <[email protected]>
Cc: [email protected], [email protected], [email protected],
[email protected]
Subject: Re: [PATCH 1/5] iomap: simplify logic for when a dio can get completed inline
Date: Wed, 19 Jul 2023 09:23:58 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 7/18/23 4:56?PM, Dave Chinner wrote:
> On Tue, Jul 18, 2023 at 01:49:16PM -0600, Jens Axboe wrote:
>> Currently iomap gates this on !IOMAP_DIO_WRITE, but this isn't entirely
>> accurate. Some writes can complete just fine inline. One such example is
>> polled IO, where the completion always happens in task context.
>>
>> Add IOMAP_DIO_INLINE_COMP which tells the completion side if we can
>> complete this dio inline, or if it needs punting to a workqueue. We set
>> this flag by default for any dio, and turn it off for unwritten extents
>> or blocks that require a sync at completion time.
>
> Ignoring the O_DSYNC case (I'll get to that at the end), this is
> still wrong - it misses extending writes that need to change file
> size at IO completion. For some filesystems, file extension at IO
> completion has the same constraints as unwritten extent conversion
> (i.e. requires locking and transactions), but the iomap
> infrastructure has no idea whether the filesystem performing the IO
> requires this or not.
>
> i.e. if iomap always punts unwritten extent IO to a workqueue, we
> also have to punt extending writes to a workqueue. Fundamentally,
> the iomap code itself cannot make a correct determination of whether
> IO completion of any specific write IO requires completion in task
> context.
>
> Only the filesystem knows that,
>
> However, the filesystem knows if the IO is going to need IO
> completion processing at submission time. It tells iomap that it
> needs completion processing via the IOMAP_F_DIRTY flag. This allows
> filesystems to determine what IOs iomap can consider as "writes that
> don't need filesystem completion processing".
>
> With this flag, iomap can optimise the IO appropriately. We can use
> REQ_FUA for O_DSYNC writes if IOMAP_F_DIRTY is not set. We can do
> inline completion if IOMAP_F_DIRTY is not set. But if IOMAP_F_DIRTY
> is set, the filesystem needs to run it's own completion processing,
> and so iomap cannot run that write with an inline completion.
Gotcha, so we need to gate INLINE_COMP on !IOMAP_F_DIRTY as well. I'll
make that change.
>> Gate the inline completion on whether we're in a task or not as well.
>> This will always be true for polled IO, but for IRQ driven IO, the
>> completion context may not allow for inline completions.
>
> Again, context does not matter for pure overwrites - we can complete
> them inline regardless of completion context. The task context only
> matters when the filesystem needs to do completion work, and we've
> already established that we are not doing inline completion
> for polled IO for unwritten, O_DSYNC or extending file writes.
Right, looks like I was just missing that bit as well, I assumed that
the previous case would co er it.
> IOWs, we already avoid polled completions for all the situations
> where IOMAP_F_DIRTY is set by the filesystem to indicate the
> operation is not a pure overwrite....
Yep
>> fs/iomap/direct-io.c | 14 +++++++++-----
>> 1 file changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
>> index ea3b868c8355..6fa77094cf0a 100644
>> --- a/fs/iomap/direct-io.c
>> +++ b/fs/iomap/direct-io.c
>> @@ -20,6 +20,7 @@
>> * Private flags for iomap_dio, must not overlap with the public ones in
>> * iomap.h:
>> */
>> +#define IOMAP_DIO_INLINE_COMP (1 << 27)
>> #define IOMAP_DIO_WRITE_FUA (1 << 28)
>> #define IOMAP_DIO_NEED_SYNC (1 << 29)
>> #define IOMAP_DIO_WRITE (1 << 30)
>> @@ -161,15 +162,15 @@ void iomap_dio_bio_end_io(struct bio *bio)
>> struct task_struct *waiter = dio->submit.waiter;
>> WRITE_ONCE(dio->submit.waiter, NULL);
>> blk_wake_io_task(waiter);
>> - } else if (dio->flags & IOMAP_DIO_WRITE) {
>> + } else if ((dio->flags & IOMAP_DIO_INLINE_COMP) && in_task()) {
>
> Regardless of whether the code is correct or not, this needs a
> comment explaining what problem the in_task() check is working
> around...
It's meant to catch cases where we're doing polled IO, but it got
cleared/disabled in the block layer. We cannot catch this at submission
time, it has to be checked at completion time. There are a few ways we
could check for that, one would be in_task(), another would be to check
the bio REQ_POLLED flag like v1 did. I don't have a strong preference
here, though it did seem like a saner check to use in_task() as generic
catch-all for if we're doing this from soft/hard irq processing or not,
unexpectedly.
>> + WRITE_ONCE(dio->iocb->private, NULL);
>> + iomap_dio_complete_work(&dio->aio.work);
>> + } else {
>> struct inode *inode = file_inode(dio->iocb->ki_filp);
>>
>> WRITE_ONCE(dio->iocb->private, NULL);
>> INIT_WORK(&dio->aio.work, iomap_dio_complete_work);
>> queue_work(inode->i_sb->s_dio_done_wq, &dio->aio.work);
>> - } else {
>> - WRITE_ONCE(dio->iocb->private, NULL);
>> - iomap_dio_complete_work(&dio->aio.work);
>> }
>> }
>>
>> @@ -244,6 +245,7 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
>>
>> if (iomap->type == IOMAP_UNWRITTEN) {
>> dio->flags |= IOMAP_DIO_UNWRITTEN;
>> + dio->flags &= ~IOMAP_DIO_INLINE_COMP;
>> need_zeroout = true;
>> }
>>
>> @@ -500,7 +502,8 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>> dio->i_size = i_size_read(inode);
>> dio->dops = dops;
>> dio->error = 0;
>> - dio->flags = 0;
>> + /* default to inline completion, turned off when not supported */
>> + dio->flags = IOMAP_DIO_INLINE_COMP;
>> dio->done_before = done_before;
>
> I think this is poorly coded. If we get the clearing logic
> wrong (as is the case in this patch) then bad things will
> happen when we run inline completion in an irq context when
> the filesystem needs to run a transaction. e.g. file extension.
Agree, it seems a bit fragile. The alternative is doing it the other way
around, enabling it for cases that we know it'll work for instead. I'll
take a stab at that approach along with the other changes.
> It looks to me like you hacked around this "default is wrong" case
> with the "in_task()" check in completion, but given that check is
> completely undocumented....
It's not a hacky work-around, it's a known case that could go wrong.
>
>> dio->submit.iter = iter;
>> @@ -535,6 +538,7 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
>> /* for data sync or sync, we need sync completion processing */
>> if (iocb_is_dsync(iocb)) {
>> dio->flags |= IOMAP_DIO_NEED_SYNC;
>> + dio->flags &= ~IOMAP_DIO_INLINE_COMP;
>
> This is looks wrong, too. We set IOMAP_DIO_WRITE_FUA ca couple of
> lines later, and during bio submission we check if REQ_FUA can be
> used if IOMAP_F_DIRTY is not set. If all the bios we submit use
> REQ_FUA, then we clear IOMAP_DIO_NEED_SYNC before we drop the dio
> submission reference.
>
> For such a REQ_FUA bio chains, we can now safely do inline
> completion because we don't run generic_write_sync() in IO
> completion now. The filesystem does not need to perform blocking or
> IO operations in completion, either, so these IOs can be completed
> in line like any other pure overwrite DIO....
True, non-extending FUA writes would be fine as well.
Thanks for the review!
--
Jens Axboe
next prev parent reply other threads:[~2023-07-19 15:24 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-18 19:49 [PATCHSET v2 0/5] Improve async iomap DIO performance Jens Axboe
2023-07-18 19:49 ` [PATCH] io_uring: Use io_schedule* in cqring wait Jens Axboe
2023-07-18 19:50 ` Jens Axboe
2023-07-18 19:49 ` [PATCH 1/5] iomap: simplify logic for when a dio can get completed inline Jens Axboe
2023-07-18 22:56 ` Dave Chinner
2023-07-19 15:23 ` Jens Axboe [this message]
2023-07-18 19:49 ` [PATCH 2/5] fs: add IOCB flags related to passing back dio completions Jens Axboe
2023-07-18 19:49 ` [PATCH 3/5] io_uring/rw: add write support for IOCB_DIO_DEFER Jens Axboe
2023-07-18 19:49 ` [PATCH 4/5] iomap: add local 'iocb' variable in iomap_dio_bio_end_io() Jens Axboe
2023-07-18 19:49 ` [PATCH 5/5] iomap: support IOCB_DIO_DEFER Jens Axboe
2023-07-18 23:50 ` Dave Chinner
2023-07-19 19:55 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox