public inbox for [email protected]
 help / color / mirror / Atom feed
From: Dave Chinner <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected], [email protected], [email protected],
	[email protected]
Subject: Re: [PATCH 4/8] iomap: completed polled IO inline
Date: Sun, 23 Jul 2023 09:05:18 +1000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Fri, Jul 21, 2023 at 09:10:57PM -0600, Jens Axboe wrote:
> On 7/21/23 3:43?PM, Dave Chinner wrote:
> > On Thu, Jul 20, 2023 at 12:13:06PM -0600, Jens Axboe wrote:
> >> Polled IO is only allowed for conditions where task completion is safe
> >> anyway, so we can always complete it inline. This cannot easily be
> >> checked with a submission side flag, as the block layer may clear the
> >> polled flag and turn it into a regular IO instead. Hence we need to
> >> check this at completion time. If REQ_POLLED is still set, then we know
> >> that this IO was successfully polled, and is completing in task context.
> >>
> >> Signed-off-by: Jens Axboe <[email protected]>
> >> ---
> >>  fs/iomap/direct-io.c | 14 ++++++++++++--
> >>  1 file changed, 12 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> >> index 9f97d0d03724..c3ea1839628f 100644
> >> --- a/fs/iomap/direct-io.c
> >> +++ b/fs/iomap/direct-io.c
> >> @@ -173,9 +173,19 @@ void iomap_dio_bio_end_io(struct bio *bio)
> >>  	}
> >>  
> >>  	/*
> >> -	 * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline
> >> +	 * Flagged with IOMAP_DIO_INLINE_COMP, we can complete it inline.
> >> +	 * Ditto for polled requests - if the flag is still at completion
> >> +	 * time, then we know the request was actually polled and completion
> >> +	 * is called from the task itself. This is why we need to check it
> >> +	 * here rather than flag it at issue time.
> >>  	 */
> >> -	if (dio->flags & IOMAP_DIO_INLINE_COMP) {
> >> +	if ((dio->flags & IOMAP_DIO_INLINE_COMP) || (bio->bi_opf & REQ_POLLED)) {
> > 
> > This still smells wrong to me. Let me see if I can work out why...
> > 
> > <spelunk!>
> > 
> > When we set up the IO in iomap_dio_bio_iter(), we do this:
> > 
> >         /*
> >          * We can only poll for single bio I/Os.
> >          */
> >         if (need_zeroout ||
> >             ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
> >                 dio->iocb->ki_flags &= ~IOCB_HIPRI;
> > 
> > The "need_zeroout" covers writes into unwritten regions that require
> > conversion at IO completion, and the latter check is for writes
> > extending EOF. i.e. this covers the cases where we have dirty
> > metadata for this specific write and so may need transactions or
> > journal/metadata IO during IO completion.
> > 
> > The only case it doesn't cover is clearing IOCB_HIPRI for O_DSYNC IO
> > that may require a call to generic_write_sync() in completion. That
> > is, if we aren't using FUA, will not have IOMAP_DIO_INLINE_COMP set,
> > but still do polled IO.
> > 
> > I think this is a bug. We don't want to be issuing more IO in
> > REQ_POLLED task context during IO completion, and O_DSYNC IO
> > completion for non-FUA IO requires a journal flush and that can
> > issue lots of journal IO and wait on it in completion process.
> > 
> > Hence I think we should only be setting REQ_POLLED in the cases
> > where IOCB_HIPRI and IOMAP_DIO_INLINE_COMP are both set.  If
> > IOMAP_DIO_INLINE_COMP is set on the dio, then it doesn't matter what
> > context we are in at completion time or whether REQ_POLLED was set
> > or cleared during the IO....
> > 
> > That means the above check should be:
> > 
> >         /*
> >          * We can only poll for single bio I/Os that can run inline
> > 	 * completion.
> >          */
> >         if (need_zeroout ||
> > 	    (iocb_is_dsync(dio->iocb) && !use_fua) ||
> >             ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
> >                 dio->iocb->ki_flags &= ~IOCB_HIPRI;
> 
> Looks like you are right, it would not be a great idea to handle that
> off polled IO completion. It'd work just fine, but anything generating
> more IO should go to a helper. I'll make that change.
> 
> > or if we change the logic such that calculate IOMAP_DIO_INLINE_COMP
> > first:
> > 
> > 	if (!(dio->flags & IOMAP_DIO_INLINE_COMP))
> > 		dio->iocb->ki_flags &= ~IOCB_HIPRI;
> > 
> > Then we don't need to care about polled IO on the completion side at
> > all at the iomap layer because it doesn't change the completion
> > requirements at all...
> 
> That still isn't true, because you can still happily issue as polled IO
> and get it cleared and now have an IRQ based completion. This would work
> for most cases, but eg xfs dio end_io handler will grab:
> 
> spin_lock(&ip->i_flags_lock);
> 
> if the inode got truncated. Maybe that can't happen because we did
> inode_dio_begin() higher up?

Yes, truncate, hole punch, etc block on inode_dio_wait() with the
i_rwsem held which means it blocks new DIO submissions and waits
until all in-flight DIO before the truncate operation starts.
inode_dio_complete() does not get called until after the filesystem
->endio completion has run, so there's no possibility of
truncate-like operations actually racing with DIO completion at
all...

> Still seems saner to check for the polled
> flag at completion to me...

I disagree. If truncate (or anything that removes extents or reduces
inode size) is running whilst DIO to that range is still in
progress, we have a use-after-free situation that will cause data
and/or filesystem corruption. It's just not a safe thing to allow,
so we prevent it from occurring at a high level in the filesystem
and the result is that low level IO code just doesn't need to
care about races with layout/size changing operations...

-Dave.
-- 
Dave Chinner
[email protected]

  reply	other threads:[~2023-07-22 23:05 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-20 18:13 [PATCHSET v4 0/8] Improve async iomap DIO performance Jens Axboe
2023-07-20 18:13 ` [PATCH 1/8] iomap: cleanup up iomap_dio_bio_end_io() Jens Axboe
2023-07-21  6:14   ` Christoph Hellwig
2023-07-21 15:13   ` Darrick J. Wong
2023-07-20 18:13 ` [PATCH 2/8] iomap: add IOMAP_DIO_INLINE_COMP Jens Axboe
2023-07-21  6:14   ` Christoph Hellwig
2023-07-21 15:16   ` Darrick J. Wong
2023-07-20 18:13 ` [PATCH 3/8] iomap: treat a write through cache the same as FUA Jens Axboe
2023-07-21  6:15   ` Christoph Hellwig
2023-07-21 14:04     ` Jens Axboe
2023-07-21 15:55       ` Darrick J. Wong
2023-07-21 16:03         ` Jens Axboe
2023-07-20 18:13 ` [PATCH 4/8] iomap: completed polled IO inline Jens Axboe
2023-07-21  6:16   ` Christoph Hellwig
2023-07-21 15:19   ` Darrick J. Wong
2023-07-21 21:43   ` Dave Chinner
2023-07-22  3:10     ` Jens Axboe
2023-07-22 23:05       ` Dave Chinner [this message]
2023-07-24 22:35         ` Jens Axboe
2023-07-22 16:54     ` Jens Axboe
2023-07-20 18:13 ` [PATCH 5/8] iomap: only set iocb->private for polled bio Jens Axboe
2023-07-21  6:18   ` Christoph Hellwig
2023-07-21 15:35   ` Darrick J. Wong
2023-07-21 15:37     ` Jens Axboe
2023-07-20 18:13 ` [PATCH 6/8] fs: add IOCB flags related to passing back dio completions Jens Axboe
2023-07-21  6:18   ` Christoph Hellwig
2023-07-21 15:48   ` Darrick J. Wong
2023-07-21 15:53     ` Jens Axboe
2023-07-20 18:13 ` [PATCH 7/8] io_uring/rw: add write support for IOCB_DIO_DEFER Jens Axboe
2023-07-21  6:19   ` Christoph Hellwig
2023-07-21 15:50   ` Darrick J. Wong
2023-07-21 15:53     ` Jens Axboe
2023-07-20 18:13 ` [PATCH 8/8] iomap: support IOCB_DIO_DEFER Jens Axboe
2023-07-21  6:19   ` Christoph Hellwig
2023-07-21 16:01   ` Darrick J. Wong
2023-07-21 16:30     ` Jens Axboe
2023-07-21 22:05   ` Dave Chinner
2023-07-22  3:12     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox