From: "Darrick J. Wong" <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected], [email protected], [email protected],
[email protected], [email protected]
Subject: Re: [PATCH 8/8] iomap: support IOCB_DIO_DEFER
Date: Fri, 21 Jul 2023 09:01:05 -0700 [thread overview]
Message-ID: <20230721160105.GR11352@frogsfrogsfrogs> (raw)
In-Reply-To: <[email protected]>
On Thu, Jul 20, 2023 at 12:13:10PM -0600, Jens Axboe wrote:
> If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler
> and data for that callback. Rather than punt the completion to a
> workqueue, we pass back the handler and data to the issuer and will get a
> callback from a safe task context.
>
> Using the following fio job to randomly dio write 4k blocks at
> queue depths of 1..16:
>
> fio --name=dio-write --filename=/data1/file --time_based=1 \
> --runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \
> --cpus_allowed=4 --ioengine=io_uring --iodepth=$depth
>
> shows the following results before and after this patch:
>
> Stock Patched Diff
> =======================================
> QD1 155K 162K + 4.5%
> QD2 290K 313K + 7.9%
> QD4 533K 597K +12.0%
> QD8 604K 827K +36.9%
> QD16 615K 845K +37.4%
Nice!
> which shows nice wins all around. If we factored in per-IOP efficiency,
> the wins look even nicer. This becomes apparent as queue depth rises,
> as the offloaded workqueue completions runs out of steam.
>
> Signed-off-by: Jens Axboe <[email protected]>
> ---
> fs/iomap/direct-io.c | 54 +++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 53 insertions(+), 1 deletion(-)
>
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index cce9af019705..de86680968a4 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -20,6 +20,7 @@
> * Private flags for iomap_dio, must not overlap with the public ones in
> * iomap.h:
> */
> +#define IOMAP_DIO_DEFER_COMP (1 << 26)
IOMAP_DIO_CALLER_COMP, to go with IOCB_CALLER_COMP?
> #define IOMAP_DIO_INLINE_COMP (1 << 27)
> #define IOMAP_DIO_STABLE_WRITE (1 << 28)
> #define IOMAP_DIO_NEED_SYNC (1 << 29)
> @@ -132,6 +133,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
> }
> EXPORT_SYMBOL_GPL(iomap_dio_complete);
>
> +static ssize_t iomap_dio_deferred_complete(void *data)
> +{
> + return iomap_dio_complete(data);
> +}
> +
> static void iomap_dio_complete_work(struct work_struct *work)
> {
> struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work);
> @@ -192,6 +198,31 @@ void iomap_dio_bio_end_io(struct bio *bio)
> goto release_bio;
> }
>
> + /*
> + * If this dio is flagged with IOMAP_DIO_DEFER_COMP, then schedule
> + * our completion that way to avoid an async punt to a workqueue.
> + */
> + if (dio->flags & IOMAP_DIO_DEFER_COMP) {
> + /* only polled IO cares about private cleared */
> + iocb->private = dio;
> + iocb->dio_complete = iomap_dio_deferred_complete;
> +
> + /*
> + * Invoke ->ki_complete() directly. We've assigned out
"We've assigned our..."
> + * dio_complete callback handler, and since the issuer set
> + * IOCB_DIO_DEFER, we know their ki_complete handler will
> + * notice ->dio_complete being set and will defer calling that
> + * handler until it can be done from a safe task context.
> + *
> + * Note that the 'res' being passed in here is not important
> + * for this case. The actual completion value of the request
> + * will be gotten from dio_complete when that is run by the
> + * issuer.
> + */
> + iocb->ki_complete(iocb, 0);
> + goto release_bio;
> + }
> +
> /*
> * Async DIO completion that requires filesystem level completion work
> * gets punted to a work queue to complete as the operation may require
> @@ -288,12 +319,17 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> * after IO completion such as unwritten extent conversion) and
> * the underlying device either supports FUA or doesn't have
> * a volatile write cache. This allows us to avoid cache flushes
> - * on IO completion.
> + * on IO completion. If we can't use stable writes and need to
"If we can't use writethrough and need to sync..."
> + * sync, disable in-task completions as dio completion will
> + * need to call generic_write_sync() which will do a blocking
> + * fsync / cache flush call.
> */
> if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
> (dio->flags & IOMAP_DIO_STABLE_WRITE) &&
> (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
> use_fua = true;
> + else if (dio->flags & IOMAP_DIO_NEED_SYNC)
> + dio->flags &= ~IOMAP_DIO_DEFER_COMP;
> }
>
> /*
> @@ -319,6 +355,13 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
> pad = pos & (fs_block_size - 1);
> if (pad)
> iomap_dio_zero(iter, dio, pos - pad, pad);
> +
> + /*
> + * If need_zeroout is set, then this is a new or unwritten
> + * extent. These need extra handling at completion time, so
"...then this is a new or unwritten extent, or dirty file metadata have
not been persisted to disk."
> + * disable in-task deferred completion for those.
> + */
> + dio->flags &= ~IOMAP_DIO_DEFER_COMP;
> }
>
> /*
> @@ -557,6 +600,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
> iomi.flags |= IOMAP_WRITE;
> dio->flags |= IOMAP_DIO_WRITE;
>
> + /*
> + * Flag as supporting deferred completions, if the issuer
> + * groks it. This can avoid a workqueue punt for writes.
> + * We may later clear this flag if we need to do other IO
> + * as part of this IO completion.
> + */
> + if (iocb->ki_flags & IOCB_DIO_DEFER)
> + dio->flags |= IOMAP_DIO_DEFER_COMP;
> +
With those comment clarifications added,
Reviewed-by: Darrick J. Wong <[email protected]>
--D
> if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
> ret = -EAGAIN;
> if (iomi.pos >= dio->i_size ||
> --
> 2.40.1
>
next prev parent reply other threads:[~2023-07-21 16:01 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-20 18:13 [PATCHSET v4 0/8] Improve async iomap DIO performance Jens Axboe
2023-07-20 18:13 ` [PATCH 1/8] iomap: cleanup up iomap_dio_bio_end_io() Jens Axboe
2023-07-21 6:14 ` Christoph Hellwig
2023-07-21 15:13 ` Darrick J. Wong
2023-07-20 18:13 ` [PATCH 2/8] iomap: add IOMAP_DIO_INLINE_COMP Jens Axboe
2023-07-21 6:14 ` Christoph Hellwig
2023-07-21 15:16 ` Darrick J. Wong
2023-07-20 18:13 ` [PATCH 3/8] iomap: treat a write through cache the same as FUA Jens Axboe
2023-07-21 6:15 ` Christoph Hellwig
2023-07-21 14:04 ` Jens Axboe
2023-07-21 15:55 ` Darrick J. Wong
2023-07-21 16:03 ` Jens Axboe
2023-07-20 18:13 ` [PATCH 4/8] iomap: completed polled IO inline Jens Axboe
2023-07-21 6:16 ` Christoph Hellwig
2023-07-21 15:19 ` Darrick J. Wong
2023-07-21 21:43 ` Dave Chinner
2023-07-22 3:10 ` Jens Axboe
2023-07-22 23:05 ` Dave Chinner
2023-07-24 22:35 ` Jens Axboe
2023-07-22 16:54 ` Jens Axboe
2023-07-20 18:13 ` [PATCH 5/8] iomap: only set iocb->private for polled bio Jens Axboe
2023-07-21 6:18 ` Christoph Hellwig
2023-07-21 15:35 ` Darrick J. Wong
2023-07-21 15:37 ` Jens Axboe
2023-07-20 18:13 ` [PATCH 6/8] fs: add IOCB flags related to passing back dio completions Jens Axboe
2023-07-21 6:18 ` Christoph Hellwig
2023-07-21 15:48 ` Darrick J. Wong
2023-07-21 15:53 ` Jens Axboe
2023-07-20 18:13 ` [PATCH 7/8] io_uring/rw: add write support for IOCB_DIO_DEFER Jens Axboe
2023-07-21 6:19 ` Christoph Hellwig
2023-07-21 15:50 ` Darrick J. Wong
2023-07-21 15:53 ` Jens Axboe
2023-07-20 18:13 ` [PATCH 8/8] iomap: support IOCB_DIO_DEFER Jens Axboe
2023-07-21 6:19 ` Christoph Hellwig
2023-07-21 16:01 ` Darrick J. Wong [this message]
2023-07-21 16:30 ` Jens Axboe
2023-07-21 22:05 ` Dave Chinner
2023-07-22 3:12 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230721160105.GR11352@frogsfrogsfrogs \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox