From: Jens Axboe <[email protected]>
To: [email protected], [email protected]
Cc: [email protected], [email protected], [email protected],
Jens Axboe <[email protected]>
Subject: [PATCH 6/6] iomap: support IOCB_DIO_DEFER
Date: Wed, 19 Jul 2023 13:54:17 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
If IOCB_DIO_DEFER is set, utilize that to set kiocb->dio_complete handler
and data for that callback. Rather than punt the completion to a
workqueue, we pass back the handler and data to the issuer and will get a
callback from a safe task context.
Using the following fio job to randomly dio write 4k blocks at
queue depths of 1..16:
fio --name=dio-write --filename=/data1/file --time_based=1 \
--runtime=10 --bs=4096 --rw=randwrite --norandommap --buffered=0 \
--cpus_allowed=4 --ioengine=io_uring --iodepth=$depth
shows the following results before and after this patch:
Stock Patched Diff
=======================================
QD1 155K 162K + 4.5%
QD2 290K 313K + 7.9%
QD4 533K 597K +12.0%
QD8 604K 827K +36.9%
QD16 615K 845K +37.4%
which shows nice wins all around. If we factored in per-IOP efficiency,
the wins look even nicer. This becomes apparent as queue depth rises,
as the offloaded workqueue completions runs out of steam.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/iomap/direct-io.c | 47 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index b30c3edf2ef3..b7055d50dd99 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -20,6 +20,7 @@
* Private flags for iomap_dio, must not overlap with the public ones in
* iomap.h:
*/
+#define IOMAP_DIO_DEFER_COMP (1 << 26)
#define IOMAP_DIO_INLINE_COMP (1 << 27)
#define IOMAP_DIO_WRITE_FUA (1 << 28)
#define IOMAP_DIO_NEED_SYNC (1 << 29)
@@ -131,6 +132,11 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
}
EXPORT_SYMBOL_GPL(iomap_dio_complete);
+static ssize_t iomap_dio_deferred_complete(void *data)
+{
+ return iomap_dio_complete(data);
+}
+
static void iomap_dio_complete_work(struct work_struct *work)
{
struct iomap_dio *dio = container_of(work, struct iomap_dio, aio.work);
@@ -180,6 +186,31 @@ void iomap_dio_bio_end_io(struct bio *bio)
goto release_bio;
}
+ /*
+ * If this dio is flagged with IOMAP_DIO_DEFER_COMP, then schedule
+ * our completion that way to avoid an async punt to a workqueue.
+ */
+ if (dio->flags & IOMAP_DIO_DEFER_COMP) {
+ /* only polled IO cares about private cleared */
+ iocb->private = dio;
+ iocb->dio_complete = iomap_dio_deferred_complete;
+
+ /*
+ * Invoke ->ki_complete() directly. We've assigned out
+ * dio_complete callback handler, and since the issuer set
+ * IOCB_DIO_DEFER, we know their ki_complete handler will
+ * notice ->dio_complete being set and will defer calling that
+ * handler until it can be done from a safe task context.
+ *
+ * Note that the 'res' being passed in here is not important
+ * for this case. The actual completion value of the request
+ * will be gotten from dio_complete when that is run by the
+ * issuer.
+ */
+ iocb->ki_complete(iocb, 0);
+ goto release_bio;
+ }
+
/*
* Async DIO completion that requires filesystem level completion work
* gets punted to a work queue to complete as the operation may require
@@ -277,12 +308,15 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
* data IO that doesn't require any metadata updates (including
* after IO completion such as unwritten extent conversion) and
* the underlying device supports FUA. This allows us to avoid
- * cache flushes on IO completion.
+ * cache flushes on IO completion. If we can't use FUA and
+ * need to sync, disable in-task completions.
*/
if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
(dio->flags & IOMAP_DIO_WRITE_FUA) &&
(bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
use_fua = true;
+ else if (dio->flags & IOMAP_DIO_NEED_SYNC)
+ dio->flags &= ~IOMAP_DIO_DEFER_COMP;
}
/*
@@ -308,6 +342,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter,
pad = pos & (fs_block_size - 1);
if (pad)
iomap_dio_zero(iter, dio, pos - pad, pad);
+
+ dio->flags &= ~IOMAP_DIO_DEFER_COMP;
}
/*
@@ -547,6 +583,15 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
iomi.flags |= IOMAP_WRITE;
dio->flags |= IOMAP_DIO_WRITE;
+ /*
+ * Flag as supporting deferred completions, if the issuer
+ * groks it. This can avoid a workqueue punt for writes.
+ * We may later clear this flag if we need to do other IO
+ * as part of this IO completion.
+ */
+ if (iocb->ki_flags & IOCB_DIO_DEFER)
+ dio->flags |= IOMAP_DIO_DEFER_COMP;
+
if (dio_flags & IOMAP_DIO_OVERWRITE_ONLY) {
ret = -EAGAIN;
if (iomi.pos >= dio->i_size ||
--
2.40.1
next prev parent reply other threads:[~2023-07-19 19:54 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-19 19:54 [PATCHSET v3 0/6] Improve async iomap DIO performance Jens Axboe
2023-07-19 19:54 ` [PATCH 1/6] iomap: cleanup up iomap_dio_bio_end_io() Jens Axboe
2023-07-20 4:50 ` Christoph Hellwig
2023-07-20 16:13 ` Jens Axboe
2023-07-19 19:54 ` [PATCH 2/6] iomap: add IOMAP_DIO_INLINE_COMP Jens Axboe
2023-07-20 4:51 ` Christoph Hellwig
2023-07-20 16:19 ` Jens Axboe
2023-07-19 19:54 ` [PATCH 3/6] iomap: treat a write through cache the same as FUA Jens Axboe
2023-07-20 4:54 ` Christoph Hellwig
2023-07-20 16:23 ` Jens Axboe
2023-07-19 19:54 ` [PATCH 4/6] fs: add IOCB flags related to passing back dio completions Jens Axboe
2023-07-20 5:01 ` Christoph Hellwig
2023-07-20 16:25 ` Jens Axboe
2023-07-19 19:54 ` [PATCH 5/6] io_uring/rw: add write support for IOCB_DIO_DEFER Jens Axboe
2023-07-19 19:54 ` Jens Axboe [this message]
2023-07-20 4:59 ` [PATCH 6/6] iomap: support IOCB_DIO_DEFER Christoph Hellwig
2023-07-20 16:27 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox