From: Christoph Hellwig <hch@lst.de>
To: Christian Brauner <brauner@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
"Darrick J. Wong" <djwong@kernel.org>, Jan Kara <jack@suse.cz>,
Jens Axboe <axboe@kernel.dk>, Avi Kivity <avi@scylladb.com>,
Damien Le Moal <dlemoal@kernel.org>,
Naohiro Aota <naohiro.aota@wdc.com>,
Johannes Thumshirn <jth@kernel.org>,
linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
io-uring@vger.kernel.org
Subject: [PATCH 4/5] iomap: support write completions from interrupt context
Date: Wed, 12 Nov 2025 08:21:28 +0100 [thread overview]
Message-ID: <20251112072214.844816-5-hch@lst.de> (raw)
In-Reply-To: <20251112072214.844816-1-hch@lst.de>
Completions for pure overwrites don't need to be deferred to a workqueue
as there is no work to be done, or at least no work that needs a user
context. Set the IOMAP_DIO_INLINE_COMP by default for writes like we
already do for reads, and the clear it for all the cases that actually
do need a user context for completions to update the inode size or
record updates to the logical to physical mapping.
I've audited all users of the ->end_io callback, and they only require
user context for I/O that involves unwritten extents, COW, size
extensions, or error handling and all those are still run from workqueue
context.
This restores the behavior of the old pre-iomap direct I/O code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
fs/iomap/direct-io.c | 55 +++++++++++++++++++++++++++++++++++---------
1 file changed, 44 insertions(+), 11 deletions(-)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index c4a883fa8ea5..df313232f422 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -184,6 +184,21 @@ static void iomap_dio_done(struct iomap_dio *dio)
if (dio->error)
dio->flags &= ~IOMAP_DIO_INLINE_COMP;
+ /*
+ * Never invalidate pages from this context to avoid deadlocks with
+ * buffered I/O completions when called from the ioend workqueue,
+ * or avoid sleeping when called directly from ->bi_end_io.
+ * Tough luck if you hit the tiny race with someone dirtying the range
+ * right between this check and the actual completion.
+ */
+ if ((dio->flags & IOMAP_DIO_WRITE) &&
+ (dio->flags & IOMAP_DIO_INLINE_COMP)) {
+ if (dio->iocb->ki_filp->f_mapping->nrpages)
+ dio->flags &= ~IOMAP_DIO_INLINE_COMP;
+ else
+ dio->flags |= IOMAP_DIO_NO_INVALIDATE;
+ }
+
if (dio->flags & IOMAP_DIO_INLINE_COMP) {
WRITE_ONCE(iocb->private, NULL);
iomap_dio_complete_work(&dio->aio.work);
@@ -234,15 +249,9 @@ u32 iomap_finish_ioend_direct(struct iomap_ioend *ioend)
/*
* Try to avoid another context switch for the completion given
* that we are already called from the ioend completion
- * workqueue, but never invalidate pages from this thread to
- * avoid deadlocks with buffered I/O completions. Tough luck if
- * you hit the tiny race with someone dirtying the range now
- * between this check and the actual completion.
+ * workqueue.
*/
- if (!dio->iocb->ki_filp->f_mapping->nrpages) {
- dio->flags |= IOMAP_DIO_INLINE_COMP;
- dio->flags |= IOMAP_DIO_NO_INVALIDATE;
- }
+ dio->flags |= IOMAP_DIO_INLINE_COMP;
iomap_dio_done(dio);
}
@@ -365,6 +374,16 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
else
dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
}
+
+ /*
+ * We can only do inline completion for pure overwrites that
+ * don't require additional I/O at completion time.
+ *
+ * This rules out writes that need zeroing or extent conversion,
+ * or extend the file size.
+ */
+ if (!iomap_dio_is_overwrite(iomap))
+ dio->flags &= ~IOMAP_DIO_INLINE_COMP;
} else {
bio_opf |= REQ_OP_READ;
}
@@ -624,10 +643,13 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
if (dio_flags & IOMAP_DIO_FSBLOCK_ALIGNED)
dio->flags |= IOMAP_DIO_FSBLOCK_ALIGNED;
- if (iov_iter_rw(iter) == READ) {
- /* reads can always complete inline */
- dio->flags |= IOMAP_DIO_INLINE_COMP;
+ /*
+ * Try to complete inline if we can. For reads this is always possible,
+ * but for writes we'll end up clearing this more often than not.
+ */
+ dio->flags |= IOMAP_DIO_INLINE_COMP;
+ if (iov_iter_rw(iter) == READ) {
if (iomi.pos >= dio->i_size)
goto out_free_dio;
@@ -669,6 +691,12 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
dio->flags |= IOMAP_DIO_WRITE_THROUGH;
}
+ /*
+ * Inode size updates must to happen from process context.
+ */
+ if (iomi.pos + iomi.len > dio->i_size)
+ dio->flags &= ~IOMAP_DIO_INLINE_COMP;
+
/*
* Try to invalidate cache pages for the range we are writing.
* If this invalidation fails, let the caller fall back to
@@ -741,9 +769,14 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
* If all the writes we issued were already written through to the
* media, we don't need to flush the cache on IO completion. Clear the
* sync flag for this case.
+ *
+ * Otherwise clear the inline completion flag if any sync work is
+ * needed, as that needs to be performed from process context.
*/
if (dio->flags & IOMAP_DIO_WRITE_THROUGH)
dio->flags &= ~IOMAP_DIO_NEED_SYNC;
+ else if (dio->flags & IOMAP_DIO_NEED_SYNC)
+ dio->flags &= ~IOMAP_DIO_INLINE_COMP;
/*
* We are about to drop our additional submission reference, which
--
2.47.3
next prev parent reply other threads:[~2025-11-12 7:22 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 7:21 enable iomap dio write completions from interrupt context Christoph Hellwig
2025-11-12 7:21 ` [PATCH 1/5] fs, iomap: remove IOCB_DIO_CALLER_COMP Christoph Hellwig
2025-11-12 19:59 ` Jan Kara
2025-11-13 0:00 ` Chaitanya Kulkarni
2025-11-12 7:21 ` [PATCH 2/5] iomap: always run error completions in user context Christoph Hellwig
2025-11-12 20:01 ` Jan Kara
2025-11-13 0:02 ` Chaitanya Kulkarni
2025-11-12 7:21 ` [PATCH 3/5] iomap: rework REQ_FUA selection Christoph Hellwig
2025-11-12 20:07 ` Jan Kara
2025-11-12 7:21 ` Christoph Hellwig [this message]
2025-11-12 20:25 ` [PATCH 4/5] iomap: support write completions from interrupt context Jan Kara
2025-11-13 6:50 ` Christoph Hellwig
2025-11-13 9:54 ` Jan Kara
2025-11-13 10:06 ` Christoph Hellwig
2025-11-13 12:06 ` Jan Kara
2025-11-13 12:35 ` Christoph Hellwig
2025-11-12 7:21 ` [PATCH 5/5] iomap: invert the polarity of IOMAP_DIO_INLINE_COMP Christoph Hellwig
2025-11-13 12:09 ` Jan Kara
2025-11-12 8:43 ` enable iomap dio write completions from interrupt context Damien Le Moal
2025-11-12 8:44 ` Christoph Hellwig
2025-11-12 8:46 ` Damien Le Moal
2025-11-13 9:58 ` Jan Kara
2025-11-13 10:05 ` Christoph Hellwig
2025-11-13 10:07 ` Damien Le Moal
2025-11-13 11:52 ` Jan Kara
-- strict thread matches above, loose matches on Subject: below --
2025-11-13 17:06 enable iomap dio write completions from interrupt context v2 Christoph Hellwig
2025-11-13 17:06 ` [PATCH 4/5] iomap: support write completions from interrupt context Christoph Hellwig
2025-11-24 11:11 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251112072214.844816-5-hch@lst.de \
--to=hch@lst.de \
--cc=avi@scylladb.com \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=djwong@kernel.org \
--cc=dlemoal@kernel.org \
--cc=io-uring@vger.kernel.org \
--cc=jack@suse.cz \
--cc=jth@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=naohiro.aota@wdc.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox