[PATCHSET for-next 0/3] Fix read/write -EAGAIN failure cases

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCHSET for-next 0/3] Fix read/write -EAGAIN failure cases
@ 2025-01-09 18:15 Jens Axboe
  2025-01-09 18:15 ` [PATCH 1/3] io_uring/rw: use io_rw_recycle() from cleanup path Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Jens Axboe @ 2025-01-09 18:15 UTC (permalink / raw)
  To: io-uring

Hi,

After I had originally ensured that all read/write requests were
fully retryable (with the 6.10 "always allocate async data" changes),
there was an attempt or two at getting rid of these -EAGAIN's bubbling
back to userspace. See:

commit 039a2e800bcd5beb89909d1a488abf3d647642cf
Author: Jens Axboe <[email protected]>
Date:   Thu Apr 25 09:04:32 2024 -0600

    io_uring/rw: reinstate thread check for retries

and the other two commits it references.

Here's another attempt, which simply moves this kind of checking to
be at completion time. This solves the issue of REQ_F_REISSUE being
missed in case it gets sets outside of submission context, and
generally just cleans up the random REQ_F_REISSUE checking that
otherwise needs to happen for the read/write issue path.

Patch 1 is just a cleanup, patch 2 does the actual change, and patch 3
finally removes the thread group checking in the "Can I resubmit"
checks.

 io_uring/io_uring.c | 15 +++++++-
 io_uring/rw.c       | 89 ++++++++++++++-------------------------------
 2 files changed, 40 insertions(+), 64 deletions(-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] io_uring/rw: use io_rw_recycle() from cleanup path
  2025-01-09 18:15 [PATCHSET for-next 0/3] Fix read/write -EAGAIN failure cases Jens Axboe
@ 2025-01-09 18:15 ` Jens Axboe
  2025-01-09 18:15 ` [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time Jens Axboe
  2025-01-09 18:15 ` [PATCH 3/3] io_uring/rw: don't gate retry on completion context Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-01-09 18:15 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Cleanup should always have the uring lock held, it's safe to recycle
from here.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/rw.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/io_uring/rw.c b/io_uring/rw.c
index ca1b19d3d142..afc669048c5d 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -434,7 +434,8 @@ int io_read_mshot_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 void io_readv_writev_cleanup(struct io_kiocb *req)
 {
-	io_rw_iovec_free(req->async_data);
+	lockdep_assert_held(&req->ctx->uring_lock);
+	io_rw_recycle(req, 0);
 }
 
 static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time
  2025-01-09 18:15 [PATCHSET for-next 0/3] Fix read/write -EAGAIN failure cases Jens Axboe
  2025-01-09 18:15 ` [PATCH 1/3] io_uring/rw: use io_rw_recycle() from cleanup path Jens Axboe
@ 2025-01-09 18:15 ` Jens Axboe
  2025-01-09 18:15 ` [PATCH 3/3] io_uring/rw: don't gate retry on completion context Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-01-09 18:15 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Rather than try and have io_read/io_write turn REQ_F_REISSUE into
-EAGAIN, catch the REQ_F_REISSUE when the request is otherwise
considered as done. This is saner as we know this isn't happening
during an actual submission, and it removes the need to randomly
check REQ_F_REISSUE after read/write submission.

If REQ_F_REISSUE is set, __io_submit_flush_completions() will skip over
this request in terms of posting a CQE, and the regular request
cleaning will ensure that it gets reissued via io-wq.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/io_uring.c | 15 +++++++--
 io_uring/rw.c       | 80 ++++++++++++++-------------------------------
 2 files changed, 38 insertions(+), 57 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index db198bd435b5..92ba2fdcd087 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -115,7 +115,7 @@
 				REQ_F_ASYNC_DATA)
 
 #define IO_REQ_CLEAN_SLOW_FLAGS (REQ_F_REFCOUNT | REQ_F_LINK | REQ_F_HARDLINK |\
-				 IO_REQ_CLEAN_FLAGS)
+				 REQ_F_REISSUE | IO_REQ_CLEAN_FLAGS)
 
 #define IO_TCTX_REFS_CACHE_NR	(1U << 10)
 
@@ -1403,6 +1403,12 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
 						    comp_list);
 
 		if (unlikely(req->flags & IO_REQ_CLEAN_SLOW_FLAGS)) {
+			if (req->flags & REQ_F_REISSUE) {
+				node = req->comp_list.next;
+				req->flags &= ~REQ_F_REISSUE;
+				io_queue_iowq(req);
+				continue;
+			}
 			if (req->flags & REQ_F_REFCOUNT) {
 				node = req->comp_list.next;
 				if (!req_ref_put_and_test(req))
@@ -1442,7 +1448,12 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
 		struct io_kiocb *req = container_of(node, struct io_kiocb,
 					    comp_list);
 
-		if (!(req->flags & REQ_F_CQE_SKIP) &&
+		/*
+		 * Requests marked with REQUEUE should not post a CQE, they
+		 * will go through the io-wq retry machinery and post one
+		 * later.
+		 */
+		if (!(req->flags & (REQ_F_CQE_SKIP | REQ_F_REISSUE)) &&
 		    unlikely(!io_fill_cqe_req(ctx, req))) {
 			if (ctx->lockless_cq) {
 				spin_lock(&ctx->completion_lock);
diff --git a/io_uring/rw.c b/io_uring/rw.c
index afc669048c5d..c52c0515f0a2 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -202,7 +202,7 @@ static void io_req_rw_cleanup(struct io_kiocb *req, unsigned int issue_flags)
 	 * mean that the underlying data can be gone at any time. But that
 	 * should be fixed seperately, and then this check could be killed.
 	 */
-	if (!(req->flags & REQ_F_REFCOUNT)) {
+	if (!(req->flags & (REQ_F_REISSUE | REQ_F_REFCOUNT))) {
 		req->flags &= ~REQ_F_NEED_CLEANUP;
 		io_rw_recycle(req, issue_flags);
 	}
@@ -455,19 +455,12 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
 	return NULL;
 }
 
-#ifdef CONFIG_BLOCK
-static void io_resubmit_prep(struct io_kiocb *req)
-{
-	struct io_async_rw *io = req->async_data;
-	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
-
-	io_meta_restore(io, &rw->kiocb);
-	iov_iter_restore(&io->iter, &io->iter_state);
-}
-
 static bool io_rw_should_reissue(struct io_kiocb *req)
 {
+#ifdef CONFIG_BLOCK
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
 	umode_t mode = file_inode(req->file)->i_mode;
+	struct io_async_rw *io = req->async_data;
 	struct io_ring_ctx *ctx = req->ctx;
 
 	if (!S_ISBLK(mode) && !S_ISREG(mode))
@@ -488,17 +481,14 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
 	 */
 	if (!same_thread_group(req->tctx->task, current) || !in_task())
 		return false;
+
+	io_meta_restore(io, &rw->kiocb);
+	iov_iter_restore(&io->iter, &io->iter_state);
 	return true;
-}
 #else
-static void io_resubmit_prep(struct io_kiocb *req)
-{
-}
-static bool io_rw_should_reissue(struct io_kiocb *req)
-{
 	return false;
-}
 #endif
+}
 
 static void io_req_end_write(struct io_kiocb *req)
 {
@@ -525,22 +515,16 @@ static void io_req_io_end(struct io_kiocb *req)
 	}
 }
 
-static bool __io_complete_rw_common(struct io_kiocb *req, long res)
+static void __io_complete_rw_common(struct io_kiocb *req, long res)
 {
-	if (unlikely(res != req->cqe.res)) {
-		if (res == -EAGAIN && io_rw_should_reissue(req)) {
-			/*
-			 * Reissue will start accounting again, finish the
-			 * current cycle.
-			 */
-			io_req_io_end(req);
-			req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
-			return true;
-		}
+	if (res == req->cqe.res)
+		return;
+	if (res == -EAGAIN && io_rw_should_reissue(req)) {
+		req->flags |= REQ_F_REISSUE | REQ_F_BL_NO_RECYCLE;
+	} else {
 		req_set_fail(req);
 		req->cqe.res = res;
 	}
-	return false;
 }
 
 static inline int io_fixup_rw_res(struct io_kiocb *req, long res)
@@ -583,8 +567,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res)
 	struct io_kiocb *req = cmd_to_io_kiocb(rw);
 
 	if (!kiocb->dio_complete || !(kiocb->ki_flags & IOCB_DIO_CALLER_COMP)) {
-		if (__io_complete_rw_common(req, res))
-			return;
+		__io_complete_rw_common(req, res);
 		io_req_set_res(req, io_fixup_rw_res(req, res), 0);
 	}
 	req->io_task_work.func = io_req_rw_complete;
@@ -646,26 +629,19 @@ static int kiocb_done(struct io_kiocb *req, ssize_t ret,
 	if (ret >= 0 && req->flags & REQ_F_CUR_POS)
 		req->file->f_pos = rw->kiocb.ki_pos;
 	if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) {
-		if (!__io_complete_rw_common(req, ret)) {
-			/*
-			 * Safe to call io_end from here as we're inline
-			 * from the submission path.
-			 */
-			io_req_io_end(req);
-			io_req_set_res(req, final_ret,
-				       io_put_kbuf(req, ret, issue_flags));
-			io_req_rw_cleanup(req, issue_flags);
-			return IOU_OK;
-		}
+		__io_complete_rw_common(req, ret);
+		/*
+		 * Safe to call io_end from here as we're inline
+		 * from the submission path.
+		 */
+		io_req_io_end(req);
+		io_req_set_res(req, final_ret, io_put_kbuf(req, ret, issue_flags));
+		io_req_rw_cleanup(req, issue_flags);
+		return IOU_OK;
 	} else {
 		io_rw_done(&rw->kiocb, ret);
 	}
 
-	if (req->flags & REQ_F_REISSUE) {
-		req->flags &= ~REQ_F_REISSUE;
-		io_resubmit_prep(req);
-		return -EAGAIN;
-	}
 	return IOU_ISSUE_SKIP_COMPLETE;
 }
 
@@ -944,8 +920,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 	if (ret == -EOPNOTSUPP && force_nonblock)
 		ret = -EAGAIN;
 
-	if (ret == -EAGAIN || (req->flags & REQ_F_REISSUE)) {
-		req->flags &= ~REQ_F_REISSUE;
+	if (ret == -EAGAIN) {
 		/* If we can poll, just do that. */
 		if (io_file_can_poll(req))
 			return -EAGAIN;
@@ -1154,11 +1129,6 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	else
 		ret2 = -EINVAL;
 
-	if (req->flags & REQ_F_REISSUE) {
-		req->flags &= ~REQ_F_REISSUE;
-		ret2 = -EAGAIN;
-	}
-
 	/*
 	 * Raw bdev writes will return -EOPNOTSUPP for IOCB_NOWAIT. Just
 	 * retry them without IOCB_NOWAIT.
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] io_uring/rw: don't gate retry on completion context
  2025-01-09 18:15 [PATCHSET for-next 0/3] Fix read/write -EAGAIN failure cases Jens Axboe
  2025-01-09 18:15 ` [PATCH 1/3] io_uring/rw: use io_rw_recycle() from cleanup path Jens Axboe
  2025-01-09 18:15 ` [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time Jens Axboe
@ 2025-01-09 18:15 ` Jens Axboe
  2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2025-01-09 18:15 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Haeuptle, Michael

nvme multipath reports that they see spurious -EAGAIN bubbling back to
userspace, which is caused by how they handle retries internally through
a kworker. However, any data that needs preserving or importing for
a read/write request has always been done so at prep time, and we can
sanely skip this check.

Reported-by: "Haeuptle, Michael" <[email protected]>
Link: https://lore.kernel.org/io-uring/DS7PR84MB31105C2C63CFA47BE8CBD6EE95102@DS7PR84MB3110.NAMPRD84.PROD.OUTLOOK.COM/
Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/rw.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/io_uring/rw.c b/io_uring/rw.c
index c52c0515f0a2..ee5d38db9b48 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -475,12 +475,6 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
 	 */
 	if (percpu_ref_is_dying(&ctx->refs))
 		return false;
-	/*
-	 * Play it safe and assume not safe to re-import and reissue if we're
-	 * not in the original thread group (or in task context).
-	 */
-	if (!same_thread_group(req->tctx->task, current) || !in_task())
-		return false;
 
 	io_meta_restore(io, &rw->kiocb);
 	iov_iter_restore(&io->iter, &io->iter_state);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-09 18:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-09 18:15 [PATCHSET for-next 0/3] Fix read/write -EAGAIN failure cases Jens Axboe
2025-01-09 18:15 ` [PATCH 1/3] io_uring/rw: use io_rw_recycle() from cleanup path Jens Axboe
2025-01-09 18:15 ` [PATCH 2/3] io_uring/rw: handle -EAGAIN retry at IO completion time Jens Axboe
2025-01-09 18:15 ` [PATCH 3/3] io_uring/rw: don't gate retry on completion context Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox