public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH V3] io_uring: fix IO hang in io_wq_put_and_exit from do_exit()
@ 2023-09-08  9:30 Ming Lei
  2023-09-08 13:49 ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Ming Lei @ 2023-09-08  9:30 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-block
  Cc: Ming Lei, David Howells, Pavel Begunkov, Chengming Zhou

io_wq_put_and_exit() is called from do_exit(), but all FIXED_FILE requests
in io_wq aren't canceled in io_uring_cancel_generic() called from do_exit().
Meantime io_wq IO code path may share resource with normal iopoll code
path.

So if any HIPRI request is submitted via io_wq, this request may not get
resource for moving on, given iopoll isn't possible in io_wq_put_and_exit().

The issue can be triggered when terminating 't/io_uring -n4 /dev/nullb0'
with default null_blk parameters.

Fix it by the following approaches:

- switch to IO_URING_F_NONBLOCK for submitting POLLED IO from io_wq, so
that requests can be canceled when submitting from exiting io_wq

- reap completed events before exiting io wq, so that these completed
requests won't hold resource and prevent other contexts from moving on

Closes: https://lore.kernel.org/linux-block/[email protected]/
Reported-by: David Howells <[email protected]>
Cc: Pavel Begunkov <[email protected]>
Cc: Chengming Zhou <[email protected]>
Signed-off-by: Ming Lei <[email protected]>
---
V3:
	- take new approach and fix regression on thread_exit in liburing
	  tests
	- pass liburing tests(make runtests)
V2:
	- avoid to mess up io_uring_cancel_generic() by adding one new
    helper for canceling io_wq requests

 io_uring/io_uring.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ad636954abae..95a3d31a1ef1 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1930,6 +1930,10 @@ void io_wq_submit_work(struct io_wq_work *work)
 		}
 	}
 
+	/* It is fragile to block POLLED IO, so switch to NON_BLOCK */
+	if ((req->ctx->flags & IORING_SETUP_IOPOLL) && def->iopoll_queue)
+		issue_flags |= IO_URING_F_NONBLOCK;
+
 	do {
 		ret = io_issue_sqe(req, issue_flags);
 		if (ret != -EAGAIN)
@@ -3363,6 +3367,12 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd)
 		finish_wait(&tctx->wait, &wait);
 	} while (1);
 
+	/*
+	 * Reap events from each ctx, otherwise these requests may take
+	 * resources and prevent other contexts from being moved on.
+	 */
+	xa_for_each(&tctx->xa, index, node)
+		io_iopoll_try_reap_events(node->ctx);
 	io_uring_clean_tctx(tctx);
 	if (cancel_all) {
 		/*
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-09-26 16:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-08  9:30 [PATCH V3] io_uring: fix IO hang in io_wq_put_and_exit from do_exit() Ming Lei
2023-09-08 13:49 ` Jens Axboe
2023-09-08 14:34   ` Ming Lei
2023-09-08 14:44     ` Jens Axboe
2023-09-08 15:25       ` Ming Lei
2023-09-15  7:04         ` Jason Wang
2023-09-25 21:17           ` Stefan Hajnoczi
2023-09-26  1:28             ` Ming Lei
2023-09-26 14:55               ` Stefan Hajnoczi
2023-09-08 15:46   ` Pavel Begunkov
2023-09-09  1:43     ` Ming Lei
2023-09-13 12:53       ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox