public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/2] Avoid spurious syzbot induced hung task panics
@ 2026-01-21 23:22 Jens Axboe
  2026-01-21 23:22 ` [PATCH 1/2] io_uring: add IO_URING_EXIT_WAIT_MAX definition Jens Axboe
  2026-01-21 23:22 ` [PATCH 2/2] io_uring/io-wq: don't trigger hung task for syzbot craziness Jens Axboe
  0 siblings, 2 replies; 3+ messages in thread
From: Jens Axboe @ 2026-01-21 23:22 UTC (permalink / raw)
  To: io-uring

Hi,

For details, see this saga:

https://lore.kernel.org/io-uring/68a2decc.050a0220.e29e5.0099.GAE@google.com/

where the tldr is that there's no real bug here, it's just syzbot doing
hundreds of 2GB /dev/msr* reads in a tiny vm and with a bunch of
debugging enabled. That leads to triggering the hung task detector when
we wait on io-wq workers to exit. I did queue a patch for 6.19 that
makes this less likely to occur, as it'll only run the very first of
the items before noticing the cancelation:

https://lore.kernel.org/io-uring/937c3e38-368e-43eb-9d7e-2dcc0697799f@kernel.dk/

but even that isn't quite enough due to how much syzbot overloads the
system.

This will still throw a WARN_ON_ONCE(), which perhaps should just be a
printk() of some sort as the trace isn't THAT interesting. But it will
avoid hitting the hung task timeout detector, which for syzbot leads
to a panic + reboot.

 io_uring/io-wq.c    | 22 +++++++++++++++++++++-
 io_uring/io_uring.c |  2 +-
 io_uring/io_uring.h |  6 ++++++
 3 files changed, 28 insertions(+), 2 deletions(-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] io_uring: add IO_URING_EXIT_WAIT_MAX definition
  2026-01-21 23:22 [PATCHSET 0/2] Avoid spurious syzbot induced hung task panics Jens Axboe
@ 2026-01-21 23:22 ` Jens Axboe
  2026-01-21 23:22 ` [PATCH 2/2] io_uring/io-wq: don't trigger hung task for syzbot craziness Jens Axboe
  1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2026-01-21 23:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Add the timeout we normally wait before complaining about things being
stuck waiting for cancelations to complete as a define, and use it in
io_ring_exit_work().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 2 +-
 io_uring/io_uring.h | 6 ++++++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index b7a077c11c21..8f01e8503a64 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2984,7 +2984,7 @@ static __cold void io_tctx_exit_cb(struct callback_head *cb)
 static __cold void io_ring_exit_work(struct work_struct *work)
 {
 	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
-	unsigned long timeout = jiffies + HZ * 60 * 5;
+	unsigned long timeout = jiffies + IO_URING_EXIT_WAIT_MAX;
 	unsigned long interval = HZ / 20;
 	struct io_tctx_exit exit;
 	struct io_tctx_node *node;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index a790c16854d3..db5350d3ca3f 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -89,6 +89,12 @@ struct io_ctx_config {
 			IOSQE_BUFFER_SELECT |\
 			IOSQE_CQE_SKIP_SUCCESS)
 
+/*
+ * Complaint timeout for io_uring cancelation exits, and for io-wq exit
+ * worker waiting.
+ */
+#define IO_URING_EXIT_WAIT_MAX	(HZ * 60 * 5)
+
 enum {
 	IOU_COMPLETE		= 0,
 
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] io_uring/io-wq: don't trigger hung task for syzbot craziness
  2026-01-21 23:22 [PATCHSET 0/2] Avoid spurious syzbot induced hung task panics Jens Axboe
  2026-01-21 23:22 ` [PATCH 1/2] io_uring: add IO_URING_EXIT_WAIT_MAX definition Jens Axboe
@ 2026-01-21 23:22 ` Jens Axboe
  1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2026-01-21 23:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Use the same trick that blk_io_schedule() does to avoid triggering the
hung task warning (and potential reboot/panic, depending on system
settings), and only wait for half the hung task timeout at the time.
If we exceed the default IO_URING_EXIT_WAIT_MAX period where we expect
things to certainly have finished unless there's a bug, then throw a
WARN_ON_ONCE() for that case.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io-wq.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 2fa7d3601edb..aa670909fece 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -17,6 +17,7 @@
 #include <linux/task_work.h>
 #include <linux/audit.h>
 #include <linux/mmu_context.h>
+#include <linux/sched/sysctl.h>
 #include <uapi/linux/io_uring.h>
 
 #include "io-wq.h"
@@ -1313,6 +1314,8 @@ static void io_wq_cancel_tw_create(struct io_wq *wq)
 
 static void io_wq_exit_workers(struct io_wq *wq)
 {
+	unsigned long timeout, warn_timeout;
+
 	if (!wq->task)
 		return;
 
@@ -1322,7 +1325,24 @@ static void io_wq_exit_workers(struct io_wq *wq)
 	io_wq_for_each_worker(wq, io_wq_worker_wake, NULL);
 	rcu_read_unlock();
 	io_worker_ref_put(wq);
-	wait_for_completion(&wq->worker_done);
+
+	/*
+	 * Shut up hung task complaint, see for example
+	 *
+	 * https://lore.kernel.org/all/696fc9e7.a70a0220.111c58.0006.GAE@google.com/
+	 *
+	 * where completely overloading the system with tons of long running
+	 * io-wq items can easily trigger the hung task timeout. Only sleep
+	 * uninterruptibly for half that time, and warn if we exceeded end
+	 * up waiting more than IO_URING_EXIT_WAIT_MAX.
+	 */
+	timeout = sysctl_hung_task_timeout_secs * HZ / 2;
+	warn_timeout = jiffies + IO_URING_EXIT_WAIT_MAX;
+	do {
+		if (wait_for_completion_timeout(&wq->worker_done, timeout))
+			break;
+		WARN_ON_ONCE(time_after(jiffies, warn_timeout));
+	} while (1);
 
 	spin_lock_irq(&wq->hash->wait.lock);
 	list_del_init(&wq->wait.entry);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-01-21 23:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-21 23:22 [PATCHSET 0/2] Avoid spurious syzbot induced hung task panics Jens Axboe
2026-01-21 23:22 ` [PATCH 1/2] io_uring: add IO_URING_EXIT_WAIT_MAX definition Jens Axboe
2026-01-21 23:22 ` [PATCH 2/2] io_uring/io-wq: don't trigger hung task for syzbot craziness Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox