public inbox for [email protected]
 help / color / mirror / Atom feed
* [RFC 5.12] io-wq: cancel unbounded works on io-wq destroy
@ 2021-04-02 16:52 Pavel Begunkov
  2021-04-03  2:19 ` Jens Axboe
  0 siblings, 1 reply; 2+ messages in thread
From: Pavel Begunkov @ 2021-04-02 16:52 UTC (permalink / raw)
  To: Jens Axboe, io-uring

[  491.222908] INFO: task thread-exit:2490 blocked for more than 122 seconds.
[  491.222957] Call Trace:
[  491.222967]  __schedule+0x36b/0x950
[  491.222985]  schedule+0x68/0xe0
[  491.222994]  schedule_timeout+0x209/0x2a0
[  491.223003]  ? tlb_flush_mmu+0x28/0x140
[  491.223013]  wait_for_completion+0x8b/0xf0
[  491.223023]  io_wq_destroy_manager+0x24/0x60
[  491.223037]  io_wq_put_and_exit+0x18/0x30
[  491.223045]  io_uring_clean_tctx+0x76/0xa0
[  491.223061]  __io_uring_files_cancel+0x1b9/0x2e0
[  491.223068]  ? blk_finish_plug+0x26/0x40
[  491.223085]  do_exit+0xc0/0xb40
[  491.223099]  ? syscall_trace_enter.isra.0+0x1a1/0x1e0
[  491.223109]  __x64_sys_exit+0x1b/0x20
[  491.223117]  do_syscall_64+0x38/0x50
[  491.223131]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  491.223177] INFO: task iou-mgr-2490:2491 blocked for more than 122 seconds.
[  491.223194] Call Trace:
[  491.223198]  __schedule+0x36b/0x950
[  491.223206]  ? pick_next_task_fair+0xcf/0x3e0
[  491.223218]  schedule+0x68/0xe0
[  491.223225]  schedule_timeout+0x209/0x2a0
[  491.223236]  wait_for_completion+0x8b/0xf0
[  491.223246]  io_wq_manager+0xf1/0x1d0
[  491.223255]  ? recalc_sigpending+0x1c/0x60
[  491.223265]  ? io_wq_cpu_online+0x40/0x40
[  491.223272]  ret_from_fork+0x22/0x30

Cancel all unbound works on exit, otherwise do_exit() ->
io_uring_files_cancel() may wait for io-wq destruction for long, e.g.
until somewhat sends a SIGKILL.

Suggested-by: Jens Axboe <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---

Not quite happy about it as it cancels pipes and sockets, but
is probably better than waiting.

 fs/io-wq.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/io-wq.c b/fs/io-wq.c
index 433c4d3c3c1c..e2ab569e47b9 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -702,6 +702,11 @@ static void io_wq_cancel_pending(struct io_wq *wq)
 		io_wqe_cancel_pending_work(wq->wqes[node], &match);
 }
 
+static bool io_wq_cancel_unbounded(struct io_wq_work *work, void *data)
+{
+	return work->flags & IO_WQ_WORK_UNBOUND;
+}
+
 /*
  * Manager thread. Tasked with creating new workers, if we need them.
  */
@@ -736,6 +741,8 @@ static int io_wq_manager(void *data)
 
 	if (atomic_dec_and_test(&wq->worker_refs))
 		complete(&wq->worker_done);
+
+	io_wq_cancel_cb(wq, io_wq_cancel_unbounded, NULL, true);
 	wait_for_completion(&wq->worker_done);
 
 	spin_lock_irq(&wq->hash->wait.lock);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [RFC 5.12] io-wq: cancel unbounded works on io-wq destroy
  2021-04-02 16:52 [RFC 5.12] io-wq: cancel unbounded works on io-wq destroy Pavel Begunkov
@ 2021-04-03  2:19 ` Jens Axboe
  0 siblings, 0 replies; 2+ messages in thread
From: Jens Axboe @ 2021-04-03  2:19 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 4/2/21 10:52 AM, Pavel Begunkov wrote:
> [  491.222908] INFO: task thread-exit:2490 blocked for more than 122 seconds.
> [  491.222957] Call Trace:
> [  491.222967]  __schedule+0x36b/0x950
> [  491.222985]  schedule+0x68/0xe0
> [  491.222994]  schedule_timeout+0x209/0x2a0
> [  491.223003]  ? tlb_flush_mmu+0x28/0x140
> [  491.223013]  wait_for_completion+0x8b/0xf0
> [  491.223023]  io_wq_destroy_manager+0x24/0x60
> [  491.223037]  io_wq_put_and_exit+0x18/0x30
> [  491.223045]  io_uring_clean_tctx+0x76/0xa0
> [  491.223061]  __io_uring_files_cancel+0x1b9/0x2e0
> [  491.223068]  ? blk_finish_plug+0x26/0x40
> [  491.223085]  do_exit+0xc0/0xb40
> [  491.223099]  ? syscall_trace_enter.isra.0+0x1a1/0x1e0
> [  491.223109]  __x64_sys_exit+0x1b/0x20
> [  491.223117]  do_syscall_64+0x38/0x50
> [  491.223131]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [  491.223177] INFO: task iou-mgr-2490:2491 blocked for more than 122 seconds.
> [  491.223194] Call Trace:
> [  491.223198]  __schedule+0x36b/0x950
> [  491.223206]  ? pick_next_task_fair+0xcf/0x3e0
> [  491.223218]  schedule+0x68/0xe0
> [  491.223225]  schedule_timeout+0x209/0x2a0
> [  491.223236]  wait_for_completion+0x8b/0xf0
> [  491.223246]  io_wq_manager+0xf1/0x1d0
> [  491.223255]  ? recalc_sigpending+0x1c/0x60
> [  491.223265]  ? io_wq_cpu_online+0x40/0x40
> [  491.223272]  ret_from_fork+0x22/0x30
> 
> Cancel all unbound works on exit, otherwise do_exit() ->
> io_uring_files_cancel() may wait for io-wq destruction for long, e.g.
> until somewhat sends a SIGKILL.
> 
> Suggested-by: Jens Axboe <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> 
> Not quite happy about it as it cancels pipes and sockets, but
> is probably better than waiting.

I don't think there's any other way, if it's not bounded execution,
we have to cancel it. The same thing would happen these requests
if they were not punted async. It's either this, or "re-parenting"
the requests, if the exiting task is part of a ring that belongs
to a parent.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-04-03  2:19 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-02 16:52 [RFC 5.12] io-wq: cancel unbounded works on io-wq destroy Pavel Begunkov
2021-04-03  2:19 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox