* [PATCH 1/2] io_uring: break out of iowq iopoll on teardown
2023-09-07 12:50 [PATCH 0/2] for-next fixes Pavel Begunkov
@ 2023-09-07 12:50 ` Pavel Begunkov
2023-09-07 12:50 ` [PATCH 2/2] io_uring: fix unprotected iopoll overflow Pavel Begunkov
2023-09-07 15:02 ` [PATCH 0/2] for-next fixes Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Pavel Begunkov @ 2023-09-07 12:50 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
io-wq will retry iopoll even when it failed with -EAGAIN. If that
races with task exit, which sets TIF_NOTIFY_SIGNAL for all its workers,
such workers might potentially infinitely spin retrying iopoll again and
again and each time failing on some allocation / waiting / etc. Don't
keep spinning if io-wq is dying.
Fixes: 561fb04a6a225 ("io_uring: replace workqueue usage with io-wq")
Cc: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io-wq.c | 10 ++++++++++
io_uring/io-wq.h | 1 +
io_uring/io_uring.c | 2 ++
3 files changed, 13 insertions(+)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 62f345587df5..1ecc8c748768 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -174,6 +174,16 @@ static void io_worker_ref_put(struct io_wq *wq)
complete(&wq->worker_done);
}
+bool io_wq_worker_stopped(void)
+{
+ struct io_worker *worker = current->worker_private;
+
+ if (WARN_ON_ONCE(!io_wq_current_is_worker()))
+ return true;
+
+ return test_bit(IO_WQ_BIT_EXIT, &worker->wq->state);
+}
+
static void io_worker_cancel_cb(struct io_worker *worker)
{
struct io_wq_acct *acct = io_wq_get_acct(worker);
diff --git a/io_uring/io-wq.h b/io_uring/io-wq.h
index 06d9ca90c577..2b2a6406dd8e 100644
--- a/io_uring/io-wq.h
+++ b/io_uring/io-wq.h
@@ -52,6 +52,7 @@ void io_wq_hash_work(struct io_wq_work *work, void *val);
int io_wq_cpu_affinity(struct io_uring_task *tctx, cpumask_var_t mask);
int io_wq_max_workers(struct io_wq *wq, int *new_count);
+bool io_wq_worker_stopped(void);
static inline bool io_wq_is_hashed(struct io_wq_work *work)
{
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 88599852af82..4674203c1cac 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1942,6 +1942,8 @@ void io_wq_submit_work(struct io_wq_work *work)
if (!needs_poll) {
if (!(req->ctx->flags & IORING_SETUP_IOPOLL))
break;
+ if (io_wq_worker_stopped())
+ break;
cond_resched();
continue;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] io_uring: fix unprotected iopoll overflow
2023-09-07 12:50 [PATCH 0/2] for-next fixes Pavel Begunkov
2023-09-07 12:50 ` [PATCH 1/2] io_uring: break out of iowq iopoll on teardown Pavel Begunkov
@ 2023-09-07 12:50 ` Pavel Begunkov
2023-09-07 15:02 ` [PATCH 0/2] for-next fixes Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Pavel Begunkov @ 2023-09-07 12:50 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
[ 71.490669] WARNING: CPU: 3 PID: 17070 at io_uring/io_uring.c:769
io_cqring_event_overflow+0x47b/0x6b0
[ 71.498381] Call Trace:
[ 71.498590] <TASK>
[ 71.501858] io_req_cqe_overflow+0x105/0x1e0
[ 71.502194] __io_submit_flush_completions+0x9f9/0x1090
[ 71.503537] io_submit_sqes+0xebd/0x1f00
[ 71.503879] __do_sys_io_uring_enter+0x8c5/0x2380
[ 71.507360] do_syscall_64+0x39/0x80
We decoupled CQ locking from ->task_complete but haven't fixed up places
forcing locking for CQ overflows.
Fixes: ec26c225f06f5 ("io_uring: merge iopoll and normal completion paths")
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4674203c1cac..6cce8948bddf 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -883,7 +883,7 @@ static void __io_flush_post_cqes(struct io_ring_ctx *ctx)
struct io_uring_cqe *cqe = &ctx->completion_cqes[i];
if (!io_fill_cqe_aux(ctx, cqe->user_data, cqe->res, cqe->flags)) {
- if (ctx->task_complete) {
+ if (ctx->lockless_cq) {
spin_lock(&ctx->completion_lock);
io_cqring_event_overflow(ctx, cqe->user_data,
cqe->res, cqe->flags, 0, 0);
@@ -1541,7 +1541,7 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
if (!(req->flags & REQ_F_CQE_SKIP) &&
unlikely(!io_fill_cqe_req(ctx, req))) {
- if (ctx->task_complete) {
+ if (ctx->lockless_cq) {
spin_lock(&ctx->completion_lock);
io_req_cqe_overflow(req);
spin_unlock(&ctx->completion_lock);
--
2.41.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] for-next fixes
2023-09-07 12:50 [PATCH 0/2] for-next fixes Pavel Begunkov
2023-09-07 12:50 ` [PATCH 1/2] io_uring: break out of iowq iopoll on teardown Pavel Begunkov
2023-09-07 12:50 ` [PATCH 2/2] io_uring: fix unprotected iopoll overflow Pavel Begunkov
@ 2023-09-07 15:02 ` Jens Axboe
2 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2023-09-07 15:02 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 9/7/23 6:50 AM, Pavel Begunkov wrote:
> Patch 1 fixes a potential iopoll/iowq live lock
> Patch 2 fixes a recent problem in overflow locking
>
> Pavel Begunkov (2):
> io_uring: break out of iowq iopoll on teardown
> io_uring: fix unprotected iopoll overflow
>
> io_uring/io-wq.c | 10 ++++++++++
> io_uring/io-wq.h | 1 +
> io_uring/io_uring.c | 6 ++++--
> 3 files changed, 15 insertions(+), 2 deletions(-)
Thanks - applied manually, as lore is lagging for hours again...
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread