public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/4] sqpoll fixes
@ 2021-03-11 23:29 Pavel Begunkov
  2021-03-11 23:29 ` [PATCH 1/4] io_uring: cancel deferred requests in try_cancel Pavel Begunkov
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-11 23:29 UTC (permalink / raw)
  To: Jens Axboe, io-uring

1-3 are relatively easy. 2/3 is rather a cleanup, but it's better to
throw it out of the equation, makes life easier.

4/4 removes the park/unpark dance for cancellations, and fixes a couple
of issues (will send one test later).
Signals in io_sq_thread() add troubles adding more cases to think about
and care about races b/w dying SQPOLL task, dying ctxs and going away
userspace tasks.

Pavel Begunkov (4):
  io_uring: cancel deferred requests in try_cancel
  io_uring: remove useless ->startup completion
  io_uring: prevent racy sqd->thread checks
  io_uring: cancel sqpoll via task_work

 fs/io_uring.c | 195 +++++++++++++++++++++++++-------------------------
 1 file changed, 99 insertions(+), 96 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] io_uring: cancel deferred requests in try_cancel
  2021-03-11 23:29 [PATCH 0/4] sqpoll fixes Pavel Begunkov
@ 2021-03-11 23:29 ` Pavel Begunkov
  2021-03-11 23:29 ` [PATCH 2/4] io_uring: remove useless ->startup completion Pavel Begunkov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-11 23:29 UTC (permalink / raw)
  To: Jens Axboe, io-uring

As io_uring_cancel_files() and others let SQO to run between
io_uring_try_cancel_requests(), SQO may generate new deferred requests,
so it's safer to try to cancel them in it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 49f85f49e1c3..56f3d8f408c9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8577,11 +8577,11 @@ static bool io_cancel_task_cb(struct io_wq_work *work, void *data)
 	return ret;
 }
 
-static void io_cancel_defer_files(struct io_ring_ctx *ctx,
+static bool io_cancel_defer_files(struct io_ring_ctx *ctx,
 				  struct task_struct *task,
 				  struct files_struct *files)
 {
-	struct io_defer_entry *de = NULL;
+	struct io_defer_entry *de;
 	LIST_HEAD(list);
 
 	spin_lock_irq(&ctx->completion_lock);
@@ -8592,6 +8592,8 @@ static void io_cancel_defer_files(struct io_ring_ctx *ctx,
 		}
 	}
 	spin_unlock_irq(&ctx->completion_lock);
+	if (list_empty(&list))
+		return false;
 
 	while (!list_empty(&list)) {
 		de = list_first_entry(&list, struct io_defer_entry, list);
@@ -8601,6 +8603,7 @@ static void io_cancel_defer_files(struct io_ring_ctx *ctx,
 		io_req_complete(de->req, -ECANCELED);
 		kfree(de);
 	}
+	return true;
 }
 
 static bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
@@ -8666,6 +8669,7 @@ static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 			}
 		}
 
+		ret |= io_cancel_defer_files(ctx, task, files);
 		ret |= io_poll_remove_all(ctx, task, files);
 		ret |= io_kill_timeouts(ctx, task, files);
 		ret |= io_run_task_work();
@@ -8734,8 +8738,6 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 			atomic_inc(&task->io_uring->in_idle);
 	}
 
-	io_cancel_defer_files(ctx, task, files);
-
 	io_uring_cancel_files(ctx, task, files);
 	if (!files)
 		io_uring_try_cancel_requests(ctx, task, NULL);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] io_uring: remove useless ->startup completion
  2021-03-11 23:29 [PATCH 0/4] sqpoll fixes Pavel Begunkov
  2021-03-11 23:29 ` [PATCH 1/4] io_uring: cancel deferred requests in try_cancel Pavel Begunkov
@ 2021-03-11 23:29 ` Pavel Begunkov
  2021-03-11 23:29 ` [PATCH 3/4] io_uring: prevent racy sqd->thread checks Pavel Begunkov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-11 23:29 UTC (permalink / raw)
  To: Jens Axboe, io-uring

We always do complete(&sqd->startup) almost right after sqd->thread
creation, either in the success path or in io_sq_thread_finish(). It's
specifically created not started for us to be able to set some stuff
like sqd->thread and io_uring_alloc_task_context() before following
right after wake_up_new_task().

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 56f3d8f408c9..6349374d715d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -272,7 +272,6 @@ struct io_sq_data {
 	pid_t			task_tgid;
 
 	unsigned long		state;
-	struct completion	startup;
 	struct completion	exited;
 };
 
@@ -6656,8 +6655,6 @@ static int io_sq_thread(void *data)
 		set_cpus_allowed_ptr(current, cpu_online_mask);
 	current->flags |= PF_NO_SETAFFINITY;
 
-	wait_for_completion(&sqd->startup);
-
 	down_read(&sqd->rw_lock);
 
 	while (!test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state)) {
@@ -7080,7 +7077,6 @@ static void io_sq_thread_finish(struct io_ring_ctx *ctx)
 	struct io_sq_data *sqd = ctx->sq_data;
 
 	if (sqd) {
-		complete(&sqd->startup);
 		io_sq_thread_park(sqd);
 		list_del(&ctx->sqd_list);
 		io_sqd_update_thread_idle(sqd);
@@ -7144,7 +7140,6 @@ static struct io_sq_data *io_get_sq_data(struct io_uring_params *p)
 	INIT_LIST_HEAD(&sqd->ctx_list);
 	init_rwsem(&sqd->rw_lock);
 	init_waitqueue_head(&sqd->wait);
-	init_completion(&sqd->startup);
 	init_completion(&sqd->exited);
 	return sqd;
 }
@@ -7856,7 +7851,6 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 		wake_up_new_task(tsk);
 		if (ret)
 			goto err;
-		complete(&sqd->startup);
 	} else if (p->flags & IORING_SETUP_SQ_AFF) {
 		/* Can't have SQ_AFF without SQPOLL */
 		ret = -EINVAL;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] io_uring: prevent racy sqd->thread checks
  2021-03-11 23:29 [PATCH 0/4] sqpoll fixes Pavel Begunkov
  2021-03-11 23:29 ` [PATCH 1/4] io_uring: cancel deferred requests in try_cancel Pavel Begunkov
  2021-03-11 23:29 ` [PATCH 2/4] io_uring: remove useless ->startup completion Pavel Begunkov
@ 2021-03-11 23:29 ` Pavel Begunkov
  2021-03-11 23:29 ` [PATCH 4/4] io_uring: cancel sqpoll via task_work Pavel Begunkov
  2021-03-12 18:19 ` [PATCH 0/4] sqpoll fixes Jens Axboe
  4 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-11 23:29 UTC (permalink / raw)
  To: Jens Axboe, io-uring

SQPOLL thread to which we're trying to attach may be going away, it's
not nice but a more serious problem is if io_sq_offload_create() sees
sqd->thread==NULL, and tries to init it with a new thread. There are
tons of ways it can be exploited or fail.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6349374d715d..cdec59510433 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7119,14 +7119,18 @@ static struct io_sq_data *io_attach_sq_data(struct io_uring_params *p)
 	return sqd;
 }
 
-static struct io_sq_data *io_get_sq_data(struct io_uring_params *p)
+static struct io_sq_data *io_get_sq_data(struct io_uring_params *p,
+					 bool *attached)
 {
 	struct io_sq_data *sqd;
 
+	*attached = false;
 	if (p->flags & IORING_SETUP_ATTACH_WQ) {
 		sqd = io_attach_sq_data(p);
-		if (!IS_ERR(sqd))
+		if (!IS_ERR(sqd)) {
+			*attached = true;
 			return sqd;
+		}
 		/* fall through for EPERM case, setup new sqd/task */
 		if (PTR_ERR(sqd) != -EPERM)
 			return sqd;
@@ -7799,12 +7803,13 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		struct task_struct *tsk;
 		struct io_sq_data *sqd;
+		bool attached;
 
 		ret = -EPERM;
 		if (!capable(CAP_SYS_ADMIN) && !capable(CAP_SYS_NICE))
 			goto err;
 
-		sqd = io_get_sq_data(p);
+		sqd = io_get_sq_data(p, &attached);
 		if (IS_ERR(sqd)) {
 			ret = PTR_ERR(sqd);
 			goto err;
@@ -7816,13 +7821,24 @@ static int io_sq_offload_create(struct io_ring_ctx *ctx,
 		if (!ctx->sq_thread_idle)
 			ctx->sq_thread_idle = HZ;
 
+		ret = 0;
 		io_sq_thread_park(sqd);
-		list_add(&ctx->sqd_list, &sqd->ctx_list);
-		io_sqd_update_thread_idle(sqd);
+		/* don't attach to a dying SQPOLL thread, would be racy */
+		if (attached && !sqd->thread) {
+			ret = -ENXIO;
+		} else {
+			list_add(&ctx->sqd_list, &sqd->ctx_list);
+			io_sqd_update_thread_idle(sqd);
+		}
 		io_sq_thread_unpark(sqd);
 
-		if (sqd->thread)
+		if (ret < 0) {
+			io_put_sq_data(sqd);
+			ctx->sq_data = NULL;
+			return ret;
+		} else if (attached) {
 			return 0;
+		}
 
 		if (p->flags & IORING_SETUP_SQ_AFF) {
 			int cpu = p->sq_thread_cpu;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] io_uring: cancel sqpoll via task_work
  2021-03-11 23:29 [PATCH 0/4] sqpoll fixes Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-03-11 23:29 ` [PATCH 3/4] io_uring: prevent racy sqd->thread checks Pavel Begunkov
@ 2021-03-11 23:29 ` Pavel Begunkov
  2021-03-12 19:35   ` Pavel Begunkov
  2021-03-12 18:19 ` [PATCH 0/4] sqpoll fixes Jens Axboe
  4 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-11 23:29 UTC (permalink / raw)
  To: Jens Axboe, io-uring

1) The first problem is io_uring_cancel_sqpoll() ->
io_uring_cancel_task_requests() basically doing park(); park(); and so
hanging.

2) Another one is more subtle, when the master task is doing cancellations,
but SQPOLL task submits in-between the end of the cancellation but
before finish() requests taking a ref to the ctx, and so eternally
locking it up.

3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
same io_uring_cancel_sqpoll() from the owner task, they race for
tctx->wait events. And there probably more of them.

Instead do SQPOLL cancellations from within SQPOLL task context via
task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
park()/unpark() during cancellation, which is ugly, subtle and anyway
doesn't allow to do io_run_task_work() properly.

io_uring_cancel_sqpoll() is called only from SQPOLL task context and
under sqd locking, so all parking is removed from there. And so,
io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
SQPOLL task, and that spare us from some headache.

Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
which is not used anymore.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 151 ++++++++++++++++++++++++--------------------------
 1 file changed, 71 insertions(+), 80 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index cdec59510433..70286b393c0e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6665,6 +6665,7 @@ static int io_sq_thread(void *data)
 			up_read(&sqd->rw_lock);
 			cond_resched();
 			down_read(&sqd->rw_lock);
+			io_run_task_work();
 			timeout = jiffies + sqd->sq_thread_idle;
 			continue;
 		}
@@ -6720,18 +6721,22 @@ static int io_sq_thread(void *data)
 		finish_wait(&sqd->wait, &wait);
 		timeout = jiffies + sqd->sq_thread_idle;
 	}
-
-	list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
-		io_uring_cancel_sqpoll(ctx);
 	up_read(&sqd->rw_lock);
-
+	down_write(&sqd->rw_lock);
+	/*
+	 * someone may have parked and added a cancellation task_work, run
+	 * it first because we don't want it in io_uring_cancel_sqpoll()
+	 */
 	io_run_task_work();
 
-	down_write(&sqd->rw_lock);
+	list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
+		io_uring_cancel_sqpoll(ctx);
 	sqd->thread = NULL;
 	list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
 		io_ring_set_wakeup_flag(ctx);
 	up_write(&sqd->rw_lock);
+
+	io_run_task_work();
 	complete(&sqd->exited);
 	do_exit(0);
 }
@@ -7033,8 +7038,8 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
 static void io_sq_thread_unpark(struct io_sq_data *sqd)
 	__releases(&sqd->rw_lock)
 {
-	if (sqd->thread == current)
-		return;
+	WARN_ON_ONCE(sqd->thread == current);
+
 	clear_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
 	up_write(&sqd->rw_lock);
 }
@@ -7042,8 +7047,8 @@ static void io_sq_thread_unpark(struct io_sq_data *sqd)
 static void io_sq_thread_park(struct io_sq_data *sqd)
 	__acquires(&sqd->rw_lock)
 {
-	if (sqd->thread == current)
-		return;
+	WARN_ON_ONCE(sqd->thread == current);
+
 	set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
 	down_write(&sqd->rw_lock);
 	/* set again for consistency, in case concurrent parks are happening */
@@ -7054,8 +7059,8 @@ static void io_sq_thread_park(struct io_sq_data *sqd)
 
 static void io_sq_thread_stop(struct io_sq_data *sqd)
 {
-	if (test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state))
-		return;
+	WARN_ON_ONCE(sqd->thread == current);
+
 	down_write(&sqd->rw_lock);
 	set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
 	if (sqd->thread)
@@ -7078,7 +7083,7 @@ static void io_sq_thread_finish(struct io_ring_ctx *ctx)
 
 	if (sqd) {
 		io_sq_thread_park(sqd);
-		list_del(&ctx->sqd_list);
+		list_del_init(&ctx->sqd_list);
 		io_sqd_update_thread_idle(sqd);
 		io_sq_thread_unpark(sqd);
 
@@ -7760,7 +7765,6 @@ static int io_uring_alloc_task_context(struct task_struct *task,
 	init_waitqueue_head(&tctx->wait);
 	tctx->last = NULL;
 	atomic_set(&tctx->in_idle, 0);
-	tctx->sqpoll = false;
 	task->io_uring = tctx;
 	spin_lock_init(&tctx->task_lock);
 	INIT_WQ_LIST(&tctx->task_list);
@@ -8719,43 +8723,12 @@ static void io_uring_cancel_files(struct io_ring_ctx *ctx,
 
 		io_uring_try_cancel_requests(ctx, task, files);
 
-		if (ctx->sq_data)
-			io_sq_thread_unpark(ctx->sq_data);
 		prepare_to_wait(&task->io_uring->wait, &wait,
 				TASK_UNINTERRUPTIBLE);
 		if (inflight == io_uring_count_inflight(ctx, task, files))
 			schedule();
 		finish_wait(&task->io_uring->wait, &wait);
-		if (ctx->sq_data)
-			io_sq_thread_park(ctx->sq_data);
-	}
-}
-
-/*
- * We need to iteratively cancel requests, in case a request has dependent
- * hard links. These persist even for failure of cancelations, hence keep
- * looping until none are found.
- */
-static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
-					  struct files_struct *files)
-{
-	struct task_struct *task = current;
-
-	if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) {
-		io_sq_thread_park(ctx->sq_data);
-		task = ctx->sq_data->thread;
-		if (task)
-			atomic_inc(&task->io_uring->in_idle);
 	}
-
-	io_uring_cancel_files(ctx, task, files);
-	if (!files)
-		io_uring_try_cancel_requests(ctx, task, NULL);
-
-	if (task)
-		atomic_dec(&task->io_uring->in_idle);
-	if (ctx->sq_data)
-		io_sq_thread_unpark(ctx->sq_data);
 }
 
 /*
@@ -8796,15 +8769,6 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx)
 		}
 		tctx->last = ctx;
 	}
-
-	/*
-	 * This is race safe in that the task itself is doing this, hence it
-	 * cannot be going through the exit/cancel paths at the same time.
-	 * This cannot be modified while exit/cancel is running.
-	 */
-	if (!tctx->sqpoll && (ctx->flags & IORING_SETUP_SQPOLL))
-		tctx->sqpoll = true;
-
 	return 0;
 }
 
@@ -8847,6 +8811,44 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
 	}
 }
 
+static s64 tctx_inflight(struct io_uring_task *tctx)
+{
+	return percpu_counter_sum(&tctx->inflight);
+}
+
+static void io_sqpoll_cancel_cb(struct callback_head *cb)
+{
+	struct io_tctx_exit *work = container_of(cb, struct io_tctx_exit, task_work);
+	struct io_ring_ctx *ctx = work->ctx;
+	struct io_sq_data *sqd = ctx->sq_data;
+
+	if (sqd->thread)
+		io_uring_cancel_sqpoll(ctx);
+	complete(&work->completion);
+}
+
+static void io_sqpoll_cancel_sync(struct io_ring_ctx *ctx)
+{
+	struct io_sq_data *sqd = ctx->sq_data;
+	struct io_tctx_exit work = { .ctx = ctx, };
+	struct task_struct *task;
+
+	io_sq_thread_park(sqd);
+	list_del_init(&ctx->sqd_list);
+	io_sqd_update_thread_idle(sqd);
+	task = sqd->thread;
+	if (task) {
+		init_completion(&work.completion);
+		init_task_work(&work.task_work, io_sqpoll_cancel_cb);
+		WARN_ON_ONCE(task_work_add(task, &work.task_work, TWA_SIGNAL));
+		wake_up_process(task);
+	}
+	io_sq_thread_unpark(sqd);
+
+	if (task)
+		wait_for_completion(&work.completion);
+}
+
 void __io_uring_files_cancel(struct files_struct *files)
 {
 	struct io_uring_task *tctx = current->io_uring;
@@ -8855,41 +8857,40 @@ void __io_uring_files_cancel(struct files_struct *files)
 
 	/* make sure overflow events are dropped */
 	atomic_inc(&tctx->in_idle);
-	xa_for_each(&tctx->xa, index, node)
-		io_uring_cancel_task_requests(node->ctx, files);
+	xa_for_each(&tctx->xa, index, node) {
+		struct io_ring_ctx *ctx = node->ctx;
+
+		if (ctx->sq_data) {
+			io_sqpoll_cancel_sync(ctx);
+			continue;
+		}
+		io_uring_cancel_files(ctx, current, files);
+		if (!files)
+			io_uring_try_cancel_requests(ctx, current, NULL);
+	}
 	atomic_dec(&tctx->in_idle);
 
 	if (files)
 		io_uring_clean_tctx(tctx);
 }
 
-static s64 tctx_inflight(struct io_uring_task *tctx)
-{
-	return percpu_counter_sum(&tctx->inflight);
-}
-
+/* should only be called by SQPOLL task */
 static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
 {
 	struct io_sq_data *sqd = ctx->sq_data;
-	struct io_uring_task *tctx;
+	struct io_uring_task *tctx = current->io_uring;
 	s64 inflight;
 	DEFINE_WAIT(wait);
 
-	if (!sqd)
-		return;
-	io_sq_thread_park(sqd);
-	if (!sqd->thread || !sqd->thread->io_uring) {
-		io_sq_thread_unpark(sqd);
-		return;
-	}
-	tctx = ctx->sq_data->thread->io_uring;
+	WARN_ON_ONCE(!sqd || ctx->sq_data->thread != current);
+
 	atomic_inc(&tctx->in_idle);
 	do {
 		/* read completions before cancelations */
 		inflight = tctx_inflight(tctx);
 		if (!inflight)
 			break;
-		io_uring_cancel_task_requests(ctx, NULL);
+		io_uring_try_cancel_requests(ctx, current, NULL);
 
 		prepare_to_wait(&tctx->wait, &wait, TASK_UNINTERRUPTIBLE);
 		/*
@@ -8902,7 +8903,6 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx)
 		finish_wait(&tctx->wait, &wait);
 	} while (1);
 	atomic_dec(&tctx->in_idle);
-	io_sq_thread_unpark(sqd);
 }
 
 /*
@@ -8917,15 +8917,6 @@ void __io_uring_task_cancel(void)
 
 	/* make sure overflow events are dropped */
 	atomic_inc(&tctx->in_idle);
-
-	if (tctx->sqpoll) {
-		struct io_tctx_node *node;
-		unsigned long index;
-
-		xa_for_each(&tctx->xa, index, node)
-			io_uring_cancel_sqpoll(node->ctx);
-	}
-
 	do {
 		/* read completions before cancelations */
 		inflight = tctx_inflight(tctx);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/4] sqpoll fixes
  2021-03-11 23:29 [PATCH 0/4] sqpoll fixes Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-03-11 23:29 ` [PATCH 4/4] io_uring: cancel sqpoll via task_work Pavel Begunkov
@ 2021-03-12 18:19 ` Jens Axboe
  4 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2021-03-12 18:19 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/11/21 4:29 PM, Pavel Begunkov wrote:
> 1-3 are relatively easy. 2/3 is rather a cleanup, but it's better to
> throw it out of the equation, makes life easier.
> 
> 4/4 removes the park/unpark dance for cancellations, and fixes a couple
> of issues (will send one test later).
> Signals in io_sq_thread() add troubles adding more cases to think about
> and care about races b/w dying SQPOLL task, dying ctxs and going away
> userspace tasks.
> 
> Pavel Begunkov (4):
>   io_uring: cancel deferred requests in try_cancel
>   io_uring: remove useless ->startup completion
>   io_uring: prevent racy sqd->thread checks
>   io_uring: cancel sqpoll via task_work
> 
>  fs/io_uring.c | 195 +++++++++++++++++++++++++-------------------------
>  1 file changed, 99 insertions(+), 96 deletions(-)

Added for 5.12, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] io_uring: cancel sqpoll via task_work
  2021-03-11 23:29 ` [PATCH 4/4] io_uring: cancel sqpoll via task_work Pavel Begunkov
@ 2021-03-12 19:35   ` Pavel Begunkov
  2021-03-12 19:40     ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-12 19:35 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 11/03/2021 23:29, Pavel Begunkov wrote:
> 1) The first problem is io_uring_cancel_sqpoll() ->
> io_uring_cancel_task_requests() basically doing park(); park(); and so
> hanging.
> 
> 2) Another one is more subtle, when the master task is doing cancellations,
> but SQPOLL task submits in-between the end of the cancellation but
> before finish() requests taking a ref to the ctx, and so eternally
> locking it up.
> 
> 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
> same io_uring_cancel_sqpoll() from the owner task, they race for
> tctx->wait events. And there probably more of them.
> 
> Instead do SQPOLL cancellations from within SQPOLL task context via
> task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
> park()/unpark() during cancellation, which is ugly, subtle and anyway
> doesn't allow to do io_run_task_work() properly.> 
> io_uring_cancel_sqpoll() is called only from SQPOLL task context and
> under sqd locking, so all parking is removed from there. And so,
> io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
> SQPOLL task, and that spare us from some headache.
> 
> Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
> which is not used anymore.


Looks, the chunk below somehow slipped from the patch. Not important
for 5.12, but can can be folded anyway

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 9761a0ec9f95..c24c62b47745 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -22,7 +22,6 @@ struct io_uring_task {
 	void			*io_wq;
 	struct percpu_counter	inflight;
 	atomic_t		in_idle;
-	bool			sqpoll;
 
 	spinlock_t		task_lock;
 	struct io_wq_work_list	task_list;


-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] io_uring: cancel sqpoll via task_work
  2021-03-12 19:35   ` Pavel Begunkov
@ 2021-03-12 19:40     ` Jens Axboe
  2021-03-12 21:41       ` Pavel Begunkov
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2021-03-12 19:40 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/12/21 12:35 PM, Pavel Begunkov wrote:
> On 11/03/2021 23:29, Pavel Begunkov wrote:
>> 1) The first problem is io_uring_cancel_sqpoll() ->
>> io_uring_cancel_task_requests() basically doing park(); park(); and so
>> hanging.
>>
>> 2) Another one is more subtle, when the master task is doing cancellations,
>> but SQPOLL task submits in-between the end of the cancellation but
>> before finish() requests taking a ref to the ctx, and so eternally
>> locking it up.
>>
>> 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
>> same io_uring_cancel_sqpoll() from the owner task, they race for
>> tctx->wait events. And there probably more of them.
>>
>> Instead do SQPOLL cancellations from within SQPOLL task context via
>> task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
>> park()/unpark() during cancellation, which is ugly, subtle and anyway
>> doesn't allow to do io_run_task_work() properly.> 
>> io_uring_cancel_sqpoll() is called only from SQPOLL task context and
>> under sqd locking, so all parking is removed from there. And so,
>> io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
>> SQPOLL task, and that spare us from some headache.
>>
>> Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
>> which is not used anymore.
> 
> 
> Looks, the chunk below somehow slipped from the patch. Not important
> for 5.12, but can can be folded anyway
> 
> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
> index 9761a0ec9f95..c24c62b47745 100644
> --- a/include/linux/io_uring.h
> +++ b/include/linux/io_uring.h
> @@ -22,7 +22,6 @@ struct io_uring_task {
>  	void			*io_wq;
>  	struct percpu_counter	inflight;
>  	atomic_t		in_idle;
> -	bool			sqpoll;
>  
>  	spinlock_t		task_lock;
>  	struct io_wq_work_list	task_list;

Let's do it as a separate patch instead.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] io_uring: cancel sqpoll via task_work
  2021-03-12 19:40     ` Jens Axboe
@ 2021-03-12 21:41       ` Pavel Begunkov
  2021-03-13  2:44         ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-12 21:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 12/03/2021 19:40, Jens Axboe wrote:
> On 3/12/21 12:35 PM, Pavel Begunkov wrote:
>> On 11/03/2021 23:29, Pavel Begunkov wrote:
>>> 1) The first problem is io_uring_cancel_sqpoll() ->
>>> io_uring_cancel_task_requests() basically doing park(); park(); and so
>>> hanging.
>>>
>>> 2) Another one is more subtle, when the master task is doing cancellations,
>>> but SQPOLL task submits in-between the end of the cancellation but
>>> before finish() requests taking a ref to the ctx, and so eternally
>>> locking it up.
>>>
>>> 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
>>> same io_uring_cancel_sqpoll() from the owner task, they race for
>>> tctx->wait events. And there probably more of them.
>>>
>>> Instead do SQPOLL cancellations from within SQPOLL task context via
>>> task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
>>> park()/unpark() during cancellation, which is ugly, subtle and anyway
>>> doesn't allow to do io_run_task_work() properly.> 
>>> io_uring_cancel_sqpoll() is called only from SQPOLL task context and
>>> under sqd locking, so all parking is removed from there. And so,
>>> io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
>>> SQPOLL task, and that spare us from some headache.
>>>
>>> Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
>>> which is not used anymore.
>>
>>
>> Looks, the chunk below somehow slipped from the patch. Not important
>> for 5.12, but can can be folded anyway
>>
>> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
>> index 9761a0ec9f95..c24c62b47745 100644
>> --- a/include/linux/io_uring.h
>> +++ b/include/linux/io_uring.h
>> @@ -22,7 +22,6 @@ struct io_uring_task {
>>  	void			*io_wq;
>>  	struct percpu_counter	inflight;
>>  	atomic_t		in_idle;
>> -	bool			sqpoll;
>>  
>>  	spinlock_t		task_lock;
>>  	struct io_wq_work_list	task_list;
> 
> Let's do it as a separate patch instead.

Ok, I'll send it for-5.13 when it's appropriate.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 4/4] io_uring: cancel sqpoll via task_work
  2021-03-12 21:41       ` Pavel Begunkov
@ 2021-03-13  2:44         ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2021-03-13  2:44 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/12/21 2:41 PM, Pavel Begunkov wrote:
> On 12/03/2021 19:40, Jens Axboe wrote:
>> On 3/12/21 12:35 PM, Pavel Begunkov wrote:
>>> On 11/03/2021 23:29, Pavel Begunkov wrote:
>>>> 1) The first problem is io_uring_cancel_sqpoll() ->
>>>> io_uring_cancel_task_requests() basically doing park(); park(); and so
>>>> hanging.
>>>>
>>>> 2) Another one is more subtle, when the master task is doing cancellations,
>>>> but SQPOLL task submits in-between the end of the cancellation but
>>>> before finish() requests taking a ref to the ctx, and so eternally
>>>> locking it up.
>>>>
>>>> 3) Yet another is a dying SQPOLL task doing io_uring_cancel_sqpoll() and
>>>> same io_uring_cancel_sqpoll() from the owner task, they race for
>>>> tctx->wait events. And there probably more of them.
>>>>
>>>> Instead do SQPOLL cancellations from within SQPOLL task context via
>>>> task_work, see io_sqpoll_cancel_sync(). With that we don't need temporal
>>>> park()/unpark() during cancellation, which is ugly, subtle and anyway
>>>> doesn't allow to do io_run_task_work() properly.> 
>>>> io_uring_cancel_sqpoll() is called only from SQPOLL task context and
>>>> under sqd locking, so all parking is removed from there. And so,
>>>> io_sq_thread_[un]park() and io_sq_thread_stop() are not used now by
>>>> SQPOLL task, and that spare us from some headache.
>>>>
>>>> Also remove ctx->sqd_list early to avoid 2). And kill tctx->sqpoll,
>>>> which is not used anymore.
>>>
>>>
>>> Looks, the chunk below somehow slipped from the patch. Not important
>>> for 5.12, but can can be folded anyway
>>>
>>> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
>>> index 9761a0ec9f95..c24c62b47745 100644
>>> --- a/include/linux/io_uring.h
>>> +++ b/include/linux/io_uring.h
>>> @@ -22,7 +22,6 @@ struct io_uring_task {
>>>  	void			*io_wq;
>>>  	struct percpu_counter	inflight;
>>>  	atomic_t		in_idle;
>>> -	bool			sqpoll;
>>>  
>>>  	spinlock_t		task_lock;
>>>  	struct io_wq_work_list	task_list;
>>
>> Let's do it as a separate patch instead.
> 
> Ok, I'll send it for-5.13 when it's appropriate.

Yeah that's fine, obviously no rush. I'll rebase for-5.13/io_uring
when -rc3 is out.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-03-13  2:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 23:29 [PATCH 0/4] sqpoll fixes Pavel Begunkov
2021-03-11 23:29 ` [PATCH 1/4] io_uring: cancel deferred requests in try_cancel Pavel Begunkov
2021-03-11 23:29 ` [PATCH 2/4] io_uring: remove useless ->startup completion Pavel Begunkov
2021-03-11 23:29 ` [PATCH 3/4] io_uring: prevent racy sqd->thread checks Pavel Begunkov
2021-03-11 23:29 ` [PATCH 4/4] io_uring: cancel sqpoll via task_work Pavel Begunkov
2021-03-12 19:35   ` Pavel Begunkov
2021-03-12 19:40     ` Jens Axboe
2021-03-12 21:41       ` Pavel Begunkov
2021-03-13  2:44         ` Jens Axboe
2021-03-12 18:19 ` [PATCH 0/4] sqpoll fixes Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox