public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 5.12 0/8] remove task file notes
@ 2021-03-04 13:59 Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 1/8] io_uring: cancel-match based on flags Pavel Begunkov
                   ` (7 more replies)
  0 siblings, 8 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

1-2 fixes a race for io-wq, can be considered separately

3-7 introduce a mapping from ctx to all tctx, and using that removes
file notes, i.e. taking a io_uring file note previously stored in
task->io_uring->xa. It's needed because we don't free io_uring ctx
until all submitters die/exec, and it became worse after killing
->flush().
There are rough corner in a form of not behaving nicely, I'll address
in follow-up patches.

8/8 just a very useful warning.

Pavel Begunkov (8):
  io_uring: cancel-match based on flags
  io_uring: reliably cancel linked timeouts
  io_uring: make del_task_file more forgiving
  io_uring: introduce ctx to tctx back map
  io_uring: do ctx initiated file note removal
  io_uring: don't take task ring-file notes
  io_uring: index io_uring->xa by ctx not file
  io_uring: warn when ring exit takes too long

 fs/io_uring.c            | 146 ++++++++++++++++++++++++++++++---------
 include/linux/io_uring.h |   2 +-
 2 files changed, 113 insertions(+), 35 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/8] io_uring: cancel-match based on flags
  2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 2/8] io_uring: reliably cancel linked timeouts Pavel Begunkov
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Instead of going into request internals, like checking req->file->f_op,
do match them based on REQ_F_INFLIGHT, it's set only when we want it to
be reliably cancelled.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7cf8d4a99d91..c340d7ba40a2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -703,7 +703,7 @@ enum {
 
 	/* fail rest of links */
 	REQ_F_FAIL_LINK		= BIT(REQ_F_FAIL_LINK_BIT),
-	/* on inflight list */
+	/* on inflight list, should be cancelled and waited on exit reliably */
 	REQ_F_INFLIGHT		= BIT(REQ_F_INFLIGHT_BIT),
 	/* read/write uses file position */
 	REQ_F_CUR_POS		= BIT(REQ_F_CUR_POS_BIT),
@@ -1069,7 +1069,7 @@ static bool io_match_task(struct io_kiocb *head,
 		return true;
 
 	io_for_each_link(req, head) {
-		if (req->file && req->file->f_op == &io_uring_fops)
+		if (req->flags & REQ_F_INFLIGHT)
 			return true;
 		if (req->task->files == files)
 			return true;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/8] io_uring: reliably cancel linked timeouts
  2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 1/8] io_uring: cancel-match based on flags Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 3/8] io_uring: make del_task_file more forgiving Pavel Begunkov
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Linked timeouts are fired asynchronously (i.e. soft-irq), and use
generic cancellation paths to do its stuff, including poking into io-wq.
The problem is that it's racy to access tctx->io_wq, as
io_uring_task_cancel() and others may be happening at this exact moment.
Mark linked timeouts with REQ_F_INLIFGHT for now, making sure there are
no timeouts before io-wq destraction.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index c340d7ba40a2..f82f46c604d7 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5500,6 +5500,7 @@ static int io_timeout_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 
 	data->mode = io_translate_timeout_mode(flags);
 	hrtimer_init(&data->timer, CLOCK_MONOTONIC, data->mode);
+	io_req_track_inflight(req);
 	return 0;
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/8] io_uring: make del_task_file more forgiving
  2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 1/8] io_uring: cancel-match based on flags Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 2/8] io_uring: reliably cancel linked timeouts Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 4/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Rework io_uring_del_task_file(), so it accepts an index to delete, and
it's not necessarily have to be in the ->xa. Infer file from xa_erase()
to maintain a single origin of truth.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f82f46c604d7..55e1ec4c0099 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8772,15 +8772,18 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 /*
  * Remove this io_uring_file -> task mapping.
  */
-static void io_uring_del_task_file(struct file *file)
+static void io_uring_del_task_file(unsigned long index)
 {
 	struct io_uring_task *tctx = current->io_uring;
+	struct file *file;
+
+	file = xa_erase(&tctx->xa, index);
+	if (!file)
+		return;
 
 	if (tctx->last == file)
 		tctx->last = NULL;
-	file = xa_erase(&tctx->xa, (unsigned long)file);
-	if (file)
-		fput(file);
+	fput(file);
 }
 
 static void io_uring_clean_tctx(struct io_uring_task *tctx)
@@ -8789,7 +8792,7 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
 	unsigned long index;
 
 	xa_for_each(&tctx->xa, index, file)
-		io_uring_del_task_file(file);
+		io_uring_del_task_file(index);
 	if (tctx->io_wq) {
 		io_wq_put_and_exit(tctx->io_wq);
 		tctx->io_wq = NULL;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/8] io_uring: introduce ctx to tctx back map
  2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-03-04 13:59 ` [PATCH 3/8] io_uring: make del_task_file more forgiving Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 5/8] io_uring: do ctx initiated file note removal Pavel Begunkov
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

For each pair tcxt-ctx create an object and chain it into ctx, so we
have a way to traverse all tctx that are using current ctx. Preparation
patch, will be used later.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 68 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 48 insertions(+), 20 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 55e1ec4c0099..8d632fa61799 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -454,6 +454,7 @@ struct io_ring_ctx {
 
 	/* Keep this last, we don't need it for the fast path */
 	struct work_struct		exit_work;
+	struct list_head		tctx_list;
 };
 
 /*
@@ -805,6 +806,13 @@ struct io_kiocb {
 	struct io_wq_work		work;
 };
 
+struct io_tctx_node {
+	struct list_head	ctx_node;
+	struct task_struct	*task;
+	struct file		*file;
+	struct io_ring_ctx	*ctx;
+};
+
 struct io_defer_entry {
 	struct list_head	list;
 	struct io_kiocb		*req;
@@ -1144,6 +1152,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	INIT_LIST_HEAD(&ctx->rsrc_ref_list);
 	INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
 	init_llist_head(&ctx->rsrc_put_llist);
+	INIT_LIST_HEAD(&ctx->tctx_list);
 	INIT_LIST_HEAD(&ctx->submit_state.comp.free_list);
 	INIT_LIST_HEAD(&ctx->submit_state.comp.locked_free_list);
 	return ctx;
@@ -8735,6 +8744,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 {
 	struct io_uring_task *tctx = current->io_uring;
+	struct io_tctx_node *node;
 	int ret;
 
 	if (unlikely(!tctx)) {
@@ -8747,13 +8757,25 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 		void *old = xa_load(&tctx->xa, (unsigned long)file);
 
 		if (!old) {
+			node = kmalloc(sizeof(*node), GFP_KERNEL);
+			if (!node)
+				return -ENOMEM;
+			node->ctx = ctx;
+			node->file = file;
+			node->task = current;
+
 			get_file(file);
 			ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
-						file, GFP_KERNEL));
+						node, GFP_KERNEL));
 			if (ret) {
 				fput(file);
+				kfree(node);
 				return ret;
 			}
+
+			mutex_lock(&ctx->uring_lock);
+			list_add(&node->ctx_node, &ctx->tctx_list);
+			mutex_unlock(&ctx->uring_lock);
 		}
 		tctx->last = file;
 	}
@@ -8775,23 +8797,31 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 static void io_uring_del_task_file(unsigned long index)
 {
 	struct io_uring_task *tctx = current->io_uring;
-	struct file *file;
+	struct io_tctx_node *node;
 
-	file = xa_erase(&tctx->xa, index);
-	if (!file)
+	node = xa_erase(&tctx->xa, index);
+	if (!node)
 		return;
 
-	if (tctx->last == file)
+	WARN_ON_ONCE(current != node->task);
+	WARN_ON_ONCE(list_empty(&node->ctx_node));
+
+	mutex_lock(&node->ctx->uring_lock);
+	list_del(&node->ctx_node);
+	mutex_unlock(&node->ctx->uring_lock);
+
+	if (tctx->last == node->file)
 		tctx->last = NULL;
-	fput(file);
+	fput(node->file);
+	kfree(node);
 }
 
 static void io_uring_clean_tctx(struct io_uring_task *tctx)
 {
-	struct file *file;
+	struct io_tctx_node *node;
 	unsigned long index;
 
-	xa_for_each(&tctx->xa, index, file)
+	xa_for_each(&tctx->xa, index, node)
 		io_uring_del_task_file(index);
 	if (tctx->io_wq) {
 		io_wq_put_and_exit(tctx->io_wq);
@@ -8802,13 +8832,13 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
 void __io_uring_files_cancel(struct files_struct *files)
 {
 	struct io_uring_task *tctx = current->io_uring;
-	struct file *file;
+	struct io_tctx_node *node;
 	unsigned long index;
 
 	/* make sure overflow events are dropped */
 	atomic_inc(&tctx->in_idle);
-	xa_for_each(&tctx->xa, index, file)
-		io_uring_cancel_task_requests(file->private_data, files);
+	xa_for_each(&tctx->xa, index, node)
+		io_uring_cancel_task_requests(node->ctx, files);
 	atomic_dec(&tctx->in_idle);
 
 	if (files)
@@ -8874,11 +8904,11 @@ void __io_uring_task_cancel(void)
 	atomic_inc(&tctx->in_idle);
 
 	if (tctx->sqpoll) {
-		struct file *file;
+		struct io_tctx_node *node;
 		unsigned long index;
 
-		xa_for_each(&tctx->xa, index, file)
-			io_uring_cancel_sqpoll(file->private_data);
+		xa_for_each(&tctx->xa, index, node)
+			io_uring_cancel_sqpoll(node->ctx);
 	}
 
 	do {
@@ -8910,18 +8940,16 @@ void __io_uring_task_cancel(void)
 void __io_uring_unshare(void)
 {
 	struct io_uring_task *tctx = current->io_uring;
-	struct file *file;
+	struct io_tctx_node *node;
 	unsigned long index;
 
 	io_wq_unshare(tctx->io_wq);
 	if (!tctx->sqpoll)
 		return;
 
-	xa_for_each(&tctx->xa, index, file) {
-		struct io_ring_ctx *ctx = file->private_data;
-
-		if (ctx->sq_data)
-			io_sq_thread_stop(ctx->sq_data);
+	xa_for_each(&tctx->xa, index, node) {
+		if (node->ctx->sq_data)
+			io_sq_thread_stop(node->ctx->sq_data);
 	}
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/8] io_uring: do ctx initiated file note removal
  2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-03-04 13:59 ` [PATCH 4/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 6/8] io_uring: don't take task ring-file notes Pavel Begunkov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Another preparation patch. When full quiesce is done on ctx exit, use
task_work infra to remove corresponding to the ctx io_uring->xa entries.
For that we use the back tctx map. Also use ->in_idle to prevent
removing it while we traversing ->xa on cancellation, just ignore it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 46 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8d632fa61799..d88f77f4bb7e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -987,6 +987,7 @@ static const struct io_op_def io_op_defs[] = {
 	[IORING_OP_UNLINKAT] = {},
 };
 
+static void io_uring_del_task_file(unsigned long index);
 static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 					 struct task_struct *task,
 					 struct files_struct *files);
@@ -8521,10 +8522,33 @@ static bool io_run_ctx_fallback(struct io_ring_ctx *ctx)
 	return executed;
 }
 
+struct io_tctx_exit {
+	struct callback_head		task_work;
+	struct completion		completion;
+	unsigned long			index;
+};
+
+static void io_tctx_exit_cb(struct callback_head *cb)
+{
+	struct io_uring_task *tctx = current->io_uring;
+	struct io_tctx_exit *work;
+
+	work = container_of(cb, struct io_tctx_exit, task_work);
+	/*
+	 * When @in_idle, we're in cancellation and it's racy to remove the
+	 * node. It'll be removed by the end of cancellation, just ignore it.
+	 */
+	if (!atomic_read(&tctx->in_idle))
+		io_uring_del_task_file(work->index);
+	complete(&work->completion);
+}
+
 static void io_ring_exit_work(struct work_struct *work)
 {
-	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
-					       exit_work);
+	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
+	struct io_tctx_exit exit;
+	struct io_tctx_node *node;
+	int ret;
 
 	/*
 	 * If we're doing polled IO and end up having requests being
@@ -8535,6 +8559,26 @@ static void io_ring_exit_work(struct work_struct *work)
 	do {
 		io_uring_try_cancel_requests(ctx, NULL, NULL);
 	} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
+
+	mutex_lock(&ctx->uring_lock);
+	while (!list_empty(&ctx->tctx_list)) {
+		node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
+					ctx_node);
+		exit.index = (unsigned long)node->file;
+		init_completion(&exit.completion);
+		init_task_work(&exit.task_work, io_tctx_exit_cb);
+		ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
+		if (WARN_ON_ONCE(ret))
+			continue;
+		wake_up_process(node->task);
+
+		mutex_unlock(&ctx->uring_lock);
+		wait_for_completion(&exit.completion);
+		cond_resched();
+		mutex_lock(&ctx->uring_lock);
+	}
+	mutex_unlock(&ctx->uring_lock);
+
 	io_ring_ctx_free(ctx);
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/8] io_uring: don't take task ring-file notes
  2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (4 preceding siblings ...)
  2021-03-04 13:59 ` [PATCH 5/8] io_uring: do ctx initiated file note removal Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 7/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 8/8] io_uring: warn when ring exit takes too long Pavel Begunkov
  7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

With ->flush() gone we're now leaving all uring file notes until the
task dies/execs, so the ctx will not be freed until all tasks that have
ever submit a request die. It was nicer with flush but not much, we
could have locked as described ctx in many cases.

Now we guarantee that ctx outlives all tctx in a sense that
io_ring_exit_work() waits for all tctxs to drop their corresponding
enties in ->xa, and ctx won't go away until then. Hence, additional
io_uring file reference (a.k.a. task file notes) are not needed anymore.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index d88f77f4bb7e..da93ae7b3aef 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8808,11 +8808,9 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 			node->file = file;
 			node->task = current;
 
-			get_file(file);
 			ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
 						node, GFP_KERNEL));
 			if (ret) {
-				fput(file);
 				kfree(node);
 				return ret;
 			}
@@ -8843,6 +8841,8 @@ static void io_uring_del_task_file(unsigned long index)
 	struct io_uring_task *tctx = current->io_uring;
 	struct io_tctx_node *node;
 
+	if (!tctx)
+		return;
 	node = xa_erase(&tctx->xa, index);
 	if (!node)
 		return;
@@ -8856,7 +8856,6 @@ static void io_uring_del_task_file(unsigned long index)
 
 	if (tctx->last == node->file)
 		tctx->last = NULL;
-	fput(node->file);
 	kfree(node);
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 7/8] io_uring: index io_uring->xa by ctx not file
  2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (5 preceding siblings ...)
  2021-03-04 13:59 ` [PATCH 6/8] io_uring: don't take task ring-file notes Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
  2021-03-04 13:59 ` [PATCH 8/8] io_uring: warn when ring exit takes too long Pavel Begunkov
  7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

We don't use task file notes anymore, and no need left in indexing
task->io_uring->xa by file, and replace it with ctx. It's better
design-wise, especially since we keep a dangling file, and so have to
keep an eye on not dereferencing it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c            | 24 +++++++++++-------------
 include/linux/io_uring.h |  2 +-
 2 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index da93ae7b3aef..46a2417187ff 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -809,7 +809,6 @@ struct io_kiocb {
 struct io_tctx_node {
 	struct list_head	ctx_node;
 	struct task_struct	*task;
-	struct file		*file;
 	struct io_ring_ctx	*ctx;
 };
 
@@ -8525,7 +8524,7 @@ static bool io_run_ctx_fallback(struct io_ring_ctx *ctx)
 struct io_tctx_exit {
 	struct callback_head		task_work;
 	struct completion		completion;
-	unsigned long			index;
+	struct io_ring_ctx		*ctx;
 };
 
 static void io_tctx_exit_cb(struct callback_head *cb)
@@ -8539,7 +8538,7 @@ static void io_tctx_exit_cb(struct callback_head *cb)
 	 * node. It'll be removed by the end of cancellation, just ignore it.
 	 */
 	if (!atomic_read(&tctx->in_idle))
-		io_uring_del_task_file(work->index);
+		io_uring_del_task_file((unsigned long)work->ctx);
 	complete(&work->completion);
 }
 
@@ -8564,7 +8563,7 @@ static void io_ring_exit_work(struct work_struct *work)
 	while (!list_empty(&ctx->tctx_list)) {
 		node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
 					ctx_node);
-		exit.index = (unsigned long)node->file;
+		exit.ctx = ctx;
 		init_completion(&exit.completion);
 		init_task_work(&exit.task_work, io_tctx_exit_cb);
 		ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
@@ -8785,7 +8784,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 /*
  * Note that this task has used io_uring. We use it for cancelation purposes.
  */
-static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
+static int io_uring_add_task_file(struct io_ring_ctx *ctx)
 {
 	struct io_uring_task *tctx = current->io_uring;
 	struct io_tctx_node *node;
@@ -8797,18 +8796,17 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 			return ret;
 		tctx = current->io_uring;
 	}
-	if (tctx->last != file) {
-		void *old = xa_load(&tctx->xa, (unsigned long)file);
+	if (tctx->last != ctx) {
+		void *old = xa_load(&tctx->xa, (unsigned long)ctx);
 
 		if (!old) {
 			node = kmalloc(sizeof(*node), GFP_KERNEL);
 			if (!node)
 				return -ENOMEM;
 			node->ctx = ctx;
-			node->file = file;
 			node->task = current;
 
-			ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
+			ret = xa_err(xa_store(&tctx->xa, (unsigned long)ctx,
 						node, GFP_KERNEL));
 			if (ret) {
 				kfree(node);
@@ -8819,7 +8817,7 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
 			list_add(&node->ctx_node, &ctx->tctx_list);
 			mutex_unlock(&ctx->uring_lock);
 		}
-		tctx->last = file;
+		tctx->last = ctx;
 	}
 
 	/*
@@ -8854,7 +8852,7 @@ static void io_uring_del_task_file(unsigned long index)
 	list_del(&node->ctx_node);
 	mutex_unlock(&node->ctx->uring_lock);
 
-	if (tctx->last == node->file)
+	if (tctx->last == node->ctx)
 		tctx->last = NULL;
 	kfree(node);
 }
@@ -9176,7 +9174,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		}
 		submitted = to_submit;
 	} else if (to_submit) {
-		ret = io_uring_add_task_file(ctx, f.file);
+		ret = io_uring_add_task_file(ctx);
 		if (unlikely(ret))
 			goto out;
 		mutex_lock(&ctx->uring_lock);
@@ -9385,7 +9383,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
 	if (fd < 0)
 		return fd;
 
-	ret = io_uring_add_task_file(ctx, file);
+	ret = io_uring_add_task_file(ctx);
 	if (ret) {
 		put_unused_fd(fd);
 		return ret;
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 796e0d7d186d..0e7dfd9e82c0 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -18,7 +18,7 @@ struct io_uring_task {
 	/* submission side */
 	struct xarray		xa;
 	struct wait_queue_head	wait;
-	struct file		*last;
+	void			*last;
 	void			*io_wq;
 	struct percpu_counter	inflight;
 	atomic_t		in_idle;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 8/8] io_uring: warn when ring exit takes too long
  2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
                   ` (6 preceding siblings ...)
  2021-03-04 13:59 ` [PATCH 7/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
  2021-03-05  4:25   ` Jens Axboe
  7 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
  To: Jens Axboe, io-uring

We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
monitor whether removal hang or not. Add WARN_ONCE to catch hangs.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 46a2417187ff..b4820f6261fe 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8547,6 +8547,8 @@ static void io_ring_exit_work(struct work_struct *work)
 	struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
 	struct io_tctx_exit exit;
 	struct io_tctx_node *node;
+	const u64 bias_ms = MSEC_PER_SEC * 60 * 5;
+	ktime_t start = ktime_get();
 	int ret;
 
 	/*
@@ -8557,10 +8559,13 @@ static void io_ring_exit_work(struct work_struct *work)
 	 */
 	do {
 		io_uring_try_cancel_requests(ctx, NULL, NULL);
+		WARN_ON_ONCE(ktime_ms_delta(ktime_get(), start) > bias_ms);
 	} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
 
 	mutex_lock(&ctx->uring_lock);
 	while (!list_empty(&ctx->tctx_list)) {
+		WARN_ON_ONCE(ktime_ms_delta(ktime_get(), start) > bias_ms);
+
 		node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
 					ctx_node);
 		exit.ctx = ctx;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 8/8] io_uring: warn when ring exit takes too long
  2021-03-04 13:59 ` [PATCH 8/8] io_uring: warn when ring exit takes too long Pavel Begunkov
@ 2021-03-05  4:25   ` Jens Axboe
  2021-03-05 11:15     ` Pavel Begunkov
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2021-03-05  4:25 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/4/21 6:59 AM, Pavel Begunkov wrote:
> We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
> monitor whether removal hang or not. Add WARN_ONCE to catch hangs.

Minor nit, but I'd just use jiffies for this. Ala:

unsigned long timeout = jiffies + 60 * HZ;

if (time_after(jiffies, timeout))
    complain();

That's a well known idiom, and we don't need better precision than that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 8/8] io_uring: warn when ring exit takes too long
  2021-03-05  4:25   ` Jens Axboe
@ 2021-03-05 11:15     ` Pavel Begunkov
  0 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-05 11:15 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 05/03/2021 04:25, Jens Axboe wrote:
> On 3/4/21 6:59 AM, Pavel Begunkov wrote:
>> We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
>> monitor whether removal hang or not. Add WARN_ONCE to catch hangs.
> 
> Minor nit, but I'd just use jiffies for this. Ala:
> 
> unsigned long timeout = jiffies + 60 * HZ;
> 
> if (time_after(jiffies, timeout))
>     complain();
> 
> That's a well known idiom, and we don't need better precision than that.

And also looks much nicer, thanks

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-03-05 11:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
2021-03-04 13:59 ` [PATCH 1/8] io_uring: cancel-match based on flags Pavel Begunkov
2021-03-04 13:59 ` [PATCH 2/8] io_uring: reliably cancel linked timeouts Pavel Begunkov
2021-03-04 13:59 ` [PATCH 3/8] io_uring: make del_task_file more forgiving Pavel Begunkov
2021-03-04 13:59 ` [PATCH 4/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
2021-03-04 13:59 ` [PATCH 5/8] io_uring: do ctx initiated file note removal Pavel Begunkov
2021-03-04 13:59 ` [PATCH 6/8] io_uring: don't take task ring-file notes Pavel Begunkov
2021-03-04 13:59 ` [PATCH 7/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
2021-03-04 13:59 ` [PATCH 8/8] io_uring: warn when ring exit takes too long Pavel Begunkov
2021-03-05  4:25   ` Jens Axboe
2021-03-05 11:15     ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox