* [PATCH 5.12 0/8] remove task file notes
@ 2021-03-04 13:59 Pavel Begunkov
2021-03-04 13:59 ` [PATCH 1/8] io_uring: cancel-match based on flags Pavel Begunkov
` (7 more replies)
0 siblings, 8 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
1-2 fixes a race for io-wq, can be considered separately
3-7 introduce a mapping from ctx to all tctx, and using that removes
file notes, i.e. taking a io_uring file note previously stored in
task->io_uring->xa. It's needed because we don't free io_uring ctx
until all submitters die/exec, and it became worse after killing
->flush().
There are rough corner in a form of not behaving nicely, I'll address
in follow-up patches.
8/8 just a very useful warning.
Pavel Begunkov (8):
io_uring: cancel-match based on flags
io_uring: reliably cancel linked timeouts
io_uring: make del_task_file more forgiving
io_uring: introduce ctx to tctx back map
io_uring: do ctx initiated file note removal
io_uring: don't take task ring-file notes
io_uring: index io_uring->xa by ctx not file
io_uring: warn when ring exit takes too long
fs/io_uring.c | 146 ++++++++++++++++++++++++++++++---------
include/linux/io_uring.h | 2 +-
2 files changed, 113 insertions(+), 35 deletions(-)
--
2.24.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/8] io_uring: cancel-match based on flags
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
2021-03-04 13:59 ` [PATCH 2/8] io_uring: reliably cancel linked timeouts Pavel Begunkov
` (6 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Instead of going into request internals, like checking req->file->f_op,
do match them based on REQ_F_INFLIGHT, it's set only when we want it to
be reliably cancelled.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7cf8d4a99d91..c340d7ba40a2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -703,7 +703,7 @@ enum {
/* fail rest of links */
REQ_F_FAIL_LINK = BIT(REQ_F_FAIL_LINK_BIT),
- /* on inflight list */
+ /* on inflight list, should be cancelled and waited on exit reliably */
REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT),
/* read/write uses file position */
REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT),
@@ -1069,7 +1069,7 @@ static bool io_match_task(struct io_kiocb *head,
return true;
io_for_each_link(req, head) {
- if (req->file && req->file->f_op == &io_uring_fops)
+ if (req->flags & REQ_F_INFLIGHT)
return true;
if (req->task->files == files)
return true;
--
2.24.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/8] io_uring: reliably cancel linked timeouts
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
2021-03-04 13:59 ` [PATCH 1/8] io_uring: cancel-match based on flags Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
2021-03-04 13:59 ` [PATCH 3/8] io_uring: make del_task_file more forgiving Pavel Begunkov
` (5 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Linked timeouts are fired asynchronously (i.e. soft-irq), and use
generic cancellation paths to do its stuff, including poking into io-wq.
The problem is that it's racy to access tctx->io_wq, as
io_uring_task_cancel() and others may be happening at this exact moment.
Mark linked timeouts with REQ_F_INLIFGHT for now, making sure there are
no timeouts before io-wq destraction.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index c340d7ba40a2..f82f46c604d7 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5500,6 +5500,7 @@ static int io_timeout_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe,
data->mode = io_translate_timeout_mode(flags);
hrtimer_init(&data->timer, CLOCK_MONOTONIC, data->mode);
+ io_req_track_inflight(req);
return 0;
}
--
2.24.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/8] io_uring: make del_task_file more forgiving
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
2021-03-04 13:59 ` [PATCH 1/8] io_uring: cancel-match based on flags Pavel Begunkov
2021-03-04 13:59 ` [PATCH 2/8] io_uring: reliably cancel linked timeouts Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
2021-03-04 13:59 ` [PATCH 4/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
` (4 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Rework io_uring_del_task_file(), so it accepts an index to delete, and
it's not necessarily have to be in the ->xa. Infer file from xa_erase()
to maintain a single origin of truth.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f82f46c604d7..55e1ec4c0099 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8772,15 +8772,18 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
/*
* Remove this io_uring_file -> task mapping.
*/
-static void io_uring_del_task_file(struct file *file)
+static void io_uring_del_task_file(unsigned long index)
{
struct io_uring_task *tctx = current->io_uring;
+ struct file *file;
+
+ file = xa_erase(&tctx->xa, index);
+ if (!file)
+ return;
if (tctx->last == file)
tctx->last = NULL;
- file = xa_erase(&tctx->xa, (unsigned long)file);
- if (file)
- fput(file);
+ fput(file);
}
static void io_uring_clean_tctx(struct io_uring_task *tctx)
@@ -8789,7 +8792,7 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
unsigned long index;
xa_for_each(&tctx->xa, index, file)
- io_uring_del_task_file(file);
+ io_uring_del_task_file(index);
if (tctx->io_wq) {
io_wq_put_and_exit(tctx->io_wq);
tctx->io_wq = NULL;
--
2.24.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 4/8] io_uring: introduce ctx to tctx back map
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
` (2 preceding siblings ...)
2021-03-04 13:59 ` [PATCH 3/8] io_uring: make del_task_file more forgiving Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
2021-03-04 13:59 ` [PATCH 5/8] io_uring: do ctx initiated file note removal Pavel Begunkov
` (3 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
For each pair tcxt-ctx create an object and chain it into ctx, so we
have a way to traverse all tctx that are using current ctx. Preparation
patch, will be used later.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 68 ++++++++++++++++++++++++++++++++++++---------------
1 file changed, 48 insertions(+), 20 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 55e1ec4c0099..8d632fa61799 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -454,6 +454,7 @@ struct io_ring_ctx {
/* Keep this last, we don't need it for the fast path */
struct work_struct exit_work;
+ struct list_head tctx_list;
};
/*
@@ -805,6 +806,13 @@ struct io_kiocb {
struct io_wq_work work;
};
+struct io_tctx_node {
+ struct list_head ctx_node;
+ struct task_struct *task;
+ struct file *file;
+ struct io_ring_ctx *ctx;
+};
+
struct io_defer_entry {
struct list_head list;
struct io_kiocb *req;
@@ -1144,6 +1152,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_LIST_HEAD(&ctx->rsrc_ref_list);
INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
init_llist_head(&ctx->rsrc_put_llist);
+ INIT_LIST_HEAD(&ctx->tctx_list);
INIT_LIST_HEAD(&ctx->submit_state.comp.free_list);
INIT_LIST_HEAD(&ctx->submit_state.comp.locked_free_list);
return ctx;
@@ -8735,6 +8744,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
{
struct io_uring_task *tctx = current->io_uring;
+ struct io_tctx_node *node;
int ret;
if (unlikely(!tctx)) {
@@ -8747,13 +8757,25 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
void *old = xa_load(&tctx->xa, (unsigned long)file);
if (!old) {
+ node = kmalloc(sizeof(*node), GFP_KERNEL);
+ if (!node)
+ return -ENOMEM;
+ node->ctx = ctx;
+ node->file = file;
+ node->task = current;
+
get_file(file);
ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
- file, GFP_KERNEL));
+ node, GFP_KERNEL));
if (ret) {
fput(file);
+ kfree(node);
return ret;
}
+
+ mutex_lock(&ctx->uring_lock);
+ list_add(&node->ctx_node, &ctx->tctx_list);
+ mutex_unlock(&ctx->uring_lock);
}
tctx->last = file;
}
@@ -8775,23 +8797,31 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
static void io_uring_del_task_file(unsigned long index)
{
struct io_uring_task *tctx = current->io_uring;
- struct file *file;
+ struct io_tctx_node *node;
- file = xa_erase(&tctx->xa, index);
- if (!file)
+ node = xa_erase(&tctx->xa, index);
+ if (!node)
return;
- if (tctx->last == file)
+ WARN_ON_ONCE(current != node->task);
+ WARN_ON_ONCE(list_empty(&node->ctx_node));
+
+ mutex_lock(&node->ctx->uring_lock);
+ list_del(&node->ctx_node);
+ mutex_unlock(&node->ctx->uring_lock);
+
+ if (tctx->last == node->file)
tctx->last = NULL;
- fput(file);
+ fput(node->file);
+ kfree(node);
}
static void io_uring_clean_tctx(struct io_uring_task *tctx)
{
- struct file *file;
+ struct io_tctx_node *node;
unsigned long index;
- xa_for_each(&tctx->xa, index, file)
+ xa_for_each(&tctx->xa, index, node)
io_uring_del_task_file(index);
if (tctx->io_wq) {
io_wq_put_and_exit(tctx->io_wq);
@@ -8802,13 +8832,13 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
void __io_uring_files_cancel(struct files_struct *files)
{
struct io_uring_task *tctx = current->io_uring;
- struct file *file;
+ struct io_tctx_node *node;
unsigned long index;
/* make sure overflow events are dropped */
atomic_inc(&tctx->in_idle);
- xa_for_each(&tctx->xa, index, file)
- io_uring_cancel_task_requests(file->private_data, files);
+ xa_for_each(&tctx->xa, index, node)
+ io_uring_cancel_task_requests(node->ctx, files);
atomic_dec(&tctx->in_idle);
if (files)
@@ -8874,11 +8904,11 @@ void __io_uring_task_cancel(void)
atomic_inc(&tctx->in_idle);
if (tctx->sqpoll) {
- struct file *file;
+ struct io_tctx_node *node;
unsigned long index;
- xa_for_each(&tctx->xa, index, file)
- io_uring_cancel_sqpoll(file->private_data);
+ xa_for_each(&tctx->xa, index, node)
+ io_uring_cancel_sqpoll(node->ctx);
}
do {
@@ -8910,18 +8940,16 @@ void __io_uring_task_cancel(void)
void __io_uring_unshare(void)
{
struct io_uring_task *tctx = current->io_uring;
- struct file *file;
+ struct io_tctx_node *node;
unsigned long index;
io_wq_unshare(tctx->io_wq);
if (!tctx->sqpoll)
return;
- xa_for_each(&tctx->xa, index, file) {
- struct io_ring_ctx *ctx = file->private_data;
-
- if (ctx->sq_data)
- io_sq_thread_stop(ctx->sq_data);
+ xa_for_each(&tctx->xa, index, node) {
+ if (node->ctx->sq_data)
+ io_sq_thread_stop(node->ctx->sq_data);
}
}
--
2.24.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 5/8] io_uring: do ctx initiated file note removal
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
` (3 preceding siblings ...)
2021-03-04 13:59 ` [PATCH 4/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
2021-03-04 13:59 ` [PATCH 6/8] io_uring: don't take task ring-file notes Pavel Begunkov
` (2 subsequent siblings)
7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Another preparation patch. When full quiesce is done on ctx exit, use
task_work infra to remove corresponding to the ctx io_uring->xa entries.
For that we use the back tctx map. Also use ->in_idle to prevent
removing it while we traversing ->xa on cancellation, just ignore it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8d632fa61799..d88f77f4bb7e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -987,6 +987,7 @@ static const struct io_op_def io_op_defs[] = {
[IORING_OP_UNLINKAT] = {},
};
+static void io_uring_del_task_file(unsigned long index);
static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
struct task_struct *task,
struct files_struct *files);
@@ -8521,10 +8522,33 @@ static bool io_run_ctx_fallback(struct io_ring_ctx *ctx)
return executed;
}
+struct io_tctx_exit {
+ struct callback_head task_work;
+ struct completion completion;
+ unsigned long index;
+};
+
+static void io_tctx_exit_cb(struct callback_head *cb)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ struct io_tctx_exit *work;
+
+ work = container_of(cb, struct io_tctx_exit, task_work);
+ /*
+ * When @in_idle, we're in cancellation and it's racy to remove the
+ * node. It'll be removed by the end of cancellation, just ignore it.
+ */
+ if (!atomic_read(&tctx->in_idle))
+ io_uring_del_task_file(work->index);
+ complete(&work->completion);
+}
+
static void io_ring_exit_work(struct work_struct *work)
{
- struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
- exit_work);
+ struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
+ struct io_tctx_exit exit;
+ struct io_tctx_node *node;
+ int ret;
/*
* If we're doing polled IO and end up having requests being
@@ -8535,6 +8559,26 @@ static void io_ring_exit_work(struct work_struct *work)
do {
io_uring_try_cancel_requests(ctx, NULL, NULL);
} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
+
+ mutex_lock(&ctx->uring_lock);
+ while (!list_empty(&ctx->tctx_list)) {
+ node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
+ ctx_node);
+ exit.index = (unsigned long)node->file;
+ init_completion(&exit.completion);
+ init_task_work(&exit.task_work, io_tctx_exit_cb);
+ ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
+ if (WARN_ON_ONCE(ret))
+ continue;
+ wake_up_process(node->task);
+
+ mutex_unlock(&ctx->uring_lock);
+ wait_for_completion(&exit.completion);
+ cond_resched();
+ mutex_lock(&ctx->uring_lock);
+ }
+ mutex_unlock(&ctx->uring_lock);
+
io_ring_ctx_free(ctx);
}
--
2.24.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 6/8] io_uring: don't take task ring-file notes
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
` (4 preceding siblings ...)
2021-03-04 13:59 ` [PATCH 5/8] io_uring: do ctx initiated file note removal Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
2021-03-04 13:59 ` [PATCH 7/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
2021-03-04 13:59 ` [PATCH 8/8] io_uring: warn when ring exit takes too long Pavel Begunkov
7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
With ->flush() gone we're now leaving all uring file notes until the
task dies/execs, so the ctx will not be freed until all tasks that have
ever submit a request die. It was nicer with flush but not much, we
could have locked as described ctx in many cases.
Now we guarantee that ctx outlives all tctx in a sense that
io_ring_exit_work() waits for all tctxs to drop their corresponding
enties in ->xa, and ctx won't go away until then. Hence, additional
io_uring file reference (a.k.a. task file notes) are not needed anymore.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d88f77f4bb7e..da93ae7b3aef 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8808,11 +8808,9 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
node->file = file;
node->task = current;
- get_file(file);
ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
node, GFP_KERNEL));
if (ret) {
- fput(file);
kfree(node);
return ret;
}
@@ -8843,6 +8841,8 @@ static void io_uring_del_task_file(unsigned long index)
struct io_uring_task *tctx = current->io_uring;
struct io_tctx_node *node;
+ if (!tctx)
+ return;
node = xa_erase(&tctx->xa, index);
if (!node)
return;
@@ -8856,7 +8856,6 @@ static void io_uring_del_task_file(unsigned long index)
if (tctx->last == node->file)
tctx->last = NULL;
- fput(node->file);
kfree(node);
}
--
2.24.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 7/8] io_uring: index io_uring->xa by ctx not file
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
` (5 preceding siblings ...)
2021-03-04 13:59 ` [PATCH 6/8] io_uring: don't take task ring-file notes Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
2021-03-04 13:59 ` [PATCH 8/8] io_uring: warn when ring exit takes too long Pavel Begunkov
7 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't use task file notes anymore, and no need left in indexing
task->io_uring->xa by file, and replace it with ctx. It's better
design-wise, especially since we keep a dangling file, and so have to
keep an eye on not dereferencing it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 24 +++++++++++-------------
include/linux/io_uring.h | 2 +-
2 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index da93ae7b3aef..46a2417187ff 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -809,7 +809,6 @@ struct io_kiocb {
struct io_tctx_node {
struct list_head ctx_node;
struct task_struct *task;
- struct file *file;
struct io_ring_ctx *ctx;
};
@@ -8525,7 +8524,7 @@ static bool io_run_ctx_fallback(struct io_ring_ctx *ctx)
struct io_tctx_exit {
struct callback_head task_work;
struct completion completion;
- unsigned long index;
+ struct io_ring_ctx *ctx;
};
static void io_tctx_exit_cb(struct callback_head *cb)
@@ -8539,7 +8538,7 @@ static void io_tctx_exit_cb(struct callback_head *cb)
* node. It'll be removed by the end of cancellation, just ignore it.
*/
if (!atomic_read(&tctx->in_idle))
- io_uring_del_task_file(work->index);
+ io_uring_del_task_file((unsigned long)work->ctx);
complete(&work->completion);
}
@@ -8564,7 +8563,7 @@ static void io_ring_exit_work(struct work_struct *work)
while (!list_empty(&ctx->tctx_list)) {
node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
ctx_node);
- exit.index = (unsigned long)node->file;
+ exit.ctx = ctx;
init_completion(&exit.completion);
init_task_work(&exit.task_work, io_tctx_exit_cb);
ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
@@ -8785,7 +8784,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
/*
* Note that this task has used io_uring. We use it for cancelation purposes.
*/
-static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
+static int io_uring_add_task_file(struct io_ring_ctx *ctx)
{
struct io_uring_task *tctx = current->io_uring;
struct io_tctx_node *node;
@@ -8797,18 +8796,17 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
return ret;
tctx = current->io_uring;
}
- if (tctx->last != file) {
- void *old = xa_load(&tctx->xa, (unsigned long)file);
+ if (tctx->last != ctx) {
+ void *old = xa_load(&tctx->xa, (unsigned long)ctx);
if (!old) {
node = kmalloc(sizeof(*node), GFP_KERNEL);
if (!node)
return -ENOMEM;
node->ctx = ctx;
- node->file = file;
node->task = current;
- ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
+ ret = xa_err(xa_store(&tctx->xa, (unsigned long)ctx,
node, GFP_KERNEL));
if (ret) {
kfree(node);
@@ -8819,7 +8817,7 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
list_add(&node->ctx_node, &ctx->tctx_list);
mutex_unlock(&ctx->uring_lock);
}
- tctx->last = file;
+ tctx->last = ctx;
}
/*
@@ -8854,7 +8852,7 @@ static void io_uring_del_task_file(unsigned long index)
list_del(&node->ctx_node);
mutex_unlock(&node->ctx->uring_lock);
- if (tctx->last == node->file)
+ if (tctx->last == node->ctx)
tctx->last = NULL;
kfree(node);
}
@@ -9176,7 +9174,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
}
submitted = to_submit;
} else if (to_submit) {
- ret = io_uring_add_task_file(ctx, f.file);
+ ret = io_uring_add_task_file(ctx);
if (unlikely(ret))
goto out;
mutex_lock(&ctx->uring_lock);
@@ -9385,7 +9383,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
if (fd < 0)
return fd;
- ret = io_uring_add_task_file(ctx, file);
+ ret = io_uring_add_task_file(ctx);
if (ret) {
put_unused_fd(fd);
return ret;
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 796e0d7d186d..0e7dfd9e82c0 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -18,7 +18,7 @@ struct io_uring_task {
/* submission side */
struct xarray xa;
struct wait_queue_head wait;
- struct file *last;
+ void *last;
void *io_wq;
struct percpu_counter inflight;
atomic_t in_idle;
--
2.24.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 8/8] io_uring: warn when ring exit takes too long
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
` (6 preceding siblings ...)
2021-03-04 13:59 ` [PATCH 7/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
@ 2021-03-04 13:59 ` Pavel Begunkov
2021-03-05 4:25 ` Jens Axboe
7 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-04 13:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
monitor whether removal hang or not. Add WARN_ONCE to catch hangs.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 46a2417187ff..b4820f6261fe 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8547,6 +8547,8 @@ static void io_ring_exit_work(struct work_struct *work)
struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
struct io_tctx_exit exit;
struct io_tctx_node *node;
+ const u64 bias_ms = MSEC_PER_SEC * 60 * 5;
+ ktime_t start = ktime_get();
int ret;
/*
@@ -8557,10 +8559,13 @@ static void io_ring_exit_work(struct work_struct *work)
*/
do {
io_uring_try_cancel_requests(ctx, NULL, NULL);
+ WARN_ON_ONCE(ktime_ms_delta(ktime_get(), start) > bias_ms);
} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
mutex_lock(&ctx->uring_lock);
while (!list_empty(&ctx->tctx_list)) {
+ WARN_ON_ONCE(ktime_ms_delta(ktime_get(), start) > bias_ms);
+
node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
ctx_node);
exit.ctx = ctx;
--
2.24.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 8/8] io_uring: warn when ring exit takes too long
2021-03-04 13:59 ` [PATCH 8/8] io_uring: warn when ring exit takes too long Pavel Begunkov
@ 2021-03-05 4:25 ` Jens Axboe
2021-03-05 11:15 ` Pavel Begunkov
0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2021-03-05 4:25 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 3/4/21 6:59 AM, Pavel Begunkov wrote:
> We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
> monitor whether removal hang or not. Add WARN_ONCE to catch hangs.
Minor nit, but I'd just use jiffies for this. Ala:
unsigned long timeout = jiffies + 60 * HZ;
if (time_after(jiffies, timeout))
complain();
That's a well known idiom, and we don't need better precision than that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 8/8] io_uring: warn when ring exit takes too long
2021-03-05 4:25 ` Jens Axboe
@ 2021-03-05 11:15 ` Pavel Begunkov
0 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2021-03-05 11:15 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 05/03/2021 04:25, Jens Axboe wrote:
> On 3/4/21 6:59 AM, Pavel Begunkov wrote:
>> We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
>> monitor whether removal hang or not. Add WARN_ONCE to catch hangs.
>
> Minor nit, but I'd just use jiffies for this. Ala:
>
> unsigned long timeout = jiffies + 60 * HZ;
>
> if (time_after(jiffies, timeout))
> complain();
>
> That's a well known idiom, and we don't need better precision than that.
And also looks much nicer, thanks
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-03-05 11:20 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-04 13:59 [PATCH 5.12 0/8] remove task file notes Pavel Begunkov
2021-03-04 13:59 ` [PATCH 1/8] io_uring: cancel-match based on flags Pavel Begunkov
2021-03-04 13:59 ` [PATCH 2/8] io_uring: reliably cancel linked timeouts Pavel Begunkov
2021-03-04 13:59 ` [PATCH 3/8] io_uring: make del_task_file more forgiving Pavel Begunkov
2021-03-04 13:59 ` [PATCH 4/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
2021-03-04 13:59 ` [PATCH 5/8] io_uring: do ctx initiated file note removal Pavel Begunkov
2021-03-04 13:59 ` [PATCH 6/8] io_uring: don't take task ring-file notes Pavel Begunkov
2021-03-04 13:59 ` [PATCH 7/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
2021-03-04 13:59 ` [PATCH 8/8] io_uring: warn when ring exit takes too long Pavel Begunkov
2021-03-05 4:25 ` Jens Axboe
2021-03-05 11:15 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox