* [PATCH v4 5.12 0/8] remove task file notes
@ 2021-03-06 11:02 Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 1/8] io_uring: make del_task_file more forgiving Pavel Begunkov
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
Introduce a mapping from ctx to all tctx, and using that removes
file notes, i.e. taking a io_uring file note previously stored in
task->io_uring->xa. It's needed because we don't free io_uring ctx
until all submitters die/exec, and it became worse after killing
->flush(). There are rough corner in a form of not behaving nicely,
I'll address in follow-up patches. Also use it to do cancellations
right.
The torture is as simple as below. It will get OOM in no time. Also,
I plan to use it to fix recently broken cancellations.
while (1) {
assert(!io_uring_queue_init(8, &ring, 0));
io_uring_queue_exit(&ring);
}
WARNING: hangs without reverting sq park refactoring
v2: rebase (resolve conflicts)
drop taken 2 patches
v3: use jiffies in 6/6 (Jens)
v4: 2 patches, 7/8 fix cancellation regressions
Pavel Begunkov (8):
io_uring: make del_task_file more forgiving
io_uring: introduce ctx to tctx back map
io_uring: do ctx initiated file note removal
io_uring: don't take task ring-file notes
io_uring: index io_uring->xa by ctx not file
io_uring: warn when ring exit takes too long
io_uring: cancel reqs of all iowq's on ring exit
io-wq: warn on creating manager awhile exiting
fs/io-wq.c | 2 +
fs/io_uring.c | 173 ++++++++++++++++++++++++++++++++-------
include/linux/io_uring.h | 2 +-
3 files changed, 147 insertions(+), 30 deletions(-)
--
2.24.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v4 1/8] io_uring: make del_task_file more forgiving
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 2/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
Rework io_uring_del_task_file(), so it accepts an index to delete, and
it's not necessarily have to be in the ->xa. Infer file from xa_erase()
to maintain a single origin of truth.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 92c25b5f1349..4c6a92e5d5a3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8780,15 +8780,18 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
/*
* Remove this io_uring_file -> task mapping.
*/
-static void io_uring_del_task_file(struct file *file)
+static void io_uring_del_task_file(unsigned long index)
{
struct io_uring_task *tctx = current->io_uring;
+ struct file *file;
+
+ file = xa_erase(&tctx->xa, index);
+ if (!file)
+ return;
if (tctx->last == file)
tctx->last = NULL;
- file = xa_erase(&tctx->xa, (unsigned long)file);
- if (file)
- fput(file);
+ fput(file);
}
static void io_uring_clean_tctx(struct io_uring_task *tctx)
@@ -8797,7 +8800,7 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
unsigned long index;
xa_for_each(&tctx->xa, index, file)
- io_uring_del_task_file(file);
+ io_uring_del_task_file(index);
if (tctx->io_wq) {
io_wq_put_and_exit(tctx->io_wq);
tctx->io_wq = NULL;
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 2/8] io_uring: introduce ctx to tctx back map
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 1/8] io_uring: make del_task_file more forgiving Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 3/8] io_uring: do ctx initiated file note removal Pavel Begunkov
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
For each pair tcxt-ctx create an object and chain it into ctx, so we
have a way to traverse all tctx that are using current ctx. Preparation
patch, will be used later.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 58 ++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 44 insertions(+), 14 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4c6a92e5d5a3..f26f8199e4ab 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -454,6 +454,7 @@ struct io_ring_ctx {
/* Keep this last, we don't need it for the fast path */
struct work_struct exit_work;
+ struct list_head tctx_list;
};
/*
@@ -805,6 +806,13 @@ struct io_kiocb {
struct io_wq_work work;
};
+struct io_tctx_node {
+ struct list_head ctx_node;
+ struct task_struct *task;
+ struct file *file;
+ struct io_ring_ctx *ctx;
+};
+
struct io_defer_entry {
struct list_head list;
struct io_kiocb *req;
@@ -1144,6 +1152,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_LIST_HEAD(&ctx->rsrc_ref_list);
INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
init_llist_head(&ctx->rsrc_put_llist);
+ INIT_LIST_HEAD(&ctx->tctx_list);
INIT_LIST_HEAD(&ctx->submit_state.comp.free_list);
INIT_LIST_HEAD(&ctx->submit_state.comp.locked_free_list);
return ctx;
@@ -8743,6 +8752,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
{
struct io_uring_task *tctx = current->io_uring;
+ struct io_tctx_node *node;
int ret;
if (unlikely(!tctx)) {
@@ -8755,13 +8765,25 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
void *old = xa_load(&tctx->xa, (unsigned long)file);
if (!old) {
+ node = kmalloc(sizeof(*node), GFP_KERNEL);
+ if (!node)
+ return -ENOMEM;
+ node->ctx = ctx;
+ node->file = file;
+ node->task = current;
+
get_file(file);
ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
- file, GFP_KERNEL));
+ node, GFP_KERNEL));
if (ret) {
fput(file);
+ kfree(node);
return ret;
}
+
+ mutex_lock(&ctx->uring_lock);
+ list_add(&node->ctx_node, &ctx->tctx_list);
+ mutex_unlock(&ctx->uring_lock);
}
tctx->last = file;
}
@@ -8783,23 +8805,31 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
static void io_uring_del_task_file(unsigned long index)
{
struct io_uring_task *tctx = current->io_uring;
- struct file *file;
+ struct io_tctx_node *node;
- file = xa_erase(&tctx->xa, index);
- if (!file)
+ node = xa_erase(&tctx->xa, index);
+ if (!node)
return;
- if (tctx->last == file)
+ WARN_ON_ONCE(current != node->task);
+ WARN_ON_ONCE(list_empty(&node->ctx_node));
+
+ mutex_lock(&node->ctx->uring_lock);
+ list_del(&node->ctx_node);
+ mutex_unlock(&node->ctx->uring_lock);
+
+ if (tctx->last == node->file)
tctx->last = NULL;
- fput(file);
+ fput(node->file);
+ kfree(node);
}
static void io_uring_clean_tctx(struct io_uring_task *tctx)
{
- struct file *file;
+ struct io_tctx_node *node;
unsigned long index;
- xa_for_each(&tctx->xa, index, file)
+ xa_for_each(&tctx->xa, index, node)
io_uring_del_task_file(index);
if (tctx->io_wq) {
io_wq_put_and_exit(tctx->io_wq);
@@ -8810,13 +8840,13 @@ static void io_uring_clean_tctx(struct io_uring_task *tctx)
void __io_uring_files_cancel(struct files_struct *files)
{
struct io_uring_task *tctx = current->io_uring;
- struct file *file;
+ struct io_tctx_node *node;
unsigned long index;
/* make sure overflow events are dropped */
atomic_inc(&tctx->in_idle);
- xa_for_each(&tctx->xa, index, file)
- io_uring_cancel_task_requests(file->private_data, files);
+ xa_for_each(&tctx->xa, index, node)
+ io_uring_cancel_task_requests(node->ctx, files);
atomic_dec(&tctx->in_idle);
if (files)
@@ -8879,11 +8909,11 @@ void __io_uring_task_cancel(void)
atomic_inc(&tctx->in_idle);
if (tctx->sqpoll) {
- struct file *file;
+ struct io_tctx_node *node;
unsigned long index;
- xa_for_each(&tctx->xa, index, file)
- io_uring_cancel_sqpoll(file->private_data);
+ xa_for_each(&tctx->xa, index, node)
+ io_uring_cancel_sqpoll(node->ctx);
}
do {
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 3/8] io_uring: do ctx initiated file note removal
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 1/8] io_uring: make del_task_file more forgiving Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 2/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 4/8] io_uring: don't take task ring-file notes Pavel Begunkov
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
Another preparation patch. When full quiesce is done on ctx exit, use
task_work infra to remove corresponding to the ctx io_uring->xa entries.
For that we use the back tctx map. Also use ->in_idle to prevent
removing it while we traversing ->xa on cancellation, just ignore it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 46 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f26f8199e4ab..692096e85749 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -987,6 +987,7 @@ static const struct io_op_def io_op_defs[] = {
[IORING_OP_UNLINKAT] = {},
};
+static void io_uring_del_task_file(unsigned long index);
static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
struct task_struct *task,
struct files_struct *files);
@@ -8531,10 +8532,33 @@ static bool io_run_ctx_fallback(struct io_ring_ctx *ctx)
return executed;
}
+struct io_tctx_exit {
+ struct callback_head task_work;
+ struct completion completion;
+ unsigned long index;
+};
+
+static void io_tctx_exit_cb(struct callback_head *cb)
+{
+ struct io_uring_task *tctx = current->io_uring;
+ struct io_tctx_exit *work;
+
+ work = container_of(cb, struct io_tctx_exit, task_work);
+ /*
+ * When @in_idle, we're in cancellation and it's racy to remove the
+ * node. It'll be removed by the end of cancellation, just ignore it.
+ */
+ if (!atomic_read(&tctx->in_idle))
+ io_uring_del_task_file(work->index);
+ complete(&work->completion);
+}
+
static void io_ring_exit_work(struct work_struct *work)
{
- struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
- exit_work);
+ struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
+ struct io_tctx_exit exit;
+ struct io_tctx_node *node;
+ int ret;
/*
* If we're doing polled IO and end up having requests being
@@ -8545,6 +8569,26 @@ static void io_ring_exit_work(struct work_struct *work)
do {
io_uring_try_cancel_requests(ctx, NULL, NULL);
} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
+
+ mutex_lock(&ctx->uring_lock);
+ while (!list_empty(&ctx->tctx_list)) {
+ node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
+ ctx_node);
+ exit.index = (unsigned long)node->file;
+ init_completion(&exit.completion);
+ init_task_work(&exit.task_work, io_tctx_exit_cb);
+ ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
+ if (WARN_ON_ONCE(ret))
+ continue;
+ wake_up_process(node->task);
+
+ mutex_unlock(&ctx->uring_lock);
+ wait_for_completion(&exit.completion);
+ cond_resched();
+ mutex_lock(&ctx->uring_lock);
+ }
+ mutex_unlock(&ctx->uring_lock);
+
io_ring_ctx_free(ctx);
}
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 4/8] io_uring: don't take task ring-file notes
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
` (2 preceding siblings ...)
2021-03-06 11:02 ` [PATCH v4 3/8] io_uring: do ctx initiated file note removal Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 5/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
With ->flush() gone we're now leaving all uring file notes until the
task dies/execs, so the ctx will not be freed until all tasks that have
ever submit a request die. It was nicer with flush but not much, we
could have locked as described ctx in many cases.
Now we guarantee that ctx outlives all tctx in a sense that
io_ring_exit_work() waits for all tctxs to drop their corresponding
enties in ->xa, and ctx won't go away until then. Hence, additional
io_uring file reference (a.k.a. task file notes) are not needed anymore.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 692096e85749..a4e5acb058d2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8816,11 +8816,9 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
node->file = file;
node->task = current;
- get_file(file);
ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
node, GFP_KERNEL));
if (ret) {
- fput(file);
kfree(node);
return ret;
}
@@ -8851,6 +8849,8 @@ static void io_uring_del_task_file(unsigned long index)
struct io_uring_task *tctx = current->io_uring;
struct io_tctx_node *node;
+ if (!tctx)
+ return;
node = xa_erase(&tctx->xa, index);
if (!node)
return;
@@ -8864,7 +8864,6 @@ static void io_uring_del_task_file(unsigned long index)
if (tctx->last == node->file)
tctx->last = NULL;
- fput(node->file);
kfree(node);
}
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 5/8] io_uring: index io_uring->xa by ctx not file
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
` (3 preceding siblings ...)
2021-03-06 11:02 ` [PATCH v4 4/8] io_uring: don't take task ring-file notes Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 6/8] io_uring: warn when ring exit takes too long Pavel Begunkov
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't use task file notes anymore, and no need left in indexing
task->io_uring->xa by file, and replace it with ctx. It's better
design-wise, especially since we keep a dangling file, and so have to
keep an eye on not dereferencing it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 24 +++++++++++-------------
include/linux/io_uring.h | 2 +-
2 files changed, 12 insertions(+), 14 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index a4e5acb058d2..c5436b24d221 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -809,7 +809,6 @@ struct io_kiocb {
struct io_tctx_node {
struct list_head ctx_node;
struct task_struct *task;
- struct file *file;
struct io_ring_ctx *ctx;
};
@@ -8535,7 +8534,7 @@ static bool io_run_ctx_fallback(struct io_ring_ctx *ctx)
struct io_tctx_exit {
struct callback_head task_work;
struct completion completion;
- unsigned long index;
+ struct io_ring_ctx *ctx;
};
static void io_tctx_exit_cb(struct callback_head *cb)
@@ -8549,7 +8548,7 @@ static void io_tctx_exit_cb(struct callback_head *cb)
* node. It'll be removed by the end of cancellation, just ignore it.
*/
if (!atomic_read(&tctx->in_idle))
- io_uring_del_task_file(work->index);
+ io_uring_del_task_file((unsigned long)work->ctx);
complete(&work->completion);
}
@@ -8574,7 +8573,7 @@ static void io_ring_exit_work(struct work_struct *work)
while (!list_empty(&ctx->tctx_list)) {
node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
ctx_node);
- exit.index = (unsigned long)node->file;
+ exit.ctx = ctx;
init_completion(&exit.completion);
init_task_work(&exit.task_work, io_tctx_exit_cb);
ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
@@ -8793,7 +8792,7 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
/*
* Note that this task has used io_uring. We use it for cancelation purposes.
*/
-static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
+static int io_uring_add_task_file(struct io_ring_ctx *ctx)
{
struct io_uring_task *tctx = current->io_uring;
struct io_tctx_node *node;
@@ -8805,18 +8804,17 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
return ret;
tctx = current->io_uring;
}
- if (tctx->last != file) {
- void *old = xa_load(&tctx->xa, (unsigned long)file);
+ if (tctx->last != ctx) {
+ void *old = xa_load(&tctx->xa, (unsigned long)ctx);
if (!old) {
node = kmalloc(sizeof(*node), GFP_KERNEL);
if (!node)
return -ENOMEM;
node->ctx = ctx;
- node->file = file;
node->task = current;
- ret = xa_err(xa_store(&tctx->xa, (unsigned long)file,
+ ret = xa_err(xa_store(&tctx->xa, (unsigned long)ctx,
node, GFP_KERNEL));
if (ret) {
kfree(node);
@@ -8827,7 +8825,7 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
list_add(&node->ctx_node, &ctx->tctx_list);
mutex_unlock(&ctx->uring_lock);
}
- tctx->last = file;
+ tctx->last = ctx;
}
/*
@@ -8862,7 +8860,7 @@ static void io_uring_del_task_file(unsigned long index)
list_del(&node->ctx_node);
mutex_unlock(&node->ctx->uring_lock);
- if (tctx->last == node->file)
+ if (tctx->last == node->ctx)
tctx->last = NULL;
kfree(node);
}
@@ -9161,7 +9159,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
}
submitted = to_submit;
} else if (to_submit) {
- ret = io_uring_add_task_file(ctx, f.file);
+ ret = io_uring_add_task_file(ctx);
if (unlikely(ret))
goto out;
mutex_lock(&ctx->uring_lock);
@@ -9370,7 +9368,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
if (fd < 0)
return fd;
- ret = io_uring_add_task_file(ctx, file);
+ ret = io_uring_add_task_file(ctx);
if (ret) {
put_unused_fd(fd);
return ret;
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 7cb7bd0e334c..9761a0ec9f95 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -18,7 +18,7 @@ struct io_uring_task {
/* submission side */
struct xarray xa;
struct wait_queue_head wait;
- struct file *last;
+ void *last;
void *io_wq;
struct percpu_counter inflight;
atomic_t in_idle;
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 6/8] io_uring: warn when ring exit takes too long
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
` (4 preceding siblings ...)
2021-03-06 11:02 ` [PATCH v4 5/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 7/8] io_uring: cancel reqs of all iowq's on ring exit Pavel Begunkov
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
We use system_unbound_wq to run io_ring_exit_work(), so it's hard to
monitor whether removal hang or not. Add WARN_ONCE to catch hangs.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index c5436b24d221..e4c771df6364 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8555,6 +8555,7 @@ static void io_tctx_exit_cb(struct callback_head *cb)
static void io_ring_exit_work(struct work_struct *work)
{
struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
+ unsigned long timeout = jiffies + HZ * 60 * 5;
struct io_tctx_exit exit;
struct io_tctx_node *node;
int ret;
@@ -8567,10 +8568,14 @@ static void io_ring_exit_work(struct work_struct *work)
*/
do {
io_uring_try_cancel_requests(ctx, NULL, NULL);
+
+ WARN_ON_ONCE(time_after(jiffies, timeout));
} while (!wait_for_completion_timeout(&ctx->ref_comp, HZ/20));
mutex_lock(&ctx->uring_lock);
while (!list_empty(&ctx->tctx_list)) {
+ WARN_ON_ONCE(time_after(jiffies, timeout));
+
node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
ctx_node);
exit.ctx = ctx;
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 7/8] io_uring: cancel reqs of all iowq's on ring exit
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
` (5 preceding siblings ...)
2021-03-06 11:02 ` [PATCH v4 6/8] io_uring: warn when ring exit takes too long Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 8/8] io-wq: warn on creating manager awhile exiting Pavel Begunkov
2021-03-08 21:05 ` [PATCH v4 5.12 0/8] remove task file notes Jens Axboe
8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_ring_exit_work() have to cancel all requests, including those staying
in io-wq, however it tries only cancellation of current tctx, which is
NULL. If we've got task==NULL, use the ctx-to-tctx map to go over all
tctx/io-wq and try cancellations on them.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 42 +++++++++++++++++++++++++++++++++++++++---
1 file changed, 39 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e4c771df6364..a367e65a2191 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8683,19 +8683,55 @@ static void io_cancel_defer_files(struct io_ring_ctx *ctx,
}
}
+static bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
+{
+ struct io_kiocb *req = container_of(work, struct io_kiocb, work);
+
+ return req->ctx == data;
+}
+
+static bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
+{
+ struct io_tctx_node *node;
+ enum io_wq_cancel cret;
+ bool ret = false;
+
+ mutex_lock(&ctx->uring_lock);
+ list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
+ struct io_uring_task *tctx = node->task->io_uring;
+
+ /*
+ * io_wq will stay alive while we hold uring_lock, because it's
+ * killed after ctx nodes, which requires to take the lock.
+ */
+ if (!tctx || !tctx->io_wq)
+ continue;
+ cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true);
+ ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
+ }
+ mutex_unlock(&ctx->uring_lock);
+
+ return ret;
+}
+
static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
struct task_struct *task,
struct files_struct *files)
{
struct io_task_cancel cancel = { .task = task, .files = files, };
- struct task_struct *tctx_task = task ?: current;
- struct io_uring_task *tctx = tctx_task->io_uring;
+ struct io_uring_task *tctx = task ? task->io_uring : NULL;
while (1) {
enum io_wq_cancel cret;
bool ret = false;
- if (tctx && tctx->io_wq) {
+ if (!task) {
+ ret |= io_uring_try_cancel_iowq(ctx);
+ } else if (tctx && tctx->io_wq) {
+ /*
+ * Cancels requests of all rings, not only @ctx, but
+ * it's fine as the task is in exit/exec.
+ */
cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_task_cb,
&cancel, true);
ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v4 8/8] io-wq: warn on creating manager awhile exiting
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
` (6 preceding siblings ...)
2021-03-06 11:02 ` [PATCH v4 7/8] io_uring: cancel reqs of all iowq's on ring exit Pavel Begunkov
@ 2021-03-06 11:02 ` Pavel Begunkov
2021-03-08 21:05 ` [PATCH v4 5.12 0/8] remove task file notes Jens Axboe
8 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-06 11:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
Add a simple warning making sure that nobody tries to create a new
manager while we're under IO_WQ_BIT_EXIT. That can potentially happen
due to racy work submission after final put.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 1bfdb86336e4..1ab9324e602f 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -774,6 +774,8 @@ static int io_wq_fork_manager(struct io_wq *wq)
if (wq->manager)
return 0;
+ WARN_ON_ONCE(test_bit(IO_WQ_BIT_EXIT, &wq->state));
+
init_completion(&wq->worker_done);
atomic_set(&wq->worker_refs, 1);
tsk = create_io_thread(io_wq_manager, wq, NUMA_NO_NODE);
--
2.24.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v4 5.12 0/8] remove task file notes
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
` (7 preceding siblings ...)
2021-03-06 11:02 ` [PATCH v4 8/8] io-wq: warn on creating manager awhile exiting Pavel Begunkov
@ 2021-03-08 21:05 ` Jens Axboe
8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2021-03-08 21:05 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 3/6/21 4:02 AM, Pavel Begunkov wrote:
> Introduce a mapping from ctx to all tctx, and using that removes
> file notes, i.e. taking a io_uring file note previously stored in
> task->io_uring->xa. It's needed because we don't free io_uring ctx
> until all submitters die/exec, and it became worse after killing
> ->flush(). There are rough corner in a form of not behaving nicely,
> I'll address in follow-up patches. Also use it to do cancellations
> right.
>
> The torture is as simple as below. It will get OOM in no time. Also,
> I plan to use it to fix recently broken cancellations.
>
> while (1) {
> assert(!io_uring_queue_init(8, &ring, 0));
> io_uring_queue_exit(&ring);
> }
>
> WARNING: hangs without reverting sq park refactoring
Got fixed separately - forgot to write that this series is applied,
thanks!
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-03-08 21:06 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-06 11:02 [PATCH v4 5.12 0/8] remove task file notes Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 1/8] io_uring: make del_task_file more forgiving Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 2/8] io_uring: introduce ctx to tctx back map Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 3/8] io_uring: do ctx initiated file note removal Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 4/8] io_uring: don't take task ring-file notes Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 5/8] io_uring: index io_uring->xa by ctx not file Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 6/8] io_uring: warn when ring exit takes too long Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 7/8] io_uring: cancel reqs of all iowq's on ring exit Pavel Begunkov
2021-03-06 11:02 ` [PATCH v4 8/8] io-wq: warn on creating manager awhile exiting Pavel Begunkov
2021-03-08 21:05 ` [PATCH v4 5.12 0/8] remove task file notes Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox