* [PATCH v2 1/4] io_uring: zero ref_node after killing it
2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
@ 2021-02-20 18:03 ` Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 2/4] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
` (4 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
To: Jens Axboe, io-uring
After a rsrc/files reference node's refs are killed, it must never be
used. And that's how it works, it either assigns a new node or kills the
whole data table.
Let's explicitly NULL it, that shouldn't be necessary, but if something
would go wrong I'd rather catch a NULL dereference to using a dangling
pointer.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b7bae301744b..50d4dba08f82 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7335,6 +7335,7 @@ static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_dat
io_rsrc_ref_lock(ctx);
ref_node = data->node;
+ data->node = NULL;
io_rsrc_ref_unlock(ctx);
if (ref_node)
percpu_ref_kill(&ref_node->refs);
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/4] io_uring: fix io_rsrc_ref_quiesce races
2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 1/4] io_uring: zero ref_node after killing it Pavel Begunkov
@ 2021-02-20 18:03 ` Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 3/4] io_uring: keep generic rsrc infra generic Pavel Begunkov
` (3 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
To: Jens Axboe, io-uring
There are different types of races in io_rsrc_ref_quiesce() between
->release() of percpu_refs and reinit_completion(), fix them by always
resurrecting between iterations. BTW, clean the function up, because
DRY.
Fixes: a4f2225d1cb2 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 57 +++++++++++++++++++++------------------------------
1 file changed, 23 insertions(+), 34 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 50d4dba08f82..292fba2b8e36 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -236,6 +236,7 @@ struct fixed_rsrc_data {
struct fixed_rsrc_ref_node *node;
struct percpu_ref refs;
struct completion done;
+ bool quiesce;
};
struct io_buffer {
@@ -7316,19 +7317,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
percpu_ref_get(&rsrc_data->refs);
}
-static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
-{
- struct fixed_rsrc_ref_node *backup_node;
-
- backup_node = alloc_fixed_rsrc_ref_node(ctx);
- if (!backup_node)
- return -ENOMEM;
- init_fixed_file_ref_node(ctx, backup_node);
- io_sqe_rsrc_set_node(ctx, data, backup_node);
-
- return 0;
-}
-
static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
{
struct fixed_rsrc_ref_node *ref_node = NULL;
@@ -7347,39 +7335,40 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
{
int ret;
- io_sqe_rsrc_kill_node(ctx, data);
- percpu_ref_kill(&data->refs);
+ if (data->quiesce)
+ return -ENXIO;
- /* wait for all refs nodes to complete */
- flush_delayed_work(&ctx->rsrc_put_work);
+ data->quiesce = true;
do {
+ io_sqe_rsrc_kill_node(ctx, data);
+ percpu_ref_kill(&data->refs);
+ flush_delayed_work(&ctx->rsrc_put_work);
+
ret = wait_for_completion_interruptible(&data->done);
if (!ret)
break;
- ret = io_sqe_rsrc_add_node(ctx, data);
- if (ret < 0)
- break;
- /*
- * There is small possibility that data->done is already completed
- * So reinit it here
- */
+ percpu_ref_resurrect(&data->refs);
+ io_sqe_rsrc_set_node(ctx, data, backup_node);
+ backup_node = NULL;
reinit_completion(&data->done);
mutex_unlock(&ctx->uring_lock);
ret = io_run_task_work_sig();
mutex_lock(&ctx->uring_lock);
- io_sqe_rsrc_kill_node(ctx, data);
- } while (ret >= 0);
- if (ret < 0) {
- percpu_ref_resurrect(&data->refs);
- reinit_completion(&data->done);
- io_sqe_rsrc_set_node(ctx, data, backup_node);
- return ret;
- }
+ if (ret < 0)
+ break;
+ backup_node = alloc_fixed_rsrc_ref_node(ctx);
+ ret = -ENOMEM;
+ if (!backup_node)
+ break;
+ init_fixed_file_ref_node(ctx, backup_node);
+ } while (1);
+ data->quiesce = false;
- destroy_fixed_rsrc_ref_node(backup_node);
- return 0;
+ if (backup_node)
+ destroy_fixed_rsrc_ref_node(backup_node);
+ return ret;
}
static struct fixed_rsrc_data *alloc_fixed_rsrc_data(struct io_ring_ctx *ctx)
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/4] io_uring: keep generic rsrc infra generic
2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 1/4] io_uring: zero ref_node after killing it Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 2/4] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
@ 2021-02-20 18:03 ` Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 4/4] io_uring: wait potential ->release() on resurrect Pavel Begunkov
` (2 subsequent siblings)
5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_rsrc_ref_quiesce() is a generic resource function, though now it
was wired to allocate and initialise ref nodes with file-specific
callbacks/etc. Keep it sane by passing in as a parameters everything we
need for initialisations, otherwise it will hurt us badly one day.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 292fba2b8e36..b00ab7138410 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1037,8 +1037,7 @@ static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
static void destroy_fixed_rsrc_ref_node(struct fixed_rsrc_ref_node *ref_node);
static struct fixed_rsrc_ref_node *alloc_fixed_rsrc_ref_node(
struct io_ring_ctx *ctx);
-static void init_fixed_file_ref_node(struct io_ring_ctx *ctx,
- struct fixed_rsrc_ref_node *ref_node);
+static void io_ring_file_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc);
static bool io_rw_reissue(struct io_kiocb *req);
static void io_cqring_fill_event(struct io_kiocb *req, long res);
@@ -7331,8 +7330,10 @@ static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_dat
static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
struct io_ring_ctx *ctx,
- struct fixed_rsrc_ref_node *backup_node)
+ void (*rsrc_put)(struct io_ring_ctx *ctx,
+ struct io_rsrc_put *prsrc))
{
+ struct fixed_rsrc_ref_node *backup_node;
int ret;
if (data->quiesce)
@@ -7340,6 +7341,13 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
data->quiesce = true;
do {
+ ret = -ENOMEM;
+ backup_node = alloc_fixed_rsrc_ref_node(ctx);
+ if (!backup_node)
+ break;
+ backup_node->rsrc_data = data;
+ backup_node->rsrc_put = rsrc_put;
+
io_sqe_rsrc_kill_node(ctx, data);
percpu_ref_kill(&data->refs);
flush_delayed_work(&ctx->rsrc_put_work);
@@ -7355,15 +7363,7 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
mutex_unlock(&ctx->uring_lock);
ret = io_run_task_work_sig();
mutex_lock(&ctx->uring_lock);
-
- if (ret < 0)
- break;
- backup_node = alloc_fixed_rsrc_ref_node(ctx);
- ret = -ENOMEM;
- if (!backup_node)
- break;
- init_fixed_file_ref_node(ctx, backup_node);
- } while (1);
+ } while (ret >= 0);
data->quiesce = false;
if (backup_node)
@@ -7399,7 +7399,6 @@ static void free_fixed_rsrc_data(struct fixed_rsrc_data *data)
static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
{
struct fixed_rsrc_data *data = ctx->file_data;
- struct fixed_rsrc_ref_node *backup_node;
unsigned nr_tables, i;
int ret;
@@ -7410,12 +7409,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
*/
if (!data || percpu_ref_is_dying(&data->refs))
return -ENXIO;
- backup_node = alloc_fixed_rsrc_ref_node(ctx);
- if (!backup_node)
- return -ENOMEM;
- init_fixed_file_ref_node(ctx, backup_node);
-
- ret = io_rsrc_ref_quiesce(data, ctx, backup_node);
+ ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
if (ret)
return ret;
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 4/4] io_uring: wait potential ->release() on resurrect
2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
` (2 preceding siblings ...)
2021-02-20 18:03 ` [PATCH v2 3/4] io_uring: keep generic rsrc infra generic Pavel Begunkov
@ 2021-02-20 18:03 ` Pavel Begunkov
2021-02-20 18:33 ` [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Jens Axboe
2021-02-21 13:22 ` Hao Xu
5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: stable
There is a short window where percpu_refs are already turned zero, but
we try to do resurrect(). Play nicer and wait for ->release() to happen
in this case and proceed as everything is ok. One downside for ctx refs
is that we can ignore signal_pending() on a rare occasion, but someone
else should check for it later if needed.
Cc: <[email protected]> # 5.5+
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 26 ++++++++++++++++++--------
1 file changed, 18 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b00ab7138410..ce197af2d3c6 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1104,6 +1104,21 @@ static inline void io_set_resource_node(struct io_kiocb *req)
}
}
+static bool io_refs_resurrect(struct percpu_ref *ref, struct completion *compl)
+{
+ if (!percpu_ref_tryget(ref)) {
+ /* already at zero, wait for ->release() */
+ if (!try_wait_for_completion(compl))
+ synchronize_rcu();
+ return false;
+ }
+
+ percpu_ref_resurrect(ref);
+ reinit_completion(compl);
+ percpu_ref_put(ref);
+ return true;
+}
+
static bool io_match_task(struct io_kiocb *head,
struct task_struct *task,
struct files_struct *files)
@@ -7353,13 +7368,11 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
flush_delayed_work(&ctx->rsrc_put_work);
ret = wait_for_completion_interruptible(&data->done);
- if (!ret)
+ if (!ret || !io_refs_resurrect(&data->refs, &data->done))
break;
- percpu_ref_resurrect(&data->refs);
io_sqe_rsrc_set_node(ctx, data, backup_node);
backup_node = NULL;
- reinit_completion(&data->done);
mutex_unlock(&ctx->uring_lock);
ret = io_run_task_work_sig();
mutex_lock(&ctx->uring_lock);
@@ -10094,10 +10107,8 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
mutex_lock(&ctx->uring_lock);
- if (ret) {
- percpu_ref_resurrect(&ctx->refs);
- goto out_quiesce;
- }
+ if (ret && io_refs_resurrect(&ctx->refs, &ctx->ref_comp))
+ return ret;
}
if (ctx->restricted) {
@@ -10189,7 +10200,6 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
if (io_register_op_must_quiesce(opcode)) {
/* bring the ctx back to life */
percpu_ref_reinit(&ctx->refs);
-out_quiesce:
reinit_completion(&ctx->ref_comp);
}
return ret;
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/4] rsrc quiesce fixes/hardening v2
2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
` (3 preceding siblings ...)
2021-02-20 18:03 ` [PATCH v2 4/4] io_uring: wait potential ->release() on resurrect Pavel Begunkov
@ 2021-02-20 18:33 ` Jens Axboe
2021-02-21 13:22 ` Hao Xu
5 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2021-02-20 18:33 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 2/20/21 11:03 AM, Pavel Begunkov wrote:
> v2: concurrent quiesce avoidance (Hao)
> resurrect-release patch
>
> Pavel Begunkov (4):
> io_uring: zero ref_node after killing it
> io_uring: fix io_rsrc_ref_quiesce races
> io_uring: keep generic rsrc infra generic
> io_uring: wait potential ->release() on resurrect
>
> fs/io_uring.c | 96 ++++++++++++++++++++++++---------------------------
> 1 file changed, 45 insertions(+), 51 deletions(-)
Thanks, replaced existing series.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/4] rsrc quiesce fixes/hardening v2
2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
` (4 preceding siblings ...)
2021-02-20 18:33 ` [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Jens Axboe
@ 2021-02-21 13:22 ` Hao Xu
2021-02-22 14:05 ` Pavel Begunkov
5 siblings, 1 reply; 8+ messages in thread
From: Hao Xu @ 2021-02-21 13:22 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/2/21 上午2:03, Pavel Begunkov 写道:
> v2: concurrent quiesce avoidance (Hao)
> resurrect-release patch
>
> Pavel Begunkov (4):
> io_uring: zero ref_node after killing it
> io_uring: fix io_rsrc_ref_quiesce races
> io_uring: keep generic rsrc infra generic
> io_uring: wait potential ->release() on resurrect
>
> fs/io_uring.c | 96 ++++++++++++++++++++++++---------------------------
> 1 file changed, 45 insertions(+), 51 deletions(-)
>
I tested this patchset with the same tests
for "io_uring: don't hold uring_lock ..."
Tested-by: Hao Xu <[email protected]>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 0/4] rsrc quiesce fixes/hardening v2
2021-02-21 13:22 ` Hao Xu
@ 2021-02-22 14:05 ` Pavel Begunkov
0 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-22 14:05 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 21/02/2021 13:22, Hao Xu wrote:
> 在 2021/2/21 上午2:03, Pavel Begunkov 写道:
>> v2: concurrent quiesce avoidance (Hao)
>> resurrect-release patch
>>
>> Pavel Begunkov (4):
>> io_uring: zero ref_node after killing it
>> io_uring: fix io_rsrc_ref_quiesce races
>> io_uring: keep generic rsrc infra generic
>> io_uring: wait potential ->release() on resurrect
>>
>> fs/io_uring.c | 96 ++++++++++++++++++++++++---------------------------
>> 1 file changed, 45 insertions(+), 51 deletions(-)
>>
> I tested this patchset with the same tests
> for "io_uring: don't hold uring_lock ..."
>
> Tested-by: Hao Xu <[email protected]>
Great, thanks
FYI, looks like your emails have a strange encoding. It's
readable, but at least for me shows "undefined encoding".
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread