* [PATCH 0/3] rsrc quiesce fixes/hardening
@ 2021-02-19 20:45 Pavel Begunkov
2021-02-19 20:45 ` [PATCH 1/3] io_uring: zero ref_node after killing it Pavel Begunkov
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:45 UTC (permalink / raw)
To: Jens Axboe, io-uring
1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.
Pavel Begunkov (3):
io_uring: zero ref_node after killing it
io_uring: fix io_rsrc_ref_quiesce races
io_uring: keep generic rsrc infra generic
fs/io_uring.c | 65 +++++++++++++++++----------------------------------
1 file changed, 21 insertions(+), 44 deletions(-)
--
2.24.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] io_uring: zero ref_node after killing it
2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
@ 2021-02-19 20:45 ` Pavel Begunkov
2021-02-19 20:45 ` [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
` (3 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:45 UTC (permalink / raw)
To: Jens Axboe, io-uring
After a rsrc/files reference node's refs are killed, it must never be
used. And that's how it works, it either assigns a new node or kills the
whole data table.
Let's explicitly NULL it, that shouldn't be necessary, but if something
would go wrong I'd rather catch a NULL dereference to using a dangling
pointer.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b7bae301744b..50d4dba08f82 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7335,6 +7335,7 @@ static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_dat
io_rsrc_ref_lock(ctx);
ref_node = data->node;
+ data->node = NULL;
io_rsrc_ref_unlock(ctx);
if (ref_node)
percpu_ref_kill(&ref_node->refs);
--
2.24.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
2021-02-19 20:45 ` [PATCH 1/3] io_uring: zero ref_node after killing it Pavel Begunkov
@ 2021-02-19 20:45 ` Pavel Begunkov
2021-02-20 6:31 ` Hao Xu
2021-02-19 20:45 ` [PATCH 3/3] io_uring: keep generic rsrc infra generic Pavel Begunkov
` (2 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:45 UTC (permalink / raw)
To: Jens Axboe, io-uring
There are different types of races in io_rsrc_ref_quiesce() between
->release() of percpu_refs and reinit_completion(), fix them by always
resurrecting between iterations. BTW, clean the function up, because
DRY.
Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 46 +++++++++++++---------------------------------
1 file changed, 13 insertions(+), 33 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 50d4dba08f82..38ed52065a29 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7316,19 +7316,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
percpu_ref_get(&rsrc_data->refs);
}
-static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
-{
- struct fixed_rsrc_ref_node *backup_node;
-
- backup_node = alloc_fixed_rsrc_ref_node(ctx);
- if (!backup_node)
- return -ENOMEM;
- init_fixed_file_ref_node(ctx, backup_node);
- io_sqe_rsrc_set_node(ctx, data, backup_node);
-
- return 0;
-}
-
static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
{
struct fixed_rsrc_ref_node *ref_node = NULL;
@@ -7347,36 +7334,29 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
{
int ret;
- io_sqe_rsrc_kill_node(ctx, data);
- percpu_ref_kill(&data->refs);
-
- /* wait for all refs nodes to complete */
- flush_delayed_work(&ctx->rsrc_put_work);
do {
+ io_sqe_rsrc_kill_node(ctx, data);
+ percpu_ref_kill(&data->refs);
+ flush_delayed_work(&ctx->rsrc_put_work);
+
ret = wait_for_completion_interruptible(&data->done);
if (!ret)
break;
- ret = io_sqe_rsrc_add_node(ctx, data);
- if (ret < 0)
- break;
- /*
- * There is small possibility that data->done is already completed
- * So reinit it here
- */
+ percpu_ref_resurrect(&data->refs);
+ io_sqe_rsrc_set_node(ctx, data, backup_node);
reinit_completion(&data->done);
mutex_unlock(&ctx->uring_lock);
ret = io_run_task_work_sig();
mutex_lock(&ctx->uring_lock);
- io_sqe_rsrc_kill_node(ctx, data);
- } while (ret >= 0);
- if (ret < 0) {
- percpu_ref_resurrect(&data->refs);
- reinit_completion(&data->done);
- io_sqe_rsrc_set_node(ctx, data, backup_node);
- return ret;
- }
+ if (ret < 0)
+ return ret;
+ backup_node = alloc_fixed_rsrc_ref_node(ctx);
+ if (!backup_node)
+ return -ENOMEM;
+ init_fixed_file_ref_node(ctx, backup_node);
+ } while (1);
destroy_fixed_rsrc_ref_node(backup_node);
return 0;
--
2.24.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] io_uring: keep generic rsrc infra generic
2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
2021-02-19 20:45 ` [PATCH 1/3] io_uring: zero ref_node after killing it Pavel Begunkov
2021-02-19 20:45 ` [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
@ 2021-02-19 20:45 ` Pavel Begunkov
2021-02-19 20:49 ` [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
2021-02-20 0:28 ` Jens Axboe
4 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:45 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_rsrc_ref_quiesce() is a generic resource function, though now it
was wired to allocate and initialise ref nodes with file-specific
callbacks/etc. Keep it sane by passing in as a parameters everything we
need for initialisations, otherwise it will hurt us badly one day.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 26 +++++++++++---------------
1 file changed, 11 insertions(+), 15 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 38ed52065a29..f3af499b12a9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1036,8 +1036,7 @@ static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
static void destroy_fixed_rsrc_ref_node(struct fixed_rsrc_ref_node *ref_node);
static struct fixed_rsrc_ref_node *alloc_fixed_rsrc_ref_node(
struct io_ring_ctx *ctx);
-static void init_fixed_file_ref_node(struct io_ring_ctx *ctx,
- struct fixed_rsrc_ref_node *ref_node);
+static void io_ring_file_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc);
static bool io_rw_reissue(struct io_kiocb *req);
static void io_cqring_fill_event(struct io_kiocb *req, long res);
@@ -7330,11 +7329,19 @@ static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_dat
static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
struct io_ring_ctx *ctx,
- struct fixed_rsrc_ref_node *backup_node)
+ void (*rsrc_put)(struct io_ring_ctx *ctx,
+ struct io_rsrc_put *prsrc))
{
+ struct fixed_rsrc_ref_node *backup_node;
int ret;
do {
+ backup_node = alloc_fixed_rsrc_ref_node(ctx);
+ if (!backup_node)
+ return -ENOMEM;
+ backup_node->rsrc_data = data;
+ backup_node->rsrc_put = rsrc_put;
+
io_sqe_rsrc_kill_node(ctx, data);
percpu_ref_kill(&data->refs);
flush_delayed_work(&ctx->rsrc_put_work);
@@ -7349,13 +7356,8 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
mutex_unlock(&ctx->uring_lock);
ret = io_run_task_work_sig();
mutex_lock(&ctx->uring_lock);
-
if (ret < 0)
return ret;
- backup_node = alloc_fixed_rsrc_ref_node(ctx);
- if (!backup_node)
- return -ENOMEM;
- init_fixed_file_ref_node(ctx, backup_node);
} while (1);
destroy_fixed_rsrc_ref_node(backup_node);
@@ -7390,7 +7392,6 @@ static void free_fixed_rsrc_data(struct fixed_rsrc_data *data)
static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
{
struct fixed_rsrc_data *data = ctx->file_data;
- struct fixed_rsrc_ref_node *backup_node;
unsigned nr_tables, i;
int ret;
@@ -7401,12 +7402,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
*/
if (!data || percpu_ref_is_dying(&data->refs))
return -ENXIO;
- backup_node = alloc_fixed_rsrc_ref_node(ctx);
- if (!backup_node)
- return -ENOMEM;
- init_fixed_file_ref_node(ctx, backup_node);
-
- ret = io_rsrc_ref_quiesce(data, ctx, backup_node);
+ ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
if (ret)
return ret;
--
2.24.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] rsrc quiesce fixes/hardening
2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
` (2 preceding siblings ...)
2021-02-19 20:45 ` [PATCH 3/3] io_uring: keep generic rsrc infra generic Pavel Begunkov
@ 2021-02-19 20:49 ` Pavel Begunkov
2021-02-20 6:32 ` Hao Xu
2021-02-20 0:28 ` Jens Axboe
4 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:49 UTC (permalink / raw)
To: Jens Axboe, io-uring, Hao Xu
On 19/02/2021 20:45, Pavel Begunkov wrote:
> 1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.
s/1/2/
Hao, any chance you guys can drag these patches through the same
tests you've done for "io_uring: don't hold uring_lock ..."?
>
> Pavel Begunkov (3):
> io_uring: zero ref_node after killing it
> io_uring: fix io_rsrc_ref_quiesce races
> io_uring: keep generic rsrc infra generic
>
> fs/io_uring.c | 65 +++++++++++++++++----------------------------------
> 1 file changed, 21 insertions(+), 44 deletions(-)
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] rsrc quiesce fixes/hardening
2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
` (3 preceding siblings ...)
2021-02-19 20:49 ` [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
@ 2021-02-20 0:28 ` Jens Axboe
4 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2021-02-20 0:28 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 2/19/21 1:45 PM, Pavel Begunkov wrote:
> 1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.
>
> Pavel Begunkov (3):
> io_uring: zero ref_node after killing it
> io_uring: fix io_rsrc_ref_quiesce races
> io_uring: keep generic rsrc infra generic
>
> fs/io_uring.c | 65 +++++++++++++++++----------------------------------
> 1 file changed, 21 insertions(+), 44 deletions(-)
Looks (and tests) good to me, applied.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
2021-02-19 20:45 ` [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
@ 2021-02-20 6:31 ` Hao Xu
2021-02-20 14:07 ` Pavel Begunkov
0 siblings, 1 reply; 12+ messages in thread
From: Hao Xu @ 2021-02-20 6:31 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/2/20 上午4:45, Pavel Begunkov 写道:
> There are different types of races in io_rsrc_ref_quiesce() between
> ->release() of percpu_refs and reinit_completion(), fix them by always
> resurrecting between iterations. BTW, clean the function up, because
> DRY.
>
> Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> fs/io_uring.c | 46 +++++++++++++---------------------------------
> 1 file changed, 13 insertions(+), 33 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 50d4dba08f82..38ed52065a29 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7316,19 +7316,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
> percpu_ref_get(&rsrc_data->refs);
> }
>
> -static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
> -{
> - struct fixed_rsrc_ref_node *backup_node;
> -
> - backup_node = alloc_fixed_rsrc_ref_node(ctx);
> - if (!backup_node)
> - return -ENOMEM;
> - init_fixed_file_ref_node(ctx, backup_node);
> - io_sqe_rsrc_set_node(ctx, data, backup_node);
> -
> - return 0;
> -}
> -
> static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
> {
> struct fixed_rsrc_ref_node *ref_node = NULL;
> @@ -7347,36 +7334,29 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
> {
> int ret;
>
> - io_sqe_rsrc_kill_node(ctx, data);
> - percpu_ref_kill(&data->refs);
> -
> - /* wait for all refs nodes to complete */
> - flush_delayed_work(&ctx->rsrc_put_work);
> do {
> + io_sqe_rsrc_kill_node(ctx, data);
> + percpu_ref_kill(&data->refs);
> + flush_delayed_work(&ctx->rsrc_put_work);
> +
> ret = wait_for_completion_interruptible(&data->done);
> if (!ret)
> break;
>
> - ret = io_sqe_rsrc_add_node(ctx, data);
> - if (ret < 0)
> - break;
> - /*
> - * There is small possibility that data->done is already completed
> - * So reinit it here
> - */
> + percpu_ref_resurrect(&data->refs);
I've thought about this, if we resurrect data->refs, then we can't
avoid parallel files unregister by percpu_refs_is_dying.
> + io_sqe_rsrc_set_node(ctx, data, backup_node);
> reinit_completion(&data->done);
> mutex_unlock(&ctx->uring_lock);
> ret = io_run_task_work_sig();
> mutex_lock(&ctx->uring_lock);
> - io_sqe_rsrc_kill_node(ctx, data);
> - } while (ret >= 0);
>
> - if (ret < 0) {
> - percpu_ref_resurrect(&data->refs);
> - reinit_completion(&data->done);
> - io_sqe_rsrc_set_node(ctx, data, backup_node);
> - return ret;
> - }
> + if (ret < 0)
> + return ret;
> + backup_node = alloc_fixed_rsrc_ref_node(ctx);
> + if (!backup_node)
> + return -ENOMEM;
Should we resurrect data->refs and reinit completion before
signal or error return?
> + init_fixed_file_ref_node(ctx, backup_node);
> + } while (1);
>
> destroy_fixed_rsrc_ref_node(backup_node);
> return 0;
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] rsrc quiesce fixes/hardening
2021-02-19 20:49 ` [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
@ 2021-02-20 6:32 ` Hao Xu
2021-02-20 14:14 ` Pavel Begunkov
0 siblings, 1 reply; 12+ messages in thread
From: Hao Xu @ 2021-02-20 6:32 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/2/20 上午4:49, Pavel Begunkov 写道:
> On 19/02/2021 20:45, Pavel Begunkov wrote:
>> 1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.
>
> s/1/2/
>
> Hao, any chance you guys can drag these patches through the same
> tests you've done for "io_uring: don't hold uring_lock ..."?
>
gotcha, will test it soon.
>>
>> Pavel Begunkov (3):
>> io_uring: zero ref_node after killing it
>> io_uring: fix io_rsrc_ref_quiesce races
>> io_uring: keep generic rsrc infra generic
>>
>> fs/io_uring.c | 65 +++++++++++++++++----------------------------------
>> 1 file changed, 21 insertions(+), 44 deletions(-)
>>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
2021-02-20 6:31 ` Hao Xu
@ 2021-02-20 14:07 ` Pavel Begunkov
2021-02-20 14:33 ` Pavel Begunkov
2021-02-21 9:46 ` Hao Xu
0 siblings, 2 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-20 14:07 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 20/02/2021 06:31, Hao Xu wrote:
> 在 2021/2/20 上午4:45, Pavel Begunkov 写道:
>> There are different types of races in io_rsrc_ref_quiesce() between
>> ->release() of percpu_refs and reinit_completion(), fix them by always
>> resurrecting between iterations. BTW, clean the function up, because
>> DRY.
>>
>> Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> fs/io_uring.c | 46 +++++++++++++---------------------------------
>> 1 file changed, 13 insertions(+), 33 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 50d4dba08f82..38ed52065a29 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -7316,19 +7316,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
>> percpu_ref_get(&rsrc_data->refs);
>> }
>> -static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>> -{
>> - struct fixed_rsrc_ref_node *backup_node;
>> -
>> - backup_node = alloc_fixed_rsrc_ref_node(ctx);
>> - if (!backup_node)
>> - return -ENOMEM;
>> - init_fixed_file_ref_node(ctx, backup_node);
>> - io_sqe_rsrc_set_node(ctx, data, backup_node);
>> -
>> - return 0;
>> -}
>> -
>> static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>> {
>> struct fixed_rsrc_ref_node *ref_node = NULL;
>> @@ -7347,36 +7334,29 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
>> {
>> int ret;
>> - io_sqe_rsrc_kill_node(ctx, data);
>> - percpu_ref_kill(&data->refs);
>> -
>> - /* wait for all refs nodes to complete */
>> - flush_delayed_work(&ctx->rsrc_put_work);
>> do {
>> + io_sqe_rsrc_kill_node(ctx, data);
>> + percpu_ref_kill(&data->refs);
>> + flush_delayed_work(&ctx->rsrc_put_work);
>> +
>> ret = wait_for_completion_interruptible(&data->done);
>> if (!ret)
>> break;
>> - ret = io_sqe_rsrc_add_node(ctx, data);
>> - if (ret < 0)
>> - break;
>> - /*
>> - * There is small possibility that data->done is already completed
>> - * So reinit it here
>> - */
>> + percpu_ref_resurrect(&data->refs);
> I've thought about this, if we resurrect data->refs, then we can't
> avoid parallel files unregister by percpu_refs_is_dying.
Right, totally forgot about it, but otherwise we race with data->done.
Didn't test yet, but a diff below should do the trick.
>> + if (ret < 0)
>> + return ret;
>> + backup_node = alloc_fixed_rsrc_ref_node(ctx);
>> + if (!backup_node)
>> + return -ENOMEM;
> Should we resurrect data->refs and reinit completion before
> signal or error return?
Not sure what exactly you mean, we should not release uring_lock with
inconsistent state, so it's done right before unlock. And we should not
do it before wait_for_completion_interruptible() before it would take a
reference.
>> + init_fixed_file_ref_node(ctx, backup_node);
>> + } while (1);
>> destroy_fixed_rsrc_ref_node(backup_node);
>> return 0;
>>
>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ce5fccf00367..0af1572634de 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -236,6 +236,7 @@ struct fixed_rsrc_data {
struct fixed_rsrc_ref_node *node;
struct percpu_ref refs;
struct completion done;
+ bool quiesce;
};
struct io_buffer {
@@ -7335,6 +7336,7 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
struct fixed_rsrc_ref_node *backup_node;
int ret;
+ data->quiesce = true;
do {
backup_node = alloc_fixed_rsrc_ref_node(ctx);
if (!backup_node)
@@ -7353,16 +7355,19 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
percpu_ref_resurrect(&data->refs);
synchronize_rcu();
io_sqe_rsrc_set_node(ctx, data, backup_node);
+ backup_node = NULL;
reinit_completion(&data->done);
mutex_unlock(&ctx->uring_lock);
ret = io_run_task_work_sig();
mutex_lock(&ctx->uring_lock);
if (ret < 0)
return ret;
- } while (1);
+ } while (ret >= 0);
- destroy_fixed_rsrc_ref_node(backup_node);
- return 0;
+ data->quiesce = false;
+ if (backup_node)
+ destroy_fixed_rsrc_ref_node(backup_node);
+ return ret;
}
static struct fixed_rsrc_data *alloc_fixed_rsrc_data(struct io_ring_ctx *ctx)
@@ -7401,7 +7406,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
* Since we possibly drop uring lock later in this function to
* run task work.
*/
- if (!data || percpu_ref_is_dying(&data->refs))
+ if (!data || data->quiesce)
return -ENXIO;
ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
if (ret)
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] rsrc quiesce fixes/hardening
2021-02-20 6:32 ` Hao Xu
@ 2021-02-20 14:14 ` Pavel Begunkov
0 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-20 14:14 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 20/02/2021 06:32, Hao Xu wrote:
> 在 2021/2/20 上午4:49, Pavel Begunkov 写道:
>> On 19/02/2021 20:45, Pavel Begunkov wrote:
>>> 1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.
>>
>> s/1/2/
>>
>> Hao, any chance you guys can drag these patches through the same
>> tests you've done for "io_uring: don't hold uring_lock ..."?
>>
> gotcha, will test it soon.
awesome, thanks
>>>
>>> Pavel Begunkov (3):
>>> io_uring: zero ref_node after killing it
>>> io_uring: fix io_rsrc_ref_quiesce races
>>> io_uring: keep generic rsrc infra generic
>>>
>>> fs/io_uring.c | 65 +++++++++++++++++----------------------------------
>>> 1 file changed, 21 insertions(+), 44 deletions(-)
>>>
>>
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
2021-02-20 14:07 ` Pavel Begunkov
@ 2021-02-20 14:33 ` Pavel Begunkov
2021-02-21 9:46 ` Hao Xu
1 sibling, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-20 14:33 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 20/02/2021 14:07, Pavel Begunkov wrote:
> On 20/02/2021 06:31, Hao Xu wrote:
>> 在 2021/2/20 上午4:45, Pavel Begunkov 写道:
>>> There are different types of races in io_rsrc_ref_quiesce() between
>>> ->release() of percpu_refs and reinit_completion(), fix them by always
>>> resurrecting between iterations. BTW, clean the function up, because
>>> DRY.
>>>
>>> Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>> - /*
>>> - * There is small possibility that data->done is already completed
>>> - * So reinit it here
>>> - */
>>> + percpu_ref_resurrect(&data->refs);
>> I've thought about this, if we resurrect data->refs, then we can't
>> avoid parallel files unregister by percpu_refs_is_dying.
>
> Right, totally forgot about it, but otherwise we race with data->done.
> Didn't test yet, but a diff below should do the trick.
>
>>> + if (ret < 0)
>>> + return ret;
>>> + backup_node = alloc_fixed_rsrc_ref_node(ctx);
>>> + if (!backup_node)
>>> + return -ENOMEM;
>> Should we resurrect data->refs and reinit completion before
>> signal or error return?
>
> Not sure what exactly you mean, we should not release uring_lock with
> inconsistent state, so it's done right before unlock. And we should not
> do it before wait_for_completion_interruptible() before it would take a
> reference.
>
>>> + init_fixed_file_ref_node(ctx, backup_node);
>>> + } while (1);
>>> destroy_fixed_rsrc_ref_node(backup_node);
>>> return 0;
upd
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ce5fccf00367..12796de8ad10 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -236,6 +236,7 @@ struct fixed_rsrc_data {
struct fixed_rsrc_ref_node *node;
struct percpu_ref refs;
struct completion done;
+ bool quiesce;
};
struct io_buffer {
@@ -7335,10 +7336,12 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
struct fixed_rsrc_ref_node *backup_node;
int ret;
+ data->quiesce = true;
do {
backup_node = alloc_fixed_rsrc_ref_node(ctx);
+ ret = -ENOMEM;
if (!backup_node)
- return -ENOMEM;
+ break;
backup_node->rsrc_data = data;
backup_node->rsrc_put = rsrc_put;
@@ -7353,16 +7356,17 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
percpu_ref_resurrect(&data->refs);
synchronize_rcu();
io_sqe_rsrc_set_node(ctx, data, backup_node);
+ backup_node = NULL;
reinit_completion(&data->done);
mutex_unlock(&ctx->uring_lock);
ret = io_run_task_work_sig();
mutex_lock(&ctx->uring_lock);
- if (ret < 0)
- return ret;
- } while (1);
+ } while (ret >= 0);
- destroy_fixed_rsrc_ref_node(backup_node);
- return 0;
+ data->quiesce = false;
+ if (backup_node)
+ destroy_fixed_rsrc_ref_node(backup_node);
+ return ret;
}
static struct fixed_rsrc_data *alloc_fixed_rsrc_data(struct io_ring_ctx *ctx)
@@ -7401,7 +7405,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
* Since we possibly drop uring lock later in this function to
* run task work.
*/
- if (!data || percpu_ref_is_dying(&data->refs))
+ if (!data || data->quiesce)
return -ENXIO;
ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
if (ret)
--
Pavel Begunkov
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
2021-02-20 14:07 ` Pavel Begunkov
2021-02-20 14:33 ` Pavel Begunkov
@ 2021-02-21 9:46 ` Hao Xu
1 sibling, 0 replies; 12+ messages in thread
From: Hao Xu @ 2021-02-21 9:46 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/2/20 下午10:07, Pavel Begunkov 写道:
> On 20/02/2021 06:31, Hao Xu wrote:
>> 在 2021/2/20 上午4:45, Pavel Begunkov 写道:
>>> There are different types of races in io_rsrc_ref_quiesce() between
>>> ->release() of percpu_refs and reinit_completion(), fix them by always
>>> resurrecting between iterations. BTW, clean the function up, because
>>> DRY.
>>>
>>> Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>> fs/io_uring.c | 46 +++++++++++++---------------------------------
>>> 1 file changed, 13 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 50d4dba08f82..38ed52065a29 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -7316,19 +7316,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
>>> percpu_ref_get(&rsrc_data->refs);
>>> }
>>> -static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>>> -{
>>> - struct fixed_rsrc_ref_node *backup_node;
>>> -
>>> - backup_node = alloc_fixed_rsrc_ref_node(ctx);
>>> - if (!backup_node)
>>> - return -ENOMEM;
>>> - init_fixed_file_ref_node(ctx, backup_node);
>>> - io_sqe_rsrc_set_node(ctx, data, backup_node);
>>> -
>>> - return 0;
>>> -}
>>> -
>>> static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>>> {
>>> struct fixed_rsrc_ref_node *ref_node = NULL;
>>> @@ -7347,36 +7334,29 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
>>> {
>>> int ret;
>>> - io_sqe_rsrc_kill_node(ctx, data);
>>> - percpu_ref_kill(&data->refs);
>>> -
>>> - /* wait for all refs nodes to complete */
>>> - flush_delayed_work(&ctx->rsrc_put_work);
>>> do {
>>> + io_sqe_rsrc_kill_node(ctx, data);
>>> + percpu_ref_kill(&data->refs);
>>> + flush_delayed_work(&ctx->rsrc_put_work);
>>> +
>>> ret = wait_for_completion_interruptible(&data->done);
>>> if (!ret)
>>> break;
>>> - ret = io_sqe_rsrc_add_node(ctx, data);
>>> - if (ret < 0)
>>> - break;
>>> - /*
>>> - * There is small possibility that data->done is already completed
>>> - * So reinit it here
>>> - */
>>> + percpu_ref_resurrect(&data->refs);
>> I've thought about this, if we resurrect data->refs, then we can't
>> avoid parallel files unregister by percpu_refs_is_dying.
>
> Right, totally forgot about it, but otherwise we race with data->done.
> Didn't test yet, but a diff below should do the trick.
I'll test the latest version soon.
>
>>> + if (ret < 0)
>>> + return ret;
>>> + backup_node = alloc_fixed_rsrc_ref_node(ctx);
>>> + if (!backup_node)
>>> + return -ENOMEM;
>> Should we resurrect data->refs and reinit completion before
>> signal or error return?
>
> Not sure what exactly you mean, we should not release uring_lock with
> inconsistent state, so it's done right before unlock. And we should not
> do it before wait_for_completion_interruptible() before it would take a
> reference.
Nevermind, I misread the code.
>
>>> + init_fixed_file_ref_node(ctx, backup_node);
>>> + } while (1);
>>> destroy_fixed_rsrc_ref_node(backup_node);
>>> return 0;
>>>
>>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index ce5fccf00367..0af1572634de 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -236,6 +236,7 @@ struct fixed_rsrc_data {
> struct fixed_rsrc_ref_node *node;
> struct percpu_ref refs;
> struct completion done;
> + bool quiesce;
> };
>
> struct io_buffer {
> @@ -7335,6 +7336,7 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
> struct fixed_rsrc_ref_node *backup_node;
> int ret;
>
> + data->quiesce = true;
> do {
> backup_node = alloc_fixed_rsrc_ref_node(ctx);
> if (!backup_node)
> @@ -7353,16 +7355,19 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
> percpu_ref_resurrect(&data->refs);
> synchronize_rcu();
> io_sqe_rsrc_set_node(ctx, data, backup_node);
> + backup_node = NULL;
> reinit_completion(&data->done);
> mutex_unlock(&ctx->uring_lock);
> ret = io_run_task_work_sig();
> mutex_lock(&ctx->uring_lock);
> if (ret < 0)
> return ret;
> - } while (1);
> + } while (ret >= 0);
>
> - destroy_fixed_rsrc_ref_node(backup_node);
> - return 0;
> + data->quiesce = false;
> + if (backup_node)
> + destroy_fixed_rsrc_ref_node(backup_node);
> + return ret;
> }
>
> static struct fixed_rsrc_data *alloc_fixed_rsrc_data(struct io_ring_ctx *ctx)
> @@ -7401,7 +7406,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
> * Since we possibly drop uring lock later in this function to
> * run task work.
> */
> - if (!data || percpu_ref_is_dying(&data->refs))
> + if (!data || data->quiesce)
> return -ENXIO;
> ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
> if (ret)
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-02-21 9:47 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
2021-02-19 20:45 ` [PATCH 1/3] io_uring: zero ref_node after killing it Pavel Begunkov
2021-02-19 20:45 ` [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
2021-02-20 6:31 ` Hao Xu
2021-02-20 14:07 ` Pavel Begunkov
2021-02-20 14:33 ` Pavel Begunkov
2021-02-21 9:46 ` Hao Xu
2021-02-19 20:45 ` [PATCH 3/3] io_uring: keep generic rsrc infra generic Pavel Begunkov
2021-02-19 20:49 ` [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
2021-02-20 6:32 ` Hao Xu
2021-02-20 14:14 ` Pavel Begunkov
2021-02-20 0:28 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox