public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/3] rsrc quiesce fixes/hardening
@ 2021-02-19 20:45 Pavel Begunkov
  2021-02-19 20:45 ` [PATCH 1/3] io_uring: zero ref_node after killing it Pavel Begunkov
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:45 UTC (permalink / raw)
  To: Jens Axboe, io-uring

1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.

Pavel Begunkov (3):
  io_uring: zero ref_node after killing it
  io_uring: fix io_rsrc_ref_quiesce races
  io_uring: keep generic rsrc infra generic

 fs/io_uring.c | 65 +++++++++++++++++----------------------------------
 1 file changed, 21 insertions(+), 44 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] io_uring: zero ref_node after killing it
  2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
@ 2021-02-19 20:45 ` Pavel Begunkov
  2021-02-19 20:45 ` [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:45 UTC (permalink / raw)
  To: Jens Axboe, io-uring

After a rsrc/files reference node's refs are killed, it must never be
used. And that's how it works, it either assigns a new node or kills the
whole data table.

Let's explicitly NULL it, that shouldn't be necessary, but if something
would go wrong I'd rather catch a NULL dereference to using a dangling
pointer.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index b7bae301744b..50d4dba08f82 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7335,6 +7335,7 @@ static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_dat
 
 	io_rsrc_ref_lock(ctx);
 	ref_node = data->node;
+	data->node = NULL;
 	io_rsrc_ref_unlock(ctx);
 	if (ref_node)
 		percpu_ref_kill(&ref_node->refs);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
  2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
  2021-02-19 20:45 ` [PATCH 1/3] io_uring: zero ref_node after killing it Pavel Begunkov
@ 2021-02-19 20:45 ` Pavel Begunkov
  2021-02-20  6:31   ` Hao Xu
  2021-02-19 20:45 ` [PATCH 3/3] io_uring: keep generic rsrc infra generic Pavel Begunkov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:45 UTC (permalink / raw)
  To: Jens Axboe, io-uring

There are different types of races in io_rsrc_ref_quiesce()  between
->release() of percpu_refs and reinit_completion(), fix them by always
resurrecting between iterations. BTW, clean the function up, because
DRY.

Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 46 +++++++++++++---------------------------------
 1 file changed, 13 insertions(+), 33 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 50d4dba08f82..38ed52065a29 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7316,19 +7316,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
 	percpu_ref_get(&rsrc_data->refs);
 }
 
-static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
-{
-	struct fixed_rsrc_ref_node *backup_node;
-
-	backup_node = alloc_fixed_rsrc_ref_node(ctx);
-	if (!backup_node)
-		return -ENOMEM;
-	init_fixed_file_ref_node(ctx, backup_node);
-	io_sqe_rsrc_set_node(ctx, data, backup_node);
-
-	return 0;
-}
-
 static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
 {
 	struct fixed_rsrc_ref_node *ref_node = NULL;
@@ -7347,36 +7334,29 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 {
 	int ret;
 
-	io_sqe_rsrc_kill_node(ctx, data);
-	percpu_ref_kill(&data->refs);
-
-	/* wait for all refs nodes to complete */
-	flush_delayed_work(&ctx->rsrc_put_work);
 	do {
+		io_sqe_rsrc_kill_node(ctx, data);
+		percpu_ref_kill(&data->refs);
+		flush_delayed_work(&ctx->rsrc_put_work);
+
 		ret = wait_for_completion_interruptible(&data->done);
 		if (!ret)
 			break;
 
-		ret = io_sqe_rsrc_add_node(ctx, data);
-		if (ret < 0)
-			break;
-		/*
-		 * There is small possibility that data->done is already completed
-		 * So reinit it here
-		 */
+		percpu_ref_resurrect(&data->refs);
+		io_sqe_rsrc_set_node(ctx, data, backup_node);
 		reinit_completion(&data->done);
 		mutex_unlock(&ctx->uring_lock);
 		ret = io_run_task_work_sig();
 		mutex_lock(&ctx->uring_lock);
-		io_sqe_rsrc_kill_node(ctx, data);
-	} while (ret >= 0);
 
-	if (ret < 0) {
-		percpu_ref_resurrect(&data->refs);
-		reinit_completion(&data->done);
-		io_sqe_rsrc_set_node(ctx, data, backup_node);
-		return ret;
-	}
+		if (ret < 0)
+			return ret;
+		backup_node = alloc_fixed_rsrc_ref_node(ctx);
+		if (!backup_node)
+			return -ENOMEM;
+		init_fixed_file_ref_node(ctx, backup_node);
+	} while (1);
 
 	destroy_fixed_rsrc_ref_node(backup_node);
 	return 0;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] io_uring: keep generic rsrc infra generic
  2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
  2021-02-19 20:45 ` [PATCH 1/3] io_uring: zero ref_node after killing it Pavel Begunkov
  2021-02-19 20:45 ` [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
@ 2021-02-19 20:45 ` Pavel Begunkov
  2021-02-19 20:49 ` [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
  2021-02-20  0:28 ` Jens Axboe
  4 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:45 UTC (permalink / raw)
  To: Jens Axboe, io-uring

io_rsrc_ref_quiesce() is a generic resource function, though now it
was wired to allocate and initialise ref nodes with file-specific
callbacks/etc. Keep it sane by passing in as a parameters everything we
need for initialisations, otherwise it will hurt us badly one day.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 26 +++++++++++---------------
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 38ed52065a29..f3af499b12a9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1036,8 +1036,7 @@ static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 static void destroy_fixed_rsrc_ref_node(struct fixed_rsrc_ref_node *ref_node);
 static struct fixed_rsrc_ref_node *alloc_fixed_rsrc_ref_node(
 			struct io_ring_ctx *ctx);
-static void init_fixed_file_ref_node(struct io_ring_ctx *ctx,
-				     struct fixed_rsrc_ref_node *ref_node);
+static void io_ring_file_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc);
 
 static bool io_rw_reissue(struct io_kiocb *req);
 static void io_cqring_fill_event(struct io_kiocb *req, long res);
@@ -7330,11 +7329,19 @@ static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_dat
 
 static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 			       struct io_ring_ctx *ctx,
-			       struct fixed_rsrc_ref_node *backup_node)
+			       void (*rsrc_put)(struct io_ring_ctx *ctx,
+						struct io_rsrc_put *prsrc))
 {
+	struct fixed_rsrc_ref_node *backup_node;
 	int ret;
 
 	do {
+		backup_node = alloc_fixed_rsrc_ref_node(ctx);
+		if (!backup_node)
+			return -ENOMEM;
+		backup_node->rsrc_data = data;
+		backup_node->rsrc_put = rsrc_put;
+
 		io_sqe_rsrc_kill_node(ctx, data);
 		percpu_ref_kill(&data->refs);
 		flush_delayed_work(&ctx->rsrc_put_work);
@@ -7349,13 +7356,8 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 		mutex_unlock(&ctx->uring_lock);
 		ret = io_run_task_work_sig();
 		mutex_lock(&ctx->uring_lock);
-
 		if (ret < 0)
 			return ret;
-		backup_node = alloc_fixed_rsrc_ref_node(ctx);
-		if (!backup_node)
-			return -ENOMEM;
-		init_fixed_file_ref_node(ctx, backup_node);
 	} while (1);
 
 	destroy_fixed_rsrc_ref_node(backup_node);
@@ -7390,7 +7392,6 @@ static void free_fixed_rsrc_data(struct fixed_rsrc_data *data)
 static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
 {
 	struct fixed_rsrc_data *data = ctx->file_data;
-	struct fixed_rsrc_ref_node *backup_node;
 	unsigned nr_tables, i;
 	int ret;
 
@@ -7401,12 +7402,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
 	 */
 	if (!data || percpu_ref_is_dying(&data->refs))
 		return -ENXIO;
-	backup_node = alloc_fixed_rsrc_ref_node(ctx);
-	if (!backup_node)
-		return -ENOMEM;
-	init_fixed_file_ref_node(ctx, backup_node);
-
-	ret = io_rsrc_ref_quiesce(data, ctx, backup_node);
+	ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
 	if (ret)
 		return ret;
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] rsrc quiesce fixes/hardening
  2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-02-19 20:45 ` [PATCH 3/3] io_uring: keep generic rsrc infra generic Pavel Begunkov
@ 2021-02-19 20:49 ` Pavel Begunkov
  2021-02-20  6:32   ` Hao Xu
  2021-02-20  0:28 ` Jens Axboe
  4 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-19 20:49 UTC (permalink / raw)
  To: Jens Axboe, io-uring, Hao Xu

On 19/02/2021 20:45, Pavel Begunkov wrote:
> 1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.

s/1/2/

Hao, any chance you guys can drag these patches through the same
tests you've done for "io_uring: don't hold uring_lock ..."?

> 
> Pavel Begunkov (3):
>   io_uring: zero ref_node after killing it
>   io_uring: fix io_rsrc_ref_quiesce races
>   io_uring: keep generic rsrc infra generic
> 
>  fs/io_uring.c | 65 +++++++++++++++++----------------------------------
>  1 file changed, 21 insertions(+), 44 deletions(-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] rsrc quiesce fixes/hardening
  2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-02-19 20:49 ` [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
@ 2021-02-20  0:28 ` Jens Axboe
  4 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2021-02-20  0:28 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/19/21 1:45 PM, Pavel Begunkov wrote:
> 1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.
> 
> Pavel Begunkov (3):
>   io_uring: zero ref_node after killing it
>   io_uring: fix io_rsrc_ref_quiesce races
>   io_uring: keep generic rsrc infra generic
> 
>  fs/io_uring.c | 65 +++++++++++++++++----------------------------------
>  1 file changed, 21 insertions(+), 44 deletions(-)

Looks (and tests) good to me, applied.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
  2021-02-19 20:45 ` [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
@ 2021-02-20  6:31   ` Hao Xu
  2021-02-20 14:07     ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: Hao Xu @ 2021-02-20  6:31 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

在 2021/2/20 上午4:45, Pavel Begunkov 写道:
> There are different types of races in io_rsrc_ref_quiesce()  between
> ->release() of percpu_refs and reinit_completion(), fix them by always
> resurrecting between iterations. BTW, clean the function up, because
> DRY.
> 
> Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
>   fs/io_uring.c | 46 +++++++++++++---------------------------------
>   1 file changed, 13 insertions(+), 33 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 50d4dba08f82..38ed52065a29 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -7316,19 +7316,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
>   	percpu_ref_get(&rsrc_data->refs);
>   }
>   
> -static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
> -{
> -	struct fixed_rsrc_ref_node *backup_node;
> -
> -	backup_node = alloc_fixed_rsrc_ref_node(ctx);
> -	if (!backup_node)
> -		return -ENOMEM;
> -	init_fixed_file_ref_node(ctx, backup_node);
> -	io_sqe_rsrc_set_node(ctx, data, backup_node);
> -
> -	return 0;
> -}
> -
>   static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>   {
>   	struct fixed_rsrc_ref_node *ref_node = NULL;
> @@ -7347,36 +7334,29 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
>   {
>   	int ret;
>   
> -	io_sqe_rsrc_kill_node(ctx, data);
> -	percpu_ref_kill(&data->refs);
> -
> -	/* wait for all refs nodes to complete */
> -	flush_delayed_work(&ctx->rsrc_put_work);
>   	do {
> +		io_sqe_rsrc_kill_node(ctx, data);
> +		percpu_ref_kill(&data->refs);
> +		flush_delayed_work(&ctx->rsrc_put_work);
> +
>   		ret = wait_for_completion_interruptible(&data->done);
>   		if (!ret)
>   			break;
>   
> -		ret = io_sqe_rsrc_add_node(ctx, data);
> -		if (ret < 0)
> -			break;
> -		/*
> -		 * There is small possibility that data->done is already completed
> -		 * So reinit it here
> -		 */
> +		percpu_ref_resurrect(&data->refs);
I've thought about this, if we resurrect data->refs, then we can't
avoid parallel files unregister by percpu_refs_is_dying.
> +		io_sqe_rsrc_set_node(ctx, data, backup_node);
>   		reinit_completion(&data->done);
>   		mutex_unlock(&ctx->uring_lock);
>   		ret = io_run_task_work_sig();
>   		mutex_lock(&ctx->uring_lock);
> -		io_sqe_rsrc_kill_node(ctx, data);
> -	} while (ret >= 0);
>   
> -	if (ret < 0) {
> -		percpu_ref_resurrect(&data->refs);
> -		reinit_completion(&data->done);
> -		io_sqe_rsrc_set_node(ctx, data, backup_node);
> -		return ret;
> -	}
> +		if (ret < 0)
> +			return ret;
> +		backup_node = alloc_fixed_rsrc_ref_node(ctx);
> +		if (!backup_node)
> +			return -ENOMEM;
Should we resurrect data->refs and reinit completion before
signal or error return?
> +		init_fixed_file_ref_node(ctx, backup_node);
> +	} while (1);
>   
>   	destroy_fixed_rsrc_ref_node(backup_node);
>   	return 0;
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] rsrc quiesce fixes/hardening
  2021-02-19 20:49 ` [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
@ 2021-02-20  6:32   ` Hao Xu
  2021-02-20 14:14     ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: Hao Xu @ 2021-02-20  6:32 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

在 2021/2/20 上午4:49, Pavel Begunkov 写道:
> On 19/02/2021 20:45, Pavel Begunkov wrote:
>> 1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.
> 
> s/1/2/
> 
> Hao, any chance you guys can drag these patches through the same
> tests you've done for "io_uring: don't hold uring_lock ..."?
> 
gotcha, will test it soon.
>>
>> Pavel Begunkov (3):
>>    io_uring: zero ref_node after killing it
>>    io_uring: fix io_rsrc_ref_quiesce races
>>    io_uring: keep generic rsrc infra generic
>>
>>   fs/io_uring.c | 65 +++++++++++++++++----------------------------------
>>   1 file changed, 21 insertions(+), 44 deletions(-)
>>
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
  2021-02-20  6:31   ` Hao Xu
@ 2021-02-20 14:07     ` Pavel Begunkov
  2021-02-20 14:33       ` Pavel Begunkov
  2021-02-21  9:46       ` Hao Xu
  0 siblings, 2 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-20 14:07 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, io-uring

On 20/02/2021 06:31, Hao Xu wrote:
> 在 2021/2/20 上午4:45, Pavel Begunkov 写道:
>> There are different types of races in io_rsrc_ref_quiesce()  between
>> ->release() of percpu_refs and reinit_completion(), fix them by always
>> resurrecting between iterations. BTW, clean the function up, because
>> DRY.
>>
>> Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>>   fs/io_uring.c | 46 +++++++++++++---------------------------------
>>   1 file changed, 13 insertions(+), 33 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 50d4dba08f82..38ed52065a29 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -7316,19 +7316,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
>>       percpu_ref_get(&rsrc_data->refs);
>>   }
>>   -static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>> -{
>> -    struct fixed_rsrc_ref_node *backup_node;
>> -
>> -    backup_node = alloc_fixed_rsrc_ref_node(ctx);
>> -    if (!backup_node)
>> -        return -ENOMEM;
>> -    init_fixed_file_ref_node(ctx, backup_node);
>> -    io_sqe_rsrc_set_node(ctx, data, backup_node);
>> -
>> -    return 0;
>> -}
>> -
>>   static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>>   {
>>       struct fixed_rsrc_ref_node *ref_node = NULL;
>> @@ -7347,36 +7334,29 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
>>   {
>>       int ret;
>>   -    io_sqe_rsrc_kill_node(ctx, data);
>> -    percpu_ref_kill(&data->refs);
>> -
>> -    /* wait for all refs nodes to complete */
>> -    flush_delayed_work(&ctx->rsrc_put_work);
>>       do {
>> +        io_sqe_rsrc_kill_node(ctx, data);
>> +        percpu_ref_kill(&data->refs);
>> +        flush_delayed_work(&ctx->rsrc_put_work);
>> +
>>           ret = wait_for_completion_interruptible(&data->done);
>>           if (!ret)
>>               break;
>>   -        ret = io_sqe_rsrc_add_node(ctx, data);
>> -        if (ret < 0)
>> -            break;
>> -        /*
>> -         * There is small possibility that data->done is already completed
>> -         * So reinit it here
>> -         */
>> +        percpu_ref_resurrect(&data->refs);
> I've thought about this, if we resurrect data->refs, then we can't
> avoid parallel files unregister by percpu_refs_is_dying.

Right, totally forgot about it, but otherwise we race with data->done.
Didn't test yet, but a diff below should do the trick.

>> +        if (ret < 0)
>> +            return ret;
>> +        backup_node = alloc_fixed_rsrc_ref_node(ctx);
>> +        if (!backup_node)
>> +            return -ENOMEM;
> Should we resurrect data->refs and reinit completion before
> signal or error return?

Not sure what exactly you mean, we should not release uring_lock with
inconsistent state, so it's done right before unlock. And we should not
do it before wait_for_completion_interruptible() before it would take a
reference.

>> +        init_fixed_file_ref_node(ctx, backup_node);
>> +    } while (1);
>>         destroy_fixed_rsrc_ref_node(backup_node);
>>       return 0;
>>
> 

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ce5fccf00367..0af1572634de 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -236,6 +236,7 @@ struct fixed_rsrc_data {
 	struct fixed_rsrc_ref_node	*node;
 	struct percpu_ref		refs;
 	struct completion		done;
+	bool				quiesce;
 };
 
 struct io_buffer {
@@ -7335,6 +7336,7 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 	struct fixed_rsrc_ref_node *backup_node;
 	int ret;
 
+	data->quiesce = true;
 	do {
 		backup_node = alloc_fixed_rsrc_ref_node(ctx);
 		if (!backup_node)
@@ -7353,16 +7355,19 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 		percpu_ref_resurrect(&data->refs);
 		synchronize_rcu();
 		io_sqe_rsrc_set_node(ctx, data, backup_node);
+		backup_node = NULL;
 		reinit_completion(&data->done);
 		mutex_unlock(&ctx->uring_lock);
 		ret = io_run_task_work_sig();
 		mutex_lock(&ctx->uring_lock);
 		if (ret < 0)
 			return ret;
-	} while (1);
+	} while (ret >= 0);
 
-	destroy_fixed_rsrc_ref_node(backup_node);
-	return 0;
+	data->quiesce = false;
+	if (backup_node)
+		destroy_fixed_rsrc_ref_node(backup_node);
+	return ret;
 }
 
 static struct fixed_rsrc_data *alloc_fixed_rsrc_data(struct io_ring_ctx *ctx)
@@ -7401,7 +7406,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
 	 * Since we possibly drop uring lock later in this function to
 	 * run task work.
 	 */
-	if (!data || percpu_ref_is_dying(&data->refs))
+	if (!data || data->quiesce)
 		return -ENXIO;
 	ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
 	if (ret)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] rsrc quiesce fixes/hardening
  2021-02-20  6:32   ` Hao Xu
@ 2021-02-20 14:14     ` Pavel Begunkov
  0 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-20 14:14 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, io-uring

On 20/02/2021 06:32, Hao Xu wrote:
> 在 2021/2/20 上午4:49, Pavel Begunkov 写道:
>> On 19/02/2021 20:45, Pavel Begunkov wrote:
>>> 1/3 addresses new races in io_rsrc_ref_quiesce(), others are hardenings.
>>
>> s/1/2/
>>
>> Hao, any chance you guys can drag these patches through the same
>> tests you've done for "io_uring: don't hold uring_lock ..."?
>>
> gotcha, will test it soon.

awesome, thanks

>>>
>>> Pavel Begunkov (3):
>>>    io_uring: zero ref_node after killing it
>>>    io_uring: fix io_rsrc_ref_quiesce races
>>>    io_uring: keep generic rsrc infra generic
>>>
>>>   fs/io_uring.c | 65 +++++++++++++++++----------------------------------
>>>   1 file changed, 21 insertions(+), 44 deletions(-)
>>>
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
  2021-02-20 14:07     ` Pavel Begunkov
@ 2021-02-20 14:33       ` Pavel Begunkov
  2021-02-21  9:46       ` Hao Xu
  1 sibling, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-02-20 14:33 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, io-uring

On 20/02/2021 14:07, Pavel Begunkov wrote:
> On 20/02/2021 06:31, Hao Xu wrote:
>> 在 2021/2/20 上午4:45, Pavel Begunkov 写道:
>>> There are different types of races in io_rsrc_ref_quiesce()  between
>>> ->release() of percpu_refs and reinit_completion(), fix them by always
>>> resurrecting between iterations. BTW, clean the function up, because
>>> DRY.
>>>
>>> Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>> -        /*
>>> -         * There is small possibility that data->done is already completed
>>> -         * So reinit it here
>>> -         */
>>> +        percpu_ref_resurrect(&data->refs);
>> I've thought about this, if we resurrect data->refs, then we can't
>> avoid parallel files unregister by percpu_refs_is_dying.
> 
> Right, totally forgot about it, but otherwise we race with data->done.
> Didn't test yet, but a diff below should do the trick.
> 
>>> +        if (ret < 0)
>>> +            return ret;
>>> +        backup_node = alloc_fixed_rsrc_ref_node(ctx);
>>> +        if (!backup_node)
>>> +            return -ENOMEM;
>> Should we resurrect data->refs and reinit completion before
>> signal or error return?
> 
> Not sure what exactly you mean, we should not release uring_lock with
> inconsistent state, so it's done right before unlock. And we should not
> do it before wait_for_completion_interruptible() before it would take a
> reference.
> 
>>> +        init_fixed_file_ref_node(ctx, backup_node);
>>> +    } while (1);
>>>         destroy_fixed_rsrc_ref_node(backup_node);
>>>       return 0;

upd

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ce5fccf00367..12796de8ad10 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -236,6 +236,7 @@ struct fixed_rsrc_data {
 	struct fixed_rsrc_ref_node	*node;
 	struct percpu_ref		refs;
 	struct completion		done;
+	bool				quiesce;
 };
 
 struct io_buffer {
@@ -7335,10 +7336,12 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 	struct fixed_rsrc_ref_node *backup_node;
 	int ret;
 
+	data->quiesce = true;
 	do {
 		backup_node = alloc_fixed_rsrc_ref_node(ctx);
+		ret = -ENOMEM;
 		if (!backup_node)
-			return -ENOMEM;
+			break;
 		backup_node->rsrc_data = data;
 		backup_node->rsrc_put = rsrc_put;
 
@@ -7353,16 +7356,17 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 		percpu_ref_resurrect(&data->refs);
 		synchronize_rcu();
 		io_sqe_rsrc_set_node(ctx, data, backup_node);
+		backup_node = NULL;
 		reinit_completion(&data->done);
 		mutex_unlock(&ctx->uring_lock);
 		ret = io_run_task_work_sig();
 		mutex_lock(&ctx->uring_lock);
-		if (ret < 0)
-			return ret;
-	} while (1);
+	} while (ret >= 0);
 
-	destroy_fixed_rsrc_ref_node(backup_node);
-	return 0;
+	data->quiesce = false;
+	if (backup_node)
+		destroy_fixed_rsrc_ref_node(backup_node);
+	return ret;
 }
 
 static struct fixed_rsrc_data *alloc_fixed_rsrc_data(struct io_ring_ctx *ctx)
@@ -7401,7 +7405,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
 	 * Since we possibly drop uring lock later in this function to
 	 * run task work.
 	 */
-	if (!data || percpu_ref_is_dying(&data->refs))
+	if (!data || data->quiesce)
 		return -ENXIO;
 	ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
 	if (ret)


-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races
  2021-02-20 14:07     ` Pavel Begunkov
  2021-02-20 14:33       ` Pavel Begunkov
@ 2021-02-21  9:46       ` Hao Xu
  1 sibling, 0 replies; 12+ messages in thread
From: Hao Xu @ 2021-02-21  9:46 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

在 2021/2/20 下午10:07, Pavel Begunkov 写道:
> On 20/02/2021 06:31, Hao Xu wrote:
>> 在 2021/2/20 上午4:45, Pavel Begunkov 写道:
>>> There are different types of races in io_rsrc_ref_quiesce()  between
>>> ->release() of percpu_refs and reinit_completion(), fix them by always
>>> resurrecting between iterations. BTW, clean the function up, because
>>> DRY.
>>>
>>> Fixes: 0ce4a72632317 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>>    fs/io_uring.c | 46 +++++++++++++---------------------------------
>>>    1 file changed, 13 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 50d4dba08f82..38ed52065a29 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -7316,19 +7316,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
>>>        percpu_ref_get(&rsrc_data->refs);
>>>    }
>>>    -static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>>> -{
>>> -    struct fixed_rsrc_ref_node *backup_node;
>>> -
>>> -    backup_node = alloc_fixed_rsrc_ref_node(ctx);
>>> -    if (!backup_node)
>>> -        return -ENOMEM;
>>> -    init_fixed_file_ref_node(ctx, backup_node);
>>> -    io_sqe_rsrc_set_node(ctx, data, backup_node);
>>> -
>>> -    return 0;
>>> -}
>>> -
>>>    static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
>>>    {
>>>        struct fixed_rsrc_ref_node *ref_node = NULL;
>>> @@ -7347,36 +7334,29 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
>>>    {
>>>        int ret;
>>>    -    io_sqe_rsrc_kill_node(ctx, data);
>>> -    percpu_ref_kill(&data->refs);
>>> -
>>> -    /* wait for all refs nodes to complete */
>>> -    flush_delayed_work(&ctx->rsrc_put_work);
>>>        do {
>>> +        io_sqe_rsrc_kill_node(ctx, data);
>>> +        percpu_ref_kill(&data->refs);
>>> +        flush_delayed_work(&ctx->rsrc_put_work);
>>> +
>>>            ret = wait_for_completion_interruptible(&data->done);
>>>            if (!ret)
>>>                break;
>>>    -        ret = io_sqe_rsrc_add_node(ctx, data);
>>> -        if (ret < 0)
>>> -            break;
>>> -        /*
>>> -         * There is small possibility that data->done is already completed
>>> -         * So reinit it here
>>> -         */
>>> +        percpu_ref_resurrect(&data->refs);
>> I've thought about this, if we resurrect data->refs, then we can't
>> avoid parallel files unregister by percpu_refs_is_dying.
> 
> Right, totally forgot about it, but otherwise we race with data->done.
> Didn't test yet, but a diff below should do the trick.
I'll test the latest version soon.
> 
>>> +        if (ret < 0)
>>> +            return ret;
>>> +        backup_node = alloc_fixed_rsrc_ref_node(ctx);
>>> +        if (!backup_node)
>>> +            return -ENOMEM;
>> Should we resurrect data->refs and reinit completion before
>> signal or error return?
> 
> Not sure what exactly you mean, we should not release uring_lock with
> inconsistent state, so it's done right before unlock. And we should not
> do it before wait_for_completion_interruptible() before it would take a
> reference.
Nevermind, I misread the code.
> 
>>> +        init_fixed_file_ref_node(ctx, backup_node);
>>> +    } while (1);
>>>          destroy_fixed_rsrc_ref_node(backup_node);
>>>        return 0;
>>>
>>
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index ce5fccf00367..0af1572634de 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -236,6 +236,7 @@ struct fixed_rsrc_data {
>   	struct fixed_rsrc_ref_node	*node;
>   	struct percpu_ref		refs;
>   	struct completion		done;
> +	bool				quiesce;
>   };
>   
>   struct io_buffer {
> @@ -7335,6 +7336,7 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
>   	struct fixed_rsrc_ref_node *backup_node;
>   	int ret;
>   
> +	data->quiesce = true;
>   	do {
>   		backup_node = alloc_fixed_rsrc_ref_node(ctx);
>   		if (!backup_node)
> @@ -7353,16 +7355,19 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
>   		percpu_ref_resurrect(&data->refs);
>   		synchronize_rcu();
>   		io_sqe_rsrc_set_node(ctx, data, backup_node);
> +		backup_node = NULL;
>   		reinit_completion(&data->done);
>   		mutex_unlock(&ctx->uring_lock);
>   		ret = io_run_task_work_sig();
>   		mutex_lock(&ctx->uring_lock);
>   		if (ret < 0)
>   			return ret;
> -	} while (1);
> +	} while (ret >= 0);
>   
> -	destroy_fixed_rsrc_ref_node(backup_node);
> -	return 0;
> +	data->quiesce = false;
> +	if (backup_node)
> +		destroy_fixed_rsrc_ref_node(backup_node);
> +	return ret;
>   }
>   
>   static struct fixed_rsrc_data *alloc_fixed_rsrc_data(struct io_ring_ctx *ctx)
> @@ -7401,7 +7406,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
>   	 * Since we possibly drop uring lock later in this function to
>   	 * run task work.
>   	 */
> -	if (!data || percpu_ref_is_dying(&data->refs))
> +	if (!data || data->quiesce)
>   		return -ENXIO;
>   	ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
>   	if (ret)
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-21  9:47 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-19 20:45 [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
2021-02-19 20:45 ` [PATCH 1/3] io_uring: zero ref_node after killing it Pavel Begunkov
2021-02-19 20:45 ` [PATCH 2/3] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
2021-02-20  6:31   ` Hao Xu
2021-02-20 14:07     ` Pavel Begunkov
2021-02-20 14:33       ` Pavel Begunkov
2021-02-21  9:46       ` Hao Xu
2021-02-19 20:45 ` [PATCH 3/3] io_uring: keep generic rsrc infra generic Pavel Begunkov
2021-02-19 20:49 ` [PATCH 0/3] rsrc quiesce fixes/hardening Pavel Begunkov
2021-02-20  6:32   ` Hao Xu
2021-02-20 14:14     ` Pavel Begunkov
2021-02-20  0:28 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox