public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2 0/4] rsrc quiesce fixes/hardening v2
@ 2021-02-20 18:03 Pavel Begunkov
  2021-02-20 18:03 ` [PATCH v2 1/4] io_uring: zero ref_node after killing it Pavel Begunkov
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

v2: concurrent quiesce avoidance (Hao)
    resurrect-release patch

Pavel Begunkov (4):
  io_uring: zero ref_node after killing it
  io_uring: fix io_rsrc_ref_quiesce races
  io_uring: keep generic rsrc infra generic
  io_uring: wait potential ->release() on resurrect

 fs/io_uring.c | 96 ++++++++++++++++++++++++---------------------------
 1 file changed, 45 insertions(+), 51 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/4] io_uring: zero ref_node after killing it
  2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
@ 2021-02-20 18:03 ` Pavel Begunkov
  2021-02-20 18:03 ` [PATCH v2 2/4] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

After a rsrc/files reference node's refs are killed, it must never be
used. And that's how it works, it either assigns a new node or kills the
whole data table.

Let's explicitly NULL it, that shouldn't be necessary, but if something
would go wrong I'd rather catch a NULL dereference to using a dangling
pointer.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index b7bae301744b..50d4dba08f82 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7335,6 +7335,7 @@ static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_dat
 
 	io_rsrc_ref_lock(ctx);
 	ref_node = data->node;
+	data->node = NULL;
 	io_rsrc_ref_unlock(ctx);
 	if (ref_node)
 		percpu_ref_kill(&ref_node->refs);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/4] io_uring: fix io_rsrc_ref_quiesce races
  2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
  2021-02-20 18:03 ` [PATCH v2 1/4] io_uring: zero ref_node after killing it Pavel Begunkov
@ 2021-02-20 18:03 ` Pavel Begunkov
  2021-02-20 18:03 ` [PATCH v2 3/4] io_uring: keep generic rsrc infra generic Pavel Begunkov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

There are different types of races in io_rsrc_ref_quiesce()  between
->release() of percpu_refs and reinit_completion(), fix them by always
resurrecting between iterations. BTW, clean the function up, because
DRY.

Fixes: a4f2225d1cb2 ("io_uring: don't hold uring_lock when calling io_run_task_work*")
Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 57 +++++++++++++++++++++------------------------------
 1 file changed, 23 insertions(+), 34 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 50d4dba08f82..292fba2b8e36 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -236,6 +236,7 @@ struct fixed_rsrc_data {
 	struct fixed_rsrc_ref_node	*node;
 	struct percpu_ref		refs;
 	struct completion		done;
+	bool				quiesce;
 };
 
 struct io_buffer {
@@ -7316,19 +7317,6 @@ static void io_sqe_rsrc_set_node(struct io_ring_ctx *ctx,
 	percpu_ref_get(&rsrc_data->refs);
 }
 
-static int io_sqe_rsrc_add_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
-{
-	struct fixed_rsrc_ref_node *backup_node;
-
-	backup_node = alloc_fixed_rsrc_ref_node(ctx);
-	if (!backup_node)
-		return -ENOMEM;
-	init_fixed_file_ref_node(ctx, backup_node);
-	io_sqe_rsrc_set_node(ctx, data, backup_node);
-
-	return 0;
-}
-
 static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_data *data)
 {
 	struct fixed_rsrc_ref_node *ref_node = NULL;
@@ -7347,39 +7335,40 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 {
 	int ret;
 
-	io_sqe_rsrc_kill_node(ctx, data);
-	percpu_ref_kill(&data->refs);
+	if (data->quiesce)
+		return -ENXIO;
 
-	/* wait for all refs nodes to complete */
-	flush_delayed_work(&ctx->rsrc_put_work);
+	data->quiesce = true;
 	do {
+		io_sqe_rsrc_kill_node(ctx, data);
+		percpu_ref_kill(&data->refs);
+		flush_delayed_work(&ctx->rsrc_put_work);
+
 		ret = wait_for_completion_interruptible(&data->done);
 		if (!ret)
 			break;
 
-		ret = io_sqe_rsrc_add_node(ctx, data);
-		if (ret < 0)
-			break;
-		/*
-		 * There is small possibility that data->done is already completed
-		 * So reinit it here
-		 */
+		percpu_ref_resurrect(&data->refs);
+		io_sqe_rsrc_set_node(ctx, data, backup_node);
+		backup_node = NULL;
 		reinit_completion(&data->done);
 		mutex_unlock(&ctx->uring_lock);
 		ret = io_run_task_work_sig();
 		mutex_lock(&ctx->uring_lock);
-		io_sqe_rsrc_kill_node(ctx, data);
-	} while (ret >= 0);
 
-	if (ret < 0) {
-		percpu_ref_resurrect(&data->refs);
-		reinit_completion(&data->done);
-		io_sqe_rsrc_set_node(ctx, data, backup_node);
-		return ret;
-	}
+		if (ret < 0)
+			break;
+		backup_node = alloc_fixed_rsrc_ref_node(ctx);
+		ret = -ENOMEM;
+		if (!backup_node)
+			break;
+		init_fixed_file_ref_node(ctx, backup_node);
+	} while (1);
+	data->quiesce = false;
 
-	destroy_fixed_rsrc_ref_node(backup_node);
-	return 0;
+	if (backup_node)
+		destroy_fixed_rsrc_ref_node(backup_node);
+	return ret;
 }
 
 static struct fixed_rsrc_data *alloc_fixed_rsrc_data(struct io_ring_ctx *ctx)
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 3/4] io_uring: keep generic rsrc infra generic
  2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
  2021-02-20 18:03 ` [PATCH v2 1/4] io_uring: zero ref_node after killing it Pavel Begunkov
  2021-02-20 18:03 ` [PATCH v2 2/4] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
@ 2021-02-20 18:03 ` Pavel Begunkov
  2021-02-20 18:03 ` [PATCH v2 4/4] io_uring: wait potential ->release() on resurrect Pavel Begunkov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring

io_rsrc_ref_quiesce() is a generic resource function, though now it
was wired to allocate and initialise ref nodes with file-specific
callbacks/etc. Keep it sane by passing in as a parameters everything we
need for initialisations, otherwise it will hurt us badly one day.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 32 +++++++++++++-------------------
 1 file changed, 13 insertions(+), 19 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 292fba2b8e36..b00ab7138410 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1037,8 +1037,7 @@ static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 static void destroy_fixed_rsrc_ref_node(struct fixed_rsrc_ref_node *ref_node);
 static struct fixed_rsrc_ref_node *alloc_fixed_rsrc_ref_node(
 			struct io_ring_ctx *ctx);
-static void init_fixed_file_ref_node(struct io_ring_ctx *ctx,
-				     struct fixed_rsrc_ref_node *ref_node);
+static void io_ring_file_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc);
 
 static bool io_rw_reissue(struct io_kiocb *req);
 static void io_cqring_fill_event(struct io_kiocb *req, long res);
@@ -7331,8 +7330,10 @@ static void io_sqe_rsrc_kill_node(struct io_ring_ctx *ctx, struct fixed_rsrc_dat
 
 static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 			       struct io_ring_ctx *ctx,
-			       struct fixed_rsrc_ref_node *backup_node)
+			       void (*rsrc_put)(struct io_ring_ctx *ctx,
+			                        struct io_rsrc_put *prsrc))
 {
+	struct fixed_rsrc_ref_node *backup_node;
 	int ret;
 
 	if (data->quiesce)
@@ -7340,6 +7341,13 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 
 	data->quiesce = true;
 	do {
+		ret = -ENOMEM;
+		backup_node = alloc_fixed_rsrc_ref_node(ctx);
+		if (!backup_node)
+			break;
+		backup_node->rsrc_data = data;
+		backup_node->rsrc_put = rsrc_put;
+
 		io_sqe_rsrc_kill_node(ctx, data);
 		percpu_ref_kill(&data->refs);
 		flush_delayed_work(&ctx->rsrc_put_work);
@@ -7355,15 +7363,7 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 		mutex_unlock(&ctx->uring_lock);
 		ret = io_run_task_work_sig();
 		mutex_lock(&ctx->uring_lock);
-
-		if (ret < 0)
-			break;
-		backup_node = alloc_fixed_rsrc_ref_node(ctx);
-		ret = -ENOMEM;
-		if (!backup_node)
-			break;
-		init_fixed_file_ref_node(ctx, backup_node);
-	} while (1);
+	} while (ret >= 0);
 	data->quiesce = false;
 
 	if (backup_node)
@@ -7399,7 +7399,6 @@ static void free_fixed_rsrc_data(struct fixed_rsrc_data *data)
 static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
 {
 	struct fixed_rsrc_data *data = ctx->file_data;
-	struct fixed_rsrc_ref_node *backup_node;
 	unsigned nr_tables, i;
 	int ret;
 
@@ -7410,12 +7409,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
 	 */
 	if (!data || percpu_ref_is_dying(&data->refs))
 		return -ENXIO;
-	backup_node = alloc_fixed_rsrc_ref_node(ctx);
-	if (!backup_node)
-		return -ENOMEM;
-	init_fixed_file_ref_node(ctx, backup_node);
-
-	ret = io_rsrc_ref_quiesce(data, ctx, backup_node);
+	ret = io_rsrc_ref_quiesce(data, ctx, io_ring_file_put);
 	if (ret)
 		return ret;
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/4] io_uring: wait potential ->release() on resurrect
  2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-02-20 18:03 ` [PATCH v2 3/4] io_uring: keep generic rsrc infra generic Pavel Begunkov
@ 2021-02-20 18:03 ` Pavel Begunkov
  2021-02-20 18:33 ` [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Jens Axboe
  2021-02-21 13:22 ` Hao Xu
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-20 18:03 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: stable

There is a short window where percpu_refs are already turned zero, but
we try to do resurrect(). Play nicer and wait for ->release() to happen
in this case and proceed as everything is ok. One downside for ctx refs
is that we can ignore signal_pending() on a rare occasion, but someone
else should check for it later if needed.

Cc: <[email protected]> # 5.5+
Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index b00ab7138410..ce197af2d3c6 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1104,6 +1104,21 @@ static inline void io_set_resource_node(struct io_kiocb *req)
 	}
 }
 
+static bool io_refs_resurrect(struct percpu_ref *ref, struct completion *compl)
+{
+	if (!percpu_ref_tryget(ref)) {
+		/* already at zero, wait for ->release() */
+		if (!try_wait_for_completion(compl))
+			synchronize_rcu();
+		return false;
+	}
+
+	percpu_ref_resurrect(ref);
+	reinit_completion(compl);
+	percpu_ref_put(ref);
+	return true;
+}
+
 static bool io_match_task(struct io_kiocb *head,
 			  struct task_struct *task,
 			  struct files_struct *files)
@@ -7353,13 +7368,11 @@ static int io_rsrc_ref_quiesce(struct fixed_rsrc_data *data,
 		flush_delayed_work(&ctx->rsrc_put_work);
 
 		ret = wait_for_completion_interruptible(&data->done);
-		if (!ret)
+		if (!ret || !io_refs_resurrect(&data->refs, &data->done))
 			break;
 
-		percpu_ref_resurrect(&data->refs);
 		io_sqe_rsrc_set_node(ctx, data, backup_node);
 		backup_node = NULL;
-		reinit_completion(&data->done);
 		mutex_unlock(&ctx->uring_lock);
 		ret = io_run_task_work_sig();
 		mutex_lock(&ctx->uring_lock);
@@ -10094,10 +10107,8 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
 
 		mutex_lock(&ctx->uring_lock);
 
-		if (ret) {
-			percpu_ref_resurrect(&ctx->refs);
-			goto out_quiesce;
-		}
+		if (ret && io_refs_resurrect(&ctx->refs, &ctx->ref_comp))
+			return ret;
 	}
 
 	if (ctx->restricted) {
@@ -10189,7 +10200,6 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
 	if (io_register_op_must_quiesce(opcode)) {
 		/* bring the ctx back to life */
 		percpu_ref_reinit(&ctx->refs);
-out_quiesce:
 		reinit_completion(&ctx->ref_comp);
 	}
 	return ret;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/4] rsrc quiesce fixes/hardening v2
  2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-02-20 18:03 ` [PATCH v2 4/4] io_uring: wait potential ->release() on resurrect Pavel Begunkov
@ 2021-02-20 18:33 ` Jens Axboe
  2021-02-21 13:22 ` Hao Xu
  5 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2021-02-20 18:33 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/20/21 11:03 AM, Pavel Begunkov wrote:
> v2: concurrent quiesce avoidance (Hao)
>     resurrect-release patch
> 
> Pavel Begunkov (4):
>   io_uring: zero ref_node after killing it
>   io_uring: fix io_rsrc_ref_quiesce races
>   io_uring: keep generic rsrc infra generic
>   io_uring: wait potential ->release() on resurrect
> 
>  fs/io_uring.c | 96 ++++++++++++++++++++++++---------------------------
>  1 file changed, 45 insertions(+), 51 deletions(-)

Thanks, replaced existing series.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/4] rsrc quiesce fixes/hardening v2
  2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
                   ` (4 preceding siblings ...)
  2021-02-20 18:33 ` [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Jens Axboe
@ 2021-02-21 13:22 ` Hao Xu
  2021-02-22 14:05   ` Pavel Begunkov
  5 siblings, 1 reply; 8+ messages in thread
From: Hao Xu @ 2021-02-21 13:22 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

在 2021/2/21 上午2:03, Pavel Begunkov 写道:
> v2: concurrent quiesce avoidance (Hao)
>      resurrect-release patch
> 
> Pavel Begunkov (4):
>    io_uring: zero ref_node after killing it
>    io_uring: fix io_rsrc_ref_quiesce races
>    io_uring: keep generic rsrc infra generic
>    io_uring: wait potential ->release() on resurrect
> 
>   fs/io_uring.c | 96 ++++++++++++++++++++++++---------------------------
>   1 file changed, 45 insertions(+), 51 deletions(-)
> 
I tested this patchset with the same tests
for "io_uring: don't hold uring_lock ..."

Tested-by: Hao Xu <[email protected]>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/4] rsrc quiesce fixes/hardening v2
  2021-02-21 13:22 ` Hao Xu
@ 2021-02-22 14:05   ` Pavel Begunkov
  0 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-22 14:05 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, io-uring

On 21/02/2021 13:22, Hao Xu wrote:
> 在 2021/2/21 上午2:03, Pavel Begunkov 写道:
>> v2: concurrent quiesce avoidance (Hao)
>>      resurrect-release patch
>>
>> Pavel Begunkov (4):
>>    io_uring: zero ref_node after killing it
>>    io_uring: fix io_rsrc_ref_quiesce races
>>    io_uring: keep generic rsrc infra generic
>>    io_uring: wait potential ->release() on resurrect
>>
>>   fs/io_uring.c | 96 ++++++++++++++++++++++++---------------------------
>>   1 file changed, 45 insertions(+), 51 deletions(-)
>>
> I tested this patchset with the same tests
> for "io_uring: don't hold uring_lock ..."
> 
> Tested-by: Hao Xu <[email protected]>

Great, thanks

FYI, looks like your emails have a strange encoding. It's
readable, but at least for me shows "undefined encoding".

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-22 14:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-20 18:03 [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 1/4] io_uring: zero ref_node after killing it Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 2/4] io_uring: fix io_rsrc_ref_quiesce races Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 3/4] io_uring: keep generic rsrc infra generic Pavel Begunkov
2021-02-20 18:03 ` [PATCH v2 4/4] io_uring: wait potential ->release() on resurrect Pavel Begunkov
2021-02-20 18:33 ` [PATCH v2 0/4] rsrc quiesce fixes/hardening v2 Jens Axboe
2021-02-21 13:22 ` Hao Xu
2021-02-22 14:05   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox