[PATCH 5.12 0/5] fixes

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH 5.12 0/5] fixes
@ 2021-03-14 20:57 Pavel Begunkov
  2021-03-14 20:57 ` [PATCH 1/5] io_uring: fix ->flags races by linked timeouts Pavel Begunkov
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Pavel Begunkov @ 2021-03-14 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Fresh fixes, 3-5 are SQPOLL related 

Pavel Begunkov (5):
  io_uring: fix ->flags races by linked timeouts
  io_uring: fix complete_post use ctx after free
  io_uring: replace sqd rw_semaphore with mutex
  io_uring: halt SQO submission on ctx exit
  io_uring: fix concurrent parking

 fs/io_uring.c | 72 +++++++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 31 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/5] io_uring: fix ->flags races by linked timeouts
  2021-03-14 20:57 [PATCH 5.12 0/5] fixes Pavel Begunkov
@ 2021-03-14 20:57 ` Pavel Begunkov
  2021-03-14 20:57 ` [PATCH 2/5] io_uring: fix complete_post use ctx after free Pavel Begunkov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Pavel Begunkov @ 2021-03-14 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

It's racy to modify req->flags from a not owning context, e.g. linked
timeout calling req_set_fail_links() for the master request might race
with that request setting/clearing flags while being executed
concurrently. Just remove req_set_fail_links(prev) from
io_link_timeout_fn(), io_async_find_and_cancel() and functions down the
line take care of setting the fail bit.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 642ad08d8964..4fd984fa6739 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6197,7 +6197,6 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
 	spin_unlock_irqrestore(&ctx->completion_lock, flags);
 
 	if (prev) {
-		req_set_fail_links(prev);
 		io_async_find_and_cancel(ctx, req, prev->user_data, -ETIME);
 		io_put_req_deferred(prev, 1);
 	} else {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/5] io_uring: fix complete_post use ctx after free
  2021-03-14 20:57 [PATCH 5.12 0/5] fixes Pavel Begunkov
  2021-03-14 20:57 ` [PATCH 1/5] io_uring: fix ->flags races by linked timeouts Pavel Begunkov
@ 2021-03-14 20:57 ` Pavel Begunkov
  2021-03-14 20:57 ` [PATCH 3/5] io_uring: replace sqd rw_semaphore with mutex Pavel Begunkov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Pavel Begunkov @ 2021-03-14 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

If io_req_complete_post() put not a final ref, we can't rely on the
request's ctx ref, and so ctx may potentially be freed while
complete_post() is in io_cqring_ev_posted()/etc.

In that case get an additional ctx reference, and put it in the end, so
protecting following io_cqring_ev_posted(). And also prolong ctx
lifetime until spin_unlock happens, as we do with mutexes, so added
percpu_ref_get() doesn't race with ctx free.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4fd984fa6739..6548445f0d0b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1550,14 +1550,14 @@ static void io_req_complete_post(struct io_kiocb *req, long res,
 		io_put_task(req->task, 1);
 		list_add(&req->compl.list, &cs->locked_free_list);
 		cs->locked_free_nr++;
-	} else
-		req = NULL;
+	} else {
+		percpu_ref_get(&ctx->refs);
+	}
 	io_commit_cqring(ctx);
 	spin_unlock_irqrestore(&ctx->completion_lock, flags);
 	io_cqring_ev_posted(ctx);
 
-	if (req)
-		percpu_ref_put(&ctx->refs);
+	percpu_ref_put(&ctx->refs);
 }
 
 static void io_req_complete_state(struct io_kiocb *req, long res,
@@ -8373,11 +8373,13 @@ static void io_ring_ctx_free(struct io_ring_ctx *ctx)
 {
 	/*
 	 * Some may use context even when all refs and requests have been put,
-	 * and they are free to do so while still holding uring_lock, see
-	 * __io_req_task_submit(). Wait for them to finish.
+	 * and they are free to do so while still holding uring_lock or
+	 * completion_lock, see __io_req_task_submit(). Wait for them to finish.
 	 */
 	mutex_lock(&ctx->uring_lock);
 	mutex_unlock(&ctx->uring_lock);
+	spin_lock_irq(&ctx->completion_lock);
+	spin_unlock_irq(&ctx->completion_lock);
 
 	io_sq_thread_finish(ctx);
 	io_sqe_buffers_unregister(ctx);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/5] io_uring: replace sqd rw_semaphore with mutex
  2021-03-14 20:57 [PATCH 5.12 0/5] fixes Pavel Begunkov
  2021-03-14 20:57 ` [PATCH 1/5] io_uring: fix ->flags races by linked timeouts Pavel Begunkov
  2021-03-14 20:57 ` [PATCH 2/5] io_uring: fix complete_post use ctx after free Pavel Begunkov
@ 2021-03-14 20:57 ` Pavel Begunkov
  2021-03-14 20:57 ` [PATCH 4/5] io_uring: halt SQO submission on ctx exit Pavel Begunkov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Pavel Begunkov @ 2021-03-14 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

The only user of read-locking of sqd->rw_lock is sq_thread itself, which
is by definition alone, so we don't really need rw_semaphore, but mutex
will do. Replace it with a mutex, and kill read-to-write upgrading and
extra task_work handling in io_sq_thread().

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 36 ++++++++++++++----------------------
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6548445f0d0b..76a60699c959 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -258,7 +258,7 @@ enum {
 
 struct io_sq_data {
 	refcount_t		refs;
-	struct rw_semaphore	rw_lock;
+	struct mutex		lock;
 
 	/* ctx's that are using this sqd */
 	struct list_head	ctx_list;
@@ -6686,16 +6686,15 @@ static int io_sq_thread(void *data)
 		set_cpus_allowed_ptr(current, cpu_online_mask);
 	current->flags |= PF_NO_SETAFFINITY;
 
-	down_read(&sqd->rw_lock);
-
+	mutex_lock(&sqd->lock);
 	while (!test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state)) {
 		int ret;
 		bool cap_entries, sqt_spin, needs_sched;
 
 		if (test_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state)) {
-			up_read(&sqd->rw_lock);
+			mutex_unlock(&sqd->lock);
 			cond_resched();
-			down_read(&sqd->rw_lock);
+			mutex_lock(&sqd->lock);
 			io_run_task_work();
 			timeout = jiffies + sqd->sq_thread_idle;
 			continue;
@@ -6742,10 +6741,10 @@ static int io_sq_thread(void *data)
 			list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
 				io_ring_set_wakeup_flag(ctx);
 
-			up_read(&sqd->rw_lock);
+			mutex_unlock(&sqd->lock);
 			schedule();
 			try_to_freeze();
-			down_read(&sqd->rw_lock);
+			mutex_lock(&sqd->lock);
 			list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
 				io_ring_clear_wakeup_flag(ctx);
 		}
@@ -6753,20 +6752,13 @@ static int io_sq_thread(void *data)
 		finish_wait(&sqd->wait, &wait);
 		timeout = jiffies + sqd->sq_thread_idle;
 	}
-	up_read(&sqd->rw_lock);
-	down_write(&sqd->rw_lock);
-	/*
-	 * someone may have parked and added a cancellation task_work, run
-	 * it first because we don't want it in io_uring_cancel_sqpoll()
-	 */
-	io_run_task_work();
 
 	list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
 		io_uring_cancel_sqpoll(ctx);
 	sqd->thread = NULL;
 	list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
 		io_ring_set_wakeup_flag(ctx);
-	up_write(&sqd->rw_lock);
+	mutex_unlock(&sqd->lock);
 
 	io_run_task_work();
 	complete(&sqd->exited);
@@ -7068,21 +7060,21 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
 }
 
 static void io_sq_thread_unpark(struct io_sq_data *sqd)
-	__releases(&sqd->rw_lock)
+	__releases(&sqd->lock)
 {
 	WARN_ON_ONCE(sqd->thread == current);
 
 	clear_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
-	up_write(&sqd->rw_lock);
+	mutex_unlock(&sqd->lock);
 }
 
 static void io_sq_thread_park(struct io_sq_data *sqd)
-	__acquires(&sqd->rw_lock)
+	__acquires(&sqd->lock)
 {
 	WARN_ON_ONCE(sqd->thread == current);
 
 	set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
-	down_write(&sqd->rw_lock);
+	mutex_lock(&sqd->lock);
 	/* set again for consistency, in case concurrent parks are happening */
 	set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
 	if (sqd->thread)
@@ -7093,11 +7085,11 @@ static void io_sq_thread_stop(struct io_sq_data *sqd)
 {
 	WARN_ON_ONCE(sqd->thread == current);
 
-	down_write(&sqd->rw_lock);
+	mutex_lock(&sqd->lock);
 	set_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
 	if (sqd->thread)
 		wake_up_process(sqd->thread);
-	up_write(&sqd->rw_lock);
+	mutex_unlock(&sqd->lock);
 	wait_for_completion(&sqd->exited);
 }
 
@@ -7179,7 +7171,7 @@ static struct io_sq_data *io_get_sq_data(struct io_uring_params *p,
 
 	refcount_set(&sqd->refs, 1);
 	INIT_LIST_HEAD(&sqd->ctx_list);
-	init_rwsem(&sqd->rw_lock);
+	mutex_init(&sqd->lock);
 	init_waitqueue_head(&sqd->wait);
 	init_completion(&sqd->exited);
 	return sqd;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/5] io_uring: halt SQO submission on ctx exit
  2021-03-14 20:57 [PATCH 5.12 0/5] fixes Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-03-14 20:57 ` [PATCH 3/5] io_uring: replace sqd rw_semaphore with mutex Pavel Begunkov
@ 2021-03-14 20:57 ` Pavel Begunkov
  2021-03-14 20:57 ` [PATCH 5/5] io_uring: fix concurrent parking Pavel Begunkov
  2021-03-17 22:43 ` [PATCH 5.12 0/5] fixes Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Pavel Begunkov @ 2021-03-14 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is
potentially running submitting new requests. It's not a disaster because
of using a "try" variant of percpu_ref_get, but is far from nice.

Remove ctx from the sqd ctx list earlier, before cancellation loop, so
SQPOLL can't find it and so won't submit new requests.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 76a60699c959..b4b8988785fb 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8561,6 +8561,14 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 		io_unregister_personality(ctx, index);
 	mutex_unlock(&ctx->uring_lock);
 
+	/* prevent SQPOLL from submitting new requests */
+	if (ctx->sq_data) {
+		io_sq_thread_park(ctx->sq_data);
+		list_del_init(&ctx->sqd_list);
+		io_sqd_update_thread_idle(ctx->sq_data);
+		io_sq_thread_unpark(ctx->sq_data);
+	}
+
 	io_kill_timeouts(ctx, NULL, NULL);
 	io_poll_remove_all(ctx, NULL, NULL);
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 5/5] io_uring: fix concurrent parking
  2021-03-14 20:57 [PATCH 5.12 0/5] fixes Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-03-14 20:57 ` [PATCH 4/5] io_uring: halt SQO submission on ctx exit Pavel Begunkov
@ 2021-03-14 20:57 ` Pavel Begunkov
  2021-03-17 22:43 ` [PATCH 5.12 0/5] fixes Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Pavel Begunkov @ 2021-03-14 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

If io_sq_thread_park() of one task got rescheduled right after
set_bit(), before it gets back to mutex_lock() there can happen
park()/unpark() by another task with SQPOLL locking again and
continuing running never seeing that first set_bit(SHOULD_PARK),
so won't even try to put the mutex down for parking.

It will get parked eventually when SQPOLL drops the lock for reschedule,
but may be problematic and will get in the way of further fixes.

Account number of tasks waiting for parking with a new atomic variable
park_pending and adjust SHOULD_PARK accordingly. It doesn't entirely
replaces SHOULD_PARK bit with this atomic var because it's convenient
to have it as a bit in the state and will help to do optimisations
later.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index b4b8988785fb..ccd7f09fd449 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -258,6 +258,7 @@ enum {
 
 struct io_sq_data {
 	refcount_t		refs;
+	atomic_t		park_pending;
 	struct mutex		lock;
 
 	/* ctx's that are using this sqd */
@@ -7064,7 +7065,13 @@ static void io_sq_thread_unpark(struct io_sq_data *sqd)
 {
 	WARN_ON_ONCE(sqd->thread == current);
 
+	/*
+	 * Do the dance but not conditional clear_bit() because it'd race with
+	 * other threads incrementing park_pending and setting the bit.
+	 */
 	clear_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
+	if (atomic_dec_return(&sqd->park_pending))
+		set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
 	mutex_unlock(&sqd->lock);
 }
 
@@ -7073,10 +7080,9 @@ static void io_sq_thread_park(struct io_sq_data *sqd)
 {
 	WARN_ON_ONCE(sqd->thread == current);
 
+	atomic_inc(&sqd->park_pending);
 	set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
 	mutex_lock(&sqd->lock);
-	/* set again for consistency, in case concurrent parks are happening */
-	set_bit(IO_SQ_THREAD_SHOULD_PARK, &sqd->state);
 	if (sqd->thread)
 		wake_up_process(sqd->thread);
 }
@@ -7096,6 +7102,8 @@ static void io_sq_thread_stop(struct io_sq_data *sqd)
 static void io_put_sq_data(struct io_sq_data *sqd)
 {
 	if (refcount_dec_and_test(&sqd->refs)) {
+		WARN_ON_ONCE(atomic_read(&sqd->park_pending));
+
 		io_sq_thread_stop(sqd);
 		kfree(sqd);
 	}
@@ -7169,6 +7177,7 @@ static struct io_sq_data *io_get_sq_data(struct io_uring_params *p,
 	if (!sqd)
 		return ERR_PTR(-ENOMEM);
 
+	atomic_set(&sqd->park_pending, 0);
 	refcount_set(&sqd->refs, 1);
 	INIT_LIST_HEAD(&sqd->ctx_list);
 	mutex_init(&sqd->lock);
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 5.12 0/5] fixes
  2021-03-14 20:57 [PATCH 5.12 0/5] fixes Pavel Begunkov
                   ` (4 preceding siblings ...)
  2021-03-14 20:57 ` [PATCH 5/5] io_uring: fix concurrent parking Pavel Begunkov
@ 2021-03-17 22:43 ` Jens Axboe
  5 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2021-03-17 22:43 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/14/21 2:57 PM, Pavel Begunkov wrote:
> Fresh fixes, 3-5 are SQPOLL related 
> 
> Pavel Begunkov (5):
>   io_uring: fix ->flags races by linked timeouts
>   io_uring: fix complete_post use ctx after free
>   io_uring: replace sqd rw_semaphore with mutex
>   io_uring: halt SQO submission on ctx exit
>   io_uring: fix concurrent parking
> 
>  fs/io_uring.c | 72 +++++++++++++++++++++++++++++----------------------
>  1 file changed, 41 insertions(+), 31 deletions(-)

These were applied a while back, forgot to reply. Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-17 22:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-14 20:57 [PATCH 5.12 0/5] fixes Pavel Begunkov
2021-03-14 20:57 ` [PATCH 1/5] io_uring: fix ->flags races by linked timeouts Pavel Begunkov
2021-03-14 20:57 ` [PATCH 2/5] io_uring: fix complete_post use ctx after free Pavel Begunkov
2021-03-14 20:57 ` [PATCH 3/5] io_uring: replace sqd rw_semaphore with mutex Pavel Begunkov
2021-03-14 20:57 ` [PATCH 4/5] io_uring: halt SQO submission on ctx exit Pavel Begunkov
2021-03-14 20:57 ` [PATCH 5/5] io_uring: fix concurrent parking Pavel Begunkov
2021-03-17 22:43 ` [PATCH 5.12 0/5] fixes Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox