public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/4] fix SQPOLL and exit races
@ 2021-01-08 20:57 Pavel Begunkov
  2021-01-08 20:57 ` [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations Pavel Begunkov
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

The series prevents submissions from SQPOLL thread when sqo_task is
getting killed (or exec()ed). That should solve races introduced by a
patch that allowed task_works during cancellation.

Jens, IIRC you said that io_rw_reissue() is called only during
submission, right? 4/4 depends on that, so 1/4 should help to catch any
misbehaviour. I reduced nr_requests but wasn't able to trigger
io_resubmit_prep() for iopoll or not. What was the trick again?

4/4 is actually fairly simple, but safety measures and comments make it
to bloat. The overhead is pretty negligible, and it allows to kill more,
but that's for-next.

Pavel Begunkov (4):
  io_uring: io_rw_reissue lockdep annotations
  io_uring: inline io_uring_attempt_task_drop()
  io_uring: add warn_once for io_uring_flush()
  io_uring: stop SQPOLL submit on creator's death

 fs/io_uring.c | 97 +++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 71 insertions(+), 26 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations
  2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov
@ 2021-01-08 20:57 ` Pavel Begunkov
  2021-01-08 20:57 ` [PATCH 2/4] io_uring: inline io_uring_attempt_task_drop() Pavel Begunkov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

We expect io_rw_reissue() to take place only during submission with
uring_lock held. Add a lockdep annotation to check that invariant.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index cb57e0360fcb..55ba1922a349 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2692,6 +2692,8 @@ static bool io_rw_reissue(struct io_kiocb *req, long res)
 	if ((res != -EAGAIN && res != -EOPNOTSUPP) || io_wq_current_is_worker())
 		return false;
 
+	lockdep_assert_held(&req->ctx->uring_lock);
+
 	ret = io_sq_thread_acquire_mm_files(req->ctx, req);
 
 	if (io_resubmit_prep(req, ret)) {
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] io_uring: inline io_uring_attempt_task_drop()
  2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov
  2021-01-08 20:57 ` [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations Pavel Begunkov
@ 2021-01-08 20:57 ` Pavel Begunkov
  2021-01-08 20:57 ` [PATCH 3/4] io_uring: add warn_once for io_uring_flush() Pavel Begunkov
  2021-01-08 20:57 ` [PATCH 4/4] io_uring: stop SQPOLL submit on creator's death Pavel Begunkov
  3 siblings, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

A simple preparation change inlining io_uring_attempt_task_drop() into
io_uring_flush().

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 55ba1922a349..1c931e7a3948 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8964,23 +8964,6 @@ static void io_uring_del_task_file(struct file *file)
 		fput(file);
 }
 
-/*
- * Drop task note for this file if we're the only ones that hold it after
- * pending fput()
- */
-static void io_uring_attempt_task_drop(struct file *file)
-{
-	if (!current->io_uring)
-		return;
-	/*
-	 * fput() is pending, will be 2 if the only other ref is our potential
-	 * task file note. If the task is exiting, drop regardless of count.
-	 */
-	if (fatal_signal_pending(current) || (current->flags & PF_EXITING) ||
-	    atomic_long_read(&file->f_count) == 2)
-		io_uring_del_task_file(file);
-}
-
 static void io_uring_remove_task_files(struct io_uring_task *tctx)
 {
 	struct file *file;
@@ -9072,7 +9055,17 @@ void __io_uring_task_cancel(void)
 
 static int io_uring_flush(struct file *file, void *data)
 {
-	io_uring_attempt_task_drop(file);
+	if (!current->io_uring)
+		return 0;
+
+	/*
+	 * fput() is pending, will be 2 if the only other ref is our potential
+	 * task file note. If the task is exiting, drop regardless of count.
+	 */
+	if (fatal_signal_pending(current) || (current->flags & PF_EXITING) ||
+	    atomic_long_read(&file->f_count) == 2)
+		io_uring_del_task_file(file);
+
 	return 0;
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] io_uring: add warn_once for io_uring_flush()
  2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov
  2021-01-08 20:57 ` [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations Pavel Begunkov
  2021-01-08 20:57 ` [PATCH 2/4] io_uring: inline io_uring_attempt_task_drop() Pavel Begunkov
@ 2021-01-08 20:57 ` Pavel Begunkov
  2021-01-08 20:57 ` [PATCH 4/4] io_uring: stop SQPOLL submit on creator's death Pavel Begunkov
  3 siblings, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

files_cancel() should cancel all relevant requests and drop file notes,
so we should never have file notes after that, including on-exit fput
and flush. Add a WARN_ONCE to be sure.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1c931e7a3948..f39671a0d84f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9055,17 +9055,23 @@ void __io_uring_task_cancel(void)
 
 static int io_uring_flush(struct file *file, void *data)
 {
-	if (!current->io_uring)
+	struct io_uring_task *tctx = current->io_uring;
+
+	if (!tctx)
 		return 0;
 
+	/* we should have cancelled and erased it before PF_EXITING */
+	WARN_ON_ONCE((current->flags & PF_EXITING) &&
+		     xa_load(&tctx->xa, (unsigned long)file));
+
 	/*
 	 * fput() is pending, will be 2 if the only other ref is our potential
 	 * task file note. If the task is exiting, drop regardless of count.
 	 */
-	if (fatal_signal_pending(current) || (current->flags & PF_EXITING) ||
-	    atomic_long_read(&file->f_count) == 2)
-		io_uring_del_task_file(file);
+	if (atomic_long_read(&file->f_count) != 2)
+		return 0;
 
+	io_uring_del_task_file(file);
 	return 0;
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] io_uring: stop SQPOLL submit on creator's death
  2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-01-08 20:57 ` [PATCH 3/4] io_uring: add warn_once for io_uring_flush() Pavel Begunkov
@ 2021-01-08 20:57 ` Pavel Begunkov
  3 siblings, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want
its internals like ->files and ->mm to be poked by the SQPOLL task, it
have never been nice and recently got racy. That can happen when the
owner undergoes destruction and SQPOLL tasks tries to submit new
requests in parallel, and so calls io_sq_thread_acquire*().

That patch halts SQPOLL submissions when sqo_task dies by introducing
sqo_dead flag. Once set, the SQPOLL task must not do any submission,
which is synchronised by uring_lock as well as the new flag.

The tricky part is to make sure that disabling always happens, that
means either the ring is discovered by creator's do_exit() -> cancel,
or if the final close() happens before it's done by the creator. The
last is guaranteed by the fact that for SQPOLL the creator task and only
it holds exactly one file note, so either it pins up to do_exit() or
removed by the creator on the final put in flush. (see comments in
uring_flush() around file->f_count == 2).

One more place that can trigger io_sq_thread_acquire_*() is
__io_req_task_submit(). Shoot off requests on sqo_dead there, even
though actually we don't need to. That's because cancellation of
sqo_task should wait for the request before going any further.

note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the
caller would enter the ring to get an error, but it still doesn't
guarantee that the flag won't be cleared.

note 2: if final __userspace__ close happens not from the creator
task, the file note will pin the ring until the task dies.

Fixed: b1b6b5a30dce8 ("kernel/io_uring: cancel io_uring before task works")
Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 62 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 53 insertions(+), 9 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f39671a0d84f..2f305c097bd5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -262,6 +262,7 @@ struct io_ring_ctx {
 		unsigned int		drain_next: 1;
 		unsigned int		eventfd_async: 1;
 		unsigned int		restricted: 1;
+		unsigned int		sqo_dead: 1;
 
 		/*
 		 * Ring buffer of indices into array of io_uring_sqe, which is
@@ -2160,12 +2161,11 @@ static void io_req_task_cancel(struct callback_head *cb)
 static void __io_req_task_submit(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
-	bool fail;
 
-	fail = __io_sq_thread_acquire_mm(ctx) ||
-		__io_sq_thread_acquire_files(ctx);
 	mutex_lock(&ctx->uring_lock);
-	if (!fail)
+	if (!ctx->sqo_dead &&
+	    !__io_sq_thread_acquire_mm(ctx) &&
+	    !__io_sq_thread_acquire_files(ctx))
 		__io_queue_sqe(req, NULL);
 	else
 		__io_req_task_cancel(req, -EFAULT);
@@ -6954,7 +6954,8 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
 		if (!list_empty(&ctx->iopoll_list))
 			io_do_iopoll(ctx, &nr_events, 0);
 
-		if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)))
+		if (to_submit && !ctx->sqo_dead &&
+		    likely(!percpu_ref_is_dying(&ctx->refs)))
 			ret = io_submit_sqes(ctx, to_submit);
 		mutex_unlock(&ctx->uring_lock);
 	}
@@ -8712,6 +8713,10 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 {
 	mutex_lock(&ctx->uring_lock);
 	percpu_ref_kill(&ctx->refs);
+
+	if (WARN_ON_ONCE((ctx->flags & IORING_SETUP_SQPOLL) && !ctx->sqo_dead))
+		ctx->sqo_dead = 1;
+
 	/* if force is set, the ring is going away. always drop after that */
 	ctx->cq_overflow_flushed = 1;
 	if (ctx->rings)
@@ -8874,6 +8879,18 @@ static void __io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 	}
 }
 
+static void io_disable_sqo_submit(struct io_ring_ctx *ctx)
+{
+	WARN_ON_ONCE(ctx->sqo_task != current);
+
+	mutex_lock(&ctx->uring_lock);
+	ctx->sqo_dead = 1;
+	mutex_unlock(&ctx->uring_lock);
+
+	/* make sure callers enter the ring to get error */
+	io_ring_set_wakeup_flag(ctx);
+}
+
 /*
  * We need to iteratively cancel requests, in case a request has dependent
  * hard links. These persist even for failure of cancelations, hence keep
@@ -8885,6 +8902,8 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx,
 	struct task_struct *task = current;
 
 	if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) {
+		/* for SQPOLL only sqo_task has task notes */
+		io_disable_sqo_submit(ctx);
 		task = ctx->sq_data->thread;
 		atomic_inc(&task->io_uring->in_idle);
 		io_sq_thread_park(ctx->sq_data);
@@ -9056,6 +9075,7 @@ void __io_uring_task_cancel(void)
 static int io_uring_flush(struct file *file, void *data)
 {
 	struct io_uring_task *tctx = current->io_uring;
+	struct io_ring_ctx *ctx = file->private_data;
 
 	if (!tctx)
 		return 0;
@@ -9071,7 +9091,16 @@ static int io_uring_flush(struct file *file, void *data)
 	if (atomic_long_read(&file->f_count) != 2)
 		return 0;
 
-	io_uring_del_task_file(file);
+	if (ctx->flags & IORING_SETUP_SQPOLL) {
+		/* there is only one file note, which is owned by sqo_task */
+		WARN_ON_ONCE((ctx->sqo_task == current) ==
+			     !xa_load(&tctx->xa, (unsigned long)file));
+
+		io_disable_sqo_submit(ctx);
+	}
+
+	if (!(ctx->flags & IORING_SETUP_SQPOLL) || ctx->sqo_task == current)
+		io_uring_del_task_file(file);
 	return 0;
 }
 
@@ -9145,8 +9174,9 @@ static unsigned long io_uring_nommu_get_unmapped_area(struct file *file,
 
 #endif /* !CONFIG_MMU */
 
-static void io_sqpoll_wait_sq(struct io_ring_ctx *ctx)
+static int io_sqpoll_wait_sq(struct io_ring_ctx *ctx)
 {
+	int ret = 0;
 	DEFINE_WAIT(wait);
 
 	do {
@@ -9155,6 +9185,11 @@ static void io_sqpoll_wait_sq(struct io_ring_ctx *ctx)
 
 		prepare_to_wait(&ctx->sqo_sq_wait, &wait, TASK_INTERRUPTIBLE);
 
+		if (unlikely(ctx->sqo_dead)) {
+			ret = -EOWNERDEAD;
+			goto out;
+		}
+
 		if (!io_sqring_full(ctx))
 			break;
 
@@ -9162,6 +9197,8 @@ static void io_sqpoll_wait_sq(struct io_ring_ctx *ctx)
 	} while (!signal_pending(current));
 
 	finish_wait(&ctx->sqo_sq_wait, &wait);
+out:
+	return ret;
 }
 
 static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz,
@@ -9235,10 +9272,16 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		io_cqring_overflow_flush(ctx, false, NULL, NULL);
 
+		ret = -EOWNERDEAD;
+		if (unlikely(ctx->sqo_dead))
+			goto out;
 		if (flags & IORING_ENTER_SQ_WAKEUP)
 			wake_up(&ctx->sq_data->wait);
-		if (flags & IORING_ENTER_SQ_WAIT)
-			io_sqpoll_wait_sq(ctx);
+		if (flags & IORING_ENTER_SQ_WAIT) {
+			ret = io_sqpoll_wait_sq(ctx);
+			if (ret)
+				goto out;
+		}
 		submitted = to_submit;
 	} else if (to_submit) {
 		ret = io_uring_add_task_file(ctx, f.file);
@@ -9665,6 +9708,7 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
 	trace_io_uring_create(ret, ctx, p->sq_entries, p->cq_entries, p->flags);
 	return ret;
 err:
+	io_disable_sqo_submit(ctx);
 	io_ring_ctx_wait_and_kill(ctx);
 	return ret;
 }
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-01-08 21:02 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov
2021-01-08 20:57 ` [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations Pavel Begunkov
2021-01-08 20:57 ` [PATCH 2/4] io_uring: inline io_uring_attempt_task_drop() Pavel Begunkov
2021-01-08 20:57 ` [PATCH 3/4] io_uring: add warn_once for io_uring_flush() Pavel Begunkov
2021-01-08 20:57 ` [PATCH 4/4] io_uring: stop SQPOLL submit on creator's death Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox