* [PATCH 0/4] fix SQPOLL and exit races @ 2021-01-08 20:57 Pavel Begunkov 2021-01-08 20:57 ` [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations Pavel Begunkov ` (3 more replies) 0 siblings, 4 replies; 5+ messages in thread From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw) To: Jens Axboe, io-uring The series prevents submissions from SQPOLL thread when sqo_task is getting killed (or exec()ed). That should solve races introduced by a patch that allowed task_works during cancellation. Jens, IIRC you said that io_rw_reissue() is called only during submission, right? 4/4 depends on that, so 1/4 should help to catch any misbehaviour. I reduced nr_requests but wasn't able to trigger io_resubmit_prep() for iopoll or not. What was the trick again? 4/4 is actually fairly simple, but safety measures and comments make it to bloat. The overhead is pretty negligible, and it allows to kill more, but that's for-next. Pavel Begunkov (4): io_uring: io_rw_reissue lockdep annotations io_uring: inline io_uring_attempt_task_drop() io_uring: add warn_once for io_uring_flush() io_uring: stop SQPOLL submit on creator's death fs/io_uring.c | 97 +++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 71 insertions(+), 26 deletions(-) -- 2.24.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations 2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov @ 2021-01-08 20:57 ` Pavel Begunkov 2021-01-08 20:57 ` [PATCH 2/4] io_uring: inline io_uring_attempt_task_drop() Pavel Begunkov ` (2 subsequent siblings) 3 siblings, 0 replies; 5+ messages in thread From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw) To: Jens Axboe, io-uring We expect io_rw_reissue() to take place only during submission with uring_lock held. Add a lockdep annotation to check that invariant. Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index cb57e0360fcb..55ba1922a349 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2692,6 +2692,8 @@ static bool io_rw_reissue(struct io_kiocb *req, long res) if ((res != -EAGAIN && res != -EOPNOTSUPP) || io_wq_current_is_worker()) return false; + lockdep_assert_held(&req->ctx->uring_lock); + ret = io_sq_thread_acquire_mm_files(req->ctx, req); if (io_resubmit_prep(req, ret)) { -- 2.24.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/4] io_uring: inline io_uring_attempt_task_drop() 2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov 2021-01-08 20:57 ` [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations Pavel Begunkov @ 2021-01-08 20:57 ` Pavel Begunkov 2021-01-08 20:57 ` [PATCH 3/4] io_uring: add warn_once for io_uring_flush() Pavel Begunkov 2021-01-08 20:57 ` [PATCH 4/4] io_uring: stop SQPOLL submit on creator's death Pavel Begunkov 3 siblings, 0 replies; 5+ messages in thread From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw) To: Jens Axboe, io-uring A simple preparation change inlining io_uring_attempt_task_drop() into io_uring_flush(). Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 55ba1922a349..1c931e7a3948 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8964,23 +8964,6 @@ static void io_uring_del_task_file(struct file *file) fput(file); } -/* - * Drop task note for this file if we're the only ones that hold it after - * pending fput() - */ -static void io_uring_attempt_task_drop(struct file *file) -{ - if (!current->io_uring) - return; - /* - * fput() is pending, will be 2 if the only other ref is our potential - * task file note. If the task is exiting, drop regardless of count. - */ - if (fatal_signal_pending(current) || (current->flags & PF_EXITING) || - atomic_long_read(&file->f_count) == 2) - io_uring_del_task_file(file); -} - static void io_uring_remove_task_files(struct io_uring_task *tctx) { struct file *file; @@ -9072,7 +9055,17 @@ void __io_uring_task_cancel(void) static int io_uring_flush(struct file *file, void *data) { - io_uring_attempt_task_drop(file); + if (!current->io_uring) + return 0; + + /* + * fput() is pending, will be 2 if the only other ref is our potential + * task file note. If the task is exiting, drop regardless of count. + */ + if (fatal_signal_pending(current) || (current->flags & PF_EXITING) || + atomic_long_read(&file->f_count) == 2) + io_uring_del_task_file(file); + return 0; } -- 2.24.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 3/4] io_uring: add warn_once for io_uring_flush() 2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov 2021-01-08 20:57 ` [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations Pavel Begunkov 2021-01-08 20:57 ` [PATCH 2/4] io_uring: inline io_uring_attempt_task_drop() Pavel Begunkov @ 2021-01-08 20:57 ` Pavel Begunkov 2021-01-08 20:57 ` [PATCH 4/4] io_uring: stop SQPOLL submit on creator's death Pavel Begunkov 3 siblings, 0 replies; 5+ messages in thread From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw) To: Jens Axboe, io-uring files_cancel() should cancel all relevant requests and drop file notes, so we should never have file notes after that, including on-exit fput and flush. Add a WARN_ONCE to be sure. Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 1c931e7a3948..f39671a0d84f 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -9055,17 +9055,23 @@ void __io_uring_task_cancel(void) static int io_uring_flush(struct file *file, void *data) { - if (!current->io_uring) + struct io_uring_task *tctx = current->io_uring; + + if (!tctx) return 0; + /* we should have cancelled and erased it before PF_EXITING */ + WARN_ON_ONCE((current->flags & PF_EXITING) && + xa_load(&tctx->xa, (unsigned long)file)); + /* * fput() is pending, will be 2 if the only other ref is our potential * task file note. If the task is exiting, drop regardless of count. */ - if (fatal_signal_pending(current) || (current->flags & PF_EXITING) || - atomic_long_read(&file->f_count) == 2) - io_uring_del_task_file(file); + if (atomic_long_read(&file->f_count) != 2) + return 0; + io_uring_del_task_file(file); return 0; } -- 2.24.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 4/4] io_uring: stop SQPOLL submit on creator's death 2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov ` (2 preceding siblings ...) 2021-01-08 20:57 ` [PATCH 3/4] io_uring: add warn_once for io_uring_flush() Pavel Begunkov @ 2021-01-08 20:57 ` Pavel Begunkov 3 siblings, 0 replies; 5+ messages in thread From: Pavel Begunkov @ 2021-01-08 20:57 UTC (permalink / raw) To: Jens Axboe, io-uring When the creator of SQPOLL io_uring dies (i.e. sqo_task), we don't want its internals like ->files and ->mm to be poked by the SQPOLL task, it have never been nice and recently got racy. That can happen when the owner undergoes destruction and SQPOLL tasks tries to submit new requests in parallel, and so calls io_sq_thread_acquire*(). That patch halts SQPOLL submissions when sqo_task dies by introducing sqo_dead flag. Once set, the SQPOLL task must not do any submission, which is synchronised by uring_lock as well as the new flag. The tricky part is to make sure that disabling always happens, that means either the ring is discovered by creator's do_exit() -> cancel, or if the final close() happens before it's done by the creator. The last is guaranteed by the fact that for SQPOLL the creator task and only it holds exactly one file note, so either it pins up to do_exit() or removed by the creator on the final put in flush. (see comments in uring_flush() around file->f_count == 2). One more place that can trigger io_sq_thread_acquire_*() is __io_req_task_submit(). Shoot off requests on sqo_dead there, even though actually we don't need to. That's because cancellation of sqo_task should wait for the request before going any further. note 1: io_disable_sqo_submit() does io_ring_set_wakeup_flag() so the caller would enter the ring to get an error, but it still doesn't guarantee that the flag won't be cleared. note 2: if final __userspace__ close happens not from the creator task, the file note will pin the ring until the task dies. Fixed: b1b6b5a30dce8 ("kernel/io_uring: cancel io_uring before task works") Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 62 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 53 insertions(+), 9 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index f39671a0d84f..2f305c097bd5 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -262,6 +262,7 @@ struct io_ring_ctx { unsigned int drain_next: 1; unsigned int eventfd_async: 1; unsigned int restricted: 1; + unsigned int sqo_dead: 1; /* * Ring buffer of indices into array of io_uring_sqe, which is @@ -2160,12 +2161,11 @@ static void io_req_task_cancel(struct callback_head *cb) static void __io_req_task_submit(struct io_kiocb *req) { struct io_ring_ctx *ctx = req->ctx; - bool fail; - fail = __io_sq_thread_acquire_mm(ctx) || - __io_sq_thread_acquire_files(ctx); mutex_lock(&ctx->uring_lock); - if (!fail) + if (!ctx->sqo_dead && + !__io_sq_thread_acquire_mm(ctx) && + !__io_sq_thread_acquire_files(ctx)) __io_queue_sqe(req, NULL); else __io_req_task_cancel(req, -EFAULT); @@ -6954,7 +6954,8 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries) if (!list_empty(&ctx->iopoll_list)) io_do_iopoll(ctx, &nr_events, 0); - if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs))) + if (to_submit && !ctx->sqo_dead && + likely(!percpu_ref_is_dying(&ctx->refs))) ret = io_submit_sqes(ctx, to_submit); mutex_unlock(&ctx->uring_lock); } @@ -8712,6 +8713,10 @@ static void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx) { mutex_lock(&ctx->uring_lock); percpu_ref_kill(&ctx->refs); + + if (WARN_ON_ONCE((ctx->flags & IORING_SETUP_SQPOLL) && !ctx->sqo_dead)) + ctx->sqo_dead = 1; + /* if force is set, the ring is going away. always drop after that */ ctx->cq_overflow_flushed = 1; if (ctx->rings) @@ -8874,6 +8879,18 @@ static void __io_uring_cancel_task_requests(struct io_ring_ctx *ctx, } } +static void io_disable_sqo_submit(struct io_ring_ctx *ctx) +{ + WARN_ON_ONCE(ctx->sqo_task != current); + + mutex_lock(&ctx->uring_lock); + ctx->sqo_dead = 1; + mutex_unlock(&ctx->uring_lock); + + /* make sure callers enter the ring to get error */ + io_ring_set_wakeup_flag(ctx); +} + /* * We need to iteratively cancel requests, in case a request has dependent * hard links. These persist even for failure of cancelations, hence keep @@ -8885,6 +8902,8 @@ static void io_uring_cancel_task_requests(struct io_ring_ctx *ctx, struct task_struct *task = current; if ((ctx->flags & IORING_SETUP_SQPOLL) && ctx->sq_data) { + /* for SQPOLL only sqo_task has task notes */ + io_disable_sqo_submit(ctx); task = ctx->sq_data->thread; atomic_inc(&task->io_uring->in_idle); io_sq_thread_park(ctx->sq_data); @@ -9056,6 +9075,7 @@ void __io_uring_task_cancel(void) static int io_uring_flush(struct file *file, void *data) { struct io_uring_task *tctx = current->io_uring; + struct io_ring_ctx *ctx = file->private_data; if (!tctx) return 0; @@ -9071,7 +9091,16 @@ static int io_uring_flush(struct file *file, void *data) if (atomic_long_read(&file->f_count) != 2) return 0; - io_uring_del_task_file(file); + if (ctx->flags & IORING_SETUP_SQPOLL) { + /* there is only one file note, which is owned by sqo_task */ + WARN_ON_ONCE((ctx->sqo_task == current) == + !xa_load(&tctx->xa, (unsigned long)file)); + + io_disable_sqo_submit(ctx); + } + + if (!(ctx->flags & IORING_SETUP_SQPOLL) || ctx->sqo_task == current) + io_uring_del_task_file(file); return 0; } @@ -9145,8 +9174,9 @@ static unsigned long io_uring_nommu_get_unmapped_area(struct file *file, #endif /* !CONFIG_MMU */ -static void io_sqpoll_wait_sq(struct io_ring_ctx *ctx) +static int io_sqpoll_wait_sq(struct io_ring_ctx *ctx) { + int ret = 0; DEFINE_WAIT(wait); do { @@ -9155,6 +9185,11 @@ static void io_sqpoll_wait_sq(struct io_ring_ctx *ctx) prepare_to_wait(&ctx->sqo_sq_wait, &wait, TASK_INTERRUPTIBLE); + if (unlikely(ctx->sqo_dead)) { + ret = -EOWNERDEAD; + goto out; + } + if (!io_sqring_full(ctx)) break; @@ -9162,6 +9197,8 @@ static void io_sqpoll_wait_sq(struct io_ring_ctx *ctx) } while (!signal_pending(current)); finish_wait(&ctx->sqo_sq_wait, &wait); +out: + return ret; } static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz, @@ -9235,10 +9272,16 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (ctx->flags & IORING_SETUP_SQPOLL) { io_cqring_overflow_flush(ctx, false, NULL, NULL); + ret = -EOWNERDEAD; + if (unlikely(ctx->sqo_dead)) + goto out; if (flags & IORING_ENTER_SQ_WAKEUP) wake_up(&ctx->sq_data->wait); - if (flags & IORING_ENTER_SQ_WAIT) - io_sqpoll_wait_sq(ctx); + if (flags & IORING_ENTER_SQ_WAIT) { + ret = io_sqpoll_wait_sq(ctx); + if (ret) + goto out; + } submitted = to_submit; } else if (to_submit) { ret = io_uring_add_task_file(ctx, f.file); @@ -9665,6 +9708,7 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p, trace_io_uring_create(ret, ctx, p->sq_entries, p->cq_entries, p->flags); return ret; err: + io_disable_sqo_submit(ctx); io_ring_ctx_wait_and_kill(ctx); return ret; } -- 2.24.0 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-01-08 21:02 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-01-08 20:57 [PATCH 0/4] fix SQPOLL and exit races Pavel Begunkov 2021-01-08 20:57 ` [PATCH 1/4] io_uring: io_rw_reissue lockdep annotations Pavel Begunkov 2021-01-08 20:57 ` [PATCH 2/4] io_uring: inline io_uring_attempt_task_drop() Pavel Begunkov 2021-01-08 20:57 ` [PATCH 3/4] io_uring: add warn_once for io_uring_flush() Pavel Begunkov 2021-01-08 20:57 ` [PATCH 4/4] io_uring: stop SQPOLL submit on creator's death Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox