On 8/7/20 3:50 PM, Jens Axboe wrote: > On 8/7/20 12:00 PM, Jann Horn wrote: >> On Fri, Aug 7, 2020 at 6:56 PM Jens Axboe wrote: >>> >>> An earlier commit: >>> >>> b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()") >>> >>> ensured that we didn't get stuck waiting for eventfd reads when it's >>> registered with the io_uring ring for event notification, but we still >>> have a gap where the task can be waiting on other events in the kernel >>> and need a bigger nudge to make forward progress. >>> >>> Ensure that we use signaled notifications for a task that isn't currently >>> running, to be certain the work is seen and processed immediately. >>> >>> Cc: stable@vger.kernel.org # v5.7+ >>> Reported-by: Josef >>> Signed-off-by: Jens Axboe >>> >>> --- >>> >>> This isn't perfect, as it'll use TWA_SIGNAL even for cases where we >>> don't absolutely need it (like task waiting for completions in >>> io_cqring_wait()), but we don't have a good way to tell right now. We >>> can probably improve on this in the future, for now I think this is the >>> best solution. >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index e9b27cdaa735..b4300a61f231 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -1720,7 +1720,7 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb) >>> */ >>> if (ctx->flags & IORING_SETUP_SQPOLL) >>> notify = 0; >>> - else if (ctx->cq_ev_fd) >>> + else if (ctx->cq_ev_fd || (tsk->state != TASK_RUNNING)) >>> notify = TWA_SIGNAL; >>> >>> ret = task_work_add(tsk, cb, notify); >> >> I don't get it. Apart from still not understanding the big picture: >> >> What guarantees that the lockless read of tsk->state is in any way >> related to the state of the remote process by the time we reach >> task_work_add()? And why do we not need to signal in TASK_RUNNING >> state (e.g. directly before the remote process switches to >> TASK_INTERRUPTIBLE or something like that)? > > Yeah it doesn't, the patch doesn't cover the racy case. As far as I can > tell, we've got two ways to do it: > > 1) We split the task_work_add() into two parts, one adding the work and > one doing the signaling. Then we could do: > > int notify = TWA_RESUME; > > __task_work_add(tsk, cb); > > if (ctx->flags & IORING_SETUP_SQPOLL) > notify = 0; > else if (ctx->cq_ev_fd || (tsk->state != TASK_RUNNING)) > notify = TWA_SIGNAL; > > __task_work_signal(tsk, notify); Something like the attached - totally untested so far, but it implements that idea. -- Jens Axboe