public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: [email protected]
Cc: [email protected], [email protected],
	Josef <[email protected]>
Subject: Re: [PATCH 2/2] io_uring: use TWA_SIGNAL for task_work if the task isn't running
Date: Mon, 10 Aug 2020 13:21:48 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/10/20 9:02 AM, Jens Axboe wrote:
> On 8/10/20 5:42 AM, [email protected] wrote:
>> On Sat, Aug 08, 2020 at 12:34:39PM -0600, Jens Axboe wrote:
>>> An earlier commit:
>>>
>>> b7db41c9e03b ("io_uring: fix regression with always ignoring signals in io_cqring_wait()")
>>>
>>> ensured that we didn't get stuck waiting for eventfd reads when it's
>>> registered with the io_uring ring for event notification, but we still
>>> have a gap where the task can be waiting on other events in the kernel
>>> and need a bigger nudge to make forward progress.
>>>
>>> Ensure that we use signaled notifications for a task that isn't currently
>>> running, to be certain the work is seen and processed immediately.
>>>
>>> Cc: [email protected] # v5.7+
>>> Reported-by: Josef <[email protected]>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>> ---
>>>  fs/io_uring.c | 22 ++++++++++++++--------
>>>  1 file changed, 14 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index e9b27cdaa735..443eecdfeda9 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1712,21 +1712,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
>>>  	struct io_ring_ctx *ctx = req->ctx;
>>>  	int ret, notify = TWA_RESUME;
>>>  
>>> +	ret = __task_work_add(tsk, cb);
>>> +	if (unlikely(ret))
>>> +		return ret;
>>> +
>>>  	/*
>>>  	 * SQPOLL kernel thread doesn't need notification, just a wakeup.
>>> -	 * If we're not using an eventfd, then TWA_RESUME is always fine,
>>> -	 * as we won't have dependencies between request completions for
>>> -	 * other kernel wait conditions.
>>> +	 * For any other work, use signaled wakeups if the task isn't
>>> +	 * running to avoid dependencies between tasks or threads. If
>>> +	 * the issuing task is currently waiting in the kernel on a thread,
>>> +	 * and same thread is waiting for a completion event, then we need
>>> +	 * to ensure that the issuing task processes task_work. TWA_SIGNAL
>>> +	 * is needed for that.
>>>  	 */
>>>  	if (ctx->flags & IORING_SETUP_SQPOLL)
>>>  		notify = 0;
>>> -	else if (ctx->cq_ev_fd)
>>> +	else if (READ_ONCE(tsk->state) != TASK_RUNNING)
>>>  		notify = TWA_SIGNAL;
>>>  
>>> -	ret = task_work_add(tsk, cb, notify);
>>> -	if (!ret)
>>> -		wake_up_process(tsk);
>>> -	return ret;
>>> +	__task_work_notify(tsk, notify);
>>> +	wake_up_process(tsk);
>>> +	return 0;
>>>  }
>>
>> Wait.. so the only change here is that you look at tsk->state, _after_
>> doing __task_work_add(), but nothing, not the Changelog nor the comment
>> explains this.
>>
>> So you're relying on __task_work_add() being an smp_mb() vs the add, and
>> you order this against the smp_mb() in set_current_state() ?
>>
>> This really needs spelling out.
> 
> I'll update the changelog, it suffers a bit from having been reused from
> the earlier versions. Thanks for checking!

I failed to convince myself that the existing construct was safe, so
here's an incremental on top of that. Basically we re-check the task
state _after_ the initial notification, to protect ourselves from the
case where we initially find the task running, but between that check
and when we do the notification, it's now gone to sleep. Should be
pretty slim, but I think it's there.

Hence do a loop around it, if we're using TWA_RESUME.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 44ac103483b6..a4ecb6c7e2b0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1780,12 +1780,27 @@ static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb)
 	 * to ensure that the issuing task processes task_work. TWA_SIGNAL
 	 * is needed for that.
 	 */
-	if (ctx->flags & IORING_SETUP_SQPOLL)
+	if (ctx->flags & IORING_SETUP_SQPOLL) {
 		notify = 0;
-	else if (READ_ONCE(tsk->state) != TASK_RUNNING)
-		notify = TWA_SIGNAL;
+	} else {
+		bool notified = false;
 
-	__task_work_notify(tsk, notify);
+		/*
+		 * If the task is running, TWA_RESUME notify is enough. Make
+		 * sure to re-check after we've sent the notification, as not
+		 * to have a race between the check and the notification. This
+		 * only applies for TWA_RESUME, as TWA_SIGNAL is safe with a
+		 * sleeping task
+		 */
+		do {
+			if (READ_ONCE(tsk->state) != TASK_RUNNING)
+				notify = TWA_SIGNAL;
+			else if (notified)
+				break;
+			__task_work_notify(tsk, notify);
+			notified = true;
+		} while (notify != TWA_SIGNAL);
+	}
 	wake_up_process(tsk);
 	return 0;
 }

and I've folded it in here:

https://git.kernel.dk/cgit/linux-block/commit/?h=io_uring-5.9&id=8d685b56f80b16516be9ce2eb1aee5adcfba13ff

-- 
Jens Axboe


  reply	other threads:[~2020-08-10 19:21 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-08 18:34 [PATCHSET 0/2] io_uring: use TWA_SIGNAL more carefully Jens Axboe
2020-08-08 18:34 ` [PATCH 1/2] kernel: split task_work_add() into two separate helpers Jens Axboe
2020-08-10 11:37   ` peterz
2020-08-10 15:01     ` Jens Axboe
2020-08-10 15:28       ` peterz
2020-08-10 17:51       ` Jens Axboe
2020-08-10 19:53         ` Peter Zijlstra
2020-08-08 18:34 ` [PATCH 2/2] io_uring: use TWA_SIGNAL for task_work if the task isn't running Jens Axboe
2020-08-10 11:42   ` peterz
2020-08-10 15:02     ` Jens Axboe
2020-08-10 19:21       ` Jens Axboe [this message]
2020-08-10 20:12         ` Peter Zijlstra
2020-08-10 20:13           ` Jens Axboe
2020-08-10 20:25             ` Jens Axboe
2020-08-10 20:32               ` Peter Zijlstra
2020-08-10 20:35                 ` Jens Axboe
2020-08-10 20:35               ` Jann Horn
2020-08-10 21:06                 ` Jens Axboe
2020-08-10 21:10                   ` Peter Zijlstra
2020-08-10 21:12                     ` Jens Axboe
2020-08-10 21:26                       ` Jann Horn
2020-08-10 21:28                         ` Jens Axboe
2020-08-10 22:01                           ` Jens Axboe
2020-08-10 22:41                             ` Jann Horn
2020-08-11  1:25                               ` Jens Axboe
2020-08-11  6:45                               ` Oleg Nesterov
2020-08-11  6:56                                 ` Peter Zijlstra
2020-08-11  7:14                                   ` Oleg Nesterov
2020-08-11  7:26                                     ` Oleg Nesterov
2020-08-11  7:49                                       ` Peter Zijlstra
2020-08-11  7:45                                     ` Peter Zijlstra
2020-08-11  8:10                                       ` Oleg Nesterov
2020-08-11 13:06                                         ` Jens Axboe
2020-08-11 14:05                                           ` Oleg Nesterov
2020-08-11 14:12                                             ` Jens Axboe
2020-08-10 21:27                       ` Jens Axboe
2020-08-10 20:16           ` Jens Axboe
2020-08-13 16:25   ` Sasha Levin
2020-08-19 23:57   ` Sasha Levin
2020-08-19 23:59     ` Jens Axboe
2020-08-20  0:02       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox