public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Andres Freund <[email protected]>, [email protected]
Cc: Oleg Nesterov <[email protected]>
Subject: Re: signals not reliably interrupting io_uring_enter anymore
Date: Fri, 3 Jul 2020 20:56:15 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 7/3/20 8:08 PM, Jens Axboe wrote:
> On 7/3/20 7:52 PM, Jens Axboe wrote:
>> On 7/3/20 7:13 PM, Andres Freund wrote:
>>> Hi, 
>>>
>>> On July 3, 2020 5:48:21 PM PDT, Jens Axboe <[email protected]> wrote:
>>>> On 7/3/20 6:15 PM, Andres Freund wrote:
>>>>> Hi,
>>>>>
>>>>> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>>>>>> I haven't yet fully analyzed the problem, but after updating to
>>>>>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres
>>>> does
>>>>>> not work reliably anymore.
>>>>>>
>>>>>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>>>>>> interrupted by signals anymore. The signal handler executes, but
>>>>>> afterwards the syscall is restarted. Previously io_uring_enter
>>>> reliably
>>>>>> returned EINTR in that case.
>>>>>>
>>>>>> Currently postgres relies on signals interrupting io_uring_enter().
>>>> We
>>>>>> probably can find a way to not do so, but it'd not be entirely
>>>> trivial.
>>>>>>
>>>>>> I suspect the issue is
>>>>>>
>>>>>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag:
>>>> io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>>>>>> Author: Jens Axboe <[email protected]>
>>>>>> Date:   2020-06-30 12:39:05 -0600
>>>>>>
>>>>>>     io_uring: use signal based task_work running
>>>>>>
>>>>>> as that appears to have changed the error returned by
>>>>>> io_uring_enter(GETEVENTS) after having been interrupted by a signal
>>>> from
>>>>>> EINTR to ERESTARTSYS.
>>>>>>
>>>>>>
>>>>>> I'll check to make sure that the issue doesn't exist before the
>>>> above
>>>>>> commit.
>>>>>
>>>>> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue
>>>> doesn't
>>>>> exist, which pretty much confirms that the above commit is the issue.
>>>>>
>>>>> What was the reason for changing EINTR to ERESTARTSYS in the above
>>>>> commit? I assume trying to avoid returning spurious EINTRs to
>>>> userland?
>>>>
>>>> Yeah, for when it's running task_work. I wonder if something like the
>>>> below will do the trick?
>>>>
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 700644a016a7..0efa73d78451 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx
>>>> *ctx, int min_events,
>>>> 	do {
>>>> 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
>>>> 						TASK_INTERRUPTIBLE);
>>>> -		/* make sure we run task_work before checking for signals */
>>>> -		if (current->task_works)
>>>> -			task_work_run();
>>>> 		if (signal_pending(current)) {
>>>> -			ret = -ERESTARTSYS;
>>>> +			if (current->task_works)
>>>> +				ret = -ERESTARTSYS;
>>>> +			else
>>>> +				ret = -EINTR;
>>>> 			break;
>>>> 		}
>>>> 		if (io_should_wake(&iowq, false))
>>>> @@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx
>>>> *ctx, int min_events,
>>>> 	} while (1);
>>>> 	finish_wait(&ctx->wait, &iowq.wq);
>>>>
>>>> -	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
>>>> +	restore_saved_sigmask_unless(ret == -EINTR);
>>>>
>>>> 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret :
>>>> 0;
>>>> }
>>>
>>> I'll try in a bit. Suspect however that there'd be trouble if there
>>> were both an actual signal and task work pending?
>>
>> Yes, I have that worry too. We'd really need to check if we have an
>> actual signal pending - if we do, we still do -EINTR. If not, then we
>> just do -ERESTARTSYS and restart the system call after task_work has
>> been completed. Half-assed approach below, I suspect this won't _really_
>> work without appropriate locking. Which would be unfortunate.
>>
>> Either that, or we'd need to know if an actual signal was delivered when
>> we get re-entered due to returning -ERESTARTSYS. If it was just
>> task_work being run, then we're fine. But if an actual signal was
>> pending, then we'd need to return -EINTR.
>>
>> CC'ing Oleg to see if he has any good ideas here.
> 
> This might be simpler:

Or... That's it for today, I'll check in after the weekend.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 700644a016a7..25a1877d3d84 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6197,11 +6197,18 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	do {
 		prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
 						TASK_INTERRUPTIBLE);
-		/* make sure we run task_work before checking for signals */
-		if (current->task_works)
-			task_work_run();
 		if (signal_pending(current)) {
-			ret = -ERESTARTSYS;
+			if (current->jobctl & JOBCTL_TASK_WORK) {
+				spin_lock_irq(&current->sighand->siglock);
+				current->jobctl &= ~JOBCTL_TASK_WORK;
+				recalc_sigpending();
+				spin_unlock_irq(&current->sighand->siglock);
+				if (current->task_works) {
+					task_work_run();
+					continue;
+				}
+			}
+			ret = -EINTR;
 			break;
 		}
 		if (io_should_wake(&iowq, false))
@@ -6210,7 +6217,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 	} while (1);
 	finish_wait(&ctx->wait, &iowq.wq);
 
-	restore_saved_sigmask_unless(ret == -ERESTARTSYS);
+	restore_saved_sigmask_unless(ret == -EINTR);
 
 	return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
 }

-- 
Jens Axboe


  reply	other threads:[~2020-07-04  2:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-04  0:00 signals not reliably interrupting io_uring_enter anymore Andres Freund
2020-07-04  0:15 ` Andres Freund
2020-07-04  0:48   ` Jens Axboe
2020-07-04  1:13     ` Andres Freund
2020-07-04  1:52       ` Jens Axboe
2020-07-04  2:08         ` Jens Axboe
2020-07-04  2:56           ` Jens Axboe [this message]
2020-07-04 14:55             ` Jens Axboe
2020-07-04 19:11               ` Andres Freund
2020-07-04 19:45                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox