From: Jens Axboe <[email protected]>
To: Andres Freund <[email protected]>, [email protected]
Subject: Re: signals not reliably interrupting io_uring_enter anymore
Date: Fri, 3 Jul 2020 20:08:17 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 7/3/20 7:52 PM, Jens Axboe wrote:
> On 7/3/20 7:13 PM, Andres Freund wrote:
>> Hi,
>>
>> On July 3, 2020 5:48:21 PM PDT, Jens Axboe <[email protected]> wrote:
>>> On 7/3/20 6:15 PM, Andres Freund wrote:
>>>> Hi,
>>>>
>>>> On 2020-07-03 17:00:49 -0700, Andres Freund wrote:
>>>>> I haven't yet fully analyzed the problem, but after updating to
>>>>> cdd3bb54332f82295ed90cd0c09c78cd0c0ee822 io_uring using postgres
>>> does
>>>>> not work reliably anymore.
>>>>>
>>>>> The symptom is that io_uring_enter(IORING_ENTER_GETEVENTS) isn't
>>>>> interrupted by signals anymore. The signal handler executes, but
>>>>> afterwards the syscall is restarted. Previously io_uring_enter
>>> reliably
>>>>> returned EINTR in that case.
>>>>>
>>>>> Currently postgres relies on signals interrupting io_uring_enter().
>>> We
>>>>> probably can find a way to not do so, but it'd not be entirely
>>> trivial.
>>>>>
>>>>> I suspect the issue is
>>>>>
>>>>> commit ce593a6c480a22acba08795be313c0c6d49dd35d (tag:
>>> io_uring-5.8-2020-07-01, linux-block/io_uring-5.8)
>>>>> Author: Jens Axboe <[email protected]>
>>>>> Date: 2020-06-30 12:39:05 -0600
>>>>>
>>>>> io_uring: use signal based task_work running
>>>>>
>>>>> as that appears to have changed the error returned by
>>>>> io_uring_enter(GETEVENTS) after having been interrupted by a signal
>>> from
>>>>> EINTR to ERESTARTSYS.
>>>>>
>>>>>
>>>>> I'll check to make sure that the issue doesn't exist before the
>>> above
>>>>> commit.
>>>>
>>>> Indeed, on cd77006e01b3198c75fb7819b3d0ff89709539bb the PG issue
>>> doesn't
>>>> exist, which pretty much confirms that the above commit is the issue.
>>>>
>>>> What was the reason for changing EINTR to ERESTARTSYS in the above
>>>> commit? I assume trying to avoid returning spurious EINTRs to
>>> userland?
>>>
>>> Yeah, for when it's running task_work. I wonder if something like the
>>> below will do the trick?
>>>
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 700644a016a7..0efa73d78451 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx
>>> *ctx, int min_events,
>>> do {
>>> prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
>>> TASK_INTERRUPTIBLE);
>>> - /* make sure we run task_work before checking for signals */
>>> - if (current->task_works)
>>> - task_work_run();
>>> if (signal_pending(current)) {
>>> - ret = -ERESTARTSYS;
>>> + if (current->task_works)
>>> + ret = -ERESTARTSYS;
>>> + else
>>> + ret = -EINTR;
>>> break;
>>> }
>>> if (io_should_wake(&iowq, false))
>>> @@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx
>>> *ctx, int min_events,
>>> } while (1);
>>> finish_wait(&ctx->wait, &iowq.wq);
>>>
>>> - restore_saved_sigmask_unless(ret == -ERESTARTSYS);
>>> + restore_saved_sigmask_unless(ret == -EINTR);
>>>
>>> return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret :
>>> 0;
>>> }
>>
>> I'll try in a bit. Suspect however that there'd be trouble if there
>> were both an actual signal and task work pending?
>
> Yes, I have that worry too. We'd really need to check if we have an
> actual signal pending - if we do, we still do -EINTR. If not, then we
> just do -ERESTARTSYS and restart the system call after task_work has
> been completed. Half-assed approach below, I suspect this won't _really_
> work without appropriate locking. Which would be unfortunate.
>
> Either that, or we'd need to know if an actual signal was delivered when
> we get re-entered due to returning -ERESTARTSYS. If it was just
> task_work being run, then we're fine. But if an actual signal was
> pending, then we'd need to return -EINTR.
>
> CC'ing Oleg to see if he has any good ideas here.
This might be simpler:
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 700644a016a7..8f54fd5085bf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6197,11 +6197,11 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
do {
prepare_to_wait_exclusive(&ctx->wait, &iowq.wq,
TASK_INTERRUPTIBLE);
- /* make sure we run task_work before checking for signals */
- if (current->task_works)
- task_work_run();
if (signal_pending(current)) {
- ret = -ERESTARTSYS;
+ if (!sigisemptyset(¤t->pending.signal))
+ ret = -EINTR;
+ else
+ ret = -ERESTARTSYS;
break;
}
if (io_should_wake(&iowq, false))
@@ -6210,7 +6210,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
} while (1);
finish_wait(&ctx->wait, &iowq.wq);
- restore_saved_sigmask_unless(ret == -ERESTARTSYS);
+ restore_saved_sigmask_unless(ret == -EINTR);
return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
}
--
Jens Axboe
next prev parent reply other threads:[~2020-07-04 2:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-04 0:00 signals not reliably interrupting io_uring_enter anymore Andres Freund
2020-07-04 0:15 ` Andres Freund
2020-07-04 0:48 ` Jens Axboe
2020-07-04 1:13 ` Andres Freund
2020-07-04 1:52 ` Jens Axboe
2020-07-04 2:08 ` Jens Axboe [this message]
2020-07-04 2:56 ` Jens Axboe
2020-07-04 14:55 ` Jens Axboe
2020-07-04 19:11 ` Andres Freund
2020-07-04 19:45 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox