From: Pavel Begunkov <[email protected]>
To: Avi Kivity <[email protected]>, Jens Axboe <[email protected]>,
[email protected]
Subject: Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
Date: Wed, 15 Jun 2022 12:30:55 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 6/15/22 12:04, Avi Kivity wrote:
>
> On 15/06/2022 13.48, Pavel Begunkov wrote:
>> On 6/15/22 11:12, Avi Kivity wrote:
>>>
>>> On 19/04/2022 20.14, Jens Axboe wrote:
>>>> On 4/19/22 9:21 AM, Jens Axboe wrote:
>>>>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>>>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>>>>>> inline completion vs workqueue completion:
>>>>>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>>>>>> completions.
>>>>>>>>> I measured this:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Performance counter stats for 'system wide':
>>>>>>>>>
>>>>>>>>> 1,273,756 io_uring:io_uring_task_add
>>>>>>>>>
>>>>>>>>> 12.288597765 seconds time elapsed
>>>>>>>>>
>>>>>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>>>>>> counter.
>>>>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>>>>> expected.
>>>>> Might actually be implicated. Not because it's a async worker, but
>>>>> because I think we might be losing some affinity in this case. Looking
>>>>> at traces, we're definitely bouncing between the poll completion side
>>>>> and then execution the completion.
>>>>>
>>>>> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
>>>>> me know what base you prefer, I can do a version against that. I see
>>>>> about a 3% win with io_uring with this, and was slower before against
>>>>> linux-aio as you saw as well.
>>>> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
>>>> believe may be the underlying cause of it.
>>>>
>>>
>>> Resurrecting an old thread. I have a question about timeliness of completions. Let's assume a request has completed. From the patch, it appears that io_uring will only guarantee that a completion appears on the completion ring if the thread has entered kernel mode since the completion happened. So user-space polling of the completion ring can cause unbounded delays.
>>
>> Right, but polling the CQ is a bad pattern, io_uring_{wait,peek}_cqe/etc.
>> will do the polling vs syscalling dance for you.
>
>
> Can you be more explicit?
>
>
> I don't think peek is enough. If there is a cqe pending, it will return it, but will not cause compeleted-but-unqueued events to generate completions.
>
>
> And wait won't enter the kernel if a cqe is pending, IIUC.
Right, usually it won't, but works if you eventually end up
waiting, e.g. by waiting for all expected cqes.
>> For larger audience, I'll remind that it's an opt-in feature
>>
>
> I don't understand - what is an opt-in feature?
The behaviour that you worry about when CQEs are not posted until
you do syscall, it's only so if you set IORING_SETUP_COOP_TASKRUN.
>>> If this is correct (it's not unreasonable, but should be documented), then there should also be a simple way to force a kernel entry. But how to do this using liburing? IIUC if I the following apply:
>>>
>>>
>>> 1. I have no pending sqes
>>>
>>> 2. There are pending completions
>>>
>>> 3. There is a completed event for which a completion has not been appended to the completion queue ring
>>>
>>>
>>> Then io_uring_wait_cqe() will elide io_uring_enter() and the completed-but-not-reported event will be delayed.
>>
>> One way is to process all CQEs and then it'll try to enter the
>> kernel and do the job.
>>
>> Another way is to also set IORING_SETUP_TASKRUN_FLAG, then when
>> there is work that requires to enter the kernel io_uring will
>> set IORING_SQ_TASKRUN in sq_flags.
>> Actually, I'm not mistaken io_uring has some automagic handling
>> of it internally
>>
>> https://github.com/axboe/liburing/blob/master/src/queue.c#L36
>>
>>
>>
--
Pavel Begunkov
next prev parent reply other threads:[~2022-06-15 11:31 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-19 11:07 IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL Avi Kivity
2022-04-19 11:38 ` Jens Axboe
2022-04-19 11:57 ` Avi Kivity
2022-04-19 12:04 ` Jens Axboe
2022-04-19 12:21 ` Avi Kivity
2022-04-19 12:31 ` Jens Axboe
2022-04-19 15:21 ` Jens Axboe
2022-04-19 15:51 ` Avi Kivity
2022-04-19 17:14 ` Jens Axboe
2022-04-19 19:41 ` Avi Kivity
2022-04-19 19:58 ` Jens Axboe
2022-04-20 11:55 ` Avi Kivity
2022-04-20 12:09 ` Jens Axboe
2022-04-21 9:05 ` Avi Kivity
2022-06-15 10:12 ` Avi Kivity
2022-06-15 10:48 ` Pavel Begunkov
2022-06-15 11:04 ` Avi Kivity
2022-06-15 11:07 ` Avi Kivity
2022-06-15 11:38 ` Pavel Begunkov
2022-06-15 12:21 ` Jens Axboe
2022-06-15 13:43 ` Avi Kivity
2022-06-15 11:30 ` Pavel Begunkov [this message]
2022-06-15 11:36 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox