Suggestion: chain SQEs - single CQE for N chained SQEs

public inbox for [email protected]
 help / color / mirror / Atom feed

* Suggestion: chain SQEs - single CQE for N chained SQEs
@ 2020-04-18 12:49 H. de Vries
  2020-04-18 13:50 ` Pavel Begunkov
  0 siblings, 1 reply; 4+ messages in thread
From: H. de Vries @ 2020-04-18 12:49 UTC (permalink / raw)
  To: io-uring; +Cc: axboe

Hi,

Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793

Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. 

After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. 

However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss.

If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop.

Regards,
Hielke de Vries

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion: chain SQEs - single CQE for N chained SQEs
  2020-04-18 12:49 Suggestion: chain SQEs - single CQE for N chained SQEs H. de Vries
@ 2020-04-18 13:50 ` Pavel Begunkov
  2020-04-18 18:18   ` H. de Vries
  0 siblings, 1 reply; 4+ messages in thread
From: Pavel Begunkov @ 2020-04-18 13:50 UTC (permalink / raw)
  To: H. de Vries, io-uring; +Cc: axboe

On 4/18/2020 3:49 PM, H. de Vries wrote:
> Hi,
> 
> Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793
> 
> Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. 
> 
> After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. 
> 
> However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss.
> 
> If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop.

I've played with the idea before [1], it always returns only one CQE per
link, (for the last request on success, or for a failed one otherwise).
Looks like what you're suggesting. Is that so? As for me, it's just
simpler to deal with links on the user side.

It's actually in my TODO for 5.8, but depends on some changes for
sequences/drains/timeouts, that hopefully we'll push soon. We just need
to be careful to e.g. not lose CQEs with BIDs for IOSQE_BUFFER_SELECT
requests.

[1]
https://lore.kernel.org/io-uring/[email protected]/

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion: chain SQEs - single CQE for N chained SQEs
  2020-04-18 13:50 ` Pavel Begunkov
@ 2020-04-18 18:18   ` H. de Vries
  2020-04-19 10:43     ` Pavel Begunkov
  0 siblings, 1 reply; 4+ messages in thread
From: H. de Vries @ 2020-04-18 18:18 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe

Hi Pavel,

Yes, [1] is what I mean. In an event loop every CQE is handled by a new iteration in the loop, this is the "expensive" part. Less CQEs, less iterations. It is nice to see possible kernel performance gains [2] as well, but I suggested this specifically in the case of event loops.

Can you elaborate on “handling links from the user side”? 

[2] 
https://lore.kernel.org/io-uring/[email protected]/#t

--
Hielke de Vries


On Sat, Apr 18, 2020, at 15:50, Pavel Begunkov wrote:
> On 4/18/2020 3:49 PM, H. de Vries wrote:
> > Hi,
> > 
> > Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793
> > 
> > Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. 
> > 
> > After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. 
> > 
> > However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss.
> > 
> > If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop.
> 
> I've played with the idea before [1], it always returns only one CQE per
> link, (for the last request on success, or for a failed one otherwise).
> Looks like what you're suggesting. Is that so? As for me, it's just
> simpler to deal with links on the user side.
> 
> It's actually in my TODO for 5.8, but depends on some changes for
> sequences/drains/timeouts, that hopefully we'll push soon. We just need
> to be careful to e.g. not lose CQEs with BIDs for IOSQE_BUFFER_SELECT
> requests.
> 
> [1]
> https://lore.kernel.org/io-uring/[email protected]/
> 
> -- 
> Pavel Begunkov
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Suggestion: chain SQEs - single CQE for N chained SQEs
  2020-04-18 18:18   ` H. de Vries
@ 2020-04-19 10:43     ` Pavel Begunkov
  0 siblings, 0 replies; 4+ messages in thread
From: Pavel Begunkov @ 2020-04-19 10:43 UTC (permalink / raw)
  To: H. de Vries, io-uring; +Cc: axboe

On 4/18/2020 9:18 PM, H. de Vries wrote:
> Hi Pavel,
> 
> Yes, [1] is what I mean. In an event loop every CQE is handled by a new iteration in the loop, this is the "expensive" part. Less CQEs, less iterations. It is nice to see possible kernel performance gains [2] as well, but I suggested this specifically in the case of event loops.
> 
> Can you elaborate on “handling links from the user side”? 

Long story short, fail recovery and tracking of links in the userspace
would be easier if having 1 cqe per link.

TL;DR;
Applications usually want to do some action, which is represented by a
ordered (linked) set of requests. And it should be common to have
similar code structure

e.g. cqe->user_data points to struct request, which are kept in a list.
Possibly with request->action pointing to a common "action" struct
instance tracking current stage (i.e. state machine), etc. And with that
you can do fail recovery (e.g. re-submitting failed ones) / rollback,
etc. That's especially useful for hi-level libraries.

And now let's see what an application should consider in case of a
failure. I'll use the following example:
SQ: req_n, (linked) req0 -> req1 -> req2

1. it should reap the failure event + all -ECANCELED. And they can lie
in CQ not sequentially, but with other events in between.
e.g. CQ: req0(failed), req_n, req1(-CANCELED), req2(-CANCELED)

2. CQEs can get there out of order (only when failed during submission).
e.g. CQ: req2(failed), req0(-ECANCELED), req1(-ECANCELED)

3. io_uring may have not consumed all SQEs of the link, so it needs to
do some cleanup there as well.
e.g. CQ: req0(failed), SQ after submit: req1 -> req2

It's just hell to handle it right. I was lifting them with recent
patches and 1 yet stashed, but still with the feature it could be as
simple as:

req = cqe->user_data;
act = req->action;
while (act->stage != req->num) {
    complete_and_remove_req(&act->request_list_head);
    act->stage++;
}

> [2] 
> https://lore.kernel.org/io-uring/[email protected]/#t
> 
> --
> Hielke de Vries
> 
> 
> On Sat, Apr 18, 2020, at 15:50, Pavel Begunkov wrote:
>> On 4/18/2020 3:49 PM, H. de Vries wrote:
>>> Hi,
>>>
>>> Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793
>>>
>>> Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. 
>>>
>>> After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. 
>>>
>>> However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss.
>>>
>>> If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop.
>>
>> I've played with the idea before [1], it always returns only one CQE per
>> link, (for the last request on success, or for a failed one otherwise).
>> Looks like what you're suggesting. Is that so? As for me, it's just
>> simpler to deal with links on the user side.
>>
>> It's actually in my TODO for 5.8, but depends on some changes for
>> sequences/drains/timeouts, that hopefully we'll push soon. We just need
>> to be careful to e.g. not lose CQEs with BIDs for IOSQE_BUFFER_SELECT
>> requests.
>>
>> [1]
>> https://lore.kernel.org/io-uring/[email protected]/
>>
>> -- 
>> Pavel Begunkov
>>

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-19 10:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-18 12:49 Suggestion: chain SQEs - single CQE for N chained SQEs H. de Vries
2020-04-18 13:50 ` Pavel Begunkov
2020-04-18 18:18   ` H. de Vries
2020-04-19 10:43     ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox