Re: Suggestion: chain SQEs - single CQE for N chained SQEs

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: "H. de Vries" <[email protected]>,
	io-uring <[email protected]>
Cc: [email protected]
Subject: Re: Suggestion: chain SQEs - single CQE for N chained SQEs
Date: Sun, 19 Apr 2020 13:43:33 +0300	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 4/18/2020 9:18 PM, H. de Vries wrote:
> Hi Pavel,
> 
> Yes, [1] is what I mean. In an event loop every CQE is handled by a new iteration in the loop, this is the "expensive" part. Less CQEs, less iterations. It is nice to see possible kernel performance gains [2] as well, but I suggested this specifically in the case of event loops.
> 
> Can you elaborate on “handling links from the user side”? 

Long story short, fail recovery and tracking of links in the userspace
would be easier if having 1 cqe per link.

TL;DR;
Applications usually want to do some action, which is represented by a
ordered (linked) set of requests. And it should be common to have
similar code structure

e.g. cqe->user_data points to struct request, which are kept in a list.
Possibly with request->action pointing to a common "action" struct
instance tracking current stage (i.e. state machine), etc. And with that
you can do fail recovery (e.g. re-submitting failed ones) / rollback,
etc. That's especially useful for hi-level libraries.

And now let's see what an application should consider in case of a
failure. I'll use the following example:
SQ: req_n, (linked) req0 -> req1 -> req2

1. it should reap the failure event + all -ECANCELED. And they can lie
in CQ not sequentially, but with other events in between.
e.g. CQ: req0(failed), req_n, req1(-CANCELED), req2(-CANCELED)

2. CQEs can get there out of order (only when failed during submission).
e.g. CQ: req2(failed), req0(-ECANCELED), req1(-ECANCELED)

3. io_uring may have not consumed all SQEs of the link, so it needs to
do some cleanup there as well.
e.g. CQ: req0(failed), SQ after submit: req1 -> req2

It's just hell to handle it right. I was lifting them with recent
patches and 1 yet stashed, but still with the feature it could be as
simple as:

req = cqe->user_data;
act = req->action;
while (act->stage != req->num) {
    complete_and_remove_req(&act->request_list_head);
    act->stage++;
}

> [2] 
> https://lore.kernel.org/io-uring/[email protected]/#t
> 
> --
> Hielke de Vries
> 
> 
> On Sat, Apr 18, 2020, at 15:50, Pavel Begunkov wrote:
>> On 4/18/2020 3:49 PM, H. de Vries wrote:
>>> Hi,
>>>
>>> Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793
>>>
>>> Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. 
>>>
>>> After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. 
>>>
>>> However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss.
>>>
>>> If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop.
>>
>> I've played with the idea before [1], it always returns only one CQE per
>> link, (for the last request on success, or for a failed one otherwise).
>> Looks like what you're suggesting. Is that so? As for me, it's just
>> simpler to deal with links on the user side.
>>
>> It's actually in my TODO for 5.8, but depends on some changes for
>> sequences/drains/timeouts, that hopefully we'll push soon. We just need
>> to be careful to e.g. not lose CQEs with BIDs for IOSQE_BUFFER_SELECT
>> requests.
>>
>> [1]
>> https://lore.kernel.org/io-uring/[email protected]/
>>
>> -- 
>> Pavel Begunkov
>>

-- 
Pavel Begunkov

     prev parent reply	other threads:[~2020-04-19 10:43 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-18 12:49 Suggestion: chain SQEs - single CQE for N chained SQEs H. de Vries
2020-04-18 13:50 ` Pavel Begunkov
2020-04-18 18:18   ` H. de Vries
2020-04-19 10:43     ` Pavel Begunkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox