* Suggestion: chain SQEs - single CQE for N chained SQEs @ 2020-04-18 12:49 H. de Vries 2020-04-18 13:50 ` Pavel Begunkov 0 siblings, 1 reply; 4+ messages in thread From: H. de Vries @ 2020-04-18 12:49 UTC (permalink / raw) To: io-uring; +Cc: axboe Hi, Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793 Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss. If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop. Regards, Hielke de Vries ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Suggestion: chain SQEs - single CQE for N chained SQEs 2020-04-18 12:49 Suggestion: chain SQEs - single CQE for N chained SQEs H. de Vries @ 2020-04-18 13:50 ` Pavel Begunkov 2020-04-18 18:18 ` H. de Vries 0 siblings, 1 reply; 4+ messages in thread From: Pavel Begunkov @ 2020-04-18 13:50 UTC (permalink / raw) To: H. de Vries, io-uring; +Cc: axboe On 4/18/2020 3:49 PM, H. de Vries wrote: > Hi, > > Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793 > > Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. > > After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. > > However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss. > > If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop. I've played with the idea before [1], it always returns only one CQE per link, (for the last request on success, or for a failed one otherwise). Looks like what you're suggesting. Is that so? As for me, it's just simpler to deal with links on the user side. It's actually in my TODO for 5.8, but depends on some changes for sequences/drains/timeouts, that hopefully we'll push soon. We just need to be careful to e.g. not lose CQEs with BIDs for IOSQE_BUFFER_SELECT requests. [1] https://lore.kernel.org/io-uring/[email protected]/ -- Pavel Begunkov ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Suggestion: chain SQEs - single CQE for N chained SQEs 2020-04-18 13:50 ` Pavel Begunkov @ 2020-04-18 18:18 ` H. de Vries 2020-04-19 10:43 ` Pavel Begunkov 0 siblings, 1 reply; 4+ messages in thread From: H. de Vries @ 2020-04-18 18:18 UTC (permalink / raw) To: Pavel Begunkov, io-uring; +Cc: axboe Hi Pavel, Yes, [1] is what I mean. In an event loop every CQE is handled by a new iteration in the loop, this is the "expensive" part. Less CQEs, less iterations. It is nice to see possible kernel performance gains [2] as well, but I suggested this specifically in the case of event loops. Can you elaborate on “handling links from the user side”? [2] https://lore.kernel.org/io-uring/[email protected]/#t -- Hielke de Vries On Sat, Apr 18, 2020, at 15:50, Pavel Begunkov wrote: > On 4/18/2020 3:49 PM, H. de Vries wrote: > > Hi, > > > > Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793 > > > > Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. > > > > After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. > > > > However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss. > > > > If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop. > > I've played with the idea before [1], it always returns only one CQE per > link, (for the last request on success, or for a failed one otherwise). > Looks like what you're suggesting. Is that so? As for me, it's just > simpler to deal with links on the user side. > > It's actually in my TODO for 5.8, but depends on some changes for > sequences/drains/timeouts, that hopefully we'll push soon. We just need > to be careful to e.g. not lose CQEs with BIDs for IOSQE_BUFFER_SELECT > requests. > > [1] > https://lore.kernel.org/io-uring/[email protected]/ > > -- > Pavel Begunkov > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Suggestion: chain SQEs - single CQE for N chained SQEs 2020-04-18 18:18 ` H. de Vries @ 2020-04-19 10:43 ` Pavel Begunkov 0 siblings, 0 replies; 4+ messages in thread From: Pavel Begunkov @ 2020-04-19 10:43 UTC (permalink / raw) To: H. de Vries, io-uring; +Cc: axboe On 4/18/2020 9:18 PM, H. de Vries wrote: > Hi Pavel, > > Yes, [1] is what I mean. In an event loop every CQE is handled by a new iteration in the loop, this is the "expensive" part. Less CQEs, less iterations. It is nice to see possible kernel performance gains [2] as well, but I suggested this specifically in the case of event loops. > > Can you elaborate on “handling links from the user side”? Long story short, fail recovery and tracking of links in the userspace would be easier if having 1 cqe per link. TL;DR; Applications usually want to do some action, which is represented by a ordered (linked) set of requests. And it should be common to have similar code structure e.g. cqe->user_data points to struct request, which are kept in a list. Possibly with request->action pointing to a common "action" struct instance tracking current stage (i.e. state machine), etc. And with that you can do fail recovery (e.g. re-submitting failed ones) / rollback, etc. That's especially useful for hi-level libraries. And now let's see what an application should consider in case of a failure. I'll use the following example: SQ: req_n, (linked) req0 -> req1 -> req2 1. it should reap the failure event + all -ECANCELED. And they can lie in CQ not sequentially, but with other events in between. e.g. CQ: req0(failed), req_n, req1(-CANCELED), req2(-CANCELED) 2. CQEs can get there out of order (only when failed during submission). e.g. CQ: req2(failed), req0(-ECANCELED), req1(-ECANCELED) 3. io_uring may have not consumed all SQEs of the link, so it needs to do some cleanup there as well. e.g. CQ: req0(failed), SQ after submit: req1 -> req2 It's just hell to handle it right. I was lifting them with recent patches and 1 yet stashed, but still with the feature it could be as simple as: req = cqe->user_data; act = req->action; while (act->stage != req->num) { complete_and_remove_req(&act->request_list_head); act->stage++; } > [2] > https://lore.kernel.org/io-uring/[email protected]/#t > > -- > Hielke de Vries > > > On Sat, Apr 18, 2020, at 15:50, Pavel Begunkov wrote: >> On 4/18/2020 3:49 PM, H. de Vries wrote: >>> Hi, >>> >>> Following up on the discussion from here: https://twitter.com/i/status/1234135064323280897 and https://twitter.com/hielkedv/status/1250445647565729793 >>> >>> Using io_uring in event loops with IORING_FEAT_FAST_POLL can give a performance boost compared to epoll (https://twitter.com/hielkedv/status/1234135064323280897). However we need some way to manage 'in-flight' buffers, and IOSQE_BUFFER_SELECT is a solution for this. >>> >>> After a buffer has been used, it can be re-registered using IOSQE_BUFFER_SELECT by giving it a buffer ID (BID). We can also initially register a range of buffers, with e.g. BIDs 0-1000 . When buffer registration for this range is completed, this will result in a single CQE. >>> >>> However, because (network) events complete quite random, we cannot re-register a range of buffers. Maybe BIDs 3, 7, 39 and 420 are ready to be reused, but the rest of the buffers is still in-flight. So in each iteration of the event loop we need to re-register the buffer, which will result in one additional CQE for each event. The amount of CQEs to be handled in the event loop now becomes 2 times as much. If you're dealing with 200k requests per second, this can result in quite some performance loss. >>> >>> If it would be possible to register multiple buffers by e.g. chaining multiple SQEs that would result in a single CQE, we could save many event loop iterations and increase performance of the event loop. >> >> I've played with the idea before [1], it always returns only one CQE per >> link, (for the last request on success, or for a failed one otherwise). >> Looks like what you're suggesting. Is that so? As for me, it's just >> simpler to deal with links on the user side. >> >> It's actually in my TODO for 5.8, but depends on some changes for >> sequences/drains/timeouts, that hopefully we'll push soon. We just need >> to be careful to e.g. not lose CQEs with BIDs for IOSQE_BUFFER_SELECT >> requests. >> >> [1] >> https://lore.kernel.org/io-uring/[email protected]/ >> >> -- >> Pavel Begunkov >> -- Pavel Begunkov ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-04-19 10:43 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-04-18 12:49 Suggestion: chain SQEs - single CQE for N chained SQEs H. de Vries 2020-04-18 13:50 ` Pavel Begunkov 2020-04-18 18:18 ` H. de Vries 2020-04-19 10:43 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox