From: Pavel Begunkov <[email protected]>
To: Ming Lei <[email protected]>
Cc: Kevin Wolf <[email protected]>, Jens Axboe <[email protected]>,
[email protected], [email protected]
Subject: Re: [PATCH 5/9] io_uring: support SQE group
Date: Thu, 2 May 2024 15:28:01 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <ZjEHhRoGP8z4syuP@fedora>
On 4/30/24 16:00, Ming Lei wrote:
> On Tue, Apr 30, 2024 at 01:27:10PM +0100, Pavel Begunkov wrote:
...
>>>> And what does it achieve? The infra has matured since early days,
>>>> it saves user-kernel transitions at best but not context switching
>>>> overhead, and not even that if you do wait(1) and happen to catch
>>>> middle CQEs. And it disables LAZY_WAKE, so CQ side batching with
>>>> timers and what not is effectively useless with links.
>>>
>>> Not only the context switch, it supports 1:N or N:M dependency which
>>
>> I completely missed, how N:M is supported? That starting to sound
>> terrifying.
>
> N:M is actually from Kevin's idea.
>
> sqe group can be made to be more flexible by:
>
> Inside the group, all SQEs are submitted in parallel, so there isn't any
> dependency among SQEs in one group.
>
> The 1st SQE is group leader, and the other SQEs are group member. The whole
> group share single IOSQE_IO_LINK and IOSQE_IO_DRAIN from group leader, and
> the two flags can't be set for group members.
>
> When the group is in one link chain, this group isn't submitted until
> the previous SQE or group is completed. And the following SQE or group
> can't be started if this group isn't completed.
>
> When IOSQE_IO_DRAIN is set for group leader, all requests in this group
> and previous requests submitted are drained. Given IOSQE_IO_DRAIN can
> be set for group leader only, we respect IO_DRAIN for SQE group by
> always completing group leader as the last on in the group.
>
> SQE group provides flexible way to support N:M dependency, such as:
>
> - group A is chained with group B together by IOSQE_IO_LINK
> - group A has N SQEs
> - group B has M SQEs
>
> then M SQEs in group B depend on N SQEs in group A.
>
>
>>
>>> is missing in io_uring, but also makes async application easier to write by
>>> saving extra context switches, which just adds extra intermediate states for
>>> application.
>>
>> You're still executing requests (i.e. ->issue) primarily from the
>> submitter task context, they would still fly back to the task and
>> wake it up. You may save something by completing all of them
>> together via that refcounting, but you might just as well try to
>> batch CQ, which is a more generic issue. It's not clear what
>> context switches you save then.
>
> Wrt. the above N:M example, one io_uring_enter() is enough, and
> it can't be done in single context switch without sqe group, please
> see the liburing test code:
Do you mean doing all that in a single system call? The main
performance problem for io_uring is waiting, i.e. schedule()ing
the task out and in, that's what I meant by context switching.
--
Pavel Begunkov
next prev parent reply other threads:[~2024-05-02 14:27 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-08 1:03 [RFC PATCH 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-04-08 1:03 ` [PATCH 1/9] io_uring: net: don't check sqe->__pad2[0] for send zc Ming Lei
2024-04-08 1:03 ` [PATCH 2/9] io_uring: support user sqe ext flags Ming Lei
2024-04-22 18:16 ` Jens Axboe
2024-04-23 13:57 ` Ming Lei
2024-04-29 15:24 ` Pavel Begunkov
2024-04-30 3:43 ` Ming Lei
2024-04-30 12:00 ` Pavel Begunkov
2024-04-30 12:56 ` Ming Lei
2024-04-30 14:10 ` Pavel Begunkov
2024-04-30 15:46 ` Ming Lei
2024-05-02 14:22 ` Pavel Begunkov
2024-05-04 1:19 ` Ming Lei
2024-04-08 1:03 ` [PATCH 3/9] io_uring: add helper for filling cqes in __io_submit_flush_completions() Ming Lei
2024-04-08 1:03 ` [PATCH 4/9] io_uring: add one output argument to io_submit_sqe Ming Lei
2024-04-08 1:03 ` [PATCH 5/9] io_uring: support SQE group Ming Lei
2024-04-22 18:27 ` Jens Axboe
2024-04-23 13:08 ` Kevin Wolf
2024-04-24 1:39 ` Ming Lei
2024-04-25 9:27 ` Kevin Wolf
2024-04-26 7:53 ` Ming Lei
2024-04-26 17:05 ` Kevin Wolf
2024-04-29 3:34 ` Ming Lei
2024-04-29 15:48 ` Pavel Begunkov
2024-04-30 3:07 ` Ming Lei
2024-04-29 15:32 ` Pavel Begunkov
2024-04-30 3:03 ` Ming Lei
2024-04-30 12:27 ` Pavel Begunkov
2024-04-30 15:00 ` Ming Lei
2024-05-02 14:09 ` Pavel Begunkov
2024-05-04 1:56 ` Ming Lei
2024-05-02 14:28 ` Pavel Begunkov [this message]
2024-04-24 0:46 ` Ming Lei
2024-04-08 1:03 ` [PATCH 6/9] io_uring: support providing sqe group buffer Ming Lei
2024-04-08 1:03 ` [PATCH 7/9] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-04-08 1:03 ` [PATCH 8/9] ublk: support provide io buffer Ming Lei
2024-04-08 1:03 ` [RFC PATCH 9/9] liburing: support sqe ext_flags & sqe group Ming Lei
2024-04-19 0:55 ` [RFC PATCH 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox