public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Ming Lei <[email protected]>
Cc: Jens Axboe <[email protected]>,
	[email protected], [email protected],
	Kevin Wolf <[email protected]>
Subject: Re: [PATCH V3 5/9] io_uring: support SQE group
Date: Sun, 16 Jun 2024 19:14:37 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <ZmhR3/TipsQI5OxN@fedora>

On 6/11/24 14:32, Ming Lei wrote:
> On Mon, Jun 10, 2024 at 02:55:22AM +0100, Pavel Begunkov wrote:
>> On 5/21/24 03:58, Ming Lei wrote:
>>> On Sat, May 11, 2024 at 08:12:08AM +0800, Ming Lei wrote:
>>>> SQE group is defined as one chain of SQEs starting with the first SQE that
>>>> has IOSQE_SQE_GROUP set, and ending with the first subsequent SQE that
>>>> doesn't have it set, and it is similar with chain of linked SQEs.
>>>>
>>>> Not like linked SQEs, each sqe is issued after the previous one is completed.
>>>> All SQEs in one group are submitted in parallel, so there isn't any dependency
>>>> among SQEs in one group.
>>>>
>>>> The 1st SQE is group leader, and the other SQEs are group member. The whole
>>>> group share single IOSQE_IO_LINK and IOSQE_IO_DRAIN from group leader, and
>>>> the two flags are ignored for group members.
>>>>
>>>> When the group is in one link chain, this group isn't submitted until the
>>>> previous SQE or group is completed. And the following SQE or group can't
>>>> be started if this group isn't completed. Failure from any group member will
>>>> fail the group leader, then the link chain can be terminated.
>>>>
>>>> When IOSQE_IO_DRAIN is set for group leader, all requests in this group and
>>>> previous requests submitted are drained. Given IOSQE_IO_DRAIN can be set for
>>>> group leader only, we respect IO_DRAIN by always completing group leader as
>>>> the last one in the group.
>>>>
>>>> Working together with IOSQE_IO_LINK, SQE group provides flexible way to
>>>> support N:M dependency, such as:
>>>>
>>>> - group A is chained with group B together
>>>> - group A has N SQEs
>>>> - group B has M SQEs
>>>>
>>>> then M SQEs in group B depend on N SQEs in group A.
>>>>
>>>> N:M dependency can support some interesting use cases in efficient way:
>>>>
>>>> 1) read from multiple files, then write the read data into single file
>>>>
>>>> 2) read from single file, and write the read data into multiple files
>>>>
>>>> 3) write same data into multiple files, and read data from multiple files and
>>>> compare if correct data is written
>>>>
>>>> Also IOSQE_SQE_GROUP takes the last bit in sqe->flags, but we still can
>>>> extend sqe->flags with one uring context flag, such as use __pad3 for
>>>> non-uring_cmd OPs and part of uring_cmd_flags for uring_cmd OP.
>>>>
>>>> Suggested-by: Kevin Wolf <[email protected]>
>>>> Signed-off-by: Ming Lei <[email protected]>
>>>
>>> BTW, I wrote one link-grp-cp.c liburing/example which is based on sqe group,
>>> and keep QD not changed, just re-organize IOs in the following ways:
>>>
>>> - each group have 4 READ IOs, linked by one single write IO for writing
>>>     the read data in sqe group to destination file
>>
>> IIUC it's comparing 1 large write request with 4 small, and
> 
> It is actually reasonable from storage device viewpoint, concurrent
> small READs are often fast than single big READ, but concurrent small
> writes are usually slower.

It is, but that doesn't make the comparison apple to apple.
Even what I described, even though it's better (same number
of syscalls but better parallelism as you don't block next
batch of reads by writes), you can argues it's not a
completely fair comparison either since needs different number
of buffers, etc.

>> it's not exactly anything close to fair. And you can do same
>> in userspace (without links). And having control in userspace
> 
> No, you can't do it with single syscall.

That's called you _can_ do it. And syscalls is not everything,
context switching turned to be a bigger problem, and to execute
links it does exactly that.

-- 
Pavel Begunkov

  reply	other threads:[~2024-06-16 18:14 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-11  0:12 [PATCH V3 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-05-11  0:12 ` [PATCH V3 1/9] io_uring: add io_link_req() helper Ming Lei
2024-05-11  0:12 ` [PATCH V3 2/9] io_uring: add io_submit_fail_link() helper Ming Lei
2024-05-11  0:12 ` [PATCH V3 3/9] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-06-10  1:18   ` Pavel Begunkov
2024-06-11 13:21     ` Ming Lei
2024-05-11  0:12 ` [PATCH V3 4/9] io_uring: move marking REQ_F_CQE_SKIP out of io_free_req() Ming Lei
2024-06-10  1:23   ` Pavel Begunkov
2024-06-11 13:28     ` Ming Lei
2024-06-16 18:08       ` Pavel Begunkov
2024-05-11  0:12 ` [PATCH V3 5/9] io_uring: support SQE group Ming Lei
2024-05-21  2:58   ` Ming Lei
2024-06-10  1:55     ` Pavel Begunkov
2024-06-11 13:32       ` Ming Lei
2024-06-16 18:14         ` Pavel Begunkov [this message]
2024-06-17  1:42           ` Ming Lei
2024-06-10  2:53   ` Pavel Begunkov
2024-06-13  1:45     ` Ming Lei
2024-06-16 19:13       ` Pavel Begunkov
2024-06-17  3:54         ` Ming Lei
2024-05-11  0:12 ` [PATCH V3 6/9] io_uring: support sqe group with members depending on leader Ming Lei
2024-05-11  0:12 ` [PATCH V3 7/9] io_uring: support providing sqe group buffer Ming Lei
2024-06-10  2:00   ` Pavel Begunkov
2024-06-12  0:22     ` Ming Lei
2024-05-11  0:12 ` [PATCH V3 8/9] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-05-11  0:12 ` [PATCH V3 9/9] ublk: support provide io buffer Ming Lei
2024-06-03  0:05 ` [PATCH V3 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-06-07 12:32   ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox