From: Pavel Begunkov <[email protected]>
To: Ming Lei <[email protected]>, Jens Axboe <[email protected]>,
[email protected]
Cc: [email protected], Kevin Wolf <[email protected]>
Subject: Re: [PATCH V3 5/9] io_uring: support SQE group
Date: Mon, 10 Jun 2024 02:55:22 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <ZkwNxxUM7jqzpqgg@fedora>
On 5/21/24 03:58, Ming Lei wrote:
> On Sat, May 11, 2024 at 08:12:08AM +0800, Ming Lei wrote:
>> SQE group is defined as one chain of SQEs starting with the first SQE that
>> has IOSQE_SQE_GROUP set, and ending with the first subsequent SQE that
>> doesn't have it set, and it is similar with chain of linked SQEs.
>>
>> Not like linked SQEs, each sqe is issued after the previous one is completed.
>> All SQEs in one group are submitted in parallel, so there isn't any dependency
>> among SQEs in one group.
>>
>> The 1st SQE is group leader, and the other SQEs are group member. The whole
>> group share single IOSQE_IO_LINK and IOSQE_IO_DRAIN from group leader, and
>> the two flags are ignored for group members.
>>
>> When the group is in one link chain, this group isn't submitted until the
>> previous SQE or group is completed. And the following SQE or group can't
>> be started if this group isn't completed. Failure from any group member will
>> fail the group leader, then the link chain can be terminated.
>>
>> When IOSQE_IO_DRAIN is set for group leader, all requests in this group and
>> previous requests submitted are drained. Given IOSQE_IO_DRAIN can be set for
>> group leader only, we respect IO_DRAIN by always completing group leader as
>> the last one in the group.
>>
>> Working together with IOSQE_IO_LINK, SQE group provides flexible way to
>> support N:M dependency, such as:
>>
>> - group A is chained with group B together
>> - group A has N SQEs
>> - group B has M SQEs
>>
>> then M SQEs in group B depend on N SQEs in group A.
>>
>> N:M dependency can support some interesting use cases in efficient way:
>>
>> 1) read from multiple files, then write the read data into single file
>>
>> 2) read from single file, and write the read data into multiple files
>>
>> 3) write same data into multiple files, and read data from multiple files and
>> compare if correct data is written
>>
>> Also IOSQE_SQE_GROUP takes the last bit in sqe->flags, but we still can
>> extend sqe->flags with one uring context flag, such as use __pad3 for
>> non-uring_cmd OPs and part of uring_cmd_flags for uring_cmd OP.
>>
>> Suggested-by: Kevin Wolf <[email protected]>
>> Signed-off-by: Ming Lei <[email protected]>
>
> BTW, I wrote one link-grp-cp.c liburing/example which is based on sqe group,
> and keep QD not changed, just re-organize IOs in the following ways:
>
> - each group have 4 READ IOs, linked by one single write IO for writing
> the read data in sqe group to destination file
IIUC it's comparing 1 large write request with 4 small, and
it's not exactly anything close to fair. And you can do same
in userspace (without links). And having control in userspace
you can do more fun tricks, like interleaving writes for one
batch with reads from the next batch.
> - the 1st 12 groups have (4 + 1) IOs, and the last group have (3 + 1)
> IOs
>
>
> Run the example for copying two block device(from virtio-blk to
> virtio-scsi in my test VM):
>
> 1) buffered copy:
> - perf is improved by 5%
>
> 2) direct IO mode
> - perf is improved by 27%
>
>
> [1] link-grp-cp.c example
>
> https://github.com/ming1/liburing/commits/sqe_group_v2/
>
>
> [2] one bug fixes(top commit) against V3
>
> https://github.com/ming1/linux/commits/io_uring_sqe_group_v3/
>
>
>
> Thanks,
> Ming
>
--
Pavel Begunkov
next prev parent reply other threads:[~2024-06-10 1:55 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-11 0:12 [PATCH V3 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-05-11 0:12 ` [PATCH V3 1/9] io_uring: add io_link_req() helper Ming Lei
2024-05-11 0:12 ` [PATCH V3 2/9] io_uring: add io_submit_fail_link() helper Ming Lei
2024-05-11 0:12 ` [PATCH V3 3/9] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-06-10 1:18 ` Pavel Begunkov
2024-06-11 13:21 ` Ming Lei
2024-05-11 0:12 ` [PATCH V3 4/9] io_uring: move marking REQ_F_CQE_SKIP out of io_free_req() Ming Lei
2024-06-10 1:23 ` Pavel Begunkov
2024-06-11 13:28 ` Ming Lei
2024-06-16 18:08 ` Pavel Begunkov
2024-05-11 0:12 ` [PATCH V3 5/9] io_uring: support SQE group Ming Lei
2024-05-21 2:58 ` Ming Lei
2024-06-10 1:55 ` Pavel Begunkov [this message]
2024-06-11 13:32 ` Ming Lei
2024-06-16 18:14 ` Pavel Begunkov
2024-06-17 1:42 ` Ming Lei
2024-06-10 2:53 ` Pavel Begunkov
2024-06-13 1:45 ` Ming Lei
2024-06-16 19:13 ` Pavel Begunkov
2024-06-17 3:54 ` Ming Lei
2024-05-11 0:12 ` [PATCH V3 6/9] io_uring: support sqe group with members depending on leader Ming Lei
2024-05-11 0:12 ` [PATCH V3 7/9] io_uring: support providing sqe group buffer Ming Lei
2024-06-10 2:00 ` Pavel Begunkov
2024-06-12 0:22 ` Ming Lei
2024-05-11 0:12 ` [PATCH V3 8/9] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-05-11 0:12 ` [PATCH V3 9/9] ublk: support provide io buffer Ming Lei
2024-06-03 0:05 ` [PATCH V3 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-06-07 12:32 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox