public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Ming Lei <[email protected]>, [email protected]
Cc: [email protected],
	Pavel Begunkov <[email protected]>,
	Kevin Wolf <[email protected]>
Subject: Re: [PATCH 5/9] io_uring: support SQE group
Date: Mon, 22 Apr 2024 12:27:28 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 4/7/24 7:03 PM, Ming Lei wrote:
> SQE group is defined as one chain of SQEs starting with the first sqe that
> has IOSQE_EXT_SQE_GROUP set, and ending with the first subsequent sqe that
> doesn't have it set, and it is similar with chain of linked sqes.
> 
> The 1st SQE is group leader, and the other SQEs are group member. The group
> leader is always freed after all members are completed. Group members
> aren't submitted until the group leader is completed, and there isn't any
> dependency among group members, and IOSQE_IO_LINK can't be set for group
> members, same with IOSQE_IO_DRAIN.
> 
> Typically the group leader provides or makes resource, and the other members
> consume the resource, such as scenario of multiple backup, the 1st SQE is to
> read data from source file into fixed buffer, the other SQEs write data from
> the same buffer into other destination files. SQE group provides very
> efficient way to complete this task: 1) fs write SQEs and fs read SQE can be
> submitted in single syscall, no need to submit fs read SQE first, and wait
> until read SQE is completed, 2) no need to link all write SQEs together, then
> write SQEs can be submitted to files concurrently. Meantime application is
> simplified a lot in this way.
> 
> Another use case is to for supporting generic device zero copy:
> 
> - the lead SQE is for providing device buffer, which is owned by device or
>   kernel, can't be cross userspace, otherwise easy to cause leak for devil
>   application or panic
> 
> - member SQEs reads or writes concurrently against the buffer provided by lead
>   SQE

In concept, this looks very similar to "sqe bundles" that I played with
in the past:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-bundle

Didn't look too closely yet at the implementation, but in spirit it's
about the same in that the first entry is processed first, and there's
no ordering implied between the test of the members of the bundle /
group.

I do think that's a flexible thing to support, particularly if:

1) We can do it more efficiently than links, which are pretty horrible.
2) It enables new worthwhile use cases
3) It's done cleanly 
4) It's easily understandable and easy to document, so that users will
   actually understand what this is and what use cases it enable. Part
   of that is actually naming, it should be readily apparent what a
   group is, what the lead is, and what the members are. Using your
   terminology here, definitely worth spending some time on that to get
   it just right and self evident.

-- 
Jens Axboe


  reply	other threads:[~2024-04-22 18:27 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-08  1:03 [RFC PATCH 0/9] io_uring: support sqe group and provide group kbuf Ming Lei
2024-04-08  1:03 ` [PATCH 1/9] io_uring: net: don't check sqe->__pad2[0] for send zc Ming Lei
2024-04-08  1:03 ` [PATCH 2/9] io_uring: support user sqe ext flags Ming Lei
2024-04-22 18:16   ` Jens Axboe
2024-04-23 13:57     ` Ming Lei
2024-04-29 15:24       ` Pavel Begunkov
2024-04-30  3:43         ` Ming Lei
2024-04-30 12:00           ` Pavel Begunkov
2024-04-30 12:56             ` Ming Lei
2024-04-30 14:10               ` Pavel Begunkov
2024-04-30 15:46                 ` Ming Lei
2024-05-02 14:22                   ` Pavel Begunkov
2024-05-04  1:19                     ` Ming Lei
2024-04-08  1:03 ` [PATCH 3/9] io_uring: add helper for filling cqes in __io_submit_flush_completions() Ming Lei
2024-04-08  1:03 ` [PATCH 4/9] io_uring: add one output argument to io_submit_sqe Ming Lei
2024-04-08  1:03 ` [PATCH 5/9] io_uring: support SQE group Ming Lei
2024-04-22 18:27   ` Jens Axboe [this message]
2024-04-23 13:08     ` Kevin Wolf
2024-04-24  1:39       ` Ming Lei
2024-04-25  9:27         ` Kevin Wolf
2024-04-26  7:53           ` Ming Lei
2024-04-26 17:05             ` Kevin Wolf
2024-04-29  3:34               ` Ming Lei
2024-04-29 15:48         ` Pavel Begunkov
2024-04-30  3:07           ` Ming Lei
2024-04-29 15:32       ` Pavel Begunkov
2024-04-30  3:03         ` Ming Lei
2024-04-30 12:27           ` Pavel Begunkov
2024-04-30 15:00             ` Ming Lei
2024-05-02 14:09               ` Pavel Begunkov
2024-05-04  1:56                 ` Ming Lei
2024-05-02 14:28               ` Pavel Begunkov
2024-04-24  0:46     ` Ming Lei
2024-04-08  1:03 ` [PATCH 6/9] io_uring: support providing sqe group buffer Ming Lei
2024-04-08  1:03 ` [PATCH 7/9] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-04-08  1:03 ` [PATCH 8/9] ublk: support provide io buffer Ming Lei
2024-04-08  1:03 ` [RFC PATCH 9/9] liburing: support sqe ext_flags & sqe group Ming Lei
2024-04-19  0:55 ` [RFC PATCH 0/9] io_uring: support sqe group and provide group kbuf Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox