From: Ming Lei <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected], Pavel Begunkov <[email protected]>,
[email protected],
Uday Shankar <[email protected]>,
Akilesh Kailash <[email protected]>,
Kevin Wolf <[email protected]>
Subject: Re: [PATCH V8 4/7] io_uring: support SQE group
Date: Tue, 29 Oct 2024 09:50:45 +0800 [thread overview]
Message-ID: <ZyA_dbiU0ho5IJYA@fedora> (raw)
In-Reply-To: <[email protected]>
On Mon, Oct 28, 2024 at 06:12:34PM -0600, Jens Axboe wrote:
> On 10/25/24 6:22 AM, Ming Lei wrote:
> > SQE group is defined as one chain of SQEs starting with the first SQE that
> > has IOSQE_SQE_GROUP set, and ending with the first subsequent SQE that
> > doesn't have it set, and it is similar with chain of linked SQEs.
> >
> > Not like linked SQEs, each sqe is issued after the previous one is
> > completed. All SQEs in one group can be submitted in parallel. To simplify
> > the implementation from beginning, all members are queued after the leader
> > is completed, however, this way may be changed and leader and members may
> > be issued concurrently in future.
> >
> > The 1st SQE is group leader, and the other SQEs are group member. The whole
> > group share single IOSQE_IO_LINK and IOSQE_IO_DRAIN from group leader, and
> > the two flags can't be set for group members. For the sake of
> > simplicity, IORING_OP_LINK_TIMEOUT is disallowed for SQE group now.
> >
> > When the group is in one link chain, this group isn't submitted until the
> > previous SQE or group is completed. And the following SQE or group can't
> > be started if this group isn't completed. Failure from any group member will
> > fail the group leader, then the link chain can be terminated.
> >
> > When IOSQE_IO_DRAIN is set for group leader, all requests in this group and
> > previous requests submitted are drained. Given IOSQE_IO_DRAIN can be set for
> > group leader only, we respect IO_DRAIN by always completing group leader as
> > the last one in the group. Meantime it is natural to post leader's CQE
> > as the last one from application viewpoint.
> >
> > Working together with IOSQE_IO_LINK, SQE group provides flexible way to
> > support N:M dependency, such as:
> >
> > - group A is chained with group B together
> > - group A has N SQEs
> > - group B has M SQEs
> >
> > then M SQEs in group B depend on N SQEs in group A.
> >
> > N:M dependency can support some interesting use cases in efficient way:
> >
> > 1) read from multiple files, then write the read data into single file
> >
> > 2) read from single file, and write the read data into multiple files
> >
> > 3) write same data into multiple files, and read data from multiple files and
> > compare if correct data is written
> >
> > Also IOSQE_SQE_GROUP takes the last bit in sqe->flags, but we still can
> > extend sqe->flags with io_uring context flag, such as use __pad3 for
> > non-uring_cmd OPs and part of uring_cmd_flags for uring_cmd OP.
>
> Since it's taking the last flag, maybe a better idea to have the last
> flag mean "more flags in (for example) __pad3" and put the new flag
> there? Not sure you mean in terms of "io_uring context flag", would it
> be an enter flag? Ring required to be setup with a certain flag? Neither
> of those seem super encouraging, imho.
I meant:
If "more flags in __pad3" is enabled in future we may claim it as one
feature to userspace, such as IORING_FEAT_EXT_FLAG.
Will improve the above commit log.
>
> Apart from that, just a few minor nits below.
>
> > +void io_fail_group_members(struct io_kiocb *req)
> > +{
> > + struct io_kiocb *member = req->grp_link;
> > +
> > + while (member) {
> > + struct io_kiocb *next = member->grp_link;
> > +
> > + if (!(member->flags & REQ_F_FAIL)) {
> > + req_set_fail(member);
> > + io_req_set_res(member, -ECANCELED, 0);
> > + }
> > + member = next;
> > + }
> > +}
> > +
> > +static void io_queue_group_members(struct io_kiocb *req)
> > +{
> > + struct io_kiocb *member = req->grp_link;
> > +
> > + if (!member)
> > + return;
> > +
> > + req->grp_link = NULL;
> > + while (member) {
> > + struct io_kiocb *next = member->grp_link;
> > +
> > + member->grp_leader = req;
> > + if (unlikely(member->flags & REQ_F_FAIL)) {
> > + io_req_task_queue_fail(member, member->cqe.res);
> > + } else if (unlikely(req->flags & REQ_F_FAIL)) {
> > + io_req_task_queue_fail(member, -ECANCELED);
> > + } else {
> > + io_req_task_queue(member);
> > + }
> > + member = next;
> > + }
> > +}
>
> Was going to say don't check for !member, you have the while loop. Which
> is what you do in the helper above. You can also drop the parens in this
> one.
OK, will remove the check `!member` and all parens.
>
> > +static enum group_mem io_prep_free_group_req(struct io_kiocb *req,
> > + struct io_kiocb **leader)
> > +{
> > + /*
> > + * Group completion is done, so clear the flag for avoiding double
> > + * handling in case of io-wq
> > + */
> > + req->flags &= ~REQ_F_SQE_GROUP;
> > +
> > + if (req_is_group_leader(req)) {
> > + /* Queue members now */
> > + if (req->grp_link)
> > + io_queue_group_members(req);
> > + return GROUP_LEADER;
> > + } else {
> > + if (!req_is_last_group_member(req))
> > + return GROUP_OTHER_MEMBER;
> > +
> > + /*
> > + * Prepare for freeing leader which can only be found from
> > + * the last member
> > + */
> > + *leader = req->grp_leader;
> > + (*leader)->flags &= ~REQ_F_SQE_GROUP_LEADER;
> > + req->grp_leader = NULL;
> > + return GROUP_LAST_MEMBER;
> > + }
> > +}
>
> Just drop the second indentation here.
OK.
>
> > @@ -927,7 +1051,8 @@ static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
> > * Handle special CQ sync cases via task_work. DEFER_TASKRUN requires
> > * the submitter task context, IOPOLL protects with uring_lock.
> > */
> > - if (ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL)) {
> > + if (ctx->task_complete || (ctx->flags & IORING_SETUP_IOPOLL) ||
> > + (req->flags & REQ_F_SQE_GROUP)) {
> > req->io_task_work.func = io_req_task_complete;
> > io_req_task_work_add(req);
> > return;
>
> Minor detail, but might be nice with a REQ_F_* flag for this in the
> future.
>
> > @@ -1450,8 +1596,16 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
> > struct io_kiocb *req = container_of(node, struct io_kiocb,
> > comp_list);
> >
> > - if (!(req->flags & REQ_F_CQE_SKIP))
> > - io_req_commit_cqe(ctx, req);
> > + if (unlikely(req->flags & (REQ_F_CQE_SKIP | REQ_F_SQE_GROUP))) {
> > + if (req->flags & REQ_F_SQE_GROUP) {
> > + io_complete_group_req(req);
> > + continue;
> > + }
> > +
> > + if (req->flags & REQ_F_CQE_SKIP)
> > + continue;
> > + }
> > + io_req_commit_cqe(ctx, req);
> > }
> > __io_cq_unlock_post(ctx);
> >
> > @@ -1661,8 +1815,12 @@ static u32 io_get_sequence(struct io_kiocb *req)
> > struct io_kiocb *cur;
> >
> > /* need original cached_sq_head, but it was increased for each req */
> > - io_for_each_link(cur, req)
> > - seq--;
> > + io_for_each_link(cur, req) {
> > + if (req_is_group_leader(cur))
> > + seq -= cur->grp_refs;
> > + else
> > + seq--;
> > + }
> > return seq;
> > }
> >
> > @@ -2124,6 +2282,67 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
> > return def->prep(req, sqe);
> > }
> >
> > +static struct io_kiocb *io_group_sqe(struct io_submit_link *group,
> > + struct io_kiocb *req)
> > +{
> > + /*
> > + * Group chain is similar with link chain: starts with 1st sqe with
> > + * REQ_F_SQE_GROUP, and ends with the 1st sqe without REQ_F_SQE_GROUP
> > + */
> > + if (group->head) {
> > + struct io_kiocb *lead = group->head;
> > +
> > + /*
> > + * Members can't be in link chain, can't be drained, but
> > + * the whole group can be linked or drained by setting
> > + * flags on group leader.
> > + *
> > + * IOSQE_CQE_SKIP_SUCCESS can't be set for member
> > + * for the sake of simplicity
> > + */
> > + if (req->flags & (IO_REQ_LINK_FLAGS | REQ_F_IO_DRAIN |
> > + REQ_F_CQE_SKIP))
> > + req_fail_link_node(lead, -EINVAL);
> > +
> > + lead->grp_refs += 1;
> > + group->last->grp_link = req;
> > + group->last = req;
> > +
> > + if (req->flags & REQ_F_SQE_GROUP)
> > + return NULL;
> > +
> > + req->grp_link = NULL;
> > + req->flags |= REQ_F_SQE_GROUP;
> > + group->head = NULL;
> > +
> > + return lead;
> > + } else {
> > + if (WARN_ON_ONCE(!(req->flags & REQ_F_SQE_GROUP)))
> > + return req;
> > + group->head = req;
> > + group->last = req;
> > + req->grp_refs = 1;
> > + req->flags |= REQ_F_SQE_GROUP_LEADER;
> > + return NULL;
> > + }
> > +}
>
> Same here, drop the 2nd indentation.
OK.
>
> > diff --git a/io_uring/timeout.c b/io_uring/timeout.c
> > index 9973876d91b0..ed6c74f1a475 100644
> > --- a/io_uring/timeout.c
> > +++ b/io_uring/timeout.c
> > @@ -149,6 +149,8 @@ static void io_req_tw_fail_links(struct io_kiocb *link, struct io_tw_state *ts)
> > res = link->cqe.res;
> > link->link = NULL;
> > io_req_set_res(link, res, 0);
> > + if (req_is_group_leader(link))
> > + io_fail_group_members(link);
> > io_req_task_complete(link, ts);
> > link = nxt;
> > }
> > @@ -543,6 +545,10 @@ static int __io_timeout_prep(struct io_kiocb *req,
> > if (is_timeout_link) {
> > struct io_submit_link *link = &req->ctx->submit_state.link;
> >
> > + /* so far disallow IO group link timeout */
> > + if (req->ctx->submit_state.group.head)
> > + return -EINVAL;
> > +
>
> For now, disallow IO group linked timeout
OK.
thanks,
Ming
next prev parent reply other threads:[~2024-10-29 1:51 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 12:22 [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Ming Lei
2024-10-25 12:22 ` [PATCH V8 1/7] io_uring: add io_link_req() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 2/7] io_uring: add io_submit_fail_link() helper Ming Lei
2024-10-25 12:22 ` [PATCH V8 3/7] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-10-25 12:22 ` [PATCH V8 4/7] io_uring: support SQE group Ming Lei
2024-10-29 0:12 ` Jens Axboe
2024-10-29 1:50 ` Ming Lei [this message]
2024-10-29 16:38 ` Pavel Begunkov
2024-10-31 21:24 ` Jens Axboe
2024-10-31 21:39 ` Jens Axboe
2024-11-01 0:00 ` Jens Axboe
2024-10-25 12:22 ` [PATCH V8 5/7] io_uring: support leased group buffer with REQ_F_GROUP_KBUF Ming Lei
2024-10-29 16:47 ` Pavel Begunkov
2024-10-30 0:45 ` Ming Lei
2024-10-30 1:25 ` Pavel Begunkov
2024-10-30 2:04 ` Ming Lei
2024-10-31 13:16 ` Pavel Begunkov
2024-11-01 1:04 ` Ming Lei
2024-11-03 22:31 ` Pavel Begunkov
2024-11-04 0:16 ` Ming Lei
2024-11-04 1:08 ` Pavel Begunkov
2024-11-04 1:21 ` Ming Lei
2024-11-04 12:23 ` Pavel Begunkov
2024-11-04 13:08 ` Ming Lei
2024-11-04 13:24 ` Pavel Begunkov
2024-11-04 13:35 ` Ming Lei
2024-11-04 16:38 ` Pavel Begunkov
2024-11-05 3:37 ` Ming Lei
2024-10-25 12:22 ` [PATCH V8 6/7] io_uring/uring_cmd: support leasing device kernel buffer to io_uring Ming Lei
2024-10-25 12:22 ` [PATCH V8 7/7] ublk: support leasing io " Ming Lei
2024-10-29 17:01 ` [PATCH V8 0/8] io_uring: support sqe group and leased group kbuf Pavel Begunkov
2024-10-29 17:04 ` Jens Axboe
2024-10-29 19:18 ` Jens Axboe
2024-10-29 20:06 ` Jens Axboe
2024-10-29 21:26 ` Jens Axboe
2024-10-30 2:03 ` Ming Lei
2024-10-30 2:43 ` Jens Axboe
2024-10-30 3:08 ` Ming Lei
2024-10-30 4:11 ` Ming Lei
2024-10-30 13:20 ` Jens Axboe
2024-10-31 2:53 ` Ming Lei
2024-10-31 13:35 ` Jens Axboe
2024-10-31 15:07 ` Jens Axboe
2024-11-01 2:57 ` Ming Lei
2024-11-01 1:39 ` Ming Lei
2024-10-31 13:42 ` Pavel Begunkov
2024-10-30 13:18 ` Jens Axboe
2024-10-31 13:25 ` Pavel Begunkov
2024-10-31 14:29 ` Jens Axboe
2024-10-31 15:25 ` Pavel Begunkov
2024-10-31 15:42 ` Jens Axboe
2024-10-31 16:29 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZyA_dbiU0ho5IJYA@fedora \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox