From: Ming Lei <[email protected]>
To: Pavel Begunkov <[email protected]>
Cc: Jens Axboe <[email protected]>,
[email protected], [email protected],
Kevin Wolf <[email protected]>,
[email protected]
Subject: Re: [PATCH V5 4/8] io_uring: support SQE group
Date: Sat, 7 Sep 2024 17:36:08 +0800 [thread overview]
Message-ID: <ZtweiCfLOJmdeY0Z@fedora> (raw)
In-Reply-To: <[email protected]>
On Fri, Sep 06, 2024 at 06:15:32PM +0100, Pavel Begunkov wrote:
> On 8/29/24 05:29, Ming Lei wrote:
> ...
> > > > + if (WARN_ON_ONCE(lead->grp_refs <= 0))
> > > > + return false;
> > > > +
> > > > + req->flags &= ~REQ_F_SQE_GROUP;
> > >
> > > I'm getting completely lost when and why it clears and sets
> > > back REQ_F_SQE_GROUP and REQ_F_SQE_GROUP_LEADER. Is there any
> > > rule?
> >
> > My fault, it should have been documented somewhere.
> >
> > REQ_F_SQE_GROUP is cleared when the request is completed, but it is
> > reused as flag for marking the last request in this group, so we can
> > free the group leader when observing the 'last' member request.
>
> Maybe it'd be cleaner to use a second flag?
I will add one new flag with same value, since the two's lifetime
is non-overlapping.
>
> > The only other difference about the two flags is that both are cleared
> > when the group leader becomes the last one in the group, then
> > this leader degenerates as normal request, which way can simplify
> > group leader freeing.
> >
> > >
> > > > + /*
> > > > + * Set linked leader as failed if any member is failed, so
> > > > + * the remained link chain can be terminated
> > > > + */
> > > > + if (unlikely((req->flags & REQ_F_FAIL) &&
> > > > + ((lead->flags & IO_REQ_LINK_FLAGS) && lead->link)))
> > > > + req_set_fail(lead);
> > >
> > > if (req->flags & REQ_F_FAIL)
> > > req_set_fail(lead);
> > >
> > > REQ_F_FAIL is not specific to links, if a request fails we need
> > > to mark it as such.
> >
> > It is for handling group failure.
> >
> > The following condition
> >
> > ((lead->flags & IO_REQ_LINK_FLAGS) && lead->link))
> >
> > means that this group is in one link-chain.
> >
> > If any member in this group is failed, we need to fail this group(lead),
> > then the remained requests in this chain can be failed.
> >
> > Otherwise, it isn't necessary to fail group leader in case of any member
> > io failure.
>
> What bad would happen if you do it like this?
>
> if (req->flags & REQ_F_FAIL)
> req_set_fail(lead);
>
> I'm asking because if you rely on some particular combination
> of F_FAIL and F_LINK somewhere, it's likely wrong, but otherwise
> we F_FAIL a larger set of requests, which should never be an
> issue.
From dependency relation viewpoint it is not necessary to fail all, but it
makes us easier to start with this more determinate behavior, will
change to this way in V6.
>
> > > > + return !--lead->grp_refs;
> > > > +}
> > > > +
> > > > +static inline bool leader_is_the_last(struct io_kiocb *lead)
> > > > +{
> > > > + return lead->grp_refs == 1 && (lead->flags & REQ_F_SQE_GROUP);
> > > > +}
> > > > +
> > > > +static void io_complete_group_member(struct io_kiocb *req)
> > > > +{
> > > > + struct io_kiocb *lead = get_group_leader(req);
> > > > +
> > > > + if (WARN_ON_ONCE(!(req->flags & REQ_F_SQE_GROUP)))
> > > > + return;
> > > > +
> > > > + /* member CQE needs to be posted first */
> > > > + if (!(req->flags & REQ_F_CQE_SKIP))
> > > > + io_req_commit_cqe(req->ctx, req);
> > > > +
> > > > + if (__io_complete_group_member(req, lead)) {
> > > > + /*
> > > > + * SQE_GROUP flag is kept for the last member, so the leader
> > > > + * can be retrieved & freed from this last member
> > > > + */
> > > > + req->flags |= REQ_F_SQE_GROUP;
> >
> > 'req' is the last completed request, so mark it as the last one
> > by reusing REQ_F_SQE_GROUP, so we can free group leader in
> > io_free_batch_list() when observing the last flag.
> >
> > But it should have been documented.
> >
> > > > + if (!(lead->flags & REQ_F_CQE_SKIP))
> > > > + io_req_commit_cqe(lead->ctx, lead);
> > > > + } else if (leader_is_the_last(lead)) {
> > > > + /* leader will degenerate to plain req if it is the last */
> > > > + lead->flags &= ~(REQ_F_SQE_GROUP | REQ_F_SQE_GROUP_LEADER);
> > >
> > > What's this chunk is about?
> >
> > The leader becomes the only request not completed in group, so it is
> > degenerated as normal one by clearing the two flags. This way simplifies
> > logic for completing group leader.
> >
> ...
> > > > @@ -1388,11 +1501,33 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
> > > > comp_list);
> > > > if (unlikely(req->flags & IO_REQ_CLEAN_SLOW_FLAGS)) {
> > > > + if (req->flags & (REQ_F_SQE_GROUP |
> > > > + REQ_F_SQE_GROUP_LEADER)) {
> > > > + struct io_kiocb *leader;
> > > > +
> > > > + /* Leader is freed via the last member */
> > > > + if (req_is_group_leader(req)) {
> > > > + node = req->comp_list.next;
> > > > + continue;
> > > > + }
> > > > +
> > > > + /*
> > > > + * Only the last member keeps GROUP flag,
> > > > + * free leader and this member together
> > > > + */
> > > > + leader = get_group_leader(req);
> > > > + leader->flags &= ~REQ_F_SQE_GROUP_LEADER;
> > > > + req->flags &= ~REQ_F_SQE_GROUP;
> > > > + wq_stack_add_head(&leader->comp_list,
> > > > + &req->comp_list);
> > >
> > > That's quite hacky, but at least we can replace it with
> > > task work if it gets in the way later on.
> >
> > io_free_batch_list() is already called in task context, and it isn't
> > necessary to schedule one extra tw, which hurts perf more or less.
> >
> > Another way is to store these leaders into one temp list, and
> > call io_free_batch_list() for this temp list one more time.
>
> What I'm saying, it's fine to leave it as is for now. In the
> future if it becomes a problem for ome reason or another, we can
> do it the task_work like way.
OK, got it, I will leave it as is, and document the potential risk
with future changes.
>
> ...
> > > > @@ -2101,6 +2251,62 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
> > > > return def->prep(req, sqe);
> > > > }
> > > > +static struct io_kiocb *io_group_sqe(struct io_submit_link *group,
> > > > + struct io_kiocb *req)
> > > > +{
> > > > + /*
> > > > + * Group chain is similar with link chain: starts with 1st sqe with
> > > > + * REQ_F_SQE_GROUP, and ends with the 1st sqe without REQ_F_SQE_GROUP
> > > > + */
> > > > + if (group->head) {
> > > > + struct io_kiocb *lead = group->head;
> > > > +
> > > > + /* members can't be in link chain, can't be drained */
> > > > + if (req->flags & (IO_REQ_LINK_FLAGS | REQ_F_IO_DRAIN))
> > > > + req_fail_link_node(lead, -EINVAL);
> > >
> > > That should fail the entire link (if any) as well.
> >
> > Good catch, here we should fail link head by following the logic
> > in io_submit_fail_init().
> >
> > >
> > > I have even more doubts we even want to mix links and groups. Apart
> >
> > Wrt. ublk, group provides zero copy, and the ublk io(group) is generic
> > IO, sometime IO_LINK is really needed & helpful, such as in ublk-nbd,
> > send(tcp) requests need to be linked & zc. And we shouldn't limit IO_LINK
> > for generic io_uring IO.
> >
> > > from nuances as such, which would be quite hard to track, the semantics
> > > of IOSQE_CQE_SKIP_SUCCESS is unclear.
> >
> > IO group just follows every normal request.
>
> It tries to mimic but groups don't and essentially can't do it the
> same way, at least in some aspects. E.g. IOSQE_CQE_SKIP_SUCCESS
> usually means that all following will be silenced. What if a
> member is CQE_SKIP, should it stop the leader from posting a CQE?
> And whatever the answer is, it'll be different from the link's
> behaviour.
Here it looks easier than link's:
- only leader's IOSQE_CQE_SKIP_SUCCESS follows linked request's rule
- all members just respects the flag for its own, and not related with
leader's
>
> Regardless, let's forbid IOSQE_CQE_SKIP_SUCCESS and linked timeouts
> for groups, that can be discussed afterwards.
It should easy to forbid IOSQE_CQE_SKIP_SUCCESS which is per-sqe, will do
it in V6.
I am not sure if it is easy to disallow IORING_OP_LINK_TIMEOUT, which
covers all linked sqes, and group leader could be just one of them.
Can you share any idea about the implementation to forbid LINK_TIMEOUT
for sqe group?
>
> > 1) fail in linked chain
> > - follows IO_LINK's behavior since io_fail_links() covers io group
> >
> > 2) otherwise
> > - just respect IOSQE_CQE_SKIP_SUCCESS
> >
> > > And also it doen't work with IORING_OP_LINK_TIMEOUT.
> >
> > REQ_F_LINK_TIMEOUT can work on whole group(or group leader) only, and I
> > will document it in V6.
>
> It would still be troublesome. When a linked timeout fires it searches
> for the request it's attached to and cancels it, however, group leaders
> that queued up their members are discoverable. But let's say you can find
> them in some way, then the only sensbile thing to do is cancel members,
> which should be doable by checking req->grp_leader, but might be easier
> to leave it to follow up patches.
We have changed sqe group to start queuing members after leader is
completed. link timeout will cancel leader with all its members via
leader->grp_link, this behavior should respect IORING_OP_LINK_TIMEOUT
completely.
Please see io_fail_links() and io_cancel_group_members().
>
>
> > > > +
> > > > + lead->grp_refs += 1;
> > > > + group->last->grp_link = req;
> > > > + group->last = req;
> > > > +
> > > > + if (req->flags & REQ_F_SQE_GROUP)
> > > > + return NULL;
> > > > +
> > > > + req->grp_link = NULL;
> > > > + req->flags |= REQ_F_SQE_GROUP;
> > > > + group->head = NULL;
> > > > + if (lead->flags & REQ_F_FAIL) {
> > > > + io_queue_sqe_fallback(lead);
> > >
> > > Let's say the group was in the middle of a link, it'll
> > > complete that group and continue with assembling / executing
> > > the link when it should've failed it and honoured the
> > > request order.
> >
> > OK, here we can simply remove the above two lines, and link submit
> > state can handle this failure in link chain.
>
> If you just delete then nobody would check for REQ_F_FAIL and
> fail the request.
io_link_assembling() & io_link_sqe() checks for REQ_F_FAIL and call
io_queue_sqe_fallback() either if it is in link chain or
not.
> Assuming you'd also set the fail flag to the
> link head when appropriate, how about deleting these two line
> and do like below? (can be further prettified)
>
>
> bool io_group_assembling()
> {
> return state->group.head || (req->flags & REQ_F_SQE_GROUP);
> }
> bool io_link_assembling()
> {
> return state->link.head || (req->flags & IO_REQ_LINK_FLAGS);
> }
>
> static inline int io_submit_sqe()
> {
> ...
> if (unlikely(io_link_assembling(state, req) ||
> io_group_assembling(state, req) ||
> req->flags & REQ_F_FAIL)) {
> if (io_group_assembling(state, req)) {
> req = io_group_sqe(&state->group, req);
> if (!req)
> return 0;
> }
> if (io_link_assembling(state, req)) {
> req = io_link_sqe(&state->link, req);
> if (!req)
> return 0;
> }
> if (req->flags & REQ_F_FAIL) {
> io_queue_sqe_fallback(req);
> return 0;
As I mentioned above, io_link_assembling() & io_link_sqe() covers
the failure handling.
thanks,
Ming
next prev parent reply other threads:[~2024-09-07 9:36 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-08 16:24 [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf Ming Lei
2024-08-08 16:24 ` [PATCH V5 1/8] io_uring: add io_link_req() helper Ming Lei
2024-08-08 16:24 ` [PATCH V5 2/8] io_uring: add io_submit_fail_link() helper Ming Lei
2024-08-08 16:24 ` [PATCH V5 3/8] io_uring: add helper of io_req_commit_cqe() Ming Lei
2024-08-08 16:24 ` [PATCH V5 4/8] io_uring: support SQE group Ming Lei
2024-08-27 15:18 ` Pavel Begunkov
2024-08-29 4:29 ` Ming Lei
2024-09-06 17:15 ` Pavel Begunkov
2024-09-07 9:36 ` Ming Lei [this message]
2024-09-10 13:12 ` Pavel Begunkov
2024-09-10 15:04 ` Ming Lei
2024-09-10 20:31 ` Pavel Begunkov
2024-09-11 1:28 ` Ming Lei
2024-08-08 16:24 ` [PATCH V5 5/8] io_uring: support sqe group with members depending on leader Ming Lei
2024-08-08 16:24 ` [PATCH V5 6/8] io_uring: support providing sqe group buffer Ming Lei
2024-08-08 16:24 ` [PATCH V5 7/8] io_uring/uring_cmd: support provide group kernel buffer Ming Lei
2024-08-08 16:24 ` [PATCH V5 8/8] ublk: support provide io buffer Ming Lei
2024-08-17 4:16 ` [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf Ming Lei
2024-08-17 19:48 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZtweiCfLOJmdeY0Z@fedora \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox