public inbox for [email protected]
 help / color / mirror / Atom feed
From: Oliver Crumrine <[email protected]>
To: Pavel Begunkov <[email protected]>,
	Oliver Crumrine <[email protected]>,
	[email protected],  [email protected], [email protected],
	[email protected], [email protected],  [email protected],
	[email protected]
Cc: [email protected], [email protected],
	 [email protected], [email protected]
Subject: Re: [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy
Date: Fri, 5 Apr 2024 13:04:32 -0700	[thread overview]
Message-ID: <CAK1VsR17Ea6cmks7BcdvS4ZHQMRz_kWd1NhPh8J1fUpsgC7WFg@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>

Pavel Begunkov wrote:
> On 4/4/24 23:17, Oliver Crumrine wrote:
> > In his patch to enable zerocopy networking for io_uring, Pavel Begunkov
> > specifically disabled REQ_F_CQE_SKIP, as (at least from my
> > understanding) the userspace program wouldn't receive the
> > IORING_CQE_F_MORE flag in the result value.
>
> No. IORING_CQE_F_MORE means there will be another CQE from this
> request, so a single CQE without IORING_CQE_F_MORE is trivially
> fine.
>
> The problem is the semantics, because by suppressing the first
> CQE you're loosing the result value. You might rely on WAITALL
That's already happening with io_send.
> as other sends and "fail" (in terms of io_uring) the request
> in case of a partial send posting 2 CQEs, but that's not a great
> way and it's getting userspace complicated pretty easily.
>
> In short, it was left out for later because there is a
> better way to implement it, but it should be done carefully
Maybe we could put the return values in the notifs? That would be a
discrepancy between io_send and io_send_zc, though.
>
>
> > To fix this, instead of keeping track of how many CQEs have been
> > received, and subtracting notifs from that, programs can keep track of
>
> That's a benchmark way of doing it, more realistically
> it'd be more like
>
> event_loop() {
> 	cqe = wait_cqe();
> 	struct req *r = (struct req *)cqe->user_data;
> 	r->callback(r, cqe);
>
>
> send_zc_callback(req, cqe) {
> 	if (cqe->flags & F_MORE) {
> 		// don't free the req
> 		// we should wait for another CQE
> 		...
> 	}
> }
>
> > how many SQEs they have issued, and if a CQE is returned with an error,
> > they can simply subtract from how many notifs they expect to receive.
>
> The design specifically untangles those two notions, i.e. there can
> be a notification even when the main CQE fails (ret<0). It's safer
> this way, even though AFAIK relying on errors would be fine with
> current users (TCP/UDP).
>
>
> > Signed-off-by: Oliver Crumrine <[email protected]>
> > ---
> >   io_uring/net.c | 6 ++----
> >   1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/io_uring/net.c b/io_uring/net.c
> > index 1e7665ff6ef7..822f49809b68 100644
> > --- a/io_uring/net.c
> > +++ b/io_uring/net.c
> > @@ -1044,9 +1044,6 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> >
> >   	if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)))
> >   		return -EINVAL;
> > -	/* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
> > -	if (req->flags & REQ_F_CQE_SKIP)
> > -		return -EINVAL;
> >
> >   	notif = zc->notif = io_alloc_notif(ctx);
> >   	if (!notif)
> > @@ -1342,7 +1339,8 @@ void io_sendrecv_fail(struct io_kiocb *req)
> >   		req->cqe.res = sr->done_io;
> >
> >   	if ((req->flags & REQ_F_NEED_CLEANUP) &&
> > -	    (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC))
> > +	    (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC) &&
> > +	    !(req->flags & REQ_F_CQE_SKIP))
> >   		req->cqe.flags |= IORING_CQE_F_MORE;
> >   }
> >
>
> --
> Pavel Begunkov

  reply	other threads:[~2024-04-05 20:04 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-04 22:16 [PATCH 0/3] Add REQ_F_CQE_SKIP support to io_uring zerocopy Oliver Crumrine
2024-04-04 22:17 ` [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for " Oliver Crumrine
2024-04-05 13:01   ` Pavel Begunkov
2024-04-05 20:04     ` Oliver Crumrine [this message]
2024-04-06 21:23       ` Pavel Begunkov
2024-04-07 13:13         ` Oliver Crumrine
2024-04-07 19:14           ` Oliver Crumrine
2024-04-07 23:46             ` Pavel Begunkov
2024-04-09  1:33               ` Oliver Crumrine
2024-04-10 12:05                 ` Pavel Begunkov
2024-04-11  0:52                   ` Oliver Crumrine
2024-04-12 13:20                     ` Pavel Begunkov
2024-04-15 23:51                       ` Oliver Crumrine
2024-04-04 22:19 ` [PATCH 2/3] io_uring: Add io_uring_peek_cqe to mini_liburing Oliver Crumrine
2024-04-04 22:19 ` [PATCH 3/3] io_uring: Support IOSQE_CQE_SKIP_SUCCESS in io_uring zerocopy test Oliver Crumrine
2024-04-06 20:33   ` Muhammad Usama Anjum
2024-04-05 12:06 ` [PATCH 0/3] Add REQ_F_CQE_SKIP support to io_uring zerocopy Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAK1VsR17Ea6cmks7BcdvS4ZHQMRz_kWd1NhPh8J1fUpsgC7WFg@mail.gmail.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox