public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Oliver Crumrine <[email protected]>,
	[email protected], [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected]
Cc: [email protected], [email protected],
	[email protected], [email protected]
Subject: Re: [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy
Date: Fri, 5 Apr 2024 14:01:07 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <b1a047a1b2d55c1c245a78ca9772c31a9b3ceb12.1712268605.git.ozlinuxc@gmail.com>

On 4/4/24 23:17, Oliver Crumrine wrote:
> In his patch to enable zerocopy networking for io_uring, Pavel Begunkov
> specifically disabled REQ_F_CQE_SKIP, as (at least from my
> understanding) the userspace program wouldn't receive the
> IORING_CQE_F_MORE flag in the result value.

No. IORING_CQE_F_MORE means there will be another CQE from this
request, so a single CQE without IORING_CQE_F_MORE is trivially
fine.

The problem is the semantics, because by suppressing the first
CQE you're loosing the result value. You might rely on WAITALL
as other sends and "fail" (in terms of io_uring) the request
in case of a partial send posting 2 CQEs, but that's not a great
way and it's getting userspace complicated pretty easily.

In short, it was left out for later because there is a
better way to implement it, but it should be done carefully


> To fix this, instead of keeping track of how many CQEs have been
> received, and subtracting notifs from that, programs can keep track of

That's a benchmark way of doing it, more realistically
it'd be more like

event_loop() {
	cqe = wait_cqe();
	struct req *r = (struct req *)cqe->user_data;
	r->callback(r, cqe);
}

send_zc_callback(req, cqe) {
	if (cqe->flags & F_MORE) {
		// don't free the req
		// we should wait for another CQE
		...
	}
}

> how many SQEs they have issued, and if a CQE is returned with an error,
> they can simply subtract from how many notifs they expect to receive.

The design specifically untangles those two notions, i.e. there can
be a notification even when the main CQE fails (ret<0). It's safer
this way, even though AFAIK relying on errors would be fine with
current users (TCP/UDP).


> Signed-off-by: Oliver Crumrine <[email protected]>
> ---
>   io_uring/net.c | 6 ++----
>   1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/io_uring/net.c b/io_uring/net.c
> index 1e7665ff6ef7..822f49809b68 100644
> --- a/io_uring/net.c
> +++ b/io_uring/net.c
> @@ -1044,9 +1044,6 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>   
>   	if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)))
>   		return -EINVAL;
> -	/* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
> -	if (req->flags & REQ_F_CQE_SKIP)
> -		return -EINVAL;
>   
>   	notif = zc->notif = io_alloc_notif(ctx);
>   	if (!notif)
> @@ -1342,7 +1339,8 @@ void io_sendrecv_fail(struct io_kiocb *req)
>   		req->cqe.res = sr->done_io;
>   
>   	if ((req->flags & REQ_F_NEED_CLEANUP) &&
> -	    (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC))
> +	    (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC) &&
> +	    !(req->flags & REQ_F_CQE_SKIP))
>   		req->cqe.flags |= IORING_CQE_F_MORE;
>   }
>   

-- 
Pavel Begunkov

  reply	other threads:[~2024-04-05 13:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-04 22:16 [PATCH 0/3] Add REQ_F_CQE_SKIP support to io_uring zerocopy Oliver Crumrine
2024-04-04 22:17 ` [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for " Oliver Crumrine
2024-04-05 13:01   ` Pavel Begunkov [this message]
2024-04-05 20:04     ` Oliver Crumrine
2024-04-06 21:23       ` Pavel Begunkov
2024-04-07 13:13         ` Oliver Crumrine
2024-04-07 19:14           ` Oliver Crumrine
2024-04-07 23:46             ` Pavel Begunkov
2024-04-09  1:33               ` Oliver Crumrine
2024-04-10 12:05                 ` Pavel Begunkov
2024-04-11  0:52                   ` Oliver Crumrine
2024-04-12 13:20                     ` Pavel Begunkov
2024-04-15 23:51                       ` Oliver Crumrine
2024-04-04 22:19 ` [PATCH 2/3] io_uring: Add io_uring_peek_cqe to mini_liburing Oliver Crumrine
2024-04-04 22:19 ` [PATCH 3/3] io_uring: Support IOSQE_CQE_SKIP_SUCCESS in io_uring zerocopy test Oliver Crumrine
2024-04-06 20:33   ` Muhammad Usama Anjum
2024-04-05 12:06 ` [PATCH 0/3] Add REQ_F_CQE_SKIP support to io_uring zerocopy Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox