public inbox for [email protected]
 help / color / mirror / Atom feed
From: Oliver Crumrine <[email protected]>
To: Pavel Begunkov <[email protected]>,
	Oliver Crumrine <[email protected]>,
	[email protected]
Cc: [email protected], [email protected]
Subject: Re: [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy
Date: Sun, 7 Apr 2024 06:13:55 -0700	[thread overview]
Message-ID: <CAK1VsR210nrqtxWaVbQh00t_=7rhq9bwucFygGZaT=7N-t7E5Q@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>

Pavel Begunkov wrote:
> On 4/5/24 21:04, Oliver Crumrine wrote:
> > Pavel Begunkov wrote:
> >> On 4/4/24 23:17, Oliver Crumrine wrote:
> >>> In his patch to enable zerocopy networking for io_uring, Pavel Begunkov
> >>> specifically disabled REQ_F_CQE_SKIP, as (at least from my
> >>> understanding) the userspace program wouldn't receive the
> >>> IORING_CQE_F_MORE flag in the result value.
> >>
> >> No. IORING_CQE_F_MORE means there will be another CQE from this
> >> request, so a single CQE without IORING_CQE_F_MORE is trivially
> >> fine.
> >>
> >> The problem is the semantics, because by suppressing the first
> >> CQE you're loosing the result value. You might rely on WAITALL
> > That's already happening with io_send.
>
> Right, and it's still annoying and hard to use
Another solution might be something where there is a counter that stores
how many CQEs with REQ_F_CQE_SKIP have been processed. Before exiting,
userspace could call a function like: io_wait_completions(int completions)
which would wait until everything is done, and then userspace could peek
the completion ring.
>
> >> as other sends and "fail" (in terms of io_uring) the request
> >> in case of a partial send posting 2 CQEs, but that's not a great
> >> way and it's getting userspace complicated pretty easily.
> >>
> >> In short, it was left out for later because there is a
> >> better way to implement it, but it should be done carefully
> > Maybe we could put the return values in the notifs? That would be a
> > discrepancy between io_send and io_send_zc, though.
>
> Yes. And yes, having a custom flavour is not good. It'd only
> be well usable if apart from returning the actual result
> it also guarantees there will be one and only one CQE, then
> the userspace doesn't have to do the dancing with counting
> and checking F_MORE. In fact, I outlined before how a generic
> solution may looks like:
>
> https://github.com/axboe/liburing/issues/824
>
> The only interesting part, IMHO, is to be able to merge the
> main completion with its notification. Below is an old stash
> rebased onto for-6.10. The only thing missing is relinking,
> but maybe we don't even care about it. I need to cover it
> well with tests.
The patch looks pretty good. The only potential issue is that you store
the res of the normal CQE into the notif CQE. This overwrites the
IORING_CQE_F_NOTIF with IORING_CQE_F_MORE. This means that the notif would
indicate to userspace that there will be another CQE, of which there
won't.
>
>
>
>
> commit ca5e4fb6d105b5dfdf3768d46ce01529b7bb88c5
> Author: Pavel Begunkov <[email protected]>
> Date:   Sat Apr 6 15:46:38 2024 +0100
>
>      io_uring/net: introduce single CQE send zc mode
>
>      IORING_OP_SEND[MSG]_ZC requests are posting two completions, one to
>      notify that the data was queued, and later a second, usually referred
>      as "notification", to let the user know that the buffer used can be
>      reused/freed. In some cases the user might not care about the main
>      completion and would be content getting only the notification, which
>      would allow to simplify the userspace.
>
>      One example is when after a send the user would be waiting for the other
>      end to get the message and reply back not pushing any more data in the
>      meantime. Another case is unreliable protocols like UDP, which do not
>      require a confirmation from the other end before dropping buffers, and
>      so the notifications are usually posted shortly after the send request
>      is queued.
>
>      Add a flag merging completions into a single CQE. cqe->res will store
>      the send's result as usual, and it will have IORING_CQE_F_NOTIF set if
>      the buffer was potentially used. Timewise, it would be posted at the
>      moment when the notification would have been originally completed.
>
>      Signed-off-by: Pavel Begunkov <[email protected]>
>
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 7bd10201a02b..e2b528c341c9 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -356,6 +356,7 @@ enum io_uring_op {
>   #define IORING_RECV_MULTISHOT		(1U << 1)
>   #define IORING_RECVSEND_FIXED_BUF	(1U << 2)
>   #define IORING_SEND_ZC_REPORT_USAGE	(1U << 3)
> +#define IORING_SEND_ZC_COMBINE_CQE	(1U << 4)
>
>   /*
>    * cqe.res for IORING_CQE_F_NOTIF if
> diff --git a/io_uring/net.c b/io_uring/net.c
> index a74287692071..052f030ab8f8 100644
> --- a/io_uring/net.c
> +++ b/io_uring/net.c
> @@ -992,7 +992,19 @@ void io_send_zc_cleanup(struct io_kiocb *req)
>   	}
>   }
>
> -#define IO_ZC_FLAGS_COMMON (IORING_RECVSEND_POLL_FIRST | IORING_RECVSEND_FIXED_BUF)
> +static inline void io_sendzc_adjust_res(struct io_kiocb *req)
> +{
> +	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
> +
> +	if (sr->flags & IORING_SEND_ZC_COMBINE_CQE) {
> +		sr->notif->cqe.res = req->cqe.res;
> +		req->flags |= REQ_F_CQE_SKIP;
> +	}
> +}
> +
> +#define IO_ZC_FLAGS_COMMON (IORING_RECVSEND_POLL_FIRST | \
> +			    IORING_RECVSEND_FIXED_BUF | \
> +			    IORING_SEND_ZC_COMBINE_CQE)
>   #define IO_ZC_FLAGS_VALID  (IO_ZC_FLAGS_COMMON | IORING_SEND_ZC_REPORT_USAGE)
>
>   int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> @@ -1022,6 +1034,8 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>   		if (zc->flags & ~IO_ZC_FLAGS_VALID)
>   			return -EINVAL;
>   		if (zc->flags & IORING_SEND_ZC_REPORT_USAGE) {
> +			if (zc->flags & IORING_SEND_ZC_COMBINE_CQE)
> +				return -EINVAL;
>   			io_notif_set_extended(notif);
>   			io_notif_to_data(notif)->zc_report = true;
>   		}
> @@ -1197,6 +1211,9 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags)
>   	else if (zc->done_io)
>   		ret = zc->done_io;
>
> +	io_req_set_res(req, ret, IORING_CQE_F_MORE);
> +	io_sendzc_adjust_res(req);
> +
>   	/*
>   	 * If we're in io-wq we can't rely on tw ordering guarantees, defer
>   	 * flushing notif to io_send_zc_cleanup()
> @@ -1205,7 +1222,6 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags)
>   		io_notif_flush(zc->notif);
>   		io_req_msg_cleanup(req, 0);
>   	}
> -	io_req_set_res(req, ret, IORING_CQE_F_MORE);
>   	return IOU_OK;
>   }
>

>   	else if (sr->done_io)
>   		ret = sr->done_io;
>
> +	io_req_set_res(req, ret, IORING_CQE_F_MORE);
> +	io_sendzc_adjust_res(req);
> +
>   	/*
>   	 * If we're in io-wq we can't rely on tw ordering guarantees, defer
>   	 * flushing notif to io_send_zc_cleanup()
> @@ -1266,7 +1285,6 @@ int io_sendmsg_zc(struct io_kiocb *req, unsigned int issue_flags)
>   		io_notif_flush(sr->notif);
>   		io_req_msg_cleanup(req, 0);
>   	}
> -	io_req_set_res(req, ret, IORING_CQE_F_MORE);
>   	return IOU_OK;
>   }
>
> @@ -1278,8 +1296,10 @@ void io_sendrecv_fail(struct io_kiocb *req)
>   		req->cqe.res = sr->done_io;
>
>   	if ((req->flags & REQ_F_NEED_CLEANUP) &&
> -	    (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC))
> +	    (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC)) {
>   		req->cqe.flags |= IORING_CQE_F_MORE;
> +		io_sendzc_adjust_res(req);
> +	}
>   }
>
>   int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>
>
> --
> Pavel Begunkov

  reply	other threads:[~2024-04-07 13:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-04 22:16 [PATCH 0/3] Add REQ_F_CQE_SKIP support to io_uring zerocopy Oliver Crumrine
2024-04-04 22:17 ` [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for " Oliver Crumrine
2024-04-05 13:01   ` Pavel Begunkov
2024-04-05 20:04     ` Oliver Crumrine
2024-04-06 21:23       ` Pavel Begunkov
2024-04-07 13:13         ` Oliver Crumrine [this message]
2024-04-07 19:14           ` Oliver Crumrine
2024-04-07 23:46             ` Pavel Begunkov
2024-04-09  1:33               ` Oliver Crumrine
2024-04-10 12:05                 ` Pavel Begunkov
2024-04-11  0:52                   ` Oliver Crumrine
2024-04-12 13:20                     ` Pavel Begunkov
2024-04-15 23:51                       ` Oliver Crumrine
2024-04-04 22:19 ` [PATCH 2/3] io_uring: Add io_uring_peek_cqe to mini_liburing Oliver Crumrine
2024-04-04 22:19 ` [PATCH 3/3] io_uring: Support IOSQE_CQE_SKIP_SUCCESS in io_uring zerocopy test Oliver Crumrine
2024-04-06 20:33   ` Muhammad Usama Anjum
2024-04-05 12:06 ` [PATCH 0/3] Add REQ_F_CQE_SKIP support to io_uring zerocopy Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK1VsR210nrqtxWaVbQh00t_=7rhq9bwucFygGZaT=7N-t7E5Q@mail.gmail.com' \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox