From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>, [email protected]
Subject: Re: [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC
Date: Mon, 5 Sep 2022 16:51:08 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <b56a06f431ea01d125627d4fd95d712e5d72a51c.1662415676.git.asml.silence@gmail.com>
On 9/5/22 4:09 PM, Pavel Begunkov wrote:
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
>
> Doc writing is not my strongest side, comments are welcome.
>
> man/io_uring_enter.2 | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 44 insertions(+)
>
> diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
> index 1a9311e..7fd275c 100644
> --- a/man/io_uring_enter.2
> +++ b/man/io_uring_enter.2
> @@ -1059,6 +1059,50 @@ value being passed in. This request type can be used to either just wake or
> interrupt anyone waiting for completions on the target ring, or it can be used
> to pass messages via the two fields. Available since 5.18.
>
> +.TP
> +.B IORING_OP_SEND_ZC
> +Issue the zerocopy equivalent of a
> +.BR send(2)
> +system call. It's similar to IORING_OP_SEND, but when the
> +.I flags
> +field of the
> +.I "struct io_uring_cqe"
> +contains IORING_CQE_F_MORE, the userspace should expect a second cqe, a.k.a.
> +notification, and until then it should not modify data in the buffer. The
> +notification will have the same
> +.I user_data
> +as the first one and its
> +.I flags
> +field will contain the
> +.I IORING_CQE_F_NOTIF
> +flag. It's guaranteed that IORING_CQE_F_MORE is set IFF the result is
> +non-negative.
> +.I fd
> +must be set to the socket file descriptor,
> +.I addr
> +must contain a pointer to the buffer,
> +.I len
> +denotes the length of the buffer to send, and
> +.I msg_flags
> +holds the flags associated with the system call. When
> +.I addr2
> +is non-zero it points to the address of the target with
> +.I addr_len
> +specifying its size, turning the request into a
> +.BR sendto(2)
> +system call equivalent.
> +
> +.B IORING_OP_SEND_ZC
> +tries to avoid making intermediate data copies but still may fall back to
> +copying. Furthermore, zerocopy is not always faster, especially when the
> +per-request payload size is small. The two completion model is needed because
> +the kernel might hold on to buffers for a long time, e.g. waiting for a TCP ACK,
> +and having a separate cqe for request completions allows the userspace to push
> +more data without extra delays. Note, notifications don't guarantee that the
> +data has been or will ever be received by the other endpoint.
I'd probably reorder this a bit to introduce it with the fact that's
it's like SEND, but zero-copy. Then explain the mechanics of how MORE is
set for the 2 stage completion notification if zc is done. I can shuffle
it around a bit if you want me to - just let me know!
> +Available since 5.20.
Should be 6.0 here.
--
Jens Axboe
next prev parent reply other threads:[~2022-09-05 22:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-05 22:09 [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC Pavel Begunkov
2022-09-05 22:51 ` Jens Axboe [this message]
2022-09-06 10:19 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox