From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Subject: Re: [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC
Date: Tue, 6 Sep 2022 11:19:29 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 9/5/22 23:51, Jens Axboe wrote:
> On 9/5/22 4:09 PM, Pavel Begunkov wrote:
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>>
>> Doc writing is not my strongest side, comments are welcome.
>>
>> man/io_uring_enter.2 | 44 ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 44 insertions(+)
>>
>> diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
>> index 1a9311e..7fd275c 100644
>> --- a/man/io_uring_enter.2
>> +++ b/man/io_uring_enter.2
>> @@ -1059,6 +1059,50 @@ value being passed in. This request type can be used to either just wake or
>> interrupt anyone waiting for completions on the target ring, or it can be used
>> to pass messages via the two fields. Available since 5.18.
>>
>> +.TP
>> +.B IORING_OP_SEND_ZC
>> +Issue the zerocopy equivalent of a
>> +.BR send(2)
>> +system call. It's similar to IORING_OP_SEND, but when the
>> +.I flags
>> +field of the
>> +.I "struct io_uring_cqe"
>> +contains IORING_CQE_F_MORE, the userspace should expect a second cqe, a.k.a.
>> +notification, and until then it should not modify data in the buffer. The
>> +notification will have the same
>> +.I user_data
>> +as the first one and its
>> +.I flags
>> +field will contain the
>> +.I IORING_CQE_F_NOTIF
>> +flag. It's guaranteed that IORING_CQE_F_MORE is set IFF the result is
>> +non-negative.
>> +.I fd
>> +must be set to the socket file descriptor,
>> +.I addr
>> +must contain a pointer to the buffer,
>> +.I len
>> +denotes the length of the buffer to send, and
>> +.I msg_flags
>> +holds the flags associated with the system call. When
>> +.I addr2
>> +is non-zero it points to the address of the target with
>> +.I addr_len
>> +specifying its size, turning the request into a
>> +.BR sendto(2)
>> +system call equivalent.
>> +
>> +.B IORING_OP_SEND_ZC
>> +tries to avoid making intermediate data copies but still may fall back to
>> +copying. Furthermore, zerocopy is not always faster, especially when the
>> +per-request payload size is small. The two completion model is needed because
>> +the kernel might hold on to buffers for a long time, e.g. waiting for a TCP ACK,
>> +and having a separate cqe for request completions allows the userspace to push
>> +more data without extra delays. Note, notifications don't guarantee that the
>> +data has been or will ever be received by the other endpoint.
>
> I'd probably reorder this a bit to introduce it with the fact that's
> it's like SEND, but zero-copy. Then explain the mechanics of how MORE is
> set for the 2 stage completion notification if zc is done. I can shuffle
> it around a bit if you want me to - just let me know!
I don't mind you editing it at all, makes my life easier. Anyway,
sent out a v2, let's see if it reads better.
--
Pavel Begunkov
prev parent reply other threads:[~2022-09-06 10:23 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-05 22:09 [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC Pavel Begunkov
2022-09-05 22:51 ` Jens Axboe
2022-09-06 10:19 ` Pavel Begunkov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox