public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC
@ 2022-09-05 22:09 Pavel Begunkov
  2022-09-05 22:51 ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Pavel Begunkov @ 2022-09-05 22:09 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Signed-off-by: Pavel Begunkov <[email protected]>
---

Doc writing is not my strongest side, comments are welcome.

 man/io_uring_enter.2 | 44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
index 1a9311e..7fd275c 100644
--- a/man/io_uring_enter.2
+++ b/man/io_uring_enter.2
@@ -1059,6 +1059,50 @@ value being passed in. This request type can be used to either just wake or
 interrupt anyone waiting for completions on the target ring, or it can be used
 to pass messages via the two fields. Available since 5.18.
 
+.TP
+.B IORING_OP_SEND_ZC
+Issue the zerocopy equivalent of a
+.BR send(2)
+system call. It's similar to IORING_OP_SEND, but when the
+.I flags
+field of the
+.I "struct io_uring_cqe"
+contains IORING_CQE_F_MORE, the userspace should expect a second cqe, a.k.a.
+notification, and until then it should not modify data in the buffer. The
+notification will have the same
+.I user_data
+as the first one and its
+.I flags
+field will contain the
+.I IORING_CQE_F_NOTIF
+flag. It's guaranteed that IORING_CQE_F_MORE is set IFF the result is
+non-negative.
+.I fd
+must be set to the socket file descriptor,
+.I addr
+must contain a pointer to the buffer,
+.I len
+denotes the length of the buffer to send, and
+.I msg_flags
+holds the flags associated with the system call. When
+.I addr2
+is non-zero it points to the address of the target with
+.I addr_len
+specifying its size, turning the request into a
+.BR sendto(2)
+system call equivalent.
+
+.B IORING_OP_SEND_ZC
+tries to avoid making intermediate data copies but still may fall back to
+copying. Furthermore, zerocopy is not always faster, especially when the
+per-request payload size is small. The two completion model is needed because
+the kernel might hold on to buffers for a long time, e.g. waiting for a TCP ACK,
+and having a separate cqe for request completions allows the userspace to push
+more data without extra delays. Note, notifications don't guarantee that the
+data has been or will ever be received by the other endpoint.
+
+Available since 5.20.
+
 .PP
 The
 .I flags
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC
  2022-09-05 22:09 [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC Pavel Begunkov
@ 2022-09-05 22:51 ` Jens Axboe
  2022-09-06 10:19   ` Pavel Begunkov
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2022-09-05 22:51 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 9/5/22 4:09 PM, Pavel Begunkov wrote:
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> 
> Doc writing is not my strongest side, comments are welcome.
> 
>  man/io_uring_enter.2 | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
> index 1a9311e..7fd275c 100644
> --- a/man/io_uring_enter.2
> +++ b/man/io_uring_enter.2
> @@ -1059,6 +1059,50 @@ value being passed in. This request type can be used to either just wake or
>  interrupt anyone waiting for completions on the target ring, or it can be used
>  to pass messages via the two fields. Available since 5.18.
>  
> +.TP
> +.B IORING_OP_SEND_ZC
> +Issue the zerocopy equivalent of a
> +.BR send(2)
> +system call. It's similar to IORING_OP_SEND, but when the
> +.I flags
> +field of the
> +.I "struct io_uring_cqe"
> +contains IORING_CQE_F_MORE, the userspace should expect a second cqe, a.k.a.
> +notification, and until then it should not modify data in the buffer. The
> +notification will have the same
> +.I user_data
> +as the first one and its
> +.I flags
> +field will contain the
> +.I IORING_CQE_F_NOTIF
> +flag. It's guaranteed that IORING_CQE_F_MORE is set IFF the result is
> +non-negative.
> +.I fd
> +must be set to the socket file descriptor,
> +.I addr
> +must contain a pointer to the buffer,
> +.I len
> +denotes the length of the buffer to send, and
> +.I msg_flags
> +holds the flags associated with the system call. When
> +.I addr2
> +is non-zero it points to the address of the target with
> +.I addr_len
> +specifying its size, turning the request into a
> +.BR sendto(2)
> +system call equivalent.
> +
> +.B IORING_OP_SEND_ZC
> +tries to avoid making intermediate data copies but still may fall back to
> +copying. Furthermore, zerocopy is not always faster, especially when the
> +per-request payload size is small. The two completion model is needed because
> +the kernel might hold on to buffers for a long time, e.g. waiting for a TCP ACK,
> +and having a separate cqe for request completions allows the userspace to push
> +more data without extra delays. Note, notifications don't guarantee that the
> +data has been or will ever be received by the other endpoint.

I'd probably reorder this a bit to introduce it with the fact that's
it's like SEND, but zero-copy. Then explain the mechanics of how MORE is
set for the 2 stage completion notification if zc is done. I can shuffle
it around a bit if you want me to - just let me know!

> +Available since 5.20.

Should be 6.0 here.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC
  2022-09-05 22:51 ` Jens Axboe
@ 2022-09-06 10:19   ` Pavel Begunkov
  0 siblings, 0 replies; 3+ messages in thread
From: Pavel Begunkov @ 2022-09-06 10:19 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 9/5/22 23:51, Jens Axboe wrote:
> On 9/5/22 4:09 PM, Pavel Begunkov wrote:
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>>
>> Doc writing is not my strongest side, comments are welcome.
>>
>>   man/io_uring_enter.2 | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 44 insertions(+)
>>
>> diff --git a/man/io_uring_enter.2 b/man/io_uring_enter.2
>> index 1a9311e..7fd275c 100644
>> --- a/man/io_uring_enter.2
>> +++ b/man/io_uring_enter.2
>> @@ -1059,6 +1059,50 @@ value being passed in. This request type can be used to either just wake or
>>   interrupt anyone waiting for completions on the target ring, or it can be used
>>   to pass messages via the two fields. Available since 5.18.
>>   
>> +.TP
>> +.B IORING_OP_SEND_ZC
>> +Issue the zerocopy equivalent of a
>> +.BR send(2)
>> +system call. It's similar to IORING_OP_SEND, but when the
>> +.I flags
>> +field of the
>> +.I "struct io_uring_cqe"
>> +contains IORING_CQE_F_MORE, the userspace should expect a second cqe, a.k.a.
>> +notification, and until then it should not modify data in the buffer. The
>> +notification will have the same
>> +.I user_data
>> +as the first one and its
>> +.I flags
>> +field will contain the
>> +.I IORING_CQE_F_NOTIF
>> +flag. It's guaranteed that IORING_CQE_F_MORE is set IFF the result is
>> +non-negative.
>> +.I fd
>> +must be set to the socket file descriptor,
>> +.I addr
>> +must contain a pointer to the buffer,
>> +.I len
>> +denotes the length of the buffer to send, and
>> +.I msg_flags
>> +holds the flags associated with the system call. When
>> +.I addr2
>> +is non-zero it points to the address of the target with
>> +.I addr_len
>> +specifying its size, turning the request into a
>> +.BR sendto(2)
>> +system call equivalent.
>> +
>> +.B IORING_OP_SEND_ZC
>> +tries to avoid making intermediate data copies but still may fall back to
>> +copying. Furthermore, zerocopy is not always faster, especially when the
>> +per-request payload size is small. The two completion model is needed because
>> +the kernel might hold on to buffers for a long time, e.g. waiting for a TCP ACK,
>> +and having a separate cqe for request completions allows the userspace to push
>> +more data without extra delays. Note, notifications don't guarantee that the
>> +data has been or will ever be received by the other endpoint.
> 
> I'd probably reorder this a bit to introduce it with the fact that's
> it's like SEND, but zero-copy. Then explain the mechanics of how MORE is
> set for the 2 stage completion notification if zc is done. I can shuffle
> it around a bit if you want me to - just let me know!

I don't mind you editing it at all, makes my life easier. Anyway,
sent out a v2, let's see if it reads better.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-09-06 10:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-05 22:09 [PATCH liburing] man/io_uring_enter.2: document IORING_OP_SEND_ZC Pavel Begunkov
2022-09-05 22:51 ` Jens Axboe
2022-09-06 10:19   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox