From: Pavel Begunkov <[email protected]>
To: Stefan Metzmacher <[email protected]>, [email protected]
Cc: Jens Axboe <[email protected]>, Dylan Yudaken <[email protected]>
Subject: Re: [RFC 2/2] io_uring/net: allow to override notification tag
Date: Wed, 17 Aug 2022 13:42:01 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 8/16/22 09:23, Stefan Metzmacher wrote:
> Am 16.08.22 um 09:42 schrieb Pavel Begunkov:
>> Considering limited amount of slots some users struggle with
>> registration time notification tag assignment as it's hard to manage
>> notifications using sequence numbers. Add a simple feature that copies
>> sqe->user_data of a send(+flush) request into the notification CQE it
>> flushes (and only when it's flushes).
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> include/uapi/linux/io_uring.h | 4 ++++
>> io_uring/net.c | 6 +++++-
>> 2 files changed, 9 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index 20368394870e..91e7944c9c78 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -280,11 +280,15 @@ enum io_uring_op {
>> *
>> * IORING_RECVSEND_NOTIF_FLUSH Flush a notification after a successful
>> * successful. Only for zerocopy sends.
>> + *
>> + * IORING_RECVSEND_NOTIF_COPY_TAG Copy request's user_data into the notification
>> + * completion even if it's flushed.
>> */
>> #define IORING_RECVSEND_POLL_FIRST (1U << 0)
>> #define IORING_RECV_MULTISHOT (1U << 1)
>> #define IORING_RECVSEND_FIXED_BUF (1U << 2)
>> #define IORING_RECVSEND_NOTIF_FLUSH (1U << 3)
>> +#define IORING_RECVSEND_NOTIF_COPY_TAG (1U << 4)
>> /* cqe->res mask for extracting the notification sequence number */
>> #define IORING_NOTIF_SEQ_MASK 0xFFFFU
>> diff --git a/io_uring/net.c b/io_uring/net.c
>> index bd3fad9536ef..4d271a269979 100644
>> --- a/io_uring/net.c
>> +++ b/io_uring/net.c
>> @@ -858,7 +858,9 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>> zc->flags = READ_ONCE(sqe->ioprio);
>> if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST |
>> - IORING_RECVSEND_FIXED_BUF | IORING_RECVSEND_NOTIF_FLUSH))
>> + IORING_RECVSEND_FIXED_BUF |
>> + IORING_RECVSEND_NOTIF_FLUSH |
>> + IORING_RECVSEND_NOTIF_COPY_TAG))
>> return -EINVAL;
>> if (zc->flags & IORING_RECVSEND_FIXED_BUF) {
>> unsigned idx = READ_ONCE(sqe->buf_index);
>> @@ -1024,6 +1026,8 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags)
>> if (ret == -ERESTARTSYS)
>> ret = -EINTR;
>> } else if (zc->flags & IORING_RECVSEND_NOTIF_FLUSH) {
>> + if (zc->flags & IORING_RECVSEND_NOTIF_COPY_TAG)
>> + notif->cqe.user_data = req->cqe.user_data;
>> io_notif_slot_flush_submit(notif_slot, 0);
>> }
>
> This would work but it seems to be confusing.
>
> Can't we have a slot-less mode, with slot_idx==U16_MAX,
> where we always allocate a new notif for each request,
> this would then get the same user_data and would be referenced on the
> request in order to reuse the same notif on an async retry after a short send.
Ok, retries may make slots managing much harder, let me think
> And this notif will always be flushed at the end of the request.
>
> This:
>
> struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
> struct io_notif_slot *slot)
>
> would change to:
>
> struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
> __u64 cqe_user_data,
> __s32 cqe_res)
>
>
> and:
>
> void io_notif_slot_flush(struct io_notif_slot *slot) __must_hold(&ctx->uring_lock)
>
> (__must_hold looks wrong there...)
Nope, it should be there
> could just be:
>
> void io_notif_flush(struct io_notif_*notif)
>
> What do you think? It would remove the whole notif slot complexity
> from caller using IORING_RECVSEND_NOTIF_FLUSH for every request anyway.
The downside is that requests then should be pretty large or it'll
lose in performance. Surely not a problem for 8MB per request but
even 4KB won't suffice. And users may want to put in smaller chunks
on the wire instead of waiting for mode data to let tcp handle
pacing and potentially improve latencies by sending earlier.
On the other hand that one notification per request idea mentioned
before can extended to 1-2 CQEs per request, which is interestingly
the approach zc send discussions started with.
--
Pavel Begunkov
next prev parent reply other threads:[~2022-08-17 12:44 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-16 7:41 [RFC 0/2] io_uring zc notification tag override Pavel Begunkov
2022-08-16 7:42 ` [RFC 1/2] io_uring/notif: change notif CQE uapi format Pavel Begunkov
2022-08-16 8:14 ` Stefan Metzmacher
2022-08-16 7:42 ` [RFC 2/2] io_uring/net: allow to override notification tag Pavel Begunkov
2022-08-16 8:23 ` Stefan Metzmacher
2022-08-17 12:42 ` Pavel Begunkov [this message]
2022-08-18 18:13 ` Stefan Metzmacher
2022-08-19 11:42 ` Pavel Begunkov
2022-08-19 12:36 ` Stefan Metzmacher
2022-08-22 11:49 ` Pavel Begunkov
2022-08-16 8:37 ` Dylan Yudaken
2022-08-17 10:48 ` Pavel Begunkov
2022-08-17 12:04 ` Dylan Yudaken
2022-08-17 12:44 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox