Re: [RFC 2/2] io_uring/net: allow to override notification tag

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: Stefan Metzmacher <[email protected]>, [email protected]
Cc: Jens Axboe <[email protected]>, Dylan Yudaken <[email protected]>
Subject: Re: [RFC 2/2] io_uring/net: allow to override notification tag
Date: Fri, 19 Aug 2022 12:42:28 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/18/22 19:13, Stefan Metzmacher wrote:
> Am 17.08.22 um 14:42 schrieb Pavel Begunkov:
>> On 8/16/22 09:23, Stefan Metzmacher wrote:
>>> Am 16.08.22 um 09:42 schrieb Pavel Begunkov:
>>>> Considering limited amount of slots some users struggle with
>>>> registration time notification tag assignment as it's hard to manage
>>>> notifications using sequence numbers. Add a simple feature that copies
>>>> sqe->user_data of a send(+flush) request into the notification CQE it
>>>> flushes (and only when it's flushes).
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> ---
>>>>   include/uapi/linux/io_uring.h | 4 ++++
>>>>   io_uring/net.c                | 6 +++++-
>>>>   2 files changed, 9 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>> index 20368394870e..91e7944c9c78 100644
>>>> --- a/include/uapi/linux/io_uring.h
>>>> +++ b/include/uapi/linux/io_uring.h
>>>> @@ -280,11 +280,15 @@ enum io_uring_op {
>>>>    *
>>>>    * IORING_RECVSEND_NOTIF_FLUSH    Flush a notification after a successful
>>>>    *                successful. Only for zerocopy sends.
>>>> + *
>>>> + * IORING_RECVSEND_NOTIF_COPY_TAG Copy request's user_data into the notification
>>>> + *                  completion even if it's flushed.
>>>>    */
>>>>   #define IORING_RECVSEND_POLL_FIRST    (1U << 0)
>>>>   #define IORING_RECV_MULTISHOT        (1U << 1)
>>>>   #define IORING_RECVSEND_FIXED_BUF    (1U << 2)
>>>>   #define IORING_RECVSEND_NOTIF_FLUSH    (1U << 3)
>>>> +#define IORING_RECVSEND_NOTIF_COPY_TAG    (1U << 4)
>>>>   /* cqe->res mask for extracting the notification sequence number */
>>>>   #define IORING_NOTIF_SEQ_MASK        0xFFFFU
>>>> diff --git a/io_uring/net.c b/io_uring/net.c
>>>> index bd3fad9536ef..4d271a269979 100644
>>>> --- a/io_uring/net.c
>>>> +++ b/io_uring/net.c
>>>> @@ -858,7 +858,9 @@ int io_sendzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>>       zc->flags = READ_ONCE(sqe->ioprio);
>>>>       if (zc->flags & ~(IORING_RECVSEND_POLL_FIRST |
>>>> -              IORING_RECVSEND_FIXED_BUF | IORING_RECVSEND_NOTIF_FLUSH))
>>>> +              IORING_RECVSEND_FIXED_BUF |
>>>> +              IORING_RECVSEND_NOTIF_FLUSH |
>>>> +              IORING_RECVSEND_NOTIF_COPY_TAG))
>>>>           return -EINVAL;
>>>>       if (zc->flags & IORING_RECVSEND_FIXED_BUF) {
>>>>           unsigned idx = READ_ONCE(sqe->buf_index);
>>>> @@ -1024,6 +1026,8 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags)
>>>>           if (ret == -ERESTARTSYS)
>>>>               ret = -EINTR;
>>>>       } else if (zc->flags & IORING_RECVSEND_NOTIF_FLUSH) {
>>>> +        if (zc->flags & IORING_RECVSEND_NOTIF_COPY_TAG)
>>>> +            notif->cqe.user_data = req->cqe.user_data;
>>>>           io_notif_slot_flush_submit(notif_slot, 0);
>>>>       }
>>>
>>> This would work but it seems to be confusing.
>>>
>>> Can't we have a slot-less mode, with slot_idx==U16_MAX,
>>> where we always allocate a new notif for each request,
>>> this would then get the same user_data and would be referenced on the
>>> request in order to reuse the same notif on an async retry after a short send.
>>
>> Ok, retries may make slots managing much harder, let me think
> 
> With retries it would be much saner to use the same
> notif for the whole request. So keeping it referenced
> as zc->notif might be a way, maybe doing that in the _prep
> function in order to do it just once, then we don't need
> zc->slot_idx anymore.

Even though it's possible atm with some userspace consideration,
it's definitely should be patched up.

>>> And this notif will always be flushed at the end of the request.
>>>
>>> This:
>>>
>>> struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
>>>                                  struct io_notif_slot *slot)
>>>
>>> would change to:
>>>
>>> struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
>>>                                  __u64 cqe_user_data,
>>>                  __s32 cqe_res)
>>>
>>>
>>> and:
>>>
>>> void io_notif_slot_flush(struct io_notif_slot *slot) __must_hold(&ctx->uring_lock)
>>>
>>> (__must_hold looks wrong there...)
>>
>> Nope, it should be there
> 
> Shouldn't it be something like
> __must_hold(&slot->notif->ctx->uring_lock)
> 
> There is not 'ctx' argument.

Ah, in this sense, agree

>>> What do you think? It would remove the whole notif slot complexity
>>> from caller using IORING_RECVSEND_NOTIF_FLUSH for every request anyway.
>>
>> The downside is that requests then should be pretty large or it'll
>> lose in performance. Surely not a problem for 8MB per request but
>> even 4KB won't suffice. And users may want to put in smaller chunks
>> on the wire instead of waiting for mode data to let tcp handle
>> pacing and potentially improve latencies by sending earlier.
> 
> If this is optional applications can decide what fits better.
> 
>> On the other hand that one notification per request idea mentioned
>> before can extended to 1-2 CQEs per request, which is interestingly
>> the approach zc send discussions started with.
> 
> In order to make use of any of this I need any way
> to get 2 CQEs with user_data being the same or related.

The idea described above will post 2 CQEs (mostly) per request
as you want with an optional way to have only 1 CQE. My current
sentiment is to kill all the slot business, leave this 1-2 CQE
per request and see if there are users for whom it won't be
enough. It's anyway just a slight deviation from what I wanted
to push as a complimentary interface.

> The only benefit for with slots is being able to avoid or
> batch additional CQEs, correct? Or is there more to it?

CQE batching is a lesser problem, I'm more concerned of how
it sticks with the network. In short, it'll hugely underperform
with TCP if requests are not large enough.

A simple bench with some hacks, localhost, TCP, run by

./msg_zerocopy -6 -r tcp -s <size> &
./io_uring_zerocopy_tx -6 -D "::1" -s <size> -m <0,2> tcp


non-zerocopy:
4000B:  tx=8711880 (MB=33233), tx/s=1742376 (MB/s=6646)
16000B: tx=3196528 (MB=48775), tx/s=639305 (MB/s=9755)
60000B: tx=1036536 (MB=59311), tx/s=207307 (MB/s=11862)

zerocopy:
4000B:  tx=3003488 (MB=11457), tx/s=600697 (MB/s=2291)
16000B: tx=2940296 (MB=44865), tx/s=588059 (MB/s=8973)
60000B: tx=2621792 (MB=150020), tx/s=524358 (MB/s=30004)

Reusing notifications with slots will change the picture.
And it this has nothing to do with io_uring overhead like
CQE posting and so on.

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2022-08-19 11:43 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16  7:41 [RFC 0/2] io_uring zc notification tag override Pavel Begunkov
2022-08-16  7:42 ` [RFC 1/2] io_uring/notif: change notif CQE uapi format Pavel Begunkov
2022-08-16  8:14   ` Stefan Metzmacher
2022-08-16  7:42 ` [RFC 2/2] io_uring/net: allow to override notification tag Pavel Begunkov
2022-08-16  8:23   ` Stefan Metzmacher
2022-08-17 12:42     ` Pavel Begunkov
2022-08-18 18:13       ` Stefan Metzmacher
2022-08-19 11:42         ` Pavel Begunkov [this message]
2022-08-19 12:36           ` Stefan Metzmacher
2022-08-22 11:49             ` Pavel Begunkov
2022-08-16  8:37   ` Dylan Yudaken
2022-08-17 10:48     ` Pavel Begunkov
2022-08-17 12:04       ` Dylan Yudaken
2022-08-17 12:44         ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox