public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Stefan Metzmacher <[email protected]>, [email protected]
Cc: Jens Axboe <[email protected]>, Dylan Yudaken <[email protected]>
Subject: Re: [RFC 2/2] io_uring/net: allow to override notification tag
Date: Mon, 22 Aug 2022 12:49:39 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/19/22 13:36, Stefan Metzmacher wrote:
[...]
>>>>> What do you think? It would remove the whole notif slot complexity
>>>>> from caller using IORING_RECVSEND_NOTIF_FLUSH for every request anyway.
>>>>
>>>> The downside is that requests then should be pretty large or it'll
>>>> lose in performance. Surely not a problem for 8MB per request but
>>>> even 4KB won't suffice. And users may want to put in smaller chunks
>>>> on the wire instead of waiting for mode data to let tcp handle
>>>> pacing and potentially improve latencies by sending earlier.
>>>
>>> If this is optional applications can decide what fits better.
>>>
>>>> On the other hand that one notification per request idea mentioned
>>>> before can extended to 1-2 CQEs per request, which is interestingly
>>>> the approach zc send discussions started with.
>>>
>>> In order to make use of any of this I need any way
>>> to get 2 CQEs with user_data being the same or related.
>>
>> The idea described above will post 2 CQEs (mostly) per request
>> as you want with an optional way to have only 1 CQE. My current
>> sentiment is to kill all the slot business, leave this 1-2 CQE
>> per request and see if there are users for whom it won't be
>> enough. It's anyway just a slight deviation from what I wanted
>> to push as a complimentary interface.
> 
> Ah, ok, removing the slot stuff again would be fine for me...
> 
>>> The only benefit for with slots is being able to avoid or
>>> batch additional CQEs, correct? Or is there more to it?
>>
>> CQE batching is a lesser problem, I'm more concerned of how
>> it sticks with the network. In short, it'll hugely underperform
>> with TCP if requests are not large enough.
>>
>> A simple bench with some hacks, localhost, TCP, run by
>>
>> ./msg_zerocopy -6 -r tcp -s <size> &
>> ./io_uring_zerocopy_tx -6 -D "::1" -s <size> -m <0,2> tcp
>>
>>
>> non-zerocopy:
>> 4000B:  tx=8711880 (MB=33233), tx/s=1742376 (MB/s=6646)
>> 16000B: tx=3196528 (MB=48775), tx/s=639305 (MB/s=9755)
>> 60000B: tx=1036536 (MB=59311), tx/s=207307 (MB/s=11862)
>>
>> zerocopy:
>> 4000B:  tx=3003488 (MB=11457), tx/s=600697 (MB/s=2291)
>> 16000B: tx=2940296 (MB=44865), tx/s=588059 (MB/s=8973)
>> 60000B: tx=2621792 (MB=150020), tx/s=524358 (MB/s=30004)
> 
> So with something between 16k and 60k we reach the point where
> ZC starts to be faster, correct?

For this setup -- yes, should be somewhat around 16-20K,
don't remember numbers for real hw, but I saw similar
tendencies.

> Did you remove the loopback restriction as described in
> Documentation/networking/msg_zerocopy.rst ?

right, it wouldn't outperform even with large payload otherwise

> Are the results similar when using ./msg_zerocopy -6 tcp -s <size>
> as client?

Shouldn't be, it also batches multiple requests to a single
(internal) notification and also exposes it to the userspace
differently.

> And the reason is some page pinning overhead from iov_iter_get_pages2()
> in __zerocopy_sg_from_iter()?

No, I was using registered buffers here, so instead of
iov_iter_get_pages2() business zerocopy was doing
io_uring/net.c:io_sg_from_iter(). And in any case overhead on pinning
wouldn't drastically change it.

>> Reusing notifications with slots will change the picture.
>> And it this has nothing to do with io_uring overhead like
>> CQE posting and so on.
> 
> Hmm I don't understand how the number of notif structures
> would have any impact? Is it related to io_sg_from_iter()?

It comes from TCP stack force changing an skbuff every time
it meets a new ubuf_info (i.e. a notification handle for
simplicity), there is a slight bump on skb allocation overhead
but the main problem is seemingly comes from tcp_push and so,
feeding it down the stack. I don't think there is any fundamental
reason for why it should be working so much slower but might
be problematic from engineering perspective. I'll ask a bit
around or maybe look myself if find time for that.

-- 
Pavel Begunkov

  reply	other threads:[~2022-08-22 11:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-16  7:41 [RFC 0/2] io_uring zc notification tag override Pavel Begunkov
2022-08-16  7:42 ` [RFC 1/2] io_uring/notif: change notif CQE uapi format Pavel Begunkov
2022-08-16  8:14   ` Stefan Metzmacher
2022-08-16  7:42 ` [RFC 2/2] io_uring/net: allow to override notification tag Pavel Begunkov
2022-08-16  8:23   ` Stefan Metzmacher
2022-08-17 12:42     ` Pavel Begunkov
2022-08-18 18:13       ` Stefan Metzmacher
2022-08-19 11:42         ` Pavel Begunkov
2022-08-19 12:36           ` Stefan Metzmacher
2022-08-22 11:49             ` Pavel Begunkov [this message]
2022-08-16  8:37   ` Dylan Yudaken
2022-08-17 10:48     ` Pavel Begunkov
2022-08-17 12:04       ` Dylan Yudaken
2022-08-17 12:44         ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox