From: Pavel Begunkov <[email protected]>
To: Stefan Metzmacher <[email protected]>, [email protected]
Cc: Jens Axboe <[email protected]>, Dylan Yudaken <[email protected]>
Subject: Re: [RFC 2/2] io_uring/net: allow to override notification tag
Date: Mon, 22 Aug 2022 12:49:39 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 8/19/22 13:36, Stefan Metzmacher wrote:
[...]
>>>>> What do you think? It would remove the whole notif slot complexity
>>>>> from caller using IORING_RECVSEND_NOTIF_FLUSH for every request anyway.
>>>>
>>>> The downside is that requests then should be pretty large or it'll
>>>> lose in performance. Surely not a problem for 8MB per request but
>>>> even 4KB won't suffice. And users may want to put in smaller chunks
>>>> on the wire instead of waiting for mode data to let tcp handle
>>>> pacing and potentially improve latencies by sending earlier.
>>>
>>> If this is optional applications can decide what fits better.
>>>
>>>> On the other hand that one notification per request idea mentioned
>>>> before can extended to 1-2 CQEs per request, which is interestingly
>>>> the approach zc send discussions started with.
>>>
>>> In order to make use of any of this I need any way
>>> to get 2 CQEs with user_data being the same or related.
>>
>> The idea described above will post 2 CQEs (mostly) per request
>> as you want with an optional way to have only 1 CQE. My current
>> sentiment is to kill all the slot business, leave this 1-2 CQE
>> per request and see if there are users for whom it won't be
>> enough. It's anyway just a slight deviation from what I wanted
>> to push as a complimentary interface.
>
> Ah, ok, removing the slot stuff again would be fine for me...
>
>>> The only benefit for with slots is being able to avoid or
>>> batch additional CQEs, correct? Or is there more to it?
>>
>> CQE batching is a lesser problem, I'm more concerned of how
>> it sticks with the network. In short, it'll hugely underperform
>> with TCP if requests are not large enough.
>>
>> A simple bench with some hacks, localhost, TCP, run by
>>
>> ./msg_zerocopy -6 -r tcp -s <size> &
>> ./io_uring_zerocopy_tx -6 -D "::1" -s <size> -m <0,2> tcp
>>
>>
>> non-zerocopy:
>> 4000B: tx=8711880 (MB=33233), tx/s=1742376 (MB/s=6646)
>> 16000B: tx=3196528 (MB=48775), tx/s=639305 (MB/s=9755)
>> 60000B: tx=1036536 (MB=59311), tx/s=207307 (MB/s=11862)
>>
>> zerocopy:
>> 4000B: tx=3003488 (MB=11457), tx/s=600697 (MB/s=2291)
>> 16000B: tx=2940296 (MB=44865), tx/s=588059 (MB/s=8973)
>> 60000B: tx=2621792 (MB=150020), tx/s=524358 (MB/s=30004)
>
> So with something between 16k and 60k we reach the point where
> ZC starts to be faster, correct?
For this setup -- yes, should be somewhat around 16-20K,
don't remember numbers for real hw, but I saw similar
tendencies.
> Did you remove the loopback restriction as described in
> Documentation/networking/msg_zerocopy.rst ?
right, it wouldn't outperform even with large payload otherwise
> Are the results similar when using ./msg_zerocopy -6 tcp -s <size>
> as client?
Shouldn't be, it also batches multiple requests to a single
(internal) notification and also exposes it to the userspace
differently.
> And the reason is some page pinning overhead from iov_iter_get_pages2()
> in __zerocopy_sg_from_iter()?
No, I was using registered buffers here, so instead of
iov_iter_get_pages2() business zerocopy was doing
io_uring/net.c:io_sg_from_iter(). And in any case overhead on pinning
wouldn't drastically change it.
>> Reusing notifications with slots will change the picture.
>> And it this has nothing to do with io_uring overhead like
>> CQE posting and so on.
>
> Hmm I don't understand how the number of notif structures
> would have any impact? Is it related to io_sg_from_iter()?
It comes from TCP stack force changing an skbuff every time
it meets a new ubuf_info (i.e. a notification handle for
simplicity), there is a slight bump on skb allocation overhead
but the main problem is seemingly comes from tcp_push and so,
feeding it down the stack. I don't think there is any fundamental
reason for why it should be working so much slower but might
be problematic from engineering perspective. I'll ask a bit
around or maybe look myself if find time for that.
--
Pavel Begunkov
next prev parent reply other threads:[~2022-08-22 11:54 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-16 7:41 [RFC 0/2] io_uring zc notification tag override Pavel Begunkov
2022-08-16 7:42 ` [RFC 1/2] io_uring/notif: change notif CQE uapi format Pavel Begunkov
2022-08-16 8:14 ` Stefan Metzmacher
2022-08-16 7:42 ` [RFC 2/2] io_uring/net: allow to override notification tag Pavel Begunkov
2022-08-16 8:23 ` Stefan Metzmacher
2022-08-17 12:42 ` Pavel Begunkov
2022-08-18 18:13 ` Stefan Metzmacher
2022-08-19 11:42 ` Pavel Begunkov
2022-08-19 12:36 ` Stefan Metzmacher
2022-08-22 11:49 ` Pavel Begunkov [this message]
2022-08-16 8:37 ` Dylan Yudaken
2022-08-17 10:48 ` Pavel Begunkov
2022-08-17 12:04 ` Dylan Yudaken
2022-08-17 12:44 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox