From: Pavel Begunkov <[email protected]>
To: Willem de Bruijn <[email protected]>
Cc: [email protected], [email protected],
[email protected], Jakub Kicinski <[email protected]>,
Jonathan Lemon <[email protected]>,
"David S . Miller" <[email protected]>,
Eric Dumazet <[email protected]>,
Hideaki YOSHIFUJI <[email protected]>,
David Ahern <[email protected]>, Jens Axboe <[email protected]>
Subject: Re: [RFC 00/12] io_uring zerocopy send
Date: Thu, 2 Dec 2021 16:25:19 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CA+FuTSeP-W-ePV1EkWMmD4Ycsfq9viYdtyfDbUW3LXTc2q+BHQ@mail.gmail.com>
On 12/2/21 00:36, Willem de Bruijn wrote:
>>>>> 1) we pass a bvec, so no page table walks.
>>>>> 2) zerocopy_sg_from_iter() is just slow, adding a bvec optimised version
>>>>> still doing page get/put (see 4/12) slashed 4-5%.
>>>>> 3) avoiding get_page/put_page in 5/12
>>>>> 4) completion events are posted into io_uring's CQ, so no
>>>>> extra recvmsg for getting events
>>>>> 5) no poll(2) in the code because of io_uring
>>>>> 6) lot of time is spent in sock_omalloc()/free allocating ubuf_info.
>>>>> io_uring caches the structures reducing it to nearly zero-overhead.
>>>>
>>>> Nice set of complementary optimizations.
>>>>
>>>> We have looked at adding some of those as independent additions to
>>>> msg_zerocopy before, such as long-term pinned regions. One issue with
>>>> that is that the pages must remain until the request completes,
>>>> regardless of whether the calling process is alive. So it cannot rely
>>>> on a pinned range held by a process only.
>>>>
>>>> If feasible, it would be preferable if the optimizations can be added
>>>> to msg_zerocopy directly, rather than adding a dependency on io_uring
>>>> to make use of them. But not sure how feasible that is. For some, like
>>>> 4 and 5, the answer is clearly it isn't. 6, it probably is?
>>
>> Forgot about 6), io_uring uses the fact that submissions are
>> done under an per ring mutex, and completions are under a per
>> ring spinlock, so there are two lists for them and no extra
>> locking. Lists are spliced in a batched manner, so it's
>> 1 spinlock per N (e.g. 32) cached ubuf_info's allocations.
>>
>> Any similar guarantees for sockets?
>
> For datagrams it might matter, not sure if it would show up in a
> profile. The current notification mechanism is quite a bit more
> heavyweight than any form of fixed ubuf pool.
Just to give an idea what I'm seeing in profiles: while testing
3 | io_uring (@flush=false, nr_reqs=1) | 96534 | 2.03
I found that removing one extra smb_mb() per request in io_uring
gave around +0.65% of t-put (quick testing). In profiles the
function where it was dropped from 0.93% to 0.09%.
From what I see, alloc+free takes 6-10% for 64KB UDP, it may be
great to have something for MSG_ZEROCOPY, but if that adds
additional locking/atomics, honestly I'd prefer to keep it separate
from io_uring's caching.
I also hope we can optimise generic paths at some point, and the
faster it gets the more such additional locking will hurt, pretty
much how it was with the block layer.
> For TCP this matters less, as multiple sends are not needed and
> completions are coalesced, because in order.
>
--
Pavel Begunkov
next prev parent reply other threads:[~2021-12-02 16:25 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-30 15:18 [RFC 00/12] io_uring zerocopy send Pavel Begunkov
2021-11-30 15:18 ` [RFC 01/12] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-11-30 15:18 ` [RFC 02/12] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2021-11-30 15:18 ` [RFC 03/12] net/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-11-30 15:18 ` [RFC 04/12] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-11-30 15:18 ` [RFC 05/12] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-01 19:20 ` Jonathan Lemon
2021-12-01 20:17 ` Pavel Begunkov
2021-11-30 15:18 ` [RFC 06/12] io_uring: add send notifiers registration Pavel Begunkov
2021-11-30 15:18 ` [RFC 07/12] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-11-30 15:18 ` [RFC 08/12] io_uring: wire send zc request type Pavel Begunkov
2021-11-30 15:18 ` [RFC 09/12] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-11-30 15:18 ` [RFC 10/12] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-11-30 15:18 ` [RFC 11/12] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-11-30 15:19 ` [RFC 12/12] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-01 3:10 ` [RFC 00/12] io_uring zerocopy send David Ahern
2021-12-01 15:32 ` Pavel Begunkov
2021-12-01 17:57 ` David Ahern
[not found] ` <[email protected]>
2021-12-01 19:20 ` David Ahern
2021-12-01 20:15 ` Pavel Begunkov
2021-12-01 21:51 ` Martin KaFai Lau
2021-12-01 22:35 ` David Ahern
2021-12-01 23:07 ` Martin KaFai Lau
2021-12-01 23:18 ` Pavel Begunkov
2021-12-02 15:48 ` Pavel Begunkov
2021-12-02 17:40 ` Martin KaFai Lau
2021-12-01 20:42 ` Pavel Begunkov
2021-12-01 14:31 ` Pavel Begunkov
2021-12-01 17:49 ` David Ahern
2021-12-01 19:59 ` Pavel Begunkov
2021-12-01 18:10 ` Willem de Bruijn
2021-12-01 19:59 ` Pavel Begunkov
2021-12-01 20:29 ` Pavel Begunkov
2021-12-02 0:36 ` Willem de Bruijn
2021-12-02 16:25 ` Pavel Begunkov [this message]
2021-12-02 0:32 ` Willem de Bruijn
2021-12-02 16:45 ` Pavel Begunkov
2021-12-02 21:25 ` Willem de Bruijn
2021-12-03 16:19 ` Pavel Begunkov
2021-12-03 16:30 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox