public inbox for [email protected]
 help / color / mirror / Atom feed
From: David Ahern <[email protected]>
To: Pavel Begunkov <[email protected]>,
	[email protected], [email protected],
	[email protected]
Cc: "David S . Miller" <[email protected]>,
	Jakub Kicinski <[email protected]>,
	Jonathan Lemon <[email protected]>,
	Willem de Bruijn <[email protected]>,
	Jens Axboe <[email protected]>,
	[email protected]
Subject: Re: [PATCH net-next v4 00/27] io_uring zerocopy send
Date: Wed, 13 Jul 2022 16:45:43 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 7/11/22 5:56 AM, Pavel Begunkov wrote:
> On 7/8/22 15:26, Pavel Begunkov wrote:
>> On 7/8/22 05:10, David Ahern wrote:
>>> On 7/7/22 5:49 AM, Pavel Begunkov wrote:
>>>> NOTE: Not be picked directly. After getting necessary acks, I'll be
>>>> working
>>>>        out merging with Jakub and Jens.
>>>>
>>>> The patchset implements io_uring zerocopy send. It works with both
>>>> registered
>>>> and normal buffers, mixing is allowed but not recommended. Apart
>>>> from usual
>>>> request completions, just as with MSG_ZEROCOPY, io_uring separately
>>>> notifies
>>>> the userspace when buffers are freed and can be reused (see API
>>>> design below),
>>>> which is delivered into io_uring's Completion Queue. Those
>>>> "buffer-free"
>>>> notifications are not necessarily per request, but the userspace has
>>>> control
>>>> over it and should explicitly attaching a number of requests to a
>>>> single
>>>> notification. The series also adds some internal optimisations when
>>>> used with
>>>> registered buffers like removing page referencing.
>>>>
>>>>  From the kernel networking perspective there are two main changes.
>>>> The first
>>>> one is passing ubuf_info into the network layer from io_uring
>>>> (inside of an
>>>> in kernel struct msghdr). This allows extra optimisations, e.g.
>>>> ubuf_info
>>>> caching on the io_uring side, but also helps to avoid cross-referencing
>>>> and synchronisation problems. The second part is an optional
>>>> optimisation
>>>> removing page referencing for requests with registered buffers.
>>>>
>>>> Benchmarking with an optimised version of the selftest (see [1]),
>>>> which sends
>>>> a bunch of requests, waits for completions and repeats. "+ flush"
>>>> column posts
>>>> one additional "buffer-free" notification per request, and just "zc"
>>>> doesn't
>>>> post buffer notifications at all.
>>>>
>>>> NIC (requests / second):
>>>> IO size | non-zc    | zc             | zc + flush
>>>> 4000    | 495134    | 606420 (+22%)  | 558971 (+12%)
>>>> 1500    | 551808    | 577116 (+4.5%) | 565803 (+2.5%)
>>>> 1000    | 584677    | 592088 (+1.2%) | 560885 (-4%)
>>>> 600     | 596292    | 598550 (+0.4%) | 555366 (-6.7%)
>>>>
>>>> dummy (requests / second):
>>>> IO size | non-zc    | zc             | zc + flush
>>>> 8000    | 1299916   | 2396600 (+84%) | 2224219 (+71%)
>>>> 4000    | 1869230   | 2344146 (+25%) | 2170069 (+16%)
>>>> 1200    | 2071617   | 2361960 (+14%) | 2203052 (+6%)
>>>> 600     | 2106794   | 2381527 (+13%) | 2195295 (+4%)
>>>>
>>>> Previously it also brought a massive performance speedup compared to
>>>> the
>>>> msg_zerocopy tool (see [3]), which is probably not super interesting.
>>>>
>>>
>>> can you add a comment that the above results are for UDP.
>>
>> Oh, right, forgot to add it
>>
>>
>>> You dropped comments about TCP testing; any progress there? If not, can
>>> you relay any issues you are hitting?
>>
>> Not really a problem, but for me it's bottle necked at NIC bandwidth
>> (~3GB/s) for both zc and non-zc and doesn't even nearly saturate a CPU.
>> Was actually benchmarked by my colleague quite a while ago, but can't
>> find numbers. Probably need to at least add localhost numbers or grab
>> a better server.
> 
> Testing localhost TCP with a hack (see below), it doesn't include
> refcounting optimisations I was testing UDP with and that will be
> sent afterwards. Numbers are in MB/s
> 
> IO size | non-zc    | zc
> 1200    | 4174      | 4148
> 4096    | 7597      | 11228

I am surprised by the low numbers; you should be able to saturate a 100G
link with TCP and ZC TX API.

> 
> Because it's localhost, we also spend cycles here for the recv side.
> Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
> omitted optimisations will somewhat help. I don't consider it to be a
> blocker. but would be interesting to poke into later. One thing helping
> non-zc is that it squeezes a number of requests into a single page
> whenever zerocopy adds a new frag for every request.
> 
> Can't say anything new for larger payloads, I'm still NIC-bound but
> looking at CPU utilisation zc doesn't drain as much cycles as non-zc.
> Also, I don't remember if mentioned before, but another catch is that
> with TCP it expects users to not be flushing notifications too much,
> because it forces it to allocate a new skb and lose a good chunk of
> benefits from using TCP.

I had issues with TCP sockets and io_uring at the end of 2020:
https://www.spinics.net/lists/io-uring/msg05125.html

have not tried anything recent (from 2022).


  reply	other threads:[~2022-07-13 23:45 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-07 11:49 [PATCH net-next v4 00/27] io_uring zerocopy send Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 01/27] ipv4: avoid partial copy for zc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 02/27] ipv6: " Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 03/27] skbuff: don't mix ubuf_info from different sources Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 04/27] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 05/27] skbuff: carry external ubuf_info in msghdr Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 06/27] net: Allow custom iter handler " Pavel Begunkov
2022-07-11 12:20   ` Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 07/27] net: introduce managed frags infrastructure Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 08/27] net: introduce __skb_fill_page_desc_noacc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 09/27] ipv4/udp: support externally provided ubufs Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 10/27] ipv6/udp: " Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 11/27] tcp: " Pavel Begunkov
2022-07-08  4:06   ` David Ahern
2022-07-08 14:03     ` Pavel Begunkov
2022-07-13 23:38       ` David Ahern
2022-07-07 11:49 ` [PATCH net-next v4 12/27] io_uring: initialise msghdr::msg_ubuf Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 13/27] io_uring: export io_put_task() Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 14/27] io_uring: add zc notification infrastructure Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 15/27] io_uring: cache struct io_notif Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 16/27] io_uring: complete notifiers in tw Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 17/27] io_uring: add rsrc referencing for notifiers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 18/27] io_uring: add notification slot registration Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 19/27] io_uring: wire send zc request type Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 20/27] io_uring: account locked pages for non-fixed zc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 21/27] io_uring: allow to pass addr into sendzc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 22/27] io_uring: sendzc with fixed buffers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 23/27] io_uring: flush notifiers after sendzc Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 24/27] io_uring: rename IORING_OP_FILES_UPDATE Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 25/27] io_uring: add zc notification flush requests Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 26/27] io_uring: enable managed frags with register buffers Pavel Begunkov
2022-07-07 11:49 ` [PATCH net-next v4 27/27] selftests/io_uring: test zerocopy send Pavel Begunkov
2022-07-08  4:10 ` [PATCH net-next v4 00/27] io_uring " David Ahern
2022-07-08 14:26   ` Pavel Begunkov
2022-07-11 12:56     ` Pavel Begunkov
2022-07-13 23:45       ` David Ahern [this message]
2022-07-14 18:55         ` Pavel Begunkov
2022-07-18  2:19           ` David Ahern
2022-07-20 13:32             ` Pavel Begunkov
2022-07-24 18:28             ` David Ahern
2022-07-27 10:51               ` Pavel Begunkov
2022-07-29 22:30                 ` David Ahern
2022-09-26 20:08               ` Pavel Begunkov
2022-09-28 19:31                 ` David Ahern
2022-09-28 20:11                   ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox