public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jinjie Ruan <[email protected]>
To: Pavel Begunkov <[email protected]>,
	<[email protected]>, <[email protected]>,
	<[email protected]>
Cc: "David S . Miller" <[email protected]>,
	Jakub Kicinski <[email protected]>,
	Jonathan Lemon <[email protected]>,
	Willem de Bruijn <[email protected]>,
	Jens Axboe <[email protected]>, David Ahern <[email protected]>,
	<[email protected]>
Subject: Re: [PATCH net-next v5 00/27] io_uring zerocopy send
Date: Tue, 18 Feb 2025 09:47:15 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>



On 2022/7/13 4:52, Pavel Begunkov wrote:
> NOTE: Not to be picked directly. After getting necessary acks, I'll be
>       working out merging with Jakub and Jens.
> 
> The patchset implements io_uring zerocopy send. It works with both registered
> and normal buffers, mixing is allowed but not recommended. Apart from usual
> request completions, just as with MSG_ZEROCOPY, io_uring separately notifies
> the userspace when buffers are freed and can be reused (see API design below),
> which is delivered into io_uring's Completion Queue. Those "buffer-free"
> notifications are not necessarily per request, but the userspace has control
> over it and should explicitly attaching a number of requests to a single
> notification. The series also adds some internal optimisations when used with
> registered buffers like removing page referencing.
> 
>>From the kernel networking perspective there are two main changes. The first
> one is passing ubuf_info into the network layer from io_uring (inside of an
> in kernel struct msghdr). This allows extra optimisations, e.g. ubuf_info
> caching on the io_uring side, but also helps to avoid cross-referencing
> and synchronisation problems. The second part is an optional optimisation
> removing page referencing for requests with registered buffers.
> 
> Benchmarking UDP with an optimised version of the selftest (see [1]), which

Hi, Pavel, I'm interested in zero copy sending of io_uring, but I can't
reproduce its performance using zerocopy send selftest test case, such
as "bash io_uring_zerocopy_tx.sh 6 udp -m 0/1/2/3 -n 64", even baseline
performance may be the best.

               MB/s
NONZC         8379
ZC            5910
ZC_FIXED      6294
MIXED         6350

And the zero-copy example in [1] does not seem to work because the
kernel is modified by following commit:

https://lore.kernel.org/all/[email protected]/

Can you help me reproduce this performance test result? Is it necessary
to configure better parameters to reproduce the problem?


> sends a bunch of requests, waits for completions and repeats. "+ flush" column
> posts one additional "buffer-free" notification per request, and just "zc"
> doesn't post buffer notifications at all.
> 
> NIC (requests / second):
> IO size | non-zc    | zc             | zc + flush
> 4000    | 495134    | 606420 (+22%)  | 558971 (+12%)
> 1500    | 551808    | 577116 (+4.5%) | 565803 (+2.5%)
> 1000    | 584677    | 592088 (+1.2%) | 560885 (-4%)
> 600     | 596292    | 598550 (+0.4%) | 555366 (-6.7%)
> 
> dummy (requests / second):
> IO size | non-zc    | zc             | zc + flush
> 8000    | 1299916   | 2396600 (+84%) | 2224219 (+71%)
> 4000    | 1869230   | 2344146 (+25%) | 2170069 (+16%)
> 1200    | 2071617   | 2361960 (+14%) | 2203052 (+6%)
> 600     | 2106794   | 2381527 (+13%) | 2195295 (+4%)
> 
> Previously it also brought a massive performance speedup compared to the
> msg_zerocopy tool (see [3]), which is probably not super interesting. There
> is also an additional bunch of refcounting optimisations that was omitted from
> the series for simplicity and as they don't change the picture drastically,
> they will be sent as follow up, as well as flushing optimisations closing the
> performance gap b/w two last columns.
> 
> For TCP on localhost (with hacks enabling localhost zerocopy) and including
> additional overhead for receive:
> 
> IO size | non-zc    | zc
> 1200    | 4174      | 4148
> 4096    | 7597      | 11228
> 
> Using a real NIC 1200 bytes, zc is worse than non-zc ~5-10%, maybe the
> omitted optimisations will somewhat help, should look better for 4000,
> but couldn't test properly because of setup problems.
> 
> Links:
> 
>   liburing (benchmark + tests):
>   [1] https://github.com/isilence/liburing/tree/zc_v4
> 
>   kernel repo:
>   [2] https://github.com/isilence/linux/tree/zc_v4
> 
>   RFC v1:
>   [3] https://lore.kernel.org/io-uring/[email protected]/
> 
>   RFC v2:
>   https://lore.kernel.org/io-uring/[email protected]/
> 
>   Net patches based:
>   [email protected]:isilence/linux.git zc_v4-net-base
>   or
>   https://github.com/isilence/linux/tree/zc_v4-net-base
> 
> API design overview:
> 
>   The series introduces an io_uring concept of notifactors. From the userspace
>   perspective it's an entity to which it can bind one or more requests and then
>   requesting to flush it. Flushing a notifier makes it impossible to attach new
>   requests to it, and instructs the notifier to post a completion once all
>   requests attached to it are completed and the kernel doesn't need the buffers
>   anymore.
> 
>   Notifications are stored in notification slots, which should be registered as
>   an array in io_uring. Each slot stores only one notifier at any particular
>   moment. Flushing removes it from the slot and the slot automatically replaces
>   it with a new notifier. All operations with notifiers are done by specifying
>   an index of a slot it's currently in.
> 
>   When registering a notification the userspace specifies a u64 tag for each
>   slot, which will be copied in notification completion entries as
>   cqe::user_data. cqe::res is 0 and cqe::flags is equal to wrap around u32
>   sequence number counting notifiers of a slot.
> 
> Changelog:
> 
>   v4 -> v5
>     remove ubuf_info checks from custom iov_iter callbacks to
>     avoid disabling the page refs optimisations for TCP
> 
>   v3 -> v4
>     custom iov_iter handling
> 
>   RFC v2 -> v3:
>     mem accounting for non-registered buffers
>     allow mixing registered and normal requests per notifier
>     notification flushing via IORING_OP_RSRC_UPDATE
>     TCP support
>     fix buffer indexing
>     fix io-wq ->uring_lock locking
>     fix bugs when mixing with MSG_ZEROCOPY
>     fix managed refs bugs in skbuff.c
> 
>   RFC -> RFC v2:
>     remove additional overhead for non-zc from skb_release_data()
>     avoid msg propagation, hide extra bits of non-zc overhead
>     task_work based "buffer free" notifications
>     improve io_uring's notification refcounting
>     added 5/19, (no pfmemalloc tracking)
>     added 8/19 and 9/19 preventing small copies with zc
>     misc small changes
> 
> David Ahern (1):
>   net: Allow custom iter handler in msghdr
> 
> Pavel Begunkov (26):
>   ipv4: avoid partial copy for zc
>   ipv6: avoid partial copy for zc
>   skbuff: don't mix ubuf_info from different sources
>   skbuff: add SKBFL_DONT_ORPHAN flag
>   skbuff: carry external ubuf_info in msghdr
>   net: introduce managed frags infrastructure
>   net: introduce __skb_fill_page_desc_noacc
>   ipv4/udp: support externally provided ubufs
>   ipv6/udp: support externally provided ubufs
>   tcp: support externally provided ubufs
>   io_uring: initialise msghdr::msg_ubuf
>   io_uring: export io_put_task()
>   io_uring: add zc notification infrastructure
>   io_uring: cache struct io_notif
>   io_uring: complete notifiers in tw
>   io_uring: add rsrc referencing for notifiers
>   io_uring: add notification slot registration
>   io_uring: wire send zc request type
>   io_uring: account locked pages for non-fixed zc
>   io_uring: allow to pass addr into sendzc
>   io_uring: sendzc with fixed buffers
>   io_uring: flush notifiers after sendzc
>   io_uring: rename IORING_OP_FILES_UPDATE
>   io_uring: add zc notification flush requests
>   io_uring: enable managed frags with register buffers
>   selftests/io_uring: test zerocopy send
> 
>  include/linux/io_uring_types.h                |  37 ++
>  include/linux/skbuff.h                        |  66 +-
>  include/linux/socket.h                        |   5 +
>  include/uapi/linux/io_uring.h                 |  45 +-
>  io_uring/Makefile                             |   2 +-
>  io_uring/io_uring.c                           |  42 +-
>  io_uring/io_uring.h                           |  22 +
>  io_uring/net.c                                | 187 ++++++
>  io_uring/net.h                                |   4 +
>  io_uring/notif.c                              | 215 +++++++
>  io_uring/notif.h                              |  87 +++
>  io_uring/opdef.c                              |  24 +-
>  io_uring/rsrc.c                               |  55 +-
>  io_uring/rsrc.h                               |  16 +-
>  io_uring/tctx.h                               |  26 -
>  net/compat.c                                  |   1 +
>  net/core/datagram.c                           |  14 +-
>  net/core/skbuff.c                             |  37 +-
>  net/ipv4/ip_output.c                          |  50 +-
>  net/ipv4/tcp.c                                |  32 +-
>  net/ipv6/ip6_output.c                         |  49 +-
>  net/socket.c                                  |   3 +
>  tools/testing/selftests/net/Makefile          |   1 +
>  .../selftests/net/io_uring_zerocopy_tx.c      | 605 ++++++++++++++++++
>  .../selftests/net/io_uring_zerocopy_tx.sh     | 131 ++++
>  25 files changed, 1628 insertions(+), 128 deletions(-)
>  create mode 100644 io_uring/notif.c
>  create mode 100644 io_uring/notif.h
>  create mode 100644 tools/testing/selftests/net/io_uring_zerocopy_tx.c
>  create mode 100755 tools/testing/selftests/net/io_uring_zerocopy_tx.sh
> 

  parent reply	other threads:[~2025-02-18  1:47 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12 20:52 [PATCH net-next v5 00/27] io_uring zerocopy send Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 01/27] ipv4: avoid partial copy for zc Pavel Begunkov
2022-07-19  1:54   ` Jakub Kicinski
2022-07-19  9:35     ` Willem de Bruijn
2022-07-21 10:03       ` Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 02/27] ipv6: " Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 03/27] skbuff: don't mix ubuf_info from different sources Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 04/27] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 05/27] skbuff: carry external ubuf_info in msghdr Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 06/27] net: Allow custom iter handler " Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 07/27] net: introduce managed frags infrastructure Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 08/27] net: introduce __skb_fill_page_desc_noacc Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 09/27] ipv4/udp: support externally provided ubufs Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 10/27] ipv6/udp: " Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 11/27] tcp: " Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 12/27] io_uring: initialise msghdr::msg_ubuf Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 13/27] io_uring: export io_put_task() Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 14/27] io_uring: add zc notification infrastructure Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 15/27] io_uring: cache struct io_notif Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 16/27] io_uring: complete notifiers in tw Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 17/27] io_uring: add rsrc referencing for notifiers Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 18/27] io_uring: add notification slot registration Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 19/27] io_uring: wire send zc request type Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 20/27] io_uring: account locked pages for non-fixed zc Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 21/27] io_uring: allow to pass addr into sendzc Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 22/27] io_uring: sendzc with fixed buffers Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 23/27] io_uring: flush notifiers after sendzc Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 24/27] io_uring: rename IORING_OP_FILES_UPDATE Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 25/27] io_uring: add zc notification flush requests Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 26/27] io_uring: enable managed frags with register buffers Pavel Begunkov
2022-07-12 20:52 ` [PATCH net-next v5 27/27] selftests/io_uring: test zerocopy send Pavel Begunkov
2022-07-27  8:01   ` dust.li
2022-07-27  9:18     ` Pavel Begunkov
2022-07-20 12:46 ` (subset) [PATCH net-next v5 00/27] io_uring " Jens Axboe
2025-02-18  1:47 ` Jinjie Ruan [this message]
2025-02-19 12:11   ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox