From: Pavel Begunkov <[email protected]>
To: [email protected], [email protected],
[email protected]
Cc: "David S . Miller" <[email protected]>,
Jakub Kicinski <[email protected]>,
Jonathan Lemon <[email protected]>,
Willem de Bruijn <[email protected]>,
Jens Axboe <[email protected]>,
[email protected], Pavel Begunkov <[email protected]>
Subject: [RFC net-next v3 00/29] io_uring zerocopy send
Date: Tue, 28 Jun 2022 19:56:22 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
The third iteration of patches for zerocopy io_uring sends. I fixed
all known issues since the previous version and reshuffled io_uring
patches, but the net/ code didn't change much. I think it's ready
and will send it as a non-RFC soon.
All tests below are done using io_uring with all relevant performance
options turned on. Numbers look good, send + flush per request, which
is the worst case, is on par with non-zerocopy with the payload size
lower than 600 bytes with dummy netdev and b/w 1200-1500 for NIC tests.
Without "buffer-free" notification flushing at all it's on par with NIC
at around 600 bytes.
dummy:
IO size | non-zc (tx/s) | zc (tx/s) | zc + flush (tx/s)
8000 | 1299916 | 2396600 (+84%) | 2224219 (+71%)
4000 | 1869230 | 2344146 (+25%) | 2170069 (+16%)
1200 | 2071617 | 2361960 (+14%) | 2203052 (+6%)
600 | 2106794 | 2381527 (+13%) | 2195295 (+4%)
NIC:
IO size | non-zc (tx/s) | zc (tx/s) | zc + flush (tx/s)
4000 | 495134 | 606420 (+22%) | 558971 (+12%)
1500 | 551808 | 577116 (+4.5%) | 565803 (+2.5%)
1000 | 584677 | 592088 (+1.2%) | 560885 (-4%)
600 | 596292 | 598550 (+0.4%) | 555366 (-6.7%)
Apart from zerocopy, it also removes page referencing for reigstered
buffers (used in all zc tests). I'm experimenting with notificaiton
optimsation, which should improve the 3rd column, but that will go
separately from this series. I've also seen good CPU usage reduction
for TCP comparing to non-zc, but not posting numbers as had problems
saturating CPU.
Links:
RFC v1:
https://lore.kernel.org/io-uring/[email protected]/
RFC v2:
https://lore.kernel.org/io-uring/[email protected]/
liburing (copy of the benchmark + some tests):
https://github.com/isilence/liburing/tree/zc_v3
kernel repo:
https://github.com/isilence/linux/tree/zc_v3
API design overview:
First we take an internal zerocopy handler, aka struct ubuf_info, and let
io_uring to pass it into the network layer in struct msghdr. io_uring
stores them as wrapping into struct io_notif.
It also has an array of so called notification slots, each keeps one and
only one active notifier at a time, to which the userspace can bind requests
by specifying the slot index. Then the userspace can request to flush a
notifier, so when all buffers and requests used with this notifier
complete/freed it'll post one CQE.
The userspace can't bind new requests to a flushed notifier, however,
it can use the slot as flushing automatically replaces the notifier with
a new one.
Changelog:
RFC v2 -> RFC v3:
TCP support
accounting for normal (non-registered) buffers
allow to combine reg and normal requests within a notifier
notification flushing via IORING_OP_RSRC_UPDATE
overriding io_uring notification tag/user_data
add ubuf_info submmision side reference caching/batching
fix buffer indexing
fix io-wq ->uring_lock locking
fix bugs when mixing with MSG_ZEROCOPY
fix managed refs bugs in skbuff.c
numerous cleanups
RFC -> RFC v2:
remove additional overhead for non-zc from skb_release_data()
avoid msg propagation, hide extra bits of non-zc overhead
task_work based "buffer free" notifications
improve io_uring's notification refcounting
added 5/19, (no pfmemalloc tracking)
added 8/19 and 9/19 preventing small copies with zc
misc small changes
Pavel Begunkov (29):
ipv4: avoid partial copy for zc
ipv6: avoid partial copy for zc
skbuff: add SKBFL_DONT_ORPHAN flag
skbuff: carry external ubuf_info in msghdr
net: bvec specific path in zerocopy_sg_from_iter
net: optimise bvec-based zc page referencing
net: don't track pfmemalloc for managed frags
skbuff: don't mix ubuf_info of different types
ipv4/udp: support zc with managed data
ipv6/udp: support zc with managed data
tcp: support zc with managed data
tcp: kill extra io_uring's uarg refcounting
net: let callers provide extra ubuf_info refs
io_uring: opcode independent fixed buf import
io_uring: add zc notification infrastructure
io_uring: cache struct io_notif
io_uring: complete notifiers in tw
io_uring: add notification slot registration
io_uring: rename IORING_OP_FILES_UPDATE
io_uring: add zc notification flush requests
io_uring: wire send zc request type
io_uring: account locked pages for non-fixed zc
io_uring: allow to pass addr into sendzc
io_uring: add rsrc referencing for notifiers
io_uring: sendzc with fixed buffers
io_uring: flush notifiers after sendzc
io_uring: allow to override zc tag on flush
io_uring: batch submission notif referencing
selftests/io_uring: test zerocopy send
fs/io_uring.c | 566 +++++++++++++++-
include/linux/skbuff.h | 59 +-
include/linux/socket.h | 8 +
include/uapi/linux/io_uring.h | 43 +-
net/compat.c | 2 +
net/core/datagram.c | 53 +-
net/core/skbuff.c | 35 +-
net/ipv4/ip_output.c | 66 +-
net/ipv4/tcp.c | 56 +-
net/ipv6/ip6_output.c | 65 +-
net/socket.c | 6 +
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/io_uring_zerocopy_tx.c | 605 ++++++++++++++++++
.../selftests/net/io_uring_zerocopy_tx.sh | 131 ++++
14 files changed, 1613 insertions(+), 83 deletions(-)
create mode 100644 tools/testing/selftests/net/io_uring_zerocopy_tx.c
create mode 100755 tools/testing/selftests/net/io_uring_zerocopy_tx.sh
--
2.36.1
next reply other threads:[~2022-06-28 19:00 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-28 18:56 Pavel Begunkov [this message]
2022-06-28 18:56 ` [RFC net-next v3 01/29] ipv4: avoid partial copy for zc Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 02/29] ipv6: " Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 03/29] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 04/29] skbuff: carry external ubuf_info in msghdr Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 05/29] net: bvec specific path in zerocopy_sg_from_iter Pavel Begunkov
2022-06-28 20:06 ` Al Viro
2022-06-28 21:33 ` Pavel Begunkov
2022-06-28 22:52 ` David Ahern
2022-07-04 13:31 ` Pavel Begunkov
2022-07-05 2:28 ` David Ahern
2022-07-05 14:03 ` Pavel Begunkov
2022-07-05 22:09 ` Pavel Begunkov
2022-07-06 15:11 ` David Ahern
2022-06-28 18:56 ` [RFC net-next v3 06/29] net: optimise bvec-based zc page referencing Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 07/29] net: don't track pfmemalloc for managed frags Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 08/29] skbuff: don't mix ubuf_info of different types Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 09/29] ipv4/udp: support zc with managed data Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 10/29] ipv6/udp: " Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 11/29] tcp: " Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 12/29] tcp: kill extra io_uring's uarg refcounting Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 13/29] net: let callers provide extra ubuf_info refs Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 14/29] io_uring: opcode independent fixed buf import Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 15/29] io_uring: add zc notification infrastructure Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 16/29] io_uring: cache struct io_notif Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 17/29] io_uring: complete notifiers in tw Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 18/29] io_uring: add notification slot registration Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 19/29] io_uring: rename IORING_OP_FILES_UPDATE Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 20/29] io_uring: add zc notification flush requests Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 21/29] io_uring: wire send zc request type Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 22/29] io_uring: account locked pages for non-fixed zc Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 23/29] io_uring: allow to pass addr into sendzc Pavel Begunkov
2022-06-29 7:42 ` Stefan Metzmacher
2022-06-29 9:53 ` Pavel Begunkov
2022-08-13 8:45 ` Stefan Metzmacher
2022-08-15 9:46 ` Pavel Begunkov
2022-08-15 11:40 ` Stefan Metzmacher
2022-08-15 12:19 ` Pavel Begunkov
2022-08-15 13:30 ` Stefan Metzmacher
2022-08-15 14:09 ` Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 24/29] io_uring: add rsrc referencing for notifiers Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 25/29] io_uring: sendzc with fixed buffers Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 26/29] io_uring: flush notifiers after sendzc Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 27/29] io_uring: allow to override zc tag on flush Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 28/29] io_uring: batch submission notif referencing Pavel Begunkov
2022-06-28 18:56 ` [RFC net-next v3 29/29] selftests/io_uring: test zerocopy send Pavel Begunkov
2022-06-28 19:03 ` [RFC net-next v3 00/29] io_uring " Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox