From: Pavel Begunkov <[email protected]>
To: David Ahern <[email protected]>,
[email protected], [email protected],
[email protected]
Cc: Jakub Kicinski <[email protected]>,
Jonathan Lemon <[email protected]>,
"David S . Miller" <[email protected]>,
Willem de Bruijn <[email protected]>,
Eric Dumazet <[email protected]>,
Hideaki YOSHIFUJI <[email protected]>,
David Ahern <[email protected]>, Jens Axboe <[email protected]>
Subject: Re: [RFC 00/12] io_uring zerocopy send
Date: Wed, 1 Dec 2021 15:32:36 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 12/1/21 03:10, David Ahern wrote:
> On 11/30/21 8:18 AM, Pavel Begunkov wrote:
>> Early proof of concept for zerocopy send via io_uring. This is just
>> an RFC, there are details yet to be figured out, but hope to gather
>> some feedback.
>>
>> Benchmarking udp (65435 bytes) with a dummy net device (mtu=0xffff):
>> The best case io_uring=116079 MB/s vs msg_zerocopy=47421 MB/s,
>> or 2.44 times faster.
>>
>> № | test: | BW (MB/s) | speedup
>> 1 | msg_zerocopy (non-zc) | 18281 | 0.38
>> 2 | msg_zerocopy -z (baseline) | 47421 | 1
>> 3 | io_uring (@flush=false, nr_reqs=1) | 96534 | 2.03
>> 4 | io_uring (@flush=true, nr_reqs=1) | 89310 | 1.88
>> 5 | io_uring (@flush=false, nr_reqs=8) | 116079 | 2.44
>> 6 | io_uring (@flush=true, nr_reqs=8) | 109722 | 2.31
>>
>> Based on selftests/.../msg_zerocopy but more limited. You can use
>> msg_zerocopy -r as usual for receive side.
>>
> ...
>
> Can you state the exact command lines you are running for all of the
> commands? I tried this set (and commands referenced below) and my
Sure. First, for dummy I set mtu by hand, not sure can do it from
the userspace, can I? Without it __ip_append_data() falls into
non-zerocopy path.
diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c
index f82ad7419508..5c5aeacdabd5 100644
--- a/drivers/net/dummy.c
+++ b/drivers/net/dummy.c
@@ -132,7 +132,8 @@ static void dummy_setup(struct net_device *dev)
eth_hw_addr_random(dev);
dev->min_mtu = 0;
- dev->max_mtu = 0;
+ dev->mtu = 0xffff;
+ dev->max_mtu = 0xffff;
}
# dummy configuration
modprobe dummy numdummies=1
ip link set dummy0 up
# force requests to <dummy_ip_addr> go through the dummy device
ip route add <dummy_ip_addr> dev dummy0
With dummy I was just sinking the traffic to the dummy device,
was good enough for me. Omitting "taskset" and "nice":
send-zc -4 -D <dummy_ip_addr> -t 10 udp
Similarly with msg_zerocopy:
<kernel>/tools/testing/selftests/net/msg_zerocopy -4 -p 6666 -D <dummy_ip_addr> -t 10 -z udp
For loopback testing, as zerocopy is not allowed for it as Willem explained in
the original MSG_ZEROCOPY cover-letter, I used a hack to bypass it:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ebb12a7d386d..42df33b175ce 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2854,9 +2854,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask)
/* Frags must be orphaned, even if refcounted, if skb might loop to rx path */
static inline int skb_orphan_frags_rx(struct sk_buff *skb, gfp_t gfp_mask)
{
- if (likely(!skb_zcopy(skb)))
- return 0;
- return skb_copy_ubufs(skb, gfp_mask);
+ return skb_orphan_frags(skb, gfp_mask);
}
/**
Then running those two lines below in parallel and looking for the numbers
send shows. It was in favor of io_uring for me, but don't remember
exactly. perf shows that "send-zc" spends lot of time receiving, so
wasn't testing performance of it after some point.
msg_zerocopy -r -v -4 -t 20 udp
send-zc -4 -D 127.0.0.1 -t 10 udp
> mileage varies quite a bit.
Interesting, any brief notes on the setup and the results? Dummy
or something real? io_uring doesn't show if it was really zerocopied
or not, but I assume you checked it (e.g. with perf/bpftrace).
I expected that @flush=true might be worse with real devices,
there is one spot to be patched, but apart from that and
cycles spend in a real LLD offseting the overhead, didn't
anticipate any problems. I'll see once I try a real device.
> Also, have you run this proposed change (and with TCP) across nodes
> (ie., not just local process to local process via dummy interface)?
Not yet, I tried dummy, and localhost UDP as per above and similarly
TCP. Just need to grab a server with a proper NIC, will try it out
soon.
>> Benchmark:
>> https://github.com/isilence/liburing.git zc_v1
>>
>> or this file in particular:
>> https://github.com/isilence/liburing/blob/zc_v1/test/send-zc.c
>>
>> To run the benchmark:
>> ```
>> cd <liburing_dir> && make && cd test
>> # ./send-zc -4 [-p <port>] [-s <payload_size>] -D <destination> udp
>> ./send-zc -4 -D 127.0.0.1 udp
>> ```
>>
>> msg_zerocopy can be used for the server side, e.g.
>> ```
>> cd <linux-kernel>/tools/testing/selftests/net && make
>> ./msg_zerocopy -4 -r [-p <port>] [-t <sec>] udp
>> ```
--
Pavel Begunkov
next prev parent reply other threads:[~2021-12-01 15:32 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-30 15:18 [RFC 00/12] io_uring zerocopy send Pavel Begunkov
2021-11-30 15:18 ` [RFC 01/12] skbuff: add SKBFL_DONT_ORPHAN flag Pavel Begunkov
2021-11-30 15:18 ` [RFC 02/12] skbuff: pass a struct ubuf_info in msghdr Pavel Begunkov
2021-11-30 15:18 ` [RFC 03/12] net/udp: add support msgdr::msg_ubuf Pavel Begunkov
2021-11-30 15:18 ` [RFC 04/12] net: add zerocopy_sg_from_iter for bvec Pavel Begunkov
2021-11-30 15:18 ` [RFC 05/12] net: optimise page get/free for bvec zc Pavel Begunkov
2021-12-01 19:20 ` Jonathan Lemon
2021-12-01 20:17 ` Pavel Begunkov
2021-11-30 15:18 ` [RFC 06/12] io_uring: add send notifiers registration Pavel Begunkov
2021-11-30 15:18 ` [RFC 07/12] io_uring: infrastructure for send zc notifications Pavel Begunkov
2021-11-30 15:18 ` [RFC 08/12] io_uring: wire send zc request type Pavel Begunkov
2021-11-30 15:18 ` [RFC 09/12] io_uring: add an option to flush zc notifications Pavel Begunkov
2021-11-30 15:18 ` [RFC 10/12] io_uring: opcode independent fixed buf import Pavel Begunkov
2021-11-30 15:18 ` [RFC 11/12] io_uring: sendzc with fixed buffers Pavel Begunkov
2021-11-30 15:19 ` [RFC 12/12] io_uring: cache struct ubuf_info Pavel Begunkov
2021-12-01 3:10 ` [RFC 00/12] io_uring zerocopy send David Ahern
2021-12-01 15:32 ` Pavel Begunkov [this message]
2021-12-01 17:57 ` David Ahern
[not found] ` <[email protected]>
2021-12-01 19:20 ` David Ahern
2021-12-01 20:15 ` Pavel Begunkov
2021-12-01 21:51 ` Martin KaFai Lau
2021-12-01 22:35 ` David Ahern
2021-12-01 23:07 ` Martin KaFai Lau
2021-12-01 23:18 ` Pavel Begunkov
2021-12-02 15:48 ` Pavel Begunkov
2021-12-02 17:40 ` Martin KaFai Lau
2021-12-01 20:42 ` Pavel Begunkov
2021-12-01 14:31 ` Pavel Begunkov
2021-12-01 17:49 ` David Ahern
2021-12-01 19:59 ` Pavel Begunkov
2021-12-01 18:10 ` Willem de Bruijn
2021-12-01 19:59 ` Pavel Begunkov
2021-12-01 20:29 ` Pavel Begunkov
2021-12-02 0:36 ` Willem de Bruijn
2021-12-02 16:25 ` Pavel Begunkov
2021-12-02 0:32 ` Willem de Bruijn
2021-12-02 16:45 ` Pavel Begunkov
2021-12-02 21:25 ` Willem de Bruijn
2021-12-03 16:19 ` Pavel Begunkov
2021-12-03 16:30 ` Willem de Bruijn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox