From: Pavel Begunkov <[email protected]>
To: Paolo Abeni <[email protected]>, David Wei <[email protected]>,
[email protected], [email protected]
Cc: Jens Axboe <[email protected]>, Jakub Kicinski <[email protected]>,
"David S. Miller" <[email protected]>,
Eric Dumazet <[email protected]>,
Jesper Dangaard Brouer <[email protected]>,
David Ahern <[email protected]>,
Mina Almasry <[email protected]>,
Stanislav Fomichev <[email protected]>,
Joe Damato <[email protected]>,
Pedro Tammela <[email protected]>
Subject: Re: [PATCH net-next v11 00/21] io_uring zero copy rx
Date: Fri, 17 Jan 2025 14:42:30 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 1/17/25 14:28, Paolo Abeni wrote:
> On 1/17/25 12:16 AM, David Wei wrote:
>> This patchset adds support for zero copy rx into userspace pages using
>> io_uring, eliminating a kernel to user copy.
>>
>> We configure a page pool that a driver uses to fill a hw rx queue to
>> hand out user pages instead of kernel pages. Any data that ends up
>> hitting this hw rx queue will thus be dma'd into userspace memory
>> directly, without needing to be bounced through kernel memory. 'Reading'
>> data out of a socket instead becomes a _notification_ mechanism, where
>> the kernel tells userspace where the data is. The overall approach is
>> similar to the devmem TCP proposal.
>>
>> This relies on hw header/data split, flow steering and RSS to ensure
>> packet headers remain in kernel memory and only desired flows hit a hw
>> rx queue configured for zero copy. Configuring this is outside of the
>> scope of this patchset.
>>
>> We share netdev core infra with devmem TCP. The main difference is that
>> io_uring is used for the uAPI and the lifetime of all objects are bound
>> to an io_uring instance. Data is 'read' using a new io_uring request
>> type. When done, data is returned via a new shared refill queue. A zero
>> copy page pool refills a hw rx queue from this refill queue directly. Of
>> course, the lifetime of these data buffers are managed by io_uring
>> rather than the networking stack, with different refcounting rules.
>>
>> This patchset is the first step adding basic zero copy support. We will
>> extend this iteratively with new features e.g. dynamically allocated
>> zero copy areas, THP support, dmabuf support, improved copy fallback,
>> general optimisations and more.
>>
>> In terms of netdev support, we're first targeting Broadcom bnxt. Patches
>> aren't included since Taehee Yoo has already sent a more comprehensive
>> patchset adding support in [1]. Google gve should already support this,
>> and Mellanox mlx5 support is WIP pending driver changes.
>>
>> ===========
>> Performance
>> ===========
>>
>> Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet.
>>
>> Test setup:
>> * AMD EPYC 9454
>> * Broadcom BCM957508 200G
>> * Kernel v6.11 base [2]
>> * liburing fork [3]
>> * kperf fork [4]
>> * 4K MTU
>> * Single TCP flow
>>
>> With application thread + net rx softirq pinned to _different_ cores:
>>
>> +-------------------------------+
>> | epoll | io_uring |
>> |-----------|-------------------|
>> | 82.2 Gbps | 116.2 Gbps (+41%) |
>> +-------------------------------+
>>
>> Pinned to _same_ core:
>>
>> +-------------------------------+
>> | epoll | io_uring |
>> |-----------|-------------------|
>> | 62.6 Gbps | 80.9 Gbps (+29%) |
>> +-------------------------------+
>>
>> =====
>> Links
>> =====
>>
>> Broadcom bnxt support:
>> [1]: https://lore.kernel.org/netdev/[email protected]/
>>
>> Linux kernel branch:
>> [2]: https://github.com/spikeh/linux.git zcrx/v9
>>
>> liburing for testing:
>> [3]: https://github.com/isilence/liburing.git zcrx/next
>>
>> kperf for testing:
>> [4]: https://git.kernel.dk/kperf.git
>
> We are getting very close to the merge window. In order to get this
> series merged before such deadline the point raised by Jakub on this
> version must me resolved, the next iteration should land to the ML
> before the end of the current working day and the series must apply
> cleanly to net-next, so that it can be processed by our CI.
Sounds good, thanks Paolo.
Since the merging is not trivial, I'll send a PR for the net/
patches instead of reposting the entire thing, if that sounds right
to you. The rest will be handled on the io_uring side.
--
Pavel Begunkov
next prev parent reply other threads:[~2025-01-17 14:41 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 23:16 [PATCH net-next v11 00/21] io_uring zero copy rx David Wei
2025-01-16 23:16 ` [PATCH net-next v11 01/21] net: page_pool: don't cast mp param to devmem David Wei
2025-01-16 23:16 ` [PATCH net-next v11 02/21] net: prefix devmem specific helpers David Wei
2025-01-16 23:16 ` [PATCH net-next v11 03/21] net: generalise net_iov chunk owners David Wei
2025-01-16 23:16 ` [PATCH net-next v11 04/21] net: page_pool: create hooks for custom memory providers David Wei
2025-01-17 1:46 ` Jakub Kicinski
2025-01-17 1:48 ` Jakub Kicinski
2025-01-17 2:15 ` David Wei
2025-01-16 23:16 ` [PATCH net-next v11 05/21] netdev: add io_uring memory provider info David Wei
2025-01-16 23:16 ` [PATCH net-next v11 06/21] net: page_pool: add callback for mp info printing David Wei
2025-01-16 23:16 ` [PATCH net-next v11 07/21] net: page_pool: add a mp hook to unregister_netdevice* David Wei
2025-01-16 23:16 ` [PATCH net-next v11 08/21] net: prepare for non devmem TCP memory providers David Wei
2025-01-16 23:16 ` [PATCH net-next v11 09/21] net: page_pool: add memory provider helpers David Wei
2025-01-17 1:47 ` Jakub Kicinski
2025-01-16 23:16 ` [PATCH net-next v11 10/21] net: add helpers for setting a memory provider on an rx queue David Wei
2025-01-17 1:50 ` Jakub Kicinski
2025-01-17 1:52 ` Jakub Kicinski
2025-01-17 2:17 ` David Wei
2025-01-17 2:25 ` Jakub Kicinski
2025-01-17 2:47 ` Pavel Begunkov
2025-01-17 22:11 ` Jakub Kicinski
2025-01-17 23:20 ` Pavel Begunkov
2025-01-18 2:08 ` Jakub Kicinski
2025-01-18 3:09 ` Pavel Begunkov
2025-01-16 23:16 ` [PATCH net-next v11 11/21] io_uring/zcrx: add interface queue and refill queue David Wei
2025-01-16 23:16 ` [PATCH net-next v11 12/21] io_uring/zcrx: add io_zcrx_area David Wei
2025-01-16 23:16 ` [PATCH net-next v11 13/21] io_uring/zcrx: grab a net device David Wei
2025-01-16 23:16 ` [PATCH net-next v11 14/21] io_uring/zcrx: implement zerocopy receive pp memory provider David Wei
2025-01-17 2:07 ` Jakub Kicinski
2025-01-17 2:17 ` David Wei
2025-01-16 23:16 ` [PATCH net-next v11 15/21] io_uring/zcrx: dma-map area for the device David Wei
2025-01-16 23:16 ` [PATCH net-next v11 16/21] io_uring/zcrx: add io_recvzc request David Wei
2025-01-16 23:16 ` [PATCH net-next v11 17/21] io_uring/zcrx: set pp memory provider for an rx queue David Wei
2025-01-17 2:13 ` Jakub Kicinski
2025-01-17 2:38 ` Pavel Begunkov
2025-01-16 23:17 ` [PATCH net-next v11 18/21] io_uring/zcrx: throttle receive requests David Wei
2025-01-16 23:17 ` [PATCH net-next v11 19/21] io_uring/zcrx: add copy fallback David Wei
2025-01-16 23:17 ` [PATCH net-next v11 20/21] net: add documentation for io_uring zcrx David Wei
2025-01-16 23:17 ` [PATCH net-next v11 21/21] io_uring/zcrx: add selftest David Wei
2025-01-17 14:28 ` [PATCH net-next v11 00/21] io_uring zero copy rx Paolo Abeni
2025-01-17 14:42 ` Pavel Begunkov [this message]
2025-01-17 16:05 ` Paolo Abeni
2025-01-17 16:19 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox