From: Paolo Abeni <[email protected]>
To: Pavel Begunkov <[email protected]>,
Jakub Kicinski <[email protected]>
Cc: Jens Axboe <[email protected]>,
"David S. Miller" <[email protected]>,
Eric Dumazet <[email protected]>,
Jesper Dangaard Brouer <[email protected]>,
David Ahern <[email protected]>,
Mina Almasry <[email protected]>,
Stanislav Fomichev <[email protected]>,
Joe Damato <[email protected]>,
Pedro Tammela <[email protected]>,
David Wei <[email protected]>,
[email protected], [email protected]
Subject: Re: [PATCH net-next v11 00/21] io_uring zero copy rx
Date: Fri, 17 Jan 2025 17:05:15 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 1/17/25 3:42 PM, Pavel Begunkov wrote:
> On 1/17/25 14:28, Paolo Abeni wrote:
>> On 1/17/25 12:16 AM, David Wei wrote:
>>> This patchset adds support for zero copy rx into userspace pages using
>>> io_uring, eliminating a kernel to user copy.
>>>
>>> We configure a page pool that a driver uses to fill a hw rx queue to
>>> hand out user pages instead of kernel pages. Any data that ends up
>>> hitting this hw rx queue will thus be dma'd into userspace memory
>>> directly, without needing to be bounced through kernel memory. 'Reading'
>>> data out of a socket instead becomes a _notification_ mechanism, where
>>> the kernel tells userspace where the data is. The overall approach is
>>> similar to the devmem TCP proposal.
>>>
>>> This relies on hw header/data split, flow steering ad RSS to ensure
>>> packet headers remain in kernel memory and only desired flows hit a hw
>>> rx queue configured for zero copy. Configuring this is outside of the
>>> scope of this patchset.
>>>
>>> We share netdev core infra with devmem TCP. The main difference is that
>>> io_uring is used for the uAPI and the lifetime of all objects are bound
>>> to an io_uring instance. Data is 'read' using a new io_uring request
>>> type. When done, data is returned via a new shared refill queue. A zero
>>> copy page pool refills a hw rx queue from this refill queue directly. Of
>>> course, the lifetime of these data buffers are managed by io_uring
>>> rather than the networking stack, with different refcounting rules.
>>>
>>> This patchset is the first step adding basic zero copy support. We will
>>> extend this iteratively with new features e.g. dynamically allocated
>>> zero copy areas, THP support, dmabuf support, improved copy fallback,
>>> general optimisations and more.
>>>
>>> In terms of netdev support, we're first targeting Broadcom bnxt. Patches
>>> aren't included since Taehee Yoo has already sent a more comprehensive
>>> patchset adding support in [1]. Google gve should already support this,
>>> and Mellanox mlx5 support is WIP pending driver changes.
>>>
>>> ===========
>>> Performance
>>> ===========
>>>
>>> Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet.
>>>
>>> Test setup:
>>> * AMD EPYC 9454
>>> * Broadcom BCM957508 200G
>>> * Kernel v6.11 base [2]
>>> * liburing fork [3]
>>> * kperf fork [4]
>>> * 4K MTU
>>> * Single TCP flow
>>>
>>> With application thread + net rx softirq pinned to _different_ cores:
>>>
>>> +-------------------------------+
>>> | epoll | io_uring |
>>> |-----------|-------------------|
>>> | 82.2 Gbps | 116.2 Gbps (+41%) |
>>> +-------------------------------+
>>>
>>> Pinned to _same_ core:
>>>
>>> +-------------------------------+
>>> | epoll | io_uring |
>>> |-----------|-------------------|
>>> | 62.6 Gbps | 80.9 Gbps (+29%) |
>>> +-------------------------------+
>>>
>>> =====
>>> Links
>>> =====
>>>
>>> Broadcom bnxt support:
>>> [1]: https://lore.kernel.org/netdev/[email protected]/
>>>
>>> Linux kernel branch:
>>> [2]: https://github.com/spikeh/linux.git zcrx/v9
>>>
>>> liburing for testing:
>>> [3]: https://github.com/isilence/liburing.git zcrx/next
>>>
>>> kperf for testing:
>>> [4]: https://git.kernel.dk/kperf.git
>>
>> We are getting very close to the merge window. In order to get this
>> series merged before such deadline the point raised by Jakub on this
>> version must me resolved, the next iteration should land to the ML
>> before the end of the current working day and the series must apply
>> cleanly to net-next, so that it can be processed by our CI.
>
> Sounds good, thanks Paolo.
>
> Since the merging is not trivial, I'll send a PR for the net/
> patches instead of reposting the entire thing, if that sounds right
> to you. The rest will be handled on the io_uring side.
I agree it is the more straight-forward path. @Jakub: do you see any
problem with the above?
/P
next prev parent reply other threads:[~2025-01-17 16:05 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 23:16 [PATCH net-next v11 00/21] io_uring zero copy rx David Wei
2025-01-16 23:16 ` [PATCH net-next v11 01/21] net: page_pool: don't cast mp param to devmem David Wei
2025-01-16 23:16 ` [PATCH net-next v11 02/21] net: prefix devmem specific helpers David Wei
2025-01-16 23:16 ` [PATCH net-next v11 03/21] net: generalise net_iov chunk owners David Wei
2025-01-16 23:16 ` [PATCH net-next v11 04/21] net: page_pool: create hooks for custom memory providers David Wei
2025-01-17 1:46 ` Jakub Kicinski
2025-01-17 1:48 ` Jakub Kicinski
2025-01-17 2:15 ` David Wei
2025-01-16 23:16 ` [PATCH net-next v11 05/21] netdev: add io_uring memory provider info David Wei
2025-01-16 23:16 ` [PATCH net-next v11 06/21] net: page_pool: add callback for mp info printing David Wei
2025-01-16 23:16 ` [PATCH net-next v11 07/21] net: page_pool: add a mp hook to unregister_netdevice* David Wei
2025-01-16 23:16 ` [PATCH net-next v11 08/21] net: prepare for non devmem TCP memory providers David Wei
2025-01-16 23:16 ` [PATCH net-next v11 09/21] net: page_pool: add memory provider helpers David Wei
2025-01-17 1:47 ` Jakub Kicinski
2025-01-16 23:16 ` [PATCH net-next v11 10/21] net: add helpers for setting a memory provider on an rx queue David Wei
2025-01-17 1:50 ` Jakub Kicinski
2025-01-17 1:52 ` Jakub Kicinski
2025-01-17 2:17 ` David Wei
2025-01-17 2:25 ` Jakub Kicinski
2025-01-17 2:47 ` Pavel Begunkov
2025-01-17 22:11 ` Jakub Kicinski
2025-01-17 23:20 ` Pavel Begunkov
2025-01-18 2:08 ` Jakub Kicinski
2025-01-18 3:09 ` Pavel Begunkov
2025-01-16 23:16 ` [PATCH net-next v11 11/21] io_uring/zcrx: add interface queue and refill queue David Wei
2025-01-16 23:16 ` [PATCH net-next v11 12/21] io_uring/zcrx: add io_zcrx_area David Wei
2025-01-16 23:16 ` [PATCH net-next v11 13/21] io_uring/zcrx: grab a net device David Wei
2025-01-16 23:16 ` [PATCH net-next v11 14/21] io_uring/zcrx: implement zerocopy receive pp memory provider David Wei
2025-01-17 2:07 ` Jakub Kicinski
2025-01-17 2:17 ` David Wei
2025-01-16 23:16 ` [PATCH net-next v11 15/21] io_uring/zcrx: dma-map area for the device David Wei
2025-01-16 23:16 ` [PATCH net-next v11 16/21] io_uring/zcrx: add io_recvzc request David Wei
2025-01-16 23:16 ` [PATCH net-next v11 17/21] io_uring/zcrx: set pp memory provider for an rx queue David Wei
2025-01-17 2:13 ` Jakub Kicinski
2025-01-17 2:38 ` Pavel Begunkov
2025-01-16 23:17 ` [PATCH net-next v11 18/21] io_uring/zcrx: throttle receive requests David Wei
2025-01-16 23:17 ` [PATCH net-next v11 19/21] io_uring/zcrx: add copy fallback David Wei
2025-01-16 23:17 ` [PATCH net-next v11 20/21] net: add documentation for io_uring zcrx David Wei
2025-01-16 23:17 ` [PATCH net-next v11 21/21] io_uring/zcrx: add selftest David Wei
2025-01-17 14:28 ` [PATCH net-next v11 00/21] io_uring zero copy rx Paolo Abeni
2025-01-17 14:42 ` Pavel Begunkov
2025-01-17 16:05 ` Paolo Abeni [this message]
2025-01-17 16:19 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox