From: Paolo Abeni <[email protected]>
To: David Wei <[email protected]>,
[email protected], [email protected]
Cc: Jens Axboe <[email protected]>,
Pavel Begunkov <[email protected]>,
Jakub Kicinski <[email protected]>,
"David S. Miller" <[email protected]>,
Eric Dumazet <[email protected]>,
Jesper Dangaard Brouer <[email protected]>,
David Ahern <[email protected]>,
Mina Almasry <[email protected]>,
Stanislav Fomichev <[email protected]>,
Joe Damato <[email protected]>,
Pedro Tammela <[email protected]>
Subject: Re: [PATCH net-next v11 00/21] io_uring zero copy rx
Date: Fri, 17 Jan 2025 15:28:15 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 1/17/25 12:16 AM, David Wei wrote:
> This patchset adds support for zero copy rx into userspace pages using
> io_uring, eliminating a kernel to user copy.
>
> We configure a page pool that a driver uses to fill a hw rx queue to
> hand out user pages instead of kernel pages. Any data that ends up
> hitting this hw rx queue will thus be dma'd into userspace memory
> directly, without needing to be bounced through kernel memory. 'Reading'
> data out of a socket instead becomes a _notification_ mechanism, where
> the kernel tells userspace where the data is. The overall approach is
> similar to the devmem TCP proposal.
>
> This relies on hw header/data split, flow steering and RSS to ensure
> packet headers remain in kernel memory and only desired flows hit a hw
> rx queue configured for zero copy. Configuring this is outside of the
> scope of this patchset.
>
> We share netdev core infra with devmem TCP. The main difference is that
> io_uring is used for the uAPI and the lifetime of all objects are bound
> to an io_uring instance. Data is 'read' using a new io_uring request
> type. When done, data is returned via a new shared refill queue. A zero
> copy page pool refills a hw rx queue from this refill queue directly. Of
> course, the lifetime of these data buffers are managed by io_uring
> rather than the networking stack, with different refcounting rules.
>
> This patchset is the first step adding basic zero copy support. We will
> extend this iteratively with new features e.g. dynamically allocated
> zero copy areas, THP support, dmabuf support, improved copy fallback,
> general optimisations and more.
>
> In terms of netdev support, we're first targeting Broadcom bnxt. Patches
> aren't included since Taehee Yoo has already sent a more comprehensive
> patchset adding support in [1]. Google gve should already support this,
> and Mellanox mlx5 support is WIP pending driver changes.
>
> ===========
> Performance
> ===========
>
> Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet.
>
> Test setup:
> * AMD EPYC 9454
> * Broadcom BCM957508 200G
> * Kernel v6.11 base [2]
> * liburing fork [3]
> * kperf fork [4]
> * 4K MTU
> * Single TCP flow
>
> With application thread + net rx softirq pinned to _different_ cores:
>
> +-------------------------------+
> | epoll | io_uring |
> |-----------|-------------------|
> | 82.2 Gbps | 116.2 Gbps (+41%) |
> +-------------------------------+
>
> Pinned to _same_ core:
>
> +-------------------------------+
> | epoll | io_uring |
> |-----------|-------------------|
> | 62.6 Gbps | 80.9 Gbps (+29%) |
> +-------------------------------+
>
> =====
> Links
> =====
>
> Broadcom bnxt support:
> [1]: https://lore.kernel.org/netdev/[email protected]/
>
> Linux kernel branch:
> [2]: https://github.com/spikeh/linux.git zcrx/v9
>
> liburing for testing:
> [3]: https://github.com/isilence/liburing.git zcrx/next
>
> kperf for testing:
> [4]: https://git.kernel.dk/kperf.git
We are getting very close to the merge window. In order to get this
series merged before such deadline the point raised by Jakub on this
version must me resolved, the next iteration should land to the ML
before the end of the current working day and the series must apply
cleanly to net-next, so that it can be processed by our CI.
Thanks,
Paolo
next prev parent reply other threads:[~2025-01-17 14:28 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-16 23:16 [PATCH net-next v11 00/21] io_uring zero copy rx David Wei
2025-01-16 23:16 ` [PATCH net-next v11 01/21] net: page_pool: don't cast mp param to devmem David Wei
2025-01-16 23:16 ` [PATCH net-next v11 02/21] net: prefix devmem specific helpers David Wei
2025-01-16 23:16 ` [PATCH net-next v11 03/21] net: generalise net_iov chunk owners David Wei
2025-01-16 23:16 ` [PATCH net-next v11 04/21] net: page_pool: create hooks for custom memory providers David Wei
2025-01-17 1:46 ` Jakub Kicinski
2025-01-17 1:48 ` Jakub Kicinski
2025-01-17 2:15 ` David Wei
2025-01-16 23:16 ` [PATCH net-next v11 05/21] netdev: add io_uring memory provider info David Wei
2025-01-16 23:16 ` [PATCH net-next v11 06/21] net: page_pool: add callback for mp info printing David Wei
2025-01-16 23:16 ` [PATCH net-next v11 07/21] net: page_pool: add a mp hook to unregister_netdevice* David Wei
2025-01-16 23:16 ` [PATCH net-next v11 08/21] net: prepare for non devmem TCP memory providers David Wei
2025-01-16 23:16 ` [PATCH net-next v11 09/21] net: page_pool: add memory provider helpers David Wei
2025-01-17 1:47 ` Jakub Kicinski
2025-01-16 23:16 ` [PATCH net-next v11 10/21] net: add helpers for setting a memory provider on an rx queue David Wei
2025-01-17 1:50 ` Jakub Kicinski
2025-01-17 1:52 ` Jakub Kicinski
2025-01-17 2:17 ` David Wei
2025-01-17 2:25 ` Jakub Kicinski
2025-01-17 2:47 ` Pavel Begunkov
2025-01-17 22:11 ` Jakub Kicinski
2025-01-17 23:20 ` Pavel Begunkov
2025-01-18 2:08 ` Jakub Kicinski
2025-01-18 3:09 ` Pavel Begunkov
2025-01-16 23:16 ` [PATCH net-next v11 11/21] io_uring/zcrx: add interface queue and refill queue David Wei
2025-01-16 23:16 ` [PATCH net-next v11 12/21] io_uring/zcrx: add io_zcrx_area David Wei
2025-01-16 23:16 ` [PATCH net-next v11 13/21] io_uring/zcrx: grab a net device David Wei
2025-01-16 23:16 ` [PATCH net-next v11 14/21] io_uring/zcrx: implement zerocopy receive pp memory provider David Wei
2025-01-17 2:07 ` Jakub Kicinski
2025-01-17 2:17 ` David Wei
2025-01-16 23:16 ` [PATCH net-next v11 15/21] io_uring/zcrx: dma-map area for the device David Wei
2025-01-16 23:16 ` [PATCH net-next v11 16/21] io_uring/zcrx: add io_recvzc request David Wei
2025-01-16 23:16 ` [PATCH net-next v11 17/21] io_uring/zcrx: set pp memory provider for an rx queue David Wei
2025-01-17 2:13 ` Jakub Kicinski
2025-01-17 2:38 ` Pavel Begunkov
2025-01-16 23:17 ` [PATCH net-next v11 18/21] io_uring/zcrx: throttle receive requests David Wei
2025-01-16 23:17 ` [PATCH net-next v11 19/21] io_uring/zcrx: add copy fallback David Wei
2025-01-16 23:17 ` [PATCH net-next v11 20/21] net: add documentation for io_uring zcrx David Wei
2025-01-16 23:17 ` [PATCH net-next v11 21/21] io_uring/zcrx: add selftest David Wei
2025-01-17 14:28 ` Paolo Abeni [this message]
2025-01-17 14:42 ` [PATCH net-next v11 00/21] io_uring zero copy rx Pavel Begunkov
2025-01-17 16:05 ` Paolo Abeni
2025-01-17 16:19 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox