* [PATCH net-next v12 00/10] io_uring zero copy rx
@ 2025-01-17 16:11 Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 01/10] net: page_pool: don't cast mp param to devmem Pavel Begunkov
` (10 more replies)
0 siblings, 11 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
This patchset contains net/ patches needed by a new io_uring request
implementing zero copy rx into userspace pages, eliminating a kernel
to user copy.
We configure a page pool that a driver uses to fill a hw rx queue to
hand out user pages instead of kernel pages. Any data that ends up
hitting this hw rx queue will thus be dma'd into userspace memory
directly, without needing to be bounced through kernel memory. 'Reading'
data out of a socket instead becomes a _notification_ mechanism, where
the kernel tells userspace where the data is. The overall approach is
similar to the devmem TCP proposal.
This relies on hw header/data split, flow steering and RSS to ensure
packet headers remain in kernel memory and only desired flows hit a hw
rx queue configured for zero copy. Configuring this is outside of the
scope of this patchset.
We share netdev core infra with devmem TCP. The main difference is that
io_uring is used for the uAPI and the lifetime of all objects are bound
to an io_uring instance. Data is 'read' using a new io_uring request
type. When done, data is returned via a new shared refill queue. A zero
copy page pool refills a hw rx queue from this refill queue directly. Of
course, the lifetime of these data buffers are managed by io_uring
rather than the networking stack, with different refcounting rules.
This patchset is the first step adding basic zero copy support. We will
extend this iteratively with new features e.g. dynamically allocated
zero copy areas, THP support, dmabuf support, improved copy fallback,
general optimisations and more.
In terms of netdev support, we're first targeting Broadcom bnxt. Patches
aren't included since Taehee Yoo has already sent a more comprehensive
patchset adding support in [1]. Google gve should already support this,
and Mellanox mlx5 support is WIP pending driver changes.
===========
Performance
===========
Note: Comparison with epoll + TCP_ZEROCOPY_RECEIVE isn't done yet.
Test setup:
* AMD EPYC 9454
* Broadcom BCM957508 200G
* Kernel v6.11 base [2]
* liburing fork [3]
* kperf fork [4]
* 4K MTU
* Single TCP flow
With application thread + net rx softirq pinned to _different_ cores:
+-------------------------------+
| epoll | io_uring |
|-----------|-------------------|
| 82.2 Gbps | 116.2 Gbps (+41%) |
+-------------------------------+
Pinned to _same_ core:
+-------------------------------+
| epoll | io_uring |
|-----------|-------------------|
| 62.6 Gbps | 80.9 Gbps (+29%) |
+-------------------------------+
=====
Links
=====
Broadcom bnxt support:
[1]: https://lore.kernel.org/netdev/[email protected]/
Linux kernel branch including io_uring bits:
[2]: https://github.com/isilence/linux.git zcrx/v12
liburing for testing:
[3]: https://github.com/isilence/liburing.git zcrx/next
kperf for testing:
[4]: https://git.kernel.dk/kperf.git
Changes in v12:
---------------
* Check nla_nest_start() errors
* Don't leak a netdev, add missing netdev_put()
* Warn on failed queue restart during close
Changes in v11:
---------------
* Add a shim provider helper for page_pool_set_dma_addr_netmem()
* Drop netdev in ->uninstall, pin struct device instead
* Add net_mp_open_rxq() and net_mp_close_rxq()
* Remove unneeded CFLAGS += -I/usr/include/ in selftest Makefile
Changes in v10:
---------------
* Fix !CONFIG_PAGE_POOL build
* Use acquire/release for RQ in examples
* Fix page_pool_ref_netmem for net_iov
* Move provider helpers / definitions into a new file
* Don’t export page_pool_{set,clear}_pp_info, introduce
net_mp_niov_{set,clear}_page_pool() instead
* Remove devmem.h from net/core/page_pool_user.c
* Add Netdev yaml for io-uring attribute
* Add memory provider ops for filling in Netlink info
Changes in v9:
--------------
* Fail proof against multiple page pools running the same memory
provider
* Lock the consumer side of the refill queue.
* Move scrub into io_uring exit.
* Kill napi_execute.
* Kill area init api and export finer grained net helpers as partial
init now need to happen in ->alloc_netmems()
* Separate user refcounting.
* Fix copy fallback path math.
* Add rodata check to page_pool_init()
* Fix incorrect path in documentation
Changes in v8:
--------------
* add documentation and selftest
* use io_uring regions for the refill ring
Changes in v7:
--------------
net:
* Use NAPI_F_PREFER_BUSY_POLL for napi_execute + stylistics changes.
Changes in v6:
--------------
Please note: Comparison with TCP_ZEROCOPY_RECEIVE isn't done yet.
net:
* Drop a devmem.h clean up patch.
* Migrate to netdev_get_by_index from deprecated API.
* Fix !CONFIG_NET_DEVMEM build.
* Don’t return into the page pool cache directly, use a new helper
* Refactor napi_execute
io_uring:
* Require IORING_RECV_MULTISHOT flag set.
* Add unselectable CONFIG_IO_URING_ZCRX.
* Pulled latest io_uring changes.
* Unexport io_uring_pp_zc_ops.
Changes in v5:
--------------
* Rebase on top of merged net_iov + netmem infra.
* Decouple net_iov from devmem TCP.
* Use netdev queue API to allocate an rx queue.
* Minor uAPI enhancements for future extensibility.
* QoS improvements with request throttling.
Changes in RFC v4:
------------------
* Rebased on top of Mina Almasry's TCP devmem patchset and latest
net-next, now sharing common infra e.g.:
* netmem_t and net_iovs
* Page pool memory provider
* The registered buffer (rbuf) completion queue where completions from
io_recvzc requests are posted is removed. Now these post into the main
completion queue, using big (32-byte) CQEs. The first 16 bytes is an
ordinary CQE, while the latter 16 bytes contain the io_uring_rbuf_cqe
as before. This vastly simplifies the uAPI and removes a level of
indirection in userspace when looking for payloads.
* The rbuf refill queue is still needed for userspace to return
buffers to kernel.
* Simplified code and uAPI on the io_uring side, particularly
io_recvzc() and io_zc_rx_recv(). Many unnecessary lines were removed
e.g. extra msg flags, readlen, etc.
Changes in RFC v3:
------------------
* Rebased on top of Jakub Kicinski's memory provider API RFC. The ZC
pool added is now a backend for memory provider.
* We're also reusing ppiov infrastructure. The refcounting rules stay
the same but it's shifted into ppiov->refcount. That lets us to
flexibly manage buffer lifetimes without adding any extra code to the
common networking paths. It'd also make it easier to support dmabufs
and device memory in the future.
* io_uring also knows about pages, and so ppiovs might unnecessarily
break tools inspecting data, that can easily be solved later.
Many patches are not for upstream as they depend on work in progress,
namely from Mina:
* struct netmem_t
* Driver ndo commands for Rx queue configs
* struct page_pool_iov and shared pp infra
Changes in RFC v2:
------------------
* Added copy fallback support if userspace memory allocated for ZC Rx
runs out, or if header splitting or flow steering fails.
* Added veth support for ZC Rx, for testing and demonstration. We will
need to figure out what driver would be best for such testing
functionality in the future. Perhaps netdevsim?
* Added socket registration API to io_uring to associate specific
sockets with ifqs/Rx queues for ZC.
* Added multi-socket support, such that multiple connections can be
steered into the same hardware Rx queue.
* Added Netbench server/client support.
David Wei (2):
netdev: add io_uring memory provider info
net: add helpers for setting a memory provider on an rx queue
Pavel Begunkov (8):
net: page_pool: don't cast mp param to devmem
net: prefix devmem specific helpers
net: generalise net_iov chunk owners
net: page_pool: create hooks for custom memory providers
net: page_pool: add callback for mp info printing
net: page_pool: add a mp hook to unregister_netdevice*
net: prepare for non devmem TCP memory providers
net: page_pool: add memory provider helpers
Documentation/netlink/specs/netdev.yaml | 15 ++++
include/net/netmem.h | 21 +++++-
include/net/page_pool/memory_provider.h | 45 ++++++++++++
include/net/page_pool/types.h | 4 ++
include/uapi/linux/netdev.h | 8 +++
net/core/dev.c | 16 ++++-
net/core/devmem.c | 93 ++++++++++++++++---------
net/core/devmem.h | 49 ++++++-------
net/core/netdev-genl.c | 11 +--
net/core/netdev_rx_queue.c | 62 +++++++++++++++++
net/core/page_pool.c | 51 +++++++++++---
net/core/page_pool_user.c | 7 +-
net/ipv4/tcp.c | 7 +-
tools/include/uapi/linux/netdev.h | 8 +++
14 files changed, 316 insertions(+), 81 deletions(-)
create mode 100644 include/net/page_pool/memory_provider.h
--
2.47.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next v12 01/10] net: page_pool: don't cast mp param to devmem
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 02/10] net: prefix devmem specific helpers Pavel Begunkov
` (9 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
page_pool_check_memory_provider() is a generic path and shouldn't assume
anything about the actual type of the memory provider argument. It's
fine while devmem is the only provider, but cast away the devmem
specific binding types to avoid confusion.
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Mina Almasry <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
net/core/page_pool_user.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 48335766c1bf..8d31c71bea1a 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -353,7 +353,7 @@ void page_pool_unlist(struct page_pool *pool)
int page_pool_check_memory_provider(struct net_device *dev,
struct netdev_rx_queue *rxq)
{
- struct net_devmem_dmabuf_binding *binding = rxq->mp_params.mp_priv;
+ void *binding = rxq->mp_params.mp_priv;
struct page_pool *pool;
struct hlist_node *n;
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 02/10] net: prefix devmem specific helpers
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 01/10] net: page_pool: don't cast mp param to devmem Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 03/10] net: generalise net_iov chunk owners Pavel Begunkov
` (8 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
Add prefixes to all helpers that are specific to devmem TCP, i.e.
net_iov_binding[_id].
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Mina Almasry <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
net/core/devmem.c | 2 +-
net/core/devmem.h | 14 +++++++-------
net/ipv4/tcp.c | 2 +-
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index c971b8aceac8..acd3e390a3da 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -94,7 +94,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
void net_devmem_free_dmabuf(struct net_iov *niov)
{
- struct net_devmem_dmabuf_binding *binding = net_iov_binding(niov);
+ struct net_devmem_dmabuf_binding *binding = net_devmem_iov_binding(niov);
unsigned long dma_addr = net_devmem_get_dma_addr(niov);
if (WARN_ON(!gen_pool_has_addr(binding->chunk_pool, dma_addr,
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 76099ef9c482..99782ddeca40 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -86,11 +86,16 @@ static inline unsigned int net_iov_idx(const struct net_iov *niov)
}
static inline struct net_devmem_dmabuf_binding *
-net_iov_binding(const struct net_iov *niov)
+net_devmem_iov_binding(const struct net_iov *niov)
{
return net_iov_owner(niov)->binding;
}
+static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
+{
+ return net_devmem_iov_binding(niov)->id;
+}
+
static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
{
struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
@@ -99,11 +104,6 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
((unsigned long)net_iov_idx(niov) << PAGE_SHIFT);
}
-static inline u32 net_iov_binding_id(const struct net_iov *niov)
-{
- return net_iov_owner(niov)->binding->id;
-}
-
static inline void
net_devmem_dmabuf_binding_get(struct net_devmem_dmabuf_binding *binding)
{
@@ -171,7 +171,7 @@ static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
return 0;
}
-static inline u32 net_iov_binding_id(const struct net_iov *niov)
+static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
{
return 0;
}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d704bda6c41..b872de9a8271 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2494,7 +2494,7 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
/* Will perform the exchange later */
dmabuf_cmsg.frag_token = tcp_xa_pool.tokens[tcp_xa_pool.idx];
- dmabuf_cmsg.dmabuf_id = net_iov_binding_id(niov);
+ dmabuf_cmsg.dmabuf_id = net_devmem_iov_binding_id(niov);
offset += copy;
remaining_len -= copy;
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 03/10] net: generalise net_iov chunk owners
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 01/10] net: page_pool: don't cast mp param to devmem Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 02/10] net: prefix devmem specific helpers Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 04/10] net: page_pool: create hooks for custom memory providers Pavel Begunkov
` (7 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
Currently net_iov stores a pointer to struct dmabuf_genpool_chunk_owner,
which serves as a useful abstraction to share data and provide a
context. However, it's too devmem specific, and we want to reuse it for
other memory providers, and for that we need to decouple net_iov from
devmem. Make net_iov to point to a new base structure called
net_iov_area, which dmabuf_genpool_chunk_owner extends.
Reviewed-by: Mina Almasry <[email protected]>
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/net/netmem.h | 21 ++++++++++++++++++++-
net/core/devmem.c | 25 +++++++++++++------------
net/core/devmem.h | 25 +++++++++----------------
3 files changed, 42 insertions(+), 29 deletions(-)
diff --git a/include/net/netmem.h b/include/net/netmem.h
index 1b58faa4f20f..c61d5b21e7b4 100644
--- a/include/net/netmem.h
+++ b/include/net/netmem.h
@@ -24,11 +24,20 @@ struct net_iov {
unsigned long __unused_padding;
unsigned long pp_magic;
struct page_pool *pp;
- struct dmabuf_genpool_chunk_owner *owner;
+ struct net_iov_area *owner;
unsigned long dma_addr;
atomic_long_t pp_ref_count;
};
+struct net_iov_area {
+ /* Array of net_iovs for this area. */
+ struct net_iov *niovs;
+ size_t num_niovs;
+
+ /* Offset into the dma-buf where this chunk starts. */
+ unsigned long base_virtual;
+};
+
/* These fields in struct page are used by the page_pool and net stack:
*
* struct {
@@ -54,6 +63,16 @@ NET_IOV_ASSERT_OFFSET(dma_addr, dma_addr);
NET_IOV_ASSERT_OFFSET(pp_ref_count, pp_ref_count);
#undef NET_IOV_ASSERT_OFFSET
+static inline struct net_iov_area *net_iov_owner(const struct net_iov *niov)
+{
+ return niov->owner;
+}
+
+static inline unsigned int net_iov_idx(const struct net_iov *niov)
+{
+ return niov - net_iov_owner(niov)->niovs;
+}
+
/* netmem */
/**
diff --git a/net/core/devmem.c b/net/core/devmem.c
index acd3e390a3da..3d91fba2bd26 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -33,14 +33,15 @@ static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
{
struct dmabuf_genpool_chunk_owner *owner = chunk->owner;
- kvfree(owner->niovs);
+ kvfree(owner->area.niovs);
kfree(owner);
}
static dma_addr_t net_devmem_get_dma_addr(const struct net_iov *niov)
{
- struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
+ struct dmabuf_genpool_chunk_owner *owner;
+ owner = net_devmem_iov_to_chunk_owner(niov);
return owner->base_dma_addr +
((dma_addr_t)net_iov_idx(niov) << PAGE_SHIFT);
}
@@ -83,7 +84,7 @@ net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
offset = dma_addr - owner->base_dma_addr;
index = offset / PAGE_SIZE;
- niov = &owner->niovs[index];
+ niov = &owner->area.niovs[index];
niov->pp_magic = 0;
niov->pp = NULL;
@@ -261,9 +262,9 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
goto err_free_chunks;
}
- owner->base_virtual = virtual;
+ owner->area.base_virtual = virtual;
owner->base_dma_addr = dma_addr;
- owner->num_niovs = len / PAGE_SIZE;
+ owner->area.num_niovs = len / PAGE_SIZE;
owner->binding = binding;
err = gen_pool_add_owner(binding->chunk_pool, dma_addr,
@@ -275,17 +276,17 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
goto err_free_chunks;
}
- owner->niovs = kvmalloc_array(owner->num_niovs,
- sizeof(*owner->niovs),
- GFP_KERNEL);
- if (!owner->niovs) {
+ owner->area.niovs = kvmalloc_array(owner->area.num_niovs,
+ sizeof(*owner->area.niovs),
+ GFP_KERNEL);
+ if (!owner->area.niovs) {
err = -ENOMEM;
goto err_free_chunks;
}
- for (i = 0; i < owner->num_niovs; i++) {
- niov = &owner->niovs[i];
- niov->owner = owner;
+ for (i = 0; i < owner->area.num_niovs; i++) {
+ niov = &owner->area.niovs[i];
+ niov->owner = &owner->area;
page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov),
net_devmem_get_dma_addr(niov));
}
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 99782ddeca40..a2b9913e9a17 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -10,6 +10,8 @@
#ifndef _NET_DEVMEM_H
#define _NET_DEVMEM_H
+#include <net/netmem.h>
+
struct netlink_ext_ack;
struct net_devmem_dmabuf_binding {
@@ -51,17 +53,11 @@ struct net_devmem_dmabuf_binding {
* allocations from this chunk.
*/
struct dmabuf_genpool_chunk_owner {
- /* Offset into the dma-buf where this chunk starts. */
- unsigned long base_virtual;
+ struct net_iov_area area;
+ struct net_devmem_dmabuf_binding *binding;
/* dma_addr of the start of the chunk. */
dma_addr_t base_dma_addr;
-
- /* Array of net_iovs for this chunk. */
- struct net_iov *niovs;
- size_t num_niovs;
-
- struct net_devmem_dmabuf_binding *binding;
};
void __net_devmem_dmabuf_binding_free(struct net_devmem_dmabuf_binding *binding);
@@ -75,20 +71,17 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
void dev_dmabuf_uninstall(struct net_device *dev);
static inline struct dmabuf_genpool_chunk_owner *
-net_iov_owner(const struct net_iov *niov)
+net_devmem_iov_to_chunk_owner(const struct net_iov *niov)
{
- return niov->owner;
-}
+ struct net_iov_area *owner = net_iov_owner(niov);
-static inline unsigned int net_iov_idx(const struct net_iov *niov)
-{
- return niov - net_iov_owner(niov)->niovs;
+ return container_of(owner, struct dmabuf_genpool_chunk_owner, area);
}
static inline struct net_devmem_dmabuf_binding *
net_devmem_iov_binding(const struct net_iov *niov)
{
- return net_iov_owner(niov)->binding;
+ return net_devmem_iov_to_chunk_owner(niov)->binding;
}
static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
@@ -98,7 +91,7 @@ static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
static inline unsigned long net_iov_virtual_addr(const struct net_iov *niov)
{
- struct dmabuf_genpool_chunk_owner *owner = net_iov_owner(niov);
+ struct net_iov_area *owner = net_iov_owner(niov);
return owner->base_virtual +
((unsigned long)net_iov_idx(niov) << PAGE_SHIFT);
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 04/10] net: page_pool: create hooks for custom memory providers
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
` (2 preceding siblings ...)
2025-01-17 16:11 ` [PATCH net-next v12 03/10] net: generalise net_iov chunk owners Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 05/10] netdev: add io_uring memory provider info Pavel Begunkov
` (6 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
A spin off from the original page pool memory providers patch by Jakub,
which allows extending page pools with custom allocators. One of such
providers is devmem TCP, and the other is io_uring zerocopy added in
following patches.
Link: https://lore.kernel.org/netdev/[email protected]/
Co-developed-by: Jakub Kicinski <[email protected]> # initial mp proposal
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/net/page_pool/memory_provider.h | 15 +++++++++++++++
include/net/page_pool/types.h | 4 ++++
net/core/devmem.c | 15 ++++++++++++++-
net/core/page_pool.c | 23 +++++++++++++++--------
4 files changed, 48 insertions(+), 9 deletions(-)
create mode 100644 include/net/page_pool/memory_provider.h
diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
new file mode 100644
index 000000000000..e49d0a52629d
--- /dev/null
+++ b/include/net/page_pool/memory_provider.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _NET_PAGE_POOL_MEMORY_PROVIDER_H
+#define _NET_PAGE_POOL_MEMORY_PROVIDER_H
+
+#include <net/netmem.h>
+#include <net/page_pool/types.h>
+
+struct memory_provider_ops {
+ netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp);
+ bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem);
+ int (*init)(struct page_pool *pool);
+ void (*destroy)(struct page_pool *pool);
+};
+
+#endif
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index ed4cd114180a..88f65c3e2ad9 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -152,8 +152,11 @@ struct page_pool_stats {
*/
#define PAGE_POOL_FRAG_GROUP_ALIGN (4 * sizeof(long))
+struct memory_provider_ops;
+
struct pp_memory_provider_params {
void *mp_priv;
+ const struct memory_provider_ops *mp_ops;
};
struct page_pool {
@@ -216,6 +219,7 @@ struct page_pool {
struct ptr_ring ring;
void *mp_priv;
+ const struct memory_provider_ops *mp_ops;
#ifdef CONFIG_PAGE_POOL_STATS
/* recycle stats are per-cpu to avoid locking */
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 3d91fba2bd26..1a88ab6faf06 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -16,6 +16,7 @@
#include <net/netdev_queues.h>
#include <net/netdev_rx_queue.h>
#include <net/page_pool/helpers.h>
+#include <net/page_pool/memory_provider.h>
#include <trace/events/page_pool.h>
#include "devmem.h"
@@ -27,6 +28,8 @@
/* Protected by rtnl_lock() */
static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1);
+static const struct memory_provider_ops dmabuf_devmem_ops;
+
static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
struct gen_pool_chunk *chunk,
void *not_used)
@@ -118,6 +121,7 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding)
WARN_ON(rxq->mp_params.mp_priv != binding);
rxq->mp_params.mp_priv = NULL;
+ rxq->mp_params.mp_ops = NULL;
rxq_idx = get_netdev_rx_queue_index(rxq);
@@ -153,7 +157,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
}
rxq = __netif_get_rx_queue(dev, rxq_idx);
- if (rxq->mp_params.mp_priv) {
+ if (rxq->mp_params.mp_ops) {
NL_SET_ERR_MSG(extack, "designated queue already memory provider bound");
return -EEXIST;
}
@@ -171,6 +175,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
return err;
rxq->mp_params.mp_priv = binding;
+ rxq->mp_params.mp_ops = &dmabuf_devmem_ops;
err = netdev_rx_queue_restart(dev, rxq_idx);
if (err)
@@ -180,6 +185,7 @@ int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
err_xa_erase:
rxq->mp_params.mp_priv = NULL;
+ rxq->mp_params.mp_ops = NULL;
xa_erase(&binding->bound_rxqs, xa_idx);
return err;
@@ -399,3 +405,10 @@ bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem)
/* We don't want the page pool put_page()ing our net_iovs. */
return false;
}
+
+static const struct memory_provider_ops dmabuf_devmem_ops = {
+ .init = mp_dmabuf_devmem_init,
+ .destroy = mp_dmabuf_devmem_destroy,
+ .alloc_netmems = mp_dmabuf_devmem_alloc_netmems,
+ .release_netmem = mp_dmabuf_devmem_release_page,
+};
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a3de752c5178..199564b03533 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -13,6 +13,7 @@
#include <net/netdev_rx_queue.h>
#include <net/page_pool/helpers.h>
+#include <net/page_pool/memory_provider.h>
#include <net/xdp.h>
#include <linux/dma-direction.h>
@@ -285,13 +286,19 @@ static int page_pool_init(struct page_pool *pool,
rxq = __netif_get_rx_queue(pool->slow.netdev,
pool->slow.queue_idx);
pool->mp_priv = rxq->mp_params.mp_priv;
+ pool->mp_ops = rxq->mp_params.mp_ops;
}
- if (pool->mp_priv) {
+ if (pool->mp_ops) {
if (!pool->dma_map || !pool->dma_sync)
return -EOPNOTSUPP;
- err = mp_dmabuf_devmem_init(pool);
+ if (WARN_ON(!is_kernel_rodata((unsigned long)pool->mp_ops))) {
+ err = -EFAULT;
+ goto free_ptr_ring;
+ }
+
+ err = pool->mp_ops->init(pool);
if (err) {
pr_warn("%s() mem-provider init failed %d\n", __func__,
err);
@@ -588,8 +595,8 @@ netmem_ref page_pool_alloc_netmems(struct page_pool *pool, gfp_t gfp)
return netmem;
/* Slow-path: cache empty, do real allocation */
- if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv)
- netmem = mp_dmabuf_devmem_alloc_netmems(pool, gfp);
+ if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
+ netmem = pool->mp_ops->alloc_netmems(pool, gfp);
else
netmem = __page_pool_alloc_pages_slow(pool, gfp);
return netmem;
@@ -680,8 +687,8 @@ void page_pool_return_page(struct page_pool *pool, netmem_ref netmem)
bool put;
put = true;
- if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_priv)
- put = mp_dmabuf_devmem_release_page(pool, netmem);
+ if (static_branch_unlikely(&page_pool_mem_providers) && pool->mp_ops)
+ put = pool->mp_ops->release_netmem(pool, netmem);
else
__page_pool_release_page_dma(pool, netmem);
@@ -1049,8 +1056,8 @@ static void __page_pool_destroy(struct page_pool *pool)
page_pool_unlist(pool);
page_pool_uninit(pool);
- if (pool->mp_priv) {
- mp_dmabuf_devmem_destroy(pool);
+ if (pool->mp_ops) {
+ pool->mp_ops->destroy(pool);
static_branch_dec(&page_pool_mem_providers);
}
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 05/10] netdev: add io_uring memory provider info
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
` (3 preceding siblings ...)
2025-01-17 16:11 ` [PATCH net-next v12 04/10] net: page_pool: create hooks for custom memory providers Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 06/10] net: page_pool: add callback for mp info printing Pavel Begunkov
` (5 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
From: David Wei <[email protected]>
Add a nested attribute for io_uring memory provider info. For now it is
empty and its presence indicates that a particular page pool or queue
has an io_uring memory provider attached.
$ ./cli.py --spec netlink/specs/netdev.yaml --dump page-pool-get
[{'id': 80,
'ifindex': 2,
'inflight': 64,
'inflight-mem': 262144,
'napi-id': 525},
{'id': 79,
'ifindex': 2,
'inflight': 320,
'inflight-mem': 1310720,
'io_uring': {},
'napi-id': 525},
...
$ ./cli.py --spec netlink/specs/netdev.yaml --dump queue-get
[{'id': 0, 'ifindex': 1, 'type': 'rx'},
{'id': 0, 'ifindex': 1, 'type': 'tx'},
{'id': 0, 'ifindex': 2, 'napi-id': 513, 'type': 'rx'},
{'id': 1, 'ifindex': 2, 'napi-id': 514, 'type': 'rx'},
...
{'id': 12, 'ifindex': 2, 'io_uring': {}, 'napi-id': 525, 'type': 'rx'},
...
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David Wei <[email protected]>
---
Documentation/netlink/specs/netdev.yaml | 15 +++++++++++++++
include/uapi/linux/netdev.h | 8 ++++++++
tools/include/uapi/linux/netdev.h | 8 ++++++++
3 files changed, 31 insertions(+)
diff --git a/Documentation/netlink/specs/netdev.yaml b/Documentation/netlink/specs/netdev.yaml
index cbb544bd6c84..288923e965ae 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -114,6 +114,9 @@ attribute-sets:
doc: Bitmask of enabled AF_XDP features.
type: u64
enum: xsk-flags
+ -
+ name: io-uring-provider-info
+ attributes: []
-
name: page-pool
attributes:
@@ -171,6 +174,11 @@ attribute-sets:
name: dmabuf
doc: ID of the dmabuf this page-pool is attached to.
type: u32
+ -
+ name: io-uring
+ doc: io-uring memory provider information.
+ type: nest
+ nested-attributes: io-uring-provider-info
-
name: page-pool-info
subset-of: page-pool
@@ -296,6 +304,11 @@ attribute-sets:
name: dmabuf
doc: ID of the dmabuf attached to this queue, if any.
type: u32
+ -
+ name: io-uring
+ doc: io_uring memory provider information.
+ type: nest
+ nested-attributes: io-uring-provider-info
-
name: qstats
@@ -572,6 +585,7 @@ operations:
- inflight-mem
- detach-time
- dmabuf
+ - io-uring
dump:
reply: *pp-reply
config-cond: page-pool
@@ -637,6 +651,7 @@ operations:
- napi-id
- ifindex
- dmabuf
+ - io-uring
dump:
request:
attributes:
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index e4be227d3ad6..684090732068 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -86,6 +86,12 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
+enum {
+
+ __NETDEV_A_IO_URING_PROVIDER_INFO_MAX,
+ NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1)
+};
+
enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
@@ -94,6 +100,7 @@ enum {
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
NETDEV_A_PAGE_POOL_DETACH_TIME,
NETDEV_A_PAGE_POOL_DMABUF,
+ NETDEV_A_PAGE_POOL_IO_URING,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
@@ -136,6 +143,7 @@ enum {
NETDEV_A_QUEUE_TYPE,
NETDEV_A_QUEUE_NAPI_ID,
NETDEV_A_QUEUE_DMABUF,
+ NETDEV_A_QUEUE_IO_URING,
__NETDEV_A_QUEUE_MAX,
NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index e4be227d3ad6..684090732068 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -86,6 +86,12 @@ enum {
NETDEV_A_DEV_MAX = (__NETDEV_A_DEV_MAX - 1)
};
+enum {
+
+ __NETDEV_A_IO_URING_PROVIDER_INFO_MAX,
+ NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1)
+};
+
enum {
NETDEV_A_PAGE_POOL_ID = 1,
NETDEV_A_PAGE_POOL_IFINDEX,
@@ -94,6 +100,7 @@ enum {
NETDEV_A_PAGE_POOL_INFLIGHT_MEM,
NETDEV_A_PAGE_POOL_DETACH_TIME,
NETDEV_A_PAGE_POOL_DMABUF,
+ NETDEV_A_PAGE_POOL_IO_URING,
__NETDEV_A_PAGE_POOL_MAX,
NETDEV_A_PAGE_POOL_MAX = (__NETDEV_A_PAGE_POOL_MAX - 1)
@@ -136,6 +143,7 @@ enum {
NETDEV_A_QUEUE_TYPE,
NETDEV_A_QUEUE_NAPI_ID,
NETDEV_A_QUEUE_DMABUF,
+ NETDEV_A_QUEUE_IO_URING,
__NETDEV_A_QUEUE_MAX,
NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1)
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 06/10] net: page_pool: add callback for mp info printing
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
` (4 preceding siblings ...)
2025-01-17 16:11 ` [PATCH net-next v12 05/10] netdev: add io_uring memory provider info Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 07/10] net: page_pool: add a mp hook to unregister_netdevice* Pavel Begunkov
` (4 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
Add a mandatory callback that prints information about the memory
provider to netlink.
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/net/page_pool/memory_provider.h | 5 +++++
net/core/devmem.c | 10 ++++++++++
net/core/netdev-genl.c | 11 ++++++-----
net/core/page_pool_user.c | 5 ++---
4 files changed, 23 insertions(+), 8 deletions(-)
diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index e49d0a52629d..6d10a0959d00 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -5,11 +5,16 @@
#include <net/netmem.h>
#include <net/page_pool/types.h>
+struct netdev_rx_queue;
+struct sk_buff;
+
struct memory_provider_ops {
netmem_ref (*alloc_netmems)(struct page_pool *pool, gfp_t gfp);
bool (*release_netmem)(struct page_pool *pool, netmem_ref netmem);
int (*init)(struct page_pool *pool);
void (*destroy)(struct page_pool *pool);
+ int (*nl_fill)(void *mp_priv, struct sk_buff *rsp,
+ struct netdev_rx_queue *rxq);
};
#endif
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 1a88ab6faf06..b33b978fa28f 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -406,9 +406,19 @@ bool mp_dmabuf_devmem_release_page(struct page_pool *pool, netmem_ref netmem)
return false;
}
+static int mp_dmabuf_devmem_nl_fill(void *mp_priv, struct sk_buff *rsp,
+ struct netdev_rx_queue *rxq)
+{
+ const struct net_devmem_dmabuf_binding *binding = mp_priv;
+ int type = rxq ? NETDEV_A_QUEUE_DMABUF : NETDEV_A_PAGE_POOL_DMABUF;
+
+ return nla_put_u32(rsp, type, binding->id);
+}
+
static const struct memory_provider_ops dmabuf_devmem_ops = {
.init = mp_dmabuf_devmem_init,
.destroy = mp_dmabuf_devmem_destroy,
.alloc_netmems = mp_dmabuf_devmem_alloc_netmems,
.release_netmem = mp_dmabuf_devmem_release_page,
+ .nl_fill = mp_dmabuf_devmem_nl_fill,
};
diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
index 715f85c6b62e..5b459b4fef46 100644
--- a/net/core/netdev-genl.c
+++ b/net/core/netdev-genl.c
@@ -10,6 +10,7 @@
#include <net/sock.h>
#include <net/xdp.h>
#include <net/xdp_sock.h>
+#include <net/page_pool/memory_provider.h>
#include "dev.h"
#include "devmem.h"
@@ -368,7 +369,7 @@ static int
netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
u32 q_idx, u32 q_type, const struct genl_info *info)
{
- struct net_devmem_dmabuf_binding *binding;
+ struct pp_memory_provider_params *params;
struct netdev_rx_queue *rxq;
struct netdev_queue *txq;
void *hdr;
@@ -385,15 +386,15 @@ netdev_nl_queue_fill_one(struct sk_buff *rsp, struct net_device *netdev,
switch (q_type) {
case NETDEV_QUEUE_TYPE_RX:
rxq = __netif_get_rx_queue(netdev, q_idx);
+
if (rxq->napi && nla_put_u32(rsp, NETDEV_A_QUEUE_NAPI_ID,
rxq->napi->napi_id))
goto nla_put_failure;
- binding = rxq->mp_params.mp_priv;
- if (binding &&
- nla_put_u32(rsp, NETDEV_A_QUEUE_DMABUF, binding->id))
+ params = &rxq->mp_params;
+ if (params->mp_ops &&
+ params->mp_ops->nl_fill(params->mp_priv, rsp, rxq))
goto nla_put_failure;
-
break;
case NETDEV_QUEUE_TYPE_TX:
txq = netdev_get_tx_queue(netdev, q_idx);
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 8d31c71bea1a..bd017537fa80 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -7,9 +7,9 @@
#include <net/netdev_rx_queue.h>
#include <net/page_pool/helpers.h>
#include <net/page_pool/types.h>
+#include <net/page_pool/memory_provider.h>
#include <net/sock.h>
-#include "devmem.h"
#include "page_pool_priv.h"
#include "netdev-genl-gen.h"
@@ -214,7 +214,6 @@ static int
page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
const struct genl_info *info)
{
- struct net_devmem_dmabuf_binding *binding = pool->mp_priv;
size_t inflight, refsz;
void *hdr;
@@ -244,7 +243,7 @@ page_pool_nl_fill(struct sk_buff *rsp, const struct page_pool *pool,
pool->user.detach_time))
goto err_cancel;
- if (binding && nla_put_u32(rsp, NETDEV_A_PAGE_POOL_DMABUF, binding->id))
+ if (pool->mp_ops && pool->mp_ops->nl_fill(pool->mp_priv, rsp, NULL))
goto err_cancel;
genlmsg_end(rsp, hdr);
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 07/10] net: page_pool: add a mp hook to unregister_netdevice*
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
` (5 preceding siblings ...)
2025-01-17 16:11 ` [PATCH net-next v12 06/10] net: page_pool: add callback for mp info printing Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 08/10] net: prepare for non devmem TCP memory providers Pavel Begunkov
` (3 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
Devmem TCP needs a hook in unregister_netdevice_many_notify() to upkeep
the set tracking queues it's bound to, i.e. ->bound_rxqs. Instead of
devmem sticking directly out of the genetic path, add a mp function.
Reviewed-by: Jakub Kicinski <[email protected]>
Reviewed-by: Mina Almasry <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/net/page_pool/memory_provider.h | 1 +
net/core/dev.c | 16 ++++++++++-
net/core/devmem.c | 36 +++++++++++--------------
net/core/devmem.h | 5 ----
4 files changed, 32 insertions(+), 26 deletions(-)
diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index 6d10a0959d00..36469a7e649f 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -15,6 +15,7 @@ struct memory_provider_ops {
void (*destroy)(struct page_pool *pool);
int (*nl_fill)(void *mp_priv, struct sk_buff *rsp,
struct netdev_rx_queue *rxq);
+ void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq);
};
#endif
diff --git a/net/core/dev.c b/net/core/dev.c
index fe5f5855593d..e5a4ba3fc24f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -158,6 +158,7 @@
#include <net/netdev_rx_queue.h>
#include <net/page_pool/types.h>
#include <net/page_pool/helpers.h>
+#include <net/page_pool/memory_provider.h>
#include <net/rps.h>
#include <linux/phy_link_topology.h>
@@ -11721,6 +11722,19 @@ void unregister_netdevice_queue(struct net_device *dev, struct list_head *head)
}
EXPORT_SYMBOL(unregister_netdevice_queue);
+static void dev_memory_provider_uninstall(struct net_device *dev)
+{
+ unsigned int i;
+
+ for (i = 0; i < dev->real_num_rx_queues; i++) {
+ struct netdev_rx_queue *rxq = &dev->_rx[i];
+ struct pp_memory_provider_params *p = &rxq->mp_params;
+
+ if (p->mp_ops && p->mp_ops->uninstall)
+ p->mp_ops->uninstall(rxq->mp_params.mp_priv, rxq);
+ }
+}
+
void unregister_netdevice_many_notify(struct list_head *head,
u32 portid, const struct nlmsghdr *nlh)
{
@@ -11777,7 +11791,7 @@ void unregister_netdevice_many_notify(struct list_head *head,
dev_tcx_uninstall(dev);
dev_xdp_uninstall(dev);
bpf_dev_bound_netdev_unregister(dev);
- dev_dmabuf_uninstall(dev);
+ dev_memory_provider_uninstall(dev);
netdev_offload_xstats_disable_all(dev);
diff --git a/net/core/devmem.c b/net/core/devmem.c
index b33b978fa28f..ebb77d2f30f4 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -320,26 +320,6 @@ net_devmem_bind_dmabuf(struct net_device *dev, unsigned int dmabuf_fd,
return ERR_PTR(err);
}
-void dev_dmabuf_uninstall(struct net_device *dev)
-{
- struct net_devmem_dmabuf_binding *binding;
- struct netdev_rx_queue *rxq;
- unsigned long xa_idx;
- unsigned int i;
-
- for (i = 0; i < dev->real_num_rx_queues; i++) {
- binding = dev->_rx[i].mp_params.mp_priv;
- if (!binding)
- continue;
-
- xa_for_each(&binding->bound_rxqs, xa_idx, rxq)
- if (rxq == &dev->_rx[i]) {
- xa_erase(&binding->bound_rxqs, xa_idx);
- break;
- }
- }
-}
-
/*** "Dmabuf devmem memory provider" ***/
int mp_dmabuf_devmem_init(struct page_pool *pool)
@@ -415,10 +395,26 @@ static int mp_dmabuf_devmem_nl_fill(void *mp_priv, struct sk_buff *rsp,
return nla_put_u32(rsp, type, binding->id);
}
+static void mp_dmabuf_devmem_uninstall(void *mp_priv,
+ struct netdev_rx_queue *rxq)
+{
+ struct net_devmem_dmabuf_binding *binding = mp_priv;
+ struct netdev_rx_queue *bound_rxq;
+ unsigned long xa_idx;
+
+ xa_for_each(&binding->bound_rxqs, xa_idx, bound_rxq) {
+ if (bound_rxq == rxq) {
+ xa_erase(&binding->bound_rxqs, xa_idx);
+ break;
+ }
+ }
+}
+
static const struct memory_provider_ops dmabuf_devmem_ops = {
.init = mp_dmabuf_devmem_init,
.destroy = mp_dmabuf_devmem_destroy,
.alloc_netmems = mp_dmabuf_devmem_alloc_netmems,
.release_netmem = mp_dmabuf_devmem_release_page,
.nl_fill = mp_dmabuf_devmem_nl_fill,
+ .uninstall = mp_dmabuf_devmem_uninstall,
};
diff --git a/net/core/devmem.h b/net/core/devmem.h
index a2b9913e9a17..8e999fe2ae67 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -68,7 +68,6 @@ void net_devmem_unbind_dmabuf(struct net_devmem_dmabuf_binding *binding);
int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
struct net_devmem_dmabuf_binding *binding,
struct netlink_ext_ack *extack);
-void dev_dmabuf_uninstall(struct net_device *dev);
static inline struct dmabuf_genpool_chunk_owner *
net_devmem_iov_to_chunk_owner(const struct net_iov *niov)
@@ -145,10 +144,6 @@ net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
return -EOPNOTSUPP;
}
-static inline void dev_dmabuf_uninstall(struct net_device *dev)
-{
-}
-
static inline struct net_iov *
net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding)
{
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 08/10] net: prepare for non devmem TCP memory providers
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
` (6 preceding siblings ...)
2025-01-17 16:11 ` [PATCH net-next v12 07/10] net: page_pool: add a mp hook to unregister_netdevice* Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 09/10] net: page_pool: add memory provider helpers Pavel Begunkov
` (2 subsequent siblings)
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
There is a good bunch of places in generic paths assuming that the only
page pool memory provider is devmem TCP. As we want to reuse the net_iov
and provider infrastructure, we need to patch it up and explicitly check
the provider type when we branch into devmem TCP code.
Reviewed-by: Mina Almasry <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
net/core/devmem.c | 5 +++++
net/core/devmem.h | 7 +++++++
net/ipv4/tcp.c | 5 +++++
3 files changed, 17 insertions(+)
diff --git a/net/core/devmem.c b/net/core/devmem.c
index ebb77d2f30f4..2f46f271b80e 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -30,6 +30,11 @@ static DEFINE_XARRAY_FLAGS(net_devmem_dmabuf_bindings, XA_FLAGS_ALLOC1);
static const struct memory_provider_ops dmabuf_devmem_ops;
+bool net_is_devmem_iov(struct net_iov *niov)
+{
+ return niov->pp->mp_ops == &dmabuf_devmem_ops;
+}
+
static void net_devmem_dmabuf_free_chunk_owner(struct gen_pool *genpool,
struct gen_pool_chunk *chunk,
void *not_used)
diff --git a/net/core/devmem.h b/net/core/devmem.h
index 8e999fe2ae67..7fc158d52729 100644
--- a/net/core/devmem.h
+++ b/net/core/devmem.h
@@ -115,6 +115,8 @@ struct net_iov *
net_devmem_alloc_dmabuf(struct net_devmem_dmabuf_binding *binding);
void net_devmem_free_dmabuf(struct net_iov *ppiov);
+bool net_is_devmem_iov(struct net_iov *niov);
+
#else
struct net_devmem_dmabuf_binding;
@@ -163,6 +165,11 @@ static inline u32 net_devmem_iov_binding_id(const struct net_iov *niov)
{
return 0;
}
+
+static inline bool net_is_devmem_iov(struct net_iov *niov)
+{
+ return false;
+}
#endif
#endif /* _NET_DEVMEM_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b872de9a8271..7f43d31c9400 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2476,6 +2476,11 @@ static int tcp_recvmsg_dmabuf(struct sock *sk, const struct sk_buff *skb,
}
niov = skb_frag_net_iov(frag);
+ if (!net_is_devmem_iov(niov)) {
+ err = -ENODEV;
+ goto out;
+ }
+
end = start + skb_frag_size(frag);
copy = end - offset;
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 09/10] net: page_pool: add memory provider helpers
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
` (7 preceding siblings ...)
2025-01-17 16:11 ` [PATCH net-next v12 08/10] net: prepare for non devmem TCP memory providers Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 10/10] net: add helpers for setting a memory provider on an rx queue Pavel Begunkov
2025-01-17 22:16 ` [PATCH net-next v12 00/10] io_uring zero copy rx Jakub Kicinski
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
Add helpers for memory providers to interact with page pools.
net_mp_niov_{set,clear}_page_pool() serve to [dis]associate a net_iov
with a page pool. If used, the memory provider is responsible to match
"set" calls with "clear" once a net_iov is not going to be used by a page
pool anymore, changing a page pool, etc.
Acked-by: Jakub Kicinski <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/net/page_pool/memory_provider.h | 19 +++++++++++++++++
net/core/page_pool.c | 28 +++++++++++++++++++++++++
2 files changed, 47 insertions(+)
diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index 36469a7e649f..4f0ffb8f6a0a 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -18,4 +18,23 @@ struct memory_provider_ops {
void (*uninstall)(void *mp_priv, struct netdev_rx_queue *rxq);
};
+bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr);
+void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov);
+void net_mp_niov_clear_page_pool(struct net_iov *niov);
+
+/**
+ * net_mp_netmem_place_in_cache() - give a netmem to a page pool
+ * @pool: the page pool to place the netmem into
+ * @netmem: netmem to give
+ *
+ * Push an accounted netmem into the page pool's allocation cache. The caller
+ * must ensure that there is space in the cache. It should only be called off
+ * the mp_ops->alloc_netmems() path.
+ */
+static inline void net_mp_netmem_place_in_cache(struct page_pool *pool,
+ netmem_ref netmem)
+{
+ pool->alloc.cache[pool->alloc.count++] = netmem;
+}
+
#endif
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 199564b03533..c003b9263bd3 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1196,3 +1196,31 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
}
}
EXPORT_SYMBOL(page_pool_update_nid);
+
+bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr)
+{
+ return page_pool_set_dma_addr_netmem(net_iov_to_netmem(niov), addr);
+}
+
+/* Associate a niov with a page pool. Should follow with a matching
+ * net_mp_niov_clear_page_pool()
+ */
+void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov)
+{
+ netmem_ref netmem = net_iov_to_netmem(niov);
+
+ page_pool_set_pp_info(pool, netmem);
+
+ pool->pages_state_hold_cnt++;
+ trace_page_pool_state_hold(pool, netmem, pool->pages_state_hold_cnt);
+}
+
+/* Disassociate a niov from a page pool. Should only be used in the
+ * ->release_netmem() path.
+ */
+void net_mp_niov_clear_page_pool(struct net_iov *niov)
+{
+ netmem_ref netmem = net_iov_to_netmem(niov);
+
+ page_pool_clear_pp_info(netmem);
+}
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next v12 10/10] net: add helpers for setting a memory provider on an rx queue
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
` (8 preceding siblings ...)
2025-01-17 16:11 ` [PATCH net-next v12 09/10] net: page_pool: add memory provider helpers Pavel Begunkov
@ 2025-01-17 16:11 ` Pavel Begunkov
2025-01-17 22:16 ` [PATCH net-next v12 00/10] io_uring zero copy rx Jakub Kicinski
10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2025-01-17 16:11 UTC (permalink / raw)
To: io-uring, netdev
Cc: asml.silence, Jens Axboe, Jakub Kicinski, Paolo Abeni,
David S . Miller, Eric Dumazet, Jesper Dangaard Brouer,
David Ahern, Mina Almasry, Stanislav Fomichev, Joe Damato,
Pedro Tammela, David Wei
From: David Wei <[email protected]>
Add helpers that properly prep or remove a memory provider for an rx
queue then restart the queue.
Reviewed-by: Jakub Kicinski <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: David Wei <[email protected]>
---
include/net/page_pool/memory_provider.h | 5 ++
net/core/netdev_rx_queue.c | 62 +++++++++++++++++++++++++
2 files changed, 67 insertions(+)
diff --git a/include/net/page_pool/memory_provider.h b/include/net/page_pool/memory_provider.h
index 4f0ffb8f6a0a..b3e665897767 100644
--- a/include/net/page_pool/memory_provider.h
+++ b/include/net/page_pool/memory_provider.h
@@ -22,6 +22,11 @@ bool net_mp_niov_set_dma_addr(struct net_iov *niov, dma_addr_t addr);
void net_mp_niov_set_page_pool(struct page_pool *pool, struct net_iov *niov);
void net_mp_niov_clear_page_pool(struct net_iov *niov);
+int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
+ struct pp_memory_provider_params *p);
+void net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
+ struct pp_memory_provider_params *old_p);
+
/**
* net_mp_netmem_place_in_cache() - give a netmem to a page pool
* @pool: the page pool to place the netmem into
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
index db82786fa0c4..a13deedf6fc1 100644
--- a/net/core/netdev_rx_queue.c
+++ b/net/core/netdev_rx_queue.c
@@ -3,6 +3,7 @@
#include <linux/netdevice.h>
#include <net/netdev_queues.h>
#include <net/netdev_rx_queue.h>
+#include <net/page_pool/memory_provider.h>
#include "page_pool_priv.h"
@@ -80,3 +81,64 @@ int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
return err;
}
EXPORT_SYMBOL_NS_GPL(netdev_rx_queue_restart, "NETDEV_INTERNAL");
+
+static int __net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
+ struct pp_memory_provider_params *p)
+{
+ struct netdev_rx_queue *rxq;
+ int ret;
+
+ if (ifq_idx >= dev->real_num_rx_queues)
+ return -EINVAL;
+ ifq_idx = array_index_nospec(ifq_idx, dev->real_num_rx_queues);
+
+ rxq = __netif_get_rx_queue(dev, ifq_idx);
+ if (rxq->mp_params.mp_ops)
+ return -EEXIST;
+
+ rxq->mp_params = *p;
+ ret = netdev_rx_queue_restart(dev, ifq_idx);
+ if (ret) {
+ rxq->mp_params.mp_ops = NULL;
+ rxq->mp_params.mp_priv = NULL;
+ }
+ return ret;
+}
+
+int net_mp_open_rxq(struct net_device *dev, unsigned ifq_idx,
+ struct pp_memory_provider_params *p)
+{
+ int ret;
+
+ rtnl_lock();
+ ret = __net_mp_open_rxq(dev, ifq_idx, p);
+ rtnl_unlock();
+ return ret;
+}
+
+static void __net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
+ struct pp_memory_provider_params *old_p)
+{
+ struct netdev_rx_queue *rxq;
+
+ if (WARN_ON_ONCE(ifq_idx >= dev->real_num_rx_queues))
+ return;
+
+ rxq = __netif_get_rx_queue(dev, ifq_idx);
+
+ if (WARN_ON_ONCE(rxq->mp_params.mp_ops != old_p->mp_ops ||
+ rxq->mp_params.mp_priv != old_p->mp_priv))
+ return;
+
+ rxq->mp_params.mp_ops = NULL;
+ rxq->mp_params.mp_priv = NULL;
+ WARN_ON(netdev_rx_queue_restart(dev, ifq_idx));
+}
+
+void net_mp_close_rxq(struct net_device *dev, unsigned ifq_idx,
+ struct pp_memory_provider_params *old_p)
+{
+ rtnl_lock();
+ __net_mp_close_rxq(dev, ifq_idx, old_p);
+ rtnl_unlock();
+}
--
2.47.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next v12 00/10] io_uring zero copy rx
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
` (9 preceding siblings ...)
2025-01-17 16:11 ` [PATCH net-next v12 10/10] net: add helpers for setting a memory provider on an rx queue Pavel Begunkov
@ 2025-01-17 22:16 ` Jakub Kicinski
10 siblings, 0 replies; 12+ messages in thread
From: Jakub Kicinski @ 2025-01-17 22:16 UTC (permalink / raw)
To: Pavel Begunkov
Cc: io-uring, netdev, Jens Axboe, Paolo Abeni, David S . Miller,
Eric Dumazet, Jesper Dangaard Brouer, David Ahern, Mina Almasry,
Stanislav Fomichev, Joe Damato, Pedro Tammela, David Wei
On Fri, 17 Jan 2025 16:11:38 +0000 Pavel Begunkov wrote:
> This patchset contains net/ patches needed by a new io_uring request
> implementing zero copy rx into userspace pages, eliminating a kernel
> to user copy.
>
> We configure a page pool that a driver uses to fill a hw rx queue to
> hand out user pages instead of kernel pages. Any data that ends up
> hitting this hw rx queue will thus be dma'd into userspace memory
> directly, without needing to be bounced through kernel memory. 'Reading'
> data out of a socket instead becomes a _notification_ mechanism, where
> the kernel tells userspace where the data is. The overall approach is
> similar to the devmem TCP proposal.
The YNL codegen is not clean on this series, so CI didn't run
the selftests. Plus we need to resolve the issue of calling
the ops on a dead netdev.
diff --git a/include/uapi/linux/netdev.h b/include/uapi/linux/netdev.h
index 684090732068..6c6ee183802d 100644
--- a/include/uapi/linux/netdev.h
+++ b/include/uapi/linux/netdev.h
@@ -87,7 +87,6 @@ enum {
};
enum {
-
__NETDEV_A_IO_URING_PROVIDER_INFO_MAX,
NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1)
};
diff --git a/tools/include/uapi/linux/netdev.h b/tools/include/uapi/linux/netdev.h
index 684090732068..6c6ee183802d 100644
--- a/tools/include/uapi/linux/netdev.h
+++ b/tools/include/uapi/linux/netdev.h
@@ -87,7 +87,6 @@ enum {
};
enum {
-
__NETDEV_A_IO_URING_PROVIDER_INFO_MAX,
NETDEV_A_IO_URING_PROVIDER_INFO_MAX = (__NETDEV_A_IO_URING_PROVIDER_INFO_MAX - 1)
};
--
pw-bot: cr
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-01-17 22:16 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-17 16:11 [PATCH net-next v12 00/10] io_uring zero copy rx Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 01/10] net: page_pool: don't cast mp param to devmem Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 02/10] net: prefix devmem specific helpers Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 03/10] net: generalise net_iov chunk owners Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 04/10] net: page_pool: create hooks for custom memory providers Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 05/10] netdev: add io_uring memory provider info Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 06/10] net: page_pool: add callback for mp info printing Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 07/10] net: page_pool: add a mp hook to unregister_netdevice* Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 08/10] net: prepare for non devmem TCP memory providers Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 09/10] net: page_pool: add memory provider helpers Pavel Begunkov
2025-01-17 16:11 ` [PATCH net-next v12 10/10] net: add helpers for setting a memory provider on an rx queue Pavel Begunkov
2025-01-17 22:16 ` [PATCH net-next v12 00/10] io_uring zero copy rx Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox