* [PATCH io_uring for-6.18 01/20] io_uring/zcrx: improve rqe cache alignment
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 02/20] io_uring/zcrx: replace memchar_inv with is_zero Pavel Begunkov
` (19 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Refill queue entries are 16B structures, but because of the ring header
placement, they're 8B aligned but not naturally / 16B aligned, which
means some of them span across 2 cache lines. Push rqes to a new cache
line.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 51fd2350dbe9..c02045e4c1b6 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -352,7 +352,7 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
void *ptr;
int ret;
- off = sizeof(struct io_uring);
+ off = ALIGN(sizeof(struct io_uring), L1_CACHE_BYTES);
size = off + sizeof(struct io_uring_zcrx_rqe) * reg->rq_entries;
if (size > rd->size)
return -EINVAL;
@@ -367,6 +367,10 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
ptr = io_region_get_ptr(&ifq->region);
ifq->rq_ring = (struct io_uring *)ptr;
ifq->rqes = (struct io_uring_zcrx_rqe *)(ptr + off);
+
+ reg->offsets.head = offsetof(struct io_uring, head);
+ reg->offsets.tail = offsetof(struct io_uring, tail);
+ reg->offsets.rqes = off;
return 0;
}
@@ -618,9 +622,6 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
goto err;
ifq->if_rxq = reg.if_rxq;
- reg.offsets.rqes = sizeof(struct io_uring);
- reg.offsets.head = offsetof(struct io_uring, head);
- reg.offsets.tail = offsetof(struct io_uring, tail);
reg.zcrx_id = id;
scoped_guard(mutex, &ctx->mmap_lock) {
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 02/20] io_uring/zcrx: replace memchar_inv with is_zero
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 01/20] io_uring/zcrx: improve rqe cache alignment Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 03/20] io_uring/zcrx: use page_pool_unref_and_test() Pavel Begunkov
` (18 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
memchr_inv() is more ambiguous than mem_is_zero(), so use the latter
for zero checks.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index c02045e4c1b6..a4a0560e8269 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -566,7 +566,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
return -EFAULT;
if (copy_from_user(&rd, u64_to_user_ptr(reg.region_ptr), sizeof(rd)))
return -EFAULT;
- if (memchr_inv(®.__resv, 0, sizeof(reg.__resv)) ||
+ if (!mem_is_zero(®.__resv, sizeof(reg.__resv)) ||
reg.__resv2 || reg.zcrx_id)
return -EINVAL;
if (reg.if_rxq == -1 || !reg.rq_entries || reg.flags)
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 03/20] io_uring/zcrx: use page_pool_unref_and_test()
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 01/20] io_uring/zcrx: improve rqe cache alignment Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 02/20] io_uring/zcrx: replace memchar_inv with is_zero Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 04/20] io_uring/zcrx: remove extra io_zcrx_drop_netdev Pavel Begunkov
` (17 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
page_pool_unref_and_test() tries to better follow usuall refcount
semantics, use it instead of page_pool_unref_netmem().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index a4a0560e8269..bd2fb3688432 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -787,7 +787,7 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
continue;
netmem = net_iov_to_netmem(niov);
- if (page_pool_unref_netmem(netmem, 1) != 0)
+ if (!page_pool_unref_and_test(netmem))
continue;
if (unlikely(niov->pp != pp)) {
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 04/20] io_uring/zcrx: remove extra io_zcrx_drop_netdev
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (2 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 03/20] io_uring/zcrx: use page_pool_unref_and_test() Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 05/20] io_uring/zcrx: don't pass slot to io_zcrx_create_area Pavel Begunkov
` (16 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
io_close_queue() already detaches the netdev, don't unnecessary call
io_zcrx_drop_netdev() right after.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index bd2fb3688432..7a46e6fc2ee7 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -517,7 +517,6 @@ static void io_close_queue(struct io_zcrx_ifq *ifq)
static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
{
io_close_queue(ifq);
- io_zcrx_drop_netdev(ifq);
if (ifq->area)
io_zcrx_free_area(ifq->area);
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 05/20] io_uring/zcrx: don't pass slot to io_zcrx_create_area
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (3 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 04/20] io_uring/zcrx: remove extra io_zcrx_drop_netdev Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 06/20] io_uring/zcrx: move area reg checks into io_import_area Pavel Begunkov
` (15 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Don't pass a pointer to a pointer where an area should be stored to
io_zcrx_create_area(), and let it handle finding the right place for a
new area. It's more straightforward and will be needed to support
multiple areas.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 7a46e6fc2ee7..c64b8c7ddedf 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -397,8 +397,16 @@ static void io_zcrx_free_area(struct io_zcrx_area *area)
#define IO_ZCRX_AREA_SUPPORTED_FLAGS (IORING_ZCRX_AREA_DMABUF)
+static int io_zcrx_append_area(struct io_zcrx_ifq *ifq,
+ struct io_zcrx_area *area)
+{
+ if (ifq->area)
+ return -EINVAL;
+ ifq->area = area;
+ return 0;
+}
+
static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
- struct io_zcrx_area **res,
struct io_uring_zcrx_area_reg *area_reg)
{
struct io_zcrx_area *area;
@@ -455,8 +463,10 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
area->area_id = 0;
area_reg->rq_area_token = (u64)area->area_id << IORING_ZCRX_AREA_SHIFT;
spin_lock_init(&area->freelist_lock);
- *res = area;
- return 0;
+
+ ret = io_zcrx_append_area(ifq, area);
+ if (!ret)
+ return 0;
err:
if (area)
io_zcrx_free_area(area);
@@ -610,7 +620,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
}
get_device(ifq->dev);
- ret = io_zcrx_create_area(ifq, &ifq->area, &area);
+ ret = io_zcrx_create_area(ifq, &area);
if (ret)
goto err;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 06/20] io_uring/zcrx: move area reg checks into io_import_area
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (4 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 05/20] io_uring/zcrx: don't pass slot to io_zcrx_create_area Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 07/20] io_uring/zcrx: check all niovs filled with dma addresses Pavel Begunkov
` (14 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
io_import_area() is responsible for importing memory and parsing
io_uring_zcrx_area_reg, so move all area reg structure checks into the
function.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index c64b8c7ddedf..ef8d60b92646 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -26,6 +26,8 @@
#include "zcrx.h"
#include "rsrc.h"
+#define IO_ZCRX_AREA_SUPPORTED_FLAGS (IORING_ZCRX_AREA_DMABUF)
+
#define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
static inline struct io_zcrx_ifq *io_pp_to_ifq(struct page_pool *pp)
@@ -231,6 +233,13 @@ static int io_import_area(struct io_zcrx_ifq *ifq,
{
int ret;
+ if (area_reg->flags & ~IO_ZCRX_AREA_SUPPORTED_FLAGS)
+ return -EINVAL;
+ if (area_reg->rq_area_token)
+ return -EINVAL;
+ if (area_reg->__resv2[0] || area_reg->__resv2[1])
+ return -EINVAL;
+
ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
if (ret)
return ret;
@@ -395,8 +404,6 @@ static void io_zcrx_free_area(struct io_zcrx_area *area)
kfree(area);
}
-#define IO_ZCRX_AREA_SUPPORTED_FLAGS (IORING_ZCRX_AREA_DMABUF)
-
static int io_zcrx_append_area(struct io_zcrx_ifq *ifq,
struct io_zcrx_area *area)
{
@@ -413,13 +420,6 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
unsigned nr_iovs;
int i, ret;
- if (area_reg->flags & ~IO_ZCRX_AREA_SUPPORTED_FLAGS)
- return -EINVAL;
- if (area_reg->rq_area_token)
- return -EINVAL;
- if (area_reg->__resv2[0] || area_reg->__resv2[1])
- return -EINVAL;
-
ret = -ENOMEM;
area = kzalloc(sizeof(*area), GFP_KERNEL);
if (!area)
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 07/20] io_uring/zcrx: check all niovs filled with dma addresses
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (5 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 06/20] io_uring/zcrx: move area reg checks into io_import_area Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 08/20] io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback() Pavel Begunkov
` (13 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Add a warning if io_populate_area_dma() can't fill in all net_iovs, it
should never happen.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index ef8d60b92646..0f15e0fa5467 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -77,6 +77,9 @@ static int io_populate_area_dma(struct io_zcrx_ifq *ifq,
niov_idx++;
}
}
+
+ if (WARN_ON_ONCE(niov_idx != area->nia.num_niovs))
+ return -EFAULT;
return 0;
}
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 08/20] io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback()
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (6 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 07/20] io_uring/zcrx: check all niovs filled with dma addresses Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 09/20] io_uring/zcrx: deduplicate area mapping Pavel Begunkov
` (12 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
io_zcrx_copy_chunk() doesn't and shouldn't care from which area the
buffer is allocated, don't try to resolve the area in it but pass the
ifq to io_zcrx_alloc_fallback() and let it handle it. Also rename it for
more clarity.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 0f15e0fa5467..16bf036c7b24 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -957,10 +957,14 @@ static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov,
return true;
}
-static struct net_iov *io_zcrx_alloc_fallback(struct io_zcrx_area *area)
+static struct net_iov *io_alloc_fallback_niov(struct io_zcrx_ifq *ifq)
{
+ struct io_zcrx_area *area = ifq->area;
struct net_iov *niov = NULL;
+ if (area->mem.is_dmabuf)
+ return NULL;
+
spin_lock_bh(&area->freelist_lock);
if (area->free_count)
niov = __io_zcrx_get_free_niov(area);
@@ -1020,19 +1024,15 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
struct page *src_page, unsigned int src_offset,
size_t len)
{
- struct io_zcrx_area *area = ifq->area;
size_t copied = 0;
int ret = 0;
- if (area->mem.is_dmabuf)
- return -EFAULT;
-
while (len) {
struct io_copy_cache cc;
struct net_iov *niov;
size_t n;
- niov = io_zcrx_alloc_fallback(area);
+ niov = io_alloc_fallback_niov(ifq);
if (!niov) {
ret = -ENOMEM;
break;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 09/20] io_uring/zcrx: deduplicate area mapping
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (7 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 08/20] io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback() Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 10/20] io_uring/zcrx: remove dmabuf_offset Pavel Begunkov
` (11 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
With a common type for storing dma addresses and io_populate_area_dma(),
type-specific area mapping helpers are trivial, so open code them and
deduplicate the call to io_populate_area_dma().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 37 ++++++++++++++-----------------------
1 file changed, 14 insertions(+), 23 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 16bf036c7b24..bba92774c801 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -157,14 +157,6 @@ static int io_import_dmabuf(struct io_zcrx_ifq *ifq,
return ret;
}
-static int io_zcrx_map_area_dmabuf(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
-{
- if (!IS_ENABLED(CONFIG_DMA_SHARED_BUFFER))
- return -EINVAL;
- return io_populate_area_dma(ifq, area, area->mem.sgt,
- area->mem.dmabuf_offset);
-}
-
static unsigned long io_count_account_pages(struct page **pages, unsigned nr_pages)
{
struct folio *last_folio = NULL;
@@ -275,30 +267,29 @@ static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
}
}
-static unsigned io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
-{
- int ret;
-
- ret = dma_map_sgtable(ifq->dev, &area->mem.page_sg_table,
- DMA_FROM_DEVICE, IO_DMA_ATTR);
- if (ret < 0)
- return ret;
- return io_populate_area_dma(ifq, area, &area->mem.page_sg_table, 0);
-}
-
static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
{
+ unsigned long offset;
+ struct sg_table *sgt;
int ret;
guard(mutex)(&ifq->dma_lock);
if (area->is_mapped)
return 0;
- if (area->mem.is_dmabuf)
- ret = io_zcrx_map_area_dmabuf(ifq, area);
- else
- ret = io_zcrx_map_area_umem(ifq, area);
+ if (!area->mem.is_dmabuf) {
+ ret = dma_map_sgtable(ifq->dev, &area->mem.page_sg_table,
+ DMA_FROM_DEVICE, IO_DMA_ATTR);
+ if (ret < 0)
+ return ret;
+ sgt = &area->mem.page_sg_table;
+ offset = 0;
+ } else {
+ sgt = area->mem.sgt;
+ offset = area->mem.dmabuf_offset;
+ }
+ ret = io_populate_area_dma(ifq, area, sgt, offset);
if (ret == 0)
area->is_mapped = true;
return ret;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 10/20] io_uring/zcrx: remove dmabuf_offset
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (8 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 09/20] io_uring/zcrx: deduplicate area mapping Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 11/20] io_uring/zcrx: set sgt for umem area Pavel Begunkov
` (10 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
It was removed from uapi, so now it's always 0 and can be removed
together with offset handling in io_populate_area_dma().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 13 ++-----------
io_uring/zcrx.h | 1 -
2 files changed, 2 insertions(+), 12 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index bba92774c801..bcefb302aadf 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -53,7 +53,7 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
static int io_populate_area_dma(struct io_zcrx_ifq *ifq,
struct io_zcrx_area *area,
- struct sg_table *sgt, unsigned long off)
+ struct sg_table *sgt)
{
struct scatterlist *sg;
unsigned i, niov_idx = 0;
@@ -61,11 +61,6 @@ static int io_populate_area_dma(struct io_zcrx_ifq *ifq,
for_each_sgtable_dma_sg(sgt, sg, i) {
dma_addr_t dma = sg_dma_address(sg);
unsigned long sg_len = sg_dma_len(sg);
- unsigned long sg_off = min(sg_len, off);
-
- off -= sg_off;
- sg_len -= sg_off;
- dma += sg_off;
while (sg_len && niov_idx < area->nia.num_niovs) {
struct net_iov *niov = &area->nia.niovs[niov_idx];
@@ -149,7 +144,6 @@ static int io_import_dmabuf(struct io_zcrx_ifq *ifq,
goto err;
}
- mem->dmabuf_offset = off;
mem->size = len;
return 0;
err:
@@ -269,7 +263,6 @@ static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
{
- unsigned long offset;
struct sg_table *sgt;
int ret;
@@ -283,13 +276,11 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
if (ret < 0)
return ret;
sgt = &area->mem.page_sg_table;
- offset = 0;
} else {
sgt = area->mem.sgt;
- offset = area->mem.dmabuf_offset;
}
- ret = io_populate_area_dma(ifq, area, sgt, offset);
+ ret = io_populate_area_dma(ifq, area, sgt);
if (ret == 0)
area->is_mapped = true;
return ret;
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 109c4ca36434..24ed473632c6 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -20,7 +20,6 @@ struct io_zcrx_mem {
struct dma_buf_attachment *attach;
struct dma_buf *dmabuf;
struct sg_table *sgt;
- unsigned long dmabuf_offset;
};
struct io_zcrx_area {
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 11/20] io_uring/zcrx: set sgt for umem area
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (9 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 10/20] io_uring/zcrx: remove dmabuf_offset Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 12/20] io_uring/zcrx: make niov size variable Pavel Begunkov
` (9 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Set struct io_zcrx_mem::sgt for umem areas as well to simplify looking
up the current sg table.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 14 ++++++--------
io_uring/zcrx.h | 2 +-
2 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index bcefb302aadf..764723bf04d6 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -52,9 +52,9 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
}
static int io_populate_area_dma(struct io_zcrx_ifq *ifq,
- struct io_zcrx_area *area,
- struct sg_table *sgt)
+ struct io_zcrx_area *area)
{
+ struct sg_table *sgt = area->mem.sgt;
struct scatterlist *sg;
unsigned i, niov_idx = 0;
@@ -197,6 +197,7 @@ static int io_import_umem(struct io_zcrx_ifq *ifq,
if (ret < 0)
mem->account_pages = 0;
+ mem->sgt = &mem->page_sg_table;
mem->pages = pages;
mem->nr_folios = nr_pages;
mem->size = area_reg->len;
@@ -211,7 +212,8 @@ static void io_release_area_mem(struct io_zcrx_mem *mem)
}
if (mem->pages) {
unpin_user_pages(mem->pages, mem->nr_folios);
- sg_free_table(&mem->page_sg_table);
+ sg_free_table(mem->sgt);
+ mem->sgt = NULL;
kvfree(mem->pages);
}
}
@@ -263,7 +265,6 @@ static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
{
- struct sg_table *sgt;
int ret;
guard(mutex)(&ifq->dma_lock);
@@ -275,12 +276,9 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
DMA_FROM_DEVICE, IO_DMA_ATTR);
if (ret < 0)
return ret;
- sgt = &area->mem.page_sg_table;
- } else {
- sgt = area->mem.sgt;
}
- ret = io_populate_area_dma(ifq, area, sgt);
+ ret = io_populate_area_dma(ifq, area);
if (ret == 0)
area->is_mapped = true;
return ret;
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 24ed473632c6..27d7cf28a04e 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -16,10 +16,10 @@ struct io_zcrx_mem {
unsigned long nr_folios;
struct sg_table page_sg_table;
unsigned long account_pages;
+ struct sg_table *sgt;
struct dma_buf_attachment *attach;
struct dma_buf *dmabuf;
- struct sg_table *sgt;
};
struct io_zcrx_area {
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 12/20] io_uring/zcrx: make niov size variable
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (10 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 11/20] io_uring/zcrx: set sgt for umem area Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 13/20] io_uring/zcrx: rename dma lock Pavel Begunkov
` (8 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Instead of using PAGE_SIZE for the niov size add a niov_shift field to
ifq, and patch up all important places. Copy fallback still assumes
PAGE_SIZE, so it'll be wasting some memory for now.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 30 ++++++++++++++++++++----------
io_uring/zcrx.h | 1 +
2 files changed, 21 insertions(+), 10 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 764723bf04d6..85832f60d68a 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -45,15 +45,18 @@ static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *nio
static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
+ unsigned niov_pages_shift;
lockdep_assert(!area->mem.is_dmabuf);
- return area->mem.pages[net_iov_idx(niov)];
+ niov_pages_shift = area->ifq->niov_shift - PAGE_SHIFT;
+ return area->mem.pages[net_iov_idx(niov) << niov_pages_shift];
}
static int io_populate_area_dma(struct io_zcrx_ifq *ifq,
struct io_zcrx_area *area)
{
+ unsigned niov_size = 1U << ifq->niov_shift;
struct sg_table *sgt = area->mem.sgt;
struct scatterlist *sg;
unsigned i, niov_idx = 0;
@@ -62,13 +65,16 @@ static int io_populate_area_dma(struct io_zcrx_ifq *ifq,
dma_addr_t dma = sg_dma_address(sg);
unsigned long sg_len = sg_dma_len(sg);
+ if (WARN_ON_ONCE(sg_len % niov_size))
+ return -EINVAL;
+
while (sg_len && niov_idx < area->nia.num_niovs) {
struct net_iov *niov = &area->nia.niovs[niov_idx];
if (net_mp_niov_set_dma_addr(niov, dma))
return -EFAULT;
- sg_len -= PAGE_SIZE;
- dma += PAGE_SIZE;
+ sg_len -= niov_size;
+ dma += niov_size;
niov_idx++;
}
}
@@ -284,18 +290,21 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
return ret;
}
-static void io_zcrx_sync_for_device(const struct page_pool *pool,
+static void io_zcrx_sync_for_device(struct page_pool *pool,
struct net_iov *niov)
{
#if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC)
dma_addr_t dma_addr;
+ unsigned niov_size;
+
if (!dma_dev_need_sync(pool->p.dev))
return;
+ niov_size = 1U << io_pp_to_ifq(pool)->niov_shift;
dma_addr = page_pool_get_dma_addr_netmem(net_iov_to_netmem(niov));
__dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset,
- PAGE_SIZE, pool->p.dma_dir);
+ niov_size, pool->p.dma_dir);
#endif
}
@@ -413,7 +422,8 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
if (ret)
goto err;
- nr_iovs = area->mem.size >> PAGE_SHIFT;
+ ifq->niov_shift = PAGE_SHIFT;
+ nr_iovs = area->mem.size >> ifq->niov_shift;
area->nia.num_niovs = nr_iovs;
ret = -ENOMEM;
@@ -764,7 +774,7 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
unsigned niov_idx, area_idx;
area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT;
- niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> PAGE_SHIFT;
+ niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> ifq->niov_shift;
if (unlikely(rqe->__pad || area_idx))
continue;
@@ -854,8 +864,8 @@ static int io_pp_zc_init(struct page_pool *pp)
return -EINVAL;
if (WARN_ON_ONCE(!pp->dma_map))
return -EOPNOTSUPP;
- if (pp->p.order != 0)
- return -EOPNOTSUPP;
+ if (pp->p.order + PAGE_SHIFT != ifq->niov_shift)
+ return -EINVAL;
if (pp->p.dma_dir != DMA_FROM_DEVICE)
return -EOPNOTSUPP;
@@ -930,7 +940,7 @@ static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov,
cqe->flags |= IORING_CQE_F_32;
area = io_zcrx_iov_to_area(niov);
- offset = off + (net_iov_idx(niov) << PAGE_SHIFT);
+ offset = off + (net_iov_idx(niov) << ifq->niov_shift);
rcqe = (struct io_uring_zcrx_cqe *)(cqe + 1);
rcqe->off = offset + ((u64)area->area_id << IORING_ZCRX_AREA_SHIFT);
rcqe->__pad = 0;
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 27d7cf28a04e..7604f1f85ccb 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -41,6 +41,7 @@ struct io_zcrx_area {
struct io_zcrx_ifq {
struct io_ring_ctx *ctx;
struct io_zcrx_area *area;
+ unsigned niov_shift;
spinlock_t rq_lock ____cacheline_aligned_in_smp;
struct io_uring *rq_ring;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 13/20] io_uring/zcrx: rename dma lock
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (11 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 12/20] io_uring/zcrx: make niov size variable Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 14/20] io_uring/zcrx: protect netdev with pp_lock Pavel Begunkov
` (7 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
In preparation for reusing the lock for other purposes, rename it to
"pp_lock". As before, it can be taken deeper inside the networking stack
by page pool, and so the syscall io_uring must avoid holding it while
doing queue reconfiguration or anything that can result in immediate pp
init/destruction.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 8 ++++----
io_uring/zcrx.h | 7 ++++++-
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 85832f60d68a..0deb41b74b7c 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -253,7 +253,7 @@ static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
{
int i;
- guard(mutex)(&ifq->dma_lock);
+ guard(mutex)(&ifq->pp_lock);
if (!area->is_mapped)
return;
area->is_mapped = false;
@@ -273,7 +273,7 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
{
int ret;
- guard(mutex)(&ifq->dma_lock);
+ guard(mutex)(&ifq->pp_lock);
if (area->is_mapped)
return 0;
@@ -478,7 +478,7 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx)
ifq->ctx = ctx;
spin_lock_init(&ifq->lock);
spin_lock_init(&ifq->rq_lock);
- mutex_init(&ifq->dma_lock);
+ mutex_init(&ifq->pp_lock);
return ifq;
}
@@ -527,7 +527,7 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
put_device(ifq->dev);
io_free_rbuf_ring(ifq);
- mutex_destroy(&ifq->dma_lock);
+ mutex_destroy(&ifq->pp_lock);
kfree(ifq);
}
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 7604f1f85ccb..3f89a34e5282 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -54,7 +54,12 @@ struct io_zcrx_ifq {
struct net_device *netdev;
netdevice_tracker netdev_tracker;
spinlock_t lock;
- struct mutex dma_lock;
+
+ /*
+ * Page pool and net configuration lock, can be taken deeper in the
+ * net stack.
+ */
+ struct mutex pp_lock;
struct io_mapped_region region;
};
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 14/20] io_uring/zcrx: protect netdev with pp_lock
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (12 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 13/20] io_uring/zcrx: rename dma lock Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 15/20] io_uring/zcrx: reduce netmem scope in refill Pavel Begunkov
` (6 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Remove ifq->lock and reuse pp_lock to protect the netdev pointer.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 23 +++++++++++------------
io_uring/zcrx.h | 1 -
2 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 0deb41b74b7c..6a5b6f32edc3 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -476,7 +476,6 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx)
ifq->if_rxq = -1;
ifq->ctx = ctx;
- spin_lock_init(&ifq->lock);
spin_lock_init(&ifq->rq_lock);
mutex_init(&ifq->pp_lock);
return ifq;
@@ -484,12 +483,12 @@ static struct io_zcrx_ifq *io_zcrx_ifq_alloc(struct io_ring_ctx *ctx)
static void io_zcrx_drop_netdev(struct io_zcrx_ifq *ifq)
{
- spin_lock(&ifq->lock);
- if (ifq->netdev) {
- netdev_put(ifq->netdev, &ifq->netdev_tracker);
- ifq->netdev = NULL;
- }
- spin_unlock(&ifq->lock);
+ guard(mutex)(&ifq->pp_lock);
+
+ if (!ifq->netdev)
+ return;
+ netdev_put(ifq->netdev, &ifq->netdev_tracker);
+ ifq->netdev = NULL;
}
static void io_close_queue(struct io_zcrx_ifq *ifq)
@@ -504,11 +503,11 @@ static void io_close_queue(struct io_zcrx_ifq *ifq)
if (ifq->if_rxq == -1)
return;
- spin_lock(&ifq->lock);
- netdev = ifq->netdev;
- netdev_tracker = ifq->netdev_tracker;
- ifq->netdev = NULL;
- spin_unlock(&ifq->lock);
+ scoped_guard(mutex, &ifq->pp_lock) {
+ netdev = ifq->netdev;
+ netdev_tracker = ifq->netdev_tracker;
+ ifq->netdev = NULL;
+ }
if (netdev) {
net_mp_close_rxq(netdev, ifq->if_rxq, &p);
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 3f89a34e5282..a48871b5adad 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -53,7 +53,6 @@ struct io_zcrx_ifq {
struct device *dev;
struct net_device *netdev;
netdevice_tracker netdev_tracker;
- spinlock_t lock;
/*
* Page pool and net configuration lock, can be taken deeper in the
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 15/20] io_uring/zcrx: reduce netmem scope in refill
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (13 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 14/20] io_uring/zcrx: protect netdev with pp_lock Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:27 ` [PATCH io_uring for-6.18 16/20] io_uring/zcrx: use guards for the refill lock Pavel Begunkov
` (5 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Reduce the scope of a local var netmem in io_zcrx_ring_refill.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 6a5b6f32edc3..5f99fc7b43ee 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -755,7 +755,6 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
{
unsigned int mask = ifq->rq_entries - 1;
unsigned int entries;
- netmem_ref netmem;
spin_lock_bh(&ifq->rq_lock);
@@ -771,6 +770,7 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
struct io_zcrx_area *area;
struct net_iov *niov;
unsigned niov_idx, area_idx;
+ netmem_ref netmem;
area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT;
niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> ifq->niov_shift;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 16/20] io_uring/zcrx: use guards for the refill lock
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (14 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 15/20] io_uring/zcrx: reduce netmem scope in refill Pavel Begunkov
@ 2025-09-16 14:27 ` Pavel Begunkov
2025-09-16 14:28 ` [PATCH io_uring for-6.18 17/20] io_uring/zcrx: don't adjust free cache space Pavel Begunkov
` (4 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:27 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Use guards for rq_lock in io_zcrx_ring_refill(), makes it a tad simpler.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 5f99fc7b43ee..630b19ebb47e 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -756,14 +756,12 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
unsigned int mask = ifq->rq_entries - 1;
unsigned int entries;
- spin_lock_bh(&ifq->rq_lock);
+ guard(spinlock_bh)(&ifq->rq_lock);
entries = io_zcrx_rqring_entries(ifq);
entries = min_t(unsigned, entries, PP_ALLOC_CACHE_REFILL - pp->alloc.count);
- if (unlikely(!entries)) {
- spin_unlock_bh(&ifq->rq_lock);
+ if (unlikely(!entries))
return;
- }
do {
struct io_uring_zcrx_rqe *rqe = io_zcrx_get_rqe(ifq, mask);
@@ -801,7 +799,6 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
} while (--entries);
smp_store_release(&ifq->rq_ring->head, ifq->cached_rq_head);
- spin_unlock_bh(&ifq->rq_lock);
}
static void io_zcrx_refill_slow(struct page_pool *pp, struct io_zcrx_ifq *ifq)
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 17/20] io_uring/zcrx: don't adjust free cache space
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (15 preceding siblings ...)
2025-09-16 14:27 ` [PATCH io_uring for-6.18 16/20] io_uring/zcrx: use guards for the refill lock Pavel Begunkov
@ 2025-09-16 14:28 ` Pavel Begunkov
2025-09-16 14:28 ` [PATCH io_uring for-6.18 18/20] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov
` (3 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:28 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
The cache should be empty when io_pp_zc_alloc_netmems() is called,
that's promised by page pool and further checked, so there is no need to
recalculate the available space in io_zcrx_ring_refill().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 630b19ebb47e..a805f744c774 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -759,7 +759,7 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
guard(spinlock_bh)(&ifq->rq_lock);
entries = io_zcrx_rqring_entries(ifq);
- entries = min_t(unsigned, entries, PP_ALLOC_CACHE_REFILL - pp->alloc.count);
+ entries = min_t(unsigned, entries, PP_ALLOC_CACHE_REFILL);
if (unlikely(!entries))
return;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 18/20] io_uring/zcrx: introduce io_parse_rqe()
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (16 preceding siblings ...)
2025-09-16 14:28 ` [PATCH io_uring for-6.18 17/20] io_uring/zcrx: don't adjust free cache space Pavel Begunkov
@ 2025-09-16 14:28 ` Pavel Begunkov
2025-09-16 14:28 ` [PATCH io_uring for-6.18 19/20] io_uring/zcrx: allow synchronous buffer return Pavel Begunkov
` (2 subsequent siblings)
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:28 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Add a helper for verifying a rqe and extracting a niov out of it. It'll
be reused in following patches.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 36 +++++++++++++++++++++++-------------
1 file changed, 23 insertions(+), 13 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index a805f744c774..81d4aa75a69f 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -750,6 +750,28 @@ static struct io_uring_zcrx_rqe *io_zcrx_get_rqe(struct io_zcrx_ifq *ifq,
return &ifq->rqes[idx];
}
+static inline bool io_parse_rqe(struct io_uring_zcrx_rqe *rqe,
+ struct io_zcrx_ifq *ifq,
+ struct net_iov **ret_niov)
+{
+ unsigned niov_idx, area_idx;
+ struct io_zcrx_area *area;
+
+ area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT;
+ niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> ifq->niov_shift;
+
+ if (unlikely(rqe->__pad || area_idx))
+ return false;
+ area = ifq->area;
+
+ if (unlikely(niov_idx >= area->nia.num_niovs))
+ return false;
+ niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs);
+
+ *ret_niov = &area->nia.niovs[niov_idx];
+ return true;
+}
+
static void io_zcrx_ring_refill(struct page_pool *pp,
struct io_zcrx_ifq *ifq)
{
@@ -765,23 +787,11 @@ static void io_zcrx_ring_refill(struct page_pool *pp,
do {
struct io_uring_zcrx_rqe *rqe = io_zcrx_get_rqe(ifq, mask);
- struct io_zcrx_area *area;
struct net_iov *niov;
- unsigned niov_idx, area_idx;
netmem_ref netmem;
- area_idx = rqe->off >> IORING_ZCRX_AREA_SHIFT;
- niov_idx = (rqe->off & ~IORING_ZCRX_AREA_MASK) >> ifq->niov_shift;
-
- if (unlikely(rqe->__pad || area_idx))
+ if (!io_parse_rqe(rqe, ifq, &niov))
continue;
- area = ifq->area;
-
- if (unlikely(niov_idx >= area->nia.num_niovs))
- continue;
- niov_idx = array_index_nospec(niov_idx, area->nia.num_niovs);
-
- niov = &area->nia.niovs[niov_idx];
if (!io_zcrx_put_niov_uref(niov))
continue;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 19/20] io_uring/zcrx: allow synchronous buffer return
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (17 preceding siblings ...)
2025-09-16 14:28 ` [PATCH io_uring for-6.18 18/20] io_uring/zcrx: introduce io_parse_rqe() Pavel Begunkov
@ 2025-09-16 14:28 ` Pavel Begunkov
2025-09-16 14:28 ` [PATCH io_uring for-6.18 20/20] io_uring/zcrx: account niov arrays to cgroup Pavel Begunkov
2025-09-16 18:37 ` [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Jens Axboe
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:28 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
Returning buffers via a ring is performant and convenient, but it
becomes a problem when/if the user misconfigured the ring size and it
becomes full. Add a synchronous way to return buffers back to the page
pool via a new register opcode. It's supposed to be a reliable slow
path for refilling.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 12 +++++++
io_uring/register.c | 3 ++
io_uring/zcrx.c | 68 +++++++++++++++++++++++++++++++++++
io_uring/zcrx.h | 7 ++++
4 files changed, 90 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 1ce17c535944..a0cc1cc0dd01 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -689,6 +689,9 @@ enum io_uring_register_op {
/* query various aspects of io_uring, see linux/io_uring/query.h */
IORING_REGISTER_QUERY = 35,
+ /* return zcrx buffers back into circulation */
+ IORING_REGISTER_ZCRX_REFILL = 36,
+
/* this goes last */
IORING_REGISTER_LAST,
@@ -1070,6 +1073,15 @@ struct io_uring_zcrx_ifq_reg {
__u64 __resv[3];
};
+struct io_uring_zcrx_sync_refill {
+ __u32 zcrx_id;
+ /* the number of entries to return */
+ __u32 nr_entries;
+ /* pointer to an array of struct io_uring_zcrx_rqe */
+ __u64 rqes;
+ __u64 __resv[2];
+};
+
#ifdef __cplusplus
}
#endif
diff --git a/io_uring/register.c b/io_uring/register.c
index 96e9cac12823..43f04c47522c 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -833,6 +833,9 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
case IORING_REGISTER_QUERY:
ret = io_query(ctx, arg, nr_args);
break;
+ case IORING_REGISTER_ZCRX_REFILL:
+ ret = io_zcrx_return_bufs(ctx, arg, nr_args);
+ break;
default:
ret = -EINVAL;
break;
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 81d4aa75a69f..07a114f9a542 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -927,6 +927,74 @@ static const struct memory_provider_ops io_uring_pp_zc_ops = {
.uninstall = io_pp_uninstall,
};
+#define IO_ZCRX_MAX_SYS_REFILL_BUFS (1 << 16)
+#define IO_ZCRX_SYS_REFILL_BATCH 32
+
+static void io_return_buffers(struct io_zcrx_ifq *ifq,
+ struct io_uring_zcrx_rqe *rqes, unsigned nr)
+{
+ int i;
+
+ for (i = 0; i < nr; i++) {
+ struct net_iov *niov;
+ netmem_ref netmem;
+
+ if (!io_parse_rqe(&rqes[i], ifq, &niov))
+ continue;
+
+ scoped_guard(spinlock_bh, &ifq->rq_lock) {
+ if (!io_zcrx_put_niov_uref(niov))
+ continue;
+ }
+
+ netmem = net_iov_to_netmem(niov);
+ if (!page_pool_unref_and_test(netmem))
+ continue;
+ io_zcrx_return_niov(niov);
+ }
+}
+
+int io_zcrx_return_bufs(struct io_ring_ctx *ctx,
+ void __user *arg, unsigned nr_arg)
+{
+ struct io_uring_zcrx_rqe rqes[IO_ZCRX_SYS_REFILL_BATCH];
+ struct io_uring_zcrx_rqe __user *user_rqes;
+ struct io_uring_zcrx_sync_refill zr;
+ struct io_zcrx_ifq *ifq;
+ unsigned nr, i;
+
+ if (nr_arg)
+ return -EINVAL;
+ if (copy_from_user(&zr, arg, sizeof(zr)))
+ return -EFAULT;
+ if (!zr.nr_entries || zr.nr_entries > IO_ZCRX_MAX_SYS_REFILL_BUFS)
+ return -EINVAL;
+ if (!mem_is_zero(&zr.__resv, sizeof(zr.__resv)))
+ return -EINVAL;
+
+ ifq = xa_load(&ctx->zcrx_ctxs, zr.zcrx_id);
+ if (!ifq)
+ return -EINVAL;
+ nr = zr.nr_entries;
+ user_rqes = u64_to_user_ptr(zr.rqes);
+
+ for (i = 0; i < nr;) {
+ unsigned batch = min(nr - i, IO_ZCRX_SYS_REFILL_BATCH);
+ size_t size = batch * sizeof(rqes[0]);
+
+ if (copy_from_user(rqes, user_rqes + i, size))
+ return i ? i : -EFAULT;
+ io_return_buffers(ifq, rqes, batch);
+
+ i += batch;
+
+ if (fatal_signal_pending(current))
+ return i;
+ cond_resched();
+ }
+ return nr;
+}
+
static bool io_zcrx_queue_cqe(struct io_kiocb *req, struct net_iov *niov,
struct io_zcrx_ifq *ifq, int off, int len)
{
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index a48871b5adad..33ef61503092 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -63,6 +63,8 @@ struct io_zcrx_ifq {
};
#if defined(CONFIG_IO_URING_ZCRX)
+int io_zcrx_return_bufs(struct io_ring_ctx *ctx,
+ void __user *arg, unsigned nr_arg);
int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
struct io_uring_zcrx_ifq_reg __user *arg);
void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx);
@@ -95,6 +97,11 @@ static inline struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ct
{
return NULL;
}
+static inline int io_zcrx_return_bufs(struct io_ring_ctx *ctx,
+ void __user *arg, unsigned nr_arg)
+{
+ return -EOPNOTSUPP;
+}
#endif
int io_recvzc(struct io_kiocb *req, unsigned int issue_flags);
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH io_uring for-6.18 20/20] io_uring/zcrx: account niov arrays to cgroup
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (18 preceding siblings ...)
2025-09-16 14:28 ` [PATCH io_uring for-6.18 19/20] io_uring/zcrx: allow synchronous buffer return Pavel Begunkov
@ 2025-09-16 14:28 ` Pavel Begunkov
2025-09-16 18:37 ` [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Jens Axboe
20 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2025-09-16 14:28 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe, netdev
net_iov / freelist / etc. arrays can be quite long, make sure they're
accounted.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 07a114f9a542..6799b5f33c96 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -428,17 +428,17 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
ret = -ENOMEM;
area->nia.niovs = kvmalloc_array(nr_iovs, sizeof(area->nia.niovs[0]),
- GFP_KERNEL | __GFP_ZERO);
+ GFP_KERNEL_ACCOUNT | __GFP_ZERO);
if (!area->nia.niovs)
goto err;
area->freelist = kvmalloc_array(nr_iovs, sizeof(area->freelist[0]),
- GFP_KERNEL | __GFP_ZERO);
+ GFP_KERNEL_ACCOUNT | __GFP_ZERO);
if (!area->freelist)
goto err;
area->user_refs = kvmalloc_array(nr_iovs, sizeof(area->user_refs[0]),
- GFP_KERNEL | __GFP_ZERO);
+ GFP_KERNEL_ACCOUNT | __GFP_ZERO);
if (!area->user_refs)
goto err;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates
2025-09-16 14:27 [PATCH io_uring for-6.18 00/20] zcrx for-6.18 updates Pavel Begunkov
` (19 preceding siblings ...)
2025-09-16 14:28 ` [PATCH io_uring for-6.18 20/20] io_uring/zcrx: account niov arrays to cgroup Pavel Begunkov
@ 2025-09-16 18:37 ` Jens Axboe
20 siblings, 0 replies; 22+ messages in thread
From: Jens Axboe @ 2025-09-16 18:37 UTC (permalink / raw)
To: io-uring, Pavel Begunkov; +Cc: netdev
On Tue, 16 Sep 2025 15:27:43 +0100, Pavel Begunkov wrote:
> A bunch of assorted zcrx patches for 6.18, which includes
> - Improve refill entry alignment for better caching (Patch 1)
> - Various cleanups, especially around deduplicating normal memory vs
> dmabuf setup.
> - Generalisation of the niov size (Patch 12). It's still hard coded to
> PAGE_SIZE on init, but will let the user to specify the rx buffer
> length on setup.
> - Syscall / synchronous bufer return (Patch 19). It'll be used as a
> slow fallback path for returning buffers when the refill queue is
> full. Useful for tolerating slight queue size misconfiguration or
> with inconsistent load.
> - Accounting more memory to cgroups (Patch 20)
> - Additional independent cleanups that will also be useful for
> mutli-area support.
>
> [...]
Applied, thanks!
[01/20] io_uring/zcrx: improve rqe cache alignment
commit: 9eb3c571787d1ef7e2c3393c153b1a6b103a26e3
[02/20] io_uring/zcrx: replace memchar_inv with is_zero
commit: bdc0d478a1632a72afa6d359d7fdd49dd08c0b25
[03/20] io_uring/zcrx: use page_pool_unref_and_test()
commit: d5e31db9a950f1edfa20a59e7105e9cc78135493
[04/20] io_uring/zcrx: remove extra io_zcrx_drop_netdev
commit: c49606fc4be78da6c7a7c623566f6cf7663ba740
[05/20] io_uring/zcrx: don't pass slot to io_zcrx_create_area
commit: d425f13146af0ef10b8f1dc7cc9fd700ce7c759e
[06/20] io_uring/zcrx: move area reg checks into io_import_area
commit: 01464ea405e13789bf4f14c7d4e9fa97f0885d46
[07/20] io_uring/zcrx: check all niovs filled with dma addresses
commit: d7ae46b454eb05e3df0d46c2ac9c61416a4d9057
[08/20] io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback()
commit: 02bb047b5f42ed30ca97010069cb36cd3afb74e1
[09/20] io_uring/zcrx: deduplicate area mapping
commit: 439a98b972fbb1991819b5367f482cd4161ba39c
[10/20] io_uring/zcrx: remove dmabuf_offset
commit: 6c185117291a85937fa67d402efc4f11b2891c6a
[11/20] io_uring/zcrx: set sgt for umem area
commit: 5d93f7bade0b1eb60d0f395ad72b35581d28a896
[12/20] io_uring/zcrx: make niov size variable
commit: d8d135dfe3e8e306d9edfcccf28dbe75c6a85567
[13/20] io_uring/zcrx: rename dma lock
commit: 4f602f3112c8271e32bea358dd2a8005d32a5bd5
[14/20] io_uring/zcrx: protect netdev with pp_lock
commit: 20dda449c0b6297ed7c13a23a1207ed072655bff
[15/20] io_uring/zcrx: reduce netmem scope in refill
commit: 73fa880effc5644aaf746596acb1b1efa44606df
[16/20] io_uring/zcrx: use guards for the refill lock
commit: c95257f336556de05f26dc88a890fb2a59364939
[17/20] io_uring/zcrx: don't adjust free cache space
commit: 5a8b6e7c1d7b5863faaf392eafa089bd599a8973
[18/20] io_uring/zcrx: introduce io_parse_rqe()
commit: 8fd08d8dda3c6c4e9f0b73acdcf8a1cf391b0c8f
[19/20] io_uring/zcrx: allow synchronous buffer return
commit: 705d2ac7b2044f1ca05ba6033183151a04dbff4d
[20/20] io_uring/zcrx: account niov arrays to cgroup
commit: 31bf77dcc3810e08bcc7d15470e92cdfffb7f7f1
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 22+ messages in thread