public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] add support for multiple ifqs per io_uring
@ 2025-04-16 15:21 Pavel Begunkov
  2025-04-16 15:21 ` [PATCH 1/5] io_uring/zcrx: remove duplicated freelist init Pavel Begunkov
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-16 15:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, David Wei

Note: depends on patches queued for 6.15-rcN.

Patches 3-5 allow to register multiple ifqs within a single io_uring
instance. That should be useful for setups with multiple interfaces.

Patch 1 and 2 and not related but I just bundled them together.

Pavel Begunkov (5):
  io_uring/zcrx: remove duplicated freelist init
  io_uring/zcrx: move io_zcrx_iov_page
  io_uring/zcrx: let zcrx choose region for mmaping
  io_uring/zcrx: move zcrx region to struct io_zcrx_ifq
  io_uring/zcrx: add support for multiple ifqs

 include/linux/io_uring_types.h |  7 +--
 io_uring/io_uring.c            |  3 +-
 io_uring/memmap.c              | 10 ++--
 io_uring/net.c                 |  8 ++-
 io_uring/zcrx.c                | 90 +++++++++++++++++++++-------------
 io_uring/zcrx.h                |  8 +++
 6 files changed, 78 insertions(+), 48 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] io_uring/zcrx: remove duplicated freelist init
  2025-04-16 15:21 [PATCH 0/5] add support for multiple ifqs per io_uring Pavel Begunkov
@ 2025-04-16 15:21 ` Pavel Begunkov
  2025-04-18 15:06   ` David Wei
  2025-04-16 15:21 ` [PATCH 2/5] io_uring/zcrx: move io_zcrx_iov_page Pavel Begunkov
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-16 15:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, David Wei

Several lines below we already initialise the freelist, don't do it
twice.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 5defbe8f95f9..659438f4cfcf 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -238,9 +238,6 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
 	if (!area->freelist)
 		goto err;
 
-	for (i = 0; i < nr_iovs; i++)
-		area->freelist[i] = i;
-
 	area->user_refs = kvmalloc_array(nr_iovs, sizeof(area->user_refs[0]),
 					GFP_KERNEL | __GFP_ZERO);
 	if (!area->user_refs)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] io_uring/zcrx: move io_zcrx_iov_page
  2025-04-16 15:21 [PATCH 0/5] add support for multiple ifqs per io_uring Pavel Begunkov
  2025-04-16 15:21 ` [PATCH 1/5] io_uring/zcrx: remove duplicated freelist init Pavel Begunkov
@ 2025-04-16 15:21 ` Pavel Begunkov
  2025-04-18 15:07   ` David Wei
  2025-04-16 15:21 ` [PATCH 3/5] io_uring/zcrx: let zcrx choose region for mmaping Pavel Begunkov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-16 15:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, David Wei

We'll need io_zcrx_iov_page at the top to keep offset calculations
closer together, move it there.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 659438f4cfcf..0b56d5f84959 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -31,6 +31,20 @@ static inline struct io_zcrx_ifq *io_pp_to_ifq(struct page_pool *pp)
 	return pp->mp_priv;
 }
 
+static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov)
+{
+	struct net_iov_area *owner = net_iov_owner(niov);
+
+	return container_of(owner, struct io_zcrx_area, nia);
+}
+
+static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
+{
+	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
+
+	return area->pages[net_iov_idx(niov)];
+}
+
 #define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
 
 static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
@@ -111,13 +125,6 @@ struct io_zcrx_args {
 
 static const struct memory_provider_ops io_uring_pp_zc_ops;
 
-static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov)
-{
-	struct net_iov_area *owner = net_iov_owner(niov);
-
-	return container_of(owner, struct io_zcrx_area, nia);
-}
-
 static inline atomic_t *io_get_user_counter(struct net_iov *niov)
 {
 	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
@@ -140,13 +147,6 @@ static void io_zcrx_get_niov_uref(struct net_iov *niov)
 	atomic_inc(io_get_user_counter(niov));
 }
 
-static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
-{
-	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
-
-	return area->pages[net_iov_idx(niov)];
-}
-
 static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
 				 struct io_uring_zcrx_ifq_reg *reg,
 				 struct io_uring_region_desc *rd)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] io_uring/zcrx: let zcrx choose region for mmaping
  2025-04-16 15:21 [PATCH 0/5] add support for multiple ifqs per io_uring Pavel Begunkov
  2025-04-16 15:21 ` [PATCH 1/5] io_uring/zcrx: remove duplicated freelist init Pavel Begunkov
  2025-04-16 15:21 ` [PATCH 2/5] io_uring/zcrx: move io_zcrx_iov_page Pavel Begunkov
@ 2025-04-16 15:21 ` Pavel Begunkov
  2025-04-18 15:35   ` David Wei
  2025-04-16 15:21 ` [PATCH 4/5] io_uring/zcrx: move zcrx region to struct io_zcrx_ifq Pavel Begunkov
  2025-04-16 15:21 ` [PATCH 5/5] io_uring/zcrx: add support for multiple ifqs Pavel Begunkov
  4 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-16 15:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, David Wei

In preparation for adding multiple ifqs, add a helper returning a region
for mmaping zcrx refill queue. For now it's trivial and returns the same
ctx global ->zcrx_region.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/memmap.c | 10 ++++++----
 io_uring/zcrx.c   | 10 ++++++++++
 io_uring/zcrx.h   |  7 +++++++
 3 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 76fcc79656b0..e53289b8d69d 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -13,6 +13,7 @@
 #include "memmap.h"
 #include "kbuf.h"
 #include "rsrc.h"
+#include "zcrx.h"
 
 static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
 				   size_t size, gfp_t gfp)
@@ -258,7 +259,9 @@ static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
 						   loff_t pgoff)
 {
 	loff_t offset = pgoff << PAGE_SHIFT;
-	unsigned int bgid;
+	unsigned int id;
+
+	id = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
 
 	switch (offset & IORING_OFF_MMAP_MASK) {
 	case IORING_OFF_SQ_RING:
@@ -267,12 +270,11 @@ static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
 	case IORING_OFF_SQES:
 		return &ctx->sq_region;
 	case IORING_OFF_PBUF_RING:
-		bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
-		return io_pbuf_get_region(ctx, bgid);
+		return io_pbuf_get_region(ctx, id);
 	case IORING_MAP_OFF_PARAM_REGION:
 		return &ctx->param_region;
 	case IORING_MAP_OFF_ZCRX_REGION:
-		return &ctx->zcrx_region;
+		return io_zcrx_get_region(ctx, id);
 	}
 	return NULL;
 }
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 0b56d5f84959..652daff0eb8d 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -329,6 +329,16 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
 	kfree(ifq);
 }
 
+struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ctx,
+					    unsigned int id)
+{
+	lockdep_assert_held(&ctx->mmap_lock);
+
+	if (id != 0)
+		return NULL;
+	return &ctx->zcrx_region;
+}
+
 int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 			  struct io_uring_zcrx_ifq_reg __user *arg)
 {
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 47f1c0e8c197..a183125e69f0 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -48,6 +48,8 @@ void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx);
 int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 		 struct socket *sock, unsigned int flags,
 		 unsigned issue_flags, unsigned int *len);
+struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ctx,
+					    unsigned int id);
 #else
 static inline int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 					struct io_uring_zcrx_ifq_reg __user *arg)
@@ -66,6 +68,11 @@ static inline int io_zcrx_recv(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 {
 	return -EOPNOTSUPP;
 }
+static inline struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ctx,
+							  unsigned int id)
+{
+	return NULL;
+}
 #endif
 
 int io_recvzc(struct io_kiocb *req, unsigned int issue_flags);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] io_uring/zcrx: move zcrx region to struct io_zcrx_ifq
  2025-04-16 15:21 [PATCH 0/5] add support for multiple ifqs per io_uring Pavel Begunkov
                   ` (2 preceding siblings ...)
  2025-04-16 15:21 ` [PATCH 3/5] io_uring/zcrx: let zcrx choose region for mmaping Pavel Begunkov
@ 2025-04-16 15:21 ` Pavel Begunkov
  2025-04-18 16:05   ` David Wei
  2025-04-16 15:21 ` [PATCH 5/5] io_uring/zcrx: add support for multiple ifqs Pavel Begunkov
  4 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-16 15:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, David Wei

Refill queue region is a part of zcrx and should stay in struct
io_zcrx_ifq. We can't have multiple queues without it.

Note: ctx->ifq assignments are now protected by mmap_lock as it's in
the mmap region look up path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/io_uring_types.h |  2 --
 io_uring/zcrx.c                | 20 ++++++++++++--------
 io_uring/zcrx.h                |  1 +
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3b467879bca8..06d722289fc5 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -448,8 +448,6 @@ struct io_ring_ctx {
 	struct io_mapped_region		ring_region;
 	/* used for optimised request parameter and wait argument passing  */
 	struct io_mapped_region		param_region;
-	/* just one zcrx per ring for now, will move to io_zcrx_ifq eventually */
-	struct io_mapped_region		zcrx_region;
 };
 
 /*
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 652daff0eb8d..d56665fd103d 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -160,12 +160,11 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
 	if (size > rd->size)
 		return -EINVAL;
 
-	ret = io_create_region_mmap_safe(ifq->ctx, &ifq->ctx->zcrx_region, rd,
-					 IORING_MAP_OFF_ZCRX_REGION);
+	ret = io_create_region(ifq->ctx, &ifq->region, rd, IORING_MAP_OFF_ZCRX_REGION);
 	if (ret < 0)
 		return ret;
 
-	ptr = io_region_get_ptr(&ifq->ctx->zcrx_region);
+	ptr = io_region_get_ptr(&ifq->region);
 	ifq->rq_ring = (struct io_uring *)ptr;
 	ifq->rqes = (struct io_uring_zcrx_rqe *)(ptr + off);
 	return 0;
@@ -173,7 +172,10 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
 
 static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
 {
-	io_free_region(ifq->ctx, &ifq->ctx->zcrx_region);
+	if (WARN_ON_ONCE(ifq->ctx->ifq))
+		return;
+
+	io_free_region(ifq->ctx, &ifq->region);
 	ifq->rq_ring = NULL;
 	ifq->rqes = NULL;
 }
@@ -334,9 +336,9 @@ struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ctx,
 {
 	lockdep_assert_held(&ctx->mmap_lock);
 
-	if (id != 0)
+	if (id != 0 || !ctx->ifq)
 		return NULL;
-	return &ctx->zcrx_region;
+	return &ctx->ifq->region;
 }
 
 int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
@@ -428,7 +430,8 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 		ret = -EFAULT;
 		goto err;
 	}
-	ctx->ifq = ifq;
+	scoped_guard(mutex, &ctx->mmap_lock)
+		ctx->ifq = ifq;
 	return 0;
 err:
 	io_zcrx_ifq_free(ifq);
@@ -444,7 +447,8 @@ void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
 	if (!ifq)
 		return;
 
-	ctx->ifq = NULL;
+	scoped_guard(mutex, &ctx->mmap_lock)
+		ctx->ifq = NULL;
 	io_zcrx_ifq_free(ifq);
 }
 
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index a183125e69f0..e58191d56249 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -38,6 +38,7 @@ struct io_zcrx_ifq {
 	struct net_device		*netdev;
 	netdevice_tracker		netdev_tracker;
 	spinlock_t			lock;
+	struct io_mapped_region		region;
 };
 
 #if defined(CONFIG_IO_URING_ZCRX)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] io_uring/zcrx: add support for multiple ifqs
  2025-04-16 15:21 [PATCH 0/5] add support for multiple ifqs per io_uring Pavel Begunkov
                   ` (3 preceding siblings ...)
  2025-04-16 15:21 ` [PATCH 4/5] io_uring/zcrx: move zcrx region to struct io_zcrx_ifq Pavel Begunkov
@ 2025-04-16 15:21 ` Pavel Begunkov
  2025-04-18 17:01   ` David Wei
  4 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-16 15:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, David Wei

Allow the user to register multiple ifqs / zcrx contexts. With that we
can use multiple interfaces / interface queues in a single io_uring
instance.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/io_uring_types.h |  5 ++--
 io_uring/io_uring.c            |  3 +-
 io_uring/net.c                 |  8 ++---
 io_uring/zcrx.c                | 53 +++++++++++++++++++++-------------
 4 files changed, 40 insertions(+), 29 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 06d722289fc5..7e23e993280e 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -40,8 +40,6 @@ enum io_uring_cmd_flags {
 	IO_URING_F_TASK_DEAD		= (1 << 13),
 };
 
-struct io_zcrx_ifq;
-
 struct io_wq_work_node {
 	struct io_wq_work_node *next;
 };
@@ -394,7 +392,8 @@ struct io_ring_ctx {
 	struct wait_queue_head		poll_wq;
 	struct io_restriction		restrictions;
 
-	struct io_zcrx_ifq		*ifq;
+	/* Stores zcrx object pointers of type struct io_zcrx_ifq */
+	struct xarray			zcrx_ctxs;
 
 	u32			pers_next;
 	struct xarray		personalities;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 61514b14ee3f..ed85c9374f6e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -359,6 +359,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	INIT_LIST_HEAD(&ctx->tctx_list);
 	ctx->submit_state.free_list.next = NULL;
 	INIT_HLIST_HEAD(&ctx->waitid_list);
+	xa_init_flags(&ctx->zcrx_ctxs, XA_FLAGS_ALLOC);
 #ifdef CONFIG_FUTEX
 	INIT_HLIST_HEAD(&ctx->futex_list);
 #endif
@@ -2888,7 +2889,7 @@ static __cold void io_ring_exit_work(struct work_struct *work)
 			io_cqring_overflow_kill(ctx);
 			mutex_unlock(&ctx->uring_lock);
 		}
-		if (ctx->ifq) {
+		if (!xa_empty(&ctx->zcrx_ctxs)) {
 			mutex_lock(&ctx->uring_lock);
 			io_shutdown_zcrx_ifqs(ctx);
 			mutex_unlock(&ctx->uring_lock);
diff --git a/io_uring/net.c b/io_uring/net.c
index 5f1a519d1fc6..b3a643675ce8 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1185,16 +1185,14 @@ int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
 	unsigned ifq_idx;
 
-	if (unlikely(sqe->file_index || sqe->addr2 || sqe->addr ||
-		     sqe->addr3))
+	if (unlikely(sqe->addr2 || sqe->addr || sqe->addr3))
 		return -EINVAL;
 
 	ifq_idx = READ_ONCE(sqe->zcrx_ifq_idx);
-	if (ifq_idx != 0)
-		return -EINVAL;
-	zc->ifq = req->ctx->ifq;
+	zc->ifq = xa_load(&req->ctx->zcrx_ctxs, ifq_idx);
 	if (!zc->ifq)
 		return -EINVAL;
+
 	zc->len = READ_ONCE(sqe->len);
 	zc->flags = READ_ONCE(sqe->ioprio);
 	zc->msg_flags = READ_ONCE(sqe->msg_flags);
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index d56665fd103d..e4ce971b1257 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -172,9 +172,6 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
 
 static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
 {
-	if (WARN_ON_ONCE(ifq->ctx->ifq))
-		return;
-
 	io_free_region(ifq->ctx, &ifq->region);
 	ifq->rq_ring = NULL;
 	ifq->rqes = NULL;
@@ -334,11 +331,11 @@ static void io_zcrx_ifq_free(struct io_zcrx_ifq *ifq)
 struct io_mapped_region *io_zcrx_get_region(struct io_ring_ctx *ctx,
 					    unsigned int id)
 {
+	struct io_zcrx_ifq *ifq = xa_load(&ctx->zcrx_ctxs, id);
+
 	lockdep_assert_held(&ctx->mmap_lock);
 
-	if (id != 0 || !ctx->ifq)
-		return NULL;
-	return &ctx->ifq->region;
+	return ifq ? &ifq->region : NULL;
 }
 
 int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
@@ -350,6 +347,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 	struct io_uring_region_desc rd;
 	struct io_zcrx_ifq *ifq;
 	int ret;
+	u32 id;
 
 	/*
 	 * 1. Interface queue allocation.
@@ -362,8 +360,6 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 	if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN &&
 	      ctx->flags & IORING_SETUP_CQE32))
 		return -EINVAL;
-	if (ctx->ifq)
-		return -EBUSY;
 	if (copy_from_user(&reg, arg, sizeof(reg)))
 		return -EFAULT;
 	if (copy_from_user(&rd, u64_to_user_ptr(reg.region_ptr), sizeof(rd)))
@@ -424,14 +420,21 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 	reg.offsets.head = offsetof(struct io_uring, head);
 	reg.offsets.tail = offsetof(struct io_uring, tail);
 
+	scoped_guard(mutex, &ctx->mmap_lock) {
+		ret = xa_alloc(&ctx->zcrx_ctxs, &id, ifq, xa_limit_31b, GFP_KERNEL);
+		if (ret)
+			goto err;
+	}
+
+	reg.zcrx_id = id;
 	if (copy_to_user(arg, &reg, sizeof(reg)) ||
 	    copy_to_user(u64_to_user_ptr(reg.region_ptr), &rd, sizeof(rd)) ||
 	    copy_to_user(u64_to_user_ptr(reg.area_ptr), &area, sizeof(area))) {
+		scoped_guard(mutex, &ctx->mmap_lock)
+			xa_erase(&ctx->zcrx_ctxs, id);
 		ret = -EFAULT;
 		goto err;
 	}
-	scoped_guard(mutex, &ctx->mmap_lock)
-		ctx->ifq = ifq;
 	return 0;
 err:
 	io_zcrx_ifq_free(ifq);
@@ -440,16 +443,23 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 
 void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
 {
-	struct io_zcrx_ifq *ifq = ctx->ifq;
+	struct io_zcrx_ifq *ifq;
+	unsigned long id;
 
 	lockdep_assert_held(&ctx->uring_lock);
 
-	if (!ifq)
-		return;
+	while (1) {
+		scoped_guard(mutex, &ctx->mmap_lock) {
+			ifq = xa_find(&ctx->zcrx_ctxs, &id, ULONG_MAX, XA_PRESENT);
+			if (ifq)
+				xa_erase(&ctx->zcrx_ctxs, id);
+		}
+		if (!ifq)
+			break;
+		io_zcrx_ifq_free(ifq);
+	}
 
-	scoped_guard(mutex, &ctx->mmap_lock)
-		ctx->ifq = NULL;
-	io_zcrx_ifq_free(ifq);
+	xa_destroy(&ctx->zcrx_ctxs);
 }
 
 static struct net_iov *__io_zcrx_get_free_niov(struct io_zcrx_area *area)
@@ -506,12 +516,15 @@ static void io_zcrx_scrub(struct io_zcrx_ifq *ifq)
 
 void io_shutdown_zcrx_ifqs(struct io_ring_ctx *ctx)
 {
+	struct io_zcrx_ifq *ifq;
+	unsigned long index;
+
 	lockdep_assert_held(&ctx->uring_lock);
 
-	if (!ctx->ifq)
-		return;
-	io_zcrx_scrub(ctx->ifq);
-	io_close_queue(ctx->ifq);
+	xa_for_each(&ctx->zcrx_ctxs, index, ifq) {
+		io_zcrx_scrub(ifq);
+		io_close_queue(ifq);
+	}
 }
 
 static inline u32 io_zcrx_rqring_entries(struct io_zcrx_ifq *ifq)
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/5] io_uring/zcrx: remove duplicated freelist init
  2025-04-16 15:21 ` [PATCH 1/5] io_uring/zcrx: remove duplicated freelist init Pavel Begunkov
@ 2025-04-18 15:06   ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-04-18 15:06 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-04-16 08:21, Pavel Begunkov wrote:
> Several lines below we already initialise the freelist, don't do it
> twice.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  io_uring/zcrx.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 5defbe8f95f9..659438f4cfcf 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -238,9 +238,6 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
>  	if (!area->freelist)
>  		goto err;
>  
> -	for (i = 0; i < nr_iovs; i++)
> -		area->freelist[i] = i;
> -
>  	area->user_refs = kvmalloc_array(nr_iovs, sizeof(area->user_refs[0]),
>  					GFP_KERNEL | __GFP_ZERO);
>  	if (!area->user_refs)

Reviewed-by: David Wei <dw@davidwei.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/5] io_uring/zcrx: move io_zcrx_iov_page
  2025-04-16 15:21 ` [PATCH 2/5] io_uring/zcrx: move io_zcrx_iov_page Pavel Begunkov
@ 2025-04-18 15:07   ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-04-18 15:07 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-04-16 08:21, Pavel Begunkov wrote:
> We'll need io_zcrx_iov_page at the top to keep offset calculations
> closer together, move it there.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  io_uring/zcrx.c | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 659438f4cfcf..0b56d5f84959 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -31,6 +31,20 @@ static inline struct io_zcrx_ifq *io_pp_to_ifq(struct page_pool *pp)
>  	return pp->mp_priv;
>  }
>  
> +static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov)
> +{
> +	struct net_iov_area *owner = net_iov_owner(niov);
> +
> +	return container_of(owner, struct io_zcrx_area, nia);
> +}
> +
> +static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
> +{
> +	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
> +
> +	return area->pages[net_iov_idx(niov)];
> +}
> +
>  #define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
>  
>  static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
> @@ -111,13 +125,6 @@ struct io_zcrx_args {
>  
>  static const struct memory_provider_ops io_uring_pp_zc_ops;
>  
> -static inline struct io_zcrx_area *io_zcrx_iov_to_area(const struct net_iov *niov)
> -{
> -	struct net_iov_area *owner = net_iov_owner(niov);
> -
> -	return container_of(owner, struct io_zcrx_area, nia);
> -}
> -
>  static inline atomic_t *io_get_user_counter(struct net_iov *niov)
>  {
>  	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
> @@ -140,13 +147,6 @@ static void io_zcrx_get_niov_uref(struct net_iov *niov)
>  	atomic_inc(io_get_user_counter(niov));
>  }
>  
> -static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
> -{
> -	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
> -
> -	return area->pages[net_iov_idx(niov)];
> -}
> -
>  static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
>  				 struct io_uring_zcrx_ifq_reg *reg,
>  				 struct io_uring_region_desc *rd)

Reviewed-by: David Wei <dw@davidwei.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] io_uring/zcrx: let zcrx choose region for mmaping
  2025-04-16 15:21 ` [PATCH 3/5] io_uring/zcrx: let zcrx choose region for mmaping Pavel Begunkov
@ 2025-04-18 15:35   ` David Wei
  2025-04-18 15:52     ` Pavel Begunkov
  0 siblings, 1 reply; 16+ messages in thread
From: David Wei @ 2025-04-18 15:35 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-04-16 08:21, Pavel Begunkov wrote:
> In preparation for adding multiple ifqs, add a helper returning a region
> for mmaping zcrx refill queue. For now it's trivial and returns the same
> ctx global ->zcrx_region.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  io_uring/memmap.c | 10 ++++++----
>  io_uring/zcrx.c   | 10 ++++++++++
>  io_uring/zcrx.h   |  7 +++++++
>  3 files changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/io_uring/memmap.c b/io_uring/memmap.c
> index 76fcc79656b0..e53289b8d69d 100644
> --- a/io_uring/memmap.c
> +++ b/io_uring/memmap.c
> @@ -13,6 +13,7 @@
>  #include "memmap.h"
>  #include "kbuf.h"
>  #include "rsrc.h"
> +#include "zcrx.h"
>  
>  static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
>  				   size_t size, gfp_t gfp)
> @@ -258,7 +259,9 @@ static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
>  						   loff_t pgoff)
>  {
>  	loff_t offset = pgoff << PAGE_SHIFT;
> -	unsigned int bgid;
> +	unsigned int id;
> +
> +	id = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;

We using the same IORING_OFF_PBUF_SHIFT?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/5] io_uring/zcrx: let zcrx choose region for mmaping
  2025-04-18 15:35   ` David Wei
@ 2025-04-18 15:52     ` Pavel Begunkov
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-18 15:52 UTC (permalink / raw)
  To: David Wei, io-uring

On 4/18/25 16:35, David Wei wrote:
> On 2025-04-16 08:21, Pavel Begunkov wrote:
>> In preparation for adding multiple ifqs, add a helper returning a region
>> for mmaping zcrx refill queue. For now it's trivial and returns the same
>> ctx global ->zcrx_region.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   io_uring/memmap.c | 10 ++++++----
>>   io_uring/zcrx.c   | 10 ++++++++++
>>   io_uring/zcrx.h   |  7 +++++++
>>   3 files changed, 23 insertions(+), 4 deletions(-)
>>
>> diff --git a/io_uring/memmap.c b/io_uring/memmap.c
>> index 76fcc79656b0..e53289b8d69d 100644
>> --- a/io_uring/memmap.c
>> +++ b/io_uring/memmap.c
>> @@ -13,6 +13,7 @@
>>   #include "memmap.h"
>>   #include "kbuf.h"
>>   #include "rsrc.h"
>> +#include "zcrx.h"
>>   
>>   static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
>>   				   size_t size, gfp_t gfp)
>> @@ -258,7 +259,9 @@ static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
>>   						   loff_t pgoff)
>>   {
>>   	loff_t offset = pgoff << PAGE_SHIFT;
>> -	unsigned int bgid;
>> +	unsigned int id;
>> +
>> +	id = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
> 
> We using the same IORING_OFF_PBUF_SHIFT?

Let me check that, and also it likely doesn't return the right
value to the user as well.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] io_uring/zcrx: move zcrx region to struct io_zcrx_ifq
  2025-04-16 15:21 ` [PATCH 4/5] io_uring/zcrx: move zcrx region to struct io_zcrx_ifq Pavel Begunkov
@ 2025-04-18 16:05   ` David Wei
  2025-04-18 16:22     ` Pavel Begunkov
  0 siblings, 1 reply; 16+ messages in thread
From: David Wei @ 2025-04-18 16:05 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-04-16 08:21, Pavel Begunkov wrote:
> Refill queue region is a part of zcrx and should stay in struct
> io_zcrx_ifq. We can't have multiple queues without it.
> 
> Note: ctx->ifq assignments are now protected by mmap_lock as it's in
> the mmap region look up path.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/linux/io_uring_types.h |  2 --
>  io_uring/zcrx.c                | 20 ++++++++++++--------
>  io_uring/zcrx.h                |  1 +
>  3 files changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 3b467879bca8..06d722289fc5 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -448,8 +448,6 @@ struct io_ring_ctx {
>  	struct io_mapped_region		ring_region;
>  	/* used for optimised request parameter and wait argument passing  */
>  	struct io_mapped_region		param_region;
> -	/* just one zcrx per ring for now, will move to io_zcrx_ifq eventually */
> -	struct io_mapped_region		zcrx_region;
>  };
>  
>  /*
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 652daff0eb8d..d56665fd103d 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -160,12 +160,11 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
>  	if (size > rd->size)
>  		return -EINVAL;
>  
> -	ret = io_create_region_mmap_safe(ifq->ctx, &ifq->ctx->zcrx_region, rd,
> -					 IORING_MAP_OFF_ZCRX_REGION);
> +	ret = io_create_region(ifq->ctx, &ifq->region, rd, IORING_MAP_OFF_ZCRX_REGION);

Why is this changed to io_create_region()? I don't see the caller of
io_allocate_rbuf_ring() changing or taking mmap_lock.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] io_uring/zcrx: move zcrx region to struct io_zcrx_ifq
  2025-04-18 16:05   ` David Wei
@ 2025-04-18 16:22     ` Pavel Begunkov
  2025-04-18 18:52       ` David Wei
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-18 16:22 UTC (permalink / raw)
  To: David Wei, io-uring

On 4/18/25 17:05, David Wei wrote:
> On 2025-04-16 08:21, Pavel Begunkov wrote:
>> Refill queue region is a part of zcrx and should stay in struct
>> io_zcrx_ifq. We can't have multiple queues without it.
>>
>> Note: ctx->ifq assignments are now protected by mmap_lock as it's in
>> the mmap region look up path.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/linux/io_uring_types.h |  2 --
>>   io_uring/zcrx.c                | 20 ++++++++++++--------
>>   io_uring/zcrx.h                |  1 +
>>   3 files changed, 13 insertions(+), 10 deletions(-)
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index 3b467879bca8..06d722289fc5 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -448,8 +448,6 @@ struct io_ring_ctx {
>>   	struct io_mapped_region		ring_region;
>>   	/* used for optimised request parameter and wait argument passing  */
>>   	struct io_mapped_region		param_region;
>> -	/* just one zcrx per ring for now, will move to io_zcrx_ifq eventually */
>> -	struct io_mapped_region		zcrx_region;
>>   };
>>   
>>   /*
>> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
>> index 652daff0eb8d..d56665fd103d 100644
>> --- a/io_uring/zcrx.c
>> +++ b/io_uring/zcrx.c
>> @@ -160,12 +160,11 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
>>   	if (size > rd->size)
>>   		return -EINVAL;
>>   
>> -	ret = io_create_region_mmap_safe(ifq->ctx, &ifq->ctx->zcrx_region, rd,
>> -					 IORING_MAP_OFF_ZCRX_REGION);
>> +	ret = io_create_region(ifq->ctx, &ifq->region, rd, IORING_MAP_OFF_ZCRX_REGION);
> 
> Why is this changed to io_create_region()? I don't see the caller of
> io_allocate_rbuf_ring() changing or taking mmap_lock.

We only care about mmap seeing a consistent region. The mmap holds
the ctx->mmap_lock, and *mmap_safe was making sure the region is
updated atomically from its perspective.

Now, instead of protecting the region itself, this patch is protecting
how ifq and subsequently the region is published:

-	ctx->ifq = ifq;
+	scoped_guard(mutex, &ctx->mmap_lock)
+		ctx->ifq = ifq;

And io_zcrx_get_region() is either sees NULL or ifq with a
correct region.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] io_uring/zcrx: add support for multiple ifqs
  2025-04-16 15:21 ` [PATCH 5/5] io_uring/zcrx: add support for multiple ifqs Pavel Begunkov
@ 2025-04-18 17:01   ` David Wei
  2025-04-18 17:22     ` Pavel Begunkov
  0 siblings, 1 reply; 16+ messages in thread
From: David Wei @ 2025-04-18 17:01 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-04-16 08:21, Pavel Begunkov wrote:
> Allow the user to register multiple ifqs / zcrx contexts. With that we
> can use multiple interfaces / interface queues in a single io_uring
> instance.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  include/linux/io_uring_types.h |  5 ++--
>  io_uring/io_uring.c            |  3 +-
>  io_uring/net.c                 |  8 ++---
>  io_uring/zcrx.c                | 53 +++++++++++++++++++++-------------
>  4 files changed, 40 insertions(+), 29 deletions(-)
> 
[...]
> diff --git a/io_uring/net.c b/io_uring/net.c
> index 5f1a519d1fc6..b3a643675ce8 100644
> --- a/io_uring/net.c
> +++ b/io_uring/net.c
> @@ -1185,16 +1185,14 @@ int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  	struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
>  	unsigned ifq_idx;
>  
> -	if (unlikely(sqe->file_index || sqe->addr2 || sqe->addr ||
> -		     sqe->addr3))
> +	if (unlikely(sqe->addr2 || sqe->addr || sqe->addr3))
>  		return -EINVAL;

Why remove sqe->file_index?

>  
>  	ifq_idx = READ_ONCE(sqe->zcrx_ifq_idx);
> -	if (ifq_idx != 0)
> -		return -EINVAL;
> -	zc->ifq = req->ctx->ifq;
> +	zc->ifq = xa_load(&req->ctx->zcrx_ctxs, ifq_idx);
>  	if (!zc->ifq)
>  		return -EINVAL;
> +
>  	zc->len = READ_ONCE(sqe->len);
>  	zc->flags = READ_ONCE(sqe->ioprio);
>  	zc->msg_flags = READ_ONCE(sqe->msg_flags);
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index d56665fd103d..e4ce971b1257 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -172,9 +172,6 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
>  
>  static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
>  {
> -	if (WARN_ON_ONCE(ifq->ctx->ifq))
> -		return;
> -

I think this should stay.

>  	io_free_region(ifq->ctx, &ifq->region);
>  	ifq->rq_ring = NULL;
>  	ifq->rqes = NULL;
[...]
> @@ -440,16 +443,23 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
>  
>  void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
>  {
> -	struct io_zcrx_ifq *ifq = ctx->ifq;
> +	struct io_zcrx_ifq *ifq;
> +	unsigned long id;
>  
>  	lockdep_assert_held(&ctx->uring_lock);
>  
> -	if (!ifq)
> -		return;
> +	while (1) {
> +		scoped_guard(mutex, &ctx->mmap_lock) {
> +			ifq = xa_find(&ctx->zcrx_ctxs, &id, ULONG_MAX, XA_PRESENT);
> +			if (ifq)
> +				xa_erase(&ctx->zcrx_ctxs, id);
> +		}
> +		if (!ifq)
> +			break;
> +		io_zcrx_ifq_free(ifq);
> +	}

Why not xa_for_each()? Is it weirdness with scoped_guard macro?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] io_uring/zcrx: add support for multiple ifqs
  2025-04-18 17:01   ` David Wei
@ 2025-04-18 17:22     ` Pavel Begunkov
  2025-04-18 18:54       ` David Wei
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-04-18 17:22 UTC (permalink / raw)
  To: David Wei, io-uring

On 4/18/25 18:01, David Wei wrote:
> On 2025-04-16 08:21, Pavel Begunkov wrote:
>> Allow the user to register multiple ifqs / zcrx contexts. With that we
>> can use multiple interfaces / interface queues in a single io_uring
>> instance.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   include/linux/io_uring_types.h |  5 ++--
>>   io_uring/io_uring.c            |  3 +-
>>   io_uring/net.c                 |  8 ++---
>>   io_uring/zcrx.c                | 53 +++++++++++++++++++++-------------
>>   4 files changed, 40 insertions(+), 29 deletions(-)
>>
> [...]
>> diff --git a/io_uring/net.c b/io_uring/net.c
>> index 5f1a519d1fc6..b3a643675ce8 100644
>> --- a/io_uring/net.c
>> +++ b/io_uring/net.c
>> @@ -1185,16 +1185,14 @@ int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>   	struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
>>   	unsigned ifq_idx;
>>   
>> -	if (unlikely(sqe->file_index || sqe->addr2 || sqe->addr ||
>> -		     sqe->addr3))
>> +	if (unlikely(sqe->addr2 || sqe->addr || sqe->addr3))
>>   		return -EINVAL;
> 
> Why remove sqe->file_index?

it's aliased with ->zcrx_ifq_idx. The ifq_idx check below
essentially does nothing. And fwiw, userspace was getting
correct errors, so it's not a fix.

>>   	ifq_idx = READ_ONCE(sqe->zcrx_ifq_idx);
>> -	if (ifq_idx != 0)
>> -		return -EINVAL;
>> -	zc->ifq = req->ctx->ifq;
>> +	zc->ifq = xa_load(&req->ctx->zcrx_ctxs, ifq_idx);
>>   	if (!zc->ifq)
>>   		return -EINVAL;
>> +
>>   	zc->len = READ_ONCE(sqe->len);
>>   	zc->flags = READ_ONCE(sqe->ioprio);
>>   	zc->msg_flags = READ_ONCE(sqe->msg_flags);
>> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
>> index d56665fd103d..e4ce971b1257 100644
>> --- a/io_uring/zcrx.c
>> +++ b/io_uring/zcrx.c
>> @@ -172,9 +172,6 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
>>   
>>   static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
>>   {
>> -	if (WARN_ON_ONCE(ifq->ctx->ifq))
>> -		return;
>> -
> 
> I think this should stay.

There is not ctx->ifq anymore. You may look up in the xarray,
but for that you need to know the index, and it's easier to
just remove it.

  
>>   	io_free_region(ifq->ctx, &ifq->region);
>>   	ifq->rq_ring = NULL;
>>   	ifq->rqes = NULL;
> [...]
>> @@ -440,16 +443,23 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
>>   
>>   void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
>>   {
>> -	struct io_zcrx_ifq *ifq = ctx->ifq;
>> +	struct io_zcrx_ifq *ifq;
>> +	unsigned long id;
>>   
>>   	lockdep_assert_held(&ctx->uring_lock);
>>   
>> -	if (!ifq)
>> -		return;
>> +	while (1) {
>> +		scoped_guard(mutex, &ctx->mmap_lock) {
>> +			ifq = xa_find(&ctx->zcrx_ctxs, &id, ULONG_MAX, XA_PRESENT);
>> +			if (ifq)
>> +				xa_erase(&ctx->zcrx_ctxs, id);
>> +		}
>> +		if (!ifq)
>> +			break;
>> +		io_zcrx_ifq_free(ifq);
>> +	}
> 
> Why not xa_for_each()? Is it weirdness with scoped_guard macro?

I don't want io_zcrx_ifq_free() to be under mmap_lock, that might
complicate sync with mmap. It's good enough for now, but I'd like
to have sth like this in the future:

struct xarray tmp_onstack_xarray;

scoped_guard(mutex, &ctx->mmap_lock)
	xarray_swap(&tmp_onstack_xarray, &ctx->zcrx);
for_each_xarray(tmp_onstack_xarray)
	io_zcrx_ifq_free();

but there is no xarray_swap AFAIK.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/5] io_uring/zcrx: move zcrx region to struct io_zcrx_ifq
  2025-04-18 16:22     ` Pavel Begunkov
@ 2025-04-18 18:52       ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-04-18 18:52 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-04-18 09:22, Pavel Begunkov wrote:
> On 4/18/25 17:05, David Wei wrote:
>> On 2025-04-16 08:21, Pavel Begunkov wrote:
>>> Refill queue region is a part of zcrx and should stay in struct
>>> io_zcrx_ifq. We can't have multiple queues without it.
>>>
>>> Note: ctx->ifq assignments are now protected by mmap_lock as it's in
>>> the mmap region look up path.
>>>
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>>   include/linux/io_uring_types.h |  2 --
>>>   io_uring/zcrx.c                | 20 ++++++++++++--------
>>>   io_uring/zcrx.h                |  1 +
>>>   3 files changed, 13 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>>> index 3b467879bca8..06d722289fc5 100644
>>> --- a/include/linux/io_uring_types.h
>>> +++ b/include/linux/io_uring_types.h
>>> @@ -448,8 +448,6 @@ struct io_ring_ctx {
>>>       struct io_mapped_region        ring_region;
>>>       /* used for optimised request parameter and wait argument passing  */
>>>       struct io_mapped_region        param_region;
>>> -    /* just one zcrx per ring for now, will move to io_zcrx_ifq eventually */
>>> -    struct io_mapped_region        zcrx_region;
>>>   };
>>>     /*
>>> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
>>> index 652daff0eb8d..d56665fd103d 100644
>>> --- a/io_uring/zcrx.c
>>> +++ b/io_uring/zcrx.c
>>> @@ -160,12 +160,11 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
>>>       if (size > rd->size)
>>>           return -EINVAL;
>>>   -    ret = io_create_region_mmap_safe(ifq->ctx, &ifq->ctx->zcrx_region, rd,
>>> -                     IORING_MAP_OFF_ZCRX_REGION);
>>> +    ret = io_create_region(ifq->ctx, &ifq->region, rd, IORING_MAP_OFF_ZCRX_REGION);
>>
>> Why is this changed to io_create_region()? I don't see the caller of
>> io_allocate_rbuf_ring() changing or taking mmap_lock.
> 
> We only care about mmap seeing a consistent region. The mmap holds
> the ctx->mmap_lock, and *mmap_safe was making sure the region is
> updated atomically from its perspective.
> 
> Now, instead of protecting the region itself, this patch is protecting
> how ifq and subsequently the region is published:
> 
> -    ctx->ifq = ifq;
> +    scoped_guard(mutex, &ctx->mmap_lock)
> +        ctx->ifq = ifq;
> 
> And io_zcrx_get_region() is either sees NULL or ifq with a
> correct region.
> 

Ah I see, thanks, make sense.

Reviewed-by: David Wei <dw@davidwei.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] io_uring/zcrx: add support for multiple ifqs
  2025-04-18 17:22     ` Pavel Begunkov
@ 2025-04-18 18:54       ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-04-18 18:54 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-04-18 10:22, Pavel Begunkov wrote:
> On 4/18/25 18:01, David Wei wrote:
>> On 2025-04-16 08:21, Pavel Begunkov wrote:
>>> Allow the user to register multiple ifqs / zcrx contexts. With that we
>>> can use multiple interfaces / interface queues in a single io_uring
>>> instance.
>>>
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>>   include/linux/io_uring_types.h |  5 ++--
>>>   io_uring/io_uring.c            |  3 +-
>>>   io_uring/net.c                 |  8 ++---
>>>   io_uring/zcrx.c                | 53 +++++++++++++++++++++-------------
>>>   4 files changed, 40 insertions(+), 29 deletions(-)
>>>
>> [...]
>>> diff --git a/io_uring/net.c b/io_uring/net.c
>>> index 5f1a519d1fc6..b3a643675ce8 100644
>>> --- a/io_uring/net.c
>>> +++ b/io_uring/net.c
>>> @@ -1185,16 +1185,14 @@ int io_recvzc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>>>       struct io_recvzc *zc = io_kiocb_to_cmd(req, struct io_recvzc);
>>>       unsigned ifq_idx;
>>>   -    if (unlikely(sqe->file_index || sqe->addr2 || sqe->addr ||
>>> -             sqe->addr3))
>>> +    if (unlikely(sqe->addr2 || sqe->addr || sqe->addr3))
>>>           return -EINVAL;
>>
>> Why remove sqe->file_index?
> 
> it's aliased with ->zcrx_ifq_idx. The ifq_idx check below
> essentially does nothing. And fwiw, userspace was getting
> correct errors, so it's not a fix.
> 

Got it.

>>>       ifq_idx = READ_ONCE(sqe->zcrx_ifq_idx);
>>> -    if (ifq_idx != 0)
>>> -        return -EINVAL;
>>> -    zc->ifq = req->ctx->ifq;
>>> +    zc->ifq = xa_load(&req->ctx->zcrx_ctxs, ifq_idx);
>>>       if (!zc->ifq)
>>>           return -EINVAL;
>>> +
>>>       zc->len = READ_ONCE(sqe->len);
>>>       zc->flags = READ_ONCE(sqe->ioprio);
>>>       zc->msg_flags = READ_ONCE(sqe->msg_flags);
>>> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
>>> index d56665fd103d..e4ce971b1257 100644
>>> --- a/io_uring/zcrx.c
>>> +++ b/io_uring/zcrx.c
>>> @@ -172,9 +172,6 @@ static int io_allocate_rbuf_ring(struct io_zcrx_ifq *ifq,
>>>     static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
>>>   {
>>> -    if (WARN_ON_ONCE(ifq->ctx->ifq))
>>> -        return;
>>> -
>>
>> I think this should stay.
> 
> There is not ctx->ifq anymore. You may look up in the xarray,
> but for that you need to know the index, and it's easier to
> just remove it.

Yeah, and it's rather defensive to check.

> 
>  
>>>       io_free_region(ifq->ctx, &ifq->region);
>>>       ifq->rq_ring = NULL;
>>>       ifq->rqes = NULL;
>> [...]
>>> @@ -440,16 +443,23 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
>>>     void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
>>>   {
>>> -    struct io_zcrx_ifq *ifq = ctx->ifq;
>>> +    struct io_zcrx_ifq *ifq;
>>> +    unsigned long id;
>>>         lockdep_assert_held(&ctx->uring_lock);
>>>   -    if (!ifq)
>>> -        return;
>>> +    while (1) {
>>> +        scoped_guard(mutex, &ctx->mmap_lock) {
>>> +            ifq = xa_find(&ctx->zcrx_ctxs, &id, ULONG_MAX, XA_PRESENT);
>>> +            if (ifq)
>>> +                xa_erase(&ctx->zcrx_ctxs, id);
>>> +        }
>>> +        if (!ifq)
>>> +            break;
>>> +        io_zcrx_ifq_free(ifq);
>>> +    }
>>
>> Why not xa_for_each()? Is it weirdness with scoped_guard macro?
> 
> I don't want io_zcrx_ifq_free() to be under mmap_lock, that might
> complicate sync with mmap. It's good enough for now, but I'd like
> to have sth like this in the future:
> 
> struct xarray tmp_onstack_xarray;
> 
> scoped_guard(mutex, &ctx->mmap_lock)
>     xarray_swap(&tmp_onstack_xarray, &ctx->zcrx);
> for_each_xarray(tmp_onstack_xarray)
>     io_zcrx_ifq_free();
> 
> but there is no xarray_swap AFAIK.
> 

Reviewed-by: David Wei <dw@davidwei.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-04-18 18:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16 15:21 [PATCH 0/5] add support for multiple ifqs per io_uring Pavel Begunkov
2025-04-16 15:21 ` [PATCH 1/5] io_uring/zcrx: remove duplicated freelist init Pavel Begunkov
2025-04-18 15:06   ` David Wei
2025-04-16 15:21 ` [PATCH 2/5] io_uring/zcrx: move io_zcrx_iov_page Pavel Begunkov
2025-04-18 15:07   ` David Wei
2025-04-16 15:21 ` [PATCH 3/5] io_uring/zcrx: let zcrx choose region for mmaping Pavel Begunkov
2025-04-18 15:35   ` David Wei
2025-04-18 15:52     ` Pavel Begunkov
2025-04-16 15:21 ` [PATCH 4/5] io_uring/zcrx: move zcrx region to struct io_zcrx_ifq Pavel Begunkov
2025-04-18 16:05   ` David Wei
2025-04-18 16:22     ` Pavel Begunkov
2025-04-18 18:52       ` David Wei
2025-04-16 15:21 ` [PATCH 5/5] io_uring/zcrx: add support for multiple ifqs Pavel Begunkov
2025-04-18 17:01   ` David Wei
2025-04-18 17:22     ` Pavel Begunkov
2025-04-18 18:54       ` David Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox