public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx
@ 2025-05-01 12:17 Pavel Begunkov
  2025-05-01 12:17 ` [PATCH io_uring 1/5] io_uring/zcrx: improve area validation Pavel Begunkov
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
  To: io-uring
  Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira

Currently, io_uring zcrx uses regular user pages to populate the
area for page pools, this series allows the user to pass a dmabuf
instead.

Patches 1-4 are preparatory and do code shuffling. All dmabuf
touching changes are in the last patch. A basic example can be
found at:

https://github.com/isilence/liburing/tree/zcrx-dmabuf
https://github.com/isilence/liburing.git zcrx-dmabuf

Pavel Begunkov (5):
  io_uring/zcrx: improve area validation
  io_uring/zcrx: resolve netdev before area creation
  io_uring/zcrx: split out memory holders from area
  io_uring/zcrx: split common area map/unmap parts
  io_uring/zcrx: dmabuf backed zerocopy receive

 include/uapi/linux/io_uring.h |   6 +-
 io_uring/rsrc.c               |  27 ++--
 io_uring/rsrc.h               |   2 +-
 io_uring/zcrx.c               | 260 +++++++++++++++++++++++++++-------
 io_uring/zcrx.h               |  18 ++-
 5 files changed, 248 insertions(+), 65 deletions(-)

-- 
2.48.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH io_uring 1/5] io_uring/zcrx: improve area validation
  2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
  2025-05-01 12:17 ` [PATCH io_uring 2/5] io_uring/zcrx: resolve netdev before area creation Pavel Begunkov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
  To: io-uring
  Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira

dmabuf backed area will be taking an offset instead of addresses, and
io_buffer_validate() is not flexible enough to facilitate it. It also
takes an iovec, which may truncate the u64 length zcrx takes. Add a new
helper function for validation.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/rsrc.c | 27 +++++++++++++++------------
 io_uring/rsrc.h |  2 +-
 io_uring/zcrx.c |  7 +++----
 3 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index b4c5f3ee8855..1657d775c8ba 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -80,10 +80,21 @@ static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
 	return 0;
 }
 
-int io_buffer_validate(struct iovec *iov)
+int io_validate_user_buf_range(u64 uaddr, u64 ulen)
 {
-	unsigned long tmp, acct_len = iov->iov_len + (PAGE_SIZE - 1);
+	unsigned long tmp, base = (unsigned long)uaddr;
+	unsigned long acct_len = (unsigned long)PAGE_ALIGN(ulen);
 
+	/* arbitrary limit, but we need something */
+	if (ulen > SZ_1G || !ulen)
+		return -EFAULT;
+	if (check_add_overflow(base, acct_len, &tmp))
+		return -EOVERFLOW;
+	return 0;
+}
+
+static int io_buffer_validate(struct iovec *iov)
+{
 	/*
 	 * Don't impose further limits on the size and buffer
 	 * constraints here, we'll -EINVAL later when IO is
@@ -91,17 +102,9 @@ int io_buffer_validate(struct iovec *iov)
 	 */
 	if (!iov->iov_base)
 		return iov->iov_len ? -EFAULT : 0;
-	if (!iov->iov_len)
-		return -EFAULT;
-
-	/* arbitrary limit, but we need something */
-	if (iov->iov_len > SZ_1G)
-		return -EFAULT;
 
-	if (check_add_overflow((unsigned long)iov->iov_base, acct_len, &tmp))
-		return -EOVERFLOW;
-
-	return 0;
+	return io_validate_user_buf_range((unsigned long)iov->iov_base,
+					  iov->iov_len);
 }
 
 static void io_release_ubuf(void *priv)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 6008ad2e6d9e..2818aa0d0472 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -83,7 +83,7 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg,
 			    unsigned size, unsigned type);
 int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg,
 			unsigned int size, unsigned int type);
-int io_buffer_validate(struct iovec *iov);
+int io_validate_user_buf_range(u64 uaddr, u64 ulen);
 
 bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
 			      struct io_imu_folio_data *data);
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 22f420d6fbb9..5e918587fdc5 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -209,7 +209,6 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
 {
 	struct io_zcrx_area *area;
 	int i, ret, nr_pages, nr_iovs;
-	struct iovec iov;
 
 	if (area_reg->flags || area_reg->rq_area_token)
 		return -EINVAL;
@@ -218,11 +217,11 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
 	if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
 		return -EINVAL;
 
-	iov.iov_base = u64_to_user_ptr(area_reg->addr);
-	iov.iov_len = area_reg->len;
-	ret = io_buffer_validate(&iov);
+	ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
 	if (ret)
 		return ret;
+	if (!area_reg->addr)
+		return -EFAULT;
 
 	ret = -ENOMEM;
 	area = kzalloc(sizeof(*area), GFP_KERNEL);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH io_uring 2/5] io_uring/zcrx: resolve netdev before area creation
  2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
  2025-05-01 12:17 ` [PATCH io_uring 1/5] io_uring/zcrx: improve area validation Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
  2025-05-01 12:17 ` [PATCH io_uring 3/5] io_uring/zcrx: split out memory holders from area Pavel Begunkov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
  To: io-uring
  Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira

Some area types will require a valid struct device to be created, so
resolve netdev and struct device before creating an area.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 5e918587fdc5..b5335dd4f5b1 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -395,6 +395,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 	ifq = io_zcrx_ifq_alloc(ctx);
 	if (!ifq)
 		return -ENOMEM;
+	ifq->rq_entries = reg.rq_entries;
 
 	scoped_guard(mutex, &ctx->mmap_lock) {
 		/* preallocate id */
@@ -407,24 +408,24 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
 	if (ret)
 		goto err;
 
-	ret = io_zcrx_create_area(ifq, &ifq->area, &area);
-	if (ret)
-		goto err;
-
-	ifq->rq_entries = reg.rq_entries;
-
-	ret = -ENODEV;
 	ifq->netdev = netdev_get_by_index(current->nsproxy->net_ns, reg.if_idx,
 					  &ifq->netdev_tracker, GFP_KERNEL);
-	if (!ifq->netdev)
+	if (!ifq->netdev) {
+		ret = -ENODEV;
 		goto err;
+	}
 
 	ifq->dev = ifq->netdev->dev.parent;
-	ret = -EOPNOTSUPP;
-	if (!ifq->dev)
+	if (!ifq->dev) {
+		ret = -EOPNOTSUPP;
 		goto err;
+	}
 	get_device(ifq->dev);
 
+	ret = io_zcrx_create_area(ifq, &ifq->area, &area);
+	if (ret)
+		goto err;
+
 	mp_param.mp_ops = &io_uring_pp_zc_ops;
 	mp_param.mp_priv = ifq;
 	ret = net_mp_open_rxq(ifq->netdev, reg.if_rxq, &mp_param);
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH io_uring 3/5] io_uring/zcrx: split out memory holders from area
  2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
  2025-05-01 12:17 ` [PATCH io_uring 1/5] io_uring/zcrx: improve area validation Pavel Begunkov
  2025-05-01 12:17 ` [PATCH io_uring 2/5] io_uring/zcrx: resolve netdev before area creation Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
  2025-05-01 12:17 ` [PATCH io_uring 4/5] io_uring/zcrx: split common area map/unmap parts Pavel Begunkov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
  To: io-uring
  Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira

In the data path users of struct io_zcrx_area don't need to know what
kind of memory it's backed by. Only keep there generic bits in there and
and split out memory type dependent fields into a new structure. It also
logically separates the step that actually imports the memory, e.g.
pinning user pages, from the generic area initialisation.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 71 ++++++++++++++++++++++++++++++++-----------------
 io_uring/zcrx.h | 11 ++++++--
 2 files changed, 56 insertions(+), 26 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index b5335dd4f5b1..8d4cfd957e38 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -26,6 +26,8 @@
 #include "zcrx.h"
 #include "rsrc.h"
 
+#define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
+
 static inline struct io_zcrx_ifq *io_pp_to_ifq(struct page_pool *pp)
 {
 	return pp->mp_priv;
@@ -42,10 +44,43 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
 {
 	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
 
-	return area->pages[net_iov_idx(niov)];
+	return area->mem.pages[net_iov_idx(niov)];
 }
 
-#define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
+static void io_release_area_mem(struct io_zcrx_mem *mem)
+{
+	if (mem->pages) {
+		unpin_user_pages(mem->pages, mem->nr_folios);
+		kvfree(mem->pages);
+	}
+}
+
+static int io_import_area(struct io_zcrx_ifq *ifq,
+			  struct io_zcrx_mem *mem,
+			  struct io_uring_zcrx_area_reg *area_reg)
+{
+	struct page **pages;
+	int nr_pages;
+	int ret;
+
+	ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
+	if (ret)
+		return ret;
+	if (!area_reg->addr)
+		return -EFAULT;
+	if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
+		return -EINVAL;
+
+	pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len,
+				   &nr_pages);
+	if (IS_ERR(pages))
+		return PTR_ERR(pages);
+
+	mem->pages = pages;
+	mem->nr_folios = nr_pages;
+	mem->size = area_reg->len;
+	return 0;
+}
 
 static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
 				 struct io_zcrx_area *area, int nr_mapped)
@@ -84,8 +119,8 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
 		struct net_iov *niov = &area->nia.niovs[i];
 		dma_addr_t dma;
 
-		dma = dma_map_page_attrs(ifq->dev, area->pages[i], 0, PAGE_SIZE,
-					 DMA_FROM_DEVICE, IO_DMA_ATTR);
+		dma = dma_map_page_attrs(ifq->dev, area->mem.pages[i], 0,
+					 PAGE_SIZE, DMA_FROM_DEVICE, IO_DMA_ATTR);
 		if (dma_mapping_error(ifq->dev, dma))
 			break;
 		if (net_mp_niov_set_dma_addr(niov, dma)) {
@@ -192,14 +227,11 @@ static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
 static void io_zcrx_free_area(struct io_zcrx_area *area)
 {
 	io_zcrx_unmap_area(area->ifq, area);
+	io_release_area_mem(&area->mem);
 
 	kvfree(area->freelist);
 	kvfree(area->nia.niovs);
 	kvfree(area->user_refs);
-	if (area->pages) {
-		unpin_user_pages(area->pages, area->nr_folios);
-		kvfree(area->pages);
-	}
 	kfree(area);
 }
 
@@ -208,36 +240,27 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
 			       struct io_uring_zcrx_area_reg *area_reg)
 {
 	struct io_zcrx_area *area;
-	int i, ret, nr_pages, nr_iovs;
+	unsigned nr_iovs;
+	int i, ret;
 
 	if (area_reg->flags || area_reg->rq_area_token)
 		return -EINVAL;
 	if (area_reg->__resv1 || area_reg->__resv2[0] || area_reg->__resv2[1])
 		return -EINVAL;
-	if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
-		return -EINVAL;
-
-	ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
-	if (ret)
-		return ret;
-	if (!area_reg->addr)
-		return -EFAULT;
 
 	ret = -ENOMEM;
 	area = kzalloc(sizeof(*area), GFP_KERNEL);
 	if (!area)
 		goto err;
 
-	area->pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len,
-				   &nr_pages);
-	if (IS_ERR(area->pages)) {
-		ret = PTR_ERR(area->pages);
-		area->pages = NULL;
+	ret = io_import_area(ifq, &area->mem, area_reg);
+	if (ret)
 		goto err;
-	}
-	area->nr_folios = nr_iovs = nr_pages;
+
+	nr_iovs = area->mem.size >> PAGE_SHIFT;
 	area->nia.num_niovs = nr_iovs;
 
+	ret = -ENOMEM;
 	area->nia.niovs = kvmalloc_array(nr_iovs, sizeof(area->nia.niovs[0]),
 					 GFP_KERNEL | __GFP_ZERO);
 	if (!area->nia.niovs)
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index e3c7c4e647f1..9c22807af807 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -7,6 +7,13 @@
 #include <net/page_pool/types.h>
 #include <net/net_trackers.h>
 
+struct io_zcrx_mem {
+	unsigned long			size;
+
+	struct page			**pages;
+	unsigned long			nr_folios;
+};
+
 struct io_zcrx_area {
 	struct net_iov_area	nia;
 	struct io_zcrx_ifq	*ifq;
@@ -14,13 +21,13 @@ struct io_zcrx_area {
 
 	bool			is_mapped;
 	u16			area_id;
-	struct page		**pages;
-	unsigned long		nr_folios;
 
 	/* freelist */
 	spinlock_t		freelist_lock ____cacheline_aligned_in_smp;
 	u32			free_count;
 	u32			*freelist;
+
+	struct io_zcrx_mem	mem;
 };
 
 struct io_zcrx_ifq {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH io_uring 4/5] io_uring/zcrx: split common area map/unmap parts
  2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
                   ` (2 preceding siblings ...)
  2025-05-01 12:17 ` [PATCH io_uring 3/5] io_uring/zcrx: split out memory holders from area Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
  2025-05-01 12:17 ` [PATCH io_uring 5/5] io_uring/zcrx: dmabuf backed zerocopy receive Pavel Begunkov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
  To: io-uring
  Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira

Extract area type depedent parts of io_zcrx_[un]map_area from the
generic path. It'll be helpful once there are more area memory types
and not only user pages.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 42 +++++++++++++++++++++++++++++-------------
 1 file changed, 29 insertions(+), 13 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 8d4cfd957e38..34b09beba992 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -82,22 +82,31 @@ static int io_import_area(struct io_zcrx_ifq *ifq,
 	return 0;
 }
 
-static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
-				 struct io_zcrx_area *area, int nr_mapped)
+static void io_zcrx_unmap_umem(struct io_zcrx_ifq *ifq,
+				struct io_zcrx_area *area, int nr_mapped)
 {
 	int i;
 
 	for (i = 0; i < nr_mapped; i++) {
-		struct net_iov *niov = &area->nia.niovs[i];
-		dma_addr_t dma;
+		netmem_ref netmem = net_iov_to_netmem(&area->nia.niovs[i]);
+		dma_addr_t dma = page_pool_get_dma_addr_netmem(netmem);
 
-		dma = page_pool_get_dma_addr_netmem(net_iov_to_netmem(niov));
 		dma_unmap_page_attrs(ifq->dev, dma, PAGE_SIZE,
 				     DMA_FROM_DEVICE, IO_DMA_ATTR);
-		net_mp_niov_set_dma_addr(niov, 0);
 	}
 }
 
+static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
+				 struct io_zcrx_area *area, int nr_mapped)
+{
+	int i;
+
+	io_zcrx_unmap_umem(ifq, area, nr_mapped);
+
+	for (i = 0; i < area->nia.num_niovs; i++)
+		net_mp_niov_set_dma_addr(&area->nia.niovs[i], 0);
+}
+
 static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
 {
 	guard(mutex)(&ifq->dma_lock);
@@ -107,14 +116,10 @@ static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *are
 	area->is_mapped = false;
 }
 
-static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
+static int io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
 {
 	int i;
 
-	guard(mutex)(&ifq->dma_lock);
-	if (area->is_mapped)
-		return 0;
-
 	for (i = 0; i < area->nia.num_niovs; i++) {
 		struct net_iov *niov = &area->nia.niovs[i];
 		dma_addr_t dma;
@@ -129,9 +134,20 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
 			break;
 		}
 	}
+	return i;
+}
+
+static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
+{
+	unsigned nr;
+
+	guard(mutex)(&ifq->dma_lock);
+	if (area->is_mapped)
+		return 0;
 
-	if (i != area->nia.num_niovs) {
-		__io_zcrx_unmap_area(ifq, area, i);
+	nr = io_zcrx_map_area_umem(ifq, area);
+	if (nr != area->nia.num_niovs) {
+		__io_zcrx_unmap_area(ifq, area, nr);
 		return -EINVAL;
 	}
 
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH io_uring 5/5] io_uring/zcrx: dmabuf backed zerocopy receive
  2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
                   ` (3 preceding siblings ...)
  2025-05-01 12:17 ` [PATCH io_uring 4/5] io_uring/zcrx: split common area map/unmap parts Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
  2025-05-02 15:25 ` [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Jens Axboe
  2025-05-06 14:34 ` Alexey Charkov
  6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
  To: io-uring
  Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira

Add support for dmabuf backed zcrx areas. To use it, the user should
pass IORING_ZCRX_AREA_DMABUF in the struct io_uring_zcrx_area_reg flags
field and pass a dmabuf fd in the dmabuf_fd field.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/uapi/linux/io_uring.h |   6 +-
 io_uring/zcrx.c               | 155 ++++++++++++++++++++++++++++++----
 io_uring/zcrx.h               |   7 ++
 3 files changed, 151 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 130f3bc71a69..5ce096090b0c 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -990,12 +990,16 @@ struct io_uring_zcrx_offsets {
 	__u64	__resv[2];
 };
 
+enum io_uring_zcrx_area_flags {
+	IORING_ZCRX_AREA_DMABUF		= 1,
+};
+
 struct io_uring_zcrx_area_reg {
 	__u64	addr;
 	__u64	len;
 	__u64	rq_area_token;
 	__u32	flags;
-	__u32	__resv1;
+	__u32	dmabuf_fd;
 	__u64	__resv2[2];
 };
 
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 34b09beba992..fac293bcba72 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -47,30 +47,110 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
 	return area->mem.pages[net_iov_idx(niov)];
 }
 
-static void io_release_area_mem(struct io_zcrx_mem *mem)
+static void io_release_dmabuf(struct io_zcrx_mem *mem)
 {
-	if (mem->pages) {
-		unpin_user_pages(mem->pages, mem->nr_folios);
-		kvfree(mem->pages);
+	if (mem->sgt)
+		dma_buf_unmap_attachment_unlocked(mem->attach, mem->sgt,
+						  DMA_FROM_DEVICE);
+	if (mem->attach)
+		dma_buf_detach(mem->dmabuf, mem->attach);
+	if (mem->dmabuf)
+		dma_buf_put(mem->dmabuf);
+
+	mem->sgt = NULL;
+	mem->attach = NULL;
+	mem->dmabuf = NULL;
+}
+
+static int io_import_dmabuf(struct io_zcrx_ifq *ifq,
+			    struct io_zcrx_mem *mem,
+			    struct io_uring_zcrx_area_reg *area_reg)
+{
+	unsigned long off = (unsigned long)area_reg->addr;
+	unsigned long len = (unsigned long)area_reg->len;
+	unsigned long total_size = 0;
+	struct scatterlist *sg;
+	int dmabuf_fd = area_reg->dmabuf_fd;
+	int i, ret;
+
+	if (WARN_ON_ONCE(!ifq->dev))
+		return -EFAULT;
+
+	mem->is_dmabuf = true;
+	mem->dmabuf = dma_buf_get(dmabuf_fd);
+	if (IS_ERR(mem->dmabuf)) {
+		ret = PTR_ERR(mem->dmabuf);
+		mem->dmabuf = NULL;
+		goto err;
+	}
+
+	mem->attach = dma_buf_attach(mem->dmabuf, ifq->dev);
+	if (IS_ERR(mem->attach)) {
+		ret = PTR_ERR(mem->attach);
+		mem->attach = NULL;
+		goto err;
+	}
+
+	mem->sgt = dma_buf_map_attachment_unlocked(mem->attach, DMA_FROM_DEVICE);
+	if (IS_ERR(mem->sgt)) {
+		ret = PTR_ERR(mem->sgt);
+		mem->sgt = NULL;
+		goto err;
 	}
+
+	for_each_sgtable_dma_sg(mem->sgt, sg, i)
+		total_size += sg_dma_len(sg);
+
+	if (total_size < off + len)
+		return -EINVAL;
+
+	mem->dmabuf_offset = off;
+	mem->size = len;
+	return 0;
+err:
+	io_release_dmabuf(mem);
+	return ret;
 }
 
-static int io_import_area(struct io_zcrx_ifq *ifq,
+static int io_zcrx_map_area_dmabuf(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
+{
+	unsigned long off = area->mem.dmabuf_offset;
+	struct scatterlist *sg;
+	unsigned i, niov_idx = 0;
+
+	for_each_sgtable_dma_sg(area->mem.sgt, sg, i) {
+		dma_addr_t dma = sg_dma_address(sg);
+		unsigned long sg_len = sg_dma_len(sg);
+		unsigned long sg_off = min(sg_len, off);
+
+		off -= sg_off;
+		sg_len -= sg_off;
+		dma += sg_off;
+
+		while (sg_len && niov_idx < area->nia.num_niovs) {
+			struct net_iov *niov = &area->nia.niovs[niov_idx];
+
+			if (net_mp_niov_set_dma_addr(niov, dma))
+				return 0;
+			sg_len -= PAGE_SIZE;
+			dma += PAGE_SIZE;
+			niov_idx++;
+		}
+	}
+	return niov_idx;
+}
+
+static int io_import_umem(struct io_zcrx_ifq *ifq,
 			  struct io_zcrx_mem *mem,
 			  struct io_uring_zcrx_area_reg *area_reg)
 {
 	struct page **pages;
 	int nr_pages;
-	int ret;
 
-	ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
-	if (ret)
-		return ret;
+	if (area_reg->dmabuf_fd)
+		return -EINVAL;
 	if (!area_reg->addr)
 		return -EFAULT;
-	if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
-		return -EINVAL;
-
 	pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len,
 				   &nr_pages);
 	if (IS_ERR(pages))
@@ -82,6 +162,35 @@ static int io_import_area(struct io_zcrx_ifq *ifq,
 	return 0;
 }
 
+static void io_release_area_mem(struct io_zcrx_mem *mem)
+{
+	if (mem->is_dmabuf) {
+		io_release_dmabuf(mem);
+		return;
+	}
+	if (mem->pages) {
+		unpin_user_pages(mem->pages, mem->nr_folios);
+		kvfree(mem->pages);
+	}
+}
+
+static int io_import_area(struct io_zcrx_ifq *ifq,
+			  struct io_zcrx_mem *mem,
+			  struct io_uring_zcrx_area_reg *area_reg)
+{
+	int ret;
+
+	ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
+	if (ret)
+		return ret;
+	if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
+		return -EINVAL;
+
+	if (area_reg->flags & IORING_ZCRX_AREA_DMABUF)
+		return io_import_dmabuf(ifq, mem, area_reg);
+	return io_import_umem(ifq, mem, area_reg);
+}
+
 static void io_zcrx_unmap_umem(struct io_zcrx_ifq *ifq,
 				struct io_zcrx_area *area, int nr_mapped)
 {
@@ -101,7 +210,10 @@ static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
 {
 	int i;
 
-	io_zcrx_unmap_umem(ifq, area, nr_mapped);
+	if (area->mem.is_dmabuf)
+		io_release_dmabuf(&area->mem);
+	else
+		io_zcrx_unmap_umem(ifq, area, nr_mapped);
 
 	for (i = 0; i < area->nia.num_niovs; i++)
 		net_mp_niov_set_dma_addr(&area->nia.niovs[i], 0);
@@ -145,7 +257,11 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
 	if (area->is_mapped)
 		return 0;
 
-	nr = io_zcrx_map_area_umem(ifq, area);
+	if (area->mem.is_dmabuf)
+		nr = io_zcrx_map_area_dmabuf(ifq, area);
+	else
+		nr = io_zcrx_map_area_umem(ifq, area);
+
 	if (nr != area->nia.num_niovs) {
 		__io_zcrx_unmap_area(ifq, area, nr);
 		return -EINVAL;
@@ -251,6 +367,8 @@ static void io_zcrx_free_area(struct io_zcrx_area *area)
 	kfree(area);
 }
 
+#define IO_ZCRX_AREA_SUPPORTED_FLAGS	(IORING_ZCRX_AREA_DMABUF)
+
 static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
 			       struct io_zcrx_area **res,
 			       struct io_uring_zcrx_area_reg *area_reg)
@@ -259,9 +377,11 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
 	unsigned nr_iovs;
 	int i, ret;
 
-	if (area_reg->flags || area_reg->rq_area_token)
+	if (area_reg->flags & ~IO_ZCRX_AREA_SUPPORTED_FLAGS)
+		return -EINVAL;
+	if (area_reg->rq_area_token)
 		return -EINVAL;
-	if (area_reg->__resv1 || area_reg->__resv2[0] || area_reg->__resv2[1])
+	if (area_reg->__resv2[0] || area_reg->__resv2[1])
 		return -EINVAL;
 
 	ret = -ENOMEM;
@@ -819,6 +939,9 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 	size_t copied = 0;
 	int ret = 0;
 
+	if (area->mem.is_dmabuf)
+		return -EFAULT;
+
 	while (len) {
 		size_t copy_size = min_t(size_t, PAGE_SIZE, len);
 		const int dst_off = 0;
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 9c22807af807..2f5e26389f22 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -3,15 +3,22 @@
 #define IOU_ZC_RX_H
 
 #include <linux/io_uring_types.h>
+#include <linux/dma-buf.h>
 #include <linux/socket.h>
 #include <net/page_pool/types.h>
 #include <net/net_trackers.h>
 
 struct io_zcrx_mem {
 	unsigned long			size;
+	bool				is_dmabuf;
 
 	struct page			**pages;
 	unsigned long			nr_folios;
+
+	struct dma_buf_attachment	*attach;
+	struct dma_buf			*dmabuf;
+	struct sg_table			*sgt;
+	unsigned long			dmabuf_offset;
 };
 
 struct io_zcrx_area {
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx
  2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
                   ` (4 preceding siblings ...)
  2025-05-01 12:17 ` [PATCH io_uring 5/5] io_uring/zcrx: dmabuf backed zerocopy receive Pavel Begunkov
@ 2025-05-02 15:25 ` Jens Axboe
  2025-05-06 14:34 ` Alexey Charkov
  6 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2025-05-02 15:25 UTC (permalink / raw)
  To: io-uring, Pavel Begunkov
  Cc: David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira


On Thu, 01 May 2025 13:17:13 +0100, Pavel Begunkov wrote:
> Currently, io_uring zcrx uses regular user pages to populate the
> area for page pools, this series allows the user to pass a dmabuf
> instead.
> 
> Patches 1-4 are preparatory and do code shuffling. All dmabuf
> touching changes are in the last patch. A basic example can be
> found at:
> 
> [...]

Applied, thanks!

[1/5] io_uring/zcrx: improve area validation
      commit: d760d3f59f0d8d0df2895db30d36cf23106d6b05
[2/5] io_uring/zcrx: resolve netdev before area creation
      commit: 6c9589aa08471f8984cdb5e743d2a2c048dc2403
[3/5] io_uring/zcrx: split out memory holders from area
      commit: 782dfa329ac9d1b5ca7b6df56a7696bac58cb829
[4/5] io_uring/zcrx: split common area map/unmap parts
      commit: 8a62804248fff77749048a0f5511649b2569bba9
[5/5] io_uring/zcrx: dmabuf backed zerocopy receive
      commit: a42c735833315bbe7a54243ef5453b9a7fa0c248

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx
  2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
                   ` (5 preceding siblings ...)
  2025-05-02 15:25 ` [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Jens Axboe
@ 2025-05-06 14:34 ` Alexey Charkov
  2025-05-06 15:32   ` Pavel Begunkov
  6 siblings, 1 reply; 9+ messages in thread
From: Alexey Charkov @ 2025-05-06 14:34 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: io-uring, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira

On Tue, May 6, 2025 at 6:29 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> Currently, io_uring zcrx uses regular user pages to populate the
> area for page pools, this series allows the user to pass a dmabuf
> instead.
>
> Patches 1-4 are preparatory and do code shuffling. All dmabuf
> touching changes are in the last patch. A basic example can be
> found at:
>
> https://github.com/isilence/liburing/tree/zcrx-dmabuf
> https://github.com/isilence/liburing.git zcrx-dmabuf
>
> Pavel Begunkov (5):
>   io_uring/zcrx: improve area validation
>   io_uring/zcrx: resolve netdev before area creation
>   io_uring/zcrx: split out memory holders from area
>   io_uring/zcrx: split common area map/unmap parts
>   io_uring/zcrx: dmabuf backed zerocopy receive
>
>  include/uapi/linux/io_uring.h |   6 +-
>  io_uring/rsrc.c               |  27 ++--
>  io_uring/rsrc.h               |   2 +-
>  io_uring/zcrx.c               | 260 +++++++++++++++++++++++++++-------
>  io_uring/zcrx.h               |  18 ++-
>  5 files changed, 248 insertions(+), 65 deletions(-)

Hi Pavel,

Looks like another "depends" line might be needed in io_uring/Kconfig:

diff --git a/io_uring/Kconfig b/io_uring/Kconfig
index 4b949c42c0bf..9fa2cf502940 100644
--- a/io_uring/Kconfig
+++ b/io_uring/Kconfig
@@ -9,3 +9,4 @@ config IO_URING_ZCRX
        depends on PAGE_POOL
        depends on INET
        depends on NET_RX_BUSY_POLL
+       depends on DMA_SHARED_BUFFER

Otherwise I'm having trouble compiling the next-20250506 kernel for
VT8500, which doesn't select DMA_BUF by default. The following linking
error appears at the very end:

armv7a-unknown-linux-gnueabihf-ld: io_uring/zcrx.o: in function
`io_release_dmabuf':
zcrx.c:(.text+0x1c): undefined reference to `dma_buf_unmap_attachment_unlocked'
armv7a-unknown-linux-gnueabihf-ld: zcrx.c:(.text+0x30): undefined
reference to `dma_buf_detach'
armv7a-unknown-linux-gnueabihf-ld: zcrx.c:(.text+0x40): undefined
reference to `dma_buf_put'
armv7a-unknown-linux-gnueabihf-ld: io_uring/zcrx.o: in function
`io_register_zcrx_ifq':
zcrx.c:(.text+0x15cc): undefined reference to `dma_buf_get'
armv7a-unknown-linux-gnueabihf-ld: zcrx.c:(.text+0x15e8): undefined
reference to `dma_buf_attach'
armv7a-unknown-linux-gnueabihf-ld: zcrx.c:(.text+0x1604): undefined
reference to `dma_buf_map_attachment_unlocked'
make[2]: *** [scripts/Makefile.vmlinux:91: vmlinux] Error 1
make[1]: *** [/home/alchark/linux/Makefile:1242: vmlinux] Error 2
make: *** [Makefile:248: __sub-make] Error 2

Best regards,
Alexey

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx
  2025-05-06 14:34 ` Alexey Charkov
@ 2025-05-06 15:32   ` Pavel Begunkov
  0 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-06 15:32 UTC (permalink / raw)
  To: Alexey Charkov
  Cc: io-uring, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
	Victor Nogueira

On 5/6/25 15:34, Alexey Charkov wrote:
> On Tue, May 6, 2025 at 6:29 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> Currently, io_uring zcrx uses regular user pages to populate the
>> area for page pools, this series allows the user to pass a dmabuf
>> instead.
>>
>> Patches 1-4 are preparatory and do code shuffling. All dmabuf
>> touching changes are in the last patch. A basic example can be
>> found at:
>>
>> https://github.com/isilence/liburing/tree/zcrx-dmabuf
>> https://github.com/isilence/liburing.git zcrx-dmabuf
>>
>> Pavel Begunkov (5):
>>    io_uring/zcrx: improve area validation
>>    io_uring/zcrx: resolve netdev before area creation
>>    io_uring/zcrx: split out memory holders from area
>>    io_uring/zcrx: split common area map/unmap parts
>>    io_uring/zcrx: dmabuf backed zerocopy receive
>>
>>   include/uapi/linux/io_uring.h |   6 +-
>>   io_uring/rsrc.c               |  27 ++--
>>   io_uring/rsrc.h               |   2 +-
>>   io_uring/zcrx.c               | 260 +++++++++++++++++++++++++++-------
>>   io_uring/zcrx.h               |  18 ++-
>>   5 files changed, 248 insertions(+), 65 deletions(-)
> 
> Hi Pavel,
> 
> Looks like another "depends" line might be needed in io_uring/Kconfig:

Ah yes, thanks for letting know, I'll patch it up. dmabuf is
optional here, so fwiw not going to gate the entire api on
that.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-05-06 15:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 1/5] io_uring/zcrx: improve area validation Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 2/5] io_uring/zcrx: resolve netdev before area creation Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 3/5] io_uring/zcrx: split out memory holders from area Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 4/5] io_uring/zcrx: split common area map/unmap parts Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 5/5] io_uring/zcrx: dmabuf backed zerocopy receive Pavel Begunkov
2025-05-02 15:25 ` [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Jens Axboe
2025-05-06 14:34 ` Alexey Charkov
2025-05-06 15:32   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox