public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] zcrx huge pages support Vol 1
@ 2025-07-02 14:29 Pavel Begunkov
  2025-07-02 14:29 ` [PATCH v4 1/6] io_uring/zcrx: always pass page to io_zcrx_copy_chunk Pavel Begunkov
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 14:29 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence

Use sgtable as the common format b/w dmabuf and user pages, deduplicate
dma address propagation handling, and use it to omptimise dma mappings.
It also prepares it for larger size for NIC pages.

v4: further restrict kmap fallback length, v3 isn't be generic
    enough to cover future uses.

v3: truncate kmap'ed length by both pages

v2: Don't coalesce into folios, just use sg_alloc_table_from_pages()
    for now. Coalescing will return back later.

    Improve some fallback copy code. Patch 1, and Patch 6 adding a
    helper to work with larger pages, which also allows to get rid
    of skb_frag_foreach_page.

    Return copy fallback helpers back to pages instead of folios,
    the latter wouldn't be correct in all cases.

Pavel Begunkov (6):
  io_uring/zcrx: always pass page to io_zcrx_copy_chunk
  io_uring/zcrx: return error from io_zcrx_map_area_*
  io_uring/zcrx: introduce io_populate_area_dma
  io_uring/zcrx: allocate sgtable for umem areas
  io_uring/zcrx: assert area type in io_zcrx_iov_page
  io_uring/zcrx: prepare fallback for larger pages

 io_uring/zcrx.c | 241 +++++++++++++++++++++++++-----------------------
 io_uring/zcrx.h |   1 +
 2 files changed, 128 insertions(+), 114 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v4 1/6] io_uring/zcrx: always pass page to io_zcrx_copy_chunk
  2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
@ 2025-07-02 14:29 ` Pavel Begunkov
  2025-07-02 22:32   ` David Wei
  2025-07-02 14:29 ` [PATCH v4 2/6] io_uring/zcrx: return error from io_zcrx_map_area_* Pavel Begunkov
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 14:29 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence

io_zcrx_copy_chunk() currently takes either a page or virtual address.
Unify the parameters, make it take pages and resolve the linear part
into a page the same way general networking code does that.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 797247a34cb7..99a253c1c6c5 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -943,8 +943,8 @@ static struct net_iov *io_zcrx_alloc_fallback(struct io_zcrx_area *area)
 }
 
 static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
-				  void *src_base, struct page *src_page,
-				  unsigned int src_offset, size_t len)
+				  struct page *src_page, unsigned int src_offset,
+				  size_t len)
 {
 	struct io_zcrx_area *area = ifq->area;
 	size_t copied = 0;
@@ -958,7 +958,7 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 		const int dst_off = 0;
 		struct net_iov *niov;
 		struct page *dst_page;
-		void *dst_addr;
+		void *dst_addr, *src_addr;
 
 		niov = io_zcrx_alloc_fallback(area);
 		if (!niov) {
@@ -968,13 +968,11 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 
 		dst_page = io_zcrx_iov_page(niov);
 		dst_addr = kmap_local_page(dst_page);
-		if (src_page)
-			src_base = kmap_local_page(src_page);
+		src_addr = kmap_local_page(src_page);
 
-		memcpy(dst_addr, src_base + src_offset, copy_size);
+		memcpy(dst_addr, src_addr + src_offset, copy_size);
 
-		if (src_page)
-			kunmap_local(src_base);
+		kunmap_local(src_addr);
 		kunmap_local(dst_addr);
 
 		if (!io_zcrx_queue_cqe(req, niov, ifq, dst_off, copy_size)) {
@@ -1003,7 +1001,7 @@ static int io_zcrx_copy_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 
 	skb_frag_foreach_page(frag, off, len,
 			      page, p_off, p_len, t) {
-		ret = io_zcrx_copy_chunk(req, ifq, NULL, page, p_off, p_len);
+		ret = io_zcrx_copy_chunk(req, ifq, page, p_off, p_len);
 		if (ret < 0)
 			return copied ? copied : ret;
 		copied += ret;
@@ -1065,8 +1063,9 @@ io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb,
 		size_t to_copy;
 
 		to_copy = min_t(size_t, skb_headlen(skb) - offset, len);
-		copied = io_zcrx_copy_chunk(req, ifq, skb->data, NULL,
-					    offset, to_copy);
+		copied = io_zcrx_copy_chunk(req, ifq, virt_to_page(skb->data),
+					    offset_in_page(skb->data) + offset,
+					    to_copy);
 		if (copied < 0) {
 			ret = copied;
 			goto out;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 2/6] io_uring/zcrx: return error from io_zcrx_map_area_*
  2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
  2025-07-02 14:29 ` [PATCH v4 1/6] io_uring/zcrx: always pass page to io_zcrx_copy_chunk Pavel Begunkov
@ 2025-07-02 14:29 ` Pavel Begunkov
  2025-07-02 22:27   ` David Wei
  2025-07-02 14:29 ` [PATCH v4 3/6] io_uring/zcrx: introduce io_populate_area_dma Pavel Begunkov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 14:29 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence

io_zcrx_map_area_*() helpers return the number of processed niovs, which
we use to unroll some of the mappings for user memory areas. It's
unhandy, and dmabuf doesn't care about it. Return an error code instead
and move failure partial unmapping into io_zcrx_map_area_umem().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 99a253c1c6c5..2cde88988260 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -139,13 +139,13 @@ static int io_zcrx_map_area_dmabuf(struct io_zcrx_ifq *ifq, struct io_zcrx_area
 			struct net_iov *niov = &area->nia.niovs[niov_idx];
 
 			if (net_mp_niov_set_dma_addr(niov, dma))
-				return 0;
+				return -EFAULT;
 			sg_len -= PAGE_SIZE;
 			dma += PAGE_SIZE;
 			niov_idx++;
 		}
 	}
-	return niov_idx;
+	return 0;
 }
 
 static int io_import_umem(struct io_zcrx_ifq *ifq,
@@ -254,29 +254,30 @@ static int io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *a
 			break;
 		}
 	}
-	return i;
+
+	if (i != area->nia.num_niovs) {
+		__io_zcrx_unmap_area(ifq, area, i);
+		return -EINVAL;
+	}
+	return 0;
 }
 
 static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
 {
-	unsigned nr;
+	int ret;
 
 	guard(mutex)(&ifq->dma_lock);
 	if (area->is_mapped)
 		return 0;
 
 	if (area->mem.is_dmabuf)
-		nr = io_zcrx_map_area_dmabuf(ifq, area);
+		ret = io_zcrx_map_area_dmabuf(ifq, area);
 	else
-		nr = io_zcrx_map_area_umem(ifq, area);
+		ret = io_zcrx_map_area_umem(ifq, area);
 
-	if (nr != area->nia.num_niovs) {
-		__io_zcrx_unmap_area(ifq, area, nr);
-		return -EINVAL;
-	}
-
-	area->is_mapped = true;
-	return 0;
+	if (ret == 0)
+		area->is_mapped = true;
+	return ret;
 }
 
 static void io_zcrx_sync_for_device(const struct page_pool *pool,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 3/6] io_uring/zcrx: introduce io_populate_area_dma
  2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
  2025-07-02 14:29 ` [PATCH v4 1/6] io_uring/zcrx: always pass page to io_zcrx_copy_chunk Pavel Begunkov
  2025-07-02 14:29 ` [PATCH v4 2/6] io_uring/zcrx: return error from io_zcrx_map_area_* Pavel Begunkov
@ 2025-07-02 14:29 ` Pavel Begunkov
  2025-07-02 23:13   ` David Wei
  2025-07-02 14:29 ` [PATCH v4 4/6] io_uring/zcrx: allocate sgtable for umem areas Pavel Begunkov
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 14:29 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence

Add a helper that initialises page-pool dma addresses from a sg table.
It'll be reused in following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 56 +++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 2cde88988260..cef0763010a0 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -47,6 +47,35 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
 	return area->mem.pages[net_iov_idx(niov)];
 }
 
+static int io_populate_area_dma(struct io_zcrx_ifq *ifq,
+				struct io_zcrx_area *area,
+				struct sg_table *sgt, unsigned long off)
+{
+	struct scatterlist *sg;
+	unsigned i, niov_idx = 0;
+
+	for_each_sgtable_dma_sg(sgt, sg, i) {
+		dma_addr_t dma = sg_dma_address(sg);
+		unsigned long sg_len = sg_dma_len(sg);
+		unsigned long sg_off = min(sg_len, off);
+
+		off -= sg_off;
+		sg_len -= sg_off;
+		dma += sg_off;
+
+		while (sg_len && niov_idx < area->nia.num_niovs) {
+			struct net_iov *niov = &area->nia.niovs[niov_idx];
+
+			if (net_mp_niov_set_dma_addr(niov, dma))
+				return -EFAULT;
+			sg_len -= PAGE_SIZE;
+			dma += PAGE_SIZE;
+			niov_idx++;
+		}
+	}
+	return 0;
+}
+
 static void io_release_dmabuf(struct io_zcrx_mem *mem)
 {
 	if (!IS_ENABLED(CONFIG_DMA_SHARED_BUFFER))
@@ -119,33 +148,10 @@ static int io_import_dmabuf(struct io_zcrx_ifq *ifq,
 
 static int io_zcrx_map_area_dmabuf(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
 {
-	unsigned long off = area->mem.dmabuf_offset;
-	struct scatterlist *sg;
-	unsigned i, niov_idx = 0;
-
 	if (!IS_ENABLED(CONFIG_DMA_SHARED_BUFFER))
 		return -EINVAL;
-
-	for_each_sgtable_dma_sg(area->mem.sgt, sg, i) {
-		dma_addr_t dma = sg_dma_address(sg);
-		unsigned long sg_len = sg_dma_len(sg);
-		unsigned long sg_off = min(sg_len, off);
-
-		off -= sg_off;
-		sg_len -= sg_off;
-		dma += sg_off;
-
-		while (sg_len && niov_idx < area->nia.num_niovs) {
-			struct net_iov *niov = &area->nia.niovs[niov_idx];
-
-			if (net_mp_niov_set_dma_addr(niov, dma))
-				return -EFAULT;
-			sg_len -= PAGE_SIZE;
-			dma += PAGE_SIZE;
-			niov_idx++;
-		}
-	}
-	return 0;
+	return io_populate_area_dma(ifq, area, area->mem.sgt,
+				    area->mem.dmabuf_offset);
 }
 
 static int io_import_umem(struct io_zcrx_ifq *ifq,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 4/6] io_uring/zcrx: allocate sgtable for umem areas
  2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
                   ` (2 preceding siblings ...)
  2025-07-02 14:29 ` [PATCH v4 3/6] io_uring/zcrx: introduce io_populate_area_dma Pavel Begunkov
@ 2025-07-02 14:29 ` Pavel Begunkov
  2025-07-02 23:16   ` David Wei
  2025-07-02 14:29 ` [PATCH v4 5/6] io_uring/zcrx: assert area type in io_zcrx_iov_page Pavel Begunkov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 14:29 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence

Currently, dma addresses for umem areas are stored directly in niovs.
It's memory efficient but inconvenient. I need a better format 1) to
share code with dmabuf areas, and 2) for disentangling page, folio and
niov sizes. dmabuf already provides sg_table, create one for user memory
as well.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 78 +++++++++++++++++--------------------------------
 io_uring/zcrx.h |  1 +
 2 files changed, 28 insertions(+), 51 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index cef0763010a0..fbcec06a1fb0 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -159,7 +159,7 @@ static int io_import_umem(struct io_zcrx_ifq *ifq,
 			  struct io_uring_zcrx_area_reg *area_reg)
 {
 	struct page **pages;
-	int nr_pages;
+	int nr_pages, ret;
 
 	if (area_reg->dmabuf_fd)
 		return -EINVAL;
@@ -170,6 +170,12 @@ static int io_import_umem(struct io_zcrx_ifq *ifq,
 	if (IS_ERR(pages))
 		return PTR_ERR(pages);
 
+	ret = sg_alloc_table_from_pages(&mem->page_sg_table, pages, nr_pages,
+					0, nr_pages << PAGE_SHIFT,
+					GFP_KERNEL_ACCOUNT);
+	if (ret)
+		return ret;
+
 	mem->pages = pages;
 	mem->nr_folios = nr_pages;
 	mem->size = area_reg->len;
@@ -184,6 +190,7 @@ static void io_release_area_mem(struct io_zcrx_mem *mem)
 	}
 	if (mem->pages) {
 		unpin_user_pages(mem->pages, mem->nr_folios);
+		sg_free_table(&mem->page_sg_table);
 		kvfree(mem->pages);
 	}
 }
@@ -205,67 +212,36 @@ static int io_import_area(struct io_zcrx_ifq *ifq,
 	return io_import_umem(ifq, mem, area_reg);
 }
 
-static void io_zcrx_unmap_umem(struct io_zcrx_ifq *ifq,
-				struct io_zcrx_area *area, int nr_mapped)
-{
-	int i;
-
-	for (i = 0; i < nr_mapped; i++) {
-		netmem_ref netmem = net_iov_to_netmem(&area->nia.niovs[i]);
-		dma_addr_t dma = page_pool_get_dma_addr_netmem(netmem);
-
-		dma_unmap_page_attrs(ifq->dev, dma, PAGE_SIZE,
-				     DMA_FROM_DEVICE, IO_DMA_ATTR);
-	}
-}
-
-static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
-				 struct io_zcrx_area *area, int nr_mapped)
+static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
+				struct io_zcrx_area *area)
 {
 	int i;
 
-	if (area->mem.is_dmabuf)
-		io_release_dmabuf(&area->mem);
-	else
-		io_zcrx_unmap_umem(ifq, area, nr_mapped);
+	guard(mutex)(&ifq->dma_lock);
+	if (!area->is_mapped)
+		return;
+	area->is_mapped = false;
 
 	for (i = 0; i < area->nia.num_niovs; i++)
 		net_mp_niov_set_dma_addr(&area->nia.niovs[i], 0);
-}
-
-static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
-{
-	guard(mutex)(&ifq->dma_lock);
 
-	if (area->is_mapped)
-		__io_zcrx_unmap_area(ifq, area, area->nia.num_niovs);
-	area->is_mapped = false;
+	if (area->mem.is_dmabuf) {
+		io_release_dmabuf(&area->mem);
+	} else {
+		dma_unmap_sgtable(ifq->dev, &area->mem.page_sg_table,
+				  DMA_FROM_DEVICE, IO_DMA_ATTR);
+	}
 }
 
-static int io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
+static unsigned io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
 {
-	int i;
-
-	for (i = 0; i < area->nia.num_niovs; i++) {
-		struct net_iov *niov = &area->nia.niovs[i];
-		dma_addr_t dma;
-
-		dma = dma_map_page_attrs(ifq->dev, area->mem.pages[i], 0,
-					 PAGE_SIZE, DMA_FROM_DEVICE, IO_DMA_ATTR);
-		if (dma_mapping_error(ifq->dev, dma))
-			break;
-		if (net_mp_niov_set_dma_addr(niov, dma)) {
-			dma_unmap_page_attrs(ifq->dev, dma, PAGE_SIZE,
-					     DMA_FROM_DEVICE, IO_DMA_ATTR);
-			break;
-		}
-	}
+	int ret;
 
-	if (i != area->nia.num_niovs) {
-		__io_zcrx_unmap_area(ifq, area, i);
-		return -EINVAL;
-	}
-	return 0;
+	ret = dma_map_sgtable(ifq->dev, &area->mem.page_sg_table,
+				DMA_FROM_DEVICE, IO_DMA_ATTR);
+	if (ret < 0)
+		return ret;
+	return io_populate_area_dma(ifq, area, &area->mem.page_sg_table, 0);
 }
 
 static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 2f5e26389f22..89015b923911 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -14,6 +14,7 @@ struct io_zcrx_mem {
 
 	struct page			**pages;
 	unsigned long			nr_folios;
+	struct sg_table			page_sg_table;
 
 	struct dma_buf_attachment	*attach;
 	struct dma_buf			*dmabuf;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 5/6] io_uring/zcrx: assert area type in io_zcrx_iov_page
  2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
                   ` (3 preceding siblings ...)
  2025-07-02 14:29 ` [PATCH v4 4/6] io_uring/zcrx: allocate sgtable for umem areas Pavel Begunkov
@ 2025-07-02 14:29 ` Pavel Begunkov
  2025-07-02 14:29 ` [PATCH v4 6/6] io_uring/zcrx: prepare fallback for larger pages Pavel Begunkov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 14:29 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence

Add a simple debug assertion to io_zcrx_iov_page() making it's not
trying to return pages for a dmabuf area.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index fbcec06a1fb0..fcc7550aa0fa 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -44,6 +44,8 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
 {
 	struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
 
+	lockdep_assert(!area->mem.is_dmabuf);
+
 	return area->mem.pages[net_iov_idx(niov)];
 }
 
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v4 6/6] io_uring/zcrx: prepare fallback for larger pages
  2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
                   ` (4 preceding siblings ...)
  2025-07-02 14:29 ` [PATCH v4 5/6] io_uring/zcrx: assert area type in io_zcrx_iov_page Pavel Begunkov
@ 2025-07-02 14:29 ` Pavel Begunkov
  2025-07-02 23:12 ` [PATCH v4 0/6] zcrx huge pages support Vol 1 David Wei
  2025-07-08 18:00 ` Jens Axboe
  7 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 14:29 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence

io_zcrx_copy_chunk() processes one page at a time, which won't be
sufficient when the net_iov size grows. Introduce a structure keeping
the target niov page and other parameters, it's more convenient and can
be reused later. And add a helper function that can efficient copy
buffers of an arbitrary length. For 64bit archs the loop inside should
be compiled out.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/zcrx.c | 83 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 56 insertions(+), 27 deletions(-)

diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index fcc7550aa0fa..8777b90a46f3 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -927,6 +927,51 @@ static struct net_iov *io_zcrx_alloc_fallback(struct io_zcrx_area *area)
 	return niov;
 }
 
+struct io_copy_cache {
+	struct page		*page;
+	unsigned long		offset;
+	size_t			size;
+};
+
+static ssize_t io_copy_page(struct io_copy_cache *cc, struct page *src_page,
+			    unsigned int src_offset, size_t len)
+{
+	size_t copied = 0;
+
+	len = min(len, cc->size);
+
+	while (len) {
+		void *src_addr, *dst_addr;
+		struct page *dst_page = cc->page;
+		unsigned dst_offset = cc->offset;
+		size_t n = len;
+
+		if (folio_test_partial_kmap(page_folio(dst_page)) ||
+		    folio_test_partial_kmap(page_folio(src_page))) {
+			dst_page = nth_page(dst_page, dst_offset / PAGE_SIZE);
+			dst_offset = offset_in_page(dst_offset);
+			src_page = nth_page(src_page, src_offset / PAGE_SIZE);
+			src_offset = offset_in_page(src_offset);
+			n = min(PAGE_SIZE - src_offset, PAGE_SIZE - dst_offset);
+			n = min(n, len);
+		}
+
+		dst_addr = kmap_local_page(dst_page) + dst_offset;
+		src_addr = kmap_local_page(src_page) + src_offset;
+
+		memcpy(dst_addr, src_addr, n);
+
+		kunmap_local(src_addr);
+		kunmap_local(dst_addr);
+
+		cc->size -= n;
+		cc->offset += n;
+		len -= n;
+		copied += n;
+	}
+	return copied;
+}
+
 static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 				  struct page *src_page, unsigned int src_offset,
 				  size_t len)
@@ -939,11 +984,9 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 		return -EFAULT;
 
 	while (len) {
-		size_t copy_size = min_t(size_t, PAGE_SIZE, len);
-		const int dst_off = 0;
+		struct io_copy_cache cc;
 		struct net_iov *niov;
-		struct page *dst_page;
-		void *dst_addr, *src_addr;
+		size_t n;
 
 		niov = io_zcrx_alloc_fallback(area);
 		if (!niov) {
@@ -951,25 +994,22 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 			break;
 		}
 
-		dst_page = io_zcrx_iov_page(niov);
-		dst_addr = kmap_local_page(dst_page);
-		src_addr = kmap_local_page(src_page);
-
-		memcpy(dst_addr, src_addr + src_offset, copy_size);
+		cc.page = io_zcrx_iov_page(niov);
+		cc.offset = 0;
+		cc.size = PAGE_SIZE;
 
-		kunmap_local(src_addr);
-		kunmap_local(dst_addr);
+		n = io_copy_page(&cc, src_page, src_offset, len);
 
-		if (!io_zcrx_queue_cqe(req, niov, ifq, dst_off, copy_size)) {
+		if (!io_zcrx_queue_cqe(req, niov, ifq, 0, n)) {
 			io_zcrx_return_niov(niov);
 			ret = -ENOSPC;
 			break;
 		}
 
 		io_zcrx_get_niov_uref(niov);
-		src_offset += copy_size;
-		len -= copy_size;
-		copied += copy_size;
+		src_offset += n;
+		len -= n;
+		copied += n;
 	}
 
 	return copied ? copied : ret;
@@ -979,19 +1019,8 @@ static int io_zcrx_copy_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
 			     const skb_frag_t *frag, int off, int len)
 {
 	struct page *page = skb_frag_page(frag);
-	u32 p_off, p_len, t, copied = 0;
-	int ret = 0;
 
-	off += skb_frag_off(frag);
-
-	skb_frag_foreach_page(frag, off, len,
-			      page, p_off, p_len, t) {
-		ret = io_zcrx_copy_chunk(req, ifq, page, p_off, p_len);
-		if (ret < 0)
-			return copied ? copied : ret;
-		copied += ret;
-	}
-	return copied;
+	return io_zcrx_copy_chunk(req, ifq, page, off + skb_frag_off(frag), len);
 }
 
 static int io_zcrx_recv_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] io_uring/zcrx: return error from io_zcrx_map_area_*
  2025-07-02 14:29 ` [PATCH v4 2/6] io_uring/zcrx: return error from io_zcrx_map_area_* Pavel Begunkov
@ 2025-07-02 22:27   ` David Wei
  2025-07-02 22:50     ` Pavel Begunkov
  0 siblings, 1 reply; 16+ messages in thread
From: David Wei @ 2025-07-02 22:27 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-07-02 07:29, Pavel Begunkov wrote:
> io_zcrx_map_area_*() helpers return the number of processed niovs, which
> we use to unroll some of the mappings for user memory areas. It's
> unhandy, and dmabuf doesn't care about it. Return an error code instead
> and move failure partial unmapping into io_zcrx_map_area_umem().
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/zcrx.c | 27 ++++++++++++++-------------
>   1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 99a253c1c6c5..2cde88988260 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c

...

> @@ -254,29 +254,30 @@ static int io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *a
>   			break;
>   		}
>   	}
> -	return i;
> +
> +	if (i != area->nia.num_niovs) {
> +		__io_zcrx_unmap_area(ifq, area, i);
> +		return -EINVAL;
> +	}
> +	return 0;
>   }

Does io_release_dmabuf() still need to be called in
io_zcrx_map_area_dmabuf() in case of failure, if say
net_mp_niov_set_dma_addr() fails in the loop?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 1/6] io_uring/zcrx: always pass page to io_zcrx_copy_chunk
  2025-07-02 14:29 ` [PATCH v4 1/6] io_uring/zcrx: always pass page to io_zcrx_copy_chunk Pavel Begunkov
@ 2025-07-02 22:32   ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-07-02 22:32 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-07-02 07:29, Pavel Begunkov wrote:
> io_zcrx_copy_chunk() currently takes either a page or virtual address.
> Unify the parameters, make it take pages and resolve the linear part
> into a page the same way general networking code does that.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/zcrx.c | 21 ++++++++++-----------
>   1 file changed, 10 insertions(+), 11 deletions(-)
> 
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 797247a34cb7..99a253c1c6c5 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -943,8 +943,8 @@ static struct net_iov *io_zcrx_alloc_fallback(struct io_zcrx_area *area)
>   }
>   
>   static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
> -				  void *src_base, struct page *src_page,
> -				  unsigned int src_offset, size_t len)
> +				  struct page *src_page, unsigned int src_offset,
> +				  size_t len)
>   {
>   	struct io_zcrx_area *area = ifq->area;
>   	size_t copied = 0;
> @@ -958,7 +958,7 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
>   		const int dst_off = 0;
>   		struct net_iov *niov;
>   		struct page *dst_page;
> -		void *dst_addr;
> +		void *dst_addr, *src_addr;
>   
>   		niov = io_zcrx_alloc_fallback(area);
>   		if (!niov) {
> @@ -968,13 +968,11 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
>   
>   		dst_page = io_zcrx_iov_page(niov);
>   		dst_addr = kmap_local_page(dst_page);
> -		if (src_page)
> -			src_base = kmap_local_page(src_page);
> +		src_addr = kmap_local_page(src_page);
>   
> -		memcpy(dst_addr, src_base + src_offset, copy_size);
> +		memcpy(dst_addr, src_addr + src_offset, copy_size);
>   
> -		if (src_page)
> -			kunmap_local(src_base);
> +		kunmap_local(src_addr);
>   		kunmap_local(dst_addr);
>   
>   		if (!io_zcrx_queue_cqe(req, niov, ifq, dst_off, copy_size)) {
> @@ -1003,7 +1001,7 @@ static int io_zcrx_copy_frag(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
>   
>   	skb_frag_foreach_page(frag, off, len,
>   			      page, p_off, p_len, t) {
> -		ret = io_zcrx_copy_chunk(req, ifq, NULL, page, p_off, p_len);
> +		ret = io_zcrx_copy_chunk(req, ifq, page, p_off, p_len);
>   		if (ret < 0)
>   			return copied ? copied : ret;
>   		copied += ret;
> @@ -1065,8 +1063,9 @@ io_zcrx_recv_skb(read_descriptor_t *desc, struct sk_buff *skb,
>   		size_t to_copy;
>   
>   		to_copy = min_t(size_t, skb_headlen(skb) - offset, len);
> -		copied = io_zcrx_copy_chunk(req, ifq, skb->data, NULL,
> -					    offset, to_copy);
> +		copied = io_zcrx_copy_chunk(req, ifq, virt_to_page(skb->data),
> +					    offset_in_page(skb->data) + offset,
> +					    to_copy);
>   		if (copied < 0) {
>   			ret = copied;
>   			goto out;

Looks fine, mechanical changes.

Reviewed-by: David Wei <dw@davidwei.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] io_uring/zcrx: return error from io_zcrx_map_area_*
  2025-07-02 22:27   ` David Wei
@ 2025-07-02 22:50     ` Pavel Begunkov
  2025-07-02 23:20       ` David Wei
  0 siblings, 1 reply; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 22:50 UTC (permalink / raw)
  To: David Wei, io-uring

On 7/2/25 23:27, David Wei wrote:
> On 2025-07-02 07:29, Pavel Begunkov wrote:
>> io_zcrx_map_area_*() helpers return the number of processed niovs, which
>> we use to unroll some of the mappings for user memory areas. It's
>> unhandy, and dmabuf doesn't care about it. Return an error code instead
>> and move failure partial unmapping into io_zcrx_map_area_umem().
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>   io_uring/zcrx.c | 27 ++++++++++++++-------------
>>   1 file changed, 14 insertions(+), 13 deletions(-)
>>
>> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
>> index 99a253c1c6c5..2cde88988260 100644
>> --- a/io_uring/zcrx.c
>> +++ b/io_uring/zcrx.c
> 
> ...
> 
>> @@ -254,29 +254,30 @@ static int io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *a
>>               break;
>>           }
>>       }
>> -    return i;
>> +
>> +    if (i != area->nia.num_niovs) {
>> +        __io_zcrx_unmap_area(ifq, area, i);
>> +        return -EINVAL;
>> +    }
>> +    return 0;
>>   }
> 
> Does io_release_dmabuf() still need to be called in
> io_zcrx_map_area_dmabuf() in case of failure, if say
> net_mp_niov_set_dma_addr() fails in the loop?

It doesn't. The function only sets niov addresses, and it's fine
to leave garbage on failure. In contrast, the umem version sets
up dma mappings and needs to unmap if there is an error.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/6] zcrx huge pages support Vol 1
  2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
                   ` (5 preceding siblings ...)
  2025-07-02 14:29 ` [PATCH v4 6/6] io_uring/zcrx: prepare fallback for larger pages Pavel Begunkov
@ 2025-07-02 23:12 ` David Wei
  2025-07-02 23:51   ` Pavel Begunkov
  2025-07-08 18:00 ` Jens Axboe
  7 siblings, 1 reply; 16+ messages in thread
From: David Wei @ 2025-07-02 23:12 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-07-02 07:29, Pavel Begunkov wrote:
> Use sgtable as the common format b/w dmabuf and user pages, deduplicate
> dma address propagation handling, and use it to omptimise dma mappings.
> It also prepares it for larger size for NIC pages.
> 
> v4: further restrict kmap fallback length, v3 isn't be generic
>      enough to cover future uses.
> 
> v3: truncate kmap'ed length by both pages
> 
> v2: Don't coalesce into folios, just use sg_alloc_table_from_pages()
>      for now. Coalescing will return back later.
> 
>      Improve some fallback copy code. Patch 1, and Patch 6 adding a
>      helper to work with larger pages, which also allows to get rid
>      of skb_frag_foreach_page.
> 
>      Return copy fallback helpers back to pages instead of folios,
>      the latter wouldn't be correct in all cases.
> 
> Pavel Begunkov (6):
>    io_uring/zcrx: always pass page to io_zcrx_copy_chunk
>    io_uring/zcrx: return error from io_zcrx_map_area_*
>    io_uring/zcrx: introduce io_populate_area_dma
>    io_uring/zcrx: allocate sgtable for umem areas
>    io_uring/zcrx: assert area type in io_zcrx_iov_page
>    io_uring/zcrx: prepare fallback for larger pages
> 
>   io_uring/zcrx.c | 241 +++++++++++++++++++++++++-----------------------
>   io_uring/zcrx.h |   1 +
>   2 files changed, 128 insertions(+), 114 deletions(-)
> 

What did you use to test this patch series or will that come with vol 2?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 3/6] io_uring/zcrx: introduce io_populate_area_dma
  2025-07-02 14:29 ` [PATCH v4 3/6] io_uring/zcrx: introduce io_populate_area_dma Pavel Begunkov
@ 2025-07-02 23:13   ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-07-02 23:13 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-07-02 07:29, Pavel Begunkov wrote:
> Add a helper that initialises page-pool dma addresses from a sg table.
> It'll be reused in following patches.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/zcrx.c | 56 +++++++++++++++++++++++++++----------------------
>   1 file changed, 31 insertions(+), 25 deletions(-)
> 
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index 2cde88988260..cef0763010a0 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -47,6 +47,35 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
>   	return area->mem.pages[net_iov_idx(niov)];
>   }
>   
> +static int io_populate_area_dma(struct io_zcrx_ifq *ifq,
> +				struct io_zcrx_area *area,
> +				struct sg_table *sgt, unsigned long off)
> +{
> +	struct scatterlist *sg;
> +	unsigned i, niov_idx = 0;
> +
> +	for_each_sgtable_dma_sg(sgt, sg, i) {
> +		dma_addr_t dma = sg_dma_address(sg);
> +		unsigned long sg_len = sg_dma_len(sg);
> +		unsigned long sg_off = min(sg_len, off);
> +
> +		off -= sg_off;
> +		sg_len -= sg_off;
> +		dma += sg_off;
> +
> +		while (sg_len && niov_idx < area->nia.num_niovs) {
> +			struct net_iov *niov = &area->nia.niovs[niov_idx];
> +
> +			if (net_mp_niov_set_dma_addr(niov, dma))
> +				return -EFAULT;
> +			sg_len -= PAGE_SIZE;
> +			dma += PAGE_SIZE;
> +			niov_idx++;
> +		}
> +	}
> +	return 0;
> +}
> +
>   static void io_release_dmabuf(struct io_zcrx_mem *mem)
>   {
>   	if (!IS_ENABLED(CONFIG_DMA_SHARED_BUFFER))
> @@ -119,33 +148,10 @@ static int io_import_dmabuf(struct io_zcrx_ifq *ifq,
>   
>   static int io_zcrx_map_area_dmabuf(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
>   {
> -	unsigned long off = area->mem.dmabuf_offset;
> -	struct scatterlist *sg;
> -	unsigned i, niov_idx = 0;
> -
>   	if (!IS_ENABLED(CONFIG_DMA_SHARED_BUFFER))
>   		return -EINVAL;
> -
> -	for_each_sgtable_dma_sg(area->mem.sgt, sg, i) {
> -		dma_addr_t dma = sg_dma_address(sg);
> -		unsigned long sg_len = sg_dma_len(sg);
> -		unsigned long sg_off = min(sg_len, off);
> -
> -		off -= sg_off;
> -		sg_len -= sg_off;
> -		dma += sg_off;
> -
> -		while (sg_len && niov_idx < area->nia.num_niovs) {
> -			struct net_iov *niov = &area->nia.niovs[niov_idx];
> -
> -			if (net_mp_niov_set_dma_addr(niov, dma))
> -				return -EFAULT;
> -			sg_len -= PAGE_SIZE;
> -			dma += PAGE_SIZE;
> -			niov_idx++;
> -		}
> -	}
> -	return 0;
> +	return io_populate_area_dma(ifq, area, area->mem.sgt,
> +				    area->mem.dmabuf_offset);
>   }
>   
>   static int io_import_umem(struct io_zcrx_ifq *ifq,

Reviewed-by: David Wei <dw@davidwei.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 4/6] io_uring/zcrx: allocate sgtable for umem areas
  2025-07-02 14:29 ` [PATCH v4 4/6] io_uring/zcrx: allocate sgtable for umem areas Pavel Begunkov
@ 2025-07-02 23:16   ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-07-02 23:16 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-07-02 07:29, Pavel Begunkov wrote:
> Currently, dma addresses for umem areas are stored directly in niovs.
> It's memory efficient but inconvenient. I need a better format 1) to
> share code with dmabuf areas, and 2) for disentangling page, folio and
> niov sizes. dmabuf already provides sg_table, create one for user memory
> as well.
> 
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>   io_uring/zcrx.c | 78 +++++++++++++++++--------------------------------
>   io_uring/zcrx.h |  1 +
>   2 files changed, 28 insertions(+), 51 deletions(-)
> 
> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
> index cef0763010a0..fbcec06a1fb0 100644
> --- a/io_uring/zcrx.c
> +++ b/io_uring/zcrx.c
> @@ -159,7 +159,7 @@ static int io_import_umem(struct io_zcrx_ifq *ifq,
>   			  struct io_uring_zcrx_area_reg *area_reg)
>   {
>   	struct page **pages;
> -	int nr_pages;
> +	int nr_pages, ret;
>   
>   	if (area_reg->dmabuf_fd)
>   		return -EINVAL;
> @@ -170,6 +170,12 @@ static int io_import_umem(struct io_zcrx_ifq *ifq,
>   	if (IS_ERR(pages))
>   		return PTR_ERR(pages);
>   
> +	ret = sg_alloc_table_from_pages(&mem->page_sg_table, pages, nr_pages,
> +					0, nr_pages << PAGE_SHIFT,
> +					GFP_KERNEL_ACCOUNT);
> +	if (ret)
> +		return ret;
> +
>   	mem->pages = pages;
>   	mem->nr_folios = nr_pages;
>   	mem->size = area_reg->len;
> @@ -184,6 +190,7 @@ static void io_release_area_mem(struct io_zcrx_mem *mem)
>   	}
>   	if (mem->pages) {
>   		unpin_user_pages(mem->pages, mem->nr_folios);
> +		sg_free_table(&mem->page_sg_table);
>   		kvfree(mem->pages);
>   	}
>   }
> @@ -205,67 +212,36 @@ static int io_import_area(struct io_zcrx_ifq *ifq,
>   	return io_import_umem(ifq, mem, area_reg);
>   }
>   
> -static void io_zcrx_unmap_umem(struct io_zcrx_ifq *ifq,
> -				struct io_zcrx_area *area, int nr_mapped)
> -{
> -	int i;
> -
> -	for (i = 0; i < nr_mapped; i++) {
> -		netmem_ref netmem = net_iov_to_netmem(&area->nia.niovs[i]);
> -		dma_addr_t dma = page_pool_get_dma_addr_netmem(netmem);
> -
> -		dma_unmap_page_attrs(ifq->dev, dma, PAGE_SIZE,
> -				     DMA_FROM_DEVICE, IO_DMA_ATTR);
> -	}
> -}
> -
> -static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
> -				 struct io_zcrx_area *area, int nr_mapped)
> +static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
> +				struct io_zcrx_area *area)
>   {
>   	int i;
>   
> -	if (area->mem.is_dmabuf)
> -		io_release_dmabuf(&area->mem);
> -	else
> -		io_zcrx_unmap_umem(ifq, area, nr_mapped);
> +	guard(mutex)(&ifq->dma_lock);
> +	if (!area->is_mapped)
> +		return;
> +	area->is_mapped = false;
>   
>   	for (i = 0; i < area->nia.num_niovs; i++)
>   		net_mp_niov_set_dma_addr(&area->nia.niovs[i], 0);
> -}
> -
> -static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
> -{
> -	guard(mutex)(&ifq->dma_lock);
>   
> -	if (area->is_mapped)
> -		__io_zcrx_unmap_area(ifq, area, area->nia.num_niovs);
> -	area->is_mapped = false;
> +	if (area->mem.is_dmabuf) {
> +		io_release_dmabuf(&area->mem);
> +	} else {
> +		dma_unmap_sgtable(ifq->dev, &area->mem.page_sg_table,
> +				  DMA_FROM_DEVICE, IO_DMA_ATTR);
> +	}
>   }
>   
> -static int io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
> +static unsigned io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
>   {
> -	int i;
> -
> -	for (i = 0; i < area->nia.num_niovs; i++) {
> -		struct net_iov *niov = &area->nia.niovs[i];
> -		dma_addr_t dma;
> -
> -		dma = dma_map_page_attrs(ifq->dev, area->mem.pages[i], 0,
> -					 PAGE_SIZE, DMA_FROM_DEVICE, IO_DMA_ATTR);
> -		if (dma_mapping_error(ifq->dev, dma))
> -			break;
> -		if (net_mp_niov_set_dma_addr(niov, dma)) {
> -			dma_unmap_page_attrs(ifq->dev, dma, PAGE_SIZE,
> -					     DMA_FROM_DEVICE, IO_DMA_ATTR);
> -			break;
> -		}
> -	}
> +	int ret;
>   
> -	if (i != area->nia.num_niovs) {
> -		__io_zcrx_unmap_area(ifq, area, i);
> -		return -EINVAL;
> -	}
> -	return 0;
> +	ret = dma_map_sgtable(ifq->dev, &area->mem.page_sg_table,
> +				DMA_FROM_DEVICE, IO_DMA_ATTR);
> +	if (ret < 0)
> +		return ret;
> +	return io_populate_area_dma(ifq, area, &area->mem.page_sg_table, 0);
>   }
>   
>   static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
> diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
> index 2f5e26389f22..89015b923911 100644
> --- a/io_uring/zcrx.h
> +++ b/io_uring/zcrx.h
> @@ -14,6 +14,7 @@ struct io_zcrx_mem {
>   
>   	struct page			**pages;
>   	unsigned long			nr_folios;
> +	struct sg_table			page_sg_table;
>   
>   	struct dma_buf_attachment	*attach;
>   	struct dma_buf			*dmabuf;

I looked at dma_map_sgtable() (but not too closely) and it seems to be
equivalent to the hand rolled code being deleted.

Reviewed-by: David Wei <dw@davidwei.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 2/6] io_uring/zcrx: return error from io_zcrx_map_area_*
  2025-07-02 22:50     ` Pavel Begunkov
@ 2025-07-02 23:20       ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-07-02 23:20 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2025-07-02 15:50, Pavel Begunkov wrote:
> On 7/2/25 23:27, David Wei wrote:
>> On 2025-07-02 07:29, Pavel Begunkov wrote:
>>> io_zcrx_map_area_*() helpers return the number of processed niovs, which
>>> we use to unroll some of the mappings for user memory areas. It's
>>> unhandy, and dmabuf doesn't care about it. Return an error code instead
>>> and move failure partial unmapping into io_zcrx_map_area_umem().
>>>
>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>> ---
>>>   io_uring/zcrx.c | 27 ++++++++++++++-------------
>>>   1 file changed, 14 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
>>> index 99a253c1c6c5..2cde88988260 100644
>>> --- a/io_uring/zcrx.c
>>> +++ b/io_uring/zcrx.c
>>
>> ...
>>
>>> @@ -254,29 +254,30 @@ static int io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *a
>>>               break;
>>>           }
>>>       }
>>> -    return i;
>>> +
>>> +    if (i != area->nia.num_niovs) {
>>> +        __io_zcrx_unmap_area(ifq, area, i);
>>> +        return -EINVAL;
>>> +    }
>>> +    return 0;
>>>   }
>>
>> Does io_release_dmabuf() still need to be called in
>> io_zcrx_map_area_dmabuf() in case of failure, if say
>> net_mp_niov_set_dma_addr() fails in the loop?
> 
> It doesn't. The function only sets niov addresses, and it's fine
> to leave garbage on failure. In contrast, the umem version sets
> up dma mappings and needs to unmap if there is an error.
> 

Yeah, I see. And it's replaced with sg_table helpers anyway in patch 4.

Reviewed-by: David Wei <dw@davidwei.uk>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/6] zcrx huge pages support Vol 1
  2025-07-02 23:12 ` [PATCH v4 0/6] zcrx huge pages support Vol 1 David Wei
@ 2025-07-02 23:51   ` Pavel Begunkov
  0 siblings, 0 replies; 16+ messages in thread
From: Pavel Begunkov @ 2025-07-02 23:51 UTC (permalink / raw)
  To: David Wei, io-uring

On 7/3/25 00:12, David Wei wrote:
> On 2025-07-02 07:29, Pavel Begunkov wrote:
>> Use sgtable as the common format b/w dmabuf and user pages, deduplicate
>> dma address propagation handling, and use it to omptimise dma mappings.
>> It also prepares it for larger size for NIC pages.
>>
>> v4: further restrict kmap fallback length, v3 isn't be generic
>>      enough to cover future uses.
>>
>> v3: truncate kmap'ed length by both pages
>>
>> v2: Don't coalesce into folios, just use sg_alloc_table_from_pages()
>>      for now. Coalescing will return back later.
>>
>>      Improve some fallback copy code. Patch 1, and Patch 6 adding a
>>      helper to work with larger pages, which also allows to get rid
>>      of skb_frag_foreach_page.
>>
>>      Return copy fallback helpers back to pages instead of folios,
>>      the latter wouldn't be correct in all cases.
>>
>> Pavel Begunkov (6):
>>    io_uring/zcrx: always pass page to io_zcrx_copy_chunk
>>    io_uring/zcrx: return error from io_zcrx_map_area_*
>>    io_uring/zcrx: introduce io_populate_area_dma
>>    io_uring/zcrx: allocate sgtable for umem areas
>>    io_uring/zcrx: assert area type in io_zcrx_iov_page
>>    io_uring/zcrx: prepare fallback for larger pages
>>
>>   io_uring/zcrx.c | 241 +++++++++++++++++++++++++-----------------------
>>   io_uring/zcrx.h |   1 +
>>   2 files changed, 128 insertions(+), 114 deletions(-)
>>
> 
> What did you use to test this patch series or will that come with vol 2?

I used my libuirng example with different areas types but didn't
do anything in terms of benchmarking. This series only improves
iommu, and I don't care too much about that.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v4 0/6] zcrx huge pages support Vol 1
  2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
                   ` (6 preceding siblings ...)
  2025-07-02 23:12 ` [PATCH v4 0/6] zcrx huge pages support Vol 1 David Wei
@ 2025-07-08 18:00 ` Jens Axboe
  7 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2025-07-08 18:00 UTC (permalink / raw)
  To: io-uring, Pavel Begunkov


On Wed, 02 Jul 2025 15:29:03 +0100, Pavel Begunkov wrote:
> Use sgtable as the common format b/w dmabuf and user pages, deduplicate
> dma address propagation handling, and use it to omptimise dma mappings.
> It also prepares it for larger size for NIC pages.
> 
> v4: further restrict kmap fallback length, v3 isn't be generic
>     enough to cover future uses.
> 
> [...]

Applied, thanks!

[1/6] io_uring/zcrx: always pass page to io_zcrx_copy_chunk
      commit: e9a9ddb15b092eb4dc0d34a3e043e73f2510a6b0
[2/6] io_uring/zcrx: return error from io_zcrx_map_area_*
      commit: 06897ddfc523cea415bd139148c5276b8b61b016
[3/6] io_uring/zcrx: introduce io_populate_area_dma
      commit: 54e89a93ef05d1a7c9996ff12e42eeecb4f66697
[4/6] io_uring/zcrx: allocate sgtable for umem areas
      commit: b84621d96ee0221e0bfbf9f477bbec7a5077c464
[5/6] io_uring/zcrx: assert area type in io_zcrx_iov_page
      commit: 1b4dc1ff0a8887c2fbb83a48e87284375ab4b02a
[6/6] io_uring/zcrx: prepare fallback for larger pages
      commit: e67645bb7f3f48e0dd794ca813ede75f61e1b31b

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-07-08 18:00 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-02 14:29 [PATCH v4 0/6] zcrx huge pages support Vol 1 Pavel Begunkov
2025-07-02 14:29 ` [PATCH v4 1/6] io_uring/zcrx: always pass page to io_zcrx_copy_chunk Pavel Begunkov
2025-07-02 22:32   ` David Wei
2025-07-02 14:29 ` [PATCH v4 2/6] io_uring/zcrx: return error from io_zcrx_map_area_* Pavel Begunkov
2025-07-02 22:27   ` David Wei
2025-07-02 22:50     ` Pavel Begunkov
2025-07-02 23:20       ` David Wei
2025-07-02 14:29 ` [PATCH v4 3/6] io_uring/zcrx: introduce io_populate_area_dma Pavel Begunkov
2025-07-02 23:13   ` David Wei
2025-07-02 14:29 ` [PATCH v4 4/6] io_uring/zcrx: allocate sgtable for umem areas Pavel Begunkov
2025-07-02 23:16   ` David Wei
2025-07-02 14:29 ` [PATCH v4 5/6] io_uring/zcrx: assert area type in io_zcrx_iov_page Pavel Begunkov
2025-07-02 14:29 ` [PATCH v4 6/6] io_uring/zcrx: prepare fallback for larger pages Pavel Begunkov
2025-07-02 23:12 ` [PATCH v4 0/6] zcrx huge pages support Vol 1 David Wei
2025-07-02 23:51   ` Pavel Begunkov
2025-07-08 18:00 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox