* [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx
@ 2025-05-01 12:17 Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 1/5] io_uring/zcrx: improve area validation Pavel Begunkov
` (6 more replies)
0 siblings, 7 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
To: io-uring
Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
Currently, io_uring zcrx uses regular user pages to populate the
area for page pools, this series allows the user to pass a dmabuf
instead.
Patches 1-4 are preparatory and do code shuffling. All dmabuf
touching changes are in the last patch. A basic example can be
found at:
https://github.com/isilence/liburing/tree/zcrx-dmabuf
https://github.com/isilence/liburing.git zcrx-dmabuf
Pavel Begunkov (5):
io_uring/zcrx: improve area validation
io_uring/zcrx: resolve netdev before area creation
io_uring/zcrx: split out memory holders from area
io_uring/zcrx: split common area map/unmap parts
io_uring/zcrx: dmabuf backed zerocopy receive
include/uapi/linux/io_uring.h | 6 +-
io_uring/rsrc.c | 27 ++--
io_uring/rsrc.h | 2 +-
io_uring/zcrx.c | 260 +++++++++++++++++++++++++++-------
io_uring/zcrx.h | 18 ++-
5 files changed, 248 insertions(+), 65 deletions(-)
--
2.48.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH io_uring 1/5] io_uring/zcrx: improve area validation
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 2/5] io_uring/zcrx: resolve netdev before area creation Pavel Begunkov
` (5 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
To: io-uring
Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
dmabuf backed area will be taking an offset instead of addresses, and
io_buffer_validate() is not flexible enough to facilitate it. It also
takes an iovec, which may truncate the u64 length zcrx takes. Add a new
helper function for validation.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/rsrc.c | 27 +++++++++++++++------------
io_uring/rsrc.h | 2 +-
io_uring/zcrx.c | 7 +++----
3 files changed, 19 insertions(+), 17 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index b4c5f3ee8855..1657d775c8ba 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -80,10 +80,21 @@ static int io_account_mem(struct io_ring_ctx *ctx, unsigned long nr_pages)
return 0;
}
-int io_buffer_validate(struct iovec *iov)
+int io_validate_user_buf_range(u64 uaddr, u64 ulen)
{
- unsigned long tmp, acct_len = iov->iov_len + (PAGE_SIZE - 1);
+ unsigned long tmp, base = (unsigned long)uaddr;
+ unsigned long acct_len = (unsigned long)PAGE_ALIGN(ulen);
+ /* arbitrary limit, but we need something */
+ if (ulen > SZ_1G || !ulen)
+ return -EFAULT;
+ if (check_add_overflow(base, acct_len, &tmp))
+ return -EOVERFLOW;
+ return 0;
+}
+
+static int io_buffer_validate(struct iovec *iov)
+{
/*
* Don't impose further limits on the size and buffer
* constraints here, we'll -EINVAL later when IO is
@@ -91,17 +102,9 @@ int io_buffer_validate(struct iovec *iov)
*/
if (!iov->iov_base)
return iov->iov_len ? -EFAULT : 0;
- if (!iov->iov_len)
- return -EFAULT;
-
- /* arbitrary limit, but we need something */
- if (iov->iov_len > SZ_1G)
- return -EFAULT;
- if (check_add_overflow((unsigned long)iov->iov_base, acct_len, &tmp))
- return -EOVERFLOW;
-
- return 0;
+ return io_validate_user_buf_range((unsigned long)iov->iov_base,
+ iov->iov_len);
}
static void io_release_ubuf(void *priv)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 6008ad2e6d9e..2818aa0d0472 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -83,7 +83,7 @@ int io_register_rsrc_update(struct io_ring_ctx *ctx, void __user *arg,
unsigned size, unsigned type);
int io_register_rsrc(struct io_ring_ctx *ctx, void __user *arg,
unsigned int size, unsigned int type);
-int io_buffer_validate(struct iovec *iov);
+int io_validate_user_buf_range(u64 uaddr, u64 ulen);
bool io_check_coalesce_buffer(struct page **page_array, int nr_pages,
struct io_imu_folio_data *data);
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 22f420d6fbb9..5e918587fdc5 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -209,7 +209,6 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
{
struct io_zcrx_area *area;
int i, ret, nr_pages, nr_iovs;
- struct iovec iov;
if (area_reg->flags || area_reg->rq_area_token)
return -EINVAL;
@@ -218,11 +217,11 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
return -EINVAL;
- iov.iov_base = u64_to_user_ptr(area_reg->addr);
- iov.iov_len = area_reg->len;
- ret = io_buffer_validate(&iov);
+ ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
if (ret)
return ret;
+ if (!area_reg->addr)
+ return -EFAULT;
ret = -ENOMEM;
area = kzalloc(sizeof(*area), GFP_KERNEL);
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH io_uring 2/5] io_uring/zcrx: resolve netdev before area creation
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 1/5] io_uring/zcrx: improve area validation Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 3/5] io_uring/zcrx: split out memory holders from area Pavel Begunkov
` (4 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
To: io-uring
Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
Some area types will require a valid struct device to be created, so
resolve netdev and struct device before creating an area.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 5e918587fdc5..b5335dd4f5b1 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -395,6 +395,7 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
ifq = io_zcrx_ifq_alloc(ctx);
if (!ifq)
return -ENOMEM;
+ ifq->rq_entries = reg.rq_entries;
scoped_guard(mutex, &ctx->mmap_lock) {
/* preallocate id */
@@ -407,24 +408,24 @@ int io_register_zcrx_ifq(struct io_ring_ctx *ctx,
if (ret)
goto err;
- ret = io_zcrx_create_area(ifq, &ifq->area, &area);
- if (ret)
- goto err;
-
- ifq->rq_entries = reg.rq_entries;
-
- ret = -ENODEV;
ifq->netdev = netdev_get_by_index(current->nsproxy->net_ns, reg.if_idx,
&ifq->netdev_tracker, GFP_KERNEL);
- if (!ifq->netdev)
+ if (!ifq->netdev) {
+ ret = -ENODEV;
goto err;
+ }
ifq->dev = ifq->netdev->dev.parent;
- ret = -EOPNOTSUPP;
- if (!ifq->dev)
+ if (!ifq->dev) {
+ ret = -EOPNOTSUPP;
goto err;
+ }
get_device(ifq->dev);
+ ret = io_zcrx_create_area(ifq, &ifq->area, &area);
+ if (ret)
+ goto err;
+
mp_param.mp_ops = &io_uring_pp_zc_ops;
mp_param.mp_priv = ifq;
ret = net_mp_open_rxq(ifq->netdev, reg.if_rxq, &mp_param);
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH io_uring 3/5] io_uring/zcrx: split out memory holders from area
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 1/5] io_uring/zcrx: improve area validation Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 2/5] io_uring/zcrx: resolve netdev before area creation Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 4/5] io_uring/zcrx: split common area map/unmap parts Pavel Begunkov
` (3 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
To: io-uring
Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
In the data path users of struct io_zcrx_area don't need to know what
kind of memory it's backed by. Only keep there generic bits in there and
and split out memory type dependent fields into a new structure. It also
logically separates the step that actually imports the memory, e.g.
pinning user pages, from the generic area initialisation.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 71 ++++++++++++++++++++++++++++++++-----------------
io_uring/zcrx.h | 11 ++++++--
2 files changed, 56 insertions(+), 26 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index b5335dd4f5b1..8d4cfd957e38 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -26,6 +26,8 @@
#include "zcrx.h"
#include "rsrc.h"
+#define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
+
static inline struct io_zcrx_ifq *io_pp_to_ifq(struct page_pool *pp)
{
return pp->mp_priv;
@@ -42,10 +44,43 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
{
struct io_zcrx_area *area = io_zcrx_iov_to_area(niov);
- return area->pages[net_iov_idx(niov)];
+ return area->mem.pages[net_iov_idx(niov)];
}
-#define IO_DMA_ATTR (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
+static void io_release_area_mem(struct io_zcrx_mem *mem)
+{
+ if (mem->pages) {
+ unpin_user_pages(mem->pages, mem->nr_folios);
+ kvfree(mem->pages);
+ }
+}
+
+static int io_import_area(struct io_zcrx_ifq *ifq,
+ struct io_zcrx_mem *mem,
+ struct io_uring_zcrx_area_reg *area_reg)
+{
+ struct page **pages;
+ int nr_pages;
+ int ret;
+
+ ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
+ if (ret)
+ return ret;
+ if (!area_reg->addr)
+ return -EFAULT;
+ if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
+ return -EINVAL;
+
+ pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len,
+ &nr_pages);
+ if (IS_ERR(pages))
+ return PTR_ERR(pages);
+
+ mem->pages = pages;
+ mem->nr_folios = nr_pages;
+ mem->size = area_reg->len;
+ return 0;
+}
static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
struct io_zcrx_area *area, int nr_mapped)
@@ -84,8 +119,8 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
struct net_iov *niov = &area->nia.niovs[i];
dma_addr_t dma;
- dma = dma_map_page_attrs(ifq->dev, area->pages[i], 0, PAGE_SIZE,
- DMA_FROM_DEVICE, IO_DMA_ATTR);
+ dma = dma_map_page_attrs(ifq->dev, area->mem.pages[i], 0,
+ PAGE_SIZE, DMA_FROM_DEVICE, IO_DMA_ATTR);
if (dma_mapping_error(ifq->dev, dma))
break;
if (net_mp_niov_set_dma_addr(niov, dma)) {
@@ -192,14 +227,11 @@ static void io_free_rbuf_ring(struct io_zcrx_ifq *ifq)
static void io_zcrx_free_area(struct io_zcrx_area *area)
{
io_zcrx_unmap_area(area->ifq, area);
+ io_release_area_mem(&area->mem);
kvfree(area->freelist);
kvfree(area->nia.niovs);
kvfree(area->user_refs);
- if (area->pages) {
- unpin_user_pages(area->pages, area->nr_folios);
- kvfree(area->pages);
- }
kfree(area);
}
@@ -208,36 +240,27 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
struct io_uring_zcrx_area_reg *area_reg)
{
struct io_zcrx_area *area;
- int i, ret, nr_pages, nr_iovs;
+ unsigned nr_iovs;
+ int i, ret;
if (area_reg->flags || area_reg->rq_area_token)
return -EINVAL;
if (area_reg->__resv1 || area_reg->__resv2[0] || area_reg->__resv2[1])
return -EINVAL;
- if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
- return -EINVAL;
-
- ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
- if (ret)
- return ret;
- if (!area_reg->addr)
- return -EFAULT;
ret = -ENOMEM;
area = kzalloc(sizeof(*area), GFP_KERNEL);
if (!area)
goto err;
- area->pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len,
- &nr_pages);
- if (IS_ERR(area->pages)) {
- ret = PTR_ERR(area->pages);
- area->pages = NULL;
+ ret = io_import_area(ifq, &area->mem, area_reg);
+ if (ret)
goto err;
- }
- area->nr_folios = nr_iovs = nr_pages;
+
+ nr_iovs = area->mem.size >> PAGE_SHIFT;
area->nia.num_niovs = nr_iovs;
+ ret = -ENOMEM;
area->nia.niovs = kvmalloc_array(nr_iovs, sizeof(area->nia.niovs[0]),
GFP_KERNEL | __GFP_ZERO);
if (!area->nia.niovs)
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index e3c7c4e647f1..9c22807af807 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -7,6 +7,13 @@
#include <net/page_pool/types.h>
#include <net/net_trackers.h>
+struct io_zcrx_mem {
+ unsigned long size;
+
+ struct page **pages;
+ unsigned long nr_folios;
+};
+
struct io_zcrx_area {
struct net_iov_area nia;
struct io_zcrx_ifq *ifq;
@@ -14,13 +21,13 @@ struct io_zcrx_area {
bool is_mapped;
u16 area_id;
- struct page **pages;
- unsigned long nr_folios;
/* freelist */
spinlock_t freelist_lock ____cacheline_aligned_in_smp;
u32 free_count;
u32 *freelist;
+
+ struct io_zcrx_mem mem;
};
struct io_zcrx_ifq {
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH io_uring 4/5] io_uring/zcrx: split common area map/unmap parts
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
` (2 preceding siblings ...)
2025-05-01 12:17 ` [PATCH io_uring 3/5] io_uring/zcrx: split out memory holders from area Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 5/5] io_uring/zcrx: dmabuf backed zerocopy receive Pavel Begunkov
` (2 subsequent siblings)
6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
To: io-uring
Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
Extract area type depedent parts of io_zcrx_[un]map_area from the
generic path. It'll be helpful once there are more area memory types
and not only user pages.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/zcrx.c | 42 +++++++++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 8d4cfd957e38..34b09beba992 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -82,22 +82,31 @@ static int io_import_area(struct io_zcrx_ifq *ifq,
return 0;
}
-static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
- struct io_zcrx_area *area, int nr_mapped)
+static void io_zcrx_unmap_umem(struct io_zcrx_ifq *ifq,
+ struct io_zcrx_area *area, int nr_mapped)
{
int i;
for (i = 0; i < nr_mapped; i++) {
- struct net_iov *niov = &area->nia.niovs[i];
- dma_addr_t dma;
+ netmem_ref netmem = net_iov_to_netmem(&area->nia.niovs[i]);
+ dma_addr_t dma = page_pool_get_dma_addr_netmem(netmem);
- dma = page_pool_get_dma_addr_netmem(net_iov_to_netmem(niov));
dma_unmap_page_attrs(ifq->dev, dma, PAGE_SIZE,
DMA_FROM_DEVICE, IO_DMA_ATTR);
- net_mp_niov_set_dma_addr(niov, 0);
}
}
+static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
+ struct io_zcrx_area *area, int nr_mapped)
+{
+ int i;
+
+ io_zcrx_unmap_umem(ifq, area, nr_mapped);
+
+ for (i = 0; i < area->nia.num_niovs; i++)
+ net_mp_niov_set_dma_addr(&area->nia.niovs[i], 0);
+}
+
static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
{
guard(mutex)(&ifq->dma_lock);
@@ -107,14 +116,10 @@ static void io_zcrx_unmap_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *are
area->is_mapped = false;
}
-static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
+static int io_zcrx_map_area_umem(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
{
int i;
- guard(mutex)(&ifq->dma_lock);
- if (area->is_mapped)
- return 0;
-
for (i = 0; i < area->nia.num_niovs; i++) {
struct net_iov *niov = &area->nia.niovs[i];
dma_addr_t dma;
@@ -129,9 +134,20 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
break;
}
}
+ return i;
+}
+
+static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
+{
+ unsigned nr;
+
+ guard(mutex)(&ifq->dma_lock);
+ if (area->is_mapped)
+ return 0;
- if (i != area->nia.num_niovs) {
- __io_zcrx_unmap_area(ifq, area, i);
+ nr = io_zcrx_map_area_umem(ifq, area);
+ if (nr != area->nia.num_niovs) {
+ __io_zcrx_unmap_area(ifq, area, nr);
return -EINVAL;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH io_uring 5/5] io_uring/zcrx: dmabuf backed zerocopy receive
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
` (3 preceding siblings ...)
2025-05-01 12:17 ` [PATCH io_uring 4/5] io_uring/zcrx: split common area map/unmap parts Pavel Begunkov
@ 2025-05-01 12:17 ` Pavel Begunkov
2025-05-02 15:25 ` [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Jens Axboe
2025-05-06 14:34 ` Alexey Charkov
6 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-01 12:17 UTC (permalink / raw)
To: io-uring
Cc: asml.silence, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
Add support for dmabuf backed zcrx areas. To use it, the user should
pass IORING_ZCRX_AREA_DMABUF in the struct io_uring_zcrx_area_reg flags
field and pass a dmabuf fd in the dmabuf_fd field.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 6 +-
io_uring/zcrx.c | 155 ++++++++++++++++++++++++++++++----
io_uring/zcrx.h | 7 ++
3 files changed, 151 insertions(+), 17 deletions(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 130f3bc71a69..5ce096090b0c 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -990,12 +990,16 @@ struct io_uring_zcrx_offsets {
__u64 __resv[2];
};
+enum io_uring_zcrx_area_flags {
+ IORING_ZCRX_AREA_DMABUF = 1,
+};
+
struct io_uring_zcrx_area_reg {
__u64 addr;
__u64 len;
__u64 rq_area_token;
__u32 flags;
- __u32 __resv1;
+ __u32 dmabuf_fd;
__u64 __resv2[2];
};
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index 34b09beba992..fac293bcba72 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -47,30 +47,110 @@ static inline struct page *io_zcrx_iov_page(const struct net_iov *niov)
return area->mem.pages[net_iov_idx(niov)];
}
-static void io_release_area_mem(struct io_zcrx_mem *mem)
+static void io_release_dmabuf(struct io_zcrx_mem *mem)
{
- if (mem->pages) {
- unpin_user_pages(mem->pages, mem->nr_folios);
- kvfree(mem->pages);
+ if (mem->sgt)
+ dma_buf_unmap_attachment_unlocked(mem->attach, mem->sgt,
+ DMA_FROM_DEVICE);
+ if (mem->attach)
+ dma_buf_detach(mem->dmabuf, mem->attach);
+ if (mem->dmabuf)
+ dma_buf_put(mem->dmabuf);
+
+ mem->sgt = NULL;
+ mem->attach = NULL;
+ mem->dmabuf = NULL;
+}
+
+static int io_import_dmabuf(struct io_zcrx_ifq *ifq,
+ struct io_zcrx_mem *mem,
+ struct io_uring_zcrx_area_reg *area_reg)
+{
+ unsigned long off = (unsigned long)area_reg->addr;
+ unsigned long len = (unsigned long)area_reg->len;
+ unsigned long total_size = 0;
+ struct scatterlist *sg;
+ int dmabuf_fd = area_reg->dmabuf_fd;
+ int i, ret;
+
+ if (WARN_ON_ONCE(!ifq->dev))
+ return -EFAULT;
+
+ mem->is_dmabuf = true;
+ mem->dmabuf = dma_buf_get(dmabuf_fd);
+ if (IS_ERR(mem->dmabuf)) {
+ ret = PTR_ERR(mem->dmabuf);
+ mem->dmabuf = NULL;
+ goto err;
+ }
+
+ mem->attach = dma_buf_attach(mem->dmabuf, ifq->dev);
+ if (IS_ERR(mem->attach)) {
+ ret = PTR_ERR(mem->attach);
+ mem->attach = NULL;
+ goto err;
+ }
+
+ mem->sgt = dma_buf_map_attachment_unlocked(mem->attach, DMA_FROM_DEVICE);
+ if (IS_ERR(mem->sgt)) {
+ ret = PTR_ERR(mem->sgt);
+ mem->sgt = NULL;
+ goto err;
}
+
+ for_each_sgtable_dma_sg(mem->sgt, sg, i)
+ total_size += sg_dma_len(sg);
+
+ if (total_size < off + len)
+ return -EINVAL;
+
+ mem->dmabuf_offset = off;
+ mem->size = len;
+ return 0;
+err:
+ io_release_dmabuf(mem);
+ return ret;
}
-static int io_import_area(struct io_zcrx_ifq *ifq,
+static int io_zcrx_map_area_dmabuf(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
+{
+ unsigned long off = area->mem.dmabuf_offset;
+ struct scatterlist *sg;
+ unsigned i, niov_idx = 0;
+
+ for_each_sgtable_dma_sg(area->mem.sgt, sg, i) {
+ dma_addr_t dma = sg_dma_address(sg);
+ unsigned long sg_len = sg_dma_len(sg);
+ unsigned long sg_off = min(sg_len, off);
+
+ off -= sg_off;
+ sg_len -= sg_off;
+ dma += sg_off;
+
+ while (sg_len && niov_idx < area->nia.num_niovs) {
+ struct net_iov *niov = &area->nia.niovs[niov_idx];
+
+ if (net_mp_niov_set_dma_addr(niov, dma))
+ return 0;
+ sg_len -= PAGE_SIZE;
+ dma += PAGE_SIZE;
+ niov_idx++;
+ }
+ }
+ return niov_idx;
+}
+
+static int io_import_umem(struct io_zcrx_ifq *ifq,
struct io_zcrx_mem *mem,
struct io_uring_zcrx_area_reg *area_reg)
{
struct page **pages;
int nr_pages;
- int ret;
- ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
- if (ret)
- return ret;
+ if (area_reg->dmabuf_fd)
+ return -EINVAL;
if (!area_reg->addr)
return -EFAULT;
- if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
- return -EINVAL;
-
pages = io_pin_pages((unsigned long)area_reg->addr, area_reg->len,
&nr_pages);
if (IS_ERR(pages))
@@ -82,6 +162,35 @@ static int io_import_area(struct io_zcrx_ifq *ifq,
return 0;
}
+static void io_release_area_mem(struct io_zcrx_mem *mem)
+{
+ if (mem->is_dmabuf) {
+ io_release_dmabuf(mem);
+ return;
+ }
+ if (mem->pages) {
+ unpin_user_pages(mem->pages, mem->nr_folios);
+ kvfree(mem->pages);
+ }
+}
+
+static int io_import_area(struct io_zcrx_ifq *ifq,
+ struct io_zcrx_mem *mem,
+ struct io_uring_zcrx_area_reg *area_reg)
+{
+ int ret;
+
+ ret = io_validate_user_buf_range(area_reg->addr, area_reg->len);
+ if (ret)
+ return ret;
+ if (area_reg->addr & ~PAGE_MASK || area_reg->len & ~PAGE_MASK)
+ return -EINVAL;
+
+ if (area_reg->flags & IORING_ZCRX_AREA_DMABUF)
+ return io_import_dmabuf(ifq, mem, area_reg);
+ return io_import_umem(ifq, mem, area_reg);
+}
+
static void io_zcrx_unmap_umem(struct io_zcrx_ifq *ifq,
struct io_zcrx_area *area, int nr_mapped)
{
@@ -101,7 +210,10 @@ static void __io_zcrx_unmap_area(struct io_zcrx_ifq *ifq,
{
int i;
- io_zcrx_unmap_umem(ifq, area, nr_mapped);
+ if (area->mem.is_dmabuf)
+ io_release_dmabuf(&area->mem);
+ else
+ io_zcrx_unmap_umem(ifq, area, nr_mapped);
for (i = 0; i < area->nia.num_niovs; i++)
net_mp_niov_set_dma_addr(&area->nia.niovs[i], 0);
@@ -145,7 +257,11 @@ static int io_zcrx_map_area(struct io_zcrx_ifq *ifq, struct io_zcrx_area *area)
if (area->is_mapped)
return 0;
- nr = io_zcrx_map_area_umem(ifq, area);
+ if (area->mem.is_dmabuf)
+ nr = io_zcrx_map_area_dmabuf(ifq, area);
+ else
+ nr = io_zcrx_map_area_umem(ifq, area);
+
if (nr != area->nia.num_niovs) {
__io_zcrx_unmap_area(ifq, area, nr);
return -EINVAL;
@@ -251,6 +367,8 @@ static void io_zcrx_free_area(struct io_zcrx_area *area)
kfree(area);
}
+#define IO_ZCRX_AREA_SUPPORTED_FLAGS (IORING_ZCRX_AREA_DMABUF)
+
static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
struct io_zcrx_area **res,
struct io_uring_zcrx_area_reg *area_reg)
@@ -259,9 +377,11 @@ static int io_zcrx_create_area(struct io_zcrx_ifq *ifq,
unsigned nr_iovs;
int i, ret;
- if (area_reg->flags || area_reg->rq_area_token)
+ if (area_reg->flags & ~IO_ZCRX_AREA_SUPPORTED_FLAGS)
+ return -EINVAL;
+ if (area_reg->rq_area_token)
return -EINVAL;
- if (area_reg->__resv1 || area_reg->__resv2[0] || area_reg->__resv2[1])
+ if (area_reg->__resv2[0] || area_reg->__resv2[1])
return -EINVAL;
ret = -ENOMEM;
@@ -819,6 +939,9 @@ static ssize_t io_zcrx_copy_chunk(struct io_kiocb *req, struct io_zcrx_ifq *ifq,
size_t copied = 0;
int ret = 0;
+ if (area->mem.is_dmabuf)
+ return -EFAULT;
+
while (len) {
size_t copy_size = min_t(size_t, PAGE_SIZE, len);
const int dst_off = 0;
diff --git a/io_uring/zcrx.h b/io_uring/zcrx.h
index 9c22807af807..2f5e26389f22 100644
--- a/io_uring/zcrx.h
+++ b/io_uring/zcrx.h
@@ -3,15 +3,22 @@
#define IOU_ZC_RX_H
#include <linux/io_uring_types.h>
+#include <linux/dma-buf.h>
#include <linux/socket.h>
#include <net/page_pool/types.h>
#include <net/net_trackers.h>
struct io_zcrx_mem {
unsigned long size;
+ bool is_dmabuf;
struct page **pages;
unsigned long nr_folios;
+
+ struct dma_buf_attachment *attach;
+ struct dma_buf *dmabuf;
+ struct sg_table *sgt;
+ unsigned long dmabuf_offset;
};
struct io_zcrx_area {
--
2.48.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
` (4 preceding siblings ...)
2025-05-01 12:17 ` [PATCH io_uring 5/5] io_uring/zcrx: dmabuf backed zerocopy receive Pavel Begunkov
@ 2025-05-02 15:25 ` Jens Axboe
2025-05-06 14:34 ` Alexey Charkov
6 siblings, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2025-05-02 15:25 UTC (permalink / raw)
To: io-uring, Pavel Begunkov
Cc: David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
On Thu, 01 May 2025 13:17:13 +0100, Pavel Begunkov wrote:
> Currently, io_uring zcrx uses regular user pages to populate the
> area for page pools, this series allows the user to pass a dmabuf
> instead.
>
> Patches 1-4 are preparatory and do code shuffling. All dmabuf
> touching changes are in the last patch. A basic example can be
> found at:
>
> [...]
Applied, thanks!
[1/5] io_uring/zcrx: improve area validation
commit: d760d3f59f0d8d0df2895db30d36cf23106d6b05
[2/5] io_uring/zcrx: resolve netdev before area creation
commit: 6c9589aa08471f8984cdb5e743d2a2c048dc2403
[3/5] io_uring/zcrx: split out memory holders from area
commit: 782dfa329ac9d1b5ca7b6df56a7696bac58cb829
[4/5] io_uring/zcrx: split common area map/unmap parts
commit: 8a62804248fff77749048a0f5511649b2569bba9
[5/5] io_uring/zcrx: dmabuf backed zerocopy receive
commit: a42c735833315bbe7a54243ef5453b9a7fa0c248
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
` (5 preceding siblings ...)
2025-05-02 15:25 ` [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Jens Axboe
@ 2025-05-06 14:34 ` Alexey Charkov
2025-05-06 15:32 ` Pavel Begunkov
6 siblings, 1 reply; 9+ messages in thread
From: Alexey Charkov @ 2025-05-06 14:34 UTC (permalink / raw)
To: Pavel Begunkov
Cc: io-uring, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
On Tue, May 6, 2025 at 6:29 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> Currently, io_uring zcrx uses regular user pages to populate the
> area for page pools, this series allows the user to pass a dmabuf
> instead.
>
> Patches 1-4 are preparatory and do code shuffling. All dmabuf
> touching changes are in the last patch. A basic example can be
> found at:
>
> https://github.com/isilence/liburing/tree/zcrx-dmabuf
> https://github.com/isilence/liburing.git zcrx-dmabuf
>
> Pavel Begunkov (5):
> io_uring/zcrx: improve area validation
> io_uring/zcrx: resolve netdev before area creation
> io_uring/zcrx: split out memory holders from area
> io_uring/zcrx: split common area map/unmap parts
> io_uring/zcrx: dmabuf backed zerocopy receive
>
> include/uapi/linux/io_uring.h | 6 +-
> io_uring/rsrc.c | 27 ++--
> io_uring/rsrc.h | 2 +-
> io_uring/zcrx.c | 260 +++++++++++++++++++++++++++-------
> io_uring/zcrx.h | 18 ++-
> 5 files changed, 248 insertions(+), 65 deletions(-)
Hi Pavel,
Looks like another "depends" line might be needed in io_uring/Kconfig:
diff --git a/io_uring/Kconfig b/io_uring/Kconfig
index 4b949c42c0bf..9fa2cf502940 100644
--- a/io_uring/Kconfig
+++ b/io_uring/Kconfig
@@ -9,3 +9,4 @@ config IO_URING_ZCRX
depends on PAGE_POOL
depends on INET
depends on NET_RX_BUSY_POLL
+ depends on DMA_SHARED_BUFFER
Otherwise I'm having trouble compiling the next-20250506 kernel for
VT8500, which doesn't select DMA_BUF by default. The following linking
error appears at the very end:
armv7a-unknown-linux-gnueabihf-ld: io_uring/zcrx.o: in function
`io_release_dmabuf':
zcrx.c:(.text+0x1c): undefined reference to `dma_buf_unmap_attachment_unlocked'
armv7a-unknown-linux-gnueabihf-ld: zcrx.c:(.text+0x30): undefined
reference to `dma_buf_detach'
armv7a-unknown-linux-gnueabihf-ld: zcrx.c:(.text+0x40): undefined
reference to `dma_buf_put'
armv7a-unknown-linux-gnueabihf-ld: io_uring/zcrx.o: in function
`io_register_zcrx_ifq':
zcrx.c:(.text+0x15cc): undefined reference to `dma_buf_get'
armv7a-unknown-linux-gnueabihf-ld: zcrx.c:(.text+0x15e8): undefined
reference to `dma_buf_attach'
armv7a-unknown-linux-gnueabihf-ld: zcrx.c:(.text+0x1604): undefined
reference to `dma_buf_map_attachment_unlocked'
make[2]: *** [scripts/Makefile.vmlinux:91: vmlinux] Error 1
make[1]: *** [/home/alchark/linux/Makefile:1242: vmlinux] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Best regards,
Alexey
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx
2025-05-06 14:34 ` Alexey Charkov
@ 2025-05-06 15:32 ` Pavel Begunkov
0 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2025-05-06 15:32 UTC (permalink / raw)
To: Alexey Charkov
Cc: io-uring, David Wei, netdev, Jamal Hadi Salim, Pedro Tammela,
Victor Nogueira
On 5/6/25 15:34, Alexey Charkov wrote:
> On Tue, May 6, 2025 at 6:29 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> Currently, io_uring zcrx uses regular user pages to populate the
>> area for page pools, this series allows the user to pass a dmabuf
>> instead.
>>
>> Patches 1-4 are preparatory and do code shuffling. All dmabuf
>> touching changes are in the last patch. A basic example can be
>> found at:
>>
>> https://github.com/isilence/liburing/tree/zcrx-dmabuf
>> https://github.com/isilence/liburing.git zcrx-dmabuf
>>
>> Pavel Begunkov (5):
>> io_uring/zcrx: improve area validation
>> io_uring/zcrx: resolve netdev before area creation
>> io_uring/zcrx: split out memory holders from area
>> io_uring/zcrx: split common area map/unmap parts
>> io_uring/zcrx: dmabuf backed zerocopy receive
>>
>> include/uapi/linux/io_uring.h | 6 +-
>> io_uring/rsrc.c | 27 ++--
>> io_uring/rsrc.h | 2 +-
>> io_uring/zcrx.c | 260 +++++++++++++++++++++++++++-------
>> io_uring/zcrx.h | 18 ++-
>> 5 files changed, 248 insertions(+), 65 deletions(-)
>
> Hi Pavel,
>
> Looks like another "depends" line might be needed in io_uring/Kconfig:
Ah yes, thanks for letting know, I'll patch it up. dmabuf is
optional here, so fwiw not going to gate the entire api on
that.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-05-06 15:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-01 12:17 [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 1/5] io_uring/zcrx: improve area validation Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 2/5] io_uring/zcrx: resolve netdev before area creation Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 3/5] io_uring/zcrx: split out memory holders from area Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 4/5] io_uring/zcrx: split common area map/unmap parts Pavel Begunkov
2025-05-01 12:17 ` [PATCH io_uring 5/5] io_uring/zcrx: dmabuf backed zerocopy receive Pavel Begunkov
2025-05-02 15:25 ` [PATCH io_uring 0/5] Add dmabuf support for io_uring zcrx Jens Axboe
2025-05-06 14:34 ` Alexey Charkov
2025-05-06 15:32 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox