[PATCH v1 00/11] io_uring: add kernel-managed buffer rings

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v1 00/11] io_uring: add kernel-managed buffer rings
@ 2026-02-10  0:28 Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
                   ` (11 more replies)
  0 siblings, 12 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Currently, io_uring buffer rings require the application to allocate and
manage the backing buffers. This series introduces kernel-managed buffer
rings, where the kernel allocates and manages the buffers on behalf of
the application.

This is split out from the fuse over io_uring series in [1], which needs the
kernel to own and manage buffers shared between the fuse server and the
kernel.

This series is on top of the for-next branch in Jens' io-uring tree. The
corresponding liburing changes are in [2] and will be submitted after the
changes in this patchset are accepted.

Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/20260116233044.1532965-1-joannelkoong@gmail.com/
[2] https://github.com/joannekoong/liburing/tree/kmbuf

Changelog
---------
Changes since [1]:
* add "if (bl)" check for recycling API (Bernd)
* check mul overflow, use GFP_USER, use PTR as return type (Christoph)
* fix bl->ring leak (me)

Joanne Koong (11):
  io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic
    helpers
  io_uring/kbuf: rename io_unregister_pbuf_ring() to
    io_unregister_buf_ring()
  io_uring/kbuf: add support for kernel-managed buffer rings
  io_uring/kbuf: add mmap support for kernel-managed buffer rings
  io_uring/kbuf: support kernel-managed buffer rings in buffer selection
  io_uring/kbuf: add buffer ring pinning/unpinning
  io_uring/kbuf: add recycling for kernel managed buffer rings
  io_uring/kbuf: add io_uring_is_kmbuf_ring()
  io_uring/kbuf: export io_ring_buffer_select()
  io_uring/kbuf: return buffer id in buffer selection
  io_uring/cmd: set selected buffer index in __io_uring_cmd_done()

 include/linux/io_uring/cmd.h   |  53 ++++-
 include/linux/io_uring_types.h |  10 +-
 include/uapi/linux/io_uring.h  |  17 +-
 io_uring/kbuf.c                | 365 +++++++++++++++++++++++++++------
 io_uring/kbuf.h                |  19 +-
 io_uring/memmap.c              | 116 ++++++++++-
 io_uring/memmap.h              |   4 +
 io_uring/register.c            |   9 +-
 io_uring/uring_cmd.c           |   6 +-
 9 files changed, 526 insertions(+), 73 deletions(-)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Refactor the logic in io_register_pbuf_ring() into generic helpers:
- io_copy_and_validate_buf_reg(): Copy out user arg and validate user
  arg and buffer registration parameters
- io_alloc_new_buffer_list(): Allocate and initialize a new buffer
  list for the given buffer group ID
- io_setup_pbuf_ring(): Sets up the physical buffer ring region and
  handles memory mapping for provided buffer rings

This is a preparatory change for upcoming kernel-managed buffer ring
support which will need to reuse some of these helpers.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 io_uring/kbuf.c | 129 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 85 insertions(+), 44 deletions(-)

diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 67d4fe576473..850b836f32ee 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -596,55 +596,73 @@ int io_manage_buffers_legacy(struct io_kiocb *req, unsigned int issue_flags)
 	return IOU_COMPLETE;
 }
 
-int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
+static int io_copy_and_validate_buf_reg(const void __user *arg,
+					struct io_uring_buf_reg *reg,
+					unsigned int permitted_flags)
 {
-	struct io_uring_buf_reg reg;
-	struct io_buffer_list *bl;
-	struct io_uring_region_desc rd;
-	struct io_uring_buf_ring *br;
-	unsigned long mmap_offset;
-	unsigned long ring_size;
-	int ret;
-
-	lockdep_assert_held(&ctx->uring_lock);
-
-	if (copy_from_user(&reg, arg, sizeof(reg)))
+	if (copy_from_user(reg, arg, sizeof(*reg)))
 		return -EFAULT;
-	if (!mem_is_zero(reg.resv, sizeof(reg.resv)))
+
+	if (!mem_is_zero(reg->resv, sizeof(reg->resv)))
 		return -EINVAL;
-	if (reg.flags & ~(IOU_PBUF_RING_MMAP | IOU_PBUF_RING_INC))
+	if (reg->flags & ~permitted_flags)
 		return -EINVAL;
-	if (!is_power_of_2(reg.ring_entries))
+	if (!is_power_of_2(reg->ring_entries))
 		return -EINVAL;
 	/* cannot disambiguate full vs empty due to head/tail size */
-	if (reg.ring_entries >= 65536)
+	if (reg->ring_entries >= 65536)
 		return -EINVAL;
+	return 0;
+}
 
-	bl = io_buffer_get_list(ctx, reg.bgid);
-	if (bl) {
+static struct io_buffer_list *
+io_alloc_new_buffer_list(struct io_ring_ctx *ctx,
+			 const struct io_uring_buf_reg *reg)
+{
+	struct io_buffer_list *list;
+
+	list = io_buffer_get_list(ctx, reg->bgid);
+	if (list) {
 		/* if mapped buffer ring OR classic exists, don't allow */
-		if (bl->flags & IOBL_BUF_RING || !list_empty(&bl->buf_list))
-			return -EEXIST;
-		io_destroy_bl(ctx, bl);
+		if (list->flags & IOBL_BUF_RING || !list_empty(&list->buf_list))
+			return ERR_PTR(-EEXIST);
+		io_destroy_bl(ctx, list);
 	}
 
-	bl = kzalloc(sizeof(*bl), GFP_KERNEL_ACCOUNT);
-	if (!bl)
-		return -ENOMEM;
+	list = kzalloc(sizeof(*list), GFP_KERNEL_ACCOUNT);
+	if (!list)
+		return ERR_PTR(-ENOMEM);
+
+	list->nr_entries = reg->ring_entries;
+	list->mask = reg->ring_entries - 1;
+	list->flags = IOBL_BUF_RING;
+
+	return list;
+}
+
+static int io_setup_pbuf_ring(struct io_ring_ctx *ctx,
+			      const struct io_uring_buf_reg *reg,
+			      struct io_buffer_list *bl)
+{
+	struct io_uring_region_desc rd;
+	unsigned long mmap_offset;
+	unsigned long ring_size;
+	int ret;
 
-	mmap_offset = (unsigned long)reg.bgid << IORING_OFF_PBUF_SHIFT;
-	ring_size = flex_array_size(br, bufs, reg.ring_entries);
+	mmap_offset = (unsigned long)reg->bgid << IORING_OFF_PBUF_SHIFT;
+	ring_size = flex_array_size(bl->buf_ring, bufs, reg->ring_entries);
 
 	memset(&rd, 0, sizeof(rd));
 	rd.size = PAGE_ALIGN(ring_size);
-	if (!(reg.flags & IOU_PBUF_RING_MMAP)) {
-		rd.user_addr = reg.ring_addr;
+	if (!(reg->flags & IOU_PBUF_RING_MMAP)) {
+		rd.user_addr = reg->ring_addr;
 		rd.flags |= IORING_MEM_REGION_TYPE_USER;
 	}
+
 	ret = io_create_region(ctx, &bl->region, &rd, mmap_offset);
 	if (ret)
-		goto fail;
-	br = io_region_get_ptr(&bl->region);
+		return ret;
+	bl->buf_ring = io_region_get_ptr(&bl->region);
 
 #ifdef SHM_COLOUR
 	/*
@@ -656,25 +674,48 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
 	 * should use IOU_PBUF_RING_MMAP instead, and liburing will handle
 	 * this transparently.
 	 */
-	if (!(reg.flags & IOU_PBUF_RING_MMAP) &&
-	    ((reg.ring_addr | (unsigned long)br) & (SHM_COLOUR - 1))) {
-		ret = -EINVAL;
-		goto fail;
+	if (!(reg->flags & IOU_PBUF_RING_MMAP) &&
+	    ((reg->ring_addr | (unsigned long)bl->buf_ring) &
+	     (SHM_COLOUR - 1))) {
+		io_free_region(ctx->user, &bl->region);
+		return -EINVAL;
 	}
 #endif
 
-	bl->nr_entries = reg.ring_entries;
-	bl->mask = reg.ring_entries - 1;
-	bl->flags |= IOBL_BUF_RING;
-	bl->buf_ring = br;
-	if (reg.flags & IOU_PBUF_RING_INC)
+	if (reg->flags & IOU_PBUF_RING_INC)
 		bl->flags |= IOBL_INC;
+
+	return 0;
+}
+
+int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
+{
+	unsigned int permitted_flags;
+	struct io_uring_buf_reg reg;
+	struct io_buffer_list *bl;
+	int ret;
+
+	lockdep_assert_held(&ctx->uring_lock);
+
+	permitted_flags = IOU_PBUF_RING_MMAP | IOU_PBUF_RING_INC;
+	ret = io_copy_and_validate_buf_reg(arg, &reg, permitted_flags);
+	if (ret)
+		return ret;
+
+	bl = io_alloc_new_buffer_list(ctx, &reg);
+	if (IS_ERR(bl))
+		return PTR_ERR(bl);
+
+	ret = io_setup_pbuf_ring(ctx, &reg, bl);
+	if (ret) {
+		kfree(bl);
+		return ret;
+	}
+
 	ret = io_buffer_add_list(ctx, bl, reg.bgid);
-	if (!ret)
-		return 0;
-fail:
-	io_free_region(ctx->user, &bl->region);
-	kfree(bl);
+	if (ret)
+		io_put_bl(ctx, bl);
+
 	return ret;
 }
 
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring()
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Use the more generic name io_unregister_buf_ring() as this function will
be used for unregistering both provided buffer rings and kernel-managed
buffer rings.

This is a preparatory change for upcoming kernel-managed buffer ring
support.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 io_uring/kbuf.c     | 2 +-
 io_uring/kbuf.h     | 2 +-
 io_uring/register.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 850b836f32ee..aa9b70b72db4 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -719,7 +719,7 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
 	return ret;
 }
 
-int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
+int io_unregister_buf_ring(struct io_ring_ctx *ctx, void __user *arg)
 {
 	struct io_uring_buf_reg reg;
 	struct io_buffer_list *bl;
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index bf15e26520d3..40b44f4fdb15 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -74,7 +74,7 @@ int io_provide_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 int io_manage_buffers_legacy(struct io_kiocb *req, unsigned int issue_flags);
 
 int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
-int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
+int io_unregister_buf_ring(struct io_ring_ctx *ctx, void __user *arg);
 int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg);
 
 bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags);
diff --git a/io_uring/register.c b/io_uring/register.c
index 594b1f2ce875..0882cb34f851 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -841,7 +841,7 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
 		ret = -EINVAL;
 		if (!arg || nr_args != 1)
 			break;
-		ret = io_unregister_pbuf_ring(ctx, arg);
+		ret = io_unregister_buf_ring(ctx, arg);
 		break;
 	case IORING_REGISTER_SYNC_CANCEL:
 		ret = -EINVAL;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10 16:34   ` Pavel Begunkov
  2026-02-10  0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Add support for kernel-managed buffer rings (kmbuf rings), which allow
the kernel to allocate and manage the backing buffers for a buffer
ring, rather than requiring the application to provide and manage them.

This introduces two new registration opcodes:
- IORING_REGISTER_KMBUF_RING: Register a kernel-managed buffer ring
- IORING_UNREGISTER_KMBUF_RING: Unregister a kernel-managed buffer ring

The existing io_uring_buf_reg structure is extended with a union to
support both application-provided buffer rings (pbuf) and kernel-managed
buffer rings (kmbuf):
- For pbuf rings: ring_addr specifies the user-provided ring address
- For kmbuf rings: buf_size specifies the size of each buffer. buf_size
  must be non-zero and page-aligned.

The implementation follows the same pattern as pbuf ring registration,
reusing the validation and buffer list allocation helpers introduced in
earlier refactoring. The IOBL_KERNEL_MANAGED flag marks buffer lists as
kernel-managed for appropriate handling in the I/O path.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/uapi/linux/io_uring.h |  15 ++++-
 io_uring/kbuf.c               |  81 ++++++++++++++++++++++++-
 io_uring/kbuf.h               |   7 ++-
 io_uring/memmap.c             | 111 ++++++++++++++++++++++++++++++++++
 io_uring/memmap.h             |   4 ++
 io_uring/register.c           |   7 +++
 6 files changed, 219 insertions(+), 6 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index fc473af6feb4..a0889c1744bd 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -715,6 +715,10 @@ enum io_uring_register_op {
 	/* register bpf filtering programs */
 	IORING_REGISTER_BPF_FILTER		= 37,
 
+	/* register/unregister kernel-managed ring buffer group */
+	IORING_REGISTER_KMBUF_RING		= 38,
+	IORING_UNREGISTER_KMBUF_RING		= 39,
+
 	/* this goes last */
 	IORING_REGISTER_LAST,
 
@@ -891,9 +895,16 @@ enum io_uring_register_pbuf_ring_flags {
 	IOU_PBUF_RING_INC	= 2,
 };
 
-/* argument for IORING_(UN)REGISTER_PBUF_RING */
+/* argument for IORING_(UN)REGISTER_PBUF_RING and
+ * IORING_(UN)REGISTER_KMBUF_RING
+ */
 struct io_uring_buf_reg {
-	__u64	ring_addr;
+	union {
+		/* used for pbuf rings */
+		__u64	ring_addr;
+		/* used for kmbuf rings */
+		__u32   buf_size;
+	};
 	__u32	ring_entries;
 	__u16	bgid;
 	__u16	flags;
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index aa9b70b72db4..9bc36451d083 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -427,10 +427,13 @@ static int io_remove_buffers_legacy(struct io_ring_ctx *ctx,
 
 static void io_put_bl(struct io_ring_ctx *ctx, struct io_buffer_list *bl)
 {
-	if (bl->flags & IOBL_BUF_RING)
+	if (bl->flags & IOBL_BUF_RING) {
 		io_free_region(ctx->user, &bl->region);
-	else
+		if (bl->flags & IOBL_KERNEL_MANAGED)
+			kfree(bl->buf_ring);
+	} else {
 		io_remove_buffers_legacy(ctx, bl, -1U);
+	}
 
 	kfree(bl);
 }
@@ -779,3 +782,77 @@ struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
 		return NULL;
 	return &bl->region;
 }
+
+static int io_setup_kmbuf_ring(struct io_ring_ctx *ctx,
+			       struct io_buffer_list *bl,
+			       struct io_uring_buf_reg *reg)
+{
+	struct io_uring_buf_ring *ring;
+	unsigned long ring_size;
+	void *buf_region;
+	unsigned int i;
+	int ret;
+
+	/* allocate pages for the ring structure */
+	ring_size = flex_array_size(ring, bufs, bl->nr_entries);
+	ring = kzalloc(ring_size, GFP_KERNEL_ACCOUNT);
+	if (!ring)
+		return -ENOMEM;
+
+	ret = io_create_region_multi_buf(ctx, &bl->region, bl->nr_entries,
+					 reg->buf_size);
+	if (ret) {
+		kfree(ring);
+		return ret;
+	}
+
+	/* initialize ring buf entries to point to the buffers */
+	buf_region = bl->region.ptr;
+	for (i = 0; i < bl->nr_entries; i++) {
+		struct io_uring_buf *buf = &ring->bufs[i];
+
+		buf->addr = (u64)(uintptr_t)buf_region;
+		buf->len = reg->buf_size;
+		buf->bid = i;
+
+		buf_region += reg->buf_size;
+	}
+	ring->tail = bl->nr_entries;
+
+	bl->buf_ring = ring;
+	bl->flags |= IOBL_KERNEL_MANAGED;
+
+	return 0;
+}
+
+int io_register_kmbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
+{
+	struct io_uring_buf_reg reg;
+	struct io_buffer_list *bl;
+	int ret;
+
+	lockdep_assert_held(&ctx->uring_lock);
+
+	ret = io_copy_and_validate_buf_reg(arg, &reg, 0);
+	if (ret)
+		return ret;
+
+	if (!reg.buf_size || !PAGE_ALIGNED(reg.buf_size))
+		return -EINVAL;
+
+	bl = io_alloc_new_buffer_list(ctx, &reg);
+	if (IS_ERR(bl))
+		return PTR_ERR(bl);
+
+	ret = io_setup_kmbuf_ring(ctx, bl, &reg);
+	if (ret) {
+		kfree(bl);
+		return ret;
+	}
+
+	ret = io_buffer_add_list(ctx, bl, reg.bgid);
+	if (ret)
+		io_put_bl(ctx, bl);
+
+	return ret;
+}
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 40b44f4fdb15..62c80a1ebf03 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -7,9 +7,11 @@
 
 enum {
 	/* ring mapped provided buffers */
-	IOBL_BUF_RING	= 1,
+	IOBL_BUF_RING		= 1,
 	/* buffers are consumed incrementally rather than always fully */
-	IOBL_INC	= 2,
+	IOBL_INC		= 2,
+	/* buffers are kernel managed */
+	IOBL_KERNEL_MANAGED	= 4,
 };
 
 struct io_buffer_list {
@@ -74,6 +76,7 @@ int io_provide_buffers_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe
 int io_manage_buffers_legacy(struct io_kiocb *req, unsigned int issue_flags);
 
 int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
+int io_register_kmbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
 int io_unregister_buf_ring(struct io_ring_ctx *ctx, void __user *arg);
 int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg);
 
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 89f56609e50a..8d37e93c0433 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -15,6 +15,28 @@
 #include "rsrc.h"
 #include "zcrx.h"
 
+static void release_multi_buf_pages(struct page **pages, unsigned long nr_pages)
+{
+	struct page *page;
+	unsigned int nr, i = 0;
+
+	while (nr_pages) {
+		page = pages[i];
+
+		if (!page || WARN_ON_ONCE(page != compound_head(page)))
+			return;
+
+		nr = compound_nr(page);
+		put_page(page);
+
+		if (WARN_ON_ONCE(nr > nr_pages))
+			return;
+
+		i += nr;
+		nr_pages -= nr;
+	}
+}
+
 static bool io_mem_alloc_compound(struct page **pages, int nr_pages,
 				  size_t size, gfp_t gfp)
 {
@@ -86,6 +108,8 @@ enum {
 	IO_REGION_F_USER_PROVIDED		= 2,
 	/* only the first page in the array is ref'ed */
 	IO_REGION_F_SINGLE_REF			= 4,
+	/* pages in the array belong to multiple discrete allocations */
+	IO_REGION_F_MULTI_BUF			= 8,
 };
 
 void io_free_region(struct user_struct *user, struct io_mapped_region *mr)
@@ -98,6 +122,8 @@ void io_free_region(struct user_struct *user, struct io_mapped_region *mr)
 
 		if (mr->flags & IO_REGION_F_USER_PROVIDED)
 			unpin_user_pages(mr->pages, nr_refs);
+		else if (mr->flags & IO_REGION_F_MULTI_BUF)
+			release_multi_buf_pages(mr->pages, nr_refs);
 		else
 			release_pages(mr->pages, nr_refs);
 
@@ -149,6 +175,54 @@ static int io_region_pin_pages(struct io_mapped_region *mr,
 	return 0;
 }
 
+static int io_region_allocate_pages_multi_buf(struct io_mapped_region *mr,
+					      unsigned int nr_bufs,
+					      unsigned int buf_size)
+{
+	gfp_t gfp = GFP_USER | __GFP_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
+	struct page **pages, **cur_pages;
+	unsigned int nr_allocated;
+	unsigned int buf_pages;
+	unsigned int i;
+
+	if (!PAGE_ALIGNED(buf_size))
+		return -EINVAL;
+
+	buf_pages = buf_size >> PAGE_SHIFT;
+
+	pages = kvmalloc_array(mr->nr_pages, sizeof(*pages), gfp);
+	if (!pages)
+		return -ENOMEM;
+
+	cur_pages = pages;
+
+	for (i = 0; i < nr_bufs; i++) {
+		if (io_mem_alloc_compound(cur_pages, buf_pages, buf_size,
+					  gfp)) {
+			cur_pages += buf_pages;
+			continue;
+		}
+
+		nr_allocated = alloc_pages_bulk_node(gfp, NUMA_NO_NODE,
+						     buf_pages, cur_pages);
+		if (nr_allocated != buf_pages) {
+			unsigned int total =
+				(cur_pages - pages) + nr_allocated;
+
+			release_multi_buf_pages(pages, total);
+			kvfree(pages);
+			return -ENOMEM;
+		}
+
+		cur_pages += buf_pages;
+	}
+
+	mr->flags |= IO_REGION_F_MULTI_BUF;
+	mr->pages = pages;
+
+	return 0;
+}
+
 static int io_region_allocate_pages(struct io_mapped_region *mr,
 				    struct io_uring_region_desc *reg,
 				    unsigned long mmap_offset)
@@ -181,6 +255,43 @@ static int io_region_allocate_pages(struct io_mapped_region *mr,
 	return 0;
 }
 
+int io_create_region_multi_buf(struct io_ring_ctx *ctx,
+			       struct io_mapped_region *mr,
+			       unsigned int nr_bufs, unsigned int buf_size)
+{
+	unsigned int nr_pages;
+	int ret;
+
+	if (WARN_ON_ONCE(mr->pages || mr->ptr || mr->nr_pages))
+		return -EFAULT;
+
+	if (WARN_ON_ONCE(!nr_bufs || !buf_size || !PAGE_ALIGNED(buf_size)))
+		return -EINVAL;
+
+	if (check_mul_overflow(buf_size >> PAGE_SHIFT, nr_bufs, &nr_pages))
+		return -EINVAL;
+
+	if (ctx->user) {
+		ret = __io_account_mem(ctx->user, nr_pages);
+		if (ret)
+			return ret;
+	}
+	mr->nr_pages = nr_pages;
+
+	ret = io_region_allocate_pages_multi_buf(mr, nr_bufs, buf_size);
+	if (ret)
+		goto out_free;
+
+	ret = io_region_init_ptr(mr);
+	if (ret)
+		goto out_free;
+
+	return 0;
+out_free:
+	io_free_region(ctx->user, mr);
+	return ret;
+}
+
 int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
 		     struct io_uring_region_desc *reg,
 		     unsigned long mmap_offset)
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index f4cfbb6b9a1f..3aa1167462ae 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -22,6 +22,10 @@ int io_create_region(struct io_ring_ctx *ctx, struct io_mapped_region *mr,
 		     struct io_uring_region_desc *reg,
 		     unsigned long mmap_offset);
 
+int io_create_region_multi_buf(struct io_ring_ctx *ctx,
+			       struct io_mapped_region *mr,
+			       unsigned int nr_bufs, unsigned int buf_size);
+
 static inline void *io_region_get_ptr(struct io_mapped_region *mr)
 {
 	return mr->ptr;
diff --git a/io_uring/register.c b/io_uring/register.c
index 0882cb34f851..2db8daaf8fde 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -837,7 +837,14 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
 			break;
 		ret = io_register_pbuf_ring(ctx, arg);
 		break;
+	case IORING_REGISTER_KMBUF_RING:
+		ret = -EINVAL;
+		if (!arg || nr_args != 1)
+			break;
+		ret = io_register_kmbuf_ring(ctx, arg);
+		break;
 	case IORING_UNREGISTER_PBUF_RING:
+	case IORING_UNREGISTER_KMBUF_RING:
 		ret = -EINVAL;
 		if (!arg || nr_args != 1)
 			break;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 04/11] io_uring/kbuf: add mmap support for kernel-managed buffer rings
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (2 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  1:02   ` Jens Axboe
  2026-02-10  0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Add support for mmapping kernel-managed buffer rings (kmbuf) to
userspace, allowing applications to access the kernel-allocated buffers.

Similar to application-provided buffer rings (pbuf), kmbuf rings use the
buffer group ID encoded in the mmap offset to identify which buffer ring
to map. The implementation follows the same pattern as pbuf rings.

New mmap offset constants are introduced:
  - IORING_OFF_KMBUF_RING (0x88000000): Base offset for kmbuf mappings
  - IORING_OFF_KMBUF_SHIFT (16): Shift value to encode buffer group ID

The mmap offset encodes the bgid shifted by IORING_OFF_KMBUF_SHIFT.
The io_buf_get_region() helper retrieves the appropriate region.

This allows userspace to mmap the kernel-allocated buffer region and
access the buffers directly.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/uapi/linux/io_uring.h |  2 ++
 io_uring/kbuf.c               | 11 +++++++++--
 io_uring/kbuf.h               |  5 +++--
 io_uring/memmap.c             |  5 ++++-
 4 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index a0889c1744bd..42a2812c9922 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -545,6 +545,8 @@ struct io_uring_cqe {
 #define IORING_OFF_SQES			0x10000000ULL
 #define IORING_OFF_PBUF_RING		0x80000000ULL
 #define IORING_OFF_PBUF_SHIFT		16
+#define IORING_OFF_KMBUF_RING		0x88000000ULL
+#define IORING_OFF_KMBUF_SHIFT		16
 #define IORING_OFF_MMAP_MASK		0xf8000000ULL
 
 /*
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 9bc36451d083..ccf5b213087b 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -770,16 +770,23 @@ int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg)
 	return 0;
 }
 
-struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
-					    unsigned int bgid)
+struct io_mapped_region *io_buf_get_region(struct io_ring_ctx *ctx,
+					   unsigned int bgid,
+					   bool kernel_managed)
 {
 	struct io_buffer_list *bl;
+	bool is_kernel_managed;
 
 	lockdep_assert_held(&ctx->mmap_lock);
 
 	bl = xa_load(&ctx->io_bl_xa, bgid);
 	if (!bl || !(bl->flags & IOBL_BUF_RING))
 		return NULL;
+
+	is_kernel_managed = !!(bl->flags & IOBL_KERNEL_MANAGED);
+	if (is_kernel_managed != kernel_managed)
+		return NULL;
+
 	return &bl->region;
 }
 
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 62c80a1ebf03..11d165888b8e 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -88,8 +88,9 @@ unsigned int __io_put_kbufs(struct io_kiocb *req, struct io_buffer_list *bl,
 bool io_kbuf_commit(struct io_kiocb *req,
 		    struct io_buffer_list *bl, int len, int nr);
 
-struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
-					    unsigned int bgid);
+struct io_mapped_region *io_buf_get_region(struct io_ring_ctx *ctx,
+					   unsigned int bgid,
+					   bool kernel_managed);
 
 static inline bool io_kbuf_recycle_ring(struct io_kiocb *req,
 					struct io_buffer_list *bl)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index 8d37e93c0433..916315122323 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -356,7 +356,10 @@ static struct io_mapped_region *io_mmap_get_region(struct io_ring_ctx *ctx,
 		return &ctx->sq_region;
 	case IORING_OFF_PBUF_RING:
 		id = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
-		return io_pbuf_get_region(ctx, id);
+		return io_buf_get_region(ctx, id, false);
+	case IORING_OFF_KMBUF_RING:
+		id = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_KMBUF_SHIFT;
+		return io_buf_get_region(ctx, id, true);
 	case IORING_MAP_OFF_PARAM_REGION:
 		return &ctx->param_region;
 	case IORING_MAP_OFF_ZCRX_REGION:
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (3 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Allow kernel-managed buffers to be selected. This requires modifying the
io_br_sel struct to separate the fields for address and val, since a
kernel address cannot be distinguished from a negative val when error
checking.

Auto-commit any selected kernel-managed buffer.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/io_uring_types.h |  8 ++++----
 io_uring/kbuf.c                | 16 ++++++++++++----
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3e4a82a6f817..36cc2e0346d9 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -93,13 +93,13 @@ struct io_mapped_region {
  */
 struct io_br_sel {
 	struct io_buffer_list *buf_list;
-	/*
-	 * Some selection parts return the user address, others return an error.
-	 */
 	union {
+		/* for classic/ring provided buffers */
 		void __user *addr;
-		ssize_t val;
+		/* for kernel-managed buffers */
+		void *kaddr;
 	};
+	ssize_t val;
 };
 
 
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index ccf5b213087b..1e8395270227 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -155,7 +155,8 @@ static int io_provided_buffers_select(struct io_kiocb *req, size_t *len,
 	return 1;
 }
 
-static bool io_should_commit(struct io_kiocb *req, unsigned int issue_flags)
+static bool io_should_commit(struct io_kiocb *req, struct io_buffer_list *bl,
+			     unsigned int issue_flags)
 {
 	/*
 	* If we came in unlocked, we have no choice but to consume the
@@ -170,7 +171,11 @@ static bool io_should_commit(struct io_kiocb *req, unsigned int issue_flags)
 	if (issue_flags & IO_URING_F_UNLOCKED)
 		return true;
 
-	/* uring_cmd commits kbuf upfront, no need to auto-commit */
+	/* kernel-managed buffers are auto-committed */
+	if (bl->flags & IOBL_KERNEL_MANAGED)
+		return true;
+
+	/* multishot uring_cmd commits kbuf upfront, no need to auto-commit */
 	if (!io_file_can_poll(req) && req->opcode != IORING_OP_URING_CMD)
 		return true;
 	return false;
@@ -200,9 +205,12 @@ static struct io_br_sel io_ring_buffer_select(struct io_kiocb *req, size_t *len,
 	req->flags |= REQ_F_BUFFER_RING | REQ_F_BUFFERS_COMMIT;
 	req->buf_index = READ_ONCE(buf->bid);
 	sel.buf_list = bl;
-	sel.addr = u64_to_user_ptr(READ_ONCE(buf->addr));
+	if (bl->flags & IOBL_KERNEL_MANAGED)
+		sel.kaddr = (void *)(uintptr_t)READ_ONCE(buf->addr);
+	else
+		sel.addr = u64_to_user_ptr(READ_ONCE(buf->addr));
 
-	if (io_should_commit(req, issue_flags)) {
+	if (io_should_commit(req, bl, issue_flags)) {
 		io_kbuf_commit(req, sel.buf_list, *len, 1);
 		sel.buf_list = NULL;
 	}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (4 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  1:07   ` Jens Axboe
  2026-02-10  0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Add kernel APIs to pin and unpin buffer rings, preventing userspace from
unregistering a buffer ring while it is pinned by the kernel.

This provides a mechanism for kernel subsystems to safely access buffer
ring contents while ensuring the buffer ring remains valid. A pinned
buffer ring cannot be unregistered until explicitly unpinned. On the
userspace side, trying to unregister a pinned buffer will return -EBUSY.

This is a preparatory change for upcoming fuse usage of kernel-managed
buffer rings. It is necessary for fuse to pin the buffer ring because
fuse may need to select a buffer in atomic contexts, which it can only
do so by using the underlying buffer list pointer.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/io_uring/cmd.h | 17 +++++++++++++
 io_uring/kbuf.c              | 48 ++++++++++++++++++++++++++++++++++++
 io_uring/kbuf.h              |  5 ++++
 3 files changed, 70 insertions(+)

diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index 375fd048c4cb..702b1903e6ee 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -84,6 +84,10 @@ struct io_br_sel io_uring_cmd_buffer_select(struct io_uring_cmd *ioucmd,
 bool io_uring_mshot_cmd_post_cqe(struct io_uring_cmd *ioucmd,
 				 struct io_br_sel *sel, unsigned int issue_flags);
 
+int io_uring_buf_ring_pin(struct io_uring_cmd *cmd, unsigned buf_group,
+			  unsigned issue_flags, struct io_buffer_list **bl);
+int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd, unsigned buf_group,
+			    unsigned issue_flags);
 #else
 static inline int
 io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
@@ -126,6 +130,19 @@ static inline bool io_uring_mshot_cmd_post_cqe(struct io_uring_cmd *ioucmd,
 {
 	return true;
 }
+static inline int io_uring_buf_ring_pin(struct io_uring_cmd *cmd,
+					unsigned buf_group,
+					unsigned issue_flags,
+					struct io_buffer_list **bl)
+{
+	return -EOPNOTSUPP;
+}
+static inline int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd,
+					  unsigned buf_group,
+					  unsigned issue_flags)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 static inline struct io_uring_cmd *io_uring_cmd_from_tw(struct io_tw_req tw_req)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 1e8395270227..dee1764ed19f 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -9,6 +9,7 @@
 #include <linux/poll.h>
 #include <linux/vmalloc.h>
 #include <linux/io_uring.h>
+#include <linux/io_uring/cmd.h>
 
 #include <uapi/linux/io_uring.h>
 
@@ -237,6 +238,51 @@ struct io_br_sel io_buffer_select(struct io_kiocb *req, size_t *len,
 	return sel;
 }
 
+int io_uring_buf_ring_pin(struct io_uring_cmd *cmd, unsigned buf_group,
+			  unsigned issue_flags, struct io_buffer_list **bl)
+{
+	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
+	struct io_buffer_list *buffer_list;
+	int ret = -EINVAL;
+
+	io_ring_submit_lock(ctx, issue_flags);
+
+	buffer_list = io_buffer_get_list(ctx, buf_group);
+	if (buffer_list && (buffer_list->flags & IOBL_BUF_RING)) {
+		if (unlikely(buffer_list->flags & IOBL_PINNED)) {
+			ret = -EALREADY;
+		} else {
+			buffer_list->flags |= IOBL_PINNED;
+			ret = 0;
+			*bl = buffer_list;
+		}
+	}
+
+	io_ring_submit_unlock(ctx, issue_flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(io_uring_buf_ring_pin);
+
+int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd, unsigned buf_group,
+		       unsigned issue_flags)
+{
+	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
+	struct io_buffer_list *bl;
+	int ret = -EINVAL;
+
+	io_ring_submit_lock(ctx, issue_flags);
+
+	bl = io_buffer_get_list(ctx, buf_group);
+	if (bl && (bl->flags & IOBL_BUF_RING) && (bl->flags & IOBL_PINNED)) {
+		bl->flags &= ~IOBL_PINNED;
+		ret = 0;
+	}
+
+	io_ring_submit_unlock(ctx, issue_flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(io_uring_buf_ring_unpin);
+
 /* cap it at a reasonable 256, will be one page even for 4K */
 #define PEEK_MAX_IMPORT		256
 
@@ -747,6 +793,8 @@ int io_unregister_buf_ring(struct io_ring_ctx *ctx, void __user *arg)
 		return -ENOENT;
 	if (!(bl->flags & IOBL_BUF_RING))
 		return -EINVAL;
+	if (bl->flags & IOBL_PINNED)
+		return -EBUSY;
 
 	scoped_guard(mutex, &ctx->mmap_lock)
 		xa_erase(&ctx->io_bl_xa, bl->bgid);
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 11d165888b8e..781630c2cc10 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -12,6 +12,11 @@ enum {
 	IOBL_INC		= 2,
 	/* buffers are kernel managed */
 	IOBL_KERNEL_MANAGED	= 4,
+	/*
+	 * buffer ring is pinned and cannot be unregistered by userspace until
+	 * it has been unpinned
+	 */
+	IOBL_PINNED		= 8,
 };
 
 struct io_buffer_list {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (5 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  0:52   ` Jens Axboe
  2026-02-10  0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Add an interface for buffers to be recycled back into a kernel-managed
buffer ring.

This is a preparatory patch for fuse over io-uring.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/io_uring/cmd.h | 11 +++++++++
 io_uring/kbuf.c              | 44 ++++++++++++++++++++++++++++++++++++
 2 files changed, 55 insertions(+)

diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index 702b1903e6ee..a488e945f883 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -88,6 +88,10 @@ int io_uring_buf_ring_pin(struct io_uring_cmd *cmd, unsigned buf_group,
 			  unsigned issue_flags, struct io_buffer_list **bl);
 int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd, unsigned buf_group,
 			    unsigned issue_flags);
+
+int io_uring_kmbuf_recycle(struct io_uring_cmd *cmd, unsigned int buf_group,
+			   u64 addr, unsigned int len, unsigned int bid,
+			   unsigned int issue_flags);
 #else
 static inline int
 io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
@@ -143,6 +147,13 @@ static inline int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd,
 {
 	return -EOPNOTSUPP;
 }
+static inline int io_uring_kmbuf_recycle(struct io_uring_cmd *cmd,
+					 unsigned int buf_group, u64 addr,
+					 unsigned int len, unsigned int bid,
+					 unsigned int issue_flags)
+{
+	return -EOPNOTSUPP;
+}
 #endif
 
 static inline struct io_uring_cmd *io_uring_cmd_from_tw(struct io_tw_req tw_req)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index dee1764ed19f..17b6178be4ce 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -102,6 +102,50 @@ void io_kbuf_drop_legacy(struct io_kiocb *req)
 	req->kbuf = NULL;
 }
 
+int io_uring_kmbuf_recycle(struct io_uring_cmd *cmd, unsigned int buf_group,
+			   u64 addr, unsigned int len, unsigned int bid,
+			   unsigned int issue_flags)
+{
+	struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+	struct io_ring_ctx *ctx = req->ctx;
+	struct io_uring_buf_ring *br;
+	struct io_uring_buf *buf;
+	struct io_buffer_list *bl;
+	int ret = -EINVAL;
+
+	if (WARN_ON_ONCE(req->flags & REQ_F_BUFFERS_COMMIT))
+		return ret;
+
+	io_ring_submit_lock(ctx, issue_flags);
+
+	bl = io_buffer_get_list(ctx, buf_group);
+
+	if (!bl || WARN_ON_ONCE(!(bl->flags & IOBL_BUF_RING)) ||
+	    WARN_ON_ONCE(!(bl->flags & IOBL_KERNEL_MANAGED)))
+		goto done;
+
+	br = bl->buf_ring;
+
+	if (WARN_ON_ONCE((br->tail - bl->head) >= bl->nr_entries))
+		goto done;
+
+	buf = &br->bufs[(br->tail) & bl->mask];
+
+	buf->addr = addr;
+	buf->len = len;
+	buf->bid = bid;
+
+	req->flags &= ~REQ_F_BUFFER_RING;
+
+	br->tail++;
+	ret = 0;
+
+done:
+	io_ring_submit_unlock(ctx, issue_flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(io_uring_kmbuf_recycle);
+
 bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring()
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (6 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

io_uring_is_kmbuf_ring() returns true if there is a kernel-managed
buffer ring at the specified buffer group.

This is a preparatory patch for upcoming fuse kernel-managed buffer
support, which needs to ensure the buffer ring registered by the server
is a kernel-managed buffer ring.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/io_uring/cmd.h |  9 +++++++++
 io_uring/kbuf.c              | 20 ++++++++++++++++++++
 2 files changed, 29 insertions(+)

diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index a488e945f883..04a937f6f4d3 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -92,6 +92,9 @@ int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd, unsigned buf_group,
 int io_uring_kmbuf_recycle(struct io_uring_cmd *cmd, unsigned int buf_group,
 			   u64 addr, unsigned int len, unsigned int bid,
 			   unsigned int issue_flags);
+
+bool io_uring_is_kmbuf_ring(struct io_uring_cmd *cmd, unsigned int buf_group,
+			    unsigned int issue_flags);
 #else
 static inline int
 io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
@@ -154,6 +157,12 @@ static inline int io_uring_kmbuf_recycle(struct io_uring_cmd *cmd,
 {
 	return -EOPNOTSUPP;
 }
+static inline bool io_uring_is_kmbuf_ring(struct io_uring_cmd *cmd,
+					  unsigned int buf_group,
+					  unsigned int issue_flags)
+{
+	return false;
+}
 #endif
 
 static inline struct io_uring_cmd *io_uring_cmd_from_tw(struct io_tw_req tw_req)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 17b6178be4ce..797cc2f0a5e9 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -963,3 +963,23 @@ int io_register_kmbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
 
 	return ret;
 }
+
+bool io_uring_is_kmbuf_ring(struct io_uring_cmd *cmd, unsigned int buf_group,
+			    unsigned int issue_flags)
+{
+	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
+	struct io_buffer_list *bl;
+	bool is_kmbuf_ring = false;
+
+	io_ring_submit_lock(ctx, issue_flags);
+
+	bl = io_buffer_get_list(ctx, buf_group);
+	if (likely(bl) && (bl->flags & IOBL_KERNEL_MANAGED)) {
+		WARN_ON_ONCE(!(bl->flags & IOBL_BUF_RING));
+		is_kmbuf_ring = true;
+	}
+
+	io_ring_submit_unlock(ctx, issue_flags);
+	return is_kmbuf_ring;
+}
+EXPORT_SYMBOL_GPL(io_uring_is_kmbuf_ring);
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select()
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (7 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Export io_ring_buffer_select() so that it may be used by callers who
pass in a pinned bufring without needing to grab the io_uring mutex.

This is a preparatory patch that will be needed by fuse io-uring, which
will need to select a buffer from a kernel-managed bufring while the
uring mutex may already be held by in-progress commits, and may need to
select a buffer in atomic contexts.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/io_uring/cmd.h | 14 ++++++++++++++
 io_uring/kbuf.c              |  7 ++++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index 04a937f6f4d3..d4b5943bdeb1 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -95,6 +95,10 @@ int io_uring_kmbuf_recycle(struct io_uring_cmd *cmd, unsigned int buf_group,
 
 bool io_uring_is_kmbuf_ring(struct io_uring_cmd *cmd, unsigned int buf_group,
 			    unsigned int issue_flags);
+
+struct io_br_sel io_ring_buffer_select(struct io_kiocb *req, size_t *len,
+				       struct io_buffer_list *bl,
+				       unsigned int issue_flags);
 #else
 static inline int
 io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
@@ -163,6 +167,16 @@ static inline bool io_uring_is_kmbuf_ring(struct io_uring_cmd *cmd,
 {
 	return false;
 }
+static inline struct io_br_sel io_ring_buffer_select(struct io_kiocb *req,
+						     size_t *len,
+						     struct io_buffer_list *bl,
+						     unsigned int issue_flags)
+{
+	struct io_br_sel sel = {
+		.val = -EOPNOTSUPP,
+	};
+	return sel;
+}
 #endif
 
 static inline struct io_uring_cmd *io_uring_cmd_from_tw(struct io_tw_req tw_req)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 797cc2f0a5e9..9a93f10d3214 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -226,9 +226,9 @@ static bool io_should_commit(struct io_kiocb *req, struct io_buffer_list *bl,
 	return false;
 }
 
-static struct io_br_sel io_ring_buffer_select(struct io_kiocb *req, size_t *len,
-					      struct io_buffer_list *bl,
-					      unsigned int issue_flags)
+struct io_br_sel io_ring_buffer_select(struct io_kiocb *req, size_t *len,
+				       struct io_buffer_list *bl,
+				       unsigned int issue_flags)
 {
 	struct io_uring_buf_ring *br = bl->buf_ring;
 	__u16 tail, head = bl->head;
@@ -261,6 +261,7 @@ static struct io_br_sel io_ring_buffer_select(struct io_kiocb *req, size_t *len,
 	}
 	return sel;
 }
+EXPORT_SYMBOL_GPL(io_ring_buffer_select);
 
 struct io_br_sel io_buffer_select(struct io_kiocb *req, size_t *len,
 				  unsigned buf_group, unsigned int issue_flags)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (8 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  0:53   ` Jens Axboe
  2026-02-10  0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
  2026-02-10  0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
  11 siblings, 1 reply; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

Return the id of the selected buffer in io_buffer_select(). This is
needed for kernel-managed buffer rings to later recycle the selected
buffer.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 include/linux/io_uring/cmd.h   | 2 +-
 include/linux/io_uring_types.h | 2 ++
 io_uring/kbuf.c                | 7 +++++--
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index d4b5943bdeb1..94df2bdebe77 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -71,7 +71,7 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd);
 
 /*
  * Select a buffer from the provided buffer group for multishot uring_cmd.
- * Returns the selected buffer address and size.
+ * Returns the selected buffer address, size, and id.
  */
 struct io_br_sel io_uring_cmd_buffer_select(struct io_uring_cmd *ioucmd,
 					    unsigned buf_group, size_t *len,
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 36cc2e0346d9..5a56bb341337 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -100,6 +100,8 @@ struct io_br_sel {
 		void *kaddr;
 	};
 	ssize_t val;
+	/* id of the selected buffer */
+	unsigned buf_id;
 };
 
 
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 9a93f10d3214..24c1e34ea23e 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -250,6 +250,7 @@ struct io_br_sel io_ring_buffer_select(struct io_kiocb *req, size_t *len,
 	req->flags |= REQ_F_BUFFER_RING | REQ_F_BUFFERS_COMMIT;
 	req->buf_index = READ_ONCE(buf->bid);
 	sel.buf_list = bl;
+	sel.buf_id = req->buf_index;
 	if (bl->flags & IOBL_KERNEL_MANAGED)
 		sel.kaddr = (void *)(uintptr_t)READ_ONCE(buf->addr);
 	else
@@ -274,10 +275,12 @@ struct io_br_sel io_buffer_select(struct io_kiocb *req, size_t *len,
 
 	bl = io_buffer_get_list(ctx, buf_group);
 	if (likely(bl)) {
-		if (bl->flags & IOBL_BUF_RING)
+		if (bl->flags & IOBL_BUF_RING) {
 			sel = io_ring_buffer_select(req, len, bl, issue_flags);
-		else
+		} else {
 			sel.addr = io_provided_buffer_select(req, len, bl);
+			sel.buf_id = req->buf_index;
+		}
 	}
 	io_ring_submit_unlock(req->ctx, issue_flags);
 	return sel;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done()
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (9 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
@ 2026-02-10  0:28 ` Joanne Koong
  2026-02-10  0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
  11 siblings, 0 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10  0:28 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

When uring_cmd operations select a buffer, the completion queue entry
should indicate which buffer was selected.

Set IORING_CQE_F_BUFFER on the completed entry and encode the buffer
index if a buffer was selected.

This will be needed for fuse, which needs to relay to userspace which
selected buffer contains the data.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 io_uring/uring_cmd.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index ee7b49f47cb5..6d38df1a812d 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -151,6 +151,7 @@ void __io_uring_cmd_done(struct io_uring_cmd *ioucmd, s32 ret, u64 res2,
 		       unsigned issue_flags, bool is_cqe32)
 {
 	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
+	u32 cflags = 0;
 
 	if (WARN_ON_ONCE(req->flags & REQ_F_APOLL_MULTISHOT))
 		return;
@@ -160,7 +161,10 @@ void __io_uring_cmd_done(struct io_uring_cmd *ioucmd, s32 ret, u64 res2,
 	if (ret < 0)
 		req_set_fail(req);
 
-	io_req_set_res(req, ret, 0);
+	if (req->flags & (REQ_F_BUFFER_SELECTED | REQ_F_BUFFER_RING))
+		cflags |= IORING_CQE_F_BUFFER |
+			(req->buf_index << IORING_CQE_BUFFER_SHIFT);
+	io_req_set_res(req, ret, cflags);
 	if (is_cqe32) {
 		if (req->ctx->flags & IORING_SETUP_CQE_MIXED)
 			req->cqe.flags |= IORING_CQE_F_32;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings
  2026-02-10  0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
@ 2026-02-10  0:52   ` Jens Axboe
  0 siblings, 0 replies; 33+ messages in thread
From: Jens Axboe @ 2026-02-10  0:52 UTC (permalink / raw)
  To: Joanne Koong, io-uring
  Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

On 2/9/26 5:28 PM, Joanne Koong wrote:
> +int io_uring_kmbuf_recycle(struct io_uring_cmd *cmd, unsigned int buf_group,
> +			   u64 addr, unsigned int len, unsigned int bid,
> +			   unsigned int issue_flags)
> +{
> +	struct io_kiocb *req = cmd_to_io_kiocb(cmd);
> +	struct io_ring_ctx *ctx = req->ctx;
> +	struct io_uring_buf_ring *br;
> +	struct io_uring_buf *buf;
> +	struct io_buffer_list *bl;
> +	int ret = -EINVAL;
> +
> +	if (WARN_ON_ONCE(req->flags & REQ_F_BUFFERS_COMMIT))
> +		return ret;
> +
> +	io_ring_submit_lock(ctx, issue_flags);
> +
> +	bl = io_buffer_get_list(ctx, buf_group);
> +
> +	if (!bl || WARN_ON_ONCE(!(bl->flags & IOBL_BUF_RING)) ||
> +	    WARN_ON_ONCE(!(bl->flags & IOBL_KERNEL_MANAGED)))
> +		goto done;
> +
> +	br = bl->buf_ring;
> +
> +	if (WARN_ON_ONCE((br->tail - bl->head) >= bl->nr_entries))
> +		goto done;

I think you want:

	if (WARN_ON_ONCE((__u16)(br->tail - bl->head) >= bl->nr_entries))

here to avoid int promotion from messing this up if tail has wrapped.

In general, across the patches for the WARN_ON_ONCE(), it's not a huge
issue to have a litter of them for now. Hopefully we can prune some of
these down the line, however.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection
  2026-02-10  0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
@ 2026-02-10  0:53   ` Jens Axboe
  2026-02-10 22:36     ` Joanne Koong
  0 siblings, 1 reply; 33+ messages in thread
From: Jens Axboe @ 2026-02-10  0:53 UTC (permalink / raw)
  To: Joanne Koong, io-uring
  Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

On 2/9/26 5:28 PM, Joanne Koong wrote:
> Return the id of the selected buffer in io_buffer_select(). This is
> needed for kernel-managed buffer rings to later recycle the selected
> buffer.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/linux/io_uring/cmd.h   | 2 +-
>  include/linux/io_uring_types.h | 2 ++
>  io_uring/kbuf.c                | 7 +++++--
>  3 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
> index d4b5943bdeb1..94df2bdebe77 100644
> --- a/include/linux/io_uring/cmd.h
> +++ b/include/linux/io_uring/cmd.h
> @@ -71,7 +71,7 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd);
>  
>  /*
>   * Select a buffer from the provided buffer group for multishot uring_cmd.
> - * Returns the selected buffer address and size.
> + * Returns the selected buffer address, size, and id.
>   */
>  struct io_br_sel io_uring_cmd_buffer_select(struct io_uring_cmd *ioucmd,
>  					    unsigned buf_group, size_t *len,
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 36cc2e0346d9..5a56bb341337 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -100,6 +100,8 @@ struct io_br_sel {
>  		void *kaddr;
>  	};
>  	ssize_t val;
> +	/* id of the selected buffer */
> +	unsigned buf_id;
>  };

I'm probably missing something here, but why can't the caller just use
req->buf_index for this?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 00/11] io_uring: add kernel-managed buffer rings
  2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
                   ` (10 preceding siblings ...)
  2026-02-10  0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
@ 2026-02-10  0:55 ` Jens Axboe
  2026-02-10 22:45   ` Joanne Koong
  11 siblings, 1 reply; 33+ messages in thread
From: Jens Axboe @ 2026-02-10  0:55 UTC (permalink / raw)
  To: Joanne Koong, io-uring
  Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

On 2/9/26 5:28 PM, Joanne Koong wrote:
> Currently, io_uring buffer rings require the application to allocate and
> manage the backing buffers. This series introduces kernel-managed buffer
> rings, where the kernel allocates and manages the buffers on behalf of
> the application.
> 
> This is split out from the fuse over io_uring series in [1], which needs the
> kernel to own and manage buffers shared between the fuse server and the
> kernel.
> 
> This series is on top of the for-next branch in Jens' io-uring tree. The
> corresponding liburing changes are in [2] and will be submitted after the
> changes in this patchset are accepted.

Generally looks pretty good - for context, do you have a branch with
these patches and the users on top too? Makes it a bit easier for cross
referencing, as some of these really do need an exposed user to make a
good judgement on the helpers.

I know there's the older series, but I'm assuming the latter patches
changed somewhat too, and it'd be nicer to look at a current set rather
than go back to the older ones.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 04/11] io_uring/kbuf: add mmap support for kernel-managed buffer rings
  2026-02-10  0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
@ 2026-02-10  1:02   ` Jens Axboe
  0 siblings, 0 replies; 33+ messages in thread
From: Jens Axboe @ 2026-02-10  1:02 UTC (permalink / raw)
  To: Joanne Koong, io-uring
  Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

On 2/9/26 5:28 PM, Joanne Koong wrote:
> Add support for mmapping kernel-managed buffer rings (kmbuf) to
> userspace, allowing applications to access the kernel-allocated buffers.
> 
> Similar to application-provided buffer rings (pbuf), kmbuf rings use the
> buffer group ID encoded in the mmap offset to identify which buffer ring
> to map. The implementation follows the same pattern as pbuf rings.
> 
> New mmap offset constants are introduced:
>   - IORING_OFF_KMBUF_RING (0x88000000): Base offset for kmbuf mappings
>   - IORING_OFF_KMBUF_SHIFT (16): Shift value to encode buffer group ID
> 
> The mmap offset encodes the bgid shifted by IORING_OFF_KMBUF_SHIFT.
> The io_buf_get_region() helper retrieves the appropriate region.
> 
> This allows userspace to mmap the kernel-allocated buffer region and
> access the buffers directly.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  include/uapi/linux/io_uring.h |  2 ++
>  io_uring/kbuf.c               | 11 +++++++++--
>  io_uring/kbuf.h               |  5 +++--
>  io_uring/memmap.c             |  5 ++++-
>  4 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index a0889c1744bd..42a2812c9922 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -545,6 +545,8 @@ struct io_uring_cqe {
>  #define IORING_OFF_SQES			0x10000000ULL
>  #define IORING_OFF_PBUF_RING		0x80000000ULL
>  #define IORING_OFF_PBUF_SHIFT		16
> +#define IORING_OFF_KMBUF_RING		0x88000000ULL
> +#define IORING_OFF_KMBUF_SHIFT		16
>  #define IORING_OFF_MMAP_MASK		0xf8000000ULL
>  
>  /*
> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
> index 9bc36451d083..ccf5b213087b 100644
> --- a/io_uring/kbuf.c
> +++ b/io_uring/kbuf.c
> @@ -770,16 +770,23 @@ int io_register_pbuf_status(struct io_ring_ctx *ctx, void __user *arg)
>  	return 0;
>  }
>  
> -struct io_mapped_region *io_pbuf_get_region(struct io_ring_ctx *ctx,
> -					    unsigned int bgid)
> +struct io_mapped_region *io_buf_get_region(struct io_ring_ctx *ctx,
> +					   unsigned int bgid,
> +					   bool kernel_managed)
>  {
>  	struct io_buffer_list *bl;
> +	bool is_kernel_managed;
>  
>  	lockdep_assert_held(&ctx->mmap_lock);
>  
>  	bl = xa_load(&ctx->io_bl_xa, bgid);
>  	if (!bl || !(bl->flags & IOBL_BUF_RING))
>  		return NULL;
> +
> +	is_kernel_managed = !!(bl->flags & IOBL_KERNEL_MANAGED);
> +	if (is_kernel_managed != kernel_managed)
> +		return NULL;
> +
>  	return &bl->region;
>  }

For this, I think just add another helper - leave io_pbuf_get_region()
and add a bl->flags & IOBL_KERNEL_MANAGED error check in there, and
add a io_kbuf_get_region() or similar and have a !(bl->flags &
IOBL_KERNEL_MANAGED) error check in that one.

That's easier to read, and there's little reason to avoid duplicating
the xa_load() part.

Minor nit, but imho it's more readable that way.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning
  2026-02-10  0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
@ 2026-02-10  1:07   ` Jens Axboe
  2026-02-10 17:57     ` Caleb Sander Mateos
  0 siblings, 1 reply; 33+ messages in thread
From: Jens Axboe @ 2026-02-10  1:07 UTC (permalink / raw)
  To: Joanne Koong, io-uring
  Cc: csander, krisman, bernd, hch, asml.silence, linux-fsdevel

On 2/9/26 5:28 PM, Joanne Koong wrote:
> +int io_uring_buf_ring_pin(struct io_uring_cmd *cmd, unsigned buf_group,
> +			  unsigned issue_flags, struct io_buffer_list **bl)
> +{
> +	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> +	struct io_buffer_list *buffer_list;
> +	int ret = -EINVAL;

Probably use the usual struct io_buffer_list *bl here and either use an
ERR_PTR return, or rename the passed on **bl to **blret or something.

> +int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd, unsigned buf_group,
> +		       unsigned issue_flags)
> +{
> +	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> +	struct io_buffer_list *bl;
> +	int ret = -EINVAL;
> +
> +	io_ring_submit_lock(ctx, issue_flags);
> +
> +	bl = io_buffer_get_list(ctx, buf_group);
> +	if (bl && (bl->flags & IOBL_BUF_RING) && (bl->flags & IOBL_PINNED)) {

Usually done as:

	if ((bl->flags & (IOBL_BUF_RING|IOBL_PINNED)) == (IOBL_BUF_RING|IOBL_PINNED))

and maybe then just have an earlier

	if (!bl)
		goto err;

> +		bl->flags &= ~IOBL_PINNED;
> +		ret = 0;
> +	}
err:
> +	io_ring_submit_unlock(ctx, issue_flags);
> +	return ret;
> +}

to avoid making it way too long. For io_uring, it's fine to exceed 80
chars where it makes sense.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-10  0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
@ 2026-02-10 16:34   ` Pavel Begunkov
  2026-02-10 19:39     ` Joanne Koong
  2026-02-11 15:45     ` Christoph Hellwig
  0 siblings, 2 replies; 33+ messages in thread
From: Pavel Begunkov @ 2026-02-10 16:34 UTC (permalink / raw)
  To: Joanne Koong, axboe, io-uring; +Cc: csander, krisman, bernd, hch, linux-fsdevel

On 2/10/26 00:28, Joanne Koong wrote:
> Add support for kernel-managed buffer rings (kmbuf rings), which allow
> the kernel to allocate and manage the backing buffers for a buffer
> ring, rather than requiring the application to provide and manage them.
> 
> This introduces two new registration opcodes:
> - IORING_REGISTER_KMBUF_RING: Register a kernel-managed buffer ring
> - IORING_UNREGISTER_KMBUF_RING: Unregister a kernel-managed buffer ring
> 
> The existing io_uring_buf_reg structure is extended with a union to
> support both application-provided buffer rings (pbuf) and kernel-managed
> buffer rings (kmbuf):
> - For pbuf rings: ring_addr specifies the user-provided ring address
> - For kmbuf rings: buf_size specifies the size of each buffer. buf_size
>    must be non-zero and page-aligned.
> 
> The implementation follows the same pattern as pbuf ring registration,
> reusing the validation and buffer list allocation helpers introduced in
> earlier refactoring. The IOBL_KERNEL_MANAGED flag marks buffer lists as
> kernel-managed for appropriate handling in the I/O path.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>   include/uapi/linux/io_uring.h |  15 ++++-
>   io_uring/kbuf.c               |  81 ++++++++++++++++++++++++-
>   io_uring/kbuf.h               |   7 ++-
>   io_uring/memmap.c             | 111 ++++++++++++++++++++++++++++++++++
>   io_uring/memmap.h             |   4 ++
>   io_uring/register.c           |   7 +++
>   6 files changed, 219 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index fc473af6feb4..a0889c1744bd 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -715,6 +715,10 @@ enum io_uring_register_op {
>   	/* register bpf filtering programs */
>   	IORING_REGISTER_BPF_FILTER		= 37,
>   
> +	/* register/unregister kernel-managed ring buffer group */
> +	IORING_REGISTER_KMBUF_RING		= 38,
> +	IORING_UNREGISTER_KMBUF_RING		= 39,
> +
>   	/* this goes last */
>   	IORING_REGISTER_LAST,
>   
> @@ -891,9 +895,16 @@ enum io_uring_register_pbuf_ring_flags {
>   	IOU_PBUF_RING_INC	= 2,
>   };
>   
> -/* argument for IORING_(UN)REGISTER_PBUF_RING */
> +/* argument for IORING_(UN)REGISTER_PBUF_RING and
> + * IORING_(UN)REGISTER_KMBUF_RING
> + */
>   struct io_uring_buf_reg {
> -	__u64	ring_addr;
> +	union {
> +		/* used for pbuf rings */
> +		__u64	ring_addr;
> +		/* used for kmbuf rings */
> +		__u32   buf_size;

If you're creating a region, there should be no reason why it
can't work with user passed memory. You're fencing yourself off
optimisations that are already there like huge pages.

> +	};
>   	__u32	ring_entries;
>   	__u16	bgid;
>   	__u16	flags;
> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
> index aa9b70b72db4..9bc36451d083 100644
> --- a/io_uring/kbuf.c
> +++ b/io_uring/kbuf.c
...
> +static int io_setup_kmbuf_ring(struct io_ring_ctx *ctx,
> +			       struct io_buffer_list *bl,
> +			       struct io_uring_buf_reg *reg)
> +{
> +	struct io_uring_buf_ring *ring;
> +	unsigned long ring_size;
> +	void *buf_region;
> +	unsigned int i;
> +	int ret;
> +
> +	/* allocate pages for the ring structure */
> +	ring_size = flex_array_size(ring, bufs, bl->nr_entries);
> +	ring = kzalloc(ring_size, GFP_KERNEL_ACCOUNT);
> +	if (!ring)
> +		return -ENOMEM;
> +
> +	ret = io_create_region_multi_buf(ctx, &bl->region, bl->nr_entries,
> +					 reg->buf_size);

Please use io_create_region(), the new function does nothing new
and only violates abstractions.

Provided buffer rings with kernel addresses could be an interesting
abstraction, but why is it also responsible for allocating buffers?
What I'd do:

1. Strip buffer allocation from IORING_REGISTER_KMBUF_RING.
2. Replace *_REGISTER_KMBUF_RING with *_REGISTER_PBUF_RING + a new flag.
    Or maybe don't expose it to the user at all and create it from
    fuse via internal API.
3. Require the user to register a memory region of appropriate size,
    see IORING_REGISTER_MEM_REGION, ctx->param_region. Make fuse
    populating the buffer ring using the memory region.

I wanted to make regions shareable anyway (need it for other purposes),
I can toss patches for that tomorrow.

A separate question is whether extending buffer rings is the right
approach as it seems like you're only using it for fuse requests and
not for passing buffers to normal requests, but I don't see the
big picture here.

> +	if (ret) {
> +		kfree(ring);
> +		return ret;
> +	}
> +
> +	/* initialize ring buf entries to point to the buffers */
> +	buf_region = bl->region.ptr;

io_region_get_ptr()

> +	for (i = 0; i < bl->nr_entries; i++) {
> +		struct io_uring_buf *buf = &ring->bufs[i];
> +
> +		buf->addr = (u64)(uintptr_t)buf_region;
> +		buf->len = reg->buf_size;
> +		buf->bid = i;
> +
> +		buf_region += reg->buf_size;
> +	}
> +	ring->tail = bl->nr_entries;
> +
> +	bl->buf_ring = ring;
> +	bl->flags |= IOBL_KERNEL_MANAGED;
> +
> +	return 0;
> +}
> +
> +int io_register_kmbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
> +{
> +	struct io_uring_buf_reg reg;
> +	struct io_buffer_list *bl;
> +	int ret;
> +
> +	lockdep_assert_held(&ctx->uring_lock);
> +
> +	ret = io_copy_and_validate_buf_reg(arg, &reg, 0);
> +	if (ret)
> +		return ret;
> +
> +	if (!reg.buf_size || !PAGE_ALIGNED(reg.buf_size))

With io_create_region_multi_buf() gone, you shouldn't need
to align every buffer, that could be a lot of wasted memory
(thinking about 64KB pages).

> +		return -EINVAL;
> +
> +	bl = io_alloc_new_buffer_list(ctx, &reg);
> +	if (IS_ERR(bl))
> +		return PTR_ERR(bl);
> +
> +	ret = io_setup_kmbuf_ring(ctx, bl, &reg);
> +	if (ret) {
> +		kfree(bl);
> +		return ret;
> +	}
> +
> +	ret = io_buffer_add_list(ctx, bl, reg.bgid);
> +	if (ret)
> +		io_put_bl(ctx, bl);
> +
> +	return ret;

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning
  2026-02-10  1:07   ` Jens Axboe
@ 2026-02-10 17:57     ` Caleb Sander Mateos
  2026-02-10 18:00       ` Jens Axboe
  0 siblings, 1 reply; 33+ messages in thread
From: Caleb Sander Mateos @ 2026-02-10 17:57 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Joanne Koong, io-uring, krisman, bernd, hch, asml.silence,
	linux-fsdevel

On Mon, Feb 9, 2026 at 5:07 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 2/9/26 5:28 PM, Joanne Koong wrote:
> > +int io_uring_buf_ring_pin(struct io_uring_cmd *cmd, unsigned buf_group,
> > +                       unsigned issue_flags, struct io_buffer_list **bl)
> > +{
> > +     struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> > +     struct io_buffer_list *buffer_list;
> > +     int ret = -EINVAL;
>
> Probably use the usual struct io_buffer_list *bl here and either use an
> ERR_PTR return, or rename the passed on **bl to **blret or something.
>
> > +int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd, unsigned buf_group,
> > +                    unsigned issue_flags)
> > +{
> > +     struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> > +     struct io_buffer_list *bl;
> > +     int ret = -EINVAL;
> > +
> > +     io_ring_submit_lock(ctx, issue_flags);
> > +
> > +     bl = io_buffer_get_list(ctx, buf_group);
> > +     if (bl && (bl->flags & IOBL_BUF_RING) && (bl->flags & IOBL_PINNED)) {
>
> Usually done as:
>
>         if ((bl->flags & (IOBL_BUF_RING|IOBL_PINNED)) == (IOBL_BUF_RING|IOBL_PINNED))

FWIW, modern compilers will perform this optimization automatically.
They'll even optimize it further to !(~bl->flags &
(IOBL_BUF_RING|IOBL_PINNED)): https://godbolt.org/z/xGoP4TfhP

Best,
Caleb

>
> and maybe then just have an earlier
>
>         if (!bl)
>                 goto err;
>
> > +             bl->flags &= ~IOBL_PINNED;
> > +             ret = 0;
> > +     }
> err:
> > +     io_ring_submit_unlock(ctx, issue_flags);
> > +     return ret;
> > +}
>
> to avoid making it way too long. For io_uring, it's fine to exceed 80
> chars where it makes sense.
>
> --
> Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning
  2026-02-10 17:57     ` Caleb Sander Mateos
@ 2026-02-10 18:00       ` Jens Axboe
  0 siblings, 0 replies; 33+ messages in thread
From: Jens Axboe @ 2026-02-10 18:00 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Joanne Koong, io-uring, krisman, bernd, hch, asml.silence,
	linux-fsdevel

On 2/10/26 10:57 AM, Caleb Sander Mateos wrote:
> On Mon, Feb 9, 2026 at 5:07?PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 2/9/26 5:28 PM, Joanne Koong wrote:
>>> +int io_uring_buf_ring_pin(struct io_uring_cmd *cmd, unsigned buf_group,
>>> +                       unsigned issue_flags, struct io_buffer_list **bl)
>>> +{
>>> +     struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
>>> +     struct io_buffer_list *buffer_list;
>>> +     int ret = -EINVAL;
>>
>> Probably use the usual struct io_buffer_list *bl here and either use an
>> ERR_PTR return, or rename the passed on **bl to **blret or something.
>>
>>> +int io_uring_buf_ring_unpin(struct io_uring_cmd *cmd, unsigned buf_group,
>>> +                    unsigned issue_flags)
>>> +{
>>> +     struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
>>> +     struct io_buffer_list *bl;
>>> +     int ret = -EINVAL;
>>> +
>>> +     io_ring_submit_lock(ctx, issue_flags);
>>> +
>>> +     bl = io_buffer_get_list(ctx, buf_group);
>>> +     if (bl && (bl->flags & IOBL_BUF_RING) && (bl->flags & IOBL_PINNED)) {
>>
>> Usually done as:
>>
>>         if ((bl->flags & (IOBL_BUF_RING|IOBL_PINNED)) == (IOBL_BUF_RING|IOBL_PINNED))
> 
> FWIW, modern compilers will perform this optimization automatically.
> They'll even optimize it further to !(~bl->flags &
> (IOBL_BUF_RING|IOBL_PINNED)): https://godbolt.org/z/xGoP4TfhP

Sure, it's not about that, it's more about the common way of doing it,
which makes it easier to read for people. FWIW, your example is easier
to read too than the original.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-10 16:34   ` Pavel Begunkov
@ 2026-02-10 19:39     ` Joanne Koong
  2026-02-11 12:01       ` Pavel Begunkov
  2026-02-11 15:45     ` Christoph Hellwig
  1 sibling, 1 reply; 33+ messages in thread
From: Joanne Koong @ 2026-02-10 19:39 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: axboe, io-uring, csander, krisman, bernd, hch, linux-fsdevel

On Tue, Feb 10, 2026 at 8:34 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 2/10/26 00:28, Joanne Koong wrote:
> > Add support for kernel-managed buffer rings (kmbuf rings), which allow
> > the kernel to allocate and manage the backing buffers for a buffer
> > ring, rather than requiring the application to provide and manage them.
> >
> > This introduces two new registration opcodes:
> > - IORING_REGISTER_KMBUF_RING: Register a kernel-managed buffer ring
> > - IORING_UNREGISTER_KMBUF_RING: Unregister a kernel-managed buffer ring
> >
> > The existing io_uring_buf_reg structure is extended with a union to
> > support both application-provided buffer rings (pbuf) and kernel-managed
> > buffer rings (kmbuf):
> > - For pbuf rings: ring_addr specifies the user-provided ring address
> > - For kmbuf rings: buf_size specifies the size of each buffer. buf_size
> >    must be non-zero and page-aligned.
> >
> > The implementation follows the same pattern as pbuf ring registration,
> > reusing the validation and buffer list allocation helpers introduced in
> > earlier refactoring. The IOBL_KERNEL_MANAGED flag marks buffer lists as
> > kernel-managed for appropriate handling in the I/O path.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >   include/uapi/linux/io_uring.h |  15 ++++-
> >   io_uring/kbuf.c               |  81 ++++++++++++++++++++++++-
> >   io_uring/kbuf.h               |   7 ++-
> >   io_uring/memmap.c             | 111 ++++++++++++++++++++++++++++++++++
> >   io_uring/memmap.h             |   4 ++
> >   io_uring/register.c           |   7 +++
> >   6 files changed, 219 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > index fc473af6feb4..a0889c1744bd 100644
> > --- a/include/uapi/linux/io_uring.h
> > +++ b/include/uapi/linux/io_uring.h
> > @@ -715,6 +715,10 @@ enum io_uring_register_op {
> >       /* register bpf filtering programs */
> >       IORING_REGISTER_BPF_FILTER              = 37,
> >
> > +     /* register/unregister kernel-managed ring buffer group */
> > +     IORING_REGISTER_KMBUF_RING              = 38,
> > +     IORING_UNREGISTER_KMBUF_RING            = 39,
> > +
> >       /* this goes last */
> >       IORING_REGISTER_LAST,
> >
> > @@ -891,9 +895,16 @@ enum io_uring_register_pbuf_ring_flags {
> >       IOU_PBUF_RING_INC       = 2,
> >   };
> >
> > -/* argument for IORING_(UN)REGISTER_PBUF_RING */
> > +/* argument for IORING_(UN)REGISTER_PBUF_RING and
> > + * IORING_(UN)REGISTER_KMBUF_RING
> > + */
> >   struct io_uring_buf_reg {
> > -     __u64   ring_addr;
> > +     union {
> > +             /* used for pbuf rings */
> > +             __u64   ring_addr;
> > +             /* used for kmbuf rings */
> > +             __u32   buf_size;
>
> If you're creating a region, there should be no reason why it
> can't work with user passed memory. You're fencing yourself off
> optimisations that are already there like huge pages.

Are there any optimizations with user-allocated buffers that wouldn't
be possible with kernel-allocated buffers? For huge pages, can't the
kernel do this as well (eg I see in io_mem_alloc_compound(), it calls
into alloc_pages() with order > 0)?

>
> > +     };
> >       __u32   ring_entries;
> >       __u16   bgid;
> >       __u16   flags;
> > diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
> > index aa9b70b72db4..9bc36451d083 100644
> > --- a/io_uring/kbuf.c
> > +++ b/io_uring/kbuf.c
> ...
> > +static int io_setup_kmbuf_ring(struct io_ring_ctx *ctx,
> > +                            struct io_buffer_list *bl,
> > +                            struct io_uring_buf_reg *reg)
> > +{
> > +     struct io_uring_buf_ring *ring;
> > +     unsigned long ring_size;
> > +     void *buf_region;
> > +     unsigned int i;
> > +     int ret;
> > +
> > +     /* allocate pages for the ring structure */
> > +     ring_size = flex_array_size(ring, bufs, bl->nr_entries);
> > +     ring = kzalloc(ring_size, GFP_KERNEL_ACCOUNT);
> > +     if (!ring)
> > +             return -ENOMEM;
> > +
> > +     ret = io_create_region_multi_buf(ctx, &bl->region, bl->nr_entries,
> > +                                      reg->buf_size);
>
> Please use io_create_region(), the new function does nothing new
> and only violates abstractions.

There's separate checks needed between io_create_region() and
io_create_region_multi_buf() (eg IORING_MEM_REGION_TYPE_USER flag
checking) and different allocation calls (eg
io_region_allocate_pages() vs io_region_allocate_pages_multi_buf()).
Maybe I'm misinterpreting your comment (or the code), but I'm not
seeing how this can just use io_create_region().

>
> Provided buffer rings with kernel addresses could be an interesting
> abstraction, but why is it also responsible for allocating buffers?

Conceptually, I think it makes the interface and lifecycle management
simpler/cleaner. With registering it from userspace, imo there's
additional complications with no tangible benefits, eg it's not
guaranteed that the memory regions registered for the buffers are the
same size, with allocating it from the kernel-side we can guarantee
that the pages are allocated physically contiguously, userspace setup
with user-allocated buffers is less straightforward, etc. In general,
I'm just not really seeing what advantages there are in allocating the
buffers from userspace. Could you elaborate on that part more?

> What I'd do:
>
> 1. Strip buffer allocation from IORING_REGISTER_KMBUF_RING.
> 2. Replace *_REGISTER_KMBUF_RING with *_REGISTER_PBUF_RING + a new flag.
>     Or maybe don't expose it to the user at all and create it from
>     fuse via internal API.

If kmbuf rings are squashed into pbuf rings, then pbuf rings will need
to support pinning. In fuse, there are some contexts where you can't
grab the uring mutex because you're running in atomic context and this
can be encountered while recycling the buffer. I originally had a
patch adding pinning to pbuf rings (to mitigate the overhead of
registered buffers lookups) but dropped it when Jens and Caleb didn't
like the idea. But for kmbuf rings, pinning will be necessary for
fuse.

> 3. Require the user to register a memory region of appropriate size,
>     see IORING_REGISTER_MEM_REGION, ctx->param_region. Make fuse
>     populating the buffer ring using the memory region.
>
> I wanted to make regions shareable anyway (need it for other purposes),
> I can toss patches for that tomorrow.
>
> A separate question is whether extending buffer rings is the right
> approach as it seems like you're only using it for fuse requests and
> not for passing buffers to normal requests, but I don't see the

What are 'normal requests'? For fuse's use case, there are only fuse requests.

Thanks,
Joanne

> big picture here.
>
> > +     if (ret) {
> > +             kfree(ring);
> > +             return ret;
> > +     }
> > +
> > +     /* initialize ring buf entries to point to the buffers */
> > +     buf_region = bl->region.ptr;
>
> io_region_get_ptr()
>
> > +     for (i = 0; i < bl->nr_entries; i++) {
> > +             struct io_uring_buf *buf = &ring->bufs[i];
> > +
> > +             buf->addr = (u64)(uintptr_t)buf_region;
> > +             buf->len = reg->buf_size;
> > +             buf->bid = i;
> > +
> > +             buf_region += reg->buf_size;
> > +     }
> > +     ring->tail = bl->nr_entries;
> > +
> > +     bl->buf_ring = ring;
> > +     bl->flags |= IOBL_KERNEL_MANAGED;
> > +
> > +     return 0;
> > +}
> > +
> > +int io_register_kmbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
> > +{
> > +     struct io_uring_buf_reg reg;
> > +     struct io_buffer_list *bl;
> > +     int ret;
> > +
> > +     lockdep_assert_held(&ctx->uring_lock);
> > +
> > +     ret = io_copy_and_validate_buf_reg(arg, &reg, 0);
> > +     if (ret)
> > +             return ret;
> > +
> > +     if (!reg.buf_size || !PAGE_ALIGNED(reg.buf_size))
>
> With io_create_region_multi_buf() gone, you shouldn't need
> to align every buffer, that could be a lot of wasted memory
> (thinking about 64KB pages).
>
> > +             return -EINVAL;
> > +
> > +     bl = io_alloc_new_buffer_list(ctx, &reg);
> > +     if (IS_ERR(bl))
> > +             return PTR_ERR(bl);
> > +
> > +     ret = io_setup_kmbuf_ring(ctx, bl, &reg);
> > +     if (ret) {
> > +             kfree(bl);
> > +             return ret;
> > +     }
> > +
> > +     ret = io_buffer_add_list(ctx, bl, reg.bgid);
> > +     if (ret)
> > +             io_put_bl(ctx, bl);
> > +
> > +     return ret;
>
> --
> Pavel Begunkov
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection
  2026-02-10  0:53   ` Jens Axboe
@ 2026-02-10 22:36     ` Joanne Koong
  0 siblings, 0 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10 22:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: io-uring, csander, krisman, bernd, hch, asml.silence,
	linux-fsdevel

On Mon, Feb 9, 2026 at 4:53 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 2/9/26 5:28 PM, Joanne Koong wrote:
> > Return the id of the selected buffer in io_buffer_select(). This is
> > needed for kernel-managed buffer rings to later recycle the selected
> > buffer.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  include/linux/io_uring/cmd.h   | 2 +-
> >  include/linux/io_uring_types.h | 2 ++
> >  io_uring/kbuf.c                | 7 +++++--
> >  3 files changed, 8 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
> > index d4b5943bdeb1..94df2bdebe77 100644
> > --- a/include/linux/io_uring/cmd.h
> > +++ b/include/linux/io_uring/cmd.h
> > @@ -71,7 +71,7 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd);
> >
> >  /*
> >   * Select a buffer from the provided buffer group for multishot uring_cmd.
> > - * Returns the selected buffer address and size.
> > + * Returns the selected buffer address, size, and id.
> >   */
> >  struct io_br_sel io_uring_cmd_buffer_select(struct io_uring_cmd *ioucmd,
> >                                           unsigned buf_group, size_t *len,
> > diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> > index 36cc2e0346d9..5a56bb341337 100644
> > --- a/include/linux/io_uring_types.h
> > +++ b/include/linux/io_uring_types.h
> > @@ -100,6 +100,8 @@ struct io_br_sel {
> >               void *kaddr;
> >       };
> >       ssize_t val;
> > +     /* id of the selected buffer */
> > +     unsigned buf_id;
> >  };
>
> I'm probably missing something here, but why can't the caller just use
> req->buf_index for this?

The caller can, but from the caller side they only have access to the
cmd so they would need to do something like

struct io_kiocb *req = cmd_to_iocb_kiocb(ent->cmd);
buf_id = req->buf_index;

which may be kind of ugly with looking inside io-uring internals.
Maybe a helper here would be nicer, something like
io_uring_cmd_buf_id() or io_uring_req_buf_id(). It seemed cleaner to
me to just return the buf id as part of the io_br_sel struct, but I'm
happy to do it another way if you have a preference.

Thanks,
Joanne

>
> --
> Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 00/11] io_uring: add kernel-managed buffer rings
  2026-02-10  0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
@ 2026-02-10 22:45   ` Joanne Koong
  0 siblings, 0 replies; 33+ messages in thread
From: Joanne Koong @ 2026-02-10 22:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: io-uring, csander, krisman, bernd, hch, asml.silence,
	linux-fsdevel

On Mon, Feb 9, 2026 at 4:55 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 2/9/26 5:28 PM, Joanne Koong wrote:
> > Currently, io_uring buffer rings require the application to allocate and
> > manage the backing buffers. This series introduces kernel-managed buffer
> > rings, where the kernel allocates and manages the buffers on behalf of
> > the application.
> >
> > This is split out from the fuse over io_uring series in [1], which needs the
> > kernel to own and manage buffers shared between the fuse server and the
> > kernel.
> >
> > This series is on top of the for-next branch in Jens' io-uring tree. The
> > corresponding liburing changes are in [2] and will be submitted after the
> > changes in this patchset are accepted.
>
> Generally looks pretty good - for context, do you have a branch with
> these patches and the users on top too? Makes it a bit easier for cross
> referencing, as some of these really do need an exposed user to make a
> good judgement on the helpers.

Thanks for reviewing the patches. The branch containing the userside
changes on top of these patches is in [1]. I'll make the changes you
pointed out in your other comments as part of v2. Once the discussion
with Pavel is resolved / figured out with the changes he wants for v2,
I'll submit v2.

Thanks,
Joanne

[1] https://github.com/joannekoong/linux/commits/fuse_zero_copy/

>
> I know there's the older series, but I'm assuming the latter patches
> changed somewhat too, and it'd be nicer to look at a current set rather
> than go back to the older ones.
>
> --
> Jens Axboe

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-10 19:39     ` Joanne Koong
@ 2026-02-11 12:01       ` Pavel Begunkov
  2026-02-11 22:06         ` Joanne Koong
  0 siblings, 1 reply; 33+ messages in thread
From: Pavel Begunkov @ 2026-02-11 12:01 UTC (permalink / raw)
  To: Joanne Koong; +Cc: axboe, io-uring, csander, krisman, bernd, hch, linux-fsdevel

On 2/10/26 19:39, Joanne Koong wrote:
> On Tue, Feb 10, 2026 at 8:34 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
...
>>> -/* argument for IORING_(UN)REGISTER_PBUF_RING */
>>> +/* argument for IORING_(UN)REGISTER_PBUF_RING and
>>> + * IORING_(UN)REGISTER_KMBUF_RING
>>> + */
>>>    struct io_uring_buf_reg {
>>> -     __u64   ring_addr;
>>> +     union {
>>> +             /* used for pbuf rings */
>>> +             __u64   ring_addr;
>>> +             /* used for kmbuf rings */
>>> +             __u32   buf_size;
>>
>> If you're creating a region, there should be no reason why it
>> can't work with user passed memory. You're fencing yourself off
>> optimisations that are already there like huge pages.
> 
> Are there any optimizations with user-allocated buffers that wouldn't
> be possible with kernel-allocated buffers? For huge pages, can't the
> kernel do this as well (eg I see in io_mem_alloc_compound(), it calls
> into alloc_pages() with order > 0)?

Yes, there is handful of differences. To name one, 1MB allocation won't
get you a PMD mappable huge page, while user space can allocate 2MB,
register the first 1MB and reuse the rest for other purposes.

>>> +     };
>>>        __u32   ring_entries;
>>>        __u16   bgid;
>>>        __u16   flags;
>>> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
>>> index aa9b70b72db4..9bc36451d083 100644
>>> --- a/io_uring/kbuf.c
>>> +++ b/io_uring/kbuf.c
>> ...
>>> +static int io_setup_kmbuf_ring(struct io_ring_ctx *ctx,
>>> +                            struct io_buffer_list *bl,
>>> +                            struct io_uring_buf_reg *reg)
>>> +{
>>> +     struct io_uring_buf_ring *ring;
>>> +     unsigned long ring_size;
>>> +     void *buf_region;
>>> +     unsigned int i;
>>> +     int ret;
>>> +
>>> +     /* allocate pages for the ring structure */
>>> +     ring_size = flex_array_size(ring, bufs, bl->nr_entries);
>>> +     ring = kzalloc(ring_size, GFP_KERNEL_ACCOUNT);
>>> +     if (!ring)
>>> +             return -ENOMEM;
>>> +
>>> +     ret = io_create_region_multi_buf(ctx, &bl->region, bl->nr_entries,
>>> +                                      reg->buf_size);
>>
>> Please use io_create_region(), the new function does nothing new
>> and only violates abstractions.
> 
> There's separate checks needed between io_create_region() and
> io_create_region_multi_buf() (eg IORING_MEM_REGION_TYPE_USER flag

If io_create_region() is too strict, let's discuss that in
examples if there are any, but it's likely not a good idea changing
that. If it's too lax, filter arguments in the caller. IOW, don't
pass IORING_MEM_REGION_TYPE_USER if it's not used.

> checking) and different allocation calls (eg
> io_region_allocate_pages() vs io_region_allocate_pages_multi_buf()).

I saw that and saying that all memmap.c changes can get dropped.
You're using it as one big virtually contig kernel memory range then
chunked into buffers, and that's pretty much what you're getting with
normal io_create_region(). I get that you only need it to be
contiguous within a single buffer, but that's not what you're doing,
and it'll be only worse than default io_create_region() e.g.
effectively disabling any usefulness of io_mem_alloc_compound(),
and ultimately you don't need to care.

Regions shouldn't know anything about your buffers, how it's
subdivided after, etc.

> Maybe I'm misinterpreting your comment (or the code), but I'm not
> seeing how this can just use io_create_region().

struct io_uring_region_desc rd = {};
total_size = nr_bufs * buf_size;
rd.size = PAGE_ALIGN(total_size);
io_create_region(&region, &rd);

Add something like this for user provided memory:

if (use_user_memory) {
	rd.user_addr = uaddr;
	rd.flags |= IORING_MEM_REGION_TYPE_USER;
}

>> Provided buffer rings with kernel addresses could be an interesting
>> abstraction, but why is it also responsible for allocating buffers?
> 
> Conceptually, I think it makes the interface and lifecycle management
> simpler/cleaner. With registering it from userspace, imo there's
> additional complications with no tangible benefits, eg it's not
> guaranteed that the memory regions registered for the buffers are the
> same size, with allocating it from the kernel-side we can guarantee
> that the pages are allocated physically contiguously, userspace setup
> with user-allocated buffers is less straightforward, etc. In general,
> I'm just not really seeing what advantages there are in allocating the
> buffers from userspace. Could you elaborate on that part more?

I don't think I follow. I'm saying that it might be interesting
to separate rings from how and with what they're populated on the
kernel API level, but the fuse kernel module can do the population
and get exactly same layout as you currently have:

int fuse_create_ring(size_t region_offset /* user space argument */) {
	struct io_mapped_region *mr = get_mem_region(ctx);
	// that can take full control of the ring
	ring = grab_empty_ring(io_uring_ctx);

	size = nr_bufs * buf_size;
	if (region_offset + size > get_size(mr)) // + other validation
		return error;

	buf = mr_get_ptr(mr) + offset;
	for (i = 0; i < nr_bufs; i++) {
		ring_push_buffer(ring, buf, buf_size);
		buf += buf_size;
	}
}

fuse might not care, but with empty rings other users will get a
channel they can use to do IO (e.g. read requests) using their
kernel addresses in the future. 	

>> What I'd do:
>>
>> 1. Strip buffer allocation from IORING_REGISTER_KMBUF_RING.
>> 2. Replace *_REGISTER_KMBUF_RING with *_REGISTER_PBUF_RING + a new flag.
>>      Or maybe don't expose it to the user at all and create it from
>>      fuse via internal API.
> 
> If kmbuf rings are squashed into pbuf rings, then pbuf rings will need
> to support pinning. In fuse, there are some contexts where you can't

It'd change uapi but not internals, you already piggy back it
on pbuf implementation and differentiate with a flag.

It could basically be:

if (flags & IOU_PBUF_RING_KM)
	bl->flags |= IOBL_KERNEL_MANAGED;

Pinning can be gated on that flag as well. Pretty likely uapi
and internals will be a bit cleaner, but that's not a huge deal,
just don't see why would you roll out a separate set of uapi
([un]register, offsets, etc.) when essentially it can be treated
as the same thing.

> grab the uring mutex because you're running in atomic context and this
> can be encountered while recycling the buffer. I originally had a
> patch adding pinning to pbuf rings (to mitigate the overhead of
> registered buffers lookups) 

IIRC, you was pinning the registered buffer table and not provided
buffer rings? Which would indeed be a bad idea. Thinking about it,
fwiw, instead of creating multiple registered buffers and trying to
lock the entire table, you could've kept all memory in one larger
registered buffer and pinned only it. It's already refcounted, so
shouldn't have been much of a problem.

> but dropped it when Jens and Caleb didn't
> like the idea. But for kmbuf rings, pinning will be necessary for
> fuse.
> 
>> 3. Require the user to register a memory region of appropriate size,
>>      see IORING_REGISTER_MEM_REGION, ctx->param_region. Make fuse
>>      populating the buffer ring using the memory region.

To explain why, I don't think that creating many small regions
is a good direction going forward. In case of kernel allocation,
it's extra mmap()s, extra user space management, and wasted space.
For user provided memory it's over-accounting and extra memory
footprint. It'll also give you better lifecycle guarantees, i.e.
you won't be able to free buffers while there are requests for the
context. I'm not so sure about ring bound memory, let's say I have
my suspicions, and you'd need to be extra careful about buffer
lifetimes even after a fuse instance dies.

>> I wanted to make regions shareable anyway (need it for other purposes),
>> I can toss patches for that tomorrow.
>>
>> A separate question is whether extending buffer rings is the right
>> approach as it seems like you're only using it for fuse requests and
>> not for passing buffers to normal requests, but I don't see the
> 
> What are 'normal requests'? For fuse's use case, there are only fuse requests.

Any kind of read/recv/etc. that can use provided buffers. It's
where kernel memory filled rings would shine, as you'd be able
to use them together without changing any opcode specific code.
I.e. not changes in read request implementation, only kbuf.c

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-10 16:34   ` Pavel Begunkov
  2026-02-10 19:39     ` Joanne Koong
@ 2026-02-11 15:45     ` Christoph Hellwig
  2026-02-12 10:44       ` Pavel Begunkov
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2026-02-11 15:45 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Joanne Koong, axboe, io-uring, csander, krisman, bernd, hch,
	linux-fsdevel

On Tue, Feb 10, 2026 at 04:34:47PM +0000, Pavel Begunkov wrote:
> > +	union {
> > +		/* used for pbuf rings */
> > +		__u64	ring_addr;
> > +		/* used for kmbuf rings */
> > +		__u32   buf_size;
> 
> If you're creating a region, there should be no reason why it
> can't work with user passed memory. You're fencing yourself off
> optimisations that are already there like huge pages.

Any pages mapped to userspace can be allocated in the kernel as well.

And I really do like this design, because it means we can have a
buffer ring that is only mapped read-only into userspace.  That way
we can still do zero-copy raids if the device requires stable pages
for checksumming or raid.  I was going to implement this as soon
as this series lands upstream.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-11 12:01       ` Pavel Begunkov
@ 2026-02-11 22:06         ` Joanne Koong
  2026-02-12 10:07           ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Joanne Koong @ 2026-02-11 22:06 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: axboe, io-uring, csander, krisman, bernd, hch, linux-fsdevel

On Wed, Feb 11, 2026 at 4:01 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 2/10/26 19:39, Joanne Koong wrote:
> > On Tue, Feb 10, 2026 at 8:34 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> ...
> >>> -/* argument for IORING_(UN)REGISTER_PBUF_RING */
> >>> +/* argument for IORING_(UN)REGISTER_PBUF_RING and
> >>> + * IORING_(UN)REGISTER_KMBUF_RING
> >>> + */
> >>>    struct io_uring_buf_reg {
> >>> -     __u64   ring_addr;
> >>> +     union {
> >>> +             /* used for pbuf rings */
> >>> +             __u64   ring_addr;
> >>> +             /* used for kmbuf rings */
> >>> +             __u32   buf_size;
> >>
> >> If you're creating a region, there should be no reason why it
> >> can't work with user passed memory. You're fencing yourself off
> >> optimisations that are already there like huge pages.
> >
> > Are there any optimizations with user-allocated buffers that wouldn't
> > be possible with kernel-allocated buffers? For huge pages, can't the
> > kernel do this as well (eg I see in io_mem_alloc_compound(), it calls
> > into alloc_pages() with order > 0)?
>
> Yes, there is handful of differences. To name one, 1MB allocation won't
> get you a PMD mappable huge page, while user space can allocate 2MB,
> register the first 1MB and reuse the rest for other purposes.
>
> >>> +     };
> >>>        __u32   ring_entries;
> >>>        __u16   bgid;
> >>>        __u16   flags;
> >>> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
> >>> index aa9b70b72db4..9bc36451d083 100644
> >>> --- a/io_uring/kbuf.c
> >>> +++ b/io_uring/kbuf.c
> >> ...
> >>> +static int io_setup_kmbuf_ring(struct io_ring_ctx *ctx,
> >>> +                            struct io_buffer_list *bl,
> >>> +                            struct io_uring_buf_reg *reg)
> >>> +{
> >>> +     struct io_uring_buf_ring *ring;
> >>> +     unsigned long ring_size;
> >>> +     void *buf_region;
> >>> +     unsigned int i;
> >>> +     int ret;
> >>> +
> >>> +     /* allocate pages for the ring structure */
> >>> +     ring_size = flex_array_size(ring, bufs, bl->nr_entries);
> >>> +     ring = kzalloc(ring_size, GFP_KERNEL_ACCOUNT);
> >>> +     if (!ring)
> >>> +             return -ENOMEM;
> >>> +
> >>> +     ret = io_create_region_multi_buf(ctx, &bl->region, bl->nr_entries,
> >>> +                                      reg->buf_size);
> >>
> >> Please use io_create_region(), the new function does nothing new
> >> and only violates abstractions.
> >
> > There's separate checks needed between io_create_region() and
> > io_create_region_multi_buf() (eg IORING_MEM_REGION_TYPE_USER flag
>
> If io_create_region() is too strict, let's discuss that in
> examples if there are any, but it's likely not a good idea changing
> that. If it's too lax, filter arguments in the caller. IOW, don't
> pass IORING_MEM_REGION_TYPE_USER if it's not used.
>
> > checking) and different allocation calls (eg
> > io_region_allocate_pages() vs io_region_allocate_pages_multi_buf()).
>
> I saw that and saying that all memmap.c changes can get dropped.
> You're using it as one big virtually contig kernel memory range then
> chunked into buffers, and that's pretty much what you're getting with
> normal io_create_region(). I get that you only need it to be
> contiguous within a single buffer, but that's not what you're doing,
> and it'll be only worse than default io_create_region() e.g.
> effectively disabling any usefulness of io_mem_alloc_compound(),
> and ultimately you don't need to care.

When I originally implemented it, I had it use
io_region_allocate_pages() but this fails because it's allocating way
too much memory at once. For fuse's use case, each buffer is usually
at least 1 MB if not more. Allocating the memory one buffer a time in
io_region_allocate_pages_multi_buf() bypasses the allocation errors I
was seeing. That's the main reason I don't think this can just use
io_create_region().

>
> Regions shouldn't know anything about your buffers, how it's
> subdivided after, etc.
>
> > Maybe I'm misinterpreting your comment (or the code), but I'm not
> > seeing how this can just use io_create_region().
>
> struct io_uring_region_desc rd = {};
> total_size = nr_bufs * buf_size;
> rd.size = PAGE_ALIGN(total_size);
> io_create_region(&region, &rd);
>
> Add something like this for user provided memory:
>
> if (use_user_memory) {
>         rd.user_addr = uaddr;
>         rd.flags |= IORING_MEM_REGION_TYPE_USER;
> }
>
>
> >> Provided buffer rings with kernel addresses could be an interesting
> >> abstraction, but why is it also responsible for allocating buffers?
> >
> > Conceptually, I think it makes the interface and lifecycle management
> > simpler/cleaner. With registering it from userspace, imo there's
> > additional complications with no tangible benefits, eg it's not
> > guaranteed that the memory regions registered for the buffers are the
> > same size, with allocating it from the kernel-side we can guarantee
> > that the pages are allocated physically contiguously, userspace setup
> > with user-allocated buffers is less straightforward, etc. In general,
> > I'm just not really seeing what advantages there are in allocating the
> > buffers from userspace. Could you elaborate on that part more?
>
> I don't think I follow. I'm saying that it might be interesting
> to separate rings from how and with what they're populated on the
> kernel API level, but the fuse kernel module can do the population

Oh okay, from your first message I (and I think christoph too) thought
what you were saying is that the user should be responsible for
allocating the buffers with complete ownership over them, and then
just pass those allocated to the kernel to use. But what you're saying
is that just use a different way for getting the kernel to allocate
the buffers (eg through the IORING_REGISTER_MEM_REGION interface). Am
I reading this correctly?

> and get exactly same layout as you currently have:
>
> int fuse_create_ring(size_t region_offset /* user space argument */) {
>         struct io_mapped_region *mr = get_mem_region(ctx);
>         // that can take full control of the ring
>         ring = grab_empty_ring(io_uring_ctx);
>
>         size = nr_bufs * buf_size;
>         if (region_offset + size > get_size(mr)) // + other validation
>                 return error;
>
>         buf = mr_get_ptr(mr) + offset;
>         for (i = 0; i < nr_bufs; i++) {
>                 ring_push_buffer(ring, buf, buf_size);
>                 buf += buf_size;
>         }
> }
>
> fuse might not care, but with empty rings other users will get a
> channel they can use to do IO (e.g. read requests) using their
> kernel addresses in the future.
>
> >> What I'd do:
> >>
> >> 1. Strip buffer allocation from IORING_REGISTER_KMBUF_RING.
> >> 2. Replace *_REGISTER_KMBUF_RING with *_REGISTER_PBUF_RING + a new flag.
> >>      Or maybe don't expose it to the user at all and create it from
> >>      fuse via internal API.
> >
> > If kmbuf rings are squashed into pbuf rings, then pbuf rings will need
> > to support pinning. In fuse, there are some contexts where you can't
>
> It'd change uapi but not internals, you already piggy back it
> on pbuf implementation and differentiate with a flag.
>
> It could basically be:
>
> if (flags & IOU_PBUF_RING_KM)
>         bl->flags |= IOBL_KERNEL_MANAGED;
>
> Pinning can be gated on that flag as well. Pretty likely uapi
> and internals will be a bit cleaner, but that's not a huge deal,
> just don't see why would you roll out a separate set of uapi
> ([un]register, offsets, etc.) when essentially it can be treated
> as the same thing.

imo, it looked cleaner as a separate api because it has different
expectations and behaviors and squashing kmbuf into the pbuf api makes
the pbuf api needlessly more complex. Though I guess from the
userspace pov, liburing could have a wrapper that takes care of
setting up the pbuf details for kernel-managed pbufs. But in my head,
having pbufs vs. kmbufs makes it clearer what each one does vs regular
pbufs vs. pbufs that are kernel-managed.

Especially with now having kmbufs go through the ioring mem region
interface, it makes things more confusing imo if they're combined, eg
pbufs that are kernel-managed are created empty and then populated
from the kernel side by whatever subsystem is using them. Right now
there's only one mem region supported per ring, but in the future if
there's the possibility that multiple mem regions can be registered
(eg if userspace doesn't know upfront what mem region length they'll
need), then we should also probably add in a region id param for the
registration arg, which if kmbuf rings go through the pbuf ring
registration api, is not possible to do.

But I'm happy to combine the interfaces and go with your suggestion.
I'll make this change for v2 unless someone else objects.

>
> > grab the uring mutex because you're running in atomic context and this
> > can be encountered while recycling the buffer. I originally had a
> > patch adding pinning to pbuf rings (to mitigate the overhead of
> > registered buffers lookups)
>
> IIRC, you was pinning the registered buffer table and not provided

Yeah, you're right I misremembered and the objections / patch I
dropped was pinning the registered buffer table, not the pbuf ring

> buffer rings? Which would indeed be a bad idea. Thinking about it,
> fwiw, instead of creating multiple registered buffers and trying to
> lock the entire table, you could've kept all memory in one larger
> registered buffer and pinned only it. It's already refcounted, so
> shouldn't have been much of a problem.

Hmm, I'm not sure this idea would work for sparse buffers populated by
the kernel, unless those are automatically pinned too but then from
the user POV for unregistration they'd need to unregister buffers
individually instead of just calling IORING_UNREGISTER_BUFFERS but it
might be annoying for them to now need to know which buffers are
pinned vs not. When i benchmarked the fuse code with vs without pinned
registered buffers, it didn't seem to make much of a difference
performance-wise thankfully, so I just dropped it.

>
> > but dropped it when Jens and Caleb didn't
> > like the idea. But for kmbuf rings, pinning will be necessary for
> > fuse.
> >
> >> 3. Require the user to register a memory region of appropriate size,
> >>      see IORING_REGISTER_MEM_REGION, ctx->param_region. Make fuse
> >>      populating the buffer ring using the memory region.
>
> To explain why, I don't think that creating many small regions
> is a good direction going forward. In case of kernel allocation,
> it's extra mmap()s, extra user space management, and wasted space.

To clarify, is this in reply to why the individual buffers shouldn't
be allocated separately by the kernel?
I added a comment about this above in the discussion about
io_region_allocate_pages_multi_buf(), and if the memory allocation
issue I was seeing is bypassable and the region can be allocated all
at once, I'm happy to make that change. With having the allocation be
separate buffers though, I'm not sure I agree that there are extra
mmaps / userspace management. All the pages across the buffers are
vmapped together and the userspace just needs to do 1 mmap call for
them. On the userspace side, I don't think there's more management
since the mmapped address represents the range across all the buffers.
I'm not seeing how there's wasted space either since the only
requirement is that the buffer size is page aligned. I think also
there's a higher chance of the entire buffer region being physically
contiguous if each buffer is allocated separately vs. all the buffers
are allocated as 1 region. I don't feel strongly about this either way
and I'm happy to allocate the entire region at once if that's
possible.

> For user provided memory it's over-accounting and extra memory
> footprint. It'll also give you better lifecycle guarantees, i.e.

Just out of curiosity, could you elaborate on the over-accounting and
extra memory footprint? I was under the impression it would be the
same since the accounting gets adjusted by the total bytes allocated?
For the extra memory footprint, is the extra footprint from the
metadata to describe each buffer region, or are you referring to
something else?

> you won't be able to free buffers while there are requests for the
> context. I'm not so sure about ring bound memory, let's say I have
> my suspicions, and you'd need to be extra careful about buffer
> lifetimes even after a fuse instance dies.
>
> >> I wanted to make regions shareable anyway (need it for other purposes),
> >> I can toss patches for that tomorrow.
> >>
> >> A separate question is whether extending buffer rings is the right
> >> approach as it seems like you're only using it for fuse requests and
> >> not for passing buffers to normal requests, but I don't see the
> >
> > What are 'normal requests'? For fuse's use case, there are only fuse requests.
>
> Any kind of read/recv/etc. that can use provided buffers. It's
> where kernel memory filled rings would shine, as you'd be able
> to use them together without changing any opcode specific code.
> I.e. not changes in read request implementation, only kbuf.c
>

Thanks for your input on the series. To iterate / sum up, these are
changes for v2 I'll be making:
- api-wise from userspace/liburing: get rid of KMBUF_RING api
interface and have users go through PBUF_RING api instead with a flag
indicating the ring is kernel-managed
- have kernel buffer allocation go through IORING_REGISTER_MEM_REGION
instead, which means when the pbuf ring is created and the
kernel-managed flag is set, the ring will be empty. The memory region
will need to be registered before the mmap call to the ring fd.
- add apis for subsystems to populate a kernel-managed buffer ring
with addresses from the registered mem region

Does this align with your understanding of the conversation as well or
is there anything I'm missing?

And Christoph, do these changes for v2 work for your use case as well?

Thanks,
Joanne
> --
> Pavel Begunkov
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-11 22:06         ` Joanne Koong
@ 2026-02-12 10:07           ` Christoph Hellwig
  2026-02-12 10:52             ` Pavel Begunkov
  0 siblings, 1 reply; 33+ messages in thread
From: Christoph Hellwig @ 2026-02-12 10:07 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Pavel Begunkov, axboe, io-uring, csander, krisman, bernd, hch,
	linux-fsdevel

On Wed, Feb 11, 2026 at 02:06:18PM -0800, Joanne Koong wrote:
> > I don't think I follow. I'm saying that it might be interesting
> > to separate rings from how and with what they're populated on the
> > kernel API level, but the fuse kernel module can do the population
> 
> Oh okay, from your first message I (and I think christoph too) thought
> what you were saying is that the user should be responsible for
> allocating the buffers with complete ownership over them, and then
> just pass those allocated to the kernel to use. But what you're saying
> is that just use a different way for getting the kernel to allocate
> the buffers (eg through the IORING_REGISTER_MEM_REGION interface). Am
> I reading this correctly?

I'm arguing exactly against this.  For my use case I need a setup
where the kernel controls the allocation fully and guarantees user
processes can only read the memory but never write to it.  I'd love
to be able to piggy back than onto your work.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-11 15:45     ` Christoph Hellwig
@ 2026-02-12 10:44       ` Pavel Begunkov
  2026-02-13  7:18         ` Christoph Hellwig
  0 siblings, 1 reply; 33+ messages in thread
From: Pavel Begunkov @ 2026-02-12 10:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Joanne Koong, axboe, io-uring, csander, krisman, bernd,
	linux-fsdevel

On 2/11/26 15:45, Christoph Hellwig wrote:
> On Tue, Feb 10, 2026 at 04:34:47PM +0000, Pavel Begunkov wrote:
>>> +	union {
>>> +		/* used for pbuf rings */
>>> +		__u64	ring_addr;
>>> +		/* used for kmbuf rings */
>>> +		__u32   buf_size;
>>
>> If you're creating a region, there should be no reason why it
>> can't work with user passed memory. You're fencing yourself off
>> optimisations that are already there like huge pages.
> 
> Any pages mapped to userspace can be allocated in the kernel as well.

pow2 round ups will waste memory. 1MB allocations will never
become 2MB huge pages. And there is a separate question of
1GB huge pages. The user can be smarter about all placement
decisions.

> And I really do like this design, because it means we can have a
> buffer ring that is only mapped read-only into userspace.  That way
> we can still do zero-copy raids if the device requires stable pages
> for checksumming or raid.  I was going to implement this as soon
> as this series lands upstream.

That's an interesting case. To be clear, user provided memory is
an optional feature for pbuf rings / regions / etc., and I think
the io_uring uapi should leave fields for the feature. However, I
have nothing against fuse refusing to bind to buffer rings it
doesn't like.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-12 10:07           ` Christoph Hellwig
@ 2026-02-12 10:52             ` Pavel Begunkov
  2026-02-12 17:29               ` Joanne Koong
  2026-02-13  7:21               ` Christoph Hellwig
  0 siblings, 2 replies; 33+ messages in thread
From: Pavel Begunkov @ 2026-02-12 10:52 UTC (permalink / raw)
  To: Christoph Hellwig, Joanne Koong
  Cc: axboe, io-uring, csander, krisman, bernd, linux-fsdevel

On 2/12/26 10:07, Christoph Hellwig wrote:
> On Wed, Feb 11, 2026 at 02:06:18PM -0800, Joanne Koong wrote:
>>> I don't think I follow. I'm saying that it might be interesting
>>> to separate rings from how and with what they're populated on the
>>> kernel API level, but the fuse kernel module can do the population
>>
>> Oh okay, from your first message I (and I think christoph too) thought
>> what you were saying is that the user should be responsible for
>> allocating the buffers with complete ownership over them, and then
>> just pass those allocated to the kernel to use. But what you're saying
>> is that just use a different way for getting the kernel to allocate
>> the buffers (eg through the IORING_REGISTER_MEM_REGION interface). Am
>> I reading this correctly?
> 
> I'm arguing exactly against this.  For my use case I need a setup
> where the kernel controls the allocation fully and guarantees user
> processes can only read the memory but never write to it.  I'd love
> to be able to piggy back than onto your work.

IORING_REGISTER_MEM_REGION supports both types of allocations. It can
have a new registration flag for read-only, and then you either make
the bounce avoidance optional or reject binding fuse to unsupported
setups during init. Any arguments against that? I need to go over
Joanne's reply, but I don't see any contradiction in principal with
your use case.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-12 10:52             ` Pavel Begunkov
@ 2026-02-12 17:29               ` Joanne Koong
  2026-02-13  7:27                 ` Christoph Hellwig
  2026-02-13  7:21               ` Christoph Hellwig
  1 sibling, 1 reply; 33+ messages in thread
From: Joanne Koong @ 2026-02-12 17:29 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Christoph Hellwig, axboe, io-uring, csander, krisman, bernd,
	linux-fsdevel

On Thu, Feb 12, 2026 at 2:52 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 2/12/26 10:07, Christoph Hellwig wrote:
> > On Wed, Feb 11, 2026 at 02:06:18PM -0800, Joanne Koong wrote:
> >>> I don't think I follow. I'm saying that it might be interesting
> >>> to separate rings from how and with what they're populated on the
> >>> kernel API level, but the fuse kernel module can do the population
> >>
> >> Oh okay, from your first message I (and I think christoph too) thought
> >> what you were saying is that the user should be responsible for
> >> allocating the buffers with complete ownership over them, and then
> >> just pass those allocated to the kernel to use. But what you're saying
> >> is that just use a different way for getting the kernel to allocate
> >> the buffers (eg through the IORING_REGISTER_MEM_REGION interface). Am
> >> I reading this correctly?
> >
> > I'm arguing exactly against this.  For my use case I need a setup
> > where the kernel controls the allocation fully and guarantees user
> > processes can only read the memory but never write to it.  I'd love

By "control the allocation fully" do you mean for your use case, the
allocation/setup isn't triggered by userspace but is initiated by the
kernel (eg user never explicitly registers any kbuf ring, the kernel
just uses the kbuf ring data structure internally and users can read
the buffer contents)? If userspace initiates the setup of the kbuf
ring, going through IORING_REGISTER_MEM_REGION would be semantically
the same, except the buffer allocation by the kernel now happens
before the ring is created and then later populated into the ring.
userspace would still need to make an mmap call to the region and the
kernel could enforce that as read-only. But if userspace doesn't
initiate the setup, then going through IORING_REGISTER_MEM_REGION gets
uglier.

> > to be able to piggy back than onto your work.
>
> IORING_REGISTER_MEM_REGION supports both types of allocations. It can
> have a new registration flag for read-only, and then you either make
> the bounce avoidance optional or reject binding fuse to unsupported
> setups during init. Any arguments against that? I need to go over
> Joanne's reply, but I don't see any contradiction in principal with
> your use case.

So i guess the flow would have to be:
a) user calls io_uring_register_region(&ring, &mem_region_reg) with
mem_region_reg.region_uptr's size field set to the total buffer size
(and mem_region_reg.flags read-only bit set if needed)
     kernel allocates region
b) user calls mmap() to get the address of the region. If read-only
bit was set, it gets a read-only address
c) user calls io_uring_register_buf_ring(&ring, &buf_reg, flags) with
buf_reg.flags |= IOU_PBUF_RING_KERNEL_MANAGED
     kernel creates an empty kernel-managed ring. None of the buffers
are populated
d) user tells X subsystem to populate the ring starting from offset Z
in the registered mem region
e) on the kernel side, the subsystem populates the ring starting from
offset Z, filling it up using the buf_size and ring_entries values
that the user registered the ring with in c)

To be completely honest, the more I look at this the more this feels
like overkill / over-engineered to me. I get that now the user can do
the PMD optimization, but does that actually lead to noticeable
performance benefits? It seems especially confusing with them going
through the same pbuf ring interface but having totally different
expectations.

What about adding a straightforward kmbuf ring that goes through the
pbuf interface (eg the design in this patchset) and then in the future
adding an interface for pbuf rings (both kernel-managed and
non-kernel-managed) to go through IORING_REGISTERED_MEM_REGIONS if
users end up needing/wanting to have their rings populated that way?

Thanks,
Joanne

>
> --
> Pavel Begunkov
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-12 10:44       ` Pavel Begunkov
@ 2026-02-13  7:18         ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2026-02-13  7:18 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Christoph Hellwig, Joanne Koong, axboe, io-uring, csander,
	krisman, bernd, linux-fsdevel

On Thu, Feb 12, 2026 at 10:44:44AM +0000, Pavel Begunkov wrote:
> > 
> > Any pages mapped to userspace can be allocated in the kernel as well.
> 
> pow2 round ups will waste memory. 1MB allocations will never
> become 2MB huge pages. And there is a separate question of
> 1GB huge pages. The user can be smarter about all placement
> decisions.

Sure.  But if the application cares that much about TLB pressure
I'd just round up to nice multtiple of PTE levels.

> 
> > And I really do like this design, because it means we can have a
> > buffer ring that is only mapped read-only into userspace.  That way
> > we can still do zero-copy raids if the device requires stable pages
> > for checksumming or raid.  I was going to implement this as soon
> > as this series lands upstream.
> 
> That's an interesting case. To be clear, user provided memory is
> an optional feature for pbuf rings / regions / etc., and I think
> the io_uring uapi should leave fields for the feature. However, I
> have nothing against fuse refusing to bind to buffer rings it
> doesn't like.

Can you clarify what you mean with 'pbuf'?  The only fixed buffer API I
know is io_uring_register_buffers* which always takes user provided
buffers, so I have a hard time parsing what you're saying there.  But
that might just be sign that I'm no expert in io_uring APIs, and that
web searches have degraded to the point of not being very useful
anymore.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-12 10:52             ` Pavel Begunkov
  2026-02-12 17:29               ` Joanne Koong
@ 2026-02-13  7:21               ` Christoph Hellwig
  1 sibling, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2026-02-13  7:21 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Christoph Hellwig, Joanne Koong, axboe, io-uring, csander,
	krisman, bernd, linux-fsdevel

On Thu, Feb 12, 2026 at 10:52:29AM +0000, Pavel Begunkov wrote:
> > I'm arguing exactly against this.  For my use case I need a setup
> > where the kernel controls the allocation fully and guarantees user
> > processes can only read the memory but never write to it.  I'd love
> > to be able to piggy back than onto your work.
> 
> IORING_REGISTER_MEM_REGION supports both types of allocations. It can
> have a new registration flag for read-only, and then you either make

IORING_REGISTER_MEM_REGION seems to be all about cqs from both your
commit message and the public documentation.  I'm confused.

> the bounce avoidance optional or reject binding fuse to unsupported
> setups during init. Any arguments against that? I need to go over
> Joanne's reply, but I don't see any contradiction in principal with
> your use case.

My use case is not about fuse, but good old block and file system
I/O.


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
  2026-02-12 17:29               ` Joanne Koong
@ 2026-02-13  7:27                 ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2026-02-13  7:27 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Pavel Begunkov, Christoph Hellwig, axboe, io-uring, csander,
	krisman, bernd, linux-fsdevel

On Thu, Feb 12, 2026 at 09:29:31AM -0800, Joanne Koong wrote:
> > > I'm arguing exactly against this.  For my use case I need a setup
> > > where the kernel controls the allocation fully and guarantees user
> > > processes can only read the memory but never write to it.  I'd love
> 
> By "control the allocation fully" do you mean for your use case, the
> allocation/setup isn't triggered by userspace but is initiated by the
> kernel (eg user never explicitly registers any kbuf ring, the kernel
> just uses the kbuf ring data structure internally and users can read
> the buffer contents)? If userspace initiates the setup of the kbuf
> ring, going through IORING_REGISTER_MEM_REGION would be semantically
> the same, except the buffer allocation by the kernel now happens
> before the ring is created and then later populated into the ring.
> userspace would still need to make an mmap call to the region and the
> kernel could enforce that as read-only. But if userspace doesn't
> initiate the setup, then going through IORING_REGISTER_MEM_REGION gets
> uglier.

The idea is that the application tells the kernel that it wants to use
a fixed buffer pool for reads.  Right now the application does this
using io_uring_register_buffers().  The problem with that is that
io_uring_register_buffers ends up just doing a pin of the memory,
but the application or, in case of shared memory, someone else could
still modify the memory.  If the underlying file system or storage
device needs verify checksums, or worse rebuild data from parity
(or uncompress), it needs to ensure that the memory it is operating
on can't be modified by someone else.

So I've been thinking of a version of io_uring_register_buffers where
the buffers are not provided by the application, but instead by the
kernel and mapped into the application address space read-only for
a while, and I thought I could implement this on top of your series,
but I have to admit I haven't really looked into the details all
that much.

> 
> To be completely honest, the more I look at this the more this feels
> like overkill / over-engineered to me. I get that now the user can do
> the PMD optimization, but does that actually lead to noticeable
> performance benefits? It seems especially confusing with them going
> through the same pbuf ring interface but having totally different
> expectations.

Yes.  The PMD mapping also is not that relevant.  Both AMD (implicit)
and ARM (explicit) have optimizations for contiguous PTEs that are
almost as valuable.

> What about adding a straightforward kmbuf ring that goes through the
> pbuf interface (eg the design in this patchset) and then in the future
> adding an interface for pbuf rings (both kernel-managed and
> non-kernel-managed) to go through IORING_REGISTERED_MEM_REGIONS if
> users end up needing/wanting to have their rings populated that way?

That feels much simpler to me as well.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-02-13  7:27 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
2026-02-10  0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
2026-02-10  0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2026-02-10 16:34   ` Pavel Begunkov
2026-02-10 19:39     ` Joanne Koong
2026-02-11 12:01       ` Pavel Begunkov
2026-02-11 22:06         ` Joanne Koong
2026-02-12 10:07           ` Christoph Hellwig
2026-02-12 10:52             ` Pavel Begunkov
2026-02-12 17:29               ` Joanne Koong
2026-02-13  7:27                 ` Christoph Hellwig
2026-02-13  7:21               ` Christoph Hellwig
2026-02-11 15:45     ` Christoph Hellwig
2026-02-12 10:44       ` Pavel Begunkov
2026-02-13  7:18         ` Christoph Hellwig
2026-02-10  0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
2026-02-10  1:02   ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2026-02-10  0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2026-02-10  1:07   ` Jens Axboe
2026-02-10 17:57     ` Caleb Sander Mateos
2026-02-10 18:00       ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
2026-02-10  0:52   ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2026-02-10  0:53   ` Jens Axboe
2026-02-10 22:36     ` Joanne Koong
2026-02-10  0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2026-02-10  0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
2026-02-10 22:45   ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox