public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHv5 00/11] ublk zero copy support
@ 2025-02-24 21:31 Keith Busch
  2025-02-24 21:31 ` [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu Keith Busch
                   ` (12 more replies)
  0 siblings, 13 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Keith Busch <[email protected]>

Changes from v4:

  A few cleanup prep patches from me and Pavel are at the beginning of
  this series.

  Uses Pavel's combined buffer lookup and import. This simplifies
  utilizing fixed buffers a bit later in the series, and obviates any
  need to generically handle fixed buffers. This also fixes up the net
  zero-copy notif assignemnet that Ming pointed out.

  Included the nvme uring_cmd fix for using kernel registered bvecs from
  Xinyu.

  Used speculative safe array indexes when registering a new bvec
  (Pavel).

  Encode the allowed direction as bit flags (Caleb, Pavel).

  Incorporated various cleanups suggested by Caleb.

Keith Busch (7):
  io_uring/rsrc: remove redundant check for valid imu
  io_uring/nop: reuse req->buf_index
  io_uring/rw: move fixed buffer import to issue path
  io_uring: add support for kernel registered bvecs
  ublk: zc register/unregister bvec
  io_uring: add abstraction for buf_table rsrc data
  io_uring: cache nodes and mapped buffers

Pavel Begunkov (3):
  io_uring/net: reuse req->buf_index for sendzc
  io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed()
  io_uring: combine buffer lookup and import

Xinyu Zhang (1):
  nvme: map uring_cmd data even if address is 0

 drivers/block/ublk_drv.c       | 117 +++++++++----
 drivers/nvme/host/ioctl.c      |  12 +-
 include/linux/io_uring/cmd.h   |  13 +-
 include/linux/io_uring_types.h |  24 ++-
 include/uapi/linux/ublk_cmd.h  |   4 +
 io_uring/fdinfo.c              |   8 +-
 io_uring/filetable.c           |   2 +-
 io_uring/net.c                 |  25 +--
 io_uring/nop.c                 |   7 +-
 io_uring/opdef.c               |   8 +-
 io_uring/register.c            |   2 +-
 io_uring/rsrc.c                | 304 +++++++++++++++++++++++++++------
 io_uring/rsrc.h                |  16 +-
 io_uring/rw.c                  |  52 +++---
 io_uring/rw.h                  |   4 +-
 io_uring/uring_cmd.c           |  28 +--
 16 files changed, 433 insertions(+), 193 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25  8:37   ` Ming Lei
  2025-02-25 13:13   ` Pavel Begunkov
  2025-02-24 21:31 ` [PATCHv5 02/11] io_uring/nop: reuse req->buf_index Keith Busch
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Keith Busch <[email protected]>

The only caller to io_buffer_unmap already checks if the node's buf is
not null, so no need to check again.

Signed-off-by: Keith Busch <[email protected]>
---
 io_uring/rsrc.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 20b884c84e55f..efef29352dcfb 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -103,19 +103,16 @@ int io_buffer_validate(struct iovec *iov)
 
 static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
 {
+	struct io_mapped_ubuf *imu = node->buf;
 	unsigned int i;
 
-	if (node->buf) {
-		struct io_mapped_ubuf *imu = node->buf;
-
-		if (!refcount_dec_and_test(&imu->refs))
-			return;
-		for (i = 0; i < imu->nr_bvecs; i++)
-			unpin_user_page(imu->bvec[i].bv_page);
-		if (imu->acct_pages)
-			io_unaccount_mem(ctx, imu->acct_pages);
-		kvfree(imu);
-	}
+	if (!refcount_dec_and_test(&imu->refs))
+		return;
+	for (i = 0; i < imu->nr_bvecs; i++)
+		unpin_user_page(imu->bvec[i].bv_page);
+	if (imu->acct_pages)
+		io_unaccount_mem(ctx, imu->acct_pages);
+	kvfree(imu);
 }
 
 struct io_rsrc_node *io_rsrc_node_alloc(int type)
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 02/11] io_uring/nop: reuse req->buf_index
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
  2025-02-24 21:31 ` [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-24 23:30   ` Jens Axboe
                     ` (2 more replies)
  2025-02-24 21:31 ` [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc Keith Busch
                   ` (10 subsequent siblings)
  12 siblings, 3 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Keith Busch <[email protected]>

There is already a field in io_kiocb that can store a registered buffer
index, use that instead of stashing the value into struct io_nop.

Signed-off-by: Keith Busch <[email protected]>
---
 io_uring/nop.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/io_uring/nop.c b/io_uring/nop.c
index 5e5196df650a1..ea539531cb5f6 100644
--- a/io_uring/nop.c
+++ b/io_uring/nop.c
@@ -16,7 +16,6 @@ struct io_nop {
 	struct file     *file;
 	int             result;
 	int		fd;
-	int		buffer;
 	unsigned int	flags;
 };
 
@@ -40,9 +39,7 @@ int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	else
 		nop->fd = -1;
 	if (nop->flags & IORING_NOP_FIXED_BUFFER)
-		nop->buffer = READ_ONCE(sqe->buf_index);
-	else
-		nop->buffer = -1;
+		req->buf_index = READ_ONCE(sqe->buf_index);
 	return 0;
 }
 
@@ -69,7 +66,7 @@ int io_nop(struct io_kiocb *req, unsigned int issue_flags)
 
 		ret = -EFAULT;
 		io_ring_submit_lock(ctx, issue_flags);
-		node = io_rsrc_node_lookup(&ctx->buf_table, nop->buffer);
+		node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
 		if (node) {
 			io_req_assign_buf_node(req, node);
 			ret = 0;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
  2025-02-24 21:31 ` [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu Keith Busch
  2025-02-24 21:31 ` [PATCHv5 02/11] io_uring/nop: reuse req->buf_index Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25  8:44   ` Ming Lei
  2025-02-25 13:14   ` Pavel Begunkov
  2025-02-24 21:31 ` [PATCHv5 04/11] io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed() Keith Busch
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Pavel Begunkov <[email protected]>

There is already a field in io_kiocb that can store a registered buffer
index, use that instead of stashing the value into struct io_sr_msg.

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/net.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 173546415ed17..fa35a6b58d472 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -76,7 +76,6 @@ struct io_sr_msg {
 	u16				flags;
 	/* initialised and used only by !msg send variants */
 	u16				buf_group;
-	u16				buf_index;
 	bool				retry;
 	void __user			*msg_control;
 	/* used only for send zerocopy */
@@ -1371,7 +1370,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	zc->len = READ_ONCE(sqe->len);
 	zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL | MSG_ZEROCOPY;
-	zc->buf_index = READ_ONCE(sqe->buf_index);
+	req->buf_index = READ_ONCE(sqe->buf_index);
 	if (zc->msg_flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
 
@@ -1447,7 +1446,7 @@ static int io_send_zc_import(struct io_kiocb *req, unsigned int issue_flags)
 
 		ret = -EFAULT;
 		io_ring_submit_lock(ctx, issue_flags);
-		node = io_rsrc_node_lookup(&ctx->buf_table, sr->buf_index);
+		node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
 		if (node) {
 			io_req_assign_buf_node(sr->notif, node);
 			ret = 0;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 04/11] io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed()
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (2 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25  8:52   ` Ming Lei
  2025-02-24 21:31 ` [PATCHv5 05/11] io_uring: combine buffer lookup and import Keith Busch
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Pavel Begunkov <[email protected]>

io_uring_cmd_import_fixed() will need to know the io_uring execution
state in following commits, for now just pass issue_flags into it
without actually using.

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 drivers/nvme/host/ioctl.c    | 10 ++++++----
 include/linux/io_uring/cmd.h |  6 ++++--
 io_uring/uring_cmd.c         |  3 ++-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index e8930146847af..e0876bc9aacde 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -114,7 +114,8 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
 
 static int nvme_map_user_request(struct request *req, u64 ubuffer,
 		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
-		struct io_uring_cmd *ioucmd, unsigned int flags)
+		struct io_uring_cmd *ioucmd, unsigned int flags,
+		unsigned int iou_issue_flags)
 {
 	struct request_queue *q = req->q;
 	struct nvme_ns *ns = q->queuedata;
@@ -142,7 +143,8 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
 		if (WARN_ON_ONCE(flags & NVME_IOCTL_VEC))
 			return -EINVAL;
 		ret = io_uring_cmd_import_fixed(ubuffer, bufflen,
-				rq_data_dir(req), &iter, ioucmd);
+				rq_data_dir(req), &iter, ioucmd,
+				iou_issue_flags);
 		if (ret < 0)
 			goto out;
 		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
@@ -194,7 +196,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	req->timeout = timeout;
 	if (ubuffer && bufflen) {
 		ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer,
-				meta_len, NULL, flags);
+				meta_len, NULL, flags, 0);
 		if (ret)
 			return ret;
 	}
@@ -514,7 +516,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	if (d.addr && d.data_len) {
 		ret = nvme_map_user_request(req, d.addr,
 			d.data_len, nvme_to_user_ptr(d.metadata),
-			d.metadata_len, ioucmd, vec);
+			d.metadata_len, ioucmd, vec, issue_flags);
 		if (ret)
 			return ret;
 	}
diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index abd0c8bd950ba..87150dc0a07cf 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -39,7 +39,8 @@ static inline void io_uring_cmd_private_sz_check(size_t cmd_sz)
 
 #if defined(CONFIG_IO_URING)
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
-			      struct iov_iter *iter, void *ioucmd);
+			      struct iov_iter *iter, void *ioucmd,
+			      unsigned int issue_flags);
 
 /*
  * Completes the request, i.e. posts an io_uring CQE and deallocates @ioucmd
@@ -67,7 +68,8 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd);
 
 #else
 static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
-			      struct iov_iter *iter, void *ioucmd)
+			      struct iov_iter *iter, void *ioucmd,
+			      unsigned int issue_flags)
 {
 	return -EOPNOTSUPP;
 }
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 14086a2664611..28ed69c40756e 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -257,7 +257,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 }
 
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
-			      struct iov_iter *iter, void *ioucmd)
+			      struct iov_iter *iter, void *ioucmd,
+			      unsigned int issue_flags)
 {
 	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
 	struct io_rsrc_node *node = req->buf_node;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 05/11] io_uring: combine buffer lookup and import
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (3 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 04/11] io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed() Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25  8:55   ` Ming Lei
  2025-02-24 21:31 ` [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path Keith Busch
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Pavel Begunkov <[email protected]>

Registered buffer are currently imported in two steps, first we lookup
a rsrc node and then use it to set up the iterator. The first part is
usually done at the prep stage, and import happens whenever it's needed.
As we want to defer binding to a node so that it works with linked
requests, combine both steps into a single helper.

Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/net.c       | 22 ++++------------------
 io_uring/rsrc.c      | 31 ++++++++++++++++++++++++++++++-
 io_uring/rsrc.h      |  6 +++---
 io_uring/rw.c        |  9 +--------
 io_uring/uring_cmd.c | 25 ++++---------------------
 5 files changed, 42 insertions(+), 51 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index fa35a6b58d472..f223721418fac 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1441,24 +1441,10 @@ static int io_send_zc_import(struct io_kiocb *req, unsigned int issue_flags)
 	int ret;
 
 	if (sr->flags & IORING_RECVSEND_FIXED_BUF) {
-		struct io_ring_ctx *ctx = req->ctx;
-		struct io_rsrc_node *node;
-
-		ret = -EFAULT;
-		io_ring_submit_lock(ctx, issue_flags);
-		node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
-		if (node) {
-			io_req_assign_buf_node(sr->notif, node);
-			ret = 0;
-		}
-		io_ring_submit_unlock(ctx, issue_flags);
-
-		if (unlikely(ret))
-			return ret;
-
-		ret = io_import_fixed(ITER_SOURCE, &kmsg->msg.msg_iter,
-					node->buf, (u64)(uintptr_t)sr->buf,
-					sr->len);
+		sr->notif->buf_index = req->buf_index;
+		ret = io_import_reg_buf(sr->notif, &kmsg->msg.msg_iter,
+					(u64)(uintptr_t)sr->buf, sr->len,
+					ITER_SOURCE, issue_flags);
 		if (unlikely(ret))
 			return ret;
 		kmsg->msg.sg_from_iter = io_sg_from_iter;
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index efef29352dcfb..f814526982c36 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -857,7 +857,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 	return ret;
 }
 
-int io_import_fixed(int ddir, struct iov_iter *iter,
+static int io_import_fixed(int ddir, struct iov_iter *iter,
 			   struct io_mapped_ubuf *imu,
 			   u64 buf_addr, size_t len)
 {
@@ -916,6 +916,35 @@ int io_import_fixed(int ddir, struct iov_iter *iter,
 	return 0;
 }
 
+static inline struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req,
+						    unsigned issue_flags)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+	struct io_rsrc_node *node;
+
+	if (req->flags & REQ_F_BUF_NODE)
+		return req->buf_node;
+
+	io_ring_submit_lock(ctx, issue_flags);
+	node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
+	if (node)
+		io_req_assign_buf_node(req, node);
+	io_ring_submit_unlock(ctx, issue_flags);
+	return node;
+}
+
+int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
+			u64 buf_addr, size_t len, int ddir,
+			unsigned issue_flags)
+{
+	struct io_rsrc_node *node;
+
+	node = io_find_buf_node(req, issue_flags);
+	if (!node)
+		return -EFAULT;
+	return io_import_fixed(ddir, iter, node->buf, buf_addr, len);
+}
+
 /* Lock two rings at once. The rings must be different! */
 static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2)
 {
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 2b1e258954092..f0e9080599646 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -44,9 +44,9 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node);
 void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data);
 int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr);
 
-int io_import_fixed(int ddir, struct iov_iter *iter,
-			   struct io_mapped_ubuf *imu,
-			   u64 buf_addr, size_t len);
+int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
+			u64 buf_addr, size_t len, int ddir,
+			unsigned issue_flags);
 
 int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 3443f418d9120..db24bcd4c6335 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -352,8 +352,6 @@ static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe
 			    int ddir)
 {
 	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
-	struct io_ring_ctx *ctx = req->ctx;
-	struct io_rsrc_node *node;
 	struct io_async_rw *io;
 	int ret;
 
@@ -361,13 +359,8 @@ static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe
 	if (unlikely(ret))
 		return ret;
 
-	node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
-	if (!node)
-		return -EFAULT;
-	io_req_assign_buf_node(req, node);
-
 	io = req->async_data;
-	ret = io_import_fixed(ddir, &io->iter, node->buf, rw->addr, rw->len);
+	ret = io_import_reg_buf(req, &io->iter, rw->addr, rw->len, ddir, 0);
 	iov_iter_save_state(&io->iter, &io->iter_state);
 	return ret;
 }
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 28ed69c40756e..31d5e0948af14 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -199,21 +199,9 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (ioucmd->flags & ~IORING_URING_CMD_MASK)
 		return -EINVAL;
 
-	if (ioucmd->flags & IORING_URING_CMD_FIXED) {
-		struct io_ring_ctx *ctx = req->ctx;
-		struct io_rsrc_node *node;
-		u16 index = READ_ONCE(sqe->buf_index);
-
-		node = io_rsrc_node_lookup(&ctx->buf_table, index);
-		if (unlikely(!node))
-			return -EFAULT;
-		/*
-		 * Pi node upfront, prior to io_uring_cmd_import_fixed()
-		 * being called. This prevents destruction of the mapped buffer
-		 * we'll need at actual import time.
-		 */
-		io_req_assign_buf_node(req, node);
-	}
+	if (ioucmd->flags & IORING_URING_CMD_FIXED)
+		req->buf_index = READ_ONCE(sqe->buf_index);
+
 	ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
 
 	return io_uring_cmd_prep_setup(req, sqe);
@@ -261,13 +249,8 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      unsigned int issue_flags)
 {
 	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
-	struct io_rsrc_node *node = req->buf_node;
-
-	/* Must have had rsrc_node assigned at prep time */
-	if (node)
-		return io_import_fixed(rw, iter, node->buf, ubuf, len);
 
-	return -EFAULT;
+	return io_import_reg_buf(req, iter, ubuf, len, rw, issue_flags);
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (4 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 05/11] io_uring: combine buffer lookup and import Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25  9:26   ` Ming Lei
                     ` (2 more replies)
  2025-02-24 21:31 ` [PATCHv5 07/11] io_uring: add support for kernel registered bvecs Keith Busch
                   ` (6 subsequent siblings)
  12 siblings, 3 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Keith Busch <[email protected]>

Registered buffers may depend on a linked command, which makes the prep
path too early to import. Move to the issue path when the node is
actually needed like all the other users of fixed buffers.

Signed-off-by: Keith Busch <[email protected]>
---
 io_uring/opdef.c |  8 ++++----
 io_uring/rw.c    | 43 ++++++++++++++++++++++++++-----------------
 io_uring/rw.h    |  4 ++--
 3 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 9344534780a02..5369ae33b5ad9 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -104,8 +104,8 @@ const struct io_issue_def io_issue_defs[] = {
 		.iopoll			= 1,
 		.iopoll_queue		= 1,
 		.async_size		= sizeof(struct io_async_rw),
-		.prep			= io_prep_read_fixed,
-		.issue			= io_read,
+		.prep			= io_prep_read,
+		.issue			= io_read_fixed,
 	},
 	[IORING_OP_WRITE_FIXED] = {
 		.needs_file		= 1,
@@ -118,8 +118,8 @@ const struct io_issue_def io_issue_defs[] = {
 		.iopoll			= 1,
 		.iopoll_queue		= 1,
 		.async_size		= sizeof(struct io_async_rw),
-		.prep			= io_prep_write_fixed,
-		.issue			= io_write,
+		.prep			= io_prep_write,
+		.issue			= io_write_fixed,
 	},
 	[IORING_OP_POLL_ADD] = {
 		.needs_file		= 1,
diff --git a/io_uring/rw.c b/io_uring/rw.c
index db24bcd4c6335..5f37fa48fdd9b 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -348,33 +348,20 @@ int io_prep_writev(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return io_prep_rwv(req, sqe, ITER_SOURCE);
 }
 
-static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe,
-			    int ddir)
+static int io_init_rw_fixed(struct io_kiocb *req, unsigned int issue_flags, int ddir)
 {
 	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
-	struct io_async_rw *io;
+	struct io_async_rw *io = req->async_data;
 	int ret;
 
-	ret = io_prep_rw(req, sqe, ddir, false);
-	if (unlikely(ret))
-		return ret;
+	if (io->bytes_done)
+		return 0;
 
-	io = req->async_data;
 	ret = io_import_reg_buf(req, &io->iter, rw->addr, rw->len, ddir, 0);
 	iov_iter_save_state(&io->iter, &io->iter_state);
 	return ret;
 }
 
-int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
-	return io_prep_rw_fixed(req, sqe, ITER_DEST);
-}
-
-int io_prep_write_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-{
-	return io_prep_rw_fixed(req, sqe, ITER_SOURCE);
-}
-
 /*
  * Multishot read is prepared just like a normal read/write request, only
  * difference is that we set the MULTISHOT flag.
@@ -1138,6 +1125,28 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	}
 }
 
+int io_read_fixed(struct io_kiocb *req, unsigned int issue_flags)
+{
+	int ret;
+
+	ret = io_init_rw_fixed(req, issue_flags, ITER_DEST);
+	if (ret)
+		return ret;
+
+	return io_read(req, issue_flags);
+}
+
+int io_write_fixed(struct io_kiocb *req, unsigned int issue_flags)
+{
+	int ret;
+
+	ret = io_init_rw_fixed(req, issue_flags, ITER_SOURCE);
+	if (ret)
+		return ret;
+
+	return io_write(req, issue_flags);
+}
+
 void io_rw_fail(struct io_kiocb *req)
 {
 	int res;
diff --git a/io_uring/rw.h b/io_uring/rw.h
index a45e0c71b59d6..42a491d277273 100644
--- a/io_uring/rw.h
+++ b/io_uring/rw.h
@@ -30,14 +30,14 @@ struct io_async_rw {
 	);
 };
 
-int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe);
-int io_prep_write_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_prep_readv(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_prep_writev(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_prep_read(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_prep_write(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_read(struct io_kiocb *req, unsigned int issue_flags);
 int io_write(struct io_kiocb *req, unsigned int issue_flags);
+int io_read_fixed(struct io_kiocb *req, unsigned int issue_flags);
+int io_write_fixed(struct io_kiocb *req, unsigned int issue_flags);
 void io_readv_writev_cleanup(struct io_kiocb *req);
 void io_rw_fail(struct io_kiocb *req);
 void io_req_rw_complete(struct io_kiocb *req, io_tw_token_t tw);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 07/11] io_uring: add support for kernel registered bvecs
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (5 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25  9:40   ` Ming Lei
                     ` (2 more replies)
  2025-02-24 21:31 ` [PATCHv5 08/11] nvme: map uring_cmd data even if address is 0 Keith Busch
                   ` (5 subsequent siblings)
  12 siblings, 3 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Keith Busch <[email protected]>

Provide an interface for the kernel to leverage the existing
pre-registered buffers that io_uring provides. User space can reference
these later to achieve zero-copy IO.

User space must register an empty fixed buffer table with io_uring in
order for the kernel to make use of it.

Signed-off-by: Keith Busch <[email protected]>
---
 include/linux/io_uring/cmd.h |   7 ++
 io_uring/rsrc.c              | 123 +++++++++++++++++++++++++++++++++--
 io_uring/rsrc.h              |   8 +++
 3 files changed, 131 insertions(+), 7 deletions(-)

diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index 87150dc0a07cf..cf8d80d847344 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -4,6 +4,7 @@
 
 #include <uapi/linux/io_uring.h>
 #include <linux/io_uring_types.h>
+#include <linux/blk-mq.h>
 
 /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */
 #define IORING_URING_CMD_CANCELABLE	(1U << 30)
@@ -125,4 +126,10 @@ static inline struct io_uring_cmd_data *io_uring_cmd_get_async_data(struct io_ur
 	return cmd_to_io_kiocb(cmd)->async_data;
 }
 
+int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
+			    void (*release)(void *), unsigned int index,
+			    unsigned int issue_flags);
+void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
+			       unsigned int issue_flags);
+
 #endif /* _LINUX_IO_URING_CMD_H */
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index f814526982c36..e0c6ed3aef5b5 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -9,6 +9,7 @@
 #include <linux/hugetlb.h>
 #include <linux/compat.h>
 #include <linux/io_uring.h>
+#include <linux/io_uring/cmd.h>
 
 #include <uapi/linux/io_uring.h>
 
@@ -104,14 +105,21 @@ int io_buffer_validate(struct iovec *iov)
 static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
 {
 	struct io_mapped_ubuf *imu = node->buf;
-	unsigned int i;
 
 	if (!refcount_dec_and_test(&imu->refs))
 		return;
-	for (i = 0; i < imu->nr_bvecs; i++)
-		unpin_user_page(imu->bvec[i].bv_page);
-	if (imu->acct_pages)
-		io_unaccount_mem(ctx, imu->acct_pages);
+
+	if (imu->release) {
+		imu->release(imu->priv);
+	} else {
+		unsigned int i;
+
+		for (i = 0; i < imu->nr_bvecs; i++)
+			unpin_user_page(imu->bvec[i].bv_page);
+		if (imu->acct_pages)
+			io_unaccount_mem(ctx, imu->acct_pages);
+	}
+
 	kvfree(imu);
 }
 
@@ -761,6 +769,9 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
 	imu->len = iov->iov_len;
 	imu->nr_bvecs = nr_pages;
 	imu->folio_shift = PAGE_SHIFT;
+	imu->release = NULL;
+	imu->priv = NULL;
+	imu->perm = IO_IMU_READABLE | IO_IMU_WRITEABLE;
 	if (coalesced)
 		imu->folio_shift = data.folio_shift;
 	refcount_set(&imu->refs, 1);
@@ -857,6 +868,95 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 	return ret;
 }
 
+int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
+			    void (*release)(void *), unsigned int index,
+			    unsigned int issue_flags)
+{
+	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
+	struct io_rsrc_data *data = &ctx->buf_table;
+	struct req_iterator rq_iter;
+	struct io_mapped_ubuf *imu;
+	struct io_rsrc_node *node;
+	struct bio_vec bv, *bvec;
+	u16 nr_bvecs;
+	int ret = 0;
+
+
+	io_ring_submit_lock(ctx, issue_flags);
+	if (index >= data->nr) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+	index = array_index_nospec(index, data->nr);
+
+	if (data->nodes[index] ) {
+		ret = -EBUSY;
+		goto unlock;
+	}
+
+	node = io_rsrc_node_alloc(IORING_RSRC_BUFFER);
+	if (!node) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	nr_bvecs = blk_rq_nr_phys_segments(rq);
+	imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL);
+	if (!imu) {
+		kfree(node);
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	imu->ubuf = 0;
+	imu->len = blk_rq_bytes(rq);
+	imu->acct_pages = 0;
+	imu->folio_shift = PAGE_SHIFT;
+	imu->nr_bvecs = nr_bvecs;
+	refcount_set(&imu->refs, 1);
+	imu->release = release;
+	imu->priv = rq;
+
+	if (op_is_write(req_op(rq)))
+		imu->perm = IO_IMU_WRITEABLE;
+	else
+		imu->perm = IO_IMU_READABLE;
+
+	bvec = imu->bvec;
+	rq_for_each_bvec(bv, rq, rq_iter)
+		*bvec++ = bv;
+
+	node->buf = imu;
+	data->nodes[index] = node;
+unlock:
+	io_ring_submit_unlock(ctx, issue_flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(io_buffer_register_bvec);
+
+void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
+			       unsigned int issue_flags)
+{
+	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
+	struct io_rsrc_data *data = &ctx->buf_table;
+	struct io_rsrc_node *node;
+
+	io_ring_submit_lock(ctx, issue_flags);
+	if (index >= data->nr)
+		goto unlock;
+	index = array_index_nospec(index, data->nr);
+
+	node = data->nodes[index];
+	if (!node || !node->buf->release)
+		goto unlock;
+
+	io_put_rsrc_node(ctx, node);
+	data->nodes[index] = NULL;
+unlock:
+	io_ring_submit_unlock(ctx, issue_flags);
+}
+EXPORT_SYMBOL_GPL(io_buffer_unregister_bvec);
+
 static int io_import_fixed(int ddir, struct iov_iter *iter,
 			   struct io_mapped_ubuf *imu,
 			   u64 buf_addr, size_t len)
@@ -871,6 +971,8 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
 	/* not inside the mapped region */
 	if (unlikely(buf_addr < imu->ubuf || buf_end > (imu->ubuf + imu->len)))
 		return -EFAULT;
+	if (!(imu->perm & (1 << ddir)))
+		return -EFAULT;
 
 	/*
 	 * Might not be a start of buffer, set size appropriately
@@ -883,8 +985,8 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
 		/*
 		 * Don't use iov_iter_advance() here, as it's really slow for
 		 * using the latter parts of a big fixed buffer - it iterates
-		 * over each segment manually. We can cheat a bit here, because
-		 * we know that:
+		 * over each segment manually. We can cheat a bit here for user
+		 * registered nodes, because we know that:
 		 *
 		 * 1) it's a BVEC iter, we set it up
 		 * 2) all bvecs are the same in size, except potentially the
@@ -898,8 +1000,15 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
 		 */
 		const struct bio_vec *bvec = imu->bvec;
 
+		/*
+		 * Kernel buffer bvecs, on the other hand, don't necessarily
+		 * have the size property of user registered ones, so we have
+		 * to use the slow iter advance.
+		 */
 		if (offset < bvec->bv_len) {
 			iter->iov_offset = offset;
+		} else if (imu->release) {
+			iov_iter_advance(iter, offset);
 		} else {
 			unsigned long seg_skip;
 
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index f0e9080599646..64bf35667cf9c 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -20,6 +20,11 @@ struct io_rsrc_node {
 	};
 };
 
+enum {
+	IO_IMU_READABLE		= 1 << 0,
+	IO_IMU_WRITEABLE	= 1 << 1,
+};
+
 struct io_mapped_ubuf {
 	u64		ubuf;
 	unsigned int	len;
@@ -27,6 +32,9 @@ struct io_mapped_ubuf {
 	unsigned int    folio_shift;
 	refcount_t	refs;
 	unsigned long	acct_pages;
+	void		(*release)(void *);
+	void		*priv;
+	u8		perm;
 	struct bio_vec	bvec[] __counted_by(nr_bvecs);
 };
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 08/11] nvme: map uring_cmd data even if address is 0
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (6 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 07/11] io_uring: add support for kernel registered bvecs Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25  9:41   ` Ming Lei
  2025-02-24 21:31 ` [PATCHv5 09/11] ublk: zc register/unregister bvec Keith Busch
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Xinyu Zhang, Keith Busch

From: Xinyu Zhang <[email protected]>

When using kernel registered bvec fixed buffers, the "address" is
actually the offset into the bvec rather than userspace address.
Therefore it can be 0.
We can skip checking whether the address is NULL before mapping
uring_cmd data. Bad userspace address will be handled properly later when
the user buffer is imported.
With this patch, we will be able to use the kernel registered bvec fixed
buffers in io_uring NVMe passthru with ublk zero-copy support in
https://lore.kernel.org/io-uring/[email protected]/T/#u.

Reviewed-by: Caleb Sander Mateos <[email protected]>
Reviewed-by: Jens Axboe <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
Signed-off-by: Xinyu Zhang <[email protected]>
---
 drivers/nvme/host/ioctl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index e0876bc9aacde..fe9fb80c6a144 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -513,7 +513,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		return PTR_ERR(req);
 	req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0;
 
-	if (d.addr && d.data_len) {
+	if (d.data_len) {
 		ret = nvme_map_user_request(req, d.addr,
 			d.data_len, nvme_to_user_ptr(d.metadata),
 			d.metadata_len, ioucmd, vec, issue_flags);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (7 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 08/11] nvme: map uring_cmd data even if address is 0 Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25 11:00   ` Ming Lei
                     ` (3 more replies)
  2025-02-24 21:31 ` [PATCHv5 10/11] io_uring: add abstraction for buf_table rsrc data Keith Busch
                   ` (3 subsequent siblings)
  12 siblings, 4 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Keith Busch <[email protected]>

Provide new operations for the user to request mapping an active request
to an io uring instance's buf_table. The user has to provide the index
it wants to install the buffer.

A reference count is taken on the request to ensure it can't be
completed while it is active in a ring's buf_table.

Signed-off-by: Keith Busch <[email protected]>
---
 drivers/block/ublk_drv.c      | 117 +++++++++++++++++++++++-----------
 include/uapi/linux/ublk_cmd.h |   4 ++
 2 files changed, 85 insertions(+), 36 deletions(-)

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 529085181f355..a719d873e3882 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -51,6 +51,9 @@
 /* private ioctl command mirror */
 #define UBLK_CMD_DEL_DEV_ASYNC	_IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
 
+#define UBLK_IO_REGISTER_IO_BUF		_IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
+#define UBLK_IO_UNREGISTER_IO_BUF	_IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
+
 /* All UBLK_F_* have to be included into UBLK_F_ALL */
 #define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
 		| UBLK_F_URING_CMD_COMP_IN_TASK \
@@ -201,7 +204,7 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
 						   int tag);
 static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
 {
-	return ub->dev_info.flags & UBLK_F_USER_COPY;
+	return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
 }
 
 static inline bool ublk_dev_is_zoned(const struct ublk_device *ub)
@@ -581,7 +584,7 @@ static void ublk_apply_params(struct ublk_device *ub)
 
 static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
 {
-	return ubq->flags & UBLK_F_USER_COPY;
+	return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
 }
 
 static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
@@ -1747,6 +1750,77 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd,
 	io_uring_cmd_mark_cancelable(cmd, issue_flags);
 }
 
+static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
+		struct ublk_queue *ubq, int tag, size_t offset)
+{
+	struct request *req;
+
+	if (!ublk_need_req_ref(ubq))
+		return NULL;
+
+	req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag);
+	if (!req)
+		return NULL;
+
+	if (!ublk_get_req_ref(ubq, req))
+		return NULL;
+
+	if (unlikely(!blk_mq_request_started(req) || req->tag != tag))
+		goto fail_put;
+
+	if (!ublk_rq_has_data(req))
+		goto fail_put;
+
+	if (offset > blk_rq_bytes(req))
+		goto fail_put;
+
+	return req;
+fail_put:
+	ublk_put_req_ref(ubq, req);
+	return NULL;
+}
+
+static void ublk_io_release(void *priv)
+{
+	struct request *rq = priv;
+	struct ublk_queue *ubq = rq->mq_hctx->driver_data;
+
+	ublk_put_req_ref(ubq, rq);
+}
+
+static int ublk_register_io_buf(struct io_uring_cmd *cmd,
+				struct ublk_queue *ubq, unsigned int tag,
+				const struct ublksrv_io_cmd *ub_cmd,
+				unsigned int issue_flags)
+{
+	struct ublk_device *ub = cmd->file->private_data;
+	int index = (int)ub_cmd->addr, ret;
+	struct request *req;
+
+	req = __ublk_check_and_get_req(ub, ubq, tag, 0);
+	if (!req)
+		return -EINVAL;
+
+	ret = io_buffer_register_bvec(cmd, req, ublk_io_release, index,
+				      issue_flags);
+	if (ret) {
+		ublk_put_req_ref(ubq, req);
+		return ret;
+	}
+
+	return 0;
+}
+
+static int ublk_unregister_io_buf(struct io_uring_cmd *cmd,
+				  const struct ublksrv_io_cmd *ub_cmd,
+				  unsigned int issue_flags)
+{
+	int index = (int)ub_cmd->addr;
+
+	io_buffer_unregister_bvec(cmd, index, issue_flags);
+	return 0;
+}
+
 static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 			       unsigned int issue_flags,
 			       const struct ublksrv_io_cmd *ub_cmd)
@@ -1798,6 +1872,10 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 
 	ret = -EINVAL;
 	switch (_IOC_NR(cmd_op)) {
+	case UBLK_IO_REGISTER_IO_BUF:
+		return ublk_register_io_buf(cmd, ubq, tag, ub_cmd, issue_flags);
+	case UBLK_IO_UNREGISTER_IO_BUF:
+		return ublk_unregister_io_buf(cmd, ub_cmd, issue_flags);
 	case UBLK_IO_FETCH_REQ:
 		/* UBLK_IO_FETCH_REQ is only allowed before queue is setup */
 		if (ublk_queue_ready(ubq)) {
@@ -1872,36 +1950,6 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
 	return -EIOCBQUEUED;
 }
 
-static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
-		struct ublk_queue *ubq, int tag, size_t offset)
-{
-	struct request *req;
-
-	if (!ublk_need_req_ref(ubq))
-		return NULL;
-
-	req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag);
-	if (!req)
-		return NULL;
-
-	if (!ublk_get_req_ref(ubq, req))
-		return NULL;
-
-	if (unlikely(!blk_mq_request_started(req) || req->tag != tag))
-		goto fail_put;
-
-	if (!ublk_rq_has_data(req))
-		goto fail_put;
-
-	if (offset > blk_rq_bytes(req))
-		goto fail_put;
-
-	return req;
-fail_put:
-	ublk_put_req_ref(ubq, req);
-	return NULL;
-}
-
 static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd,
 		unsigned int issue_flags)
 {
@@ -2527,9 +2575,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
 		goto out_free_dev_number;
 	}
 
-	/* We are not ready to support zero copy */
-	ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY;
-
 	ub->dev_info.nr_hw_queues = min_t(unsigned int,
 			ub->dev_info.nr_hw_queues, nr_cpu_ids);
 	ublk_align_max_io_size(ub);
@@ -2860,7 +2905,7 @@ static int ublk_ctrl_get_features(struct io_uring_cmd *cmd)
 {
 	const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
 	void __user *argp = (void __user *)(unsigned long)header->addr;
-	u64 features = UBLK_F_ALL & ~UBLK_F_SUPPORT_ZERO_COPY;
+	u64 features = UBLK_F_ALL;
 
 	if (header->len != UBLK_FEATURES_LEN || !header->addr)
 		return -EINVAL;
diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
index a8bc98bb69fce..74246c926b55f 100644
--- a/include/uapi/linux/ublk_cmd.h
+++ b/include/uapi/linux/ublk_cmd.h
@@ -94,6 +94,10 @@
 	_IOWR('u', UBLK_IO_COMMIT_AND_FETCH_REQ, struct ublksrv_io_cmd)
 #define	UBLK_U_IO_NEED_GET_DATA		\
 	_IOWR('u', UBLK_IO_NEED_GET_DATA, struct ublksrv_io_cmd)
+#define	UBLK_U_IO_REGISTER_IO_BUF	\
+	_IOWR('u', 0x23, struct ublksrv_io_cmd)
+#define	UBLK_U_IO_UNREGISTER_IO_BUF	\
+	_IOWR('u', 0x24, struct ublksrv_io_cmd)
 
 /* only ABORT means that no re-fetch */
 #define UBLK_IO_RES_OK			0
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 10/11] io_uring: add abstraction for buf_table rsrc data
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (8 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 09/11] ublk: zc register/unregister bvec Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25 16:04   ` Pavel Begunkov
  2025-02-24 21:31 ` [PATCHv5 11/11] io_uring: cache nodes and mapped buffers Keith Busch
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Keith Busch <[email protected]>

We'll need to add more fields specific to the registered buffers, so
make a layer for it now. No functional change in this patch.

Reviewed-by: Caleb Sander Mateos <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 include/linux/io_uring_types.h |  6 +++-
 io_uring/fdinfo.c              |  8 +++---
 io_uring/nop.c                 |  2 +-
 io_uring/register.c            |  2 +-
 io_uring/rsrc.c                | 51 +++++++++++++++++-----------------
 5 files changed, 36 insertions(+), 33 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index c0fe8a00fe53a..a05ae4cb98a4c 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -69,6 +69,10 @@ struct io_file_table {
 	unsigned int alloc_hint;
 };
 
+struct io_buf_table {
+	struct io_rsrc_data	data;
+};
+
 struct io_hash_bucket {
 	struct hlist_head	list;
 } ____cacheline_aligned_in_smp;
@@ -293,7 +297,7 @@ struct io_ring_ctx {
 		struct io_wq_work_list	iopoll_list;
 
 		struct io_file_table	file_table;
-		struct io_rsrc_data	buf_table;
+		struct io_buf_table	buf_table;
 
 		struct io_submit_state	submit_state;
 
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index f60d0a9d505e2..d389c06cbce10 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -217,12 +217,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file)
 			seq_puts(m, "\n");
 		}
 	}
-	seq_printf(m, "UserBufs:\t%u\n", ctx->buf_table.nr);
-	for (i = 0; has_lock && i < ctx->buf_table.nr; i++) {
+	seq_printf(m, "UserBufs:\t%u\n", ctx->buf_table.data.nr);
+	for (i = 0; has_lock && i < ctx->buf_table.data.nr; i++) {
 		struct io_mapped_ubuf *buf = NULL;
 
-		if (ctx->buf_table.nodes[i])
-			buf = ctx->buf_table.nodes[i]->buf;
+		if (ctx->buf_table.data.nodes[i])
+			buf = ctx->buf_table.data.nodes[i]->buf;
 		if (buf)
 			seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, buf->len);
 		else
diff --git a/io_uring/nop.c b/io_uring/nop.c
index ea539531cb5f6..da8870e00eee7 100644
--- a/io_uring/nop.c
+++ b/io_uring/nop.c
@@ -66,7 +66,7 @@ int io_nop(struct io_kiocb *req, unsigned int issue_flags)
 
 		ret = -EFAULT;
 		io_ring_submit_lock(ctx, issue_flags);
-		node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
+		node = io_rsrc_node_lookup(&ctx->buf_table.data, req->buf_index);
 		if (node) {
 			io_req_assign_buf_node(req, node);
 			ret = 0;
diff --git a/io_uring/register.c b/io_uring/register.c
index cc23a4c205cd4..f15a8d52ad30f 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -926,7 +926,7 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
 	ret = __io_uring_register(ctx, opcode, arg, nr_args);
 
 	trace_io_uring_register(ctx, opcode, ctx->file_table.data.nr,
-				ctx->buf_table.nr, ret);
+				ctx->buf_table.data.nr, ret);
 	mutex_unlock(&ctx->uring_lock);
 
 	fput(file);
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index e0c6ed3aef5b5..70558317fbb2b 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -236,9 +236,9 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
 	__u32 done;
 	int i, err;
 
-	if (!ctx->buf_table.nr)
+	if (!ctx->buf_table.data.nr)
 		return -ENXIO;
-	if (up->offset + nr_args > ctx->buf_table.nr)
+	if (up->offset + nr_args > ctx->buf_table.data.nr)
 		return -EINVAL;
 
 	for (done = 0; done < nr_args; done++) {
@@ -270,9 +270,9 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
 			}
 			node->tag = tag;
 		}
-		i = array_index_nospec(up->offset + done, ctx->buf_table.nr);
-		io_reset_rsrc_node(ctx, &ctx->buf_table, i);
-		ctx->buf_table.nodes[i] = node;
+		i = array_index_nospec(up->offset + done, ctx->buf_table.data.nr);
+		io_reset_rsrc_node(ctx, &ctx->buf_table.data, i);
+		ctx->buf_table.data.nodes[i] = node;
 		if (ctx->compat)
 			user_data += sizeof(struct compat_iovec);
 		else
@@ -550,9 +550,9 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
 
 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
 {
-	if (!ctx->buf_table.nr)
+	if (!ctx->buf_table.data.nr)
 		return -ENXIO;
-	io_rsrc_data_free(ctx, &ctx->buf_table);
+	io_rsrc_data_free(ctx, &ctx->buf_table.data);
 	return 0;
 }
 
@@ -579,8 +579,8 @@ static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages,
 	}
 
 	/* check previously registered pages */
-	for (i = 0; i < ctx->buf_table.nr; i++) {
-		struct io_rsrc_node *node = ctx->buf_table.nodes[i];
+	for (i = 0; i < ctx->buf_table.data.nr; i++) {
+		struct io_rsrc_node *node = ctx->buf_table.data.nodes[i];
 		struct io_mapped_ubuf *imu;
 
 		if (!node)
@@ -809,7 +809,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 
 	BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16));
 
-	if (ctx->buf_table.nr)
+	if (ctx->buf_table.data.nr)
 		return -EBUSY;
 	if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS)
 		return -EINVAL;
@@ -862,7 +862,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 		data.nodes[i] = node;
 	}
 
-	ctx->buf_table = data;
+	ctx->buf_table.data = data;
 	if (ret)
 		io_sqe_buffers_unregister(ctx);
 	return ret;
@@ -873,7 +873,7 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
 			    unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
-	struct io_rsrc_data *data = &ctx->buf_table;
+	struct io_rsrc_data *data = &ctx->buf_table.data;
 	struct req_iterator rq_iter;
 	struct io_mapped_ubuf *imu;
 	struct io_rsrc_node *node;
@@ -938,7 +938,7 @@ void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
 			       unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
-	struct io_rsrc_data *data = &ctx->buf_table;
+	struct io_rsrc_data *data = &ctx->buf_table.data;
 	struct io_rsrc_node *node;
 
 	io_ring_submit_lock(ctx, issue_flags);
@@ -1035,7 +1035,7 @@ static inline struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req,
 		return req->buf_node;
 
 	io_ring_submit_lock(ctx, issue_flags);
-	node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
+	node = io_rsrc_node_lookup(&ctx->buf_table.data, req->buf_index);
 	if (node)
 		io_req_assign_buf_node(req, node);
 	io_ring_submit_unlock(ctx, issue_flags);
@@ -1085,10 +1085,10 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 	if (!arg->nr && (arg->dst_off || arg->src_off))
 		return -EINVAL;
 	/* not allowed unless REPLACE is set */
-	if (ctx->buf_table.nr && !(arg->flags & IORING_REGISTER_DST_REPLACE))
+	if (ctx->buf_table.data.nr && !(arg->flags & IORING_REGISTER_DST_REPLACE))
 		return -EBUSY;
 
-	nbufs = src_ctx->buf_table.nr;
+	nbufs = src_ctx->buf_table.data.nr;
 	if (!arg->nr)
 		arg->nr = nbufs;
 	else if (arg->nr > nbufs)
@@ -1098,13 +1098,13 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 	if (check_add_overflow(arg->nr, arg->dst_off, &nbufs))
 		return -EOVERFLOW;
 
-	ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.nr));
+	ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.data.nr));
 	if (ret)
 		return ret;
 
 	/* Fill entries in data from dst that won't overlap with src */
-	for (i = 0; i < min(arg->dst_off, ctx->buf_table.nr); i++) {
-		struct io_rsrc_node *src_node = ctx->buf_table.nodes[i];
+	for (i = 0; i < min(arg->dst_off, ctx->buf_table.data.nr); i++) {
+		struct io_rsrc_node *src_node = ctx->buf_table.data.nodes[i];
 
 		if (src_node) {
 			data.nodes[i] = src_node;
@@ -1113,7 +1113,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 	}
 
 	ret = -ENXIO;
-	nbufs = src_ctx->buf_table.nr;
+	nbufs = src_ctx->buf_table.data.nr;
 	if (!nbufs)
 		goto out_free;
 	ret = -EINVAL;
@@ -1133,7 +1133,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 	while (nr--) {
 		struct io_rsrc_node *dst_node, *src_node;
 
-		src_node = io_rsrc_node_lookup(&src_ctx->buf_table, i);
+		src_node = io_rsrc_node_lookup(&src_ctx->buf_table.data, i);
 		if (!src_node) {
 			dst_node = NULL;
 		} else {
@@ -1155,7 +1155,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 	 * old and new nodes at this point.
 	 */
 	if (arg->flags & IORING_REGISTER_DST_REPLACE)
-		io_rsrc_data_free(ctx, &ctx->buf_table);
+		io_sqe_buffers_unregister(ctx);
 
 	/*
 	 * ctx->buf_table must be empty now - either the contents are being
@@ -1163,10 +1163,9 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 	 * copied to a ring that does not have buffers yet (checked at function
 	 * entry).
 	 */
-	WARN_ON_ONCE(ctx->buf_table.nr);
-	ctx->buf_table = data;
+	WARN_ON_ONCE(ctx->buf_table.data.nr);
+	ctx->buf_table.data = data;
 	return 0;
-
 out_free:
 	io_rsrc_data_free(ctx, &data);
 	return ret;
@@ -1191,7 +1190,7 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
 		return -EFAULT;
 	if (buf.flags & ~(IORING_REGISTER_SRC_REGISTERED|IORING_REGISTER_DST_REPLACE))
 		return -EINVAL;
-	if (!(buf.flags & IORING_REGISTER_DST_REPLACE) && ctx->buf_table.nr)
+	if (!(buf.flags & IORING_REGISTER_DST_REPLACE) && ctx->buf_table.data.nr)
 		return -EBUSY;
 	if (memchr_inv(buf.pad, 0, sizeof(buf.pad)))
 		return -EINVAL;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCHv5 11/11] io_uring: cache nodes and mapped buffers
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (9 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 10/11] io_uring: add abstraction for buf_table rsrc data Keith Busch
@ 2025-02-24 21:31 ` Keith Busch
  2025-02-25 13:11   ` Pavel Begunkov
  2025-02-25 14:10 ` [PATCHv5 00/11] ublk zero copy support Pavel Begunkov
  2025-02-25 15:07 ` (subset) " Jens Axboe
  12 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-24 21:31 UTC (permalink / raw)
  To: ming.lei, asml.silence, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

From: Keith Busch <[email protected]>

Frequent alloc/free cycles on these is pretty costly. Use an io cache to
more efficiently reuse these buffers.

Signed-off-by: Keith Busch <[email protected]>
---
 include/linux/io_uring_types.h |  18 ++---
 io_uring/filetable.c           |   2 +-
 io_uring/rsrc.c                | 120 +++++++++++++++++++++++++--------
 io_uring/rsrc.h                |   2 +-
 4 files changed, 104 insertions(+), 38 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index a05ae4cb98a4c..fda3221de2174 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -69,8 +69,18 @@ struct io_file_table {
 	unsigned int alloc_hint;
 };
 
+struct io_alloc_cache {
+	void			**entries;
+	unsigned int		nr_cached;
+	unsigned int		max_cached;
+	unsigned int		elem_size;
+	unsigned int		init_clear;
+};
+
 struct io_buf_table {
 	struct io_rsrc_data	data;
+	struct io_alloc_cache	node_cache;
+	struct io_alloc_cache	imu_cache;
 };
 
 struct io_hash_bucket {
@@ -224,14 +234,6 @@ struct io_submit_state {
 	struct blk_plug		plug;
 };
 
-struct io_alloc_cache {
-	void			**entries;
-	unsigned int		nr_cached;
-	unsigned int		max_cached;
-	unsigned int		elem_size;
-	unsigned int		init_clear;
-};
-
 struct io_ring_ctx {
 	/* const or read-mostly hot data */
 	struct {
diff --git a/io_uring/filetable.c b/io_uring/filetable.c
index dd8eeec97acf6..a21660e3145ab 100644
--- a/io_uring/filetable.c
+++ b/io_uring/filetable.c
@@ -68,7 +68,7 @@ static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *file,
 	if (slot_index >= ctx->file_table.data.nr)
 		return -EINVAL;
 
-	node = io_rsrc_node_alloc(IORING_RSRC_FILE);
+	node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE);
 	if (!node)
 		return -ENOMEM;
 
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 70558317fbb2b..43ee821e3f5d0 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -33,6 +33,8 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
 #define IORING_MAX_FIXED_FILES	(1U << 20)
 #define IORING_MAX_REG_BUFFERS	(1U << 14)
 
+#define IO_CACHED_BVECS_SEGS	32
+
 int __io_account_mem(struct user_struct *user, unsigned long nr_pages)
 {
 	unsigned long page_limit, cur_pages, new_pages;
@@ -102,6 +104,22 @@ int io_buffer_validate(struct iovec *iov)
 	return 0;
 }
 
+static struct io_mapped_ubuf *io_alloc_imu(struct io_ring_ctx *ctx,
+					   int nr_bvecs)
+{
+	if (nr_bvecs <= IO_CACHED_BVECS_SEGS)
+		return io_cache_alloc(&ctx->buf_table.imu_cache, GFP_KERNEL);
+	return kvmalloc(struct_size_t(struct io_mapped_ubuf, bvec, nr_bvecs),
+			GFP_KERNEL);
+}
+
+static void io_free_imu(struct io_ring_ctx *ctx, struct io_mapped_ubuf *imu)
+{
+	if (imu->nr_bvecs > IO_CACHED_BVECS_SEGS ||
+	    !io_alloc_cache_put(&ctx->buf_table.imu_cache, imu))
+		kvfree(imu);
+}
+
 static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
 {
 	struct io_mapped_ubuf *imu = node->buf;
@@ -120,22 +138,35 @@ static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
 			io_unaccount_mem(ctx, imu->acct_pages);
 	}
 
-	kvfree(imu);
+	io_free_imu(ctx, imu);
 }
 
-struct io_rsrc_node *io_rsrc_node_alloc(int type)
+struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type)
 {
 	struct io_rsrc_node *node;
 
-	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (type == IORING_RSRC_FILE)
+		node = kmalloc(sizeof(*node), GFP_KERNEL);
+	else
+		node = io_cache_alloc(&ctx->buf_table.node_cache, GFP_KERNEL);
 	if (node) {
 		node->type = type;
 		node->refs = 1;
+		node->tag = 0;
+		node->file_ptr = 0;
 	}
 	return node;
 }
 
-__cold void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data)
+static __cold void __io_rsrc_data_free(struct io_rsrc_data *data)
+{
+	kvfree(data->nodes);
+	data->nodes = NULL;
+	data->nr = 0;
+}
+
+__cold void io_rsrc_data_free(struct io_ring_ctx *ctx,
+			      struct io_rsrc_data *data)
 {
 	if (!data->nr)
 		return;
@@ -143,9 +174,7 @@ __cold void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data
 		if (data->nodes[data->nr])
 			io_put_rsrc_node(ctx, data->nodes[data->nr]);
 	}
-	kvfree(data->nodes);
-	data->nodes = NULL;
-	data->nr = 0;
+	__io_rsrc_data_free(data);
 }
 
 __cold int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr)
@@ -159,6 +188,31 @@ __cold int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr)
 	return -ENOMEM;
 }
 
+static __cold int io_rsrc_buffer_alloc(struct io_buf_table *table, unsigned nr)
+{
+	const int imu_cache_size = struct_size_t(struct io_mapped_ubuf, bvec,
+						 IO_CACHED_BVECS_SEGS);
+	const int node_size = sizeof(struct io_rsrc_node);
+	int ret;
+
+	ret = io_rsrc_data_alloc(&table->data, nr);
+	if (ret)
+		return ret;
+
+	if (io_alloc_cache_init(&table->node_cache, nr, node_size, 0))
+		goto free_data;
+
+	if (io_alloc_cache_init(&table->imu_cache, nr, imu_cache_size, 0))
+		goto free_cache;
+
+	return 0;
+free_cache:
+	io_alloc_cache_free(&table->node_cache, kfree);
+free_data:
+	__io_rsrc_data_free(&table->data);
+	return -ENOMEM;
+}
+
 static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 				 struct io_uring_rsrc_update2 *up,
 				 unsigned nr_args)
@@ -208,7 +262,7 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 				err = -EBADF;
 				break;
 			}
-			node = io_rsrc_node_alloc(IORING_RSRC_FILE);
+			node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE);
 			if (!node) {
 				err = -ENOMEM;
 				fput(file);
@@ -460,6 +514,8 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
 	case IORING_RSRC_BUFFER:
 		if (node->buf)
 			io_buffer_unmap(ctx, node);
+		if (io_alloc_cache_put(&ctx->buf_table.node_cache, node))
+			return;
 		break;
 	default:
 		WARN_ON_ONCE(1);
@@ -528,7 +584,7 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
 			goto fail;
 		}
 		ret = -ENOMEM;
-		node = io_rsrc_node_alloc(IORING_RSRC_FILE);
+		node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE);
 		if (!node) {
 			fput(file);
 			goto fail;
@@ -548,11 +604,19 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
 	return ret;
 }
 
+static void io_rsrc_buffer_free(struct io_ring_ctx *ctx,
+				struct io_buf_table *table)
+{
+	io_rsrc_data_free(ctx, &table->data);
+	io_alloc_cache_free(&table->node_cache, kfree);
+	io_alloc_cache_free(&table->imu_cache, kfree);
+}
+
 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
 {
 	if (!ctx->buf_table.data.nr)
 		return -ENXIO;
-	io_rsrc_data_free(ctx, &ctx->buf_table.data);
+	io_rsrc_buffer_free(ctx, &ctx->buf_table);
 	return 0;
 }
 
@@ -733,7 +797,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
 	if (!iov->iov_base)
 		return NULL;
 
-	node = io_rsrc_node_alloc(IORING_RSRC_BUFFER);
+	node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER);
 	if (!node)
 		return ERR_PTR(-ENOMEM);
 	node->buf = NULL;
@@ -753,7 +817,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
 			coalesced = io_coalesce_buffer(&pages, &nr_pages, &data);
 	}
 
-	imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL);
+	imu = io_alloc_imu(ctx, nr_pages);
 	if (!imu)
 		goto done;
 
@@ -789,7 +853,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
 	}
 done:
 	if (ret) {
-		kvfree(imu);
+		io_free_imu(ctx, imu);
 		if (node)
 			io_put_rsrc_node(ctx, node);
 		node = ERR_PTR(ret);
@@ -802,9 +866,9 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 			    unsigned int nr_args, u64 __user *tags)
 {
 	struct page *last_hpage = NULL;
-	struct io_rsrc_data data;
 	struct iovec fast_iov, *iov = &fast_iov;
 	const struct iovec __user *uvec;
+	struct io_buf_table table;
 	int i, ret;
 
 	BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16));
@@ -813,13 +877,14 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 		return -EBUSY;
 	if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS)
 		return -EINVAL;
-	ret = io_rsrc_data_alloc(&data, nr_args);
+	ret = io_rsrc_buffer_alloc(&table, nr_args);
 	if (ret)
 		return ret;
 
 	if (!arg)
 		memset(iov, 0, sizeof(*iov));
 
+	ctx->buf_table = table;
 	for (i = 0; i < nr_args; i++) {
 		struct io_rsrc_node *node;
 		u64 tag = 0;
@@ -859,10 +924,8 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 			}
 			node->tag = tag;
 		}
-		data.nodes[i] = node;
+		table.data.nodes[i] = node;
 	}
-
-	ctx->buf_table.data = data;
 	if (ret)
 		io_sqe_buffers_unregister(ctx);
 	return ret;
@@ -894,14 +957,15 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
 		goto unlock;
 	}
 
-	node = io_rsrc_node_alloc(IORING_RSRC_BUFFER);
+	node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER);
 	if (!node) {
 		ret = -ENOMEM;
 		goto unlock;
 	}
 
 	nr_bvecs = blk_rq_nr_phys_segments(rq);
-	imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL);
+
+	imu = io_alloc_imu(ctx, nr_bvecs);
 	if (!imu) {
 		kfree(node);
 		ret = -ENOMEM;
@@ -1067,7 +1131,7 @@ static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2)
 static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx,
 			    struct io_uring_clone_buffers *arg)
 {
-	struct io_rsrc_data data;
+	struct io_buf_table table;
 	int i, ret, off, nr;
 	unsigned int nbufs;
 
@@ -1098,7 +1162,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 	if (check_add_overflow(arg->nr, arg->dst_off, &nbufs))
 		return -EOVERFLOW;
 
-	ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.data.nr));
+	ret = io_rsrc_buffer_alloc(&table, max(nbufs, ctx->buf_table.data.nr));
 	if (ret)
 		return ret;
 
@@ -1107,7 +1171,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 		struct io_rsrc_node *src_node = ctx->buf_table.data.nodes[i];
 
 		if (src_node) {
-			data.nodes[i] = src_node;
+			table.data.nodes[i] = src_node;
 			src_node->refs++;
 		}
 	}
@@ -1137,7 +1201,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 		if (!src_node) {
 			dst_node = NULL;
 		} else {
-			dst_node = io_rsrc_node_alloc(IORING_RSRC_BUFFER);
+			dst_node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER);
 			if (!dst_node) {
 				ret = -ENOMEM;
 				goto out_free;
@@ -1146,12 +1210,12 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 			refcount_inc(&src_node->buf->refs);
 			dst_node->buf = src_node->buf;
 		}
-		data.nodes[off++] = dst_node;
+		table.data.nodes[off++] = dst_node;
 		i++;
 	}
 
 	/*
-	 * If asked for replace, put the old table. data->nodes[] holds both
+	 * If asked for replace, put the old table. table.data->nodes[] holds both
 	 * old and new nodes at this point.
 	 */
 	if (arg->flags & IORING_REGISTER_DST_REPLACE)
@@ -1164,10 +1228,10 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
 	 * entry).
 	 */
 	WARN_ON_ONCE(ctx->buf_table.data.nr);
-	ctx->buf_table.data = data;
+	ctx->buf_table = table;
 	return 0;
 out_free:
-	io_rsrc_data_free(ctx, &data);
+	io_rsrc_buffer_free(ctx, &table);
 	return ret;
 }
 
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 64bf35667cf9c..92dd78be9546d 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -47,7 +47,7 @@ struct io_imu_folio_data {
 	unsigned int	nr_folios;
 };
 
-struct io_rsrc_node *io_rsrc_node_alloc(int type);
+struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type);
 void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node);
 void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data);
 int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 02/11] io_uring/nop: reuse req->buf_index
  2025-02-24 21:31 ` [PATCHv5 02/11] io_uring/nop: reuse req->buf_index Keith Busch
@ 2025-02-24 23:30   ` Jens Axboe
  2025-02-25  0:02     ` Keith Busch
  2025-02-25  8:43   ` Ming Lei
  2025-02-25 13:13   ` Pavel Begunkov
  2 siblings, 1 reply; 51+ messages in thread
From: Jens Axboe @ 2025-02-24 23:30 UTC (permalink / raw)
  To: Keith Busch, ming.lei, asml.silence, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 2:31 PM, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> There is already a field in io_kiocb that can store a registered buffer
> index, use that instead of stashing the value into struct io_nop.

Only reason it was done this way is that ->buf_index is initially the
buffer group ID, and then the buffer ID when a buffer is selected. But I
_think_ we always restore that and hence we don't need to do this
anymore, should be checked. Maybe you already did?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 02/11] io_uring/nop: reuse req->buf_index
  2025-02-24 23:30   ` Jens Axboe
@ 2025-02-25  0:02     ` Keith Busch
  0 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-25  0:02 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Keith Busch, ming.lei, asml.silence, linux-block, io-uring, bernd,
	csander

On Mon, Feb 24, 2025 at 04:30:48PM -0700, Jens Axboe wrote:
> On 2/24/25 2:31 PM, Keith Busch wrote:
> > From: Keith Busch <[email protected]>
> > 
> > There is already a field in io_kiocb that can store a registered buffer
> > index, use that instead of stashing the value into struct io_nop.
> 
> Only reason it was done this way is that ->buf_index is initially the
> buffer group ID, and then the buffer ID when a buffer is selected. But I
> _think_ we always restore that and hence we don't need to do this
> anymore, should be checked. Maybe you already did?

The IORING_OP_NOP opdef doesn't set the buffer_select flag, so the req
buf_index couldn't be used for a buffer select id.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu
  2025-02-24 21:31 ` [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu Keith Busch
@ 2025-02-25  8:37   ` Ming Lei
  2025-02-25 13:13   ` Pavel Begunkov
  1 sibling, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-25  8:37 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:06PM -0800, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> The only caller to io_buffer_unmap already checks if the node's buf is
> not null, so no need to check again.
> 
> Signed-off-by: Keith Busch <[email protected]>

Reviewed-by: Ming Lei <[email protected]>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 02/11] io_uring/nop: reuse req->buf_index
  2025-02-24 21:31 ` [PATCHv5 02/11] io_uring/nop: reuse req->buf_index Keith Busch
  2025-02-24 23:30   ` Jens Axboe
@ 2025-02-25  8:43   ` Ming Lei
  2025-02-25 13:13   ` Pavel Begunkov
  2 siblings, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-25  8:43 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:07PM -0800, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> There is already a field in io_kiocb that can store a registered buffer
> index, use that instead of stashing the value into struct io_nop.
> 
> Signed-off-by: Keith Busch <[email protected]>

Reviewed-by: Ming Lei <[email protected]>

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc
  2025-02-24 21:31 ` [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc Keith Busch
@ 2025-02-25  8:44   ` Ming Lei
  2025-02-25 13:14   ` Pavel Begunkov
  1 sibling, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-25  8:44 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:08PM -0800, Keith Busch wrote:
> From: Pavel Begunkov <[email protected]>
> 
> There is already a field in io_kiocb that can store a registered buffer
> index, use that instead of stashing the value into struct io_sr_msg.
> 
> Reviewed-by: Keith Busch <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Ming Lei <[email protected]>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 04/11] io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed()
  2025-02-24 21:31 ` [PATCHv5 04/11] io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed() Keith Busch
@ 2025-02-25  8:52   ` Ming Lei
  0 siblings, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-25  8:52 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:09PM -0800, Keith Busch wrote:
> From: Pavel Begunkov <[email protected]>
> 
> io_uring_cmd_import_fixed() will need to know the io_uring execution
> state in following commits, for now just pass issue_flags into it
> without actually using.
> 
> Reviewed-by: Keith Busch <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Ming Lei <[email protected]>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 05/11] io_uring: combine buffer lookup and import
  2025-02-24 21:31 ` [PATCHv5 05/11] io_uring: combine buffer lookup and import Keith Busch
@ 2025-02-25  8:55   ` Ming Lei
  0 siblings, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-25  8:55 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:10PM -0800, Keith Busch wrote:
> From: Pavel Begunkov <[email protected]>
> 
> Registered buffer are currently imported in two steps, first we lookup
> a rsrc node and then use it to set up the iterator. The first part is
> usually done at the prep stage, and import happens whenever it's needed.
> As we want to defer binding to a node so that it works with linked
> requests, combine both steps into a single helper.
> 
> Reviewed-by: Keith Busch <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>

Reviewed-by: Ming Lei <[email protected]>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path
  2025-02-24 21:31 ` [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path Keith Busch
@ 2025-02-25  9:26   ` Ming Lei
  2025-02-25 13:57   ` Pavel Begunkov
  2025-02-25 20:57   ` Caleb Sander Mateos
  2 siblings, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-25  9:26 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:11PM -0800, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> Registered buffers may depend on a linked command, which makes the prep
> path too early to import. Move to the issue path when the node is
> actually needed like all the other users of fixed buffers.
> 
> Signed-off-by: Keith Busch <[email protected]>

Reviewed-by: Ming Lei <[email protected]>



Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 07/11] io_uring: add support for kernel registered bvecs
  2025-02-24 21:31 ` [PATCHv5 07/11] io_uring: add support for kernel registered bvecs Keith Busch
@ 2025-02-25  9:40   ` Ming Lei
  2025-02-25 17:32     ` Keith Busch
  2025-02-25 14:00   ` Pavel Begunkov
  2025-02-25 20:58   ` Caleb Sander Mateos
  2 siblings, 1 reply; 51+ messages in thread
From: Ming Lei @ 2025-02-25  9:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:12PM -0800, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> Provide an interface for the kernel to leverage the existing
> pre-registered buffers that io_uring provides. User space can reference
> these later to achieve zero-copy IO.
> 
> User space must register an empty fixed buffer table with io_uring in
> order for the kernel to make use of it.
> 
> Signed-off-by: Keith Busch <[email protected]>
> ---
>  include/linux/io_uring/cmd.h |   7 ++
>  io_uring/rsrc.c              | 123 +++++++++++++++++++++++++++++++++--
>  io_uring/rsrc.h              |   8 +++
>  3 files changed, 131 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
> index 87150dc0a07cf..cf8d80d847344 100644
> --- a/include/linux/io_uring/cmd.h
> +++ b/include/linux/io_uring/cmd.h
> @@ -4,6 +4,7 @@
>  
>  #include <uapi/linux/io_uring.h>
>  #include <linux/io_uring_types.h>
> +#include <linux/blk-mq.h>
>  
>  /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */
>  #define IORING_URING_CMD_CANCELABLE	(1U << 30)
> @@ -125,4 +126,10 @@ static inline struct io_uring_cmd_data *io_uring_cmd_get_async_data(struct io_ur
>  	return cmd_to_io_kiocb(cmd)->async_data;
>  }
>  
> +int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
> +			    void (*release)(void *), unsigned int index,
> +			    unsigned int issue_flags);
> +void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
> +			       unsigned int issue_flags);
> +
>  #endif /* _LINUX_IO_URING_CMD_H */
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index f814526982c36..e0c6ed3aef5b5 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -9,6 +9,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/compat.h>
>  #include <linux/io_uring.h>
> +#include <linux/io_uring/cmd.h>
>  
>  #include <uapi/linux/io_uring.h>
>  
> @@ -104,14 +105,21 @@ int io_buffer_validate(struct iovec *iov)
>  static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
>  {
>  	struct io_mapped_ubuf *imu = node->buf;
> -	unsigned int i;
>  
>  	if (!refcount_dec_and_test(&imu->refs))
>  		return;
> -	for (i = 0; i < imu->nr_bvecs; i++)
> -		unpin_user_page(imu->bvec[i].bv_page);
> -	if (imu->acct_pages)
> -		io_unaccount_mem(ctx, imu->acct_pages);
> +
> +	if (imu->release) {
> +		imu->release(imu->priv);
> +	} else {
> +		unsigned int i;
> +
> +		for (i = 0; i < imu->nr_bvecs; i++)
> +			unpin_user_page(imu->bvec[i].bv_page);
> +		if (imu->acct_pages)
> +			io_unaccount_mem(ctx, imu->acct_pages);
> +	}
> +
>  	kvfree(imu);
>  }
>  
> @@ -761,6 +769,9 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
>  	imu->len = iov->iov_len;
>  	imu->nr_bvecs = nr_pages;
>  	imu->folio_shift = PAGE_SHIFT;
> +	imu->release = NULL;
> +	imu->priv = NULL;
> +	imu->perm = IO_IMU_READABLE | IO_IMU_WRITEABLE;
>  	if (coalesced)
>  		imu->folio_shift = data.folio_shift;
>  	refcount_set(&imu->refs, 1);
> @@ -857,6 +868,95 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
>  	return ret;
>  }
>  
> +int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
> +			    void (*release)(void *), unsigned int index,
> +			    unsigned int issue_flags)
> +{
> +	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> +	struct io_rsrc_data *data = &ctx->buf_table;
> +	struct req_iterator rq_iter;
> +	struct io_mapped_ubuf *imu;
> +	struct io_rsrc_node *node;
> +	struct bio_vec bv, *bvec;
> +	u16 nr_bvecs;
> +	int ret = 0;
> +
> +
> +	io_ring_submit_lock(ctx, issue_flags);
> +	if (index >= data->nr) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +	index = array_index_nospec(index, data->nr);
> +
> +	if (data->nodes[index] ) {
> +		ret = -EBUSY;
> +		goto unlock;
> +	}
> +
> +	node = io_rsrc_node_alloc(IORING_RSRC_BUFFER);
> +	if (!node) {
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	nr_bvecs = blk_rq_nr_phys_segments(rq);
> +	imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL);
> +	if (!imu) {
> +		kfree(node);
> +		ret = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	imu->ubuf = 0;
> +	imu->len = blk_rq_bytes(rq);
> +	imu->acct_pages = 0;
> +	imu->folio_shift = PAGE_SHIFT;
> +	imu->nr_bvecs = nr_bvecs;
> +	refcount_set(&imu->refs, 1);
> +	imu->release = release;
> +	imu->priv = rq;
> +
> +	if (op_is_write(req_op(rq)))
> +		imu->perm = IO_IMU_WRITEABLE;
> +	else
> +		imu->perm = IO_IMU_READABLE;

Looks the above is wrong, if request is for write op, the buffer
should be readable & !writeable.

IO_IMU_WRITEABLE is supposed to mean the buffer is writeable, isn't it?

> +
> +	bvec = imu->bvec;
> +	rq_for_each_bvec(bv, rq, rq_iter)
> +		*bvec++ = bv;
> +
> +	node->buf = imu;
> +	data->nodes[index] = node;
> +unlock:
> +	io_ring_submit_unlock(ctx, issue_flags);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(io_buffer_register_bvec);
> +
> +void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
> +			       unsigned int issue_flags)
> +{
> +	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> +	struct io_rsrc_data *data = &ctx->buf_table;
> +	struct io_rsrc_node *node;
> +
> +	io_ring_submit_lock(ctx, issue_flags);
> +	if (index >= data->nr)
> +		goto unlock;
> +	index = array_index_nospec(index, data->nr);
> +
> +	node = data->nodes[index];
> +	if (!node || !node->buf->release)
> +		goto unlock;
> +
> +	io_put_rsrc_node(ctx, node);
> +	data->nodes[index] = NULL;
> +unlock:
> +	io_ring_submit_unlock(ctx, issue_flags);
> +}
> +EXPORT_SYMBOL_GPL(io_buffer_unregister_bvec);
> +
>  static int io_import_fixed(int ddir, struct iov_iter *iter,
>  			   struct io_mapped_ubuf *imu,
>  			   u64 buf_addr, size_t len)
> @@ -871,6 +971,8 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>  	/* not inside the mapped region */
>  	if (unlikely(buf_addr < imu->ubuf || buf_end > (imu->ubuf + imu->len)))
>  		return -EFAULT;
> +	if (!(imu->perm & (1 << ddir)))
> +		return -EFAULT;
>  
>  	/*
>  	 * Might not be a start of buffer, set size appropriately
> @@ -883,8 +985,8 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>  		/*
>  		 * Don't use iov_iter_advance() here, as it's really slow for
>  		 * using the latter parts of a big fixed buffer - it iterates
> -		 * over each segment manually. We can cheat a bit here, because
> -		 * we know that:
> +		 * over each segment manually. We can cheat a bit here for user
> +		 * registered nodes, because we know that:
>  		 *
>  		 * 1) it's a BVEC iter, we set it up
>  		 * 2) all bvecs are the same in size, except potentially the
> @@ -898,8 +1000,15 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>  		 */
>  		const struct bio_vec *bvec = imu->bvec;
>  
> +		/*
> +		 * Kernel buffer bvecs, on the other hand, don't necessarily
> +		 * have the size property of user registered ones, so we have
> +		 * to use the slow iter advance.
> +		 */
>  		if (offset < bvec->bv_len) {
>  			iter->iov_offset = offset;
> +		} else if (imu->release) {
> +			iov_iter_advance(iter, offset);
>  		} else {
>  			unsigned long seg_skip;
>  
> diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
> index f0e9080599646..64bf35667cf9c 100644
> --- a/io_uring/rsrc.h
> +++ b/io_uring/rsrc.h
> @@ -20,6 +20,11 @@ struct io_rsrc_node {
>  	};
>  };
>  
> +enum {
> +	IO_IMU_READABLE		= 1 << 0,
> +	IO_IMU_WRITEABLE	= 1 << 1,
> +};
> +

The above definition could be wrong too, IO_IMU_READABLE is supposed to
mean that the buffer is readable, but it is aligned with 1 << ITER_DEST.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 08/11] nvme: map uring_cmd data even if address is 0
  2025-02-24 21:31 ` [PATCHv5 08/11] nvme: map uring_cmd data even if address is 0 Keith Busch
@ 2025-02-25  9:41   ` Ming Lei
  0 siblings, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-25  9:41 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Xinyu Zhang, Keith Busch

On Mon, Feb 24, 2025 at 01:31:13PM -0800, Keith Busch wrote:
> From: Xinyu Zhang <[email protected]>
> 
> When using kernel registered bvec fixed buffers, the "address" is
> actually the offset into the bvec rather than userspace address.
> Therefore it can be 0.
> We can skip checking whether the address is NULL before mapping
> uring_cmd data. Bad userspace address will be handled properly later when
> the user buffer is imported.
> With this patch, we will be able to use the kernel registered bvec fixed
> buffers in io_uring NVMe passthru with ublk zero-copy support in
> https://lore.kernel.org/io-uring/[email protected]/T/#u.
> 
> Reviewed-by: Caleb Sander Mateos <[email protected]>
> Reviewed-by: Jens Axboe <[email protected]>
> Reviewed-by: Keith Busch <[email protected]>
> Signed-off-by: Xinyu Zhang <[email protected]>

Reviewed-by: Ming Lei <[email protected]>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-24 21:31 ` [PATCHv5 09/11] ublk: zc register/unregister bvec Keith Busch
@ 2025-02-25 11:00   ` Ming Lei
  2025-02-25 16:35     ` Keith Busch
  2025-02-25 16:19   ` Pavel Begunkov
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 51+ messages in thread
From: Ming Lei @ 2025-02-25 11:00 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:14PM -0800, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> Provide new operations for the user to request mapping an active request
> to an io uring instance's buf_table. The user has to provide the index
> it wants to install the buffer.
> 
> A reference count is taken on the request to ensure it can't be
> completed while it is active in a ring's buf_table.
> 
> Signed-off-by: Keith Busch <[email protected]>
> ---
>  drivers/block/ublk_drv.c      | 117 +++++++++++++++++++++++-----------
>  include/uapi/linux/ublk_cmd.h |   4 ++
>  2 files changed, 85 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 529085181f355..a719d873e3882 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -51,6 +51,9 @@
>  /* private ioctl command mirror */
>  #define UBLK_CMD_DEL_DEV_ASYNC	_IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
>  
> +#define UBLK_IO_REGISTER_IO_BUF		_IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
> +#define UBLK_IO_UNREGISTER_IO_BUF	_IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
> +
>  /* All UBLK_F_* have to be included into UBLK_F_ALL */
>  #define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
>  		| UBLK_F_URING_CMD_COMP_IN_TASK \
> @@ -201,7 +204,7 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
>  						   int tag);
>  static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
>  {
> -	return ub->dev_info.flags & UBLK_F_USER_COPY;
> +	return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
>  }

I'd suggest to set UBLK_F_USER_COPY explicitly either from userspace or
kernel side.

One reason is that UBLK_F_UNPRIVILEGED_DEV mode can't work for both.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 11/11] io_uring: cache nodes and mapped buffers
  2025-02-24 21:31 ` [PATCHv5 11/11] io_uring: cache nodes and mapped buffers Keith Busch
@ 2025-02-25 13:11   ` Pavel Begunkov
  0 siblings, 0 replies; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 13:11 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> Frequent alloc/free cycles on these is pretty costly. Use an io cache to
> more efficiently reuse these buffers.
> 
> Signed-off-by: Keith Busch <[email protected]>
> ---
>   include/linux/io_uring_types.h |  18 ++---
>   io_uring/filetable.c           |   2 +-
>   io_uring/rsrc.c                | 120 +++++++++++++++++++++++++--------
>   io_uring/rsrc.h                |   2 +-
>   4 files changed, 104 insertions(+), 38 deletions(-)
> 
...
> +static __cold int io_rsrc_buffer_alloc(struct io_buf_table *table, unsigned nr)
> +{
> +	const int imu_cache_size = struct_size_t(struct io_mapped_ubuf, bvec,
> +						 IO_CACHED_BVECS_SEGS);
> +	const int node_size = sizeof(struct io_rsrc_node);
> +	int ret;
> +
> +	ret = io_rsrc_data_alloc(&table->data, nr);
> +	if (ret)
> +		return ret;
> +
> +	if (io_alloc_cache_init(&table->node_cache, nr, node_size, 0))

We shouldn't use nr for the cache size, that could be unreasonably
huge for a cache. Let's use a constant for now and that should be
good enough, at least for now. 64 would be a good default. Same for
the imu cache.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu
  2025-02-24 21:31 ` [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu Keith Busch
  2025-02-25  8:37   ` Ming Lei
@ 2025-02-25 13:13   ` Pavel Begunkov
  1 sibling, 0 replies; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 13:13 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> The only caller to io_buffer_unmap already checks if the node's buf is
> not null, so no need to check again.

Reviewed-by: Pavel Begunkov <[email protected]>

> 
> Signed-off-by: Keith Busch <[email protected]>
> ---
>   io_uring/rsrc.c | 19 ++++++++-----------
>   1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 20b884c84e55f..efef29352dcfb 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -103,19 +103,16 @@ int io_buffer_validate(struct iovec *iov)
>   
>   static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
>   {
> +	struct io_mapped_ubuf *imu = node->buf;
>   	unsigned int i;
>   
> -	if (node->buf) {
> -		struct io_mapped_ubuf *imu = node->buf;
> -
> -		if (!refcount_dec_and_test(&imu->refs))
> -			return;
> -		for (i = 0; i < imu->nr_bvecs; i++)
> -			unpin_user_page(imu->bvec[i].bv_page);
> -		if (imu->acct_pages)
> -			io_unaccount_mem(ctx, imu->acct_pages);
> -		kvfree(imu);
> -	}
> +	if (!refcount_dec_and_test(&imu->refs))
> +		return;
> +	for (i = 0; i < imu->nr_bvecs; i++)
> +		unpin_user_page(imu->bvec[i].bv_page);
> +	if (imu->acct_pages)
> +		io_unaccount_mem(ctx, imu->acct_pages);
> +	kvfree(imu);
>   }
>   
>   struct io_rsrc_node *io_rsrc_node_alloc(int type)

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 02/11] io_uring/nop: reuse req->buf_index
  2025-02-24 21:31 ` [PATCHv5 02/11] io_uring/nop: reuse req->buf_index Keith Busch
  2025-02-24 23:30   ` Jens Axboe
  2025-02-25  8:43   ` Ming Lei
@ 2025-02-25 13:13   ` Pavel Begunkov
  2 siblings, 0 replies; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 13:13 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> There is already a field in io_kiocb that can store a registered buffer
> index, use that instead of stashing the value into struct io_nop.

Reviewed-by: Pavel Begunkov <[email protected]>

> 
> Signed-off-by: Keith Busch <[email protected]>
> ---
>   io_uring/nop.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/io_uring/nop.c b/io_uring/nop.c
> index 5e5196df650a1..ea539531cb5f6 100644
> --- a/io_uring/nop.c
> +++ b/io_uring/nop.c
> @@ -16,7 +16,6 @@ struct io_nop {
>   	struct file     *file;
>   	int             result;
>   	int		fd;
> -	int		buffer;
>   	unsigned int	flags;
>   };
>   
> @@ -40,9 +39,7 @@ int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>   	else
>   		nop->fd = -1;
>   	if (nop->flags & IORING_NOP_FIXED_BUFFER)
> -		nop->buffer = READ_ONCE(sqe->buf_index);
> -	else
> -		nop->buffer = -1;
> +		req->buf_index = READ_ONCE(sqe->buf_index);
>   	return 0;
>   }
>   
> @@ -69,7 +66,7 @@ int io_nop(struct io_kiocb *req, unsigned int issue_flags)
>   
>   		ret = -EFAULT;
>   		io_ring_submit_lock(ctx, issue_flags);
> -		node = io_rsrc_node_lookup(&ctx->buf_table, nop->buffer);
> +		node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
>   		if (node) {
>   			io_req_assign_buf_node(req, node);
>   			ret = 0;

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc
  2025-02-24 21:31 ` [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc Keith Busch
  2025-02-25  8:44   ` Ming Lei
@ 2025-02-25 13:14   ` Pavel Begunkov
  1 sibling, 0 replies; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 13:14 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Pavel Begunkov <[email protected]>
> 
> There is already a field in io_kiocb that can store a registered buffer
> index, use that instead of stashing the value into struct io_sr_msg.

Reviewed-by: Pavel Begunkov <[email protected]>

> 
> Reviewed-by: Keith Busch <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
>   io_uring/net.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/io_uring/net.c b/io_uring/net.c
> index 173546415ed17..fa35a6b58d472 100644
> --- a/io_uring/net.c
> +++ b/io_uring/net.c
> @@ -76,7 +76,6 @@ struct io_sr_msg {
>   	u16				flags;
>   	/* initialised and used only by !msg send variants */
>   	u16				buf_group;
> -	u16				buf_index;
>   	bool				retry;
>   	void __user			*msg_control;
>   	/* used only for send zerocopy */
> @@ -1371,7 +1370,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>   
>   	zc->len = READ_ONCE(sqe->len);
>   	zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL | MSG_ZEROCOPY;
> -	zc->buf_index = READ_ONCE(sqe->buf_index);
> +	req->buf_index = READ_ONCE(sqe->buf_index);
>   	if (zc->msg_flags & MSG_DONTWAIT)
>   		req->flags |= REQ_F_NOWAIT;
>   
> @@ -1447,7 +1446,7 @@ static int io_send_zc_import(struct io_kiocb *req, unsigned int issue_flags)
>   
>   		ret = -EFAULT;
>   		io_ring_submit_lock(ctx, issue_flags);
> -		node = io_rsrc_node_lookup(&ctx->buf_table, sr->buf_index);
> +		node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
>   		if (node) {
>   			io_req_assign_buf_node(sr->notif, node);
>   			ret = 0;

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path
  2025-02-24 21:31 ` [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path Keith Busch
  2025-02-25  9:26   ` Ming Lei
@ 2025-02-25 13:57   ` Pavel Begunkov
  2025-02-25 20:57   ` Caleb Sander Mateos
  2 siblings, 0 replies; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 13:57 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> Registered buffers may depend on a linked command, which makes the prep
> path too early to import. Move to the issue path when the node is
> actually needed like all the other users of fixed buffers.
> 
> Signed-off-by: Keith Busch <[email protected]>
> ---
>   io_uring/opdef.c |  8 ++++----
>   io_uring/rw.c    | 43 ++++++++++++++++++++++++++-----------------
>   io_uring/rw.h    |  4 ++--
>   3 files changed, 32 insertions(+), 23 deletions(-)
> 
> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
> index 9344534780a02..5369ae33b5ad9 100644
> --- a/io_uring/opdef.c
> +++ b/io_uring/opdef.c
> @@ -104,8 +104,8 @@ const struct io_issue_def io_issue_defs[] = {
>   		.iopoll			= 1,
>   		.iopoll_queue		= 1,
>   		.async_size		= sizeof(struct io_async_rw),
> -		.prep			= io_prep_read_fixed,
> -		.issue			= io_read,
> +		.prep			= io_prep_read,

io_prep_read_fixed() -> io_init_rw_fixed() -> io_prep_rw(do_import=false)
after:
io_prep_read() -> io_prep_rw(do_import=true)

This change flips do_import. I'd say, let's just remove the importing
bits from io_prep_rw_fixed(), and should be good for now.

Apart from that:

Reviewed-by: Pavel Begunkov <[email protected]>

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 07/11] io_uring: add support for kernel registered bvecs
  2025-02-24 21:31 ` [PATCHv5 07/11] io_uring: add support for kernel registered bvecs Keith Busch
  2025-02-25  9:40   ` Ming Lei
@ 2025-02-25 14:00   ` Pavel Begunkov
  2025-02-25 14:05     ` Pavel Begunkov
  2025-02-25 20:58   ` Caleb Sander Mateos
  2 siblings, 1 reply; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 14:00 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Keith Busch <[email protected]>
...
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index f814526982c36..e0c6ed3aef5b5 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -9,6 +9,7 @@
...
> +int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
> +			    void (*release)(void *), unsigned int index,
> +			    unsigned int issue_flags)
> +{
> +	struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> +	struct io_rsrc_data *data = &ctx->buf_table;
> +	struct req_iterator rq_iter;
> +	struct io_mapped_ubuf *imu;
> +	struct io_rsrc_node *node;
> +	struct bio_vec bv, *bvec;
> +	u16 nr_bvecs;
> +	int ret = 0;
> +
> +

nit: extra new line

> +	io_ring_submit_lock(ctx, issue_flags);
> +	if (index >= data->nr) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +	index = array_index_nospec(index, data->nr);
> +
> +	if (data->nodes[index] ) {

nit: extra space

> +		ret = -EBUSY;
> +		goto unlock;
> +	}
> +
...
> diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
> index f0e9080599646..64bf35667cf9c 100644
> --- a/io_uring/rsrc.h
> +++ b/io_uring/rsrc.h
> @@ -20,6 +20,11 @@ struct io_rsrc_node {
>   	};
>   };
>   
> +enum {
> +	IO_IMU_READABLE		= 1 << 0,
> +	IO_IMU_WRITEABLE	= 1 << 1,

1 << READ, 1 << WRITE

And let's add BUILD_BUG_ON that they fit into u8.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 07/11] io_uring: add support for kernel registered bvecs
  2025-02-25 14:00   ` Pavel Begunkov
@ 2025-02-25 14:05     ` Pavel Begunkov
  0 siblings, 0 replies; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 14:05 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/25/25 14:00, Pavel Begunkov wrote:
> On 2/24/25 21:31, Keith Busch wrote:
>> From: Keith Busch <[email protected]>
...
>> diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
>> index f0e9080599646..64bf35667cf9c 100644
>> --- a/io_uring/rsrc.h
>> +++ b/io_uring/rsrc.h
>> @@ -20,6 +20,11 @@ struct io_rsrc_node {
>>       };
>>   };
>> +enum {
>> +    IO_IMU_READABLE        = 1 << 0,
>> +    IO_IMU_WRITEABLE    = 1 << 1,
> 
> 1 << READ, 1 << WRITE
> 
> And let's add BUILD_BUG_ON that they fit into u8.

Apart from that and Ming's comments that patch looks good to me.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 00/11] ublk zero copy support
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (10 preceding siblings ...)
  2025-02-24 21:31 ` [PATCHv5 11/11] io_uring: cache nodes and mapped buffers Keith Busch
@ 2025-02-25 14:10 ` Pavel Begunkov
  2025-02-25 14:47   ` Jens Axboe
  2025-02-25 15:07 ` (subset) " Jens Axboe
  12 siblings, 1 reply; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 14:10 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> Changes from v4:

Should we pick the first 3 patches so that you don't have to carry
them around? Probably even [1-5] if we have the blessing from nvme
and Keith.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 00/11] ublk zero copy support
  2025-02-25 14:10 ` [PATCHv5 00/11] ublk zero copy support Pavel Begunkov
@ 2025-02-25 14:47   ` Jens Axboe
  0 siblings, 0 replies; 51+ messages in thread
From: Jens Axboe @ 2025-02-25 14:47 UTC (permalink / raw)
  To: Pavel Begunkov, Keith Busch, ming.lei, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/25/25 7:10 AM, Pavel Begunkov wrote:
> On 2/24/25 21:31, Keith Busch wrote:
>> From: Keith Busch <[email protected]>
>>
>> Changes from v4:
> 
> Should we pick the first 3 patches so that you don't have to carry
> them around? Probably even [1-5] if we have the blessing from nvme
> and Keith.

Yep I think 1-5 are good to go, I'll run a bit of testing and queue
it up. Last bits look pretty close too, but indeed easier to manage
a shrinking series for a v6 posting that can hopefully wrap it up.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: (subset) [PATCHv5 00/11] ublk zero copy support
  2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
                   ` (11 preceding siblings ...)
  2025-02-25 14:10 ` [PATCHv5 00/11] ublk zero copy support Pavel Begunkov
@ 2025-02-25 15:07 ` Jens Axboe
  12 siblings, 0 replies; 51+ messages in thread
From: Jens Axboe @ 2025-02-25 15:07 UTC (permalink / raw)
  To: ming.lei, asml.silence, linux-block, io-uring, Keith Busch
  Cc: bernd, csander, Keith Busch


On Mon, 24 Feb 2025 13:31:05 -0800, Keith Busch wrote:
> Changes from v4:
> 
>   A few cleanup prep patches from me and Pavel are at the beginning of
>   this series.
> 
>   Uses Pavel's combined buffer lookup and import. This simplifies
>   utilizing fixed buffers a bit later in the series, and obviates any
>   need to generically handle fixed buffers. This also fixes up the net
>   zero-copy notif assignemnet that Ming pointed out.
> 
> [...]

Applied, thanks!

[01/11] io_uring/rsrc: remove redundant check for valid imu
        commit: 559d80da74a0d61a92fffa085db165eea6431ee8
[02/11] io_uring/nop: reuse req->buf_index
        commit: ee993fe7a5f6641d0e02fbc5d6378d77b2f39d08
[03/11] io_uring/net: reuse req->buf_index for sendzc
        commit: 1a917a2d5c7ea5ea1640b260c280c2f805c94854
[04/11] io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed()
        commit: 7323341f44c17b27e5622d66460fa9726e44321a
[05/11] io_uring: combine buffer lookup and import
        commit: 82cbf420496cffbb8e228ebd065851155978bab6

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 10/11] io_uring: add abstraction for buf_table rsrc data
  2025-02-24 21:31 ` [PATCHv5 10/11] io_uring: add abstraction for buf_table rsrc data Keith Busch
@ 2025-02-25 16:04   ` Pavel Begunkov
  0 siblings, 0 replies; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 16:04 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> We'll need to add more fields specific to the registered buffers, so
> make a layer for it now. No functional change in this patch.

Reviewed-by: Pavel Begunkov <[email protected]>

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-24 21:31 ` [PATCHv5 09/11] ublk: zc register/unregister bvec Keith Busch
  2025-02-25 11:00   ` Ming Lei
@ 2025-02-25 16:19   ` Pavel Begunkov
  2025-02-25 16:27     ` Keith Busch
  2025-02-25 21:14   ` Caleb Sander Mateos
  2025-02-26  8:15   ` Ming Lei
  3 siblings, 1 reply; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 16:19 UTC (permalink / raw)
  To: Keith Busch, ming.lei, axboe, linux-block, io-uring
  Cc: bernd, csander, Keith Busch

On 2/24/25 21:31, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> Provide new operations for the user to request mapping an active request
> to an io uring instance's buf_table. The user has to provide the index
> it wants to install the buffer.

Do we ever fail requests here? I don't see any result propagation.
E.g. what if the ublk server fail, either being killed or just an
io_uring request using the buffer failed? Looking at
__ublk_complete_rq(), shouldn't someone set struct ublk_io::res?

io_uring plumbing lgtm

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-25 16:19   ` Pavel Begunkov
@ 2025-02-25 16:27     ` Keith Busch
  2025-02-25 16:42       ` Pavel Begunkov
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-25 16:27 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Keith Busch, ming.lei, axboe, linux-block, io-uring, bernd,
	csander

On Tue, Feb 25, 2025 at 04:19:37PM +0000, Pavel Begunkov wrote:
> On 2/24/25 21:31, Keith Busch wrote:
> > From: Keith Busch <[email protected]>
> > 
> > Provide new operations for the user to request mapping an active request
> > to an io uring instance's buf_table. The user has to provide the index
> > it wants to install the buffer.
> 
> Do we ever fail requests here? I don't see any result propagation.
> E.g. what if the ublk server fail, either being killed or just an
> io_uring request using the buffer failed? Looking at
> __ublk_complete_rq(), shouldn't someone set struct ublk_io::res?

If the ublk server is killed, the ublk driver timeout handler will abort
all incomplete requests.

If a backend request using this buffer fails, for example -EFAULT, then
the ublk server notifies the ublk driver frontend with that status in a
COMMIT_AND_FETCH command, and the ublk driver completes that frontend
request with an appropriate error status.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-25 11:00   ` Ming Lei
@ 2025-02-25 16:35     ` Keith Busch
  2025-02-25 22:56       ` Ming Lei
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-25 16:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, asml.silence, axboe, linux-block, io-uring, bernd,
	csander

On Tue, Feb 25, 2025 at 07:00:05PM +0800, Ming Lei wrote:
> On Mon, Feb 24, 2025 at 01:31:14PM -0800, Keith Busch wrote:
> >  static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
> >  {
> > -	return ub->dev_info.flags & UBLK_F_USER_COPY;
> > +	return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> >  }
> 
> I'd suggest to set UBLK_F_USER_COPY explicitly either from userspace or
> kernel side.
> 
> One reason is that UBLK_F_UNPRIVILEGED_DEV mode can't work for both.

In my reference implementation using ublksrv, I had the userspace
explicitly setting F_USER_COPY automatically if zero copy was requested.
Is that what you mean? Or do you need the kernel side to set both flags
if zero copy is requested too?

I actually have a newer diff for ublksrv making use of the SQE links.
I'll send that out with the next update since it looks like there will
need to be at least one more version.

Relevant part from the cover letter,
https://lore.kernel.org/io-uring/[email protected]/

diff --git a/ublksrv_tgt.cpp b/ublksrv_tgt.cpp
index 8f9cf28..f3ebe14 100644
--- a/ublksrv_tgt.cpp
+++ b/ublksrv_tgt.cpp
@@ -723,7 +723,7 @@ static int cmd_dev_add(int argc, char *argv[])
 			data.tgt_type = optarg;
 			break;
 		case 'z':
-			data.flags |= UBLK_F_SUPPORT_ZERO_COPY;
+			data.flags |= UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_USER_COPY;
 			break;
 		case 'q':
 			data.nr_hw_queues = strtol(optarg, NULL, 10);


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-25 16:27     ` Keith Busch
@ 2025-02-25 16:42       ` Pavel Begunkov
  2025-02-25 16:52         ` Keith Busch
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Begunkov @ 2025-02-25 16:42 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, ming.lei, axboe, linux-block, io-uring, bernd,
	csander

On 2/25/25 16:27, Keith Busch wrote:
> On Tue, Feb 25, 2025 at 04:19:37PM +0000, Pavel Begunkov wrote:
>> On 2/24/25 21:31, Keith Busch wrote:
>>> From: Keith Busch <[email protected]>
>>>
>>> Provide new operations for the user to request mapping an active request
>>> to an io uring instance's buf_table. The user has to provide the index
>>> it wants to install the buffer.
>>
>> Do we ever fail requests here? I don't see any result propagation.
>> E.g. what if the ublk server fail, either being killed or just an
>> io_uring request using the buffer failed? Looking at
>> __ublk_complete_rq(), shouldn't someone set struct ublk_io::res?
> 
> If the ublk server is killed, the ublk driver timeout handler will abort
> all incomplete requests.
> 
> If a backend request using this buffer fails, for example -EFAULT, then
> the ublk server notifies the ublk driver frontend with that status in a
> COMMIT_AND_FETCH command, and the ublk driver completes that frontend
> request with an appropriate error status.

I see. IIUC, the API assumes that in normal circumstances you
first unregister the buffer, and then issue another command like
COMMIT_AND_FETCH to finally complete the ublk request. Is that it?

Regardless

Reviewed-by: Pavel Begunkov <[email protected]> # io_uring

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-25 16:42       ` Pavel Begunkov
@ 2025-02-25 16:52         ` Keith Busch
  2025-02-27  4:16           ` Ming Lei
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-25 16:52 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Keith Busch, ming.lei, axboe, linux-block, io-uring, bernd,
	csander

On Tue, Feb 25, 2025 at 04:42:59PM +0000, Pavel Begunkov wrote:
> On 2/25/25 16:27, Keith Busch wrote:
> > On Tue, Feb 25, 2025 at 04:19:37PM +0000, Pavel Begunkov wrote:
> > > On 2/24/25 21:31, Keith Busch wrote:
> > > > From: Keith Busch <[email protected]>
> > > > 
> > > > Provide new operations for the user to request mapping an active request
> > > > to an io uring instance's buf_table. The user has to provide the index
> > > > it wants to install the buffer.
> > > 
> > > Do we ever fail requests here? I don't see any result propagation.
> > > E.g. what if the ublk server fail, either being killed or just an
> > > io_uring request using the buffer failed? Looking at
> > > __ublk_complete_rq(), shouldn't someone set struct ublk_io::res?
> > 
> > If the ublk server is killed, the ublk driver timeout handler will abort
> > all incomplete requests.
> > 
> > If a backend request using this buffer fails, for example -EFAULT, then
> > the ublk server notifies the ublk driver frontend with that status in a
> > COMMIT_AND_FETCH command, and the ublk driver completes that frontend
> > request with an appropriate error status.
> 
> I see. IIUC, the API assumes that in normal circumstances you
> first unregister the buffer, and then issue another command like
> COMMIT_AND_FETCH to finally complete the ublk request. Is that it?

Yes, that's the expected good sequence. It's okay if user space does it
the other around, too: commit first, then unregister. The registration
holds a reference on the ublk request, preventing it from completing.

The backend urin gthat registered the bvec can also be a different uring
instance than the frontend that notifies the of the commit-and-fetch. In
such a setup, the commit and unregister sequence could happen
concurrently, and that's also okay.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 07/11] io_uring: add support for kernel registered bvecs
  2025-02-25  9:40   ` Ming Lei
@ 2025-02-25 17:32     ` Keith Busch
  2025-02-25 22:47       ` Ming Lei
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-25 17:32 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, asml.silence, axboe, linux-block, io-uring, bernd,
	csander

On Tue, Feb 25, 2025 at 05:40:14PM +0800, Ming Lei wrote:
> On Mon, Feb 24, 2025 at 01:31:12PM -0800, Keith Busch wrote:
> > +
> > +	if (op_is_write(req_op(rq)))
> > +		imu->perm = IO_IMU_WRITEABLE;
> > +	else
> > +		imu->perm = IO_IMU_READABLE;
> 
> Looks the above is wrong, if request is for write op, the buffer
> should be readable & !writeable.
> 
> IO_IMU_WRITEABLE is supposed to mean the buffer is writeable, isn't it?

In the setup I used here, IMU_WRITEABLE means this can be used in a
write command. You can write from this buffer, not to it.

I think this is the kind of ambiguity that lead iov iter to call these
buffers SOURCE and DEST instead of WRITE and READ.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path
  2025-02-24 21:31 ` [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path Keith Busch
  2025-02-25  9:26   ` Ming Lei
  2025-02-25 13:57   ` Pavel Begunkov
@ 2025-02-25 20:57   ` Caleb Sander Mateos
  2025-02-25 21:16     ` Keith Busch
  2 siblings, 1 reply; 51+ messages in thread
From: Caleb Sander Mateos @ 2025-02-25 20:57 UTC (permalink / raw)
  To: Keith Busch
  Cc: ming.lei, asml.silence, axboe, linux-block, io-uring, bernd,
	Keith Busch

On Mon, Feb 24, 2025 at 1:31 PM Keith Busch <[email protected]> wrote:
>
> From: Keith Busch <[email protected]>
>
> Registered buffers may depend on a linked command, which makes the prep
> path too early to import. Move to the issue path when the node is
> actually needed like all the other users of fixed buffers.
>
> Signed-off-by: Keith Busch <[email protected]>
> ---
>  io_uring/opdef.c |  8 ++++----
>  io_uring/rw.c    | 43 ++++++++++++++++++++++++++-----------------
>  io_uring/rw.h    |  4 ++--
>  3 files changed, 32 insertions(+), 23 deletions(-)
>
> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
> index 9344534780a02..5369ae33b5ad9 100644
> --- a/io_uring/opdef.c
> +++ b/io_uring/opdef.c
> @@ -104,8 +104,8 @@ const struct io_issue_def io_issue_defs[] = {
>                 .iopoll                 = 1,
>                 .iopoll_queue           = 1,
>                 .async_size             = sizeof(struct io_async_rw),
> -               .prep                   = io_prep_read_fixed,
> -               .issue                  = io_read,
> +               .prep                   = io_prep_read,
> +               .issue                  = io_read_fixed,
>         },
>         [IORING_OP_WRITE_FIXED] = {
>                 .needs_file             = 1,
> @@ -118,8 +118,8 @@ const struct io_issue_def io_issue_defs[] = {
>                 .iopoll                 = 1,
>                 .iopoll_queue           = 1,
>                 .async_size             = sizeof(struct io_async_rw),
> -               .prep                   = io_prep_write_fixed,
> -               .issue                  = io_write,
> +               .prep                   = io_prep_write,
> +               .issue                  = io_write_fixed,
>         },
>         [IORING_OP_POLL_ADD] = {
>                 .needs_file             = 1,
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index db24bcd4c6335..5f37fa48fdd9b 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -348,33 +348,20 @@ int io_prep_writev(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>         return io_prep_rwv(req, sqe, ITER_SOURCE);
>  }
>
> -static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> -                           int ddir)
> +static int io_init_rw_fixed(struct io_kiocb *req, unsigned int issue_flags, int ddir)
>  {
>         struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
> -       struct io_async_rw *io;
> +       struct io_async_rw *io = req->async_data;
>         int ret;
>
> -       ret = io_prep_rw(req, sqe, ddir, false);
> -       if (unlikely(ret))
> -               return ret;
> +       if (io->bytes_done)
> +               return 0;
>
> -       io = req->async_data;
>         ret = io_import_reg_buf(req, &io->iter, rw->addr, rw->len, ddir, 0);

Shouldn't this be passing issue_flags here?

Best,
Caleb



>         iov_iter_save_state(&io->iter, &io->iter_state);
>         return ret;
>  }
>
> -int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> -{
> -       return io_prep_rw_fixed(req, sqe, ITER_DEST);
> -}
> -
> -int io_prep_write_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> -{
> -       return io_prep_rw_fixed(req, sqe, ITER_SOURCE);
> -}
> -
>  /*
>   * Multishot read is prepared just like a normal read/write request, only
>   * difference is that we set the MULTISHOT flag.
> @@ -1138,6 +1125,28 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>         }
>  }
>
> +int io_read_fixed(struct io_kiocb *req, unsigned int issue_flags)
> +{
> +       int ret;
> +
> +       ret = io_init_rw_fixed(req, issue_flags, ITER_DEST);
> +       if (ret)
> +               return ret;
> +
> +       return io_read(req, issue_flags);
> +}
> +
> +int io_write_fixed(struct io_kiocb *req, unsigned int issue_flags)
> +{
> +       int ret;
> +
> +       ret = io_init_rw_fixed(req, issue_flags, ITER_SOURCE);
> +       if (ret)
> +               return ret;
> +
> +       return io_write(req, issue_flags);
> +}
> +
>  void io_rw_fail(struct io_kiocb *req)
>  {
>         int res;
> diff --git a/io_uring/rw.h b/io_uring/rw.h
> index a45e0c71b59d6..42a491d277273 100644
> --- a/io_uring/rw.h
> +++ b/io_uring/rw.h
> @@ -30,14 +30,14 @@ struct io_async_rw {
>         );
>  };
>
> -int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe);
> -int io_prep_write_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe);
>  int io_prep_readv(struct io_kiocb *req, const struct io_uring_sqe *sqe);
>  int io_prep_writev(struct io_kiocb *req, const struct io_uring_sqe *sqe);
>  int io_prep_read(struct io_kiocb *req, const struct io_uring_sqe *sqe);
>  int io_prep_write(struct io_kiocb *req, const struct io_uring_sqe *sqe);
>  int io_read(struct io_kiocb *req, unsigned int issue_flags);
>  int io_write(struct io_kiocb *req, unsigned int issue_flags);
> +int io_read_fixed(struct io_kiocb *req, unsigned int issue_flags);
> +int io_write_fixed(struct io_kiocb *req, unsigned int issue_flags);
>  void io_readv_writev_cleanup(struct io_kiocb *req);
>  void io_rw_fail(struct io_kiocb *req);
>  void io_req_rw_complete(struct io_kiocb *req, io_tw_token_t tw);
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 07/11] io_uring: add support for kernel registered bvecs
  2025-02-24 21:31 ` [PATCHv5 07/11] io_uring: add support for kernel registered bvecs Keith Busch
  2025-02-25  9:40   ` Ming Lei
  2025-02-25 14:00   ` Pavel Begunkov
@ 2025-02-25 20:58   ` Caleb Sander Mateos
  2 siblings, 0 replies; 51+ messages in thread
From: Caleb Sander Mateos @ 2025-02-25 20:58 UTC (permalink / raw)
  To: Keith Busch
  Cc: ming.lei, asml.silence, axboe, linux-block, io-uring, bernd,
	Keith Busch

On Mon, Feb 24, 2025 at 1:31 PM Keith Busch <[email protected]> wrote:
>
> From: Keith Busch <[email protected]>
>
> Provide an interface for the kernel to leverage the existing
> pre-registered buffers that io_uring provides. User space can reference
> these later to achieve zero-copy IO.
>
> User space must register an empty fixed buffer table with io_uring in
> order for the kernel to make use of it.
>
> Signed-off-by: Keith Busch <[email protected]>
> ---
>  include/linux/io_uring/cmd.h |   7 ++
>  io_uring/rsrc.c              | 123 +++++++++++++++++++++++++++++++++--
>  io_uring/rsrc.h              |   8 +++
>  3 files changed, 131 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
> index 87150dc0a07cf..cf8d80d847344 100644
> --- a/include/linux/io_uring/cmd.h
> +++ b/include/linux/io_uring/cmd.h
> @@ -4,6 +4,7 @@
>
>  #include <uapi/linux/io_uring.h>
>  #include <linux/io_uring_types.h>
> +#include <linux/blk-mq.h>
>
>  /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */
>  #define IORING_URING_CMD_CANCELABLE    (1U << 30)
> @@ -125,4 +126,10 @@ static inline struct io_uring_cmd_data *io_uring_cmd_get_async_data(struct io_ur
>         return cmd_to_io_kiocb(cmd)->async_data;
>  }
>
> +int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
> +                           void (*release)(void *), unsigned int index,
> +                           unsigned int issue_flags);
> +void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
> +                              unsigned int issue_flags);
> +
>  #endif /* _LINUX_IO_URING_CMD_H */
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index f814526982c36..e0c6ed3aef5b5 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -9,6 +9,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/compat.h>
>  #include <linux/io_uring.h>
> +#include <linux/io_uring/cmd.h>
>
>  #include <uapi/linux/io_uring.h>
>
> @@ -104,14 +105,21 @@ int io_buffer_validate(struct iovec *iov)
>  static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
>  {
>         struct io_mapped_ubuf *imu = node->buf;
> -       unsigned int i;
>
>         if (!refcount_dec_and_test(&imu->refs))
>                 return;
> -       for (i = 0; i < imu->nr_bvecs; i++)
> -               unpin_user_page(imu->bvec[i].bv_page);
> -       if (imu->acct_pages)
> -               io_unaccount_mem(ctx, imu->acct_pages);
> +
> +       if (imu->release) {
> +               imu->release(imu->priv);
> +       } else {
> +               unsigned int i;
> +
> +               for (i = 0; i < imu->nr_bvecs; i++)
> +                       unpin_user_page(imu->bvec[i].bv_page);
> +               if (imu->acct_pages)
> +                       io_unaccount_mem(ctx, imu->acct_pages);
> +       }
> +
>         kvfree(imu);
>  }
>
> @@ -761,6 +769,9 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx,
>         imu->len = iov->iov_len;
>         imu->nr_bvecs = nr_pages;
>         imu->folio_shift = PAGE_SHIFT;
> +       imu->release = NULL;
> +       imu->priv = NULL;
> +       imu->perm = IO_IMU_READABLE | IO_IMU_WRITEABLE;
>         if (coalesced)
>                 imu->folio_shift = data.folio_shift;
>         refcount_set(&imu->refs, 1);
> @@ -857,6 +868,95 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
>         return ret;
>  }
>
> +int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
> +                           void (*release)(void *), unsigned int index,
> +                           unsigned int issue_flags)
> +{
> +       struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> +       struct io_rsrc_data *data = &ctx->buf_table;
> +       struct req_iterator rq_iter;
> +       struct io_mapped_ubuf *imu;
> +       struct io_rsrc_node *node;
> +       struct bio_vec bv, *bvec;
> +       u16 nr_bvecs;
> +       int ret = 0;
> +
> +
> +       io_ring_submit_lock(ctx, issue_flags);
> +       if (index >= data->nr) {
> +               ret = -EINVAL;
> +               goto unlock;
> +       }
> +       index = array_index_nospec(index, data->nr);
> +
> +       if (data->nodes[index] ) {

nit: extra space before )

> +               ret = -EBUSY;
> +               goto unlock;
> +       }
> +
> +       node = io_rsrc_node_alloc(IORING_RSRC_BUFFER);
> +       if (!node) {
> +               ret = -ENOMEM;
> +               goto unlock;
> +       }
> +
> +       nr_bvecs = blk_rq_nr_phys_segments(rq);
> +       imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL);
> +       if (!imu) {
> +               kfree(node);
> +               ret = -ENOMEM;
> +               goto unlock;
> +       }
> +
> +       imu->ubuf = 0;
> +       imu->len = blk_rq_bytes(rq);
> +       imu->acct_pages = 0;
> +       imu->folio_shift = PAGE_SHIFT;
> +       imu->nr_bvecs = nr_bvecs;
> +       refcount_set(&imu->refs, 1);
> +       imu->release = release;
> +       imu->priv = rq;
> +
> +       if (op_is_write(req_op(rq)))
> +               imu->perm = IO_IMU_WRITEABLE;
> +       else
> +               imu->perm = IO_IMU_READABLE;

imu->perm = 1 << rq_data_dir(rq); ?

> +
> +       bvec = imu->bvec;
> +       rq_for_each_bvec(bv, rq, rq_iter)
> +               *bvec++ = bv;
> +
> +       node->buf = imu;
> +       data->nodes[index] = node;
> +unlock:
> +       io_ring_submit_unlock(ctx, issue_flags);
> +       return ret;
> +}
> +EXPORT_SYMBOL_GPL(io_buffer_register_bvec);
> +
> +void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
> +                              unsigned int issue_flags)
> +{
> +       struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
> +       struct io_rsrc_data *data = &ctx->buf_table;
> +       struct io_rsrc_node *node;
> +
> +       io_ring_submit_lock(ctx, issue_flags);
> +       if (index >= data->nr)
> +               goto unlock;
> +       index = array_index_nospec(index, data->nr);
> +
> +       node = data->nodes[index];
> +       if (!node || !node->buf->release)
> +               goto unlock;

Would it be useful to return some error code in these cases so
userspace can tell that the unregistration parameters were invalid?

Best,
Caleb


> +
> +       io_put_rsrc_node(ctx, node);
> +       data->nodes[index] = NULL;
> +unlock:
> +       io_ring_submit_unlock(ctx, issue_flags);
> +}
> +EXPORT_SYMBOL_GPL(io_buffer_unregister_bvec);
> +
>  static int io_import_fixed(int ddir, struct iov_iter *iter,
>                            struct io_mapped_ubuf *imu,
>                            u64 buf_addr, size_t len)
> @@ -871,6 +971,8 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>         /* not inside the mapped region */
>         if (unlikely(buf_addr < imu->ubuf || buf_end > (imu->ubuf + imu->len)))
>                 return -EFAULT;
> +       if (!(imu->perm & (1 << ddir)))
> +               return -EFAULT;
>
>         /*
>          * Might not be a start of buffer, set size appropriately
> @@ -883,8 +985,8 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>                 /*
>                  * Don't use iov_iter_advance() here, as it's really slow for
>                  * using the latter parts of a big fixed buffer - it iterates
> -                * over each segment manually. We can cheat a bit here, because
> -                * we know that:
> +                * over each segment manually. We can cheat a bit here for user
> +                * registered nodes, because we know that:
>                  *
>                  * 1) it's a BVEC iter, we set it up
>                  * 2) all bvecs are the same in size, except potentially the
> @@ -898,8 +1000,15 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>                  */
>                 const struct bio_vec *bvec = imu->bvec;
>
> +               /*
> +                * Kernel buffer bvecs, on the other hand, don't necessarily
> +                * have the size property of user registered ones, so we have
> +                * to use the slow iter advance.
> +                */
>                 if (offset < bvec->bv_len) {
>                         iter->iov_offset = offset;
> +               } else if (imu->release) {
> +                       iov_iter_advance(iter, offset);
>                 } else {
>                         unsigned long seg_skip;
>
> diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
> index f0e9080599646..64bf35667cf9c 100644
> --- a/io_uring/rsrc.h
> +++ b/io_uring/rsrc.h
> @@ -20,6 +20,11 @@ struct io_rsrc_node {
>         };
>  };
>
> +enum {
> +       IO_IMU_READABLE         = 1 << 0,
> +       IO_IMU_WRITEABLE        = 1 << 1,
> +};
> +
>  struct io_mapped_ubuf {
>         u64             ubuf;
>         unsigned int    len;
> @@ -27,6 +32,9 @@ struct io_mapped_ubuf {
>         unsigned int    folio_shift;
>         refcount_t      refs;
>         unsigned long   acct_pages;
> +       void            (*release)(void *);
> +       void            *priv;
> +       u8              perm;
>         struct bio_vec  bvec[] __counted_by(nr_bvecs);
>  };
>
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-24 21:31 ` [PATCHv5 09/11] ublk: zc register/unregister bvec Keith Busch
  2025-02-25 11:00   ` Ming Lei
  2025-02-25 16:19   ` Pavel Begunkov
@ 2025-02-25 21:14   ` Caleb Sander Mateos
  2025-02-26  8:15   ` Ming Lei
  3 siblings, 0 replies; 51+ messages in thread
From: Caleb Sander Mateos @ 2025-02-25 21:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: ming.lei, asml.silence, axboe, linux-block, io-uring, bernd,
	Keith Busch

On Mon, Feb 24, 2025 at 1:31 PM Keith Busch <[email protected]> wrote:
>
> From: Keith Busch <[email protected]>
>
> Provide new operations for the user to request mapping an active request
> to an io uring instance's buf_table. The user has to provide the index
> it wants to install the buffer.
>
> A reference count is taken on the request to ensure it can't be
> completed while it is active in a ring's buf_table.
>
> Signed-off-by: Keith Busch <[email protected]>
> ---
>  drivers/block/ublk_drv.c      | 117 +++++++++++++++++++++++-----------
>  include/uapi/linux/ublk_cmd.h |   4 ++
>  2 files changed, 85 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 529085181f355..a719d873e3882 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -51,6 +51,9 @@
>  /* private ioctl command mirror */
>  #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC)
>
> +#define UBLK_IO_REGISTER_IO_BUF                _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF)
> +#define UBLK_IO_UNREGISTER_IO_BUF      _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF)
> +
>  /* All UBLK_F_* have to be included into UBLK_F_ALL */
>  #define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \
>                 | UBLK_F_URING_CMD_COMP_IN_TASK \
> @@ -201,7 +204,7 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
>                                                    int tag);
>  static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
>  {
> -       return ub->dev_info.flags & UBLK_F_USER_COPY;
> +       return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
>  }
>
>  static inline bool ublk_dev_is_zoned(const struct ublk_device *ub)
> @@ -581,7 +584,7 @@ static void ublk_apply_params(struct ublk_device *ub)
>
>  static inline bool ublk_support_user_copy(const struct ublk_queue *ubq)
>  {
> -       return ubq->flags & UBLK_F_USER_COPY;
> +       return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
>  }
>
>  static inline bool ublk_need_req_ref(const struct ublk_queue *ubq)
> @@ -1747,6 +1750,77 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd,
>         io_uring_cmd_mark_cancelable(cmd, issue_flags);
>  }
>
> +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
> +               struct ublk_queue *ubq, int tag, size_t offset)
> +{
> +       struct request *req;
> +
> +       if (!ublk_need_req_ref(ubq))
> +               return NULL;
> +
> +       req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag);
> +       if (!req)
> +               return NULL;
> +
> +       if (!ublk_get_req_ref(ubq, req))
> +               return NULL;
> +
> +       if (unlikely(!blk_mq_request_started(req) || req->tag != tag))
> +               goto fail_put;
> +
> +       if (!ublk_rq_has_data(req))
> +               goto fail_put;
> +
> +       if (offset > blk_rq_bytes(req))
> +               goto fail_put;
> +
> +       return req;
> +fail_put:
> +       ublk_put_req_ref(ubq, req);
> +       return NULL;
> +}
> +
> +static void ublk_io_release(void *priv)
> +{
> +       struct request *rq = priv;
> +       struct ublk_queue *ubq = rq->mq_hctx->driver_data;
> +
> +       ublk_put_req_ref(ubq, rq);
> +}
> +
> +static int ublk_register_io_buf(struct io_uring_cmd *cmd,
> +                               struct ublk_queue *ubq, unsigned int tag,
> +                               const struct ublksrv_io_cmd *ub_cmd,
> +                               unsigned int issue_flags)
> +{
> +       struct ublk_device *ub = cmd->file->private_data;
> +       int index = (int)ub_cmd->addr, ret;
> +       struct request *req;
> +
> +       req = __ublk_check_and_get_req(ub, ubq, tag, 0);
> +       if (!req)
> +               return -EINVAL;
> +
> +       ret = io_buffer_register_bvec(cmd, req, ublk_io_release, index,
> +                                     issue_flags);
> +       if (ret) {
> +               ublk_put_req_ref(ubq, req);
> +               return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +static int ublk_unregister_io_buf(struct io_uring_cmd *cmd,
> +                                 const struct ublksrv_io_cmd *ub_cmd,
> +                                 unsigned int issue_flags)
> +{
> +       int index = (int)ub_cmd->addr;
> +
> +       io_buffer_unregister_bvec(cmd, index, issue_flags);
> +       return 0;
> +}
> +
>  static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
>                                unsigned int issue_flags,
>                                const struct ublksrv_io_cmd *ub_cmd)
> @@ -1798,6 +1872,10 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
>
>         ret = -EINVAL;
>         switch (_IOC_NR(cmd_op)) {
> +       case UBLK_IO_REGISTER_IO_BUF:
> +               return ublk_register_io_buf(cmd, ubq, tag, ub_cmd, issue_flags);
> +       case UBLK_IO_UNREGISTER_IO_BUF:
> +               return ublk_unregister_io_buf(cmd, ub_cmd, issue_flags);

In the other cases, completion happens asynchronously by returning
-EIOCBQUEUED and calling io_uring_cmd_done() when the command
finishes. It looks like that's necessary because
ublk_ch_uring_cmd_cb() ignores the return value from
__ublk_ch_uring_cmd()/ublk_ch_uring_cmd_local(). (In the non-task-work
case, ublk_ch_uring_cmd() does propagate the return value.) Maybe
ublk_ch_uring_cmd_cb() should check the return value and call
io_uring_cmd_done() if it's not -EIOCBQUEUED.

Best,
Caleb


>         case UBLK_IO_FETCH_REQ:
>                 /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */
>                 if (ublk_queue_ready(ubq)) {
> @@ -1872,36 +1950,6 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
>         return -EIOCBQUEUED;
>  }
>
> -static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
> -               struct ublk_queue *ubq, int tag, size_t offset)
> -{
> -       struct request *req;
> -
> -       if (!ublk_need_req_ref(ubq))
> -               return NULL;
> -
> -       req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag);
> -       if (!req)
> -               return NULL;
> -
> -       if (!ublk_get_req_ref(ubq, req))
> -               return NULL;
> -
> -       if (unlikely(!blk_mq_request_started(req) || req->tag != tag))
> -               goto fail_put;
> -
> -       if (!ublk_rq_has_data(req))
> -               goto fail_put;
> -
> -       if (offset > blk_rq_bytes(req))
> -               goto fail_put;
> -
> -       return req;
> -fail_put:
> -       ublk_put_req_ref(ubq, req);
> -       return NULL;
> -}
> -
>  static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd,
>                 unsigned int issue_flags)
>  {
> @@ -2527,9 +2575,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
>                 goto out_free_dev_number;
>         }
>
> -       /* We are not ready to support zero copy */
> -       ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY;
> -
>         ub->dev_info.nr_hw_queues = min_t(unsigned int,
>                         ub->dev_info.nr_hw_queues, nr_cpu_ids);
>         ublk_align_max_io_size(ub);
> @@ -2860,7 +2905,7 @@ static int ublk_ctrl_get_features(struct io_uring_cmd *cmd)
>  {
>         const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe);
>         void __user *argp = (void __user *)(unsigned long)header->addr;
> -       u64 features = UBLK_F_ALL & ~UBLK_F_SUPPORT_ZERO_COPY;
> +       u64 features = UBLK_F_ALL;
>
>         if (header->len != UBLK_FEATURES_LEN || !header->addr)
>                 return -EINVAL;
> diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h
> index a8bc98bb69fce..74246c926b55f 100644
> --- a/include/uapi/linux/ublk_cmd.h
> +++ b/include/uapi/linux/ublk_cmd.h
> @@ -94,6 +94,10 @@
>         _IOWR('u', UBLK_IO_COMMIT_AND_FETCH_REQ, struct ublksrv_io_cmd)
>  #define        UBLK_U_IO_NEED_GET_DATA         \
>         _IOWR('u', UBLK_IO_NEED_GET_DATA, struct ublksrv_io_cmd)
> +#define        UBLK_U_IO_REGISTER_IO_BUF       \
> +       _IOWR('u', 0x23, struct ublksrv_io_cmd)
> +#define        UBLK_U_IO_UNREGISTER_IO_BUF     \
> +       _IOWR('u', 0x24, struct ublksrv_io_cmd)
>
>  /* only ABORT means that no re-fetch */
>  #define UBLK_IO_RES_OK                 0
> --
> 2.43.5
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path
  2025-02-25 20:57   ` Caleb Sander Mateos
@ 2025-02-25 21:16     ` Keith Busch
  0 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-25 21:16 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Keith Busch, ming.lei, asml.silence, axboe, linux-block, io-uring,
	bernd

On Tue, Feb 25, 2025 at 12:57:43PM -0800, Caleb Sander Mateos wrote:
> On Mon, Feb 24, 2025 at 1:31 PM Keith Busch <[email protected]> wrote:
> > +static int io_init_rw_fixed(struct io_kiocb *req, unsigned int issue_flags, int ddir)
> >  {
> >         struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
> > -       struct io_async_rw *io;
> > +       struct io_async_rw *io = req->async_data;
> >         int ret;
> >
> > -       ret = io_prep_rw(req, sqe, ddir, false);
> > -       if (unlikely(ret))
> > -               return ret;
> > +       if (io->bytes_done)
> > +               return 0;
> >
> > -       io = req->async_data;
> >         ret = io_import_reg_buf(req, &io->iter, rw->addr, rw->len, ddir, 0);
> 
> Shouldn't this be passing issue_flags here?

Definitely should be doing that, and I have that in my next version
already. Was hoping to get that fixed up version out before anyone
noticed, but you got me. 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 07/11] io_uring: add support for kernel registered bvecs
  2025-02-25 17:32     ` Keith Busch
@ 2025-02-25 22:47       ` Ming Lei
  2025-02-25 22:55         ` Keith Busch
  0 siblings, 1 reply; 51+ messages in thread
From: Ming Lei @ 2025-02-25 22:47 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, asml.silence, axboe, linux-block, io-uring, bernd,
	csander

On Wed, Feb 26, 2025 at 1:32 AM Keith Busch <[email protected]> wrote:
>
> On Tue, Feb 25, 2025 at 05:40:14PM +0800, Ming Lei wrote:
> > On Mon, Feb 24, 2025 at 01:31:12PM -0800, Keith Busch wrote:
> > > +
> > > +   if (op_is_write(req_op(rq)))
> > > +           imu->perm = IO_IMU_WRITEABLE;
> > > +   else
> > > +           imu->perm = IO_IMU_READABLE;
> >
> > Looks the above is wrong, if request is for write op, the buffer
> > should be readable & !writeable.
> >
> > IO_IMU_WRITEABLE is supposed to mean the buffer is writeable, isn't it?
>
> In the setup I used here, IMU_WRITEABLE means this can be used in a
> write command. You can write from this buffer, not to it.

But IMU represents a buffer, and the buffer could be used for other
OPs in future,
instead of write command only. Here it is more readable to mark the buffer
readable or writable.

I'd suggest not introducing the confusion from the beginning.

Thanks,


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 07/11] io_uring: add support for kernel registered bvecs
  2025-02-25 22:47       ` Ming Lei
@ 2025-02-25 22:55         ` Keith Busch
  0 siblings, 0 replies; 51+ messages in thread
From: Keith Busch @ 2025-02-25 22:55 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, asml.silence, axboe, linux-block, io-uring, bernd,
	csander

On Wed, Feb 26, 2025 at 06:47:54AM +0800, Ming Lei wrote:
> On Wed, Feb 26, 2025 at 1:32 AM Keith Busch <[email protected]> wrote:
> >
> > On Tue, Feb 25, 2025 at 05:40:14PM +0800, Ming Lei wrote:
> > > On Mon, Feb 24, 2025 at 01:31:12PM -0800, Keith Busch wrote:
> > > > +
> > > > +   if (op_is_write(req_op(rq)))
> > > > +           imu->perm = IO_IMU_WRITEABLE;
> > > > +   else
> > > > +           imu->perm = IO_IMU_READABLE;
> > >
> > > Looks the above is wrong, if request is for write op, the buffer
> > > should be readable & !writeable.
> > >
> > > IO_IMU_WRITEABLE is supposed to mean the buffer is writeable, isn't it?
> >
> > In the setup I used here, IMU_WRITEABLE means this can be used in a
> > write command. You can write from this buffer, not to it.
> 
> But IMU represents a buffer, and the buffer could be used for other
> OPs in future,
> instead of write command only. Here it is more readable to mark the buffer
> readable or writable.
> 
> I'd suggest not introducing the confusion from the beginning.

Absolutely, no disagreement here. My next version calls the flags
"IO_IMU_SOURCE" and "IO_IMU_DEST" and defined from the same ITER_
values.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-25 16:35     ` Keith Busch
@ 2025-02-25 22:56       ` Ming Lei
  0 siblings, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-25 22:56 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, asml.silence, axboe, linux-block, io-uring, bernd,
	csander

On Tue, Feb 25, 2025 at 09:35:30AM -0700, Keith Busch wrote:
> On Tue, Feb 25, 2025 at 07:00:05PM +0800, Ming Lei wrote:
> > On Mon, Feb 24, 2025 at 01:31:14PM -0800, Keith Busch wrote:
> > >  static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub)
> > >  {
> > > -	return ub->dev_info.flags & UBLK_F_USER_COPY;
> > > +	return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY);
> > >  }
> > 
> > I'd suggest to set UBLK_F_USER_COPY explicitly either from userspace or
> > kernel side.
> > 
> > One reason is that UBLK_F_UNPRIVILEGED_DEV mode can't work for both.
> 
> In my reference implementation using ublksrv, I had the userspace
> explicitly setting F_USER_COPY automatically if zero copy was requested.
> Is that what you mean? Or do you need the kernel side to set both flags
> if zero copy is requested too?

Then the driver side has to validate the setting, and fail ZERO_COPY if
F_USER_COPY isn't set.

> 
> I actually have a newer diff for ublksrv making use of the SQE links.
> I'll send that out with the next update since it looks like there will
> need to be at least one more version.
> 
> Relevant part from the cover letter,
> https://lore.kernel.org/io-uring/[email protected]/

OK, I will try to cook a ublk selftest in kernel tree so that the
cross-subsystem change can be covered a bit easier.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-24 21:31 ` [PATCHv5 09/11] ublk: zc register/unregister bvec Keith Busch
                     ` (2 preceding siblings ...)
  2025-02-25 21:14   ` Caleb Sander Mateos
@ 2025-02-26  8:15   ` Ming Lei
  2025-02-26 17:10     ` Keith Busch
  3 siblings, 1 reply; 51+ messages in thread
From: Ming Lei @ 2025-02-26  8:15 UTC (permalink / raw)
  To: Keith Busch
  Cc: asml.silence, axboe, linux-block, io-uring, bernd, csander,
	Keith Busch

On Mon, Feb 24, 2025 at 01:31:14PM -0800, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> Provide new operations for the user to request mapping an active request
> to an io uring instance's buf_table. The user has to provide the index
> it wants to install the buffer.
> 
> A reference count is taken on the request to ensure it can't be
> completed while it is active in a ring's buf_table.
> 
> Signed-off-by: Keith Busch <[email protected]>
> ---

Looks IO_LINK doesn't work, and UNREG_BUF cqe can be received from userspace.

It is triggered reliably in the ublk selftests(test_loop_03.sh) I just post out:

https://lore.kernel.org/linux-block/[email protected]/T/#m3adfecbfa33de9f9f728ccb4ab1185091be34797


Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-26  8:15   ` Ming Lei
@ 2025-02-26 17:10     ` Keith Busch
  2025-02-27  4:19       ` Ming Lei
  0 siblings, 1 reply; 51+ messages in thread
From: Keith Busch @ 2025-02-26 17:10 UTC (permalink / raw)
  To: Ming Lei
  Cc: Keith Busch, asml.silence, axboe, linux-block, io-uring, bernd,
	csander

On Wed, Feb 26, 2025 at 04:15:39PM +0800, Ming Lei wrote:
> On Mon, Feb 24, 2025 at 01:31:14PM -0800, Keith Busch wrote:
> > From: Keith Busch <[email protected]>
> > 
> > Provide new operations for the user to request mapping an active request
> > to an io uring instance's buf_table. The user has to provide the index
> > it wants to install the buffer.
> > 
> > A reference count is taken on the request to ensure it can't be
> > completed while it is active in a ring's buf_table.
> > 
> > Signed-off-by: Keith Busch <[email protected]>
> > ---
> 
> Looks IO_LINK doesn't work, and UNREG_BUF cqe can be received from userspace.

You can link the register, but should do the unregister with COMMIT
command on the frontend when the backend is complete. This doesn't need
the triple SQE requirement.

I was going to share with the next version, but since you bring it up
now, here's the reference patch for ublksrv using links:

---
diff --git a/include/ublk_cmd.h b/include/ublk_cmd.h
index 0150003..07439be 100644
--- a/include/ublk_cmd.h
+++ b/include/ublk_cmd.h
@@ -94,6 +94,10 @@
 	_IOWR('u', UBLK_IO_COMMIT_AND_FETCH_REQ, struct ublksrv_io_cmd)
 #define	UBLK_U_IO_NEED_GET_DATA		\
 	_IOWR('u', UBLK_IO_NEED_GET_DATA, struct ublksrv_io_cmd)
+#define UBLK_U_IO_REGISTER_IO_BUF	\
+	_IOWR('u', 0x23, struct ublksrv_io_cmd)
+#define UBLK_U_IO_UNREGISTER_IO_BUF	\
+	_IOWR('u', 0x24, struct ublksrv_io_cmd)
 
 /* only ABORT means that no re-fetch */
 #define UBLK_IO_RES_OK			0
diff --git a/include/ublksrv_tgt.h b/include/ublksrv_tgt.h
index 1deee2b..c331963 100644
--- a/include/ublksrv_tgt.h
+++ b/include/ublksrv_tgt.h
@@ -99,6 +99,7 @@ struct ublk_io_tgt {
 	co_handle_type co;
 	const struct io_uring_cqe *tgt_io_cqe;
 	int queued_tgt_io;	/* obsolete */
+	bool needs_unregister;
 };
 
 static inline struct ublk_io_tgt *__ublk_get_io_tgt_data(const struct ublk_io_data *io)
diff --git a/lib/ublksrv.c b/lib/ublksrv.c
index 16a9e13..7205247 100644
--- a/lib/ublksrv.c
+++ b/lib/ublksrv.c
@@ -619,6 +619,15 @@ skip_alloc_buf:
 		goto fail;
 	}
 
+	if (ctrl_dev->dev_info.flags & UBLK_F_SUPPORT_ZERO_COPY) {
+		ret = io_uring_register_buffers_sparse(&q->ring, q->q_depth);
+		if (ret) {
+			ublk_err("ublk dev %d queue %d register spare buffers failed %d",
+					q->dev->ctrl_dev->dev_info.dev_id, q->q_id, ret);
+			goto fail;
+		}
+	}
+
 	io_uring_register_ring_fd(&q->ring);
 
 	/*
diff --git a/tgt_loop.cpp b/tgt_loop.cpp
index 0f16676..91f8c81 100644
--- a/tgt_loop.cpp
+++ b/tgt_loop.cpp
@@ -246,12 +246,70 @@ static inline int loop_fallocate_mode(const struct ublksrv_io_desc *iod)
        return mode;
 }
 
+static inline void io_uring_prep_buf_register(struct io_uring_sqe *sqe,
+		int dev_fd, int tag, int q_id, __u64 index)
+{
+	struct ublksrv_io_cmd *cmd = (struct ublksrv_io_cmd *)sqe->cmd;
+
+	io_uring_prep_read(sqe, dev_fd, 0, 0, 0);
+	sqe->opcode		= IORING_OP_URING_CMD;
+	sqe->flags		|= IOSQE_IO_LINK | IOSQE_CQE_SKIP_SUCCESS | IOSQE_FIXED_FILE;
+	sqe->cmd_op		= UBLK_U_IO_REGISTER_IO_BUF;
+
+	cmd->tag		= tag;
+	cmd->addr		= index;
+	cmd->q_id		= q_id;
+}
+
+static inline void io_uring_prep_buf_unregister(struct io_uring_sqe *sqe,
+		int dev_fd, int tag, int q_id, __u64 index)
+{
+	struct ublksrv_io_cmd *cmd = (struct ublksrv_io_cmd *)sqe->cmd;
+
+	io_uring_prep_read(sqe, dev_fd, 0, 0, 0);
+	sqe->opcode             = IORING_OP_URING_CMD;
+	sqe->flags              |= IOSQE_CQE_SKIP_SUCCESS | IOSQE_FIXED_FILE;
+	sqe->cmd_op             = UBLK_U_IO_UNREGISTER_IO_BUF;
+
+	cmd->tag                = tag;
+	cmd->addr               = index;
+	cmd->q_id               = q_id;
+}
+
+static void loop_unregister(const struct ublksrv_queue *q, int tag)
+{
+	struct io_uring_sqe *sqe;
+
+	ublk_get_sqe_pair(q->ring_ptr, &sqe, NULL);
+	io_uring_prep_buf_unregister(sqe, 0, tag, q->q_id, tag);
+}
+
 static void loop_queue_tgt_read(const struct ublksrv_queue *q,
-		const struct ublksrv_io_desc *iod, int tag)
+		const struct ublk_io_data *data, int tag)
 {
+	struct ublk_io_tgt *io = __ublk_get_io_tgt_data(data);
+	const struct ublksrv_io_desc *iod = data->iod;
+	const struct ublksrv_ctrl_dev_info *info =
+		ublksrv_ctrl_get_dev_info(ublksrv_get_ctrl_dev(q->dev));
 	unsigned ublk_op = ublksrv_get_op(iod);
 
-	if (user_copy) {
+	if (info->flags & UBLK_F_SUPPORT_ZERO_COPY) {
+		struct io_uring_sqe *reg;
+		struct io_uring_sqe *read;
+
+		ublk_get_sqe_pair(q->ring_ptr, &reg, &read);
+
+		io_uring_prep_buf_register(reg, 0, tag, q->q_id, tag);
+
+		io_uring_prep_read_fixed(read, 1 /*fds[1]*/,
+			0,
+			iod->nr_sectors << 9,
+			iod->start_sector << 9,
+			tag);
+		io_uring_sqe_set_flags(read, IOSQE_FIXED_FILE);
+		read->user_data = build_user_data(tag, ublk_op, 0, 1);
+		io->needs_unregister = true;
+	} else if (user_copy) {
 		struct io_uring_sqe *sqe, *sqe2;
 		__u64 pos = ublk_pos(q->q_id, tag, 0);
 		void *buf = ublksrv_queue_get_io_buf(q, tag);
@@ -284,11 +342,31 @@ static void loop_queue_tgt_read(const struct ublksrv_queue *q,
 }
 
 static void loop_queue_tgt_write(const struct ublksrv_queue *q,
-		const struct ublksrv_io_desc *iod, int tag)
+		const struct ublk_io_data *data, int tag)
 {
+	const struct ublksrv_io_desc *iod = data->iod;
+	const struct ublksrv_ctrl_dev_info *info =
+		ublksrv_ctrl_get_dev_info(ublksrv_get_ctrl_dev(q->dev));
 	unsigned ublk_op = ublksrv_get_op(iod);
 
-	if (user_copy) {
+	if (info->flags & UBLK_F_SUPPORT_ZERO_COPY) {
+		struct ublk_io_tgt *io = __ublk_get_io_tgt_data(data);
+		struct io_uring_sqe *reg;
+		struct io_uring_sqe *write;
+
+		ublk_get_sqe_pair(q->ring_ptr, &reg, &write);
+		io_uring_prep_buf_register(reg, 0, tag, q->q_id, tag);
+
+		io_uring_prep_write_fixed(write, 1 /*fds[1]*/,
+			0,
+			iod->nr_sectors << 9,
+			iod->start_sector << 9,
+			tag);
+		io_uring_sqe_set_flags(write, IOSQE_FIXED_FILE);
+		write->user_data = build_user_data(tag, ublk_op, 0, 1);
+
+		io->needs_unregister = true;
+	} else if (user_copy) {
 		struct io_uring_sqe *sqe, *sqe2;
 		__u64 pos = ublk_pos(q->q_id, tag, 0);
 		void *buf = ublksrv_queue_get_io_buf(q, tag);
@@ -352,10 +430,10 @@ static int loop_queue_tgt_io(const struct ublksrv_queue *q,
 		sqe->user_data = build_user_data(tag, ublk_op, 0, 1);
 		break;
 	case UBLK_IO_OP_READ:
-		loop_queue_tgt_read(q, iod, tag);
+		loop_queue_tgt_read(q, data, tag);
 		break;
 	case UBLK_IO_OP_WRITE:
-		loop_queue_tgt_write(q, iod, tag);
+		loop_queue_tgt_write(q, data, tag);
 		break;
 	default:
 		return -EINVAL;
@@ -387,6 +465,10 @@ static co_io_job __loop_handle_io_async(const struct ublksrv_queue *q,
 		if (io->tgt_io_cqe->res == -EAGAIN)
 			goto again;
 
+		if (io->needs_unregister) {
+			io->needs_unregister = false;
+			loop_unregister(q, tag);
+		}
 		ublksrv_complete_io(q, tag, io->tgt_io_cqe->res);
 	} else if (ret < 0) {
 		ublk_err( "fail to queue io %d, ret %d\n", tag, tag);
diff --git a/ublksrv_tgt.cpp b/ublksrv_tgt.cpp
index 8f9cf28..f3ebe14 100644
--- a/ublksrv_tgt.cpp
+++ b/ublksrv_tgt.cpp
@@ -723,7 +723,7 @@ static int cmd_dev_add(int argc, char *argv[])
 			data.tgt_type = optarg;
 			break;
 		case 'z':
-			data.flags |= UBLK_F_SUPPORT_ZERO_COPY;
+			data.flags |= UBLK_F_SUPPORT_ZERO_COPY | UBLK_F_USER_COPY;
 			break;
 		case 'q':
 			data.nr_hw_queues = strtol(optarg, NULL, 10);
--

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-25 16:52         ` Keith Busch
@ 2025-02-27  4:16           ` Ming Lei
  0 siblings, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-27  4:16 UTC (permalink / raw)
  To: Keith Busch
  Cc: Pavel Begunkov, Keith Busch, axboe, linux-block, io-uring, bernd,
	csander

On Tue, Feb 25, 2025 at 09:52:09AM -0700, Keith Busch wrote:
> On Tue, Feb 25, 2025 at 04:42:59PM +0000, Pavel Begunkov wrote:
> > On 2/25/25 16:27, Keith Busch wrote:
> > > On Tue, Feb 25, 2025 at 04:19:37PM +0000, Pavel Begunkov wrote:
> > > > On 2/24/25 21:31, Keith Busch wrote:
> > > > > From: Keith Busch <[email protected]>
> > > > > 
> > > > > Provide new operations for the user to request mapping an active request
> > > > > to an io uring instance's buf_table. The user has to provide the index
> > > > > it wants to install the buffer.
> > > > 
> > > > Do we ever fail requests here? I don't see any result propagation.
> > > > E.g. what if the ublk server fail, either being killed or just an
> > > > io_uring request using the buffer failed? Looking at
> > > > __ublk_complete_rq(), shouldn't someone set struct ublk_io::res?
> > > 
> > > If the ublk server is killed, the ublk driver timeout handler will abort
> > > all incomplete requests.
> > > 
> > > If a backend request using this buffer fails, for example -EFAULT, then
> > > the ublk server notifies the ublk driver frontend with that status in a
> > > COMMIT_AND_FETCH command, and the ublk driver completes that frontend
> > > request with an appropriate error status.
> > 
> > I see. IIUC, the API assumes that in normal circumstances you
> > first unregister the buffer, and then issue another command like
> > COMMIT_AND_FETCH to finally complete the ublk request. Is that it?
> 
> Yes, that's the expected good sequence. It's okay if user space does it

That is exactly what the ublk ktests patch loop/zc is doing.

> the other around, too: commit first, then unregister. The registration
> holds a reference on the ublk request, preventing it from completing.

It depends on UBLK_IO_COMMIT_AND_FETCH_REQ is only done once, and luckily
it works in this way from beginning, otherwise the request still may be freed
before unregister command is done.

> The backend urin gthat registered the bvec can also be a different uring
> instance than the frontend that notifies the of the commit-and-fetch. In
> such a setup, the commit and unregister sequence could happen
> concurrently, and that's also okay.

Yes, ublk io commands are always run in the queue context, but
unregister from io-wq still brings some extra latency for io.




Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCHv5 09/11] ublk: zc register/unregister bvec
  2025-02-26 17:10     ` Keith Busch
@ 2025-02-27  4:19       ` Ming Lei
  0 siblings, 0 replies; 51+ messages in thread
From: Ming Lei @ 2025-02-27  4:19 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, asml.silence, axboe, linux-block, io-uring, bernd,
	csander

On Wed, Feb 26, 2025 at 10:10:31AM -0700, Keith Busch wrote:
> On Wed, Feb 26, 2025 at 04:15:39PM +0800, Ming Lei wrote:
> > On Mon, Feb 24, 2025 at 01:31:14PM -0800, Keith Busch wrote:
> > > From: Keith Busch <[email protected]>
> > > 
> > > Provide new operations for the user to request mapping an active request
> > > to an io uring instance's buf_table. The user has to provide the index
> > > it wants to install the buffer.
> > > 
> > > A reference count is taken on the request to ensure it can't be
> > > completed while it is active in a ring's buf_table.
> > > 
> > > Signed-off-by: Keith Busch <[email protected]>
> > > ---
> > 
> > Looks IO_LINK doesn't work, and UNREG_BUF cqe can be received from userspace.
> 
> You can link the register, but should do the unregister with COMMIT
> command on the frontend when the backend is complete. This doesn't need
> the triple SQE requirement.
> 
> I was going to share with the next version, but since you bring it up
> now, here's the reference patch for ublksrv using links:

Forget to reply in this thread, IO_LINK works well in ktests V2 after
fixing one out-of-sqe issue, which is mentioned in the V2 cover-letter.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2025-02-27  4:19 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-24 21:31 [PATCHv5 00/11] ublk zero copy support Keith Busch
2025-02-24 21:31 ` [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu Keith Busch
2025-02-25  8:37   ` Ming Lei
2025-02-25 13:13   ` Pavel Begunkov
2025-02-24 21:31 ` [PATCHv5 02/11] io_uring/nop: reuse req->buf_index Keith Busch
2025-02-24 23:30   ` Jens Axboe
2025-02-25  0:02     ` Keith Busch
2025-02-25  8:43   ` Ming Lei
2025-02-25 13:13   ` Pavel Begunkov
2025-02-24 21:31 ` [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc Keith Busch
2025-02-25  8:44   ` Ming Lei
2025-02-25 13:14   ` Pavel Begunkov
2025-02-24 21:31 ` [PATCHv5 04/11] io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed() Keith Busch
2025-02-25  8:52   ` Ming Lei
2025-02-24 21:31 ` [PATCHv5 05/11] io_uring: combine buffer lookup and import Keith Busch
2025-02-25  8:55   ` Ming Lei
2025-02-24 21:31 ` [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path Keith Busch
2025-02-25  9:26   ` Ming Lei
2025-02-25 13:57   ` Pavel Begunkov
2025-02-25 20:57   ` Caleb Sander Mateos
2025-02-25 21:16     ` Keith Busch
2025-02-24 21:31 ` [PATCHv5 07/11] io_uring: add support for kernel registered bvecs Keith Busch
2025-02-25  9:40   ` Ming Lei
2025-02-25 17:32     ` Keith Busch
2025-02-25 22:47       ` Ming Lei
2025-02-25 22:55         ` Keith Busch
2025-02-25 14:00   ` Pavel Begunkov
2025-02-25 14:05     ` Pavel Begunkov
2025-02-25 20:58   ` Caleb Sander Mateos
2025-02-24 21:31 ` [PATCHv5 08/11] nvme: map uring_cmd data even if address is 0 Keith Busch
2025-02-25  9:41   ` Ming Lei
2025-02-24 21:31 ` [PATCHv5 09/11] ublk: zc register/unregister bvec Keith Busch
2025-02-25 11:00   ` Ming Lei
2025-02-25 16:35     ` Keith Busch
2025-02-25 22:56       ` Ming Lei
2025-02-25 16:19   ` Pavel Begunkov
2025-02-25 16:27     ` Keith Busch
2025-02-25 16:42       ` Pavel Begunkov
2025-02-25 16:52         ` Keith Busch
2025-02-27  4:16           ` Ming Lei
2025-02-25 21:14   ` Caleb Sander Mateos
2025-02-26  8:15   ` Ming Lei
2025-02-26 17:10     ` Keith Busch
2025-02-27  4:19       ` Ming Lei
2025-02-24 21:31 ` [PATCHv5 10/11] io_uring: add abstraction for buf_table rsrc data Keith Busch
2025-02-25 16:04   ` Pavel Begunkov
2025-02-24 21:31 ` [PATCHv5 11/11] io_uring: cache nodes and mapped buffers Keith Busch
2025-02-25 13:11   ` Pavel Begunkov
2025-02-25 14:10 ` [PATCHv5 00/11] ublk zero copy support Pavel Begunkov
2025-02-25 14:47   ` Jens Axboe
2025-02-25 15:07 ` (subset) " Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox