public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/3] Consistently look up fixed buffers before going async
@ 2025-03-21 18:48 Caleb Sander Mateos
  2025-03-21 18:48 ` [PATCH 1/3] io_uring/net: only import send_zc buffer once Caleb Sander Mateos
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Caleb Sander Mateos @ 2025-03-21 18:48 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, Ming Lei, Keith Busch,
	Christoph Hellwig, Sagi Grimberg
  Cc: Xinyu Zhang, io-uring, linux-kernel, linux-nvme,
	Caleb Sander Mateos

To use ublk zero copy, an application submits a sequence of io_uring
operations:
(1) Register a ublk request's buffer into the fixed buffer table
(2) Use the fixed buffer in some I/O operation
(3) Unregister the buffer from the fixed buffer table

The ordering of these operations is critical; if the fixed buffer lookup
occurs before the register or after the unregister operation, the I/O
will fail with EFAULT or even corrupt a different ublk request's buffer.
It is possible to guarantee the correct order by linking the operations,
but that adds overhead and doesn't allow multiple I/O operations to
execute in parallel using the same ublk request's buffer. Ideally, the
application could just submit the register, I/O, and unregister SQEs in
the desired order without links and io_uring would ensure the ordering.
This mostly works, leveraging the fact that each io_uring SQE is prepped
and issued non-blocking in order (barring link, drain, and force-async
flags). But it requires the fixed buffer lookup to occur during the
initial non-blocking issue.

This patch series fixes the 2 gaps where the initial issue can return
EAGAIN before looking up the fixed buffer:
- IORING_OP_SEND_ZC using IORING_RECVSEND_POLL_FIRST
- IORING_OP_URING_CMD, of which NVMe passthru is currently the only
  fixed buffer user. blk_mq_alloc_request() can return EAGAIN before
  io_uring_cmd_import_fixed() is called to look up the fixed buffer.

Caleb Sander Mateos (3):
  io_uring/net: only import send_zc buffer once
  io_uring/net: import send_zc fixed buffer before going async
  io_uring/uring_cmd: import fixed buffer before going async

 drivers/nvme/host/ioctl.c    | 10 ++++------
 include/linux/io_uring/cmd.h |  6 ++----
 io_uring/net.c               | 13 ++++++++-----
 io_uring/rsrc.c              |  6 ++++++
 io_uring/rsrc.h              |  2 ++
 io_uring/uring_cmd.c         | 10 +++++++---
 6 files changed, 29 insertions(+), 18 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] io_uring/net: only import send_zc buffer once
  2025-03-21 18:48 [PATCH 0/3] Consistently look up fixed buffers before going async Caleb Sander Mateos
@ 2025-03-21 18:48 ` Caleb Sander Mateos
  2025-03-21 20:38   ` Pavel Begunkov
  2025-03-21 18:48 ` [PATCH 2/3] io_uring/net: import send_zc fixed buffer before going async Caleb Sander Mateos
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Caleb Sander Mateos @ 2025-03-21 18:48 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, Ming Lei, Keith Busch,
	Christoph Hellwig, Sagi Grimberg
  Cc: Xinyu Zhang, io-uring, linux-kernel, linux-nvme,
	Caleb Sander Mateos

io_send_zc() guards its call to io_send_zc_import() with if (!done_io)
in an attempt to avoid calling it redundantly on the same req. However,
if the initial non-blocking issue returns -EAGAIN, done_io will stay 0.
This causes the subsequent issue to unnecessarily re-import the buffer.

Add an explicit flag "imported" to io_sr_msg to track if its buffer has
already been imported. Clear the flag in io_send_zc_prep(). Call
io_send_zc_import() and set the flag in io_send_zc() if it is unset.

Signed-off-by: Caleb Sander Mateos <[email protected]>
Fixes: 54cdcca05abd ("io_uring/net: switch io_send() and io_send_zc() to using io_async_msghdr")
---
 io_uring/net.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 6d13d378358b..a29893d567b8 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -74,10 +74,11 @@ struct io_sr_msg {
 	unsigned			nr_multishot_loops;
 	u16				flags;
 	/* initialised and used only by !msg send variants */
 	u16				buf_group;
 	bool				retry;
+	bool				imported; /* only for io_send_zc */
 	void __user			*msg_control;
 	/* used only for send zerocopy */
 	struct io_kiocb 		*notif;
 };
 
@@ -1222,10 +1223,11 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_kiocb *notif;
 
 	zc->done_io = 0;
 	zc->retry = false;
+	zc->imported = false;
 	req->flags |= REQ_F_POLL_NO_LAZY;
 
 	if (unlikely(READ_ONCE(sqe->__pad2[0]) || READ_ONCE(sqe->addr3)))
 		return -EINVAL;
 	/* we don't support IOSQE_CQE_SKIP_SUCCESS just yet */
@@ -1369,11 +1371,12 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags)
 
 	if (!(req->flags & REQ_F_POLLED) &&
 	    (zc->flags & IORING_RECVSEND_POLL_FIRST))
 		return -EAGAIN;
 
-	if (!zc->done_io) {
+	if (!zc->imported) {
+		zc->imported = true;
 		ret = io_send_zc_import(req, issue_flags);
 		if (unlikely(ret))
 			return ret;
 	}
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] io_uring/net: import send_zc fixed buffer before going async
  2025-03-21 18:48 [PATCH 0/3] Consistently look up fixed buffers before going async Caleb Sander Mateos
  2025-03-21 18:48 ` [PATCH 1/3] io_uring/net: only import send_zc buffer once Caleb Sander Mateos
@ 2025-03-21 18:48 ` Caleb Sander Mateos
  2025-03-21 18:48 ` [PATCH 3/3] io_uring/uring_cmd: import " Caleb Sander Mateos
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Caleb Sander Mateos @ 2025-03-21 18:48 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, Ming Lei, Keith Busch,
	Christoph Hellwig, Sagi Grimberg
  Cc: Xinyu Zhang, io-uring, linux-kernel, linux-nvme,
	Caleb Sander Mateos

When IORING_OP_SEND_ZC is used with the IORING_RECVSEND_POLL_FIRST flag,
the initial issue will return -EAGAIN to force arming the poll handler.
If the operation is also using fixed buffers, the fixed buffer lookup
does not happen until the subsequent issue. This ordering difference is
observable when using UBLK_U_IO_{,UN}REGISTER_IO_BUF SQEs to modify the
fixed buffer table. If the IORING_OP_SEND_ZC operation is followed
immediately by a UBLK_U_IO_UNREGISTER_IO_BUF that unregisters the fixed
buffer, IORING_RECVSEND_POLL_FIRST will cause the fixed buffer lookup to
fail because it happens after the buffer is unregistered.

Swap the order of the buffer import and IORING_RECVSEND_POLL_FIRST check
to ensure the fixed buffer lookup happens on the initial issue even if
the operation goes async.

Signed-off-by: Caleb Sander Mateos <[email protected]>
Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
---
 io_uring/net.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index a29893d567b8..5adc7b80138e 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1367,21 +1367,21 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(!sock))
 		return -ENOTSOCK;
 	if (!test_bit(SOCK_SUPPORT_ZC, &sock->flags))
 		return -EOPNOTSUPP;
 
-	if (!(req->flags & REQ_F_POLLED) &&
-	    (zc->flags & IORING_RECVSEND_POLL_FIRST))
-		return -EAGAIN;
-
 	if (!zc->imported) {
 		zc->imported = true;
 		ret = io_send_zc_import(req, issue_flags);
 		if (unlikely(ret))
 			return ret;
 	}
 
+	if (!(req->flags & REQ_F_POLLED) &&
+	    (zc->flags & IORING_RECVSEND_POLL_FIRST))
+		return -EAGAIN;
+
 	msg_flags = zc->msg_flags;
 	if (issue_flags & IO_URING_F_NONBLOCK)
 		msg_flags |= MSG_DONTWAIT;
 	if (msg_flags & MSG_WAITALL)
 		min_ret = iov_iter_count(&kmsg->msg.msg_iter);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] io_uring/uring_cmd: import fixed buffer before going async
  2025-03-21 18:48 [PATCH 0/3] Consistently look up fixed buffers before going async Caleb Sander Mateos
  2025-03-21 18:48 ` [PATCH 1/3] io_uring/net: only import send_zc buffer once Caleb Sander Mateos
  2025-03-21 18:48 ` [PATCH 2/3] io_uring/net: import send_zc fixed buffer before going async Caleb Sander Mateos
@ 2025-03-21 18:48 ` Caleb Sander Mateos
  2025-03-21 20:35   ` Pavel Begunkov
  2025-03-21 19:53 ` [PATCH 0/3] Consistently look up fixed buffers " Jens Axboe
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Caleb Sander Mateos @ 2025-03-21 18:48 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, Ming Lei, Keith Busch,
	Christoph Hellwig, Sagi Grimberg
  Cc: Xinyu Zhang, io-uring, linux-kernel, linux-nvme,
	Caleb Sander Mateos

For uring_cmd operations with fixed buffers, the fixed buffer lookup
happens in io_uring_cmd_import_fixed(), called from the ->uring_cmd()
implementation. A ->uring_cmd() implementation could return -EAGAIN on
the initial issue for any reason before io_uring_cmd_import_fixed().
For example, nvme_uring_cmd_io() calls nvme_alloc_user_request() first,
which can return -EAGAIN if all tags in the tag set are in use.
This ordering difference is observable when using
UBLK_U_IO_{,UN}REGISTER_IO_BUF SQEs to modify the fixed buffer table.
If the uring_cmd is followed by a UBLK_U_IO_UNREGISTER_IO_BUF operation
that unregisters the fixed buffer, the uring_cmd going async will cause
the fixed buffer lookup to fail because it happens after the unregister.

Move the fixed buffer lookup out of io_uring_cmd_import_fixed() and
instead perform it in io_uring_cmd() before calling ->uring_cmd().
io_uring_cmd_import_fixed() now only initializes an iov_iter from the
existing fixed buffer node. This division of responsibilities makes
sense as the fixed buffer lookup is an io_uring implementation detail
and independent of the ->uring_cmd() implementation. It also cuts down
on the need to pass around the io_uring issue_flags.

Signed-off-by: Caleb Sander Mateos <[email protected]>
Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
---
 drivers/nvme/host/ioctl.c    | 10 ++++------
 include/linux/io_uring/cmd.h |  6 ++----
 io_uring/rsrc.c              |  6 ++++++
 io_uring/rsrc.h              |  2 ++
 io_uring/uring_cmd.c         | 10 +++++++---
 5 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index fe9fb80c6a14..3fad74563b9e 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -112,12 +112,11 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
 	return req;
 }
 
 static int nvme_map_user_request(struct request *req, u64 ubuffer,
 		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
-		struct io_uring_cmd *ioucmd, unsigned int flags,
-		unsigned int iou_issue_flags)
+		struct io_uring_cmd *ioucmd, unsigned int flags)
 {
 	struct request_queue *q = req->q;
 	struct nvme_ns *ns = q->queuedata;
 	struct block_device *bdev = ns ? ns->disk->part0 : NULL;
 	bool supports_metadata = bdev && blk_get_integrity(bdev->bd_disk);
@@ -141,12 +140,11 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
 
 		/* fixedbufs is only for non-vectored io */
 		if (WARN_ON_ONCE(flags & NVME_IOCTL_VEC))
 			return -EINVAL;
 		ret = io_uring_cmd_import_fixed(ubuffer, bufflen,
-				rq_data_dir(req), &iter, ioucmd,
-				iou_issue_flags);
+				rq_data_dir(req), &iter, ioucmd);
 		if (ret < 0)
 			goto out;
 		ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL);
 	} else {
 		ret = blk_rq_map_user_io(req, NULL, nvme_to_user_ptr(ubuffer),
@@ -194,11 +192,11 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 		return PTR_ERR(req);
 
 	req->timeout = timeout;
 	if (ubuffer && bufflen) {
 		ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer,
-				meta_len, NULL, flags, 0);
+				meta_len, NULL, flags);
 		if (ret)
 			return ret;
 	}
 
 	bio = req->bio;
@@ -514,11 +512,11 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0;
 
 	if (d.data_len) {
 		ret = nvme_map_user_request(req, d.addr,
 			d.data_len, nvme_to_user_ptr(d.metadata),
-			d.metadata_len, ioucmd, vec, issue_flags);
+			d.metadata_len, ioucmd, vec);
 		if (ret)
 			return ret;
 	}
 
 	/* to free bio on completion, as req->bio will be null at that time */
diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h
index 598cacda4aa3..ea243bfab2a8 100644
--- a/include/linux/io_uring/cmd.h
+++ b/include/linux/io_uring/cmd.h
@@ -39,12 +39,11 @@ static inline void io_uring_cmd_private_sz_check(size_t cmd_sz)
 )
 
 #if defined(CONFIG_IO_URING)
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      struct iov_iter *iter,
-			      struct io_uring_cmd *ioucmd,
-			      unsigned int issue_flags);
+			      struct io_uring_cmd *ioucmd);
 
 /*
  * Completes the request, i.e. posts an io_uring CQE and deallocates @ioucmd
  * and the corresponding io_uring request.
  *
@@ -69,12 +68,11 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
 void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd);
 
 #else
 static inline int
 io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
-			  struct iov_iter *iter, struct io_uring_cmd *ioucmd,
-			  unsigned int issue_flags)
+			  struct iov_iter *iter, struct io_uring_cmd *ioucmd)
 {
 	return -EOPNOTSUPP;
 }
 static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret,
 		u64 ret2, unsigned issue_flags)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 5fff6ba2b7c0..ad0dfe51acb1 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1099,10 +1099,16 @@ int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
 	if (!node)
 		return -EFAULT;
 	return io_import_fixed(ddir, iter, node->buf, buf_addr, len);
 }
 
+int io_import_buf_node(struct io_kiocb *req, struct iov_iter *iter,
+			u64 buf_addr, size_t len, int ddir)
+{
+	return io_import_fixed(ddir, iter, req->buf_node->buf, buf_addr, len);
+}
+
 /* Lock two rings at once. The rings must be different! */
 static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2)
 {
 	if (ctx1 > ctx2)
 		swap(ctx1, ctx2);
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index f10a1252b3e9..bc0f8f0a2054 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -59,10 +59,12 @@ int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr);
 struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req,
 				      unsigned issue_flags);
 int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
 			u64 buf_addr, size_t len, int ddir,
 			unsigned issue_flags);
+int io_import_buf_node(struct io_kiocb *req, struct iov_iter *iter,
+			u64 buf_addr, size_t len, int ddir);
 
 int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
 int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 			    unsigned int nr_args, u64 __user *tags);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index de39b602aa82..15a76fe48fe5 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -232,10 +232,15 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 			return -EOPNOTSUPP;
 		issue_flags |= IO_URING_F_IOPOLL;
 		req->iopoll_completed = 0;
 	}
 
+	if (ioucmd->flags & IORING_URING_CMD_FIXED) {
+		if (!io_find_buf_node(req, issue_flags))
+			return -EFAULT;
+	}
+
 	ret = file->f_op->uring_cmd(ioucmd, issue_flags);
 	if (ret == -EAGAIN || ret == -EIOCBQUEUED)
 		return ret;
 	if (ret < 0)
 		req_set_fail(req);
@@ -244,16 +249,15 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 	return IOU_OK;
 }
 
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      struct iov_iter *iter,
-			      struct io_uring_cmd *ioucmd,
-			      unsigned int issue_flags)
+			      struct io_uring_cmd *ioucmd)
 {
 	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
 
-	return io_import_reg_buf(req, iter, ubuf, len, rw, issue_flags);
+	return io_import_buf_node(req, iter, ubuf, len, rw);
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
 
 void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd)
 {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Consistently look up fixed buffers before going async
  2025-03-21 18:48 [PATCH 0/3] Consistently look up fixed buffers before going async Caleb Sander Mateos
                   ` (2 preceding siblings ...)
  2025-03-21 18:48 ` [PATCH 3/3] io_uring/uring_cmd: import " Caleb Sander Mateos
@ 2025-03-21 19:53 ` Jens Axboe
  2025-03-21 20:24 ` Pavel Begunkov
  2025-03-22  7:33 ` Ming Lei
  5 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2025-03-21 19:53 UTC (permalink / raw)
  To: Pavel Begunkov, Ming Lei, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Caleb Sander Mateos
  Cc: Xinyu Zhang, io-uring, linux-kernel, linux-nvme


On Fri, 21 Mar 2025 12:48:16 -0600, Caleb Sander Mateos wrote:
> To use ublk zero copy, an application submits a sequence of io_uring
> operations:
> (1) Register a ublk request's buffer into the fixed buffer table
> (2) Use the fixed buffer in some I/O operation
> (3) Unregister the buffer from the fixed buffer table
> 
> The ordering of these operations is critical; if the fixed buffer lookup
> occurs before the register or after the unregister operation, the I/O
> will fail with EFAULT or even corrupt a different ublk request's buffer.
> It is possible to guarantee the correct order by linking the operations,
> but that adds overhead and doesn't allow multiple I/O operations to
> execute in parallel using the same ublk request's buffer. Ideally, the
> application could just submit the register, I/O, and unregister SQEs in
> the desired order without links and io_uring would ensure the ordering.
> This mostly works, leveraging the fact that each io_uring SQE is prepped
> and issued non-blocking in order (barring link, drain, and force-async
> flags). But it requires the fixed buffer lookup to occur during the
> initial non-blocking issue.
> 
> [...]

Applied, thanks!

[1/3] io_uring/net: only import send_zc buffer once
      commit: 8e3100fcc5cbba03518b8b5c059624aba5c29d50
[2/3] io_uring/net: import send_zc fixed buffer before going async
      commit: 15f4c96bec5d0791904ee68c0f83ba18cab7466d
[3/3] io_uring/uring_cmd: import fixed buffer before going async
      commit: 70085217bec1eb8bbd19e661da9f1734ed8d35ca

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Consistently look up fixed buffers before going async
  2025-03-21 18:48 [PATCH 0/3] Consistently look up fixed buffers before going async Caleb Sander Mateos
                   ` (3 preceding siblings ...)
  2025-03-21 19:53 ` [PATCH 0/3] Consistently look up fixed buffers " Jens Axboe
@ 2025-03-21 20:24 ` Pavel Begunkov
  2025-03-21 21:24   ` Caleb Sander Mateos
  2025-03-22  7:42   ` Ming Lei
  2025-03-22  7:33 ` Ming Lei
  5 siblings, 2 replies; 15+ messages in thread
From: Pavel Begunkov @ 2025-03-21 20:24 UTC (permalink / raw)
  To: Caleb Sander Mateos, Jens Axboe, Ming Lei, Keith Busch,
	Christoph Hellwig, Sagi Grimberg
  Cc: Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On 3/21/25 18:48, Caleb Sander Mateos wrote:
> To use ublk zero copy, an application submits a sequence of io_uring
> operations:
> (1) Register a ublk request's buffer into the fixed buffer table
> (2) Use the fixed buffer in some I/O operation
> (3) Unregister the buffer from the fixed buffer table
> 
> The ordering of these operations is critical; if the fixed buffer lookup
> occurs before the register or after the unregister operation, the I/O
> will fail with EFAULT or even corrupt a different ublk request's buffer.
> It is possible to guarantee the correct order by linking the operations,
> but that adds overhead and doesn't allow multiple I/O operations to
> execute in parallel using the same ublk request's buffer. Ideally, the
> application could just submit the register, I/O, and unregister SQEs in
> the desired order without links and io_uring would ensure the ordering.
> This mostly works, leveraging the fact that each io_uring SQE is prepped
> and issued non-blocking in order (barring link, drain, and force-async
> flags). But it requires the fixed buffer lookup to occur during the
> initial non-blocking issue.

In other words, leveraging internal details that is not a part
of the uapi, should never be relied upon by the user and is fragile.
Any drain request or IOSQE_ASYNC and it'll break, or for any reason
why it might be desirable to change the behaviour in the future.

Sorry, but no, we absolutely can't have that, it'll be an absolute
nightmare to maintain as basically every request scheduling decision
now becomes a part of the uapi.

There is an api to order requests, if you want to order them you
either have to use that or do it in user space. In your particular
case you can try to opportunistically issue them without ordering
by making sure the reg buffer slot is not reused in the meantime
and handling request failures.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] io_uring/uring_cmd: import fixed buffer before going async
  2025-03-21 18:48 ` [PATCH 3/3] io_uring/uring_cmd: import " Caleb Sander Mateos
@ 2025-03-21 20:35   ` Pavel Begunkov
  2025-03-21 21:38     ` Caleb Sander Mateos
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2025-03-21 20:35 UTC (permalink / raw)
  To: Caleb Sander Mateos, Jens Axboe, Ming Lei, Keith Busch,
	Christoph Hellwig, Sagi Grimberg
  Cc: Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On 3/21/25 18:48, Caleb Sander Mateos wrote:
> For uring_cmd operations with fixed buffers, the fixed buffer lookup
> happens in io_uring_cmd_import_fixed(), called from the ->uring_cmd()
> implementation. A ->uring_cmd() implementation could return -EAGAIN on
> the initial issue for any reason before io_uring_cmd_import_fixed().
> For example, nvme_uring_cmd_io() calls nvme_alloc_user_request() first,
> which can return -EAGAIN if all tags in the tag set are in use.

That's up to command when it resolves the buffer, you can just
move the call to io_import_reg_buf() earlier in nvme cmd code
and not working it around at the io_uring side.

In general, it's a step back, it just got cleaned up from the
mess where node resolution and buffer imports were separate
steps and duplicate by every single request type that used it.

> This ordering difference is observable when using
> UBLK_U_IO_{,UN}REGISTER_IO_BUF SQEs to modify the fixed buffer table.
> If the uring_cmd is followed by a UBLK_U_IO_UNREGISTER_IO_BUF operation
> that unregisters the fixed buffer, the uring_cmd going async will cause
> the fixed buffer lookup to fail because it happens after the unregister.
> 
> Move the fixed buffer lookup out of io_uring_cmd_import_fixed() and
> instead perform it in io_uring_cmd() before calling ->uring_cmd().
> io_uring_cmd_import_fixed() now only initializes an iov_iter from the
> existing fixed buffer node. This division of responsibilities makes
> sense as the fixed buffer lookup is an io_uring implementation detail
> and independent of the ->uring_cmd() implementation. It also cuts down
> on the need to pass around the io_uring issue_flags.
> 
> Signed-off-by: Caleb Sander Mateos <[email protected]>
> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
> ---
>   drivers/nvme/host/ioctl.c    | 10 ++++------
>   include/linux/io_uring/cmd.h |  6 ++----
>   io_uring/rsrc.c              |  6 ++++++
>   io_uring/rsrc.h              |  2 ++
>   io_uring/uring_cmd.c         | 10 +++++++---
>   5 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
> index fe9fb80c6a14..3fad74563b9e 100644
> --- a/drivers/nvme/host/ioctl.c
> +++ b/drivers/nvme/host/ioctl.c
> @@ -112,12 +112,11 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] io_uring/net: only import send_zc buffer once
  2025-03-21 18:48 ` [PATCH 1/3] io_uring/net: only import send_zc buffer once Caleb Sander Mateos
@ 2025-03-21 20:38   ` Pavel Begunkov
  2025-03-21 20:44     ` Caleb Sander Mateos
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2025-03-21 20:38 UTC (permalink / raw)
  To: Caleb Sander Mateos, Jens Axboe, Ming Lei, Keith Busch,
	Christoph Hellwig, Sagi Grimberg
  Cc: Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On 3/21/25 18:48, Caleb Sander Mateos wrote:
> io_send_zc() guards its call to io_send_zc_import() with if (!done_io)
> in an attempt to avoid calling it redundantly on the same req. However,
> if the initial non-blocking issue returns -EAGAIN, done_io will stay 0.
> This causes the subsequent issue to unnecessarily re-import the buffer.
> 
> Add an explicit flag "imported" to io_sr_msg to track if its buffer has
> already been imported. Clear the flag in io_send_zc_prep(). Call
> io_send_zc_import() and set the flag in io_send_zc() if it is unset.

lgtm. Maybe there is a way to put it into req->flags and combine
with REQ_F_IMPORT_BUFFER, but likely just an idea for the future.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] io_uring/net: only import send_zc buffer once
  2025-03-21 20:38   ` Pavel Begunkov
@ 2025-03-21 20:44     ` Caleb Sander Mateos
  0 siblings, 0 replies; 15+ messages in thread
From: Caleb Sander Mateos @ 2025-03-21 20:44 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Ming Lei, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On Fri, Mar 21, 2025 at 1:37 PM Pavel Begunkov <[email protected]> wrote:
>
> On 3/21/25 18:48, Caleb Sander Mateos wrote:
> > io_send_zc() guards its call to io_send_zc_import() with if (!done_io)
> > in an attempt to avoid calling it redundantly on the same req. However,
> > if the initial non-blocking issue returns -EAGAIN, done_io will stay 0.
> > This causes the subsequent issue to unnecessarily re-import the buffer.
> >
> > Add an explicit flag "imported" to io_sr_msg to track if its buffer has
> > already been imported. Clear the flag in io_send_zc_prep(). Call
> > io_send_zc_import() and set the flag in io_send_zc() if it is unset.
>
> lgtm. Maybe there is a way to put it into req->flags and combine
> with REQ_F_IMPORT_BUFFER, but likely just an idea for the future.

Yes, I considered making it a bitflag. But since there was an existing
hole in io_sr_msg, I figured it wasn't worth optimizing. Certainly if
we want to shrink io_sr_msg in the future, this is low-hanging fruit.
I'm a bit hesitant to reserve a bit in the generic req->flags that's
only used for one opcode.

Best,
Caleb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Consistently look up fixed buffers before going async
  2025-03-21 20:24 ` Pavel Begunkov
@ 2025-03-21 21:24   ` Caleb Sander Mateos
  2025-03-22 12:33     ` Pavel Begunkov
  2025-03-22  7:42   ` Ming Lei
  1 sibling, 1 reply; 15+ messages in thread
From: Caleb Sander Mateos @ 2025-03-21 21:24 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Ming Lei, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On Fri, Mar 21, 2025 at 1:23 PM Pavel Begunkov <[email protected]> wrote:
>
> On 3/21/25 18:48, Caleb Sander Mateos wrote:
> > To use ublk zero copy, an application submits a sequence of io_uring
> > operations:
> > (1) Register a ublk request's buffer into the fixed buffer table
> > (2) Use the fixed buffer in some I/O operation
> > (3) Unregister the buffer from the fixed buffer table
> >
> > The ordering of these operations is critical; if the fixed buffer lookup
> > occurs before the register or after the unregister operation, the I/O
> > will fail with EFAULT or even corrupt a different ublk request's buffer.
> > It is possible to guarantee the correct order by linking the operations,
> > but that adds overhead and doesn't allow multiple I/O operations to
> > execute in parallel using the same ublk request's buffer. Ideally, the
> > application could just submit the register, I/O, and unregister SQEs in
> > the desired order without links and io_uring would ensure the ordering.
> > This mostly works, leveraging the fact that each io_uring SQE is prepped
> > and issued non-blocking in order (barring link, drain, and force-async
> > flags). But it requires the fixed buffer lookup to occur during the
> > initial non-blocking issue.
>
> In other words, leveraging internal details that is not a part
> of the uapi, should never be relied upon by the user and is fragile.
> Any drain request or IOSQE_ASYNC and it'll break, or for any reason
> why it might be desirable to change the behaviour in the future.
>
> Sorry, but no, we absolutely can't have that, it'll be an absolute
> nightmare to maintain as basically every request scheduling decision
> now becomes a part of the uapi.

I thought we discussed this on the ublk zero copy patchset, but I
can't seem to find the email. My recollection is that Jens thought it
was reasonable for userspace to rely on the sequential prep + issue of
each SQE as long as it's not setting any of these flags that affect
their order. (Please correct me if that's not what you remember.)
I don't have a strong opinion about whether or not io_uring should
provide this guarantee, but I was under the impression this had
already been decided. I was just trying to fix the few gaps in this
guarantee, but I'm fine dropping the patches if Jens also feels
userspace shouldn't rely on this io_uring behavior.

>
> There is an api to order requests, if you want to order them you
> either have to use that or do it in user space. In your particular
> case you can try to opportunistically issue them without ordering
> by making sure the reg buffer slot is not reused in the meantime
> and handling request failures.

Yes, I am aware of the other options. Unfortunately, io_uring's linked
operation interface isn't rich enough to express an arbitrary
dependency graph. We have multiple I/O operations operating on the
same ublk request's buffer, so we would either need to link the I/O
operations (which would prevent them from executing in parallel), or
use a separate register/unregister operation for every I/O operation
(which has considerable overhead). We can also wait for the completion
of the I/O operations before submitting the unregister operation, but
that adds latency to the ublk request and requires another
io_uring_enter syscall.

We are using separate registered buffer indices for each ublk request
so at least this scenario doesn't lead to data corruption. And we can
certainly handle the EFAULT when the operation goes asynchronous, but
it would be preferable not to need to do that.

Best,
Caleb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] io_uring/uring_cmd: import fixed buffer before going async
  2025-03-21 20:35   ` Pavel Begunkov
@ 2025-03-21 21:38     ` Caleb Sander Mateos
  2025-03-22 12:18       ` Pavel Begunkov
  0 siblings, 1 reply; 15+ messages in thread
From: Caleb Sander Mateos @ 2025-03-21 21:38 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, Ming Lei, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On Fri, Mar 21, 2025 at 1:34 PM Pavel Begunkov <[email protected]> wrote:
>
> On 3/21/25 18:48, Caleb Sander Mateos wrote:
> > For uring_cmd operations with fixed buffers, the fixed buffer lookup
> > happens in io_uring_cmd_import_fixed(), called from the ->uring_cmd()
> > implementation. A ->uring_cmd() implementation could return -EAGAIN on
> > the initial issue for any reason before io_uring_cmd_import_fixed().
> > For example, nvme_uring_cmd_io() calls nvme_alloc_user_request() first,
> > which can return -EAGAIN if all tags in the tag set are in use.
>
> That's up to command when it resolves the buffer, you can just
> move the call to io_import_reg_buf() earlier in nvme cmd code
> and not working it around at the io_uring side.
>
> In general, it's a step back, it just got cleaned up from the
> mess where node resolution and buffer imports were separate
> steps and duplicate by every single request type that used it.

Yes, I considered just reordering the steps in nvme_uring_cmd_io().
But it seems easy for a future change to accidentally introduce
another point where the issue can go async before it has looked up the
fixed buffer. And I am imagining there will be more uring_cmd fixed
buffer users added (e.g. btrfs). This seems like a generic problem
rather than something specific to NVMe passthru.
My other feeling is that the fixed buffer lookup is an io_uring-layer
detail, whereas the use of the buffer is more a concern of the
->uring_cmd() implementation. If only the opcodes were consistent
about how a fixed buffer is requested, we could do the lookup in the
generic io_uring code like fixed files already do.
But I'm open to implementing a different fix here if Jens would prefer.

Best,
Caleb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Consistently look up fixed buffers before going async
  2025-03-21 18:48 [PATCH 0/3] Consistently look up fixed buffers before going async Caleb Sander Mateos
                   ` (4 preceding siblings ...)
  2025-03-21 20:24 ` Pavel Begunkov
@ 2025-03-22  7:33 ` Ming Lei
  5 siblings, 0 replies; 15+ messages in thread
From: Ming Lei @ 2025-03-22  7:33 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Jens Axboe, Pavel Begunkov, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On Fri, Mar 21, 2025 at 12:48:16PM -0600, Caleb Sander Mateos wrote:
> To use ublk zero copy, an application submits a sequence of io_uring
> operations:
> (1) Register a ublk request's buffer into the fixed buffer table
> (2) Use the fixed buffer in some I/O operation
> (3) Unregister the buffer from the fixed buffer table
> 
> The ordering of these operations is critical; if the fixed buffer lookup
> occurs before the register or after the unregister operation, the I/O
> will fail with EFAULT or even corrupt a different ublk request's buffer.
> It is possible to guarantee the correct order by linking the operations,
> but that adds overhead and doesn't allow multiple I/O operations to
> execute in parallel using the same ublk request's buffer. Ideally, the
> application could just submit the register, I/O, and unregister SQEs in
> the desired order without links and io_uring would ensure the ordering.

So far there are only two ways to provide the order guarantee in io_uring
syscall viewpoint:

1) IOSQE_IO_LINK

2) submit register_buffer operation and wait its completion, then submit IO
operations

Otherwise, you may just depend on the implementation, and there isn't such
order guarantee, and it is hard to write generic io_uring application.

I posted sqe group patchset for addressing this particular requirement in
API level.

https://lore.kernel.org/linux-block/[email protected]/

Now I'd suggest to re-consider this approach for respecting the order
in API level, so both application and io_uring needn't play trick for
addressing this real problem.

With sqe group, just two OPs are needed:

- provide_buffer OP(group leader)

- other generic OPs(group members)

group leader won't be completed until all group member OPs are done.

The whole group share same IO_LINK/IO_HARDLINK flag.

That is all the concept, and this approach takes less SQEs, and application
will become simpler too.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Consistently look up fixed buffers before going async
  2025-03-21 20:24 ` Pavel Begunkov
  2025-03-21 21:24   ` Caleb Sander Mateos
@ 2025-03-22  7:42   ` Ming Lei
  1 sibling, 0 replies; 15+ messages in thread
From: Ming Lei @ 2025-03-22  7:42 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Caleb Sander Mateos, Jens Axboe, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On Fri, Mar 21, 2025 at 08:24:43PM +0000, Pavel Begunkov wrote:
> On 3/21/25 18:48, Caleb Sander Mateos wrote:
> > To use ublk zero copy, an application submits a sequence of io_uring
> > operations:
> > (1) Register a ublk request's buffer into the fixed buffer table
> > (2) Use the fixed buffer in some I/O operation
> > (3) Unregister the buffer from the fixed buffer table
> > 
> > The ordering of these operations is critical; if the fixed buffer lookup
> > occurs before the register or after the unregister operation, the I/O
> > will fail with EFAULT or even corrupt a different ublk request's buffer.
> > It is possible to guarantee the correct order by linking the operations,
> > but that adds overhead and doesn't allow multiple I/O operations to
> > execute in parallel using the same ublk request's buffer. Ideally, the
> > application could just submit the register, I/O, and unregister SQEs in
> > the desired order without links and io_uring would ensure the ordering.
> > This mostly works, leveraging the fact that each io_uring SQE is prepped
> > and issued non-blocking in order (barring link, drain, and force-async
> > flags). But it requires the fixed buffer lookup to occur during the
> > initial non-blocking issue.
> 
> In other words, leveraging internal details that is not a part
> of the uapi, should never be relied upon by the user and is fragile.
> Any drain request or IOSQE_ASYNC and it'll break, or for any reason
> why it might be desirable to change the behaviour in the future.
> 
> Sorry, but no, we absolutely can't have that, it'll be an absolute
> nightmare to maintain as basically every request scheduling decision
> now becomes a part of the uapi.
> 
> There is an api to order requests, if you want to order them you
> either have to use that or do it in user space. In your particular
> case you can try to opportunistically issue them without ordering
> by making sure the reg buffer slot is not reused in the meantime
> and handling request failures.

I agree, the order should be provided from UAPI/syscall level.

SQE group does address this order issue, and now it can work with
fixed buffer registering OP together.

If no one objects, I will post out the patch for review.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] io_uring/uring_cmd: import fixed buffer before going async
  2025-03-21 21:38     ` Caleb Sander Mateos
@ 2025-03-22 12:18       ` Pavel Begunkov
  0 siblings, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2025-03-22 12:18 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Jens Axboe, Ming Lei, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On 3/21/25 21:38, Caleb Sander Mateos wrote:
> On Fri, Mar 21, 2025 at 1:34 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 3/21/25 18:48, Caleb Sander Mateos wrote:
>>> For uring_cmd operations with fixed buffers, the fixed buffer lookup
>>> happens in io_uring_cmd_import_fixed(), called from the ->uring_cmd()
>>> implementation. A ->uring_cmd() implementation could return -EAGAIN on
>>> the initial issue for any reason before io_uring_cmd_import_fixed().
>>> For example, nvme_uring_cmd_io() calls nvme_alloc_user_request() first,
>>> which can return -EAGAIN if all tags in the tag set are in use.
>>
>> That's up to command when it resolves the buffer, you can just
>> move the call to io_import_reg_buf() earlier in nvme cmd code
>> and not working it around at the io_uring side.
>>
>> In general, it's a step back, it just got cleaned up from the
>> mess where node resolution and buffer imports were separate
>> steps and duplicate by every single request type that used it.
> 
> Yes, I considered just reordering the steps in nvme_uring_cmd_io().
> But it seems easy for a future change to accidentally introduce
> another point where the issue can go async before it has looked up the
> fixed buffer. And I am imagining there will be more uring_cmd fixed
> buffer users added (e.g. btrfs). This seems like a generic problem
> rather than something specific to NVMe passthru.

That's working around the api for ordering requests, that's the
reason you have an ordering problem.

> My other feeling is that the fixed buffer lookup is an io_uring-layer
> detail, whereas the use of the buffer is more a concern of the
> ->uring_cmd() implementation. If only the opcodes were consistent
> about how a fixed buffer is requested, we could do the lookup in the
> generic io_uring code like fixed files already do.

That's one of things I'd hope was done better, and not only
specifically for registered buffers, but it's late for that.

> But I'm open to implementing a different fix here if Jens would prefer.

It's not a fix, the behaviour is well within the constrained
of io_uring.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Consistently look up fixed buffers before going async
  2025-03-21 21:24   ` Caleb Sander Mateos
@ 2025-03-22 12:33     ` Pavel Begunkov
  0 siblings, 0 replies; 15+ messages in thread
From: Pavel Begunkov @ 2025-03-22 12:33 UTC (permalink / raw)
  To: Caleb Sander Mateos
  Cc: Jens Axboe, Ming Lei, Keith Busch, Christoph Hellwig,
	Sagi Grimberg, Xinyu Zhang, io-uring, linux-kernel, linux-nvme

On 3/21/25 21:24, Caleb Sander Mateos wrote:
> On Fri, Mar 21, 2025 at 1:23 PM Pavel Begunkov <[email protected]> wrote:
>>
>> On 3/21/25 18:48, Caleb Sander Mateos wrote:
>>> To use ublk zero copy, an application submits a sequence of io_uring
>>> operations:
>>> (1) Register a ublk request's buffer into the fixed buffer table
>>> (2) Use the fixed buffer in some I/O operation
>>> (3) Unregister the buffer from the fixed buffer table
>>>
>>> The ordering of these operations is critical; if the fixed buffer lookup
>>> occurs before the register or after the unregister operation, the I/O
>>> will fail with EFAULT or even corrupt a different ublk request's buffer.
>>> It is possible to guarantee the correct order by linking the operations,
>>> but that adds overhead and doesn't allow multiple I/O operations to
>>> execute in parallel using the same ublk request's buffer. Ideally, the
>>> application could just submit the register, I/O, and unregister SQEs in
>>> the desired order without links and io_uring would ensure the ordering.
>>> This mostly works, leveraging the fact that each io_uring SQE is prepped
>>> and issued non-blocking in order (barring link, drain, and force-async
>>> flags). But it requires the fixed buffer lookup to occur during the
>>> initial non-blocking issue.
>>
>> In other words, leveraging internal details that is not a part
>> of the uapi, should never be relied upon by the user and is fragile.
>> Any drain request or IOSQE_ASYNC and it'll break, or for any reason
>> why it might be desirable to change the behaviour in the future.
>>
>> Sorry, but no, we absolutely can't have that, it'll be an absolute
>> nightmare to maintain as basically every request scheduling decision
>> now becomes a part of the uapi.
> 
> I thought we discussed this on the ublk zero copy patchset, but I
> can't seem to find the email. My recollection is that Jens thought it
> was reasonable for userspace to rely on the sequential prep + issue of
> each SQE as long as it's not setting any of these flags that affect
> their order. (Please correct me if that's not what you remember.)

Well, my opinions are my own. I think it's reasonable to assume that
for optimisation purposes IFF the user space can sanely handle
errors when the assumption fails.

In your case, the user space should expect that an unregistration
op can happen before a read/write had resolved the buffer (node), in
which case the rw request will find that the buffer slot is empty
and fail. And that should be handled in the user space, e.g.
by reissuing the rw request of failing.

> I don't have a strong opinion about whether or not io_uring should
> provide this guarantee, but I was under the impression this had
> already been decided. I was just trying to fix the few gaps in this

I don't think so, it's a major uapi change, and a huge burden
for many future core io_uring changes.

> guarantee, but I'm fine dropping the patches if Jens also feels
> userspace shouldn't rely on this io_uring behavior.
> 
>>
>> There is an api to order requests, if you want to order them you
>> either have to use that or do it in user space. In your particular
>> case you can try to opportunistically issue them without ordering
>> by making sure the reg buffer slot is not reused in the meantime
>> and handling request failures.
> 
> Yes, I am aware of the other options. Unfortunately, io_uring's linked
> operation interface isn't rich enough to express an arbitrary
> dependency graph. We have multiple I/O operations operating on the
> same ublk request's buffer, so we would either need to link the I/O
> operations (which would prevent them from executing in parallel), or
> use a separate register/unregister operation for every I/O operation
> (which has considerable overhead). We can also wait for the completion
> of the I/O operations before submitting the unregister operation, but
> that adds latency to the ublk request and requires another
> io_uring_enter syscall.
> 
> We are using separate registered buffer indices for each ublk request
> so at least this scenario doesn't lead to data corruption. And we can
> certainly handle the EFAULT when the operation goes asynchronous, but
> it would be preferable not to need to do that.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-03-22 12:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-21 18:48 [PATCH 0/3] Consistently look up fixed buffers before going async Caleb Sander Mateos
2025-03-21 18:48 ` [PATCH 1/3] io_uring/net: only import send_zc buffer once Caleb Sander Mateos
2025-03-21 20:38   ` Pavel Begunkov
2025-03-21 20:44     ` Caleb Sander Mateos
2025-03-21 18:48 ` [PATCH 2/3] io_uring/net: import send_zc fixed buffer before going async Caleb Sander Mateos
2025-03-21 18:48 ` [PATCH 3/3] io_uring/uring_cmd: import " Caleb Sander Mateos
2025-03-21 20:35   ` Pavel Begunkov
2025-03-21 21:38     ` Caleb Sander Mateos
2025-03-22 12:18       ` Pavel Begunkov
2025-03-21 19:53 ` [PATCH 0/3] Consistently look up fixed buffers " Jens Axboe
2025-03-21 20:24 ` Pavel Begunkov
2025-03-21 21:24   ` Caleb Sander Mateos
2025-03-22 12:33     ` Pavel Begunkov
2025-03-22  7:42   ` Ming Lei
2025-03-22  7:33 ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox