public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH for-next 0/4] fixed-buffer for uring-cmd/passthrough
       [not found] <CGME20220819104031epcas5p3d485526e1b2b42078ccce7e40a74b7f5@epcas5p3.samsung.com>
@ 2022-08-19 10:30 ` Kanchan Joshi
       [not found]   ` <CGME20220819104036epcas5p2bb4d9b2cccbdfcdb460e085abe7fd1a8@epcas5p2.samsung.com>
                     ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev,
	Kanchan Joshi

Hi,

Currently uring-cmd lacks the ability to leverage the pre-registered
buffers. This series adds new fixed-buffer variant of uring command
IORING_OP_URING_CMD_FIXED, and plumbs nvme passthrough to work with
that.

Patch 1, 3 = prep/infrastructure
Patch 2 = expand io_uring command to use registered-buffers
Patch 4 = expand nvme passthrough to use registered-buffers

Using registered-buffers showed 5-12% IOPS gain in my setup.
QD   Without     With
8     853        928
32    1370       1528
128   1505       1631

This series is prepared on top of:
for-next + iopoll-passthru series [1] + passthru optimization series [2].
A unified branch with all that is present here:
https://github.com/OpenMPDK/linux/commits/feat/pt_fixedbufs_v1

Fio that can use IORING_OP_URING_CMD_FIXED (on specifying fixedbufs=1)
is here -
https://github.com/joshkan/fio/commit/300f1187f75aaf2c502c180041943c340670d0ac

[1] https://lore.kernel.org/linux-block/[email protected]/
[2] https://lore.kernel.org/linux-block/[email protected]/

Anuj Gupta (2):
  io_uring: introduce io_uring_cmd_import_fixed
  io_uring: introduce fixed buffer support for io_uring_cmd

Kanchan Joshi (2):
  block: add helper to map bvec iterator for passthrough
  nvme: wire up fixed buffer support for nvme passthrough

 block/blk-map.c               | 71 +++++++++++++++++++++++++++++++++++
 drivers/nvme/host/ioctl.c     | 38 +++++++++++++------
 include/linux/blk-mq.h        |  1 +
 include/linux/io_uring.h      | 10 +++++
 include/uapi/linux/io_uring.h |  1 +
 io_uring/opdef.c              | 10 +++++
 io_uring/rw.c                 |  3 +-
 io_uring/uring_cmd.c          | 26 +++++++++++++
 8 files changed, 147 insertions(+), 13 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH for-next 1/4] io_uring: introduce io_uring_cmd_import_fixed
       [not found]   ` <CGME20220819104036epcas5p2bb4d9b2cccbdfcdb460e085abe7fd1a8@epcas5p2.samsung.com>
@ 2022-08-19 10:30     ` Kanchan Joshi
  0 siblings, 0 replies; 9+ messages in thread
From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta,
	Kanchan Joshi

From: Anuj Gupta <[email protected]>

This is a new helper that callers can use to obtain a bvec iterator for
the previously mapped buffer. This is preparatory work to enable
fixed-buffer support for io_uring_cmd.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 include/linux/io_uring.h |  7 +++++++
 io_uring/uring_cmd.c     | 10 ++++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 58676c0a398f..60aba10468fc 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -32,6 +32,8 @@ struct io_uring_cmd {
 };
 
 #if defined(CONFIG_IO_URING)
+int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
+		struct iov_iter *iter, void *ioucmd)
 void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2);
 void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
 			void (*task_work_cb)(struct io_uring_cmd *));
@@ -59,6 +61,11 @@ static inline void io_uring_free(struct task_struct *tsk)
 		__io_uring_free(tsk);
 }
 #else
+int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
+		struct iov_iter *iter, void *ioucmd)
+{
+	return -1;
+}
 static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret,
 		ssize_t ret2)
 {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 8900856fa588..ff65cc8ab6cc 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -121,3 +121,13 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 
 	return IOU_ISSUE_SKIP_COMPLETE;
 }
+
+int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len,
+		int rw, struct iov_iter *iter, void *ioucmd)
+{
+	struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd);
+	struct io_mapped_ubuf *imu = req->imu;
+
+	return io_import_fixed(rw, iter, imu, ubuf, len);
+}
+EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd
       [not found]   ` <CGME20220819104038epcas5p265c9385cfd9189d20ebfffeaa4d5efae@epcas5p2.samsung.com>
@ 2022-08-19 10:30     ` Kanchan Joshi
  2022-08-22 10:58       ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta,
	Kanchan Joshi

From: Anuj Gupta <[email protected]>

Add IORING_OP_URING_CMD_FIXED opcode that enables sending io_uring
command with previously registered buffers. User-space passes the buffer
index in sqe->buf_index, same as done in read/write variants that uses
fixed buffers.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 include/linux/io_uring.h      |  5 ++++-
 include/uapi/linux/io_uring.h |  1 +
 io_uring/opdef.c              | 10 ++++++++++
 io_uring/rw.c                 |  3 ++-
 io_uring/uring_cmd.c          | 18 +++++++++++++++++-
 5 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 60aba10468fc..40961d7c3827 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -5,6 +5,8 @@
 #include <linux/sched.h>
 #include <linux/xarray.h>
 
+#include<uapi/linux/io_uring.h>
+
 enum io_uring_cmd_flags {
 	IO_URING_F_COMPLETE_DEFER	= 1,
 	IO_URING_F_UNLOCKED		= 2,
@@ -15,6 +17,7 @@ enum io_uring_cmd_flags {
 	IO_URING_F_SQE128		= 4,
 	IO_URING_F_CQE32		= 8,
 	IO_URING_F_IOPOLL		= 16,
+	IO_URING_F_FIXEDBUFS		= 32,
 };
 
 struct io_uring_cmd {
@@ -33,7 +36,7 @@ struct io_uring_cmd {
 
 #if defined(CONFIG_IO_URING)
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
-		struct iov_iter *iter, void *ioucmd)
+		struct iov_iter *iter, void *ioucmd);
 void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2);
 void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
 			void (*task_work_cb)(struct io_uring_cmd *));
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 1463cfecb56b..80ea35d1ed5c 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -203,6 +203,7 @@ enum io_uring_op {
 	IORING_OP_SOCKET,
 	IORING_OP_URING_CMD,
 	IORING_OP_SENDZC_NOTIF,
+	IORING_OP_URING_CMD_FIXED,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 9a0df19306fe..7d5731b84c92 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = {
 		.issue			= io_uring_cmd,
 		.prep_async		= io_uring_cmd_prep_async,
 	},
+	[IORING_OP_URING_CMD_FIXED] = {
+		.needs_file		= 1,
+		.plug			= 1,
+		.name			= "URING_CMD_FIXED",
+		.iopoll			= 1,
+		.async_size		= uring_cmd_pdu_size(1),
+		.prep			= io_uring_cmd_prep,
+		.issue			= io_uring_cmd,
+		.prep_async		= io_uring_cmd_prep_async,
+	},
 	[IORING_OP_SENDZC_NOTIF] = {
 		.name			= "SENDZC_NOTIF",
 		.needs_file		= 1,
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 1a4fb8a44b9a..3c7b94bffa62 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
 		if (READ_ONCE(req->iopoll_completed))
 			break;
 
-		if (req->opcode == IORING_OP_URING_CMD) {
+		if (req->opcode == IORING_OP_URING_CMD ||
+				req->opcode == IORING_OP_URING_CMD_FIXED) {
 			struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw;
 
 			ret = req->file->f_op->uring_cmd_iopoll(ioucmd);
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index ff65cc8ab6cc..9383150b2949 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -3,11 +3,13 @@
 #include <linux/errno.h>
 #include <linux/file.h>
 #include <linux/io_uring.h>
+#include <linux/nospec.h>
 
 #include <uapi/linux/io_uring.h>
 
 #include "io_uring.h"
 #include "uring_cmd.h"
+#include "rsrc.h"
 
 static void io_uring_cmd_work(struct io_kiocb *req, bool *locked)
 {
@@ -74,6 +76,18 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 
 	if (sqe->rw_flags || sqe->__pad1)
 		return -EINVAL;
+
+	req->buf_index = READ_ONCE(sqe->buf_index);
+	if (req->opcode == IORING_OP_URING_CMD_FIXED) {
+		struct io_ring_ctx *ctx = req->ctx;
+		u16 index;
+
+		if (unlikely(req->buf_index >= ctx->nr_user_bufs))
+			return -EFAULT;
+		index = array_index_nospec(req->buf_index, ctx->nr_user_bufs);
+		req->imu = ctx->user_bufs[index];
+		io_req_set_rsrc_node(req, ctx, 0);
+	}
 	ioucmd->cmd = sqe->cmd;
 	ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
 	return 0;
@@ -98,6 +112,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 		req->iopoll_completed = 0;
 		WRITE_ONCE(ioucmd->cookie, NULL);
 	}
+	if (req->opcode == IORING_OP_URING_CMD_FIXED)
+		issue_flags |= IO_URING_F_FIXEDBUFS;
 
 	if (req_has_async_data(req))
 		ioucmd->cmd = req->async_data;
@@ -125,7 +141,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len,
 		int rw, struct iov_iter *iter, void *ioucmd)
 {
-	struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd);
+	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
 	struct io_mapped_ubuf *imu = req->imu;
 
 	return io_import_fixed(rw, iter, imu, ubuf, len);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH for-next 3/4] block: add helper to map bvec iterator for passthrough
       [not found]   ` <CGME20220819104042epcas5p177f384cd4c15918f666c7eacc4dfab4c@epcas5p1.samsung.com>
@ 2022-08-19 10:30     ` Kanchan Joshi
  0 siblings, 0 replies; 9+ messages in thread
From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev,
	Kanchan Joshi, Anuj Gupta

Add blk_rq_map_user_fixedb which maps the bvec iterator into a bio and
places that into the request.
This helper is to be used in nvme for uring-passthrough with
fixed-buffer.

Signed-off-by: Kanchan Joshi <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
---
 block/blk-map.c        | 71 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/blk-mq.h |  1 +
 2 files changed, 72 insertions(+)

diff --git a/block/blk-map.c b/block/blk-map.c
index d0ff80a9902e..ee17cc78bf00 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -611,6 +611,77 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq,
 }
 EXPORT_SYMBOL(blk_rq_map_user);
 
+/* Prepare bio for passthrough IO given an existing bvec iter */
+int blk_rq_map_user_bvec(struct request *rq, struct iov_iter *iter)
+{
+	struct request_queue *q = rq->q;
+	size_t iter_count, nr_segs;
+	struct bio *bio;
+	struct bio_vec *bv, *bvec_arr, *bvprvp = NULL;
+	struct queue_limits *lim = &q->limits;
+	unsigned int nsegs = 0, bytes = 0;
+	int ret, i;
+
+	iter_count = iov_iter_count(iter);
+	nr_segs = iter->nr_segs;
+
+	if (!iter_count || (iter_count >> 9) > queue_max_hw_sectors(q))
+		return -EINVAL;
+	if (nr_segs > queue_max_segments(q))
+		return -EINVAL;
+	if (rq->cmd_flags & REQ_POLLED) {
+		blk_opf_t opf = rq->cmd_flags | REQ_ALLOC_CACHE;
+
+		/* no iovecs to alloc, as we already have a BVEC iterator */
+		bio = bio_alloc_bioset(NULL, 0, opf, GFP_KERNEL,
+					&fs_bio_set);
+		if (!bio)
+			return -ENOMEM;
+	} else {
+		bio = bio_kmalloc(0, GFP_KERNEL);
+		if (!bio)
+			return -ENOMEM;
+		bio_init(bio, NULL, bio->bi_inline_vecs, 0, req_op(rq));
+	}
+	bio_iov_bvec_set(bio, iter);
+	blk_rq_bio_prep(rq, bio, nr_segs);
+
+	/* loop to perform a bunch of sanity checks */
+	bvec_arr = (struct bio_vec *)iter->bvec;
+	for (i = 0; i < nr_segs; i++) {
+		bv = &bvec_arr[i];
+		/*
+		 * If the queue doesn't support SG gaps and adding this
+		 * offset would create a gap, disallow it.
+		 */
+		if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) {
+			ret = -EINVAL;
+			goto out_free;
+		}
+
+		/* check full condition */
+		if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len) {
+			ret = -EINVAL;
+			goto out_free;
+		}
+
+		if (bytes + bv->bv_len <= iter_count &&
+				bv->bv_offset + bv->bv_len <= PAGE_SIZE) {
+			nsegs++;
+			bytes += bv->bv_len;
+		} else {
+			ret = -EINVAL;
+			goto out_free;
+		}
+		bvprvp = bv;
+	}
+	return 0;
+out_free:
+	bio_map_put(bio);
+	return ret;
+}
+EXPORT_SYMBOL(blk_rq_map_user_bvec);
+
 /**
  * blk_rq_unmap_user - unmap a request with user data
  * @bio:	       start of bio list
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 9d1af7a0a401..a7c9c836d2f3 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -971,6 +971,7 @@ struct rq_map_data {
 	bool from_user;
 };
 
+int blk_rq_map_user_fixedb(struct request *rq, struct iov_iter *iter);
 int blk_rq_map_user(struct request_queue *, struct request *,
 		struct rq_map_data *, void __user *, unsigned long, gfp_t);
 int blk_rq_map_user_iov(struct request_queue *, struct request *,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH for-next 4/4] nvme: wire up fixed buffer support for nvme passthrough
       [not found]   ` <CGME20220819104045epcas5p117a9fcb0c3143e877e75e24ceba4f381@epcas5p1.samsung.com>
@ 2022-08-19 10:30     ` Kanchan Joshi
  0 siblings, 0 replies; 9+ messages in thread
From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw)
  To: axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev,
	Kanchan Joshi, Anuj Gupta

if io_uring sends passthrough command with IO_URING_F_FIXEDBUFS flag,
use the pre-registered buffer to form the bio.
While at it modify nvme_submit_user_cmd to take ubuffer as plain integer
argument, and do away with nvme_to_user_ptr conversion in callers.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 drivers/nvme/host/ioctl.c | 38 ++++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 7756b439a688..5f2e2d31f5c7 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -65,10 +65,11 @@ static int nvme_finish_user_metadata(struct request *req, void __user *ubuf,
 }
 
 static struct request *nvme_alloc_user_request(struct request_queue *q,
-		struct nvme_command *cmd, void __user *ubuffer,
+		struct nvme_command *cmd, u64 ubuffer,
 		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
 		u32 meta_seed, void **metap, unsigned timeout, bool vec,
-		blk_opf_t rq_flags, blk_mq_req_flags_t blk_flags)
+		blk_opf_t rq_flags, blk_mq_req_flags_t blk_flags,
+		struct io_uring_cmd *ioucmd, bool fixedbufs)
 {
 	bool write = nvme_is_write(cmd);
 	struct nvme_ns *ns = q->queuedata;
@@ -89,14 +90,27 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
 
 	if (ubuffer && bufflen) {
 		if (!vec)
-			ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen,
-				GFP_KERNEL);
+			if (fixedbufs) {
+				struct iov_iter iter;
+
+				ret = io_uring_cmd_import_fixed(ubuffer,
+						bufflen, rq_data_dir(req),
+						&iter, ioucmd);
+				if (ret < 0)
+					goto out;
+				ret = blk_rq_map_user_fixedb(req, &iter);
+			} else {
+				ret = blk_rq_map_user(q, req, NULL,
+						nvme_to_user_ptr(ubuffer),
+						bufflen, GFP_KERNEL);
+			}
 		else {
 			struct iovec fast_iov[UIO_FASTIOV];
 			struct iovec *iov = fast_iov;
 			struct iov_iter iter;
 
-			ret = import_iovec(rq_data_dir(req), ubuffer, bufflen,
+			ret = import_iovec(rq_data_dir(req),
+					nvme_to_user_ptr(ubuffer), bufflen,
 					UIO_FASTIOV, &iov, &iter);
 			if (ret < 0)
 				goto out;
@@ -132,7 +146,7 @@ static struct request *nvme_alloc_user_request(struct request_queue *q,
 }
 
 static int nvme_submit_user_cmd(struct request_queue *q,
-		struct nvme_command *cmd, void __user *ubuffer,
+		struct nvme_command *cmd, u64 ubuffer,
 		unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
 		u32 meta_seed, u64 *result, unsigned timeout, bool vec)
 {
@@ -142,7 +156,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
 	int ret;
 
 	req = nvme_alloc_user_request(q, cmd, ubuffer, bufflen, meta_buffer,
-			meta_len, meta_seed, &meta, timeout, vec, 0, 0);
+			meta_len, meta_seed, &meta, timeout, vec, 0, 0, NULL, 0);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
@@ -220,7 +234,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
 	c.rw.appmask = cpu_to_le16(io.appmask);
 
 	return nvme_submit_user_cmd(ns->queue, &c,
-			nvme_to_user_ptr(io.addr), length,
+			io.addr, length,
 			metadata, meta_len, lower_32_bits(io.slba), NULL, 0,
 			false);
 }
@@ -274,7 +288,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		timeout = msecs_to_jiffies(cmd.timeout_ms);
 
 	status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
-			nvme_to_user_ptr(cmd.addr), cmd.data_len,
+			cmd.addr, cmd.data_len,
 			nvme_to_user_ptr(cmd.metadata), cmd.metadata_len,
 			0, &result, timeout, false);
 
@@ -320,7 +334,7 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		timeout = msecs_to_jiffies(cmd.timeout_ms);
 
 	status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
-			nvme_to_user_ptr(cmd.addr), cmd.data_len,
+			cmd.addr, cmd.data_len,
 			nvme_to_user_ptr(cmd.metadata), cmd.metadata_len,
 			0, &cmd.result, timeout, vec);
 
@@ -457,11 +471,11 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		rq_flags |= REQ_POLLED;
 
 retry:
-	req = nvme_alloc_user_request(q, &c, nvme_to_user_ptr(d.addr),
+	req = nvme_alloc_user_request(q, &c, d.addr,
 			d.data_len, nvme_to_user_ptr(d.metadata),
 			d.metadata_len, 0, &meta, d.timeout_ms ?
 			msecs_to_jiffies(d.timeout_ms) : 0, vec, rq_flags,
-			blk_flags);
+			blk_flags, ioucmd, issue_flags & IO_URING_F_FIXEDBUFS);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 	req->end_io = nvme_uring_cmd_end_io;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd
  2022-08-19 10:30     ` [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd Kanchan Joshi
@ 2022-08-22 10:58       ` Pavel Begunkov
  2022-08-22 11:33         ` Kanchan Joshi
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2022-08-22 10:58 UTC (permalink / raw)
  To: Kanchan Joshi, axboe, hch, kbusch
  Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta

On 8/19/22 11:30, Kanchan Joshi wrote:
> From: Anuj Gupta <[email protected]>
> 
> Add IORING_OP_URING_CMD_FIXED opcode that enables sending io_uring
> command with previously registered buffers. User-space passes the buffer
> index in sqe->buf_index, same as done in read/write variants that uses
> fixed buffers.
> 
> Signed-off-by: Anuj Gupta <[email protected]>
> Signed-off-by: Kanchan Joshi <[email protected]>
> ---
>   include/linux/io_uring.h      |  5 ++++-
>   include/uapi/linux/io_uring.h |  1 +
>   io_uring/opdef.c              | 10 ++++++++++
>   io_uring/rw.c                 |  3 ++-
>   io_uring/uring_cmd.c          | 18 +++++++++++++++++-
>   5 files changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
> index 60aba10468fc..40961d7c3827 100644
> --- a/include/linux/io_uring.h
> +++ b/include/linux/io_uring.h
> @@ -5,6 +5,8 @@
>   #include <linux/sched.h>
>   #include <linux/xarray.h>
>   
> +#include<uapi/linux/io_uring.h>
> +
>   enum io_uring_cmd_flags {
>   	IO_URING_F_COMPLETE_DEFER	= 1,
>   	IO_URING_F_UNLOCKED		= 2,
> @@ -15,6 +17,7 @@ enum io_uring_cmd_flags {
>   	IO_URING_F_SQE128		= 4,
>   	IO_URING_F_CQE32		= 8,
>   	IO_URING_F_IOPOLL		= 16,
> +	IO_URING_F_FIXEDBUFS		= 32,
>   };
>   
>   struct io_uring_cmd {
> @@ -33,7 +36,7 @@ struct io_uring_cmd {
>   
>   #if defined(CONFIG_IO_URING)
>   int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
> -		struct iov_iter *iter, void *ioucmd)
> +		struct iov_iter *iter, void *ioucmd);

Please try to compile the first patch separately

>   void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2);
>   void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>   			void (*task_work_cb)(struct io_uring_cmd *));
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 1463cfecb56b..80ea35d1ed5c 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -203,6 +203,7 @@ enum io_uring_op {
>   	IORING_OP_SOCKET,
>   	IORING_OP_URING_CMD,
>   	IORING_OP_SENDZC_NOTIF,
> +	IORING_OP_URING_CMD_FIXED,

I don't think it should be another opcode, is there any
control flags we can fit it in?


>   	/* this goes last, obviously */
>   	IORING_OP_LAST,
> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
> index 9a0df19306fe..7d5731b84c92 100644
> --- a/io_uring/opdef.c
> +++ b/io_uring/opdef.c
> @@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = {
>   		.issue			= io_uring_cmd,
>   		.prep_async		= io_uring_cmd_prep_async,
>   	},
> +	[IORING_OP_URING_CMD_FIXED] = {
> +		.needs_file		= 1,
> +		.plug			= 1,
> +		.name			= "URING_CMD_FIXED",
> +		.iopoll			= 1,
> +		.async_size		= uring_cmd_pdu_size(1),
> +		.prep			= io_uring_cmd_prep,
> +		.issue			= io_uring_cmd,
> +		.prep_async		= io_uring_cmd_prep_async,
> +	},
>   	[IORING_OP_SENDZC_NOTIF] = {
>   		.name			= "SENDZC_NOTIF",
>   		.needs_file		= 1,
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 1a4fb8a44b9a..3c7b94bffa62 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
>   		if (READ_ONCE(req->iopoll_completed))
>   			break;
>   
> -		if (req->opcode == IORING_OP_URING_CMD) {
> +		if (req->opcode == IORING_OP_URING_CMD ||
> +				req->opcode == IORING_OP_URING_CMD_FIXED) {

I don't see the changed chunk upstream

>   			struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw;
>   
>   			ret = req->file->f_op->uring_cmd_iopoll(ioucmd);
[...]

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd
  2022-08-22 10:58       ` Pavel Begunkov
@ 2022-08-22 11:33         ` Kanchan Joshi
  2022-08-25  9:34           ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Kanchan Joshi @ 2022-08-22 11:33 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, ming.lei,
	gost.dev, Anuj Gupta

[-- Attachment #1: Type: text/plain, Size: 4098 bytes --]

On Mon, Aug 22, 2022 at 11:58:24AM +0100, Pavel Begunkov wrote:
>On 8/19/22 11:30, Kanchan Joshi wrote:
>>From: Anuj Gupta <[email protected]>
>>
>>Add IORING_OP_URING_CMD_FIXED opcode that enables sending io_uring
>>command with previously registered buffers. User-space passes the buffer
>>index in sqe->buf_index, same as done in read/write variants that uses
>>fixed buffers.
>>
>>Signed-off-by: Anuj Gupta <[email protected]>
>>Signed-off-by: Kanchan Joshi <[email protected]>
>>---
>>  include/linux/io_uring.h      |  5 ++++-
>>  include/uapi/linux/io_uring.h |  1 +
>>  io_uring/opdef.c              | 10 ++++++++++
>>  io_uring/rw.c                 |  3 ++-
>>  io_uring/uring_cmd.c          | 18 +++++++++++++++++-
>>  5 files changed, 34 insertions(+), 3 deletions(-)
>>
>>diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
>>index 60aba10468fc..40961d7c3827 100644
>>--- a/include/linux/io_uring.h
>>+++ b/include/linux/io_uring.h
>>@@ -5,6 +5,8 @@
>>  #include <linux/sched.h>
>>  #include <linux/xarray.h>
>>+#include<uapi/linux/io_uring.h>
>>+
>>  enum io_uring_cmd_flags {
>>  	IO_URING_F_COMPLETE_DEFER	= 1,
>>  	IO_URING_F_UNLOCKED		= 2,
>>@@ -15,6 +17,7 @@ enum io_uring_cmd_flags {
>>  	IO_URING_F_SQE128		= 4,
>>  	IO_URING_F_CQE32		= 8,
>>  	IO_URING_F_IOPOLL		= 16,
>>+	IO_URING_F_FIXEDBUFS		= 32,
>>  };
>>  struct io_uring_cmd {
>>@@ -33,7 +36,7 @@ struct io_uring_cmd {
>>  #if defined(CONFIG_IO_URING)
>>  int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
>>-		struct iov_iter *iter, void *ioucmd)
>>+		struct iov_iter *iter, void *ioucmd);
>
>Please try to compile the first patch separately

Indeed, this should have been part of that patch. Thanks.

>>  void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2);
>>  void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>  			void (*task_work_cb)(struct io_uring_cmd *));
>>diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>index 1463cfecb56b..80ea35d1ed5c 100644
>>--- a/include/uapi/linux/io_uring.h
>>+++ b/include/uapi/linux/io_uring.h
>>@@ -203,6 +203,7 @@ enum io_uring_op {
>>  	IORING_OP_SOCKET,
>>  	IORING_OP_URING_CMD,
>>  	IORING_OP_SENDZC_NOTIF,
>>+	IORING_OP_URING_CMD_FIXED,
>
>I don't think it should be another opcode, is there any
>control flags we can fit it in?

using sqe->rw_flags could be another way.
But I think that may create bit of disharmony in user-space.
Current choice (IORING_OP_URING_CMD_FIXED) is along the same lines as
IORING_OP_READ/WRITE_FIXED. User-space uses new opcode, and sends the
buffer by filling sqe->buf_index. 
So must we take a different way?

>>  	/* this goes last, obviously */
>>  	IORING_OP_LAST,
>>diff --git a/io_uring/opdef.c b/io_uring/opdef.c
>>index 9a0df19306fe..7d5731b84c92 100644
>>--- a/io_uring/opdef.c
>>+++ b/io_uring/opdef.c
>>@@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = {
>>  		.issue			= io_uring_cmd,
>>  		.prep_async		= io_uring_cmd_prep_async,
>>  	},
>>+	[IORING_OP_URING_CMD_FIXED] = {
>>+		.needs_file		= 1,
>>+		.plug			= 1,
>>+		.name			= "URING_CMD_FIXED",
>>+		.iopoll			= 1,
>>+		.async_size		= uring_cmd_pdu_size(1),
>>+		.prep			= io_uring_cmd_prep,
>>+		.issue			= io_uring_cmd,
>>+		.prep_async		= io_uring_cmd_prep_async,
>>+	},
>>  	[IORING_OP_SENDZC_NOTIF] = {
>>  		.name			= "SENDZC_NOTIF",
>>  		.needs_file		= 1,
>>diff --git a/io_uring/rw.c b/io_uring/rw.c
>>index 1a4fb8a44b9a..3c7b94bffa62 100644
>>--- a/io_uring/rw.c
>>+++ b/io_uring/rw.c
>>@@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
>>  		if (READ_ONCE(req->iopoll_completed))
>>  			break;
>>-		if (req->opcode == IORING_OP_URING_CMD) {
>>+		if (req->opcode == IORING_OP_URING_CMD ||
>>+				req->opcode == IORING_OP_URING_CMD_FIXED) {
>
>I don't see the changed chunk upstream

Right, it is on top of iopoll support (plus one more series mentioned in
covered letter). Here is the link - 
https://lore.kernel.org/linux-block/[email protected]/
It would be great if you could review that.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd
  2022-08-22 11:33         ` Kanchan Joshi
@ 2022-08-25  9:34           ` Pavel Begunkov
  2022-08-25 16:02             ` Kanchan Joshi
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2022-08-25  9:34 UTC (permalink / raw)
  To: Kanchan Joshi
  Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, ming.lei,
	gost.dev, Anuj Gupta

On 8/22/22 12:33, Kanchan Joshi wrote:
> On Mon, Aug 22, 2022 at 11:58:24AM +0100, Pavel Begunkov wrote:
[...]
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index 1463cfecb56b..80ea35d1ed5c 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -203,6 +203,7 @@ enum io_uring_op {
>>>      IORING_OP_SOCKET,
>>>      IORING_OP_URING_CMD,
>>>      IORING_OP_SENDZC_NOTIF,
>>> +    IORING_OP_URING_CMD_FIXED,
>>
>> I don't think it should be another opcode, is there any
>> control flags we can fit it in?
> 
> using sqe->rw_flags could be another way.

We also use ->ioprio for io_uring opcode specific flags,
e.g. like in io_sendmsg_prep() for IORING_RECVSEND_POLL_FIRST,
might be even better better.

> But I think that may create bit of disharmony in user-space.
> Current choice (IORING_OP_URING_CMD_FIXED) is along the same lines as
> IORING_OP_READ/WRITE_FIXED.

And I still believe it was a bad choice, I don't like this encoding
of independent options/features by linearising toggles into opcodes.
A consistent way to add vectored fixed bufs would be to have a 4th
opcode, e.g. READV_FIXED, which is not great.

> User-space uses new opcode, and sends the
> buffer by filling sqe->buf_index. So must we take a different way?

I do think so


>>>      /* this goes last, obviously */
>>>      IORING_OP_LAST,
>>> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
>>> index 9a0df19306fe..7d5731b84c92 100644
>>> --- a/io_uring/opdef.c
>>> +++ b/io_uring/opdef.c
>>> @@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = {
>>>          .issue            = io_uring_cmd,
>>>          .prep_async        = io_uring_cmd_prep_async,
>>>      },
>>> +    [IORING_OP_URING_CMD_FIXED] = {
>>> +        .needs_file        = 1,
>>> +        .plug            = 1,
>>> +        .name            = "URING_CMD_FIXED",
>>> +        .iopoll            = 1,
>>> +        .async_size        = uring_cmd_pdu_size(1),
>>> +        .prep            = io_uring_cmd_prep,
>>> +        .issue            = io_uring_cmd,
>>> +        .prep_async        = io_uring_cmd_prep_async,
>>> +    },
>>>      [IORING_OP_SENDZC_NOTIF] = {
>>>          .name            = "SENDZC_NOTIF",
>>>          .needs_file        = 1,
>>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>>> index 1a4fb8a44b9a..3c7b94bffa62 100644
>>> --- a/io_uring/rw.c
>>> +++ b/io_uring/rw.c
>>> @@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
>>>          if (READ_ONCE(req->iopoll_completed))
>>>              break;
>>> -        if (req->opcode == IORING_OP_URING_CMD) {
>>> +        if (req->opcode == IORING_OP_URING_CMD ||
>>> +                req->opcode == IORING_OP_URING_CMD_FIXED) {
>>
>> I don't see the changed chunk upstream
> 
> Right, it is on top of iopoll support (plus one more series mentioned in
> covered letter). Here is the link - https://lore.kernel.org/linux-block/[email protected]/
> It would be great if you could review that.
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd
  2022-08-25  9:34           ` Pavel Begunkov
@ 2022-08-25 16:02             ` Kanchan Joshi
  0 siblings, 0 replies; 9+ messages in thread
From: Kanchan Joshi @ 2022-08-25 16:02 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, ming.lei,
	gost.dev, Anuj Gupta

[-- Attachment #1: Type: text/plain, Size: 1462 bytes --]

On Thu, Aug 25, 2022 at 10:34:11AM +0100, Pavel Begunkov wrote:
>On 8/22/22 12:33, Kanchan Joshi wrote:
>>On Mon, Aug 22, 2022 at 11:58:24AM +0100, Pavel Begunkov wrote:
>[...]
>>>>diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>>index 1463cfecb56b..80ea35d1ed5c 100644
>>>>--- a/include/uapi/linux/io_uring.h
>>>>+++ b/include/uapi/linux/io_uring.h
>>>>@@ -203,6 +203,7 @@ enum io_uring_op {
>>>>     IORING_OP_SOCKET,
>>>>     IORING_OP_URING_CMD,
>>>>     IORING_OP_SENDZC_NOTIF,
>>>>+    IORING_OP_URING_CMD_FIXED,
>>>
>>>I don't think it should be another opcode, is there any
>>>control flags we can fit it in?
>>
>>using sqe->rw_flags could be another way.
>
>We also use ->ioprio for io_uring opcode specific flags,
>e.g. like in io_sendmsg_prep() for IORING_RECVSEND_POLL_FIRST,
>might be even better better.
>
>>But I think that may create bit of disharmony in user-space.
>>Current choice (IORING_OP_URING_CMD_FIXED) is along the same lines as
>>IORING_OP_READ/WRITE_FIXED.
>
>And I still believe it was a bad choice, I don't like this encoding
>of independent options/features by linearising toggles into opcodes.
>A consistent way to add vectored fixed bufs would be to have a 4th
>opcode, e.g. READV_FIXED, which is not great.
>
>>User-space uses new opcode, and sends the
>>buffer by filling sqe->buf_index. So must we take a different way?
>
>I do think so

I see. Will change this in next iteration.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-08-25 16:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20220819104031epcas5p3d485526e1b2b42078ccce7e40a74b7f5@epcas5p3.samsung.com>
2022-08-19 10:30 ` [PATCH for-next 0/4] fixed-buffer for uring-cmd/passthrough Kanchan Joshi
     [not found]   ` <CGME20220819104036epcas5p2bb4d9b2cccbdfcdb460e085abe7fd1a8@epcas5p2.samsung.com>
2022-08-19 10:30     ` [PATCH for-next 1/4] io_uring: introduce io_uring_cmd_import_fixed Kanchan Joshi
     [not found]   ` <CGME20220819104038epcas5p265c9385cfd9189d20ebfffeaa4d5efae@epcas5p2.samsung.com>
2022-08-19 10:30     ` [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd Kanchan Joshi
2022-08-22 10:58       ` Pavel Begunkov
2022-08-22 11:33         ` Kanchan Joshi
2022-08-25  9:34           ` Pavel Begunkov
2022-08-25 16:02             ` Kanchan Joshi
     [not found]   ` <CGME20220819104042epcas5p177f384cd4c15918f666c7eacc4dfab4c@epcas5p1.samsung.com>
2022-08-19 10:30     ` [PATCH for-next 3/4] block: add helper to map bvec iterator for passthrough Kanchan Joshi
     [not found]   ` <CGME20220819104045epcas5p117a9fcb0c3143e877e75e24ceba4f381@epcas5p1.samsung.com>
2022-08-19 10:30     ` [PATCH for-next 4/4] nvme: wire up fixed buffer support for nvme passthrough Kanchan Joshi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox