public inbox for [email protected]
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] Read/Write with meta buffer
       [not found] <CGME20240322185729epcas5p350c5054b5b519a6aa9d1b35ba3709563@epcas5p3.samsung.com>
@ 2024-03-22 18:50 ` Kanchan Joshi
       [not found]   ` <CGME20240322185731epcas5p20fc525f793a537310f7b3ae5ba5bc75b@epcas5p2.samsung.com>
                     ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Kanchan Joshi @ 2024-03-22 18:50 UTC (permalink / raw)
  To: martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538, Kanchan Joshi

This patchset is aimed at getting the feedback on a new io_uring
interface that userspace can use to exchange meta buffer along with
read/write.

Two new opcodes for that: IORING_OP_READ_META and IORING_OP_WRITE_META.
The leftover space in the SQE is used to send meta buffer pointer
and its length. Patch #2 for this.

The interface is supported for block direct IO. Patch #4 for this.
Other two are prep patches.

It has been tried not to touch the hot read/write path, as much as
possible. Performance for non-meta IO is same after the patches [2].
There is some code in the cold path (worker-based async)
though.

Moderately tested by modifying the fio [1] to use this interface
(only for NVMe block devices)

[1] https://github.com/OpenMPDK/fio/tree/feat/test-meta

[2]
without this:

taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31

With this:
taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31

Anuj Gupta (3):
  io_uring/rw: Get rid of flags field in struct io_rw
  io_uring/rw: support read/write with metadata
  block: modify bio_integrity_map_user to accept iov_iter as argument

Kanchan Joshi (1):
  block: add support to pass the meta buffer

 block/bio-integrity.c         |  27 ++++++---
 block/fops.c                  |   9 +++
 block/t10-pi.c                |   6 ++
 drivers/nvme/host/ioctl.c     |  11 +++-
 include/linux/bio.h           |  13 +++-
 include/linux/fs.h            |   1 +
 include/uapi/linux/io_uring.h |   6 ++
 io_uring/io_uring.c           |   2 +
 io_uring/opdef.c              |  29 +++++++++
 io_uring/rw.c                 | 108 +++++++++++++++++++++++++++++-----
 io_uring/rw.h                 |   8 +++
 11 files changed, 193 insertions(+), 27 deletions(-)


base-commit: 6f0974eccbf78baead1735722c4f1ee3eb9422cd
-- 
2.25.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [RFC PATCH 1/4] io_uring/rw: Get rid of flags field in struct io_rw
       [not found]   ` <CGME20240322185731epcas5p20fc525f793a537310f7b3ae5ba5bc75b@epcas5p2.samsung.com>
@ 2024-03-22 18:50     ` Kanchan Joshi
  2024-03-27 23:30       ` David Wei
  0 siblings, 1 reply; 11+ messages in thread
From: Kanchan Joshi @ 2024-03-22 18:50 UTC (permalink / raw)
  To: martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538, Anuj Gupta

From: Anuj Gupta <[email protected]>

Get rid of the flags field in io_rw. Flags can be set in kiocb->flags
during prep rather than doing it while issuing the I/O in io_read/io_write.

Signed-off-by: Anuj Gupta <[email protected]>
---
 io_uring/rw.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/io_uring/rw.c b/io_uring/rw.c
index 47e097ab5d7e..40f6c2a59928 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -27,7 +27,6 @@ struct io_rw {
 	struct kiocb			kiocb;
 	u64				addr;
 	u32				len;
-	rwf_t				flags;
 };
 
 static inline bool io_file_supports_nowait(struct io_kiocb *req)
@@ -78,10 +77,16 @@ static int io_iov_buffer_select_prep(struct io_kiocb *req)
 int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+	struct kiocb *kiocb = &rw->kiocb;
 	unsigned ioprio;
 	int ret;
 
-	rw->kiocb.ki_pos = READ_ONCE(sqe->off);
+	kiocb->ki_flags = 0;
+	ret = kiocb_set_rw_flags(kiocb, READ_ONCE(sqe->rw_flags));
+	if (unlikely(ret))
+		return ret;
+
+	kiocb->ki_pos = READ_ONCE(sqe->off);
 	/* used for fixed read/write too - just read unconditionally */
 	req->buf_index = READ_ONCE(sqe->buf_index);
 
@@ -91,15 +96,14 @@ int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 		if (ret)
 			return ret;
 
-		rw->kiocb.ki_ioprio = ioprio;
+		kiocb->ki_ioprio = ioprio;
 	} else {
-		rw->kiocb.ki_ioprio = get_current_ioprio();
+		kiocb->ki_ioprio = get_current_ioprio();
 	}
-	rw->kiocb.dio_complete = NULL;
+	kiocb->dio_complete = NULL;
 
 	rw->addr = READ_ONCE(sqe->addr);
 	rw->len = READ_ONCE(sqe->len);
-	rw->flags = READ_ONCE(sqe->rw_flags);
 	return 0;
 }
 
@@ -720,7 +724,6 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 	struct kiocb *kiocb = &rw->kiocb;
 	struct io_ring_ctx *ctx = req->ctx;
 	struct file *file = req->file;
-	int ret;
 
 	if (unlikely(!(file->f_mode & mode)))
 		return -EBADF;
@@ -728,10 +731,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 	if (!(req->flags & REQ_F_FIXED_FILE))
 		req->flags |= io_file_get_flags(file);
 
-	kiocb->ki_flags = file->f_iocb_flags;
-	ret = kiocb_set_rw_flags(kiocb, rw->flags);
-	if (unlikely(ret))
-		return ret;
+	kiocb->ki_flags |= file->f_iocb_flags;
 	kiocb->ki_flags |= IOCB_ALLOC_CACHE;
 
 	/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 2/4] io_uring/rw: support read/write with metadata
       [not found]   ` <CGME20240322185734epcas5p2cd407dac97cd157c1833c4022ea84805@epcas5p2.samsung.com>
@ 2024-03-22 18:50     ` Kanchan Joshi
  0 siblings, 0 replies; 11+ messages in thread
From: Kanchan Joshi @ 2024-03-22 18:50 UTC (permalink / raw)
  To: martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538, Anuj Gupta, Kanchan Joshi,
	Nitesh Shetty

From: Anuj Gupta <[email protected]>

This patch introduces IORING_OP_READ_META and IORING_OP_WRITE_META
opcodes which allow sending a meta buffer along with read/write.

Application can do that by using the newly added meta_buf and meta-len
fields of the SQE.

These opcodes are supported only for direct IO.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
Signed-off-by: Nitesh Shetty <[email protected]>
---
 include/linux/fs.h            |  1 +
 include/uapi/linux/io_uring.h |  6 +++
 io_uring/io_uring.c           |  2 +
 io_uring/opdef.c              | 29 ++++++++++++
 io_uring/rw.c                 | 86 +++++++++++++++++++++++++++++++++--
 io_uring/rw.h                 |  8 ++++
 6 files changed, 129 insertions(+), 3 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0a22b7245982..c3a483a4fdac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -327,6 +327,7 @@ struct readahead_control;
 #define IOCB_NOIO		(1 << 20)
 /* can use bio alloc cache */
 #define IOCB_ALLOC_CACHE	(1 << 21)
+#define IOCB_USE_META		(1 << 22)
 /*
  * IOCB_DIO_CALLER_COMP can be set by the iocb owner, to indicate that the
  * iocb completion can be passed back to the owner for execution from a safe
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 7bd10201a02b..87bd44098037 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -97,6 +97,10 @@ struct io_uring_sqe {
 			__u64	addr3;
 			__u64	__pad2[1];
 		};
+		struct {
+			__u64	meta_addr;
+			__u32	meta_len;
+		};
 		__u64	optval;
 		/*
 		 * If the ring is initialized with IORING_SETUP_SQE128, then
@@ -256,6 +260,8 @@ enum io_uring_op {
 	IORING_OP_FUTEX_WAITV,
 	IORING_OP_FIXED_FD_INSTALL,
 	IORING_OP_FTRUNCATE,
+	IORING_OP_READ_META,
+	IORING_OP_WRITE_META,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 49a124daa359..7c380cac4465 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -4134,7 +4134,9 @@ static int __init io_uring_init(void)
 	BUILD_BUG_SQE_ELEM(44, __u16,  addr_len);
 	BUILD_BUG_SQE_ELEM(46, __u16,  __pad3[0]);
 	BUILD_BUG_SQE_ELEM(48, __u64,  addr3);
+	BUILD_BUG_SQE_ELEM(48, __u64,  meta_addr);
 	BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd);
+	BUILD_BUG_SQE_ELEM(56, __u32,  meta_len);
 	BUILD_BUG_SQE_ELEM(56, __u64,  __pad2);
 
 	BUILD_BUG_ON(sizeof(struct io_uring_files_update) !=
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 9c080aadc5a6..cb31573ac4ad 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -146,6 +146,26 @@ const struct io_issue_def io_issue_defs[] = {
 		.prep			= io_eopnotsupp_prep,
 #endif
 	},
+	[IORING_OP_READ_META] = {
+		.needs_file		= 1,
+		.plug			= 1,
+		.audit_skip		= 1,
+		.ioprio			= 1,
+		.iopoll			= 1,
+		.iopoll_queue		= 1,
+		.prep			= io_prep_rw_meta,
+		.issue			= io_rw_meta,
+	},
+	[IORING_OP_WRITE_META] = {
+		.needs_file		= 1,
+		.plug			= 1,
+		.audit_skip		= 1,
+		.ioprio			= 1,
+		.iopoll			= 1,
+		.iopoll_queue		= 1,
+		.prep			= io_prep_rw_meta,
+		.issue			= io_rw_meta,
+	},
 	[IORING_OP_RECVMSG] = {
 		.needs_file		= 1,
 		.unbound_nonreg_file	= 1,
@@ -501,6 +521,15 @@ const struct io_cold_def io_cold_defs[] = {
 		.cleanup		= io_readv_writev_cleanup,
 		.fail			= io_rw_fail,
 	},
+	[IORING_OP_READ_META] = {
+		.async_size		= sizeof(struct io_async_rw),
+		.name			= "READ_META",
+		.fail			= io_rw_fail,
+	},
+	[IORING_OP_WRITE_META] = {
+		.async_size		= sizeof(struct io_async_rw),
+		.name			= "WRITE_META",
+	},
 	[IORING_OP_FSYNC] = {
 		.name			= "FSYNC",
 	},
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 40f6c2a59928..87a6304052f0 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -27,6 +27,7 @@ struct io_rw {
 	struct kiocb			kiocb;
 	u64				addr;
 	u32				len;
+	u32				meta_len;
 };
 
 static inline bool io_file_supports_nowait(struct io_kiocb *req)
@@ -107,6 +108,22 @@ int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
+int io_prep_rw_meta(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+	struct kiocb *kiocb = &rw->kiocb;
+	int ret;
+
+	ret = io_prep_rw(req, sqe);
+	if (unlikely(ret))
+		return ret;
+	kiocb->private = u64_to_user_ptr(READ_ONCE(sqe->meta_addr));
+	rw->meta_len = READ_ONCE(sqe->meta_len);
+
+	kiocb->ki_flags |= IOCB_USE_META;
+	return 0;
+}
+
 int io_prep_rwv(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	int ret;
@@ -571,9 +588,18 @@ static void io_req_map_rw(struct io_kiocb *req, const struct iovec *iovec,
 	}
 }
 
+static inline void io_req_map_meta(struct io_async_rw *iorw, struct io_rw_state_meta *sm)
+{
+	memcpy(&iorw->s_meta.iter_meta, &sm->iter_meta, sizeof(struct iov_iter));
+	iov_iter_save_state(&iorw->s_meta.iter_meta, &iorw->s_meta.iter_state_meta);
+}
+
 static int io_setup_async_rw(struct io_kiocb *req, const struct iovec *iovec,
 			     struct io_rw_state *s, bool force)
 {
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+	struct kiocb *kiocb = &rw->kiocb;
+
 	if (!force && !io_cold_defs[req->opcode].prep_async)
 		return 0;
 	/* opcode type doesn't need async data */
@@ -591,6 +617,11 @@ static int io_setup_async_rw(struct io_kiocb *req, const struct iovec *iovec,
 		iorw = req->async_data;
 		/* we've copied and mapped the iter, ensure state is saved */
 		iov_iter_save_state(&iorw->s.iter, &iorw->s.iter_state);
+		if (unlikely(kiocb->ki_flags & IOCB_USE_META)) {
+			struct io_rw_state_meta *sm = kiocb->private;
+
+			io_req_map_meta(iorw, sm);
+		}
 	}
 	return 0;
 }
@@ -747,7 +778,8 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode)
 		if (!(kiocb->ki_flags & IOCB_DIRECT) || !file->f_op->iopoll)
 			return -EOPNOTSUPP;
 
-		kiocb->private = NULL;
+		if (likely(!(kiocb->ki_flags & IOCB_USE_META)))
+			kiocb->private = NULL;
 		kiocb->ki_flags |= IOCB_HIPRI;
 		kiocb->ki_complete = io_complete_rw_iopoll;
 		req->iopoll_completed = 0;
@@ -766,6 +798,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_rw_state __s, *s = &__s;
 	struct iovec *iovec;
 	struct kiocb *kiocb = &rw->kiocb;
+	struct io_rw_state_meta *sm = kiocb->private;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
 	struct io_async_rw *io;
 	ssize_t ret, ret2;
@@ -840,13 +873,16 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 		/* no retry on NONBLOCK nor RWF_NOWAIT */
 		if (req->flags & REQ_F_NOWAIT)
 			goto done;
+		if (kiocb->ki_flags & IOCB_USE_META)
+			kiocb->private = sm;
 		ret = 0;
 	} else if (ret == -EIOCBQUEUED) {
 		if (iovec)
 			kfree(iovec);
 		return IOU_ISSUE_SKIP_COMPLETE;
 	} else if (ret == req->cqe.res || ret <= 0 || !force_nonblock ||
-		   (req->flags & REQ_F_NOWAIT) || !need_complete_io(req)) {
+		   (req->flags & REQ_F_NOWAIT) || !need_complete_io(req) ||
+		   (kiocb->ki_flags & IOCB_USE_META)) {
 		/* read all, failed, already did sync or don't want to retry */
 		goto done;
 	}
@@ -857,6 +893,12 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 	 * manually if we need to.
 	 */
 	iov_iter_restore(&s->iter, &s->iter_state);
+	if (unlikely(kiocb->ki_flags & IOCB_USE_META)) {
+		/* don't handle partial completion for read + meta */
+		if (ret > 0)
+			goto done;
+		iov_iter_restore(&sm->iter_meta, &sm->iter_state_meta);
+	}
 
 	ret2 = io_setup_async_rw(req, iovec, s, true);
 	iovec = NULL;
@@ -1070,7 +1112,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 		if (ret2 == -EAGAIN && (req->ctx->flags & IORING_SETUP_IOPOLL))
 			goto copy_iov;
 
-		if (ret2 != req->cqe.res && ret2 >= 0 && need_complete_io(req)) {
+		if (ret2 != req->cqe.res && ret2 >= 0 && need_complete_io(req)
+				&& !(kiocb->ki_flags & IOCB_USE_META)) {
 			struct io_async_rw *io;
 
 			trace_io_uring_short_write(req->ctx, kiocb->ki_pos - ret2,
@@ -1111,6 +1154,43 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	return ret;
 }
 
+int io_rw_meta(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+	void __user *meta_addr = u64_to_user_ptr((u64)rw->kiocb.private);
+	struct io_rw_state_meta __sm, *sm = &__sm;
+	struct kiocb *kiocb = &rw->kiocb;
+	int ret;
+
+	if (!(req->file->f_flags & O_DIRECT))
+		return -EOPNOTSUPP;
+	/* prepare iter for meta-buffer */
+	if (!req_has_async_data(req)) {
+		ret = import_ubuf(ITER_SOURCE, meta_addr, rw->meta_len, &sm->iter_meta);
+		iov_iter_save_state(&sm->iter_meta, &sm->iter_state_meta);
+		if (unlikely(ret < 0))
+			return ret;
+	} else {
+		struct io_async_rw *io = req->async_data;
+
+		sm = &io->s_meta;
+		iov_iter_restore(&sm->iter_meta, &sm->iter_state_meta);
+	}
+	/* Store iter for meta-buf in private, will be used later*/
+	kiocb->private = sm;
+	if (req->opcode == IORING_OP_READ_META) {
+		ret = __io_read(req, issue_flags);
+		if (ret >= 0)
+			return kiocb_done(req, ret, issue_flags);
+	} else {
+		ret = io_write(req, issue_flags);
+	}
+	if (ret == -EAGAIN)
+		kiocb->private = meta_addr;
+	return ret;
+
+}
+
 void io_rw_fail(struct io_kiocb *req)
 {
 	int res;
diff --git a/io_uring/rw.h b/io_uring/rw.h
index f9e89b4fe4da..7c12216776bc 100644
--- a/io_uring/rw.h
+++ b/io_uring/rw.h
@@ -8,19 +8,27 @@ struct io_rw_state {
 	struct iovec			fast_iov[UIO_FASTIOV];
 };
 
+struct io_rw_state_meta {
+	struct iov_iter			iter_meta;
+	struct iov_iter_state		iter_state_meta;
+};
+
 struct io_async_rw {
 	struct io_rw_state		s;
+	struct io_rw_state_meta		s_meta;
 	const struct iovec		*free_iovec;
 	size_t				bytes_done;
 	struct wait_page_queue		wpq;
 };
 
 int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_prep_rw_meta(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_prep_rwv(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_read(struct io_kiocb *req, unsigned int issue_flags);
 int io_readv_prep_async(struct io_kiocb *req);
 int io_write(struct io_kiocb *req, unsigned int issue_flags);
+int io_rw_meta(struct io_kiocb *req, unsigned int issue_flags);
 int io_writev_prep_async(struct io_kiocb *req);
 void io_readv_writev_cleanup(struct io_kiocb *req);
 void io_rw_fail(struct io_kiocb *req);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 3/4] block: modify bio_integrity_map_user to accept iov_iter as argument
       [not found]   ` <CGME20240322185736epcas5p3d0093948e9904e775994bcbe735ea0c5@epcas5p3.samsung.com>
@ 2024-03-22 18:50     ` Kanchan Joshi
  0 siblings, 0 replies; 11+ messages in thread
From: Kanchan Joshi @ 2024-03-22 18:50 UTC (permalink / raw)
  To: martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538, Anuj Gupta, Kanchan Joshi

From: Anuj Gupta <[email protected]>

Refactor bio_integrity_map_user to accept iov_iter as
argument. This is a prep patch.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 block/bio-integrity.c     | 14 ++++++--------
 drivers/nvme/host/ioctl.c | 11 +++++++++--
 include/linux/bio.h       |  7 ++++---
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 2e3e8e04961e..e340b5a03cdf 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -308,17 +308,16 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages,
 	return nr_bvecs;
 }
 
-int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes,
-			   u32 seed)
+int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter,
+			  u32 seed)
 {
 	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 	unsigned int align = q->dma_pad_mask | queue_dma_alignment(q);
 	struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages;
 	struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec;
+	size_t offset, bytes = iter->count;
 	unsigned int direction, nr_bvecs;
-	struct iov_iter iter;
 	int ret, nr_vecs;
-	size_t offset;
 	bool copy;
 
 	if (bio_integrity(bio))
@@ -331,8 +330,7 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes,
 	else
 		direction = ITER_SOURCE;
 
-	iov_iter_ubuf(&iter, direction, ubuf, bytes);
-	nr_vecs = iov_iter_npages(&iter, BIO_MAX_VECS + 1);
+	nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS + 1);
 	if (nr_vecs > BIO_MAX_VECS)
 		return -E2BIG;
 	if (nr_vecs > UIO_FASTIOV) {
@@ -342,8 +340,8 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes,
 		pages = NULL;
 	}
 
-	copy = !iov_iter_is_aligned(&iter, align, align);
-	ret = iov_iter_extract_pages(&iter, &pages, bytes, nr_vecs, 0, &offset);
+	copy = !iov_iter_is_aligned(iter, align, align);
+	ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs, 0, &offset);
 	if (unlikely(ret < 0))
 		goto free_bvec;
 
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 3dfd5ae99ae0..deee13000f08 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -145,8 +145,15 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
 	if (bdev) {
 		bio_set_dev(bio, bdev);
 		if (meta_buffer && meta_len) {
-			ret = bio_integrity_map_user(bio, meta_buffer, meta_len,
-						     meta_seed);
+			struct iov_iter iter;
+			unsigned int direction;
+
+			if (bio_data_dir(bio) == READ)
+				direction = ITER_DEST;
+			else
+				direction = ITER_SOURCE;
+			iov_iter_ubuf(&iter, direction, meta_buffer, meta_len);
+			ret = bio_integrity_map_user(bio, &iter, meta_seed);
 			if (ret)
 				goto out_unmap;
 			req->cmd_flags |= REQ_INTEGRITY;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 875d792bffff..20cf47fc851f 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -723,7 +723,7 @@ static inline bool bioset_initialized(struct bio_set *bs)
 	for_each_bio(_bio)						\
 		bip_for_each_vec(_bvl, _bio->bi_integrity, _iter)
 
-int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t len, u32 seed);
+int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed);
 extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
 extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
 extern bool bio_integrity_prep(struct bio *);
@@ -795,8 +795,9 @@ static inline int bio_integrity_add_page(struct bio *bio, struct page *page,
 	return 0;
 }
 
-static inline int bio_integrity_map_user(struct bio *bio, void __user *ubuf,
-					 ssize_t len, u32 seed)
+static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter,
+					u32 seed)
+
 {
 	return -EINVAL;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [RFC PATCH 4/4] block: add support to pass the meta buffer
       [not found]   ` <CGME20240322185738epcas5p20e5bd448ce83350eb9e79c929c4a9b2b@epcas5p2.samsung.com>
@ 2024-03-22 18:50     ` Kanchan Joshi
  0 siblings, 0 replies; 11+ messages in thread
From: Kanchan Joshi @ 2024-03-22 18:50 UTC (permalink / raw)
  To: martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538, Kanchan Joshi, Anuj Gupta

If IOCB_USE_META flag is set, iocb->private points to iov_iter
corresponding to the meta-buffer.
Use it to prepare the bip and attach that to the bio that we send down.

Make sure that block-integrity checks are skipped for this user-owned
meta buffer.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 block/bio-integrity.c | 13 +++++++++++++
 block/fops.c          |  9 +++++++++
 block/t10-pi.c        |  6 ++++++
 include/linux/bio.h   |  6 ++++++
 4 files changed, 34 insertions(+)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index e340b5a03cdf..c46b70aff840 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -308,6 +308,19 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages,
 	return nr_bvecs;
 }
 
+int bio_integrity_map_iter(struct bio *bio, struct iov_iter *iter)
+{
+	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
+
+	if (!bi)
+		return -EINVAL;
+
+	if (iter->count < bio_integrity_bytes(bi, bio_sectors(bio)))
+		return -EINVAL;
+
+	return bio_integrity_map_user(bio, iter, 0);
+}
+
 int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter,
 			  u32 seed)
 {
diff --git a/block/fops.c b/block/fops.c
index 679d9b752fe8..e488fa66dd60 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -353,6 +353,15 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 		task_io_account_write(bio->bi_iter.bi_size);
 	}
 
+	if (unlikely(iocb->ki_flags & IOCB_USE_META)) {
+		ret = bio_integrity_map_iter(bio, iocb->private);
+		WRITE_ONCE(iocb->private, NULL);
+		if (unlikely(ret)) {
+			bio_put(bio);
+			return ret;
+		}
+	}
+
 	if (iocb->ki_flags & IOCB_NOWAIT)
 		bio->bi_opf |= REQ_NOWAIT;
 
diff --git a/block/t10-pi.c b/block/t10-pi.c
index d90892fd6f2a..72d1522417a1 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -156,6 +156,8 @@ static void t10_pi_type1_prepare(struct request *rq)
 		/* Already remapped? */
 		if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
 			break;
+		if (bip->bip_flags & BIP_INTEGRITY_USER)
+			break;
 
 		bip_for_each_vec(iv, bip, iter) {
 			unsigned int j;
@@ -205,6 +207,8 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
 		struct bio_vec iv;
 		struct bvec_iter iter;
 
+		if (bip->bip_flags & BIP_INTEGRITY_USER)
+			break;
 		bip_for_each_vec(iv, bip, iter) {
 			unsigned int j;
 			void *p;
@@ -408,6 +412,8 @@ static void ext_pi_type1_prepare(struct request *rq)
 		/* Already remapped? */
 		if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
 			break;
+		if (bip->bip_flags & BIP_INTEGRITY_USER)
+			break;
 
 		bip_for_each_vec(iv, bip, iter) {
 			unsigned int j;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 20cf47fc851f..34ea387dfc59 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -723,6 +723,7 @@ static inline bool bioset_initialized(struct bio_set *bs)
 	for_each_bio(_bio)						\
 		bip_for_each_vec(_bvl, _bio->bi_integrity, _iter)
 
+int bio_integrity_map_iter(struct bio *bio, struct iov_iter *iter);
 int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed);
 extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
 extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
@@ -802,6 +803,11 @@ static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter,
 	return -EINVAL;
 }
 
+static inline int bio_integrity_map_iter(struct bio *bio, struct iov_iter *iter)
+{
+	return -EINVAL;
+}
+
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/4] io_uring/rw: Get rid of flags field in struct io_rw
  2024-03-22 18:50     ` [RFC PATCH 1/4] io_uring/rw: Get rid of flags field in struct io_rw Kanchan Joshi
@ 2024-03-27 23:30       ` David Wei
  2024-03-27 23:32         ` David Wei
  0 siblings, 1 reply; 11+ messages in thread
From: David Wei @ 2024-03-27 23:30 UTC (permalink / raw)
  To: Kanchan Joshi, martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538, Anuj Gupta

On 2024-03-22 11:50, Kanchan Joshi wrote:
> From: Anuj Gupta <[email protected]>
> 
> Get rid of the flags field in io_rw. Flags can be set in kiocb->flags
> during prep rather than doing it while issuing the I/O in io_read/io_write.
> 
> Signed-off-by: Anuj Gupta <[email protected]>
> ---
>  io_uring/rw.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)

This patch looks fine and is a no-op on its own, but I think there is a
subtle semantic change. If the rw_flags is invalid (i.e.
kiocb_set_rw_flags() returns an err) and prep() fails, then the
remaining submissions won't be submitted unless IORING_SETUP_SUBMIT_ALL
is set.

Currently if kiocb_set_rw_flags() fails in prep(), only the request will
fail.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 1/4] io_uring/rw: Get rid of flags field in struct io_rw
  2024-03-27 23:30       ` David Wei
@ 2024-03-27 23:32         ` David Wei
  0 siblings, 0 replies; 11+ messages in thread
From: David Wei @ 2024-03-27 23:32 UTC (permalink / raw)
  To: Kanchan Joshi, martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538, Anuj Gupta

On 2024-03-27 16:30, David Wei wrote:
> On 2024-03-22 11:50, Kanchan Joshi wrote:
>> From: Anuj Gupta <[email protected]>
>>
>> Get rid of the flags field in io_rw. Flags can be set in kiocb->flags
>> during prep rather than doing it while issuing the I/O in io_read/io_write.
>>
>> Signed-off-by: Anuj Gupta <[email protected]>
>> ---
>>  io_uring/rw.c | 22 +++++++++++-----------
>>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> This patch looks fine and is a no-op on its own, but I think there is a
> subtle semantic change. If the rw_flags is invalid (i.e.
> kiocb_set_rw_flags() returns an err) and prep() fails, then the
> remaining submissions won't be submitted unless IORING_SETUP_SUBMIT_ALL
> is set.
> 
> Currently if kiocb_set_rw_flags() fails in prep(), only the request will
> fail.

Sorry, that should say fails in _issue()_.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/4] Read/Write with meta buffer
  2024-03-22 18:50 ` [RFC PATCH 0/4] Read/Write with meta buffer Kanchan Joshi
                     ` (3 preceding siblings ...)
       [not found]   ` <CGME20240322185738epcas5p20e5bd448ce83350eb9e79c929c4a9b2b@epcas5p2.samsung.com>
@ 2024-03-27 23:38   ` Jens Axboe
  2024-03-28 12:03     ` Kanchan Joshi
  2024-04-06 21:30   ` Pavel Begunkov
  5 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2024-03-27 23:38 UTC (permalink / raw)
  To: Kanchan Joshi, martin.petersen, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538

On 3/22/24 12:50 PM, Kanchan Joshi wrote:
> This patchset is aimed at getting the feedback on a new io_uring
> interface that userspace can use to exchange meta buffer along with
> read/write.
> 
> Two new opcodes for that: IORING_OP_READ_META and IORING_OP_WRITE_META.
> The leftover space in the SQE is used to send meta buffer pointer
> and its length. Patch #2 for this.
> 
> The interface is supported for block direct IO. Patch #4 for this.
> Other two are prep patches.
> 
> It has been tried not to touch the hot read/write path, as much as
> possible. Performance for non-meta IO is same after the patches [2].
> There is some code in the cold path (worker-based async)
> though.

This patchset should look cleaner if you rebase it on top of the current
for-6.10/io_uring branch, as it gets rid of the async nastiness. Since
that'll need doing anyway, could you repost a v2 where it's rebased on
top of that?

Also in terms of the cover letter, would be good with a bit more of a
description of what this enables. It's a bit scant on detail on what
exactly this gives you.

> taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
> submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
> submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
> IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31
> 
> With this:
> taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
> submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
> submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
> IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31

Not that I don't believe you, but that looks like you pasted the same
stuff in there twice? It's the exact same perf and pids.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/4] Read/Write with meta buffer
  2024-03-27 23:38   ` [RFC PATCH 0/4] Read/Write with " Jens Axboe
@ 2024-03-28 12:03     ` Kanchan Joshi
  0 siblings, 0 replies; 11+ messages in thread
From: Kanchan Joshi @ 2024-03-28 12:03 UTC (permalink / raw)
  To: Jens Axboe, martin.petersen, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538

On 3/28/2024 5:08 AM, Jens Axboe wrote:

> This patchset should look cleaner if you rebase it on top of the current
> for-6.10/io_uring branch, as it gets rid of the async nastiness. Since
> that'll need doing anyway, could you repost a v2 where it's rebased on
> top of that?

Yes, next iteration will use that as the base.

> Also in terms of the cover letter, would be good with a bit more of a
> description of what this enables. It's a bit scant on detail on what
> exactly this gives you.

Will fix that.
But currently the only thing it gives is - pass meta buffer to/from the 
block-device.

It keeps things simple, and fine for PI type 0 (normal unprotected IO).
For other PI types, exposing few knobs may help. Using "sqe->rw_flags" 
if there is no other way.

>> taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
>> submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
>> submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
>> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
>> Engine=io_uring, sq_ring=128, cq_ring=128
>> IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
>> IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31
>>
>> With this:
>> taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
>> submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
>> submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
>> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
>> Engine=io_uring, sq_ring=128, cq_ring=128
>> IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
>> IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31
> 
> Not that I don't believe you, but that looks like you pasted the same
> stuff in there twice? It's the exact same perf and pids.

Indeed :-(
Made a goof-up while pasting stuff [1] to the cover letter.

[1]
Before the patch:
# taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 
/dev/nvme0n1 /dev/nvme1n1
submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31
IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31
Exiting on timeout
Maximum IOPS=10.04M

After the patch:
# taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 
/dev/nvme0n1 /dev/nvme1n1
submitter=1, tid=2412, file=/dev/nvme1n1, node=-1
submitter=0, tid=2411, file=/dev/nvme0n1, node=-1
polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
IOPS=10.03M, BW=4.90GiB/s, IOS/call=31/31
IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31
Exiting on timeout
Maximum IOPS=10.04M

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/4] Read/Write with meta buffer
  2024-03-22 18:50 ` [RFC PATCH 0/4] Read/Write with meta buffer Kanchan Joshi
                     ` (4 preceding siblings ...)
  2024-03-27 23:38   ` [RFC PATCH 0/4] Read/Write with " Jens Axboe
@ 2024-04-06 21:30   ` Pavel Begunkov
  2024-04-25 19:05     ` Kanchan Joshi
  5 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2024-04-06 21:30 UTC (permalink / raw)
  To: Kanchan Joshi, martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538

On 3/22/24 18:50, Kanchan Joshi wrote:
> This patchset is aimed at getting the feedback on a new io_uring
> interface that userspace can use to exchange meta buffer along with
> read/write.
> 
> Two new opcodes for that: IORING_OP_READ_META and IORING_OP_WRITE_META.
> The leftover space in the SQE is used to send meta buffer pointer
> and its length. Patch #2 for this.

I do remember there were back and forth design discussions about that
back when some other guy attempted to implement it, but have you tried
to do it not as a separate opcode?

It reads like all read/write opcodes might benefit from it, and it'd
be unfortunate to then be adding IORING_OP_READ_META_FIXED and
multiplicatively all other variants.

> The interface is supported for block direct IO. Patch #4 for this.
> Other two are prep patches.
> 
> It has been tried not to touch the hot read/write path, as much as
> possible. Performance for non-meta IO is same after the patches [2].
> There is some code in the cold path (worker-based async)
> though.
> 
> Moderately tested by modifying the fio [1] to use this interface
> (only for NVMe block devices)
> 
> [1] https://github.com/OpenMPDK/fio/tree/feat/test-meta
> 
> [2]
> without this:
> 
> taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
> submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
> submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
> IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31
> 
> With this:
> taskset -c 2,5 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1
> submitter=1, tid=2453, file=/dev/nvme1n1, node=-1
> submitter=0, tid=2452, file=/dev/nvme0n1, node=-1
> polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128
> Engine=io_uring, sq_ring=128, cq_ring=128
> IOPS=10.02M, BW=4.89GiB/s, IOS/call=31/31
> IOPS=10.04M, BW=4.90GiB/s, IOS/call=31/31
> 
> Anuj Gupta (3):
>    io_uring/rw: Get rid of flags field in struct io_rw
>    io_uring/rw: support read/write with metadata
>    block: modify bio_integrity_map_user to accept iov_iter as argument
> 
> Kanchan Joshi (1):
>    block: add support to pass the meta buffer
> 
>   block/bio-integrity.c         |  27 ++++++---
>   block/fops.c                  |   9 +++
>   block/t10-pi.c                |   6 ++
>   drivers/nvme/host/ioctl.c     |  11 +++-
>   include/linux/bio.h           |  13 +++-
>   include/linux/fs.h            |   1 +
>   include/uapi/linux/io_uring.h |   6 ++
>   io_uring/io_uring.c           |   2 +
>   io_uring/opdef.c              |  29 +++++++++
>   io_uring/rw.c                 | 108 +++++++++++++++++++++++++++++-----
>   io_uring/rw.h                 |   8 +++
>   11 files changed, 193 insertions(+), 27 deletions(-)
> 
> 
> base-commit: 6f0974eccbf78baead1735722c4f1ee3eb9422cd

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC PATCH 0/4] Read/Write with meta buffer
  2024-04-06 21:30   ` Pavel Begunkov
@ 2024-04-25 19:05     ` Kanchan Joshi
  0 siblings, 0 replies; 11+ messages in thread
From: Kanchan Joshi @ 2024-04-25 19:05 UTC (permalink / raw)
  To: Pavel Begunkov, martin.petersen, axboe, kbusch, hch
  Cc: io-uring, linux-block, anuj1072538

On 4/7/2024 3:00 AM, Pavel Begunkov wrote:
> On 3/22/24 18:50, Kanchan Joshi wrote:
>> This patchset is aimed at getting the feedback on a new io_uring
>> interface that userspace can use to exchange meta buffer along with
>> read/write.
>>
>> Two new opcodes for that: IORING_OP_READ_META and IORING_OP_WRITE_META.
>> The leftover space in the SQE is used to send meta buffer pointer
>> and its length. Patch #2 for this.
> 
> I do remember there were back and forth design discussions about that
> back when some other guy attempted to implement it, but have you tried
> to do it not as a separate opcode?

Did not try that in the first cut, thinking it would help in not 
touching the hot (non-meta) io path. But open to this.

> It reads like all read/write opcodes might benefit from it, and it'd
> be unfortunate to then be adding IORING_OP_READ_META_FIXED and
> multiplicatively all other variants.

Right, that's a good point.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-04-25 19:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20240322185729epcas5p350c5054b5b519a6aa9d1b35ba3709563@epcas5p3.samsung.com>
2024-03-22 18:50 ` [RFC PATCH 0/4] Read/Write with meta buffer Kanchan Joshi
     [not found]   ` <CGME20240322185731epcas5p20fc525f793a537310f7b3ae5ba5bc75b@epcas5p2.samsung.com>
2024-03-22 18:50     ` [RFC PATCH 1/4] io_uring/rw: Get rid of flags field in struct io_rw Kanchan Joshi
2024-03-27 23:30       ` David Wei
2024-03-27 23:32         ` David Wei
     [not found]   ` <CGME20240322185734epcas5p2cd407dac97cd157c1833c4022ea84805@epcas5p2.samsung.com>
2024-03-22 18:50     ` [RFC PATCH 2/4] io_uring/rw: support read/write with metadata Kanchan Joshi
     [not found]   ` <CGME20240322185736epcas5p3d0093948e9904e775994bcbe735ea0c5@epcas5p3.samsung.com>
2024-03-22 18:50     ` [RFC PATCH 3/4] block: modify bio_integrity_map_user to accept iov_iter as argument Kanchan Joshi
     [not found]   ` <CGME20240322185738epcas5p20e5bd448ce83350eb9e79c929c4a9b2b@epcas5p2.samsung.com>
2024-03-22 18:50     ` [RFC PATCH 4/4] block: add support to pass the meta buffer Kanchan Joshi
2024-03-27 23:38   ` [RFC PATCH 0/4] Read/Write with " Jens Axboe
2024-03-28 12:03     ` Kanchan Joshi
2024-04-06 21:30   ` Pavel Begunkov
2024-04-25 19:05     ` Kanchan Joshi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox