* [PATCH for-next 0/4] fixed-buffer for uring-cmd/passthrough [not found] <CGME20220819104031epcas5p3d485526e1b2b42078ccce7e40a74b7f5@epcas5p3.samsung.com> @ 2022-08-19 10:30 ` Kanchan Joshi [not found] ` <CGME20220819104036epcas5p2bb4d9b2cccbdfcdb460e085abe7fd1a8@epcas5p2.samsung.com> ` (3 more replies) 0 siblings, 4 replies; 9+ messages in thread From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Kanchan Joshi Hi, Currently uring-cmd lacks the ability to leverage the pre-registered buffers. This series adds new fixed-buffer variant of uring command IORING_OP_URING_CMD_FIXED, and plumbs nvme passthrough to work with that. Patch 1, 3 = prep/infrastructure Patch 2 = expand io_uring command to use registered-buffers Patch 4 = expand nvme passthrough to use registered-buffers Using registered-buffers showed 5-12% IOPS gain in my setup. QD Without With 8 853 928 32 1370 1528 128 1505 1631 This series is prepared on top of: for-next + iopoll-passthru series [1] + passthru optimization series [2]. A unified branch with all that is present here: https://github.com/OpenMPDK/linux/commits/feat/pt_fixedbufs_v1 Fio that can use IORING_OP_URING_CMD_FIXED (on specifying fixedbufs=1) is here - https://github.com/joshkan/fio/commit/300f1187f75aaf2c502c180041943c340670d0ac [1] https://lore.kernel.org/linux-block/[email protected]/ [2] https://lore.kernel.org/linux-block/[email protected]/ Anuj Gupta (2): io_uring: introduce io_uring_cmd_import_fixed io_uring: introduce fixed buffer support for io_uring_cmd Kanchan Joshi (2): block: add helper to map bvec iterator for passthrough nvme: wire up fixed buffer support for nvme passthrough block/blk-map.c | 71 +++++++++++++++++++++++++++++++++++ drivers/nvme/host/ioctl.c | 38 +++++++++++++------ include/linux/blk-mq.h | 1 + include/linux/io_uring.h | 10 +++++ include/uapi/linux/io_uring.h | 1 + io_uring/opdef.c | 10 +++++ io_uring/rw.c | 3 +- io_uring/uring_cmd.c | 26 +++++++++++++ 8 files changed, 147 insertions(+), 13 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CGME20220819104036epcas5p2bb4d9b2cccbdfcdb460e085abe7fd1a8@epcas5p2.samsung.com>]
* [PATCH for-next 1/4] io_uring: introduce io_uring_cmd_import_fixed [not found] ` <CGME20220819104036epcas5p2bb4d9b2cccbdfcdb460e085abe7fd1a8@epcas5p2.samsung.com> @ 2022-08-19 10:30 ` Kanchan Joshi 0 siblings, 0 replies; 9+ messages in thread From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta, Kanchan Joshi From: Anuj Gupta <[email protected]> This is a new helper that callers can use to obtain a bvec iterator for the previously mapped buffer. This is preparatory work to enable fixed-buffer support for io_uring_cmd. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- include/linux/io_uring.h | 7 +++++++ io_uring/uring_cmd.c | 10 ++++++++++ 2 files changed, 17 insertions(+) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 58676c0a398f..60aba10468fc 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -32,6 +32,8 @@ struct io_uring_cmd { }; #if defined(CONFIG_IO_URING) +int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, + struct iov_iter *iter, void *ioucmd) void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, void (*task_work_cb)(struct io_uring_cmd *)); @@ -59,6 +61,11 @@ static inline void io_uring_free(struct task_struct *tsk) __io_uring_free(tsk); } #else +int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, + struct iov_iter *iter, void *ioucmd) +{ + return -1; +} static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t ret2) { diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 8900856fa588..ff65cc8ab6cc 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -121,3 +121,13 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) return IOU_ISSUE_SKIP_COMPLETE; } + +int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, + int rw, struct iov_iter *iter, void *ioucmd) +{ + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); + struct io_mapped_ubuf *imu = req->imu; + + return io_import_fixed(rw, iter, imu, ubuf, len); +} +EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); -- 2.25.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <CGME20220819104038epcas5p265c9385cfd9189d20ebfffeaa4d5efae@epcas5p2.samsung.com>]
* [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd [not found] ` <CGME20220819104038epcas5p265c9385cfd9189d20ebfffeaa4d5efae@epcas5p2.samsung.com> @ 2022-08-19 10:30 ` Kanchan Joshi 2022-08-22 10:58 ` Pavel Begunkov 0 siblings, 1 reply; 9+ messages in thread From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta, Kanchan Joshi From: Anuj Gupta <[email protected]> Add IORING_OP_URING_CMD_FIXED opcode that enables sending io_uring command with previously registered buffers. User-space passes the buffer index in sqe->buf_index, same as done in read/write variants that uses fixed buffers. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- include/linux/io_uring.h | 5 ++++- include/uapi/linux/io_uring.h | 1 + io_uring/opdef.c | 10 ++++++++++ io_uring/rw.c | 3 ++- io_uring/uring_cmd.c | 18 +++++++++++++++++- 5 files changed, 34 insertions(+), 3 deletions(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 60aba10468fc..40961d7c3827 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -5,6 +5,8 @@ #include <linux/sched.h> #include <linux/xarray.h> +#include<uapi/linux/io_uring.h> + enum io_uring_cmd_flags { IO_URING_F_COMPLETE_DEFER = 1, IO_URING_F_UNLOCKED = 2, @@ -15,6 +17,7 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = 4, IO_URING_F_CQE32 = 8, IO_URING_F_IOPOLL = 16, + IO_URING_F_FIXEDBUFS = 32, }; struct io_uring_cmd { @@ -33,7 +36,7 @@ struct io_uring_cmd { #if defined(CONFIG_IO_URING) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, - struct iov_iter *iter, void *ioucmd) + struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, void (*task_work_cb)(struct io_uring_cmd *)); diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 1463cfecb56b..80ea35d1ed5c 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -203,6 +203,7 @@ enum io_uring_op { IORING_OP_SOCKET, IORING_OP_URING_CMD, IORING_OP_SENDZC_NOTIF, + IORING_OP_URING_CMD_FIXED, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 9a0df19306fe..7d5731b84c92 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = { .issue = io_uring_cmd, .prep_async = io_uring_cmd_prep_async, }, + [IORING_OP_URING_CMD_FIXED] = { + .needs_file = 1, + .plug = 1, + .name = "URING_CMD_FIXED", + .iopoll = 1, + .async_size = uring_cmd_pdu_size(1), + .prep = io_uring_cmd_prep, + .issue = io_uring_cmd, + .prep_async = io_uring_cmd_prep_async, + }, [IORING_OP_SENDZC_NOTIF] = { .name = "SENDZC_NOTIF", .needs_file = 1, diff --git a/io_uring/rw.c b/io_uring/rw.c index 1a4fb8a44b9a..3c7b94bffa62 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) if (READ_ONCE(req->iopoll_completed)) break; - if (req->opcode == IORING_OP_URING_CMD) { + if (req->opcode == IORING_OP_URING_CMD || + req->opcode == IORING_OP_URING_CMD_FIXED) { struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw; ret = req->file->f_op->uring_cmd_iopoll(ioucmd); diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index ff65cc8ab6cc..9383150b2949 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -3,11 +3,13 @@ #include <linux/errno.h> #include <linux/file.h> #include <linux/io_uring.h> +#include <linux/nospec.h> #include <uapi/linux/io_uring.h> #include "io_uring.h" #include "uring_cmd.h" +#include "rsrc.h" static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) { @@ -74,6 +76,18 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (sqe->rw_flags || sqe->__pad1) return -EINVAL; + + req->buf_index = READ_ONCE(sqe->buf_index); + if (req->opcode == IORING_OP_URING_CMD_FIXED) { + struct io_ring_ctx *ctx = req->ctx; + u16 index; + + if (unlikely(req->buf_index >= ctx->nr_user_bufs)) + return -EFAULT; + index = array_index_nospec(req->buf_index, ctx->nr_user_bufs); + req->imu = ctx->user_bufs[index]; + io_req_set_rsrc_node(req, ctx, 0); + } ioucmd->cmd = sqe->cmd; ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); return 0; @@ -98,6 +112,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) req->iopoll_completed = 0; WRITE_ONCE(ioucmd->cookie, NULL); } + if (req->opcode == IORING_OP_URING_CMD_FIXED) + issue_flags |= IO_URING_F_FIXEDBUFS; if (req_has_async_data(req)) ioucmd->cmd = req->async_data; @@ -125,7 +141,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) { - struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); + struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); struct io_mapped_ubuf *imu = req->imu; return io_import_fixed(rw, iter, imu, ubuf, len); -- 2.25.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd 2022-08-19 10:30 ` [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd Kanchan Joshi @ 2022-08-22 10:58 ` Pavel Begunkov 2022-08-22 11:33 ` Kanchan Joshi 0 siblings, 1 reply; 9+ messages in thread From: Pavel Begunkov @ 2022-08-22 10:58 UTC (permalink / raw) To: Kanchan Joshi, axboe, hch, kbusch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta On 8/19/22 11:30, Kanchan Joshi wrote: > From: Anuj Gupta <[email protected]> > > Add IORING_OP_URING_CMD_FIXED opcode that enables sending io_uring > command with previously registered buffers. User-space passes the buffer > index in sqe->buf_index, same as done in read/write variants that uses > fixed buffers. > > Signed-off-by: Anuj Gupta <[email protected]> > Signed-off-by: Kanchan Joshi <[email protected]> > --- > include/linux/io_uring.h | 5 ++++- > include/uapi/linux/io_uring.h | 1 + > io_uring/opdef.c | 10 ++++++++++ > io_uring/rw.c | 3 ++- > io_uring/uring_cmd.c | 18 +++++++++++++++++- > 5 files changed, 34 insertions(+), 3 deletions(-) > > diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h > index 60aba10468fc..40961d7c3827 100644 > --- a/include/linux/io_uring.h > +++ b/include/linux/io_uring.h > @@ -5,6 +5,8 @@ > #include <linux/sched.h> > #include <linux/xarray.h> > > +#include<uapi/linux/io_uring.h> > + > enum io_uring_cmd_flags { > IO_URING_F_COMPLETE_DEFER = 1, > IO_URING_F_UNLOCKED = 2, > @@ -15,6 +17,7 @@ enum io_uring_cmd_flags { > IO_URING_F_SQE128 = 4, > IO_URING_F_CQE32 = 8, > IO_URING_F_IOPOLL = 16, > + IO_URING_F_FIXEDBUFS = 32, > }; > > struct io_uring_cmd { > @@ -33,7 +36,7 @@ struct io_uring_cmd { > > #if defined(CONFIG_IO_URING) > int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, > - struct iov_iter *iter, void *ioucmd) > + struct iov_iter *iter, void *ioucmd); Please try to compile the first patch separately > void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); > void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, > void (*task_work_cb)(struct io_uring_cmd *)); > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > index 1463cfecb56b..80ea35d1ed5c 100644 > --- a/include/uapi/linux/io_uring.h > +++ b/include/uapi/linux/io_uring.h > @@ -203,6 +203,7 @@ enum io_uring_op { > IORING_OP_SOCKET, > IORING_OP_URING_CMD, > IORING_OP_SENDZC_NOTIF, > + IORING_OP_URING_CMD_FIXED, I don't think it should be another opcode, is there any control flags we can fit it in? > /* this goes last, obviously */ > IORING_OP_LAST, > diff --git a/io_uring/opdef.c b/io_uring/opdef.c > index 9a0df19306fe..7d5731b84c92 100644 > --- a/io_uring/opdef.c > +++ b/io_uring/opdef.c > @@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = { > .issue = io_uring_cmd, > .prep_async = io_uring_cmd_prep_async, > }, > + [IORING_OP_URING_CMD_FIXED] = { > + .needs_file = 1, > + .plug = 1, > + .name = "URING_CMD_FIXED", > + .iopoll = 1, > + .async_size = uring_cmd_pdu_size(1), > + .prep = io_uring_cmd_prep, > + .issue = io_uring_cmd, > + .prep_async = io_uring_cmd_prep_async, > + }, > [IORING_OP_SENDZC_NOTIF] = { > .name = "SENDZC_NOTIF", > .needs_file = 1, > diff --git a/io_uring/rw.c b/io_uring/rw.c > index 1a4fb8a44b9a..3c7b94bffa62 100644 > --- a/io_uring/rw.c > +++ b/io_uring/rw.c > @@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) > if (READ_ONCE(req->iopoll_completed)) > break; > > - if (req->opcode == IORING_OP_URING_CMD) { > + if (req->opcode == IORING_OP_URING_CMD || > + req->opcode == IORING_OP_URING_CMD_FIXED) { I don't see the changed chunk upstream > struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw; > > ret = req->file->f_op->uring_cmd_iopoll(ioucmd); [...] -- Pavel Begunkov ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd 2022-08-22 10:58 ` Pavel Begunkov @ 2022-08-22 11:33 ` Kanchan Joshi 2022-08-25 9:34 ` Pavel Begunkov 0 siblings, 1 reply; 9+ messages in thread From: Kanchan Joshi @ 2022-08-22 11:33 UTC (permalink / raw) To: Pavel Begunkov Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta [-- Attachment #1: Type: text/plain, Size: 4098 bytes --] On Mon, Aug 22, 2022 at 11:58:24AM +0100, Pavel Begunkov wrote: >On 8/19/22 11:30, Kanchan Joshi wrote: >>From: Anuj Gupta <[email protected]> >> >>Add IORING_OP_URING_CMD_FIXED opcode that enables sending io_uring >>command with previously registered buffers. User-space passes the buffer >>index in sqe->buf_index, same as done in read/write variants that uses >>fixed buffers. >> >>Signed-off-by: Anuj Gupta <[email protected]> >>Signed-off-by: Kanchan Joshi <[email protected]> >>--- >> include/linux/io_uring.h | 5 ++++- >> include/uapi/linux/io_uring.h | 1 + >> io_uring/opdef.c | 10 ++++++++++ >> io_uring/rw.c | 3 ++- >> io_uring/uring_cmd.c | 18 +++++++++++++++++- >> 5 files changed, 34 insertions(+), 3 deletions(-) >> >>diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h >>index 60aba10468fc..40961d7c3827 100644 >>--- a/include/linux/io_uring.h >>+++ b/include/linux/io_uring.h >>@@ -5,6 +5,8 @@ >> #include <linux/sched.h> >> #include <linux/xarray.h> >>+#include<uapi/linux/io_uring.h> >>+ >> enum io_uring_cmd_flags { >> IO_URING_F_COMPLETE_DEFER = 1, >> IO_URING_F_UNLOCKED = 2, >>@@ -15,6 +17,7 @@ enum io_uring_cmd_flags { >> IO_URING_F_SQE128 = 4, >> IO_URING_F_CQE32 = 8, >> IO_URING_F_IOPOLL = 16, >>+ IO_URING_F_FIXEDBUFS = 32, >> }; >> struct io_uring_cmd { >>@@ -33,7 +36,7 @@ struct io_uring_cmd { >> #if defined(CONFIG_IO_URING) >> int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, >>- struct iov_iter *iter, void *ioucmd) >>+ struct iov_iter *iter, void *ioucmd); > >Please try to compile the first patch separately Indeed, this should have been part of that patch. Thanks. >> void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); >> void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >> void (*task_work_cb)(struct io_uring_cmd *)); >>diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >>index 1463cfecb56b..80ea35d1ed5c 100644 >>--- a/include/uapi/linux/io_uring.h >>+++ b/include/uapi/linux/io_uring.h >>@@ -203,6 +203,7 @@ enum io_uring_op { >> IORING_OP_SOCKET, >> IORING_OP_URING_CMD, >> IORING_OP_SENDZC_NOTIF, >>+ IORING_OP_URING_CMD_FIXED, > >I don't think it should be another opcode, is there any >control flags we can fit it in? using sqe->rw_flags could be another way. But I think that may create bit of disharmony in user-space. Current choice (IORING_OP_URING_CMD_FIXED) is along the same lines as IORING_OP_READ/WRITE_FIXED. User-space uses new opcode, and sends the buffer by filling sqe->buf_index. So must we take a different way? >> /* this goes last, obviously */ >> IORING_OP_LAST, >>diff --git a/io_uring/opdef.c b/io_uring/opdef.c >>index 9a0df19306fe..7d5731b84c92 100644 >>--- a/io_uring/opdef.c >>+++ b/io_uring/opdef.c >>@@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = { >> .issue = io_uring_cmd, >> .prep_async = io_uring_cmd_prep_async, >> }, >>+ [IORING_OP_URING_CMD_FIXED] = { >>+ .needs_file = 1, >>+ .plug = 1, >>+ .name = "URING_CMD_FIXED", >>+ .iopoll = 1, >>+ .async_size = uring_cmd_pdu_size(1), >>+ .prep = io_uring_cmd_prep, >>+ .issue = io_uring_cmd, >>+ .prep_async = io_uring_cmd_prep_async, >>+ }, >> [IORING_OP_SENDZC_NOTIF] = { >> .name = "SENDZC_NOTIF", >> .needs_file = 1, >>diff --git a/io_uring/rw.c b/io_uring/rw.c >>index 1a4fb8a44b9a..3c7b94bffa62 100644 >>--- a/io_uring/rw.c >>+++ b/io_uring/rw.c >>@@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) >> if (READ_ONCE(req->iopoll_completed)) >> break; >>- if (req->opcode == IORING_OP_URING_CMD) { >>+ if (req->opcode == IORING_OP_URING_CMD || >>+ req->opcode == IORING_OP_URING_CMD_FIXED) { > >I don't see the changed chunk upstream Right, it is on top of iopoll support (plus one more series mentioned in covered letter). Here is the link - https://lore.kernel.org/linux-block/[email protected]/ It would be great if you could review that. [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd 2022-08-22 11:33 ` Kanchan Joshi @ 2022-08-25 9:34 ` Pavel Begunkov 2022-08-25 16:02 ` Kanchan Joshi 0 siblings, 1 reply; 9+ messages in thread From: Pavel Begunkov @ 2022-08-25 9:34 UTC (permalink / raw) To: Kanchan Joshi Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta On 8/22/22 12:33, Kanchan Joshi wrote: > On Mon, Aug 22, 2022 at 11:58:24AM +0100, Pavel Begunkov wrote: [...] >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >>> index 1463cfecb56b..80ea35d1ed5c 100644 >>> --- a/include/uapi/linux/io_uring.h >>> +++ b/include/uapi/linux/io_uring.h >>> @@ -203,6 +203,7 @@ enum io_uring_op { >>> IORING_OP_SOCKET, >>> IORING_OP_URING_CMD, >>> IORING_OP_SENDZC_NOTIF, >>> + IORING_OP_URING_CMD_FIXED, >> >> I don't think it should be another opcode, is there any >> control flags we can fit it in? > > using sqe->rw_flags could be another way. We also use ->ioprio for io_uring opcode specific flags, e.g. like in io_sendmsg_prep() for IORING_RECVSEND_POLL_FIRST, might be even better better. > But I think that may create bit of disharmony in user-space. > Current choice (IORING_OP_URING_CMD_FIXED) is along the same lines as > IORING_OP_READ/WRITE_FIXED. And I still believe it was a bad choice, I don't like this encoding of independent options/features by linearising toggles into opcodes. A consistent way to add vectored fixed bufs would be to have a 4th opcode, e.g. READV_FIXED, which is not great. > User-space uses new opcode, and sends the > buffer by filling sqe->buf_index. So must we take a different way? I do think so >>> /* this goes last, obviously */ >>> IORING_OP_LAST, >>> diff --git a/io_uring/opdef.c b/io_uring/opdef.c >>> index 9a0df19306fe..7d5731b84c92 100644 >>> --- a/io_uring/opdef.c >>> +++ b/io_uring/opdef.c >>> @@ -472,6 +472,16 @@ const struct io_op_def io_op_defs[] = { >>> .issue = io_uring_cmd, >>> .prep_async = io_uring_cmd_prep_async, >>> }, >>> + [IORING_OP_URING_CMD_FIXED] = { >>> + .needs_file = 1, >>> + .plug = 1, >>> + .name = "URING_CMD_FIXED", >>> + .iopoll = 1, >>> + .async_size = uring_cmd_pdu_size(1), >>> + .prep = io_uring_cmd_prep, >>> + .issue = io_uring_cmd, >>> + .prep_async = io_uring_cmd_prep_async, >>> + }, >>> [IORING_OP_SENDZC_NOTIF] = { >>> .name = "SENDZC_NOTIF", >>> .needs_file = 1, >>> diff --git a/io_uring/rw.c b/io_uring/rw.c >>> index 1a4fb8a44b9a..3c7b94bffa62 100644 >>> --- a/io_uring/rw.c >>> +++ b/io_uring/rw.c >>> @@ -1005,7 +1005,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) >>> if (READ_ONCE(req->iopoll_completed)) >>> break; >>> - if (req->opcode == IORING_OP_URING_CMD) { >>> + if (req->opcode == IORING_OP_URING_CMD || >>> + req->opcode == IORING_OP_URING_CMD_FIXED) { >> >> I don't see the changed chunk upstream > > Right, it is on top of iopoll support (plus one more series mentioned in > covered letter). Here is the link - https://lore.kernel.org/linux-block/[email protected]/ > It would be great if you could review that. > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd 2022-08-25 9:34 ` Pavel Begunkov @ 2022-08-25 16:02 ` Kanchan Joshi 0 siblings, 0 replies; 9+ messages in thread From: Kanchan Joshi @ 2022-08-25 16:02 UTC (permalink / raw) To: Pavel Begunkov Cc: axboe, hch, kbusch, io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Anuj Gupta [-- Attachment #1: Type: text/plain, Size: 1462 bytes --] On Thu, Aug 25, 2022 at 10:34:11AM +0100, Pavel Begunkov wrote: >On 8/22/22 12:33, Kanchan Joshi wrote: >>On Mon, Aug 22, 2022 at 11:58:24AM +0100, Pavel Begunkov wrote: >[...] >>>>diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >>>>index 1463cfecb56b..80ea35d1ed5c 100644 >>>>--- a/include/uapi/linux/io_uring.h >>>>+++ b/include/uapi/linux/io_uring.h >>>>@@ -203,6 +203,7 @@ enum io_uring_op { >>>> IORING_OP_SOCKET, >>>> IORING_OP_URING_CMD, >>>> IORING_OP_SENDZC_NOTIF, >>>>+ IORING_OP_URING_CMD_FIXED, >>> >>>I don't think it should be another opcode, is there any >>>control flags we can fit it in? >> >>using sqe->rw_flags could be another way. > >We also use ->ioprio for io_uring opcode specific flags, >e.g. like in io_sendmsg_prep() for IORING_RECVSEND_POLL_FIRST, >might be even better better. > >>But I think that may create bit of disharmony in user-space. >>Current choice (IORING_OP_URING_CMD_FIXED) is along the same lines as >>IORING_OP_READ/WRITE_FIXED. > >And I still believe it was a bad choice, I don't like this encoding >of independent options/features by linearising toggles into opcodes. >A consistent way to add vectored fixed bufs would be to have a 4th >opcode, e.g. READV_FIXED, which is not great. > >>User-space uses new opcode, and sends the >>buffer by filling sqe->buf_index. So must we take a different way? > >I do think so I see. Will change this in next iteration. [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CGME20220819104042epcas5p177f384cd4c15918f666c7eacc4dfab4c@epcas5p1.samsung.com>]
* [PATCH for-next 3/4] block: add helper to map bvec iterator for passthrough [not found] ` <CGME20220819104042epcas5p177f384cd4c15918f666c7eacc4dfab4c@epcas5p1.samsung.com> @ 2022-08-19 10:30 ` Kanchan Joshi 0 siblings, 0 replies; 9+ messages in thread From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Kanchan Joshi, Anuj Gupta Add blk_rq_map_user_fixedb which maps the bvec iterator into a bio and places that into the request. This helper is to be used in nvme for uring-passthrough with fixed-buffer. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- block/blk-map.c | 71 ++++++++++++++++++++++++++++++++++++++++++ include/linux/blk-mq.h | 1 + 2 files changed, 72 insertions(+) diff --git a/block/blk-map.c b/block/blk-map.c index d0ff80a9902e..ee17cc78bf00 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -611,6 +611,77 @@ int blk_rq_map_user(struct request_queue *q, struct request *rq, } EXPORT_SYMBOL(blk_rq_map_user); +/* Prepare bio for passthrough IO given an existing bvec iter */ +int blk_rq_map_user_bvec(struct request *rq, struct iov_iter *iter) +{ + struct request_queue *q = rq->q; + size_t iter_count, nr_segs; + struct bio *bio; + struct bio_vec *bv, *bvec_arr, *bvprvp = NULL; + struct queue_limits *lim = &q->limits; + unsigned int nsegs = 0, bytes = 0; + int ret, i; + + iter_count = iov_iter_count(iter); + nr_segs = iter->nr_segs; + + if (!iter_count || (iter_count >> 9) > queue_max_hw_sectors(q)) + return -EINVAL; + if (nr_segs > queue_max_segments(q)) + return -EINVAL; + if (rq->cmd_flags & REQ_POLLED) { + blk_opf_t opf = rq->cmd_flags | REQ_ALLOC_CACHE; + + /* no iovecs to alloc, as we already have a BVEC iterator */ + bio = bio_alloc_bioset(NULL, 0, opf, GFP_KERNEL, + &fs_bio_set); + if (!bio) + return -ENOMEM; + } else { + bio = bio_kmalloc(0, GFP_KERNEL); + if (!bio) + return -ENOMEM; + bio_init(bio, NULL, bio->bi_inline_vecs, 0, req_op(rq)); + } + bio_iov_bvec_set(bio, iter); + blk_rq_bio_prep(rq, bio, nr_segs); + + /* loop to perform a bunch of sanity checks */ + bvec_arr = (struct bio_vec *)iter->bvec; + for (i = 0; i < nr_segs; i++) { + bv = &bvec_arr[i]; + /* + * If the queue doesn't support SG gaps and adding this + * offset would create a gap, disallow it. + */ + if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) { + ret = -EINVAL; + goto out_free; + } + + /* check full condition */ + if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len) { + ret = -EINVAL; + goto out_free; + } + + if (bytes + bv->bv_len <= iter_count && + bv->bv_offset + bv->bv_len <= PAGE_SIZE) { + nsegs++; + bytes += bv->bv_len; + } else { + ret = -EINVAL; + goto out_free; + } + bvprvp = bv; + } + return 0; +out_free: + bio_map_put(bio); + return ret; +} +EXPORT_SYMBOL(blk_rq_map_user_bvec); + /** * blk_rq_unmap_user - unmap a request with user data * @bio: start of bio list diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 9d1af7a0a401..a7c9c836d2f3 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -971,6 +971,7 @@ struct rq_map_data { bool from_user; }; +int blk_rq_map_user_fixedb(struct request *rq, struct iov_iter *iter); int blk_rq_map_user(struct request_queue *, struct request *, struct rq_map_data *, void __user *, unsigned long, gfp_t); int blk_rq_map_user_iov(struct request_queue *, struct request *, -- 2.25.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <CGME20220819104045epcas5p117a9fcb0c3143e877e75e24ceba4f381@epcas5p1.samsung.com>]
* [PATCH for-next 4/4] nvme: wire up fixed buffer support for nvme passthrough [not found] ` <CGME20220819104045epcas5p117a9fcb0c3143e877e75e24ceba4f381@epcas5p1.samsung.com> @ 2022-08-19 10:30 ` Kanchan Joshi 0 siblings, 0 replies; 9+ messages in thread From: Kanchan Joshi @ 2022-08-19 10:30 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Kanchan Joshi, Anuj Gupta if io_uring sends passthrough command with IO_URING_F_FIXEDBUFS flag, use the pre-registered buffer to form the bio. While at it modify nvme_submit_user_cmd to take ubuffer as plain integer argument, and do away with nvme_to_user_ptr conversion in callers. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- drivers/nvme/host/ioctl.c | 38 ++++++++++++++++++++++++++------------ 1 file changed, 26 insertions(+), 12 deletions(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 7756b439a688..5f2e2d31f5c7 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -65,10 +65,11 @@ static int nvme_finish_user_metadata(struct request *req, void __user *ubuf, } static struct request *nvme_alloc_user_request(struct request_queue *q, - struct nvme_command *cmd, void __user *ubuffer, + struct nvme_command *cmd, u64 ubuffer, unsigned bufflen, void __user *meta_buffer, unsigned meta_len, u32 meta_seed, void **metap, unsigned timeout, bool vec, - blk_opf_t rq_flags, blk_mq_req_flags_t blk_flags) + blk_opf_t rq_flags, blk_mq_req_flags_t blk_flags, + struct io_uring_cmd *ioucmd, bool fixedbufs) { bool write = nvme_is_write(cmd); struct nvme_ns *ns = q->queuedata; @@ -89,14 +90,27 @@ static struct request *nvme_alloc_user_request(struct request_queue *q, if (ubuffer && bufflen) { if (!vec) - ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, - GFP_KERNEL); + if (fixedbufs) { + struct iov_iter iter; + + ret = io_uring_cmd_import_fixed(ubuffer, + bufflen, rq_data_dir(req), + &iter, ioucmd); + if (ret < 0) + goto out; + ret = blk_rq_map_user_fixedb(req, &iter); + } else { + ret = blk_rq_map_user(q, req, NULL, + nvme_to_user_ptr(ubuffer), + bufflen, GFP_KERNEL); + } else { struct iovec fast_iov[UIO_FASTIOV]; struct iovec *iov = fast_iov; struct iov_iter iter; - ret = import_iovec(rq_data_dir(req), ubuffer, bufflen, + ret = import_iovec(rq_data_dir(req), + nvme_to_user_ptr(ubuffer), bufflen, UIO_FASTIOV, &iov, &iter); if (ret < 0) goto out; @@ -132,7 +146,7 @@ static struct request *nvme_alloc_user_request(struct request_queue *q, } static int nvme_submit_user_cmd(struct request_queue *q, - struct nvme_command *cmd, void __user *ubuffer, + struct nvme_command *cmd, u64 ubuffer, unsigned bufflen, void __user *meta_buffer, unsigned meta_len, u32 meta_seed, u64 *result, unsigned timeout, bool vec) { @@ -142,7 +156,7 @@ static int nvme_submit_user_cmd(struct request_queue *q, int ret; req = nvme_alloc_user_request(q, cmd, ubuffer, bufflen, meta_buffer, - meta_len, meta_seed, &meta, timeout, vec, 0, 0); + meta_len, meta_seed, &meta, timeout, vec, 0, 0, NULL, 0); if (IS_ERR(req)) return PTR_ERR(req); @@ -220,7 +234,7 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) c.rw.appmask = cpu_to_le16(io.appmask); return nvme_submit_user_cmd(ns->queue, &c, - nvme_to_user_ptr(io.addr), length, + io.addr, length, metadata, meta_len, lower_32_bits(io.slba), NULL, 0, false); } @@ -274,7 +288,7 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, timeout = msecs_to_jiffies(cmd.timeout_ms); status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, - nvme_to_user_ptr(cmd.addr), cmd.data_len, + cmd.addr, cmd.data_len, nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, 0, &result, timeout, false); @@ -320,7 +334,7 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, timeout = msecs_to_jiffies(cmd.timeout_ms); status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, - nvme_to_user_ptr(cmd.addr), cmd.data_len, + cmd.addr, cmd.data_len, nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, 0, &cmd.result, timeout, vec); @@ -457,11 +471,11 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, rq_flags |= REQ_POLLED; retry: - req = nvme_alloc_user_request(q, &c, nvme_to_user_ptr(d.addr), + req = nvme_alloc_user_request(q, &c, d.addr, d.data_len, nvme_to_user_ptr(d.metadata), d.metadata_len, 0, &meta, d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0, vec, rq_flags, - blk_flags); + blk_flags, ioucmd, issue_flags & IO_URING_F_FIXEDBUFS); if (IS_ERR(req)) return PTR_ERR(req); req->end_io = nvme_uring_cmd_end_io; -- 2.25.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-08-25 16:12 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20220819104031epcas5p3d485526e1b2b42078ccce7e40a74b7f5@epcas5p3.samsung.com> 2022-08-19 10:30 ` [PATCH for-next 0/4] fixed-buffer for uring-cmd/passthrough Kanchan Joshi [not found] ` <CGME20220819104036epcas5p2bb4d9b2cccbdfcdb460e085abe7fd1a8@epcas5p2.samsung.com> 2022-08-19 10:30 ` [PATCH for-next 1/4] io_uring: introduce io_uring_cmd_import_fixed Kanchan Joshi [not found] ` <CGME20220819104038epcas5p265c9385cfd9189d20ebfffeaa4d5efae@epcas5p2.samsung.com> 2022-08-19 10:30 ` [PATCH for-next 2/4] io_uring: introduce fixed buffer support for io_uring_cmd Kanchan Joshi 2022-08-22 10:58 ` Pavel Begunkov 2022-08-22 11:33 ` Kanchan Joshi 2022-08-25 9:34 ` Pavel Begunkov 2022-08-25 16:02 ` Kanchan Joshi [not found] ` <CGME20220819104042epcas5p177f384cd4c15918f666c7eacc4dfab4c@epcas5p1.samsung.com> 2022-08-19 10:30 ` [PATCH for-next 3/4] block: add helper to map bvec iterator for passthrough Kanchan Joshi [not found] ` <CGME20220819104045epcas5p117a9fcb0c3143e877e75e24ceba4f381@epcas5p1.samsung.com> 2022-08-19 10:30 ` [PATCH for-next 4/4] nvme: wire up fixed buffer support for nvme passthrough Kanchan Joshi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox