* [RFC PATCH 0/6] Fixed-buffers io_uring passthrough over nvme-char [not found] <CGME20210805125910epcas5p1100e7093dd2b1ac5bbb751331e2ded23@epcas5p1.samsung.com> @ 2021-08-05 12:55 ` Kanchan Joshi [not found] ` <CGME20210805125917epcas5p4f75c9423a7b886dc79500901cc8f55ab@epcas5p4.samsung.com> ` (6 more replies) 0 siblings, 7 replies; 19+ messages in thread From: Kanchan Joshi @ 2021-08-05 12:55 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, anuj20.g, javier.gonz, hare, Kanchan Joshi Hi, This series takes a dig at expanding io_uring passthrough to have fixed-buffer support. For the NVMe side plumbing, it uses the per-namespace char-device as the backend. The work is on top of Jens' branch (which is 5.14-rc1 based): https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v5 Testing is done by integrating user-space side with fio: https://github.com/joshkan/fio/commit/a8f40d4f7013ccd38fc51275d8b00fb4ce2404f5 Patches can be grouped into two parts - First part (patch 1 and 2) involves extending 'io_uring cmd' infra (for ioctl updates in task-context), and wire up nvme-passthrough on char-device (/dev/ngXnY). The second part (patch 4, 5, 6) introduces fixed-buffer variant of uring-cmd (IORING_OP_URING_CMD_FIXED), and new NVMe passthrough IOCTLs that operate on pre-registered buffers. The buffers are registered with io_uring in the usual fashion. And the buffer index is passed in command-SQE. This to be used along with two new fixed-buffer ioctl opcodes: NVME_IOCTL_IO_CMD_FIXED or NVME_IOCTL_IO64_CMD_FIXED. For existing passthrough ioctl, nvme-driver maps the buffers in I/O path for each command. For the new ioctl, driver supplies buffer-index to io_uring and gets to reuse pre-mapped buffers. If introducing new ioctls in nvme (for fixed-buffer) does not sound great, another option would be to use 'user flags' in the passthrough command. I look forward to the thoughts of the community on this. Jens, Command-SQE didn't have the field for buf_index. I'm torn among few options, but repurposed len for now. Please take a look. I'd appreciate the feedback/comments. Thanks, Changes from v4: https://lore.kernel.org/linux-nvme/[email protected]/ 1. Moved to v5 branch of Jens, adapted to task-work changes in io_uring 2. Removed support for block-passthrough (over nvme0n1) for now 3. Added support for char-passthrough (over ng0n1) 4. Added fixed-buffer passthrough in io_uring and nvme plumbing Anuj Gupta (2): io_uring: mark iopoll not supported for uring-cmd io_uring: add support for uring_cmd with fixed-buffer Kanchan Joshi (4): io_uring: add infra for uring_cmd completion in submitter-task. nvme: wire-up support for async-passthru on char-device. io_uring: add helper for fixed-buffer uring-cmd nvme: enable passthrough with fixed-buffer drivers/nvme/host/core.c | 1 + drivers/nvme/host/ioctl.c | 245 +++++++++++++++++++++++++++++--- drivers/nvme/host/multipath.c | 1 + drivers/nvme/host/nvme.h | 5 + fs/io_uring.c | 69 ++++++++- include/linux/io_uring.h | 15 ++ include/uapi/linux/io_uring.h | 6 +- include/uapi/linux/nvme_ioctl.h | 2 + 8 files changed, 319 insertions(+), 25 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CGME20210805125917epcas5p4f75c9423a7b886dc79500901cc8f55ab@epcas5p4.samsung.com>]
* [RFC PATCH 1/6] io_uring: add infra for uring_cmd completion in submitter-task. [not found] ` <CGME20210805125917epcas5p4f75c9423a7b886dc79500901cc8f55ab@epcas5p4.samsung.com> @ 2021-08-05 12:55 ` Kanchan Joshi 0 siblings, 0 replies; 19+ messages in thread From: Kanchan Joshi @ 2021-08-05 12:55 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, anuj20.g, javier.gonz, hare, Kanchan Joshi Completion of a uring_cmd ioctl may involve referencing certain ioctl-specific fields, requiring original submitter context. Export an API that driver can use for this purpose. The API facilitates reusing task-work infra of io_uring, while driver gets to implement cmd-specific handling in a callback. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- fs/io_uring.c | 16 ++++++++++++++++ include/linux/io_uring.h | 8 ++++++++ 2 files changed, 24 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 481ebd20352c..b73bc16c3e70 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2041,6 +2041,22 @@ static void io_req_task_submit(struct io_kiocb *req) mutex_unlock(&ctx->uring_lock); } +static void io_uring_cmd_work(struct io_kiocb *req) +{ + req->uring_cmd.driver_cb(&req->uring_cmd); +} + +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*driver_cb)(struct io_uring_cmd *)) +{ + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); + + req->uring_cmd.driver_cb = driver_cb; + req->io_task_work.func = io_uring_cmd_work; + io_req_task_work_add(req); +} +EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); + static void io_req_task_queue_fail(struct io_kiocb *req, int ret) { req->result = ret; diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index ea20c16f9a64..235d1603f97e 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -14,11 +14,15 @@ struct io_uring_cmd { __u16 op; __u16 unused; __u32 len; + /* used if driver requires update in task context*/ + void (*driver_cb)(struct io_uring_cmd *cmd); __u64 pdu[5]; /* 40 bytes available inline for free use */ }; #if defined(CONFIG_IO_URING) void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret); +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*driver_cb)(struct io_uring_cmd *)); struct sock *io_uring_get_socket(struct file *file); void __io_uring_cancel(struct files_struct *files); void __io_uring_free(struct task_struct *tsk); @@ -41,6 +45,10 @@ static inline void io_uring_free(struct task_struct *tsk) static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret) { } +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*driver_cb)(struct io_uring_cmd *)) +{ +} static inline struct sock *io_uring_get_socket(struct file *file) { return NULL; -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
[parent not found: <CGME20210805125923epcas5p10e6c1b95475440be68f58244d5a3cb9a@epcas5p1.samsung.com>]
* [RFC PATCH 2/6] nvme: wire-up support for async-passthru on char-device. [not found] ` <CGME20210805125923epcas5p10e6c1b95475440be68f58244d5a3cb9a@epcas5p1.samsung.com> @ 2021-08-05 12:55 ` Kanchan Joshi 2021-09-07 7:46 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Kanchan Joshi @ 2021-08-05 12:55 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, anuj20.g, javier.gonz, hare, Kanchan Joshi Introduce handlers for fops->uring_cmd(), implementing async passthru on char device (including the multipath one). The handlers supports NVME_IOCTL_IO_CMD and NVME_IOCTL_IO64_CMD. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- drivers/nvme/host/core.c | 1 + drivers/nvme/host/ioctl.c | 177 +++++++++++++++++++++++++++++++--- drivers/nvme/host/multipath.c | 1 + drivers/nvme/host/nvme.h | 5 + 4 files changed, 169 insertions(+), 15 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 11779be42186..58fe7f6c94ba 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3559,6 +3559,7 @@ static const struct file_operations nvme_ns_chr_fops = { .release = nvme_ns_chr_release, .unlocked_ioctl = nvme_ns_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, + .uring_cmd = nvme_ns_chr_async_ioctl, }; static int nvme_add_ns_cdev(struct nvme_ns *ns) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 305ddd415e45..2730c5dfdf78 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -18,6 +18,93 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) ptrval = (compat_uptr_t)ptrval; return (void __user *)ptrval; } +/* + * This is carved within the io_uring_cmd, to avoid dynamic allocation. + * Care should be taken not to grow this beyond what is available. + * Expect build warning otherwise. + */ +struct uring_cmd_data { + union { + struct bio *bio; + u64 result; /* nvme cmd result */ + }; + void *meta; /* kernel-resident buffer */ + int status; /* nvme cmd status */ +}; + +inline u64 *nvme_ioucmd_data_addr(struct io_uring_cmd *ioucmd) +{ + return &(((struct block_uring_cmd *)&ioucmd->pdu)->unused2[1]); +} + +static void nvme_pt_task_cb(struct io_uring_cmd *ioucmd) +{ + struct uring_cmd_data *ucd; + struct nvme_passthru_cmd64 __user *ptcmd64 = NULL; + struct block_uring_cmd *bcmd; + + bcmd = (struct block_uring_cmd *) &ioucmd->pdu; + ptcmd64 = (void __user *) bcmd->unused2[0]; + ucd = (struct uring_cmd_data *) nvme_ioucmd_data_addr(ioucmd); + + if (ucd->meta) { + void __user *umeta = nvme_to_user_ptr(ptcmd64->metadata); + + if (!ucd->status) + if (copy_to_user(umeta, ucd->meta, ptcmd64->metadata_len)) + ucd->status = -EFAULT; + kfree(ucd->meta); + } + if (likely(bcmd->ioctl_cmd == NVME_IOCTL_IO64_CMD)) { + if (put_user(ucd->result, &ptcmd64->result)) + ucd->status = -EFAULT; + } else { + struct nvme_passthru_cmd __user *ptcmd = (void *)bcmd->unused2[0]; + + if (put_user(ucd->result, &ptcmd->result)) + ucd->status = -EFAULT; + } + io_uring_cmd_done(ioucmd, ucd->status); +} + +static void nvme_end_async_pt(struct request *req, blk_status_t err) +{ + struct io_uring_cmd *ioucmd; + struct uring_cmd_data *ucd; + struct bio *bio; + + ioucmd = req->end_io_data; + ucd = (struct uring_cmd_data *) nvme_ioucmd_data_addr(ioucmd); + /* extract bio before reusing the same field for status */ + bio = ucd->bio; + + if (nvme_req(req)->flags & NVME_REQ_CANCELLED) + ucd->status = -EINTR; + else + ucd->status = nvme_req(req)->status; + ucd->result = le64_to_cpu(nvme_req(req)->result.u64); + + /* this takes care of setting up task-work */ + io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb); + + /* we can unmap pages, free bio and request */ + blk_rq_unmap_user(bio); + blk_mq_free_request(req); +} + +static void nvme_setup_uring_cmd_data(struct request *rq, + struct io_uring_cmd *ioucmd, void *meta, bool write) +{ + struct uring_cmd_data *ucd; + + ucd = (struct uring_cmd_data *) nvme_ioucmd_data_addr(ioucmd); + /* to free bio on completion, as req->bio will be null at that time */ + ucd->bio = rq->bio; + /* meta update is required only for read requests */ + if (meta && !write) + ucd->meta = meta; + rq->end_io_data = ioucmd; +} static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, unsigned len, u32 seed, bool write) @@ -56,7 +143,8 @@ static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, static int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd, void __user *ubuffer, unsigned bufflen, void __user *meta_buffer, unsigned meta_len, - u32 meta_seed, u64 *result, unsigned timeout) + u32 meta_seed, u64 *result, unsigned timeout, + struct io_uring_cmd *ioucmd) { bool write = nvme_is_write(cmd); struct nvme_ns *ns = q->queuedata; @@ -92,6 +180,12 @@ static int nvme_submit_user_cmd(struct request_queue *q, req->cmd_flags |= REQ_INTEGRITY; } } + if (ioucmd) { /* async dispatch */ + nvme_setup_uring_cmd_data(req, ioucmd, meta, write); + blk_execute_rq_nowait(ns ? ns->disk : NULL, req, 0, + nvme_end_async_pt); + return 0; + } ret = nvme_execute_passthru_rq(req); if (result) @@ -170,7 +264,8 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio) return nvme_submit_user_cmd(ns->queue, &c, nvme_to_user_ptr(io.addr), length, - metadata, meta_len, lower_32_bits(io.slba), NULL, 0); + metadata, meta_len, lower_32_bits(io.slba), NULL, 0, + NULL); } static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, @@ -188,7 +283,8 @@ static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, } static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, - struct nvme_passthru_cmd __user *ucmd) + struct nvme_passthru_cmd __user *ucmd, + struct io_uring_cmd *ioucmd) { struct nvme_passthru_cmd cmd; struct nvme_command c; @@ -224,9 +320,9 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, nvme_to_user_ptr(cmd.addr), cmd.data_len, nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, - 0, &result, timeout); + 0, &result, timeout, ioucmd); - if (status >= 0) { + if (!ioucmd && status >= 0) { if (put_user(result, &ucmd->result)) return -EFAULT; } @@ -235,7 +331,8 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, } static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, - struct nvme_passthru_cmd64 __user *ucmd) + struct nvme_passthru_cmd64 __user *ucmd, + struct io_uring_cmd *ioucmd) { struct nvme_passthru_cmd64 cmd; struct nvme_command c; @@ -270,9 +367,9 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns, status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, nvme_to_user_ptr(cmd.addr), cmd.data_len, nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, - 0, &cmd.result, timeout); + 0, &cmd.result, timeout, ioucmd); - if (status >= 0) { + if (!ioucmd && status >= 0) { if (put_user(cmd.result, &ucmd->result)) return -EFAULT; } @@ -294,9 +391,9 @@ static int nvme_ctrl_ioctl(struct nvme_ctrl *ctrl, unsigned int cmd, { switch (cmd) { case NVME_IOCTL_ADMIN_CMD: - return nvme_user_cmd(ctrl, NULL, argp); + return nvme_user_cmd(ctrl, NULL, argp, NULL); case NVME_IOCTL_ADMIN64_CMD: - return nvme_user_cmd64(ctrl, NULL, argp); + return nvme_user_cmd64(ctrl, NULL, argp, NULL); default: return sed_ioctl(ctrl->opal_dev, cmd, argp); } @@ -328,7 +425,7 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd, force_successful_syscall_return(); return ns->head->ns_id; case NVME_IOCTL_IO_CMD: - return nvme_user_cmd(ns->ctrl, ns, argp); + return nvme_user_cmd(ns->ctrl, ns, argp, NULL); /* * struct nvme_user_io can have different padding on some 32-bit ABIs. * Just accept the compat version as all fields that are used are the @@ -340,7 +437,7 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd, case NVME_IOCTL_SUBMIT_IO: return nvme_submit_io(ns, argp); case NVME_IOCTL_IO64_CMD: - return nvme_user_cmd64(ns->ctrl, ns, argp); + return nvme_user_cmd64(ns->ctrl, ns, argp, NULL); default: if (!ns->ndev) return -ENOTTY; @@ -371,6 +468,41 @@ long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg) return __nvme_ioctl(ns, cmd, (void __user *)arg); } +static int nvme_ns_async_ioctl(struct nvme_ns *ns, struct io_uring_cmd *ioucmd) +{ + struct block_uring_cmd *bcmd = (struct block_uring_cmd *)&ioucmd->pdu; + void __user *argp = (void __user *) bcmd->unused2[0]; + int ret; + + BUILD_BUG_ON(sizeof(struct uring_cmd_data) > + sizeof(struct block_uring_cmd) - + offsetof(struct block_uring_cmd, unused2[1])); + + switch (bcmd->ioctl_cmd) { + case NVME_IOCTL_IO_CMD: + ret = nvme_user_cmd(ns->ctrl, ns, argp, ioucmd); + break; + case NVME_IOCTL_IO64_CMD: + ret = nvme_user_cmd64(ns->ctrl, ns, argp, ioucmd); + break; + default: + ret = -ENOTTY; + } + + if (ret >= 0) + ret = -EIOCBQUEUED; + return ret; +} + +int nvme_ns_chr_async_ioctl(struct io_uring_cmd *ioucmd, + enum io_uring_cmd_flags flags) +{ + struct nvme_ns *ns = container_of(file_inode(ioucmd->file)->i_cdev, + struct nvme_ns, cdev); + + return nvme_ns_async_ioctl(ns, ioucmd); +} + #ifdef CONFIG_NVME_MULTIPATH static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd, void __user *argp, struct nvme_ns_head *head, int srcu_idx) @@ -387,6 +519,21 @@ static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd, return ret; } +int nvme_ns_head_chr_async_ioctl(struct io_uring_cmd *ioucmd, + enum io_uring_cmd_flags flags) +{ + struct cdev *cdev = file_inode(ioucmd->file)->i_cdev; + struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev); + int srcu_idx = srcu_read_lock(&head->srcu); + struct nvme_ns *ns = nvme_find_path(head); + int ret = -EWOULDBLOCK; + + if (ns) + ret = nvme_ns_async_ioctl(ns, ioucmd); + srcu_read_unlock(&head->srcu, srcu_idx); + return ret; +} + int nvme_ns_head_ioctl(struct block_device *bdev, fmode_t mode, unsigned int cmd, unsigned long arg) { @@ -463,7 +610,7 @@ static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp) kref_get(&ns->kref); up_read(&ctrl->namespaces_rwsem); - ret = nvme_user_cmd(ctrl, ns, argp); + ret = nvme_user_cmd(ctrl, ns, argp, NULL); nvme_put_ns(ns); return ret; @@ -480,9 +627,9 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd, switch (cmd) { case NVME_IOCTL_ADMIN_CMD: - return nvme_user_cmd(ctrl, NULL, argp); + return nvme_user_cmd(ctrl, NULL, argp, NULL); case NVME_IOCTL_ADMIN64_CMD: - return nvme_user_cmd64(ctrl, NULL, argp); + return nvme_user_cmd64(ctrl, NULL, argp, NULL); case NVME_IOCTL_IO_CMD: return nvme_dev_user_cmd(ctrl, argp); case NVME_IOCTL_RESET: diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 0ea5298469c3..e89a6334249a 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -403,6 +403,7 @@ static const struct file_operations nvme_ns_head_chr_fops = { .release = nvme_ns_head_chr_release, .unlocked_ioctl = nvme_ns_head_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, + .uring_cmd = nvme_ns_head_chr_async_ioctl, }; static int nvme_add_ns_head_cdev(struct nvme_ns_head *head) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 18ef8dd03a90..e43e73eda4c6 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -17,6 +17,7 @@ #include <linux/rcupdate.h> #include <linux/wait.h> #include <linux/t10-pi.h> +#include <linux/io_uring.h> #include <trace/events/block.h> @@ -688,6 +689,10 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg); long nvme_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg); +int nvme_ns_chr_async_ioctl(struct io_uring_cmd *ucmd, + enum io_uring_cmd_flags flags); +int nvme_ns_head_chr_async_ioctl(struct io_uring_cmd *ucmd, + enum io_uring_cmd_flags flags); int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo); extern const struct attribute_group *nvme_ns_id_attr_groups[]; -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 2/6] nvme: wire-up support for async-passthru on char-device. 2021-08-05 12:55 ` [RFC PATCH 2/6] nvme: wire-up support for async-passthru on char-device Kanchan Joshi @ 2021-09-07 7:46 ` Christoph Hellwig 2021-09-07 16:20 ` Kanchan Joshi 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2021-09-07 7:46 UTC (permalink / raw) To: Kanchan Joshi Cc: axboe, hch, kbusch, io-uring, linux-nvme, anuj20.g, javier.gonz, hare Looking at this in isolation: - no need to also implement the legacy non-64 passthrough interface - no need to overlay the block_uring_cmd structure as that makes a complete mess Below is an untested patch to fix that up a bit. A few other notes: - I suspect the ioctl_cmd really should move into the core using_cmd infrastructure - please stick to the naming of the file operation instead of using something different. That being said async_ioctl seems better fitting than uring_cmd - that whole mix of user space interface and internal data in the ->pdu field is a mess. What is the problem with deferring the request freeing into the user context, which would clean up quite a bit of that, especially if io_uring_cmd grows a private field. diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index d336e34aac410..8ceff441b6425 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -18,12 +18,12 @@ static void __user *nvme_to_user_ptr(uintptr_t ptrval) ptrval = (compat_uptr_t)ptrval; return (void __user *)ptrval; } -/* - * This is carved within the io_uring_cmd, to avoid dynamic allocation. - * Care should be taken not to grow this beyond what is available. - * Expect build warning otherwise. - */ -struct uring_cmd_data { + +/* This overlays struct io_uring_cmd pdu (40 bytes) */ +struct nvme_uring_cmd { + __u32 ioctl_cmd; + __u32 unused1; + void __user *argp; union { struct bio *bio; u64 result; /* nvme cmd result */ @@ -32,57 +32,42 @@ struct uring_cmd_data { int status; /* nvme cmd status */ }; -inline u64 *nvme_ioucmd_data_addr(struct io_uring_cmd *ioucmd) +static struct nvme_uring_cmd *nvme_uring_cmd(struct io_uring_cmd *ioucmd) { - return &(((struct block_uring_cmd *)&ioucmd->pdu)->unused2[1]); + return (struct nvme_uring_cmd *)&ioucmd->pdu; } static void nvme_pt_task_cb(struct io_uring_cmd *ioucmd) { - struct uring_cmd_data *ucd; - struct nvme_passthru_cmd64 __user *ptcmd64 = NULL; - struct block_uring_cmd *bcmd; + struct nvme_uring_cmd *cmd = nvme_uring_cmd(ioucmd); + struct nvme_passthru_cmd64 __user *ptcmd64 = cmd->argp; - bcmd = (struct block_uring_cmd *) &ioucmd->pdu; - ptcmd64 = (void __user *) bcmd->unused2[0]; - ucd = (struct uring_cmd_data *) nvme_ioucmd_data_addr(ioucmd); - - if (ucd->meta) { + if (cmd->meta) { void __user *umeta = nvme_to_user_ptr(ptcmd64->metadata); - if (!ucd->status) - if (copy_to_user(umeta, ucd->meta, ptcmd64->metadata_len)) - ucd->status = -EFAULT; - kfree(ucd->meta); + if (!cmd->status) + if (copy_to_user(umeta, cmd->meta, ptcmd64->metadata_len)) + cmd->status = -EFAULT; + kfree(cmd->meta); } - if (likely(bcmd->ioctl_cmd == NVME_IOCTL_IO64_CMD)) { - if (put_user(ucd->result, &ptcmd64->result)) - ucd->status = -EFAULT; - } else { - struct nvme_passthru_cmd __user *ptcmd = (void *)bcmd->unused2[0]; - if (put_user(ucd->result, &ptcmd->result)) - ucd->status = -EFAULT; - } - io_uring_cmd_done(ioucmd, ucd->status); + if (put_user(cmd->result, &ptcmd64->result)) + cmd->status = -EFAULT; + io_uring_cmd_done(ioucmd, cmd->status); } static void nvme_end_async_pt(struct request *req, blk_status_t err) { - struct io_uring_cmd *ioucmd; - struct uring_cmd_data *ucd; - struct bio *bio; - - ioucmd = req->end_io_data; - ucd = (struct uring_cmd_data *) nvme_ioucmd_data_addr(ioucmd); + struct io_uring_cmd *ioucmd = req->end_io_data; + struct nvme_uring_cmd *cmd = nvme_uring_cmd(ioucmd); /* extract bio before reusing the same field for status */ - bio = ucd->bio; + struct bio *bio = cmd->bio; if (nvme_req(req)->flags & NVME_REQ_CANCELLED) - ucd->status = -EINTR; + cmd->status = -EINTR; else - ucd->status = nvme_req(req)->status; - ucd->result = le64_to_cpu(nvme_req(req)->result.u64); + cmd->status = nvme_req(req)->status; + cmd->result = le64_to_cpu(nvme_req(req)->result.u64); /* this takes care of setting up task-work */ io_uring_cmd_complete_in_task(ioucmd, nvme_pt_task_cb); @@ -95,14 +80,15 @@ static void nvme_end_async_pt(struct request *req, blk_status_t err) static void nvme_setup_uring_cmd_data(struct request *rq, struct io_uring_cmd *ioucmd, void *meta, bool write) { - struct uring_cmd_data *ucd; + struct nvme_uring_cmd *cmd = nvme_uring_cmd(ioucmd); - ucd = (struct uring_cmd_data *) nvme_ioucmd_data_addr(ioucmd); /* to free bio on completion, as req->bio will be null at that time */ - ucd->bio = rq->bio; + cmd->bio = rq->bio; /* meta update is required only for read requests */ if (meta && !write) - ucd->meta = meta; + cmd->meta = meta; + else + cmd->meta = NULL; rq->end_io_data = ioucmd; } @@ -139,23 +125,19 @@ static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, out: return ERR_PTR(ret); } + static inline bool nvme_is_fixedb_passthru(struct io_uring_cmd *ioucmd) { - struct block_uring_cmd *bcmd; - if (!ioucmd) return false; - bcmd = (struct block_uring_cmd *)&ioucmd->pdu; - if (bcmd && ((bcmd->ioctl_cmd == NVME_IOCTL_IO_CMD_FIXED) || - (bcmd->ioctl_cmd == NVME_IOCTL_IO64_CMD_FIXED))) - return true; - return false; + return nvme_uring_cmd(ioucmd)->ioctl_cmd == NVME_IOCTL_IO64_CMD_FIXED; } + /* * Unlike blk_rq_map_user () this is only for fixed-buffer async passthrough. * And hopefully faster as well. */ -int nvme_rq_map_user_fixedb(struct request_queue *q, struct request *rq, +static int nvme_rq_map_user_fixedb(struct request_queue *q, struct request *rq, void __user *ubuf, unsigned long len, gfp_t gfp_mask, struct io_uring_cmd *ioucmd) { @@ -345,8 +327,7 @@ static bool nvme_validate_passthru_nsid(struct nvme_ctrl *ctrl, } static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, - struct nvme_passthru_cmd __user *ucmd, - struct io_uring_cmd *ioucmd) + struct nvme_passthru_cmd __user *ucmd) { struct nvme_passthru_cmd cmd; struct nvme_command c; @@ -382,9 +363,9 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns, status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c, nvme_to_user_ptr(cmd.addr), cmd.data_len, nvme_to_user_ptr(cmd.metadata), cmd.metadata_len, - 0, &result, timeout, ioucmd); + 0, &result, timeout, NULL); - if (!ioucmd && status >= 0) { + if (status >= 0) { if (put_user(result, &ucmd->result)) return -EFAULT; } @@ -453,7 +434,7 @@ static int nvme_ctrl_ioctl(struct nvme_ctrl *ctrl, unsigned int cmd, { switch (cmd) { case NVME_IOCTL_ADMIN_CMD: - return nvme_user_cmd(ctrl, NULL, argp, NULL); + return nvme_user_cmd(ctrl, NULL, argp); case NVME_IOCTL_ADMIN64_CMD: return nvme_user_cmd64(ctrl, NULL, argp, NULL); default: @@ -487,7 +468,7 @@ static int nvme_ns_ioctl(struct nvme_ns *ns, unsigned int cmd, force_successful_syscall_return(); return ns->head->ns_id; case NVME_IOCTL_IO_CMD: - return nvme_user_cmd(ns->ctrl, ns, argp, NULL); + return nvme_user_cmd(ns->ctrl, ns, argp); /* * struct nvme_user_io can have different padding on some 32-bit ABIs. * Just accept the compat version as all fields that are used are the @@ -532,22 +513,13 @@ long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg) static int nvme_ns_async_ioctl(struct nvme_ns *ns, struct io_uring_cmd *ioucmd) { - struct block_uring_cmd *bcmd = (struct block_uring_cmd *)&ioucmd->pdu; - void __user *argp = (void __user *) bcmd->unused2[0]; + struct nvme_uring_cmd *cmd = nvme_uring_cmd(ioucmd); int ret; - BUILD_BUG_ON(sizeof(struct uring_cmd_data) > - sizeof(struct block_uring_cmd) - - offsetof(struct block_uring_cmd, unused2[1])); - - switch (bcmd->ioctl_cmd) { - case NVME_IOCTL_IO_CMD: - case NVME_IOCTL_IO_CMD_FIXED: - ret = nvme_user_cmd(ns->ctrl, ns, argp, ioucmd); - break; + switch (cmd->ioctl_cmd) { case NVME_IOCTL_IO64_CMD: case NVME_IOCTL_IO64_CMD_FIXED: - ret = nvme_user_cmd64(ns->ctrl, ns, argp, ioucmd); + ret = nvme_user_cmd64(ns->ctrl, ns, cmd->argp, ioucmd); break; default: ret = -ENOTTY; @@ -674,7 +646,7 @@ static int nvme_dev_user_cmd(struct nvme_ctrl *ctrl, void __user *argp) kref_get(&ns->kref); up_read(&ctrl->namespaces_rwsem); - ret = nvme_user_cmd(ctrl, ns, argp, NULL); + ret = nvme_user_cmd(ctrl, ns, argp); nvme_put_ns(ns); return ret; @@ -691,7 +663,7 @@ long nvme_dev_ioctl(struct file *file, unsigned int cmd, switch (cmd) { case NVME_IOCTL_ADMIN_CMD: - return nvme_user_cmd(ctrl, NULL, argp, NULL); + return nvme_user_cmd(ctrl, NULL, argp); case NVME_IOCTL_ADMIN64_CMD: return nvme_user_cmd64(ctrl, NULL, argp, NULL); case NVME_IOCTL_IO_CMD: diff --git a/include/uapi/linux/nvme_ioctl.h b/include/uapi/linux/nvme_ioctl.h index fc05c6024edd6..a65e648a57928 100644 --- a/include/uapi/linux/nvme_ioctl.h +++ b/include/uapi/linux/nvme_ioctl.h @@ -78,7 +78,6 @@ struct nvme_passthru_cmd64 { #define NVME_IOCTL_RESCAN _IO('N', 0x46) #define NVME_IOCTL_ADMIN64_CMD _IOWR('N', 0x47, struct nvme_passthru_cmd64) #define NVME_IOCTL_IO64_CMD _IOWR('N', 0x48, struct nvme_passthru_cmd64) -#define NVME_IOCTL_IO_CMD_FIXED _IOWR('N', 0x49, struct nvme_passthru_cmd) #define NVME_IOCTL_IO64_CMD_FIXED _IOWR('N', 0x50, struct nvme_passthru_cmd64) #endif /* _UAPI_LINUX_NVME_IOCTL_H */ ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 2/6] nvme: wire-up support for async-passthru on char-device. 2021-09-07 7:46 ` Christoph Hellwig @ 2021-09-07 16:20 ` Kanchan Joshi 2021-09-08 6:15 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Kanchan Joshi @ 2021-09-07 16:20 UTC (permalink / raw) To: Christoph Hellwig Cc: Kanchan Joshi, Jens Axboe, Keith Busch, io-uring, linux-nvme, anuj20.g, Javier Gonzalez, hare On Tue, Sep 7, 2021 at 1:17 PM Christoph Hellwig <[email protected]> wrote: > > Looking at this in isolation: > > - no need to also implement the legacy non-64 passthrough interface > - no need to overlay the block_uring_cmd structure as that makes a > complete mess > > Below is an untested patch to fix that up a bit. Thanks for taking a look and cleaning that up. Looks a lot better. > A few other notes: > > - I suspect the ioctl_cmd really should move into the core using_cmd > infrastructure Yes, that seems possible by creating that field outside by combining "op" and "unused" below. +struct io_uring_cmd { + struct file *file; + __u16 op; + __u16 unused; + __u32 len; + __u64 pdu[5]; /* 40 bytes available inline for free use */ +}; > - please stick to the naming of the file operation instead of using > something different. That being said async_ioctl seems better > fitting than uring_cmd Got it. > - that whole mix of user space interface and internal data in the > ->pdu field is a mess. What is the problem with deferring the > request freeing into the user context, which would clean up > quite a bit of that, especially if io_uring_cmd grows a private > field. That mix isn't great but the attempt was to save the allocation. And I was not very sure if it'd be fine to defer freeing the request until task-work fires up. Even if we take that route, we would still need a place to store bio pointer (hopefully meta pointer can be extracted out of bio). Do you see it differently? Thanks, -- Kanchan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 2/6] nvme: wire-up support for async-passthru on char-device. 2021-09-07 16:20 ` Kanchan Joshi @ 2021-09-08 6:15 ` Christoph Hellwig 2021-09-22 7:19 ` Kanchan Joshi 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2021-09-08 6:15 UTC (permalink / raw) To: Kanchan Joshi Cc: Christoph Hellwig, Kanchan Joshi, Jens Axboe, Keith Busch, io-uring, linux-nvme, anuj20.g, Javier Gonzalez, hare On Tue, Sep 07, 2021 at 09:50:27PM +0530, Kanchan Joshi wrote: > > A few other notes: > > > > - I suspect the ioctl_cmd really should move into the core using_cmd > > infrastructure > > Yes, that seems possible by creating that field outside by combining > "op" and "unused" below. > +struct io_uring_cmd { > + struct file *file; > + __u16 op; > + __u16 unused; > + __u32 len; > + __u64 pdu[5]; /* 40 bytes available inline for free use */ > +}; Two different issues here: - the idea of having a two layer indirection with op and a cmd doesn't really make much sense - if we want to avoid conflicts using 32-bit probably makes sense So I'd turn op and unused into a single cmd field, use the ioctl encoding macros for it (but preferably pick different numbers than the existing ioctls). > > - that whole mix of user space interface and internal data in the > > ->pdu field is a mess. What is the problem with deferring the > > request freeing into the user context, which would clean up > > quite a bit of that, especially if io_uring_cmd grows a private > > field. > > That mix isn't great but the attempt was to save the allocation. > And I was not very sure if it'd be fine to defer freeing the request > until task-work fires up. What would be the problem with the delaying? > Even if we take that route, we would still need a place to store bio > pointer (hopefully meta pointer can be extracted out of bio). > Do you see it differently? We don't need the bio pointer at all. The old passthrough code needed it when we still used block layer bonuce buffering for it. But that bounce buffering for passthrough commands got removed a while ago, and even before nvme never used it. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 2/6] nvme: wire-up support for async-passthru on char-device. 2021-09-08 6:15 ` Christoph Hellwig @ 2021-09-22 7:19 ` Kanchan Joshi 0 siblings, 0 replies; 19+ messages in thread From: Kanchan Joshi @ 2021-09-22 7:19 UTC (permalink / raw) To: Christoph Hellwig Cc: Kanchan Joshi, Jens Axboe, Keith Busch, io-uring, linux-nvme, anuj20.g, Javier Gonzalez, hare I am sorry for taking longer than I should have. On Wed, Sep 8, 2021 at 11:45 AM Christoph Hellwig <[email protected]> wrote: > > On Tue, Sep 07, 2021 at 09:50:27PM +0530, Kanchan Joshi wrote: > > > A few other notes: > > > > > > - I suspect the ioctl_cmd really should move into the core using_cmd > > > infrastructure > > > > Yes, that seems possible by creating that field outside by combining > > "op" and "unused" below. > > +struct io_uring_cmd { > > + struct file *file; > > + __u16 op; > > + __u16 unused; > > + __u32 len; > > + __u64 pdu[5]; /* 40 bytes available inline for free use */ > > +}; > > Two different issues here: > > - the idea of having a two layer indirection with op and a cmd doesn't > really make much sense > - if we want to avoid conflicts using 32-bit probably makes sense > > So I'd turn op and unused into a single cmd field, use the ioctl encoding > macros for it (but preferably pick different numbers than the existing > ioctls). I was thinking along the same lines, except the "picking different numbers than existing ioctls" part. Does that mean adding a new IOCTL for each operation which requires async transport? > > > - that whole mix of user space interface and internal data in the > > > ->pdu field is a mess. What is the problem with deferring the > > > request freeing into the user context, which would clean up > > > quite a bit of that, especially if io_uring_cmd grows a private > > > field. > > > > That mix isn't great but the attempt was to save the allocation. > > And I was not very sure if it'd be fine to defer freeing the request > > until task-work fires up. > > What would be the problem with the delaying? When we free the request, the tag is also freed, and that may enable someone else to pump more IO. Pushing freeing of requests to user-context seemed like delaying that part. If you think that is a misplaced concern, I can change. The changed structure will look like this - struct nvme_uring_cmd { __u32 ioctl_cmd; __u32 unused1; void __user *argp; union { struct bio *bio; struct request *req; }; void *meta; }; cmd->bio will be freed in nvme-completion while cmd->req will be freed in user context. Since we have the request intact, we will not store "u64 result; int status;" anymore and overall there will be a reduction of 4 bytes in size of nvme_uring_cmd. Would you prefer this way? > > Even if we take that route, we would still need a place to store bio > > pointer (hopefully meta pointer can be extracted out of bio). > > Do you see it differently? > > We don't need the bio pointer at all. The old passthrough code needed > it when we still used block layer bonuce buffering for it. But that > bounce buffering for passthrough commands got removed a while ago, > and even before nvme never used it. nvme_submit_user_cmd() calls blk_rq_map_user(), which sets up req->bio (and that is regardless of bounce buffering I suppose). For sync-ioctl, this bio pointer is locally stored and that is used to free the bio post completion. For async-ioctl too, we need some place to store this. So I could not connect this to bounce buffering (alone). Am I missing your point? One of the way could be to change blk_update_request() to avoid setting req->bio to NULL. But perhaps that may invite more troubles, and we are not saving anything: bio-pointer is inside the union anyway (in above struct nvme_uring_cmd). -- Kanchan ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CGME20210805125927epcas5p28f3413fe3d0a2baed37a05453df0d482@epcas5p2.samsung.com>]
* [RFC PATCH 3/6] io_uring: mark iopoll not supported for uring-cmd [not found] ` <CGME20210805125927epcas5p28f3413fe3d0a2baed37a05453df0d482@epcas5p2.samsung.com> @ 2021-08-05 12:55 ` Kanchan Joshi 0 siblings, 0 replies; 19+ messages in thread From: Kanchan Joshi @ 2021-08-05 12:55 UTC (permalink / raw) To: axboe, hch, kbusch; +Cc: io-uring, linux-nvme, anuj20.g, javier.gonz, hare From: Anuj Gupta <[email protected]> Currently uring-passthrough doesn't support iopoll. Bail out to avoid the panic. Signed-off-by: Anuj Gupta <[email protected]> --- fs/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index b73bc16c3e70..3361056313a7 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3587,6 +3587,11 @@ static int io_uring_cmd_prep(struct io_kiocb *req, if (!req->file->f_op->uring_cmd) return -EOPNOTSUPP; + if (req->ctx->flags & IORING_SETUP_IOPOLL) { + printk_once(KERN_WARNING "io_uring: iopoll not supported!\n"); + return -EOPNOTSUPP; + } + cmd->op = READ_ONCE(csqe->op); cmd->len = READ_ONCE(csqe->len); -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
[parent not found: <CGME20210805125931epcas5p259fec172085ea34fdbf5a1c1f8da5e90@epcas5p2.samsung.com>]
* [RFC PATCH 4/6] io_uring: add helper for fixed-buffer uring-cmd [not found] ` <CGME20210805125931epcas5p259fec172085ea34fdbf5a1c1f8da5e90@epcas5p2.samsung.com> @ 2021-08-05 12:55 ` Kanchan Joshi 2021-09-07 7:47 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Kanchan Joshi @ 2021-08-05 12:55 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, anuj20.g, javier.gonz, hare, Kanchan Joshi Refactor the existing code and factor out helper that can be used for passthrough with fixed-buffer. This is a prep patch. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- fs/io_uring.c | 21 +++++++++++++++------ include/linux/io_uring.h | 7 +++++++ 2 files changed, 22 insertions(+), 6 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 3361056313a7..1f2263a78c8e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2792,12 +2792,10 @@ static void kiocb_done(struct kiocb *kiocb, ssize_t ret, } } } - -static int __io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter, - struct io_mapped_ubuf *imu) +static int __io_import_fixed(u64 buf_addr, size_t len, int rw, + struct iov_iter *iter, struct io_mapped_ubuf *imu) { - size_t len = req->rw.len; - u64 buf_end, buf_addr = req->rw.addr; + u64 buf_end; size_t offset; if (unlikely(check_add_overflow(buf_addr, (u64)len, &buf_end))) @@ -2864,8 +2862,19 @@ static int io_import_fixed(struct io_kiocb *req, int rw, struct iov_iter *iter) imu = READ_ONCE(ctx->user_bufs[index]); req->imu = imu; } - return __io_import_fixed(req, rw, iter, imu); + return __io_import_fixed(req->rw.addr, req->rw.len, rw, iter, imu); +} + +int io_uring_cmd_import_fixed(void *ubuf, unsigned long len, + int rw, struct iov_iter *iter, void *ioucmd) +{ + u64 buf_addr = (u64)ubuf; + struct io_kiocb *req = container_of(ioucmd, struct io_kiocb, uring_cmd); + struct io_mapped_ubuf *imu = req->imu; + + return __io_import_fixed(buf_addr, len, rw, iter, imu); } +EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); static void io_ring_submit_unlock(struct io_ring_ctx *ctx, bool needs_lock) { diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 235d1603f97e..0bd8f611edb8 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -20,6 +20,8 @@ struct io_uring_cmd { }; #if defined(CONFIG_IO_URING) +int io_uring_cmd_import_fixed(void *ubuf, unsigned long len, + int rw, struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret); void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, void (*driver_cb)(struct io_uring_cmd *)); @@ -49,6 +51,11 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, void (*driver_cb)(struct io_uring_cmd *)) { } +int io_uring_cmd_import_fixed(void *ubuf, unsigned long len, + int rw, struct iov_iter *iter, void *ioucmd) +{ + return -1; +} static inline struct sock *io_uring_get_socket(struct file *file) { return NULL; -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 4/6] io_uring: add helper for fixed-buffer uring-cmd 2021-08-05 12:55 ` [RFC PATCH 4/6] io_uring: add helper for fixed-buffer uring-cmd Kanchan Joshi @ 2021-09-07 7:47 ` Christoph Hellwig 0 siblings, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2021-09-07 7:47 UTC (permalink / raw) To: Kanchan Joshi Cc: axboe, hch, kbusch, io-uring, linux-nvme, anuj20.g, javier.gonz, hare On Thu, Aug 05, 2021 at 06:25:37PM +0530, Kanchan Joshi wrote: > +int io_uring_cmd_import_fixed(void *ubuf, unsigned long len, > + int rw, struct iov_iter *iter, void *ioucmd) > +{ > + u64 buf_addr = (u64)ubuf; This will cause a warning on 32-bit platforms. ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CGME20210805125934epcas5p4ff88e95d558ad9f65d77a888a4211b18@epcas5p4.samsung.com>]
* [RFC PATCH 5/6] io_uring: add support for uring_cmd with fixed-buffer [not found] ` <CGME20210805125934epcas5p4ff88e95d558ad9f65d77a888a4211b18@epcas5p4.samsung.com> @ 2021-08-05 12:55 ` Kanchan Joshi 2021-09-07 7:48 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Kanchan Joshi @ 2021-08-05 12:55 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, anuj20.g, javier.gonz, hare, Kanchan Joshi From: Anuj Gupta <[email protected]> Add IORING_OP_URING_CMD_FIXED opcode that enables performing the operation with previously registered buffers. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- fs/io_uring.c | 27 ++++++++++++++++++++++++++- include/uapi/linux/io_uring.h | 6 +++++- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 1f2263a78c8e..a80f4c98ea86 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1060,6 +1060,10 @@ static const struct io_op_def io_op_defs[] = { .needs_file = 1, .offsets = 1, }, + [IORING_OP_URING_CMD_FIXED] = { + .needs_file = 1, + .offsets = 1, + }, }; static bool io_disarm_next(struct io_kiocb *req); @@ -3602,7 +3606,12 @@ static int io_uring_cmd_prep(struct io_kiocb *req, } cmd->op = READ_ONCE(csqe->op); - cmd->len = READ_ONCE(csqe->len); + if (req->opcode == IORING_OP_URING_CMD_FIXED) { + req->imu = NULL; + io_req_set_rsrc_node(req); + req->buf_index = READ_ONCE(csqe->buf_index); + } else + cmd->len = READ_ONCE(csqe->len); /* * The payload is the last 40 bytes of an io_uring_cmd_sqe, with the @@ -3617,6 +3626,20 @@ static int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) struct file *file = req->file; int ret; + if (req->opcode == IORING_OP_URING_CMD_FIXED) { + u32 index, buf_index = req->buf_index; + struct io_ring_ctx *ctx = req->ctx; + struct io_mapped_ubuf *imu = req->imu; + + if (likely(!imu)) { + if (unlikely(buf_index >= ctx->nr_user_bufs)) + return -EFAULT; + index = array_index_nospec(buf_index, ctx->nr_user_bufs); + imu = READ_ONCE(ctx->user_bufs[index]); + req->imu = imu; + } + } + ret = file->f_op->uring_cmd(&req->uring_cmd, issue_flags); /* queued async, consumer will call io_uring_cmd_done() when complete */ if (ret == -EIOCBQUEUED) @@ -6031,6 +6054,7 @@ static int io_req_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) case IORING_OP_UNLINKAT: return io_unlinkat_prep(req, sqe); case IORING_OP_URING_CMD: + case IORING_OP_URING_CMD_FIXED: return io_uring_cmd_prep(req, sqe); } @@ -6322,6 +6346,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) ret = io_unlinkat(req, issue_flags); break; case IORING_OP_URING_CMD: + case IORING_OP_URING_CMD_FIXED: ret = io_uring_cmd(req, issue_flags); break; default: diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 92565e17bfd9..1a10ebd4ca0a 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -75,7 +75,10 @@ struct io_uring_cmd_sqe { __u64 user_data; __u16 op; __u16 personality; - __u32 len; + union { + __u32 len; + __u16 buf_index; + }; __u64 pdu[5]; }; @@ -154,6 +157,7 @@ enum { IORING_OP_RENAMEAT, IORING_OP_UNLINKAT, IORING_OP_URING_CMD, + IORING_OP_URING_CMD_FIXED, /* this goes last, obviously */ IORING_OP_LAST, -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 5/6] io_uring: add support for uring_cmd with fixed-buffer 2021-08-05 12:55 ` [RFC PATCH 5/6] io_uring: add support for uring_cmd with fixed-buffer Kanchan Joshi @ 2021-09-07 7:48 ` Christoph Hellwig 2021-09-07 16:29 ` Kanchan Joshi 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2021-09-07 7:48 UTC (permalink / raw) To: Kanchan Joshi Cc: axboe, hch, kbusch, io-uring, linux-nvme, anuj20.g, javier.gonz, hare On Thu, Aug 05, 2021 at 06:25:38PM +0530, Kanchan Joshi wrote: > From: Anuj Gupta <[email protected]> > > Add IORING_OP_URING_CMD_FIXED opcode that enables performing the > operation with previously registered buffers. We should also pass this information on into ->uring_cmd instead of needing two ioctl_cmd opcodes. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 5/6] io_uring: add support for uring_cmd with fixed-buffer 2021-09-07 7:48 ` Christoph Hellwig @ 2021-09-07 16:29 ` Kanchan Joshi 0 siblings, 0 replies; 19+ messages in thread From: Kanchan Joshi @ 2021-09-07 16:29 UTC (permalink / raw) To: Christoph Hellwig Cc: Kanchan Joshi, Jens Axboe, Keith Busch, io-uring, linux-nvme, anuj20.g, Javier Gonzalez, hare On Tue, Sep 7, 2021 at 1:19 PM Christoph Hellwig <[email protected]> wrote: > > On Thu, Aug 05, 2021 at 06:25:38PM +0530, Kanchan Joshi wrote: > > From: Anuj Gupta <[email protected]> > > > > Add IORING_OP_URING_CMD_FIXED opcode that enables performing the > > operation with previously registered buffers. > > We should also pass this information on into ->uring_cmd instead of > needing two ioctl_cmd opcodes. Indeed. We now (in internal version) switch between fixed/regular ioctl by looking at io-uring opcode which will be different anyway. -- Kanchan ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <CGME20210805125937epcas5p15667b460e28d87bd40400f69005aafe3@epcas5p1.samsung.com>]
* [RFC PATCH 6/6] nvme: enable passthrough with fixed-buffer [not found] ` <CGME20210805125937epcas5p15667b460e28d87bd40400f69005aafe3@epcas5p1.samsung.com> @ 2021-08-05 12:55 ` Kanchan Joshi 2021-09-07 7:50 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Kanchan Joshi @ 2021-08-05 12:55 UTC (permalink / raw) To: axboe, hch, kbusch Cc: io-uring, linux-nvme, anuj20.g, javier.gonz, hare, Kanchan Joshi Add two new variants of passthrough ioctls (NVMe_IOCTL_IO/IO64_CMD_FIXED) to carry out passthrough command with pre-mapped buffers. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- drivers/nvme/host/ioctl.c | 68 ++++++++++++++++++++++++++++++++- include/uapi/linux/nvme_ioctl.h | 2 + 2 files changed, 68 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 2730c5dfdf78..d336e34aac41 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -139,6 +139,64 @@ static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf, out: return ERR_PTR(ret); } +static inline bool nvme_is_fixedb_passthru(struct io_uring_cmd *ioucmd) +{ + struct block_uring_cmd *bcmd; + + if (!ioucmd) + return false; + bcmd = (struct block_uring_cmd *)&ioucmd->pdu; + if (bcmd && ((bcmd->ioctl_cmd == NVME_IOCTL_IO_CMD_FIXED) || + (bcmd->ioctl_cmd == NVME_IOCTL_IO64_CMD_FIXED))) + return true; + return false; +} +/* + * Unlike blk_rq_map_user () this is only for fixed-buffer async passthrough. + * And hopefully faster as well. + */ +int nvme_rq_map_user_fixedb(struct request_queue *q, struct request *rq, + void __user *ubuf, unsigned long len, gfp_t gfp_mask, + struct io_uring_cmd *ioucmd) +{ + struct iov_iter iter; + size_t iter_count, nr_segs; + struct bio *bio; + int ret; + + /* + * Talk to io_uring to obtain BVEC iterator for the buffer. + * And use that iterator to form bio/request. + */ + ret = io_uring_cmd_import_fixed(ubuf, len, rq_data_dir(rq), &iter, + ioucmd); + if (unlikely(ret < 0)) + return ret; + iter_count = iov_iter_count(&iter); + nr_segs = iter.nr_segs; + + if (!iter_count || (iter_count >> 9) > queue_max_hw_sectors(q)) + return -EINVAL; + if (nr_segs > queue_max_segments(q)) + return -EINVAL; + /* no iovecs to alloc, as we already have a BVEC iterator */ + bio = bio_kmalloc(gfp_mask, 0); + if (!bio) + return -ENOMEM; + + bio->bi_opf |= req_op(rq); + ret = bio_iov_iter_get_pages(bio, &iter); + if (ret) + goto out_free; + + blk_rq_bio_prep(rq, bio, nr_segs); + return 0; + +out_free: + bio_release_pages(bio, false); + bio_put(bio); + return ret; +} static int nvme_submit_user_cmd(struct request_queue *q, struct nvme_command *cmd, void __user *ubuffer, @@ -163,8 +221,12 @@ static int nvme_submit_user_cmd(struct request_queue *q, nvme_req(req)->flags |= NVME_REQ_USERCMD; if (ubuffer && bufflen) { - ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, - GFP_KERNEL); + if (likely(!nvme_is_fixedb_passthru(ioucmd))) + ret = blk_rq_map_user(q, req, NULL, ubuffer, bufflen, + GFP_KERNEL); + else + ret = nvme_rq_map_user_fixedb(q, req, ubuffer, bufflen, + GFP_KERNEL, ioucmd); if (ret) goto out; bio = req->bio; @@ -480,9 +542,11 @@ static int nvme_ns_async_ioctl(struct nvme_ns *ns, struct io_uring_cmd *ioucmd) switch (bcmd->ioctl_cmd) { case NVME_IOCTL_IO_CMD: + case NVME_IOCTL_IO_CMD_FIXED: ret = nvme_user_cmd(ns->ctrl, ns, argp, ioucmd); break; case NVME_IOCTL_IO64_CMD: + case NVME_IOCTL_IO64_CMD_FIXED: ret = nvme_user_cmd64(ns->ctrl, ns, argp, ioucmd); break; default: diff --git a/include/uapi/linux/nvme_ioctl.h b/include/uapi/linux/nvme_ioctl.h index d99b5a772698..fc05c6024edd 100644 --- a/include/uapi/linux/nvme_ioctl.h +++ b/include/uapi/linux/nvme_ioctl.h @@ -78,5 +78,7 @@ struct nvme_passthru_cmd64 { #define NVME_IOCTL_RESCAN _IO('N', 0x46) #define NVME_IOCTL_ADMIN64_CMD _IOWR('N', 0x47, struct nvme_passthru_cmd64) #define NVME_IOCTL_IO64_CMD _IOWR('N', 0x48, struct nvme_passthru_cmd64) +#define NVME_IOCTL_IO_CMD_FIXED _IOWR('N', 0x49, struct nvme_passthru_cmd) +#define NVME_IOCTL_IO64_CMD_FIXED _IOWR('N', 0x50, struct nvme_passthru_cmd64) #endif /* _UAPI_LINUX_NVME_IOCTL_H */ -- 2.25.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 6/6] nvme: enable passthrough with fixed-buffer 2021-08-05 12:55 ` [RFC PATCH 6/6] nvme: enable passthrough " Kanchan Joshi @ 2021-09-07 7:50 ` Christoph Hellwig 2021-09-07 16:47 ` Kanchan Joshi 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2021-09-07 7:50 UTC (permalink / raw) To: Kanchan Joshi Cc: axboe, hch, kbusch, io-uring, linux-nvme, anuj20.g, javier.gonz, hare > +/* > + * Unlike blk_rq_map_user () this is only for fixed-buffer async passthrough. > + * And hopefully faster as well. > + */ This belongs into io_uring.c. And that hopeful comment needs to be validated and removed. > +int nvme_rq_map_user_fixedb(struct request_queue *q, struct request *rq, > + void __user *ubuf, unsigned long len, gfp_t gfp_mask, > + struct io_uring_cmd *ioucmd) > +{ > + struct iov_iter iter; > + size_t iter_count, nr_segs; > + struct bio *bio; > + int ret; > + > + /* > + * Talk to io_uring to obtain BVEC iterator for the buffer. > + * And use that iterator to form bio/request. > + */ > + ret = io_uring_cmd_import_fixed(ubuf, len, rq_data_dir(rq), &iter, > + ioucmd); io_uring_cmd_import_fixed takes a non-__user pointer, so this will cause a sparse warning. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 6/6] nvme: enable passthrough with fixed-buffer 2021-09-07 7:50 ` Christoph Hellwig @ 2021-09-07 16:47 ` Kanchan Joshi 2021-09-08 6:16 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Kanchan Joshi @ 2021-09-07 16:47 UTC (permalink / raw) To: Christoph Hellwig Cc: Kanchan Joshi, Jens Axboe, Keith Busch, io-uring, linux-nvme, anuj20.g, Javier Gonzalez, hare On Tue, Sep 7, 2021 at 1:21 PM Christoph Hellwig <[email protected]> wrote: > > > +/* > > + * Unlike blk_rq_map_user () this is only for fixed-buffer async passthrough. > > + * And hopefully faster as well. > > + */ > > This belongs into io_uring.c. And that hopeful comment needs to be > validated and removed. I'll do away with that comment. But sorry, what part do you think should rather move to io_uring. Is that for the whole helper "nvme_rq_map_user_fixed"? -- Kanchan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 6/6] nvme: enable passthrough with fixed-buffer 2021-09-07 16:47 ` Kanchan Joshi @ 2021-09-08 6:16 ` Christoph Hellwig 0 siblings, 0 replies; 19+ messages in thread From: Christoph Hellwig @ 2021-09-08 6:16 UTC (permalink / raw) To: Kanchan Joshi Cc: Christoph Hellwig, Kanchan Joshi, Jens Axboe, Keith Busch, io-uring, linux-nvme, anuj20.g, Javier Gonzalez, hare On Tue, Sep 07, 2021 at 10:17:53PM +0530, Kanchan Joshi wrote: > I'll do away with that comment. But sorry, what part do you think > should rather move to io_uring. Is that for the whole helper > "nvme_rq_map_user_fixed"? Yes. Nothing nvme-specific in it. Given that it only makes sense to be used for io_uring support moving it to io_uring.c might make sense. Or conditionally to block/blk-map.c. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/6] Fixed-buffers io_uring passthrough over nvme-char 2021-08-05 12:55 ` [RFC PATCH 0/6] Fixed-buffers io_uring passthrough over nvme-char Kanchan Joshi ` (5 preceding siblings ...) [not found] ` <CGME20210805125937epcas5p15667b460e28d87bd40400f69005aafe3@epcas5p1.samsung.com> @ 2021-09-07 7:10 ` Christoph Hellwig 2021-09-07 12:38 ` Jens Axboe 6 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2021-09-07 7:10 UTC (permalink / raw) To: Kanchan Joshi, axboe Cc: kbusch, io-uring, linux-nvme, anuj20.g, javier.gonz, hare Sorry for taking so long to get back to this, the previous merge window has been a little busy. On Thu, Aug 05, 2021 at 06:25:33PM +0530, Kanchan Joshi wrote: > The work is on top of Jens' branch (which is 5.14-rc1 based): > https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v5 I think we need to have this reviewed on the list as well. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH 0/6] Fixed-buffers io_uring passthrough over nvme-char 2021-09-07 7:10 ` [RFC PATCH 0/6] Fixed-buffers io_uring passthrough over nvme-char Christoph Hellwig @ 2021-09-07 12:38 ` Jens Axboe 0 siblings, 0 replies; 19+ messages in thread From: Jens Axboe @ 2021-09-07 12:38 UTC (permalink / raw) To: Christoph Hellwig, Kanchan Joshi Cc: kbusch, io-uring, linux-nvme, anuj20.g, javier.gonz, hare On 9/7/21 1:10 AM, Christoph Hellwig wrote: > Sorry for taking so long to get back to this, the previous merge > window has been a little busy. > > On Thu, Aug 05, 2021 at 06:25:33PM +0530, Kanchan Joshi wrote: >> The work is on top of Jens' branch (which is 5.14-rc1 based): >> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-fops.v5 > > I think we need to have this reviewed on the list as well. Indeed - it has been posted in the past, but needs a refresh and a re-post. Only reason it hasn't is that I'm not that happy with the sqe layout changes. -- Jens Axboe ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-09-22 7:20 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20210805125910epcas5p1100e7093dd2b1ac5bbb751331e2ded23@epcas5p1.samsung.com> 2021-08-05 12:55 ` [RFC PATCH 0/6] Fixed-buffers io_uring passthrough over nvme-char Kanchan Joshi [not found] ` <CGME20210805125917epcas5p4f75c9423a7b886dc79500901cc8f55ab@epcas5p4.samsung.com> 2021-08-05 12:55 ` [RFC PATCH 1/6] io_uring: add infra for uring_cmd completion in submitter-task Kanchan Joshi [not found] ` <CGME20210805125923epcas5p10e6c1b95475440be68f58244d5a3cb9a@epcas5p1.samsung.com> 2021-08-05 12:55 ` [RFC PATCH 2/6] nvme: wire-up support for async-passthru on char-device Kanchan Joshi 2021-09-07 7:46 ` Christoph Hellwig 2021-09-07 16:20 ` Kanchan Joshi 2021-09-08 6:15 ` Christoph Hellwig 2021-09-22 7:19 ` Kanchan Joshi [not found] ` <CGME20210805125927epcas5p28f3413fe3d0a2baed37a05453df0d482@epcas5p2.samsung.com> 2021-08-05 12:55 ` [RFC PATCH 3/6] io_uring: mark iopoll not supported for uring-cmd Kanchan Joshi [not found] ` <CGME20210805125931epcas5p259fec172085ea34fdbf5a1c1f8da5e90@epcas5p2.samsung.com> 2021-08-05 12:55 ` [RFC PATCH 4/6] io_uring: add helper for fixed-buffer uring-cmd Kanchan Joshi 2021-09-07 7:47 ` Christoph Hellwig [not found] ` <CGME20210805125934epcas5p4ff88e95d558ad9f65d77a888a4211b18@epcas5p4.samsung.com> 2021-08-05 12:55 ` [RFC PATCH 5/6] io_uring: add support for uring_cmd with fixed-buffer Kanchan Joshi 2021-09-07 7:48 ` Christoph Hellwig 2021-09-07 16:29 ` Kanchan Joshi [not found] ` <CGME20210805125937epcas5p15667b460e28d87bd40400f69005aafe3@epcas5p1.samsung.com> 2021-08-05 12:55 ` [RFC PATCH 6/6] nvme: enable passthrough " Kanchan Joshi 2021-09-07 7:50 ` Christoph Hellwig 2021-09-07 16:47 ` Kanchan Joshi 2021-09-08 6:16 ` Christoph Hellwig 2021-09-07 7:10 ` [RFC PATCH 0/6] Fixed-buffers io_uring passthrough over nvme-char Christoph Hellwig 2021-09-07 12:38 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox