* [PATCH for-next v2 0/4] iopoll support for io_uring/nvme [not found] <CGME20220807184540epcas5p41f496a87fe65cff524740ddde071b4bb@epcas5p4.samsung.com> @ 2022-08-07 18:36 ` Kanchan Joshi [not found] ` <CGME20220807184544epcas5p19f676581e0fdf555fa1d0a83906f2fc7@epcas5p1.samsung.com> ` (3 more replies) 0 siblings, 4 replies; 5+ messages in thread From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw) To: axboe, hch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Kanchan Joshi Hi, Series enables async polling on io_uring command, and nvme passthrough (for io-commands) is wired up to leverage that. Changes since v1: - corrected variable name (Jens) - fix for a warning (test-robot) Performance impact: Pre TLDR: polling gives clear win. 512b randread performance (KIOPS): QD_batch block passthru passthru-poll block-poll 1_1 80 81 158 157 8_2 406 470 680 700 16_4 620 656 931 920 128_32 879 1056 1120 1132 Upstream fio is used for testing. Polled queues set to 1 in nvme. passthru command line: fio -iodepth=64 -rw=randread -ioengine=io_uring_cmd -bs=512 -numjobs=1 -runtime=60 -group_reporting -iodepth_batch_submit=16 -iodepth_batch_complete_min=1 -iodepth_batch_complete_max=16 -cmd_type=nvme -hipri=0 -filename=/dev/ng1n1 -name=io_uring_cmd_64 block command line: fio -direct=1 -iodepth=64 -rw=randread -ioengine=io_uring -bs=512 -numjobs=1 -runtime=60 -group_reporting -iodepth_batch_submit=16 -iodepth_batch_complete_min=1 -iodepth_batch_complete_max=16 -hipri=0 -filename=/dev/nvme1n1 name=io_uring_64 Bit of code went into non-passthrough path for io_uring (patch 2) but I do not see that causing any performance regression. peak-perf test showed 2.3M IOPS with or without this series for block-io. io_uring: Running taskset -c 0,12 t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 /dev/nvme0n1 submitter=0, tid=3089, file=/dev/nvme0n1, node=-1 submitter=1, tid=3090, file=/dev/nvme0n1, node=-1 polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=2.31M, BW=1126MiB/s, IOS/call=31/31 IOPS=2.30M, BW=1124MiB/s, IOS/call=32/31 IOPS=2.30M, BW=1123MiB/s, IOS/call=32/32 Kanchan Joshi (4): fs: add file_operations->uring_cmd_iopoll io_uring: add iopoll infrastructure for io_uring_cmd block: export blk_rq_is_poll nvme: wire up async polling for io passthrough commands block/blk-mq.c | 3 +- drivers/nvme/host/core.c | 1 + drivers/nvme/host/ioctl.c | 73 ++++++++++++++++++++++++++++++++--- drivers/nvme/host/multipath.c | 1 + drivers/nvme/host/nvme.h | 2 + include/linux/blk-mq.h | 1 + include/linux/fs.h | 1 + include/linux/io_uring.h | 8 +++- io_uring/io_uring.c | 6 +++ io_uring/opdef.c | 1 + io_uring/rw.c | 8 +++- io_uring/uring_cmd.c | 11 +++++- 12 files changed, 105 insertions(+), 11 deletions(-) base-commit: ece775e9aa8232963cc1bddf5cc91285db6233af -- 2.25.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CGME20220807184544epcas5p19f676581e0fdf555fa1d0a83906f2fc7@epcas5p1.samsung.com>]
* [PATCH for-next v2 1/4] fs: add file_operations->uring_cmd_iopoll [not found] ` <CGME20220807184544epcas5p19f676581e0fdf555fa1d0a83906f2fc7@epcas5p1.samsung.com> @ 2022-08-07 18:36 ` Kanchan Joshi 0 siblings, 0 replies; 5+ messages in thread From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw) To: axboe, hch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Kanchan Joshi io_uring will trigger this to do completion polling on uring-cmd operations. Signed-off-by: Kanchan Joshi <[email protected]> --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index 9f131e559d05..449941f99f50 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2134,6 +2134,7 @@ struct file_operations { loff_t len, unsigned int remap_flags); int (*fadvise)(struct file *, loff_t, loff_t, int); int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags); + int (*uring_cmd_iopoll)(struct io_uring_cmd *ioucmd); } __randomize_layout; struct inode_operations { -- 2.25.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
[parent not found: <CGME20220807184547epcas5p23b4ef30467d65d1b81632e7c514fc192@epcas5p2.samsung.com>]
* [PATCH for-next v2 2/4] io_uring: add iopoll infrastructure for io_uring_cmd [not found] ` <CGME20220807184547epcas5p23b4ef30467d65d1b81632e7c514fc192@epcas5p2.samsung.com> @ 2022-08-07 18:36 ` Kanchan Joshi 0 siblings, 0 replies; 5+ messages in thread From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw) To: axboe, hch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Kanchan Joshi, Pankaj Raghav Put this up in the same way as iopoll is done for regular read/write IO. Make place for storing a cookie into struct io_uring_cmd on its submission. Perform the completion using the ->uring_cmd_iopoll handler. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Pankaj Raghav <[email protected]> --- include/linux/io_uring.h | 8 ++++++-- io_uring/io_uring.c | 6 ++++++ io_uring/opdef.c | 1 + io_uring/rw.c | 8 +++++++- io_uring/uring_cmd.c | 11 +++++++++-- 5 files changed, 29 insertions(+), 5 deletions(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 4a2f6cc5a492..58676c0a398f 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -20,8 +20,12 @@ enum io_uring_cmd_flags { struct io_uring_cmd { struct file *file; const void *cmd; - /* callback to defer completions to task context */ - void (*task_work_cb)(struct io_uring_cmd *cmd); + union { + /* callback to defer completions to task context */ + void (*task_work_cb)(struct io_uring_cmd *cmd); + /* used for polled completion */ + void *cookie; + }; u32 cmd_op; u32 pad; u8 pdu[32]; /* available inline for free use */ diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index b54218da075c..48a430a86b50 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1296,6 +1296,12 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min) wq_list_empty(&ctx->iopoll_list)) break; } + + if (task_work_pending(current)) { + mutex_unlock(&ctx->uring_lock); + io_run_task_work(); + mutex_lock(&ctx->uring_lock); + } ret = io_do_iopoll(ctx, !min); if (ret < 0) break; diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 72dd2b2d8a9d..9a0df19306fe 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -466,6 +466,7 @@ const struct io_op_def io_op_defs[] = { .needs_file = 1, .plug = 1, .name = "URING_CMD", + .iopoll = 1, .async_size = uring_cmd_pdu_size(1), .prep = io_uring_cmd_prep, .issue = io_uring_cmd, diff --git a/io_uring/rw.c b/io_uring/rw.c index 2b784795103c..1a4fb8a44b9a 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -1005,7 +1005,13 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) if (READ_ONCE(req->iopoll_completed)) break; - ret = rw->kiocb.ki_filp->f_op->iopoll(&rw->kiocb, &iob, poll_flags); + if (req->opcode == IORING_OP_URING_CMD) { + struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw; + + ret = req->file->f_op->uring_cmd_iopoll(ioucmd); + } else + ret = rw->kiocb.ki_filp->f_op->iopoll(&rw->kiocb, &iob, + poll_flags); if (unlikely(ret < 0)) return ret; else if (ret) diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 0a421ed51e7e..5cc339fba8b8 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -49,7 +49,11 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2) io_req_set_res(req, 0, ret); if (req->ctx->flags & IORING_SETUP_CQE32) io_req_set_cqe32_extra(req, res2, 0); - __io_req_complete(req, 0); + if (req->ctx->flags & IORING_SETUP_IOPOLL) + /* order with io_iopoll_req_issued() checking ->iopoll_completed */ + smp_store_release(&req->iopoll_completed, 1); + else + __io_req_complete(req, 0); } EXPORT_SYMBOL_GPL(io_uring_cmd_done); @@ -89,8 +93,11 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) issue_flags |= IO_URING_F_SQE128; if (ctx->flags & IORING_SETUP_CQE32) issue_flags |= IO_URING_F_CQE32; - if (ctx->flags & IORING_SETUP_IOPOLL) + if (ctx->flags & IORING_SETUP_IOPOLL) { issue_flags |= IO_URING_F_IOPOLL; + req->iopoll_completed = 0; + WRITE_ONCE(ioucmd->cookie, NULL); + } if (req_has_async_data(req)) ioucmd->cmd = req->async_data; -- 2.25.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
[parent not found: <CGME20220807184551epcas5p3b85421505f9c28d31492163f69c59d69@epcas5p3.samsung.com>]
* [PATCH for-next v2 3/4] block: export blk_rq_is_poll [not found] ` <CGME20220807184551epcas5p3b85421505f9c28d31492163f69c59d69@epcas5p3.samsung.com> @ 2022-08-07 18:36 ` Kanchan Joshi 0 siblings, 0 replies; 5+ messages in thread From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw) To: axboe, hch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Kanchan Joshi This is being done as preparation to support iopoll for nvme passthrough Signed-off-by: Kanchan Joshi <[email protected]> --- block/blk-mq.c | 3 ++- include/linux/blk-mq.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/block/blk-mq.c b/block/blk-mq.c index 5ee62b95f3e5..de42f7237bad 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -1233,7 +1233,7 @@ static void blk_end_sync_rq(struct request *rq, blk_status_t ret) complete(&wait->done); } -static bool blk_rq_is_poll(struct request *rq) +bool blk_rq_is_poll(struct request *rq) { if (!rq->mq_hctx) return false; @@ -1243,6 +1243,7 @@ static bool blk_rq_is_poll(struct request *rq) return false; return true; } +EXPORT_SYMBOL_GPL(blk_rq_is_poll); static void blk_rq_poll_completion(struct request *rq, struct completion *wait) { diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index effee1dc715a..8f841caaa4cb 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -981,6 +981,7 @@ int blk_rq_map_kern(struct request_queue *, struct request *, void *, int blk_rq_append_bio(struct request *rq, struct bio *bio); void blk_execute_rq_nowait(struct request *rq, bool at_head); blk_status_t blk_execute_rq(struct request *rq, bool at_head); +bool blk_rq_is_poll(struct request *rq); struct req_iterator { struct bvec_iter iter; -- 2.25.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
[parent not found: <CGME20220807184555epcas5p4b7f5018c52d150150c32458fe3c21986@epcas5p4.samsung.com>]
* [PATCH for-next v2 4/4] nvme: wire up async polling for io passthrough commands [not found] ` <CGME20220807184555epcas5p4b7f5018c52d150150c32458fe3c21986@epcas5p4.samsung.com> @ 2022-08-07 18:36 ` Kanchan Joshi 0 siblings, 0 replies; 5+ messages in thread From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw) To: axboe, hch Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev, Kanchan Joshi, Anuj Gupta Store a cookie during submission, and use that to implement completion-polling inside the ->uring_cmd_iopoll handler. This handler makes use of existing bio poll facility. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- drivers/nvme/host/core.c | 1 + drivers/nvme/host/ioctl.c | 73 ++++++++++++++++++++++++++++++++--- drivers/nvme/host/multipath.c | 1 + drivers/nvme/host/nvme.h | 2 + 4 files changed, 72 insertions(+), 5 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 2429b11eb9a8..77b6c2882afd 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3976,6 +3976,7 @@ static const struct file_operations nvme_ns_chr_fops = { .unlocked_ioctl = nvme_ns_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, .uring_cmd = nvme_ns_chr_uring_cmd, + .uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll, }; static int nvme_add_ns_cdev(struct nvme_ns *ns) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 27614bee7380..7756b439a688 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -391,11 +391,19 @@ static void nvme_uring_cmd_end_io(struct request *req, blk_status_t err) struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd); /* extract bio before reusing the same field for request */ struct bio *bio = pdu->bio; + void *cookie = READ_ONCE(ioucmd->cookie); pdu->req = req; req->bio = bio; - /* this takes care of moving rest of completion-work to task context */ - io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb); + + /* + * For iopoll, complete it directly. + * Otherwise, move the completion to task work. + */ + if (cookie != NULL && blk_rq_is_poll(req)) + nvme_uring_task_cb(ioucmd); + else + io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb); } static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, @@ -445,7 +453,10 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, rq_flags = REQ_NOWAIT; blk_flags = BLK_MQ_REQ_NOWAIT; } + if (issue_flags & IO_URING_F_IOPOLL) + rq_flags |= REQ_POLLED; +retry: req = nvme_alloc_user_request(q, &c, nvme_to_user_ptr(d.addr), d.data_len, nvme_to_user_ptr(d.metadata), d.metadata_len, 0, &meta, d.timeout_ms ? @@ -456,6 +467,17 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, req->end_io = nvme_uring_cmd_end_io; req->end_io_data = ioucmd; + if (issue_flags & IO_URING_F_IOPOLL && rq_flags & REQ_POLLED) { + if (unlikely(!req->bio)) { + /* we can't poll this, so alloc regular req instead */ + blk_mq_free_request(req); + rq_flags &= ~REQ_POLLED; + goto retry; + } else { + WRITE_ONCE(ioucmd->cookie, req->bio); + req->bio->bi_opf |= REQ_POLLED; + } + } /* to free bio on completion, as req->bio will be null at that time */ pdu->bio = req->bio; pdu->meta = meta; @@ -559,9 +581,6 @@ long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg) static int nvme_uring_cmd_checks(unsigned int issue_flags) { - /* IOPOLL not supported yet */ - if (issue_flags & IO_URING_F_IOPOLL) - return -EOPNOTSUPP; /* NVMe passthrough requires big SQE/CQE support */ if ((issue_flags & (IO_URING_F_SQE128|IO_URING_F_CQE32)) != @@ -604,6 +623,23 @@ int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) return nvme_ns_uring_cmd(ns, ioucmd, issue_flags); } +int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd) +{ + struct bio *bio; + int ret = 0; + struct nvme_ns *ns; + struct request_queue *q; + + rcu_read_lock(); + bio = READ_ONCE(ioucmd->cookie); + ns = container_of(file_inode(ioucmd->file)->i_cdev, + struct nvme_ns, cdev); + q = ns->queue; + if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio && bio->bi_bdev) + ret = bio_poll(bio, NULL, 0); + rcu_read_unlock(); + return ret; +} #ifdef CONFIG_NVME_MULTIPATH static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd, void __user *argp, struct nvme_ns_head *head, int srcu_idx) @@ -685,6 +721,29 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd, srcu_read_unlock(&head->srcu, srcu_idx); return ret; } + +int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd) +{ + struct cdev *cdev = file_inode(ioucmd->file)->i_cdev; + struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev); + int srcu_idx = srcu_read_lock(&head->srcu); + struct nvme_ns *ns = nvme_find_path(head); + struct bio *bio; + int ret = 0; + struct request_queue *q; + + if (ns) { + rcu_read_lock(); + bio = READ_ONCE(ioucmd->cookie); + q = ns->queue; + if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio + && bio->bi_bdev) + ret = bio_poll(bio, NULL, 0); + rcu_read_unlock(); + } + srcu_read_unlock(&head->srcu, srcu_idx); + return ret; +} #endif /* CONFIG_NVME_MULTIPATH */ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) @@ -692,6 +751,10 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags) struct nvme_ctrl *ctrl = ioucmd->file->private_data; int ret; + /* IOPOLL not supported yet */ + if (issue_flags & IO_URING_F_IOPOLL) + return -EOPNOTSUPP; + ret = nvme_uring_cmd_checks(issue_flags); if (ret) return ret; diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 6ef497c75a16..00f2f81e20fa 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -439,6 +439,7 @@ static const struct file_operations nvme_ns_head_chr_fops = { .unlocked_ioctl = nvme_ns_head_chr_ioctl, .compat_ioctl = compat_ptr_ioctl, .uring_cmd = nvme_ns_head_chr_uring_cmd, + .uring_cmd_iopoll = nvme_ns_head_chr_uring_cmd_iopoll, }; static int nvme_add_ns_head_cdev(struct nvme_ns_head *head) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index bdc0ff7ed9ab..3f2d3dda6e6c 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -821,6 +821,8 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg); long nvme_dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg); +int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd); +int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd); int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags); int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd, -- 2.25.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-08-07 18:46 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20220807184540epcas5p41f496a87fe65cff524740ddde071b4bb@epcas5p4.samsung.com> 2022-08-07 18:36 ` [PATCH for-next v2 0/4] iopoll support for io_uring/nvme Kanchan Joshi [not found] ` <CGME20220807184544epcas5p19f676581e0fdf555fa1d0a83906f2fc7@epcas5p1.samsung.com> 2022-08-07 18:36 ` [PATCH for-next v2 1/4] fs: add file_operations->uring_cmd_iopoll Kanchan Joshi [not found] ` <CGME20220807184547epcas5p23b4ef30467d65d1b81632e7c514fc192@epcas5p2.samsung.com> 2022-08-07 18:36 ` [PATCH for-next v2 2/4] io_uring: add iopoll infrastructure for io_uring_cmd Kanchan Joshi [not found] ` <CGME20220807184551epcas5p3b85421505f9c28d31492163f69c59d69@epcas5p3.samsung.com> 2022-08-07 18:36 ` [PATCH for-next v2 3/4] block: export blk_rq_is_poll Kanchan Joshi [not found] ` <CGME20220807184555epcas5p4b7f5018c52d150150c32458fe3c21986@epcas5p4.samsung.com> 2022-08-07 18:36 ` [PATCH for-next v2 4/4] nvme: wire up async polling for io passthrough commands Kanchan Joshi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox