* [PATCH for-next v2 0/4] iopoll support for io_uring/nvme
[not found] <CGME20220807184540epcas5p41f496a87fe65cff524740ddde071b4bb@epcas5p4.samsung.com>
@ 2022-08-07 18:36 ` Kanchan Joshi
[not found] ` <CGME20220807184544epcas5p19f676581e0fdf555fa1d0a83906f2fc7@epcas5p1.samsung.com>
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw)
To: axboe, hch
Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev,
Kanchan Joshi
Hi,
Series enables async polling on io_uring command, and nvme passthrough
(for io-commands) is wired up to leverage that.
Changes since v1:
- corrected variable name (Jens)
- fix for a warning (test-robot)
Performance impact:
Pre TLDR: polling gives clear win.
512b randread performance (KIOPS):
QD_batch block passthru passthru-poll block-poll
1_1 80 81 158 157
8_2 406 470 680 700
16_4 620 656 931 920
128_32 879 1056 1120 1132
Upstream fio is used for testing. Polled queues set to 1 in nvme.
passthru command line:
fio -iodepth=64 -rw=randread -ioengine=io_uring_cmd -bs=512 -numjobs=1
-runtime=60 -group_reporting -iodepth_batch_submit=16
-iodepth_batch_complete_min=1 -iodepth_batch_complete_max=16
-cmd_type=nvme -hipri=0 -filename=/dev/ng1n1 -name=io_uring_cmd_64
block command line:
fio -direct=1 -iodepth=64 -rw=randread -ioengine=io_uring -bs=512
-numjobs=1 -runtime=60 -group_reporting -iodepth_batch_submit=16
-iodepth_batch_complete_min=1 -iodepth_batch_complete_max=16
-hipri=0 -filename=/dev/nvme1n1 name=io_uring_64
Bit of code went into non-passthrough path for io_uring (patch 2) but I
do not see that causing any performance regression.
peak-perf test showed 2.3M IOPS with or without this series for
block-io.
io_uring: Running taskset -c 0,12 t/io_uring -b512 -d128 -c32 -s32 -p1
-F1 -B1 -n2 /dev/nvme0n1
submitter=0, tid=3089, file=/dev/nvme0n1, node=-1
submitter=1, tid=3090, file=/dev/nvme0n1, node=-1
polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
polled=1, fixedbufs=1/0, register_files=1, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=2.31M, BW=1126MiB/s, IOS/call=31/31
IOPS=2.30M, BW=1124MiB/s, IOS/call=32/31
IOPS=2.30M, BW=1123MiB/s, IOS/call=32/32
Kanchan Joshi (4):
fs: add file_operations->uring_cmd_iopoll
io_uring: add iopoll infrastructure for io_uring_cmd
block: export blk_rq_is_poll
nvme: wire up async polling for io passthrough commands
block/blk-mq.c | 3 +-
drivers/nvme/host/core.c | 1 +
drivers/nvme/host/ioctl.c | 73 ++++++++++++++++++++++++++++++++---
drivers/nvme/host/multipath.c | 1 +
drivers/nvme/host/nvme.h | 2 +
include/linux/blk-mq.h | 1 +
include/linux/fs.h | 1 +
include/linux/io_uring.h | 8 +++-
io_uring/io_uring.c | 6 +++
io_uring/opdef.c | 1 +
io_uring/rw.c | 8 +++-
io_uring/uring_cmd.c | 11 +++++-
12 files changed, 105 insertions(+), 11 deletions(-)
base-commit: ece775e9aa8232963cc1bddf5cc91285db6233af
--
2.25.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH for-next v2 1/4] fs: add file_operations->uring_cmd_iopoll
[not found] ` <CGME20220807184544epcas5p19f676581e0fdf555fa1d0a83906f2fc7@epcas5p1.samsung.com>
@ 2022-08-07 18:36 ` Kanchan Joshi
0 siblings, 0 replies; 5+ messages in thread
From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw)
To: axboe, hch
Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev,
Kanchan Joshi
io_uring will trigger this to do completion polling on uring-cmd
operations.
Signed-off-by: Kanchan Joshi <[email protected]>
---
include/linux/fs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 9f131e559d05..449941f99f50 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2134,6 +2134,7 @@ struct file_operations {
loff_t len, unsigned int remap_flags);
int (*fadvise)(struct file *, loff_t, loff_t, int);
int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
+ int (*uring_cmd_iopoll)(struct io_uring_cmd *ioucmd);
} __randomize_layout;
struct inode_operations {
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH for-next v2 2/4] io_uring: add iopoll infrastructure for io_uring_cmd
[not found] ` <CGME20220807184547epcas5p23b4ef30467d65d1b81632e7c514fc192@epcas5p2.samsung.com>
@ 2022-08-07 18:36 ` Kanchan Joshi
0 siblings, 0 replies; 5+ messages in thread
From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw)
To: axboe, hch
Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev,
Kanchan Joshi, Pankaj Raghav
Put this up in the same way as iopoll is done for regular read/write IO.
Make place for storing a cookie into struct io_uring_cmd on its
submission. Perform the completion using the ->uring_cmd_iopoll handler.
Signed-off-by: Kanchan Joshi <[email protected]>
Signed-off-by: Pankaj Raghav <[email protected]>
---
include/linux/io_uring.h | 8 ++++++--
io_uring/io_uring.c | 6 ++++++
io_uring/opdef.c | 1 +
io_uring/rw.c | 8 +++++++-
io_uring/uring_cmd.c | 11 +++++++++--
5 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 4a2f6cc5a492..58676c0a398f 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -20,8 +20,12 @@ enum io_uring_cmd_flags {
struct io_uring_cmd {
struct file *file;
const void *cmd;
- /* callback to defer completions to task context */
- void (*task_work_cb)(struct io_uring_cmd *cmd);
+ union {
+ /* callback to defer completions to task context */
+ void (*task_work_cb)(struct io_uring_cmd *cmd);
+ /* used for polled completion */
+ void *cookie;
+ };
u32 cmd_op;
u32 pad;
u8 pdu[32]; /* available inline for free use */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index b54218da075c..48a430a86b50 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1296,6 +1296,12 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
wq_list_empty(&ctx->iopoll_list))
break;
}
+
+ if (task_work_pending(current)) {
+ mutex_unlock(&ctx->uring_lock);
+ io_run_task_work();
+ mutex_lock(&ctx->uring_lock);
+ }
ret = io_do_iopoll(ctx, !min);
if (ret < 0)
break;
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 72dd2b2d8a9d..9a0df19306fe 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -466,6 +466,7 @@ const struct io_op_def io_op_defs[] = {
.needs_file = 1,
.plug = 1,
.name = "URING_CMD",
+ .iopoll = 1,
.async_size = uring_cmd_pdu_size(1),
.prep = io_uring_cmd_prep,
.issue = io_uring_cmd,
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 2b784795103c..1a4fb8a44b9a 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -1005,7 +1005,13 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
if (READ_ONCE(req->iopoll_completed))
break;
- ret = rw->kiocb.ki_filp->f_op->iopoll(&rw->kiocb, &iob, poll_flags);
+ if (req->opcode == IORING_OP_URING_CMD) {
+ struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw;
+
+ ret = req->file->f_op->uring_cmd_iopoll(ioucmd);
+ } else
+ ret = rw->kiocb.ki_filp->f_op->iopoll(&rw->kiocb, &iob,
+ poll_flags);
if (unlikely(ret < 0))
return ret;
else if (ret)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 0a421ed51e7e..5cc339fba8b8 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -49,7 +49,11 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2)
io_req_set_res(req, 0, ret);
if (req->ctx->flags & IORING_SETUP_CQE32)
io_req_set_cqe32_extra(req, res2, 0);
- __io_req_complete(req, 0);
+ if (req->ctx->flags & IORING_SETUP_IOPOLL)
+ /* order with io_iopoll_req_issued() checking ->iopoll_completed */
+ smp_store_release(&req->iopoll_completed, 1);
+ else
+ __io_req_complete(req, 0);
}
EXPORT_SYMBOL_GPL(io_uring_cmd_done);
@@ -89,8 +93,11 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
issue_flags |= IO_URING_F_SQE128;
if (ctx->flags & IORING_SETUP_CQE32)
issue_flags |= IO_URING_F_CQE32;
- if (ctx->flags & IORING_SETUP_IOPOLL)
+ if (ctx->flags & IORING_SETUP_IOPOLL) {
issue_flags |= IO_URING_F_IOPOLL;
+ req->iopoll_completed = 0;
+ WRITE_ONCE(ioucmd->cookie, NULL);
+ }
if (req_has_async_data(req))
ioucmd->cmd = req->async_data;
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH for-next v2 3/4] block: export blk_rq_is_poll
[not found] ` <CGME20220807184551epcas5p3b85421505f9c28d31492163f69c59d69@epcas5p3.samsung.com>
@ 2022-08-07 18:36 ` Kanchan Joshi
0 siblings, 0 replies; 5+ messages in thread
From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw)
To: axboe, hch
Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev,
Kanchan Joshi
This is being done as preparation to support iopoll for nvme passthrough
Signed-off-by: Kanchan Joshi <[email protected]>
---
block/blk-mq.c | 3 ++-
include/linux/blk-mq.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5ee62b95f3e5..de42f7237bad 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1233,7 +1233,7 @@ static void blk_end_sync_rq(struct request *rq, blk_status_t ret)
complete(&wait->done);
}
-static bool blk_rq_is_poll(struct request *rq)
+bool blk_rq_is_poll(struct request *rq)
{
if (!rq->mq_hctx)
return false;
@@ -1243,6 +1243,7 @@ static bool blk_rq_is_poll(struct request *rq)
return false;
return true;
}
+EXPORT_SYMBOL_GPL(blk_rq_is_poll);
static void blk_rq_poll_completion(struct request *rq, struct completion *wait)
{
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index effee1dc715a..8f841caaa4cb 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -981,6 +981,7 @@ int blk_rq_map_kern(struct request_queue *, struct request *, void *,
int blk_rq_append_bio(struct request *rq, struct bio *bio);
void blk_execute_rq_nowait(struct request *rq, bool at_head);
blk_status_t blk_execute_rq(struct request *rq, bool at_head);
+bool blk_rq_is_poll(struct request *rq);
struct req_iterator {
struct bvec_iter iter;
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH for-next v2 4/4] nvme: wire up async polling for io passthrough commands
[not found] ` <CGME20220807184555epcas5p4b7f5018c52d150150c32458fe3c21986@epcas5p4.samsung.com>
@ 2022-08-07 18:36 ` Kanchan Joshi
0 siblings, 0 replies; 5+ messages in thread
From: Kanchan Joshi @ 2022-08-07 18:36 UTC (permalink / raw)
To: axboe, hch
Cc: io-uring, linux-nvme, linux-block, ming.lei, gost.dev,
Kanchan Joshi, Anuj Gupta
Store a cookie during submission, and use that to implement
completion-polling inside the ->uring_cmd_iopoll handler.
This handler makes use of existing bio poll facility.
Signed-off-by: Kanchan Joshi <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
---
drivers/nvme/host/core.c | 1 +
drivers/nvme/host/ioctl.c | 73 ++++++++++++++++++++++++++++++++---
drivers/nvme/host/multipath.c | 1 +
drivers/nvme/host/nvme.h | 2 +
4 files changed, 72 insertions(+), 5 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 2429b11eb9a8..77b6c2882afd 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3976,6 +3976,7 @@ static const struct file_operations nvme_ns_chr_fops = {
.unlocked_ioctl = nvme_ns_chr_ioctl,
.compat_ioctl = compat_ptr_ioctl,
.uring_cmd = nvme_ns_chr_uring_cmd,
+ .uring_cmd_iopoll = nvme_ns_chr_uring_cmd_iopoll,
};
static int nvme_add_ns_cdev(struct nvme_ns *ns)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 27614bee7380..7756b439a688 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -391,11 +391,19 @@ static void nvme_uring_cmd_end_io(struct request *req, blk_status_t err)
struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
/* extract bio before reusing the same field for request */
struct bio *bio = pdu->bio;
+ void *cookie = READ_ONCE(ioucmd->cookie);
pdu->req = req;
req->bio = bio;
- /* this takes care of moving rest of completion-work to task context */
- io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
+
+ /*
+ * For iopoll, complete it directly.
+ * Otherwise, move the completion to task work.
+ */
+ if (cookie != NULL && blk_rq_is_poll(req))
+ nvme_uring_task_cb(ioucmd);
+ else
+ io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
}
static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
@@ -445,7 +453,10 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
rq_flags = REQ_NOWAIT;
blk_flags = BLK_MQ_REQ_NOWAIT;
}
+ if (issue_flags & IO_URING_F_IOPOLL)
+ rq_flags |= REQ_POLLED;
+retry:
req = nvme_alloc_user_request(q, &c, nvme_to_user_ptr(d.addr),
d.data_len, nvme_to_user_ptr(d.metadata),
d.metadata_len, 0, &meta, d.timeout_ms ?
@@ -456,6 +467,17 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
req->end_io = nvme_uring_cmd_end_io;
req->end_io_data = ioucmd;
+ if (issue_flags & IO_URING_F_IOPOLL && rq_flags & REQ_POLLED) {
+ if (unlikely(!req->bio)) {
+ /* we can't poll this, so alloc regular req instead */
+ blk_mq_free_request(req);
+ rq_flags &= ~REQ_POLLED;
+ goto retry;
+ } else {
+ WRITE_ONCE(ioucmd->cookie, req->bio);
+ req->bio->bi_opf |= REQ_POLLED;
+ }
+ }
/* to free bio on completion, as req->bio will be null at that time */
pdu->bio = req->bio;
pdu->meta = meta;
@@ -559,9 +581,6 @@ long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
static int nvme_uring_cmd_checks(unsigned int issue_flags)
{
- /* IOPOLL not supported yet */
- if (issue_flags & IO_URING_F_IOPOLL)
- return -EOPNOTSUPP;
/* NVMe passthrough requires big SQE/CQE support */
if ((issue_flags & (IO_URING_F_SQE128|IO_URING_F_CQE32)) !=
@@ -604,6 +623,23 @@ int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
return nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
}
+int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
+{
+ struct bio *bio;
+ int ret = 0;
+ struct nvme_ns *ns;
+ struct request_queue *q;
+
+ rcu_read_lock();
+ bio = READ_ONCE(ioucmd->cookie);
+ ns = container_of(file_inode(ioucmd->file)->i_cdev,
+ struct nvme_ns, cdev);
+ q = ns->queue;
+ if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio && bio->bi_bdev)
+ ret = bio_poll(bio, NULL, 0);
+ rcu_read_unlock();
+ return ret;
+}
#ifdef CONFIG_NVME_MULTIPATH
static int nvme_ns_head_ctrl_ioctl(struct nvme_ns *ns, unsigned int cmd,
void __user *argp, struct nvme_ns_head *head, int srcu_idx)
@@ -685,6 +721,29 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
srcu_read_unlock(&head->srcu, srcu_idx);
return ret;
}
+
+int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
+{
+ struct cdev *cdev = file_inode(ioucmd->file)->i_cdev;
+ struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev);
+ int srcu_idx = srcu_read_lock(&head->srcu);
+ struct nvme_ns *ns = nvme_find_path(head);
+ struct bio *bio;
+ int ret = 0;
+ struct request_queue *q;
+
+ if (ns) {
+ rcu_read_lock();
+ bio = READ_ONCE(ioucmd->cookie);
+ q = ns->queue;
+ if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio
+ && bio->bi_bdev)
+ ret = bio_poll(bio, NULL, 0);
+ rcu_read_unlock();
+ }
+ srcu_read_unlock(&head->srcu, srcu_idx);
+ return ret;
+}
#endif /* CONFIG_NVME_MULTIPATH */
int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
@@ -692,6 +751,10 @@ int nvme_dev_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
struct nvme_ctrl *ctrl = ioucmd->file->private_data;
int ret;
+ /* IOPOLL not supported yet */
+ if (issue_flags & IO_URING_F_IOPOLL)
+ return -EOPNOTSUPP;
+
ret = nvme_uring_cmd_checks(issue_flags);
if (ret)
return ret;
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 6ef497c75a16..00f2f81e20fa 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -439,6 +439,7 @@ static const struct file_operations nvme_ns_head_chr_fops = {
.unlocked_ioctl = nvme_ns_head_chr_ioctl,
.compat_ioctl = compat_ptr_ioctl,
.uring_cmd = nvme_ns_head_chr_uring_cmd,
+ .uring_cmd_iopoll = nvme_ns_head_chr_uring_cmd_iopoll,
};
static int nvme_add_ns_head_cdev(struct nvme_ns_head *head)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bdc0ff7ed9ab..3f2d3dda6e6c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -821,6 +821,8 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
unsigned long arg);
long nvme_dev_ioctl(struct file *file, unsigned int cmd,
unsigned long arg);
+int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd);
+int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd);
int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd,
unsigned int issue_flags);
int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
--
2.25.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-08-07 18:46 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20220807184540epcas5p41f496a87fe65cff524740ddde071b4bb@epcas5p4.samsung.com>
2022-08-07 18:36 ` [PATCH for-next v2 0/4] iopoll support for io_uring/nvme Kanchan Joshi
[not found] ` <CGME20220807184544epcas5p19f676581e0fdf555fa1d0a83906f2fc7@epcas5p1.samsung.com>
2022-08-07 18:36 ` [PATCH for-next v2 1/4] fs: add file_operations->uring_cmd_iopoll Kanchan Joshi
[not found] ` <CGME20220807184547epcas5p23b4ef30467d65d1b81632e7c514fc192@epcas5p2.samsung.com>
2022-08-07 18:36 ` [PATCH for-next v2 2/4] io_uring: add iopoll infrastructure for io_uring_cmd Kanchan Joshi
[not found] ` <CGME20220807184551epcas5p3b85421505f9c28d31492163f69c59d69@epcas5p3.samsung.com>
2022-08-07 18:36 ` [PATCH for-next v2 3/4] block: export blk_rq_is_poll Kanchan Joshi
[not found] ` <CGME20220807184555epcas5p4b7f5018c52d150150c32458fe3c21986@epcas5p4.samsung.com>
2022-08-07 18:36 ` [PATCH for-next v2 4/4] nvme: wire up async polling for io passthrough commands Kanchan Joshi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox