From: Jens Axboe <[email protected]>
To: Kanchan Joshi <[email protected]>
Cc: [email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected]
Subject: Re: [PATCH for-next v3 0/4] fixed-buffer for uring-cmd/passthrough
Date: Fri, 2 Sep 2022 15:25:33 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
[-- Attachment #1: Type: text/plain, Size: 3491 bytes --]
On 9/2/22 1:32 PM, Jens Axboe wrote:
> On 9/2/22 12:46 PM, Kanchan Joshi wrote:
>> On Fri, Sep 02, 2022 at 10:32:16AM -0600, Jens Axboe wrote:
>>> On 9/2/22 10:06 AM, Jens Axboe wrote:
>>>> On 9/2/22 9:16 AM, Kanchan Joshi wrote:
>>>>> Hi,
>>>>>
>>>>> Currently uring-cmd lacks the ability to leverage the pre-registered
>>>>> buffers. This series adds the support in uring-cmd, and plumbs
>>>>> nvme passthrough to work with it.
>>>>>
>>>>> Using registered-buffers showed peak-perf hike from 1.85M to 2.17M IOPS
>>>>> in my setup.
>>>>>
>>>>> Without fixedbufs
>>>>> *****************
>>>>> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1
>>>>> submitter=0, tid=5256, file=/dev/ng0n1, node=-1
>>>>> polled=0, fixedbufs=0/0, register_files=1, buffered=1, QD=128
>>>>> Engine=io_uring, sq_ring=128, cq_ring=128
>>>>> IOPS=1.85M, BW=904MiB/s, IOS/call=32/31
>>>>> IOPS=1.85M, BW=903MiB/s, IOS/call=32/32
>>>>> IOPS=1.85M, BW=902MiB/s, IOS/call=32/32
>>>>> ^CExiting on signal
>>>>> Maximum IOPS=1.85M
>>>>
>>>> With the poll support queued up, I ran this one as well. tldr is:
>>>>
>>>> bdev (non pt)??? 122M IOPS
>>>> irq driven??? 51-52M IOPS
>>>> polled??????? 71M IOPS
>>>> polled+fixed??? 78M IOPS
>>
>> except first one, rest three entries are for passthru? somehow I didn't
>> see that big of a gap. I will try to align my setup in coming days.
>
> Right, sorry it was badly labeled. First one is bdev with polling,
> registered buffers, etc. The others are all the passthrough mode. polled
> goes to 74M with the caching fix, so it's about a 74M -> 82M bump using
> registered buffers with passthrough and polling.
>
>>> polled+fixed??? 82M
>>>
>>> I suspect the remainder is due to the lack of batching on the request
>>> freeing side, at least some of it. Haven't really looked deeper yet.
>>>
>>> One issue I saw - try and use passthrough polling without having any
>>> poll queues defined and it'll stall just spinning on completions. You
>>> need to ensure that these are processed as well - look at how the
>>> non-passthrough io_uring poll path handles it.
>>
>> Had tested this earlier, and it used to run fine. And it does not now.
>> I see that io are getting completed, irq-completion is arriving in nvme
>> and it is triggering task-work based completion (by calling
>> io_uring_cmd_complete_in_task). But task-work never got called and
>> therefore no completion happened.
>>
>> io_uring_cmd_complete_in_task -> io_req_task_work_add -> __io_req_task_work_add
>>
>> Seems task work did not get added. Something about newly added
>> IORING_SETUP_DEFER_TASKRUN changes the scenario.
>>
>> static inline void __io_req_task_work_add(struct io_kiocb *req, bool allow_local)
>> {
>> ?????? struct io_uring_task *tctx = req->task->io_uring;
>> ?????? struct io_ring_ctx *ctx = req->ctx;
>> ?????? struct llist_node *node;
>>
>> ?????? if (allow_local && ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
>> ?????????????? io_req_local_work_add(req);
>> ?????????????? return;
>> ?????? }
>> ????....
>>
>> To confirm, I commented that in t/io_uring and it runs fine.
>> Please see if that changes anything for you? I will try to find the
>> actual fix tomorow.
>
> Ah gotcha, yes that actually makes a lot of sense. I wonder if regular
> polling is then also broken without poll queues if
> IORING_SETUP_DEFER_TASKRUN is set. It should be, I'll check into
> io_iopoll_check().
A mix of fixes and just cleanups, here's what I got.
--
Jens Axboe
[-- Attachment #2: 0001-io_uring-cleanly-separate-request-types-for-iopoll.patch --]
[-- Type: text/x-patch, Size: 1663 bytes --]
From 50155186644a352b290b72c61e738f62640d566a Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Fri, 2 Sep 2022 15:16:29 -0600
Subject: [PATCH 1/3] io_uring: cleanly separate request types for iopoll
After the addition of iopoll support for passthrough, there's a bit of
a mixup here. Clean it up and get rid of the casting for the passthrough
command type.
Signed-off-by: Jens Axboe <[email protected]>
---
io_uring/rw.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 9698a789b3d5..3f03b6d2a5a3 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -994,7 +994,7 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
wq_list_for_each(pos, start, &ctx->iopoll_list) {
struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
- struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+ struct file *file = req->file;
int ret;
/*
@@ -1006,12 +1006,15 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
break;
if (req->opcode == IORING_OP_URING_CMD) {
- struct io_uring_cmd *ioucmd = (struct io_uring_cmd *)rw;
+ struct io_uring_cmd *ioucmd;
- ret = req->file->f_op->uring_cmd_iopoll(ioucmd);
- } else
- ret = rw->kiocb.ki_filp->f_op->iopoll(&rw->kiocb, &iob,
- poll_flags);
+ ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
+ ret = file->f_op->uring_cmd_iopoll(ioucmd, poll_flags);
+ } else {
+ struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+
+ ret = file->f_op->iopoll(&rw->kiocb, &iob, poll_flags);
+ }
if (unlikely(ret < 0))
return ret;
else if (ret)
--
2.35.1
[-- Attachment #3: 0002-nvme-use-separate-end-IO-handler-for-IOPOLL.patch --]
[-- Type: text/x-patch, Size: 2457 bytes --]
From 0bc78c843b8636dcdfe45dd07328ca826fa67f9b Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Fri, 2 Sep 2022 15:17:30 -0600
Subject: [PATCH 2/3] nvme: use separate end IO handler for IOPOLL
Don't need to rely on the cookie or request type, set the right handler
based on how we're handling the IO.
Signed-off-by: Jens Axboe <[email protected]>
---
drivers/nvme/host/ioctl.c | 30 ++++++++++++++++++++++--------
1 file changed, 22 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 7756b439a688..f34abe95821e 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -385,25 +385,36 @@ static void nvme_uring_task_cb(struct io_uring_cmd *ioucmd)
io_uring_cmd_done(ioucmd, status, result);
}
-static void nvme_uring_cmd_end_io(struct request *req, blk_status_t err)
+static void nvme_uring_iopoll_cmd_end_io(struct request *req, blk_status_t err)
{
struct io_uring_cmd *ioucmd = req->end_io_data;
struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
/* extract bio before reusing the same field for request */
struct bio *bio = pdu->bio;
- void *cookie = READ_ONCE(ioucmd->cookie);
pdu->req = req;
req->bio = bio;
/*
* For iopoll, complete it directly.
- * Otherwise, move the completion to task work.
*/
- if (cookie != NULL && blk_rq_is_poll(req))
- nvme_uring_task_cb(ioucmd);
- else
- io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
+ nvme_uring_task_cb(ioucmd);
+}
+
+static void nvme_uring_cmd_end_io(struct request *req, blk_status_t err)
+{
+ struct io_uring_cmd *ioucmd = req->end_io_data;
+ struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
+ /* extract bio before reusing the same field for request */
+ struct bio *bio = pdu->bio;
+
+ pdu->req = req;
+ req->bio = bio;
+
+ /*
+ * Move the completion to task work.
+ */
+ io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
}
static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
@@ -464,7 +475,10 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
blk_flags);
if (IS_ERR(req))
return PTR_ERR(req);
- req->end_io = nvme_uring_cmd_end_io;
+ if (issue_flags & IO_URING_F_IOPOLL)
+ req->end_io = nvme_uring_iopoll_cmd_end_io;
+ else
+ req->end_io = nvme_uring_cmd_end_io;
req->end_io_data = ioucmd;
if (issue_flags & IO_URING_F_IOPOLL && rq_flags & REQ_POLLED) {
--
2.35.1
[-- Attachment #4: 0003-fs-add-batch-and-poll-flags-to-the-uring_cmd_iopoll-.patch --]
[-- Type: text/x-patch, Size: 4261 bytes --]
From 9daa39b146f3a8f412196df5eb9f9686f308e5cc Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Fri, 2 Sep 2022 15:18:05 -0600
Subject: [PATCH 3/3] fs: add batch and poll flags to the uring_cmd_iopoll()
handler
We need the poll_flags to know how to poll for the IO, and we should
have the batch structure in preparation for supporting batched
completions with iopoll.
Signed-off-by: Jens Axboe <[email protected]>
---
drivers/nvme/host/ioctl.c | 12 ++++++++----
drivers/nvme/host/nvme.h | 6 ++++--
include/linux/fs.h | 3 ++-
io_uring/rw.c | 3 ++-
4 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index f34abe95821e..7a0b12ef49ae 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -637,7 +637,9 @@ int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
return nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
}
-int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
+int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob,
+ unsigned int poll_flags)
{
struct bio *bio;
int ret = 0;
@@ -650,7 +652,7 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
struct nvme_ns, cdev);
q = ns->queue;
if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio && bio->bi_bdev)
- ret = bio_poll(bio, NULL, 0);
+ ret = bio_poll(bio, iob, poll_flags);
rcu_read_unlock();
return ret;
}
@@ -736,7 +738,9 @@ int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
return ret;
}
-int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
+int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob,
+ unsigned int poll_flags)
{
struct cdev *cdev = file_inode(ioucmd->file)->i_cdev;
struct nvme_ns_head *head = container_of(cdev, struct nvme_ns_head, cdev);
@@ -752,7 +756,7 @@ int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd)
q = ns->queue;
if (test_bit(QUEUE_FLAG_POLL, &q->queue_flags) && bio
&& bio->bi_bdev)
- ret = bio_poll(bio, NULL, 0);
+ ret = bio_poll(bio, iob, poll_flags);
rcu_read_unlock();
}
srcu_read_unlock(&head->srcu, srcu_idx);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index fdcbc93dea21..216acbe953b3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -821,8 +821,10 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
unsigned long arg);
long nvme_dev_ioctl(struct file *file, unsigned int cmd,
unsigned long arg);
-int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd);
-int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd);
+int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob, unsigned int poll_flags);
+int nvme_ns_head_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
+ struct io_comp_batch *iob, unsigned int poll_flags);
int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd,
unsigned int issue_flags);
int nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d6badd19784f..01681d061a6a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2132,7 +2132,8 @@ struct file_operations {
loff_t len, unsigned int remap_flags);
int (*fadvise)(struct file *, loff_t, loff_t, int);
int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
- int (*uring_cmd_iopoll)(struct io_uring_cmd *ioucmd);
+ int (*uring_cmd_iopoll)(struct io_uring_cmd *, struct io_comp_batch *,
+ unsigned int poll_flags);
} __randomize_layout;
struct inode_operations {
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 3f03b6d2a5a3..4a061326c664 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -1009,7 +1009,8 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
struct io_uring_cmd *ioucmd;
ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
- ret = file->f_op->uring_cmd_iopoll(ioucmd, poll_flags);
+ ret = file->f_op->uring_cmd_iopoll(ioucmd, &iob,
+ poll_flags);
} else {
struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
--
2.35.1
next prev parent reply other threads:[~2022-09-02 21:25 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20220902152701epcas5p1d4aca8eebc90fb96ac7ed5a8270816cf@epcas5p1.samsung.com>
2022-09-02 15:16 ` [PATCH for-next v3 0/4] fixed-buffer for uring-cmd/passthrough Kanchan Joshi
[not found] ` <CGME20220902152706epcas5p11f1a11710a6ba985f4679a5cfe131bf8@epcas5p1.samsung.com>
2022-09-02 15:16 ` [PATCH for-next v3 1/4] io_uring: introduce io_uring_cmd_import_fixed Kanchan Joshi
[not found] ` <CGME20220902152709epcas5p1a1bd433cac6040c492e347edae484ca5@epcas5p1.samsung.com>
2022-09-02 15:16 ` [PATCH for-next v3 2/4] io_uring: introduce fixed buffer support for io_uring_cmd Kanchan Joshi
2022-09-02 23:13 ` Jens Axboe
[not found] ` <CGME20220902152712epcas5p2622e861ac4a5ae9820a9af9442d556b4@epcas5p2.samsung.com>
2022-09-02 15:16 ` [PATCH for-next v3 3/4] block: add helper to map bvec iterator for passthrough Kanchan Joshi
2022-09-02 23:14 ` Jens Axboe
[not found] ` <CGME20220902152717epcas5p26905ce7cb48e9d278976a301d73c297f@epcas5p2.samsung.com>
2022-09-02 15:16 ` [PATCH for-next v3 4/4] nvme: wire up fixed buffer support for nvme passthrough Kanchan Joshi
2022-09-02 16:06 ` [PATCH for-next v3 0/4] fixed-buffer for uring-cmd/passthrough Jens Axboe
2022-09-02 16:32 ` Jens Axboe
2022-09-02 18:46 ` Kanchan Joshi
2022-09-02 19:32 ` Jens Axboe
2022-09-02 21:25 ` Jens Axboe [this message]
2022-09-03 9:34 ` Kanchan Joshi
2022-09-03 17:00 ` Jens Axboe
2022-09-04 17:01 ` Kanchan Joshi
2022-09-04 20:17 ` Jens Axboe
2022-09-05 5:52 ` Kanchan Joshi
2022-09-05 17:48 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox