* [PATCH 4/4] dm: support I/O polling
@ 2021-03-02 19:05 Mikulas Patocka
2021-03-03 2:53 ` [dm-devel] " JeffleXu
0 siblings, 1 reply; 17+ messages in thread
From: Mikulas Patocka @ 2021-03-02 19:05 UTC (permalink / raw)
To: JeffleXu, Mike Snitzer, Heinz Mauelshagen, axboe, caspar,
io-uring, linux-block, joseph.qi, dm-devel, hch
Cc: Mikulas Patocka
[-- Attachment #1: dm-poll.patch --]
[-- Type: text/plain, Size: 760 bytes --]
Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
cookie.
Signed-off-by: Mikulas Patocka <[email protected]>
---
drivers/md/dm.c | 5 +++++
1 file changed, 5 insertions(+)
Index: linux-2.6/drivers/md/dm.c
===================================================================
--- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
+++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
@@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
}
}
+ if (ci.poll_cookie != BLK_QC_T_NONE) {
+ while (atomic_read(&ci.io->io_count) > 1 &&
+ blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
+ }
+
/* drop the extra reference count */
dec_pending(ci.io, errno_to_blk_status(error));
}
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-02 19:05 [PATCH 4/4] dm: support I/O polling Mikulas Patocka
@ 2021-03-03 2:53 ` JeffleXu
2021-03-03 10:09 ` Mikulas Patocka
0 siblings, 1 reply; 17+ messages in thread
From: JeffleXu @ 2021-03-03 2:53 UTC (permalink / raw)
To: Mikulas Patocka, Mike Snitzer, Heinz Mauelshagen, axboe, caspar,
io-uring, linux-block, joseph.qi, dm-devel, hch
On 3/3/21 3:05 AM, Mikulas Patocka wrote:
> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
> cookie.
>
> Signed-off-by: Mikulas Patocka <[email protected]>
>
> ---
> drivers/md/dm.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> Index: linux-2.6/drivers/md/dm.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
> }
> }
>
> + if (ci.poll_cookie != BLK_QC_T_NONE) {
> + while (atomic_read(&ci.io->io_count) > 1 &&
> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
> + }
> +
> /* drop the extra reference count */
> dec_pending(ci.io, errno_to_blk_status(error));
> }
It seems that the general idea of your design is to
1) submit *one* split bio
2) blk_poll(), waiting the previously submitted split bio complets
and then submit next split bio, repeating the above process. I'm afraid
the performance may be an issue here, since the batch every time
blk_poll() reaps may decrease.
Besides, the submitting routine and polling routine is bound together
here, i.e., polling is always synchronous.
--
Thanks,
Jeffle
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-03 2:53 ` [dm-devel] " JeffleXu
@ 2021-03-03 10:09 ` Mikulas Patocka
2021-03-04 2:57 ` JeffleXu
` (2 more replies)
0 siblings, 3 replies; 17+ messages in thread
From: Mikulas Patocka @ 2021-03-03 10:09 UTC (permalink / raw)
To: JeffleXu
Cc: Mike Snitzer, Heinz Mauelshagen, axboe, caspar, io-uring,
linux-block, joseph.qi, dm-devel, hch
On Wed, 3 Mar 2021, JeffleXu wrote:
>
>
> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>
> > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
> > cookie.
> >
> > Signed-off-by: Mikulas Patocka <[email protected]>
> >
> > ---
> > drivers/md/dm.c | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > Index: linux-2.6/drivers/md/dm.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
> > }
> > }
> >
> > + if (ci.poll_cookie != BLK_QC_T_NONE) {
> > + while (atomic_read(&ci.io->io_count) > 1 &&
> > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
> > + }
> > +
> > /* drop the extra reference count */
> > dec_pending(ci.io, errno_to_blk_status(error));
> > }
>
> It seems that the general idea of your design is to
> 1) submit *one* split bio
> 2) blk_poll(), waiting the previously submitted split bio complets
No, I submit all the bios and poll for the last one.
> and then submit next split bio, repeating the above process. I'm afraid
> the performance may be an issue here, since the batch every time
> blk_poll() reaps may decrease.
Could you benchmark it?
> Besides, the submitting routine and polling routine is bound together
> here, i.e., polling is always synchronous.
__split_and_process_bio calls __split_and_process_non_flush in a loop and
__split_and_process_non_flush records the poll cookie in ci.poll_cookie.
When we processed all the bios, we poll for the last cookie here:
if (ci.poll_cookie != BLK_QC_T_NONE) {
while (atomic_read(&ci.io->io_count) > 1 &&
blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
}
Mikulas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-03 10:09 ` Mikulas Patocka
@ 2021-03-04 2:57 ` JeffleXu
2021-03-04 10:09 ` Mikulas Patocka
2021-03-04 15:01 ` Jeff Moyer
2021-03-05 9:52 ` JeffleXu
2 siblings, 1 reply; 17+ messages in thread
From: JeffleXu @ 2021-03-04 2:57 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Mike Snitzer, Heinz Mauelshagen, axboe, caspar, io-uring,
linux-block, joseph.qi, dm-devel, hch
On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>
>
> On Wed, 3 Mar 2021, JeffleXu wrote:
>
>>
>>
>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>
>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>> cookie.
>>>
>>> Signed-off-by: Mikulas Patocka <[email protected]>
>>>
>>> ---
>>> drivers/md/dm.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> Index: linux-2.6/drivers/md/dm.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>> }
>>> }
>>>
>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
>>> + while (atomic_read(&ci.io->io_count) > 1 &&
>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>> + }
>>> +
>>> /* drop the extra reference count */
>>> dec_pending(ci.io, errno_to_blk_status(error));
>>> }
>>
>> It seems that the general idea of your design is to
>> 1) submit *one* split bio
>> 2) blk_poll(), waiting the previously submitted split bio complets
>
> No, I submit all the bios and poll for the last one.
>
>> and then submit next split bio, repeating the above process. I'm afraid
>> the performance may be an issue here, since the batch every time
>> blk_poll() reaps may decrease.
>
> Could you benchmark it?
>
I will once I finished some other issues.
>> Besides, the submitting routine and polling routine is bound together
>> here, i.e., polling is always synchronous.
>
> __split_and_process_bio calls __split_and_process_non_flush in a loop
I also noticed that you sent this patch.
https://patchwork.kernel.org/project/dm-devel/patch/alpine.LRH.2.02.2103010457510.631@file01.intranet.prod.int.rdu2.redhat.com/
I agree with you that this while() loop here is unnecessary. And thus
there's no loop calling __split_and_process_non_flush() in
__split_and_process_bio().
> __split_and_process_non_flush records the poll cookie in ci.poll_cookie.
> When we processed all the bios, we poll for the last cookie here:
>
> if (ci.poll_cookie != BLK_QC_T_NONE) {
> while (atomic_read(&ci.io->io_count) > 1 &&
> blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
> }
So what will happen if one bio submitted to dm device crosses the device
boundary among several target devices (e.g., dm-stripe)? Please refer
the following call graph.
```
submit_bio
__submit_bio_noacct
disk->fops->submit_bio(), calling into __split_and_process_bio(),
call __split_and_process_non_flush() once, submitting the *first* split bio
disk->fops->submit_bio(), calling into __split_and_process_bio(),
call __split_and_process_non_flush() once, submitting the *second* split bio
...
```
So the loop is in __submit_bio_noacct(), rather than
__split_and_process_bio(). Your design will send the first split bio,
and then poll on this split bio, then send the next split bio, polling
on this, go on and on...
--
Thanks,
Jeffle
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-04 2:57 ` JeffleXu
@ 2021-03-04 10:09 ` Mikulas Patocka
2021-03-05 18:21 ` Jens Axboe
0 siblings, 1 reply; 17+ messages in thread
From: Mikulas Patocka @ 2021-03-04 10:09 UTC (permalink / raw)
To: JeffleXu
Cc: Mike Snitzer, Heinz Mauelshagen, axboe, caspar, io-uring,
linux-block, joseph.qi, dm-devel, hch
On Thu, 4 Mar 2021, JeffleXu wrote:
> > __split_and_process_non_flush records the poll cookie in ci.poll_cookie.
> > When we processed all the bios, we poll for the last cookie here:
> >
> > if (ci.poll_cookie != BLK_QC_T_NONE) {
> > while (atomic_read(&ci.io->io_count) > 1 &&
> > blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
> > }
>
> So what will happen if one bio submitted to dm device crosses the device
> boundary among several target devices (e.g., dm-stripe)? Please refer
> the following call graph.
>
> ```
> submit_bio
> __submit_bio_noacct
> disk->fops->submit_bio(), calling into __split_and_process_bio(),
> call __split_and_process_non_flush() once, submitting the *first* split bio
> disk->fops->submit_bio(), calling into __split_and_process_bio(),
> call __split_and_process_non_flush() once, submitting the *second* split bio
> ...
> ```
>
>
> So the loop is in __submit_bio_noacct(), rather than
> __split_and_process_bio(). Your design will send the first split bio,
> and then poll on this split bio, then send the next split bio, polling
> on this, go on and on...
No. It will send all the bios and poll for the last one.
Mikulas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-03 10:09 ` Mikulas Patocka
2021-03-04 2:57 ` JeffleXu
@ 2021-03-04 15:01 ` Jeff Moyer
2021-03-04 15:11 ` Mike Snitzer
2021-03-04 15:12 ` [dm-devel] " Mikulas Patocka
2021-03-05 9:52 ` JeffleXu
2 siblings, 2 replies; 17+ messages in thread
From: Jeff Moyer @ 2021-03-04 15:01 UTC (permalink / raw)
To: Mikulas Patocka
Cc: JeffleXu, Mike Snitzer, Heinz Mauelshagen, axboe, caspar,
io-uring, linux-block, joseph.qi, dm-devel, hch
Hi, Mikulas,
Mikulas Patocka <[email protected]> writes:
> On Wed, 3 Mar 2021, JeffleXu wrote:
>
>>
>>
>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>
>> > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>> > cookie.
>> >
>> > Signed-off-by: Mikulas Patocka <[email protected]>
>> >
>> > ---
>> > drivers/md/dm.c | 5 +++++
>> > 1 file changed, 5 insertions(+)
>> >
>> > Index: linux-2.6/drivers/md/dm.c
>> > ===================================================================
>> > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>> > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>> > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>> > }
>> > }
>> >
>> > + if (ci.poll_cookie != BLK_QC_T_NONE) {
>> > + while (atomic_read(&ci.io->io_count) > 1 &&
>> > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>> > + }
>> > +
>> > /* drop the extra reference count */
>> > dec_pending(ci.io, errno_to_blk_status(error));
>> > }
>>
>> It seems that the general idea of your design is to
>> 1) submit *one* split bio
>> 2) blk_poll(), waiting the previously submitted split bio complets
>
> No, I submit all the bios and poll for the last one.
What happens if the last bio completes first? It looks like you will
call blk_poll with a cookie that already completed, and I'm pretty sure
that's invalid.
Thanks,
Jeff
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] dm: support I/O polling
2021-03-04 15:01 ` Jeff Moyer
@ 2021-03-04 15:11 ` Mike Snitzer
2021-03-04 15:12 ` [dm-devel] " Mikulas Patocka
1 sibling, 0 replies; 17+ messages in thread
From: Mike Snitzer @ 2021-03-04 15:11 UTC (permalink / raw)
To: Jeff Moyer
Cc: Mikulas Patocka, JeffleXu, Heinz Mauelshagen, axboe, caspar,
io-uring, linux-block, joseph.qi, dm-devel, hch, Jonathan Brassow
On Thu, Mar 04 2021 at 10:01am -0500,
Jeff Moyer <[email protected]> wrote:
> Hi, Mikulas,
>
> Mikulas Patocka <[email protected]> writes:
>
> > On Wed, 3 Mar 2021, JeffleXu wrote:
> >
> >>
> >>
> >> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
> >>
> >> > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
> >> > cookie.
> >> >
> >> > Signed-off-by: Mikulas Patocka <[email protected]>
> >> >
> >> > ---
> >> > drivers/md/dm.c | 5 +++++
> >> > 1 file changed, 5 insertions(+)
> >> >
> >> > Index: linux-2.6/drivers/md/dm.c
> >> > ===================================================================
> >> > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> >> > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> >> > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
> >> > }
> >> > }
> >> >
> >> > + if (ci.poll_cookie != BLK_QC_T_NONE) {
> >> > + while (atomic_read(&ci.io->io_count) > 1 &&
> >> > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
> >> > + }
> >> > +
> >> > /* drop the extra reference count */
> >> > dec_pending(ci.io, errno_to_blk_status(error));
> >> > }
> >>
> >> It seems that the general idea of your design is to
> >> 1) submit *one* split bio
> >> 2) blk_poll(), waiting the previously submitted split bio complets
> >
> > No, I submit all the bios and poll for the last one.
>
> What happens if the last bio completes first? It looks like you will
> call blk_poll with a cookie that already completed, and I'm pretty sure
> that's invalid.
In addition, I'm concerned this approach to have DM internalize IO
polling is a non-starter.
I just don't think this approach adheres to the io_uring + IO polling
interface.. it never returns a cookie to upper layers... so there is
really no opportunity for standard io_uring + IO polling interface to
work is there?
But Heinz and Mikulas are about to kick off some fio io_uring + hipri=1
(io polling) testing of Jeffle's latest v5 patchset:
https://patchwork.kernel.org/project/dm-devel/list/?series=442075
compared to Mikulas' patchset:
https://patchwork.kernel.org/project/dm-devel/list/?series=440719
We should have definitive answers soon enough, just using Jeffle's fio
config (with hipri=1 for IO polling) that was documented here:
https://listman.redhat.com/archives/dm-devel/2020-October/msg00129.html
Mike
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-04 15:01 ` Jeff Moyer
2021-03-04 15:11 ` Mike Snitzer
@ 2021-03-04 15:12 ` Mikulas Patocka
1 sibling, 0 replies; 17+ messages in thread
From: Mikulas Patocka @ 2021-03-04 15:12 UTC (permalink / raw)
To: Jeff Moyer
Cc: JeffleXu, Mike Snitzer, Heinz Mauelshagen, axboe, caspar,
io-uring, linux-block, joseph.qi, dm-devel, hch
On Thu, 4 Mar 2021, Jeff Moyer wrote:
> Hi, Mikulas,
>
> Mikulas Patocka <[email protected]> writes:
>
> > On Wed, 3 Mar 2021, JeffleXu wrote:
> >
> >>
> >>
> >> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
> >>
> >> > Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
> >> > cookie.
> >> >
> >> > Signed-off-by: Mikulas Patocka <[email protected]>
> >> >
> >> > ---
> >> > drivers/md/dm.c | 5 +++++
> >> > 1 file changed, 5 insertions(+)
> >> >
> >> > Index: linux-2.6/drivers/md/dm.c
> >> > ===================================================================
> >> > --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> >> > +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> >> > @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
> >> > }
> >> > }
> >> >
> >> > + if (ci.poll_cookie != BLK_QC_T_NONE) {
> >> > + while (atomic_read(&ci.io->io_count) > 1 &&
> >> > + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
> >> > + }
> >> > +
> >> > /* drop the extra reference count */
> >> > dec_pending(ci.io, errno_to_blk_status(error));
> >> > }
> >>
> >> It seems that the general idea of your design is to
> >> 1) submit *one* split bio
> >> 2) blk_poll(), waiting the previously submitted split bio complets
> >
> > No, I submit all the bios and poll for the last one.
>
> What happens if the last bio completes first? It looks like you will
> call blk_poll with a cookie that already completed, and I'm pretty sure
> that's invalid.
>
> Thanks,
> Jeff
If the last bio completes first, the other bios will use interrupt-driven
endio.
Calling blk_poll with already completed cookie is IMHO legal - it happens
with the current direct-io code too. The direct-io code will check for bio
completion and if the bio is not completed, call blk_poll. The bio may be
completed from an interrupt after the check and before blk_poll - in this
case, we will call blk_poll with already completed cookie.
Mikulas
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-03 10:09 ` Mikulas Patocka
2021-03-04 2:57 ` JeffleXu
2021-03-04 15:01 ` Jeff Moyer
@ 2021-03-05 9:52 ` JeffleXu
2021-03-05 17:46 ` Heinz Mauelshagen
2 siblings, 1 reply; 17+ messages in thread
From: JeffleXu @ 2021-03-05 9:52 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Mike Snitzer, Heinz Mauelshagen, axboe, caspar, io-uring,
linux-block, joseph.qi, dm-devel, hch
On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>
>
> On Wed, 3 Mar 2021, JeffleXu wrote:
>
>>
>>
>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>
>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>> cookie.
>>>
>>> Signed-off-by: Mikulas Patocka <[email protected]>
>>>
>>> ---
>>> drivers/md/dm.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> Index: linux-2.6/drivers/md/dm.c
>>> ===================================================================
>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>> }
>>> }
>>>
>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
>>> + while (atomic_read(&ci.io->io_count) > 1 &&
>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>> + }
>>> +
>>> /* drop the extra reference count */
>>> dec_pending(ci.io, errno_to_blk_status(error));
>>> }
>>
>> It seems that the general idea of your design is to
>> 1) submit *one* split bio
>> 2) blk_poll(), waiting the previously submitted split bio complets
>
> No, I submit all the bios and poll for the last one.
>
>> and then submit next split bio, repeating the above process. I'm afraid
>> the performance may be an issue here, since the batch every time
>> blk_poll() reaps may decrease.
>
> Could you benchmark it?
I only tested dm-linear.
The configuration (dm table) of dm-linear is:
0 1048576 linear /dev/nvme0n1 0
1048576 1048576 linear /dev/nvme2n1 0
2097152 1048576 linear /dev/nvme5n1 0
fio script used is:
```
$cat fio.conf
[global]
name=iouring-sqpoll-iopoll-1
ioengine=io_uring
iodepth=128
numjobs=1
thread
rw=randread
direct=1
registerfiles=1
hipri=1
runtime=10
time_based
group_reporting
randrepeat=0
filename=/dev/mapper/testdev
bs=4k
[job-1]
cpus_allowed=14
```
IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
--------------- | --------------------
213k | 19k
At least, it doesn't work well with io_uring interface.
--
Thanks,
Jeffle
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-05 9:52 ` JeffleXu
@ 2021-03-05 17:46 ` Heinz Mauelshagen
2021-03-05 17:56 ` Heinz Mauelshagen
0 siblings, 1 reply; 17+ messages in thread
From: Heinz Mauelshagen @ 2021-03-05 17:46 UTC (permalink / raw)
To: JeffleXu, Mikulas Patocka
Cc: axboe, Mike Snitzer, caspar, hch, linux-block, joseph.qi,
dm-devel, io-uring
On 3/5/21 10:52 AM, JeffleXu wrote:
>
> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>
>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>
>>>
>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>
>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>> cookie.
>>>>
>>>> Signed-off-by: Mikulas Patocka <[email protected]>
>>>>
>>>> ---
>>>> drivers/md/dm.c | 5 +++++
>>>> 1 file changed, 5 insertions(+)
>>>>
>>>> Index: linux-2.6/drivers/md/dm.c
>>>> ===================================================================
>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>> }
>>>> }
>>>>
>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>> + while (atomic_read(&ci.io->io_count) > 1 &&
>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>> + }
>>>> +
>>>> /* drop the extra reference count */
>>>> dec_pending(ci.io, errno_to_blk_status(error));
>>>> }
>>> It seems that the general idea of your design is to
>>> 1) submit *one* split bio
>>> 2) blk_poll(), waiting the previously submitted split bio complets
>> No, I submit all the bios and poll for the last one.
>>
>>> and then submit next split bio, repeating the above process. I'm afraid
>>> the performance may be an issue here, since the batch every time
>>> blk_poll() reaps may decrease.
>> Could you benchmark it?
> I only tested dm-linear.
>
> The configuration (dm table) of dm-linear is:
> 0 1048576 linear /dev/nvme0n1 0
> 1048576 1048576 linear /dev/nvme2n1 0
> 2097152 1048576 linear /dev/nvme5n1 0
>
>
> fio script used is:
> ```
> $cat fio.conf
> [global]
> name=iouring-sqpoll-iopoll-1
> ioengine=io_uring
> iodepth=128
> numjobs=1
> thread
> rw=randread
> direct=1
> registerfiles=1
> hipri=1
> runtime=10
> time_based
> group_reporting
> randrepeat=0
> filename=/dev/mapper/testdev
> bs=4k
>
> [job-1]
> cpus_allowed=14
> ```
>
> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
> --------------- | --------------------
> 213k | 19k
>
> At least, it doesn't work well with io_uring interface.
>
>
Jeffe,
I ran your above fio test on a linear LV split across 3 NVMes to second your split mapping
(system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio and io_uring,
the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles and hipri) which resulted ok:
sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
------|----------|---------------------|----------------- 56.3K |
290K | 329K | 351K I can't second your
drastic hipri=1 drop here...
Heinz
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-05 17:46 ` Heinz Mauelshagen
@ 2021-03-05 17:56 ` Heinz Mauelshagen
2021-03-05 18:09 ` Mike Snitzer
2021-03-08 3:54 ` JeffleXu
0 siblings, 2 replies; 17+ messages in thread
From: Heinz Mauelshagen @ 2021-03-05 17:56 UTC (permalink / raw)
To: JeffleXu, Mikulas Patocka
Cc: axboe, Mike Snitzer, caspar, hch, linux-block, joseph.qi,
dm-devel, io-uring
On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
> On 3/5/21 10:52 AM, JeffleXu wrote:
>>
>> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>>
>>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>>
>>>>
>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>>
>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>>> cookie.
>>>>>
>>>>> Signed-off-by: Mikulas Patocka <[email protected]>
>>>>>
>>>>> ---
>>>>> drivers/md/dm.c | 5 +++++
>>>>> 1 file changed, 5 insertions(+)
>>>>>
>>>>> Index: linux-2.6/drivers/md/dm.c
>>>>> ===================================================================
>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02
>>>>> 19:26:34.000000000 +0100
>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>>> }
>>>>> }
>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>>> + while (atomic_read(&ci.io->io_count) > 1 &&
>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>>> + }
>>>>> +
>>>>> /* drop the extra reference count */
>>>>> dec_pending(ci.io, errno_to_blk_status(error));
>>>>> }
>>>> It seems that the general idea of your design is to
>>>> 1) submit *one* split bio
>>>> 2) blk_poll(), waiting the previously submitted split bio complets
>>> No, I submit all the bios and poll for the last one.
>>>
>>>> and then submit next split bio, repeating the above process. I'm
>>>> afraid
>>>> the performance may be an issue here, since the batch every time
>>>> blk_poll() reaps may decrease.
>>> Could you benchmark it?
>> I only tested dm-linear.
>>
>> The configuration (dm table) of dm-linear is:
>> 0 1048576 linear /dev/nvme0n1 0
>> 1048576 1048576 linear /dev/nvme2n1 0
>> 2097152 1048576 linear /dev/nvme5n1 0
>>
>>
>> fio script used is:
>> ```
>> $cat fio.conf
>> [global]
>> name=iouring-sqpoll-iopoll-1
>> ioengine=io_uring
>> iodepth=128
>> numjobs=1
>> thread
>> rw=randread
>> direct=1
>> registerfiles=1
>> hipri=1
>> runtime=10
>> time_based
>> group_reporting
>> randrepeat=0
>> filename=/dev/mapper/testdev
>> bs=4k
>>
>> [job-1]
>> cpus_allowed=14
>> ```
>>
>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
>> --------------- | --------------------
>> 213k | 19k
>>
>> At least, it doesn't work well with io_uring interface.
>>
>>
>
>
> Jeffle,
>
> I ran your above fio test on a linear LV split across 3 NVMes to
> second your split mapping
> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio
> and io_uring,
> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles
> and hipri) which resulted ok:
>
>
>
> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
> ------|----------|---------------------|----------------- 56.3K |
> 290K | 329K | 351K I can't second your
> drastic hipri=1 drop here...
Sorry, email mess.
sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
-------|----------|---------------------|-----------------
56.3K | 290K | 329K | 351K
I can't second your drastic hipri=1 drop here...
> Heinz
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 4/4] dm: support I/O polling
2021-03-05 17:56 ` Heinz Mauelshagen
@ 2021-03-05 18:09 ` Mike Snitzer
2021-03-05 18:19 ` [dm-devel] " Heinz Mauelshagen
2021-03-08 3:54 ` JeffleXu
1 sibling, 1 reply; 17+ messages in thread
From: Mike Snitzer @ 2021-03-05 18:09 UTC (permalink / raw)
To: Heinz Mauelshagen
Cc: JeffleXu, Mikulas Patocka, axboe, caspar, hch, linux-block,
joseph.qi, dm-devel, io-uring
On Fri, Mar 05 2021 at 12:56pm -0500,
Heinz Mauelshagen <[email protected]> wrote:
>
> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
> >On 3/5/21 10:52 AM, JeffleXu wrote:
> >>
> >>On 3/3/21 6:09 PM, Mikulas Patocka wrote:
> >>>
> >>>On Wed, 3 Mar 2021, JeffleXu wrote:
> >>>
> >>>>
> >>>>On 3/3/21 3:05 AM, Mikulas Patocka wrote:
> >>>>
> >>>>>Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
> >>>>>cookie.
> >>>>>
> >>>>>Signed-off-by: Mikulas Patocka <[email protected]>
> >>>>>
> >>>>>---
> >>>>> drivers/md/dm.c | 5 +++++
> >>>>> 1 file changed, 5 insertions(+)
> >>>>>
> >>>>>Index: linux-2.6/drivers/md/dm.c
> >>>>>===================================================================
> >>>>>--- linux-2.6.orig/drivers/md/dm.c 2021-03-02
> >>>>>19:26:34.000000000 +0100
> >>>>>+++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
> >>>>>@@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
> >>>>> }
> >>>>> }
> >>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
> >>>>>+ while (atomic_read(&ci.io->io_count) > 1 &&
> >>>>>+ blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
> >>>>>+ }
> >>>>>+
> >>>>> /* drop the extra reference count */
> >>>>> dec_pending(ci.io, errno_to_blk_status(error));
> >>>>> }
> >>>>It seems that the general idea of your design is to
> >>>>1) submit *one* split bio
> >>>>2) blk_poll(), waiting the previously submitted split bio complets
> >>>No, I submit all the bios and poll for the last one.
> >>>
> >>>>and then submit next split bio, repeating the above process.
> >>>>I'm afraid
> >>>>the performance may be an issue here, since the batch every time
> >>>>blk_poll() reaps may decrease.
> >>>Could you benchmark it?
> >>I only tested dm-linear.
> >>
> >>The configuration (dm table) of dm-linear is:
> >>0 1048576 linear /dev/nvme0n1 0
> >>1048576 1048576 linear /dev/nvme2n1 0
> >>2097152 1048576 linear /dev/nvme5n1 0
> >>
> >>
> >>fio script used is:
> >>```
> >>$cat fio.conf
> >>[global]
> >>name=iouring-sqpoll-iopoll-1
> >>ioengine=io_uring
> >>iodepth=128
> >>numjobs=1
> >>thread
> >>rw=randread
> >>direct=1
> >>registerfiles=1
> >>hipri=1
> >>runtime=10
> >>time_based
> >>group_reporting
> >>randrepeat=0
> >>filename=/dev/mapper/testdev
> >>bs=4k
> >>
> >>[job-1]
> >>cpus_allowed=14
> >>```
> >>
> >>IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
> >>--------------- | --------------------
> >> 213k | 19k
> >>
> >>At least, it doesn't work well with io_uring interface.
> >>
> >>
> >
> >
> >Jeffle,
> >
> >I ran your above fio test on a linear LV split across 3 NVMes to
> >second your split mapping
> >(system: 32 core Intel, 256GiB RAM) comparing io engines sync,
> >libaio and io_uring,
> >the latter w/ and w/o hipri (sync+libaio obviously w/o
> >registerfiles and hipri) which resulted ok:
> >
> >
> >
> >sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
> >------|----------|---------------------|----------------- 56.3K
> >| 290K | 329K | 351K I can't second
> >your drastic hipri=1 drop here...
>
>
> Sorry, email mess.
>
>
> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
> -------|----------|---------------------|-----------------
> 56.3K | 290K | 329K | 351K
>
>
>
> I can't second your drastic hipri=1 drop here...
I think your result is just showcasing your powerful system's ability to
poll every related HW queue.. whereas Jeffle's system is likely somehow
more constrained (on a cpu level, memory, whatever).
My basis for this is that Mikulas' changes simply always return an
invalid cookie (BLK_QC_T_NONE) for purposes of intelligent IO polling.
Such an implementation is completely invalid.
I discussed with Jens and he said:
"it needs to return something that f_op->iopoll() can make sense of.
otherwise you have no option but to try everything."
Mike
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-05 18:09 ` Mike Snitzer
@ 2021-03-05 18:19 ` Heinz Mauelshagen
0 siblings, 0 replies; 17+ messages in thread
From: Heinz Mauelshagen @ 2021-03-05 18:19 UTC (permalink / raw)
To: Mike Snitzer
Cc: axboe, caspar, io-uring, linux-block, joseph.qi, dm-devel,
Mikulas Patocka, JeffleXu, hch
On 3/5/21 7:09 PM, Mike Snitzer wrote:
> On Fri, Mar 05 2021 at 12:56pm -0500,
> Heinz Mauelshagen <[email protected]> wrote:
>
>> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
>>> On 3/5/21 10:52 AM, JeffleXu wrote:
>>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>>>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>>>>
>>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>>>>
>>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>>>>> cookie.
>>>>>>>
>>>>>>> Signed-off-by: Mikulas Patocka <[email protected]>
>>>>>>>
>>>>>>> ---
>>>>>>> drivers/md/dm.c | 5 +++++
>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> Index: linux-2.6/drivers/md/dm.c
>>>>>>> ===================================================================
>>>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02
>>>>>>> 19:26:34.000000000 +0100
>>>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>>>>> }
>>>>>>> }
>>>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>>>>> + while (atomic_read(&ci.io->io_count) > 1 &&
>>>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>>>>> + }
>>>>>>> +
>>>>>>> /* drop the extra reference count */
>>>>>>> dec_pending(ci.io, errno_to_blk_status(error));
>>>>>>> }
>>>>>> It seems that the general idea of your design is to
>>>>>> 1) submit *one* split bio
>>>>>> 2) blk_poll(), waiting the previously submitted split bio complets
>>>>> No, I submit all the bios and poll for the last one.
>>>>>
>>>>>> and then submit next split bio, repeating the above process.
>>>>>> I'm afraid
>>>>>> the performance may be an issue here, since the batch every time
>>>>>> blk_poll() reaps may decrease.
>>>>> Could you benchmark it?
>>>> I only tested dm-linear.
>>>>
>>>> The configuration (dm table) of dm-linear is:
>>>> 0 1048576 linear /dev/nvme0n1 0
>>>> 1048576 1048576 linear /dev/nvme2n1 0
>>>> 2097152 1048576 linear /dev/nvme5n1 0
>>>>
>>>>
>>>> fio script used is:
>>>> ```
>>>> $cat fio.conf
>>>> [global]
>>>> name=iouring-sqpoll-iopoll-1
>>>> ioengine=io_uring
>>>> iodepth=128
>>>> numjobs=1
>>>> thread
>>>> rw=randread
>>>> direct=1
>>>> registerfiles=1
>>>> hipri=1
>>>> runtime=10
>>>> time_based
>>>> group_reporting
>>>> randrepeat=0
>>>> filename=/dev/mapper/testdev
>>>> bs=4k
>>>>
>>>> [job-1]
>>>> cpus_allowed=14
>>>> ```
>>>>
>>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
>>>> --------------- | --------------------
>>>> 213k | 19k
>>>>
>>>> At least, it doesn't work well with io_uring interface.
>>>>
>>>>
>>>
>>> Jeffle,
>>>
>>> I ran your above fio test on a linear LV split across 3 NVMes to
>>> second your split mapping
>>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync,
>>> libaio and io_uring,
>>> the latter w/ and w/o hipri (sync+libaio obviously w/o
>>> registerfiles and hipri) which resulted ok:
>>>
>>>
>>>
>>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
>>> ------|----------|---------------------|----------------- 56.3K
>>> | 290K | 329K | 351K I can't second
>>> your drastic hipri=1 drop here...
>>
>> Sorry, email mess.
>>
>>
>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
>> -------|----------|---------------------|-----------------
>> 56.3K | 290K | 329K | 351K
>>
>>
>>
>> I can't second your drastic hipri=1 drop here...
> I think your result is just showcasing your powerful system's ability to
> poll every related HW queue.. whereas Jeffle's system is likely somehow
> more constrained (on a cpu level, memory, whatever).
Jeffle,
mind explaining your test system configuration a bit more (cores, ram,
i/o config to your NVMes)
so that we have a better base for such assumption?
Thanks,
Heinz
>
> My basis for this is that Mikulas' changes simply always return an
> invalid cookie (BLK_QC_T_NONE) for purposes of intelligent IO polling.
>
> Such an implementation is completely invalid.
>
> I discussed with Jens and he said:
> "it needs to return something that f_op->iopoll() can make sense of.
> otherwise you have no option but to try everything."
>
> Mike
>
> --
> dm-devel mailing list
> [email protected]
> https://listman.redhat.com/mailman/listinfo/dm-devel
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-04 10:09 ` Mikulas Patocka
@ 2021-03-05 18:21 ` Jens Axboe
0 siblings, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2021-03-05 18:21 UTC (permalink / raw)
To: Mikulas Patocka, JeffleXu
Cc: Mike Snitzer, Heinz Mauelshagen, caspar, io-uring, linux-block,
joseph.qi, dm-devel, hch
On 3/4/21 3:09 AM, Mikulas Patocka wrote:
>
>
> On Thu, 4 Mar 2021, JeffleXu wrote:
>
>>> __split_and_process_non_flush records the poll cookie in ci.poll_cookie.
>>> When we processed all the bios, we poll for the last cookie here:
>>>
>>> if (ci.poll_cookie != BLK_QC_T_NONE) {
>>> while (atomic_read(&ci.io->io_count) > 1 &&
>>> blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>> }
>>
>> So what will happen if one bio submitted to dm device crosses the device
>> boundary among several target devices (e.g., dm-stripe)? Please refer
>> the following call graph.
>>
>> ```
>> submit_bio
>> __submit_bio_noacct
>> disk->fops->submit_bio(), calling into __split_and_process_bio(),
>> call __split_and_process_non_flush() once, submitting the *first* split bio
>> disk->fops->submit_bio(), calling into __split_and_process_bio(),
>> call __split_and_process_non_flush() once, submitting the *second* split bio
>> ...
>> ```
>>
>>
>> So the loop is in __submit_bio_noacct(), rather than
>> __split_and_process_bio(). Your design will send the first split bio,
>> and then poll on this split bio, then send the next split bio, polling
>> on this, go on and on...
>
> No. It will send all the bios and poll for the last one.
I took a quick look, and this seems very broken. You must not poll off
the submission path, polling should be invoked by the higher layer when
someone wants to reap events. IOW, dm should not be calling blk_poll()
by itself, only off mq_ops->poll(). Your patch seems to do it off
submission once you submit the last bio in that batch, effectively
implementing sync polling for that series. That's not right.
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-05 17:56 ` Heinz Mauelshagen
2021-03-05 18:09 ` Mike Snitzer
@ 2021-03-08 3:54 ` JeffleXu
2021-03-08 3:55 ` Jens Axboe
2021-03-09 11:42 ` Heinz Mauelshagen
1 sibling, 2 replies; 17+ messages in thread
From: JeffleXu @ 2021-03-08 3:54 UTC (permalink / raw)
To: Heinz Mauelshagen, Mikulas Patocka
Cc: axboe, Mike Snitzer, caspar, io-uring, linux-block, joseph.qi,
dm-devel, hch
On 3/6/21 1:56 AM, Heinz Mauelshagen wrote:
>
> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
>> On 3/5/21 10:52 AM, JeffleXu wrote:
>>>
>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>>>
>>>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>>>
>>>>>
>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>>>
>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>>>> cookie.
>>>>>>
>>>>>> Signed-off-by: Mikulas Patocka <[email protected]>
>>>>>>
>>>>>> ---
>>>>>> drivers/md/dm.c | 5 +++++
>>>>>> 1 file changed, 5 insertions(+)
>>>>>>
>>>>>> Index: linux-2.6/drivers/md/dm.c
>>>>>> ===================================================================
>>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02
>>>>>> 19:26:34.000000000 +0100
>>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>>>> }
>>>>>> }
>>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>>>> + while (atomic_read(&ci.io->io_count) > 1 &&
>>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>>>> + }
>>>>>> +
>>>>>> /* drop the extra reference count */
>>>>>> dec_pending(ci.io, errno_to_blk_status(error));
>>>>>> }
>>>>> It seems that the general idea of your design is to
>>>>> 1) submit *one* split bio
>>>>> 2) blk_poll(), waiting the previously submitted split bio complets
>>>> No, I submit all the bios and poll for the last one.
>>>>
>>>>> and then submit next split bio, repeating the above process. I'm
>>>>> afraid
>>>>> the performance may be an issue here, since the batch every time
>>>>> blk_poll() reaps may decrease.
>>>> Could you benchmark it?
>>> I only tested dm-linear.
>>>
>>> The configuration (dm table) of dm-linear is:
>>> 0 1048576 linear /dev/nvme0n1 0
>>> 1048576 1048576 linear /dev/nvme2n1 0
>>> 2097152 1048576 linear /dev/nvme5n1 0
>>>
>>>
>>> fio script used is:
>>> ```
>>> $cat fio.conf
>>> [global]
>>> name=iouring-sqpoll-iopoll-1
>>> ioengine=io_uring
>>> iodepth=128
>>> numjobs=1
>>> thread
>>> rw=randread
>>> direct=1
>>> registerfiles=1
>>> hipri=1
>>> runtime=10
>>> time_based
>>> group_reporting
>>> randrepeat=0
>>> filename=/dev/mapper/testdev
>>> bs=4k
>>>
>>> [job-1]
>>> cpus_allowed=14
>>> ```
>>>
>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
>>> --------------- | --------------------
>>> 213k | 19k
>>>
>>> At least, it doesn't work well with io_uring interface.
>>>
>>>
>>
>>
>> Jeffle,
>>
>> I ran your above fio test on a linear LV split across 3 NVMes to
>> second your split mapping
>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio
>> and io_uring,
>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles
>> and hipri) which resulted ok:
>>
>>
>>
>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
>> ------|----------|---------------------|----------------- 56.3K |
>> 290K | 329K | 351K I can't second your
>> drastic hipri=1 drop here...
>
>
> Sorry, email mess.
>
>
> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
> -------|----------|---------------------|-----------------
> 56.3K | 290K | 329K | 351K
>
>
>
> I can't second your drastic hipri=1 drop here...
>
Hummm, that's indeed somewhat strange...
My test environment:
- CPU: 128 cores, though only one CPU core is used since
'cpus_allowed=14' in fio configuration
- memory: 983G memory free
- NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3'
Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still
goes into IRQ mode, even you have specified 'hipri=1'?
--
Thanks,
Jeffle
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-08 3:54 ` JeffleXu
@ 2021-03-08 3:55 ` Jens Axboe
2021-03-09 11:42 ` Heinz Mauelshagen
1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2021-03-08 3:55 UTC (permalink / raw)
To: JeffleXu, Heinz Mauelshagen, Mikulas Patocka
Cc: Mike Snitzer, caspar, io-uring, linux-block, joseph.qi, dm-devel,
hch
On 3/7/21 8:54 PM, JeffleXu wrote:
>
>
> On 3/6/21 1:56 AM, Heinz Mauelshagen wrote:
>>
>> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
>>> On 3/5/21 10:52 AM, JeffleXu wrote:
>>>>
>>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>>>>
>>>>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>>>>
>>>>>>
>>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>>>>
>>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>>>>> cookie.
>>>>>>>
>>>>>>> Signed-off-by: Mikulas Patocka <[email protected]>
>>>>>>>
>>>>>>> ---
>>>>>>> drivers/md/dm.c | 5 +++++
>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> Index: linux-2.6/drivers/md/dm.c
>>>>>>> ===================================================================
>>>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02
>>>>>>> 19:26:34.000000000 +0100
>>>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>>>>> }
>>>>>>> }
>>>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>>>>> + while (atomic_read(&ci.io->io_count) > 1 &&
>>>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>>>>> + }
>>>>>>> +
>>>>>>> /* drop the extra reference count */
>>>>>>> dec_pending(ci.io, errno_to_blk_status(error));
>>>>>>> }
>>>>>> It seems that the general idea of your design is to
>>>>>> 1) submit *one* split bio
>>>>>> 2) blk_poll(), waiting the previously submitted split bio complets
>>>>> No, I submit all the bios and poll for the last one.
>>>>>
>>>>>> and then submit next split bio, repeating the above process. I'm
>>>>>> afraid
>>>>>> the performance may be an issue here, since the batch every time
>>>>>> blk_poll() reaps may decrease.
>>>>> Could you benchmark it?
>>>> I only tested dm-linear.
>>>>
>>>> The configuration (dm table) of dm-linear is:
>>>> 0 1048576 linear /dev/nvme0n1 0
>>>> 1048576 1048576 linear /dev/nvme2n1 0
>>>> 2097152 1048576 linear /dev/nvme5n1 0
>>>>
>>>>
>>>> fio script used is:
>>>> ```
>>>> $cat fio.conf
>>>> [global]
>>>> name=iouring-sqpoll-iopoll-1
>>>> ioengine=io_uring
>>>> iodepth=128
>>>> numjobs=1
>>>> thread
>>>> rw=randread
>>>> direct=1
>>>> registerfiles=1
>>>> hipri=1
>>>> runtime=10
>>>> time_based
>>>> group_reporting
>>>> randrepeat=0
>>>> filename=/dev/mapper/testdev
>>>> bs=4k
>>>>
>>>> [job-1]
>>>> cpus_allowed=14
>>>> ```
>>>>
>>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
>>>> --------------- | --------------------
>>>> 213k | 19k
>>>>
>>>> At least, it doesn't work well with io_uring interface.
>>>>
>>>>
>>>
>>>
>>> Jeffle,
>>>
>>> I ran your above fio test on a linear LV split across 3 NVMes to
>>> second your split mapping
>>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio
>>> and io_uring,
>>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles
>>> and hipri) which resulted ok:
>>>
>>>
>>>
>>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
>>> ------|----------|---------------------|----------------- 56.3K |
>>> 290K | 329K | 351K I can't second your
>>> drastic hipri=1 drop here...
>>
>>
>> Sorry, email mess.
>>
>>
>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
>> -------|----------|---------------------|-----------------
>> 56.3K | 290K | 329K | 351K
>>
>>
>>
>> I can't second your drastic hipri=1 drop here...
>>
>
> Hummm, that's indeed somewhat strange...
>
> My test environment:
> - CPU: 128 cores, though only one CPU core is used since
> 'cpus_allowed=14' in fio configuration
> - memory: 983G memory free
> - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3'
>
> Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still
> goes into IRQ mode, even you have specified 'hipri=1'?
That would be my guess too, and the patches also have a very suspicious
clear of HIPRI which shouldn't be there (which would let that fly through).
--
Jens Axboe
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [dm-devel] [PATCH 4/4] dm: support I/O polling
2021-03-08 3:54 ` JeffleXu
2021-03-08 3:55 ` Jens Axboe
@ 2021-03-09 11:42 ` Heinz Mauelshagen
1 sibling, 0 replies; 17+ messages in thread
From: Heinz Mauelshagen @ 2021-03-09 11:42 UTC (permalink / raw)
To: JeffleXu, Mikulas Patocka
Cc: axboe, Mike Snitzer, caspar, hch, linux-block, joseph.qi,
dm-devel, io-uring
On 3/8/21 4:54 AM, JeffleXu wrote:
>
> On 3/6/21 1:56 AM, Heinz Mauelshagen wrote:
>> On 3/5/21 6:46 PM, Heinz Mauelshagen wrote:
>>> On 3/5/21 10:52 AM, JeffleXu wrote:
>>>> On 3/3/21 6:09 PM, Mikulas Patocka wrote:
>>>>> On Wed, 3 Mar 2021, JeffleXu wrote:
>>>>>
>>>>>> On 3/3/21 3:05 AM, Mikulas Patocka wrote:
>>>>>>
>>>>>>> Support I/O polling if submit_bio_noacct_mq_direct returned non-empty
>>>>>>> cookie.
>>>>>>>
>>>>>>> Signed-off-by: Mikulas Patocka <[email protected]>
>>>>>>>
>>>>>>> ---
>>>>>>> drivers/md/dm.c | 5 +++++
>>>>>>> 1 file changed, 5 insertions(+)
>>>>>>>
>>>>>>> Index: linux-2.6/drivers/md/dm.c
>>>>>>> ===================================================================
>>>>>>> --- linux-2.6.orig/drivers/md/dm.c 2021-03-02
>>>>>>> 19:26:34.000000000 +0100
>>>>>>> +++ linux-2.6/drivers/md/dm.c 2021-03-02 19:26:34.000000000 +0100
>>>>>>> @@ -1682,6 +1682,11 @@ static void __split_and_process_bio(stru
>>>>>>> }
>>>>>>> }
>>>>>>> + if (ci.poll_cookie != BLK_QC_T_NONE) {
>>>>>>> + while (atomic_read(&ci.io->io_count) > 1 &&
>>>>>>> + blk_poll(ci.poll_queue, ci.poll_cookie, true)) ;
>>>>>>> + }
>>>>>>> +
>>>>>>> /* drop the extra reference count */
>>>>>>> dec_pending(ci.io, errno_to_blk_status(error));
>>>>>>> }
>>>>>> It seems that the general idea of your design is to
>>>>>> 1) submit *one* split bio
>>>>>> 2) blk_poll(), waiting the previously submitted split bio complets
>>>>> No, I submit all the bios and poll for the last one.
>>>>>
>>>>>> and then submit next split bio, repeating the above process. I'm
>>>>>> afraid
>>>>>> the performance may be an issue here, since the batch every time
>>>>>> blk_poll() reaps may decrease.
>>>>> Could you benchmark it?
>>>> I only tested dm-linear.
>>>>
>>>> The configuration (dm table) of dm-linear is:
>>>> 0 1048576 linear /dev/nvme0n1 0
>>>> 1048576 1048576 linear /dev/nvme2n1 0
>>>> 2097152 1048576 linear /dev/nvme5n1 0
>>>>
>>>>
>>>> fio script used is:
>>>> ```
>>>> $cat fio.conf
>>>> [global]
>>>> name=iouring-sqpoll-iopoll-1
>>>> ioengine=io_uring
>>>> iodepth=128
>>>> numjobs=1
>>>> thread
>>>> rw=randread
>>>> direct=1
>>>> registerfiles=1
>>>> hipri=1
>>>> runtime=10
>>>> time_based
>>>> group_reporting
>>>> randrepeat=0
>>>> filename=/dev/mapper/testdev
>>>> bs=4k
>>>>
>>>> [job-1]
>>>> cpus_allowed=14
>>>> ```
>>>>
>>>> IOPS (IRQ mode) | IOPS (iopoll mode (hipri=1))
>>>> --------------- | --------------------
>>>> 213k | 19k
>>>>
>>>> At least, it doesn't work well with io_uring interface.
>>>>
>>>>
>>>
>>> Jeffle,
>>>
>>> I ran your above fio test on a linear LV split across 3 NVMes to
>>> second your split mapping
>>> (system: 32 core Intel, 256GiB RAM) comparing io engines sync, libaio
>>> and io_uring,
>>> the latter w/ and w/o hipri (sync+libaio obviously w/o registerfiles
>>> and hipri) which resulted ok:
>>>
>>>
>>>
>>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
>>> ------|----------|---------------------|----------------- 56.3K |
>>> 290K | 329K | 351K I can't second your
>>> drastic hipri=1 drop here...
>>
>> Sorry, email mess.
>>
>>
>> sync | libaio | IRQ mode (hipri=0) | iopoll (hipri=1)
>> -------|----------|---------------------|-----------------
>> 56.3K | 290K | 329K | 351K
>>
>>
>>
>> I can't second your drastic hipri=1 drop here...
>>
> Hummm, that's indeed somewhat strange...
>
> My test environment:
> - CPU: 128 cores, though only one CPU core is used since
> 'cpus_allowed=14' in fio configuration
> - memory: 983G memory free
> - NVMe: Huawai ES3510P (HWE52P434T0L005N), with 'nvme.poll_queues=3'
>
> Maybe you didn't specify 'nvme.poll_queues=XXX'? In this case, IO still
> goes into IRQ mode, even you have specified 'hipri=1'?
>
Jeffle,
nvme.poll_queues was zero indeed.
At 3 results are hipri=0 / hipri=1 : 1699K / 1702K IOPS (all cores)
Single core results : 315K / 329K
Still no extreme drop...
FWIW:
Thanks,
Heinz
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-03-09 11:44 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-02 19:05 [PATCH 4/4] dm: support I/O polling Mikulas Patocka
2021-03-03 2:53 ` [dm-devel] " JeffleXu
2021-03-03 10:09 ` Mikulas Patocka
2021-03-04 2:57 ` JeffleXu
2021-03-04 10:09 ` Mikulas Patocka
2021-03-05 18:21 ` Jens Axboe
2021-03-04 15:01 ` Jeff Moyer
2021-03-04 15:11 ` Mike Snitzer
2021-03-04 15:12 ` [dm-devel] " Mikulas Patocka
2021-03-05 9:52 ` JeffleXu
2021-03-05 17:46 ` Heinz Mauelshagen
2021-03-05 17:56 ` Heinz Mauelshagen
2021-03-05 18:09 ` Mike Snitzer
2021-03-05 18:19 ` [dm-devel] " Heinz Mauelshagen
2021-03-08 3:54 ` JeffleXu
2021-03-08 3:55 ` Jens Axboe
2021-03-09 11:42 ` Heinz Mauelshagen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox