From: Bijan Mottahedeh <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected]
Subject: Re: io_uring performance with block sizes > 128k
Date: Tue, 3 Mar 2020 12:23:12 -0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 3/2/2020 9:01 PM, Jens Axboe wrote:
> On 3/2/20 4:57 PM, Jens Axboe wrote:
>> On 3/2/20 4:55 PM, Bijan Mottahedeh wrote:
>>> I'm seeing a sizeable drop in perf with polled fio tests for block sizes
>>> > 128k:
>>>
>>> filename=/dev/nvme0n1
>>> rw=randread
>>> direct=1
>>> time_based=1
>>> randrepeat=1
>>> gtod_reduce=1
>>>
>>> fio --readonly --ioengine=io_uring --iodepth 1024 --fixedbufs --hipri
>>> --numjobs=16
>>> fio --readonly --ioengine=pvsync2 --iodepth 1024 --hipri --numjobs=16
>>>
>>>
>>> Compared with the pvsync2 engine, the only major difference I could see
>>> was the dio path, __blkdev_direct_IO() for io_uring vs.
>>> __blkdev_direct_IO_simple() for pvsync2 because of the is_sync_kiocb()
>>> check.
>>>
>>>
>>> static ssize_t
>>> blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
>>> {
>>> ...
>>> if (is_sync_kiocb(iocb) && nr_pages <= BIO_MAX_PAGES)
>>> return __blkdev_direct_IO_simple(iocb, iter, nr_pages);
>>>
>>> return __blkdev_direct_IO(iocb, iter, min(nr_pages,
>>> BIO_MAX_PAGES));
>>> }
>>>
>>> Just for an experiment, I hacked io_uring code to force it through the
>>> _simple() path and I get better numbers though the variance is fairly
>>> high, but the drop at bs > 128k seems consistent:
>>>
>>>
>>> # baseline
>>> READ: bw=3167MiB/s (3321MB/s), 186MiB/s-208MiB/s (196MB/s-219MB/s) #128k
>>> READ: bw=898MiB/s (941MB/s), 51.2MiB/s-66.1MiB/s (53.7MB/s-69.3MB/s) #144k
>>> READ: bw=1576MiB/s (1652MB/s), 81.8MiB/s-109MiB/s (85.8MB/s-114MB/s) #256k
>>>
>>> # hack
>>> READ: bw=2705MiB/s (2836MB/s), 157MiB/s-174MiB/s (165MB/s-183MB/s) #128k
>>> READ: bw=2901MiB/s (3042MB/s), 174MiB/s-194MiB/s (183MB/s-204MB/s) #144k
>>> READ: bw=4194MiB/s (4398MB/s), 252MiB/s-271MiB/s (265MB/s-284MB/s) #256k
>> A quick guess would be that the IO is being split above 128K, and hence
>> the polling only catches one of the parts?
> Can you try and see if this makes a difference?
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 571b510ef0e7..cf7599a2c503 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1725,8 +1725,10 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
> if (ret < 0)
> break;
>
> +#if 0
> if (ret && spin)
> spin = false;
> +#endif
> ret = 0;
> }
>
>
I didn't see a difference.
If the request is split into two bios, is REQ_F_IOPOLL_COMPLETED set
only when the 2nd bio completes?
I think you mentioned before that the request is split with
__blk_queue_split() but I haven't yet been able to see how that happens
exactly. I see that the request size nvme_queue_rq() is the same as the
original (e.g. 256k), is that expected?
prev parent reply other threads:[~2020-03-03 20:23 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-02 23:55 io_uring performance with block sizes > 128k Bijan Mottahedeh
2020-03-02 23:57 ` Jens Axboe
2020-03-03 5:01 ` Jens Axboe
2020-03-03 20:23 ` Bijan Mottahedeh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox