* io_uring_queue_exit is REALLY slow
@ 2020-06-07 3:55 Clay Harris
2020-06-07 14:37 ` Jens Axboe
0 siblings, 1 reply; 3+ messages in thread
From: Clay Harris @ 2020-06-07 3:55 UTC (permalink / raw)
To: io-uring
So, I realize that this probably isn't something that you've looked
at yet. But, I was interested in a different criteria looking at
io_uring. That is how efficient it is for small numbers of requests
which don't transfer much data. In other words, what is the minimum
amount of io_uring work for which a program speed-up can be obtained.
I realize that this is highly dependent on how much overlap can be
gained with async processing.
In order to get a baseline, I wrote a test program which performs
4 opens, followed by 4 read + closes. For the baseline I
intentionally used files in /proc so that there would be minimum
async and I could set IOSQE_ASYNC later. I was quite surprised
by the result: Almost the entire program wall time was used in
the io_uring_queue_exit() call.
I wrote another test program which does just inits followed by exits.
There are clock_gettime()s around the io_uring_queue_init(8, &ring, 0)
and io_uring_queue_exit() calls and I printed the ratio of the
io_uring_queue_exit() elapsed time and the sum of elapsed time of
both calls.
The result varied between 0.94 and 0.99. In other words, exit is
between 16 and 100 times slower than init. Average ratio was
around 0.97. Looking at the liburing code, exit does just what
I'd expect (unmap pages and close io_uring fd).
I would have bet the ratio would be less than 0.50. No
operations were ever performed by the ring, so there should be
minimal cleanup. Even if the kernel needed to do a bunch of
cleanup, it shouldn't need the pages mapped into user space to work;
same thing for the fd being open in the user process.
Seems like there is some room for optimization here.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: io_uring_queue_exit is REALLY slow
2020-06-07 3:55 io_uring_queue_exit is REALLY slow Clay Harris
@ 2020-06-07 14:37 ` Jens Axboe
2020-06-10 3:10 ` Clay Harris
0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2020-06-07 14:37 UTC (permalink / raw)
To: Clay Harris, io-uring
On 6/6/20 9:55 PM, Clay Harris wrote:
> So, I realize that this probably isn't something that you've looked
> at yet. But, I was interested in a different criteria looking at
> io_uring. That is how efficient it is for small numbers of requests
> which don't transfer much data. In other words, what is the minimum
> amount of io_uring work for which a program speed-up can be obtained.
> I realize that this is highly dependent on how much overlap can be
> gained with async processing.
>
> In order to get a baseline, I wrote a test program which performs
> 4 opens, followed by 4 read + closes. For the baseline I
> intentionally used files in /proc so that there would be minimum
> async and I could set IOSQE_ASYNC later. I was quite surprised
> by the result: Almost the entire program wall time was used in
> the io_uring_queue_exit() call.
>
> I wrote another test program which does just inits followed by exits.
> There are clock_gettime()s around the io_uring_queue_init(8, &ring, 0)
> and io_uring_queue_exit() calls and I printed the ratio of the
> io_uring_queue_exit() elapsed time and the sum of elapsed time of
> both calls.
>
> The result varied between 0.94 and 0.99. In other words, exit is
> between 16 and 100 times slower than init. Average ratio was
> around 0.97. Looking at the liburing code, exit does just what
> I'd expect (unmap pages and close io_uring fd).
>
> I would have bet the ratio would be less than 0.50. No
> operations were ever performed by the ring, so there should be
> minimal cleanup. Even if the kernel needed to do a bunch of
> cleanup, it shouldn't need the pages mapped into user space to work;
> same thing for the fd being open in the user process.
>
> Seems like there is some room for optimization here.
Can you share your test case? And what kernel are you using, that's
kind of important.
There's no reason for teardown to be slow, except if you have
pending IO that we need to either cancel or wait for. Due to
other reasons, newer kernels will have most/some parts of
the teardown done out-of-line.
--
Jens Axboe
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: io_uring_queue_exit is REALLY slow
2020-06-07 14:37 ` Jens Axboe
@ 2020-06-10 3:10 ` Clay Harris
0 siblings, 0 replies; 3+ messages in thread
From: Clay Harris @ 2020-06-10 3:10 UTC (permalink / raw)
To: Jens Axboe; +Cc: io-uring
On Sun, Jun 07 2020 at 08:37:30 -0600, Jens Axboe quoth thus:
> On 6/6/20 9:55 PM, Clay Harris wrote:
> > So, I realize that this probably isn't something that you've looked
> > at yet. But, I was interested in a different criteria looking at
> > io_uring. That is how efficient it is for small numbers of requests
> > which don't transfer much data. In other words, what is the minimum
> > amount of io_uring work for which a program speed-up can be obtained.
> > I realize that this is highly dependent on how much overlap can be
> > gained with async processing.
> >
> > In order to get a baseline, I wrote a test program which performs
> > 4 opens, followed by 4 read + closes. For the baseline I
> > intentionally used files in /proc so that there would be minimum
> > async and I could set IOSQE_ASYNC later. I was quite surprised
> > by the result: Almost the entire program wall time was used in
> > the io_uring_queue_exit() call.
> >
> > I wrote another test program which does just inits followed by exits.
> > There are clock_gettime()s around the io_uring_queue_init(8, &ring, 0)
> > and io_uring_queue_exit() calls and I printed the ratio of the
> > io_uring_queue_exit() elapsed time and the sum of elapsed time of
> > both calls.
> >
> > The result varied between 0.94 and 0.99. In other words, exit is
> > between 16 and 100 times slower than init. Average ratio was
> > around 0.97. Looking at the liburing code, exit does just what
> > I'd expect (unmap pages and close io_uring fd).
> >
> > I would have bet the ratio would be less than 0.50. No
> > operations were ever performed by the ring, so there should be
> > minimal cleanup. Even if the kernel needed to do a bunch of
> > cleanup, it shouldn't need the pages mapped into user space to work;
> > same thing for the fd being open in the user process.
> >
> > Seems like there is some room for optimization here.
>
> Can you share your test case? And what kernel are you using, that's
> kind of important.
>
> There's no reason for teardown to be slow, except if you have
> pending IO that we need to either cancel or wait for. Due to
> other reasons, newer kernels will have most/some parts of
> the teardown done out-of-line.
I'm working up a test program for you.
Just FYI:
My initial analysis indicates that closing the io_uring fd is what's
taking all the extra time.
> --
> Jens Axboe
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2020-06-10 3:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-07 3:55 io_uring_queue_exit is REALLY slow Clay Harris
2020-06-07 14:37 ` Jens Axboe
2020-06-10 3:10 ` Clay Harris
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox