Re: [RFC] single cqe per link

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: "Pavel Begunkov" <[email protected]>,
	"Carter Li 李通洲" <[email protected]>
Cc: io-uring <[email protected]>
Subject: Re: [RFC] single cqe per link
Date: Tue, 25 Feb 2020 13:20:24 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2/25/20 3:12 AM, Pavel Begunkov wrote:
> On 2/25/2020 6:13 AM, Jens Axboe wrote:
>>>> I still think flags tagged on sqes could be a better choice, which
>>>> gives users an ability to deside if they want to ignore the cqes, not
>>>> only for links, but also for normal sqes.
>>>>
>>>> In addition, boxed cqes couldn’t resolve the issue of
>>>> IORING_IO_TIMEOUT.
>>>
>>> I would tend to agree, and it'd be trivial to just set the flag on
>>> whatever SQEs in the chain you don't care about. Or even an individual
>>> SQE, though that's probably a bit more of a reach in terms of use case.
>>> Maybe nop with drain + ignore?
> 
> Flexible, but not performant. The existence of drain is already makes
> io_uring to do a lot of extra stuff, and even worse when it's actually used.

Yeah I agree, that's assuming we can make the drain more efficient. Just
hand waving on possible use cases :-)

>>> In any case it's definitely more flexible.
> 
> That's a different thing. Knowing how requests behave (e.g. if
> nbytes!=res, then fail link), one would want to get cqe for the last
> executed sqe, whether it's an error or a success for the last one.
> 
> It makes a link to be handled as a single entity. I don't see a way to
> emulate similar behaviour with the unconditional masking. Probably, we
> will need them both.

But you can easily do that with IOSQE_NO_CQE, in fact that's what I did
to test this. The chain will have IOSQE_NO_CQE | IOSQE_IO_LINK set on
all but the last request.

>> In the interest of taking this to the extreme, I tried a nop benchmark
>> on my laptop (qemu/kvm). Granted, this setup is particularly sensitive
>> to spinlocks, they are a lot more expensive there than on a real host.
>>
>> Anyway, regular nops run at about 9.5M/sec with a single thread.
>> Flagging all SQEs with IOSQE_NO_CQE nets me about 14M/sec. So a handy
>> improvement. Looking at the top of profiles:
>>
>> cqe-per-sqe:
>>
>> +   28.45%  io_uring  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>> +   14.38%  io_uring  [kernel.kallsyms]  [k] io_submit_sqes
>> +    9.38%  io_uring  [kernel.kallsyms]  [k] io_put_req
>> +    7.25%  io_uring  libc-2.31.so       [.] syscall
>> +    6.12%  io_uring  [kernel.kallsyms]  [k] kmem_cache_free
>>
>> no-cqes:
>>
>> +   19.72%  io_uring  [kernel.kallsyms]  [k] io_put_req
>> +   11.93%  io_uring  [kernel.kallsyms]  [k] io_submit_sqes
>> +   10.14%  io_uring  [kernel.kallsyms]  [k] kmem_cache_free
>> +    9.55%  io_uring  libc-2.31.so       [.] syscall
>> +    7.48%  io_uring  [kernel.kallsyms]  [k] __io_queue_sqe
>>
>> I'll try the real disk IO tomorrow, using polled IO.
> 
> Great, would love to see

My box with the optane2 is out of commission, apparently, cannot get it
going today. So I had to make do with my laptop, which does about ~600K
random read IOPS. I don't see any difference there, using polled IO,
using 4 link deep chains (so 1/4th the CQEs). Both run at around
611-613K IOPS.

-- 
Jens Axboe

next prev parent reply	other threads:[~2020-02-25 20:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-25  0:39 [RFC] single cqe per link Pavel Begunkov
2020-02-25  2:14 ` Carter Li 李通洲
2020-02-25  2:36   ` Jens Axboe
2020-02-25  3:13     ` Jens Axboe
2020-02-25 10:12       ` Pavel Begunkov
2020-02-25 20:20         ` Jens Axboe [this message]
2020-02-25 21:13           ` Pavel Begunkov
2020-08-21  5:17             ` Questions about IORING_OP_ASYNC_CANCEL usage Carter Li 李通洲
2020-08-21  5:20               ` Carter Li 李通洲
2020-02-25  2:24 ` [RFC] single cqe per link Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox