Re: [RFC] io_commit_cqring __io_cqring_fill_event take up too much cpu

public inbox for [email protected]
 help / color / mirror / Atom feed

From: xuanzhuo <[email protected]>
To: Jens Axboe <[email protected]>, io-uring <[email protected]>
Cc: [email protected]
Subject: Re: [RFC] io_commit_cqring __io_cqring_fill_event take up too much cpu
Date: Tue, 23 Jun 2020 16:42:55 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>



On 2020/6/23 上午1:11, Jens Axboe wrote:
> On 6/22/20 8:50 AM, Jens Axboe wrote:
>> On 6/22/20 7:29 AM, Xuan Zhuo wrote:
>>> Hi Jens,
>>> I found a problem, and I think it is necessary to solve it. But the change
>>> may be relatively large, so I would like to ask you and everyone for your
>>> opinions. Or everyone has other ideas about this issue:
>>>
>>> Problem description:
>>> ===================
>>> I found that in the sq thread mode, the CPU used by io_commit_cqring and
>>> __io_cqring_fill_event accounts for a relatively large amount. The reason is
>>> because a large number of calls to smp_store_release and WRITE_ONCE.
>>> These two functions are relatively slow, and we need to call smp_store_release
>>> every time we submit a cqe. This large number of calls has caused this
>>> problem to become very prominent.
>>>
>>> My test environment is in qemu, using io_uring to accept a large number of
>>> udp packets in sq thread mode, the speed is 800000pps. I submitted 100 sqes
>>> to recv udp packet at the beginning of the application, and every time I
>>> received a cqe, I submitted another sqe. The perf top result of sq thread is
>>> as follows:
>>>
>>>
>>>
>>> 17.97% [kernel] [k] copy_user_generic_unrolled
>>> 13.92% [kernel] [k] io_commit_cqring
>>> 11.04% [kernel] [k] __io_cqring_fill_event
>>> 10.33% [kernel] [k] udp_recvmsg
>>>    5.94% [kernel] [k] skb_release_data
>>>    4.31% [kernel] [k] udp_rmem_release
>>>    2.68% [kernel] [k] __check_object_size
>>>    2.24% [kernel] [k] __slab_free
>>>    2.22% [kernel] [k] _raw_spin_lock_bh
>>>    2.21% [kernel] [k] kmem_cache_free
>>>    2.13% [kernel] [k] free_pcppages_bulk
>>>    1.83% [kernel] [k] io_submit_sqes
>>>    1.38% [kernel] [k] page_frag_free
>>>    1.31% [kernel] [k] inet_recvmsg
>>>
>>>
>>>
>>> It can be seen that io_commit_cqring and __io_cqring_fill_event account
>>> for 24.96%. This is too much. In general, the proportion of syscall may not
>>> be so high, so we must solve this problem.
>>>
>>>
>>> Solution:
>>> =================
>>> I consider that when the nr of an io_submit_sqes is too large, we don't call
>>> io_cqring_add_event directly, we can put the completed req in the queue, and
>>> then call __io_cqring_fill_event for each req then call once io_commit_cqring
>>> at the end of the io_submit_sqes function. In this way my local simple test
>>> looks good.
>> I think the solution here is to defer the cq ring filling + commit to the
>> caller instead of deep down the stack, I think that's a nice win in general.
>> To do that, we need to be able to do it after io_submit_sqes() has been
>> called. We can either do that inline, by passing down a list or struct
>> that allows the caller to place the request there instead of filling
>> the event, or out-of-band by having eg a percpu struct that allows the
>> same thing. In both cases, the actual call site would do something ala:
>>
>> if (comp_list && successful_completion) {
>> 	req->result = ret;
>> 	list_add_tail(&req->list, comp_list);
>> } else {
>> 	io_cqring_add_event(req, ret);
>> 	if (!successful_completion)
>> 		req_set_fail_links(req);
>> 	io_put_req(req);
>> }
>>
>> and then have the caller iterate the list and fill completions, if it's
>> non-empty on return.
>>
>> I don't think this is necessarily hard, but to do it nicely it will
>> touch a bunch code and hence be quite a bit of churn. I do think the
>> reward is worth it though, as this applies to the "normal" submission
>> path as well, not just the SQPOLL variant.
> Something like this series. I'd be interested to hear if it makes your
> specific test case any better.
>
> Patches are against my for-5.9/io_uring branch.
>
I applied your patches on for-5.9/io_uring and performed the same test.
The data of io_sq_thread perf top is as follows:


    19.99% [kernel] [k] copy_user_generic_unrolled
    11.63% [kernel] [k] skb_release_data
     9.36% [kernel] [k] udp_rmem_release
     8.64% [kernel] [k] udp_recvmsg
     6.21% [kernel] [k] __slab_free
     4.39% [kernel] [k] __check_object_size
     3.64% [kernel] [k] free_pcppages_bulk
     2.41% [kernel] [k] kmem_cache_free
     2.00% [kernel] [k] io_submit_sqes
     1.95% [kernel] [k] page_frag_free
     1.54% [kernel] [k] io_put_req

     ...

     0.07% [kernel] [k] io_commit_cqring
     0.44% [kernel] [k] __io_cqring_fill_event


The ratio of io_commit_cqring and __io_cqring_fill_event has been
significantly reduced.

Due to the kernel of this version, setting the irq smb_affinity of the 
network card
has no effect, so the lock competition of udp receiving packets is 
relatively large,
which may be a bug. This leads to a large fluctuation in throughput, so 
I not provide
UDP packet receiving bandwidth.

Well done.

next prev parent reply	other threads:[~2020-06-23  8:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-22 13:29 [RFC] io_commit_cqring __io_cqring_fill_event take up too much cpu Xuan Zhuo
2020-06-22 14:50 ` Jens Axboe
2020-06-22 17:11   ` Jens Axboe
2020-06-23  8:42     ` xuanzhuo [this message]
2020-06-23 12:32     ` Pavel Begunkov
2020-06-23 14:44       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=79751016-112a-7f9b-0cd9-d114ae88b2cb@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox