public inbox for [email protected]
 help / color / mirror / Atom feed
* [RFC] io_commit_cqring __io_cqring_fill_event take up too much cpu
@ 2020-06-22 13:29 Xuan Zhuo
  2020-06-22 14:50 ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Xuan Zhuo @ 2020-06-22 13:29 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, Dust.li

Hi Jens,
I found a problem, and I think it is necessary to solve it. But the change
may be relatively large, so I would like to ask you and everyone for your
opinions. Or everyone has other ideas about this issue:

Problem description:
===================
I found that in the sq thread mode, the CPU used by io_commit_cqring and
__io_cqring_fill_event accounts for a relatively large amount. The reason is
because a large number of calls to smp_store_release and WRITE_ONCE.
These two functions are relatively slow, and we need to call smp_store_release
every time we submit a cqe. This large number of calls has caused this
problem to become very prominent.

My test environment is in qemu, using io_uring to accept a large number of
udp packets in sq thread mode, the speed is 800000pps. I submitted 100 sqes
to recv udp packet at the beginning of the application, and every time I
received a cqe, I submitted another sqe. The perf top result of sq thread is
as follows:



17.97% [kernel] [k] copy_user_generic_unrolled
13.92% [kernel] [k] io_commit_cqring
11.04% [kernel] [k] __io_cqring_fill_event
10.33% [kernel] [k] udp_recvmsg
  5.94% [kernel] [k] skb_release_data
  4.31% [kernel] [k] udp_rmem_release
  2.68% [kernel] [k] __check_object_size
  2.24% [kernel] [k] __slab_free
  2.22% [kernel] [k] _raw_spin_lock_bh
  2.21% [kernel] [k] kmem_cache_free
  2.13% [kernel] [k] free_pcppages_bulk
  1.83% [kernel] [k] io_submit_sqes
  1.38% [kernel] [k] page_frag_free
  1.31% [kernel] [k] inet_recvmsg



It can be seen that io_commit_cqring and __io_cqring_fill_event account
for 24.96%. This is too much. In general, the proportion of syscall may not
be so high, so we must solve this problem.


Solution:
=================
I consider that when the nr of an io_submit_sqes is too large, we don't call
io_cqring_add_event directly, we can put the completed req in the queue, and
then call __io_cqring_fill_event for each req then call once io_commit_cqring
at the end of the io_submit_sqes function. In this way my local simple test
looks good.


Thanks for your feedback,
Xuan Zhuo



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-06-23 14:44 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-22 13:29 [RFC] io_commit_cqring __io_cqring_fill_event take up too much cpu Xuan Zhuo
2020-06-22 14:50 ` Jens Axboe
2020-06-22 17:11   ` Jens Axboe
2020-06-23  8:42     ` xuanzhuo
2020-06-23 12:32     ` Pavel Begunkov
2020-06-23 14:44       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox