public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Olivier Langlois <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: io_uring NAPI busy poll RCU is causing 50 context switches/second to my sqpoll thread
Date: Wed, 31 Jul 2024 02:00:11 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 7/31/24 01:33, Pavel Begunkov wrote:
> On 7/31/24 00:14, Olivier Langlois wrote:
>> On Tue, 2024-07-30 at 21:25 +0100, Pavel Begunkov wrote:
>>>
>>> Removing an entry or two once every minute is definitely not
>>> going to take 50% CPU, RCU machinery is running in background
>>> regardless of whether io_uring uses it or not, and it's pretty
>>> cheap considering ammortisation.
>>>
>>> If anything it more sounds from your explanation like the
>>> scheduler makes a wrong decision and schedules out the sqpoll
>>> thread even though it could continue to run, but that's need
>>> a confirmation. Does the CPU your SQPOLL is pinned to stays
>>> 100% utilised?
>>
>> Here are the facts as they are documented in the github issue.
>>
>> 1. despite thinking that I was doing NAPI busy polling, I was not
>> because my ring was not receiving any sqe after its initial setup.
>>
>> This is what the patch developped with your input
>> https://lore.kernel.org/io-uring/382791dc97d208d88ee31e5ebb5b661a0453fb79.1722374371.git.olivier@trillion01.com/T/#u
>>
>> is addressing
>>
>> (BTW, I should check if there is such a thing, but I would love to know
>> if the net code is exposing a tracepoint when napi_busy_poll is called
>> because it is very tricky to know if it is done for real or not)
>>
>> 2. the moment a second ring has been attached to the sqpoll thread that
>> was receving a lot of sqe, the NAPI busy loop started to be made for
>> real and the sqpoll cpu usage unexplicably dropped from 99% to 55%
>>
>> 3. here is my kernel cmdline:
>> hugepages=72 isolcpus=0,1,2 nohz_full=0,1,2 rcu_nocbs=0,1,2
>> rcu_nocb_poll irqaffinity=3 idle=nomwait processor.max_cstate=1
>> intel_idle.max_cstate=1 nmi_watchdog=0
>>
>> there is absolutely nothing else on CPU0 where the sqpoll thread
>> affinity is set to run.
>>
>> 4. I got the idea of doing this:
>> echo common_pid == sqpoll_pid > /sys/kernel/tracing/events/sched/filter
>> echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
>>
>> and I have recorded over 1,000 context switches in 23 seconds with RCU
>> related kernel threads.
>>
>> 5. just for the fun of checking out, I have disabled NAPI polling on my
>> io_uring rings and the sqpoll thread magically returned to 99% CPU
>> usage from 55%...
>>
>> I am open to other explanations for what I have observed but my current
>> conclusion is based on what I am able to see... the evidence appears
>> very convincing to me...
> 
> You're seeing something that doesn't make much sense to me, and we need
> to understand what that is. There might be a bug _somewhere_, that's
> always a possibility, but before saying that let's get a bit more data.
> 
> While the app is working, can you grab a profile and run mpstat for the
> CPU on which you have the SQPOLL task?
> 
> perf record -g -C <CPU number> --all-kernel &
> mpstat -u -P <CPU number> 5 10 &
> 
> And then as usual, time it so that you have some activity going on,
> mpstat interval may need adjustments, and perf report it as before

I forgot to add, ~50 switches/second for relatively brief RCU handling
is not much, not enough to take 50% of a CPU. I wonder if sqpoll was
still running but napi busy polling time got accounted to softirq
because of disabled bh and you didn't include it, hence asking CPU
stats. Do you see any latency problems for that configuration?

-- 
Pavel Begunkov

  reply	other threads:[~2024-07-31  0:59 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-30 20:05 io_uring NAPI busy poll RCU is causing 50 context switches/second to my sqpoll thread Olivier Langlois
2024-07-30 20:25 ` Pavel Begunkov
2024-07-30 23:14   ` Olivier Langlois
2024-07-31  0:33     ` Pavel Begunkov
2024-07-31  1:00       ` Pavel Begunkov [this message]
2024-08-01 23:05         ` Olivier Langlois
2024-08-01 22:02       ` Olivier Langlois
2024-08-02 15:22         ` Pavel Begunkov
2024-08-03 14:15           ` Olivier Langlois
2024-08-03 14:36             ` Jens Axboe
2024-08-03 16:50               ` Olivier Langlois
2024-08-03 21:37               ` Olivier Langlois

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox