public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, David Wei <[email protected]>,
	[email protected]
Subject: Re: [PATCH v2] io_uring: add IORING_ENTER_NO_IOWAIT to not set in_iowait
Date: Sun, 18 Aug 2024 03:27:38 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/18/24 02:08, Jens Axboe wrote:
> On 8/17/24 4:04 PM, Pavel Begunkov wrote:
>> On 8/17/24 22:09, Jens Axboe wrote:
>>> On 8/17/24 3:05 PM, Pavel Begunkov wrote:
>>>> On 8/17/24 21:20, Jens Axboe wrote:
>>>>> On 8/17/24 1:44 PM, Pavel Begunkov wrote:
>>>>>>> This patchset adds a IOURING_ENTER_NO_IOWAIT flag that can be set on
>>>>>>> enter. If set, then current->in_iowait is not set. By default this flag
...
>> And that "use case" for iowait directly linked to cpufreq, so
>> if it still counts, then we shouldn't be separating stats from
>> cpufreq at all.
> 
> This is what the cpufreq people want to do anyway, so it'll probably
> happen whether we like it or not.

Not against it, quite the opposite


>>> case, yet I think we should cater to it as it very well could be legit,
>>> just in the tiny minority of cases.
>>
>> I explained why it's a confusing feature. We can make up some niche
>> case (with enough of imagination we can justify basically anything),
>> but I explained why IMHO accounting flag (let's forget about
>> cpufreq) would have net negative effect. A sysctl knob would be
>> much more reasonable, but I don't think it's needed at all.
> 
> The main thing for me is policy vs flexibility. The fact that boost and
> iowait accounting is currently tied together is pretty ugly and will
> hopefully go away with my patches.
> 
>>>>> It's really simple for this stuff - the freq boost is useful (and
>>>>> needed) for some workloads, and the iowait accounting is never useful
>>>>> for anything but (currently) comes as an unfortunate side effect of the
>>>>> former. But even with those two separated, there are still going to be
>>>>> cases where you want to control when it happens.
>>>>
>>>> You can imagine such cases, but in reality I doubt it. If we
>>>> disable the stat part, nobody would notice as nobody cared for
>>>> last 3-4 years before in_iowait was added.
>>>
>>> That would be ideal. You're saying Jamal's complaint was purely iowait
>>> based? Because it looked like power concerns to me... If it's just
>>> iowait, then they just need to stop looking at that, that's pretty
>>> simple.
>>
>> Power consumption, and then, in search of what's wrong, it was
>> correlated to high iowait as well as difference in C state stats.
> 
> But this means that it was indeed power consumption, and iowait was just
> the canary in the coal mine that lead them down the right path.
> 
> And this in turn means that even with the split, we want to
> differentiate between short/busty sleeps and longer ones.

That's what I've been talking about since a couple of months ago,
for networking we have a well measured energy consumption
regression because of iowait, not like we can just leave it as
it is now. And For the lack of a good way to auto tune in the
kernel, an enter flag (described as a performance feature) looks
good, I agree.

...
>>>
>>>> The name might also be confusing. We need an explanation when
>>>> it could be useful, and name it accordingly. DEEP/SHALLOW_WAIT?
>>>> Do you remember how cpufreq accounts for it?
>>>
>>> I don't remember how it accounts for it, and was just pondering that
>>> with the above reply. Because if it just decays the sleep state, then
>>> you could just use it generically. If it stays high regardless of how
>>> long you wait, then it could be a power issue. Not on servers really
>>> (well a bit, depending on boosting), but more so on desktop apps.
>>> Laptops tend to be pretty power conservative!
>>
>> {SHORT,BRIEF}/LONG_WAIT maybe?
> 
> I think that's a lot more descriptive. Ideally we'd want to tie this to
> wakeup latencies, eg we'd need to know about wakeup latencies. For
> example, if the user asks for a 100 usec wait, we'd want to influence
> what sleep state is picked in propagating that information. Things like
> the min-wait I posted would directly work for that, as it tells the
> story in two chapters on what waits we're expecting here. Currently
> there's no way to do that (hence iowait -> cpufreq boosting), but there
> clearly should be. Or even without min-wait, the timeout is clearly
> known here, combined with the expected/desired number of events the
> application is looking for.

Yeah, interesting, we can auto apply it depending on the delta
time, etc. Might worth to ask the cpufreq guys about thresholds.

-- 
Pavel Begunkov

      reply	other threads:[~2024-08-18  2:27 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16 22:36 [PATCH v2] io_uring: add IORING_ENTER_NO_IOWAIT to not set in_iowait David Wei
2024-08-16 22:49 ` Jens Axboe
2024-08-17  1:23 ` Jeff Moyer
2024-08-19 23:03   ` David Wei
2024-08-17 19:44 ` Pavel Begunkov
2024-08-17 20:20   ` Jens Axboe
2024-08-17 21:05     ` Pavel Begunkov
2024-08-17 21:09       ` Jens Axboe
2024-08-17 22:04         ` Pavel Begunkov
2024-08-18  1:08           ` Jens Axboe
2024-08-18  2:27             ` Pavel Begunkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox