public inbox for [email protected]
 help / color / mirror / Atom feed
From: Andres Freund <[email protected]>
To: Christian Loehle <[email protected]>
Cc: Quentin Perret <[email protected]>,
	 "Rafael J. Wysocki" <[email protected]>,
	[email protected], [email protected],
	 [email protected], [email protected], [email protected],
	 [email protected], [email protected],
	[email protected],  [email protected],
	[email protected], [email protected],
	 [email protected], [email protected],
	[email protected],  [email protected],
	[email protected], [email protected], [email protected]
Subject: Re: [RFC PATCH 5/8] cpufreq/schedutil: Remove iowait boost
Date: Fri, 4 Oct 2024 20:39:09 -0400	[thread overview]
Message-ID: <io3xcj5vpqbkojoktbp3fuuj77gqqkf2v3gg62i4aep4ps36dc@we2zwwp5hsyt> (raw)
In-Reply-To: <[email protected]>

Hi,


A caveat: I'm a userspace developer that occasionally strays into kernel land
(see e.g. the io_uring iowait thing). So I'm likely to get some kernel side
things wrong.


On 2024-10-03 11:30:52 +0100, Christian Loehle wrote:
> These are the main issues with transforming the existing mechanism into
> a per-task attribute.
> Almost unsolvable is: Does reducing "iowait pressure" (be it per-task or per-rq)
> actually improve throughput even (assuming for now that this throughput is
> something we care about, I'm sure you know that isn't always the case, e.g.
> background tasks). With MCQ devices and some reasonable IO workload that is
> IO-bound our iowait boosting is often just boosting CPU frequency (which uses
> power obviously) to queue in yet another request for a device which has essentially
> endless pending requests. If pending request N+1 arrives x usecs earlier or
> later at the device then makes no difference in IO throughput.

That's sometimes true, but definitely not all the time? There are plenty
workloads with low-queue-depth style IO. Which often are also rather latency
sensitive.

E.g. the device a database journal resides on will typically have a low queue
depth. It's extremely common in OLTPish workloads to be bound by the latency
of journal flushes. If, after the journal flush completes, the CPU is clocked
low and takes a while to wake up, you'll see substantially worse performance.




> If boosting would improve e.g. IOPS (of that device) is something the block layer
> (with a lot of added infrastructure, but at least in theory it would know what
> device we're iowaiting on, unlike the scheduler) could tell us about. If that is
> actually useful for user experience (i.e. worth the power) only userspace can decide
> (and then we're back at uclamp_min anyway).

I think there are many cases where userspace won't realistically be able to do
anything about that.

For one, just because, for some workload, a too deep idle state is bad during
IO, doesn't mean userspace won't ever want to clock down. And it's probably
going to be too expensive to change any attributes around idle states for
individual IOs.

Are there actually any non-privileged APIs around this that userspace *could*
even change? I'd not consider moving to busy-polling based APIs a realistic
alternative.


For many workloads cpuidle is way too aggressive dropping into lower states
*despite* iowait. But just disabling all lower idle states obviously has
undesirable energy usage implications. It surely is the answer for some
workloads, but I don't think it'd be good to promote it as the sole solution.


It's easy to under-estimate the real-world impact of a change like this. When
benchmarking we tend to see what kind of throughput we can get, by having N
clients hammering the server as fast as they can. But in the real world that's
pretty rare for anything latency sensitive to go full blast - rather there's a
rate of requests incoming and that the clients are sensitive to requests being
processed more slowly.


That's not to say that the current situation can't be improved - I've seen way
too many workloads where the only ways to get decent performance were one of:

- disable most idle states (via sysfs or /dev/cpu_dma_latency)
- just have busy loops when idling - doesn't work when doing synchronous
  syscalls that block though
- have some lower priority tasks scheduled that just burns CPU

I'm just worried that removing iowait will make this worse.

Greetings,

Andres Freund

  reply	other threads:[~2024-10-05  0:39 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-05  9:26 [RFT RFC PATCH 0/8] cpufreq: cpuidle: Remove iowait behaviour Christian Loehle
2024-09-05  9:26 ` [RFC PATCH 1/8] cpuidle: menu: Remove iowait influence Christian Loehle
2024-09-30 14:58   ` Rafael J. Wysocki
2024-09-05  9:26 ` [RFC PATCH 2/8] cpuidle: Prefer teo over menu governor Christian Loehle
2024-09-30 15:06   ` Rafael J. Wysocki
2024-09-30 16:12     ` Christian Loehle
2024-09-30 16:42       ` Rafael J. Wysocki
2024-09-05  9:26 ` [RFC PATCH 3/8] TEST: cpufreq/schedutil: Linear iowait boost step Christian Loehle
2024-09-05  9:26 ` [RFC PATCH 4/8] TEST: cpufreq/schedutil: iowait boost cap sysfs Christian Loehle
2024-09-05  9:26 ` [RFC PATCH 5/8] cpufreq/schedutil: Remove iowait boost Christian Loehle
2024-09-30 16:34   ` Rafael J. Wysocki
2024-10-03  9:10     ` Christian Loehle
2024-10-03  9:47     ` Quentin Perret
2024-10-03 10:30       ` Christian Loehle
2024-10-05  0:39         ` Andres Freund [this message]
2024-10-09  9:54           ` Christian Loehle
2024-09-05  9:26 ` [RFC PATCH 6/8] cpufreq: intel_pstate: " Christian Loehle
2024-09-12 11:22   ` [RFC PATCH] TEST: cpufreq: intel_pstate: sysfs iowait_boost_cap Christian Loehle
2024-09-30 18:03   ` [RFC PATCH 6/8] cpufreq: intel_pstate: Remove iowait boost Rafael J. Wysocki
2024-09-30 20:35     ` srinivas pandruvada
2024-10-01  9:57       ` Christian Loehle
2024-10-01 14:46         ` srinivas pandruvada
2024-09-05  9:26 ` [RFC PATCH 7/8] cpufreq: Remove SCHED_CPUFREQ_IOWAIT update Christian Loehle
2024-09-05  9:26 ` [RFC PATCH 8/8] io_uring: Do not set iowait before sleeping Christian Loehle
2024-09-05 12:31 ` [RFT RFC PATCH 0/8] cpufreq: cpuidle: Remove iowait behaviour Christian Loehle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=io3xcj5vpqbkojoktbp3fuuj77gqqkf2v3gg62i4aep4ps36dc@we2zwwp5hsyt \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox