public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>,
	Xiaobing Li <[email protected]>
Cc: [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v6] io_uring: Statistics of the true utilization of sq threads.
Date: Sat, 30 Dec 2023 16:24:23 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 12/30/23 4:17 PM, Pavel Begunkov wrote:
> On 12/30/23 22:17, Jens Axboe wrote:
>> On 12/30/23 2:06 PM, Pavel Begunkov wrote:
>>> On 12/30/23 17:41, Jens Axboe wrote:
>>>> On 12/30/23 9:27 AM, Pavel Begunkov wrote:
>>>>> On 12/26/23 16:32, Jens Axboe wrote:
>>>>>>
>>>>>> On Mon, 25 Dec 2023 13:44:38 +0800, Xiaobing Li wrote:
>>>>>>> Count the running time and actual IO processing time of the sqpoll
>>>>>>> thread, and output the statistical data to fdinfo.
>>>>>>>
>>>>>>> Variable description:
>>>>>>> "work_time" in the code represents the sum of the jiffies of the sq
>>>>>>> thread actually processing IO, that is, how many milliseconds it
>>>>>>> actually takes to process IO. "total_time" represents the total time
>>>>>>> that the sq thread has elapsed from the beginning of the loop to the
>>>>>>> current time point, that is, how many milliseconds it has spent in
>>>>>>> total.
>>>>>>>
>>>>>>> [...]
>>>>>>
>>>>>> Applied, thanks!
>>>>>>
>>>>>> [1/1] io_uring: Statistics of the true utilization of sq threads.
>>>>>>          commit: 9f7e5872eca81d7341e3ec222ebdc202ff536655
>>>>>
>>>>> I don't believe the patch is near complete, there are still
>>>>> pending question that the author ignored (see replies to
>>>>> prev revisions).
>>>>
>>>> We can drop and defer, that's not an issue. It's still sitting top of
>>>> branch.
>>>>
>>>> Can you elaborate on the pending questions?
>>>
>>> I guess that wasn't clear, but I duplicated all of them in the
>>> email you're replying to for convenience
>>>
>>>>> Why it uses jiffies instead of some task run time?
>>>>> Consequently, why it's fine to account irq time and other
>>>>> preemption? (hint, it's not)
>>>>
>>>> Yeah that's a good point, might be better to use task run time. Jiffies
>>>> is also an annoying metric to expose, as you'd need to then get the tick
>>>> rate as well. Though I suspect the ratio is the interesting bit here.
>>>
>>> I agree that seconds are nicer, but that's not my point. That's
>>> not about jiffies, but that the patch keeps counting regardless
>>> whether the SQ task was actually running, or the CPU was serving
>>> irq, or even if it was force descheduled.
>>
>> Right, guess I wasn't clear, I did very much agree with using task run
>> time to avoid cases like that where it's perceived running, but really
>> isn't. For example.
>>
>>> I even outlined what a solution may look like, i.e. replace jiffies
>>> with task runtime, which should already be counted in the task.
>>
>> Would be a good change to make. And to be fair, I guess they originally
>> wanted something like that, as the very first patch had some scheduler
>> interactions. Just wasn't done quite right.
> 
> Right, just like what the v1 was doing but without touching
> core/sched.

Yep

>>>>> Why it can't be done with userspace and/or bpf? Why
>>>>> can't it be estimated by checking and tracking
>>>>> IORING_SQ_NEED_WAKEUP in userspace?
>>>>
>>>> Asking people to integrate bpf for this is a bit silly imho. Tracking
>>>
>>> I haven't seen any mention of the real use case, did I miss it?
>>> Because otherwise I fail to see how it can possibly be called
>>> silly when it's not clear how exactly it's used.
>>>
>>> Maybe it's a bash program printing stats to a curious user? Or
>>> maybe it's to track once at start, and then nobody cares about
>>> it, in which case NEED_WAKEUP would be justified.
>>>
>>> I can guess it's for adjusting the sq timeouts, but who knows.
>>
>> I only know what is in those threads, but the most obvious use case
>> would indeed be to vet the efficiency of the chosen timeout value and
>> balance cpu usage with latency like that.
>>
>>>> NEED_WAKEUP is also quite cumbersome and would most likely be higher
>>>> overhead as well.
>>>
>>> Comparing to reading a procfs file or doing an io_uring
>>> register syscall? I doubt that. It's also not everyone
>>> would be using that.
>>
>> What's the proposed integration to make NEED_WAKEUP sampling work? As
>> far as I can tell, you'd need to either do that kind of accounting every
>> time you do io_uring_submit(), or make it conditional which would then
>> at least still have a branch.
>>
>> The kernel side would obviously not be free either, but at least it
>> would be restricted to the SQPOLL side of things and not need to get
>> entangled with the general IO path that doesn't use SQPOLL.
>>
>> If we put it in there and have some way to enable/query/disable, then it
>> least it would just be a branch or two in there rather than in the
>> generic path.
> 
> It can be in the app without ever touching liburing/kernel. E.g. you
> can binary search the idle time, minimising it, while looking that
> sqpoll doesn't go to sleep too often topping with other conditions.
> Another stat you can get right away is to compare the current idle
> time with the total time since last wake up (which should be idle +
> extra).

Sure, it's more a question of if you want a simple to query metric, or
build something a lot more involved to do it. Guess it depends on the
use case. I think exposing a metric makes sense, as it'll provide an
easy way for an app to gauge efficiency of it and whether it'd make
sense to tweak settings. But I guess then what comes next is a way to
modify the timeout setting at runtime...

>>>>> What's the use case in particular? Considering that
>>>>> one of the previous revisions was uapi-less, something
>>>>> is really fishy here. Again, it's a procfs file nobody
>>>>> but a few would want to parse to use the feature.
>>>>
>>>> I brought this up earlier too, fdinfo is not a great API. For anything,
>>>> really.
>>>
>>> I saw that comment, that's why I mentioned, but the
>>> point is that I have doubts the author is even using
>>> the uapi.
>>
>> Not sure I follow... If they aren't using the API, what's the point of
>> the patch? Or are you questioning whether this is being done for an
>> actual use case, or just as a "why not, might be handy" kind of thing?
> I assume there is a use case, which hasn't been spelled out
> though AFAIK, but as a matter of fact earlier patches had no uapi.
> 
> https://lore.kernel.org/all/[email protected]/

Might be because they already have some patches exposing these metrics.
But ->

> and later patches use a suspiciously inconvenient interface,
> i.e. /proc. It's a good question how it was supposed to be used
> (and tested), but I have only guesses. Private patches? A custom
> module? Or maybe a genuine mistake, which is why I'm still very
> curious hearing about the application.

Agree, Xiaobing, can you expand on the actual use case here? How are you
intending to use this metric?

-- 
Jens Axboe


  reply	other threads:[~2023-12-30 23:24 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20231225055252epcas5p43ae8016d329b160f688def7b4f9d4ddb@epcas5p4.samsung.com>
2023-12-25  5:44 ` [PATCH v6] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
2023-12-26 16:32   ` Jens Axboe
2023-12-30 16:27     ` Pavel Begunkov
2023-12-30 17:41       ` Jens Axboe
2023-12-30 21:06         ` Pavel Begunkov
2023-12-30 22:17           ` Jens Axboe
2023-12-30 23:17             ` Pavel Begunkov
2023-12-30 23:24               ` Jens Axboe [this message]
     [not found]       ` <CGME20240103055746epcas5p148c2b06032e09956ddcfc72894abc82a@epcas5p1.samsung.com>
2024-01-03  5:49         ` Xiaobing Li
2024-01-05  4:02           ` Pavel Begunkov
     [not found]             ` <CGME20240110091327epcas5p493e0d77a122a067b6cd41ecbf92bd6eb@epcas5p4.samsung.com>
2024-01-10  9:05               ` Xiaobing Li
2024-01-10 16:15                 ` Jens Axboe
     [not found]                   ` <CGME20240112012013epcas5p38c70493069fb14da02befcf25e604bc1@epcas5p3.samsung.com>
2024-01-12  1:12                     ` Xiaobing Li
2024-01-12  2:58                       ` Jens Axboe
     [not found]                         ` <CGME20240117084516epcas5p2f0961781ff761ac3a3794c5ea80df45f@epcas5p2.samsung.com>
2024-01-17  8:37                           ` Xiaobing Li
2024-01-17 23:04                             ` Jens Axboe
     [not found]                               ` <CGME20240118023341epcas5p37b8c206d763fd56f8a9cfb3193744124@epcas5p3.samsung.com>
2024-01-18  2:25                                 ` Xiaobing Li
2024-01-18  2:56                             ` Pavel Begunkov
2024-01-11 13:12                 ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox