public inbox for [email protected]
 help / color / mirror / Atom feed
From: Xiaobing Li <[email protected]>
To: [email protected]
Cc: [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected]
Subject: Re: Re: [PATCH v6] io_uring: Statistics of the true utilization of sq threads.
Date: Wed, 17 Jan 2024 16:37:06 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 1/12/24 2:58 AM, Jens Axboe wrote:
>On 1/11/24 6:12 PM, Xiaobing Li wrote:
>> On 1/10/24 16:15 AM, Jens Axboe wrote:
>>> On 1/10/24 2:05 AM, Xiaobing Li wrote:
>>>> On 1/5/24 04:02 AM, Pavel Begunkov wrote:
>>>>> On 1/3/24 05:49, Xiaobing Li wrote:
>>>>>> On 12/30/23 9:27 AM, Pavel Begunkov wrote:
>>>>>>> Why it uses jiffies instead of some task run time?
>>>>>>> Consequently, why it's fine to account irq time and other
>>>>>>> preemption? (hint, it's not)
>>>>>>>
>>>>>>> Why it can't be done with userspace and/or bpf? Why
>>>>>>> can't it be estimated by checking and tracking
>>>>>>> IORING_SQ_NEED_WAKEUP in userspace?
>>>>>>>
>>>>>>> What's the use case in particular? Considering that
>>>>>>> one of the previous revisions was uapi-less, something
>>>>>>> is really fishy here. Again, it's a procfs file nobody
>>>>>>> but a few would want to parse to use the feature.
>>>>>>>
>>>>>>> Why it just keeps aggregating stats for the whole
>>>>>>> life time of the ring? If the workload changes,
>>>>>>> that would either totally screw the stats or would make
>>>>>>> it too inert to be useful. That's especially relevant
>>>>>>> for long running (days) processes. There should be a
>>>>>>> way to reset it so it starts counting anew.
>>>>>>
>>>>>> Hi, Jens and Pavel,
>>>>>> I carefully read the questions you raised.
>>>>>> First of all, as to why I use jiffies to statistics time, it
>>>>>> is because I have done some performance tests and found that
>>>>>> using jiffies has a relatively smaller loss of performance
>>>>>> than using task run time. Of course, using task run time is
>>>>>
>>>>> How does taking a measure for task runtime looks like? I expect it to
>>>>> be a simple read of a variable inside task_struct, maybe with READ_ONCE,
>>>>> in which case the overhead shouldn't be realistically measurable. Does
>>>>> it need locking?
>>>>
>>>> The task runtime I am talking about is similar to this:
>>>> start = get_system_time(current);
>>>> do_io_part();
>>>> sq->total_time += get_system_time(current) - start;
>>>
>>> Not sure what get_system_time() is, don't see that anywhere.
>>>
>>>> Currently, it is not possible to obtain the execution time of a piece of 
>>>> code by a simple read of a variable inside task_struct. 
>>>> Or do you have any good ideas?
>>>
>>> I must be missing something, because it seems like all you need is to
>>> read task->stime? You could possible even make do with just logging busy
>>> loop time, as getrusage(RUSAGE_THREAD, &stat) from userspace would then
>>> give you the total time.
>>>
>>> stat.ru_stime would then be the total time, the thread ran, and
>>> 1 - (above_busy_stime / stat.ru_stime) would give you the time the
>>> percentage of time the thread ran and did useful work (eg not busy
>>> looping.
>> 
>> getrusage can indeed get the total time of the thread, but this
>> introduces an extra function call, which is relatively more
>> complicated than defining a variable. In fact, recording the total
>> time of the loop and the time of processing the IO part can achieve
>> our observation purpose. Recording only two variables will have less
>> impact on the existing performance, so why not  choose a simpler and
>> effective method.
>
>I'm not opposed to exposing both of them, it does make the API simpler.
>If we can call it an API... I think the main point was using task->stime
>for it rather than jiffies etc.

Hi, Jens and Pavel
I modified the code according to your opinions.

I got the total time of the sqpoll thread through getrusage. 
eg:

fdinfo.c:
+long sq_total_time = 0;
+long sq_work_time = 0;
if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
	struct io_sq_data *sq = ctx->sq_data;

	sq_pid = sq->task_pid;
	sq_cpu = sq->sq_cpu;
+	struct rusage r;
+	getrusage(sq->thread, RUSAGE_SELF, &r);
+	sq_total_time = r.ru_stime.tv_sec * 1000000 + r.ru_stime.tv_usec;
+	sq_work_time = sq->work_time;
}

seq_printf(m, "SqThread:\t%d\n", sq_pid);
seq_printf(m, "SqThreadCpu:\t%d\n", sq_cpu);
+seq_printf(m, "SqTotalTime:\t%ldus\n", sq_total_time);
+seq_printf(m, "SqWorkTime:\t%ldus\n", sq_work_time);
seq_printf(m, "UserFiles:\t%u\n", ctx->nr_user_files);

The working time of the sqpoll thread is obtained through ktime_get().
eg:

sqpoll.c:
+ktime_t start, diff;
+start = ktime_get();
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
	int ret = __io_sq_thread(ctx, cap_entries);

	if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
		sqt_spin = true;
}
if (io_run_task_work())
	sqt_spin = true;

+diff = ktime_sub(ktime_get(), start);
+if (sqt_spin == true)
+	sqd->work_time += ktime_to_us(diff);

The test results are as follows:
Every 2.0s: cat /proc/9230/fdinfo/6 | grep -E Sq
SqMask: 0x3
SqHead: 3197153
SqTail: 3197153
CachedSqHead:   3197153
SqThread:       9231
SqThreadCpu:    11
SqTotalTime:    92215321us
SqWorkTime:     15106578us

Do you think this solution work?


  parent reply	other threads:[~2024-01-17  9:39 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20231225055252epcas5p43ae8016d329b160f688def7b4f9d4ddb@epcas5p4.samsung.com>
2023-12-25  5:44 ` [PATCH v6] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
2023-12-26 16:32   ` Jens Axboe
2023-12-30 16:27     ` Pavel Begunkov
2023-12-30 17:41       ` Jens Axboe
2023-12-30 21:06         ` Pavel Begunkov
2023-12-30 22:17           ` Jens Axboe
2023-12-30 23:17             ` Pavel Begunkov
2023-12-30 23:24               ` Jens Axboe
     [not found]       ` <CGME20240103055746epcas5p148c2b06032e09956ddcfc72894abc82a@epcas5p1.samsung.com>
2024-01-03  5:49         ` Xiaobing Li
2024-01-05  4:02           ` Pavel Begunkov
     [not found]             ` <CGME20240110091327epcas5p493e0d77a122a067b6cd41ecbf92bd6eb@epcas5p4.samsung.com>
2024-01-10  9:05               ` Xiaobing Li
2024-01-10 16:15                 ` Jens Axboe
     [not found]                   ` <CGME20240112012013epcas5p38c70493069fb14da02befcf25e604bc1@epcas5p3.samsung.com>
2024-01-12  1:12                     ` Xiaobing Li
2024-01-12  2:58                       ` Jens Axboe
     [not found]                         ` <CGME20240117084516epcas5p2f0961781ff761ac3a3794c5ea80df45f@epcas5p2.samsung.com>
2024-01-17  8:37                           ` Xiaobing Li [this message]
2024-01-17 23:04                             ` Jens Axboe
     [not found]                               ` <CGME20240118023341epcas5p37b8c206d763fd56f8a9cfb3193744124@epcas5p3.samsung.com>
2024-01-18  2:25                                 ` Xiaobing Li
2024-01-18  2:56                             ` Pavel Begunkov
2024-01-11 13:12                 ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox