* [PATCH 0/3] Sq thread real utilization statistics.
[not found] <CGME20230928023004epcas5p4a6dac4cda9c867f3673690521a8c18b6@epcas5p4.samsung.com>
@ 2023-09-28 2:22 ` Xiaobing Li
[not found] ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com>
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Xiaobing Li @ 2023-09-28 2:22 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li
Summary:
The current kernel's pelt scheduling algorithm is calculated based on
the running time of the thread. However, this algorithm may cause a
waste of CPU resources for some threads, such as the sq thread in
io_uring.
Since the sq thread has a while(1) structure, during this process, there
may be a lot of time when IO is not processed but the timeout period is
not exceeded, so the sqpoll thread will keep running, thus occupying the
CPU. Obviously, the CPU is wasted at this time.
our goal is to count the part of the time the sqpoll thread actually
processes IO, thereby reflecting the part of its CPU used to process IO,
which can be used to help improve the actual utilization of the CPU in
the future.
Modifications to the scheduling module are also applicable to other
threads with the same needs.
We use fio (version 3.28) to test the performance. In the experiments,
an fio process are viewed as an application, it starts job with sq_poll
enabled. The tests are performed on a host with 256 CPUs and 64G memory,
the IO tasks are performed on a PM1743 SSD, and the OS is Ubuntu 22.04
with kernel version of 6.4.0.
Some parameters for sequential reading and writing are as follows:
bs=128k, numjobs=1, iodepth=64.
Some parameters for random reading and writing are as follows:
bs=4k, numjobs=16, iodepth=64.
The test results are as follows:
Before modification
read write randread randwrite
IOPS(K) 53.7 46.1 849 293
BW(MB/S) 7033 6037 3476 1199
After modification
read write randread randwrite
IOPS(K) 53.7 46.1 847 293
BW(MB/S) 7033 6042 3471 1199
It can be seen from the test results that my modifications have almost
no impact on performance.
Xiaobing Li (3):
SCHEDULER: Add an interface for counting real utilization.
PROC FILESYSTEM: Add real utilization data of sq thread.
IO_URING: Statistics of the true utilization of sq threads.
fs/proc/stat.c | 25 ++++++++++++++++++++++++-
include/linux/kernel.h | 7 ++++++-
include/linux/kernel_stat.h | 3 +++
include/linux/sched.h | 1 +
io_uring/sqpoll.c | 26 +++++++++++++++++++++++++-
kernel/sched/cputime.c | 36 +++++++++++++++++++++++++++++++++++-
kernel/sched/pelt.c | 14 ++++++++++++++
7 files changed, 108 insertions(+), 4 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
[not found] ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com>
@ 2023-09-28 2:22 ` Xiaobing Li
2023-09-28 7:52 ` kernel test robot
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Xiaobing Li @ 2023-09-28 2:22 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li
Since pelt takes the running time of the thread as utilization, and for
some threads, although they are running, they are not actually
processing any transactions and are in an idling state.
our goal is to count the effective working time of the thread, so as to
Calculate the true utilization of threads.
Signed-off-by: Xiaobing Li <[email protected]>
---
include/linux/kernel.h | 7 ++++++-
include/linux/sched.h | 1 +
kernel/sched/cputime.c | 36 +++++++++++++++++++++++++++++++++++-
kernel/sched/pelt.c | 14 ++++++++++++++
4 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index cee8fe87e9f4..c1557fa9cbbe 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -37,7 +37,8 @@
#include <uapi/linux/kernel.h>
#define STACK_MAGIC 0xdeadbeef
-
+struct cfs_rq;
+struct sched_entity;
/**
* REPEAT_BYTE - repeat the value @x multiple times as an unsigned long value
* @x: value to repeat
@@ -103,6 +104,10 @@ extern int __cond_resched(void);
#elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
+extern void __update_sq_avg_block(u64 now, struct sched_entity *se);
+
+extern void __update_sq_avg(u64 now, struct sched_entity *se);
+
extern int __cond_resched(void);
DECLARE_STATIC_CALL(might_resched, __cond_resched);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..403ccb456c9a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -583,6 +583,7 @@ struct sched_entity {
* collide with read-mostly values above.
*/
struct sched_avg avg;
+ struct sched_avg sq_avg;
#endif
};
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index af7952f12e6c..824203293fd9 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -479,6 +479,40 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
#else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE: */
+void get_sqthread_util(struct task_struct *p)
+{
+ struct task_struct **sqstat = kcpustat_this_cpu->sq_util;
+
+ for (int i = 0; i < MAX_SQ_NUM; i++) {
+ if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p)
+ || sqstat[i]->__state == TASK_DEAD))
+ sqstat[i] = NULL;
+ }
+
+ if (strncmp(p->comm, "iou-sqp", 7))
+ return;
+
+ if (!kcpustat_this_cpu->flag) {
+ for (int j = 0; j < MAX_SQ_NUM; j++)
+ kcpustat_this_cpu->sq_util[j] = NULL;
+ kcpustat_this_cpu->flag = true;
+ }
+ int index = MAX_SQ_NUM;
+ bool flag = true;
+
+ for (int i = 0; i < MAX_SQ_NUM; i++) {
+ if (sqstat[i] == p)
+ flag = false;
+ if (!sqstat[i] || task_cpu(sqstat[i]) != task_cpu(p)) {
+ sqstat[i] = NULL;
+ if (i < index)
+ index = i;
+ }
+ }
+ if (flag && index < MAX_SQ_NUM)
+ sqstat[index] = p;
+}
+
/*
* Account a single tick of CPU time.
* @p: the process that the CPU time gets accounted to
@@ -487,7 +521,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
void account_process_tick(struct task_struct *p, int user_tick)
{
u64 cputime, steal;
-
+ get_sqthread_util(p);
if (vtime_accounting_enabled_this_cpu())
return;
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 0f310768260c..945efe80e08c 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -266,6 +266,20 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load)
WRITE_ONCE(sa->util_avg, sa->util_sum / divider);
}
+void __update_sq_avg_block(u64 now, struct sched_entity *se)
+{
+ if (___update_load_sum(now, &se->sq_avg, 0, 0, 0))
+ ___update_load_avg(&se->sq_avg, se_weight(se));
+}
+
+void __update_sq_avg(u64 now, struct sched_entity *se)
+{
+ struct cfs_rq *qcfs_rq = cfs_rq_of(se);
+
+ if (___update_load_sum(now, &se->sq_avg, !!se->on_rq, se_runnable(se), qcfs_rq->curr == se))
+ ___update_load_avg(&se->sq_avg, se_weight(se));
+}
+
/*
* sched_entity:
*
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] PROC FILESYSTEM: Add real utilization data of sq thread.
[not found] ` <CGME20230928023011epcas5p32668b193b447f1cd6bf78035f4894c42@epcas5p3.samsung.com>
@ 2023-09-28 2:22 ` Xiaobing Li
0 siblings, 0 replies; 12+ messages in thread
From: Xiaobing Li @ 2023-09-28 2:22 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li
Since it is necessary to count and record the real utilization of the sq
thread, it is recorded in the /proc/stat file.
Signed-off-by: Xiaobing Li <[email protected]>
---
fs/proc/stat.c | 25 ++++++++++++++++++++++++-
include/linux/kernel_stat.h | 3 +++
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index da60956b2915..bd86c0657874 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -79,6 +79,29 @@ static void show_all_irqs(struct seq_file *p)
show_irq_gap(p, nr_irqs - next);
}
+static void show_sqthread_util(struct seq_file *p)
+{
+ int i, j;
+
+ for_each_online_cpu(i) {
+ struct kernel_cpustat kcpustat;
+
+ kcpustat_cpu_fetch(&kcpustat, i);
+ struct task_struct **sqstat = kcpustat.sq_util;
+
+ for (j = 0; j < MAX_SQ_NUM; j++) {
+ if (sqstat[j]) {
+ seq_printf(p, "%d %s", sqstat[j]->pid, sqstat[j]->comm);
+ seq_put_decimal_ull(p, " pelt ",
+ (unsigned long long)sqstat[j]->se.avg.util_avg);
+ seq_put_decimal_ull(p, " real ",
+ (unsigned long long)sqstat[j]->se.sq_avg.util_avg);
+ seq_putc(p, '\n');
+ }
+ }
+ }
+}
+
static int show_stat(struct seq_file *p, void *v)
{
int i, j;
@@ -187,7 +210,7 @@ static int show_stat(struct seq_file *p, void *v)
for (i = 0; i < NR_SOFTIRQS; i++)
seq_put_decimal_ull(p, " ", per_softirq_sums[i]);
seq_putc(p, '\n');
-
+ show_sqthread_util(p);
return 0;
}
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 9935f7ecbfb9..722703da681e 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -11,6 +11,7 @@
#include <linux/vtime.h>
#include <asm/irq.h>
+#define MAX_SQ_NUM 16
/*
* 'kernel_stat.h' contains the definitions needed for doing
* some kernel statistics (CPU usage, context switches ...),
@@ -36,6 +37,8 @@ enum cpu_usage_stat {
struct kernel_cpustat {
u64 cpustat[NR_STATS];
+ bool flag;
+ struct task_struct *sq_util[MAX_SQ_NUM];
};
struct kernel_stat {
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
[not found] ` <CGME20230928023015epcas5p273b3eaebf3759790c278b03c7f0341c8@epcas5p2.samsung.com>
@ 2023-09-28 2:22 ` Xiaobing Li
2023-09-28 8:01 ` Peter Zijlstra
2023-09-28 18:33 ` kernel test robot
0 siblings, 2 replies; 12+ messages in thread
From: Xiaobing Li @ 2023-09-28 2:22 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li
Since the sq thread has a while(1) structure, during this process, there
may be a lot of time that is not processing IO but does not exceed the
timeout period, therefore, the sqpoll thread will keep running and will
keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
goal is to count the part of the time that the sqpoll thread actually
processes IO, so as to reflect the part of the CPU it uses to process
IO, which can be used to help improve the actual utilization of the CPU
in the future.
Signed-off-by: Xiaobing Li <[email protected]>
---
io_uring/sqpoll.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index bd6c2c7959a5..2c5fc4d95fa8 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -11,6 +11,7 @@
#include <linux/audit.h>
#include <linux/security.h>
#include <linux/io_uring.h>
+#include <linux/time.h>
#include <uapi/linux/io_uring.h>
@@ -235,6 +236,10 @@ static int io_sq_thread(void *data)
set_cpus_allowed_ptr(current, cpu_online_mask);
mutex_lock(&sqd->lock);
+ bool first = true;
+ struct timespec64 ts_start, ts_end;
+ struct timespec64 ts_delta;
+ struct sched_entity *se = &sqd->thread->se;
while (1) {
bool cap_entries, sqt_spin = false;
@@ -243,7 +248,16 @@ static int io_sq_thread(void *data)
break;
timeout = jiffies + sqd->sq_thread_idle;
}
-
+ ktime_get_boottime_ts64(&ts_start);
+ ts_delta = timespec64_sub(ts_start, ts_end);
+ unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
+ se->sq_avg.last_update_time;
+
+ if (first) {
+ now = 0;
+ first = false;
+ }
+ __update_sq_avg_block(now, se);
cap_entries = !list_is_singular(&sqd->ctx_list);
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
int ret = __io_sq_thread(ctx, cap_entries);
@@ -251,6 +265,16 @@ static int io_sq_thread(void *data)
if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
sqt_spin = true;
}
+
+ ktime_get_boottime_ts64(&ts_end);
+ ts_delta = timespec64_sub(ts_end, ts_start);
+ now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
+ se->sq_avg.last_update_time;
+
+ if (sqt_spin)
+ __update_sq_avg(now, se);
+ else
+ __update_sq_avg_block(now, se);
if (io_run_task_work())
sqt_spin = true;
--
2.34.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
2023-09-28 2:22 ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li
@ 2023-09-28 7:52 ` kernel test robot
2023-09-28 8:02 ` Peter Zijlstra
2023-09-29 0:31 ` Thomas Gleixner
2 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-09-28 7:52 UTC (permalink / raw)
To: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
Cc: oe-kbuild-all, linux-kernel, linux-fsdevel, io-uring, kun.dou,
peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang,
Xiaobing Li
Hi Xiaobing,
kernel test robot noticed the following build warnings:
[auto build test WARNING on tip/sched/core]
[also build test WARNING on linus/master v6.6-rc3 next-20230928]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Xiaobing-Li/SCHEDULER-Add-an-interface-for-counting-real-utilization/20230928-103219
base: tip/sched/core
patch link: https://lore.kernel.org/r/20230928022228.15770-2-xiaobing.li%40samsung.com
patch subject: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230928/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230928/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All warnings (new ones prefixed by >>):
In file included from kernel/sched/build_policy.c:52:
>> kernel/sched/cputime.c:482:6: warning: no previous prototype for 'get_sqthread_util' [-Wmissing-prototypes]
482 | void get_sqthread_util(struct task_struct *p)
| ^~~~~~~~~~~~~~~~~
kernel/sched/cputime.c: In function 'get_sqthread_util':
kernel/sched/cputime.c:484:56: error: 'struct kernel_cpustat' has no member named 'sq_util'
484 | struct task_struct **sqstat = kcpustat_this_cpu->sq_util;
| ^~
kernel/sched/cputime.c:486:29: error: 'MAX_SQ_NUM' undeclared (first use in this function)
486 | for (int i = 0; i < MAX_SQ_NUM; i++) {
| ^~~~~~~~~~
kernel/sched/cputime.c:486:29: note: each undeclared identifier is reported only once for each function it appears in
kernel/sched/cputime.c:495:31: error: 'struct kernel_cpustat' has no member named 'flag'
495 | if (!kcpustat_this_cpu->flag) {
| ^~
kernel/sched/cputime.c:497:42: error: 'struct kernel_cpustat' has no member named 'sq_util'
497 | kcpustat_this_cpu->sq_util[j] = NULL;
| ^~
kernel/sched/cputime.c:498:34: error: 'struct kernel_cpustat' has no member named 'flag'
498 | kcpustat_this_cpu->flag = true;
| ^~
vim +/get_sqthread_util +482 kernel/sched/cputime.c
481
> 482 void get_sqthread_util(struct task_struct *p)
483 {
484 struct task_struct **sqstat = kcpustat_this_cpu->sq_util;
485
486 for (int i = 0; i < MAX_SQ_NUM; i++) {
487 if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p)
488 || sqstat[i]->__state == TASK_DEAD))
489 sqstat[i] = NULL;
490 }
491
492 if (strncmp(p->comm, "iou-sqp", 7))
493 return;
494
495 if (!kcpustat_this_cpu->flag) {
496 for (int j = 0; j < MAX_SQ_NUM; j++)
497 kcpustat_this_cpu->sq_util[j] = NULL;
498 kcpustat_this_cpu->flag = true;
499 }
500 int index = MAX_SQ_NUM;
501 bool flag = true;
502
503 for (int i = 0; i < MAX_SQ_NUM; i++) {
504 if (sqstat[i] == p)
505 flag = false;
506 if (!sqstat[i] || task_cpu(sqstat[i]) != task_cpu(p)) {
507 sqstat[i] = NULL;
508 if (i < index)
509 index = i;
510 }
511 }
512 if (flag && index < MAX_SQ_NUM)
513 sqstat[index] = p;
514 }
515
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
2023-09-28 2:22 ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li
@ 2023-09-28 8:01 ` Peter Zijlstra
2023-09-28 8:23 ` Jens Axboe
2023-09-28 8:37 ` Matthew Wilcox
2023-09-28 18:33 ` kernel test robot
1 sibling, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2023-09-28 8:01 UTC (permalink / raw)
To: Xiaobing Li
Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
bsegall, mgorman, bristot, vschneid, axboe, asml.silence,
linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang
On Thu, Sep 28, 2023 at 10:22:28AM +0800, Xiaobing Li wrote:
> Since the sq thread has a while(1) structure, during this process, there
> may be a lot of time that is not processing IO but does not exceed the
> timeout period, therefore, the sqpoll thread will keep running and will
> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
> goal is to count the part of the time that the sqpoll thread actually
> processes IO, so as to reflect the part of the CPU it uses to process
> IO, which can be used to help improve the actual utilization of the CPU
> in the future.
>
> Signed-off-by: Xiaobing Li <[email protected]>
> ---
> io_uring/sqpoll.c | 26 +++++++++++++++++++++++++-
> 1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index bd6c2c7959a5..2c5fc4d95fa8 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -11,6 +11,7 @@
> #include <linux/audit.h>
> #include <linux/security.h>
> #include <linux/io_uring.h>
> +#include <linux/time.h>
>
> #include <uapi/linux/io_uring.h>
>
> @@ -235,6 +236,10 @@ static int io_sq_thread(void *data)
> set_cpus_allowed_ptr(current, cpu_online_mask);
>
> mutex_lock(&sqd->lock);
> + bool first = true;
> + struct timespec64 ts_start, ts_end;
> + struct timespec64 ts_delta;
> + struct sched_entity *se = &sqd->thread->se;
> while (1) {
> bool cap_entries, sqt_spin = false;
>
> @@ -243,7 +248,16 @@ static int io_sq_thread(void *data)
> break;
> timeout = jiffies + sqd->sq_thread_idle;
> }
> -
> + ktime_get_boottime_ts64(&ts_start);
> + ts_delta = timespec64_sub(ts_start, ts_end);
> + unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
> + se->sq_avg.last_update_time;
> +
> + if (first) {
> + now = 0;
> + first = false;
> + }
> + __update_sq_avg_block(now, se);
> cap_entries = !list_is_singular(&sqd->ctx_list);
> list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
> int ret = __io_sq_thread(ctx, cap_entries);
> @@ -251,6 +265,16 @@ static int io_sq_thread(void *data)
> if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
> sqt_spin = true;
> }
> +
> + ktime_get_boottime_ts64(&ts_end);
> + ts_delta = timespec64_sub(ts_end, ts_start);
> + now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
> + se->sq_avg.last_update_time;
> +
> + if (sqt_spin)
> + __update_sq_avg(now, se);
> + else
> + __update_sq_avg_block(now, se);
> if (io_run_task_work())
> sqt_spin = true;
>
All of this is quite insane, but the above is actually broken. You're
using wall-time to measure runtime of a preemptible thread.
On top of that, for extra insanity, you're using the frigging insane
timespec interface, because clearly the _ns() interfaces are too
complicated or something?
And that whole first thing is daft too, pull now out of the loop and
set it before, then all that goes away.
Now, I see what you're trying to do, but who actually uses this data?
Finally, please don't scream in the subject :/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
2023-09-28 2:22 ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li
2023-09-28 7:52 ` kernel test robot
@ 2023-09-28 8:02 ` Peter Zijlstra
2023-09-29 0:31 ` Thomas Gleixner
2 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2023-09-28 8:02 UTC (permalink / raw)
To: Xiaobing Li
Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
bsegall, mgorman, bristot, vschneid, axboe, asml.silence,
linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang
On Thu, Sep 28, 2023 at 10:22:26AM +0800, Xiaobing Li wrote:
> + for (int i = 0; i < MAX_SQ_NUM; i++) {
> + if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p)
> + || sqstat[i]->__state == TASK_DEAD))
> + sqstat[i] = NULL;
> + }
This coding style is horrific, please don't ever use that again.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
2023-09-28 8:01 ` Peter Zijlstra
@ 2023-09-28 8:23 ` Jens Axboe
2023-09-28 8:37 ` Matthew Wilcox
1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2023-09-28 8:23 UTC (permalink / raw)
To: Peter Zijlstra, Xiaobing Li
Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
bsegall, mgorman, bristot, vschneid, asml.silence, linux-kernel,
linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k,
kundan.kumar, wenwen.chen, ruyi.zhang
On 9/28/23 2:01 AM, Peter Zijlstra wrote:
> All of this is quite insane, but the above is actually broken. You're
> using wall-time to measure runtime of a preemptible thread.
Would have to agree with that... wall-time specifics aside, this whole
thing seems to attempt to solve an issue quite the wrong way around.
> Now, I see what you're trying to do, but who actually uses this data?
This is the important question - and if you need this data, then why not
just account it in sqpoll itself and have some way to export it? Let's
please not implicate the core kernel bits for that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
2023-09-28 8:01 ` Peter Zijlstra
2023-09-28 8:23 ` Jens Axboe
@ 2023-09-28 8:37 ` Matthew Wilcox
2023-09-28 8:41 ` Jens Axboe
1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2023-09-28 8:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence,
linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang
On Thu, Sep 28, 2023 at 10:01:14AM +0200, Peter Zijlstra wrote:
> Now, I see what you're trying to do, but who actually uses this data?
I ... don't. There seems to be the notion that since we're polling, that
shouldn't count against the runtime of the thread. But the thread has
chosen to poll! It is doing something! For one thing, it's preventing
the CPU from entering an idle state. It seems absolutely fair to
accuont this poll time to the runtime of the thread. Clearly i'm
missing something.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
2023-09-28 8:37 ` Matthew Wilcox
@ 2023-09-28 8:41 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2023-09-28 8:41 UTC (permalink / raw)
To: Matthew Wilcox, Peter Zijlstra
Cc: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
rostedt, bsegall, mgorman, bristot, vschneid, asml.silence,
linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang
On 9/28/23 2:37 AM, Matthew Wilcox wrote:
> On Thu, Sep 28, 2023 at 10:01:14AM +0200, Peter Zijlstra wrote:
>> Now, I see what you're trying to do, but who actually uses this data?
>
> I ... don't. There seems to be the notion that since we're polling, that
> shouldn't count against the runtime of the thread. But the thread has
> chosen to poll! It is doing something! For one thing, it's preventing
> the CPU from entering an idle state. It seems absolutely fair to
> accuont this poll time to the runtime of the thread. Clearly i'm
> missing something.
For sure, it should be accounted as CPU time, as it is exactly that. You
could argue that if we needed to preempt this task for something else we
would do that (and the code does check that on every loop), but it's
still using CPU.
I can see maybe wanting to know how much of the total time the thread
spent doing ACTUAL work rather than just polling for new work, but
that's not really something the scheduler should be involved in and
should be purely an io_uring sqpoll stat of some sort if that is truly
interesting for an application.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
2023-09-28 2:22 ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li
2023-09-28 8:01 ` Peter Zijlstra
@ 2023-09-28 18:33 ` kernel test robot
1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-09-28 18:33 UTC (permalink / raw)
To: Xiaobing Li, mingo, peterz, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
axboe, asml.silence
Cc: llvm, oe-kbuild-all, linux-kernel, linux-fsdevel, io-uring,
kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen,
ruyi.zhang, Xiaobing Li
Hi Xiaobing,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/sched/core]
[also build test ERROR on linus/master v6.6-rc3 next-20230928]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Xiaobing-Li/SCHEDULER-Add-an-interface-for-counting-real-utilization/20230928-103219
base: tip/sched/core
patch link: https://lore.kernel.org/r/20230928022228.15770-4-xiaobing.li%40samsung.com
patch subject: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
config: arm-mmp2_defconfig (https://download.01.org/0day-ci/archive/20230929/[email protected]/config)
compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230929/[email protected]/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
All errors (new ones prefixed by >>):
io_uring/sqpoll.c:254:7: error: no member named 'sq_avg' in 'struct sched_entity'
se->sq_avg.last_update_time;
~~ ^
>> io_uring/sqpoll.c:260:3: error: call to undeclared function '__update_sq_avg_block'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
__update_sq_avg_block(now, se);
^
io_uring/sqpoll.c:272:7: error: no member named 'sq_avg' in 'struct sched_entity'
se->sq_avg.last_update_time;
~~ ^
>> io_uring/sqpoll.c:275:4: error: call to undeclared function '__update_sq_avg'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
__update_sq_avg(now, se);
^
4 errors generated.
vim +/__update_sq_avg_block +260 io_uring/sqpoll.c
221
222 static int io_sq_thread(void *data)
223 {
224 struct io_sq_data *sqd = data;
225 struct io_ring_ctx *ctx;
226 unsigned long timeout = 0;
227 char buf[TASK_COMM_LEN];
228 DEFINE_WAIT(wait);
229
230 snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
231 set_task_comm(current, buf);
232
233 if (sqd->sq_cpu != -1)
234 set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
235 else
236 set_cpus_allowed_ptr(current, cpu_online_mask);
237
238 mutex_lock(&sqd->lock);
239 bool first = true;
240 struct timespec64 ts_start, ts_end;
241 struct timespec64 ts_delta;
242 struct sched_entity *se = &sqd->thread->se;
243 while (1) {
244 bool cap_entries, sqt_spin = false;
245
246 if (io_sqd_events_pending(sqd) || signal_pending(current)) {
247 if (io_sqd_handle_event(sqd))
248 break;
249 timeout = jiffies + sqd->sq_thread_idle;
250 }
251 ktime_get_boottime_ts64(&ts_start);
252 ts_delta = timespec64_sub(ts_start, ts_end);
253 unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
254 se->sq_avg.last_update_time;
255
256 if (first) {
257 now = 0;
258 first = false;
259 }
> 260 __update_sq_avg_block(now, se);
261 cap_entries = !list_is_singular(&sqd->ctx_list);
262 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
263 int ret = __io_sq_thread(ctx, cap_entries);
264
265 if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
266 sqt_spin = true;
267 }
268
269 ktime_get_boottime_ts64(&ts_end);
270 ts_delta = timespec64_sub(ts_end, ts_start);
271 now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
272 se->sq_avg.last_update_time;
273
274 if (sqt_spin)
> 275 __update_sq_avg(now, se);
276 else
277 __update_sq_avg_block(now, se);
278 if (io_run_task_work())
279 sqt_spin = true;
280
281 if (sqt_spin || !time_after(jiffies, timeout)) {
282 if (sqt_spin)
283 timeout = jiffies + sqd->sq_thread_idle;
284 if (unlikely(need_resched())) {
285 mutex_unlock(&sqd->lock);
286 cond_resched();
287 mutex_lock(&sqd->lock);
288 }
289 continue;
290 }
291
292 prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
293 if (!io_sqd_events_pending(sqd) && !task_work_pending(current)) {
294 bool needs_sched = true;
295
296 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
297 atomic_or(IORING_SQ_NEED_WAKEUP,
298 &ctx->rings->sq_flags);
299 if ((ctx->flags & IORING_SETUP_IOPOLL) &&
300 !wq_list_empty(&ctx->iopoll_list)) {
301 needs_sched = false;
302 break;
303 }
304
305 /*
306 * Ensure the store of the wakeup flag is not
307 * reordered with the load of the SQ tail
308 */
309 smp_mb__after_atomic();
310
311 if (io_sqring_entries(ctx)) {
312 needs_sched = false;
313 break;
314 }
315 }
316
317 if (needs_sched) {
318 mutex_unlock(&sqd->lock);
319 schedule();
320 mutex_lock(&sqd->lock);
321 }
322 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
323 atomic_andnot(IORING_SQ_NEED_WAKEUP,
324 &ctx->rings->sq_flags);
325 }
326
327 finish_wait(&sqd->wait, &wait);
328 timeout = jiffies + sqd->sq_thread_idle;
329 }
330
331 io_uring_cancel_generic(true, sqd);
332 sqd->thread = NULL;
333 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
334 atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
335 io_run_task_work();
336 mutex_unlock(&sqd->lock);
337
338 complete(&sqd->exited);
339 do_exit(0);
340 }
341
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
2023-09-28 2:22 ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li
2023-09-28 7:52 ` kernel test robot
2023-09-28 8:02 ` Peter Zijlstra
@ 2023-09-29 0:31 ` Thomas Gleixner
2 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2023-09-29 0:31 UTC (permalink / raw)
To: Xiaobing Li, mingo, peterz, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
axboe, asml.silence
Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li
On Thu, Sep 28 2023 at 10:22, Xiaobing Li wrote:
> Since pelt takes the running time of the thread as utilization, and for
> some threads, although they are running, they are not actually
> processing any transactions and are in an idling state.
> our goal is to count the effective working time of the thread, so as to
> Calculate the true utilization of threads.
Sorry. I can't figure out from the above what you are trying to achieve
and which problem this is actualy solving.
> +void get_sqthread_util(struct task_struct *p)
> +{
> + struct task_struct **sqstat = kcpustat_this_cpu->sq_util;
> +
> + for (int i = 0; i < MAX_SQ_NUM; i++) {
> + if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p)
> + || sqstat[i]->__state == TASK_DEAD))
> + sqstat[i] = NULL;
> + }
This is unreadable.
> +
> + if (strncmp(p->comm, "iou-sqp", 7))
> + return;
You really want to do hard coded string parsing on every invocation of
this function, which happens at least once per tick, right?
What's so special about iou-sqp?
Nothing at all. It might be special for your particular workload but its
completely irrelevant to everyone else.
We are not adding random statistics to the hotpath just because.
Please come back once you have a proper justification for imposing this
On everyone which is not using iouring at all.
Thanks
tglx
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-09-29 0:32 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20230928023004epcas5p4a6dac4cda9c867f3673690521a8c18b6@epcas5p4.samsung.com>
2023-09-28 2:22 ` [PATCH 0/3] Sq thread real utilization statistics Xiaobing Li
[not found] ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com>
2023-09-28 2:22 ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li
2023-09-28 7:52 ` kernel test robot
2023-09-28 8:02 ` Peter Zijlstra
2023-09-29 0:31 ` Thomas Gleixner
[not found] ` <CGME20230928023011epcas5p32668b193b447f1cd6bf78035f4894c42@epcas5p3.samsung.com>
2023-09-28 2:22 ` [PATCH 2/3] PROC FILESYSTEM: Add real utilization data of sq thread Xiaobing Li
[not found] ` <CGME20230928023015epcas5p273b3eaebf3759790c278b03c7f0341c8@epcas5p2.samsung.com>
2023-09-28 2:22 ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li
2023-09-28 8:01 ` Peter Zijlstra
2023-09-28 8:23 ` Jens Axboe
2023-09-28 8:37 ` Matthew Wilcox
2023-09-28 8:41 ` Jens Axboe
2023-09-28 18:33 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox