public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/3] Sq thread real utilization statistics.
       [not found] <CGME20230928023004epcas5p4a6dac4cda9c867f3673690521a8c18b6@epcas5p4.samsung.com>
@ 2023-09-28  2:22 ` Xiaobing Li
       [not found]   ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com>
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Xiaobing Li @ 2023-09-28  2:22 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
  Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li

Summary:

The current kernel's pelt scheduling algorithm is calculated based on
the running time of the thread. However, this algorithm may cause a
waste of CPU resources for some threads, such as the sq thread in
io_uring.
Since the sq thread has a while(1) structure, during this process, there
may be a lot of time when IO is not processed but the timeout period is
not exceeded, so the sqpoll thread will keep running, thus occupying the
CPU. Obviously, the CPU is wasted at this time.
our goal is to count the part of the time the sqpoll thread actually
processes IO, thereby reflecting the part of its CPU used to process IO,
which can be used to help improve the actual utilization of the CPU in
the future.
Modifications to the scheduling module are also applicable to other
threads with the same needs.

We use fio (version 3.28) to test the performance. In the experiments,
an fio process are viewed as an application, it starts job with sq_poll
enabled. The tests are performed on a host with 256 CPUs and 64G memory,
the IO tasks are performed on a PM1743 SSD, and the OS is Ubuntu 22.04
with kernel version of 6.4.0.

Some parameters for sequential reading and writing are as follows:
bs=128k, numjobs=1, iodepth=64.
Some parameters for random reading and writing are as follows:
bs=4k, numjobs=16, iodepth=64.

The test results are as follows:
Before modification
         read   write   randread   randwrite
IOPS(K)  53.7   46.1    849        293
BW(MB/S) 7033   6037    3476       1199

After modification
         read   write   randread   randwrite
IOPS(K)  53.7   46.1    847        293
BW(MB/S) 7033   6042    3471       1199

It can be seen from the test results that my modifications have almost
no impact on performance.

Xiaobing Li (3):
  SCHEDULER: Add an interface for counting real utilization.
  PROC FILESYSTEM: Add real utilization data of sq thread.
  IO_URING: Statistics of the true utilization of sq threads.

 fs/proc/stat.c              | 25 ++++++++++++++++++++++++-
 include/linux/kernel.h      |  7 ++++++-
 include/linux/kernel_stat.h |  3 +++
 include/linux/sched.h       |  1 +
 io_uring/sqpoll.c           | 26 +++++++++++++++++++++++++-
 kernel/sched/cputime.c      | 36 +++++++++++++++++++++++++++++++++++-
 kernel/sched/pelt.c         | 14 ++++++++++++++
 7 files changed, 108 insertions(+), 4 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
       [not found]   ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com>
@ 2023-09-28  2:22     ` Xiaobing Li
  2023-09-28  7:52       ` kernel test robot
                         ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Xiaobing Li @ 2023-09-28  2:22 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
  Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li

Since pelt takes the running time of the thread as utilization, and for
some threads, although they are running, they are not actually
processing any transactions and are in an idling state.
our goal is to count the effective working time of the thread, so as to
Calculate the true utilization of threads.

Signed-off-by: Xiaobing Li <[email protected]>
---
 include/linux/kernel.h |  7 ++++++-
 include/linux/sched.h  |  1 +
 kernel/sched/cputime.c | 36 +++++++++++++++++++++++++++++++++++-
 kernel/sched/pelt.c    | 14 ++++++++++++++
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index cee8fe87e9f4..c1557fa9cbbe 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -37,7 +37,8 @@
 #include <uapi/linux/kernel.h>
 
 #define STACK_MAGIC	0xdeadbeef
-
+struct cfs_rq;
+struct sched_entity;
 /**
  * REPEAT_BYTE - repeat the value @x multiple times as an unsigned long value
  * @x: value to repeat
@@ -103,6 +104,10 @@ extern int __cond_resched(void);
 
 #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
 
+extern void __update_sq_avg_block(u64 now, struct sched_entity *se);
+
+extern void __update_sq_avg(u64 now, struct sched_entity *se);
+
 extern int __cond_resched(void);
 
 DECLARE_STATIC_CALL(might_resched, __cond_resched);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..403ccb456c9a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -583,6 +583,7 @@ struct sched_entity {
 	 * collide with read-mostly values above.
 	 */
 	struct sched_avg		avg;
+	struct sched_avg		sq_avg;
 #endif
 };
 
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index af7952f12e6c..824203293fd9 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -479,6 +479,40 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 
 #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE: */
 
+void get_sqthread_util(struct task_struct *p)
+{
+	struct task_struct **sqstat = kcpustat_this_cpu->sq_util;
+
+	for (int i = 0; i < MAX_SQ_NUM; i++) {
+		if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p)
+		|| sqstat[i]->__state == TASK_DEAD))
+			sqstat[i] = NULL;
+	}
+
+	if (strncmp(p->comm, "iou-sqp", 7))
+		return;
+
+	if (!kcpustat_this_cpu->flag) {
+		for (int j = 0; j < MAX_SQ_NUM; j++)
+			kcpustat_this_cpu->sq_util[j] = NULL;
+		kcpustat_this_cpu->flag = true;
+	}
+	int index = MAX_SQ_NUM;
+	bool flag = true;
+
+	for (int i = 0; i < MAX_SQ_NUM; i++) {
+		if (sqstat[i] == p)
+			flag = false;
+		if (!sqstat[i] || task_cpu(sqstat[i]) != task_cpu(p)) {
+			sqstat[i] = NULL;
+			if (i < index)
+				index = i;
+		}
+	}
+	if (flag && index < MAX_SQ_NUM)
+		sqstat[index] = p;
+}
+
 /*
  * Account a single tick of CPU time.
  * @p: the process that the CPU time gets accounted to
@@ -487,7 +521,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st)
 void account_process_tick(struct task_struct *p, int user_tick)
 {
 	u64 cputime, steal;
-
+	get_sqthread_util(p);
 	if (vtime_accounting_enabled_this_cpu())
 		return;
 
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 0f310768260c..945efe80e08c 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -266,6 +266,20 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load)
 	WRITE_ONCE(sa->util_avg, sa->util_sum / divider);
 }
 
+void __update_sq_avg_block(u64 now, struct sched_entity *se)
+{
+	if (___update_load_sum(now, &se->sq_avg, 0, 0, 0))
+		___update_load_avg(&se->sq_avg, se_weight(se));
+}
+
+void __update_sq_avg(u64 now, struct sched_entity *se)
+{
+	struct cfs_rq *qcfs_rq = cfs_rq_of(se);
+
+	if (___update_load_sum(now, &se->sq_avg, !!se->on_rq, se_runnable(se), qcfs_rq->curr == se))
+		___update_load_avg(&se->sq_avg, se_weight(se));
+}
+
 /*
  * sched_entity:
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/3] PROC FILESYSTEM: Add real utilization data of sq thread.
       [not found]   ` <CGME20230928023011epcas5p32668b193b447f1cd6bf78035f4894c42@epcas5p3.samsung.com>
@ 2023-09-28  2:22     ` Xiaobing Li
  0 siblings, 0 replies; 12+ messages in thread
From: Xiaobing Li @ 2023-09-28  2:22 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
  Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li

Since it is necessary to count and record the real utilization of the sq
thread, it is recorded in the /proc/stat file.

Signed-off-by: Xiaobing Li <[email protected]>
---
 fs/proc/stat.c              | 25 ++++++++++++++++++++++++-
 include/linux/kernel_stat.h |  3 +++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fs/proc/stat.c b/fs/proc/stat.c
index da60956b2915..bd86c0657874 100644
--- a/fs/proc/stat.c
+++ b/fs/proc/stat.c
@@ -79,6 +79,29 @@ static void show_all_irqs(struct seq_file *p)
 	show_irq_gap(p, nr_irqs - next);
 }
 
+static void show_sqthread_util(struct seq_file *p)
+{
+	int i, j;
+
+	for_each_online_cpu(i) {
+		struct kernel_cpustat kcpustat;
+
+		kcpustat_cpu_fetch(&kcpustat, i);
+		struct task_struct **sqstat = kcpustat.sq_util;
+
+		for (j = 0; j < MAX_SQ_NUM; j++) {
+			if (sqstat[j]) {
+				seq_printf(p, "%d %s", sqstat[j]->pid, sqstat[j]->comm);
+				seq_put_decimal_ull(p, " pelt ",
+				(unsigned long long)sqstat[j]->se.avg.util_avg);
+				seq_put_decimal_ull(p, " real ",
+				(unsigned long long)sqstat[j]->se.sq_avg.util_avg);
+				seq_putc(p, '\n');
+			}
+		}
+	}
+}
+
 static int show_stat(struct seq_file *p, void *v)
 {
 	int i, j;
@@ -187,7 +210,7 @@ static int show_stat(struct seq_file *p, void *v)
 	for (i = 0; i < NR_SOFTIRQS; i++)
 		seq_put_decimal_ull(p, " ", per_softirq_sums[i]);
 	seq_putc(p, '\n');
-
+	show_sqthread_util(p);
 	return 0;
 }
 
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 9935f7ecbfb9..722703da681e 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -11,6 +11,7 @@
 #include <linux/vtime.h>
 #include <asm/irq.h>
 
+#define MAX_SQ_NUM 16
 /*
  * 'kernel_stat.h' contains the definitions needed for doing
  * some kernel statistics (CPU usage, context switches ...),
@@ -36,6 +37,8 @@ enum cpu_usage_stat {
 
 struct kernel_cpustat {
 	u64 cpustat[NR_STATS];
+	bool flag;
+	struct task_struct *sq_util[MAX_SQ_NUM];
 };
 
 struct kernel_stat {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
       [not found]   ` <CGME20230928023015epcas5p273b3eaebf3759790c278b03c7f0341c8@epcas5p2.samsung.com>
@ 2023-09-28  2:22     ` Xiaobing Li
  2023-09-28  8:01       ` Peter Zijlstra
  2023-09-28 18:33       ` kernel test robot
  0 siblings, 2 replies; 12+ messages in thread
From: Xiaobing Li @ 2023-09-28  2:22 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
  Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li

Since the sq thread has a while(1) structure, during this process, there
may be a lot of time that is not processing IO but does not exceed the
timeout period, therefore, the sqpoll thread will keep running and will
keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
goal is to count the part of the time that the sqpoll thread actually
processes IO, so as to reflect the part of the CPU it uses to process
IO, which can be used to help improve the actual utilization of the CPU
in the future.

Signed-off-by: Xiaobing Li <[email protected]>
---
 io_uring/sqpoll.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index bd6c2c7959a5..2c5fc4d95fa8 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -11,6 +11,7 @@
 #include <linux/audit.h>
 #include <linux/security.h>
 #include <linux/io_uring.h>
+#include <linux/time.h>
 
 #include <uapi/linux/io_uring.h>
 
@@ -235,6 +236,10 @@ static int io_sq_thread(void *data)
 		set_cpus_allowed_ptr(current, cpu_online_mask);
 
 	mutex_lock(&sqd->lock);
+	bool first = true;
+	struct timespec64 ts_start, ts_end;
+	struct timespec64 ts_delta;
+	struct sched_entity *se = &sqd->thread->se;
 	while (1) {
 		bool cap_entries, sqt_spin = false;
 
@@ -243,7 +248,16 @@ static int io_sq_thread(void *data)
 				break;
 			timeout = jiffies + sqd->sq_thread_idle;
 		}
-
+		ktime_get_boottime_ts64(&ts_start);
+		ts_delta = timespec64_sub(ts_start, ts_end);
+		unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
+		se->sq_avg.last_update_time;
+
+		if (first) {
+			now = 0;
+			first = false;
+		}
+		__update_sq_avg_block(now, se);
 		cap_entries = !list_is_singular(&sqd->ctx_list);
 		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
 			int ret = __io_sq_thread(ctx, cap_entries);
@@ -251,6 +265,16 @@ static int io_sq_thread(void *data)
 			if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
 				sqt_spin = true;
 		}
+
+		ktime_get_boottime_ts64(&ts_end);
+		ts_delta = timespec64_sub(ts_end, ts_start);
+		now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
+		se->sq_avg.last_update_time;
+
+		if (sqt_spin)
+			__update_sq_avg(now, se);
+		else
+			__update_sq_avg_block(now, se);
 		if (io_run_task_work())
 			sqt_spin = true;
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
  2023-09-28  2:22     ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li
@ 2023-09-28  7:52       ` kernel test robot
  2023-09-28  8:02       ` Peter Zijlstra
  2023-09-29  0:31       ` Thomas Gleixner
  2 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-09-28  7:52 UTC (permalink / raw)
  To: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence
  Cc: oe-kbuild-all, linux-kernel, linux-fsdevel, io-uring, kun.dou,
	peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang,
	Xiaobing Li

Hi Xiaobing,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on linus/master v6.6-rc3 next-20230928]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xiaobing-Li/SCHEDULER-Add-an-interface-for-counting-real-utilization/20230928-103219
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20230928022228.15770-2-xiaobing.li%40samsung.com
patch subject: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230928/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230928/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

   In file included from kernel/sched/build_policy.c:52:
>> kernel/sched/cputime.c:482:6: warning: no previous prototype for 'get_sqthread_util' [-Wmissing-prototypes]
     482 | void get_sqthread_util(struct task_struct *p)
         |      ^~~~~~~~~~~~~~~~~
   kernel/sched/cputime.c: In function 'get_sqthread_util':
   kernel/sched/cputime.c:484:56: error: 'struct kernel_cpustat' has no member named 'sq_util'
     484 |         struct task_struct **sqstat = kcpustat_this_cpu->sq_util;
         |                                                        ^~
   kernel/sched/cputime.c:486:29: error: 'MAX_SQ_NUM' undeclared (first use in this function)
     486 |         for (int i = 0; i < MAX_SQ_NUM; i++) {
         |                             ^~~~~~~~~~
   kernel/sched/cputime.c:486:29: note: each undeclared identifier is reported only once for each function it appears in
   kernel/sched/cputime.c:495:31: error: 'struct kernel_cpustat' has no member named 'flag'
     495 |         if (!kcpustat_this_cpu->flag) {
         |                               ^~
   kernel/sched/cputime.c:497:42: error: 'struct kernel_cpustat' has no member named 'sq_util'
     497 |                         kcpustat_this_cpu->sq_util[j] = NULL;
         |                                          ^~
   kernel/sched/cputime.c:498:34: error: 'struct kernel_cpustat' has no member named 'flag'
     498 |                 kcpustat_this_cpu->flag = true;
         |                                  ^~


vim +/get_sqthread_util +482 kernel/sched/cputime.c

   481	
 > 482	void get_sqthread_util(struct task_struct *p)
   483	{
   484		struct task_struct **sqstat = kcpustat_this_cpu->sq_util;
   485	
   486		for (int i = 0; i < MAX_SQ_NUM; i++) {
   487			if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p)
   488			|| sqstat[i]->__state == TASK_DEAD))
   489				sqstat[i] = NULL;
   490		}
   491	
   492		if (strncmp(p->comm, "iou-sqp", 7))
   493			return;
   494	
   495		if (!kcpustat_this_cpu->flag) {
   496			for (int j = 0; j < MAX_SQ_NUM; j++)
   497				kcpustat_this_cpu->sq_util[j] = NULL;
   498			kcpustat_this_cpu->flag = true;
   499		}
   500		int index = MAX_SQ_NUM;
   501		bool flag = true;
   502	
   503		for (int i = 0; i < MAX_SQ_NUM; i++) {
   504			if (sqstat[i] == p)
   505				flag = false;
   506			if (!sqstat[i] || task_cpu(sqstat[i]) != task_cpu(p)) {
   507				sqstat[i] = NULL;
   508				if (i < index)
   509					index = i;
   510			}
   511		}
   512		if (flag && index < MAX_SQ_NUM)
   513			sqstat[index] = p;
   514	}
   515	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
  2023-09-28  2:22     ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li
@ 2023-09-28  8:01       ` Peter Zijlstra
  2023-09-28  8:23         ` Jens Axboe
  2023-09-28  8:37         ` Matthew Wilcox
  2023-09-28 18:33       ` kernel test robot
  1 sibling, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2023-09-28  8:01 UTC (permalink / raw)
  To: Xiaobing Li
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, vschneid, axboe, asml.silence,
	linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang

On Thu, Sep 28, 2023 at 10:22:28AM +0800, Xiaobing Li wrote:
> Since the sq thread has a while(1) structure, during this process, there
> may be a lot of time that is not processing IO but does not exceed the
> timeout period, therefore, the sqpoll thread will keep running and will
> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
> goal is to count the part of the time that the sqpoll thread actually
> processes IO, so as to reflect the part of the CPU it uses to process
> IO, which can be used to help improve the actual utilization of the CPU
> in the future.
> 
> Signed-off-by: Xiaobing Li <[email protected]>
> ---
>  io_uring/sqpoll.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index bd6c2c7959a5..2c5fc4d95fa8 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -11,6 +11,7 @@
>  #include <linux/audit.h>
>  #include <linux/security.h>
>  #include <linux/io_uring.h>
> +#include <linux/time.h>
>  
>  #include <uapi/linux/io_uring.h>
>  
> @@ -235,6 +236,10 @@ static int io_sq_thread(void *data)
>  		set_cpus_allowed_ptr(current, cpu_online_mask);
>  
>  	mutex_lock(&sqd->lock);
> +	bool first = true;
> +	struct timespec64 ts_start, ts_end;
> +	struct timespec64 ts_delta;
> +	struct sched_entity *se = &sqd->thread->se;
>  	while (1) {
>  		bool cap_entries, sqt_spin = false;
>  
> @@ -243,7 +248,16 @@ static int io_sq_thread(void *data)
>  				break;
>  			timeout = jiffies + sqd->sq_thread_idle;
>  		}
> -
> +		ktime_get_boottime_ts64(&ts_start);
> +		ts_delta = timespec64_sub(ts_start, ts_end);
> +		unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
> +		se->sq_avg.last_update_time;
> +
> +		if (first) {
> +			now = 0;
> +			first = false;
> +		}
> +		__update_sq_avg_block(now, se);
>  		cap_entries = !list_is_singular(&sqd->ctx_list);
>  		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
>  			int ret = __io_sq_thread(ctx, cap_entries);
> @@ -251,6 +265,16 @@ static int io_sq_thread(void *data)
>  			if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
>  				sqt_spin = true;
>  		}
> +
> +		ktime_get_boottime_ts64(&ts_end);
> +		ts_delta = timespec64_sub(ts_end, ts_start);
> +		now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
> +		se->sq_avg.last_update_time;
> +
> +		if (sqt_spin)
> +			__update_sq_avg(now, se);
> +		else
> +			__update_sq_avg_block(now, se);
>  		if (io_run_task_work())
>  			sqt_spin = true;
>  

All of this is quite insane, but the above is actually broken. You're
using wall-time to measure runtime of a preemptible thread.

On top of that, for extra insanity, you're using the frigging insane
timespec interface, because clearly the _ns() interfaces are too
complicated or something?

And that whole first thing is daft too, pull now out of the loop and
set it before, then all that goes away.

Now, I see what you're trying to do, but who actually uses this data?

Finally, please don't scream in the subject :/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
  2023-09-28  2:22     ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li
  2023-09-28  7:52       ` kernel test robot
@ 2023-09-28  8:02       ` Peter Zijlstra
  2023-09-29  0:31       ` Thomas Gleixner
  2 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2023-09-28  8:02 UTC (permalink / raw)
  To: Xiaobing Li
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, vschneid, axboe, asml.silence,
	linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang

On Thu, Sep 28, 2023 at 10:22:26AM +0800, Xiaobing Li wrote:
> +	for (int i = 0; i < MAX_SQ_NUM; i++) {
> +		if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p)
> +		|| sqstat[i]->__state == TASK_DEAD))
> +			sqstat[i] = NULL;
> +	}

This coding style is horrific, please don't ever use that again.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
  2023-09-28  8:01       ` Peter Zijlstra
@ 2023-09-28  8:23         ` Jens Axboe
  2023-09-28  8:37         ` Matthew Wilcox
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2023-09-28  8:23 UTC (permalink / raw)
  To: Peter Zijlstra, Xiaobing Li
  Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, bristot, vschneid, asml.silence, linux-kernel,
	linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k,
	kundan.kumar, wenwen.chen, ruyi.zhang

On 9/28/23 2:01 AM, Peter Zijlstra wrote:
> All of this is quite insane, but the above is actually broken. You're
> using wall-time to measure runtime of a preemptible thread.

Would have to agree with that... wall-time specifics aside, this whole
thing seems to attempt to solve an issue quite the wrong way around.

> Now, I see what you're trying to do, but who actually uses this data?

This is the important question - and if you need this data, then why not
just account it in sqpoll itself and have some way to export it? Let's
please not implicate the core kernel bits for that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
  2023-09-28  8:01       ` Peter Zijlstra
  2023-09-28  8:23         ` Jens Axboe
@ 2023-09-28  8:37         ` Matthew Wilcox
  2023-09-28  8:41           ` Jens Axboe
  1 sibling, 1 reply; 12+ messages in thread
From: Matthew Wilcox @ 2023-09-28  8:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence,
	linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang

On Thu, Sep 28, 2023 at 10:01:14AM +0200, Peter Zijlstra wrote:
> Now, I see what you're trying to do, but who actually uses this data?

I ... don't.  There seems to be the notion that since we're polling, that
shouldn't count against the runtime of the thread.  But the thread has
chosen to poll!  It is doing something!  For one thing, it's preventing
the CPU from entering an idle state.  It seems absolutely fair to
accuont this poll time to the runtime of the thread.  Clearly i'm
missing something.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
  2023-09-28  8:37         ` Matthew Wilcox
@ 2023-09-28  8:41           ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2023-09-28  8:41 UTC (permalink / raw)
  To: Matthew Wilcox, Peter Zijlstra
  Cc: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann,
	rostedt, bsegall, mgorman, bristot, vschneid, asml.silence,
	linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang

On 9/28/23 2:37 AM, Matthew Wilcox wrote:
> On Thu, Sep 28, 2023 at 10:01:14AM +0200, Peter Zijlstra wrote:
>> Now, I see what you're trying to do, but who actually uses this data?
> 
> I ... don't.  There seems to be the notion that since we're polling, that
> shouldn't count against the runtime of the thread.  But the thread has
> chosen to poll!  It is doing something!  For one thing, it's preventing
> the CPU from entering an idle state.  It seems absolutely fair to
> accuont this poll time to the runtime of the thread.  Clearly i'm
> missing something.

For sure, it should be accounted as CPU time, as it is exactly that. You
could argue that if we needed to preempt this task for something else we
would do that (and the code does check that on every loop), but it's
still using CPU.

I can see maybe wanting to know how much of the total time the thread
spent doing ACTUAL work rather than just polling for new work, but
that's not really something the scheduler should be involved in and
should be purely an io_uring sqpoll stat of some sort if that is truly
interesting for an application.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
  2023-09-28  2:22     ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li
  2023-09-28  8:01       ` Peter Zijlstra
@ 2023-09-28 18:33       ` kernel test robot
  1 sibling, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-09-28 18:33 UTC (permalink / raw)
  To: Xiaobing Li, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	axboe, asml.silence
  Cc: llvm, oe-kbuild-all, linux-kernel, linux-fsdevel, io-uring,
	kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen,
	ruyi.zhang, Xiaobing Li

Hi Xiaobing,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on linus/master v6.6-rc3 next-20230928]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Xiaobing-Li/SCHEDULER-Add-an-interface-for-counting-real-utilization/20230928-103219
base:   tip/sched/core
patch link:    https://lore.kernel.org/r/20230928022228.15770-4-xiaobing.li%40samsung.com
patch subject: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads.
config: arm-mmp2_defconfig (https://download.01.org/0day-ci/archive/20230929/[email protected]/config)
compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230929/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

   io_uring/sqpoll.c:254:7: error: no member named 'sq_avg' in 'struct sched_entity'
                   se->sq_avg.last_update_time;
                   ~~  ^
>> io_uring/sqpoll.c:260:3: error: call to undeclared function '__update_sq_avg_block'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
                   __update_sq_avg_block(now, se);
                   ^
   io_uring/sqpoll.c:272:7: error: no member named 'sq_avg' in 'struct sched_entity'
                   se->sq_avg.last_update_time;
                   ~~  ^
>> io_uring/sqpoll.c:275:4: error: call to undeclared function '__update_sq_avg'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
                           __update_sq_avg(now, se);
                           ^
   4 errors generated.


vim +/__update_sq_avg_block +260 io_uring/sqpoll.c

   221	
   222	static int io_sq_thread(void *data)
   223	{
   224		struct io_sq_data *sqd = data;
   225		struct io_ring_ctx *ctx;
   226		unsigned long timeout = 0;
   227		char buf[TASK_COMM_LEN];
   228		DEFINE_WAIT(wait);
   229	
   230		snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
   231		set_task_comm(current, buf);
   232	
   233		if (sqd->sq_cpu != -1)
   234			set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
   235		else
   236			set_cpus_allowed_ptr(current, cpu_online_mask);
   237	
   238		mutex_lock(&sqd->lock);
   239		bool first = true;
   240		struct timespec64 ts_start, ts_end;
   241		struct timespec64 ts_delta;
   242		struct sched_entity *se = &sqd->thread->se;
   243		while (1) {
   244			bool cap_entries, sqt_spin = false;
   245	
   246			if (io_sqd_events_pending(sqd) || signal_pending(current)) {
   247				if (io_sqd_handle_event(sqd))
   248					break;
   249				timeout = jiffies + sqd->sq_thread_idle;
   250			}
   251			ktime_get_boottime_ts64(&ts_start);
   252			ts_delta = timespec64_sub(ts_start, ts_end);
   253			unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
   254			se->sq_avg.last_update_time;
   255	
   256			if (first) {
   257				now = 0;
   258				first = false;
   259			}
 > 260			__update_sq_avg_block(now, se);
   261			cap_entries = !list_is_singular(&sqd->ctx_list);
   262			list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
   263				int ret = __io_sq_thread(ctx, cap_entries);
   264	
   265				if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
   266					sqt_spin = true;
   267			}
   268	
   269			ktime_get_boottime_ts64(&ts_end);
   270			ts_delta = timespec64_sub(ts_end, ts_start);
   271			now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec +
   272			se->sq_avg.last_update_time;
   273	
   274			if (sqt_spin)
 > 275				__update_sq_avg(now, se);
   276			else
   277				__update_sq_avg_block(now, se);
   278			if (io_run_task_work())
   279				sqt_spin = true;
   280	
   281			if (sqt_spin || !time_after(jiffies, timeout)) {
   282				if (sqt_spin)
   283					timeout = jiffies + sqd->sq_thread_idle;
   284				if (unlikely(need_resched())) {
   285					mutex_unlock(&sqd->lock);
   286					cond_resched();
   287					mutex_lock(&sqd->lock);
   288				}
   289				continue;
   290			}
   291	
   292			prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE);
   293			if (!io_sqd_events_pending(sqd) && !task_work_pending(current)) {
   294				bool needs_sched = true;
   295	
   296				list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
   297					atomic_or(IORING_SQ_NEED_WAKEUP,
   298							&ctx->rings->sq_flags);
   299					if ((ctx->flags & IORING_SETUP_IOPOLL) &&
   300					    !wq_list_empty(&ctx->iopoll_list)) {
   301						needs_sched = false;
   302						break;
   303					}
   304	
   305					/*
   306					 * Ensure the store of the wakeup flag is not
   307					 * reordered with the load of the SQ tail
   308					 */
   309					smp_mb__after_atomic();
   310	
   311					if (io_sqring_entries(ctx)) {
   312						needs_sched = false;
   313						break;
   314					}
   315				}
   316	
   317				if (needs_sched) {
   318					mutex_unlock(&sqd->lock);
   319					schedule();
   320					mutex_lock(&sqd->lock);
   321				}
   322				list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
   323					atomic_andnot(IORING_SQ_NEED_WAKEUP,
   324							&ctx->rings->sq_flags);
   325			}
   326	
   327			finish_wait(&sqd->wait, &wait);
   328			timeout = jiffies + sqd->sq_thread_idle;
   329		}
   330	
   331		io_uring_cancel_generic(true, sqd);
   332		sqd->thread = NULL;
   333		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
   334			atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
   335		io_run_task_work();
   336		mutex_unlock(&sqd->lock);
   337	
   338		complete(&sqd->exited);
   339		do_exit(0);
   340	}
   341	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization.
  2023-09-28  2:22     ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li
  2023-09-28  7:52       ` kernel test robot
  2023-09-28  8:02       ` Peter Zijlstra
@ 2023-09-29  0:31       ` Thomas Gleixner
  2 siblings, 0 replies; 12+ messages in thread
From: Thomas Gleixner @ 2023-09-29  0:31 UTC (permalink / raw)
  To: Xiaobing Li, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid,
	axboe, asml.silence
  Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li

On Thu, Sep 28 2023 at 10:22, Xiaobing Li wrote:
> Since pelt takes the running time of the thread as utilization, and for
> some threads, although they are running, they are not actually
> processing any transactions and are in an idling state.
> our goal is to count the effective working time of the thread, so as to
> Calculate the true utilization of threads.

Sorry. I can't figure out from the above what you are trying to achieve
and which problem this is actualy solving.

> +void get_sqthread_util(struct task_struct *p)
> +{
> +	struct task_struct **sqstat = kcpustat_this_cpu->sq_util;
> +
> +	for (int i = 0; i < MAX_SQ_NUM; i++) {
> +		if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p)
> +		|| sqstat[i]->__state == TASK_DEAD))
> +			sqstat[i] = NULL;
> +	}

This is unreadable.

> +
> +	if (strncmp(p->comm, "iou-sqp", 7))
> +		return;

You really want to do hard coded string parsing on every invocation of
this function, which happens at least once per tick, right?

What's so special about iou-sqp?

Nothing at all. It might be special for your particular workload but its
completely irrelevant to everyone else.

We are not adding random statistics to the hotpath just because.

Please come back once you have a proper justification for imposing this
On everyone which is not using iouring at all.

Thanks

        tglx

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-09-29  0:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20230928023004epcas5p4a6dac4cda9c867f3673690521a8c18b6@epcas5p4.samsung.com>
2023-09-28  2:22 ` [PATCH 0/3] Sq thread real utilization statistics Xiaobing Li
     [not found]   ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com>
2023-09-28  2:22     ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li
2023-09-28  7:52       ` kernel test robot
2023-09-28  8:02       ` Peter Zijlstra
2023-09-29  0:31       ` Thomas Gleixner
     [not found]   ` <CGME20230928023011epcas5p32668b193b447f1cd6bf78035f4894c42@epcas5p3.samsung.com>
2023-09-28  2:22     ` [PATCH 2/3] PROC FILESYSTEM: Add real utilization data of sq thread Xiaobing Li
     [not found]   ` <CGME20230928023015epcas5p273b3eaebf3759790c278b03c7f0341c8@epcas5p2.samsung.com>
2023-09-28  2:22     ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li
2023-09-28  8:01       ` Peter Zijlstra
2023-09-28  8:23         ` Jens Axboe
2023-09-28  8:37         ` Matthew Wilcox
2023-09-28  8:41           ` Jens Axboe
2023-09-28 18:33       ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox