* [PATCH 0/3] Sq thread real utilization statistics. [not found] <CGME20230928023004epcas5p4a6dac4cda9c867f3673690521a8c18b6@epcas5p4.samsung.com> @ 2023-09-28 2:22 ` Xiaobing Li [not found] ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com> ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Xiaobing Li @ 2023-09-28 2:22 UTC (permalink / raw) To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li Summary: The current kernel's pelt scheduling algorithm is calculated based on the running time of the thread. However, this algorithm may cause a waste of CPU resources for some threads, such as the sq thread in io_uring. Since the sq thread has a while(1) structure, during this process, there may be a lot of time when IO is not processed but the timeout period is not exceeded, so the sqpoll thread will keep running, thus occupying the CPU. Obviously, the CPU is wasted at this time. our goal is to count the part of the time the sqpoll thread actually processes IO, thereby reflecting the part of its CPU used to process IO, which can be used to help improve the actual utilization of the CPU in the future. Modifications to the scheduling module are also applicable to other threads with the same needs. We use fio (version 3.28) to test the performance. In the experiments, an fio process are viewed as an application, it starts job with sq_poll enabled. The tests are performed on a host with 256 CPUs and 64G memory, the IO tasks are performed on a PM1743 SSD, and the OS is Ubuntu 22.04 with kernel version of 6.4.0. Some parameters for sequential reading and writing are as follows: bs=128k, numjobs=1, iodepth=64. Some parameters for random reading and writing are as follows: bs=4k, numjobs=16, iodepth=64. The test results are as follows: Before modification read write randread randwrite IOPS(K) 53.7 46.1 849 293 BW(MB/S) 7033 6037 3476 1199 After modification read write randread randwrite IOPS(K) 53.7 46.1 847 293 BW(MB/S) 7033 6042 3471 1199 It can be seen from the test results that my modifications have almost no impact on performance. Xiaobing Li (3): SCHEDULER: Add an interface for counting real utilization. PROC FILESYSTEM: Add real utilization data of sq thread. IO_URING: Statistics of the true utilization of sq threads. fs/proc/stat.c | 25 ++++++++++++++++++++++++- include/linux/kernel.h | 7 ++++++- include/linux/kernel_stat.h | 3 +++ include/linux/sched.h | 1 + io_uring/sqpoll.c | 26 +++++++++++++++++++++++++- kernel/sched/cputime.c | 36 +++++++++++++++++++++++++++++++++++- kernel/sched/pelt.c | 14 ++++++++++++++ 7 files changed, 108 insertions(+), 4 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com>]
* [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization. [not found] ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com> @ 2023-09-28 2:22 ` Xiaobing Li 2023-09-28 7:52 ` kernel test robot ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Xiaobing Li @ 2023-09-28 2:22 UTC (permalink / raw) To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li Since pelt takes the running time of the thread as utilization, and for some threads, although they are running, they are not actually processing any transactions and are in an idling state. our goal is to count the effective working time of the thread, so as to Calculate the true utilization of threads. Signed-off-by: Xiaobing Li <[email protected]> --- include/linux/kernel.h | 7 ++++++- include/linux/sched.h | 1 + kernel/sched/cputime.c | 36 +++++++++++++++++++++++++++++++++++- kernel/sched/pelt.c | 14 ++++++++++++++ 4 files changed, 56 insertions(+), 2 deletions(-) diff --git a/include/linux/kernel.h b/include/linux/kernel.h index cee8fe87e9f4..c1557fa9cbbe 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -37,7 +37,8 @@ #include <uapi/linux/kernel.h> #define STACK_MAGIC 0xdeadbeef - +struct cfs_rq; +struct sched_entity; /** * REPEAT_BYTE - repeat the value @x multiple times as an unsigned long value * @x: value to repeat @@ -103,6 +104,10 @@ extern int __cond_resched(void); #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) +extern void __update_sq_avg_block(u64 now, struct sched_entity *se); + +extern void __update_sq_avg(u64 now, struct sched_entity *se); + extern int __cond_resched(void); DECLARE_STATIC_CALL(might_resched, __cond_resched); diff --git a/include/linux/sched.h b/include/linux/sched.h index 77f01ac385f7..403ccb456c9a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -583,6 +583,7 @@ struct sched_entity { * collide with read-mostly values above. */ struct sched_avg avg; + struct sched_avg sq_avg; #endif }; diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index af7952f12e6c..824203293fd9 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -479,6 +479,40 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st) #else /* !CONFIG_VIRT_CPU_ACCOUNTING_NATIVE: */ +void get_sqthread_util(struct task_struct *p) +{ + struct task_struct **sqstat = kcpustat_this_cpu->sq_util; + + for (int i = 0; i < MAX_SQ_NUM; i++) { + if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p) + || sqstat[i]->__state == TASK_DEAD)) + sqstat[i] = NULL; + } + + if (strncmp(p->comm, "iou-sqp", 7)) + return; + + if (!kcpustat_this_cpu->flag) { + for (int j = 0; j < MAX_SQ_NUM; j++) + kcpustat_this_cpu->sq_util[j] = NULL; + kcpustat_this_cpu->flag = true; + } + int index = MAX_SQ_NUM; + bool flag = true; + + for (int i = 0; i < MAX_SQ_NUM; i++) { + if (sqstat[i] == p) + flag = false; + if (!sqstat[i] || task_cpu(sqstat[i]) != task_cpu(p)) { + sqstat[i] = NULL; + if (i < index) + index = i; + } + } + if (flag && index < MAX_SQ_NUM) + sqstat[index] = p; +} + /* * Account a single tick of CPU time. * @p: the process that the CPU time gets accounted to @@ -487,7 +521,7 @@ void thread_group_cputime_adjusted(struct task_struct *p, u64 *ut, u64 *st) void account_process_tick(struct task_struct *p, int user_tick) { u64 cputime, steal; - + get_sqthread_util(p); if (vtime_accounting_enabled_this_cpu()) return; diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index 0f310768260c..945efe80e08c 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -266,6 +266,20 @@ ___update_load_avg(struct sched_avg *sa, unsigned long load) WRITE_ONCE(sa->util_avg, sa->util_sum / divider); } +void __update_sq_avg_block(u64 now, struct sched_entity *se) +{ + if (___update_load_sum(now, &se->sq_avg, 0, 0, 0)) + ___update_load_avg(&se->sq_avg, se_weight(se)); +} + +void __update_sq_avg(u64 now, struct sched_entity *se) +{ + struct cfs_rq *qcfs_rq = cfs_rq_of(se); + + if (___update_load_sum(now, &se->sq_avg, !!se->on_rq, se_runnable(se), qcfs_rq->curr == se)) + ___update_load_avg(&se->sq_avg, se_weight(se)); +} + /* * sched_entity: * -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization. 2023-09-28 2:22 ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li @ 2023-09-28 7:52 ` kernel test robot 2023-09-28 8:02 ` Peter Zijlstra 2023-09-29 0:31 ` Thomas Gleixner 2 siblings, 0 replies; 12+ messages in thread From: kernel test robot @ 2023-09-28 7:52 UTC (permalink / raw) To: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence Cc: oe-kbuild-all, linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li Hi Xiaobing, kernel test robot noticed the following build warnings: [auto build test WARNING on tip/sched/core] [also build test WARNING on linus/master v6.6-rc3 next-20230928] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Xiaobing-Li/SCHEDULER-Add-an-interface-for-counting-real-utilization/20230928-103219 base: tip/sched/core patch link: https://lore.kernel.org/r/20230928022228.15770-2-xiaobing.li%40samsung.com patch subject: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization. config: m68k-allyesconfig (https://download.01.org/0day-ci/archive/20230928/[email protected]/config) compiler: m68k-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230928/[email protected]/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <[email protected]> | Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ All warnings (new ones prefixed by >>): In file included from kernel/sched/build_policy.c:52: >> kernel/sched/cputime.c:482:6: warning: no previous prototype for 'get_sqthread_util' [-Wmissing-prototypes] 482 | void get_sqthread_util(struct task_struct *p) | ^~~~~~~~~~~~~~~~~ kernel/sched/cputime.c: In function 'get_sqthread_util': kernel/sched/cputime.c:484:56: error: 'struct kernel_cpustat' has no member named 'sq_util' 484 | struct task_struct **sqstat = kcpustat_this_cpu->sq_util; | ^~ kernel/sched/cputime.c:486:29: error: 'MAX_SQ_NUM' undeclared (first use in this function) 486 | for (int i = 0; i < MAX_SQ_NUM; i++) { | ^~~~~~~~~~ kernel/sched/cputime.c:486:29: note: each undeclared identifier is reported only once for each function it appears in kernel/sched/cputime.c:495:31: error: 'struct kernel_cpustat' has no member named 'flag' 495 | if (!kcpustat_this_cpu->flag) { | ^~ kernel/sched/cputime.c:497:42: error: 'struct kernel_cpustat' has no member named 'sq_util' 497 | kcpustat_this_cpu->sq_util[j] = NULL; | ^~ kernel/sched/cputime.c:498:34: error: 'struct kernel_cpustat' has no member named 'flag' 498 | kcpustat_this_cpu->flag = true; | ^~ vim +/get_sqthread_util +482 kernel/sched/cputime.c 481 > 482 void get_sqthread_util(struct task_struct *p) 483 { 484 struct task_struct **sqstat = kcpustat_this_cpu->sq_util; 485 486 for (int i = 0; i < MAX_SQ_NUM; i++) { 487 if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p) 488 || sqstat[i]->__state == TASK_DEAD)) 489 sqstat[i] = NULL; 490 } 491 492 if (strncmp(p->comm, "iou-sqp", 7)) 493 return; 494 495 if (!kcpustat_this_cpu->flag) { 496 for (int j = 0; j < MAX_SQ_NUM; j++) 497 kcpustat_this_cpu->sq_util[j] = NULL; 498 kcpustat_this_cpu->flag = true; 499 } 500 int index = MAX_SQ_NUM; 501 bool flag = true; 502 503 for (int i = 0; i < MAX_SQ_NUM; i++) { 504 if (sqstat[i] == p) 505 flag = false; 506 if (!sqstat[i] || task_cpu(sqstat[i]) != task_cpu(p)) { 507 sqstat[i] = NULL; 508 if (i < index) 509 index = i; 510 } 511 } 512 if (flag && index < MAX_SQ_NUM) 513 sqstat[index] = p; 514 } 515 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization. 2023-09-28 2:22 ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li 2023-09-28 7:52 ` kernel test robot @ 2023-09-28 8:02 ` Peter Zijlstra 2023-09-29 0:31 ` Thomas Gleixner 2 siblings, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2023-09-28 8:02 UTC (permalink / raw) To: Xiaobing Li Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence, linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang On Thu, Sep 28, 2023 at 10:22:26AM +0800, Xiaobing Li wrote: > + for (int i = 0; i < MAX_SQ_NUM; i++) { > + if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p) > + || sqstat[i]->__state == TASK_DEAD)) > + sqstat[i] = NULL; > + } This coding style is horrific, please don't ever use that again. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization. 2023-09-28 2:22 ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li 2023-09-28 7:52 ` kernel test robot 2023-09-28 8:02 ` Peter Zijlstra @ 2023-09-29 0:31 ` Thomas Gleixner 2 siblings, 0 replies; 12+ messages in thread From: Thomas Gleixner @ 2023-09-29 0:31 UTC (permalink / raw) To: Xiaobing Li, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li On Thu, Sep 28 2023 at 10:22, Xiaobing Li wrote: > Since pelt takes the running time of the thread as utilization, and for > some threads, although they are running, they are not actually > processing any transactions and are in an idling state. > our goal is to count the effective working time of the thread, so as to > Calculate the true utilization of threads. Sorry. I can't figure out from the above what you are trying to achieve and which problem this is actualy solving. > +void get_sqthread_util(struct task_struct *p) > +{ > + struct task_struct **sqstat = kcpustat_this_cpu->sq_util; > + > + for (int i = 0; i < MAX_SQ_NUM; i++) { > + if (sqstat[i] && (task_cpu(sqstat[i]) != task_cpu(p) > + || sqstat[i]->__state == TASK_DEAD)) > + sqstat[i] = NULL; > + } This is unreadable. > + > + if (strncmp(p->comm, "iou-sqp", 7)) > + return; You really want to do hard coded string parsing on every invocation of this function, which happens at least once per tick, right? What's so special about iou-sqp? Nothing at all. It might be special for your particular workload but its completely irrelevant to everyone else. We are not adding random statistics to the hotpath just because. Please come back once you have a proper justification for imposing this On everyone which is not using iouring at all. Thanks tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
[parent not found: <CGME20230928023011epcas5p32668b193b447f1cd6bf78035f4894c42@epcas5p3.samsung.com>]
* [PATCH 2/3] PROC FILESYSTEM: Add real utilization data of sq thread. [not found] ` <CGME20230928023011epcas5p32668b193b447f1cd6bf78035f4894c42@epcas5p3.samsung.com> @ 2023-09-28 2:22 ` Xiaobing Li 0 siblings, 0 replies; 12+ messages in thread From: Xiaobing Li @ 2023-09-28 2:22 UTC (permalink / raw) To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li Since it is necessary to count and record the real utilization of the sq thread, it is recorded in the /proc/stat file. Signed-off-by: Xiaobing Li <[email protected]> --- fs/proc/stat.c | 25 ++++++++++++++++++++++++- include/linux/kernel_stat.h | 3 +++ 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/fs/proc/stat.c b/fs/proc/stat.c index da60956b2915..bd86c0657874 100644 --- a/fs/proc/stat.c +++ b/fs/proc/stat.c @@ -79,6 +79,29 @@ static void show_all_irqs(struct seq_file *p) show_irq_gap(p, nr_irqs - next); } +static void show_sqthread_util(struct seq_file *p) +{ + int i, j; + + for_each_online_cpu(i) { + struct kernel_cpustat kcpustat; + + kcpustat_cpu_fetch(&kcpustat, i); + struct task_struct **sqstat = kcpustat.sq_util; + + for (j = 0; j < MAX_SQ_NUM; j++) { + if (sqstat[j]) { + seq_printf(p, "%d %s", sqstat[j]->pid, sqstat[j]->comm); + seq_put_decimal_ull(p, " pelt ", + (unsigned long long)sqstat[j]->se.avg.util_avg); + seq_put_decimal_ull(p, " real ", + (unsigned long long)sqstat[j]->se.sq_avg.util_avg); + seq_putc(p, '\n'); + } + } + } +} + static int show_stat(struct seq_file *p, void *v) { int i, j; @@ -187,7 +210,7 @@ static int show_stat(struct seq_file *p, void *v) for (i = 0; i < NR_SOFTIRQS; i++) seq_put_decimal_ull(p, " ", per_softirq_sums[i]); seq_putc(p, '\n'); - + show_sqthread_util(p); return 0; } diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 9935f7ecbfb9..722703da681e 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -11,6 +11,7 @@ #include <linux/vtime.h> #include <asm/irq.h> +#define MAX_SQ_NUM 16 /* * 'kernel_stat.h' contains the definitions needed for doing * some kernel statistics (CPU usage, context switches ...), @@ -36,6 +37,8 @@ enum cpu_usage_stat { struct kernel_cpustat { u64 cpustat[NR_STATS]; + bool flag; + struct task_struct *sq_util[MAX_SQ_NUM]; }; struct kernel_stat { -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
[parent not found: <CGME20230928023015epcas5p273b3eaebf3759790c278b03c7f0341c8@epcas5p2.samsung.com>]
* [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads. [not found] ` <CGME20230928023015epcas5p273b3eaebf3759790c278b03c7f0341c8@epcas5p2.samsung.com> @ 2023-09-28 2:22 ` Xiaobing Li 2023-09-28 8:01 ` Peter Zijlstra 2023-09-28 18:33 ` kernel test robot 0 siblings, 2 replies; 12+ messages in thread From: Xiaobing Li @ 2023-09-28 2:22 UTC (permalink / raw) To: mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence Cc: linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li Since the sq thread has a while(1) structure, during this process, there may be a lot of time that is not processing IO but does not exceed the timeout period, therefore, the sqpoll thread will keep running and will keep occupying the CPU. Obviously, the CPU is wasted at this time;Our goal is to count the part of the time that the sqpoll thread actually processes IO, so as to reflect the part of the CPU it uses to process IO, which can be used to help improve the actual utilization of the CPU in the future. Signed-off-by: Xiaobing Li <[email protected]> --- io_uring/sqpoll.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c index bd6c2c7959a5..2c5fc4d95fa8 100644 --- a/io_uring/sqpoll.c +++ b/io_uring/sqpoll.c @@ -11,6 +11,7 @@ #include <linux/audit.h> #include <linux/security.h> #include <linux/io_uring.h> +#include <linux/time.h> #include <uapi/linux/io_uring.h> @@ -235,6 +236,10 @@ static int io_sq_thread(void *data) set_cpus_allowed_ptr(current, cpu_online_mask); mutex_lock(&sqd->lock); + bool first = true; + struct timespec64 ts_start, ts_end; + struct timespec64 ts_delta; + struct sched_entity *se = &sqd->thread->se; while (1) { bool cap_entries, sqt_spin = false; @@ -243,7 +248,16 @@ static int io_sq_thread(void *data) break; timeout = jiffies + sqd->sq_thread_idle; } - + ktime_get_boottime_ts64(&ts_start); + ts_delta = timespec64_sub(ts_start, ts_end); + unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + + se->sq_avg.last_update_time; + + if (first) { + now = 0; + first = false; + } + __update_sq_avg_block(now, se); cap_entries = !list_is_singular(&sqd->ctx_list); list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { int ret = __io_sq_thread(ctx, cap_entries); @@ -251,6 +265,16 @@ static int io_sq_thread(void *data) if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list))) sqt_spin = true; } + + ktime_get_boottime_ts64(&ts_end); + ts_delta = timespec64_sub(ts_end, ts_start); + now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + + se->sq_avg.last_update_time; + + if (sqt_spin) + __update_sq_avg(now, se); + else + __update_sq_avg_block(now, se); if (io_run_task_work()) sqt_spin = true; -- 2.34.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads. 2023-09-28 2:22 ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li @ 2023-09-28 8:01 ` Peter Zijlstra 2023-09-28 8:23 ` Jens Axboe 2023-09-28 8:37 ` Matthew Wilcox 2023-09-28 18:33 ` kernel test robot 1 sibling, 2 replies; 12+ messages in thread From: Peter Zijlstra @ 2023-09-28 8:01 UTC (permalink / raw) To: Xiaobing Li Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence, linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang On Thu, Sep 28, 2023 at 10:22:28AM +0800, Xiaobing Li wrote: > Since the sq thread has a while(1) structure, during this process, there > may be a lot of time that is not processing IO but does not exceed the > timeout period, therefore, the sqpoll thread will keep running and will > keep occupying the CPU. Obviously, the CPU is wasted at this time;Our > goal is to count the part of the time that the sqpoll thread actually > processes IO, so as to reflect the part of the CPU it uses to process > IO, which can be used to help improve the actual utilization of the CPU > in the future. > > Signed-off-by: Xiaobing Li <[email protected]> > --- > io_uring/sqpoll.c | 26 +++++++++++++++++++++++++- > 1 file changed, 25 insertions(+), 1 deletion(-) > > diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c > index bd6c2c7959a5..2c5fc4d95fa8 100644 > --- a/io_uring/sqpoll.c > +++ b/io_uring/sqpoll.c > @@ -11,6 +11,7 @@ > #include <linux/audit.h> > #include <linux/security.h> > #include <linux/io_uring.h> > +#include <linux/time.h> > > #include <uapi/linux/io_uring.h> > > @@ -235,6 +236,10 @@ static int io_sq_thread(void *data) > set_cpus_allowed_ptr(current, cpu_online_mask); > > mutex_lock(&sqd->lock); > + bool first = true; > + struct timespec64 ts_start, ts_end; > + struct timespec64 ts_delta; > + struct sched_entity *se = &sqd->thread->se; > while (1) { > bool cap_entries, sqt_spin = false; > > @@ -243,7 +248,16 @@ static int io_sq_thread(void *data) > break; > timeout = jiffies + sqd->sq_thread_idle; > } > - > + ktime_get_boottime_ts64(&ts_start); > + ts_delta = timespec64_sub(ts_start, ts_end); > + unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + > + se->sq_avg.last_update_time; > + > + if (first) { > + now = 0; > + first = false; > + } > + __update_sq_avg_block(now, se); > cap_entries = !list_is_singular(&sqd->ctx_list); > list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { > int ret = __io_sq_thread(ctx, cap_entries); > @@ -251,6 +265,16 @@ static int io_sq_thread(void *data) > if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list))) > sqt_spin = true; > } > + > + ktime_get_boottime_ts64(&ts_end); > + ts_delta = timespec64_sub(ts_end, ts_start); > + now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + > + se->sq_avg.last_update_time; > + > + if (sqt_spin) > + __update_sq_avg(now, se); > + else > + __update_sq_avg_block(now, se); > if (io_run_task_work()) > sqt_spin = true; > All of this is quite insane, but the above is actually broken. You're using wall-time to measure runtime of a preemptible thread. On top of that, for extra insanity, you're using the frigging insane timespec interface, because clearly the _ns() interfaces are too complicated or something? And that whole first thing is daft too, pull now out of the loop and set it before, then all that goes away. Now, I see what you're trying to do, but who actually uses this data? Finally, please don't scream in the subject :/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads. 2023-09-28 8:01 ` Peter Zijlstra @ 2023-09-28 8:23 ` Jens Axboe 2023-09-28 8:37 ` Matthew Wilcox 1 sibling, 0 replies; 12+ messages in thread From: Jens Axboe @ 2023-09-28 8:23 UTC (permalink / raw) To: Peter Zijlstra, Xiaobing Li Cc: mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, asml.silence, linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang On 9/28/23 2:01 AM, Peter Zijlstra wrote: > All of this is quite insane, but the above is actually broken. You're > using wall-time to measure runtime of a preemptible thread. Would have to agree with that... wall-time specifics aside, this whole thing seems to attempt to solve an issue quite the wrong way around. > Now, I see what you're trying to do, but who actually uses this data? This is the important question - and if you need this data, then why not just account it in sqpoll itself and have some way to export it? Let's please not implicate the core kernel bits for that. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads. 2023-09-28 8:01 ` Peter Zijlstra 2023-09-28 8:23 ` Jens Axboe @ 2023-09-28 8:37 ` Matthew Wilcox 2023-09-28 8:41 ` Jens Axboe 1 sibling, 1 reply; 12+ messages in thread From: Matthew Wilcox @ 2023-09-28 8:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence, linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang On Thu, Sep 28, 2023 at 10:01:14AM +0200, Peter Zijlstra wrote: > Now, I see what you're trying to do, but who actually uses this data? I ... don't. There seems to be the notion that since we're polling, that shouldn't count against the runtime of the thread. But the thread has chosen to poll! It is doing something! For one thing, it's preventing the CPU from entering an idle state. It seems absolutely fair to accuont this poll time to the runtime of the thread. Clearly i'm missing something. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads. 2023-09-28 8:37 ` Matthew Wilcox @ 2023-09-28 8:41 ` Jens Axboe 0 siblings, 0 replies; 12+ messages in thread From: Jens Axboe @ 2023-09-28 8:41 UTC (permalink / raw) To: Matthew Wilcox, Peter Zijlstra Cc: Xiaobing Li, mingo, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, asml.silence, linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang On 9/28/23 2:37 AM, Matthew Wilcox wrote: > On Thu, Sep 28, 2023 at 10:01:14AM +0200, Peter Zijlstra wrote: >> Now, I see what you're trying to do, but who actually uses this data? > > I ... don't. There seems to be the notion that since we're polling, that > shouldn't count against the runtime of the thread. But the thread has > chosen to poll! It is doing something! For one thing, it's preventing > the CPU from entering an idle state. It seems absolutely fair to > accuont this poll time to the runtime of the thread. Clearly i'm > missing something. For sure, it should be accounted as CPU time, as it is exactly that. You could argue that if we needed to preempt this task for something else we would do that (and the code does check that on every loop), but it's still using CPU. I can see maybe wanting to know how much of the total time the thread spent doing ACTUAL work rather than just polling for new work, but that's not really something the scheduler should be involved in and should be purely an io_uring sqpoll stat of some sort if that is truly interesting for an application. -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads. 2023-09-28 2:22 ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li 2023-09-28 8:01 ` Peter Zijlstra @ 2023-09-28 18:33 ` kernel test robot 1 sibling, 0 replies; 12+ messages in thread From: kernel test robot @ 2023-09-28 18:33 UTC (permalink / raw) To: Xiaobing Li, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, bristot, vschneid, axboe, asml.silence Cc: llvm, oe-kbuild-all, linux-kernel, linux-fsdevel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang, Xiaobing Li Hi Xiaobing, kernel test robot noticed the following build errors: [auto build test ERROR on tip/sched/core] [also build test ERROR on linus/master v6.6-rc3 next-20230928] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Xiaobing-Li/SCHEDULER-Add-an-interface-for-counting-real-utilization/20230928-103219 base: tip/sched/core patch link: https://lore.kernel.org/r/20230928022228.15770-4-xiaobing.li%40samsung.com patch subject: [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads. config: arm-mmp2_defconfig (https://download.01.org/0day-ci/archive/20230929/[email protected]/config) compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230929/[email protected]/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <[email protected]> | Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ All errors (new ones prefixed by >>): io_uring/sqpoll.c:254:7: error: no member named 'sq_avg' in 'struct sched_entity' se->sq_avg.last_update_time; ~~ ^ >> io_uring/sqpoll.c:260:3: error: call to undeclared function '__update_sq_avg_block'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration] __update_sq_avg_block(now, se); ^ io_uring/sqpoll.c:272:7: error: no member named 'sq_avg' in 'struct sched_entity' se->sq_avg.last_update_time; ~~ ^ >> io_uring/sqpoll.c:275:4: error: call to undeclared function '__update_sq_avg'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration] __update_sq_avg(now, se); ^ 4 errors generated. vim +/__update_sq_avg_block +260 io_uring/sqpoll.c 221 222 static int io_sq_thread(void *data) 223 { 224 struct io_sq_data *sqd = data; 225 struct io_ring_ctx *ctx; 226 unsigned long timeout = 0; 227 char buf[TASK_COMM_LEN]; 228 DEFINE_WAIT(wait); 229 230 snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid); 231 set_task_comm(current, buf); 232 233 if (sqd->sq_cpu != -1) 234 set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu)); 235 else 236 set_cpus_allowed_ptr(current, cpu_online_mask); 237 238 mutex_lock(&sqd->lock); 239 bool first = true; 240 struct timespec64 ts_start, ts_end; 241 struct timespec64 ts_delta; 242 struct sched_entity *se = &sqd->thread->se; 243 while (1) { 244 bool cap_entries, sqt_spin = false; 245 246 if (io_sqd_events_pending(sqd) || signal_pending(current)) { 247 if (io_sqd_handle_event(sqd)) 248 break; 249 timeout = jiffies + sqd->sq_thread_idle; 250 } 251 ktime_get_boottime_ts64(&ts_start); 252 ts_delta = timespec64_sub(ts_start, ts_end); 253 unsigned long long now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + 254 se->sq_avg.last_update_time; 255 256 if (first) { 257 now = 0; 258 first = false; 259 } > 260 __update_sq_avg_block(now, se); 261 cap_entries = !list_is_singular(&sqd->ctx_list); 262 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { 263 int ret = __io_sq_thread(ctx, cap_entries); 264 265 if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list))) 266 sqt_spin = true; 267 } 268 269 ktime_get_boottime_ts64(&ts_end); 270 ts_delta = timespec64_sub(ts_end, ts_start); 271 now = ts_delta.tv_sec * NSEC_PER_SEC + ts_delta.tv_nsec + 272 se->sq_avg.last_update_time; 273 274 if (sqt_spin) > 275 __update_sq_avg(now, se); 276 else 277 __update_sq_avg_block(now, se); 278 if (io_run_task_work()) 279 sqt_spin = true; 280 281 if (sqt_spin || !time_after(jiffies, timeout)) { 282 if (sqt_spin) 283 timeout = jiffies + sqd->sq_thread_idle; 284 if (unlikely(need_resched())) { 285 mutex_unlock(&sqd->lock); 286 cond_resched(); 287 mutex_lock(&sqd->lock); 288 } 289 continue; 290 } 291 292 prepare_to_wait(&sqd->wait, &wait, TASK_INTERRUPTIBLE); 293 if (!io_sqd_events_pending(sqd) && !task_work_pending(current)) { 294 bool needs_sched = true; 295 296 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) { 297 atomic_or(IORING_SQ_NEED_WAKEUP, 298 &ctx->rings->sq_flags); 299 if ((ctx->flags & IORING_SETUP_IOPOLL) && 300 !wq_list_empty(&ctx->iopoll_list)) { 301 needs_sched = false; 302 break; 303 } 304 305 /* 306 * Ensure the store of the wakeup flag is not 307 * reordered with the load of the SQ tail 308 */ 309 smp_mb__after_atomic(); 310 311 if (io_sqring_entries(ctx)) { 312 needs_sched = false; 313 break; 314 } 315 } 316 317 if (needs_sched) { 318 mutex_unlock(&sqd->lock); 319 schedule(); 320 mutex_lock(&sqd->lock); 321 } 322 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) 323 atomic_andnot(IORING_SQ_NEED_WAKEUP, 324 &ctx->rings->sq_flags); 325 } 326 327 finish_wait(&sqd->wait, &wait); 328 timeout = jiffies + sqd->sq_thread_idle; 329 } 330 331 io_uring_cancel_generic(true, sqd); 332 sqd->thread = NULL; 333 list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) 334 atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags); 335 io_run_task_work(); 336 mutex_unlock(&sqd->lock); 337 338 complete(&sqd->exited); 339 do_exit(0); 340 } 341 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-09-29 0:32 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20230928023004epcas5p4a6dac4cda9c867f3673690521a8c18b6@epcas5p4.samsung.com> 2023-09-28 2:22 ` [PATCH 0/3] Sq thread real utilization statistics Xiaobing Li [not found] ` <CGME20230928023007epcas5p276b6e029a67001a6ed8ab28c05b2be9c@epcas5p2.samsung.com> 2023-09-28 2:22 ` [PATCH 1/3] SCHEDULER: Add an interface for counting real utilization Xiaobing Li 2023-09-28 7:52 ` kernel test robot 2023-09-28 8:02 ` Peter Zijlstra 2023-09-29 0:31 ` Thomas Gleixner [not found] ` <CGME20230928023011epcas5p32668b193b447f1cd6bf78035f4894c42@epcas5p3.samsung.com> 2023-09-28 2:22 ` [PATCH 2/3] PROC FILESYSTEM: Add real utilization data of sq thread Xiaobing Li [not found] ` <CGME20230928023015epcas5p273b3eaebf3759790c278b03c7f0341c8@epcas5p2.samsung.com> 2023-09-28 2:22 ` [PATCH 3/3] IO_URING: Statistics of the true utilization of sq threads Xiaobing Li 2023-09-28 8:01 ` Peter Zijlstra 2023-09-28 8:23 ` Jens Axboe 2023-09-28 8:37 ` Matthew Wilcox 2023-09-28 8:41 ` Jens Axboe 2023-09-28 18:33 ` kernel test robot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox