* Re: [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
2023-11-08 8:07 ` [PATCH v2] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
@ 2023-11-08 15:26 ` Jens Axboe
[not found] ` <CGME20231113031827epcas5p2e6e858292a3cd4b9e03b4615d972b22d@epcas5p2.samsung.com>
2023-11-08 16:23 ` Gabriel Krisman Bertazi
2023-11-09 16:14 ` Pavel Begunkov
2 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2023-11-08 15:26 UTC (permalink / raw)
To: Xiaobing Li, asml.silence
Cc: linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar,
wenwen.chen, ruyi.zhang
On 11/8/23 1:07 AM, Xiaobing Li wrote:
> Since the sq thread has a while(1) structure, during this process, there
> may be a lot of time that is not processing IO but does not exceed the
> timeout period, therefore, the sqpoll thread will keep running and will
> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
> goal is to count the part of the time that the sqpoll thread actually
> processes IO, so as to reflect the part of the CPU it uses to process
> IO, which can be used to help improve the actual utilization of the CPU
> in the future.
There should be an explanation in here on what 'work' and 'total' mean.
> The test results are as follows:
> cat /proc/11440/fdinfo/6
> pos: 0
> flags: 02000002
> mnt_id: 16
> ino: 94449
> SqMask: 0xf
> SqHead: 1845170
> SqTail: 1845170
> CachedSqHead: 1845170
> CqMask: 0xf
> CqHead: 1845154
> CqTail: 1845154
> CachedCqTail: 1845154
> SQEs: 0
> CQEs: 0
> SqThread: -1
> SqThreadCpu: -1
> UserFiles: 1
> UserBufs: 0
> PollList:
> CqOverflowList:
> PID: 11440
> work: 18794
> total: 19123
These should go with the other Sq thread related ones, eg be SqWork and
SqTotal. It's counted in jiffies right now which is a bit odd in terms
of being exposed, as you'd need to know what the base of that unit is.
But probably not much of a concern, as work/total is really the metric
you care about. Maybe it'd be better to expose it as a percentage, and
get rid of total? Eg just have SqBusy: xx% be the output.
> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
> index f04a43044d91..f0b79c533062 100644
> --- a/io_uring/fdinfo.c
> +++ b/io_uring/fdinfo.c
> @@ -213,6 +213,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
>
> }
>
> + if (ctx->sq_data) {
> + seq_printf(m, "PID:\t%d\n", task_pid_nr(ctx->sq_data->thread));
> + seq_printf(m, "work:\t%lu\n", ctx->sq_data->work);
> + seq_printf(m, "total:\t%lu\n", ctx->sq_data->total);
> + }
> +
This doesn't work, it needs proper locking. See how we get the other sq
values.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
2023-11-08 8:07 ` [PATCH v2] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
2023-11-08 15:26 ` Jens Axboe
@ 2023-11-08 16:23 ` Gabriel Krisman Bertazi
2023-11-09 16:14 ` Pavel Begunkov
2 siblings, 0 replies; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-11-08 16:23 UTC (permalink / raw)
To: Xiaobing Li
Cc: axboe, asml.silence, linux-kernel, io-uring, kun.dou, peiwei.li,
joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang
Xiaobing Li <[email protected]> writes:
> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
> index f04a43044d91..f0b79c533062 100644
> --- a/io_uring/fdinfo.c
> +++ b/io_uring/fdinfo.c
> @@ -213,6 +213,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
>
> }
>
> + if (ctx->sq_data) {
> + seq_printf(m, "PID:\t%d\n", task_pid_nr(ctx->sq_data->thread));
ctx->sq_data and sq_data->thread can become NULL if the queue is being
setup/going away. You need to hold the uring_lock and the
ctx->sq_data->lock to access this.
But task_pid_nr is already published in this file a bit before this
hunk. Perhaps drop this line and move the two lines below together with
them.
> + seq_printf(m, "work:\t%lu\n", ctx->sq_data->work);
> + seq_printf(m, "total:\t%lu\n", ctx->sq_data->total);
> + }
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
2023-11-08 8:07 ` [PATCH v2] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
2023-11-08 15:26 ` Jens Axboe
2023-11-08 16:23 ` Gabriel Krisman Bertazi
@ 2023-11-09 16:14 ` Pavel Begunkov
2 siblings, 0 replies; 6+ messages in thread
From: Pavel Begunkov @ 2023-11-09 16:14 UTC (permalink / raw)
To: Xiaobing Li, axboe
Cc: linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar,
wenwen.chen, ruyi.zhang
On 11/8/23 08:07, Xiaobing Li wrote:
> Since the sq thread has a while(1) structure, during this process, there
> may be a lot of time that is not processing IO but does not exceed the
> timeout period, therefore, the sqpoll thread will keep running and will
> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
> goal is to count the part of the time that the sqpoll thread actually
> processes IO, so as to reflect the part of the CPU it uses to process
> IO, which can be used to help improve the actual utilization of the CPU
> in the future.
Let's pull the elephant out of the room, what's the use case? "Improve
in the future" doesn't sound too convincing. If it's a future kernel
change you have in mind, it has to go together with this patch. If it's
a userspace application, it'd be interesting to hear what that is,
especially if you have numbers ready.
And another classic question, why can't it be done with bpf?
> Signed-off-by: Xiaobing Li <[email protected]>
>
> v1 -> v2: Added method to query data.
>
...
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index bd6c2c7959a5..c821273406bd 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -224,6 +224,7 @@ static int io_sq_thread(void *data)
> struct io_ring_ctx *ctx;
> unsigned long timeout = 0;
> char buf[TASK_COMM_LEN];
> + unsigned long start, begin, end;
start and begin used for just slightly different accounting,
it'll get confused anyone.
> DEFINE_WAIT(wait);
>
> snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
> @@ -235,6 +236,7 @@ static int io_sq_thread(void *data)
> set_cpus_allowed_ptr(current, cpu_online_mask);
>
> mutex_lock(&sqd->lock);
> + start = jiffies;
> while (1) {
> bool cap_entries, sqt_spin = false;
>
> @@ -245,12 +247,18 @@ static int io_sq_thread(void *data)
> }
>
> cap_entries = !list_is_singular(&sqd->ctx_list);
> + begin = jiffies;
There can be {hard,soft}irq in between jiffies reads, and it can even
be scheduled out in favour of another process, so it'd collect a lot
of garbage. There should be a per-task stat for system time you can
use:
start = get_system_time(current);
do_io_part();
sq->total_time += get_system_time(current) - start;
wait();
...
> list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
> int ret = __io_sq_thread(ctx, cap_entries);
>
> if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
> sqt_spin = true;
> }
> + end = jiffies;
> + sqd->total = end - start;
...and then you don't need to track total at all, it'd be your
total = get_system_time(sq_thread /* current */);
at any given point it time.
> + if (sqt_spin == true)
> + sqd->work += end - begin;
It should go after the io_run_task_work() below, task_work is a major
part of request execution.
> +
> if (io_run_task_work())
> sqt_spin = true;
>
> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
> index 8df37e8c9149..0aa4e2efa4db 100644
> --- a/io_uring/sqpoll.h
> +++ b/io_uring/sqpoll.h
> @@ -16,6 +16,8 @@ struct io_sq_data {
> pid_t task_pid;
> pid_t task_tgid;
>
> + unsigned long work;
> + unsigned long total;
> unsigned long state;
> struct completion exited;
> };
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 6+ messages in thread