public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
       [not found] <CGME20231108081516epcas5p442a11004e3b4e6339972fd6da4c6692b@epcas5p4.samsung.com>
@ 2023-11-08  8:07 ` Xiaobing Li
  2023-11-08 15:26   ` Jens Axboe
                     ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Xiaobing Li @ 2023-11-08  8:07 UTC (permalink / raw)
  To: axboe, asml.silence
  Cc: linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar,
	wenwen.chen, ruyi.zhang, Xiaobing Li

Since the sq thread has a while(1) structure, during this process, there
may be a lot of time that is not processing IO but does not exceed the
timeout period, therefore, the sqpoll thread will keep running and will
keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
goal is to count the part of the time that the sqpoll thread actually
processes IO, so as to reflect the part of the CPU it uses to process
IO, which can be used to help improve the actual utilization of the CPU
in the future.

Signed-off-by: Xiaobing Li <[email protected]>

v1 -> v2: Added method to query data.

The test results are as follows:
cat /proc/11440/fdinfo/6
pos:    0
flags:  02000002
mnt_id: 16
ino:    94449
SqMask: 0xf
SqHead: 1845170
SqTail: 1845170
CachedSqHead:   1845170
CqMask: 0xf
CqHead: 1845154
CqTail: 1845154
CachedCqTail:   1845154
SQEs:   0
CQEs:   0
SqThread:       -1
SqThreadCpu:    -1
UserFiles:      1
UserBufs:       0
PollList:
CqOverflowList:
PID:    11440
work:   18794
total:  19123

---
 io_uring/fdinfo.c | 6 ++++++
 io_uring/sqpoll.c | 8 ++++++++
 io_uring/sqpoll.h | 2 ++
 3 files changed, 16 insertions(+)

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index f04a43044d91..f0b79c533062 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -213,6 +213,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 
 	}
 
+	if (ctx->sq_data) {
+		seq_printf(m, "PID:\t%d\n", task_pid_nr(ctx->sq_data->thread));
+		seq_printf(m, "work:\t%lu\n", ctx->sq_data->work);
+		seq_printf(m, "total:\t%lu\n", ctx->sq_data->total);
+	}
+
 	spin_unlock(&ctx->completion_lock);
 }
 #endif
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index bd6c2c7959a5..c821273406bd 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -224,6 +224,7 @@ static int io_sq_thread(void *data)
 	struct io_ring_ctx *ctx;
 	unsigned long timeout = 0;
 	char buf[TASK_COMM_LEN];
+	unsigned long start, begin, end;
 	DEFINE_WAIT(wait);
 
 	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
@@ -235,6 +236,7 @@ static int io_sq_thread(void *data)
 		set_cpus_allowed_ptr(current, cpu_online_mask);
 
 	mutex_lock(&sqd->lock);
+	start = jiffies;
 	while (1) {
 		bool cap_entries, sqt_spin = false;
 
@@ -245,12 +247,18 @@ static int io_sq_thread(void *data)
 		}
 
 		cap_entries = !list_is_singular(&sqd->ctx_list);
+		begin = jiffies;
 		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
 			int ret = __io_sq_thread(ctx, cap_entries);
 
 			if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
 				sqt_spin = true;
 		}
+		end = jiffies;
+		sqd->total = end - start;
+		if (sqt_spin == true)
+			sqd->work += end - begin;
+
 		if (io_run_task_work())
 			sqt_spin = true;
 
diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
index 8df37e8c9149..0aa4e2efa4db 100644
--- a/io_uring/sqpoll.h
+++ b/io_uring/sqpoll.h
@@ -16,6 +16,8 @@ struct io_sq_data {
 	pid_t			task_pid;
 	pid_t			task_tgid;
 
+	unsigned long       work;
+	unsigned long       total;
 	unsigned long		state;
 	struct completion	exited;
 };
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
  2023-11-08  8:07 ` [PATCH v2] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
@ 2023-11-08 15:26   ` Jens Axboe
       [not found]     ` <CGME20231113031827epcas5p2e6e858292a3cd4b9e03b4615d972b22d@epcas5p2.samsung.com>
  2023-11-08 16:23   ` Gabriel Krisman Bertazi
  2023-11-09 16:14   ` Pavel Begunkov
  2 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2023-11-08 15:26 UTC (permalink / raw)
  To: Xiaobing Li, asml.silence
  Cc: linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar,
	wenwen.chen, ruyi.zhang

On 11/8/23 1:07 AM, Xiaobing Li wrote:
> Since the sq thread has a while(1) structure, during this process, there
> may be a lot of time that is not processing IO but does not exceed the
> timeout period, therefore, the sqpoll thread will keep running and will
> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
> goal is to count the part of the time that the sqpoll thread actually
> processes IO, so as to reflect the part of the CPU it uses to process
> IO, which can be used to help improve the actual utilization of the CPU
> in the future.

There should be an explanation in here on what 'work' and 'total' mean.

> The test results are as follows:
> cat /proc/11440/fdinfo/6
> pos:    0
> flags:  02000002
> mnt_id: 16
> ino:    94449
> SqMask: 0xf
> SqHead: 1845170
> SqTail: 1845170
> CachedSqHead:   1845170
> CqMask: 0xf
> CqHead: 1845154
> CqTail: 1845154
> CachedCqTail:   1845154
> SQEs:   0
> CQEs:   0
> SqThread:       -1
> SqThreadCpu:    -1
> UserFiles:      1
> UserBufs:       0
> PollList:
> CqOverflowList:
> PID:    11440
> work:   18794
> total:  19123

These should go with the other Sq thread related ones, eg be SqWork and
SqTotal. It's counted in jiffies right now which is a bit odd in terms
of being exposed, as you'd need to know what the base of that unit is.
But probably not much of a concern, as work/total is really the metric
you care about. Maybe it'd be better to expose it as a percentage, and
get rid of total? Eg just have SqBusy: xx% be the output.

> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
> index f04a43044d91..f0b79c533062 100644
> --- a/io_uring/fdinfo.c
> +++ b/io_uring/fdinfo.c
> @@ -213,6 +213,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
>  
>  	}
>  
> +	if (ctx->sq_data) {
> +		seq_printf(m, "PID:\t%d\n", task_pid_nr(ctx->sq_data->thread));
> +		seq_printf(m, "work:\t%lu\n", ctx->sq_data->work);
> +		seq_printf(m, "total:\t%lu\n", ctx->sq_data->total);
> +	}
> +

This doesn't work, it needs proper locking. See how we get the other sq
values.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
  2023-11-08  8:07 ` [PATCH v2] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
  2023-11-08 15:26   ` Jens Axboe
@ 2023-11-08 16:23   ` Gabriel Krisman Bertazi
  2023-11-09 16:14   ` Pavel Begunkov
  2 siblings, 0 replies; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-11-08 16:23 UTC (permalink / raw)
  To: Xiaobing Li
  Cc: axboe, asml.silence, linux-kernel, io-uring, kun.dou, peiwei.li,
	joshi.k, kundan.kumar, wenwen.chen, ruyi.zhang

Xiaobing Li <[email protected]> writes:

> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
> index f04a43044d91..f0b79c533062 100644
> --- a/io_uring/fdinfo.c
> +++ b/io_uring/fdinfo.c
> @@ -213,6 +213,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
>  
>  	}
>  
> +	if (ctx->sq_data) {
> +		seq_printf(m, "PID:\t%d\n", task_pid_nr(ctx->sq_data->thread));

ctx->sq_data and sq_data->thread can become NULL if the queue is being
setup/going away. You need to hold the uring_lock and the
ctx->sq_data->lock to access this.

But task_pid_nr is already published in this file a bit before this
hunk.  Perhaps drop this line and move the two lines below together with
them.

> +		seq_printf(m, "work:\t%lu\n", ctx->sq_data->work);
> +		seq_printf(m, "total:\t%lu\n", ctx->sq_data->total);
> +	}

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
  2023-11-08  8:07 ` [PATCH v2] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
  2023-11-08 15:26   ` Jens Axboe
  2023-11-08 16:23   ` Gabriel Krisman Bertazi
@ 2023-11-09 16:14   ` Pavel Begunkov
  2 siblings, 0 replies; 6+ messages in thread
From: Pavel Begunkov @ 2023-11-09 16:14 UTC (permalink / raw)
  To: Xiaobing Li, axboe
  Cc: linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar,
	wenwen.chen, ruyi.zhang

On 11/8/23 08:07, Xiaobing Li wrote:
> Since the sq thread has a while(1) structure, during this process, there
> may be a lot of time that is not processing IO but does not exceed the
> timeout period, therefore, the sqpoll thread will keep running and will
> keep occupying the CPU. Obviously, the CPU is wasted at this time;Our
> goal is to count the part of the time that the sqpoll thread actually
> processes IO, so as to reflect the part of the CPU it uses to process
> IO, which can be used to help improve the actual utilization of the CPU
> in the future.

Let's pull the elephant out of the room, what's the use case? "Improve
in the future" doesn't sound too convincing. If it's a future kernel
change you have in mind, it has to go together with this patch. If it's
a userspace application, it'd be interesting to hear what that is,
especially if you have numbers ready.

And another classic question, why can't it be done with bpf?


> Signed-off-by: Xiaobing Li <[email protected]>
> 
> v1 -> v2: Added method to query data.
> 
...
> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
> index bd6c2c7959a5..c821273406bd 100644
> --- a/io_uring/sqpoll.c
> +++ b/io_uring/sqpoll.c
> @@ -224,6 +224,7 @@ static int io_sq_thread(void *data)
>   	struct io_ring_ctx *ctx;
>   	unsigned long timeout = 0;
>   	char buf[TASK_COMM_LEN];
> +	unsigned long start, begin, end;

start and begin used for just slightly different accounting,
it'll get confused anyone.

>   	DEFINE_WAIT(wait);
>   
>   	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
> @@ -235,6 +236,7 @@ static int io_sq_thread(void *data)
>   		set_cpus_allowed_ptr(current, cpu_online_mask);
>   
>   	mutex_lock(&sqd->lock);
> +	start = jiffies;
>   	while (1) {
>   		bool cap_entries, sqt_spin = false;
>   
> @@ -245,12 +247,18 @@ static int io_sq_thread(void *data)
>   		}
>   
>   		cap_entries = !list_is_singular(&sqd->ctx_list);
> +		begin = jiffies;

There can be {hard,soft}irq in between jiffies reads, and it can even
be scheduled out in favour of another process, so it'd collect a lot
of garbage. There should be a per-task stat for system time you can
use:

start = get_system_time(current);
do_io_part();
sq->total_time += get_system_time(current) - start;
wait();

...

>   		list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
>   			int ret = __io_sq_thread(ctx, cap_entries);
>   
>   			if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
>   				sqt_spin = true;
>   		}
> +		end = jiffies;
> +		sqd->total = end - start;

...and then you don't need to track total at all, it'd be your

total = get_system_time(sq_thread /* current */);

at any given point it time.


> +		if (sqt_spin == true)
> +			sqd->work += end - begin;

It should go after the io_run_task_work() below, task_work is a major
part of request execution.

> +
>   		if (io_run_task_work())
>   			sqt_spin = true;
>   
> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
> index 8df37e8c9149..0aa4e2efa4db 100644
> --- a/io_uring/sqpoll.h
> +++ b/io_uring/sqpoll.h
> @@ -16,6 +16,8 @@ struct io_sq_data {
>   	pid_t			task_pid;
>   	pid_t			task_tgid;
>   
> +	unsigned long       work;
> +	unsigned long       total;
>   	unsigned long		state;
>   	struct completion	exited;
>   };

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Re: [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
       [not found]     ` <CGME20231113031827epcas5p2e6e858292a3cd4b9e03b4615d972b22d@epcas5p2.samsung.com>
@ 2023-11-13  3:10       ` Xiaobing Li
  2023-11-13 16:38         ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Xiaobing Li @ 2023-11-13  3:10 UTC (permalink / raw)
  To: axboe
  Cc: asml.silence, linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k,
	kundan.kumar, wenwen.chen, ruyi.zhang, xiaobing.li

After careful consideration and testing, I don't think getting the
uring_lock is possible here, for the following reasons:
Due to lock competition, ctx->uring_lock and sq->lock are usually not
available here. The best proof is that the values of SqThread and
SqThreadCpu always output -1. In this case, it is impossible to obtain
the required work_time and total_time values.
In fact, it should be feasible to obtain work_time and total_time by
judging that ctx->sq_data is not NULL, because if the sq thread exits,
the action of reading data will also stop, and the possibility of a null
pointer reference is very low.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] io_uring: Statistics of the true utilization of sq threads.
  2023-11-13  3:10       ` Xiaobing Li
@ 2023-11-13 16:38         ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2023-11-13 16:38 UTC (permalink / raw)
  To: Xiaobing Li
  Cc: asml.silence, linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k,
	kundan.kumar, wenwen.chen, ruyi.zhang

On 11/12/23 8:10 PM, Xiaobing Li wrote:
> After careful consideration and testing, I don't think getting the
> uring_lock is possible here, for the following reasons:
> Due to lock competition, ctx->uring_lock and sq->lock are usually not
> available here. The best proof is that the values of SqThread and
> SqThreadCpu always output -1. In this case, it is impossible to obtain
> the required work_time and total_time values.
> In fact, it should be feasible to obtain work_time and total_time by
> judging that ctx->sq_data is not NULL, because if the sq thread exits,
> the action of reading data will also stop, and the possibility of a null
> pointer reference is very low.

We have that problem right now, in the current tree. And agree it's not
the best. sq_data should be fine under the (ctx) lock, it's just the
thread that may go away. Maybe we just cache the cpu/pid of it when we
create it, seems better than needing to query it. And for the other
stats, should be fine in ctx->sq_data.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-13 16:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20231108081516epcas5p442a11004e3b4e6339972fd6da4c6692b@epcas5p4.samsung.com>
2023-11-08  8:07 ` [PATCH v2] io_uring: Statistics of the true utilization of sq threads Xiaobing Li
2023-11-08 15:26   ` Jens Axboe
     [not found]     ` <CGME20231113031827epcas5p2e6e858292a3cd4b9e03b4615d972b22d@epcas5p2.samsung.com>
2023-11-13  3:10       ` Xiaobing Li
2023-11-13 16:38         ` Jens Axboe
2023-11-08 16:23   ` Gabriel Krisman Bertazi
2023-11-09 16:14   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox