public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v3] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval
@ 2023-11-15  2:36 Jens Axboe
       [not found] ` <CGME20231115061813epcas5p2bb6bebb451c6e2c65a5e9ec9ffac5f46@epcas5p2.samsung.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2023-11-15  2:36 UTC (permalink / raw)
  To: io-uring; +Cc: Pavel Begunkov

A previous commit added a trylock for getting the SQPOLL thread info via
fdinfo, but this introduced a regression where we often fail to get it if
the thread is busy. For that case, we end up not printing the current CPU
and PID info.

Rather than rely on this lock, just print the pid we already stored in
the io_sq_data struct, and ensure we update the current CPU every time
we've slept or potentially rescheduled. The latter won't potentially be
100% accurate, but that wasn't the case before either as the task can
get migrated at any time unless it has been pinned at creation time.

We retain keeping the io_sq_data dereference inside the ctx->uring_lock,
as it has always been, as destruction of the thread and data happen below
that. We could make this RCU safe, but there's little point in doing that.

With this, we always print the last valid information we had, rather than
have spurious outputs with missing information.

Fixes: 7644b1a1c9a7 ("io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid")
Signed-off-by: Jens Axboe <[email protected]>

---

v3: just use raw_smp_processor_id(), no need to query the task as it's
    the current one. also update it whenever we've re-acquired the lock
    after schedule or preemption, as it's cheap enough now to do so.

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index f04a43044d91..976e9500f651 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -145,13 +145,8 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
 		struct io_sq_data *sq = ctx->sq_data;
 
-		if (mutex_trylock(&sq->lock)) {
-			if (sq->thread) {
-				sq_pid = task_pid_nr(sq->thread);
-				sq_cpu = task_cpu(sq->thread);
-			}
-			mutex_unlock(&sq->lock);
-		}
+		sq_pid = sq->task_pid;
+		sq_cpu = sq->sq_cpu;
 	}
 
 	seq_printf(m, "SqThread:\t%d\n", sq_pid);
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index bd6c2c7959a5..a604cb6c5272 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -214,6 +214,7 @@ static bool io_sqd_handle_event(struct io_sq_data *sqd)
 			did_sig = get_signal(&ksig);
 		cond_resched();
 		mutex_lock(&sqd->lock);
+		sqd->sq_cpu = raw_smp_processor_id();
 	}
 	return did_sig || test_bit(IO_SQ_THREAD_SHOULD_STOP, &sqd->state);
 }
@@ -229,10 +230,12 @@ static int io_sq_thread(void *data)
 	snprintf(buf, sizeof(buf), "iou-sqp-%d", sqd->task_pid);
 	set_task_comm(current, buf);
 
-	if (sqd->sq_cpu != -1)
+	if (sqd->sq_cpu != -1) {
 		set_cpus_allowed_ptr(current, cpumask_of(sqd->sq_cpu));
-	else
+	} else {
 		set_cpus_allowed_ptr(current, cpu_online_mask);
+		sqd->sq_cpu = raw_smp_processor_id();
+	}
 
 	mutex_lock(&sqd->lock);
 	while (1) {
@@ -261,6 +264,7 @@ static int io_sq_thread(void *data)
 				mutex_unlock(&sqd->lock);
 				cond_resched();
 				mutex_lock(&sqd->lock);
+				sqd->sq_cpu = raw_smp_processor_id();
 			}
 			continue;
 		}
@@ -294,6 +298,7 @@ static int io_sq_thread(void *data)
 				mutex_unlock(&sqd->lock);
 				schedule();
 				mutex_lock(&sqd->lock);
+				sqd->sq_cpu = raw_smp_processor_id();
 			}
 			list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
 				atomic_andnot(IORING_SQ_NEED_WAKEUP,
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval
       [not found] ` <CGME20231115061813epcas5p2bb6bebb451c6e2c65a5e9ec9ffac5f46@epcas5p2.samsung.com>
@ 2023-11-15  6:10   ` Xiaobing Li
  2023-11-15 13:42     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Xiaobing Li @ 2023-11-15  6:10 UTC (permalink / raw)
  To: axboe
  Cc: asml.silence, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar,
	wenwen.chen, ruyi.zhang, xiaobing.li

On 11/15/23 2:36 AM, Jens Axboe wrote:
> 	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
> 		struct io_sq_data *sq = ctx->sq_data;
> 
>-		if (mutex_trylock(&sq->lock)) {
>-			if (sq->thread) {
>-				sq_pid = task_pid_nr(sq->thread);
>-				sq_cpu = task_cpu(sq->thread);
>-			}
>-			mutex_unlock(&sq->lock);
>-		}
>+		sq_pid = sq->task_pid;
>+		sq_cpu = sq->sq_cpu;
> 	}

There are two problems:
1.The output of SqThread is inaccurate. What is actually recorded is the PID of the parent process.
2. Sometimes it can output, sometimes it outputs -1.

The test results are as follows:
Every 0.5s: cat /proc/9572/fdinfo/6 | grep Sq
SqMask: 0x3
SqHead: 6765744
SqTail: 6765744
CachedSqHead:   6765744
SqThread:       -1
SqThreadCpu:    -1
SqBusy: 0%
-------------------------------------------
Every 0.5s: cat /proc/9572/fdinfo/6 | grep Sq
SqMask: 0x3
SqHead: 7348727
SqTail: 7348728
CachedSqHead:   7348728
SqThread:       9571
SqThreadCpu:    174
SqBusy: 95%


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval
  2023-11-15  6:10   ` Xiaobing Li
@ 2023-11-15 13:42     ` Jens Axboe
  2023-11-15 13:49       ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2023-11-15 13:42 UTC (permalink / raw)
  To: Xiaobing Li
  Cc: asml.silence, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar,
	wenwen.chen, ruyi.zhang

On 11/14/23 11:10 PM, Xiaobing Li wrote:
> On 11/15/23 2:36 AM, Jens Axboe wrote:
>> 	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
>> 		struct io_sq_data *sq = ctx->sq_data;
>>
>> -		if (mutex_trylock(&sq->lock)) {
>> -			if (sq->thread) {
>> -				sq_pid = task_pid_nr(sq->thread);
>> -				sq_cpu = task_cpu(sq->thread);
>> -			}
>> -			mutex_unlock(&sq->lock);
>> -		}
>> +		sq_pid = sq->task_pid;
>> +		sq_cpu = sq->sq_cpu;
>> 	}
> 
> There are two problems:
> 1.The output of SqThread is inaccurate. What is actually recorded is
> the PID of the parent process.

Doh yes, we need to reset this at the start of the thread, post
assigning task_comm. I'll send out a v4 today.

> 2. Sometimes it can output, sometimes it outputs -1.
> 
> The test results are as follows:
> Every 0.5s: cat /proc/9572/fdinfo/6 | grep Sq
> SqMask: 0x3
> SqHead: 6765744
> SqTail: 6765744
> CachedSqHead:   6765744
> SqThread:       -1
> SqThreadCpu:    -1
> SqBusy: 0%
> -------------------------------------------
> Every 0.5s: cat /proc/9572/fdinfo/6 | grep Sq
> SqMask: 0x3
> SqHead: 7348727
> SqTail: 7348728
> CachedSqHead:   7348728
> SqThread:       9571
> SqThreadCpu:    174
> SqBusy: 95%

Right, this is due to the uring_lock. We got rid of the main regression,
which was the new trylock for the sqd->lock, but the old one remains. We
can fix this as well for sqpoll info, but it's not a regression from
past releases, it's always been like that.

Pavel and I discussed it yesterday, and the easy solution is to make
io_sq_data be under RCU protection. But that requires this patch first,
so we don't have to fiddle with the sqpoll task itself. I can try and
hack up the patch if you want to test it, it'd be on top of this one and
for the next kernel release rather than 6.7.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval
  2023-11-15 13:42     ` Jens Axboe
@ 2023-11-15 13:49       ` Jens Axboe
       [not found]         ` <CGME20231118032740epcas5p20b6aad6264323376fa024bc2a56f0990@epcas5p2.samsung.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2023-11-15 13:49 UTC (permalink / raw)
  To: Xiaobing Li
  Cc: asml.silence, io-uring, kun.dou, peiwei.li, joshi.k, kundan.kumar,
	wenwen.chen, ruyi.zhang

On 11/15/23 6:42 AM, Jens Axboe wrote:
>> 2. Sometimes it can output, sometimes it outputs -1.
>>
>> The test results are as follows:
>> Every 0.5s: cat /proc/9572/fdinfo/6 | grep Sq
>> SqMask: 0x3
>> SqHead: 6765744
>> SqTail: 6765744
>> CachedSqHead:   6765744
>> SqThread:       -1
>> SqThreadCpu:    -1
>> SqBusy: 0%
>> -------------------------------------------
>> Every 0.5s: cat /proc/9572/fdinfo/6 | grep Sq
>> SqMask: 0x3
>> SqHead: 7348727
>> SqTail: 7348728
>> CachedSqHead:   7348728
>> SqThread:       9571
>> SqThreadCpu:    174
>> SqBusy: 95%
> 
> Right, this is due to the uring_lock. We got rid of the main
> regression, which was the new trylock for the sqd->lock, but the old
> one remains. We can fix this as well for sqpoll info, but it's not a
> regression from past releases, it's always been like that.
> 
> Pavel and I discussed it yesterday, and the easy solution is to make
> io_sq_data be under RCU protection. But that requires this patch
> first, so we don't have to fiddle with the sqpoll task itself. I can
> try and hack up the patch if you want to test it, it'd be on top of
> this one and for the next kernel release rather than 6.7.

Something like this, totally untested.

diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index 976e9500f651..434a21a6b653 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -142,11 +142,16 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f)
 	 */
 	has_lock = mutex_trylock(&ctx->uring_lock);
 
-	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
-		struct io_sq_data *sq = ctx->sq_data;
-
-		sq_pid = sq->task_pid;
-		sq_cpu = sq->sq_cpu;
+	if (ctx->flags & IORING_SETUP_SQPOLL) {
+		struct io_sq_data *sq;
+
+		rcu_read_lock();
+		sq = READ_ONCE(ctx->sq_data);
+		if (sq) {
+			sq_pid = sq->task_pid;
+			sq_cpu = sq->sq_cpu;
+		}
+		rcu_read_unlock();
 	}
 
 	seq_printf(m, "SqThread:\t%d\n", sq_pid);
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 65b5dbe3c850..583c76945cdf 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -70,7 +70,7 @@ void io_put_sq_data(struct io_sq_data *sqd)
 		WARN_ON_ONCE(atomic_read(&sqd->park_pending));
 
 		io_sq_thread_stop(sqd);
-		kfree(sqd);
+		kfree_rcu(sqd, rcu);
 	}
 }
 
@@ -313,7 +313,7 @@ static int io_sq_thread(void *data)
 	}
 
 	io_uring_cancel_generic(true, sqd);
-	sqd->thread = NULL;
+	WRITE_ONCE(sqd->thread, NULL);
 	list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
 		atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
 	io_run_task_work();
@@ -411,7 +411,7 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
 			goto err_sqpoll;
 		}
 
-		sqd->thread = tsk;
+		WRITE_ONCE(sqd->thread, tsk);
 		ret = io_uring_alloc_task_context(tsk, ctx);
 		wake_up_new_task(tsk);
 		if (ret)
diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
index 8df37e8c9149..0cf0c5833a27 100644
--- a/io_uring/sqpoll.h
+++ b/io_uring/sqpoll.h
@@ -18,6 +18,8 @@ struct io_sq_data {
 
 	unsigned long		state;
 	struct completion	exited;
+
+	struct rcu_head		rcu;
 };
 
 int io_sq_offload_create(struct io_ring_ctx *ctx, struct io_uring_params *p);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval
       [not found]         ` <CGME20231118032740epcas5p20b6aad6264323376fa024bc2a56f0990@epcas5p2.samsung.com>
@ 2023-11-18  3:19           ` Xiaobing Li
  2023-11-19 21:23             ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Xiaobing Li @ 2023-11-18  3:19 UTC (permalink / raw)
  To: axboe
  Cc: asml.silence, linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k,
	kundan.kumar, wenwen.chen, ruyi.zhang, xiaobing.li

On 11/15/23 6:42 AM, Jens Axboe wrote:
> 	 */
> 	has_lock = mutex_trylock(&ctx->uring_lock);
> 
>-	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
>-		struct io_sq_data *sq = ctx->sq_data;
>-
>-		sq_pid = sq->task_pid;
>-		sq_cpu = sq->sq_cpu;
>+	if (ctx->flags & IORING_SETUP_SQPOLL) {
>+		struct io_sq_data *sq;
>+
>+		rcu_read_lock();
>+		sq = READ_ONCE(ctx->sq_data);
>+		if (sq) {
>+			sq_pid = sq->task_pid;
>+			sq_cpu = sq->sq_cpu;
>+		}
>+		rcu_read_unlock();
> 	}
> 
> 	seq_printf(m, "SqThread:\t%d\n", sq_pid);
>diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
>index 65b5dbe3c850..583c76945cdf 100644
>--- a/io_uring/sqpoll.c
>+++ b/io_uring/sqpoll.c
>@@ -70,7 +70,7 @@ void io_put_sq_data(struct io_sq_data *sqd)
> 		WARN_ON_ONCE(atomic_read(&sqd->park_pending));
> 
> 		io_sq_thread_stop(sqd);
>-		kfree(sqd);
>+		kfree_rcu(sqd, rcu);
> 	}
> }
> 
>@@ -313,7 +313,7 @@ static int io_sq_thread(void *data)
> 	}
> 
> 	io_uring_cancel_generic(true, sqd);
>-	sqd->thread = NULL;
>+	WRITE_ONCE(sqd->thread, NULL);
> 	list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
> 		atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
> 	io_run_task_work();
>@@ -411,7 +411,7 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
> 			goto err_sqpoll;
> 		}
> 
>-		sqd->thread = tsk;
>+		WRITE_ONCE(sqd->thread, tsk);
> 		ret = io_uring_alloc_task_context(tsk, ctx);
> 		wake_up_new_task(tsk);
> 		if (ret)
>diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
>index 8df37e8c9149..0cf0c5833a27 100644
>--- a/io_uring/sqpoll.h
>+++ b/io_uring/sqpoll.h
>@@ -18,6 +18,8 @@ struct io_sq_data {
> 
> 	unsigned long		state;
> 	struct completion	exited;
>+
>+	struct rcu_head		rcu;
> };
> 
> int io_sq_offload_create(struct io_ring_ctx *ctx, struct io_uring_params *p);

I tested this and it worked after adding RCU lock.
It consistently outputs correct results.

The results of a simple test are as follows:
Every 0.5s: cat /proc/10212/fdinfo/6 | grep Sq
SqMask: 0x3
SqHead: 17422716
SqTail: 17422716
CachedSqHead:   17422716
SqThread:       10212
SqThreadCpu:    73
SqBusy: 97%
-------------------------------------------------------------
But the name of the sq thread is "iou-sqp-" + "the PID of its parent process":
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
  10211 root      20   0  184408   8192      0 R  99.9   0.0   4:01.42 fio
  10212 root      20   0  184408   8192      0 R  99.9   0.0   4:01.48 iou-sqp-10211
Is this the originally desired effect?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval
  2023-11-18  3:19           ` Xiaobing Li
@ 2023-11-19 21:23             ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2023-11-19 21:23 UTC (permalink / raw)
  To: Xiaobing Li
  Cc: asml.silence, linux-kernel, io-uring, kun.dou, peiwei.li, joshi.k,
	kundan.kumar, wenwen.chen, ruyi.zhang

On 11/17/23 8:19 PM, Xiaobing Li wrote:
> On 11/15/23 6:42 AM, Jens Axboe wrote:
>> 	 */
>> 	has_lock = mutex_trylock(&ctx->uring_lock);
>>
>> -	if (has_lock && (ctx->flags & IORING_SETUP_SQPOLL)) {
>> -		struct io_sq_data *sq = ctx->sq_data;
>> -
>> -		sq_pid = sq->task_pid;
>> -		sq_cpu = sq->sq_cpu;
>> +	if (ctx->flags & IORING_SETUP_SQPOLL) {
>> +		struct io_sq_data *sq;
>> +
>> +		rcu_read_lock();
>> +		sq = READ_ONCE(ctx->sq_data);
>> +		if (sq) {
>> +			sq_pid = sq->task_pid;
>> +			sq_cpu = sq->sq_cpu;
>> +		}
>> +		rcu_read_unlock();
>> 	}
>>
>> 	seq_printf(m, "SqThread:\t%d\n", sq_pid);
>> diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
>> index 65b5dbe3c850..583c76945cdf 100644
>> --- a/io_uring/sqpoll.c
>> +++ b/io_uring/sqpoll.c
>> @@ -70,7 +70,7 @@ void io_put_sq_data(struct io_sq_data *sqd)
>> 		WARN_ON_ONCE(atomic_read(&sqd->park_pending));
>>
>> 		io_sq_thread_stop(sqd);
>> -		kfree(sqd);
>> +		kfree_rcu(sqd, rcu);
>> 	}
>> }
>>
>> @@ -313,7 +313,7 @@ static int io_sq_thread(void *data)
>> 	}
>>
>> 	io_uring_cancel_generic(true, sqd);
>> -	sqd->thread = NULL;
>> +	WRITE_ONCE(sqd->thread, NULL);
>> 	list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
>> 		atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
>> 	io_run_task_work();
>> @@ -411,7 +411,7 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
>> 			goto err_sqpoll;
>> 		}
>>
>> -		sqd->thread = tsk;
>> +		WRITE_ONCE(sqd->thread, tsk);
>> 		ret = io_uring_alloc_task_context(tsk, ctx);
>> 		wake_up_new_task(tsk);
>> 		if (ret)
>> diff --git a/io_uring/sqpoll.h b/io_uring/sqpoll.h
>> index 8df37e8c9149..0cf0c5833a27 100644
>> --- a/io_uring/sqpoll.h
>> +++ b/io_uring/sqpoll.h
>> @@ -18,6 +18,8 @@ struct io_sq_data {
>>
>> 	unsigned long		state;
>> 	struct completion	exited;
>> +
>> +	struct rcu_head		rcu;
>> };
>>
>> int io_sq_offload_create(struct io_ring_ctx *ctx, struct io_uring_params *p);
> 
> I tested this and it worked after adding RCU lock.
> It consistently outputs correct results.
> 
> The results of a simple test are as follows:
> Every 0.5s: cat /proc/10212/fdinfo/6 | grep Sq
> SqMask: 0x3
> SqHead: 17422716
> SqTail: 17422716
> CachedSqHead:   17422716
> SqThread:       10212
> SqThreadCpu:    73
> SqBusy: 97%
> -------------------------------------------------------------
> But the name of the sq thread is "iou-sqp-" + "the PID of its parent process":
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>   10211 root      20   0  184408   8192      0 R  99.9   0.0   4:01.42 fio
>   10212 root      20   0  184408   8192      0 R  99.9   0.0   4:01.48 iou-sqp-10211
> Is this the originally desired effect?

That's how it has always been, it's the sqpoll thread belonging to that
parent.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-19 21:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-15  2:36 [PATCH v3] io_uring/fdinfo: remove need for sqpoll lock for thread/pid retrieval Jens Axboe
     [not found] ` <CGME20231115061813epcas5p2bb6bebb451c6e2c65a5e9ec9ffac5f46@epcas5p2.samsung.com>
2023-11-15  6:10   ` Xiaobing Li
2023-11-15 13:42     ` Jens Axboe
2023-11-15 13:49       ` Jens Axboe
     [not found]         ` <CGME20231118032740epcas5p20b6aad6264323376fa024bc2a56f0990@epcas5p2.samsung.com>
2023-11-18  3:19           ` Xiaobing Li
2023-11-19 21:23             ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox