public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v1 0/3] io_uring: add option to not set in_iowait
@ 2024-08-16 18:01 David Wei
  2024-08-16 18:01 ` [PATCH v1 1/3] io_uring: add IORING_ENTER_NO_IOWAIT flag David Wei
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: David Wei @ 2024-08-16 18:01 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

io_uring sets current->in_iowait when waiting for completions, which
achieves two things:

1. Proper accounting of the time as iowait time
2. Enable cpufreq optimisations, setting SCHED_CPUFREQ_IOWAIT on the rq

For block IO this makes sense as high iowait can be indicative of
issues. But for network IO especially recv, we do not control when the
completions happen. When doing network IO with the new min-wait feature
that lets io_uring wait for a certain number of completions before
returning, this manifests as high iowait time.

Some user tooling attributes iowait time as 'CPU utilisation' time, so
high iowait time looks like high CPU util even though the task is not
scheduled and the CPU is free to run other tasks.

This patchset adds a IOURING_ENTER_NO_IOWAIT flag that can be set on
enter. If set, then current->in_iowait is not set. By default this flag
is not set to maintain existing behaviour i.e. in_iowait is always set.

Not setting in_iowait does mean that we also lose cpufreq optimisations
above because in_iowait semantics couples 1 and 2 together. Eventually
we will untangle the two so the optimisations can be enabled
independently of the accounting.

David Wei (3):
  io_uring: add IORING_ENTER_NO_IOWAIT flag
  io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT
  io_uring: add IORING_FEAT_IOWAIT_TOGGLE feature flag

 include/uapi/linux/io_uring.h | 2 ++
 io_uring/io_uring.c           | 8 +++++---
 io_uring/io_uring.h           | 1 +
 3 files changed, 8 insertions(+), 3 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v1 1/3] io_uring: add IORING_ENTER_NO_IOWAIT flag
  2024-08-16 18:01 [PATCH v1 0/3] io_uring: add option to not set in_iowait David Wei
@ 2024-08-16 18:01 ` David Wei
  2024-08-16 18:35   ` Jens Axboe
  2024-08-16 18:01 ` [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT David Wei
  2024-08-16 18:01 ` [PATCH v1 3/3] io_uring: add IORING_FEAT_IOWAIT_TOGGLE feature flag David Wei
  2 siblings, 1 reply; 8+ messages in thread
From: David Wei @ 2024-08-16 18:01 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

Add IORING_ENTER_NO_IOWAIT flag. If this is set then io_uring will not
set current->in_iowait prior to waiting.

Signed-off-by: David Wei <[email protected]>
---
 include/uapi/linux/io_uring.h | 1 +
 io_uring/io_uring.c           | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 48c440edf674..2552d4927511 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -508,6 +508,7 @@ struct io_cqring_offsets {
 #define IORING_ENTER_EXT_ARG		(1U << 3)
 #define IORING_ENTER_REGISTERED_RING	(1U << 4)
 #define IORING_ENTER_ABS_TIMER		(1U << 5)
+#define IORING_ENTER_NO_IOWAIT		(1U << 6)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 20229e72b65c..4cc905b228a5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3155,7 +3155,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
 			       IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG |
 			       IORING_ENTER_REGISTERED_RING |
-			       IORING_ENTER_ABS_TIMER)))
+			       IORING_ENTER_ABS_TIMER | IORING_ENTER_NO_IOWAIT)))
 		return -EINVAL;
 
 	/*
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT
  2024-08-16 18:01 [PATCH v1 0/3] io_uring: add option to not set in_iowait David Wei
  2024-08-16 18:01 ` [PATCH v1 1/3] io_uring: add IORING_ENTER_NO_IOWAIT flag David Wei
@ 2024-08-16 18:01 ` David Wei
  2024-08-16 18:38   ` Jens Axboe
  2024-08-16 18:49   ` Jens Axboe
  2024-08-16 18:01 ` [PATCH v1 3/3] io_uring: add IORING_FEAT_IOWAIT_TOGGLE feature flag David Wei
  2 siblings, 2 replies; 8+ messages in thread
From: David Wei @ 2024-08-16 18:01 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

Check for IORING_ENTER_NO_WAIT and do not set current->in_iowait if it
is set. To maintain existing behaviour, by default this flag is not set.

This is to prevent waiting for completions being accounted as iowait
time. Some userspace tools consider iowait time to be 'utilisation' time
which is misleading since the task is not scheduled and the CPU is free
to run other tasks.

High iowait time might be indicative of issues for block IO, but not for
network IO i.e. recv() where we do not control when IO happens.

Signed-off-by: David Wei <[email protected]>
---
 io_uring/io_uring.c | 4 +++-
 io_uring/io_uring.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 4cc905b228a5..9438875e43ea 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2372,7 +2372,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 	 * can take into account that the task is waiting for IO - turns out
 	 * to be important for low QD IO.
 	 */
-	if (current_pending_io())
+	if (!iowq->no_iowait && current_pending_io())
 		current->in_iowait = 1;
 	ret = 0;
 	if (iowq->timeout == KTIME_MAX)
@@ -2414,6 +2414,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 	iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
 	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
 	iowq.timeout = KTIME_MAX;
+	if (flags & IORING_ENTER_NO_IOWAIT)
+		iowq.no_iowait = true;
 
 	if (uts) {
 		struct timespec64 ts;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 9935819f12b7..e35fecca4445 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -41,6 +41,7 @@ struct io_wait_queue {
 	unsigned cq_tail;
 	unsigned nr_timeouts;
 	ktime_t timeout;
+	bool no_iowait;
 
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	ktime_t napi_busy_poll_dt;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 3/3] io_uring: add IORING_FEAT_IOWAIT_TOGGLE feature flag
  2024-08-16 18:01 [PATCH v1 0/3] io_uring: add option to not set in_iowait David Wei
  2024-08-16 18:01 ` [PATCH v1 1/3] io_uring: add IORING_ENTER_NO_IOWAIT flag David Wei
  2024-08-16 18:01 ` [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT David Wei
@ 2024-08-16 18:01 ` David Wei
  2 siblings, 0 replies; 8+ messages in thread
From: David Wei @ 2024-08-16 18:01 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

Add IORING_FEAT_IOWAIT_TOGGLE and return it in io_uring_create(). This
will be used by liburing to check for this feature.

Signed-off-by: David Wei <[email protected]>
---
 include/uapi/linux/io_uring.h | 1 +
 io_uring/io_uring.c           | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 2552d4927511..3a94afa8665e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -544,6 +544,7 @@ struct io_uring_params {
 #define IORING_FEAT_LINKED_FILE		(1U << 12)
 #define IORING_FEAT_REG_REG_RING	(1U << 13)
 #define IORING_FEAT_RECVSEND_BUNDLE	(1U << 14)
+#define IORING_FEAT_IOWAIT_TOGGLE	(1U << 15)
 
 /*
  * io_uring_register(2) opcodes and arguments
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9438875e43ea..006bccd55984 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3541,7 +3541,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
 			IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
 			IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
 			IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING |
-			IORING_FEAT_RECVSEND_BUNDLE;
+			IORING_FEAT_RECVSEND_BUNDLE | IORING_FEAT_IOWAIT_TOGGLE;
 
 	if (copy_to_user(params, p, sizeof(*p))) {
 		ret = -EFAULT;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 1/3] io_uring: add IORING_ENTER_NO_IOWAIT flag
  2024-08-16 18:01 ` [PATCH v1 1/3] io_uring: add IORING_ENTER_NO_IOWAIT flag David Wei
@ 2024-08-16 18:35   ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-08-16 18:35 UTC (permalink / raw)
  To: David Wei, io-uring; +Cc: Pavel Begunkov

On 8/16/24 12:01 PM, David Wei wrote:
> Add IORING_ENTER_NO_IOWAIT flag. If this is set then io_uring will not
> set current->in_iowait prior to waiting.

Ordering is a bit wrong here - with this patch, you can set
IORING_ENTER_NO_IOWAIT and it will appear to work (eg no -EINVAL
return), but it won't actually work.

I'd just squash patch 1 + 2, as just having patch 2 with the uapi hunk
be patch 1 leaves patch 2 as just adding it to the flags check.

And at that point you'd probably just want to fold in patch 3 as well,
not sure that makes sense as a separate patch.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT
  2024-08-16 18:01 ` [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT David Wei
@ 2024-08-16 18:38   ` Jens Axboe
  2024-08-16 22:23     ` David Wei
  2024-08-16 18:49   ` Jens Axboe
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2024-08-16 18:38 UTC (permalink / raw)
  To: David Wei, io-uring; +Cc: Pavel Begunkov

On 8/16/24 12:01 PM, David Wei wrote:
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index 9935819f12b7..e35fecca4445 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -41,6 +41,7 @@ struct io_wait_queue {
>  	unsigned cq_tail;
>  	unsigned nr_timeouts;
>  	ktime_t timeout;
> +	bool no_iowait;
>  
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  	ktime_t napi_busy_poll_dt;

I'd put that bool below the NAPI section, then it'll pack in with
napi_prefer_busy_poll rather than waste 7 bytes as it does here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT
  2024-08-16 18:01 ` [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT David Wei
  2024-08-16 18:38   ` Jens Axboe
@ 2024-08-16 18:49   ` Jens Axboe
  1 sibling, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-08-16 18:49 UTC (permalink / raw)
  To: David Wei, io-uring; +Cc: Pavel Begunkov

On 8/16/24 12:01 PM, David Wei wrote:
> @@ -2414,6 +2414,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
>  	iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
>  	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
>  	iowq.timeout = KTIME_MAX;
> +	if (flags & IORING_ENTER_NO_IOWAIT)
> +		iowq.no_iowait = true;

Oh, and this should be:

	iowq.no_iowait = flags & IORING_ENTER_NO_IOWAIT;

to avoid leaving this field uninitialized by default if the flag isn't
set. The struct isn't initialized to zero.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT
  2024-08-16 18:38   ` Jens Axboe
@ 2024-08-16 22:23     ` David Wei
  0 siblings, 0 replies; 8+ messages in thread
From: David Wei @ 2024-08-16 22:23 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Pavel Begunkov

On 2024-08-16 11:38, Jens Axboe wrote:
> On 8/16/24 12:01 PM, David Wei wrote:
>> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
>> index 9935819f12b7..e35fecca4445 100644
>> --- a/io_uring/io_uring.h
>> +++ b/io_uring/io_uring.h
>> @@ -41,6 +41,7 @@ struct io_wait_queue {
>>  	unsigned cq_tail;
>>  	unsigned nr_timeouts;
>>  	ktime_t timeout;
>> +	bool no_iowait;
>>  
>>  #ifdef CONFIG_NET_RX_BUSY_POLL
>>  	ktime_t napi_busy_poll_dt;
> 
> I'd put that bool below the NAPI section, then it'll pack in with
> napi_prefer_busy_poll rather than waste 7 bytes as it does here.
> 

Thanks, I will remember to always check with pahole next time!

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-08-16 22:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-16 18:01 [PATCH v1 0/3] io_uring: add option to not set in_iowait David Wei
2024-08-16 18:01 ` [PATCH v1 1/3] io_uring: add IORING_ENTER_NO_IOWAIT flag David Wei
2024-08-16 18:35   ` Jens Axboe
2024-08-16 18:01 ` [PATCH v1 2/3] io_uring: do not set no_iowait if IORING_ENTER_NO_WAIT David Wei
2024-08-16 18:38   ` Jens Axboe
2024-08-16 22:23     ` David Wei
2024-08-16 18:49   ` Jens Axboe
2024-08-16 18:01 ` [PATCH v1 3/3] io_uring: add IORING_FEAT_IOWAIT_TOGGLE feature flag David Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox