public inbox for [email protected]
 help / color / mirror / Atom feed
From: David Wei <[email protected]>
To: [email protected]
Cc: Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	David Wei <[email protected]>
Subject: [PATCH v2] io_uring: add IORING_ENTER_NO_IOWAIT to not set in_iowait
Date: Fri, 16 Aug 2024 15:36:40 -0700	[thread overview]
Message-ID: <[email protected]> (raw)

io_uring sets current->in_iowait when waiting for completions, which
achieves two things:

1. Proper accounting of the time as iowait time
2. Enable cpufreq optimisations, setting SCHED_CPUFREQ_IOWAIT on the rq

For block IO this makes sense as high iowait can be indicative of
issues. But for network IO especially recv, the recv side does not
control when the completions happen.

Some user tooling attributes iowait time as CPU utilisation i.e. not
idle, so high iowait time looks like high CPU util even though the task
is not scheduled and the CPU is free to run other tasks. When doing
network IO with e.g. the batch completion feature, the CPU may appear to
have high utilisation.

This patchset adds a IOURING_ENTER_NO_IOWAIT flag that can be set on
enter. If set, then current->in_iowait is not set. By default this flag
is not set to maintain existing behaviour i.e. in_iowait is always set.
This is to prevent waiting for completions being accounted as CPU
utilisation.

Not setting in_iowait does mean that we also lose cpufreq optimisations
above because in_iowait semantics couples 1 and 2 together. Eventually
we will untangle the two so the optimisations can be enabled
independently of the accounting.

IORING_FEAT_IOWAIT_TOGGLE is returned in io_uring_create() to indicate
support. This will be used by liburing to check for this feature.

Signed-off-by: David Wei <[email protected]>
---
v2:
 - squash patches into one
 - move no_iowait in struct io_wait_queue to the end
 - always set iowq.no_iowait

---
 include/uapi/linux/io_uring.h | 2 ++
 io_uring/io_uring.c           | 7 ++++---
 io_uring/io_uring.h           | 1 +
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 48c440edf674..3a94afa8665e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -508,6 +508,7 @@ struct io_cqring_offsets {
 #define IORING_ENTER_EXT_ARG		(1U << 3)
 #define IORING_ENTER_REGISTERED_RING	(1U << 4)
 #define IORING_ENTER_ABS_TIMER		(1U << 5)
+#define IORING_ENTER_NO_IOWAIT		(1U << 6)
 
 /*
  * Passed in for io_uring_setup(2). Copied back with updated info on success
@@ -543,6 +544,7 @@ struct io_uring_params {
 #define IORING_FEAT_LINKED_FILE		(1U << 12)
 #define IORING_FEAT_REG_REG_RING	(1U << 13)
 #define IORING_FEAT_RECVSEND_BUNDLE	(1U << 14)
+#define IORING_FEAT_IOWAIT_TOGGLE	(1U << 15)
 
 /*
  * io_uring_register(2) opcodes and arguments
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 20229e72b65c..5e75672525df 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2372,7 +2372,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 	 * can take into account that the task is waiting for IO - turns out
 	 * to be important for low QD IO.
 	 */
-	if (current_pending_io())
+	if (!iowq->no_iowait && current_pending_io())
 		current->in_iowait = 1;
 	ret = 0;
 	if (iowq->timeout == KTIME_MAX)
@@ -2414,6 +2414,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags,
 	iowq.nr_timeouts = atomic_read(&ctx->cq_timeouts);
 	iowq.cq_tail = READ_ONCE(ctx->rings->cq.head) + min_events;
 	iowq.timeout = KTIME_MAX;
+	iowq.no_iowait = flags & IORING_ENTER_NO_IOWAIT;
 
 	if (uts) {
 		struct timespec64 ts;
@@ -3155,7 +3156,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP |
 			       IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG |
 			       IORING_ENTER_REGISTERED_RING |
-			       IORING_ENTER_ABS_TIMER)))
+			       IORING_ENTER_ABS_TIMER | IORING_ENTER_NO_IOWAIT)))
 		return -EINVAL;
 
 	/*
@@ -3539,7 +3540,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
 			IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
 			IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
 			IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING |
-			IORING_FEAT_RECVSEND_BUNDLE;
+			IORING_FEAT_RECVSEND_BUNDLE | IORING_FEAT_IOWAIT_TOGGLE;
 
 	if (copy_to_user(params, p, sizeof(*p))) {
 		ret = -EFAULT;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 9935819f12b7..426079a966ac 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -46,6 +46,7 @@ struct io_wait_queue {
 	ktime_t napi_busy_poll_dt;
 	bool napi_prefer_busy_poll;
 #endif
+	bool no_iowait;
 };
 
 static inline bool io_should_wake(struct io_wait_queue *iowq)
-- 
2.43.5


             reply	other threads:[~2024-08-16 22:36 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16 22:36 David Wei [this message]
2024-08-16 22:49 ` [PATCH v2] io_uring: add IORING_ENTER_NO_IOWAIT to not set in_iowait Jens Axboe
2024-08-17  1:23 ` Jeff Moyer
2024-08-19 23:03   ` David Wei
2024-08-17 19:44 ` Pavel Begunkov
2024-08-17 20:20   ` Jens Axboe
2024-08-17 21:05     ` Pavel Begunkov
2024-08-17 21:09       ` Jens Axboe
2024-08-17 22:04         ` Pavel Begunkov
2024-08-18  1:08           ` Jens Axboe
2024-08-18  2:27             ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox