public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Oleksandr Natalenko <[email protected]>,
	[email protected], Genes Lists <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>,
	Pavel Begunkov <[email protected]>,
	[email protected], [email protected],
	Andres Freund <[email protected]>
Subject: Re: [PATCH 6.4 800/800] io_uring: Use io_schedule* in cqring wait
Date: Mon, 24 Jul 2023 09:47:21 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 7/23/23 12:06?PM, Oleksandr Natalenko wrote:
> Hello.
> 
> On ned?le 23. ?ervence 2023 19:43:50 CEST Genes Lists wrote:
>> On 7/23/23 11:31, Jens Axboe wrote:
>> ...
>>> Just read the first one, but this is very much expected. It's now just
>>> correctly reflecting that one thread is waiting on IO. IO wait being
>>> 100% doesn't mean that one core is running 100% of the time, it just
>>> means it's WAITING on IO 100% of the time.
>>>
>>
>> Seems reasonable thank you.
>>
>> Question - do you expect the iowait to stay high for a freshly created 
>> mariadb doing nothing (as far as I can tell anyway) until process 
>> exited? Or Would you think it would drop in this case prior to the 
>> process exiting.
>>
>> For example I tried the following - is the output what you expect?
>>
>> Create a fresh mariab with no databases - monitor the core showing the 
>> iowaits with:
>>
>>     mpstat -P ALL 2 100
>>
>> # rm -f /var/lib/mysql/*
>> # mariadb-install-db --user=mysql --basedir=/usr --datadir=/var/lib/mysql
>>
>> # systemctl start mariadb      (iowaits -> 100%) 
>>  
>>
>> # iotop -bo |grep maria        (shows no output, iowait stays 100%)
>>
>> (this persists until mariadb process exits)
>>  
>>
>> # systemctl stop mariadb       (iowait drops to 0%) 
> 
> This is a visible userspace behaviour change with no changes in the
> userspace itself, so we cannot just ignore it. If for some reason this
> is how it should be now, how do we explain it to MariaDB devs to get
> this fixed?

It's not a behavioural change, it's a reporting change. There's no
functionality changing here. That said, I do think we should narrow it a
bit so we're only marked as in iowait if the task waiting has pending
IO. That should still satisfy the initial problem, and it won't flag
iowait on mariadb like cases where they have someone else just
perpetually waiting on requests.

As a side effect, this also removes the flush that wasn't at all
necessary on io_uring.

If folks are able to test the below, that would be appreciated.


diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 89a611541bc4..f4591b912ea8 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2493,11 +2493,20 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
 	return 0;
 }
 
+static bool current_pending_io(void)
+{
+	struct io_uring_task *tctx = current->io_uring;
+
+	if (!tctx)
+		return false;
+	return percpu_counter_read_positive(&tctx->inflight);
+}
+
 /* when returns >0, the caller should retry */
 static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 					  struct io_wait_queue *iowq)
 {
-	int token, ret;
+	int io_wait, ret;
 
 	if (unlikely(READ_ONCE(ctx->check_cq)))
 		return 1;
@@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
 		return 0;
 
 	/*
-	 * Use io_schedule_prepare/finish, so cpufreq can take into account
-	 * that the task is waiting for IO - turns out to be important for low
-	 * QD IO.
+	 * Mark us as being in io_wait if we have pending requests, so cpufreq
+	 * can take into account that the task is waiting for IO - turns out
+	 * to be important for low QD IO.
 	 */
-	token = io_schedule_prepare();
+	io_wait = current->in_iowait;
+	if (current_pending_io())
+		current->in_iowait = 1;
 	ret = 0;
 	if (iowq->timeout == KTIME_MAX)
 		schedule();
 	else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS))
 		ret = -ETIME;
-	io_schedule_finish(token);
+	current->in_iowait = io_wait;
 	return ret;
 }
 

-- 
Jens Axboe


  parent reply	other threads:[~2023-07-24 15:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <[email protected]>
2023-07-16 19:50 ` [PATCH 6.4 800/800] io_uring: Use io_schedule* in cqring wait Greg Kroah-Hartman
2023-07-23  9:39   ` Oleksandr Natalenko
2023-07-23 10:50     ` Greg Kroah-Hartman
2023-07-23 10:55       ` Genes Lists
2023-07-23 10:56       ` Oleksandr Natalenko
2023-07-23 12:11     ` Linux regression tracking #adding (Thorsten Leemhuis)
2023-07-23 17:35       ` Linux regression tracking #update (Thorsten Leemhuis)
2023-07-23 15:31     ` Jens Axboe
2023-07-23 17:43       ` Genes Lists
2023-07-23 18:06         ` Oleksandr Natalenko
2023-07-23 18:58           ` Andres Freund
2023-07-23 19:44             ` Genes Lists
2023-07-24 15:47           ` Jens Axboe [this message]
2023-07-24 18:04             ` Genes Lists

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox