From: Fengnan Chang <changfengnan@bytedance.com>
To: axboe@kernel.dk, asml.silence@gmail.com
Cc: io-uring@vger.kernel.org,
Fengnan Chang <changfengnan@bytedance.com>,
Diangang Li <lidiangang@bytedance.com>
Subject: [RFC PATCH] io_uring: fix io worker thread that keeps creating and destroying
Date: Thu, 22 May 2025 17:09:09 +0800 [thread overview]
Message-ID: <20250522090909.73212-1-changfengnan@bytedance.com> (raw)
When running fio with buffer io and stable iops, I observed that
part of io_worker threads keeps creating and destroying.
Using this command can reproduce:
fio --ioengine=io_uring --rw=randrw --bs=4k --direct=0 --size=100G
--iodepth=256 --filename=/data03/fio-rand-read --name=test
ps -L -p pid, you can see about 256 io_worker threads, and thread
id keeps changing.
And I do some debugging, most workers create happen in
create_worker_cb. In create_worker_cb, if all workers have gone to
sleep, and we have more work, we try to create new worker (let's
call it worker B) to handle it. And when new work comes,
io_wq_enqueue will activate free worker (let's call it worker A) or
create new one. It may cause worker A and B compete for one work.
Since buffered write is hashed work, buffered write to a given file
is serialized, only one worker gets the work in the end, the other
worker goes to sleep. After repeating it many times, a lot of
io_worker threads created, handles a few works or even no work to
handle,and exit.
There are several solutions:
1. Since all work is insert in io_wq_enqueue, io_wq_enqueue will
create worker too, remove create worker action in create_worker_cb
is fine, maybe affect performance?
2. When wq->hash->map bit is set, insert hashed work item, new work
only put in wq->hash_tail, not link to work_list,
io_worker_handle_work need to check hash_tail after a whole dependent
link, io_acct_run_queue will return false when new work insert, no
new thread will be created either in io_wqe_dec_running.
3. Check is there only one hash bucket in io_wqe_dec_running. If only
one hash bucket, don't create worker, io_wq_enqueue will handle it.
I choose plan 3 to avoid this problem. After my test, there is no
performance degradation.
Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
Signed-off-by: Diangang Li <lidiangang@bytedance.com>
---
io_uring/io-wq.c | 73 +++++++++++++++++++++++++++++++++++-------------
1 file changed, 54 insertions(+), 19 deletions(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 04a75d666195..37723a785204 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -412,25 +412,6 @@ static bool io_queue_worker_create(struct io_worker *worker,
return false;
}
-static void io_wq_dec_running(struct io_worker *worker)
-{
- struct io_wq_acct *acct = io_wq_get_acct(worker);
- struct io_wq *wq = worker->wq;
-
- if (!test_bit(IO_WORKER_F_UP, &worker->flags))
- return;
-
- if (!atomic_dec_and_test(&acct->nr_running))
- return;
- if (!io_acct_run_queue(acct))
- return;
-
- raw_spin_unlock(&acct->lock);
- atomic_inc(&acct->nr_running);
- atomic_inc(&wq->worker_refs);
- io_queue_worker_create(worker, acct, create_worker_cb);
-}
-
/*
* Worker will start processing some work. Move it to the busy list, if
* it's currently on the freelist
@@ -484,6 +465,60 @@ static bool io_wait_on_hash(struct io_wq *wq, unsigned int hash)
return ret;
}
+static bool only_one_hashed_work(struct io_wq_acct *acct,
+ struct io_wq *wq)
+ __must_hold(acct->lock)
+{
+ struct io_wq_work_node *node, *prev;
+ struct io_wq_work *work, *tail;
+ unsigned int len = 0;
+
+ wq_list_for_each(node, prev, &acct->work_list) {
+ unsigned int work_flags;
+ unsigned int hash;
+
+ work = container_of(node, struct io_wq_work, list);
+ len += 1;
+ if (len > 1)
+ return false;
+ /* not hashed, can run anytime */
+ work_flags = atomic_read(&work->flags);
+ if (!__io_wq_is_hashed(work_flags))
+ return false;
+
+ hash = __io_get_work_hash(work_flags);
+ /* all items with this hash lie in [work, tail] */
+ tail = wq->hash_tail[hash];
+ /* fast forward to a next hash, for-each will fix up @prev */
+ node = &tail->list;
+ }
+
+ return true;
+}
+
+static void io_wq_dec_running(struct io_worker *worker)
+{
+ struct io_wq_acct *acct = io_wq_get_acct(worker);
+ struct io_wq *wq = worker->wq;
+
+ if (!test_bit(IO_WORKER_F_UP, &worker->flags))
+ return;
+
+ if (!atomic_dec_and_test(&acct->nr_running))
+ return;
+ if (!io_acct_run_queue(acct))
+ return;
+ if (only_one_hashed_work(acct, wq)) {
+ raw_spin_unlock(&acct->lock);
+ return;
+ }
+
+ raw_spin_unlock(&acct->lock);
+ atomic_inc(&acct->nr_running);
+ atomic_inc(&wq->worker_refs);
+ io_queue_worker_create(worker, acct, create_worker_cb);
+}
+
static struct io_wq_work *io_get_next_work(struct io_wq_acct *acct,
struct io_wq *wq)
__must_hold(acct->lock)
--
2.20.1
next reply other threads:[~2025-05-22 9:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-22 9:09 Fengnan Chang [this message]
2025-05-22 11:34 ` [RFC PATCH] io_uring: fix io worker thread that keeps creating and destroying Jens Axboe
2025-05-22 12:01 ` [External] " Fengnan Chang
2025-05-22 14:29 ` Jens Axboe
2025-05-23 7:57 ` Fengnan Chang
2025-05-23 15:20 ` Jens Axboe
2025-05-26 11:14 ` Fengnan Chang
2025-05-28 13:39 ` Jens Axboe
2025-05-28 13:56 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250522090909.73212-1-changfengnan@bytedance.com \
--to=changfengnan@bytedance.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=lidiangang@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox