From: Jens Axboe <axboe@kernel.dk>
To: Fengnan Chang <changfengnan@bytedance.com>
Cc: asml.silence@gmail.com, io-uring@vger.kernel.org,
Diangang Li <lidiangang@bytedance.com>
Subject: Re: [External] Re: [RFC PATCH] io_uring: fix io worker thread that keeps creating and destroying
Date: Fri, 23 May 2025 09:20:12 -0600 [thread overview]
Message-ID: <7bf620dc-1b5c-4401-a03c-16978de0598a@kernel.dk> (raw)
In-Reply-To: <CAPFOzZtxXOQvC0wcNLaj-hZUOf2PWqon0uEvbQh7if7a7DdX=g@mail.gmail.com>
On 5/23/25 1:57 AM, Fengnan Chang wrote:
> Jens Axboe <axboe@kernel.dk> ?2025?5?22??? 22:29???
>>
>> On 5/22/25 6:01 AM, Fengnan Chang wrote:
>>> Jens Axboe <axboe@kernel.dk> ?2025?5?22??? 19:35???
>>>>
>>>> On 5/22/25 3:09 AM, Fengnan Chang wrote:
>>>>> When running fio with buffer io and stable iops, I observed that
>>>>> part of io_worker threads keeps creating and destroying.
>>>>> Using this command can reproduce:
>>>>> fio --ioengine=io_uring --rw=randrw --bs=4k --direct=0 --size=100G
>>>>> --iodepth=256 --filename=/data03/fio-rand-read --name=test
>>>>> ps -L -p pid, you can see about 256 io_worker threads, and thread
>>>>> id keeps changing.
>>>>> And I do some debugging, most workers create happen in
>>>>> create_worker_cb. In create_worker_cb, if all workers have gone to
>>>>> sleep, and we have more work, we try to create new worker (let's
>>>>> call it worker B) to handle it. And when new work comes,
>>>>> io_wq_enqueue will activate free worker (let's call it worker A) or
>>>>> create new one. It may cause worker A and B compete for one work.
>>>>> Since buffered write is hashed work, buffered write to a given file
>>>>> is serialized, only one worker gets the work in the end, the other
>>>>> worker goes to sleep. After repeating it many times, a lot of
>>>>> io_worker threads created, handles a few works or even no work to
>>>>> handle,and exit.
>>>>> There are several solutions:
>>>>> 1. Since all work is insert in io_wq_enqueue, io_wq_enqueue will
>>>>> create worker too, remove create worker action in create_worker_cb
>>>>> is fine, maybe affect performance?
>>>>> 2. When wq->hash->map bit is set, insert hashed work item, new work
>>>>> only put in wq->hash_tail, not link to work_list,
>>>>> io_worker_handle_work need to check hash_tail after a whole dependent
>>>>> link, io_acct_run_queue will return false when new work insert, no
>>>>> new thread will be created either in io_wqe_dec_running.
>>>>> 3. Check is there only one hash bucket in io_wqe_dec_running. If only
>>>>> one hash bucket, don't create worker, io_wq_enqueue will handle it.
>>>>
>>>> Nice catch on this! Does indeed look like a problem. Not a huge fan of
>>>> approach 3. Without having really looked into this yet, my initial idea
>>>> would've been to do some variant of solution 1 above. io_wq_enqueue()
>>>> checks if we need to create a worker, which basically boils down to "do
>>>> we have a free worker right now". If we do not, we create one. But the
>>>> question is really "do we need a new worker for this?", and if we're
>>>> inserting hashed worked and we have existing hashed work for the SAME
>>>> hash and it's busy, then the answer should be "no" as it'd be pointless
>>>> to create that worker.
>>>
>>> Agree
>>>
>>>>
>>>> Would it be feasible to augment the check in there such that
>>>> io_wq_enqueue() doesn't create a new worker for that case? And I guess a
>>>> followup question is, would that even be enough, do we always need to
>>>> cover the io_wq_dec_running() running case as well as
>>>> io_acct_run_queue() will return true as well since it doesn't know about
>>>> this either?
>>> Yes?It is feasible to avoid creating a worker by adding some checks in
>>> io_wq_enqueue. But what I have observed so far is most workers are
>>> created in io_wq_dec_running (why no worker create in io_wq_enqueue?
>>> I didn't figure it out now), it seems no need to check this
>>> in io_wq_enqueue. And cover io_wq_dec_running is necessary.
>>
>> The general concept for io-wq is that it's always assumed that a worker
>> won't block, and if it does AND more work is available, at that point a
>> new worker is created. io_wq_dec_running() is called by the scheduler
>> when a worker is scheduled out, eg blocking, and then an extra worker is
>> created at that point, if necessary.
>>
>> I wonder if we can get away with something like the below? Basically two
>> things in there:
>>
>> 1) If a worker goes to sleep AND it doesn't have a current work
>> assigned, just ignore it. Really a separate change, but seems to
>> conceptually make sense - a new worker should only be created off
>> that path, if it's currenly handling a work item and goes to sleep.
>>
>> 2) If there is current work, defer if it's hashed and the next work item
>> in that list is also hashed and of the same value.
> I like this change, this makes the logic clearer. This patch looks good,
> I'll do more tests next week.
Thanks for taking a look - I've posted it as a 3 patch series, as 1+2
above are really two separate things that need sorting imho. I've queued
it up for the next kernel release, so please do test next week when you
have time.
--
Jens Axboe
next prev parent reply other threads:[~2025-05-23 15:20 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-22 9:09 [RFC PATCH] io_uring: fix io worker thread that keeps creating and destroying Fengnan Chang
2025-05-22 11:34 ` Jens Axboe
2025-05-22 12:01 ` [External] " Fengnan Chang
2025-05-22 14:29 ` Jens Axboe
2025-05-23 7:57 ` Fengnan Chang
2025-05-23 15:20 ` Jens Axboe [this message]
2025-05-26 11:14 ` Fengnan Chang
2025-05-28 13:39 ` Jens Axboe
2025-05-28 13:56 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7bf620dc-1b5c-4401-a03c-16978de0598a@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=changfengnan@bytedance.com \
--cc=io-uring@vger.kernel.org \
--cc=lidiangang@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox