From: Fengnan Chang <fengnanchang@gmail.com>
To: Jens Axboe <axboe@kernel.dk>,
asml.silence@gmail.com, io-uring@vger.kernel.org
Cc: Fengnan Chang <changfengnan@bytedance.com>,
Diangang Li <lidiangang@bytedance.com>
Subject: Re: [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode
Date: Fri, 12 Dec 2025 17:49:29 +0800 [thread overview]
Message-ID: <21672cc5-abc2-4595-94b2-3ab0c2d40cf3@gmail.com> (raw)
In-Reply-To: <38972bbc-fb6f-42c5-bd17-b19db134dfad@kernel.dk>
在 2025/12/12 16:58, Jens Axboe 写道:
> On 12/11/25 10:11 PM, Jens Axboe wrote:
>> On 12/11/25 7:12 PM, Fengnan Chang wrote:
>>>
>>> ? 2025/12/12 09:53, Jens Axboe ??:
>>>> On 12/11/25 6:41 PM, Fengnan Chang wrote:
>>>>> Oh, we can't add nr_events == iob.nr_reqs check, if
>>>>> blk_mq_add_to_batch add failed, completed IO will not add into iob,
>>>>> iob.nr_reqs will be 0, this may cause io hang.
>>>> Indeed, won't work as-is.
>>>>
>>>> I do think we're probably making a bigger deal out of the full loop than
>>>> necessary. At least I'd be perfectly happy with just the current patch,
>>>> performance should be better there than we currently have it. Ideally
>>>> we'd have just one loop for polling and catching the completed items,
>>>> but that's a bit tricky with the batch completions.
>>> Yes, ideally one loop would be enough, but given that there are also
>>> multi_queue ctx, that doesn't seem to be possible.
>> It's not removing the double loop, but the below could help _only_
>> iterate completed requests at the end. Rather than move items between
>> the current list at the completion callback, have a separate list just
>> for completed requests. Then we can simply iterate that, knowing all of
>> them have completed. Gets rid of the ->iopoll_completed as well, and
>> then we can move the poll_refs. Not really related at all, obviously
>> this patch should be split into multiple pieces.
>>
>> This uses a lockless list. But since the producer and consumer are
>> generally the same task, that should not add any real overhead. On top
>> of the previous one I sent. What do you think?
> Ran some quick testing, as one interesting case is mixing slower and
> faster devices. Let's take this basic example:
>
> t/io_uring -p1 -d128 -b512 -s32 -c32 -F1 -B1 -R1 -X1 -n1 -P1 -t1 -n2 /dev/nvme32n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
>
> where nvme32n1 is about 1.27M IOPS, and the other 3 do about 3.3M IOPS,
> and we poll 2 devices with each IO thread. With the current kernel, we
> get:
>
> IOPS=5.18M, BW=2.53GiB/s, IOS/call=31/32
> IOPS=5.19M, BW=2.53GiB/s, IOS/call=31/31
> IOPS=5.17M, BW=2.53GiB/s, IOS/call=31/31
>
> and with the two patches we get:
>
> IOPS=6.54M, BW=3.19GiB/s, IOS/call=31/31
> IOPS=6.52M, BW=3.18GiB/s, IOS/call=31/31
> IOPS=6.52M, BW=3.18GiB/s, IOS/call=31/31
>
> or about a 25% improvement. This is mostly due to the issue you
> highlighted, where you end up with later completions (that are done)
> being stuck behind waiting on a slower completion.
>
> Note: obviously 1 thread driving multiple devices for polling could
> still be improved, and in fact it does improve if we simply change -c32
> to something lower. The more important case is the one you identified,
> where different completion times on the same device will hold
> completions up. Multi device polling is just an interesting way to kind
> of emulate that, to an extent.
>
> This effect is (naturally) also apparent in the completion latencies as
> well, particularly in the higher percentiles.
>
> Ran peak testing too, and it's better all around than before.
I love the patch, I had a similar thought, this addresses my concern, I
simple tested it
and the performance is a bit better than the previous performance.
base IOPS is 725K, previous IOPS is 782K, now 790k.
It looks like all the problems are solved,I 'll do more testing next week.
>
next prev parent reply other threads:[~2025-12-12 9:49 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-10 8:54 [RFC PATCH 0/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-10 8:55 ` [RFC PATCH 1/2] blk-mq: delete task running check in blk_hctx_poll Fengnan Chang
2025-12-10 9:19 ` Jens Axboe
2025-12-10 9:53 ` Jens Axboe
2025-12-10 8:55 ` [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-11 2:15 ` Jens Axboe
2025-12-11 4:10 ` Jens Axboe
2025-12-11 7:38 ` Fengnan
2025-12-11 10:22 ` Jens Axboe
2025-12-11 10:33 ` Jens Axboe
2025-12-11 11:13 ` Fengnan Chang
2025-12-11 11:19 ` Jens Axboe
2025-12-12 1:41 ` Fengnan Chang
2025-12-12 1:53 ` Jens Axboe
2025-12-12 2:12 ` Fengnan Chang
2025-12-12 5:11 ` Jens Axboe
2025-12-12 8:58 ` Jens Axboe
2025-12-12 9:49 ` Fengnan Chang [this message]
2025-12-12 20:22 ` Jens Axboe
2025-12-12 13:32 ` Diangang Li
2025-12-12 20:09 ` Jens Axboe
2025-12-15 6:25 ` Diangang Li
2025-12-17 12:34 ` Diangang Li
2025-12-17 16:25 ` Jens Axboe
2025-12-19 5:43 ` Diangang Li
2025-12-10 9:53 ` (subset) [RFC PATCH 0/2] " Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=21672cc5-abc2-4595-94b2-3ab0c2d40cf3@gmail.com \
--to=fengnanchang@gmail.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=changfengnan@bytedance.com \
--cc=io-uring@vger.kernel.org \
--cc=lidiangang@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox