Re: [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

From: Fengnan Chang <fengnanchang@gmail.com>
To: Jens Axboe <axboe@kernel.dk>,
	asml.silence@gmail.com, io-uring@vger.kernel.org
Cc: Fengnan Chang <changfengnan@bytedance.com>,
	Diangang Li <lidiangang@bytedance.com>
Subject: Re: [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode
Date: Fri, 12 Dec 2025 17:49:29 +0800	[thread overview]
Message-ID: <21672cc5-abc2-4595-94b2-3ab0c2d40cf3@gmail.com> (raw)
In-Reply-To: <38972bbc-fb6f-42c5-bd17-b19db134dfad@kernel.dk>



在 2025/12/12 16:58, Jens Axboe 写道:
> On 12/11/25 10:11 PM, Jens Axboe wrote:
>> On 12/11/25 7:12 PM, Fengnan Chang wrote:
>>>
>>> ? 2025/12/12 09:53, Jens Axboe ??:
>>>> On 12/11/25 6:41 PM, Fengnan Chang wrote:
>>>>> Oh, we can't add nr_events == iob.nr_reqs check, if
>>>>> blk_mq_add_to_batch add failed, completed IO will not add into iob,
>>>>> iob.nr_reqs will be 0, this may cause io hang.
>>>> Indeed, won't work as-is.
>>>>
>>>> I do think we're probably making a bigger deal out of the full loop than
>>>> necessary. At least I'd be perfectly happy with just the current patch,
>>>> performance should be better there than we currently have it. Ideally
>>>> we'd have just one loop for polling and catching the completed items,
>>>> but that's a bit tricky with the batch completions.
>>> Yes, ideally one loop would be enough, but given that there are also
>>> multi_queue ctx, that doesn't seem to be possible.
>> It's not removing the double loop, but the below could help _only_
>> iterate completed requests at the end. Rather than move items between
>> the current list at the completion callback, have a separate list just
>> for completed requests. Then we can simply iterate that, knowing all of
>> them have completed. Gets rid of the ->iopoll_completed as well, and
>> then we can move the poll_refs. Not really related at all, obviously
>> this patch should be split into multiple pieces.
>>
>> This uses a lockless list. But since the producer and consumer are
>> generally the same task, that should not add any real overhead. On top
>> of the previous one I sent. What do you think?
> Ran some quick testing, as one interesting case is mixing slower and
> faster devices. Let's take this basic example:
>
> t/io_uring -p1 -d128 -b512 -s32 -c32 -F1 -B1 -R1 -X1 -n1 -P1 -t1 -n2 /dev/nvme32n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
>
> where nvme32n1 is about 1.27M IOPS, and the other 3 do about 3.3M IOPS,
> and we poll 2 devices with each IO thread. With the current kernel, we
> get:
>
> IOPS=5.18M, BW=2.53GiB/s, IOS/call=31/32
> IOPS=5.19M, BW=2.53GiB/s, IOS/call=31/31
> IOPS=5.17M, BW=2.53GiB/s, IOS/call=31/31
>
> and with the two patches we get:
>
> IOPS=6.54M, BW=3.19GiB/s, IOS/call=31/31
> IOPS=6.52M, BW=3.18GiB/s, IOS/call=31/31
> IOPS=6.52M, BW=3.18GiB/s, IOS/call=31/31
>
> or about a 25% improvement. This is mostly due to the issue you
> highlighted, where you end up with later completions (that are done)
> being stuck behind waiting on a slower completion.
>
> Note: obviously 1 thread driving multiple devices for polling could
> still be improved, and in fact it does improve if we simply change -c32
> to something lower. The more important case is the one you identified,
> where different completion times on the same device will hold
> completions up. Multi device polling is just an interesting way to kind
> of emulate that, to an extent.
>
> This effect is (naturally) also apparent in the completion latencies as
> well, particularly in the higher percentiles.
>
> Ran peak testing too, and it's better all around than before.
I love the patch, I had a similar thought, this addresses my concern,  I 
simple tested it
and the performance is a bit better than the previous performance.

base IOPS is 725K,  previous IOPS is 782K, now 790k.
It looks like all the problems are solved,I 'll do more testing next week.
>

next prev parent reply	other threads:[~2025-12-12  9:49 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10  8:54 [RFC PATCH 0/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-10  8:55 ` [RFC PATCH 1/2] blk-mq: delete task running check in blk_hctx_poll Fengnan Chang
2025-12-10  9:19   ` Jens Axboe
2025-12-10  9:53   ` Jens Axboe
2025-12-10  8:55 ` [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-11  2:15   ` Jens Axboe
2025-12-11  4:10     ` Jens Axboe
2025-12-11  7:38       ` Fengnan
2025-12-11 10:22         ` Jens Axboe
2025-12-11 10:33           ` Jens Axboe
2025-12-11 11:13             ` Fengnan Chang
2025-12-11 11:19               ` Jens Axboe
2025-12-12  1:41             ` Fengnan Chang
2025-12-12  1:53               ` Jens Axboe
2025-12-12  2:12                 ` Fengnan Chang
2025-12-12  5:11                   ` Jens Axboe
2025-12-12  8:58                     ` Jens Axboe
2025-12-12  9:49                       ` Fengnan Chang [this message]
2025-12-12 20:22                         ` Jens Axboe
2025-12-12 13:32                     ` Diangang Li
2025-12-12 20:09                       ` Jens Axboe
2025-12-15  6:25                         ` Diangang Li
2025-12-17 12:34                     ` Diangang Li
2025-12-17 16:25                       ` Jens Axboe
2025-12-19  5:43                         ` Diangang Li
2025-12-10  9:53 ` (subset) [RFC PATCH 0/2] " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=21672cc5-abc2-4595-94b2-3ab0c2d40cf3@gmail.com \
    --to=fengnanchang@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=changfengnan@bytedance.com \
    --cc=io-uring@vger.kernel.org \
    --cc=lidiangang@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox