public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Fengnan Chang <fengnanchang@gmail.com>,
	asml.silence@gmail.com, io-uring@vger.kernel.org
Cc: Fengnan Chang <changfengnan@bytedance.com>,
	Diangang Li <lidiangang@bytedance.com>
Subject: Re: [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode
Date: Fri, 12 Dec 2025 13:22:21 -0700	[thread overview]
Message-ID: <8035dbf3-8b6c-4a6a-875a-0c59d3800aab@kernel.dk> (raw)
In-Reply-To: <21672cc5-abc2-4595-94b2-3ab0c2d40cf3@gmail.com>

On 12/12/25 2:49 AM, Fengnan Chang wrote:
> 
> 
> ? 2025/12/12 16:58, Jens Axboe ??:
>> On 12/11/25 10:11 PM, Jens Axboe wrote:
>>> On 12/11/25 7:12 PM, Fengnan Chang wrote:
>>>>
>>>> ? 2025/12/12 09:53, Jens Axboe ??:
>>>>> On 12/11/25 6:41 PM, Fengnan Chang wrote:
>>>>>> Oh, we can't add nr_events == iob.nr_reqs check, if
>>>>>> blk_mq_add_to_batch add failed, completed IO will not add into iob,
>>>>>> iob.nr_reqs will be 0, this may cause io hang.
>>>>> Indeed, won't work as-is.
>>>>>
>>>>> I do think we're probably making a bigger deal out of the full loop than
>>>>> necessary. At least I'd be perfectly happy with just the current patch,
>>>>> performance should be better there than we currently have it. Ideally
>>>>> we'd have just one loop for polling and catching the completed items,
>>>>> but that's a bit tricky with the batch completions.
>>>> Yes, ideally one loop would be enough, but given that there are also
>>>> multi_queue ctx, that doesn't seem to be possible.
>>> It's not removing the double loop, but the below could help _only_
>>> iterate completed requests at the end. Rather than move items between
>>> the current list at the completion callback, have a separate list just
>>> for completed requests. Then we can simply iterate that, knowing all of
>>> them have completed. Gets rid of the ->iopoll_completed as well, and
>>> then we can move the poll_refs. Not really related at all, obviously
>>> this patch should be split into multiple pieces.
>>>
>>> This uses a lockless list. But since the producer and consumer are
>>> generally the same task, that should not add any real overhead. On top
>>> of the previous one I sent. What do you think?
>> Ran some quick testing, as one interesting case is mixing slower and
>> faster devices. Let's take this basic example:
>>
>> t/io_uring -p1 -d128 -b512 -s32 -c32 -F1 -B1 -R1 -X1 -n1 -P1 -t1 -n2 /dev/nvme32n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
>>
>> where nvme32n1 is about 1.27M IOPS, and the other 3 do about 3.3M IOPS,
>> and we poll 2 devices with each IO thread. With the current kernel, we
>> get:
>>
>> IOPS=5.18M, BW=2.53GiB/s, IOS/call=31/32
>> IOPS=5.19M, BW=2.53GiB/s, IOS/call=31/31
>> IOPS=5.17M, BW=2.53GiB/s, IOS/call=31/31
>>
>> and with the two patches we get:
>>
>> IOPS=6.54M, BW=3.19GiB/s, IOS/call=31/31
>> IOPS=6.52M, BW=3.18GiB/s, IOS/call=31/31
>> IOPS=6.52M, BW=3.18GiB/s, IOS/call=31/31
>>
>> or about a 25% improvement. This is mostly due to the issue you
>> highlighted, where you end up with later completions (that are done)
>> being stuck behind waiting on a slower completion.
>>
>> Note: obviously 1 thread driving multiple devices for polling could
>> still be improved, and in fact it does improve if we simply change -c32
>> to something lower. The more important case is the one you identified,
>> where different completion times on the same device will hold
>> completions up. Multi device polling is just an interesting way to kind
>> of emulate that, to an extent.
>>
>> This effect is (naturally) also apparent in the completion latencies as
>> well, particularly in the higher percentiles.
>>
>> Ran peak testing too, and it's better all around than before.
> I love the patch, I had a similar thought, this addresses my concern,  I simple tested it
> and the performance is a bit better than the previous performance.
> 
> base IOPS is 725K,  previous IOPS is 782K, now 790k.
> It looks like all the problems are solved,I 'll do more testing next week.

Nice, thanks! FWIW, I put the patches in a branch here, just to have
them somewhat organized and easier to iterate/test on:

https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git/log/?h=io_uring-iopoll

-- 
Jens Axboe

  reply	other threads:[~2025-12-12 20:22 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10  8:54 [RFC PATCH 0/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-10  8:55 ` [RFC PATCH 1/2] blk-mq: delete task running check in blk_hctx_poll Fengnan Chang
2025-12-10  9:19   ` Jens Axboe
2025-12-10  9:53   ` Jens Axboe
2025-12-10  8:55 ` [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-11  2:15   ` Jens Axboe
2025-12-11  4:10     ` Jens Axboe
2025-12-11  7:38       ` Fengnan
2025-12-11 10:22         ` Jens Axboe
2025-12-11 10:33           ` Jens Axboe
2025-12-11 11:13             ` Fengnan Chang
2025-12-11 11:19               ` Jens Axboe
2025-12-12  1:41             ` Fengnan Chang
2025-12-12  1:53               ` Jens Axboe
2025-12-12  2:12                 ` Fengnan Chang
2025-12-12  5:11                   ` Jens Axboe
2025-12-12  8:58                     ` Jens Axboe
2025-12-12  9:49                       ` Fengnan Chang
2025-12-12 20:22                         ` Jens Axboe [this message]
2025-12-12 13:32                     ` Diangang Li
2025-12-12 20:09                       ` Jens Axboe
2025-12-15  6:25                         ` Diangang Li
2025-12-10  9:53 ` (subset) [RFC PATCH 0/2] " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8035dbf3-8b6c-4a6a-875a-0c59d3800aab@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=changfengnan@bytedance.com \
    --cc=fengnanchang@gmail.com \
    --cc=io-uring@vger.kernel.org \
    --cc=lidiangang@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox