From: Jens Axboe <axboe@kernel.dk>
To: Diangang Li <lidiangang@bytedance.com>,
Fengnan Chang <fengnanchang@gmail.com>,
asml.silence@gmail.com, io-uring@vger.kernel.org
Cc: Fengnan Chang <changfengnan@bytedance.com>
Subject: Re: [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode
Date: Fri, 9 Jan 2026 16:27:44 -0700 [thread overview]
Message-ID: <9006a5ad-11c0-4b37-8c7c-ad20a09da081@kernel.dk> (raw)
In-Reply-To: <c360b8bc-fcf9-4a36-8208-9451aaeb9f41@bytedance.com>
On 1/9/26 1:35 AM, Diangang Li wrote:
> On 2025/12/19 13:43, Diangang Li wrote:
>>
>>
>> On 2025/12/18 00:25, Jens Axboe wrote:
>>> On 12/17/25 5:34 AM, Diangang Li wrote:
>>>> Hi Jens,
>>>>
>>>> We?ve identified one critical panic issue here.
>>>>
>>>> [ 4504.422964] [ T63683] list_del corruption, ff2adc9b51d19a90->next is
>>>> LIST_POISON1 (dead000000000100)
>>>> [ 4504.422994] [ T63683] ------------[ cut here ]------------
>>>> [ 4504.422995] [ T63683] kernel BUG at lib/list_debug.c:56!
>>>> [ 4504.423006] [ T63683] Oops: invalid opcode: 0000 [#1] SMP NOPTI
>>>> [ 4504.423017] [ T63683] CPU: 38 UID: 0 PID: 63683 Comm: io_uring
>>>> Kdump: loaded Tainted: G S E 6.19.0-rc1+ #1
>>>> PREEMPT(voluntary)
>>>> [ 4504.423032] [ T63683] Tainted: [S]=CPU_OUT_OF_SPEC,
>>>> [E]=UNSIGNED_MODULE
>>>> [ 4504.423040] [ T63683] Hardware name: Inventec S520-A6/Nanping MLB,
>>>> BIOS 01.01.01.06.03 03/03/2023
>>>> [ 4504.423050] [ T63683] RIP:
>>>> 0010:__list_del_entry_valid_or_report+0x94/0x100
>>>> [ 4504.423064] [ T63683] Code: 89 fe 48 c7 c7 f0 78 87 b5 e8 38 07 ae
>>>> ff 0f 0b 48 89 ef e8 6e 40 cd ff 48 89 ea 48 89 de 48 c7 c7 20 79 87 b5
>>>> e8 1c 07 ae ff <0f> 0b 4c 89 e7 e8 52 40 cd ff 4c 89 e2 48 89 de 48 c7
>>>> c7 58 79 87
>>>> [ 4504.423085] [ T63683] RSP: 0018:ff4efd9f3838fdb0 EFLAGS: 00010246
>>>> [ 4504.423093] [ T63683] RAX: 000000000000004e RBX: ff2adc9b51d19a90
>>>> RCX: 0000000000000027
>>>> [ 4504.423103] [ T63683] RDX: 0000000000000000 RSI: 0000000000000001
>>>> RDI: ff2add151cf99580
>>>> [ 4504.423112] [ T63683] RBP: dead000000000100 R08: 0000000000000000
>>>> R09: 0000000000000003
>>>> [ 4504.423120] [ T63683] R10: ff4efd9f3838fc60 R11: ff2add151cdfffe8
>>>> R12: dead000000000122
>>>> [ 4504.423130] [ T63683] R13: ff2adc9b51d19a00 R14: 0000000000000000
>>>> R15: 0000000000000000
>>>> [ 4504.423139] [ T63683] FS: 00007fae4f7ff6c0(0000)
>>>> GS:ff2add15665f5000(0000) knlGS:0000000000000000
>>>> [ 4504.423148] [ T63683] CS: 0010 DS: 0000 ES: 0000 CR0:
>>>> 0000000080050033
>>>> [ 4504.423157] [ T63683] CR2: 000055aa8afe5000 CR3: 00000083037ee006
>>>> CR4: 0000000000773ef0
>>>> [ 4504.423166] [ T63683] PKRU: 55555554
>>>> [ 4504.423171] [ T63683] Call Trace:
>>>> [ 4504.423178] [ T63683] <TASK>
>>>> [ 4504.423184] [ T63683] io_do_iopoll+0x298/0x330
>>>> [ 4504.423193] [ T63683] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>>> [ 4504.423204] [ T63683] __do_sys_io_uring_enter+0x421/0x770
>>>> [ 4504.423214] [ T63683] do_syscall_64+0x67/0xf00
>>>> [ 4504.423223] [ T63683] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>> [ 4504.423232] [ T63683] RIP: 0033:0x55aa707e99c3
>>>>
>>>> It can be reproduced in three ways:
>>>> - Running iopoll tests while switching the block scheduler
>>>> - A split IO scenario in iopoll (e.g., bs=512k with max_sectors_kb=256k)
>>>> - Multi poll queues with multi threads
>>>>
>>>> All cases appear related to IO completions occurring outside the
>>>> io_do_iopoll() loop. The root cause remains unclear.
>>>
>>> Ah I see what it is - we can get multiple completions on the iopoll
>>> side, if you have multiple bio's per request. This didn't matter before
>>> the patch that uses a lockless list to collect them, as it just marked
>>> the request completed by writing to ->iopoll_complete and letting the
>>> reaper find them. But it matters with the llist change, as then we're
>>> adding the request to the llist more than once.
>>>
>>>
>>
>> From e2f749299e3c76ef92d3edfd9f8f7fc9a029129a Mon Sep 17 00:00:00 2001
>> From: Diangang Li <lidiangang@bytedance.com>
>> Date: Fri, 19 Dec 2025 10:14:33 +0800
>> Subject: [PATCH] io_uring: fix race between adding to ctx->iopoll_list
>> and ctx->iopoll_complete
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset=UTF-8
>> Content-Transfer-Encoding: 8bit
>>
>> Since commit 316693eb8aed ("io_uring: be smarter about handling IOPOLL
>> completions") introduced ctx->iopoll_complete to cache polled
>> completions, a request can be enqueued to ctx->iopoll_complete as part
>> of a batched poll while it is still in the issuing path.
>>
>> If the IO was submitted via io_wq_submit_work(), it may still be stuck
>> in io_iopoll_req_issued() waiting for ctx->uring_lock, which is held by
>> io_do_iopoll(). In this state, io_do_iopoll() may attempt to delete the
>> request from ctx->iopoll_list before it has ever been linked, leading to
>> a list_del() corruption.
>>
>> Fix this by introducing an iopoll_state flag to mark whether the request
>> has been inserted into ctx->iopoll_list. When io_do_iopoll() tries to
>> unlink a request and the flag indicates it hasn?t been linked yet, skip
>> the list_del() and just requeue the completion to ctx->iopoll_complete
>> for later reap.
>>
>> Signed-off-by: Diangang Li <lidiangang@bytedance.com>
>> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
>> ---
>> include/linux/io_uring_types.h | 1 +
>> io_uring/io_uring.c | 1 +
>> io_uring/rw.c | 7 +++++++
>> io_uring/uring_cmd.c | 1 +
>> 4 files changed, 10 insertions(+)
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/
>> io_uring_types.h
>> index 0f619c37dce4..aaf26911badb 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -677,6 +677,7 @@ struct io_kiocb {
>> };
>>
>> u8 opcode;
>> + u8 iopoll_state;
>>
>> bool cancel_seq_set;
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 5e503a0bfcfc..4eb206359d05 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -1692,6 +1692,7 @@ static void io_iopoll_req_issued(struct io_kiocb
>> *req, unsigned int issue_flags)
>> }
>>
>> list_add_tail(&req->iopoll_node, &ctx->iopoll_list);
>> + smp_store_release(&req->iopoll_state, 1);
>>
>> if (unlikely(needs_lock)) {
>> /*
>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>> index ad481ca74a46..d1397739c58b 100644
>> --- a/io_uring/rw.c
>> +++ b/io_uring/rw.c
>> @@ -869,6 +869,7 @@ static int io_rw_init_file(struct io_kiocb *req,
>> fmode_t mode, int rw_type)
>> return -EOPNOTSUPP;
>> kiocb->private = NULL;
>> kiocb->ki_flags |= IOCB_HIPRI;
>> + req->iopoll_state = 0;
>> if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) {
>> /* make sure every req only blocks once*/
>> req->flags &= ~REQ_F_IOPOLL_STATE;
>> @@ -1355,6 +1356,12 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool
>> force_nonspin)
>> struct llist_node *next = node->next;
>>
>> req = container_of(node, struct io_kiocb, iopoll_done_list);
>> + if (!READ_ONCE(req->iopoll_state)) {
>> + node->next = NULL;
>> + llist_add(&req->iopoll_done_list, &ctx->iopoll_complete);
>> + node = next;
>> + continue;
>> + }
>> list_del(&req->iopoll_node);
>> wq_list_add_tail(&req->comp_list, &ctx->submit_state.compl_reqs);
>> nr_events++;
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index 0841fa541f5d..cf2eacea5be8 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -251,6 +251,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int
>> issue_flags)
>> if (!file->f_op->uring_cmd_iopoll)
>> return -EOPNOTSUPP;
>> issue_flags |= IO_URING_F_IOPOLL;
>> + req->iopoll_state = 0;
>> if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) {
>> /* make sure every req only blocks once */
>> req->flags &= ~REQ_F_IOPOLL_STATE;
>
> Hi Jens,
>
> Regarding the analysis of this list_del corruption issue and the fix
> patch, do you have any other comments?
I just dropped the second part of the iopoll changes, so it's back to
just this one:
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git/commit/?h=for-7.0/io_uring&id=3c7d76d6128a0fef68e6540754bf85a44a29bb59
I didn't have an immediately good idea for solving it without doing more
locking and/or synchronization, and I wasn't convinced it was worth it.
I'll ponder it some more next week and pick it back up.
--
Jens Axboe
next prev parent reply other threads:[~2026-01-09 23:27 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-10 8:54 [RFC PATCH 0/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-10 8:55 ` [RFC PATCH 1/2] blk-mq: delete task running check in blk_hctx_poll Fengnan Chang
2025-12-10 9:19 ` Jens Axboe
2025-12-10 9:53 ` Jens Axboe
2025-12-10 8:55 ` [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-11 2:15 ` Jens Axboe
2025-12-11 4:10 ` Jens Axboe
2025-12-11 7:38 ` Fengnan
2025-12-11 10:22 ` Jens Axboe
2025-12-11 10:33 ` Jens Axboe
2025-12-11 11:13 ` Fengnan Chang
2025-12-11 11:19 ` Jens Axboe
2025-12-12 1:41 ` Fengnan Chang
2025-12-12 1:53 ` Jens Axboe
2025-12-12 2:12 ` Fengnan Chang
2025-12-12 5:11 ` Jens Axboe
2025-12-12 8:58 ` Jens Axboe
2025-12-12 9:49 ` Fengnan Chang
2025-12-12 20:22 ` Jens Axboe
2025-12-12 13:32 ` Diangang Li
2025-12-12 20:09 ` Jens Axboe
2025-12-15 6:25 ` Diangang Li
2025-12-17 12:34 ` Diangang Li
2025-12-17 16:25 ` Jens Axboe
2025-12-19 5:43 ` Diangang Li
2026-01-09 8:35 ` Diangang Li
2026-01-09 23:27 ` Jens Axboe [this message]
2025-12-10 9:53 ` (subset) [RFC PATCH 0/2] " Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9006a5ad-11c0-4b37-8c7c-ad20a09da081@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=changfengnan@bytedance.com \
--cc=fengnanchang@gmail.com \
--cc=io-uring@vger.kernel.org \
--cc=lidiangang@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox