From: "Diangang Li" <lidiangang@bytedance.com>
To: "Jens Axboe" <axboe@kernel.dk>,
"Fengnan Chang" <fengnanchang@gmail.com>,
<asml.silence@gmail.com>, <io-uring@vger.kernel.org>
Cc: "Fengnan Chang" <changfengnan@bytedance.com>
Subject: Re: [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode
Date: Fri, 9 Jan 2026 16:35:18 +0800 [thread overview]
Message-ID: <c360b8bc-fcf9-4a36-8208-9451aaeb9f41@bytedance.com> (raw)
In-Reply-To: <e0dfa76c-c28a-4684-81b4-6ce784ee9a3c@bytedance.com>
On 2025/12/19 13:43, Diangang Li wrote:
>
>
> On 2025/12/18 00:25, Jens Axboe wrote:
>> On 12/17/25 5:34 AM, Diangang Li wrote:
>>> Hi Jens,
>>>
>>> We?ve identified one critical panic issue here.
>>>
>>> [ 4504.422964] [ T63683] list_del corruption, ff2adc9b51d19a90->next is
>>> LIST_POISON1 (dead000000000100)
>>> [ 4504.422994] [ T63683] ------------[ cut here ]------------
>>> [ 4504.422995] [ T63683] kernel BUG at lib/list_debug.c:56!
>>> [ 4504.423006] [ T63683] Oops: invalid opcode: 0000 [#1] SMP NOPTI
>>> [ 4504.423017] [ T63683] CPU: 38 UID: 0 PID: 63683 Comm: io_uring
>>> Kdump: loaded Tainted: G S E 6.19.0-rc1+ #1
>>> PREEMPT(voluntary)
>>> [ 4504.423032] [ T63683] Tainted: [S]=CPU_OUT_OF_SPEC,
>>> [E]=UNSIGNED_MODULE
>>> [ 4504.423040] [ T63683] Hardware name: Inventec S520-A6/Nanping MLB,
>>> BIOS 01.01.01.06.03 03/03/2023
>>> [ 4504.423050] [ T63683] RIP:
>>> 0010:__list_del_entry_valid_or_report+0x94/0x100
>>> [ 4504.423064] [ T63683] Code: 89 fe 48 c7 c7 f0 78 87 b5 e8 38 07 ae
>>> ff 0f 0b 48 89 ef e8 6e 40 cd ff 48 89 ea 48 89 de 48 c7 c7 20 79 87 b5
>>> e8 1c 07 ae ff <0f> 0b 4c 89 e7 e8 52 40 cd ff 4c 89 e2 48 89 de 48 c7
>>> c7 58 79 87
>>> [ 4504.423085] [ T63683] RSP: 0018:ff4efd9f3838fdb0 EFLAGS: 00010246
>>> [ 4504.423093] [ T63683] RAX: 000000000000004e RBX: ff2adc9b51d19a90
>>> RCX: 0000000000000027
>>> [ 4504.423103] [ T63683] RDX: 0000000000000000 RSI: 0000000000000001
>>> RDI: ff2add151cf99580
>>> [ 4504.423112] [ T63683] RBP: dead000000000100 R08: 0000000000000000
>>> R09: 0000000000000003
>>> [ 4504.423120] [ T63683] R10: ff4efd9f3838fc60 R11: ff2add151cdfffe8
>>> R12: dead000000000122
>>> [ 4504.423130] [ T63683] R13: ff2adc9b51d19a00 R14: 0000000000000000
>>> R15: 0000000000000000
>>> [ 4504.423139] [ T63683] FS: 00007fae4f7ff6c0(0000)
>>> GS:ff2add15665f5000(0000) knlGS:0000000000000000
>>> [ 4504.423148] [ T63683] CS: 0010 DS: 0000 ES: 0000 CR0:
>>> 0000000080050033
>>> [ 4504.423157] [ T63683] CR2: 000055aa8afe5000 CR3: 00000083037ee006
>>> CR4: 0000000000773ef0
>>> [ 4504.423166] [ T63683] PKRU: 55555554
>>> [ 4504.423171] [ T63683] Call Trace:
>>> [ 4504.423178] [ T63683] <TASK>
>>> [ 4504.423184] [ T63683] io_do_iopoll+0x298/0x330
>>> [ 4504.423193] [ T63683] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>>> [ 4504.423204] [ T63683] __do_sys_io_uring_enter+0x421/0x770
>>> [ 4504.423214] [ T63683] do_syscall_64+0x67/0xf00
>>> [ 4504.423223] [ T63683] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> [ 4504.423232] [ T63683] RIP: 0033:0x55aa707e99c3
>>>
>>> It can be reproduced in three ways:
>>> - Running iopoll tests while switching the block scheduler
>>> - A split IO scenario in iopoll (e.g., bs=512k with max_sectors_kb=256k)
>>> - Multi poll queues with multi threads
>>>
>>> All cases appear related to IO completions occurring outside the
>>> io_do_iopoll() loop. The root cause remains unclear.
>>
>> Ah I see what it is - we can get multiple completions on the iopoll
>> side, if you have multiple bio's per request. This didn't matter before
>> the patch that uses a lockless list to collect them, as it just marked
>> the request completed by writing to ->iopoll_complete and letting the
>> reaper find them. But it matters with the llist change, as then we're
>> adding the request to the llist more than once.
>>
>>
>
> From e2f749299e3c76ef92d3edfd9f8f7fc9a029129a Mon Sep 17 00:00:00 2001
> From: Diangang Li <lidiangang@bytedance.com>
> Date: Fri, 19 Dec 2025 10:14:33 +0800
> Subject: [PATCH] io_uring: fix race between adding to ctx->iopoll_list
> and ctx->iopoll_complete
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Since commit 316693eb8aed ("io_uring: be smarter about handling IOPOLL
> completions") introduced ctx->iopoll_complete to cache polled
> completions, a request can be enqueued to ctx->iopoll_complete as part
> of a batched poll while it is still in the issuing path.
>
> If the IO was submitted via io_wq_submit_work(), it may still be stuck
> in io_iopoll_req_issued() waiting for ctx->uring_lock, which is held by
> io_do_iopoll(). In this state, io_do_iopoll() may attempt to delete the
> request from ctx->iopoll_list before it has ever been linked, leading to
> a list_del() corruption.
>
> Fix this by introducing an iopoll_state flag to mark whether the request
> has been inserted into ctx->iopoll_list. When io_do_iopoll() tries to
> unlink a request and the flag indicates it hasn’t been linked yet, skip
> the list_del() and just requeue the completion to ctx->iopoll_complete
> for later reap.
>
> Signed-off-by: Diangang Li <lidiangang@bytedance.com>
> Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
> ---
> include/linux/io_uring_types.h | 1 +
> io_uring/io_uring.c | 1 +
> io_uring/rw.c | 7 +++++++
> io_uring/uring_cmd.c | 1 +
> 4 files changed, 10 insertions(+)
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/
> io_uring_types.h
> index 0f619c37dce4..aaf26911badb 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -677,6 +677,7 @@ struct io_kiocb {
> };
>
> u8 opcode;
> + u8 iopoll_state;
>
> bool cancel_seq_set;
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 5e503a0bfcfc..4eb206359d05 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1692,6 +1692,7 @@ static void io_iopoll_req_issued(struct io_kiocb
> *req, unsigned int issue_flags)
> }
>
> list_add_tail(&req->iopoll_node, &ctx->iopoll_list);
> + smp_store_release(&req->iopoll_state, 1);
>
> if (unlikely(needs_lock)) {
> /*
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index ad481ca74a46..d1397739c58b 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -869,6 +869,7 @@ static int io_rw_init_file(struct io_kiocb *req,
> fmode_t mode, int rw_type)
> return -EOPNOTSUPP;
> kiocb->private = NULL;
> kiocb->ki_flags |= IOCB_HIPRI;
> + req->iopoll_state = 0;
> if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) {
> /* make sure every req only blocks once*/
> req->flags &= ~REQ_F_IOPOLL_STATE;
> @@ -1355,6 +1356,12 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool
> force_nonspin)
> struct llist_node *next = node->next;
>
> req = container_of(node, struct io_kiocb, iopoll_done_list);
> + if (!READ_ONCE(req->iopoll_state)) {
> + node->next = NULL;
> + llist_add(&req->iopoll_done_list, &ctx->iopoll_complete);
> + node = next;
> + continue;
> + }
> list_del(&req->iopoll_node);
> wq_list_add_tail(&req->comp_list, &ctx->submit_state.compl_reqs);
> nr_events++;
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index 0841fa541f5d..cf2eacea5be8 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -251,6 +251,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int
> issue_flags)
> if (!file->f_op->uring_cmd_iopoll)
> return -EOPNOTSUPP;
> issue_flags |= IO_URING_F_IOPOLL;
> + req->iopoll_state = 0;
> if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) {
> /* make sure every req only blocks once */
> req->flags &= ~REQ_F_IOPOLL_STATE;
Hi Jens,
Regarding the analysis of this list_del corruption issue and the fix
patch, do you have any other comments?
Best regards,
Diangang Li
next prev parent reply other threads:[~2026-01-09 8:40 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-10 8:54 [RFC PATCH 0/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-10 8:55 ` [RFC PATCH 1/2] blk-mq: delete task running check in blk_hctx_poll Fengnan Chang
2025-12-10 9:19 ` Jens Axboe
2025-12-10 9:53 ` Jens Axboe
2025-12-10 8:55 ` [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-11 2:15 ` Jens Axboe
2025-12-11 4:10 ` Jens Axboe
2025-12-11 7:38 ` Fengnan
2025-12-11 10:22 ` Jens Axboe
2025-12-11 10:33 ` Jens Axboe
2025-12-11 11:13 ` Fengnan Chang
2025-12-11 11:19 ` Jens Axboe
2025-12-12 1:41 ` Fengnan Chang
2025-12-12 1:53 ` Jens Axboe
2025-12-12 2:12 ` Fengnan Chang
2025-12-12 5:11 ` Jens Axboe
2025-12-12 8:58 ` Jens Axboe
2025-12-12 9:49 ` Fengnan Chang
2025-12-12 20:22 ` Jens Axboe
2025-12-12 13:32 ` Diangang Li
2025-12-12 20:09 ` Jens Axboe
2025-12-15 6:25 ` Diangang Li
2025-12-17 12:34 ` Diangang Li
2025-12-17 16:25 ` Jens Axboe
2025-12-19 5:43 ` Diangang Li
2026-01-09 8:35 ` Diangang Li [this message]
2026-01-09 23:27 ` Jens Axboe
2025-12-10 9:53 ` (subset) [RFC PATCH 0/2] " Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c360b8bc-fcf9-4a36-8208-9451aaeb9f41@bytedance.com \
--to=lidiangang@bytedance.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=changfengnan@bytedance.com \
--cc=fengnanchang@gmail.com \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox