public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: "Diangang Li" <lidiangang@bytedance.com>
To: "Jens Axboe" <axboe@kernel.dk>,
	"Fengnan Chang" <fengnanchang@gmail.com>,
	 <asml.silence@gmail.com>, <io-uring@vger.kernel.org>
Cc: "Fengnan Chang" <changfengnan@bytedance.com>
Subject: Re: [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode
Date: Fri, 19 Dec 2025 13:43:15 +0800	[thread overview]
Message-ID: <e0dfa76c-c28a-4684-81b4-6ce784ee9a3c@bytedance.com> (raw)
In-Reply-To: <9a8418d8-439f-4dd2-b3fe-33567129861e@kernel.dk>

On 2025/12/18 00:25, Jens Axboe wrote:
> On 12/17/25 5:34 AM, Diangang Li wrote:
>> Hi Jens,
>>
>> We?ve identified one critical panic issue here.
>>
>> [ 4504.422964] [  T63683] list_del corruption, ff2adc9b51d19a90->next is
>> LIST_POISON1 (dead000000000100)
>> [ 4504.422994] [  T63683] ------------[ cut here ]------------
>> [ 4504.422995] [  T63683] kernel BUG at lib/list_debug.c:56!
>> [ 4504.423006] [  T63683] Oops: invalid opcode: 0000 [#1] SMP NOPTI
>> [ 4504.423017] [  T63683] CPU: 38 UID: 0 PID: 63683 Comm: io_uring
>> Kdump: loaded Tainted: G S          E       6.19.0-rc1+ #1
>> PREEMPT(voluntary)
>> [ 4504.423032] [  T63683] Tainted: [S]=CPU_OUT_OF_SPEC, [E]=UNSIGNED_MODULE
>> [ 4504.423040] [  T63683] Hardware name: Inventec S520-A6/Nanping MLB,
>> BIOS 01.01.01.06.03 03/03/2023
>> [ 4504.423050] [  T63683] RIP:
>> 0010:__list_del_entry_valid_or_report+0x94/0x100
>> [ 4504.423064] [  T63683] Code: 89 fe 48 c7 c7 f0 78 87 b5 e8 38 07 ae
>> ff 0f 0b 48 89 ef e8 6e 40 cd ff 48 89 ea 48 89 de 48 c7 c7 20 79 87 b5
>> e8 1c 07 ae ff <0f> 0b 4c 89 e7 e8 52 40 cd ff 4c 89 e2 48 89 de 48 c7
>> c7 58 79 87
>> [ 4504.423085] [  T63683] RSP: 0018:ff4efd9f3838fdb0 EFLAGS: 00010246
>> [ 4504.423093] [  T63683] RAX: 000000000000004e RBX: ff2adc9b51d19a90
>> RCX: 0000000000000027
>> [ 4504.423103] [  T63683] RDX: 0000000000000000 RSI: 0000000000000001
>> RDI: ff2add151cf99580
>> [ 4504.423112] [  T63683] RBP: dead000000000100 R08: 0000000000000000
>> R09: 0000000000000003
>> [ 4504.423120] [  T63683] R10: ff4efd9f3838fc60 R11: ff2add151cdfffe8
>> R12: dead000000000122
>> [ 4504.423130] [  T63683] R13: ff2adc9b51d19a00 R14: 0000000000000000
>> R15: 0000000000000000
>> [ 4504.423139] [  T63683] FS:  00007fae4f7ff6c0(0000)
>> GS:ff2add15665f5000(0000) knlGS:0000000000000000
>> [ 4504.423148] [  T63683] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 4504.423157] [  T63683] CR2: 000055aa8afe5000 CR3: 00000083037ee006
>> CR4: 0000000000773ef0
>> [ 4504.423166] [  T63683] PKRU: 55555554
>> [ 4504.423171] [  T63683] Call Trace:
>> [ 4504.423178] [  T63683]  <TASK>
>> [ 4504.423184] [  T63683]  io_do_iopoll+0x298/0x330
>> [ 4504.423193] [  T63683]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
>> [ 4504.423204] [  T63683]  __do_sys_io_uring_enter+0x421/0x770
>> [ 4504.423214] [  T63683]  do_syscall_64+0x67/0xf00
>> [ 4504.423223] [  T63683]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [ 4504.423232] [  T63683] RIP: 0033:0x55aa707e99c3
>>
>> It can be reproduced in three ways:
>> - Running iopoll tests while switching the block scheduler
>> - A split IO scenario in iopoll (e.g., bs=512k with max_sectors_kb=256k)
>> - Multi poll queues with multi threads
>>
>> All cases appear related to IO completions occurring outside the
>> io_do_iopoll() loop. The root cause remains unclear.
> 
> Ah I see what it is - we can get multiple completions on the iopoll
> side, if you have multiple bio's per request. This didn't matter before
> the patch that uses a lockless list to collect them, as it just marked
> the request completed by writing to ->iopoll_complete and letting the
> reaper find them. But it matters with the llist change, as then we're
> adding the request to the llist more than once.
> 
> 

 From e2f749299e3c76ef92d3edfd9f8f7fc9a029129a Mon Sep 17 00:00:00 2001
From: Diangang Li <lidiangang@bytedance.com>
Date: Fri, 19 Dec 2025 10:14:33 +0800
Subject: [PATCH] io_uring: fix race between adding to ctx->iopoll_list 
and ctx->iopoll_complete
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Since commit 316693eb8aed ("io_uring: be smarter about handling IOPOLL
completions") introduced ctx->iopoll_complete to cache polled 
completions, a request can be enqueued to ctx->iopoll_complete as part 
of a batched poll while it is still in the issuing path.

If the IO was submitted via io_wq_submit_work(), it may still be stuck 
in io_iopoll_req_issued() waiting for ctx->uring_lock, which is held by
io_do_iopoll(). In this state, io_do_iopoll() may attempt to delete the
request from ctx->iopoll_list before it has ever been linked, leading to 
a list_del() corruption.

Fix this by introducing an iopoll_state flag to mark whether the request
has been inserted into ctx->iopoll_list. When io_do_iopoll() tries to
unlink a request and the flag indicates it hasn’t been linked yet, skip
the list_del() and just requeue the completion to ctx->iopoll_complete 
for later reap.

Signed-off-by: Diangang Li <lidiangang@bytedance.com>
Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
---
  include/linux/io_uring_types.h | 1 +
  io_uring/io_uring.c            | 1 +
  io_uring/rw.c                  | 7 +++++++
  io_uring/uring_cmd.c           | 1 +
  4 files changed, 10 insertions(+)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 0f619c37dce4..aaf26911badb 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -677,6 +677,7 @@ struct io_kiocb {
  	};

  	u8				opcode;
+	u8				iopoll_state;

  	bool				cancel_seq_set;

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5e503a0bfcfc..4eb206359d05 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1692,6 +1692,7 @@ static void io_iopoll_req_issued(struct io_kiocb 
*req, unsigned int issue_flags)
  	}

  	list_add_tail(&req->iopoll_node, &ctx->iopoll_list);
+	smp_store_release(&req->iopoll_state, 1);

  	if (unlikely(needs_lock)) {
  		/*
diff --git a/io_uring/rw.c b/io_uring/rw.c
index ad481ca74a46..d1397739c58b 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -869,6 +869,7 @@ static int io_rw_init_file(struct io_kiocb *req, 
fmode_t mode, int rw_type)
  			return -EOPNOTSUPP;
  		kiocb->private = NULL;
  		kiocb->ki_flags |= IOCB_HIPRI;
+		req->iopoll_state = 0;
  		if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) {
  			/* make sure every req only blocks once*/
  			req->flags &= ~REQ_F_IOPOLL_STATE;
@@ -1355,6 +1356,12 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool 
force_nonspin)
  		struct llist_node *next = node->next;

  		req = container_of(node, struct io_kiocb, iopoll_done_list);
+		if (!READ_ONCE(req->iopoll_state)) {
+			node->next = NULL;
+			llist_add(&req->iopoll_done_list, &ctx->iopoll_complete);
+			node = next;
+			continue;
+		}
  		list_del(&req->iopoll_node);
  		wq_list_add_tail(&req->comp_list, &ctx->submit_state.compl_reqs);
  		nr_events++;
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 0841fa541f5d..cf2eacea5be8 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -251,6 +251,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int 
issue_flags)
  		if (!file->f_op->uring_cmd_iopoll)
  			return -EOPNOTSUPP;
  		issue_flags |= IO_URING_F_IOPOLL;
+		req->iopoll_state = 0;
  		if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) {
  			/* make sure every req only blocks once */
  			req->flags &= ~REQ_F_IOPOLL_STATE;
-- 
2.20.1

  reply	other threads:[~2025-12-19  5:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-10  8:54 [RFC PATCH 0/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-10  8:55 ` [RFC PATCH 1/2] blk-mq: delete task running check in blk_hctx_poll Fengnan Chang
2025-12-10  9:19   ` Jens Axboe
2025-12-10  9:53   ` Jens Axboe
2025-12-10  8:55 ` [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode Fengnan Chang
2025-12-11  2:15   ` Jens Axboe
2025-12-11  4:10     ` Jens Axboe
2025-12-11  7:38       ` Fengnan
2025-12-11 10:22         ` Jens Axboe
2025-12-11 10:33           ` Jens Axboe
2025-12-11 11:13             ` Fengnan Chang
2025-12-11 11:19               ` Jens Axboe
2025-12-12  1:41             ` Fengnan Chang
2025-12-12  1:53               ` Jens Axboe
2025-12-12  2:12                 ` Fengnan Chang
2025-12-12  5:11                   ` Jens Axboe
2025-12-12  8:58                     ` Jens Axboe
2025-12-12  9:49                       ` Fengnan Chang
2025-12-12 20:22                         ` Jens Axboe
2025-12-12 13:32                     ` Diangang Li
2025-12-12 20:09                       ` Jens Axboe
2025-12-15  6:25                         ` Diangang Li
2025-12-17 12:34                     ` Diangang Li
2025-12-17 16:25                       ` Jens Axboe
2025-12-19  5:43                         ` Diangang Li [this message]
2025-12-10  9:53 ` (subset) [RFC PATCH 0/2] " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e0dfa76c-c28a-4684-81b4-6ce784ee9a3c@bytedance.com \
    --to=lidiangang@bytedance.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=changfengnan@bytedance.com \
    --cc=fengnanchang@gmail.com \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox