* [PATCH] io_uring/tctx: add separate lock for list of tctx's in ctx
@ 2025-12-31 15:41 Jens Axboe
2026-01-01 9:47 ` [syzbot ci] " syzbot ci
0 siblings, 1 reply; 2+ messages in thread
From: Jens Axboe @ 2025-12-31 15:41 UTC (permalink / raw)
To: io-uring
ctx->tcxt_list holds the tasks using this ring, and it's currently
protected by the normal ctx->uring_lock. However, this can cause a
circular locking issue, as reported by syzbot, where cancelations off
exec end up needing to remove an entry from this list:
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G L
------------------------------------------------------
syz.0.9999/12287 is trying to acquire lock:
ffff88805851c0a8 (&ctx->uring_lock){+.+.}-{4:4}, at: io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
but task is already holding lock:
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&sig->cred_guard_mutex){+.+.}-{4:4}:
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
proc_pid_attr_write+0x547/0x630 fs/proc/base.c:2837
vfs_write+0x27e/0xb30 fs/read_write.c:684
ksys_write+0x145/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #1 (sb_writers#3){.+.+}-{0:0}:
percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
__sb_start_write include/linux/fs/super.h:19 [inline]
sb_start_write+0x4d/0x1c0 include/linux/fs/super.h:125
mnt_want_write+0x41/0x90 fs/namespace.c:499
open_last_lookups fs/namei.c:4529 [inline]
path_openat+0xadd/0x3dd0 fs/namei.c:4784
do_filp_open+0x1fa/0x410 fs/namei.c:4814
io_openat2+0x3e0/0x5c0 io_uring/openclose.c:143
__io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1792
io_issue_sqe+0x165/0x1060 io_uring/io_uring.c:1815
io_queue_sqe io_uring/io_uring.c:2042 [inline]
io_submit_sqe io_uring/io_uring.c:2320 [inline]
io_submit_sqes+0xbf4/0x2140 io_uring/io_uring.c:2434
__do_sys_io_uring_enter io_uring/io_uring.c:3280 [inline]
__se_sys_io_uring_enter+0x2e0/0x2b60 io_uring/io_uring.c:3219
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&ctx->uring_lock){+.+.}-{4:4}:
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
io_uring_task_cancel include/linux/io_uring.h:24 [inline]
begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
search_binary_handler fs/exec.c:1669 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve+0x92e/0x1400 fs/exec.c:1753
do_execveat_common+0x510/0x6a0 fs/exec.c:1859
do_execve fs/exec.c:1933 [inline]
__do_sys_execve fs/exec.c:2009 [inline]
__se_sys_execve fs/exec.c:2004 [inline]
__x64_sys_execve+0x94/0xb0 fs/exec.c:2004
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Chain exists of:
&ctx->uring_lock --> sb_writers#3 --> &sig->cred_guard_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sig->cred_guard_mutex);
lock(sb_writers#3);
lock(&sig->cred_guard_mutex);
lock(&ctx->uring_lock);
*** DEADLOCK ***
1 lock held by syz.0.9999/12287:
#0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
#0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733
stack backtrace:
CPU: 0 UID: 0 PID: 12287 Comm: syz.0.9999 Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_circular_bug+0x2e2/0x300 kernel/locking/lockdep.c:2043
check_noncircular+0x12e/0x150 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
io_uring_task_cancel include/linux/io_uring.h:24 [inline]
begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
search_binary_handler fs/exec.c:1669 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve+0x92e/0x1400 fs/exec.c:1753
do_execveat_common+0x510/0x6a0 fs/exec.c:1859
do_execve fs/exec.c:1933 [inline]
__do_sys_execve fs/exec.c:2009 [inline]
__se_sys_execve fs/exec.c:2004 [inline]
__x64_sys_execve+0x94/0xb0 fs/exec.c:2004
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff3a8b8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff3a9a97038 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 00007ff3a8de5fa0 RCX: 00007ff3a8b8f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000000400
RBP: 00007ff3a8c13f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff3a8de6038 R14: 00007ff3a8de5fa0 R15: 00007ff3a8f0fa28
</TASK>
Add a separate lock just for the tctx_list, tctx_lock. This can nest
under ->uring_lock, where necessary, and be used separately for list
manipulation. For the cancelation off exec side, this removes the
need to grab ->uring_lock, hence fixing the circular locking
dependency.
Reported-by: syzbot+b0e3b77ffaa8a4067ce5@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index e1adb0d20a0a..a3e8ddc9b380 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -424,11 +424,17 @@ struct io_ring_ctx {
struct user_struct *user;
struct mm_struct *mm_account;
+ /*
+ * List of tctx nodes for this ctx, protected by tctx_lock. For
+ * cancelation purposes, nests under uring_lock.
+ */
+ struct list_head tctx_list;
+ struct mutex tctx_lock;
+
/* ctx exit and cancelation */
struct llist_head fallback_llist;
struct delayed_work fallback_work;
struct work_struct exit_work;
- struct list_head tctx_list;
struct completion ref_comp;
/* io-wq management, e.g. thread count */
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index ca12ac10c0ae..e1f9d5688bba 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -185,6 +185,7 @@ static int __io_async_cancel(struct io_cancel_data *cd,
/* slow path, try all io-wq's */
io_ring_submit_lock(ctx, issue_flags);
+ mutex_lock(&ctx->tctx_lock);
ret = -ENOENT;
list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
ret = io_async_cancel_one(node->task->io_uring, cd);
@@ -194,6 +195,7 @@ static int __io_async_cancel(struct io_cancel_data *cd,
nr++;
}
}
+ mutex_unlock(&ctx->tctx_lock);
io_ring_submit_unlock(ctx, issue_flags);
return all ? nr : ret;
}
@@ -484,6 +486,7 @@ static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
bool ret = false;
mutex_lock(&ctx->uring_lock);
+ mutex_lock(&ctx->tctx_lock);
list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
struct io_uring_task *tctx = node->task->io_uring;
@@ -496,6 +499,7 @@ static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true);
ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
}
+ mutex_unlock(&ctx->tctx_lock);
mutex_unlock(&ctx->uring_lock);
return ret;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 709943fedaf4..87a87396e940 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -340,6 +340,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_LIST_HEAD(&ctx->ltimeout_list);
init_llist_head(&ctx->work_llist);
INIT_LIST_HEAD(&ctx->tctx_list);
+ mutex_init(&ctx->tctx_lock);
ctx->submit_state.free_list.next = NULL;
INIT_HLIST_HEAD(&ctx->waitid_list);
xa_init_flags(&ctx->zcrx_ctxs, XA_FLAGS_ALLOC);
@@ -3045,6 +3046,7 @@ static __cold void io_ring_exit_work(struct work_struct *work)
exit.ctx = ctx;
mutex_lock(&ctx->uring_lock);
+ mutex_lock(&ctx->tctx_lock);
while (!list_empty(&ctx->tctx_list)) {
WARN_ON_ONCE(time_after(jiffies, timeout));
@@ -3056,6 +3058,7 @@ static __cold void io_ring_exit_work(struct work_struct *work)
if (WARN_ON_ONCE(ret))
continue;
+ mutex_unlock(&ctx->tctx_lock);
mutex_unlock(&ctx->uring_lock);
/*
* See comment above for
@@ -3064,7 +3067,9 @@ static __cold void io_ring_exit_work(struct work_struct *work)
*/
wait_for_completion_interruptible(&exit.completion);
mutex_lock(&ctx->uring_lock);
+ mutex_lock(&ctx->tctx_lock);
}
+ mutex_unlock(&ctx->tctx_lock);
mutex_unlock(&ctx->uring_lock);
spin_lock(&ctx->completion_lock);
spin_unlock(&ctx->completion_lock);
diff --git a/io_uring/register.c b/io_uring/register.c
index 62d39b3ff317..3d3822ff3fd9 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -320,6 +320,7 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
return 0;
/* now propagate the restriction to all registered users */
+ mutex_lock(&ctx->tctx_lock);
list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
tctx = node->task->io_uring;
if (WARN_ON_ONCE(!tctx->io_wq))
@@ -330,6 +331,7 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
/* ignore errors, it always returns zero anyway */
(void)io_wq_max_workers(tctx->io_wq, new_count);
}
+ mutex_unlock(&ctx->tctx_lock);
return 0;
err:
if (sqd) {
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index 5b66755579c0..6d6f44215ec8 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -136,9 +136,9 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
return ret;
}
- mutex_lock(&ctx->uring_lock);
+ mutex_lock(&ctx->tctx_lock);
list_add(&node->ctx_node, &ctx->tctx_list);
- mutex_unlock(&ctx->uring_lock);
+ mutex_unlock(&ctx->tctx_lock);
}
return 0;
}
@@ -176,9 +176,9 @@ __cold void io_uring_del_tctx_node(unsigned long index)
WARN_ON_ONCE(current != node->task);
WARN_ON_ONCE(list_empty(&node->ctx_node));
- mutex_lock(&node->ctx->uring_lock);
+ mutex_lock(&node->ctx->tctx_lock);
list_del(&node->ctx_node);
- mutex_unlock(&node->ctx->uring_lock);
+ mutex_unlock(&node->ctx->tctx_lock);
if (tctx->last == node->ctx)
tctx->last = NULL;
--
Jens Axboe
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [syzbot ci] Re: io_uring/tctx: add separate lock for list of tctx's in ctx
2025-12-31 15:41 [PATCH] io_uring/tctx: add separate lock for list of tctx's in ctx Jens Axboe
@ 2026-01-01 9:47 ` syzbot ci
0 siblings, 0 replies; 2+ messages in thread
From: syzbot ci @ 2026-01-01 9:47 UTC (permalink / raw)
To: axboe, io-uring; +Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] io_uring/tctx: add separate lock for list of tctx's in ctx
https://lore.kernel.org/all/4cffd2b8-e7c9-48df-9207-0df6e3d5c6ce@kernel.dk
* [PATCH] io_uring/tctx: add separate lock for list of tctx's in ctx
and found the following issue:
WARNING in __io_async_cancel
Full report is available here:
https://ci.syzbot.org/series/98951813-facd-4de5-8d20-609c804dc446
***
WARNING in __io_async_cancel
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: f8f9c1f4d0c7a64600e2ca312dec824a0bc2f1da
arch: amd64
compiler: Debian clang version 21.1.8 (++20251202083448+f68f64eb8130-1~exp1~20251202083504.46), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/9e0d952e-d592-4cca-9208-4b87430b4e6f/config
C repro: https://ci.syzbot.org/findings/103d3164-79b9-4a03-bdf3-9bb057961994/c_repro
syz repro: https://ci.syzbot.org/findings/103d3164-79b9-4a03-bdf3-9bb057961994/syz_repro
------------[ cut here ]------------
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff819c3a04>] prepare_to_wait+0x174/0x210 kernel/sched/wait.c:256
WARNING: kernel/sched/core.c:8754 at __might_sleep+0x92/0xf0 kernel/sched/core.c:8750, CPU#1: syz.0.17/5989
Modules linked in:
CPU: 1 UID: 0 PID: 5989 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:__might_sleep+0xac/0xf0 kernel/sched/core.c:8750
Code: 00 00 48 89 3c 24 41 89 f5 4c 8d 35 6e 25 32 0e 43 80 3c 3c 00 74 08 48 89 df e8 0f 31 96 00 48 8b 0b 4c 89 f7 89 ee 48 89 ca <67> 48 0f b9 3a 44 89 ee 48 8b 3c 24 eb b5 44 89 f1 80 e1 07 80 c1
RSP: 0018:ffffc90003c57960 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888112e4ef78 RCX: ffffffff819c3a04
RDX: ffffffff819c3a04 RSI: 0000000000000001 RDI: ffffffff8fc561c0
RBP: 0000000000000001 R08: ffffc90003c57ac7 R09: 0000000000000000
R10: ffffc90003c57aa0 R11: fffff5200078af59 R12: 1ffff110225c9def
R13: 000000000000024f R14: ffffffff8fc561c0 R15: dffffc0000000000
FS: 0000555558120500(0000) GS:ffff8882a9a0f000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002000000001c0 CR3: 00000001165c4000 CR4: 00000000000006f0
Call Trace:
<TASK>
__mutex_lock_common kernel/locking/mutex.c:591 [inline]
__mutex_lock+0x119/0x1340 kernel/locking/mutex.c:776
__io_async_cancel+0x183/0x490 io_uring/cancel.c:188
__io_sync_cancel io_uring/cancel.c:258 [inline]
io_sync_cancel+0x7d2/0x940 io_uring/cancel.c:322
__io_uring_register io_uring/register.c:761 [inline]
__do_sys_io_uring_register io_uring/register.c:921 [inline]
__se_sys_io_uring_register+0xe1b/0x12a0 io_uring/register.c:898
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xe2/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fc802f9acb9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe5ca81768 EFLAGS: 00000246 ORIG_RAX: 00000000000001ab
RAX: ffffffffffffffda RBX: 00007fc803205fa0 RCX: 00007fc802f9acb9
RDX: 00002000000001c0 RSI: 0000000000000018 RDI: 0000000000000003
RBP: 00007fc803008bf7 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fc803205fac R14: 00007fc803205fa0 R15: 00007fc803205fa0
</TASK>
----------------
Code disassembly (best guess):
0: 00 00 add %al,(%rax)
2: 48 89 3c 24 mov %rdi,(%rsp)
6: 41 89 f5 mov %esi,%r13d
9: 4c 8d 35 6e 25 32 0e lea 0xe32256e(%rip),%r14 # 0xe32257e
10: 43 80 3c 3c 00 cmpb $0x0,(%r12,%r15,1)
15: 74 08 je 0x1f
17: 48 89 df mov %rbx,%rdi
1a: e8 0f 31 96 00 call 0x96312e
1f: 48 8b 0b mov (%rbx),%rcx
22: 4c 89 f7 mov %r14,%rdi
25: 89 ee mov %ebp,%esi
27: 48 89 ca mov %rcx,%rdx
* 2a: 67 48 0f b9 3a ud1 (%edx),%rdi <-- trapping instruction
2f: 44 89 ee mov %r13d,%esi
32: 48 8b 3c 24 mov (%rsp),%rdi
36: eb b5 jmp 0xffffffed
38: 44 89 f1 mov %r14d,%ecx
3b: 80 e1 07 and $0x7,%cl
3e: 80 .byte 0x80
3f: c1 .byte 0xc1
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-01-01 9:47 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-31 15:41 [PATCH] io_uring/tctx: add separate lock for list of tctx's in ctx Jens Axboe
2026-01-01 9:47 ` [syzbot ci] " syzbot ci
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox