* [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-09-03 3:26 [PATCH 0/4] " Caleb Sander Mateos
@ 2025-09-03 21:55 ` syzbot ci
2025-09-03 23:29 ` Jens Axboe
0 siblings, 1 reply; 19+ messages in thread
From: syzbot ci @ 2025-09-03 21:55 UTC (permalink / raw)
To: axboe, csander, io-uring, linux-kernel; +Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com
* [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h
* [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
* [PATCH 3/4] io_uring: factor out uring_lock helpers
* [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
and found the following issue:
WARNING in io_handle_tw_list
Full report is available here:
https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae
***
WARNING in io_handle_tw_list
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config
syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro
------------[ cut here ]------------
WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297
WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297
Modules linked in:
CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline]
RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155
Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff
RSP: 0018:ffffc900032cf938 EFLAGS: 00010293
RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0
RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38
R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000
R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000
FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0
Call Trace:
<TASK>
tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223
io_sq_tw io_uring/sqpoll.c:244 [inline]
io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327
ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-09-03 21:55 ` [syzbot ci] " syzbot ci
@ 2025-09-03 23:29 ` Jens Axboe
2025-09-04 14:52 ` Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2025-09-03 23:29 UTC (permalink / raw)
To: syzbot ci, csander, io-uring, linux-kernel; +Cc: syzbot, syzkaller-bugs
On 9/3/25 3:55 PM, syzbot ci wrote:
> syzbot ci has tested the following series
>
> [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com
> * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h
> * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> * [PATCH 3/4] io_uring: factor out uring_lock helpers
> * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>
> and found the following issue:
> WARNING in io_handle_tw_list
>
> Full report is available here:
> https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae
>
> ***
>
> WARNING in io_handle_tw_list
>
> tree: linux-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7
> arch: amd64
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config
> syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro
>
> ------------[ cut here ]------------
> WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297
> WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297
> Modules linked in:
> CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline]
> RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155
> Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff
> RSP: 0018:ffffc900032cf938 EFLAGS: 00010293
> RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0
> RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
> RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38
> R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000
> R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000
> FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0
> Call Trace:
> <TASK>
> tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223
> io_sq_tw io_uring/sqpoll.c:244 [inline]
> io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327
> ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> </TASK>
Probably the sanest thing to do here is to clear
IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we
allow it, it'll be impossible to uphold the locking criteria on both the
issue and register side.
--
Jens Axboe
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-09-03 23:29 ` Jens Axboe
@ 2025-09-04 14:52 ` Caleb Sander Mateos
2025-09-04 16:46 ` Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 14:52 UTC (permalink / raw)
To: Jens Axboe; +Cc: syzbot ci, io-uring, linux-kernel, syzbot, syzkaller-bugs
On Wed, Sep 3, 2025 at 4:30 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/3/25 3:55 PM, syzbot ci wrote:
> > syzbot ci has tested the following series
> >
> > [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> > https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com
> > * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h
> > * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> > * [PATCH 3/4] io_uring: factor out uring_lock helpers
> > * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> >
> > and found the following issue:
> > WARNING in io_handle_tw_list
> >
> > Full report is available here:
> > https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae
> >
> > ***
> >
> > WARNING in io_handle_tw_list
> >
> > tree: linux-next
> > URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> > base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7
> > arch: amd64
> > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> > config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config
> > syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro
> >
> > ------------[ cut here ]------------
> > WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297
> > WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297
> > Modules linked in:
> > CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full)
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline]
> > RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155
> > Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff
> > RSP: 0018:ffffc900032cf938 EFLAGS: 00010293
> > RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0
> > RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
> > RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38
> > R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000
> > R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000
> > FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0
> > Call Trace:
> > <TASK>
> > tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223
> > io_sq_tw io_uring/sqpoll.c:244 [inline]
> > io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327
> > ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148
> > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > </TASK>
>
> Probably the sanest thing to do here is to clear
> IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we
> allow it, it'll be impossible to uphold the locking criteria on both the
> issue and register side.
Yup, I was thinking the same thing. Thanks for taking a look.
Best,
Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-09-04 14:52 ` Caleb Sander Mateos
@ 2025-09-04 16:46 ` Caleb Sander Mateos
2025-09-04 16:50 ` Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 16:46 UTC (permalink / raw)
To: Jens Axboe; +Cc: syzbot ci, io-uring, linux-kernel, syzbot, syzkaller-bugs
On Thu, Sep 4, 2025 at 7:52 AM Caleb Sander Mateos
<csander@purestorage.com> wrote:
>
> On Wed, Sep 3, 2025 at 4:30 PM Jens Axboe <axboe@kernel.dk> wrote:
> >
> > On 9/3/25 3:55 PM, syzbot ci wrote:
> > > syzbot ci has tested the following series
> > >
> > > [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> > > https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com
> > > * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h
> > > * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> > > * [PATCH 3/4] io_uring: factor out uring_lock helpers
> > > * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> > >
> > > and found the following issue:
> > > WARNING in io_handle_tw_list
> > >
> > > Full report is available here:
> > > https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae
> > >
> > > ***
> > >
> > > WARNING in io_handle_tw_list
> > >
> > > tree: linux-next
> > > URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> > > base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7
> > > arch: amd64
> > > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> > > config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config
> > > syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro
> > >
> > > ------------[ cut here ]------------
> > > WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297
> > > WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297
> > > Modules linked in:
> > > CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full)
> > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline]
> > > RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155
> > > Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff
> > > RSP: 0018:ffffc900032cf938 EFLAGS: 00010293
> > > RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0
> > > RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
> > > RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38
> > > R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000
> > > R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000
> > > FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0
> > > Call Trace:
> > > <TASK>
> > > tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223
> > > io_sq_tw io_uring/sqpoll.c:244 [inline]
> > > io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327
> > > ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148
> > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > > </TASK>
> >
> > Probably the sanest thing to do here is to clear
> > IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we
> > allow it, it'll be impossible to uphold the locking criteria on both the
> > issue and register side.
>
> Yup, I was thinking the same thing. Thanks for taking a look.
On further thought, IORING_SETUP_SQPOLL actually does guarantee a
single issuer. io_uring_enter() already avoids taking the uring_lock
in the IORING_SETUP_SQPOLL case because it doesn't issue any SQEs
itself. Only the SQ thread does that, so it *is* the single issuer.
The assertions I added in io_ring_ctx_lock()/io_ring_ctx_unlock() is
just unnecessarily strict. It should expect current ==
ctx->sq_data->thread in the IORING_SETUP_SQPOLL case.
Best,
Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-09-04 16:46 ` Caleb Sander Mateos
@ 2025-09-04 16:50 ` Caleb Sander Mateos
2025-09-04 23:25 ` Jens Axboe
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 16:50 UTC (permalink / raw)
To: Jens Axboe; +Cc: syzbot ci, io-uring, linux-kernel, syzbot, syzkaller-bugs
On Thu, Sep 4, 2025 at 9:46 AM Caleb Sander Mateos
<csander@purestorage.com> wrote:
>
> On Thu, Sep 4, 2025 at 7:52 AM Caleb Sander Mateos
> <csander@purestorage.com> wrote:
> >
> > On Wed, Sep 3, 2025 at 4:30 PM Jens Axboe <axboe@kernel.dk> wrote:
> > >
> > > On 9/3/25 3:55 PM, syzbot ci wrote:
> > > > syzbot ci has tested the following series
> > > >
> > > > [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> > > > https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com
> > > > * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h
> > > > * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> > > > * [PATCH 3/4] io_uring: factor out uring_lock helpers
> > > > * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> > > >
> > > > and found the following issue:
> > > > WARNING in io_handle_tw_list
> > > >
> > > > Full report is available here:
> > > > https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae
> > > >
> > > > ***
> > > >
> > > > WARNING in io_handle_tw_list
> > > >
> > > > tree: linux-next
> > > > URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> > > > base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7
> > > > arch: amd64
> > > > compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> > > > config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config
> > > > syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro
> > > >
> > > > ------------[ cut here ]------------
> > > > WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297
> > > > WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297
> > > > Modules linked in:
> > > > CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full)
> > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline]
> > > > RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155
> > > > Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff
> > > > RSP: 0018:ffffc900032cf938 EFLAGS: 00010293
> > > > RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0
> > > > RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
> > > > RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38
> > > > R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000
> > > > R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000
> > > > FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000
> > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0
> > > > Call Trace:
> > > > <TASK>
> > > > tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223
> > > > io_sq_tw io_uring/sqpoll.c:244 [inline]
> > > > io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327
> > > > ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148
> > > > ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> > > > </TASK>
> > >
> > > Probably the sanest thing to do here is to clear
> > > IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we
> > > allow it, it'll be impossible to uphold the locking criteria on both the
> > > issue and register side.
> >
> > Yup, I was thinking the same thing. Thanks for taking a look.
>
> On further thought, IORING_SETUP_SQPOLL actually does guarantee a
> single issuer. io_uring_enter() already avoids taking the uring_lock
> in the IORING_SETUP_SQPOLL case because it doesn't issue any SQEs
> itself. Only the SQ thread does that, so it *is* the single issuer.
> The assertions I added in io_ring_ctx_lock()/io_ring_ctx_unlock() is
> just unnecessarily strict. It should expect current ==
> ctx->sq_data->thread in the IORING_SETUP_SQPOLL case.
Oh, but you are totally correct about needing the mutex to synchronize
between issue on the SQ thread and io_uring_register() on other
threads. Yeah, I don't see an easy way to avoid taking the mutex on
the SQ thread unless we disallowed io_uring_register() completely.
Clearing IORING_SETUP_SINGLE_ISSUER seems like the best option for
now.
Best,
Caleb
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-09-04 16:50 ` Caleb Sander Mateos
@ 2025-09-04 23:25 ` Jens Axboe
0 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2025-09-04 23:25 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: syzbot ci, io-uring, linux-kernel, syzbot, syzkaller-bugs
On 9/4/25 10:50 AM, Caleb Sander Mateos wrote:
> On Thu, Sep 4, 2025 at 9:46?AM Caleb Sander Mateos
> <csander@purestorage.com> wrote:
>>
>> On Thu, Sep 4, 2025 at 7:52?AM Caleb Sander Mateos
>> <csander@purestorage.com> wrote:
>>>
>>> On Wed, Sep 3, 2025 at 4:30?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> On 9/3/25 3:55 PM, syzbot ci wrote:
>>>>> syzbot ci has tested the following series
>>>>>
>>>>> [v1] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>>>>> https://lore.kernel.org/all/20250903032656.2012337-1-csander@purestorage.com
>>>>> * [PATCH 1/4] io_uring: don't include filetable.h in io_uring.h
>>>>> * [PATCH 2/4] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
>>>>> * [PATCH 3/4] io_uring: factor out uring_lock helpers
>>>>> * [PATCH 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>>>>>
>>>>> and found the following issue:
>>>>> WARNING in io_handle_tw_list
>>>>>
>>>>> Full report is available here:
>>>>> https://ci.syzbot.org/series/54ae0eae-5e47-4cfe-9ae7-9eaaf959b5ae
>>>>>
>>>>> ***
>>>>>
>>>>> WARNING in io_handle_tw_list
>>>>>
>>>>> tree: linux-next
>>>>> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
>>>>> base: 5d50cf9f7cf20a17ac469c20a2e07c29c1f6aab7
>>>>> arch: amd64
>>>>> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
>>>>> config: https://ci.syzbot.org/builds/1de646dd-4ee2-418d-9c62-617d88ed4fd2/config
>>>>> syz repro: https://ci.syzbot.org/findings/e229a878-375f-4286-89fe-b6724c23addd/syz_repro
>>>>>
>>>>> ------------[ cut here ]------------
>>>>> WARNING: io_uring/io_uring.h:127 at io_ring_ctx_lock io_uring/io_uring.h:127 [inline], CPU#1: iou-sqp-6294/6297
>>>>> WARNING: io_uring/io_uring.h:127 at io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155, CPU#1: iou-sqp-6294/6297
>>>>> Modules linked in:
>>>>> CPU: 1 UID: 0 PID: 6297 Comm: iou-sqp-6294 Not tainted syzkaller #0 PREEMPT(full)
>>>>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
>>>>> RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:127 [inline]
>>>>> RIP: 0010:io_handle_tw_list+0x234/0x2e0 io_uring/io_uring.c:1155
>>>>> Code: 00 00 48 c7 c7 e0 90 02 8c be 8e 04 00 00 31 d2 e8 01 e5 d2 fc 2e 2e 2e 31 c0 45 31 e4 4d 85 ff 75 89 eb 7c e8 ad fb 00 fd 90 <0f> 0b 90 e9 cf fe ff ff 89 e9 80 e1 07 80 c1 03 38 c1 0f 8c 22 ff
>>>>> RSP: 0018:ffffc900032cf938 EFLAGS: 00010293
>>>>> RAX: ffffffff84bfcba3 RBX: dffffc0000000000 RCX: ffff888107f61cc0
>>>>> RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
>>>>> RBP: ffff8881119a8008 R08: ffff888110bb69c7 R09: 1ffff11022176d38
>>>>> R10: dffffc0000000000 R11: ffffed1022176d39 R12: ffff8881119a8000
>>>>> R13: ffff888108441e90 R14: ffff888107f61cc0 R15: 0000000000000000
>>>>> FS: 00007f81f25716c0(0000) GS:ffff8881a39f5000(0000) knlGS:0000000000000000
>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: 0000001b31b63fff CR3: 000000010f24c000 CR4: 00000000000006f0
>>>>> Call Trace:
>>>>> <TASK>
>>>>> tctx_task_work_run+0x99/0x370 io_uring/io_uring.c:1223
>>>>> io_sq_tw io_uring/sqpoll.c:244 [inline]
>>>>> io_sq_thread+0xed1/0x1e50 io_uring/sqpoll.c:327
>>>>> ret_from_fork+0x47f/0x820 arch/x86/kernel/process.c:148
>>>>> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>>>>> </TASK>
>>>>
>>>> Probably the sanest thing to do here is to clear
>>>> IORING_SETUP_SINGLE_ISSUER if it's set with IORING_SETUP_SQPOLL. If we
>>>> allow it, it'll be impossible to uphold the locking criteria on both the
>>>> issue and register side.
>>>
>>> Yup, I was thinking the same thing. Thanks for taking a look.
>>
>> On further thought, IORING_SETUP_SQPOLL actually does guarantee a
>> single issuer. io_uring_enter() already avoids taking the uring_lock
>> in the IORING_SETUP_SQPOLL case because it doesn't issue any SQEs
>> itself. Only the SQ thread does that, so it *is* the single issuer.
>> The assertions I added in io_ring_ctx_lock()/io_ring_ctx_unlock() is
>> just unnecessarily strict. It should expect current ==
>> ctx->sq_data->thread in the IORING_SETUP_SQPOLL case.
>
> Oh, but you are totally correct about needing the mutex to synchronize
> between issue on the SQ thread and io_uring_register() on other
> threads. Yeah, I don't see an easy way to avoid taking the mutex on
> the SQ thread unless we disallowed io_uring_register() completely.
> Clearing IORING_SETUP_SINGLE_ISSUER seems like the best option for
> now.
Right - I don't disagree that SQPOLL is the very definition of "single
issuer", but it'll still have to contend with the creating task doing
other operations that they would need mutual exclusion for. I don't
think clearing SINGLE_ISSUER on SQPOLL is a big deal, it's not like it's
worse off than before. It's just not getting the same optimizations that
the !SQPOLL single issuer path would get.
--
Jens Axboe
^ permalink raw reply [flat|nested] 19+ messages in thread
* [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-11-25 23:39 [PATCH v3 0/4] " Caleb Sander Mateos
@ 2025-11-26 8:15 ` syzbot ci
2025-11-26 17:30 ` Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: syzbot ci @ 2025-11-26 8:15 UTC (permalink / raw)
To: axboe, csander, io-uring, linux-kernel; +Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v3] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
https://lore.kernel.org/all/20251125233928.3962947-1-csander@purestorage.com
* [PATCH v3 1/4] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
* [PATCH v3 2/4] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
* [PATCH v3 3/4] io_uring: factor out uring_lock helpers
* [PATCH v3 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
and found the following issues:
* SYZFAIL: failed to recv rpc
* WARNING in io_ring_ctx_wait_and_kill
* WARNING in io_uring_alloc_task_context
* WARNING: suspicious RCU usage in io_eventfd_unregister
Full report is available here:
https://ci.syzbot.org/series/dde98852-0135-44b2-bbef-9ff9d772f924
***
SYZFAIL: failed to recv rpc
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 92fd6e84175befa1775e5c0ab682938eca27c0b2
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/9d67ded7-d9a8-41e3-8b58-51340991cf96/config
C repro: https://ci.syzbot.org/findings/19ae4090-3486-4e2a-973e-dcb6ec3ba0d1/c_repro
syz repro: https://ci.syzbot.org/findings/19ae4090-3486-4e2a-973e-dcb6ec3ba0d1/syz_repro
SYZFAIL: failed to recv rpc
fd=3 want=4 recv=0 n=0 (errno 9: Bad file descriptor)
***
WARNING in io_ring_ctx_wait_and_kill
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 92fd6e84175befa1775e5c0ab682938eca27c0b2
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/9d67ded7-d9a8-41e3-8b58-51340991cf96/config
C repro: https://ci.syzbot.org/findings/f5ff9320-bf6f-40b4-a6b3-eee18fa83053/c_repro
syz repro: https://ci.syzbot.org/findings/f5ff9320-bf6f-40b4-a6b3-eee18fa83053/syz_repro
------------[ cut here ]------------
WARNING: io_uring/io_uring.h:266 at io_ring_ctx_lock io_uring/io_uring.h:266 [inline], CPU#0: syz.0.17/5967
WARNING: io_uring/io_uring.h:266 at io_ring_ctx_wait_and_kill+0x35f/0x490 io_uring/io_uring.c:3119, CPU#0: syz.0.17/5967
Modules linked in:
CPU: 0 UID: 0 PID: 5967 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:266 [inline]
RIP: 0010:io_ring_ctx_wait_and_kill+0x35f/0x490 io_uring/io_uring.c:3119
Code: 4e 11 48 3b 84 24 20 01 00 00 0f 85 1e 01 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 92 fa 96 00 90 <0f> 0b 90 e9 be fd ff ff 48 8d 7c 24 40 ba 70 00 00 00 31 f6 e8 08
RSP: 0018:ffffc90004117b80 EFLAGS: 00010293
RAX: ffffffff812ac5ee RBX: ffff88810d784000 RCX: ffff888104363a80
RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
RBP: ffffc90004117d00 R08: ffffc90004117c7f R09: 0000000000000000
R10: ffffc90004117c40 R11: fffff52000822f90 R12: 1ffff92000822f74
R13: dffffc0000000000 R14: ffffc90004117c70 R15: 0000000000000000
FS: 000055558ddb3500(0000) GS:ffff88818e88a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f07135e7dac CR3: 00000001728f4000 CR4: 00000000000006f0
Call Trace:
<TASK>
io_uring_create+0x6b6/0x940 io_uring/io_uring.c:3738
io_uring_setup io_uring/io_uring.c:3764 [inline]
__do_sys_io_uring_setup io_uring/io_uring.c:3798 [inline]
__se_sys_io_uring_setup+0x235/0x240 io_uring/io_uring.c:3789
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f071338f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fff80b05b58 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 00007f07135e5fa0 RCX: 00007f071338f749
RDX: 0000000000000000 RSI: 0000200000000040 RDI: 0000000000000024
RBP: 00007f0713413f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f07135e5fa0 R14: 00007f07135e5fa0 R15: 0000000000000002
</TASK>
***
WARNING in io_uring_alloc_task_context
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 92fd6e84175befa1775e5c0ab682938eca27c0b2
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/9d67ded7-d9a8-41e3-8b58-51340991cf96/config
C repro: https://ci.syzbot.org/findings/7aa56677-dbe1-4fdc-bbc4-cc701c10fa7e/c_repro
syz repro: https://ci.syzbot.org/findings/7aa56677-dbe1-4fdc-bbc4-cc701c10fa7e/syz_repro
------------[ cut here ]------------
WARNING: io_uring/io_uring.h:266 at io_ring_ctx_lock io_uring/io_uring.h:266 [inline], CPU#0: syz.0.17/5982
WARNING: io_uring/io_uring.h:266 at io_init_wq_offload io_uring/tctx.c:23 [inline], CPU#0: syz.0.17/5982
WARNING: io_uring/io_uring.h:266 at io_uring_alloc_task_context+0x677/0x8c0 io_uring/tctx.c:86, CPU#0: syz.0.17/5982
Modules linked in:
CPU: 0 UID: 0 PID: 5982 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:266 [inline]
RIP: 0010:io_init_wq_offload io_uring/tctx.c:23 [inline]
RIP: 0010:io_uring_alloc_task_context+0x677/0x8c0 io_uring/tctx.c:86
Code: d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 3d ad 96 00 bb f4 ff ff ff eb ab e8 31 ad 96 00 eb 9c e8 2a ad 96 00 90 <0f> 0b 90 e9 12 fb ff ff 4c 8d 64 24 60 4c 8d b4 24 f0 00 00 00 ba
RSP: 0018:ffffc90003dcf9c0 EFLAGS: 00010293
RAX: ffffffff812b1356 RBX: 0000000000000000 RCX: ffff8881777957c0
RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
RBP: ffffc90003dcfb50 R08: ffffffff8f7de377 R09: 1ffffffff1efbc6e
R10: dffffc0000000000 R11: fffffbfff1efbc6f R12: ffff8881052bf000
R13: ffff888104bf2000 R14: 0000000000001000 R15: 1ffff1102097e400
FS: 00005555613bd500(0000) GS:ffff88818e88a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7773fe7dac CR3: 000000016cd1c000 CR4: 00000000000006f0
Call Trace:
<TASK>
__io_uring_add_tctx_node+0x455/0x710 io_uring/tctx.c:112
io_uring_create+0x559/0x940 io_uring/io_uring.c:3719
io_uring_setup io_uring/io_uring.c:3764 [inline]
__do_sys_io_uring_setup io_uring/io_uring.c:3798 [inline]
__se_sys_io_uring_setup+0x235/0x240 io_uring/io_uring.c:3789
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f7773d8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffe094f0b68 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 00007f7773fe5fa0 RCX: 00007f7773d8f749
RDX: 0000000000000000 RSI: 0000200000000780 RDI: 0000000000000f08
RBP: 00007f7773e13f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f7773fe5fa0 R14: 00007f7773fe5fa0 R15: 0000000000000002
</TASK>
***
WARNING: suspicious RCU usage in io_eventfd_unregister
tree: linux-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
base: 92fd6e84175befa1775e5c0ab682938eca27c0b2
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/9d67ded7-d9a8-41e3-8b58-51340991cf96/config
C repro: https://ci.syzbot.org/findings/84c08f15-f4f9-4123-b889-1d8d19f3e0b1/c_repro
syz repro: https://ci.syzbot.org/findings/84c08f15-f4f9-4123-b889-1d8d19f3e0b1/syz_repro
=============================
WARNING: suspicious RCU usage
syzkaller #0 Not tainted
-----------------------------
io_uring/eventfd.c:160 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by kworker/u10:12/3941:
#0: ffff888168f41148 ((wq_completion)iou_exit){+.+.}-{0:0}, at: process_one_work+0x841/0x15a0 kernel/workqueue.c:3236
#1: ffffc90021f3fb80 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x868/0x15a0 kernel/workqueue.c:3237
stack backtrace:
CPU: 1 UID: 0 PID: 3941 Comm: kworker/u10:12 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: iou_exit io_ring_exit_work
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
lockdep_rcu_suspicious+0x140/0x1d0 kernel/locking/lockdep.c:6876
io_eventfd_unregister+0x18b/0x1c0 io_uring/eventfd.c:159
io_ring_ctx_free+0x18a/0x820 io_uring/io_uring.c:2882
io_ring_exit_work+0xe71/0x1030 io_uring/io_uring.c:3110
process_one_work+0x93a/0x15a0 kernel/workqueue.c:3261
process_scheduled_works kernel/workqueue.c:3344 [inline]
worker_thread+0x9b0/0xee0 kernel/workqueue.c:3425
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-11-26 8:15 ` [syzbot ci] " syzbot ci
@ 2025-11-26 17:30 ` Caleb Sander Mateos
0 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-11-26 17:30 UTC (permalink / raw)
To: syzbot ci; +Cc: axboe, io-uring, linux-kernel, syzbot, syzkaller-bugs
On Wed, Nov 26, 2025 at 12:15 AM syzbot ci
<syzbot+ci500177af251d1ddc@syzkaller.appspotmail.com> wrote:
>
> syzbot ci has tested the following series
>
> [v3] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> https://lore.kernel.org/all/20251125233928.3962947-1-csander@purestorage.com
> * [PATCH v3 1/4] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
> * [PATCH v3 2/4] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
> * [PATCH v3 3/4] io_uring: factor out uring_lock helpers
> * [PATCH v3 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>
> and found the following issues:
> * SYZFAIL: failed to recv rpc
Looks like this might be a side effect of the "WARNING: suspicious RCU
usage in io_eventfd_unregister" report.
> * WARNING in io_ring_ctx_wait_and_kill
Looks like io_ring_ctx_wait_and_kill() can be called on a
IORING_SETUP_SINGLE_ISSUER io_ring_ctx before submitter_task has been
set if io_uring_create() errors out or a IORING_SETUP_R_DISABLED
io_ring_ctx is never enabled. I can relax this WARN_ON_ONCE()
condition.
> * WARNING in io_uring_alloc_task_context
Similar issue, __io_uring_add_tctx_node() is always called in
io_uring_create(), where submitter_task won't exist yet for
IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_R_DISABLED.
> * WARNING: suspicious RCU usage in io_eventfd_unregister
Missed that io_eventfd_unregister() is also called from
io_ring_ctx_free(), not just __io_uring_register(). So we can't assert
that the uring_lock mutex is held.
Thanks, syzbot!
>
> Full report is available here:
> https://ci.syzbot.org/series/dde98852-0135-44b2-bbef-9ff9d772f924
>
> ***
>
> SYZFAIL: failed to recv rpc
>
> tree: linux-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> base: 92fd6e84175befa1775e5c0ab682938eca27c0b2
> arch: amd64
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> config: https://ci.syzbot.org/builds/9d67ded7-d9a8-41e3-8b58-51340991cf96/config
> C repro: https://ci.syzbot.org/findings/19ae4090-3486-4e2a-973e-dcb6ec3ba0d1/c_repro
> syz repro: https://ci.syzbot.org/findings/19ae4090-3486-4e2a-973e-dcb6ec3ba0d1/syz_repro
>
> SYZFAIL: failed to recv rpc
> fd=3 want=4 recv=0 n=0 (errno 9: Bad file descriptor)
>
>
> ***
>
> WARNING in io_ring_ctx_wait_and_kill
>
> tree: linux-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> base: 92fd6e84175befa1775e5c0ab682938eca27c0b2
> arch: amd64
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> config: https://ci.syzbot.org/builds/9d67ded7-d9a8-41e3-8b58-51340991cf96/config
> C repro: https://ci.syzbot.org/findings/f5ff9320-bf6f-40b4-a6b3-eee18fa83053/c_repro
> syz repro: https://ci.syzbot.org/findings/f5ff9320-bf6f-40b4-a6b3-eee18fa83053/syz_repro
>
> ------------[ cut here ]------------
> WARNING: io_uring/io_uring.h:266 at io_ring_ctx_lock io_uring/io_uring.h:266 [inline], CPU#0: syz.0.17/5967
> WARNING: io_uring/io_uring.h:266 at io_ring_ctx_wait_and_kill+0x35f/0x490 io_uring/io_uring.c:3119, CPU#0: syz.0.17/5967
> Modules linked in:
> CPU: 0 UID: 0 PID: 5967 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:266 [inline]
> RIP: 0010:io_ring_ctx_wait_and_kill+0x35f/0x490 io_uring/io_uring.c:3119
> Code: 4e 11 48 3b 84 24 20 01 00 00 0f 85 1e 01 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 92 fa 96 00 90 <0f> 0b 90 e9 be fd ff ff 48 8d 7c 24 40 ba 70 00 00 00 31 f6 e8 08
> RSP: 0018:ffffc90004117b80 EFLAGS: 00010293
> RAX: ffffffff812ac5ee RBX: ffff88810d784000 RCX: ffff888104363a80
> RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
> RBP: ffffc90004117d00 R08: ffffc90004117c7f R09: 0000000000000000
> R10: ffffc90004117c40 R11: fffff52000822f90 R12: 1ffff92000822f74
> R13: dffffc0000000000 R14: ffffc90004117c70 R15: 0000000000000000
> FS: 000055558ddb3500(0000) GS:ffff88818e88a000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f07135e7dac CR3: 00000001728f4000 CR4: 00000000000006f0
> Call Trace:
> <TASK>
> io_uring_create+0x6b6/0x940 io_uring/io_uring.c:3738
> io_uring_setup io_uring/io_uring.c:3764 [inline]
> __do_sys_io_uring_setup io_uring/io_uring.c:3798 [inline]
> __se_sys_io_uring_setup+0x235/0x240 io_uring/io_uring.c:3789
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f071338f749
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007fff80b05b58 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> RAX: ffffffffffffffda RBX: 00007f07135e5fa0 RCX: 00007f071338f749
> RDX: 0000000000000000 RSI: 0000200000000040 RDI: 0000000000000024
> RBP: 00007f0713413f91 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f07135e5fa0 R14: 00007f07135e5fa0 R15: 0000000000000002
> </TASK>
>
>
> ***
>
> WARNING in io_uring_alloc_task_context
>
> tree: linux-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> base: 92fd6e84175befa1775e5c0ab682938eca27c0b2
> arch: amd64
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> config: https://ci.syzbot.org/builds/9d67ded7-d9a8-41e3-8b58-51340991cf96/config
> C repro: https://ci.syzbot.org/findings/7aa56677-dbe1-4fdc-bbc4-cc701c10fa7e/c_repro
> syz repro: https://ci.syzbot.org/findings/7aa56677-dbe1-4fdc-bbc4-cc701c10fa7e/syz_repro
>
> ------------[ cut here ]------------
> WARNING: io_uring/io_uring.h:266 at io_ring_ctx_lock io_uring/io_uring.h:266 [inline], CPU#0: syz.0.17/5982
> WARNING: io_uring/io_uring.h:266 at io_init_wq_offload io_uring/tctx.c:23 [inline], CPU#0: syz.0.17/5982
> WARNING: io_uring/io_uring.h:266 at io_uring_alloc_task_context+0x677/0x8c0 io_uring/tctx.c:86, CPU#0: syz.0.17/5982
> Modules linked in:
> CPU: 0 UID: 0 PID: 5982 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:io_ring_ctx_lock io_uring/io_uring.h:266 [inline]
> RIP: 0010:io_init_wq_offload io_uring/tctx.c:23 [inline]
> RIP: 0010:io_uring_alloc_task_context+0x677/0x8c0 io_uring/tctx.c:86
> Code: d8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc cc e8 3d ad 96 00 bb f4 ff ff ff eb ab e8 31 ad 96 00 eb 9c e8 2a ad 96 00 90 <0f> 0b 90 e9 12 fb ff ff 4c 8d 64 24 60 4c 8d b4 24 f0 00 00 00 ba
> RSP: 0018:ffffc90003dcf9c0 EFLAGS: 00010293
> RAX: ffffffff812b1356 RBX: 0000000000000000 RCX: ffff8881777957c0
> RDX: 0000000000000000 RSI: 0000000000001000 RDI: 0000000000000000
> RBP: ffffc90003dcfb50 R08: ffffffff8f7de377 R09: 1ffffffff1efbc6e
> R10: dffffc0000000000 R11: fffffbfff1efbc6f R12: ffff8881052bf000
> R13: ffff888104bf2000 R14: 0000000000001000 R15: 1ffff1102097e400
> FS: 00005555613bd500(0000) GS:ffff88818e88a000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f7773fe7dac CR3: 000000016cd1c000 CR4: 00000000000006f0
> Call Trace:
> <TASK>
> __io_uring_add_tctx_node+0x455/0x710 io_uring/tctx.c:112
> io_uring_create+0x559/0x940 io_uring/io_uring.c:3719
> io_uring_setup io_uring/io_uring.c:3764 [inline]
> __do_sys_io_uring_setup io_uring/io_uring.c:3798 [inline]
> __se_sys_io_uring_setup+0x235/0x240 io_uring/io_uring.c:3789
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f7773d8f749
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffe094f0b68 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> RAX: ffffffffffffffda RBX: 00007f7773fe5fa0 RCX: 00007f7773d8f749
> RDX: 0000000000000000 RSI: 0000200000000780 RDI: 0000000000000f08
> RBP: 00007f7773e13f91 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f7773fe5fa0 R14: 00007f7773fe5fa0 R15: 0000000000000002
> </TASK>
>
>
> ***
>
> WARNING: suspicious RCU usage in io_eventfd_unregister
>
> tree: linux-next
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/next/linux-next
> base: 92fd6e84175befa1775e5c0ab682938eca27c0b2
> arch: amd64
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> config: https://ci.syzbot.org/builds/9d67ded7-d9a8-41e3-8b58-51340991cf96/config
> C repro: https://ci.syzbot.org/findings/84c08f15-f4f9-4123-b889-1d8d19f3e0b1/c_repro
> syz repro: https://ci.syzbot.org/findings/84c08f15-f4f9-4123-b889-1d8d19f3e0b1/syz_repro
>
> =============================
> WARNING: suspicious RCU usage
> syzkaller #0 Not tainted
> -----------------------------
> io_uring/eventfd.c:160 suspicious rcu_dereference_protected() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 2, debug_locks = 1
> 2 locks held by kworker/u10:12/3941:
> #0: ffff888168f41148 ((wq_completion)iou_exit){+.+.}-{0:0}, at: process_one_work+0x841/0x15a0 kernel/workqueue.c:3236
> #1: ffffc90021f3fb80 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x868/0x15a0 kernel/workqueue.c:3237
>
> stack backtrace:
> CPU: 1 UID: 0 PID: 3941 Comm: kworker/u10:12 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Workqueue: iou_exit io_ring_exit_work
> Call Trace:
> <TASK>
> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
> lockdep_rcu_suspicious+0x140/0x1d0 kernel/locking/lockdep.c:6876
> io_eventfd_unregister+0x18b/0x1c0 io_uring/eventfd.c:159
> io_ring_ctx_free+0x18a/0x820 io_uring/io_uring.c:2882
> io_ring_exit_work+0xe71/0x1030 io_uring/io_uring.c:3110
> process_one_work+0x93a/0x15a0 kernel/workqueue.c:3261
> process_scheduled_works kernel/workqueue.c:3344 [inline]
> worker_thread+0x9b0/0xee0 kernel/workqueue.c:3425
> kthread+0x711/0x8a0 kernel/kthread.c:463
> ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
> </TASK>
>
>
> ***
>
> If these findings have caused you to resend the series or submit a
> separate fix, please add the following tag to your commit message:
> Tested-by: syzbot@syzkaller.appspotmail.com
>
> ---
> This report is generated by a bot. It may contain errors.
> syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-12-15 20:09 [PATCH v5 0/6] " Caleb Sander Mateos
@ 2025-12-16 5:21 ` syzbot ci
2025-12-18 1:24 ` Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: syzbot ci @ 2025-12-16 5:21 UTC (permalink / raw)
To: axboe, csander, io-uring, joannelkoong, linux-kernel, oliver.sang,
syzbot
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
https://lore.kernel.org/all/20251215200909.3505001-1-csander@purestorage.com
* [PATCH v5 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED
* [PATCH v5 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
* [PATCH v5 3/6] io_uring: ensure io_uring_create() initializes submitter_task
* [PATCH v5 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
* [PATCH v5 5/6] io_uring: factor out uring_lock helpers
* [PATCH v5 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
and found the following issue:
KASAN: slab-use-after-free Read in task_work_add
Full report is available here:
https://ci.syzbot.org/series/bce89909-ebf2-45f6-be49-bbd46e33e966
***
KASAN: slab-use-after-free Read in task_work_add
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: d358e5254674b70f34c847715ca509e46eb81e6f
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/db5ac991-f49c-460f-80e4-2a33be76fe7c/config
syz repro: https://ci.syzbot.org/findings/ddbf1feb-6618-4c0f-9a16-15b856f20d71/syz_repro
==================================================================
BUG: KASAN: slab-use-after-free in task_work_add+0xd7/0x440 kernel/task_work.c:73
Read of size 8 at addr ffff88816a8826f8 by task kworker/u9:2/54
CPU: 0 UID: 0 PID: 54 Comm: kworker/u9:2 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Workqueue: iou_exit io_ring_exit_work
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
task_work_add+0xd7/0x440 kernel/task_work.c:73
io_ring_ctx_lock_nested io_uring/io_uring.h:271 [inline]
io_ring_ctx_lock io_uring/io_uring.h:282 [inline]
io_req_caches_free+0x342/0x3e0 io_uring/io_uring.c:2869
io_ring_ctx_free+0x56a/0x8e0 io_uring/io_uring.c:2908
io_ring_exit_work+0xff9/0x1220 io_uring/io_uring.c:3113
process_one_work kernel/workqueue.c:3257 [inline]
process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>
Allocated by task 7671:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
unpoison_slab_object mm/kasan/common.c:339 [inline]
__kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:365
kasan_slab_alloc include/linux/kasan.h:252 [inline]
slab_post_alloc_hook mm/slub.c:4953 [inline]
slab_alloc_node mm/slub.c:5263 [inline]
kmem_cache_alloc_node_noprof+0x43c/0x720 mm/slub.c:5315
alloc_task_struct_node kernel/fork.c:184 [inline]
dup_task_struct+0x57/0x9a0 kernel/fork.c:915
copy_process+0x4ea/0x3950 kernel/fork.c:2052
kernel_clone+0x21e/0x820 kernel/fork.c:2651
__do_sys_clone3 kernel/fork.c:2953 [inline]
__se_sys_clone3+0x256/0x2d0 kernel/fork.c:2932
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 6024:
kasan_save_stack mm/kasan/common.c:56 [inline]
kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
poison_slab_object mm/kasan/common.c:252 [inline]
__kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284
kasan_slab_free include/linux/kasan.h:234 [inline]
slab_free_hook mm/slub.c:2540 [inline]
slab_free mm/slub.c:6668 [inline]
kmem_cache_free+0x197/0x620 mm/slub.c:6779
rcu_do_batch kernel/rcu/tree.c:2605 [inline]
rcu_core+0xd70/0x1870 kernel/rcu/tree.c:2857
handle_softirqs+0x27d/0x850 kernel/softirq.c:622
__do_softirq kernel/softirq.c:656 [inline]
invoke_softirq kernel/softirq.c:496 [inline]
__irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:723
irq_exit_rcu+0x9/0x30 kernel/softirq.c:739
instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
sysvec_call_function_single+0xa3/0xc0 arch/x86/kernel/smp.c:266
asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:704
Last potentially related work creation:
kasan_save_stack+0x3e/0x60 mm/kasan/common.c:56
kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
__call_rcu_common kernel/rcu/tree.c:3119 [inline]
call_rcu+0x157/0x9c0 kernel/rcu/tree.c:3239
rcu_do_batch kernel/rcu/tree.c:2605 [inline]
rcu_core+0xd70/0x1870 kernel/rcu/tree.c:2857
handle_softirqs+0x27d/0x850 kernel/softirq.c:622
run_ksoftirqd+0x9b/0x100 kernel/softirq.c:1063
smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
Second to last potentially related work creation:
kasan_save_stack+0x3e/0x60 mm/kasan/common.c:56
kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
__call_rcu_common kernel/rcu/tree.c:3119 [inline]
call_rcu+0x157/0x9c0 kernel/rcu/tree.c:3239
context_switch kernel/sched/core.c:5259 [inline]
__schedule+0x14c4/0x5000 kernel/sched/core.c:6863
preempt_schedule_irq+0xb5/0x150 kernel/sched/core.c:7190
irqentry_exit+0x5d8/0x660 kernel/entry/common.c:216
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
The buggy address belongs to the object at ffff88816a881d40
which belongs to the cache task_struct of size 7232
The buggy address is located 2488 bytes inside of
freed 7232-byte region [ffff88816a881d40, ffff88816a883980)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16a880
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff8881726b0441
anon flags: 0x57ff00000000040(head|node=1|zone=2|lastcpupid=0x7ff)
page_type: f5(slab)
raw: 057ff00000000040 ffff88816040a500 0000000000000000 0000000000000001
raw: 0000000000000000 0000000080040004 00000000f5000000 ffff8881726b0441
head: 057ff00000000040 ffff88816040a500 0000000000000000 0000000000000001
head: 0000000000000000 0000000080040004 00000000f5000000 ffff8881726b0441
head: 057ff00000000003 ffffea0005aa2001 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 7291, tgid 7291 (syz.2.649), ts 88142964676, free_ts 88127352940
set_page_owner include/linux/page_owner.h:32 [inline]
post_alloc_hook+0x234/0x290 mm/page_alloc.c:1846
prep_new_page mm/page_alloc.c:1854 [inline]
get_page_from_freelist+0x2365/0x2440 mm/page_alloc.c:3915
__alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5210
alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2486
alloc_slab_page mm/slub.c:3075 [inline]
allocate_slab+0x86/0x3b0 mm/slub.c:3248
new_slab mm/slub.c:3302 [inline]
___slab_alloc+0xf2b/0x1960 mm/slub.c:4656
__slab_alloc+0x65/0x100 mm/slub.c:4779
__slab_alloc_node mm/slub.c:4855 [inline]
slab_alloc_node mm/slub.c:5251 [inline]
kmem_cache_alloc_node_noprof+0x4ce/0x720 mm/slub.c:5315
alloc_task_struct_node kernel/fork.c:184 [inline]
dup_task_struct+0x57/0x9a0 kernel/fork.c:915
copy_process+0x4ea/0x3950 kernel/fork.c:2052
kernel_clone+0x21e/0x820 kernel/fork.c:2651
__do_sys_clone3 kernel/fork.c:2953 [inline]
__se_sys_clone3+0x256/0x2d0 kernel/fork.c:2932
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
page last free pid 5275 tgid 5275 stack trace:
reset_page_owner include/linux/page_owner.h:25 [inline]
free_pages_prepare mm/page_alloc.c:1395 [inline]
__free_frozen_pages+0xbc8/0xd30 mm/page_alloc.c:2943
__slab_free+0x21b/0x2a0 mm/slub.c:6004
qlink_free mm/kasan/quarantine.c:163 [inline]
qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179
kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
__kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:349
kasan_slab_alloc include/linux/kasan.h:252 [inline]
slab_post_alloc_hook mm/slub.c:4953 [inline]
slab_alloc_node mm/slub.c:5263 [inline]
kmem_cache_alloc_noprof+0x37d/0x710 mm/slub.c:5270
getname_flags+0xb8/0x540 fs/namei.c:146
getname include/linux/fs.h:2498 [inline]
do_sys_openat2+0xbc/0x200 fs/open.c:1426
do_sys_open fs/open.c:1436 [inline]
__do_sys_openat fs/open.c:1452 [inline]
__se_sys_openat fs/open.c:1447 [inline]
__x64_sys_openat+0x138/0x170 fs/open.c:1447
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Memory state around the buggy address:
ffff88816a882580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88816a882600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88816a882680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88816a882700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88816a882780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-12-16 5:21 ` [syzbot ci] " syzbot ci
@ 2025-12-18 1:24 ` Caleb Sander Mateos
0 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-18 1:24 UTC (permalink / raw)
To: syzbot ci
Cc: axboe, io-uring, joannelkoong, linux-kernel, oliver.sang, syzbot,
syzbot, syzkaller-bugs
On Mon, Dec 15, 2025 at 9:21 PM syzbot ci
<syzbot+ci3ff889516a0b26a2@syzkaller.appspotmail.com> wrote:
>
> syzbot ci has tested the following series
>
> [v5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> https://lore.kernel.org/all/20251215200909.3505001-1-csander@purestorage.com
> * [PATCH v5 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED
> * [PATCH v5 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
> * [PATCH v5 3/6] io_uring: ensure io_uring_create() initializes submitter_task
> * [PATCH v5 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
> * [PATCH v5 5/6] io_uring: factor out uring_lock helpers
> * [PATCH v5 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>
> and found the following issue:
> KASAN: slab-use-after-free Read in task_work_add
>
> Full report is available here:
> https://ci.syzbot.org/series/bce89909-ebf2-45f6-be49-bbd46e33e966
>
> ***
>
> KASAN: slab-use-after-free Read in task_work_add
>
> tree: torvalds
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
> base: d358e5254674b70f34c847715ca509e46eb81e6f
> arch: amd64
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> config: https://ci.syzbot.org/builds/db5ac991-f49c-460f-80e4-2a33be76fe7c/config
> syz repro: https://ci.syzbot.org/findings/ddbf1feb-6618-4c0f-9a16-15b856f20d71/syz_repro
>
> ==================================================================
> BUG: KASAN: slab-use-after-free in task_work_add+0xd7/0x440 kernel/task_work.c:73
> Read of size 8 at addr ffff88816a8826f8 by task kworker/u9:2/54
>
> CPU: 0 UID: 0 PID: 54 Comm: kworker/u9:2 Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Workqueue: iou_exit io_ring_exit_work
> Call Trace:
> <TASK>
> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
> print_address_description mm/kasan/report.c:378 [inline]
> print_report+0xca/0x240 mm/kasan/report.c:482
> kasan_report+0x118/0x150 mm/kasan/report.c:595
> task_work_add+0xd7/0x440 kernel/task_work.c:73
> io_ring_ctx_lock_nested io_uring/io_uring.h:271 [inline]
> io_ring_ctx_lock io_uring/io_uring.h:282 [inline]
> io_req_caches_free+0x342/0x3e0 io_uring/io_uring.c:2869
> io_ring_ctx_free+0x56a/0x8e0 io_uring/io_uring.c:2908
The call to io_req_caches_free() comes after the
put_task_struct(ctx->submitter_task) call in io_ring_ctx_free(), so I
guess the task_struct may have already been freed when
io_ring_ctx_lock() is called. Should be simple enough to fix by just
moving the put_task_struct() call to the end of io_ring_ctx_free().
Looking at this made me realize one other small bug, it's incorrect to
assume that if task_work_add() fails because the submitter_task has
exited, the uring lock has been acquired successfully. Even though
submitter_task will no longer be using the uring lock, other tasks
could. So this path needs to acquire the uring_lock mutex, similar to
the IORING_SETUP_SINGLE_ISSUER && IORING_SETUP_R_DISABLED case.
Thanks,
Caleb
> io_ring_exit_work+0xff9/0x1220 io_uring/io_uring.c:3113
> process_one_work kernel/workqueue.c:3257 [inline]
> process_scheduled_works+0xad1/0x1770 kernel/workqueue.c:3340
> worker_thread+0x8a0/0xda0 kernel/workqueue.c:3421
> kthread+0x711/0x8a0 kernel/kthread.c:463
> ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
> </TASK>
>
> Allocated by task 7671:
> kasan_save_stack mm/kasan/common.c:56 [inline]
> kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
> unpoison_slab_object mm/kasan/common.c:339 [inline]
> __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:365
> kasan_slab_alloc include/linux/kasan.h:252 [inline]
> slab_post_alloc_hook mm/slub.c:4953 [inline]
> slab_alloc_node mm/slub.c:5263 [inline]
> kmem_cache_alloc_node_noprof+0x43c/0x720 mm/slub.c:5315
> alloc_task_struct_node kernel/fork.c:184 [inline]
> dup_task_struct+0x57/0x9a0 kernel/fork.c:915
> copy_process+0x4ea/0x3950 kernel/fork.c:2052
> kernel_clone+0x21e/0x820 kernel/fork.c:2651
> __do_sys_clone3 kernel/fork.c:2953 [inline]
> __se_sys_clone3+0x256/0x2d0 kernel/fork.c:2932
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Freed by task 6024:
> kasan_save_stack mm/kasan/common.c:56 [inline]
> kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
> kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:584
> poison_slab_object mm/kasan/common.c:252 [inline]
> __kasan_slab_free+0x5c/0x80 mm/kasan/common.c:284
> kasan_slab_free include/linux/kasan.h:234 [inline]
> slab_free_hook mm/slub.c:2540 [inline]
> slab_free mm/slub.c:6668 [inline]
> kmem_cache_free+0x197/0x620 mm/slub.c:6779
> rcu_do_batch kernel/rcu/tree.c:2605 [inline]
> rcu_core+0xd70/0x1870 kernel/rcu/tree.c:2857
> handle_softirqs+0x27d/0x850 kernel/softirq.c:622
> __do_softirq kernel/softirq.c:656 [inline]
> invoke_softirq kernel/softirq.c:496 [inline]
> __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:723
> irq_exit_rcu+0x9/0x30 kernel/softirq.c:739
> instr_sysvec_call_function_single arch/x86/kernel/smp.c:266 [inline]
> sysvec_call_function_single+0xa3/0xc0 arch/x86/kernel/smp.c:266
> asm_sysvec_call_function_single+0x1a/0x20 arch/x86/include/asm/idtentry.h:704
>
> Last potentially related work creation:
> kasan_save_stack+0x3e/0x60 mm/kasan/common.c:56
> kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
> __call_rcu_common kernel/rcu/tree.c:3119 [inline]
> call_rcu+0x157/0x9c0 kernel/rcu/tree.c:3239
> rcu_do_batch kernel/rcu/tree.c:2605 [inline]
> rcu_core+0xd70/0x1870 kernel/rcu/tree.c:2857
> handle_softirqs+0x27d/0x850 kernel/softirq.c:622
> run_ksoftirqd+0x9b/0x100 kernel/softirq.c:1063
> smpboot_thread_fn+0x542/0xa60 kernel/smpboot.c:160
> kthread+0x711/0x8a0 kernel/kthread.c:463
> ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
>
> Second to last potentially related work creation:
> kasan_save_stack+0x3e/0x60 mm/kasan/common.c:56
> kasan_record_aux_stack+0xbd/0xd0 mm/kasan/generic.c:556
> __call_rcu_common kernel/rcu/tree.c:3119 [inline]
> call_rcu+0x157/0x9c0 kernel/rcu/tree.c:3239
> context_switch kernel/sched/core.c:5259 [inline]
> __schedule+0x14c4/0x5000 kernel/sched/core.c:6863
> preempt_schedule_irq+0xb5/0x150 kernel/sched/core.c:7190
> irqentry_exit+0x5d8/0x660 kernel/entry/common.c:216
> asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:697
>
> The buggy address belongs to the object at ffff88816a881d40
> which belongs to the cache task_struct of size 7232
> The buggy address is located 2488 bytes inside of
> freed 7232-byte region [ffff88816a881d40, ffff88816a883980)
>
> The buggy address belongs to the physical page:
> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x16a880
> head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> memcg:ffff8881726b0441
> anon flags: 0x57ff00000000040(head|node=1|zone=2|lastcpupid=0x7ff)
> page_type: f5(slab)
> raw: 057ff00000000040 ffff88816040a500 0000000000000000 0000000000000001
> raw: 0000000000000000 0000000080040004 00000000f5000000 ffff8881726b0441
> head: 057ff00000000040 ffff88816040a500 0000000000000000 0000000000000001
> head: 0000000000000000 0000000080040004 00000000f5000000 ffff8881726b0441
> head: 057ff00000000003 ffffea0005aa2001 00000000ffffffff 00000000ffffffff
> head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
> page dumped because: kasan: bad access detected
> page_owner tracks the page as allocated
> page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 7291, tgid 7291 (syz.2.649), ts 88142964676, free_ts 88127352940
> set_page_owner include/linux/page_owner.h:32 [inline]
> post_alloc_hook+0x234/0x290 mm/page_alloc.c:1846
> prep_new_page mm/page_alloc.c:1854 [inline]
> get_page_from_freelist+0x2365/0x2440 mm/page_alloc.c:3915
> __alloc_frozen_pages_noprof+0x181/0x370 mm/page_alloc.c:5210
> alloc_pages_mpol+0x232/0x4a0 mm/mempolicy.c:2486
> alloc_slab_page mm/slub.c:3075 [inline]
> allocate_slab+0x86/0x3b0 mm/slub.c:3248
> new_slab mm/slub.c:3302 [inline]
> ___slab_alloc+0xf2b/0x1960 mm/slub.c:4656
> __slab_alloc+0x65/0x100 mm/slub.c:4779
> __slab_alloc_node mm/slub.c:4855 [inline]
> slab_alloc_node mm/slub.c:5251 [inline]
> kmem_cache_alloc_node_noprof+0x4ce/0x720 mm/slub.c:5315
> alloc_task_struct_node kernel/fork.c:184 [inline]
> dup_task_struct+0x57/0x9a0 kernel/fork.c:915
> copy_process+0x4ea/0x3950 kernel/fork.c:2052
> kernel_clone+0x21e/0x820 kernel/fork.c:2651
> __do_sys_clone3 kernel/fork.c:2953 [inline]
> __se_sys_clone3+0x256/0x2d0 kernel/fork.c:2932
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> page last free pid 5275 tgid 5275 stack trace:
> reset_page_owner include/linux/page_owner.h:25 [inline]
> free_pages_prepare mm/page_alloc.c:1395 [inline]
> __free_frozen_pages+0xbc8/0xd30 mm/page_alloc.c:2943
> __slab_free+0x21b/0x2a0 mm/slub.c:6004
> qlink_free mm/kasan/quarantine.c:163 [inline]
> qlist_free_all+0x97/0x100 mm/kasan/quarantine.c:179
> kasan_quarantine_reduce+0x148/0x160 mm/kasan/quarantine.c:286
> __kasan_slab_alloc+0x22/0x80 mm/kasan/common.c:349
> kasan_slab_alloc include/linux/kasan.h:252 [inline]
> slab_post_alloc_hook mm/slub.c:4953 [inline]
> slab_alloc_node mm/slub.c:5263 [inline]
> kmem_cache_alloc_noprof+0x37d/0x710 mm/slub.c:5270
> getname_flags+0xb8/0x540 fs/namei.c:146
> getname include/linux/fs.h:2498 [inline]
> do_sys_openat2+0xbc/0x200 fs/open.c:1426
> do_sys_open fs/open.c:1436 [inline]
> __do_sys_openat fs/open.c:1452 [inline]
> __se_sys_openat fs/open.c:1447 [inline]
> __x64_sys_openat+0x138/0x170 fs/open.c:1447
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Memory state around the buggy address:
> ffff88816a882580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88816a882600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff88816a882680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff88816a882700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff88816a882780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================
>
>
> ***
>
> If these findings have caused you to resend the series or submit a
> separate fix, please add the following tag to your commit message:
> Tested-by: syzbot@syzkaller.appspotmail.com
>
> ---
> This report is generated by a bot. It may contain errors.
> syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
@ 2025-12-18 2:44 Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED Caleb Sander Mateos
` (6 more replies)
0 siblings, 7 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-18 2:44 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-kernel; +Cc: Joanne Koong, Caleb Sander Mateos
Setting IORING_SETUP_SINGLE_ISSUER when creating an io_uring doesn't
actually enable any additional optimizations (aside from being a
requirement for IORING_SETUP_DEFER_TASKRUN). This series leverages
IORING_SETUP_SINGLE_ISSUER's guarantee that only one task submits SQEs
to skip taking the uring_lock mutex for the issue and task work paths.
First, we need to disable this optimization for IORING_SETUP_SQPOLL by
clearing the IORING_SETUP_SINGLE_ISSUER flag. For IORING_SETUP_SQPOLL,
the SQ thread is the one taking the uring_lock mutex in the issue path.
Since concurrent io_uring_register() syscalls are allowed on the thread
that created/enabled the io_uring, some additional synchronization
method would be required to synchronize the two threads. This is
possible in principle by having io_uring_register() schedule a task work
item to suspend the SQ thread, but seems complex for a niche use case.
Then we factor out helpers for interacting with uring_lock to centralize
the logic.
Finally, we implement the optimization for IORING_SETUP_SINGLE_ISSUER.
If the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the
uring_lock mutex_lock() and mutex_unlock() on the submitter_task. On
other tasks acquiring the ctx uring lock, use a task work item to
suspend the submitter_task for the critical section.
If the io_ring_ctx is IORING_SETUP_R_DISABLED (possible during
io_uring_setup(), io_uring_register(), or io_uring exit), submitter_task
may be set concurrently, so acquire the uring_lock before checking it.
If submitter_task isn't set yet, the uring_lock suffices to provide
mutual exclusion. If task work can't be queued because submitter_task
has exited, also use the uring_lock for mutual exclusion.
v6:
- Release submitter_task reference last in io_ring_ctx_free() (syzbot)
- Use the uring_lock to provide mutual exclusion if task_work_add()
fails because submitter_task has exited
- Add Reviewed-by tag
v5:
- Ensure submitter_task is initialized in io_uring_create() before
calling io_ring_ctx_wait_and_kill() (kernel test robot)
- Correct Fixes tag (Joanne)
- Add Reviewed-by tag
v4:
- Handle IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_R_DISABLED
correctly (syzbot)
- Remove separate set of helpers for io_uring_register()
- Add preliminary fix to prevent races between accessing ctx->flags and
submitter_task
v3:
- Ensure mutual exclusion on threads other than submitter_task via a
task work item to suspend submitter_task
- Drop patches already merged
v2:
- Don't enable these optimizations for IORING_SETUP_SQPOLL, as we still
need to synchronize SQ thread submission with io_uring_register()
Caleb Sander Mateos (6):
io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED
io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
io_uring: ensure submitter_task is valid for io_ring_ctx's lifetime
io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
io_uring: factor out uring_lock helpers
io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
include/linux/io_uring_types.h | 12 +-
io_uring/cancel.c | 40 +++---
io_uring/cancel.h | 5 +-
io_uring/eventfd.c | 5 +-
io_uring/fdinfo.c | 8 +-
io_uring/filetable.c | 8 +-
io_uring/futex.c | 14 +-
io_uring/io_uring.c | 232 ++++++++++++++++++++-------------
io_uring/io_uring.h | 187 +++++++++++++++++++++++---
io_uring/kbuf.c | 32 +++--
io_uring/memmap.h | 2 +-
io_uring/msg_ring.c | 33 +++--
io_uring/notif.c | 5 +-
io_uring/notif.h | 3 +-
io_uring/openclose.c | 14 +-
io_uring/poll.c | 21 +--
io_uring/register.c | 81 ++++++------
io_uring/rsrc.c | 51 +++++---
io_uring/rsrc.h | 6 +-
io_uring/rw.c | 2 +-
io_uring/splice.c | 5 +-
io_uring/sqpoll.c | 5 +-
io_uring/tctx.c | 27 ++--
io_uring/tctx.h | 5 +-
io_uring/uring_cmd.c | 13 +-
io_uring/waitid.c | 13 +-
io_uring/zcrx.c | 2 +-
27 files changed, 555 insertions(+), 276 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v6 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED
2025-12-18 2:44 [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
@ 2025-12-18 2:44 ` Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL Caleb Sander Mateos
` (5 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-18 2:44 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-kernel; +Cc: Joanne Koong, Caleb Sander Mateos
io_uring_enter() and io_msg_ring() read ctx->flags and
ctx->submitter_task without holding the ctx's uring_lock. This means
they may race with the assignment to ctx->submitter_task and the
clearing of IORING_SETUP_R_DISABLED from ctx->flags in
io_register_enable_rings(). Ensure the correct ordering of the
ctx->flags and ctx->submitter_task memory accesses by storing to
ctx->flags using release ordering and loading it using acquire ordering.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 4add705e4eeb ("io_uring: remove io_register_submitter")
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
---
io_uring/io_uring.c | 2 +-
io_uring/msg_ring.c | 4 ++--
io_uring/register.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 6cb24cdf8e68..761b9612c5b6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3249,11 +3249,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
goto out;
}
ctx = file->private_data;
ret = -EBADFD;
- if (unlikely(ctx->flags & IORING_SETUP_R_DISABLED))
+ if (unlikely(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED))
goto out;
/*
* For SQ polling, the thread will do all submissions and completions.
* Just return the requested submit count, and wake the thread if
diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index 7063ea7964e7..c48588e06bfb 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -123,11 +123,11 @@ static int __io_msg_ring_data(struct io_ring_ctx *target_ctx,
if (msg->src_fd || msg->flags & ~IORING_MSG_RING_FLAGS_PASS)
return -EINVAL;
if (!(msg->flags & IORING_MSG_RING_FLAGS_PASS) && msg->dst_fd)
return -EINVAL;
- if (target_ctx->flags & IORING_SETUP_R_DISABLED)
+ if (smp_load_acquire(&target_ctx->flags) & IORING_SETUP_R_DISABLED)
return -EBADFD;
if (io_msg_need_remote(target_ctx))
return io_msg_data_remote(target_ctx, msg);
@@ -243,11 +243,11 @@ static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
if (msg->len)
return -EINVAL;
if (target_ctx == ctx)
return -EINVAL;
- if (target_ctx->flags & IORING_SETUP_R_DISABLED)
+ if (smp_load_acquire(&target_ctx->flags) & IORING_SETUP_R_DISABLED)
return -EBADFD;
if (!msg->src_file) {
int ret = io_msg_grab_file(req, issue_flags);
if (unlikely(ret))
return ret;
diff --git a/io_uring/register.c b/io_uring/register.c
index 62d39b3ff317..9e473c244041 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -191,11 +191,11 @@ static int io_register_enable_rings(struct io_ring_ctx *ctx)
}
if (ctx->restrictions.registered)
ctx->restricted = 1;
- ctx->flags &= ~IORING_SETUP_R_DISABLED;
+ smp_store_release(&ctx->flags, ctx->flags & ~IORING_SETUP_R_DISABLED);
if (ctx->sq_data && wq_has_sleeper(&ctx->sq_data->wait))
wake_up(&ctx->sq_data->wait);
return 0;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
2025-12-18 2:44 [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED Caleb Sander Mateos
@ 2025-12-18 2:44 ` Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 3/6] io_uring: ensure submitter_task is valid for io_ring_ctx's lifetime Caleb Sander Mateos
` (4 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-18 2:44 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-kernel; +Cc: Joanne Koong, Caleb Sander Mateos
IORING_SETUP_SINGLE_ISSUER doesn't currently enable any optimizations,
but it will soon be used to avoid taking io_ring_ctx's uring_lock when
submitting from the single issuer task. If the IORING_SETUP_SQPOLL flag
is set, the SQ thread is the sole task issuing SQEs. However, other
tasks may make io_uring_register() syscalls, which must be synchronized
with SQE submission. So it wouldn't be safe to skip the uring_lock
around the SQ thread's submission even if IORING_SETUP_SINGLE_ISSUER is
set. Therefore, clear IORING_SETUP_SINGLE_ISSUER from the io_ring_ctx
flags if IORING_SETUP_SQPOLL is set.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
io_uring/io_uring.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 761b9612c5b6..44ff5756b328 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3478,10 +3478,19 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
*/
if ((flags & (IORING_SETUP_SQE128|IORING_SETUP_SQE_MIXED)) ==
(IORING_SETUP_SQE128|IORING_SETUP_SQE_MIXED))
return -EINVAL;
+ /*
+ * If IORING_SETUP_SQPOLL is set, only the SQ thread issues SQEs,
+ * but other threads may call io_uring_register() concurrently.
+ * We still need ctx uring lock to synchronize these io_ring_ctx
+ * accesses, so disable the single issuer optimizations.
+ */
+ if (flags & IORING_SETUP_SQPOLL)
+ p->flags &= ~IORING_SETUP_SINGLE_ISSUER;
+
return 0;
}
static int io_uring_fill_params(struct io_uring_params *p)
{
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 3/6] io_uring: ensure submitter_task is valid for io_ring_ctx's lifetime
2025-12-18 2:44 [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL Caleb Sander Mateos
@ 2025-12-18 2:44 ` Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued() Caleb Sander Mateos
` (3 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-18 2:44 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-kernel
Cc: Joanne Koong, Caleb Sander Mateos, kernel test robot, syzbot
If io_uring_create() fails after allocating the struct io_ring_ctx, it
may call io_ring_ctx_wait_and_kill() before submitter_task has been
assigned. This is currently harmless, as the submit and register paths
that check submitter_task aren't reachable until the io_ring_ctx has
been successfully created. However, a subsequent commit will expect
submitter_task to be set for every IORING_SETUP_SINGLE_ISSUER &&
!IORING_SETUP_R_DISABLED ctx. So assign ctx->submitter_task immediately
after allocating the ctx in io_uring_create().
Similarly, the reference on submitter_task is currently released early
in io_ring_ctx_free(). But it will soon be needed to acquire the uring
lock during the later call to io_req_caches_free(). So release the
submitter_task reference as the last thing before freeing the ctx.
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202512101405.a7a2bdb2-lkp@intel.com
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
io_uring/io_uring.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 44ff5756b328..22086ac84278 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2852,12 +2852,10 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
io_destroy_buffers(ctx);
io_free_region(ctx->user, &ctx->param_region);
mutex_unlock(&ctx->uring_lock);
if (ctx->sq_creds)
put_cred(ctx->sq_creds);
- if (ctx->submitter_task)
- put_task_struct(ctx->submitter_task);
WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list));
if (ctx->mm_account) {
mmdrop(ctx->mm_account);
@@ -2877,10 +2875,13 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
if (ctx->hash_map)
io_wq_put_hash(ctx->hash_map);
io_napi_free(ctx);
kvfree(ctx->cancel_table.hbs);
xa_destroy(&ctx->io_bl_xa);
+ /* Release submitter_task last, as any io_ring_ctx_lock() may use it */
+ if (ctx->submitter_task)
+ put_task_struct(ctx->submitter_task);
kfree(ctx);
}
static __cold void io_activate_pollwq_cb(struct callback_head *cb)
{
@@ -3594,10 +3595,20 @@ static __cold int io_uring_create(struct io_ctx_config *config)
ctx = io_ring_ctx_alloc(p);
if (!ctx)
return -ENOMEM;
+ /* Assign submitter_task first, as any io_ring_ctx_lock() may use it */
+ if (ctx->flags & IORING_SETUP_SINGLE_ISSUER
+ && !(ctx->flags & IORING_SETUP_R_DISABLED)) {
+ /*
+ * Unlike io_register_enable_rings(), don't need WRITE_ONCE()
+ * since ctx isn't yet accessible from other tasks
+ */
+ ctx->submitter_task = get_task_struct(current);
+ }
+
ctx->clockid = CLOCK_MONOTONIC;
ctx->clock_offset = 0;
if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
static_branch_inc(&io_key_has_sqarray);
@@ -3662,19 +3673,10 @@ static __cold int io_uring_create(struct io_ctx_config *config)
if (copy_to_user(config->uptr, p, sizeof(*p))) {
ret = -EFAULT;
goto err;
}
- if (ctx->flags & IORING_SETUP_SINGLE_ISSUER
- && !(ctx->flags & IORING_SETUP_R_DISABLED)) {
- /*
- * Unlike io_register_enable_rings(), don't need WRITE_ONCE()
- * since ctx isn't yet accessible from other tasks
- */
- ctx->submitter_task = get_task_struct(current);
- }
-
file = io_uring_get_file(ctx);
if (IS_ERR(file)) {
ret = PTR_ERR(file);
goto err;
}
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
2025-12-18 2:44 [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
` (2 preceding siblings ...)
2025-12-18 2:44 ` [PATCH v6 3/6] io_uring: ensure submitter_task is valid for io_ring_ctx's lifetime Caleb Sander Mateos
@ 2025-12-18 2:44 ` Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 5/6] io_uring: factor out uring_lock helpers Caleb Sander Mateos
` (2 subsequent siblings)
6 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-18 2:44 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-kernel; +Cc: Joanne Koong, Caleb Sander Mateos
Use the io_ring_submit_lock() helper in io_iopoll_req_issued() instead
of open-coding the logic. io_ring_submit_unlock() can't be used for the
unlock, though, due to the extra logic before releasing the mutex.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
---
io_uring/io_uring.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 22086ac84278..ab0af4a38714 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1670,15 +1670,13 @@ void io_req_task_complete(struct io_tw_req tw_req, io_tw_token_t tw)
* accessing the kiocb cookie.
*/
static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_ring_ctx *ctx = req->ctx;
- const bool needs_lock = issue_flags & IO_URING_F_UNLOCKED;
/* workqueue context doesn't hold uring_lock, grab it now */
- if (unlikely(needs_lock))
- mutex_lock(&ctx->uring_lock);
+ io_ring_submit_lock(ctx, issue_flags);
/*
* Track whether we have multiple files in our lists. This will impact
* how we do polling eventually, not spinning if we're on potentially
* different devices.
@@ -1701,11 +1699,11 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
if (READ_ONCE(req->iopoll_completed))
wq_list_add_head(&req->comp_list, &ctx->iopoll_list);
else
wq_list_add_tail(&req->comp_list, &ctx->iopoll_list);
- if (unlikely(needs_lock)) {
+ if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) {
/*
* If IORING_SETUP_SQPOLL is enabled, sqes are either handle
* in sq thread task context or in io worker task context. If
* current task context is sq thread, we don't need to check
* whether should wake up sq thread.
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 5/6] io_uring: factor out uring_lock helpers
2025-12-18 2:44 [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
` (3 preceding siblings ...)
2025-12-18 2:44 ` [PATCH v6 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued() Caleb Sander Mateos
@ 2025-12-18 2:44 ` Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-12-18 8:01 ` [syzbot ci] " syzbot ci
6 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-18 2:44 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-kernel; +Cc: Joanne Koong, Caleb Sander Mateos
A subsequent commit will skip acquiring the io_ring_ctx uring_lock in
io_uring_enter() and io_handle_tw_list() for IORING_SETUP_SINGLE_ISSUER.
Prepare for this change by factoring out the uring_lock accesses under
these functions into helpers. Aside from the helpers, the only remaining
access of uring_lock is its mutex_init() call. Define a struct
io_ring_ctx_lock_state to pass state from io_ring_ctx_lock() to
io_ring_ctx_unlock(). It's currently empty but a subsequent commit will
add fields.
Helpers:
- io_ring_ctx_lock() for mutex_lock()
- io_ring_ctx_lock_nested() for mutex_lock_nested()
- io_ring_ctx_trylock() for mutex_trylock()
- io_ring_ctx_unlock() for mutex_unlock()
- io_ring_ctx_lock_held() for lockdep_is_held()
- io_ring_ctx_assert_locked() for lockdep_assert_held()
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
include/linux/io_uring_types.h | 12 +--
io_uring/cancel.c | 40 ++++----
io_uring/cancel.h | 5 +-
io_uring/eventfd.c | 5 +-
io_uring/fdinfo.c | 8 +-
io_uring/filetable.c | 8 +-
io_uring/futex.c | 14 +--
io_uring/io_uring.c | 181 +++++++++++++++++++--------------
io_uring/io_uring.h | 75 +++++++++++---
io_uring/kbuf.c | 32 +++---
io_uring/memmap.h | 2 +-
io_uring/msg_ring.c | 29 ++++--
io_uring/notif.c | 5 +-
io_uring/notif.h | 3 +-
io_uring/openclose.c | 14 +--
io_uring/poll.c | 21 ++--
io_uring/register.c | 79 +++++++-------
io_uring/rsrc.c | 51 ++++++----
io_uring/rsrc.h | 6 +-
io_uring/rw.c | 2 +-
io_uring/splice.c | 5 +-
io_uring/sqpoll.c | 5 +-
io_uring/tctx.c | 27 +++--
io_uring/tctx.h | 5 +-
io_uring/uring_cmd.c | 13 ++-
io_uring/waitid.c | 13 +--
io_uring/zcrx.c | 2 +-
27 files changed, 404 insertions(+), 258 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index e1adb0d20a0a..74d202394b20 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -86,11 +86,11 @@ struct io_mapped_region {
/*
* Return value from io_buffer_list selection, to avoid stashing it in
* struct io_kiocb. For legacy/classic provided buffers, keeping a reference
* across execution contexts are fine. But for ring provided buffers, the
- * list may go away as soon as ->uring_lock is dropped. As the io_kiocb
+ * list may go away as soon as the ctx uring lock is dropped. As the io_kiocb
* persists, it's better to just keep the buffer local for those cases.
*/
struct io_br_sel {
struct io_buffer_list *buf_list;
/*
@@ -231,11 +231,11 @@ struct io_submit_link {
struct io_kiocb *head;
struct io_kiocb *last;
};
struct io_submit_state {
- /* inline/task_work completion list, under ->uring_lock */
+ /* inline/task_work completion list, under ctx uring lock */
struct io_wq_work_node free_list;
/* batch completion logic */
struct io_wq_work_list compl_reqs;
struct io_submit_link link;
@@ -303,16 +303,16 @@ struct io_ring_ctx {
unsigned cached_sq_head;
unsigned sq_entries;
/*
* Fixed resources fast path, should be accessed only under
- * uring_lock, and updated through io_uring_register(2)
+ * ctx uring lock, and updated through io_uring_register(2)
*/
atomic_t cancel_seq;
/*
- * ->iopoll_list is protected by the ctx->uring_lock for
+ * ->iopoll_list is protected by the ctx uring lock for
* io_uring instances that don't use IORING_SETUP_SQPOLL.
* For SQPOLL, only the single threaded io_sq_thread() will
* manipulate the list, hence no extra locking is needed there.
*/
bool poll_multi_queue;
@@ -324,11 +324,11 @@ struct io_ring_ctx {
struct io_alloc_cache imu_cache;
struct io_submit_state submit_state;
/*
- * Modifications are protected by ->uring_lock and ->mmap_lock.
+ * Modifications protected by ctx uring lock and ->mmap_lock.
* The buffer list's io mapped region should be stable once
* published.
*/
struct xarray io_bl_xa;
@@ -467,11 +467,11 @@ struct io_ring_ctx {
struct io_mapped_region param_region;
};
/*
* Token indicating function is called in task work context:
- * ctx->uring_lock is held and any completions generated will be flushed.
+ * ctx uring lock is held and any completions generated will be flushed.
* ONLY core io_uring.c should instantiate this struct.
*/
struct io_tw_state {
bool cancel;
};
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index ca12ac10c0ae..68b58c7765ef 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -168,10 +168,11 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
static int __io_async_cancel(struct io_cancel_data *cd,
struct io_uring_task *tctx,
unsigned int issue_flags)
{
bool all = cd->flags & (IORING_ASYNC_CANCEL_ALL|IORING_ASYNC_CANCEL_ANY);
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = cd->ctx;
struct io_tctx_node *node;
int ret, nr = 0;
do {
@@ -182,21 +183,21 @@ static int __io_async_cancel(struct io_cancel_data *cd,
return ret;
nr++;
} while (1);
/* slow path, try all io-wq's */
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
ret = -ENOENT;
list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
ret = io_async_cancel_one(node->task->io_uring, cd);
if (ret != -ENOENT) {
if (!all)
break;
nr++;
}
}
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return all ? nr : ret;
}
int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
{
@@ -238,11 +239,11 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
static int __io_sync_cancel(struct io_uring_task *tctx,
struct io_cancel_data *cd, int fd)
{
struct io_ring_ctx *ctx = cd->ctx;
- /* fixed must be grabbed every time since we drop the uring_lock */
+ /* fixed must be grabbed every time since we drop the ctx uring lock */
if ((cd->flags & IORING_ASYNC_CANCEL_FD) &&
(cd->flags & IORING_ASYNC_CANCEL_FD_FIXED)) {
struct io_rsrc_node *node;
node = io_rsrc_node_lookup(&ctx->file_table.data, fd);
@@ -254,12 +255,12 @@ static int __io_sync_cancel(struct io_uring_task *tctx,
}
return __io_async_cancel(cd, tctx, 0);
}
-int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg)
- __must_hold(&ctx->uring_lock)
+int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg,
+ struct io_ring_ctx_lock_state *lock_state)
{
struct io_cancel_data cd = {
.ctx = ctx,
.seq = atomic_inc_return(&ctx->cancel_seq),
};
@@ -267,10 +268,12 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg)
struct io_uring_sync_cancel_reg sc;
struct file *file = NULL;
DEFINE_WAIT(wait);
int ret, i;
+ io_ring_ctx_assert_locked(ctx);
+
if (copy_from_user(&sc, arg, sizeof(sc)))
return -EFAULT;
if (sc.flags & ~CANCEL_FLAGS)
return -EINVAL;
for (i = 0; i < ARRAY_SIZE(sc.pad); i++)
@@ -317,11 +320,11 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg)
prepare_to_wait(&ctx->cq_wait, &wait, TASK_INTERRUPTIBLE);
ret = __io_sync_cancel(current->io_uring, &cd, sc.fd);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
if (ret != -EALREADY)
break;
ret = io_run_task_work_sig(ctx);
if (ret < 0)
@@ -329,15 +332,15 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg)
ret = schedule_hrtimeout(&timeout, HRTIMER_MODE_ABS);
if (!ret) {
ret = -ETIME;
break;
}
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
} while (1);
finish_wait(&ctx->cq_wait, &wait);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
if (ret == -ENOENT || ret > 0)
ret = 0;
out:
if (file)
@@ -351,11 +354,11 @@ bool io_cancel_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx,
{
struct hlist_node *tmp;
struct io_kiocb *req;
bool found = false;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
hlist_for_each_entry_safe(req, tmp, list, hash_node) {
if (!io_match_task_safe(req, tctx, cancel_all))
continue;
hlist_del_init(&req->hash_node);
@@ -368,24 +371,25 @@ bool io_cancel_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx,
int io_cancel_remove(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
unsigned int issue_flags, struct hlist_head *list,
bool (*cancel)(struct io_kiocb *))
{
+ struct io_ring_ctx_lock_state lock_state;
struct hlist_node *tmp;
struct io_kiocb *req;
int nr = 0;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
hlist_for_each_entry_safe(req, tmp, list, hash_node) {
if (!io_cancel_req_match(req, cd))
continue;
if (cancel(req))
nr++;
if (!(cd->flags & IORING_ASYNC_CANCEL_ALL))
break;
}
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return nr ?: -ENOENT;
}
static bool io_match_linked(struct io_kiocb *head)
{
@@ -477,37 +481,39 @@ __cold bool io_cancel_ctx_cb(struct io_wq_work *work, void *data)
return req->ctx == data;
}
static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_tctx_node *node;
enum io_wq_cancel cret;
bool ret = false;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
struct io_uring_task *tctx = node->task->io_uring;
/*
- * io_wq will stay alive while we hold uring_lock, because it's
- * killed after ctx nodes, which requires to take the lock.
+ * io_wq will stay alive while we hold ctx uring lock, because
+ * it's killed after ctx nodes, which requires to take the lock.
*/
if (!tctx || !tctx->io_wq)
continue;
cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true);
ret |= (cret != IO_WQ_CANCEL_NOTFOUND);
}
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
return ret;
}
__cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
struct io_uring_task *tctx,
bool cancel_all, bool is_sqpoll_thread)
{
struct io_task_cancel cancel = { .tctx = tctx, .all = cancel_all, };
+ struct io_ring_ctx_lock_state lock_state;
enum io_wq_cancel cret;
bool ret = false;
/* set it so io_req_local_work_add() would wake us up */
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
@@ -542,17 +548,17 @@ __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
}
if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
io_allowed_defer_tw_run(ctx))
ret |= io_run_local_work(ctx, INT_MAX, INT_MAX) > 0;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
ret |= io_cancel_defer_files(ctx, tctx, cancel_all);
ret |= io_poll_remove_all(ctx, tctx, cancel_all);
ret |= io_waitid_remove_all(ctx, tctx, cancel_all);
ret |= io_futex_remove_all(ctx, tctx, cancel_all);
ret |= io_uring_try_cancel_uring_cmd(ctx, tctx, cancel_all);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
ret |= io_kill_timeouts(ctx, tctx, cancel_all);
if (tctx)
ret |= io_run_task_work() > 0;
else
ret |= flush_delayed_work(&ctx->fallback_work);
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 6783961ede1b..ce4f6b69218e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -2,10 +2,12 @@
#ifndef IORING_CANCEL_H
#define IORING_CANCEL_H
#include <linux/io_uring_types.h>
+#include "io_uring.h"
+
struct io_cancel_data {
struct io_ring_ctx *ctx;
union {
u64 data;
struct file *file;
@@ -19,11 +21,12 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
int io_try_cancel(struct io_uring_task *tctx, struct io_cancel_data *cd,
unsigned int issue_flags);
-int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg);
+int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg,
+ struct io_ring_ctx_lock_state *lock_state);
bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd);
bool io_match_task_safe(struct io_kiocb *head, struct io_uring_task *tctx,
bool cancel_all);
bool io_cancel_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx,
diff --git a/io_uring/eventfd.c b/io_uring/eventfd.c
index 78f8ab7db104..0c615be71edf 100644
--- a/io_uring/eventfd.c
+++ b/io_uring/eventfd.c
@@ -6,10 +6,11 @@
#include <linux/eventfd.h>
#include <linux/eventpoll.h>
#include <linux/io_uring.h>
#include <linux/io_uring_types.h>
+#include "io_uring.h"
#include "io-wq.h"
#include "eventfd.h"
struct io_ev_fd {
struct eventfd_ctx *cq_ev_fd;
@@ -118,11 +119,11 @@ int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg,
struct io_ev_fd *ev_fd;
__s32 __user *fds = arg;
int fd;
ev_fd = rcu_dereference_protected(ctx->io_ev_fd,
- lockdep_is_held(&ctx->uring_lock));
+ io_ring_ctx_lock_held(ctx));
if (ev_fd)
return -EBUSY;
if (copy_from_user(&fd, fds, sizeof(*fds)))
return -EFAULT;
@@ -154,11 +155,11 @@ int io_eventfd_register(struct io_ring_ctx *ctx, void __user *arg,
int io_eventfd_unregister(struct io_ring_ctx *ctx)
{
struct io_ev_fd *ev_fd;
ev_fd = rcu_dereference_protected(ctx->io_ev_fd,
- lockdep_is_held(&ctx->uring_lock));
+ io_ring_ctx_lock_held(ctx));
if (ev_fd) {
ctx->has_evfd = false;
rcu_assign_pointer(ctx->io_ev_fd, NULL);
io_eventfd_put(ev_fd);
return 0;
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index a87d4e26eee8..886c06278a9b 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -9,10 +9,11 @@
#include <linux/io_uring.h>
#include <uapi/linux/io_uring.h>
#include "filetable.h"
+#include "io_uring.h"
#include "sqpoll.h"
#include "fdinfo.h"
#include "cancel.h"
#include "rsrc.h"
#include "opdef.h"
@@ -75,11 +76,11 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
if (ctx->flags & IORING_SETUP_SQE128)
sq_shift = 1;
/*
* we may get imprecise sqe and cqe info if uring is actively running
- * since we get cached_sq_head and cached_cq_tail without uring_lock
+ * since we get cached_sq_head and cached_cq_tail without ctx uring lock
* and sq_tail and cq_head are changed by userspace. But it's ok since
* we usually use these info when it is stuck.
*/
seq_printf(m, "SqMask:\t0x%x\n", sq_mask);
seq_printf(m, "SqHead:\t%u\n", sq_head);
@@ -249,16 +250,17 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
* anything else to get an extra reference.
*/
__cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file)
{
struct io_ring_ctx *ctx = file->private_data;
+ struct io_ring_ctx_lock_state lock_state;
/*
* Avoid ABBA deadlock between the seq lock and the io_uring mutex,
* since fdinfo case grabs it in the opposite direction of normal use
* cases.
*/
- if (mutex_trylock(&ctx->uring_lock)) {
+ if (io_ring_ctx_trylock(ctx, &lock_state)) {
__io_uring_show_fdinfo(ctx, m);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
}
}
diff --git a/io_uring/filetable.c b/io_uring/filetable.c
index 794ef95df293..40ad4a08dc89 100644
--- a/io_uring/filetable.c
+++ b/io_uring/filetable.c
@@ -55,14 +55,15 @@ void io_free_file_tables(struct io_ring_ctx *ctx, struct io_file_table *table)
table->bitmap = NULL;
}
static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *file,
u32 slot_index)
- __must_hold(&ctx->uring_lock)
{
struct io_rsrc_node *node;
+ io_ring_ctx_assert_locked(ctx);
+
if (io_is_uring_fops(file))
return -EBADF;
if (!ctx->file_table.data.nr)
return -ENXIO;
if (slot_index >= ctx->file_table.data.nr)
@@ -105,16 +106,17 @@ int __io_fixed_fd_install(struct io_ring_ctx *ctx, struct file *file,
* fput() is called correspondingly.
*/
int io_fixed_fd_install(struct io_kiocb *req, unsigned int issue_flags,
struct file *file, unsigned int file_slot)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
int ret;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
ret = __io_fixed_fd_install(ctx, file, file_slot);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
if (unlikely(ret < 0))
fput(file);
return ret;
}
diff --git a/io_uring/futex.c b/io_uring/futex.c
index 11bfff5a80df..aeda00981c7a 100644
--- a/io_uring/futex.c
+++ b/io_uring/futex.c
@@ -220,22 +220,23 @@ static void io_futex_wake_fn(struct wake_q_head *wake_q, struct futex_q *q)
int io_futexv_wait(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_futex *iof = io_kiocb_to_cmd(req, struct io_futex);
struct io_futexv_data *ifd = req->async_data;
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
int ret, woken = -1;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
ret = futex_wait_multiple_setup(ifd->futexv, iof->futex_nr, &woken);
/*
* Error case, ret is < 0. Mark the request as failed.
*/
if (unlikely(ret < 0)) {
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
req_set_fail(req);
io_req_set_res(req, ret, 0);
io_req_async_data_free(req);
return IOU_COMPLETE;
}
@@ -265,27 +266,28 @@ int io_futexv_wait(struct io_kiocb *req, unsigned int issue_flags)
iof->futexv_unqueued = 1;
if (woken != -1)
io_req_set_res(req, woken, 0);
}
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return IOU_ISSUE_SKIP_COMPLETE;
}
int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_futex *iof = io_kiocb_to_cmd(req, struct io_futex);
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_futex_data *ifd = NULL;
int ret;
if (!iof->futex_mask) {
ret = -EINVAL;
goto done;
}
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
ifd = io_cache_alloc(&ctx->futex_cache, GFP_NOWAIT);
if (!ifd) {
ret = -ENOMEM;
goto done_unlock;
}
@@ -299,17 +301,17 @@ int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags)
ret = futex_wait_setup(iof->uaddr, iof->futex_val, iof->futex_flags,
&ifd->q, NULL, NULL);
if (!ret) {
hlist_add_head(&req->hash_node, &ctx->futex_list);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return IOU_ISSUE_SKIP_COMPLETE;
}
done_unlock:
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
done:
if (ret < 0)
req_set_fail(req);
io_req_set_res(req, ret, 0);
io_req_async_data_free(req);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ab0af4a38714..237663382a5e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -234,20 +234,21 @@ static inline bool io_should_terminate_tw(struct io_ring_ctx *ctx)
static __cold void io_fallback_req_func(struct work_struct *work)
{
struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx,
fallback_work.work);
struct llist_node *node = llist_del_all(&ctx->fallback_llist);
+ struct io_ring_ctx_lock_state lock_state;
struct io_kiocb *req, *tmp;
struct io_tw_state ts = {};
percpu_ref_get(&ctx->refs);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
ts.cancel = io_should_terminate_tw(ctx);
llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
req->io_task_work.func((struct io_tw_req){req}, ts);
io_submit_flush_completions(ctx);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
percpu_ref_put(&ctx->refs);
}
static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits)
{
@@ -514,11 +515,11 @@ unsigned io_linked_nr(struct io_kiocb *req)
static __cold noinline void io_queue_deferred(struct io_ring_ctx *ctx)
{
bool drain_seen = false, first = true;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
__io_req_caches_free(ctx);
while (!list_empty(&ctx->defer_list)) {
struct io_defer_entry *de = list_first_entry(&ctx->defer_list,
struct io_defer_entry, list);
@@ -577,13 +578,15 @@ static void io_cq_unlock_post(struct io_ring_ctx *ctx)
spin_unlock(&ctx->completion_lock);
io_cqring_wake(ctx);
io_commit_cqring_flush(ctx);
}
-static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool dying)
+static void
+__io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool dying,
+ struct io_ring_ctx_lock_state *lock_state)
{
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
/* don't abort if we're dying, entries must get freed */
if (!dying && __io_cqring_events(ctx) == ctx->cq_entries)
return;
@@ -620,13 +623,13 @@ static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool dying)
* to care for a non-real case.
*/
if (need_resched()) {
ctx->cqe_sentinel = ctx->cqe_cached;
io_cq_unlock_post(ctx);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
cond_resched();
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
io_cq_lock(ctx);
}
}
if (list_empty(&ctx->cq_overflow_list)) {
@@ -634,21 +637,24 @@ static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool dying)
atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
io_cq_unlock_post(ctx);
}
-static void io_cqring_overflow_kill(struct io_ring_ctx *ctx)
+static void io_cqring_overflow_kill(struct io_ring_ctx *ctx,
+ struct io_ring_ctx_lock_state *lock_state)
{
if (ctx->rings)
- __io_cqring_overflow_flush(ctx, true);
+ __io_cqring_overflow_flush(ctx, true, lock_state);
}
static void io_cqring_do_overflow_flush(struct io_ring_ctx *ctx)
{
- mutex_lock(&ctx->uring_lock);
- __io_cqring_overflow_flush(ctx, false);
- mutex_unlock(&ctx->uring_lock);
+ struct io_ring_ctx_lock_state lock_state;
+
+ io_ring_ctx_lock(ctx, &lock_state);
+ __io_cqring_overflow_flush(ctx, false, &lock_state);
+ io_ring_ctx_unlock(ctx, &lock_state);
}
/* must to be called somewhat shortly after putting a request */
static inline void io_put_task(struct io_kiocb *req)
{
@@ -883,15 +889,15 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags
return filled;
}
/*
* Must be called from inline task_work so we know a flush will happen later,
- * and obviously with ctx->uring_lock held (tw always has that).
+ * and obviously with ctx uring lock held (tw always has that).
*/
void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags)
{
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
lockdep_assert(ctx->lockless_cq);
if (!io_fill_cqe_aux(ctx, user_data, res, cflags)) {
struct io_cqe cqe = io_init_cqe(user_data, res, cflags);
@@ -916,11 +922,11 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
*/
if (!wq_list_empty(&ctx->submit_state.compl_reqs))
__io_submit_flush_completions(ctx);
lockdep_assert(!io_wq_current_is_worker());
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
if (!ctx->lockless_cq) {
spin_lock(&ctx->completion_lock);
posted = io_fill_cqe_aux(ctx, req->cqe.user_data, res, cflags);
spin_unlock(&ctx->completion_lock);
@@ -940,11 +946,11 @@ bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe cqe[2])
{
struct io_ring_ctx *ctx = req->ctx;
bool posted;
lockdep_assert(!io_wq_current_is_worker());
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
cqe[0].user_data = req->cqe.user_data;
if (!ctx->lockless_cq) {
spin_lock(&ctx->completion_lock);
posted = io_fill_cqe_aux32(ctx, cqe);
@@ -969,11 +975,11 @@ static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_IOWQ)))
return;
/*
* Handle special CQ sync cases via task_work. DEFER_TASKRUN requires
- * the submitter task context, IOPOLL protects with uring_lock.
+ * the submitter task context, IOPOLL protects with ctx uring lock.
*/
if (ctx->lockless_cq || (req->flags & REQ_F_REISSUE)) {
defer_complete:
req->io_task_work.func = io_req_task_complete;
io_req_task_work_add(req);
@@ -994,15 +1000,14 @@ static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
*/
req_ref_put(req);
}
void io_req_defer_failed(struct io_kiocb *req, s32 res)
- __must_hold(&ctx->uring_lock)
{
const struct io_cold_def *def = &io_cold_defs[req->opcode];
- lockdep_assert_held(&req->ctx->uring_lock);
+ io_ring_ctx_assert_locked(req->ctx);
req_set_fail(req);
io_req_set_res(req, res, io_put_kbuf(req, res, NULL));
if (def->fail)
def->fail(req);
@@ -1010,20 +1015,21 @@ void io_req_defer_failed(struct io_kiocb *req, s32 res)
}
/*
* A request might get retired back into the request caches even before opcode
* handlers and io_issue_sqe() are done with it, e.g. inline completion path.
- * Because of that, io_alloc_req() should be called only under ->uring_lock
+ * Because of that, io_alloc_req() should be called only under ctx uring lock
* and with extra caution to not get a request that is still worked on.
*/
__cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
{
gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO;
void *reqs[IO_REQ_ALLOC_BATCH];
int ret;
+ io_ring_ctx_assert_locked(ctx);
+
ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
/*
* Bulk alloc is all-or-nothing. If we fail to get a batch,
* retry single alloc to be on the safe side.
@@ -1080,19 +1086,20 @@ static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
nxt = req->link;
req->link = NULL;
return nxt;
}
-static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw)
+static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw,
+ struct io_ring_ctx_lock_state *lock_state)
{
if (!ctx)
return;
if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
io_submit_flush_completions(ctx);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
percpu_ref_put(&ctx->refs);
}
/*
* Run queued task_work, returning the number of entries processed in *count.
@@ -1101,38 +1108,39 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw)
*/
struct llist_node *io_handle_tw_list(struct llist_node *node,
unsigned int *count,
unsigned int max_entries)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = NULL;
struct io_tw_state ts = { };
do {
struct llist_node *next = node->next;
struct io_kiocb *req = container_of(node, struct io_kiocb,
io_task_work.node);
if (req->ctx != ctx) {
- ctx_flush_and_put(ctx, ts);
+ ctx_flush_and_put(ctx, ts, &lock_state);
ctx = req->ctx;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
percpu_ref_get(&ctx->refs);
ts.cancel = io_should_terminate_tw(ctx);
}
INDIRECT_CALL_2(req->io_task_work.func,
io_poll_task_func, io_req_rw_complete,
(struct io_tw_req){req}, ts);
node = next;
(*count)++;
if (unlikely(need_resched())) {
- ctx_flush_and_put(ctx, ts);
+ ctx_flush_and_put(ctx, ts, &lock_state);
ctx = NULL;
cond_resched();
}
} while (node && *count < max_entries);
- ctx_flush_and_put(ctx, ts);
+ ctx_flush_and_put(ctx, ts, &lock_state);
return node;
}
static __cold void __io_fallback_tw(struct llist_node *node, bool sync)
{
@@ -1401,16 +1409,17 @@ static inline int io_run_local_work_locked(struct io_ring_ctx *ctx,
max(IO_LOCAL_TW_DEFAULT_MAX, min_events));
}
int io_run_local_work(struct io_ring_ctx *ctx, int min_events, int max_events)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_tw_state ts = {};
int ret;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
ret = __io_run_local_work(ctx, ts, min_events, max_events);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
return ret;
}
static void io_req_task_cancel(struct io_tw_req tw_req, io_tw_token_t tw)
{
@@ -1465,12 +1474,13 @@ static inline void io_req_put_rsrc_nodes(struct io_kiocb *req)
io_put_rsrc_node(req->ctx, req->buf_node);
}
static void io_free_batch_list(struct io_ring_ctx *ctx,
struct io_wq_work_node *node)
- __must_hold(&ctx->uring_lock)
{
+ io_ring_ctx_assert_locked(ctx);
+
do {
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
if (unlikely(req->flags & IO_REQ_CLEAN_SLOW_FLAGS)) {
@@ -1506,15 +1516,16 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
io_req_add_to_cache(req, ctx);
} while (node);
}
void __io_submit_flush_completions(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
{
struct io_submit_state *state = &ctx->submit_state;
struct io_wq_work_node *node;
+ io_ring_ctx_assert_locked(ctx);
+
__io_cq_lock(ctx);
__wq_list_for_each(node, &state->compl_reqs) {
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
@@ -1555,51 +1566,54 @@ static unsigned io_cqring_events(struct io_ring_ctx *ctx)
* We can't just wait for polled events to come to us, we have to actively
* find and complete them.
*/
__cold void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
{
+ struct io_ring_ctx_lock_state lock_state;
+
if (!(ctx->flags & IORING_SETUP_IOPOLL))
return;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
while (!wq_list_empty(&ctx->iopoll_list)) {
/* let it sleep and repeat later if can't complete a request */
if (io_do_iopoll(ctx, true) == 0)
break;
/*
* Ensure we allow local-to-the-cpu processing to take place,
* in this case we need to ensure that we reap all events.
* Also let task_work, etc. to progress by releasing the mutex
*/
if (need_resched()) {
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
cond_resched();
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
}
}
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
io_move_task_work_from_local(ctx);
}
-static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
+static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events,
+ struct io_ring_ctx_lock_state *lock_state)
{
unsigned int nr_events = 0;
unsigned long check_cq;
min_events = min(min_events, ctx->cq_entries);
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
if (!io_allowed_run_tw(ctx))
return -EEXIST;
check_cq = READ_ONCE(ctx->check_cq);
if (unlikely(check_cq)) {
if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
- __io_cqring_overflow_flush(ctx, false);
+ __io_cqring_overflow_flush(ctx, false, lock_state);
/*
* Similarly do not spin if we have not informed the user of any
* dropped CQE.
*/
if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
@@ -1617,11 +1631,11 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
int ret = 0;
/*
* If a submit got punted to a workqueue, we can have the
* application entering polling for a command before it gets
- * issued. That app will hold the uring_lock for the duration
+ * issued. That app holds the ctx uring lock for the duration
* of the poll right here, so we need to take a breather every
* now and then to ensure that the issue has a chance to add
* the poll to the issued list. Otherwise we can spin here
* forever, while the workqueue is stuck trying to acquire the
* very same mutex.
@@ -1632,13 +1646,13 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_events)
(void) io_run_local_work_locked(ctx, min_events);
if (task_work_pending(current) ||
wq_list_empty(&ctx->iopoll_list)) {
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
io_run_task_work();
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
}
/* some requests don't go through iopoll_list */
if (tail != ctx->cached_cq_tail ||
wq_list_empty(&ctx->iopoll_list))
break;
@@ -1669,14 +1683,15 @@ void io_req_task_complete(struct io_tw_req tw_req, io_tw_token_t tw)
* find it from a io_do_iopoll() thread before the issuer is done
* accessing the kiocb cookie.
*/
static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
- /* workqueue context doesn't hold uring_lock, grab it now */
- io_ring_submit_lock(ctx, issue_flags);
+ /* workqueue context doesn't hold ctx uring lock, grab it now */
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
/*
* Track whether we have multiple files in our lists. This will impact
* how we do polling eventually, not spinning if we're on potentially
* different devices.
@@ -1710,11 +1725,11 @@ static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_flags)
*/
if ((ctx->flags & IORING_SETUP_SQPOLL) &&
wq_has_sleeper(&ctx->sq_data->wait))
wake_up(&ctx->sq_data->wait);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
}
}
io_req_flags_t io_file_get_flags(struct file *file)
{
@@ -1728,16 +1743,17 @@ io_req_flags_t io_file_get_flags(struct file *file)
res |= REQ_F_SUPPORT_NOWAIT;
return res;
}
static __cold void io_drain_req(struct io_kiocb *req)
- __must_hold(&ctx->uring_lock)
{
struct io_ring_ctx *ctx = req->ctx;
bool drain = req->flags & IOSQE_IO_DRAIN;
struct io_defer_entry *de;
+ io_ring_ctx_assert_locked(ctx);
+
de = kmalloc(sizeof(*de), GFP_KERNEL_ACCOUNT);
if (!de) {
io_req_defer_failed(req, -ENOMEM);
return;
}
@@ -1960,23 +1976,24 @@ void io_wq_submit_work(struct io_wq_work *work)
}
inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
unsigned int issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_rsrc_node *node;
struct file *file = NULL;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
node = io_rsrc_node_lookup(&ctx->file_table.data, fd);
if (node) {
node->refs++;
req->file_node = node;
req->flags |= io_slot_flags(node);
file = io_slot_file(node);
}
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return file;
}
struct file *io_file_get_normal(struct io_kiocb *req, int fd)
{
@@ -2004,12 +2021,13 @@ static int io_req_sqe_copy(struct io_kiocb *req, unsigned int issue_flags)
def->sqe_copy(req);
return 0;
}
static void io_queue_async(struct io_kiocb *req, unsigned int issue_flags, int ret)
- __must_hold(&req->ctx->uring_lock)
{
+ io_ring_ctx_assert_locked(req->ctx);
+
if (ret != -EAGAIN || (req->flags & REQ_F_NOWAIT)) {
fail:
io_req_defer_failed(req, ret);
return;
}
@@ -2029,16 +2047,17 @@ static void io_queue_async(struct io_kiocb *req, unsigned int issue_flags, int r
break;
}
}
static inline void io_queue_sqe(struct io_kiocb *req, unsigned int extra_flags)
- __must_hold(&req->ctx->uring_lock)
{
unsigned int issue_flags = IO_URING_F_NONBLOCK |
IO_URING_F_COMPLETE_DEFER | extra_flags;
int ret;
+ io_ring_ctx_assert_locked(req->ctx);
+
ret = io_issue_sqe(req, issue_flags);
/*
* We async punt it if the file wasn't marked NOWAIT, or if the file
* doesn't support non-blocking read/write attempts
@@ -2046,12 +2065,13 @@ static inline void io_queue_sqe(struct io_kiocb *req, unsigned int extra_flags)
if (unlikely(ret))
io_queue_async(req, issue_flags, ret);
}
static void io_queue_sqe_fallback(struct io_kiocb *req)
- __must_hold(&req->ctx->uring_lock)
{
+ io_ring_ctx_assert_locked(req->ctx);
+
if (unlikely(req->flags & REQ_F_FAIL)) {
/*
* We don't submit, fail them all, for that replace hardlinks
* with normal links. Extra REQ_F_LINK is tolerated.
*/
@@ -2116,17 +2136,18 @@ static __cold int io_init_fail_req(struct io_kiocb *req, int err)
return err;
}
static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
const struct io_uring_sqe *sqe, unsigned int *left)
- __must_hold(&ctx->uring_lock)
{
const struct io_issue_def *def;
unsigned int sqe_flags;
int personality;
u8 opcode;
+ io_ring_ctx_assert_locked(ctx);
+
req->ctx = ctx;
req->opcode = opcode = READ_ONCE(sqe->opcode);
/* same numerical values with corresponding REQ_F_*, safe to copy */
sqe_flags = READ_ONCE(sqe->flags);
req->flags = (__force io_req_flags_t) sqe_flags;
@@ -2269,15 +2290,16 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe,
return 0;
}
static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
const struct io_uring_sqe *sqe, unsigned int *left)
- __must_hold(&ctx->uring_lock)
{
struct io_submit_link *link = &ctx->submit_state.link;
int ret;
+ io_ring_ctx_assert_locked(ctx);
+
ret = io_init_req(ctx, req, sqe, left);
if (unlikely(ret))
return io_submit_fail_init(sqe, req, ret);
trace_io_uring_submit_req(req);
@@ -2398,16 +2420,17 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
*sqe = &ctx->sq_sqes[head];
return true;
}
int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
- __must_hold(&ctx->uring_lock)
{
unsigned int entries = io_sqring_entries(ctx);
unsigned int left;
int ret;
+ io_ring_ctx_assert_locked(ctx);
+
entries = min(nr, entries);
if (unlikely(!entries))
return 0;
ret = left = entries;
@@ -2830,28 +2853,33 @@ static __cold void __io_req_caches_free(struct io_ring_ctx *ctx)
}
}
static __cold void io_req_caches_free(struct io_ring_ctx *ctx)
{
- guard(mutex)(&ctx->uring_lock);
+ struct io_ring_ctx_lock_state lock_state;
+
+ io_ring_ctx_lock(ctx, &lock_state);
__io_req_caches_free(ctx);
+ io_ring_ctx_unlock(ctx, &lock_state);
}
static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
{
+ struct io_ring_ctx_lock_state lock_state;
+
io_sq_thread_finish(ctx);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
io_sqe_buffers_unregister(ctx);
io_sqe_files_unregister(ctx);
io_unregister_zcrx_ifqs(ctx);
- io_cqring_overflow_kill(ctx);
+ io_cqring_overflow_kill(ctx, &lock_state);
io_eventfd_unregister(ctx);
io_free_alloc_caches(ctx);
io_destroy_buffers(ctx);
io_free_region(ctx->user, &ctx->param_region);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
if (ctx->sq_creds)
put_cred(ctx->sq_creds);
WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list));
@@ -2883,14 +2911,15 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
static __cold void io_activate_pollwq_cb(struct callback_head *cb)
{
struct io_ring_ctx *ctx = container_of(cb, struct io_ring_ctx,
poll_wq_task_work);
+ struct io_ring_ctx_lock_state lock_state;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
ctx->poll_activated = true;
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
/*
* Wake ups for some events between start of polling and activation
* might've been lost due to loose synchronisation.
*/
@@ -2980,10 +3009,11 @@ static __cold void io_tctx_exit_cb(struct callback_head *cb)
}
static __cold void io_ring_exit_work(struct work_struct *work)
{
struct io_ring_ctx *ctx = container_of(work, struct io_ring_ctx, exit_work);
+ struct io_ring_ctx_lock_state lock_state;
unsigned long timeout = jiffies + HZ * 60 * 5;
unsigned long interval = HZ / 20;
struct io_tctx_exit exit;
struct io_tctx_node *node;
int ret;
@@ -2994,13 +3024,13 @@ static __cold void io_ring_exit_work(struct work_struct *work)
* we're waiting for refs to drop. We need to reap these manually,
* as nobody else will be looking for them.
*/
do {
if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
- mutex_lock(&ctx->uring_lock);
- io_cqring_overflow_kill(ctx);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
+ io_cqring_overflow_kill(ctx, &lock_state);
+ io_ring_ctx_unlock(ctx, &lock_state);
}
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
io_move_task_work_from_local(ctx);
@@ -3041,11 +3071,11 @@ static __cold void io_ring_exit_work(struct work_struct *work)
init_completion(&exit.completion);
init_task_work(&exit.task_work, io_tctx_exit_cb);
exit.ctx = ctx;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
while (!list_empty(&ctx->tctx_list)) {
WARN_ON_ONCE(time_after(jiffies, timeout));
node = list_first_entry(&ctx->tctx_list, struct io_tctx_node,
ctx_node);
@@ -3053,20 +3083,20 @@ static __cold void io_ring_exit_work(struct work_struct *work)
list_rotate_left(&ctx->tctx_list);
ret = task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
if (WARN_ON_ONCE(ret))
continue;
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
/*
* See comment above for
* wait_for_completion_interruptible_timeout() on why this
* wait is marked as interruptible.
*/
wait_for_completion_interruptible(&exit.completion);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
}
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
spin_lock(&ctx->completion_lock);
spin_unlock(&ctx->completion_lock);
/* pairs with RCU read section in io_req_local_work_add() */
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
@@ -3075,18 +3105,19 @@ static __cold void io_ring_exit_work(struct work_struct *work)
io_ring_ctx_free(ctx);
}
static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
{
+ struct io_ring_ctx_lock_state lock_state;
unsigned long index;
struct creds *creds;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
percpu_ref_kill(&ctx->refs);
xa_for_each(&ctx->personalities, index, creds)
io_unregister_personality(ctx, index);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
flush_delayed_work(&ctx->fallback_work);
INIT_WORK(&ctx->exit_work, io_ring_exit_work);
/*
@@ -3217,10 +3248,11 @@ static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned flags,
SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
u32, min_complete, u32, flags, const void __user *, argp,
size_t, argsz)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx;
struct file *file;
long ret;
if (unlikely(flags & ~IORING_ENTER_FLAGS))
@@ -3273,14 +3305,14 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
} else if (to_submit) {
ret = io_uring_add_tctx_node(ctx);
if (unlikely(ret))
goto out;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
ret = io_submit_sqes(ctx, to_submit);
if (ret != to_submit) {
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
goto out;
}
if (flags & IORING_ENTER_GETEVENTS) {
if (ctx->syscall_iopoll)
goto iopoll_locked;
@@ -3289,11 +3321,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
* it should handle ownership problems if any.
*/
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
(void)io_run_local_work_locked(ctx, min_complete);
}
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
}
if (flags & IORING_ENTER_GETEVENTS) {
int ret2;
@@ -3302,16 +3334,17 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
* We disallow the app entering submit/complete with
* polling, but we still need to lock the ring to
* prevent racing with polled issue that got punted to
* a workqueue.
*/
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
iopoll_locked:
ret2 = io_validate_ext_arg(ctx, flags, argp, argsz);
if (likely(!ret2))
- ret2 = io_iopoll_check(ctx, min_complete);
- mutex_unlock(&ctx->uring_lock);
+ ret2 = io_iopoll_check(ctx, min_complete,
+ &lock_state);
+ io_ring_ctx_unlock(ctx, &lock_state);
} else {
struct ext_arg ext_arg = { .argsz = argsz };
ret2 = io_get_ext_arg(ctx, flags, argp, &ext_arg);
if (likely(!ret2))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index a790c16854d3..57c3eef26a88 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -195,20 +195,64 @@ void io_queue_next(struct io_kiocb *req);
void io_task_refs_refill(struct io_uring_task *tctx);
bool __io_alloc_req_refill(struct io_ring_ctx *ctx);
void io_activate_pollwq(struct io_ring_ctx *ctx);
+struct io_ring_ctx_lock_state {
+};
+
+/* Acquire the ctx uring lock with the given nesting level */
+static inline void io_ring_ctx_lock_nested(struct io_ring_ctx *ctx,
+ unsigned int subclass,
+ struct io_ring_ctx_lock_state *state)
+{
+ mutex_lock_nested(&ctx->uring_lock, subclass);
+}
+
+/* Acquire the ctx uring lock */
+static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx,
+ struct io_ring_ctx_lock_state *state)
+{
+ io_ring_ctx_lock_nested(ctx, 0, state);
+}
+
+/* Attempt to acquire the ctx uring lock without blocking */
+static inline bool io_ring_ctx_trylock(struct io_ring_ctx *ctx,
+ struct io_ring_ctx_lock_state *state)
+{
+ return mutex_trylock(&ctx->uring_lock);
+}
+
+/* Release the ctx uring lock */
+static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx,
+ struct io_ring_ctx_lock_state *state)
+{
+ mutex_unlock(&ctx->uring_lock);
+}
+
+/* Return (if CONFIG_LOCKDEP) whether the ctx uring lock is held */
+static inline bool io_ring_ctx_lock_held(const struct io_ring_ctx *ctx)
+{
+ return lockdep_is_held(&ctx->uring_lock);
+}
+
+/* Assert (if CONFIG_LOCKDEP) that the ctx uring lock is held */
+static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx)
+{
+ lockdep_assert_held(&ctx->uring_lock);
+}
+
static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
{
#if defined(CONFIG_PROVE_LOCKING)
lockdep_assert(in_task());
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
if (ctx->flags & IORING_SETUP_IOPOLL) {
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
} else if (!ctx->task_complete) {
lockdep_assert_held(&ctx->completion_lock);
} else if (ctx->submitter_task) {
/*
* ->submitter_task may be NULL and we can still post a CQE,
@@ -373,30 +417,32 @@ static inline void io_put_file(struct io_kiocb *req)
{
if (!(req->flags & REQ_F_FIXED_FILE) && req->file)
fput(req->file);
}
-static inline void io_ring_submit_unlock(struct io_ring_ctx *ctx,
- unsigned issue_flags)
+static inline void
+io_ring_submit_unlock(struct io_ring_ctx *ctx, unsigned issue_flags,
+ struct io_ring_ctx_lock_state *lock_state)
{
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
}
-static inline void io_ring_submit_lock(struct io_ring_ctx *ctx,
- unsigned issue_flags)
+static inline void
+io_ring_submit_lock(struct io_ring_ctx *ctx, unsigned issue_flags,
+ struct io_ring_ctx_lock_state *lock_state)
{
/*
- * "Normal" inline submissions always hold the uring_lock, since we
+ * "Normal" inline submissions always hold the ctx uring lock, since we
* grab it from the system call. Same is true for the SQPOLL offload.
* The only exception is when we've detached the request and issue it
* from an async worker thread, grab the lock for that case.
*/
if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
- mutex_lock(&ctx->uring_lock);
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
+ io_ring_ctx_assert_locked(ctx);
}
static inline void io_commit_cqring(struct io_ring_ctx *ctx)
{
/* order cqe stores with ring update */
@@ -504,24 +550,23 @@ static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
return task_work_pending(current) || io_local_work_pending(ctx);
}
static inline void io_tw_lock(struct io_ring_ctx *ctx, io_tw_token_t tw)
{
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
}
/*
* Don't complete immediately but use deferred completion infrastructure.
- * Protected by ->uring_lock and can only be used either with
+ * Protected by ctx uring lock and can only be used either with
* IO_URING_F_COMPLETE_DEFER or inside a tw handler holding the mutex.
*/
static inline void io_req_complete_defer(struct io_kiocb *req)
- __must_hold(&req->ctx->uring_lock)
{
struct io_submit_state *state = &req->ctx->submit_state;
- lockdep_assert_held(&req->ctx->uring_lock);
+ io_ring_ctx_assert_locked(req->ctx);
wq_list_add_tail(&req->comp_list, &state->compl_reqs);
}
static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 796d131107dd..0fb9b22171d4 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -72,22 +72,22 @@ bool io_kbuf_commit(struct io_kiocb *req,
}
static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx,
unsigned int bgid)
{
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
return xa_load(&ctx->io_bl_xa, bgid);
}
static int io_buffer_add_list(struct io_ring_ctx *ctx,
struct io_buffer_list *bl, unsigned int bgid)
{
/*
* Store buffer group ID and finally mark the list as visible.
* The normal lookup doesn't care about the visibility as we're
- * always under the ->uring_lock, but lookups from mmap do.
+ * always under the ctx uring lock, but lookups from mmap do.
*/
bl->bgid = bgid;
guard(mutex)(&ctx->mmap_lock);
return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
}
@@ -101,23 +101,24 @@ void io_kbuf_drop_legacy(struct io_kiocb *req)
req->kbuf = NULL;
}
bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_buffer_list *bl;
struct io_buffer *buf;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
buf = req->kbuf;
bl = io_buffer_get_list(ctx, buf->bgid);
list_add(&buf->list, &bl->buf_list);
bl->nbufs++;
req->flags &= ~REQ_F_BUFFER_SELECTED;
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return true;
}
static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
struct io_buffer_list *bl)
@@ -210,24 +211,25 @@ static struct io_br_sel io_ring_buffer_select(struct io_kiocb *req, size_t *len,
}
struct io_br_sel io_buffer_select(struct io_kiocb *req, size_t *len,
unsigned buf_group, unsigned int issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_br_sel sel = { };
struct io_buffer_list *bl;
- io_ring_submit_lock(req->ctx, issue_flags);
+ io_ring_submit_lock(req->ctx, issue_flags, &lock_state);
bl = io_buffer_get_list(ctx, buf_group);
if (likely(bl)) {
if (bl->flags & IOBL_BUF_RING)
sel = io_ring_buffer_select(req, len, bl, issue_flags);
else
sel.addr = io_provided_buffer_select(req, len, bl);
}
- io_ring_submit_unlock(req->ctx, issue_flags);
+ io_ring_submit_unlock(req->ctx, issue_flags, &lock_state);
return sel;
}
/* cap it at a reasonable 256, will be one page even for 4K */
#define PEEK_MAX_IMPORT 256
@@ -315,14 +317,15 @@ static int io_ring_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg,
}
int io_buffers_select(struct io_kiocb *req, struct buf_sel_arg *arg,
struct io_br_sel *sel, unsigned int issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
int ret = -ENOENT;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
sel->buf_list = io_buffer_get_list(ctx, arg->buf_group);
if (unlikely(!sel->buf_list))
goto out_unlock;
if (sel->buf_list->flags & IOBL_BUF_RING) {
@@ -342,11 +345,11 @@ int io_buffers_select(struct io_kiocb *req, struct buf_sel_arg *arg,
ret = io_provided_buffers_select(req, &arg->out_len, sel->buf_list, arg->iovs);
}
out_unlock:
if (issue_flags & IO_URING_F_UNLOCKED) {
sel->buf_list = NULL;
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
}
return ret;
}
int io_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg,
@@ -354,11 +357,11 @@ int io_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg,
{
struct io_ring_ctx *ctx = req->ctx;
struct io_buffer_list *bl;
int ret;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
bl = io_buffer_get_list(ctx, arg->buf_group);
if (unlikely(!bl))
return -ENOENT;
@@ -410,11 +413,11 @@ static int io_remove_buffers_legacy(struct io_ring_ctx *ctx,
{
unsigned long i = 0;
struct io_buffer *nxt;
/* protects io_buffers_cache */
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
WARN_ON_ONCE(bl->flags & IOBL_BUF_RING);
for (i = 0; i < nbufs && !list_empty(&bl->buf_list); i++) {
nxt = list_first_entry(&bl->buf_list, struct io_buffer, list);
list_del(&nxt->list);
@@ -579,18 +582,19 @@ static int __io_manage_buffers_legacy(struct io_kiocb *req,
}
int io_manage_buffers_legacy(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_provide_buf *p = io_kiocb_to_cmd(req, struct io_provide_buf);
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_buffer_list *bl;
int ret;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
bl = io_buffer_get_list(ctx, p->bgid);
ret = __io_manage_buffers_legacy(req, bl);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
if (ret < 0)
req_set_fail(req);
io_req_set_res(req, ret, 0);
return IOU_COMPLETE;
@@ -604,11 +608,11 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
struct io_uring_buf_ring *br;
unsigned long mmap_offset;
unsigned long ring_size;
int ret;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
if (copy_from_user(®, arg, sizeof(reg)))
return -EFAULT;
if (!mem_is_zero(reg.resv, sizeof(reg.resv)))
return -EINVAL;
@@ -680,11 +684,11 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
{
struct io_uring_buf_reg reg;
struct io_buffer_list *bl;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
if (copy_from_user(®, arg, sizeof(reg)))
return -EFAULT;
if (!mem_is_zero(reg.resv, sizeof(reg.resv)) || reg.flags)
return -EINVAL;
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index a39d9e518905..080285686a05 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -35,11 +35,11 @@ static inline void io_region_publish(struct io_ring_ctx *ctx,
struct io_mapped_region *src_region,
struct io_mapped_region *dst_region)
{
/*
* Once published mmap can find it without holding only the ->mmap_lock
- * and not ->uring_lock.
+ * and not the ctx uring lock.
*/
guard(mutex)(&ctx->mmap_lock);
*dst_region = *src_region;
}
diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index c48588e06bfb..47c7cc56782d 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -30,29 +30,31 @@ struct io_msg {
u32 cqe_flags;
};
u32 flags;
};
-static void io_double_unlock_ctx(struct io_ring_ctx *octx)
+static void io_double_unlock_ctx(struct io_ring_ctx *octx,
+ struct io_ring_ctx_lock_state *lock_state)
{
- mutex_unlock(&octx->uring_lock);
+ io_ring_ctx_unlock(octx, lock_state);
}
static int io_lock_external_ctx(struct io_ring_ctx *octx,
- unsigned int issue_flags)
+ unsigned int issue_flags,
+ struct io_ring_ctx_lock_state *lock_state)
{
/*
* To ensure proper ordering between the two ctxs, we can only
* attempt a trylock on the target. If that fails and we already have
* the source ctx lock, punt to io-wq.
*/
if (!(issue_flags & IO_URING_F_UNLOCKED)) {
- if (!mutex_trylock(&octx->uring_lock))
+ if (!io_ring_ctx_trylock(octx, lock_state))
return -EAGAIN;
return 0;
}
- mutex_lock(&octx->uring_lock);
+ io_ring_ctx_lock(octx, lock_state);
return 0;
}
void io_msg_ring_cleanup(struct io_kiocb *req)
{
@@ -116,10 +118,11 @@ static int io_msg_data_remote(struct io_ring_ctx *target_ctx,
}
static int __io_msg_ring_data(struct io_ring_ctx *target_ctx,
struct io_msg *msg, unsigned int issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
u32 flags = 0;
int ret;
if (msg->src_fd || msg->flags & ~IORING_MSG_RING_FLAGS_PASS)
return -EINVAL;
@@ -134,17 +137,18 @@ static int __io_msg_ring_data(struct io_ring_ctx *target_ctx,
if (msg->flags & IORING_MSG_RING_FLAGS_PASS)
flags = msg->cqe_flags;
ret = -EOVERFLOW;
if (target_ctx->flags & IORING_SETUP_IOPOLL) {
- if (unlikely(io_lock_external_ctx(target_ctx, issue_flags)))
+ if (unlikely(io_lock_external_ctx(target_ctx, issue_flags,
+ &lock_state)))
return -EAGAIN;
}
if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
ret = 0;
if (target_ctx->flags & IORING_SETUP_IOPOLL)
- io_double_unlock_ctx(target_ctx);
+ io_double_unlock_ctx(target_ctx, &lock_state);
return ret;
}
static int io_msg_ring_data(struct io_kiocb *req, unsigned int issue_flags)
{
@@ -155,35 +159,38 @@ static int io_msg_ring_data(struct io_kiocb *req, unsigned int issue_flags)
}
static int io_msg_grab_file(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_rsrc_node *node;
int ret = -EBADF;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
node = io_rsrc_node_lookup(&ctx->file_table.data, msg->src_fd);
if (node) {
msg->src_file = io_slot_file(node);
if (msg->src_file)
get_file(msg->src_file);
req->flags |= REQ_F_NEED_CLEANUP;
ret = 0;
}
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return ret;
}
static int io_msg_install_complete(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_ring_ctx *target_ctx = req->file->private_data;
struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
+ struct io_ring_ctx_lock_state lock_state;
struct file *src_file = msg->src_file;
int ret;
- if (unlikely(io_lock_external_ctx(target_ctx, issue_flags)))
+ if (unlikely(io_lock_external_ctx(target_ctx, issue_flags,
+ &lock_state)))
return -EAGAIN;
ret = __io_fixed_fd_install(target_ctx, src_file, msg->dst_fd);
if (ret < 0)
goto out_unlock;
@@ -200,11 +207,11 @@ static int io_msg_install_complete(struct io_kiocb *req, unsigned int issue_flag
* later IORING_OP_MSG_RING delivers the message.
*/
if (!io_post_aux_cqe(target_ctx, msg->user_data, ret, 0))
ret = -EOVERFLOW;
out_unlock:
- io_double_unlock_ctx(target_ctx);
+ io_double_unlock_ctx(target_ctx, &lock_state);
return ret;
}
static void io_msg_tw_fd_complete(struct callback_head *head)
{
diff --git a/io_uring/notif.c b/io_uring/notif.c
index f476775ba44b..8099b87af588 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -15,11 +15,11 @@ static void io_notif_tw_complete(struct io_tw_req tw_req, io_tw_token_t tw)
{
struct io_kiocb *notif = tw_req.req;
struct io_notif_data *nd = io_notif_to_data(notif);
struct io_ring_ctx *ctx = notif->ctx;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
do {
notif = cmd_to_io_kiocb(nd);
if (WARN_ON_ONCE(ctx != notif->ctx))
@@ -109,15 +109,16 @@ static const struct ubuf_info_ops io_ubuf_ops = {
.complete = io_tx_ubuf_complete,
.link_skb = io_link_skb,
};
struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
{
struct io_kiocb *notif;
struct io_notif_data *nd;
+ io_ring_ctx_assert_locked(ctx);
+
if (unlikely(!io_alloc_req(ctx, ¬if)))
return NULL;
notif->ctx = ctx;
notif->opcode = IORING_OP_NOP;
notif->flags = 0;
diff --git a/io_uring/notif.h b/io_uring/notif.h
index f3589cfef4a9..c33c9a1179c9 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -31,14 +31,15 @@ static inline struct io_notif_data *io_notif_to_data(struct io_kiocb *notif)
{
return io_kiocb_to_cmd(notif, struct io_notif_data);
}
static inline void io_notif_flush(struct io_kiocb *notif)
- __must_hold(¬if->ctx->uring_lock)
{
struct io_notif_data *nd = io_notif_to_data(notif);
+ io_ring_ctx_assert_locked(notif->ctx);
+
io_tx_ubuf_complete(NULL, &nd->uarg, true);
}
static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned len)
{
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index bfeb91b31bba..432a7a68eec1 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -189,15 +189,16 @@ void io_open_cleanup(struct io_kiocb *req)
}
int __io_close_fixed(struct io_ring_ctx *ctx, unsigned int issue_flags,
unsigned int offset)
{
+ struct io_ring_ctx_lock_state lock_state;
int ret;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
ret = io_fixed_fd_remove(ctx, offset);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return ret;
}
static inline int io_close_fixed(struct io_kiocb *req, unsigned int issue_flags)
@@ -333,18 +334,19 @@ int io_pipe_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
static int io_pipe_fixed(struct io_kiocb *req, struct file **files,
unsigned int issue_flags)
{
struct io_pipe *p = io_kiocb_to_cmd(req, struct io_pipe);
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
int ret, fds[2] = { -1, -1 };
int slot = p->file_slot;
if (p->flags & O_CLOEXEC)
return -EINVAL;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
ret = __io_fixed_fd_install(ctx, files[0], slot);
if (ret < 0)
goto err;
fds[0] = ret;
@@ -361,23 +363,23 @@ static int io_pipe_fixed(struct io_kiocb *req, struct file **files,
if (ret < 0)
goto err;
fds[1] = ret;
files[1] = NULL;
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
if (!copy_to_user(p->fds, fds, sizeof(fds)))
return 0;
ret = -EFAULT;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
err:
if (fds[0] != -1)
io_fixed_fd_remove(ctx, fds[0]);
if (fds[1] != -1)
io_fixed_fd_remove(ctx, fds[1]);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return ret;
}
static int io_pipe_fd(struct io_kiocb *req, struct file **files)
{
diff --git a/io_uring/poll.c b/io_uring/poll.c
index aac4b3b881fb..9e82315f977b 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -121,11 +121,11 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
static void io_poll_req_insert(struct io_kiocb *req)
{
struct io_hash_table *table = &req->ctx->cancel_table;
u32 index = hash_long(req->cqe.user_data, table->hash_bits);
- lockdep_assert_held(&req->ctx->uring_lock);
+ io_ring_ctx_assert_locked(req->ctx);
hlist_add_head(&req->hash_node, &table->hbs[index].list);
}
static void io_init_poll_iocb(struct io_poll *poll, __poll_t events)
@@ -339,11 +339,11 @@ void io_poll_task_func(struct io_tw_req tw_req, io_tw_token_t tw)
} else if (ret == IOU_POLL_REQUEUE) {
__io_poll_execute(req, 0);
return;
}
io_poll_remove_entries(req);
- /* task_work always has ->uring_lock held */
+ /* task_work always holds ctx uring lock */
hash_del(&req->hash_node);
if (req->opcode == IORING_OP_POLL_ADD) {
if (ret == IOU_POLL_DONE) {
struct io_poll *poll;
@@ -525,15 +525,16 @@ static bool io_poll_can_finish_inline(struct io_kiocb *req,
return pt->owning || io_poll_get_ownership(req);
}
static void io_poll_add_hash(struct io_kiocb *req, unsigned int issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
io_poll_req_insert(req);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
}
/*
* Returns 0 when it's handed over for polling. The caller owns the requests if
* it returns non-zero, but otherwise should not touch it. Negative values
@@ -728,11 +729,11 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tc
struct hlist_node *tmp;
struct io_kiocb *req;
bool found = false;
int i;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
for (i = 0; i < nr_buckets; i++) {
struct io_hash_bucket *hb = &ctx->cancel_table.hbs[i];
hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) {
@@ -814,15 +815,16 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
}
int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
unsigned issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
int ret;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
ret = __io_poll_cancel(ctx, cd);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return ret;
}
static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
unsigned int flags)
@@ -905,16 +907,17 @@ int io_poll_add(struct io_kiocb *req, unsigned int issue_flags)
}
int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_poll_update *poll_update = io_kiocb_to_cmd(req, struct io_poll_update);
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_cancel_data cd = { .ctx = ctx, .data = poll_update->old_user_data, };
struct io_kiocb *preq;
int ret2, ret = 0;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
preq = io_poll_find(ctx, true, &cd);
ret2 = io_poll_disarm(preq);
if (ret2) {
ret = ret2;
goto out;
@@ -950,11 +953,11 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
if (preq->cqe.res < 0)
req_set_fail(preq);
preq->io_task_work.func = io_req_task_complete;
io_req_task_work_add(preq);
out:
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
if (ret < 0) {
req_set_fail(req);
return ret;
}
/* complete update request, we're done with it */
diff --git a/io_uring/register.c b/io_uring/register.c
index 9e473c244041..da5030bcae2f 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -197,28 +197,30 @@ static int io_register_enable_rings(struct io_ring_ctx *ctx)
if (ctx->sq_data && wq_has_sleeper(&ctx->sq_data->wait))
wake_up(&ctx->sq_data->wait);
return 0;
}
-static __cold int __io_register_iowq_aff(struct io_ring_ctx *ctx,
- cpumask_var_t new_mask)
+static __cold int
+__io_register_iowq_aff(struct io_ring_ctx *ctx, cpumask_var_t new_mask,
+ struct io_ring_ctx_lock_state *lock_state)
{
int ret;
if (!(ctx->flags & IORING_SETUP_SQPOLL)) {
ret = io_wq_cpu_affinity(current->io_uring, new_mask);
} else {
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
ret = io_sqpoll_wq_cpu_affinity(ctx, new_mask);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
}
return ret;
}
-static __cold int io_register_iowq_aff(struct io_ring_ctx *ctx,
- void __user *arg, unsigned len)
+static __cold int
+io_register_iowq_aff(struct io_ring_ctx *ctx, void __user *arg, unsigned len,
+ struct io_ring_ctx_lock_state *lock_state)
{
cpumask_var_t new_mask;
int ret;
if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
@@ -240,30 +242,34 @@ static __cold int io_register_iowq_aff(struct io_ring_ctx *ctx,
if (ret) {
free_cpumask_var(new_mask);
return -EFAULT;
}
- ret = __io_register_iowq_aff(ctx, new_mask);
+ ret = __io_register_iowq_aff(ctx, new_mask, lock_state);
free_cpumask_var(new_mask);
return ret;
}
-static __cold int io_unregister_iowq_aff(struct io_ring_ctx *ctx)
+static __cold int
+io_unregister_iowq_aff(struct io_ring_ctx *ctx,
+ struct io_ring_ctx_lock_state *lock_state)
{
- return __io_register_iowq_aff(ctx, NULL);
+ return __io_register_iowq_aff(ctx, NULL, lock_state);
}
-static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
- void __user *arg)
- __must_hold(&ctx->uring_lock)
+static __cold int
+io_register_iowq_max_workers(struct io_ring_ctx *ctx, void __user *arg,
+ struct io_ring_ctx_lock_state *lock_state)
{
struct io_tctx_node *node;
struct io_uring_task *tctx = NULL;
struct io_sq_data *sqd = NULL;
__u32 new_count[2];
int i, ret;
+ io_ring_ctx_assert_locked(ctx);
+
if (copy_from_user(new_count, arg, sizeof(new_count)))
return -EFAULT;
for (i = 0; i < ARRAY_SIZE(new_count); i++)
if (new_count[i] > INT_MAX)
return -EINVAL;
@@ -272,18 +278,18 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
sqd = ctx->sq_data;
if (sqd) {
struct task_struct *tsk;
/*
- * Observe the correct sqd->lock -> ctx->uring_lock
- * ordering. Fine to drop uring_lock here, we hold
+ * Observe the correct sqd->lock -> ctx uring lock
+ * ordering. Fine to drop ctx uring lock here, we hold
* a ref to the ctx.
*/
refcount_inc(&sqd->refs);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
mutex_lock(&sqd->lock);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
tsk = sqpoll_task_locked(sqd);
if (tsk)
tctx = tsk->io_uring;
}
} else {
@@ -304,14 +310,14 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
} else {
memset(new_count, 0, sizeof(new_count));
}
if (sqd) {
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
mutex_unlock(&sqd->lock);
io_put_sq_data(sqd);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
}
if (copy_to_user(arg, new_count, sizeof(new_count)))
return -EFAULT;
@@ -331,14 +337,14 @@ static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
(void)io_wq_max_workers(tctx->io_wq, new_count);
}
return 0;
err:
if (sqd) {
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
mutex_unlock(&sqd->lock);
io_put_sq_data(sqd);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
}
return ret;
}
static int io_register_clock(struct io_ring_ctx *ctx,
@@ -394,11 +400,12 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
#define RESIZE_FLAGS (IORING_SETUP_CQSIZE | IORING_SETUP_CLAMP)
#define COPY_FLAGS (IORING_SETUP_NO_SQARRAY | IORING_SETUP_SQE128 | \
IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP | \
IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED)
-static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
+static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg,
+ struct io_ring_ctx_lock_state *lock_state)
{
struct io_ctx_config config;
struct io_uring_region_desc rd;
struct io_ring_ctx_rings o = { }, n = { }, *to_free = NULL;
unsigned i, tail, old_head;
@@ -468,13 +475,13 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
/*
* If using SQPOLL, park the thread
*/
if (ctx->sq_data) {
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
io_sq_thread_park(ctx->sq_data);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
}
/*
* We'll do the swap. Grab the ctx->mmap_lock, which will exclude
* any new mmap's on the ring fd. Clear out existing mappings to prevent
@@ -605,13 +612,12 @@ static int io_register_mem_region(struct io_ring_ctx *ctx, void __user *uarg)
io_region_publish(ctx, ®ion, &ctx->param_region);
return 0;
}
static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
- void __user *arg, unsigned nr_args)
- __releases(ctx->uring_lock)
- __acquires(ctx->uring_lock)
+ void __user *arg, unsigned nr_args,
+ struct io_ring_ctx_lock_state *lock_state)
{
int ret;
/*
* We don't quiesce the refs for register anymore and so it can't be
@@ -718,26 +724,26 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
case IORING_REGISTER_IOWQ_AFF:
ret = -EINVAL;
if (!arg || !nr_args)
break;
- ret = io_register_iowq_aff(ctx, arg, nr_args);
+ ret = io_register_iowq_aff(ctx, arg, nr_args, lock_state);
break;
case IORING_UNREGISTER_IOWQ_AFF:
ret = -EINVAL;
if (arg || nr_args)
break;
- ret = io_unregister_iowq_aff(ctx);
+ ret = io_unregister_iowq_aff(ctx, lock_state);
break;
case IORING_REGISTER_IOWQ_MAX_WORKERS:
ret = -EINVAL;
if (!arg || nr_args != 2)
break;
- ret = io_register_iowq_max_workers(ctx, arg);
+ ret = io_register_iowq_max_workers(ctx, arg, lock_state);
break;
case IORING_REGISTER_RING_FDS:
- ret = io_ringfd_register(ctx, arg, nr_args);
+ ret = io_ringfd_register(ctx, arg, nr_args, lock_state);
break;
case IORING_UNREGISTER_RING_FDS:
ret = io_ringfd_unregister(ctx, arg, nr_args);
break;
case IORING_REGISTER_PBUF_RING:
@@ -754,11 +760,11 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
case IORING_REGISTER_SYNC_CANCEL:
ret = -EINVAL;
if (!arg || nr_args != 1)
break;
- ret = io_sync_cancel(ctx, arg);
+ ret = io_sync_cancel(ctx, arg, lock_state);
break;
case IORING_REGISTER_FILE_ALLOC_RANGE:
ret = -EINVAL;
if (!arg || nr_args)
break;
@@ -790,11 +796,11 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
case IORING_REGISTER_CLONE_BUFFERS:
ret = -EINVAL;
if (!arg || nr_args != 1)
break;
- ret = io_register_clone_buffers(ctx, arg);
+ ret = io_register_clone_buffers(ctx, arg, lock_state);
break;
case IORING_REGISTER_ZCRX_IFQ:
ret = -EINVAL;
if (!arg || nr_args != 1)
break;
@@ -802,11 +808,11 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
break;
case IORING_REGISTER_RESIZE_RINGS:
ret = -EINVAL;
if (!arg || nr_args != 1)
break;
- ret = io_register_resize_rings(ctx, arg);
+ ret = io_register_resize_rings(ctx, arg, lock_state);
break;
case IORING_REGISTER_MEM_REGION:
ret = -EINVAL;
if (!arg || nr_args != 1)
break;
@@ -894,10 +900,11 @@ static int io_uring_register_blind(unsigned int opcode, void __user *arg,
}
SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
void __user *, arg, unsigned int, nr_args)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx;
long ret = -EBADF;
struct file *file;
bool use_registered_ring;
@@ -913,15 +920,15 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
file = io_uring_register_get_file(fd, use_registered_ring);
if (IS_ERR(file))
return PTR_ERR(file);
ctx = file->private_data;
- mutex_lock(&ctx->uring_lock);
- ret = __io_uring_register(ctx, opcode, arg, nr_args);
+ io_ring_ctx_lock(ctx, &lock_state);
+ ret = __io_uring_register(ctx, opcode, arg, nr_args, &lock_state);
trace_io_uring_register(ctx, opcode, ctx->file_table.data.nr,
ctx->buf_table.nr, ret);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
fput(file);
return ret;
}
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 41c89f5c616d..19ccfb1ee612 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -349,11 +349,11 @@ static int __io_register_rsrc_update(struct io_ring_ctx *ctx, unsigned type,
struct io_uring_rsrc_update2 *up,
unsigned nr_args)
{
__u32 tmp;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
if (check_add_overflow(up->offset, nr_args, &tmp))
return -EOVERFLOW;
switch (type) {
@@ -497,14 +497,16 @@ int io_files_update(struct io_kiocb *req, unsigned int issue_flags)
up2.resv2 = 0;
if (up->offset == IORING_FILE_INDEX_ALLOC) {
ret = io_files_update_with_index_alloc(req, issue_flags);
} else {
- io_ring_submit_lock(ctx, issue_flags);
+ struct io_ring_ctx_lock_state lock_state;
+
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
ret = __io_register_rsrc_update(ctx, IORING_RSRC_FILE,
&up2, up->nr_args);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
}
if (ret < 0)
req_set_fail(req);
io_req_set_res(req, ret, 0);
@@ -940,18 +942,19 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
void (*release)(void *), unsigned int index,
unsigned int issue_flags)
{
struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
struct io_rsrc_data *data = &ctx->buf_table;
+ struct io_ring_ctx_lock_state lock_state;
struct req_iterator rq_iter;
struct io_mapped_ubuf *imu;
struct io_rsrc_node *node;
struct bio_vec bv;
unsigned int nr_bvecs = 0;
int ret = 0;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
if (index >= data->nr) {
ret = -EINVAL;
goto unlock;
}
index = array_index_nospec(index, data->nr);
@@ -993,24 +996,25 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
imu->nr_bvecs = nr_bvecs;
node->buf = imu;
data->nodes[index] = node;
unlock:
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return ret;
}
EXPORT_SYMBOL_GPL(io_buffer_register_bvec);
int io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
unsigned int issue_flags)
{
struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
struct io_rsrc_data *data = &ctx->buf_table;
+ struct io_ring_ctx_lock_state lock_state;
struct io_rsrc_node *node;
int ret = 0;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
if (index >= data->nr) {
ret = -EINVAL;
goto unlock;
}
index = array_index_nospec(index, data->nr);
@@ -1026,11 +1030,11 @@ int io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
}
io_put_rsrc_node(ctx, node);
data->nodes[index] = NULL;
unlock:
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return ret;
}
EXPORT_SYMBOL_GPL(io_buffer_unregister_bvec);
static int validate_fixed_range(u64 buf_addr, size_t len,
@@ -1118,27 +1122,28 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
}
inline struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req,
unsigned issue_flags)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_rsrc_node *node;
if (req->flags & REQ_F_BUF_NODE)
return req->buf_node;
req->flags |= REQ_F_BUF_NODE;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
if (node) {
node->refs++;
req->buf_node = node;
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return node;
}
req->flags &= ~REQ_F_BUF_NODE;
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return NULL;
}
int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
u64 buf_addr, size_t len, int ddir,
@@ -1151,28 +1156,32 @@ int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
return -EFAULT;
return io_import_fixed(ddir, iter, node->buf, buf_addr, len);
}
/* Lock two rings at once. The rings must be different! */
-static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2)
+static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2,
+ struct io_ring_ctx_lock_state *lock_state1,
+ struct io_ring_ctx_lock_state *lock_state2)
{
- if (ctx1 > ctx2)
+ if (ctx1 > ctx2) {
swap(ctx1, ctx2);
- mutex_lock(&ctx1->uring_lock);
- mutex_lock_nested(&ctx2->uring_lock, SINGLE_DEPTH_NESTING);
+ swap(lock_state1, lock_state2);
+ }
+ io_ring_ctx_lock(ctx1, lock_state1);
+ io_ring_ctx_lock_nested(ctx2, SINGLE_DEPTH_NESTING, lock_state2);
}
/* Both rings are locked by the caller. */
static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx,
struct io_uring_clone_buffers *arg)
{
struct io_rsrc_data data;
int i, ret, off, nr;
unsigned int nbufs;
- lockdep_assert_held(&ctx->uring_lock);
- lockdep_assert_held(&src_ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
+ io_ring_ctx_assert_locked(src_ctx);
/*
* Accounting state is shared between the two rings; that only works if
* both rings are accounted towards the same counters.
*/
@@ -1272,12 +1281,14 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx
* is given in the src_fd to the current ring. This is identical to registering
* the buffers with ctx, except faster as mappings already exist.
*
* Since the memory is already accounted once, don't account it again.
*/
-int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
+int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg,
+ struct io_ring_ctx_lock_state *lock_state)
{
+ struct io_ring_ctx_lock_state lock_state2;
struct io_uring_clone_buffers buf;
struct io_ring_ctx *src_ctx;
bool registered_src;
struct file *file;
int ret;
@@ -1296,12 +1307,12 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
if (IS_ERR(file))
return PTR_ERR(file);
src_ctx = file->private_data;
if (src_ctx != ctx) {
- mutex_unlock(&ctx->uring_lock);
- lock_two_rings(ctx, src_ctx);
+ io_ring_ctx_unlock(ctx, lock_state);
+ lock_two_rings(ctx, src_ctx, lock_state, &lock_state2);
if (src_ctx->submitter_task &&
src_ctx->submitter_task != current) {
ret = -EEXIST;
goto out;
@@ -1310,11 +1321,11 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
ret = io_clone_buffers(ctx, src_ctx, &buf);
out:
if (src_ctx != ctx)
- mutex_unlock(&src_ctx->uring_lock);
+ io_ring_ctx_unlock(src_ctx, &lock_state2);
fput(file);
return ret;
}
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index d603f6a47f5e..388a0508ec59 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -2,10 +2,11 @@
#ifndef IOU_RSRC_H
#define IOU_RSRC_H
#include <linux/io_uring_types.h>
#include <linux/lockdep.h>
+#include "io_uring.h"
#define IO_VEC_CACHE_SOFT_CAP 256
enum {
IORING_RSRC_FILE = 0,
@@ -68,11 +69,12 @@ int io_import_reg_vec(int ddir, struct iov_iter *iter,
struct io_kiocb *req, struct iou_vec *vec,
unsigned nr_iovs, unsigned issue_flags);
int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
const struct iovec __user *uvec, size_t uvec_segs);
-int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
+int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg,
+ struct io_ring_ctx_lock_state *lock_state);
int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
unsigned int nr_args, u64 __user *tags);
int io_sqe_files_unregister(struct io_ring_ctx *ctx);
int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
@@ -97,11 +99,11 @@ static inline struct io_rsrc_node *io_rsrc_node_lookup(struct io_rsrc_data *data
return NULL;
}
static inline void io_put_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
{
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
if (!--node->refs)
io_free_rsrc_node(ctx, node);
}
static inline bool io_reset_rsrc_node(struct io_ring_ctx *ctx,
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 331af6bf4234..4688b210cff8 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -462,11 +462,11 @@ int io_read_mshot_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
void io_readv_writev_cleanup(struct io_kiocb *req)
{
struct io_async_rw *rw = req->async_data;
- lockdep_assert_held(&req->ctx->uring_lock);
+ io_ring_ctx_assert_locked(req->ctx);
io_vec_free(&rw->vec);
io_rw_recycle(req, 0);
}
static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
diff --git a/io_uring/splice.c b/io_uring/splice.c
index e81ebbb91925..567695c39091 100644
--- a/io_uring/splice.c
+++ b/io_uring/splice.c
@@ -58,26 +58,27 @@ void io_splice_cleanup(struct io_kiocb *req)
static struct file *io_splice_get_file(struct io_kiocb *req,
unsigned int issue_flags)
{
struct io_splice *sp = io_kiocb_to_cmd(req, struct io_splice);
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
struct io_rsrc_node *node;
struct file *file = NULL;
if (!(sp->flags & SPLICE_F_FD_IN_FIXED))
return io_file_get_normal(req, sp->splice_fd_in);
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
node = io_rsrc_node_lookup(&ctx->file_table.data, sp->splice_fd_in);
if (node) {
node->refs++;
sp->rsrc_node = node;
file = io_slot_file(node);
req->flags |= REQ_F_NEED_CLEANUP;
}
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return file;
}
int io_tee(struct io_kiocb *req, unsigned int issue_flags)
{
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 74c1a130cd87..0b4573b53cf3 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -211,29 +211,30 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, struct io_sq_data *sqd,
/* if we're handling multiple rings, cap submit size for fairness */
if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
if (to_submit || !wq_list_empty(&ctx->iopoll_list)) {
+ struct io_ring_ctx_lock_state lock_state;
const struct cred *creds = NULL;
io_sq_start_worktime(ist);
if (ctx->sq_creds != current_cred())
creds = override_creds(ctx->sq_creds);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
if (!wq_list_empty(&ctx->iopoll_list))
io_do_iopoll(ctx, true);
/*
* Don't submit if refs are dying, good for io_uring_register(),
* but also it is relied upon by io_ring_exit_work()
*/
if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) &&
!(ctx->flags & IORING_SETUP_R_DISABLED))
ret = io_submit_sqes(ctx, to_submit);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
wake_up(&ctx->sqo_sq_wait);
if (creds)
revert_creds(creds);
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index 5b66755579c0..add6134e934d 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -13,27 +13,28 @@
#include "tctx.h"
static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx,
struct task_struct *task)
{
+ struct io_ring_ctx_lock_state lock_state;
struct io_wq_hash *hash;
struct io_wq_data data;
unsigned int concurrency;
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
hash = ctx->hash_map;
if (!hash) {
hash = kzalloc(sizeof(*hash), GFP_KERNEL);
if (!hash) {
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
return ERR_PTR(-ENOMEM);
}
refcount_set(&hash->refs, 1);
init_waitqueue_head(&hash->wait);
ctx->hash_map = hash;
}
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
data.hash = hash;
data.task = task;
/* Do QD, or 4 * CPUS, whatever is smallest */
@@ -121,10 +122,12 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
if (ret)
return ret;
}
}
if (!xa_load(&tctx->xa, (unsigned long)ctx)) {
+ struct io_ring_ctx_lock_state lock_state;
+
node = kmalloc(sizeof(*node), GFP_KERNEL);
if (!node)
return -ENOMEM;
node->ctx = ctx;
node->task = current;
@@ -134,13 +137,13 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
if (ret) {
kfree(node);
return ret;
}
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, &lock_state);
list_add(&node->ctx_node, &ctx->tctx_list);
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, &lock_state);
}
return 0;
}
int __io_uring_add_tctx_node_from_submit(struct io_ring_ctx *ctx)
@@ -163,10 +166,11 @@ int __io_uring_add_tctx_node_from_submit(struct io_ring_ctx *ctx)
* Remove this io_uring_file -> task mapping.
*/
__cold void io_uring_del_tctx_node(unsigned long index)
{
struct io_uring_task *tctx = current->io_uring;
+ struct io_ring_ctx_lock_state lock_state;
struct io_tctx_node *node;
if (!tctx)
return;
node = xa_erase(&tctx->xa, index);
@@ -174,13 +178,13 @@ __cold void io_uring_del_tctx_node(unsigned long index)
return;
WARN_ON_ONCE(current != node->task);
WARN_ON_ONCE(list_empty(&node->ctx_node));
- mutex_lock(&node->ctx->uring_lock);
+ io_ring_ctx_lock(node->ctx, &lock_state);
list_del(&node->ctx_node);
- mutex_unlock(&node->ctx->uring_lock);
+ io_ring_ctx_unlock(node->ctx, &lock_state);
if (tctx->last == node->ctx)
tctx->last = NULL;
kfree(node);
}
@@ -196,11 +200,11 @@ __cold void io_uring_clean_tctx(struct io_uring_task *tctx)
cond_resched();
}
if (wq) {
/*
* Must be after io_uring_del_tctx_node() (removes nodes under
- * uring_lock) to avoid race with io_uring_try_cancel_iowq().
+ * ctx uring lock) to avoid race with io_uring_try_cancel_iowq()
*/
io_wq_put_and_exit(wq);
tctx->io_wq = NULL;
}
}
@@ -259,23 +263,24 @@ static int io_ring_add_registered_fd(struct io_uring_task *tctx, int fd,
* index. If no index is desired, application may set ->offset == -1U
* and we'll find an available index. Returns number of entries
* successfully processed, or < 0 on error if none were processed.
*/
int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
- unsigned nr_args)
+ unsigned nr_args,
+ struct io_ring_ctx_lock_state *lock_state)
{
struct io_uring_rsrc_update __user *arg = __arg;
struct io_uring_rsrc_update reg;
struct io_uring_task *tctx;
int ret, i;
if (!nr_args || nr_args > IO_RINGFD_REG_MAX)
return -EINVAL;
- mutex_unlock(&ctx->uring_lock);
+ io_ring_ctx_unlock(ctx, lock_state);
ret = __io_uring_add_tctx_node(ctx);
- mutex_lock(&ctx->uring_lock);
+ io_ring_ctx_lock(ctx, lock_state);
if (ret)
return ret;
tctx = current->io_uring;
for (i = 0; i < nr_args; i++) {
diff --git a/io_uring/tctx.h b/io_uring/tctx.h
index 608e96de70a2..f35dbf19bb80 100644
--- a/io_uring/tctx.h
+++ b/io_uring/tctx.h
@@ -1,7 +1,9 @@
// SPDX-License-Identifier: GPL-2.0
+#include "io_uring.h"
+
struct io_tctx_node {
struct list_head ctx_node;
struct task_struct *task;
struct io_ring_ctx *ctx;
};
@@ -13,11 +15,12 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx);
int __io_uring_add_tctx_node_from_submit(struct io_ring_ctx *ctx);
void io_uring_clean_tctx(struct io_uring_task *tctx);
void io_uring_unreg_ringfd(void);
int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
- unsigned nr_args);
+ unsigned nr_args,
+ struct io_ring_ctx_lock_state *lock_state);
int io_ringfd_unregister(struct io_ring_ctx *ctx, void __user *__arg,
unsigned nr_args);
/*
* Note that this task has used io_uring. We use it for cancelation purposes.
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 197474911f04..a8a128a3f0a2 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -51,11 +51,11 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
{
struct hlist_node *tmp;
struct io_kiocb *req;
bool ret = false;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
hlist_for_each_entry_safe(req, tmp, &ctx->cancelable_uring_cmd,
hash_node) {
struct io_uring_cmd *cmd = io_kiocb_to_cmd(req,
struct io_uring_cmd);
@@ -76,19 +76,20 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *ctx,
static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
unsigned int issue_flags)
{
struct io_kiocb *req = cmd_to_io_kiocb(cmd);
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
return;
cmd->flags &= ~IORING_URING_CMD_CANCELABLE;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
hlist_del(&req->hash_node);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
}
/*
* Mark this command as concelable, then io_uring_try_cancel_uring_cmd()
* will try to cancel this issued command by sending ->uring_cmd() with
@@ -103,14 +104,16 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
{
struct io_kiocb *req = cmd_to_io_kiocb(cmd);
struct io_ring_ctx *ctx = req->ctx;
if (!(cmd->flags & IORING_URING_CMD_CANCELABLE)) {
+ struct io_ring_ctx_lock_state lock_state;
+
cmd->flags |= IORING_URING_CMD_CANCELABLE;
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
hlist_add_head(&req->hash_node, &ctx->cancelable_uring_cmd);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
}
}
EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
diff --git a/io_uring/waitid.c b/io_uring/waitid.c
index 2d4cbd47c67c..a69eb1b30b89 100644
--- a/io_uring/waitid.c
+++ b/io_uring/waitid.c
@@ -130,11 +130,11 @@ static void io_waitid_complete(struct io_kiocb *req, int ret)
struct io_waitid *iw = io_kiocb_to_cmd(req, struct io_waitid);
/* anyone completing better be holding a reference */
WARN_ON_ONCE(!(atomic_read(&iw->refs) & IO_WAITID_REF_MASK));
- lockdep_assert_held(&req->ctx->uring_lock);
+ io_ring_ctx_assert_locked(req->ctx);
hlist_del_init(&req->hash_node);
io_waitid_remove_wq(req);
ret = io_waitid_finish(req, ret);
@@ -145,11 +145,11 @@ static void io_waitid_complete(struct io_kiocb *req, int ret)
static bool __io_waitid_cancel(struct io_kiocb *req)
{
struct io_waitid *iw = io_kiocb_to_cmd(req, struct io_waitid);
- lockdep_assert_held(&req->ctx->uring_lock);
+ io_ring_ctx_assert_locked(req->ctx);
/*
* Mark us canceled regardless of ownership. This will prevent a
* potential retry from a spurious wakeup.
*/
@@ -280,10 +280,11 @@ int io_waitid_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
int io_waitid(struct io_kiocb *req, unsigned int issue_flags)
{
struct io_waitid *iw = io_kiocb_to_cmd(req, struct io_waitid);
struct io_waitid_async *iwa = req->async_data;
+ struct io_ring_ctx_lock_state lock_state;
struct io_ring_ctx *ctx = req->ctx;
int ret;
ret = kernel_waitid_prepare(&iwa->wo, iw->which, iw->upid, &iw->info,
iw->options, NULL);
@@ -301,11 +302,11 @@ int io_waitid(struct io_kiocb *req, unsigned int issue_flags)
* Cancel must hold the ctx lock, so there's no risk of cancelation
* finding us until a) we remain on the list, and b) the lock is
* dropped. We only need to worry about racing with the wakeup
* callback.
*/
- io_ring_submit_lock(ctx, issue_flags);
+ io_ring_submit_lock(ctx, issue_flags, &lock_state);
/*
* iw->head is valid under the ring lock, and as long as the request
* is on the waitid_list where cancelations may find it.
*/
@@ -321,27 +322,27 @@ int io_waitid(struct io_kiocb *req, unsigned int issue_flags)
/*
* Nobody else grabbed a reference, it'll complete when we get
* a waitqueue callback, or if someone cancels it.
*/
if (!io_waitid_drop_issue_ref(req)) {
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return IOU_ISSUE_SKIP_COMPLETE;
}
/*
* Wakeup triggered, racing with us. It was prevented from
* completing because of that, queue up the tw to do that.
*/
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
return IOU_ISSUE_SKIP_COMPLETE;
}
hlist_del_init(&req->hash_node);
io_waitid_remove_wq(req);
ret = io_waitid_finish(req, ret);
- io_ring_submit_unlock(ctx, issue_flags);
+ io_ring_submit_unlock(ctx, issue_flags, &lock_state);
done:
if (ret < 0)
req_set_fail(req);
io_req_set_res(req, ret, 0);
return IOU_COMPLETE;
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index b99cf2c6670a..f2ed49bbad63 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -851,11 +851,11 @@ static struct net_iov *__io_zcrx_get_free_niov(struct io_zcrx_area *area)
void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
{
struct io_zcrx_ifq *ifq;
- lockdep_assert_held(&ctx->uring_lock);
+ io_ring_ctx_assert_locked(ctx);
while (1) {
scoped_guard(mutex, &ctx->mmap_lock) {
unsigned long id = 0;
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v6 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-12-18 2:44 [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
` (4 preceding siblings ...)
2025-12-18 2:44 ` [PATCH v6 5/6] io_uring: factor out uring_lock helpers Caleb Sander Mateos
@ 2025-12-18 2:44 ` Caleb Sander Mateos
2025-12-18 8:01 ` [syzbot ci] " syzbot ci
6 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-18 2:44 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-kernel
Cc: Joanne Koong, Caleb Sander Mateos, syzbot
io_ring_ctx's mutex uring_lock can be quite expensive in high-IOPS
workloads. Even when only one thread pinned to a single CPU is accessing
the io_ring_ctx, the atomic CASes required to lock and unlock the mutex
are very hot instructions. The mutex's primary purpose is to prevent
concurrent io_uring system calls on the same io_ring_ctx. However, there
is already a flag IORING_SETUP_SINGLE_ISSUER that promises only one
task will make io_uring_enter() and io_uring_register() system calls on
the io_ring_ctx once it's enabled.
So if the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the
uring_lock mutex_lock() and mutex_unlock() on the submitter_task. On
other tasks acquiring the ctx uring lock, use a task work item to
suspend the submitter_task for the critical section.
If the io_ring_ctx is IORING_SETUP_R_DISABLED (possible during
io_uring_setup(), io_uring_register(), or io_uring exit), submitter_task
may be set concurrently, so acquire the uring_lock before checking it.
If submitter_task isn't set yet, the uring_lock suffices to provide
mutual exclusion. If task work can't be queued because submitter_task
has exited, also use the uring_lock for mutual exclusion.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Tested-by: syzbot@syzkaller.appspotmail.com
---
io_uring/io_uring.c | 12 +++++
io_uring/io_uring.h | 118 ++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 127 insertions(+), 3 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 237663382a5e..38390c8c54e0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -363,10 +363,22 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
xa_destroy(&ctx->io_bl_xa);
kfree(ctx);
return NULL;
}
+void io_ring_suspend_work(struct callback_head *cb_head)
+{
+ struct io_ring_suspend_work *suspend_work =
+ container_of(cb_head, struct io_ring_suspend_work, cb_head);
+ DECLARE_COMPLETION_ONSTACK(suspend_end);
+
+ *suspend_work->suspend_end = &suspend_end;
+ complete(&suspend_work->suspend_start);
+
+ wait_for_completion(&suspend_end);
+}
+
static void io_clean_op(struct io_kiocb *req)
{
if (unlikely(req->flags & REQ_F_BUFFER_SELECTED))
io_kbuf_drop_legacy(req);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 57c3eef26a88..c2e39ca55569 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -1,8 +1,9 @@
#ifndef IOU_CORE_H
#define IOU_CORE_H
+#include <linux/completion.h>
#include <linux/errno.h>
#include <linux/lockdep.h>
#include <linux/resume_user_mode.h>
#include <linux/kasan.h>
#include <linux/poll.h>
@@ -195,19 +196,93 @@ void io_queue_next(struct io_kiocb *req);
void io_task_refs_refill(struct io_uring_task *tctx);
bool __io_alloc_req_refill(struct io_ring_ctx *ctx);
void io_activate_pollwq(struct io_ring_ctx *ctx);
+/*
+ * The ctx uring lock protects most of the mutable struct io_ring_ctx state
+ * accessed in the struct io_kiocb issue path. In the I/O path, it is typically
+ * acquired in the io_uring_enter() syscall and in io_handle_tw_list(). For
+ * IORING_SETUP_SQPOLL, it's acquired by io_sq_thread() instead. io_kiocb's
+ * issued with IO_URING_F_UNLOCKED in issue_flags (e.g. by io_wq_submit_work())
+ * acquire and release the ctx uring lock whenever they must touch io_ring_ctx
+ * state. io_uring_register() also acquires the ctx uring lock because most
+ * opcodes mutate io_ring_ctx state accessed in the issue path.
+ *
+ * For !IORING_SETUP_SINGLE_ISSUER io_ring_ctx's, acquiring the ctx uring lock
+ * is done via mutex_(try)lock(&ctx->uring_lock).
+ *
+ * However, for IORING_SETUP_SINGLE_ISSUER, we can avoid the mutex_lock() +
+ * mutex_unlock() overhead on submitter_task because a single thread can't race
+ * with itself. In the uncommon case where the ctx uring lock is needed on
+ * another thread, it must suspend submitter_task by scheduling a task work item
+ * on it. io_ring_ctx_lock() returns once the task work item has started.
+ * io_ring_ctx_unlock() allows the task work item to complete.
+ * If io_ring_ctx_lock() is called while the ctx is IORING_SETUP_R_DISABLED
+ * (e.g. during ctx create or exit), io_ring_ctx_lock() must acquire uring_lock
+ * because submitter_task isn't set yet. submitter_task can be accessed once
+ * uring_lock is held. If submitter_task exists, we do the same thing as in the
+ * non-IORING_SETUP_R_DISABLED case. If submitter_task isn't set, all other
+ * io_ring_ctx_lock() callers will also acquire uring_lock, so it suffices for
+ * mutual exclusion.
+ * Similarly, if io_ring_ctx_lock() is called after submitter_task has exited,
+ * task work can't be queued on it. Acquire uring_lock to exclude other callers.
+ */
+
+struct io_ring_suspend_work {
+ struct callback_head cb_head;
+ struct completion suspend_start;
+ struct completion **suspend_end;
+};
+
+void io_ring_suspend_work(struct callback_head *cb_head);
+
struct io_ring_ctx_lock_state {
+ bool mutex_held;
+ struct completion *suspend_end;
};
/* Acquire the ctx uring lock with the given nesting level */
static inline void io_ring_ctx_lock_nested(struct io_ring_ctx *ctx,
unsigned int subclass,
struct io_ring_ctx_lock_state *state)
{
- mutex_lock_nested(&ctx->uring_lock, subclass);
+ struct io_ring_suspend_work suspend_work;
+
+ if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) {
+ mutex_lock_nested(&ctx->uring_lock, subclass);
+ return;
+ }
+
+ state->mutex_held = false;
+ state->suspend_end = NULL;
+ if (unlikely(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED)) {
+ mutex_lock_nested(&ctx->uring_lock, subclass);
+ if (likely(!ctx->submitter_task)) {
+ state->mutex_held = true;
+ return;
+ }
+
+ /* submitter_task set concurrently, must suspend it */
+ mutex_unlock(&ctx->uring_lock);
+ } else if (likely(current == ctx->submitter_task)) {
+ return;
+ }
+
+ /* Use task work to suspend submitter_task */
+ init_task_work(&suspend_work.cb_head, io_ring_suspend_work);
+ init_completion(&suspend_work.suspend_start);
+ suspend_work.suspend_end = &state->suspend_end;
+ if (unlikely(task_work_add(ctx->submitter_task, &suspend_work.cb_head,
+ TWA_SIGNAL))) {
+ /* submitter_task is exiting, use mutex instead */
+ state->mutex_held = true;
+ mutex_lock_nested(&ctx->uring_lock, subclass);
+ return;
+ }
+
+ wait_for_completion(&suspend_work.suspend_start);
}
/* Acquire the ctx uring lock */
static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx,
struct io_ring_ctx_lock_state *state)
@@ -217,29 +292,66 @@ static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx,
/* Attempt to acquire the ctx uring lock without blocking */
static inline bool io_ring_ctx_trylock(struct io_ring_ctx *ctx,
struct io_ring_ctx_lock_state *state)
{
- return mutex_trylock(&ctx->uring_lock);
+ if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER))
+ return mutex_trylock(&ctx->uring_lock);
+
+ state->suspend_end = NULL;
+ if (unlikely(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED)) {
+ if (!mutex_trylock(&ctx->uring_lock))
+ return false;
+ if (likely(!ctx->submitter_task)) {
+ state->mutex_held = true;
+ return true;
+ }
+
+ mutex_unlock(&ctx->uring_lock);
+ return false;
+ }
+
+ state->mutex_held = false;
+ return current == ctx->submitter_task;
}
/* Release the ctx uring lock */
static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx,
struct io_ring_ctx_lock_state *state)
{
- mutex_unlock(&ctx->uring_lock);
+ if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) {
+ mutex_unlock(&ctx->uring_lock);
+ return;
+ }
+
+ if (unlikely(state->mutex_held))
+ mutex_unlock(&ctx->uring_lock);
+ if (unlikely(state->suspend_end))
+ complete(state->suspend_end);
}
/* Return (if CONFIG_LOCKDEP) whether the ctx uring lock is held */
static inline bool io_ring_ctx_lock_held(const struct io_ring_ctx *ctx)
{
+ /*
+ * No straightforward way to check that submitter_task is suspended
+ * without access to struct io_ring_ctx_lock_state
+ */
+ if (ctx->flags & IORING_SETUP_SINGLE_ISSUER &&
+ !(ctx->flags & IORING_SETUP_R_DISABLED))
+ return true;
+
return lockdep_is_held(&ctx->uring_lock);
}
/* Assert (if CONFIG_LOCKDEP) that the ctx uring lock is held */
static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx)
{
+ if (ctx->flags & IORING_SETUP_SINGLE_ISSUER &&
+ !(ctx->flags & IORING_SETUP_R_DISABLED))
+ return;
+
lockdep_assert_held(&ctx->uring_lock);
}
static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
{
--
2.45.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-12-18 2:44 [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
` (5 preceding siblings ...)
2025-12-18 2:44 ` [PATCH v6 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
@ 2025-12-18 8:01 ` syzbot ci
2025-12-22 20:19 ` Caleb Sander Mateos
6 siblings, 1 reply; 19+ messages in thread
From: syzbot ci @ 2025-12-18 8:01 UTC (permalink / raw)
To: axboe, csander, io-uring, joannelkoong, linux-kernel, oliver.sang,
syzbot
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
https://lore.kernel.org/all/20251218024459.1083572-1-csander@purestorage.com
* [PATCH v6 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED
* [PATCH v6 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
* [PATCH v6 3/6] io_uring: ensure submitter_task is valid for io_ring_ctx's lifetime
* [PATCH v6 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
* [PATCH v6 5/6] io_uring: factor out uring_lock helpers
* [PATCH v6 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
and found the following issue:
INFO: task hung in io_wq_put_and_exit
Full report is available here:
https://ci.syzbot.org/series/21eac721-670b-4f34-9696-66f9b28233ac
***
INFO: task hung in io_wq_put_and_exit
tree: torvalds
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
base: d358e5254674b70f34c847715ca509e46eb81e6f
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/1710cffe-7d78-4489-9aa1-823b8c2532ed/config
syz repro: https://ci.syzbot.org/findings/74ae8703-9484-4d82-aa78-84cc37dcb1ef/syz_repro
INFO: task syz.1.18:6046 blocked for more than 143 seconds.
Not tainted syzkaller #0
Blocked by coredump.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.1.18 state:D stack:25672 pid:6046 tgid:6045 ppid:5971 task_flags:0x400548 flags:0x00080004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5256 [inline]
__schedule+0x14bc/0x5000 kernel/sched/core.c:6863
__schedule_loop kernel/sched/core.c:6945 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6960
schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
do_wait_for_common kernel/sched/completion.c:100 [inline]
__wait_for_common kernel/sched/completion.c:121 [inline]
wait_for_common kernel/sched/completion.c:132 [inline]
wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
io_wq_exit_workers io_uring/io-wq.c:1328 [inline]
io_wq_put_and_exit+0x316/0x650 io_uring/io-wq.c:1356
io_uring_clean_tctx+0x11f/0x1a0 io_uring/tctx.c:207
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:652
io_uring_files_cancel include/linux/io_uring.h:19 [inline]
do_exit+0x345/0x2310 kernel/exit.c:911
do_group_exit+0x21c/0x2d0 kernel/exit.c:1112
get_signal+0x1285/0x1340 kernel/signal.c:3034
arch_do_signal_or_restart+0x9a/0x7a0 arch/x86/kernel/signal.c:337
__exit_to_user_mode_loop kernel/entry/common.c:41 [inline]
exit_to_user_mode_loop+0x87/0x4f0 kernel/entry/common.c:75
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
do_syscall_64+0x2e3/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f6a8b58f7c9
RSP: 002b:00007f6a8c4a00e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 00007f6a8b7e5fa8 RCX: 00007f6a8b58f7c9
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f6a8b7e5fac
RBP: 00007f6a8b7e5fa0 R08: 3fffffffffffffff R09: 0000000000000000
R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f6a8b7e6038 R14: 00007ffcac96d220 R15: 00007ffcac96d308
</TASK>
INFO: task iou-wrk-6046:6047 blocked for more than 143 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-6046 state:D stack:27760 pid:6047 tgid:6045 ppid:5971 task_flags:0x404050 flags:0x00080002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5256 [inline]
__schedule+0x14bc/0x5000 kernel/sched/core.c:6863
__schedule_loop kernel/sched/core.c:6945 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6960
schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
do_wait_for_common kernel/sched/completion.c:100 [inline]
__wait_for_common kernel/sched/completion.c:121 [inline]
wait_for_common kernel/sched/completion.c:132 [inline]
wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
io_ring_ctx_lock_nested+0x2b3/0x380 io_uring/io_uring.h:283
io_ring_ctx_lock io_uring/io_uring.h:290 [inline]
io_ring_submit_lock io_uring/io_uring.h:554 [inline]
io_files_update+0x677/0x7f0 io_uring/rsrc.c:504
__io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1818
io_issue_sqe+0x1de/0x1190 io_uring/io_uring.c:1841
io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1953
io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:650
io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:704
ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>
INFO: task syz.0.17:6049 blocked for more than 143 seconds.
Not tainted syzkaller #0
Blocked by coredump.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.0.17 state:D stack:25592 pid:6049 tgid:6048 ppid:5967 task_flags:0x400548 flags:0x00080004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5256 [inline]
__schedule+0x14bc/0x5000 kernel/sched/core.c:6863
__schedule_loop kernel/sched/core.c:6945 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6960
schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
do_wait_for_common kernel/sched/completion.c:100 [inline]
__wait_for_common kernel/sched/completion.c:121 [inline]
wait_for_common kernel/sched/completion.c:132 [inline]
wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
io_wq_exit_workers io_uring/io-wq.c:1328 [inline]
io_wq_put_and_exit+0x316/0x650 io_uring/io-wq.c:1356
io_uring_clean_tctx+0x11f/0x1a0 io_uring/tctx.c:207
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:652
io_uring_files_cancel include/linux/io_uring.h:19 [inline]
do_exit+0x345/0x2310 kernel/exit.c:911
do_group_exit+0x21c/0x2d0 kernel/exit.c:1112
get_signal+0x1285/0x1340 kernel/signal.c:3034
arch_do_signal_or_restart+0x9a/0x7a0 arch/x86/kernel/signal.c:337
__exit_to_user_mode_loop kernel/entry/common.c:41 [inline]
exit_to_user_mode_loop+0x87/0x4f0 kernel/entry/common.c:75
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
do_syscall_64+0x2e3/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa96a98f7c9
RSP: 002b:00007fa96b7430e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 00007fa96abe5fa8 RCX: 00007fa96a98f7c9
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007fa96abe5fac
RBP: 00007fa96abe5fa0 R08: 3fffffffffffffff R09: 0000000000000000
R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fa96abe6038 R14: 00007ffd9fcc00d0 R15: 00007ffd9fcc01b8
</TASK>
INFO: task iou-wrk-6049:6050 blocked for more than 143 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-6049 state:D stack:27760 pid:6050 tgid:6048 ppid:5967 task_flags:0x404050 flags:0x00080002
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5256 [inline]
__schedule+0x14bc/0x5000 kernel/sched/core.c:6863
__schedule_loop kernel/sched/core.c:6945 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6960
schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
do_wait_for_common kernel/sched/completion.c:100 [inline]
__wait_for_common kernel/sched/completion.c:121 [inline]
wait_for_common kernel/sched/completion.c:132 [inline]
wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
io_ring_ctx_lock_nested+0x2b3/0x380 io_uring/io_uring.h:283
io_ring_ctx_lock io_uring/io_uring.h:290 [inline]
io_ring_submit_lock io_uring/io_uring.h:554 [inline]
io_files_update+0x677/0x7f0 io_uring/rsrc.c:504
__io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1818
io_issue_sqe+0x1de/0x1190 io_uring/io_uring.c:1841
io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1953
io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:650
io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:704
ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>
INFO: task syz.2.19:6052 blocked for more than 144 seconds.
Not tainted syzkaller #0
Blocked by coredump.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz.2.19 state:D stack:26208 pid:6052 tgid:6051 ppid:5972 task_flags:0x400548 flags:0x00080004
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5256 [inline]
__schedule+0x14bc/0x5000 kernel/sched/core.c:6863
__schedule_loop kernel/sched/core.c:6945 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6960
schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
do_wait_for_common kernel/sched/completion.c:100 [inline]
__wait_for_common kernel/sched/completion.c:121 [inline]
wait_for_common kernel/sched/completion.c:132 [inline]
wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
io_wq_exit_workers io_uring/io-wq.c:1328 [inline]
io_wq_put_and_exit+0x316/0x650 io_uring/io-wq.c:1356
io_uring_clean_tctx+0x11f/0x1a0 io_uring/tctx.c:207
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:652
io_uring_files_cancel include/linux/io_uring.h:19 [inline]
do_exit+0x345/0x2310 kernel/exit.c:911
do_group_exit+0x21c/0x2d0 kernel/exit.c:1112
get_signal+0x1285/0x1340 kernel/signal.c:3034
arch_do_signal_or_restart+0x9a/0x7a0 arch/x86/kernel/signal.c:337
__exit_to_user_mode_loop kernel/entry/common.c:41 [inline]
exit_to_user_mode_loop+0x87/0x4f0 kernel/entry/common.c:75
__exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
do_syscall_64+0x2e3/0xf80 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f4b5cb8f7c9
RSP: 002b:00007f4b5d9a80e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 00007f4b5cde5fa8 RCX: 00007f4b5cb8f7c9
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f4b5cde5fac
RBP: 00007f4b5cde5fa0 R08: 3fffffffffffffff R09: 0000000000000000
R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f4b5cde6038 R14: 00007ffcdd64aed0 R15: 00007ffcdd64afb8
</TASK>
INFO: task iou-wrk-6052:6053 blocked for more than 144 seconds.
Not tainted syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-6052 state:D stack:27760 pid:6053 tgid:6051 ppid:5972 task_flags:0x404050 flags:0x00080006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5256 [inline]
__schedule+0x14bc/0x5000 kernel/sched/core.c:6863
__schedule_loop kernel/sched/core.c:6945 [inline]
schedule+0x165/0x360 kernel/sched/core.c:6960
schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
do_wait_for_common kernel/sched/completion.c:100 [inline]
__wait_for_common kernel/sched/completion.c:121 [inline]
wait_for_common kernel/sched/completion.c:132 [inline]
wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
io_ring_ctx_lock_nested+0x2b3/0x380 io_uring/io_uring.h:283
io_ring_ctx_lock io_uring/io_uring.h:290 [inline]
io_ring_submit_lock io_uring/io_uring.h:554 [inline]
io_files_update+0x677/0x7f0 io_uring/rsrc.c:504
__io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1818
io_issue_sqe+0x1de/0x1190 io_uring/io_uring.c:1841
io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1953
io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:650
io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:704
ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>
Showing all locks held in the system:
1 lock held by khungtaskd/35:
#0: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
#0: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
#0: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6775
5 locks held by kworker/u10:8/1120:
#0: ffff88823c63a918 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:639
#1: ffff88823c624588 (psi_seq){-.-.}-{0:0}, at: psi_task_switch+0x53/0x880 kernel/sched/psi.c:933
#2: ffff88810ac50788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: class_wiphy_constructor include/net/cfg80211.h:6363 [inline]
#2: ffff88810ac50788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: cfg80211_wiphy_work+0xb4/0x450 net/wireless/core.c:424
#3: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
#3: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
#3: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: ieee80211_sta_active_ibss+0xc3/0x330 net/mac80211/ibss.c:635
#4: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
#4: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
#4: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: class_rcu_constructor include/linux/rcupdate.h:1195 [inline]
#4: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: unwind_next_frame+0xa5/0x2390 arch/x86/kernel/unwind_orc.c:479
2 locks held by getty/5656:
#0: ffff8881133040a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
#1: ffffc900035732f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x449/0x1460 drivers/tty/n_tty.c:2211
3 locks held by kworker/0:9/6480:
#0: ffff888100075948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3232 [inline]
#0: ffff888100075948 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x9b4/0x1770 kernel/workqueue.c:3340
#1: ffffc9000546fb80 (deferred_process_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3233 [inline]
#1: ffffc9000546fb80 (deferred_process_work){+.+.}-{0:0}, at: process_scheduled_works+0x9ef/0x1770 kernel/workqueue.c:3340
#2: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: switchdev_deferred_process_work+0xe/0x20 net/switchdev/switchdev.c:104
1 lock held by syz-executor/6649:
#0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
#0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
#0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x8ec/0x1c90 net/core/rtnetlink.c:4071
2 locks held by syz-executor/6651:
#0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
#0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
#0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x8ec/0x1c90 net/core/rtnetlink.c:4071
#1: ffff88823c63a918 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:639
4 locks held by syz-executor/6653:
=============================================
NMI backtrace for cpu 0
CPU: 0 UID: 0 PID: 35 Comm: khungtaskd Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
nmi_cpu_backtrace+0x39e/0x3d0 lib/nmi_backtrace.c:113
nmi_trigger_cpumask_backtrace+0x17a/0x300 lib/nmi_backtrace.c:62
trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
__sys_info lib/sys_info.c:157 [inline]
sys_info+0x135/0x170 lib/sys_info.c:165
check_hung_uninterruptible_tasks kernel/hung_task.c:346 [inline]
watchdog+0xf95/0xfe0 kernel/hung_task.c:515
kthread+0x711/0x8a0 kernel/kthread.c:463
ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
</TASK>
Sending NMI from CPU 0 to CPUs 1:
NMI backtrace for cpu 1
CPU: 1 UID: 0 PID: 6653 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:io_serial_out+0x7c/0xc0 drivers/tty/serial/8250/8250_port.c:407
Code: 3f a6 fc 44 89 f9 d3 e5 49 83 c6 40 4c 89 f0 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 f7 e8 ec 91 0c fd 41 03 2e 89 d8 89 ea ee <5b> 41 5c 41 5e 41 5f 5d c3 cc cc cc cc cc 44 89 f9 80 e1 07 38 c1
RSP: 0018:ffffc90008156590 EFLAGS: 00000002
RAX: 000000000000005b RBX: 000000000000005b RCX: 0000000000000000
RDX: 00000000000003f8 RSI: 0000000000000000 RDI: 0000000000000020
RBP: 00000000000003f8 R08: ffff888102f08237 R09: 1ffff110205e1046
R10: dffffc0000000000 R11: ffffffff851b9060 R12: dffffc0000000000
R13: ffffffff998dd9e1 R14: ffffffff99bf2420 R15: 0000000000000000
FS: 0000555595186500(0000) GS:ffff8882a9e37000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055599f9c9018 CR3: 0000000112ed8000 CR4: 00000000000006f0
Call Trace:
<TASK>
serial_port_out include/linux/serial_core.h:811 [inline]
serial8250_console_putchar drivers/tty/serial/8250/8250_port.c:3192 [inline]
serial8250_console_fifo_write drivers/tty/serial/8250/8250_port.c:-1 [inline]
serial8250_console_write+0x1410/0x1ba0 drivers/tty/serial/8250/8250_port.c:3342
console_emit_next_record kernel/printk/printk.c:3129 [inline]
console_flush_one_record kernel/printk/printk.c:3215 [inline]
console_flush_all+0x745/0xb60 kernel/printk/printk.c:3289
__console_flush_and_unlock kernel/printk/printk.c:3319 [inline]
console_unlock+0xbb/0x190 kernel/printk/printk.c:3359
vprintk_emit+0x4f8/0x5f0 kernel/printk/printk.c:2426
_printk+0xcf/0x120 kernel/printk/printk.c:2451
br_set_state+0x475/0x710 net/bridge/br_stp.c:57
br_init_port+0x99/0x200 net/bridge/br_stp_if.c:39
new_nbp+0x2f9/0x440 net/bridge/br_if.c:443
br_add_if+0x283/0xeb0 net/bridge/br_if.c:586
do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2963
do_setlink+0xcf0/0x41c0 net/core/rtnetlink.c:3165
rtnl_changelink net/core/rtnetlink.c:3776 [inline]
__rtnl_newlink net/core/rtnetlink.c:3935 [inline]
rtnl_newlink+0x161c/0x1c90 net/core/rtnetlink.c:4072
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6958
netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2550
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg+0x21c/0x270 net/socket.c:742
__sys_sendto+0x3bd/0x520 net/socket.c:2206
__do_sys_sendto net/socket.c:2213 [inline]
__se_sys_sendto net/socket.c:2209 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2209
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f780c39165c
Code: 2a 5f 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 70 5f 02 00 48 8b
RSP: 002b:00007ffcecb618b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f780d114620 RCX: 00007f780c39165c
RDX: 0000000000000028 RSI: 00007f780d114670 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007ffcecb61904 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
R13: 0000000000000000 R14: 00007f780d114670 R15: 0000000000000000
</TASK>
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [syzbot ci] Re: io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
2025-12-18 8:01 ` [syzbot ci] " syzbot ci
@ 2025-12-22 20:19 ` Caleb Sander Mateos
0 siblings, 0 replies; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-12-22 20:19 UTC (permalink / raw)
To: syzbot ci
Cc: axboe, io-uring, joannelkoong, linux-kernel, oliver.sang, syzbot,
syzbot, syzkaller-bugs
On Thu, Dec 18, 2025 at 3:01 AM syzbot ci
<syzbot+ci6d21afd0455de45a@syzkaller.appspotmail.com> wrote:
>
> syzbot ci has tested the following series
>
> [v6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> https://lore.kernel.org/all/20251218024459.1083572-1-csander@purestorage.com
> * [PATCH v6 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED
> * [PATCH v6 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
> * [PATCH v6 3/6] io_uring: ensure submitter_task is valid for io_ring_ctx's lifetime
> * [PATCH v6 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
> * [PATCH v6 5/6] io_uring: factor out uring_lock helpers
> * [PATCH v6 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>
> and found the following issue:
> INFO: task hung in io_wq_put_and_exit
>
> Full report is available here:
> https://ci.syzbot.org/series/21eac721-670b-4f34-9696-66f9b28233ac
>
> ***
>
> INFO: task hung in io_wq_put_and_exit
>
> tree: torvalds
> URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux
> base: d358e5254674b70f34c847715ca509e46eb81e6f
> arch: amd64
> compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
> config: https://ci.syzbot.org/builds/1710cffe-7d78-4489-9aa1-823b8c2532ed/config
> syz repro: https://ci.syzbot.org/findings/74ae8703-9484-4d82-aa78-84cc37dcb1ef/syz_repro
>
> INFO: task syz.1.18:6046 blocked for more than 143 seconds.
> Not tainted syzkaller #0
> Blocked by coredump.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.1.18 state:D stack:25672 pid:6046 tgid:6045 ppid:5971 task_flags:0x400548 flags:0x00080004
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5256 [inline]
> __schedule+0x14bc/0x5000 kernel/sched/core.c:6863
> __schedule_loop kernel/sched/core.c:6945 [inline]
> schedule+0x165/0x360 kernel/sched/core.c:6960
> schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
> do_wait_for_common kernel/sched/completion.c:100 [inline]
> __wait_for_common kernel/sched/completion.c:121 [inline]
> wait_for_common kernel/sched/completion.c:132 [inline]
> wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
> io_wq_exit_workers io_uring/io-wq.c:1328 [inline]
> io_wq_put_and_exit+0x316/0x650 io_uring/io-wq.c:1356
> io_uring_clean_tctx+0x11f/0x1a0 io_uring/tctx.c:207
> io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:652
> io_uring_files_cancel include/linux/io_uring.h:19 [inline]
> do_exit+0x345/0x2310 kernel/exit.c:911
> do_group_exit+0x21c/0x2d0 kernel/exit.c:1112
> get_signal+0x1285/0x1340 kernel/signal.c:3034
> arch_do_signal_or_restart+0x9a/0x7a0 arch/x86/kernel/signal.c:337
> __exit_to_user_mode_loop kernel/entry/common.c:41 [inline]
> exit_to_user_mode_loop+0x87/0x4f0 kernel/entry/common.c:75
> __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
> syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
> syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
> syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
> do_syscall_64+0x2e3/0xf80 arch/x86/entry/syscall_64.c:100
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f6a8b58f7c9
> RSP: 002b:00007f6a8c4a00e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: 0000000000000001 RBX: 00007f6a8b7e5fa8 RCX: 00007f6a8b58f7c9
> RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f6a8b7e5fac
> RBP: 00007f6a8b7e5fa0 R08: 3fffffffffffffff R09: 0000000000000000
> R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f6a8b7e6038 R14: 00007ffcac96d220 R15: 00007ffcac96d308
> </TASK>
> INFO: task iou-wrk-6046:6047 blocked for more than 143 seconds.
> Not tainted syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-wrk-6046 state:D stack:27760 pid:6047 tgid:6045 ppid:5971 task_flags:0x404050 flags:0x00080002
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5256 [inline]
> __schedule+0x14bc/0x5000 kernel/sched/core.c:6863
> __schedule_loop kernel/sched/core.c:6945 [inline]
> schedule+0x165/0x360 kernel/sched/core.c:6960
> schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
> do_wait_for_common kernel/sched/completion.c:100 [inline]
> __wait_for_common kernel/sched/completion.c:121 [inline]
> wait_for_common kernel/sched/completion.c:132 [inline]
> wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
> io_ring_ctx_lock_nested+0x2b3/0x380 io_uring/io_uring.h:283
> io_ring_ctx_lock io_uring/io_uring.h:290 [inline]
> io_ring_submit_lock io_uring/io_uring.h:554 [inline]
> io_files_update+0x677/0x7f0 io_uring/rsrc.c:504
> __io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1818
> io_issue_sqe+0x1de/0x1190 io_uring/io_uring.c:1841
> io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1953
> io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:650
> io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:704
> ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
> </TASK>
Interesting, a deadlock between io_wq_exit_workers() on submitter_task
(which is exiting) and io_ring_ctx_lock() on an io_uring worker
thread. io_ring_ctx_lock() is blocked until submitter_task runs task
work, but that will never happen because it's waiting on the
completion. Not sure what the best approach is here. Maybe have the
submitter_task alternate between running task work and waiting on the
completion? Or have some way for submitter_task to indicate that it's
exiting and disable the IORING_SETUP_SINGLE_ISSUER optimization in
io_ring_ctx_lock()?
Thanks,
Caleb
> INFO: task syz.0.17:6049 blocked for more than 143 seconds.
> Not tainted syzkaller #0
> Blocked by coredump.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.0.17 state:D stack:25592 pid:6049 tgid:6048 ppid:5967 task_flags:0x400548 flags:0x00080004
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5256 [inline]
> __schedule+0x14bc/0x5000 kernel/sched/core.c:6863
> __schedule_loop kernel/sched/core.c:6945 [inline]
> schedule+0x165/0x360 kernel/sched/core.c:6960
> schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
> do_wait_for_common kernel/sched/completion.c:100 [inline]
> __wait_for_common kernel/sched/completion.c:121 [inline]
> wait_for_common kernel/sched/completion.c:132 [inline]
> wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
> io_wq_exit_workers io_uring/io-wq.c:1328 [inline]
> io_wq_put_and_exit+0x316/0x650 io_uring/io-wq.c:1356
> io_uring_clean_tctx+0x11f/0x1a0 io_uring/tctx.c:207
> io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:652
> io_uring_files_cancel include/linux/io_uring.h:19 [inline]
> do_exit+0x345/0x2310 kernel/exit.c:911
> do_group_exit+0x21c/0x2d0 kernel/exit.c:1112
> get_signal+0x1285/0x1340 kernel/signal.c:3034
> arch_do_signal_or_restart+0x9a/0x7a0 arch/x86/kernel/signal.c:337
> __exit_to_user_mode_loop kernel/entry/common.c:41 [inline]
> exit_to_user_mode_loop+0x87/0x4f0 kernel/entry/common.c:75
> __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
> syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
> syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
> syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
> do_syscall_64+0x2e3/0xf80 arch/x86/entry/syscall_64.c:100
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7fa96a98f7c9
> RSP: 002b:00007fa96b7430e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: 0000000000000001 RBX: 00007fa96abe5fa8 RCX: 00007fa96a98f7c9
> RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007fa96abe5fac
> RBP: 00007fa96abe5fa0 R08: 3fffffffffffffff R09: 0000000000000000
> R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007fa96abe6038 R14: 00007ffd9fcc00d0 R15: 00007ffd9fcc01b8
> </TASK>
> INFO: task iou-wrk-6049:6050 blocked for more than 143 seconds.
> Not tainted syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-wrk-6049 state:D stack:27760 pid:6050 tgid:6048 ppid:5967 task_flags:0x404050 flags:0x00080002
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5256 [inline]
> __schedule+0x14bc/0x5000 kernel/sched/core.c:6863
> __schedule_loop kernel/sched/core.c:6945 [inline]
> schedule+0x165/0x360 kernel/sched/core.c:6960
> schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
> do_wait_for_common kernel/sched/completion.c:100 [inline]
> __wait_for_common kernel/sched/completion.c:121 [inline]
> wait_for_common kernel/sched/completion.c:132 [inline]
> wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
> io_ring_ctx_lock_nested+0x2b3/0x380 io_uring/io_uring.h:283
> io_ring_ctx_lock io_uring/io_uring.h:290 [inline]
> io_ring_submit_lock io_uring/io_uring.h:554 [inline]
> io_files_update+0x677/0x7f0 io_uring/rsrc.c:504
> __io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1818
> io_issue_sqe+0x1de/0x1190 io_uring/io_uring.c:1841
> io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1953
> io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:650
> io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:704
> ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
> </TASK>
> INFO: task syz.2.19:6052 blocked for more than 144 seconds.
> Not tainted syzkaller #0
> Blocked by coredump.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz.2.19 state:D stack:26208 pid:6052 tgid:6051 ppid:5972 task_flags:0x400548 flags:0x00080004
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5256 [inline]
> __schedule+0x14bc/0x5000 kernel/sched/core.c:6863
> __schedule_loop kernel/sched/core.c:6945 [inline]
> schedule+0x165/0x360 kernel/sched/core.c:6960
> schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
> do_wait_for_common kernel/sched/completion.c:100 [inline]
> __wait_for_common kernel/sched/completion.c:121 [inline]
> wait_for_common kernel/sched/completion.c:132 [inline]
> wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
> io_wq_exit_workers io_uring/io-wq.c:1328 [inline]
> io_wq_put_and_exit+0x316/0x650 io_uring/io-wq.c:1356
> io_uring_clean_tctx+0x11f/0x1a0 io_uring/tctx.c:207
> io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:652
> io_uring_files_cancel include/linux/io_uring.h:19 [inline]
> do_exit+0x345/0x2310 kernel/exit.c:911
> do_group_exit+0x21c/0x2d0 kernel/exit.c:1112
> get_signal+0x1285/0x1340 kernel/signal.c:3034
> arch_do_signal_or_restart+0x9a/0x7a0 arch/x86/kernel/signal.c:337
> __exit_to_user_mode_loop kernel/entry/common.c:41 [inline]
> exit_to_user_mode_loop+0x87/0x4f0 kernel/entry/common.c:75
> __exit_to_user_mode_prepare include/linux/irq-entry-common.h:226 [inline]
> syscall_exit_to_user_mode_prepare include/linux/irq-entry-common.h:256 [inline]
> syscall_exit_to_user_mode_work include/linux/entry-common.h:159 [inline]
> syscall_exit_to_user_mode include/linux/entry-common.h:194 [inline]
> do_syscall_64+0x2e3/0xf80 arch/x86/entry/syscall_64.c:100
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f4b5cb8f7c9
> RSP: 002b:00007f4b5d9a80e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
> RAX: 0000000000000001 RBX: 00007f4b5cde5fa8 RCX: 00007f4b5cb8f7c9
> RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00007f4b5cde5fac
> RBP: 00007f4b5cde5fa0 R08: 3fffffffffffffff R09: 0000000000000000
> R10: 0000000000000800 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007f4b5cde6038 R14: 00007ffcdd64aed0 R15: 00007ffcdd64afb8
> </TASK>
> INFO: task iou-wrk-6052:6053 blocked for more than 144 seconds.
> Not tainted syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:iou-wrk-6052 state:D stack:27760 pid:6053 tgid:6051 ppid:5972 task_flags:0x404050 flags:0x00080006
> Call Trace:
> <TASK>
> context_switch kernel/sched/core.c:5256 [inline]
> __schedule+0x14bc/0x5000 kernel/sched/core.c:6863
> __schedule_loop kernel/sched/core.c:6945 [inline]
> schedule+0x165/0x360 kernel/sched/core.c:6960
> schedule_timeout+0x9a/0x270 kernel/time/sleep_timeout.c:75
> do_wait_for_common kernel/sched/completion.c:100 [inline]
> __wait_for_common kernel/sched/completion.c:121 [inline]
> wait_for_common kernel/sched/completion.c:132 [inline]
> wait_for_completion+0x2bf/0x5d0 kernel/sched/completion.c:153
> io_ring_ctx_lock_nested+0x2b3/0x380 io_uring/io_uring.h:283
> io_ring_ctx_lock io_uring/io_uring.h:290 [inline]
> io_ring_submit_lock io_uring/io_uring.h:554 [inline]
> io_files_update+0x677/0x7f0 io_uring/rsrc.c:504
> __io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1818
> io_issue_sqe+0x1de/0x1190 io_uring/io_uring.c:1841
> io_wq_submit_work+0x6e9/0xb90 io_uring/io_uring.c:1953
> io_worker_handle_work+0x7cd/0x1180 io_uring/io-wq.c:650
> io_wq_worker+0x42f/0xeb0 io_uring/io-wq.c:704
> ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
> </TASK>
>
> Showing all locks held in the system:
> 1 lock held by khungtaskd/35:
> #0: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
> #0: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
> #0: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x2e/0x180 kernel/locking/lockdep.c:6775
> 5 locks held by kworker/u10:8/1120:
> #0: ffff88823c63a918 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:639
> #1: ffff88823c624588 (psi_seq){-.-.}-{0:0}, at: psi_task_switch+0x53/0x880 kernel/sched/psi.c:933
> #2: ffff88810ac50788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: class_wiphy_constructor include/net/cfg80211.h:6363 [inline]
> #2: ffff88810ac50788 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: cfg80211_wiphy_work+0xb4/0x450 net/wireless/core.c:424
> #3: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
> #3: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
> #3: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: ieee80211_sta_active_ibss+0xc3/0x330 net/mac80211/ibss.c:635
> #4: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:331 [inline]
> #4: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:867 [inline]
> #4: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: class_rcu_constructor include/linux/rcupdate.h:1195 [inline]
> #4: ffffffff8df419e0 (rcu_read_lock){....}-{1:3}, at: unwind_next_frame+0xa5/0x2390 arch/x86/kernel/unwind_orc.c:479
> 2 locks held by getty/5656:
> #0: ffff8881133040a0 (&tty->ldisc_sem){++++}-{0:0}, at: tty_ldisc_ref_wait+0x25/0x70 drivers/tty/tty_ldisc.c:243
> #1: ffffc900035732f0 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x449/0x1460 drivers/tty/n_tty.c:2211
> 3 locks held by kworker/0:9/6480:
> #0: ffff888100075948 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3232 [inline]
> #0: ffff888100075948 ((wq_completion)events){+.+.}-{0:0}, at: process_scheduled_works+0x9b4/0x1770 kernel/workqueue.c:3340
> #1: ffffc9000546fb80 (deferred_process_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3233 [inline]
> #1: ffffc9000546fb80 (deferred_process_work){+.+.}-{0:0}, at: process_scheduled_works+0x9ef/0x1770 kernel/workqueue.c:3340
> #2: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: switchdev_deferred_process_work+0xe/0x20 net/switchdev/switchdev.c:104
> 1 lock held by syz-executor/6649:
> #0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
> #0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
> #0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x8ec/0x1c90 net/core/rtnetlink.c:4071
> 2 locks held by syz-executor/6651:
> #0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_lock net/core/rtnetlink.c:80 [inline]
> #0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_nets_lock net/core/rtnetlink.c:341 [inline]
> #0: ffffffff8f30ffc8 (rtnl_mutex){+.+.}-{4:4}, at: rtnl_newlink+0x8ec/0x1c90 net/core/rtnetlink.c:4071
> #1: ffff88823c63a918 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x2a/0x140 kernel/sched/core.c:639
> 4 locks held by syz-executor/6653:
>
> =============================================
>
> NMI backtrace for cpu 0
> CPU: 0 UID: 0 PID: 35 Comm: khungtaskd Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
> nmi_cpu_backtrace+0x39e/0x3d0 lib/nmi_backtrace.c:113
> nmi_trigger_cpumask_backtrace+0x17a/0x300 lib/nmi_backtrace.c:62
> trigger_all_cpu_backtrace include/linux/nmi.h:160 [inline]
> __sys_info lib/sys_info.c:157 [inline]
> sys_info+0x135/0x170 lib/sys_info.c:165
> check_hung_uninterruptible_tasks kernel/hung_task.c:346 [inline]
> watchdog+0xf95/0xfe0 kernel/hung_task.c:515
> kthread+0x711/0x8a0 kernel/kthread.c:463
> ret_from_fork+0x599/0xb30 arch/x86/kernel/process.c:158
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:246
> </TASK>
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1
> CPU: 1 UID: 0 PID: 6653 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:io_serial_out+0x7c/0xc0 drivers/tty/serial/8250/8250_port.c:407
> Code: 3f a6 fc 44 89 f9 d3 e5 49 83 c6 40 4c 89 f0 48 c1 e8 03 42 80 3c 20 00 74 08 4c 89 f7 e8 ec 91 0c fd 41 03 2e 89 d8 89 ea ee <5b> 41 5c 41 5e 41 5f 5d c3 cc cc cc cc cc 44 89 f9 80 e1 07 38 c1
> RSP: 0018:ffffc90008156590 EFLAGS: 00000002
> RAX: 000000000000005b RBX: 000000000000005b RCX: 0000000000000000
> RDX: 00000000000003f8 RSI: 0000000000000000 RDI: 0000000000000020
> RBP: 00000000000003f8 R08: ffff888102f08237 R09: 1ffff110205e1046
> R10: dffffc0000000000 R11: ffffffff851b9060 R12: dffffc0000000000
> R13: ffffffff998dd9e1 R14: ffffffff99bf2420 R15: 0000000000000000
> FS: 0000555595186500(0000) GS:ffff8882a9e37000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055599f9c9018 CR3: 0000000112ed8000 CR4: 00000000000006f0
> Call Trace:
> <TASK>
> serial_port_out include/linux/serial_core.h:811 [inline]
> serial8250_console_putchar drivers/tty/serial/8250/8250_port.c:3192 [inline]
> serial8250_console_fifo_write drivers/tty/serial/8250/8250_port.c:-1 [inline]
> serial8250_console_write+0x1410/0x1ba0 drivers/tty/serial/8250/8250_port.c:3342
> console_emit_next_record kernel/printk/printk.c:3129 [inline]
> console_flush_one_record kernel/printk/printk.c:3215 [inline]
> console_flush_all+0x745/0xb60 kernel/printk/printk.c:3289
> __console_flush_and_unlock kernel/printk/printk.c:3319 [inline]
> console_unlock+0xbb/0x190 kernel/printk/printk.c:3359
> vprintk_emit+0x4f8/0x5f0 kernel/printk/printk.c:2426
> _printk+0xcf/0x120 kernel/printk/printk.c:2451
> br_set_state+0x475/0x710 net/bridge/br_stp.c:57
> br_init_port+0x99/0x200 net/bridge/br_stp_if.c:39
> new_nbp+0x2f9/0x440 net/bridge/br_if.c:443
> br_add_if+0x283/0xeb0 net/bridge/br_if.c:586
> do_set_master+0x533/0x6d0 net/core/rtnetlink.c:2963
> do_setlink+0xcf0/0x41c0 net/core/rtnetlink.c:3165
> rtnl_changelink net/core/rtnetlink.c:3776 [inline]
> __rtnl_newlink net/core/rtnetlink.c:3935 [inline]
> rtnl_newlink+0x161c/0x1c90 net/core/rtnetlink.c:4072
> rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6958
> netlink_rcv_skb+0x208/0x470 net/netlink/af_netlink.c:2550
> netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
> netlink_unicast+0x82f/0x9e0 net/netlink/af_netlink.c:1344
> netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1894
> sock_sendmsg_nosec net/socket.c:727 [inline]
> __sock_sendmsg+0x21c/0x270 net/socket.c:742
> __sys_sendto+0x3bd/0x520 net/socket.c:2206
> __do_sys_sendto net/socket.c:2213 [inline]
> __se_sys_sendto net/socket.c:2209 [inline]
> __x64_sys_sendto+0xde/0x100 net/socket.c:2209
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0xf80 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f780c39165c
> Code: 2a 5f 02 00 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 70 5f 02 00 48 8b
> RSP: 002b:00007ffcecb618b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 00007f780d114620 RCX: 00007f780c39165c
> RDX: 0000000000000028 RSI: 00007f780d114670 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 00007ffcecb61904 R09: 000000000000000c
> R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000003
> R13: 0000000000000000 R14: 00007f780d114670 R15: 0000000000000000
> </TASK>
>
>
> ***
>
> If these findings have caused you to resend the series or submit a
> separate fix, please add the following tag to your commit message:
> Tested-by: syzbot@syzkaller.appspotmail.com
>
> ---
> This report is generated by a bot. It may contain errors.
> syzbot ci engineers can be reached at syzkaller@googlegroups.com.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-12-22 20:19 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-18 2:44 [PATCH v6 0/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 1/6] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 3/6] io_uring: ensure submitter_task is valid for io_ring_ctx's lifetime Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 4/6] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued() Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 5/6] io_uring: factor out uring_lock helpers Caleb Sander Mateos
2025-12-18 2:44 ` [PATCH v6 6/6] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-12-18 8:01 ` [syzbot ci] " syzbot ci
2025-12-22 20:19 ` Caleb Sander Mateos
-- strict thread matches above, loose matches on Subject: below --
2025-12-15 20:09 [PATCH v5 0/6] " Caleb Sander Mateos
2025-12-16 5:21 ` [syzbot ci] " syzbot ci
2025-12-18 1:24 ` Caleb Sander Mateos
2025-11-25 23:39 [PATCH v3 0/4] " Caleb Sander Mateos
2025-11-26 8:15 ` [syzbot ci] " syzbot ci
2025-11-26 17:30 ` Caleb Sander Mateos
2025-09-03 3:26 [PATCH 0/4] " Caleb Sander Mateos
2025-09-03 21:55 ` [syzbot ci] " syzbot ci
2025-09-03 23:29 ` Jens Axboe
2025-09-04 14:52 ` Caleb Sander Mateos
2025-09-04 16:46 ` Caleb Sander Mateos
2025-09-04 16:50 ` Caleb Sander Mateos
2025-09-04 23:25 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox