* [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) @ 2024-11-04 10:36 syzbot 2024-11-04 11:31 ` syzbot 0 siblings, 1 reply; 11+ messages in thread From: syzbot @ 2024-11-04 10:36 UTC (permalink / raw) To: asml.silence, axboe, io-uring, linux-kernel, linux-usb, syzkaller-bugs Hello, syzbot found the following issue on: HEAD commit: c88416ba074a Add linux-next specific files for 20241101 git tree: linux-next console+strace: https://syzkaller.appspot.com/x/log.txt?x=14c04740580000 kernel config: https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2 compiler: Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c04740580000 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/760a8c88d0c3/disk-c88416ba.raw.xz vmlinux: https://storage.googleapis.com/syzbot-assets/46e4b0a851a2/vmlinux-c88416ba.xz kernel image: https://storage.googleapis.com/syzbot-assets/428e2c784b75/bzImage-c88416ba.xz IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: [email protected] ------------[ cut here ]------------ WARNING: CPU: 1 PID: 3508 at io_uring/io_uring.h:142 io_lockdep_assert_cq_locked io_uring/io_uring.h:142 [inline] WARNING: CPU: 1 PID: 3508 at io_uring/io_uring.h:142 io_get_cqe_overflow+0x43f/0x590 io_uring/io_uring.h:166 Modules linked in: CPU: 1 UID: 0 PID: 3508 Comm: kworker/u8:8 Not tainted 6.12.0-rc5-next-20241101-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: iou_exit io_ring_exit_work RIP: 0010:io_lockdep_assert_cq_locked io_uring/io_uring.h:142 [inline] RIP: 0010:io_get_cqe_overflow+0x43f/0x590 io_uring/io_uring.h:166 Code: 0f 0b 90 e9 62 fc ff ff e8 fe 43 ec fc 90 0f 0b 90 e9 90 fe ff ff e8 f0 43 ec fc 90 0f 0b 90 e9 82 fe ff ff e8 e2 43 ec fc 90 <0f> 0b 90 e9 74 fe ff ff e8 d4 43 ec fc 90 0f 0b 90 e9 66 fe ff ff RSP: 0018:ffffc9000d0df810 EFLAGS: 00010293 RAX: ffffffff84a97a1e RBX: ffff888034e58000 RCX: ffff888032328000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: 0000000000000001 R08: ffffffff84a97821 R09: fffff52001a1befc R10: dffffc0000000000 R11: fffff52001a1befc R12: 0000000000000000 R13: dffffc0000000000 R14: dffffc0000000000 R15: ffffc9000d0df8a0 FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fd6dd6c11f0 CR3: 000000004b8ac000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> io_get_cqe io_uring/io_uring.h:182 [inline] io_fill_cqe_aux io_uring/io_uring.c:822 [inline] __io_post_aux_cqe io_uring/io_uring.c:843 [inline] io_post_aux_cqe+0xe5/0x420 io_uring/io_uring.c:855 io_free_rsrc_node+0xe3/0x220 io_uring/rsrc.c:453 io_put_rsrc_node io_uring/rsrc.h:81 [inline] io_rsrc_data_free+0xf2/0x200 io_uring/rsrc.c:140 io_free_file_tables+0x23/0x70 io_uring/filetable.c:52 io_sqe_files_unregister+0x53/0x140 io_uring/rsrc.c:477 io_ring_ctx_free+0x49/0xdb0 io_uring/io_uring.c:2715 io_ring_exit_work+0x80f/0x8a0 io_uring/io_uring.c:2952 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa63/0x1850 kernel/workqueue.c:3310 worker_thread+0x870/0xd30 kernel/workqueue.c:3391 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at [email protected]. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. If the report is already addressed, let syzbot know by replying with: #syz fix: exact-commit-title If you want syzbot to run the reproducer, reply with: #syz test: git://repo/address.git branch-or-commit-hash If you attach or paste a git patch, syzbot will apply it before testing. If you want to overwrite report's subsystems, reply with: #syz set subsystems: new-subsystem (See the list of subsystem names on the web dashboard) If the report is a duplicate of another one, reply with: #syz dup: exact-subject-of-another-report If you want to undo deduplication, reply with: #syz undup ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 10:36 [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) syzbot @ 2024-11-04 11:31 ` syzbot 2024-11-04 13:13 ` Pavel Begunkov 0 siblings, 1 reply; 11+ messages in thread From: syzbot @ 2024-11-04 11:31 UTC (permalink / raw) To: asml.silence, axboe, io-uring, linux-kernel, linux-usb, syzkaller-bugs syzbot has bisected this issue to: commit 3f1a546444738b21a8c312a4b49dc168b65c8706 Author: Jens Axboe <[email protected]> Date: Sat Oct 26 01:27:39 2024 +0000 io_uring/rsrc: get rid of per-ring io_rsrc_node list bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000 start commit: c88416ba074a Add linux-next specific files for 20241101 git tree: linux-next final oops: https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000 console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000 kernel config: https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c04740580000 Reported-by: [email protected] Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") For information about bisection process see: https://goo.gl/tpsmEJ#bisection ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 11:31 ` syzbot @ 2024-11-04 13:13 ` Pavel Begunkov 2024-11-04 15:08 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Pavel Begunkov @ 2024-11-04 13:13 UTC (permalink / raw) To: syzbot, axboe, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 11:31, syzbot wrote: > syzbot has bisected this issue to: > > commit 3f1a546444738b21a8c312a4b49dc168b65c8706 > Author: Jens Axboe <[email protected]> > Date: Sat Oct 26 01:27:39 2024 +0000 > > io_uring/rsrc: get rid of per-ring io_rsrc_node list > > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000 > start commit: c88416ba074a Add linux-next specific files for 20241101 > git tree: linux-next > final oops: https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000 > console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000 > kernel config: https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f > dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2 > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000 > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c04740580000 > > Reported-by: [email protected] > Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") > > For information about bisection process see: https://goo.gl/tpsmEJ#bisection Previously all puts were done by requests, which in case of an exiting ring were fallback'ed to normal tw. Now, the unregister path posts CQEs, while the original task is still alive. Should be fine in general because at this point there could be no requests posting in parallel and all is synchronised, so it's a false positive, but we need to change the assert or something else. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 13:13 ` Pavel Begunkov @ 2024-11-04 15:08 ` Jens Axboe 2024-11-04 15:27 ` Pavel Begunkov 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2024-11-04 15:08 UTC (permalink / raw) To: Pavel Begunkov, syzbot, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 6:13 AM, Pavel Begunkov wrote: > On 11/4/24 11:31, syzbot wrote: >> syzbot has bisected this issue to: >> >> commit 3f1a546444738b21a8c312a4b49dc168b65c8706 >> Author: Jens Axboe <[email protected]> >> Date: Sat Oct 26 01:27:39 2024 +0000 >> >> io_uring/rsrc: get rid of per-ring io_rsrc_node list >> >> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000 >> start commit: c88416ba074a Add linux-next specific files for 20241101 >> git tree: linux-next >> final oops: https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000 >> console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000 >> kernel config: https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f >> dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2 >> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000 >> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c04740580000 >> >> Reported-by: [email protected] >> Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") >> >> For information about bisection process see: https://goo.gl/tpsmEJ#bisection > > Previously all puts were done by requests, which in case of an exiting > ring were fallback'ed to normal tw. Now, the unregister path posts CQEs, > while the original task is still alive. Should be fine in general because > at this point there could be no requests posting in parallel and all > is synchronised, so it's a false positive, but we need to change the assert > or something else. Maybe something ala the below? Also changes these triggers to be _once(), no point spamming them. diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 00409505bf07..7792ed91469b 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -137,10 +137,11 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) * Not from an SQE, as those cannot be submitted, but via * updating tagged resources. */ - if (ctx->submitter_task->flags & PF_EXITING) - lockdep_assert(current_work()); + if (ctx->submitter_task->flags & PF_EXITING || + percpu_ref_is_dying(&ctx->refs)) + lockdep_assert_once(current_work()); else - lockdep_assert(current == ctx->submitter_task); + lockdep_assert_once(current == ctx->submitter_task); } #endif } -- Jens Axboe ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 15:08 ` Jens Axboe @ 2024-11-04 15:27 ` Pavel Begunkov 2024-11-04 15:34 ` Pavel Begunkov 0 siblings, 1 reply; 11+ messages in thread From: Pavel Begunkov @ 2024-11-04 15:27 UTC (permalink / raw) To: Jens Axboe, syzbot, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 15:08, Jens Axboe wrote: > On 11/4/24 6:13 AM, Pavel Begunkov wrote: >> On 11/4/24 11:31, syzbot wrote: >>> syzbot has bisected this issue to: >>> >>> commit 3f1a546444738b21a8c312a4b49dc168b65c8706 >>> Author: Jens Axboe <[email protected]> >>> Date: Sat Oct 26 01:27:39 2024 +0000 >>> >>> io_uring/rsrc: get rid of per-ring io_rsrc_node list >>> >>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000 >>> start commit: c88416ba074a Add linux-next specific files for 20241101 >>> git tree: linux-next >>> final oops: https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000 >>> console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000 >>> kernel config: https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f >>> dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2 >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000 >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c04740580000 >>> >>> Reported-by: [email protected] >>> Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") >>> >>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection >> >> Previously all puts were done by requests, which in case of an exiting >> ring were fallback'ed to normal tw. Now, the unregister path posts CQEs, >> while the original task is still alive. Should be fine in general because >> at this point there could be no requests posting in parallel and all >> is synchronised, so it's a false positive, but we need to change the assert >> or something else. > > Maybe something ala the below? Also changes these triggers to be > _once(), no point spamming them. > > diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h > index 00409505bf07..7792ed91469b 100644 > --- a/io_uring/io_uring.h > +++ b/io_uring/io_uring.h > @@ -137,10 +137,11 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) > * Not from an SQE, as those cannot be submitted, but via > * updating tagged resources. > */ > - if (ctx->submitter_task->flags & PF_EXITING) > - lockdep_assert(current_work()); > + if (ctx->submitter_task->flags & PF_EXITING || > + percpu_ref_is_dying(&ctx->refs)) io_move_task_work_from_local() executes requests with a normal task_work of a possible alive task, which which will the check. I was thinking to kill the extra step as it doesn't make sense, git garbage digging shows the patch below, but I don't remember if it has ever been tested. commit 65560732da185c85f472e9c94e6b8ff147fc4b96 Author: Pavel Begunkov <[email protected]> Date: Fri Jun 7 13:13:06 2024 +0100 io_uring: skip normal tw with DEFER_TASKRUN DEFER_TASKRUN execution first falls back to normal task_work and only then, when the task is dying, to workers. It's cleaner to remove the middle step and use workers as the only fallback. It also detaches DEFER_TASKRUN and normal task_work handling from each other. Signed-off-by: Pavel Begunkov <[email protected]> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 9789cf8c68c1..d9e3661ff93d 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1111,9 +1111,8 @@ static inline struct llist_node *io_llist_xchg(struct llist_head *head, return xchg(&head->first, new); } -static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) +static __cold void __io_fallback_tw(struct llist_node *node, bool sync) { - struct llist_node *node = llist_del_all(&tctx->task_list); struct io_ring_ctx *last_ctx = NULL; struct io_kiocb *req; @@ -1139,6 +1138,13 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) } } +static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) +{ + struct llist_node *node = llist_del_all(&tctx->task_list); + + __io_fallback_tw(node, sync); +} + struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, unsigned int max_entries, unsigned int *count) @@ -1287,13 +1293,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx) struct llist_node *node; node = llist_del_all(&ctx->work_llist); - while (node) { - struct io_kiocb *req = container_of(node, struct io_kiocb, - io_task_work.node); - - node = node->next; - io_req_normal_work_add(req); - } + __io_fallback_tw(node, false); } static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events, diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index e46d13e8a215..bc0a800b5ae7 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -128,7 +128,7 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) * Not from an SQE, as those cannot be submitted, but via * updating tagged resources. */ - if (ctx->submitter_task->flags & PF_EXITING) + if (percpu_ref_is_dying(&ctx->refs)) lockdep_assert(current_work()); else lockdep_assert(current == ctx->submitter_task); -- Pavel Begunkov ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 15:27 ` Pavel Begunkov @ 2024-11-04 15:34 ` Pavel Begunkov 2024-11-04 15:43 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Pavel Begunkov @ 2024-11-04 15:34 UTC (permalink / raw) To: Jens Axboe, syzbot, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 15:27, Pavel Begunkov wrote: > On 11/4/24 15:08, Jens Axboe wrote: >> On 11/4/24 6:13 AM, Pavel Begunkov wrote: >>> On 11/4/24 11:31, syzbot wrote: >>>> syzbot has bisected this issue to: >>>> >>>> commit 3f1a546444738b21a8c312a4b49dc168b65c8706 >>>> Author: Jens Axboe <[email protected]> >>>> Date: Sat Oct 26 01:27:39 2024 +0000 >>>> >>>> io_uring/rsrc: get rid of per-ring io_rsrc_node list >>>> >>>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000 >>>> start commit: c88416ba074a Add linux-next specific files for 20241101 >>>> git tree: linux-next >>>> final oops: https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000 >>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000 >>>> kernel config: https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f >>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2 >>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000 >>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c04740580000 >>>> >>>> Reported-by: [email protected] >>>> Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") >>>> >>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection >>> >>> Previously all puts were done by requests, which in case of an exiting >>> ring were fallback'ed to normal tw. Now, the unregister path posts CQEs, >>> while the original task is still alive. Should be fine in general because >>> at this point there could be no requests posting in parallel and all >>> is synchronised, so it's a false positive, but we need to change the assert >>> or something else. >> >> Maybe something ala the below? Also changes these triggers to be >> _once(), no point spamming them. >> >> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h >> index 00409505bf07..7792ed91469b 100644 >> --- a/io_uring/io_uring.h >> +++ b/io_uring/io_uring.h >> @@ -137,10 +137,11 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) >> * Not from an SQE, as those cannot be submitted, but via >> * updating tagged resources. >> */ >> - if (ctx->submitter_task->flags & PF_EXITING) >> - lockdep_assert(current_work()); >> + if (ctx->submitter_task->flags & PF_EXITING || >> + percpu_ref_is_dying(&ctx->refs)) > > io_move_task_work_from_local() executes requests with a normal > task_work of a possible alive task, which which will the check. > > I was thinking to kill the extra step as it doesn't make sense, > git garbage digging shows the patch below, but I don't remember > if it has ever been tested. > > > commit 65560732da185c85f472e9c94e6b8ff147fc4b96 > Author: Pavel Begunkov <[email protected]> > Date: Fri Jun 7 13:13:06 2024 +0100 > > io_uring: skip normal tw with DEFER_TASKRUN > DEFER_TASKRUN execution first falls back to normal task_work and only > then, when the task is dying, to workers. It's cleaner to remove the > middle step and use workers as the only fallback. It also detaches > DEFER_TASKRUN and normal task_work handling from each other. > Signed-off-by: Pavel Begunkov <[email protected]> Not sure what spacing got broken here. Regardless, the rule with sth like that should be simpler, i.e. a ctx is getting killed => everything is run from fallback/kthread. > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 9789cf8c68c1..d9e3661ff93d 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -1111,9 +1111,8 @@ static inline struct llist_node *io_llist_xchg(struct llist_head *head, > return xchg(&head->first, new); > } > > -static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) > +static __cold void __io_fallback_tw(struct llist_node *node, bool sync) > { > - struct llist_node *node = llist_del_all(&tctx->task_list); > struct io_ring_ctx *last_ctx = NULL; > struct io_kiocb *req; > > @@ -1139,6 +1138,13 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) > } > } > > +static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync) > +{ > + struct llist_node *node = llist_del_all(&tctx->task_list); > + > + __io_fallback_tw(node, sync); > +} > + > struct llist_node *tctx_task_work_run(struct io_uring_task *tctx, > unsigned int max_entries, > unsigned int *count) > @@ -1287,13 +1293,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx) > struct llist_node *node; > > node = llist_del_all(&ctx->work_llist); > - while (node) { > - struct io_kiocb *req = container_of(node, struct io_kiocb, > - io_task_work.node); > - > - node = node->next; > - io_req_normal_work_add(req); > - } > + __io_fallback_tw(node, false); > } > > static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events, > diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h > index e46d13e8a215..bc0a800b5ae7 100644 > --- a/io_uring/io_uring.h > +++ b/io_uring/io_uring.h > @@ -128,7 +128,7 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) > * Not from an SQE, as those cannot be submitted, but via > * updating tagged resources. > */ > - if (ctx->submitter_task->flags & PF_EXITING) > + if (percpu_ref_is_dying(&ctx->refs)) > lockdep_assert(current_work()); > else > lockdep_assert(current == ctx->submitter_task); > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 15:34 ` Pavel Begunkov @ 2024-11-04 15:43 ` Jens Axboe 2024-11-04 16:54 ` Pavel Begunkov 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2024-11-04 15:43 UTC (permalink / raw) To: Pavel Begunkov, syzbot, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 8:34 AM, Pavel Begunkov wrote: > On 11/4/24 15:27, Pavel Begunkov wrote: >> On 11/4/24 15:08, Jens Axboe wrote: >>> On 11/4/24 6:13 AM, Pavel Begunkov wrote: >>>> On 11/4/24 11:31, syzbot wrote: >>>>> syzbot has bisected this issue to: >>>>> >>>>> commit 3f1a546444738b21a8c312a4b49dc168b65c8706 >>>>> Author: Jens Axboe <[email protected]> >>>>> Date: Sat Oct 26 01:27:39 2024 +0000 >>>>> >>>>> io_uring/rsrc: get rid of per-ring io_rsrc_node list >>>>> >>>>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000 >>>>> start commit: c88416ba074a Add linux-next specific files for 20241101 >>>>> git tree: linux-next >>>>> final oops: https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000 >>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000 >>>>> kernel config: https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f >>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2 >>>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000 >>>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=12c04740580000 >>>>> >>>>> Reported-by: [email protected] >>>>> Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list") >>>>> >>>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection >>>> >>>> Previously all puts were done by requests, which in case of an exiting >>>> ring were fallback'ed to normal tw. Now, the unregister path posts CQEs, >>>> while the original task is still alive. Should be fine in general because >>>> at this point there could be no requests posting in parallel and all >>>> is synchronised, so it's a false positive, but we need to change the assert >>>> or something else. >>> >>> Maybe something ala the below? Also changes these triggers to be >>> _once(), no point spamming them. >>> >>> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h >>> index 00409505bf07..7792ed91469b 100644 >>> --- a/io_uring/io_uring.h >>> +++ b/io_uring/io_uring.h >>> @@ -137,10 +137,11 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx) >>> * Not from an SQE, as those cannot be submitted, but via >>> * updating tagged resources. >>> */ >>> - if (ctx->submitter_task->flags & PF_EXITING) >>> - lockdep_assert(current_work()); >>> + if (ctx->submitter_task->flags & PF_EXITING || >>> + percpu_ref_is_dying(&ctx->refs)) >> >> io_move_task_work_from_local() executes requests with a normal >> task_work of a possible alive task, which which will the check. >> >> I was thinking to kill the extra step as it doesn't make sense, >> git garbage digging shows the patch below, but I don't remember >> if it has ever been tested. >> >> >> commit 65560732da185c85f472e9c94e6b8ff147fc4b96 >> Author: Pavel Begunkov <[email protected]> >> Date: Fri Jun 7 13:13:06 2024 +0100 >> >> io_uring: skip normal tw with DEFER_TASKRUN >> DEFER_TASKRUN execution first falls back to normal task_work and only >> then, when the task is dying, to workers. It's cleaner to remove the >> middle step and use workers as the only fallback. It also detaches >> DEFER_TASKRUN and normal task_work handling from each other. >> Signed-off-by: Pavel Begunkov <[email protected]> > > Not sure what spacing got broken here. > > Regardless, the rule with sth like that should be simpler, > i.e. a ctx is getting killed => everything is run from fallback/kthread. I like it, and now there's another reason to do it. Can you out the patch? -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 15:43 ` Jens Axboe @ 2024-11-04 16:54 ` Pavel Begunkov 2024-11-04 17:03 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Pavel Begunkov @ 2024-11-04 16:54 UTC (permalink / raw) To: Jens Axboe, syzbot, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 15:43, Jens Axboe wrote: > On 11/4/24 8:34 AM, Pavel Begunkov wrote: >> On 11/4/24 15:27, Pavel Begunkov wrote: ... >> Regardless, the rule with sth like that should be simpler, >> i.e. a ctx is getting killed => everything is run from fallback/kthread. > > I like it, and now there's another reason to do it. Can you out the > patch? Let's see if it works, hopefully will try today. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 16:54 ` Pavel Begunkov @ 2024-11-04 17:03 ` Jens Axboe 2024-11-04 17:05 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2024-11-04 17:03 UTC (permalink / raw) To: Pavel Begunkov, syzbot, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 9:54 AM, Pavel Begunkov wrote: > On 11/4/24 15:43, Jens Axboe wrote: >> On 11/4/24 8:34 AM, Pavel Begunkov wrote: >>> On 11/4/24 15:27, Pavel Begunkov wrote: > ... >>> Regardless, the rule with sth like that should be simpler, >>> i.e. a ctx is getting killed => everything is run from fallback/kthread. >> >> I like it, and now there's another reason to do it. Can you out the >> patch? > > Let's see if it works, hopefully will try today. I already tried it here fwiw, does fix the issue (as expected) and it passes the full testing too. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 17:03 ` Jens Axboe @ 2024-11-04 17:05 ` Jens Axboe 2024-11-05 1:20 ` Pavel Begunkov 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2024-11-04 17:05 UTC (permalink / raw) To: Pavel Begunkov, syzbot, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 10:03 AM, Jens Axboe wrote: > On 11/4/24 9:54 AM, Pavel Begunkov wrote: >> On 11/4/24 15:43, Jens Axboe wrote: >>> On 11/4/24 8:34 AM, Pavel Begunkov wrote: >>>> On 11/4/24 15:27, Pavel Begunkov wrote: >> ... >>>> Regardless, the rule with sth like that should be simpler, >>>> i.e. a ctx is getting killed => everything is run from fallback/kthread. >>> >>> I like it, and now there's another reason to do it. Can you out the >>> patch? >> >> Let's see if it works, hopefully will try today. > > I already tried it here fwiw, does fix the issue (as expected) and it > passes the full testing too. Forgot to include the basic reproducer I wrote for this report, it's below. #include <stdio.h> #include <inttypes.h> #include <stdlib.h> #include <unistd.h> #include <liburing.h> int main(int argc, char *argv[]) { struct io_uring ring; int fds[2], ret; __u64 tags[2]; if (pipe(fds) < 0) { perror("pipe"); return 1; } tags[0] = 1; tags[1] = 2; io_uring_queue_init(4, &ring, IORING_SETUP_SINGLE_ISSUER|IORING_SETUP_DEFER_TASKRUN); io_uring_register_files_tags(&ring, fds, tags, 2); io_uring_queue_exit(&ring); sleep(1); return 0; } -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) 2024-11-04 17:05 ` Jens Axboe @ 2024-11-05 1:20 ` Pavel Begunkov 0 siblings, 0 replies; 11+ messages in thread From: Pavel Begunkov @ 2024-11-05 1:20 UTC (permalink / raw) To: Jens Axboe, syzbot, io-uring, linux-kernel, linux-usb, syzkaller-bugs On 11/4/24 17:05, Jens Axboe wrote: > On 11/4/24 10:03 AM, Jens Axboe wrote: >> On 11/4/24 9:54 AM, Pavel Begunkov wrote: >>> On 11/4/24 15:43, Jens Axboe wrote: >>>> On 11/4/24 8:34 AM, Pavel Begunkov wrote: >>>>> On 11/4/24 15:27, Pavel Begunkov wrote: >>> ... >>>>> Regardless, the rule with sth like that should be simpler, >>>>> i.e. a ctx is getting killed => everything is run from fallback/kthread. >>>> >>>> I like it, and now there's another reason to do it. Can you out the >>>> patch? >>> >>> Let's see if it works, hopefully will try today. >> >> I already tried it here fwiw, does fix the issue (as expected) and it >> passes the full testing too. > > Forgot to include the basic reproducer I wrote for this report, it's > below. Thanks for testing. I was just looking at the patch, and the fun part is that it depends on failing tw off PF_KTHREAD tasks... Otherwise when the fallback executes a request the original task could still be alive => (req->task->flags & PF_EXITING) check passes and the request continues doing stuff from the kthread (with no mm and so). A side note that it should be marginally better than before on the cancellation front because we're killing task_work more eagerly for a dying ctx but alive tasks. I'll send the patch properly. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-11-05 1:20 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-04 10:36 [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2) syzbot 2024-11-04 11:31 ` syzbot 2024-11-04 13:13 ` Pavel Begunkov 2024-11-04 15:08 ` Jens Axboe 2024-11-04 15:27 ` Pavel Begunkov 2024-11-04 15:34 ` Pavel Begunkov 2024-11-04 15:43 ` Jens Axboe 2024-11-04 16:54 ` Pavel Begunkov 2024-11-04 17:03 ` Jens Axboe 2024-11-04 17:05 ` Jens Axboe 2024-11-05 1:20 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox