INFO: task can't die in io_sq_thread

public inbox for [email protected]
 help / color / mirror / Atom feed

* INFO: task can't die in io_sq_thread_stop
@ 2020-11-15  8:30 syzbot
  2020-11-16  9:32 ` Xiaoguang Wang
  0 siblings, 1 reply; 4+ messages in thread
From: syzbot @ 2020-11-15  8:30 UTC (permalink / raw)
  To: axboe, io-uring, linux-fsdevel, linux-kernel, syzkaller-bugs,
	viro

Hello,

syzbot found the following issue on:

HEAD commit:    6dd65e60 Add linux-next specific files for 20201110
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14727d42500000
kernel config:  https://syzkaller.appspot.com/x/.config?x=4fab43daf5c54712
dashboard link: https://syzkaller.appspot.com/bug?extid=03beeb595f074db9cfd1
compiler:       gcc (GCC) 10.1.0-syz 20200507

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: [email protected]

INFO: task syz-executor.2:12399 can't die for more than 143 seconds.
task:syz-executor.2  state:D stack:28744 pid:12399 ppid:  8504 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3773 [inline]
 __schedule+0x893/0x2170 kernel/sched/core.c:4522
 schedule+0xcf/0x270 kernel/sched/core.c:4600
 schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
 kthread_stop+0x17a/0x720 kernel/kthread.c:596
 io_put_sq_data fs/io_uring.c:7193 [inline]
 io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
 io_finish_async fs/io_uring.c:7297 [inline]
 io_sq_offload_create fs/io_uring.c:8015 [inline]
 io_uring_create fs/io_uring.c:9433 [inline]
 io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45deb9
Code: Unable to access opcode bytes at RIP 0x45de8f.
RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
INFO: task syz-executor.2:12399 blocked for more than 143 seconds.
      Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor.2  state:D stack:28744 pid:12399 ppid:  8504 flags:0x00004004
Call Trace:
 context_switch kernel/sched/core.c:3773 [inline]
 __schedule+0x893/0x2170 kernel/sched/core.c:4522
 schedule+0xcf/0x270 kernel/sched/core.c:4600
 schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
 kthread_stop+0x17a/0x720 kernel/kthread.c:596
 io_put_sq_data fs/io_uring.c:7193 [inline]
 io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
 io_finish_async fs/io_uring.c:7297 [inline]
 io_sq_offload_create fs/io_uring.c:8015 [inline]
 io_uring_create fs/io_uring.c:9433 [inline]
 io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45deb9
Code: Unable to access opcode bytes at RIP 0x45de8f.
RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c

Showing all locks held in the system:
1 lock held by khungtaskd/1653:
 #0: ffffffff8b3386a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6253
1 lock held by systemd-journal/4873:
1 lock held by in:imklog/8167:
 #0: ffff88801c86e0f0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:932

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1653 Comm: khungtaskd Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:147 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:253 [inline]
 watchdog+0xd89/0xf30 kernel/hung_task.c:338
 kthread+0x3af/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_dev_trap_report_work
RIP: 0010:mark_lock+0x30/0x24c0 kernel/locking/lockdep.c:4371
Code: 41 54 41 89 d4 48 ba 00 00 00 00 00 fc ff df 55 53 48 81 ec 18 01 00 00 48 8d 5c 24 38 48 89 3c 24 48 c7 44 24 38 b3 8a b5 41 <48> c1 eb 03 48 c7 44 24 40 30 1b c6 8a 48 8d 04 13 48 c7 44 24 48
RSP: 0018:ffffc90000ca7988 EFLAGS: 00000096
RAX: 0000000000000004 RBX: ffffc90000ca79c0 RCX: ffffffff8155b947
RDX: dffffc0000000000 RSI: ffff888010d20918 RDI: ffff888010d20000
RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8ebb477f
R10: fffffbfff1d768ef R11: 000000004fb6aa4b R12: 0000000000000006
R13: dffffc0000000000 R14: ffff888010d20918 R15: 0000000000000022
FS:  0000000000000000(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8ffcf99000 CR3: 000000001b2e7000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 mark_held_locks+0x9f/0xe0 kernel/locking/lockdep.c:4011
 __trace_hardirqs_on_caller kernel/locking/lockdep.c:4037 [inline]
 lockdep_hardirqs_on_prepare kernel/locking/lockdep.c:4097 [inline]
 lockdep_hardirqs_on_prepare+0x28b/0x400 kernel/locking/lockdep.c:4049
 trace_hardirqs_on+0x5b/0x1c0 kernel/trace/trace_preemptirq.c:49
 __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
 _raw_spin_unlock_irqrestore+0x42/0x50 kernel/locking/spinlock.c:191
 extract_crng drivers/char/random.c:1026 [inline]
 _get_random_bytes+0x229/0x670 drivers/char/random.c:1562
 nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:538 [inline]
 nsim_dev_trap_report drivers/net/netdevsim/dev.c:568 [inline]
 nsim_dev_trap_report_work+0x740/0xbd0 drivers/net/netdevsim/dev.c:609
 process_one_work+0x933/0x15a0 kernel/workqueue.c:2272
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2418
 kthread+0x3af/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: INFO: task can't die in io_sq_thread_stop
  2020-11-15  8:30 INFO: task can't die in io_sq_thread_stop syzbot
@ 2020-11-16  9:32 ` Xiaoguang Wang
  2020-11-18  3:27   ` Xiaoguang Wang
  0 siblings, 1 reply; 4+ messages in thread
From: Xiaoguang Wang @ 2020-11-16  9:32 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-fsdevel, linux-kernel,
	syzkaller-bugs, viro

hi jens,

> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    6dd65e60 Add linux-next specific files for 20201110
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=14727d42500000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=4fab43daf5c54712
> dashboard link: https://syzkaller.appspot.com/bug?extid=03beeb595f074db9cfd1
> compiler:       gcc (GCC) 10.1.0-syz 20200507
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: [email protected]
> 
> INFO: task syz-executor.2:12399 can't die for more than 143 seconds.
> task:syz-executor.2  state:D stack:28744 pid:12399 ppid:  8504 flags:0x00004004
> Call Trace:
>   context_switch kernel/sched/core.c:3773 [inline]
>   __schedule+0x893/0x2170 kernel/sched/core.c:4522
>   schedule+0xcf/0x270 kernel/sched/core.c:4600
>   schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
>   do_wait_for_common kernel/sched/completion.c:85 [inline]
>   __wait_for_common kernel/sched/completion.c:106 [inline]
>   wait_for_common kernel/sched/completion.c:117 [inline]
>   wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
>   kthread_stop+0x17a/0x720 kernel/kthread.c:596
>   io_put_sq_data fs/io_uring.c:7193 [inline]
>   io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
>   io_finish_async fs/io_uring.c:7297 [inline]
>   io_sq_offload_create fs/io_uring.c:8015 [inline]
>   io_uring_create fs/io_uring.c:9433 [inline]
>   io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
I also don't have a reproducer yet, but seems that there is a race
in current codes:                  |
=> io_put_sq_data                  |
==> kthread_park(sqd->thread);     |
                                    | T1: sq thread is parked now.
==> kthread_stop(sqd->thread);     |
===> kthread_unpark(k);            |
                                    | T2: sq thread is now unpared, can run again
                                    |
                                    | T3: sq thread is now preempted out.
                                    |
===> wake_up_process(k);           |
                                    |
                                    | T4: Since sqd ctx list is empty, needs_sched will be true,
                                    | then sq thread sets task state to TASK_INTERRUPTIBLE,
                                    | and schedule, now sq thread will never be waken up.
===> wait_for_completion           |

I have artificially used mdelay() to simulate above race, will get same stack like
this syzbot report.

-               if (kthread_should_park())
+               if (kthread_should_park()) {
                         kthread_parkme();
+                       if (kthread_should_stop())
+                               break;
+               }
this diff can fix this issue, and if ctx_list is empty, we don't need to call schedule().

Regards,
Xiaoguang Wang


> RIP: 0033:0x45deb9
> Code: Unable to access opcode bytes at RIP 0x45de8f.
> RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
> RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
> R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
> INFO: task syz-executor.2:12399 blocked for more than 143 seconds.
>        Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:syz-executor.2  state:D stack:28744 pid:12399 ppid:  8504 flags:0x00004004
> Call Trace:
>   context_switch kernel/sched/core.c:3773 [inline]
>   __schedule+0x893/0x2170 kernel/sched/core.c:4522
>   schedule+0xcf/0x270 kernel/sched/core.c:4600
>   schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
>   do_wait_for_common kernel/sched/completion.c:85 [inline]
>   __wait_for_common kernel/sched/completion.c:106 [inline]
>   wait_for_common kernel/sched/completion.c:117 [inline]
>   wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
>   kthread_stop+0x17a/0x720 kernel/kthread.c:596
>   io_put_sq_data fs/io_uring.c:7193 [inline]
>   io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
>   io_finish_async fs/io_uring.c:7297 [inline]
>   io_sq_offload_create fs/io_uring.c:8015 [inline]
>   io_uring_create fs/io_uring.c:9433 [inline]
>   io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x45deb9
> Code: Unable to access opcode bytes at RIP 0x45de8f.
> RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
> RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
> R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1653:
>   #0: ffffffff8b3386a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6253
> 1 lock held by systemd-journal/4873:
> 1 lock held by in:imklog/8167:
>   #0: ffff88801c86e0f0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:932
> 
> =============================================
> 
> NMI backtrace for cpu 1
> CPU: 1 PID: 1653 Comm: khungtaskd Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x107/0x163 lib/dump_stack.c:118
>   nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
>   nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>   trigger_all_cpu_backtrace include/linux/nmi.h:147 [inline]
>   check_hung_uninterruptible_tasks kernel/hung_task.c:253 [inline]
>   watchdog+0xd89/0xf30 kernel/hung_task.c:338
>   kthread+0x3af/0x4a0 kernel/kthread.c:292
>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Workqueue: events nsim_dev_trap_report_work
> RIP: 0010:mark_lock+0x30/0x24c0 kernel/locking/lockdep.c:4371
> Code: 41 54 41 89 d4 48 ba 00 00 00 00 00 fc ff df 55 53 48 81 ec 18 01 00 00 48 8d 5c 24 38 48 89 3c 24 48 c7 44 24 38 b3 8a b5 41 <48> c1 eb 03 48 c7 44 24 40 30 1b c6 8a 48 8d 04 13 48 c7 44 24 48
> RSP: 0018:ffffc90000ca7988 EFLAGS: 00000096
> RAX: 0000000000000004 RBX: ffffc90000ca79c0 RCX: ffffffff8155b947
> RDX: dffffc0000000000 RSI: ffff888010d20918 RDI: ffff888010d20000
> RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8ebb477f
> R10: fffffbfff1d768ef R11: 000000004fb6aa4b R12: 0000000000000006
> R13: dffffc0000000000 R14: ffff888010d20918 R15: 0000000000000022
> FS:  0000000000000000(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f8ffcf99000 CR3: 000000001b2e7000 CR4: 00000000001506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   mark_held_locks+0x9f/0xe0 kernel/locking/lockdep.c:4011
>   __trace_hardirqs_on_caller kernel/locking/lockdep.c:4037 [inline]
>   lockdep_hardirqs_on_prepare kernel/locking/lockdep.c:4097 [inline]
>   lockdep_hardirqs_on_prepare+0x28b/0x400 kernel/locking/lockdep.c:4049
>   trace_hardirqs_on+0x5b/0x1c0 kernel/trace/trace_preemptirq.c:49
>   __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
>   _raw_spin_unlock_irqrestore+0x42/0x50 kernel/locking/spinlock.c:191
>   extract_crng drivers/char/random.c:1026 [inline]
>   _get_random_bytes+0x229/0x670 drivers/char/random.c:1562
>   nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:538 [inline]
>   nsim_dev_trap_report drivers/net/netdevsim/dev.c:568 [inline]
>   nsim_dev_trap_report_work+0x740/0xbd0 drivers/net/netdevsim/dev.c:609
>   process_one_work+0x933/0x15a0 kernel/workqueue.c:2272
>   worker_thread+0x64c/0x1120 kernel/workqueue.c:2418
>   kthread+0x3af/0x4a0 kernel/kthread.c:292
>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: INFO: task can't die in io_sq_thread_stop
  2020-11-16  9:32 ` Xiaoguang Wang
@ 2020-11-18  3:27   ` Xiaoguang Wang
  2020-11-18 17:58     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Xiaoguang Wang @ 2020-11-18  3:27 UTC (permalink / raw)
  To: syzbot, axboe, io-uring, linux-fsdevel, linux-kernel,
	syzkaller-bugs, viro

hi,

A gentle reminder, in case you overlooked this syzbot report.

Regards,
Xiaoguang Wang

> hi jens,
> 
>> Hello,
>>
>> syzbot found the following issue on:
>>
>> HEAD commit:    6dd65e60 Add linux-next specific files for 20201110
>> git tree:       linux-next
>> console output: https://syzkaller.appspot.com/x/log.txt?x=14727d42500000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=4fab43daf5c54712
>> dashboard link: https://syzkaller.appspot.com/bug?extid=03beeb595f074db9cfd1
>> compiler:       gcc (GCC) 10.1.0-syz 20200507
>>
>> Unfortunately, I don't have any reproducer for this issue yet.
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the commit:
>> Reported-by: [email protected]
>>
>> INFO: task syz-executor.2:12399 can't die for more than 143 seconds.
>> task:syz-executor.2  state:D stack:28744 pid:12399 ppid:  8504 flags:0x00004004
>> Call Trace:
>>   context_switch kernel/sched/core.c:3773 [inline]
>>   __schedule+0x893/0x2170 kernel/sched/core.c:4522
>>   schedule+0xcf/0x270 kernel/sched/core.c:4600
>>   schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
>>   do_wait_for_common kernel/sched/completion.c:85 [inline]
>>   __wait_for_common kernel/sched/completion.c:106 [inline]
>>   wait_for_common kernel/sched/completion.c:117 [inline]
>>   wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
>>   kthread_stop+0x17a/0x720 kernel/kthread.c:596
>>   io_put_sq_data fs/io_uring.c:7193 [inline]
>>   io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
>>   io_finish_async fs/io_uring.c:7297 [inline]
>>   io_sq_offload_create fs/io_uring.c:8015 [inline]
>>   io_uring_create fs/io_uring.c:9433 [inline]
>>   io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
>>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> I also don't have a reproducer yet, but seems that there is a race
> in current codes:                  |
> => io_put_sq_data                  |
> ==> kthread_park(sqd->thread);     |
>                                     | T1: sq thread is parked now.
> ==> kthread_stop(sqd->thread);     |
> ===> kthread_unpark(k);            |
>                                     | T2: sq thread is now unpared, can run again
>                                     |
>                                     | T3: sq thread is now preempted out.
>                                     |
> ===> wake_up_process(k);           |
>                                     |
>                                     | T4: Since sqd ctx list is empty, needs_sched will be true,
>                                     | then sq thread sets task state to TASK_INTERRUPTIBLE,
>                                     | and schedule, now sq thread will never be waken up.
> ===> wait_for_completion           |
> 
> I have artificially used mdelay() to simulate above race, will get same stack like
> this syzbot report.
> 
> -               if (kthread_should_park())
> +               if (kthread_should_park()) {
>                          kthread_parkme();
> +                       if (kthread_should_stop())
> +                               break;
> +               }
> this diff can fix this issue, and if ctx_list is empty, we don't need to call schedule().
> 
> Regards,
> Xiaoguang Wang
> 
> 
>> RIP: 0033:0x45deb9
>> Code: Unable to access opcode bytes at RIP 0x45de8f.
>> RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
>> RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
>> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
>> RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
>> R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
>> INFO: task syz-executor.2:12399 blocked for more than 143 seconds.
>>        Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:syz-executor.2  state:D stack:28744 pid:12399 ppid:  8504 flags:0x00004004
>> Call Trace:
>>   context_switch kernel/sched/core.c:3773 [inline]
>>   __schedule+0x893/0x2170 kernel/sched/core.c:4522
>>   schedule+0xcf/0x270 kernel/sched/core.c:4600
>>   schedule_timeout+0x1d8/0x250 kernel/time/timer.c:1847
>>   do_wait_for_common kernel/sched/completion.c:85 [inline]
>>   __wait_for_common kernel/sched/completion.c:106 [inline]
>>   wait_for_common kernel/sched/completion.c:117 [inline]
>>   wait_for_completion+0x163/0x260 kernel/sched/completion.c:138
>>   kthread_stop+0x17a/0x720 kernel/kthread.c:596
>>   io_put_sq_data fs/io_uring.c:7193 [inline]
>>   io_sq_thread_stop+0x452/0x570 fs/io_uring.c:7290
>>   io_finish_async fs/io_uring.c:7297 [inline]
>>   io_sq_offload_create fs/io_uring.c:8015 [inline]
>>   io_uring_create fs/io_uring.c:9433 [inline]
>>   io_uring_setup+0x19b7/0x3730 fs/io_uring.c:9507
>>   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
>>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> RIP: 0033:0x45deb9
>> Code: Unable to access opcode bytes at RIP 0x45de8f.
>> RSP: 002b:00007f174e51ac78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
>> RAX: ffffffffffffffda RBX: 0000000000008640 RCX: 000000000045deb9
>> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 00000000000050e5
>> RBP: 000000000118bf58 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
>> R13: 00007ffed9ca723f R14: 00007f174e51b9c0 R15: 000000000118bf2c
>>
>> Showing all locks held in the system:
>> 1 lock held by khungtaskd/1653:
>>   #0: ffffffff8b3386a0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6253
>> 1 lock held by systemd-journal/4873:
>> 1 lock held by in:imklog/8167:
>>   #0: ffff88801c86e0f0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:932
>>
>> =============================================
>>
>> NMI backtrace for cpu 1
>> CPU: 1 PID: 1653 Comm: khungtaskd Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>   __dump_stack lib/dump_stack.c:77 [inline]
>>   dump_stack+0x107/0x163 lib/dump_stack.c:118
>>   nmi_cpu_backtrace.cold+0x44/0xd7 lib/nmi_backtrace.c:105
>>   nmi_trigger_cpumask_backtrace+0x1b3/0x230 lib/nmi_backtrace.c:62
>>   trigger_all_cpu_backtrace include/linux/nmi.h:147 [inline]
>>   check_hung_uninterruptible_tasks kernel/hung_task.c:253 [inline]
>>   watchdog+0xd89/0xf30 kernel/hung_task.c:338
>>   kthread+0x3af/0x4a0 kernel/kthread.c:292
>>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
>> Sending NMI from CPU 1 to CPUs 0:
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 5 Comm: kworker/0:0 Not tainted 5.10.0-rc3-next-20201110-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Workqueue: events nsim_dev_trap_report_work
>> RIP: 0010:mark_lock+0x30/0x24c0 kernel/locking/lockdep.c:4371
>> Code: 41 54 41 89 d4 48 ba 00 00 00 00 00 fc ff df 55 53 48 81 ec 18 01 00 00 48 8d 5c 24 38 48 89 3c 24 48 c7 44 24 38 b3 8a b5 41 <48> c1 eb 03 48 c7 44 24 40 30 1b c6 8a 48 8d 04 13 48 c7 44 24 48
>> RSP: 0018:ffffc90000ca7988 EFLAGS: 00000096
>> RAX: 0000000000000004 RBX: ffffc90000ca79c0 RCX: ffffffff8155b947
>> RDX: dffffc0000000000 RSI: ffff888010d20918 RDI: ffff888010d20000
>> RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8ebb477f
>> R10: fffffbfff1d768ef R11: 000000004fb6aa4b R12: 0000000000000006
>> R13: dffffc0000000000 R14: ffff888010d20918 R15: 0000000000000022
>> FS:  0000000000000000(0000) GS:ffff8880b9e00000(0000) knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 00007f8ffcf99000 CR3: 000000001b2e7000 CR4: 00000000001506f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>   mark_held_locks+0x9f/0xe0 kernel/locking/lockdep.c:4011
>>   __trace_hardirqs_on_caller kernel/locking/lockdep.c:4037 [inline]
>>   lockdep_hardirqs_on_prepare kernel/locking/lockdep.c:4097 [inline]
>>   lockdep_hardirqs_on_prepare+0x28b/0x400 kernel/locking/lockdep.c:4049
>>   trace_hardirqs_on+0x5b/0x1c0 kernel/trace/trace_preemptirq.c:49
>>   __raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline]
>>   _raw_spin_unlock_irqrestore+0x42/0x50 kernel/locking/spinlock.c:191
>>   extract_crng drivers/char/random.c:1026 [inline]
>>   _get_random_bytes+0x229/0x670 drivers/char/random.c:1562
>>   nsim_dev_trap_skb_build drivers/net/netdevsim/dev.c:538 [inline]
>>   nsim_dev_trap_report drivers/net/netdevsim/dev.c:568 [inline]
>>   nsim_dev_trap_report_work+0x740/0xbd0 drivers/net/netdevsim/dev.c:609
>>   process_one_work+0x933/0x15a0 kernel/workqueue.c:2272
>>   worker_thread+0x64c/0x1120 kernel/workqueue.c:2418
>>   kthread+0x3af/0x4a0 kernel/kthread.c:292
>>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
>>
>>
>> ---
>> This report is generated by a bot. It may contain errors.
>> See https://goo.gl/tpsmEJ for more information about syzbot.
>> syzbot engineers can be reached at [email protected].
>>
>> syzbot will keep track of this issue. See:
>> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: INFO: task can't die in io_sq_thread_stop
  2020-11-18  3:27   ` Xiaoguang Wang
@ 2020-11-18 17:58     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2020-11-18 17:58 UTC (permalink / raw)
  To: Xiaoguang Wang, syzbot, io-uring, linux-fsdevel, linux-kernel,
	syzkaller-bugs, viro

On 11/17/20 8:27 PM, Xiaoguang Wang wrote:
> hi,
> 
> A gentle reminder, in case you overlooked this syzbot report.

Did see it (and your reply), was hoping you'd send an actual patch
for this (nudge, nudge).

With luck, maybe we'll see a reproducer out of syzbot at some point too.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-18 17:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-15  8:30 INFO: task can't die in io_sq_thread_stop syzbot
2020-11-16  9:32 ` Xiaoguang Wang
2020-11-18  3:27   ` Xiaoguang Wang
2020-11-18 17:58     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox