INFO: task hung in io_uring_cancel

public inbox for [email protected]
 help / color / mirror / Atom feed

* INFO: task hung in io_uring_cancel_generic
@ 2021-09-07 11:50 Hao Sun
  2021-09-07 19:30 ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Hao Sun @ 2021-09-07 11:50 UTC (permalink / raw)
  To: axboe, io-uring; +Cc: Pavel Begunkov, linux-kernel

Hello,

When using Healer to fuzz the latest Linux kernel, the following crash
was triggered.

HEAD commit: 7d2a07b76933 Linux 5.14
git tree: upstream
console output:
https://drive.google.com/file/d/1c8uRooM0TwJiTIwEviOCB4RC-hhOgGHR/view?usp=sharing
kernel config: https://drive.google.com/file/d/1XD9WYDViQLSXN7RGwH8AGGDvP9JvOghx/view?usp=sharing
Similar report:
https://groups.google.com/u/1/g/syzkaller-bugs/c/FvdcTiJIGtY/m/PcXkoenUAAAJ

Sorry, I don't have a reproducer for this crash, hope the symbolized
report can help.
If you fix this issue, please add the following tag to the commit:
Reported-by: Hao Sun <[email protected]>

INFO: task syz-executor:10969 blocked for more than 143 seconds.
      Not tainted 5.14.0+ #26
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor    state:D stack:28528 pid:10969 ppid:  9213 flags:0x00024004
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 io_uring_cancel_generic+0x458/0x740 fs/io_uring.c:9646
 io_uring_files_cancel include/linux/io_uring.h:16 [inline]
 do_exit+0x25c/0x2dd0 kernel/exit.c:780
 do_group_exit+0x125/0x340 kernel/exit.c:922
 get_signal+0x4d5/0x25a0 kernel/signal.c:2868
 arch_do_signal_or_restart+0x2ed/0x1c40 arch/x86/kernel/signal.c:865
 handle_signal_work kernel/entry/common.c:148 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
 exit_to_user_mode_prepare+0x192/0x2a0 kernel/entry/common.c:209
 __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
 syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
 do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4739cd
RSP: 002b:00007fb8de229218 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000059c0a0 RCX: 00000000004739cd
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 000000000059c0a8
RBP: 000000000059c0a8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000059c0ac
R13: 00007ffc3738731f R14: 00007ffc373874c0 R15: 00007fb8de229300

Showing all locks held in the system:
1 lock held by khungtaskd/1674:
 #0: ffffffff8b97dde0 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8129:
 #0: ffff888016f972f0 (&f->f_pos_lock){+.+.}-{3:3}, at:
__fdget_pos+0xe9/0x100 fs/file.c:990

=============================================

NMI backtrace for cpu 1
CPU: 1 PID: 1674 Comm: khungtaskd Not tainted 5.14.0+ #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1e1/0x220 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xcc8/0x1010 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 1 to CPUs 0,2-3:
NMI backtrace for cpu 2
CPU: 2 PID: 4927 Comm: systemd-journal Not tainted 5.14.0+ #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:preempt_count arch/x86/include/asm/preempt.h:27 [inline]
RIP: 0010:check_kcov_mode+0x0/0x40 kernel/kcov.c:163
Code: 48 89 df e8 02 80 47 00 e9 59 fe ff ff 48 8b 7c 24 08 e8 f3 7f
47 00 e9 61 fd ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc <65> 8b
05 29 d0 8b 7e 89 c2 81 e2 00 01 00 00 a9 00 01 ff 00 74 10
RSP: 0018:ffffc90001077d08 EFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff888014fa9d40
RDX: 0000000000000000 RSI: ffff888014fa9d40 RDI: 0000000000000002
RBP: ffff88802cb20000 R08: ffffffff817c1bfe R09: 0000000000000000
R10: 0000000000000005 R11: ffffed100c7e6531 R12: ffff888010e0af00
R13: 0000000000000200 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f9028f938c0(0000) GS:ffff888063f00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f90256da000 CR3: 00000000193df000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 __sanitizer_cov_trace_pc+0x1a/0x40 kernel/kcov.c:197
 trace_hardirqs_on+0x6e/0x1c0 kernel/trace/trace_preemptirq.c:44
 kasan_quarantine_put+0x11d/0x1c0 mm/kasan/quarantine.c:220
 kasan_slab_free include/linux/kasan.h:230 [inline]
 slab_free_hook mm/slub.c:1628 [inline]
 slab_free_freelist_hook+0xe1/0x240 mm/slub.c:1653
 slab_free mm/slub.c:3213 [inline]
 kmem_cache_free+0x8a/0x5c0 mm/slub.c:3229
 putname+0x10c/0x150 fs/namei.c:270
 do_mkdirat+0x18a/0x2b0 fs/namei.c:3920
 __do_sys_mkdir fs/namei.c:3931 [inline]
 __se_sys_mkdir fs/namei.c:3929 [inline]
 __x64_sys_mkdir+0x61/0x80 fs/namei.c:3929
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f902824f687
Code: 00 b8 ff ff ff ff c3 0f 1f 40 00 48 8b 05 09 d8 2b 00 64 c7 00
5f 00 00 00 b8 ff ff ff ff c3 0f 1f 40 00 b8 53 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d e1 d7 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffd1a764c18 EFLAGS: 00000293 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00007ffd1a767b30 RCX: 00007f902824f687
RDX: 00007f9028cc0a00 RSI: 00000000000001ed RDI: 00005650418998a0
RBP: 00007ffd1a764c50 R08: 000000000000eec0 R09: 0000000000000000
R10: 0000000000000069 R11: 0000000000000293 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffd1a767b30 R15: 00007ffd1a765140
NMI backtrace for cpu 3
CPU: 3 PID: 15560 Comm: syz-executor Not tainted 5.14.0+ #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:memchr+0x2e/0x80 lib/string.c:1070
Code: 48 bd 00 00 00 00 00 fc ff df 53 48 83 ec 18 eb 26 48 8d 5f 01
48 89 f8 48 89 f9 48 c1 e8 03 83 e1 07 0f b6 04 28 38 c8 7f 04 <84> c0
75 21 40 3a 73 ff 74 11 48 89 df 48 39 d7 75 d5 48 83 c4 18
RSP: 0018:ffffc90007f7f910 EFLAGS: 00000293
RAX: 0000000000000000 RBX: ffffffff90212e86 RCX: 0000000000000005
RDX: ffffffff90212e93 RSI: 000000000000000a RDI: ffffffff90212e85
RBP: dffffc0000000000 R08: ffffffff815dff8f R09: 0000000000000000
R10: 0000000000000005 R11: fffff52000feff84 R12: 0000000000000001
R13: dffffc0000000000 R14: 000000000000001b R15: ffffffff90212e78
FS:  00007f917dcd5700(0000) GS:ffff888135d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000005f CR3: 00000001133e5000 CR4: 0000000000750ee0
PKRU: 55555554
Call Trace:
 memchr include/linux/fortify-string.h:246 [inline]
 count_lines kernel/printk/printk_ringbuffer.c:1675 [inline]
 copy_data kernel/printk/printk_ringbuffer.c:1721 [inline]
 prb_read kernel/printk/printk_ringbuffer.c:1814 [inline]
 _prb_read_valid+0x3ae/0x6b0 kernel/printk/printk_ringbuffer.c:1880
 prb_read_valid_info+0xa7/0xe0 kernel/printk/printk_ringbuffer.c:1963
 find_first_fitting_seq+0xfd/0x210 kernel/printk/printk.c:1450
 syslog_print_all+0x15a/0x3a0 kernel/printk/printk.c:1590
 do_syslog kernel/printk/printk.c:1669 [inline]
 do_syslog+0x53f/0x650 kernel/printk/printk.c:1632
 __do_sys_syslog kernel/printk/printk.c:1747 [inline]
 __se_sys_syslog kernel/printk/printk.c:1745 [inline]
 __x64_sys_syslog+0x71/0xb0 kernel/printk/printk.c:1745
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x2000010c
Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 fc 0d 00 00 00 00 00 00 36 e1 06 4a 2a e9 2c b8 b6 4c 0f 05 <bf> 03
00 00 00 c4 a3 7b f0 c5 01 41 e2 e9 c4 22 e9 aa bb 3c 00 00
RSP: 002b:00007f917dcd50f8 EFLAGS: 00000203 ORIG_RAX: 0000000000000067
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 000000002000010c
RDX: 0000000000004c01 RSI: 0000000000000003 RDI: 0000000000000003
RBP: 0000000000000033 R08: 0000000000000005 R09: 0000000000000006
R10: 0000000000000007 R11: 0000000000000203 R12: 000000000000000b
R13: 000000000000000c R14: 000000000000000d R15: 00007f917dcd5300
NMI backtrace for cpu 0 skipped: idling at native_safe_halt
arch/x86/include/asm/irqflags.h:51 [inline]
NMI backtrace for cpu 0 skipped: idling at arch_safe_halt
arch/x86/include/asm/irqflags.h:89 [inline]
NMI backtrace for cpu 0 skipped: idling at default_idle+0xb/0x10
arch/x86/kernel/process.c:716
----------------
Code disassembly (best guess):
   0: 48 89 df              mov    %rbx,%rdi
   3: e8 02 80 47 00        callq  0x47800a
   8: e9 59 fe ff ff        jmpq   0xfffffe66
   d: 48 8b 7c 24 08        mov    0x8(%rsp),%rdi
  12: e8 f3 7f 47 00        callq  0x47800a
  17: e9 61 fd ff ff        jmpq   0xfffffd7d
  1c: cc                    int3
  1d: cc                    int3
  1e: cc                    int3
  1f: cc                    int3
  20: cc                    int3
  21: cc                    int3
  22: cc                    int3
  23: cc                    int3
  24: cc                    int3
  25: cc                    int3
  26: cc                    int3
  27: cc                    int3
  28: cc                    int3
  29: cc                    int3
* 2a: 65 8b 05 29 d0 8b 7e mov    %gs:0x7e8bd029(%rip),%eax        #
0x7e8bd05a <-- trapping instruction
  31: 89 c2                mov    %eax,%edx
  33: 81 e2 00 01 00 00    and    $0x100,%edx
  39: a9 00 01 ff 00        test   $0xff0100,%eax
  3e: 74 10                je     0x50%

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-07 11:50 INFO: task hung in io_uring_cancel_generic Hao Sun
@ 2021-09-07 19:30 ` Jens Axboe
  2021-09-07 21:31   ` Pavel Begunkov
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2021-09-07 19:30 UTC (permalink / raw)
  To: Hao Sun, io-uring; +Cc: Pavel Begunkov, linux-kernel

On 9/7/21 5:50 AM, Hao Sun wrote:
> Hello,
> 
> When using Healer to fuzz the latest Linux kernel, the following crash
> was triggered.
> 
> HEAD commit: 7d2a07b76933 Linux 5.14
> git tree: upstream
> console output:
> https://drive.google.com/file/d/1c8uRooM0TwJiTIwEviOCB4RC-hhOgGHR/view?usp=sharing
> kernel config: https://drive.google.com/file/d/1XD9WYDViQLSXN7RGwH8AGGDvP9JvOghx/view?usp=sharing
> Similar report:
> https://groups.google.com/u/1/g/syzkaller-bugs/c/FvdcTiJIGtY/m/PcXkoenUAAAJ
> 
> Sorry, I don't have a reproducer for this crash, hope the symbolized
> report can help.
> If you fix this issue, please add the following tag to the commit:
> Reported-by: Hao Sun <[email protected]>

Would be great with a reproducer for this one, though...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-07 19:30 ` Jens Axboe
@ 2021-09-07 21:31   ` Pavel Begunkov
  2021-09-08  1:01     ` Hao Sun
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2021-09-07 21:31 UTC (permalink / raw)
  To: Jens Axboe, Hao Sun, io-uring; +Cc: linux-kernel

On 9/7/21 8:30 PM, Jens Axboe wrote:
> On 9/7/21 5:50 AM, Hao Sun wrote:
>> Hello,
>>
>> When using Healer to fuzz the latest Linux kernel, the following crash
>> was triggered.
>>
>> HEAD commit: 7d2a07b76933 Linux 5.14
>> git tree: upstream
>> console output:
>> https://drive.google.com/file/d/1c8uRooM0TwJiTIwEviOCB4RC-hhOgGHR/view?usp=sharing
>> kernel config: https://drive.google.com/file/d/1XD9WYDViQLSXN7RGwH8AGGDvP9JvOghx/view?usp=sharing
>> Similar report:
>> https://groups.google.com/u/1/g/syzkaller-bugs/c/FvdcTiJIGtY/m/PcXkoenUAAAJ
>>
>> Sorry, I don't have a reproducer for this crash, hope the symbolized
>> report can help.
>> If you fix this issue, please add the following tag to the commit:
>> Reported-by: Hao Sun <[email protected]>
> 
> Would be great with a reproducer for this one, though...

And syzbot usually sends an execution log with all syz programs
it run, which may be helpful. Any chance you have anything similar
left?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-07 21:31   ` Pavel Begunkov
@ 2021-09-08  1:01     ` Hao Sun
  2021-09-08  6:57       ` Hao Sun
  0 siblings, 1 reply; 11+ messages in thread
From: Hao Sun @ 2021-09-08  1:01 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, linux-kernel

Pavel Begunkov <[email protected]> 于2021年9月8日周三 上午5:31写道：
>
> On 9/7/21 8:30 PM, Jens Axboe wrote:
> > On 9/7/21 5:50 AM, Hao Sun wrote:
> >> Hello,
> >>
> >> When using Healer to fuzz the latest Linux kernel, the following crash
> >> was triggered.
> >>
> >> HEAD commit: 7d2a07b76933 Linux 5.14
> >> git tree: upstream
> >> console output:
> >> https://drive.google.com/file/d/1c8uRooM0TwJiTIwEviOCB4RC-hhOgGHR/view?usp=sharing
> >> kernel config: https://drive.google.com/file/d/1XD9WYDViQLSXN7RGwH8AGGDvP9JvOghx/view?usp=sharing
> >> Similar report:
> >> https://groups.google.com/u/1/g/syzkaller-bugs/c/FvdcTiJIGtY/m/PcXkoenUAAAJ
> >>
> >> Sorry, I don't have a reproducer for this crash, hope the symbolized
> >> report can help.
> >> If you fix this issue, please add the following tag to the commit:
> >> Reported-by: Hao Sun <[email protected]>
> >
> > Would be great with a reproducer for this one, though...
>
> And syzbot usually sends an execution log with all syz programs
> it run, which may be helpful. Any chance you have anything similar
> left?
>

Yes, found it[1]. Here is an execution history with latest 1024
executed progs before crash saved.
Hope it can help. I'll also follow this crash closely, see if Healer
can find a reproducer and send it to you once it found.

[1] https://drive.google.com/file/d/14k8qOFeyKPD4HsqOpIjud3b9jsxFSo-u/view?usp=sharing

> --
> Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-08  1:01     ` Hao Sun
@ 2021-09-08  6:57       ` Hao Sun
  2021-09-08 20:16         ` Pavel Begunkov
  0 siblings, 1 reply; 11+ messages in thread
From: Hao Sun @ 2021-09-08  6:57 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, linux-kernel

Another similar report: INFO: task hung in io_ring_exit_work

INFO: task kworker/u9:3:1867 blocked for more than 143 seconds.
      Not tainted 5.14.0+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u9:3    state:D stack:12344 pid: 1867 ppid:     2 flags:0x00004000
Workqueue: events_unbound io_ring_exit_work
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0x323/0xae0 kernel/sched/core.c:6287
 schedule+0x36/0xe0 kernel/sched/core.c:6366
 schedule_timeout+0x189/0x430 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0xb4/0x110 kernel/sched/completion.c:138
 io_ring_exit_work+0x287/0x850 fs/io_uring.c:9297
 process_one_work+0x359/0x850 kernel/workqueue.c:2297
 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444
 kthread+0x178/0x1b0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
INFO: task kworker/u9:4:2050 blocked for more than 143 seconds.
      Not tainted 5.14.0+ #13
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u9:4    state:D stack:12232 pid: 2050 ppid:     2 flags:0x00004000
Workqueue: events_unbound io_ring_exit_work
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0x323/0xae0 kernel/sched/core.c:6287
 schedule+0x36/0xe0 kernel/sched/core.c:6366
 schedule_timeout+0x189/0x430 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0xb4/0x110 kernel/sched/completion.c:138
 io_ring_exit_work+0x287/0x850 fs/io_uring.c:9297
 process_one_work+0x359/0x850 kernel/workqueue.c:2297
 worker_thread+0x41/0x4d0 kernel/workqueue.c:2444
 kthread+0x178/0x1b0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
1 lock held by khungtaskd/1340:
 #0: ffffffff85a1d3a0 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0xe/0x1a0 kernel/locking/lockdep.c:6446
2 locks held by kworker/u9:3/1867:
 #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
set_work_data kernel/workqueue.c:633 [inline]
 #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
 #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
 #1: ffffc90003e67e70
((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: set_work_data
kernel/workqueue.c:633 [inline]
 #1: ffffc90003e67e70
((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
 #1: ffffc90003e67e70
((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
2 locks held by kworker/u9:4/2050:
 #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
set_work_data kernel/workqueue.c:633 [inline]
 #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
 #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
 #1: ffffc90004327e70
((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: set_work_data
kernel/workqueue.c:633 [inline]
 #1: ffffc90004327e70
((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
 #1: ffffc90004327e70
((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
1 lock held by in:imklog/8089:
 #0: ffff88800ece7ef0 (&f->f_pos_lock){+.+.}-{3:3}, at:
__fdget_pos+0x55/0x60 fs/file.c:990

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1340 Comm: khungtaskd Not tainted 5.14.0+ #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:105
 nmi_cpu_backtrace+0x1e9/0x210 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x120/0x180 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0x4e1/0x980 kernel/hung_task.c:295
 kthread+0x178/0x1b0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 0 to CPUs 1-3:
NMI backtrace for cpu 1
CPU: 1 PID: 4934 Comm: systemd-journal Not tainted 5.14.0+ #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:lockdep_enabled kernel/locking/lockdep.c:91 [inline]
RIP: 0010:lock_acquire+0x1a4/0x340 kernel/locking/lockdep.c:5601
Code: 04 25 28 00 00 00 0f 85 b5 01 00 00 48 83 c4 38 5b 5d 41 5c 41
5d 41 5e 41 5f c3 65 48 8b 04 25 40 70 01 00 8b 80 f4 09 00 00 <85> c0
0f 85 c5 fe ff ff 9c 8f 04 24 fa 48 c7 c7 a0 36 5d 85 e8 e3
RSP: 0018:ffffc9000086fe18 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffffffff852a8b94 RDI: ffffffff853cae8e
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85a1d3a0
R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f940a97b8c0(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9407aa3000 CR3: 000000000f49a000 CR4: 0000000000750ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 rcu_lock_acquire include/linux/rcupdate.h:267 [inline]
 rcu_read_lock include/linux/rcupdate.h:687 [inline]
 dput+0x72/0x650 fs/dcache.c:874
 path_put+0x12/0x20 fs/namei.c:557
 do_faccessat+0x1fa/0x370 fs/open.c:455
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f9409c379c7
Code: 83 c4 08 48 3d 01 f0 ff ff 73 01 c3 48 8b 0d c8 d4 2b 00 f7 d8
64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 b8 15 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d a1 d4 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffeead2d338 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
RAX: ffffffffffffffda RBX: 00007ffeead30250 RCX: 00007f9409c379c7
RDX: 00007f940a6a8a00 RSI: 0000000000000000 RDI: 000055d9e89aa9a3
RBP: 00007ffeead2d370 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000069 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffeead30250 R15: 00007ffeead2d860
NMI backtrace for cpu 2 skipped: idling at native_safe_halt
arch/x86/include/asm/irqflags.h:51 [inline]
NMI backtrace for cpu 2 skipped: idling at arch_safe_halt
arch/x86/include/asm/irqflags.h:89 [inline]
NMI backtrace for cpu 2 skipped: idling at default_idle+0xb/0x10
arch/x86/kernel/process.c:716
NMI backtrace for cpu 3
CPU: 3 PID: 8091 Comm: rs:main Q:Reg Not tainted 5.14.0+ #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6
f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa
4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
RSP: 0018:ffffc90007a23ba8 EFLAGS: 00010216
RAX: 0000000000000000 RBX: ffff88810cd3ec78 RCX: 0000000000000c85
RDX: 0000000000000d05 RSI: 0000000000000000 RDI: ffff888107f7b37b
RBP: 0000000000001000 R08: 0000000000000000 R09: ffff888107f7b2fb
R10: ffffc90007a23978 R11: 0000000000000005 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000001000 R15: ffffea00041fdec0
FS:  00007fd86dba4700(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055d6e0599858 CR3: 0000000012b38000 CR4: 0000000000750ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 memset include/linux/fortify-string.h:175 [inline]
 zero_user_segments include/linux/highmem.h:211 [inline]
 ext4_block_write_begin+0x63e/0x980 fs/ext4/inode.c:1089
 ext4_da_write_begin+0x275/0x610 fs/ext4/inode.c:3021
 generic_perform_write+0xce/0x220 mm/filemap.c:3770
 ext4_buffered_write_iter+0xd6/0x190 fs/ext4/file.c:269
 ext4_file_write_iter+0x80/0x940 fs/ext4/file.c:680
 call_write_iter include/linux/fs.h:2163 [inline]
 new_sync_write+0x18d/0x260 fs/read_write.c:507
 vfs_write+0x43b/0x4a0 fs/read_write.c:594
 ksys_write+0xd2/0x120 fs/read_write.c:647
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8705e81cd
Code: c2 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31
c3 48 83 ec 08 e8 ae fc ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b
3c 24 48 89 c2 e8 f7 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007fd86dba3590 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fd8640268f0 RCX: 00007fd8705e81cd
RDX: 00000000000004ce RSI: 00007fd8640268f0 RDI: 0000000000000008
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000293 R12: 00007fd864026670
R13: 00007fd86dba35b0 R14: 00005563e5cf4440 R15: 00000000000004ce
----------------
Code disassembly (best guess):
   0: 04 25                add    $0x25,%al
   2: 28 00                sub    %al,(%rax)
   4: 00 00                add    %al,(%rax)
   6: 0f 85 b5 01 00 00    jne    0x1c1
   c: 48 83 c4 38          add    $0x38,%rsp
  10: 5b                    pop    %rbx
  11: 5d                    pop    %rbp
  12: 41 5c                pop    %r12
  14: 41 5d                pop    %r13
  16: 41 5e                pop    %r14
  18: 41 5f                pop    %r15
  1a: c3                    retq
  1b: 65 48 8b 04 25 40 70 mov    %gs:0x17040,%rax
  22: 01 00
  24: 8b 80 f4 09 00 00    mov    0x9f4(%rax),%eax
* 2a: 85 c0                test   %eax,%eax <-- trapping instruction
  2c: 0f 85 c5 fe ff ff    jne    0xfffffef7
  32: 9c                    pushfq
  33: 8f 04 24              popq   (%rsp)
  36: fa                    cli
  37: 48 c7 c7 a0 36 5d 85 mov    $0xffffffff855d36a0,%rdi
  3e: e8                    .byte 0xe8
  3f: e3                    .byte 0xe3%

Hao Sun <[email protected]> 于2021年9月8日周三 上午9:01写道：
>
> Pavel Begunkov <[email protected]> 于2021年9月8日周三 上午5:31写道：
> >
> > On 9/7/21 8:30 PM, Jens Axboe wrote:
> > > On 9/7/21 5:50 AM, Hao Sun wrote:
> > >> Hello,
> > >>
> > >> When using Healer to fuzz the latest Linux kernel, the following crash
> > >> was triggered.
> > >>
> > >> HEAD commit: 7d2a07b76933 Linux 5.14
> > >> git tree: upstream
> > >> console output:
> > >> https://drive.google.com/file/d/1c8uRooM0TwJiTIwEviOCB4RC-hhOgGHR/view?usp=sharing
> > >> kernel config: https://drive.google.com/file/d/1XD9WYDViQLSXN7RGwH8AGGDvP9JvOghx/view?usp=sharing
> > >> Similar report:
> > >> https://groups.google.com/u/1/g/syzkaller-bugs/c/FvdcTiJIGtY/m/PcXkoenUAAAJ
> > >>
> > >> Sorry, I don't have a reproducer for this crash, hope the symbolized
> > >> report can help.
> > >> If you fix this issue, please add the following tag to the commit:
> > >> Reported-by: Hao Sun <[email protected]>
> > >
> > > Would be great with a reproducer for this one, though...
> >
> > And syzbot usually sends an execution log with all syz programs
> > it run, which may be helpful. Any chance you have anything similar
> > left?
> >
>
> Yes, found it[1]. Here is an execution history with latest 1024
> executed progs before crash saved.
> Hope it can help. I'll also follow this crash closely, see if Healer
> can find a reproducer and send it to you once it found.
>
> [1] https://drive.google.com/file/d/14k8qOFeyKPD4HsqOpIjud3b9jsxFSo-u/view?usp=sharing
>
> > --
> > Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-08  6:57       ` Hao Sun
@ 2021-09-08 20:16         ` Pavel Begunkov
  2021-09-13  2:26           ` Hao Sun
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2021-09-08 20:16 UTC (permalink / raw)
  To: Hao Sun; +Cc: Jens Axboe, io-uring, linux-kernel

On 9/8/21 7:57 AM, Hao Sun wrote:
> Another similar report: INFO: task hung in io_ring_exit_work
> 
> INFO: task kworker/u9:3:1867 blocked for more than 143 seconds.
>       Not tainted 5.14.0+ #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u9:3    state:D stack:12344 pid: 1867 ppid:     2 flags:0x00004000
> Workqueue: events_unbound io_ring_exit_work

This one is different. I guess it may be difficult for syz to
find a repro because of the delay before the warning appears.
How about to reduce it to a lower value? like 2-5 seconds

echo 5 > /proc/sys/kernel/hung_task_timeout_secs

> Call Trace:
>  context_switch kernel/sched/core.c:4940 [inline]
>  __schedule+0x323/0xae0 kernel/sched/core.c:6287
>  schedule+0x36/0xe0 kernel/sched/core.c:6366
>  schedule_timeout+0x189/0x430 kernel/time/timer.c:1857
>  do_wait_for_common kernel/sched/completion.c:85 [inline]
>  __wait_for_common kernel/sched/completion.c:106 [inline]
>  wait_for_common kernel/sched/completion.c:117 [inline]
>  wait_for_completion+0xb4/0x110 kernel/sched/completion.c:138
>  io_ring_exit_work+0x287/0x850 fs/io_uring.c:9297
>  process_one_work+0x359/0x850 kernel/workqueue.c:2297
>  worker_thread+0x41/0x4d0 kernel/workqueue.c:2444
>  kthread+0x178/0x1b0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> INFO: task kworker/u9:4:2050 blocked for more than 143 seconds.
>       Not tainted 5.14.0+ #13
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u9:4    state:D stack:12232 pid: 2050 ppid:     2 flags:0x00004000
> Workqueue: events_unbound io_ring_exit_work
> Call Trace:
>  context_switch kernel/sched/core.c:4940 [inline]
>  __schedule+0x323/0xae0 kernel/sched/core.c:6287
>  schedule+0x36/0xe0 kernel/sched/core.c:6366
>  schedule_timeout+0x189/0x430 kernel/time/timer.c:1857
>  do_wait_for_common kernel/sched/completion.c:85 [inline]
>  __wait_for_common kernel/sched/completion.c:106 [inline]
>  wait_for_common kernel/sched/completion.c:117 [inline]
>  wait_for_completion+0xb4/0x110 kernel/sched/completion.c:138
>  io_ring_exit_work+0x287/0x850 fs/io_uring.c:9297
>  process_one_work+0x359/0x850 kernel/workqueue.c:2297
>  worker_thread+0x41/0x4d0 kernel/workqueue.c:2444
>  kthread+0x178/0x1b0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1340:
>  #0: ffffffff85a1d3a0 (rcu_read_lock){....}-{1:2}, at:
> debug_show_all_locks+0xe/0x1a0 kernel/locking/lockdep.c:6446
> 2 locks held by kworker/u9:3/1867:
>  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> set_work_data kernel/workqueue.c:633 [inline]
>  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
>  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
>  #1: ffffc90003e67e70
> ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: set_work_data
> kernel/workqueue.c:633 [inline]
>  #1: ffffc90003e67e70
> ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
> set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
>  #1: ffffc90003e67e70
> ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
> process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
> 2 locks held by kworker/u9:4/2050:
>  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> set_work_data kernel/workqueue.c:633 [inline]
>  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
>  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
>  #1: ffffc90004327e70
> ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: set_work_data
> kernel/workqueue.c:633 [inline]
>  #1: ffffc90004327e70
> ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
> set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
>  #1: ffffc90004327e70
> ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
> process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
> 1 lock held by in:imklog/8089:
>  #0: ffff88800ece7ef0 (&f->f_pos_lock){+.+.}-{3:3}, at:
> __fdget_pos+0x55/0x60 fs/file.c:990
> 
> =============================================
> 
> NMI backtrace for cpu 0
> CPU: 0 PID: 1340 Comm: khungtaskd Not tainted 5.14.0+ #13
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:88 [inline]
>  dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:105
>  nmi_cpu_backtrace+0x1e9/0x210 lib/nmi_backtrace.c:105
>  nmi_trigger_cpumask_backtrace+0x120/0x180 lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
>  watchdog+0x4e1/0x980 kernel/hung_task.c:295
>  kthread+0x178/0x1b0 kernel/kthread.c:319
>  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> Sending NMI from CPU 0 to CPUs 1-3:
> NMI backtrace for cpu 1
> CPU: 1 PID: 4934 Comm: systemd-journal Not tainted 5.14.0+ #13
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> RIP: 0010:lockdep_enabled kernel/locking/lockdep.c:91 [inline]
> RIP: 0010:lock_acquire+0x1a4/0x340 kernel/locking/lockdep.c:5601
> Code: 04 25 28 00 00 00 0f 85 b5 01 00 00 48 83 c4 38 5b 5d 41 5c 41
> 5d 41 5e 41 5f c3 65 48 8b 04 25 40 70 01 00 8b 80 f4 09 00 00 <85> c0
> 0f 85 c5 fe ff ff 9c 8f 04 24 fa 48 c7 c7 a0 36 5d 85 e8 e3
> RSP: 0018:ffffc9000086fe18 EFLAGS: 00000246
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: ffffffff852a8b94 RDI: ffffffff853cae8e
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85a1d3a0
> R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
> FS:  00007f940a97b8c0(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f9407aa3000 CR3: 000000000f49a000 CR4: 0000000000750ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  rcu_lock_acquire include/linux/rcupdate.h:267 [inline]
>  rcu_read_lock include/linux/rcupdate.h:687 [inline]
>  dput+0x72/0x650 fs/dcache.c:874
>  path_put+0x12/0x20 fs/namei.c:557
>  do_faccessat+0x1fa/0x370 fs/open.c:455
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f9409c379c7
> Code: 83 c4 08 48 3d 01 f0 ff ff 73 01 c3 48 8b 0d c8 d4 2b 00 f7 d8
> 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 b8 15 00 00 00 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 8b 0d a1 d4 2b 00 f7 d8 64 89 01 48
> RSP: 002b:00007ffeead2d338 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> RAX: ffffffffffffffda RBX: 00007ffeead30250 RCX: 00007f9409c379c7
> RDX: 00007f940a6a8a00 RSI: 0000000000000000 RDI: 000055d9e89aa9a3
> RBP: 00007ffeead2d370 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000069 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007ffeead30250 R15: 00007ffeead2d860
> NMI backtrace for cpu 2 skipped: idling at native_safe_halt
> arch/x86/include/asm/irqflags.h:51 [inline]
> NMI backtrace for cpu 2 skipped: idling at arch_safe_halt
> arch/x86/include/asm/irqflags.h:89 [inline]
> NMI backtrace for cpu 2 skipped: idling at default_idle+0xb/0x10
> arch/x86/kernel/process.c:716
> NMI backtrace for cpu 3
> CPU: 3 PID: 8091 Comm: rs:main Q:Reg Not tainted 5.14.0+ #13
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
> Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6
> f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa
> 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
> RSP: 0018:ffffc90007a23ba8 EFLAGS: 00010216
> RAX: 0000000000000000 RBX: ffff88810cd3ec78 RCX: 0000000000000c85
> RDX: 0000000000000d05 RSI: 0000000000000000 RDI: ffff888107f7b37b
> RBP: 0000000000001000 R08: 0000000000000000 R09: ffff888107f7b2fb
> R10: ffffc90007a23978 R11: 0000000000000005 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000001000 R15: ffffea00041fdec0
> FS:  00007fd86dba4700(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000055d6e0599858 CR3: 0000000012b38000 CR4: 0000000000750ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  memset include/linux/fortify-string.h:175 [inline]
>  zero_user_segments include/linux/highmem.h:211 [inline]
>  ext4_block_write_begin+0x63e/0x980 fs/ext4/inode.c:1089
>  ext4_da_write_begin+0x275/0x610 fs/ext4/inode.c:3021
>  generic_perform_write+0xce/0x220 mm/filemap.c:3770
>  ext4_buffered_write_iter+0xd6/0x190 fs/ext4/file.c:269
>  ext4_file_write_iter+0x80/0x940 fs/ext4/file.c:680
>  call_write_iter include/linux/fs.h:2163 [inline]
>  new_sync_write+0x18d/0x260 fs/read_write.c:507
>  vfs_write+0x43b/0x4a0 fs/read_write.c:594
>  ksys_write+0xd2/0x120 fs/read_write.c:647
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7fd8705e81cd
> Code: c2 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31
> c3 48 83 ec 08 e8 ae fc ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b
> 3c 24 48 89 c2 e8 f7 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> RSP: 002b:00007fd86dba3590 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 00007fd8640268f0 RCX: 00007fd8705e81cd
> RDX: 00000000000004ce RSI: 00007fd8640268f0 RDI: 0000000000000008
> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000293 R12: 00007fd864026670
> R13: 00007fd86dba35b0 R14: 00005563e5cf4440 R15: 00000000000004ce
> ----------------
> Code disassembly (best guess):
>    0: 04 25                add    $0x25,%al
>    2: 28 00                sub    %al,(%rax)
>    4: 00 00                add    %al,(%rax)
>    6: 0f 85 b5 01 00 00    jne    0x1c1
>    c: 48 83 c4 38          add    $0x38,%rsp
>   10: 5b                    pop    %rbx
>   11: 5d                    pop    %rbp
>   12: 41 5c                pop    %r12
>   14: 41 5d                pop    %r13
>   16: 41 5e                pop    %r14
>   18: 41 5f                pop    %r15
>   1a: c3                    retq
>   1b: 65 48 8b 04 25 40 70 mov    %gs:0x17040,%rax
>   22: 01 00
>   24: 8b 80 f4 09 00 00    mov    0x9f4(%rax),%eax
> * 2a: 85 c0                test   %eax,%eax <-- trapping instruction
>   2c: 0f 85 c5 fe ff ff    jne    0xfffffef7
>   32: 9c                    pushfq
>   33: 8f 04 24              popq   (%rsp)
>   36: fa                    cli
>   37: 48 c7 c7 a0 36 5d 85 mov    $0xffffffff855d36a0,%rdi
>   3e: e8                    .byte 0xe8
>   3f: e3                    .byte 0xe3%
> 
> Hao Sun <[email protected]> 于2021年9月8日周三 上午9:01写道：
>>
>> Pavel Begunkov <[email protected]> 于2021年9月8日周三 上午5:31写道：
>>>
>>> On 9/7/21 8:30 PM, Jens Axboe wrote:
>>>> On 9/7/21 5:50 AM, Hao Sun wrote:
>>>>> Hello,
>>>>>
>>>>> When using Healer to fuzz the latest Linux kernel, the following crash
>>>>> was triggered.
>>>>>
>>>>> HEAD commit: 7d2a07b76933 Linux 5.14
>>>>> git tree: upstream
>>>>> console output:
>>>>> https://drive.google.com/file/d/1c8uRooM0TwJiTIwEviOCB4RC-hhOgGHR/view?usp=sharing
>>>>> kernel config: https://drive.google.com/file/d/1XD9WYDViQLSXN7RGwH8AGGDvP9JvOghx/view?usp=sharing
>>>>> Similar report:
>>>>> https://groups.google.com/u/1/g/syzkaller-bugs/c/FvdcTiJIGtY/m/PcXkoenUAAAJ
>>>>>
>>>>> Sorry, I don't have a reproducer for this crash, hope the symbolized
>>>>> report can help.
>>>>> If you fix this issue, please add the following tag to the commit:
>>>>> Reported-by: Hao Sun <[email protected]>
>>>>
>>>> Would be great with a reproducer for this one, though...
>>>
>>> And syzbot usually sends an execution log with all syz programs
>>> it run, which may be helpful. Any chance you have anything similar
>>> left?
>>>
>>
>> Yes, found it[1]. Here is an execution history with latest 1024
>> executed progs before crash saved.
>> Hope it can help. I'll also follow this crash closely, see if Healer
>> can find a reproducer and send it to you once it found.
>>
>> [1] https://drive.google.com/file/d/14k8qOFeyKPD4HsqOpIjud3b9jsxFSo-u/view?usp=sharing
>>
>>> --
>>> Pavel Begunkov

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-08 20:16         ` Pavel Begunkov
@ 2021-09-13  2:26           ` Hao Sun
  2021-09-13  2:50             ` Jens Axboe
  2021-09-13  8:30             ` Pavel Begunkov
  0 siblings, 2 replies; 11+ messages in thread
From: Hao Sun @ 2021-09-13  2:26 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, linux-kernel

Hi

Healer found a C reproducer for this crash ("INFO: task hung in
io_ring_exit_work").

HEAD commit: 4b93c544e90e-thunderbolt: test: split up test cases
git tree: upstream
console output:
https://drive.google.com/file/d/1NswMU2yMRTc8-EqbZcVvcJejV92cuZIk/view?usp=sharing
kernel config: https://drive.google.com/file/d/1c0u2EeRDhRO-ZCxr9MP2VvAtJd6kfg-p/view?usp=sharing
C reproducer: https://drive.google.com/file/d/170wk5_T8mYDaAtDcrdVi2UU9_dW1894s/view?usp=sharing
Syzlang reproducer:
https://drive.google.com/file/d/1eo-jAS9lncm4i-1kaCBkexrjpQHXboBq/view?usp=sharing

If you fix this issue, please add the following tag to the commit:
Reported-by: Hao Sun <[email protected]>

INFO: task kworker/u10:0:24 blocked for more than 143 seconds.
      Not tainted 5.14.0+ #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u10:0   state:D stack:27224 pid:   24 ppid:     2 flags:0x00004000
Workqueue: events_unbound io_ring_exit_work
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0xcd9/0x2530 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x5e5/0x890 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x17d/0x280 kernel/sched/completion.c:138
 io_ring_exit_work+0x530/0x1540 fs/io_uring.c:9333
 process_one_work+0x9df/0x16d0 kernel/workqueue.c:2297
 worker_thread+0x90/0xed0 kernel/workqueue.c:2444
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Showing all locks held in the system:
2 locks held by kworker/u10:0/24:
 #0: ffff888010c71138 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: ffff888010c71138 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
arch_atomic_long_set include/linux/atomic/atomic-long.h:41 [inline]
 #0: ffff888010c71138 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
atomic_long_set include/linux/atomic/atomic-instrumented.h:1198
[inline]
 #0: ffff888010c71138 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
set_work_data kernel/workqueue.c:634 [inline]
 #0: ffff888010c71138 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
 #0: ffff888010c71138 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
process_one_work+0x8c3/0x16d0 kernel/workqueue.c:2268
 #1: ffffc900007efdc8
((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
process_one_work+0x8f7/0x16d0 kernel/workqueue.c:2272
1 lock held by khungtaskd/1161:
 #0: ffffffff8b97dde0 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:6446
1 lock held by in:imklog/8220:
 #0: ffff88802031d4f0 (&f->f_pos_lock){+.+.}-{3:3}, at:
__fdget_pos+0xe9/0x100 fs/file.c:990

=============================================

NMI backtrace for cpu 0
CPU: 0 PID: 1161 Comm: khungtaskd Not tainted 5.14.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 nmi_cpu_backtrace.cold+0x47/0x144 lib/nmi_backtrace.c:105
 nmi_trigger_cpumask_backtrace+0x1e1/0x220 lib/nmi_backtrace.c:62
 trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
 check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
 watchdog+0xcc8/0x1010 kernel/hung_task.c:295
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Sending NMI from CPU 0 to CPUs 1-3:
NMI backtrace for cpu 1
CPU: 1 PID: 10654 Comm: syz-executor Not tainted 5.14.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:__sanitizer_cov_trace_pc+0x1c/0x40 kernel/kcov.c:197
Code: 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 65 48 8b 0c 25 40 f0
01 00 bf 02 00 00 00 48 89 ce 4c 8b 04 24 e8 76 ff ff ff 84 c0 <74> 20
48 8b 91 20 15 00 00 8b 89 1c 15 00 00 48 8b 02 48 83 c0 01
RSP: 0018:ffffc900085df560 EFLAGS: 00000202
RAX: 0000000000000001 RBX: 0000000000000002 RCX: ffff888011b08000
RDX: 0000000000000000 RSI: ffff888011b08000 RDI: 0000000000000002
RBP: ffffc900085df728 R08: ffffffff88b6a994 R09: 0000000000000002
R10: 0000000000000005 R11: ffffed1002361000 R12: ffff88802d8fa100
R13: ffffc900085df620 R14: 00000000fffffe00 R15: 0000000000000002
FS:  00007f7828b86700(0000) GS:ffff888135c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3570c3c0a0 CR3: 0000000106990000 CR4: 0000000000350ee0
Call Trace:
 p9_client_rpc+0x3d4/0x11c0 net/9p/client.c:759
 p9_client_flush+0xe9/0x1e0 net/9p/client.c:667
 p9_client_rpc+0x107e/0x11c0 net/9p/client.c:784
 p9_client_version net/9p/client.c:955 [inline]
 p9_client_create+0xbe0/0x13e0 net/9p/client.c:1055
 v9fs_session_init+0x1e4/0x1900 fs/9p/v9fs.c:406
 v9fs_mount+0x73/0x9d0 fs/9p/vfs_super.c:126
 legacy_get_tree+0x105/0x220 fs/fs_context.c:610
 vfs_get_tree+0x89/0x2f0 fs/super.c:1498
 do_new_mount fs/namespace.c:2988 [inline]
 path_mount+0x1228/0x1cb0 fs/namespace.c:3318
 do_mount+0xf3/0x110 fs/namespace.c:3331
 __do_sys_mount fs/namespace.c:3539 [inline]
 __se_sys_mount fs/namespace.c:3516 [inline]
 __x64_sys_mount+0x18f/0x230 fs/namespace.c:3516
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4739cd
Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48
89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f7828b85c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 000000000059c0a0 RCX: 00000000004739cd
RDX: 0000000020000380 RSI: 0000000020000340 RDI: 0000000000000000
RBP: 00000000004ebd80 R08: 00000000200003c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000059c0a0
R13: 00007ffd0845607f R14: 00007ffd08456220 R15: 00007f7828b85dc0
NMI backtrace for cpu 3
CPU: 3 PID: 4948 Comm: systemd-journal Not tainted 5.14.0+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:lookup_chain_cache kernel/locking/lockdep.c:3682 [inline]
RIP: 0010:lookup_chain_cache_add kernel/locking/lockdep.c:3702 [inline]
RIP: 0010:validate_chain kernel/locking/lockdep.c:3757 [inline]
RIP: 0010:__lock_acquire+0x1877/0x57e0 kernel/locking/lockdep.c:5015
Code: e5 40 08 c7 8f 48 89 5c 24 08 0f 84 6f 04 00 00 4a 8d 14 e5 40
08 c7 8f 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 <0f> 85
66 2f 00 00 4a 8b 1c e5 40 08 c7 8f 48 85 db 74 58 48 83 eb
RSP: 0018:ffffc9000107f868 EFLAGS: 00000046
RAX: dffffc0000000000 RBX: ffffffff8fc8e4b0 RCX: ffffffff815b148b
RDX: 1ffffffff1f91c96 RSI: 0000000000000008 RDI: ffffffff8fd008f8
RBP: ffff888013f4dfa0 R08: 0000000000000001 R09: fffffbfff1fa0120
R10: ffffffff8fd008ff R11: fffffbfff1fa011f R12: 0000000000003b8e
R13: ffff888013f4d580 R14: d03c86651a37c6c3 R15: 0000000000000000
FS:  00007f35737908c0(0000) GS:ffff888135d00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3570caf000 CR3: 000000001b4d1000 CR4: 0000000000350ee0
Call Trace:
 lock_acquire kernel/locking/lockdep.c:5625 [inline]
 lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5590
 seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline]
 set_root+0x3be/0x560 fs/namei.c:940
 nd_jump_root+0x38d/0x520 fs/namei.c:961
 path_init+0xf81/0x1700 fs/namei.c:2359
 path_lookupat.isra.0+0x30/0x580 fs/namei.c:2439
 __filename_lookup+0x1ca/0x410 fs/namei.c:2478
 filename_lookup fs/namei.c:2494 [inline]
 user_path_at_empty+0x42/0x60 fs/namei.c:2801
 user_path_at include/linux/namei.h:57 [inline]
 do_faccessat+0x127/0x850 fs/open.c:421
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f3572a4c9c7
Code: 83 c4 08 48 3d 01 f0 ff ff 73 01 c3 48 8b 0d c8 d4 2b 00 f7 d8
64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 b8 15 00 00 00 0f 05 <48> 3d
01 f0 ff ff 73 01 c3 48 8b 0d a1 d4 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcba7c9ca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
RAX: ffffffffffffffda RBX: 00007ffcba7cccd0 RCX: 00007f3572a4c9c7
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00005623bd6869a3
RBP: 00007ffcba7c9df0 R08: 00005623bd67c3e5 R09: 0000000000000018
R10: 0000000000000069 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00005623bd7558a0 R15: 00007ffcba7ca2e0
NMI backtrace for cpu 2 skipped: idling at native_safe_halt
arch/x86/include/asm/irqflags.h:51 [inline]
NMI backtrace for cpu 2 skipped: idling at arch_safe_halt
arch/x86/include/asm/irqflags.h:89 [inline]
NMI backtrace for cpu 2 skipped: idling at default_idle+0xb/0x10
arch/x86/kernel/process.c:716
----------------
Code disassembly (best guess):
   0: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
   7: 00 00 00 00
   b: 0f 1f 00              nopl   (%rax)
   e: 65 48 8b 0c 25 40 f0 mov    %gs:0x1f040,%rcx
  15: 01 00
  17: bf 02 00 00 00        mov    $0x2,%edi
  1c: 48 89 ce              mov    %rcx,%rsi
  1f: 4c 8b 04 24          mov    (%rsp),%r8
  23: e8 76 ff ff ff        callq  0xffffff9e
  28: 84 c0                test   %al,%al
* 2a: 74 20                je     0x4c <-- trapping instruction
  2c: 48 8b 91 20 15 00 00 mov    0x1520(%rcx),%rdx
  33: 8b 89 1c 15 00 00    mov    0x151c(%rcx),%ecx
  39: 48 8b 02              mov    (%rdx),%rax
  3c: 48 83 c0 01          add    $0x1,%rax%

Pavel Begunkov <[email protected]> 于2021年9月9日周四 上午4:16写道：

>
> On 9/8/21 7:57 AM, Hao Sun wrote:
> > Another similar report: INFO: task hung in io_ring_exit_work
> >
> > INFO: task kworker/u9:3:1867 blocked for more than 143 seconds.
> >       Not tainted 5.14.0+ #13
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:kworker/u9:3    state:D stack:12344 pid: 1867 ppid:     2 flags:0x00004000
> > Workqueue: events_unbound io_ring_exit_work
>
> This one is different. I guess it may be difficult for syz to
> find a repro because of the delay before the warning appears.
> How about to reduce it to a lower value? like 2-5 seconds
>
> echo 5 > /proc/sys/kernel/hung_task_timeout_secs
>
> > Call Trace:
> >  context_switch kernel/sched/core.c:4940 [inline]
> >  __schedule+0x323/0xae0 kernel/sched/core.c:6287
> >  schedule+0x36/0xe0 kernel/sched/core.c:6366
> >  schedule_timeout+0x189/0x430 kernel/time/timer.c:1857
> >  do_wait_for_common kernel/sched/completion.c:85 [inline]
> >  __wait_for_common kernel/sched/completion.c:106 [inline]
> >  wait_for_common kernel/sched/completion.c:117 [inline]
> >  wait_for_completion+0xb4/0x110 kernel/sched/completion.c:138
> >  io_ring_exit_work+0x287/0x850 fs/io_uring.c:9297
> >  process_one_work+0x359/0x850 kernel/workqueue.c:2297
> >  worker_thread+0x41/0x4d0 kernel/workqueue.c:2444
> >  kthread+0x178/0x1b0 kernel/kthread.c:319
> >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > INFO: task kworker/u9:4:2050 blocked for more than 143 seconds.
> >       Not tainted 5.14.0+ #13
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:kworker/u9:4    state:D stack:12232 pid: 2050 ppid:     2 flags:0x00004000
> > Workqueue: events_unbound io_ring_exit_work
> > Call Trace:
> >  context_switch kernel/sched/core.c:4940 [inline]
> >  __schedule+0x323/0xae0 kernel/sched/core.c:6287
> >  schedule+0x36/0xe0 kernel/sched/core.c:6366
> >  schedule_timeout+0x189/0x430 kernel/time/timer.c:1857
> >  do_wait_for_common kernel/sched/completion.c:85 [inline]
> >  __wait_for_common kernel/sched/completion.c:106 [inline]
> >  wait_for_common kernel/sched/completion.c:117 [inline]
> >  wait_for_completion+0xb4/0x110 kernel/sched/completion.c:138
> >  io_ring_exit_work+0x287/0x850 fs/io_uring.c:9297
> >  process_one_work+0x359/0x850 kernel/workqueue.c:2297
> >  worker_thread+0x41/0x4d0 kernel/workqueue.c:2444
> >  kthread+0x178/0x1b0 kernel/kthread.c:319
> >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> >
> > Showing all locks held in the system:
> > 1 lock held by khungtaskd/1340:
> >  #0: ffffffff85a1d3a0 (rcu_read_lock){....}-{1:2}, at:
> > debug_show_all_locks+0xe/0x1a0 kernel/locking/lockdep.c:6446
> > 2 locks held by kworker/u9:3/1867:
> >  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> > set_work_data kernel/workqueue.c:633 [inline]
> >  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> > set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
> >  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> > process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
> >  #1: ffffc90003e67e70
> > ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: set_work_data
> > kernel/workqueue.c:633 [inline]
> >  #1: ffffc90003e67e70
> > ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
> > set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
> >  #1: ffffc90003e67e70
> > ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
> > process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
> > 2 locks held by kworker/u9:4/2050:
> >  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> > set_work_data kernel/workqueue.c:633 [inline]
> >  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> > set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
> >  #0: ffff888008c5c938 ((wq_completion)events_unbound){+.+.}-{0:0}, at:
> > process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
> >  #1: ffffc90004327e70
> > ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: set_work_data
> > kernel/workqueue.c:633 [inline]
> >  #1: ffffc90004327e70
> > ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
> > set_work_pool_and_clear_pending kernel/workqueue.c:661 [inline]
> >  #1: ffffc90004327e70
> > ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at:
> > process_one_work+0x2a0/0x850 kernel/workqueue.c:2268
> > 1 lock held by in:imklog/8089:
> >  #0: ffff88800ece7ef0 (&f->f_pos_lock){+.+.}-{3:3}, at:
> > __fdget_pos+0x55/0x60 fs/file.c:990
> >
> > =============================================
> >
> > NMI backtrace for cpu 0
> > CPU: 0 PID: 1340 Comm: khungtaskd Not tainted 5.14.0+ #13
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> >  __dump_stack lib/dump_stack.c:88 [inline]
> >  dump_stack_lvl+0x8d/0xcf lib/dump_stack.c:105
> >  nmi_cpu_backtrace+0x1e9/0x210 lib/nmi_backtrace.c:105
> >  nmi_trigger_cpumask_backtrace+0x120/0x180 lib/nmi_backtrace.c:62
> >  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
> >  check_hung_uninterruptible_tasks kernel/hung_task.c:210 [inline]
> >  watchdog+0x4e1/0x980 kernel/hung_task.c:295
> >  kthread+0x178/0x1b0 kernel/kthread.c:319
> >  ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> > Sending NMI from CPU 0 to CPUs 1-3:
> > NMI backtrace for cpu 1
> > CPU: 1 PID: 4934 Comm: systemd-journal Not tainted 5.14.0+ #13
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:lockdep_enabled kernel/locking/lockdep.c:91 [inline]
> > RIP: 0010:lock_acquire+0x1a4/0x340 kernel/locking/lockdep.c:5601
> > Code: 04 25 28 00 00 00 0f 85 b5 01 00 00 48 83 c4 38 5b 5d 41 5c 41
> > 5d 41 5e 41 5f c3 65 48 8b 04 25 40 70 01 00 8b 80 f4 09 00 00 <85> c0
> > 0f 85 c5 fe ff ff 9c 8f 04 24 fa 48 c7 c7 a0 36 5d 85 e8 e3
> > RSP: 0018:ffffc9000086fe18 EFLAGS: 00000246
> > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > RDX: 0000000000000001 RSI: ffffffff852a8b94 RDI: ffffffff853cae8e
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85a1d3a0
> > R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
> > FS:  00007f940a97b8c0(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007f9407aa3000 CR3: 000000000f49a000 CR4: 0000000000750ee0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > Call Trace:
> >  rcu_lock_acquire include/linux/rcupdate.h:267 [inline]
> >  rcu_read_lock include/linux/rcupdate.h:687 [inline]
> >  dput+0x72/0x650 fs/dcache.c:874
> >  path_put+0x12/0x20 fs/namei.c:557
> >  do_faccessat+0x1fa/0x370 fs/open.c:455
> >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >  do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7f9409c379c7
> > Code: 83 c4 08 48 3d 01 f0 ff ff 73 01 c3 48 8b 0d c8 d4 2b 00 f7 d8
> > 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 b8 15 00 00 00 0f 05 <48> 3d
> > 01 f0 ff ff 73 01 c3 48 8b 0d a1 d4 2b 00 f7 d8 64 89 01 48
> > RSP: 002b:00007ffeead2d338 EFLAGS: 00000246 ORIG_RAX: 0000000000000015
> > RAX: ffffffffffffffda RBX: 00007ffeead30250 RCX: 00007f9409c379c7
> > RDX: 00007f940a6a8a00 RSI: 0000000000000000 RDI: 000055d9e89aa9a3
> > RBP: 00007ffeead2d370 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000069 R11: 0000000000000246 R12: 0000000000000000
> > R13: 0000000000000000 R14: 00007ffeead30250 R15: 00007ffeead2d860
> > NMI backtrace for cpu 2 skipped: idling at native_safe_halt
> > arch/x86/include/asm/irqflags.h:51 [inline]
> > NMI backtrace for cpu 2 skipped: idling at arch_safe_halt
> > arch/x86/include/asm/irqflags.h:89 [inline]
> > NMI backtrace for cpu 2 skipped: idling at default_idle+0xb/0x10
> > arch/x86/kernel/process.c:716
> > NMI backtrace for cpu 3
> > CPU: 3 PID: 8091 Comm: rs:main Q:Reg Not tainted 5.14.0+ #13
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> > RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
> > Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6
> > f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa
> > 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
> > RSP: 0018:ffffc90007a23ba8 EFLAGS: 00010216
> > RAX: 0000000000000000 RBX: ffff88810cd3ec78 RCX: 0000000000000c85
> > RDX: 0000000000000d05 RSI: 0000000000000000 RDI: ffff888107f7b37b
> > RBP: 0000000000001000 R08: 0000000000000000 R09: ffff888107f7b2fb
> > R10: ffffc90007a23978 R11: 0000000000000005 R12: 0000000000000000
> > R13: 0000000000000000 R14: 0000000000001000 R15: ffffea00041fdec0
> > FS:  00007fd86dba4700(0000) GS:ffff88813dd00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 000055d6e0599858 CR3: 0000000012b38000 CR4: 0000000000750ee0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > Call Trace:
> >  memset include/linux/fortify-string.h:175 [inline]
> >  zero_user_segments include/linux/highmem.h:211 [inline]
> >  ext4_block_write_begin+0x63e/0x980 fs/ext4/inode.c:1089
> >  ext4_da_write_begin+0x275/0x610 fs/ext4/inode.c:3021
> >  generic_perform_write+0xce/0x220 mm/filemap.c:3770
> >  ext4_buffered_write_iter+0xd6/0x190 fs/ext4/file.c:269
> >  ext4_file_write_iter+0x80/0x940 fs/ext4/file.c:680
> >  call_write_iter include/linux/fs.h:2163 [inline]
> >  new_sync_write+0x18d/0x260 fs/read_write.c:507
> >  vfs_write+0x43b/0x4a0 fs/read_write.c:594
> >  ksys_write+0xd2/0x120 fs/read_write.c:647
> >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >  do_syscall_64+0x34/0xb0 arch/x86/entry/common.c:80
> >  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > RIP: 0033:0x7fd8705e81cd
> > Code: c2 20 00 00 75 10 b8 01 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31
> > c3 48 83 ec 08 e8 ae fc ff ff 48 89 04 24 b8 01 00 00 00 0f 05 <48> 8b
> > 3c 24 48 89 c2 e8 f7 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
> > RSP: 002b:00007fd86dba3590 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
> > RAX: ffffffffffffffda RBX: 00007fd8640268f0 RCX: 00007fd8705e81cd
> > RDX: 00000000000004ce RSI: 00007fd8640268f0 RDI: 0000000000000008
> > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000293 R12: 00007fd864026670
> > R13: 00007fd86dba35b0 R14: 00005563e5cf4440 R15: 00000000000004ce
> > ----------------
> > Code disassembly (best guess):
> >    0: 04 25                add    $0x25,%al
> >    2: 28 00                sub    %al,(%rax)
> >    4: 00 00                add    %al,(%rax)
> >    6: 0f 85 b5 01 00 00    jne    0x1c1
> >    c: 48 83 c4 38          add    $0x38,%rsp
> >   10: 5b                    pop    %rbx
> >   11: 5d                    pop    %rbp
> >   12: 41 5c                pop    %r12
> >   14: 41 5d                pop    %r13
> >   16: 41 5e                pop    %r14
> >   18: 41 5f                pop    %r15
> >   1a: c3                    retq
> >   1b: 65 48 8b 04 25 40 70 mov    %gs:0x17040,%rax
> >   22: 01 00
> >   24: 8b 80 f4 09 00 00    mov    0x9f4(%rax),%eax
> > * 2a: 85 c0                test   %eax,%eax <-- trapping instruction
> >   2c: 0f 85 c5 fe ff ff    jne    0xfffffef7
> >   32: 9c                    pushfq
> >   33: 8f 04 24              popq   (%rsp)
> >   36: fa                    cli
> >   37: 48 c7 c7 a0 36 5d 85 mov    $0xffffffff855d36a0,%rdi
> >   3e: e8                    .byte 0xe8
> >   3f: e3                    .byte 0xe3%
> >
> > Hao Sun <[email protected]> 于2021年9月8日周三 上午9:01写道：
> >>
> >> Pavel Begunkov <[email protected]> 于2021年9月8日周三 上午5:31写道：
> >>>
> >>> On 9/7/21 8:30 PM, Jens Axboe wrote:
> >>>> On 9/7/21 5:50 AM, Hao Sun wrote:
> >>>>> Hello,
> >>>>>
> >>>>> When using Healer to fuzz the latest Linux kernel, the following crash
> >>>>> was triggered.
> >>>>>
> >>>>> HEAD commit: 7d2a07b76933 Linux 5.14
> >>>>> git tree: upstream
> >>>>> console output:
> >>>>> https://drive.google.com/file/d/1c8uRooM0TwJiTIwEviOCB4RC-hhOgGHR/view?usp=sharing
> >>>>> kernel config: https://drive.google.com/file/d/1XD9WYDViQLSXN7RGwH8AGGDvP9JvOghx/view?usp=sharing
> >>>>> Similar report:
> >>>>> https://groups.google.com/u/1/g/syzkaller-bugs/c/FvdcTiJIGtY/m/PcXkoenUAAAJ
> >>>>>
> >>>>> Sorry, I don't have a reproducer for this crash, hope the symbolized
> >>>>> report can help.
> >>>>> If you fix this issue, please add the following tag to the commit:
> >>>>> Reported-by: Hao Sun <[email protected]>
> >>>>
> >>>> Would be great with a reproducer for this one, though...
> >>>
> >>> And syzbot usually sends an execution log with all syz programs
> >>> it run, which may be helpful. Any chance you have anything similar
> >>> left?
> >>>
> >>
> >> Yes, found it[1]. Here is an execution history with latest 1024
> >> executed progs before crash saved.
> >> Hope it can help. I'll also follow this crash closely, see if Healer
> >> can find a reproducer and send it to you once it found.
> >>
> >> [1] https://drive.google.com/file/d/14k8qOFeyKPD4HsqOpIjud3b9jsxFSo-u/view?usp=sharing
> >>
> >>> --
> >>> Pavel Begunkov
>
> --
> Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-13  2:26           ` Hao Sun
@ 2021-09-13  2:50             ` Jens Axboe
  2021-09-13  3:21               ` Hao Sun
  2021-09-13  8:30             ` Pavel Begunkov
  1 sibling, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2021-09-13  2:50 UTC (permalink / raw)
  To: Hao Sun, Pavel Begunkov; +Cc: io-uring, linux-kernel

On 9/12/21 8:26 PM, Hao Sun wrote:
> Hi
> 
> Healer found a C reproducer for this crash ("INFO: task hung in
> io_ring_exit_work").
> 
> HEAD commit: 4b93c544e90e-thunderbolt: test: split up test cases

Does this reproduce on 5.15-rc1? We had a few hang cases fixed for io-wq
since that commit 6 days ago.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-13  2:50             ` Jens Axboe
@ 2021-09-13  3:21               ` Hao Sun
  0 siblings, 0 replies; 11+ messages in thread
From: Hao Sun @ 2021-09-13  3:21 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-kernel

> Jens Axboe <[email protected]> 于2021年9月13日周一 上午10:50写道：
>
> On 9/12/21 8:26 PM, Hao Sun wrote:
> > Hi
> >
> > Healer found a C reproducer for this crash ("INFO: task hung in
> > io_ring_exit_work").
> >
> > HEAD commit: 4b93c544e90e-thunderbolt: test: split up test cases
>
> Does this reproduce on 5.15-rc1? We had a few hang cases fixed for io-wq
> since that commit 6 days ago.

Just tried it. No, it did not crash the kernel on 5.15-rc1.
Following log was printed repeatedly:
[  647.478557][ T6807]  loop0: p1 p2 < > p3 p4
[  647.478922][ T6807] loop0: p1 size 11290111 extends beyond EOD, truncated
[  647.481111][ T6807] loop0: p3 size 1914664839 extends beyond EOD, truncated
[  647.482512][ T6807] loop0: p4 size 3389030400 extends beyond EOD, truncated

It seems that this crash was fixed. Sorry for the dup report.

>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-13  2:26           ` Hao Sun
  2021-09-13  2:50             ` Jens Axboe
@ 2021-09-13  8:30             ` Pavel Begunkov
  2021-09-14  3:01               ` Hao Sun
  1 sibling, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2021-09-13  8:30 UTC (permalink / raw)
  To: Hao Sun, Jens Axboe; +Cc: io-uring, linux-kernel

On 9/13/21 3:26 AM, Hao Sun wrote:
> Hi
> 
> Healer found a C reproducer for this crash ("INFO: task hung in
> io_ring_exit_work").
> 
> HEAD commit: 4b93c544e90e-thunderbolt: test: split up test cases
> git tree: upstream
> console output:
> https://drive.google.com/file/d/1NswMU2yMRTc8-EqbZcVvcJejV92cuZIk/view?usp=sharing
> kernel config: https://drive.google.com/file/d/1c0u2EeRDhRO-ZCxr9MP2VvAtJd6kfg-p/view?usp=sharing
> C reproducer: https://drive.google.com/file/d/170wk5_T8mYDaAtDcrdVi2UU9_dW1894s/view?usp=sharing
> Syzlang reproducer:
> https://drive.google.com/file/d/1eo-jAS9lncm4i-1kaCBkexrjpQHXboBq/view?usp=sharing
> 
> If you fix this issue, please add the following tag to the commit:
> Reported-by: Hao Sun <[email protected]>

I don't see the repro using io_uring at all. Can it be because of
the delay before the warning shows itself? 120 secs, this appeared
after 143.

[...]

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: INFO: task hung in io_uring_cancel_generic
  2021-09-13  8:30             ` Pavel Begunkov
@ 2021-09-14  3:01               ` Hao Sun
  0 siblings, 0 replies; 11+ messages in thread
From: Hao Sun @ 2021-09-14  3:01 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, linux-kernel

Pavel Begunkov <[email protected]> 于2021年9月13日周一 下午4:30写道：
>
> On 9/13/21 3:26 AM, Hao Sun wrote:
> > Hi
> >
> > Healer found a C reproducer for this crash ("INFO: task hung in
> > io_ring_exit_work").
> >
> > HEAD commit: 4b93c544e90e-thunderbolt: test: split up test cases
> > git tree: upstream
> > console output:
> > https://drive.google.com/file/d/1NswMU2yMRTc8-EqbZcVvcJejV92cuZIk/view?usp=sharing
> > kernel config: https://drive.google.com/file/d/1c0u2EeRDhRO-ZCxr9MP2VvAtJd6kfg-p/view?usp=sharing
> > C reproducer: https://drive.google.com/file/d/170wk5_T8mYDaAtDcrdVi2UU9_dW1894s/view?usp=sharing
> > Syzlang reproducer:
> > https://drive.google.com/file/d/1eo-jAS9lncm4i-1kaCBkexrjpQHXboBq/view?usp=sharing
> >
> > If you fix this issue, please add the following tag to the commit:
> > Reported-by: Hao Sun <[email protected]>
>
> I don't see the repro using io_uring at all. Can it be because of
> the delay before the warning shows itself? 120 secs, this appeared
> after 143.
>

I think the crash was most likely fixed. Here is what I've done.
First, I re-run the whole execution history
(https://drive.google.com/file/d/1NswMU2yMRTc8-EqbZcVvcJejV92cuZIk/view?usp=sharing)
with `syz-repro` on  latest kernel (6880fa6c5660 Linux 5.15-rc1). The
kernel did not crash at all.
Then, I re-run the history on the original version of the kernel
(4b93c544e90e-thunderbolt: test: split up test cases). It crashed and
task hang happened but with a different location
("io_wq_submit_work").
Since `syz-repro` is smart enough and will give prog enough timeout to
be executed when the crash type is `Hang` (see
https://github.com/google/syzkaller/blob/master/pkg/repro/repro.go#L98),
the delay before a warning can be handled properly.

However, I'll still keep track of this crash since it was still not
reproduced yet.

> [...]

>
> --
> Pavel Begunkov

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-09-14  3:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-07 11:50 INFO: task hung in io_uring_cancel_generic Hao Sun
2021-09-07 19:30 ` Jens Axboe
2021-09-07 21:31   ` Pavel Begunkov
2021-09-08  1:01     ` Hao Sun
2021-09-08  6:57       ` Hao Sun
2021-09-08 20:16         ` Pavel Begunkov
2021-09-13  2:26           ` Hao Sun
2021-09-13  2:50             ` Jens Axboe
2021-09-13  3:21               ` Hao Sun
2021-09-13  8:30             ` Pavel Begunkov
2021-09-14  3:01               ` Hao Sun

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox