Re: INFO: task hung in linkwatch_event (2)

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Yunsheng Lin <[email protected]>
To: syzbot <[email protected]>,
	<[email protected]>, <[email protected]>,
	<[email protected]>, <[email protected]>,
	<[email protected]>, <[email protected]>,
	<[email protected]>, <[email protected]>,
	<[email protected]>, <[email protected]>,
	<[email protected]>, <[email protected]>,
	<[email protected]>, <[email protected]>
Subject: Re: INFO: task hung in linkwatch_event (2)
Date: Wed, 6 May 2020 09:38:21 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 2020/4/29 17:59, syzbot wrote:
> Hello,
> 
> syzbot found the following crash on:
> 
> HEAD commit:    b4f63322 Merge branch 'for-linus' of git://git.kernel.org/..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1558936fe00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68
> dashboard link: https://syzkaller.appspot.com/bug?extid=96ff6cfc4551fcc29342
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14a57828100000
> 
> The bug was bisected to:
> 
> commit 386d4716fd91869e07c731657f2cde5a33086516
> Author: Luo bin <[email protected]>
> Date:   Thu Feb 27 06:34:44 2020 +0000
> 
>     hinic: fix a bug of rss configuration

The above patch does not seem to be the cause of the crash.

From the below call trace, it seems the blocking is caused by
the tun_detach() which need to flush the all the pending work
for each online cpu, it is the linkwatch_work that need to be
flushed in this crash case. But the linkwatch_work() need to take
RTNL lock, which is already taken by the tun_detach(), and that is
where the blocking is happening.

Possible way to fix or avoid this:
1. Call flush_all_backlogs() without holding the RTNL lock, I am not
   sure it is safe to do this.
2. Disabling adding link event to the unregisterring netdev, and flush
   all the pending link event without taking RTNL lock before calling
   unregister_netdevice() in tun_detach().

Any better suggestion? Thanks.

> 
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=16626fcfe00000
> final crash:    https://syzkaller.appspot.com/x/report.txt?x=15626fcfe00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=11626fcfe00000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
> Fixes: 386d4716fd91 ("hinic: fix a bug of rss configuration")
> 
> INFO: task kworker/1:5:2724 blocked for more than 143 seconds.
>       Not tainted 5.7.0-rc2-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> kworker/1:5     D27416  2724      2 0x80004000
> Workqueue: events linkwatch_event
> Call Trace:
>  schedule+0xd0/0x2a0 kernel/sched/core.c:4163
>  schedule_preempt_disabled+0xf/0x20 kernel/sched/core.c:4222
>  __mutex_lock_common kernel/locking/mutex.c:1033 [inline]
>  __mutex_lock+0x7ab/0x13c0 kernel/locking/mutex.c:1103
>  linkwatch_event+0xb/0x60 net/core/link_watch.c:242
>  process_one_work+0x965/0x16a0 kernel/workqueue.c:2268
>  worker_thread+0x96/0xe20 kernel/workqueue.c:2414
>  kthread+0x388/0x470 kernel/kthread.c:268
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> INFO: task syz-executor.0:7053 blocked for more than 143 seconds.
>       Not tainted 5.7.0-rc2-syzkaller #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> syz-executor.0  D23512  7053      1 0x80004006
> Call Trace:
>  schedule+0xd0/0x2a0 kernel/sched/core.c:4163
>  schedule_timeout+0x55b/0x850 kernel/time/timer.c:1874
>  do_wait_for_common kernel/sched/completion.c:85 [inline]
>  __wait_for_common kernel/sched/completion.c:106 [inline]
>  wait_for_common kernel/sched/completion.c:117 [inline]
>  wait_for_completion+0x16a/0x270 kernel/sched/completion.c:138
>  __flush_work+0x4fd/0xa80 kernel/workqueue.c:3045
>  flush_all_backlogs net/core/dev.c:5527 [inline]
>  rollback_registered_many+0x562/0xe70 net/core/dev.c:8813
>  rollback_registered+0xf2/0x1c0 net/core/dev.c:8873
>  unregister_netdevice_queue net/core/dev.c:9969 [inline]
>  unregister_netdevice_queue+0x1d7/0x2b0 net/core/dev.c:9962
>  unregister_netdevice include/linux/netdevice.h:2725 [inline]
>  __tun_detach+0xe42/0x1110 drivers/net/tun.c:690
>  tun_detach drivers/net/tun.c:707 [inline]
>  tun_chr_close+0xd9/0x180 drivers/net/tun.c:3413
>  __fput+0x33e/0x880 fs/file_table.c:280
>  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0xb34/0x2dd0 kernel/exit.c:795
>  do_group_exit+0x125/0x340 kernel/exit.c:893
>  get_signal+0x47b/0x24e0 kernel/signal.c:2739
>  do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784
>  exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161
>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
>  do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x4166ca
> Code: Bad RIP value.
> RSP: 002b:00007ffd4022d478 EFLAGS: 00000246 ORIG_RAX: 000000000000003d
> RAX: fffffffffffffe00 RBX: 0000000001d60940 RCX: 00000000004166ca
> RDX: 0000000040000000 RSI: 00007ffd4022d4b0 RDI: ffffffffffffffff
> RBP: 0000000000002996 R08: 0000000000000001 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> R13: 00007ffd4022d4b0 R14: 0000000001d6099b R15: 00007ffd4022d4c0
> 
> Showing all locks held in the system:
> 1 lock held by khungtaskd/1125:
>  #0: ffffffff899beb00 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x53/0x260 kernel/locking/lockdep.c:5754
> 3 locks held by kworker/1:5/2724:
>  #0: ffff8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: __write_once_size include/linux/compiler.h:226 [inline]
>  #0: ffff8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: ffff8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline]
>  #0: ffff8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:40 [inline]
>  #0: ffff8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:615 [inline]
>  #0: ffff8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:642 [inline]
>  #0: ffff8880aa026d38 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x844/0x16a0 kernel/workqueue.c:2239
>  #1: ffffc90008367dc0 ((linkwatch_work).work){+.+.}-{0:0}, at: process_one_work+0x878/0x16a0 kernel/workqueue.c:2243
>  #2: ffffffff8a582268 (rtnl_mutex){+.+.}-{3:3}, at: linkwatch_event+0xb/0x60 net/core/link_watch.c:242
> 1 lock held by in:imklog/6717:
>  #0: ffff888098d271b0 (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100 fs/file.c:826
> 2 locks held by syz-executor.0/7053:
>  #0: ffffffff8a582268 (rtnl_mutex){+.+.}-{3:3}, at: tun_detach drivers/net/tun.c:704 [inline]
>  #0: ffffffff8a582268 (rtnl_mutex){+.+.}-{3:3}, at: tun_chr_close+0x3a/0x180 drivers/net/tun.c:3413
>  #1: ffffffff89979ad0 (cpu_hotplug_lock){++++}-{0:0}, at: get_online_cpus include/linux/cpu.h:143 [inline]
>  #1: ffffffff89979ad0 (cpu_hotplug_lock){++++}-{0:0}, at: flush_all_backlogs net/core/dev.c:5520 [inline]
>  #1: ffffffff89979ad0 (cpu_hotplug_lock){++++}-{0:0}, at: rollback_registered_many+0x45b/0xe70 net/core/dev.c:8813
> 3 locks held by kworker/1:6/14336:
>  #0: ffff88809ace8d38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: __write_once_size include/linux/compiler.h:226 [inline]
>  #0: ffff88809ace8d38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
>  #0: ffff88809ace8d38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: atomic64_set include/asm-generic/atomic-instrumented.h:855 [inline]
>  #0: ffff88809ace8d38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: atomic_long_set include/asm-generic/atomic-long.h:40 [inline]
>  #0: ffff88809ace8d38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: set_work_data kernel/workqueue.c:615 [inline]
>  #0: ffff88809ace8d38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: set_work_pool_and_clear_pending kernel/workqueue.c:642 [inline]
>  #0: ffff88809ace8d38 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x844/0x16a0 kernel/workqueue.c:2239
>  #1: ffffc90004637dc0 ((addr_chk_work).work){+.+.}-{0:0}, at: process_one_work+0x878/0x16a0 kernel/workqueue.c:2243
>  #2: ffffffff8a582268 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_verify_work+0xa/0x20 net/ipv6/addrconf.c:4584
> 
> =============================================
> 
> NMI backtrace for cpu 1
> CPU: 1 PID: 1125 Comm: khungtaskd Not tainted 5.7.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x188/0x20d lib/dump_stack.c:118
>  nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
>  nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
>  trigger_all_cpu_backtrace include/linux/nmi.h:146 [inline]
>  check_hung_uninterruptible_tasks kernel/hung_task.c:205 [inline]
>  watchdog+0xa8c/0x1010 kernel/hung_task.c:289
>  kthread+0x388/0x470 kernel/kthread.c:268
>  ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
> Sending NMI from CPU 1 to CPUs 0:
> NMI backtrace for cpu 0
> CPU: 0 PID: 28894 Comm: syz-executor.0 Not tainted 5.7.0-rc2-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:io_ring_ctx_wait_and_kill+0x98/0x5e0 fs/io_uring.c:7329
> Code: 01 00 00 4d 89 f4 48 b8 00 00 00 00 00 fc ff df 4c 89 ed 49 c1 ec 03 48 c1 ed 03 49 01 c4 48 01 c5 eb 1c e8 6a f2 9d ff f3 90 <41> 80 3c 24 00 0f 85 b0 04 00 00 48 83 bb 10 01 00 00 00 74 21 e8
> RSP: 0018:ffffc90004e17a48 EFLAGS: 00000293
> RAX: ffff888091758480 RBX: ffff888094860000 RCX: 1ffff920009c2f36
> RDX: 0000000000000000 RSI: ffffffff81d53c26 RDI: ffff888094860300
> RBP: ffffed101290c02c R08: 0000000000000001 R09: ffffed101290c061
> R10: ffff888094860307 R11: ffffed101290c060 R12: ffffed101290c022
> R13: ffff888094860160 R14: ffff888094860110 R15: ffffffff81d54170
> FS:  00007fac6c1a8700(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000560ad6a654a7 CR3: 0000000009879000 CR4: 00000000001406f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  io_uring_release+0x3e/0x50 fs/io_uring.c:7352
>  __fput+0x33e/0x880 fs/file_table.c:280
>  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0xb34/0x2dd0 kernel/exit.c:795
>  do_group_exit+0x125/0x340 kernel/exit.c:893
>  get_signal+0x47b/0x24e0 kernel/signal.c:2739
>  do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784
>  exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161
>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
>  do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x45c829
> Code: Bad RIP value.
> RSP: 002b:00007fac6c1a7c78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> RAX: 0000000000000003 RBX: 00000000004e0bc0 RCX: 000000000045c829
> RDX: 0000000000000000 RSI: 0000000020000580 RDI: 00000000000000f1
> RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000000204 R14: 00000000004c425f R15: 00007fac6c1a86d4
> 
> 
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at [email protected].
> 
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
> .
>

next prev parent reply	other threads:[~2020-05-06  1:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-29  9:59 INFO: task hung in linkwatch_event (2) syzbot
2020-05-06  1:38 ` Yunsheng Lin [this message]
     [not found] ` <[email protected]>
2020-05-06 11:20   ` Yunsheng Lin
2020-12-11  2:25 ` syzbot
2022-04-05  7:38 ` [syzbot] " syzbot
2022-05-12 13:26   ` Dmitry Vyukov
     [not found] <[email protected]>
2020-05-06 11:22 ` Yunsheng Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox