INFO: rcu detected stall in io_uring

public inbox for [email protected]
 help / color / mirror / Atom feed

* INFO: rcu detected stall in io_uring_release
@ 2020-04-18 18:59 syzbot
       [not found] ` <[email protected]>
  2020-05-12  2:17 ` syzbot
  0 siblings, 2 replies; 6+ messages in thread
From: syzbot @ 2020-04-18 18:59 UTC (permalink / raw)
  To: axboe, io-uring, linux-fsdevel, linux-kernel, syzkaller-bugs,
	viro

Hello,

syzbot found the following crash on:

HEAD commit:    8f3d9f35 Linux 5.7-rc1
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=115720c3e00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=5d351a1019ed81a2
dashboard link: https://syzkaller.appspot.com/bug?extid=66243bb7126c410cefe6
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 	0-....: (10500 ticks this GP) idle=57e/1/0x4000000000000002 softirq=44329/44329 fqs=5245 
	(t=10502 jiffies g=79401 q=2096)
NMI backtrace for cpu 0
CPU: 0 PID: 23184 Comm: syz-executor.5 Not tainted 5.7.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
 rcu_dump_cpu_stacks+0x19b/0x1e5 kernel/rcu/tree_stall.h:254
 print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
 rcu_pending kernel/rcu/tree.c:3225 [inline]
 rcu_sched_clock_irq.cold+0x55d/0xcfa kernel/rcu/tree.c:2296
 update_process_times+0x25/0x60 kernel/time/timer.c:1727
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
 tick_sched_timer+0x4e/0x140 kernel/time/tick-sched.c:1320
 __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
 __hrtimer_run_queues+0x5ca/0xed0 kernel/time/hrtimer.c:1584
 hrtimer_interrupt+0x312/0x770 kernel/time/hrtimer.c:1646
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 [inline]
 smp_apic_timer_interrupt+0x15b/0x600 arch/x86/kernel/apic/apic.c:1138
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
 </IRQ>
RIP: 0010:io_ring_ctx_wait_and_kill+0x98/0x5a0 fs/io_uring.c:7301
Code: 01 00 00 4d 89 f4 48 b8 00 00 00 00 00 fc ff df 4c 89 ed 49 c1 ec 03 48 c1 ed 03 49 01 c4 48 01 c5 eb 1c e8 3a ea 9d ff f3 90 <41> 80 3c 24 00 0f 85 53 04 00 00 48 83 bb 10 01 00 00 00 74 21 e8
RSP: 0018:ffffc9000897fdf0 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
RAX: ffff888024082080 RBX: ffff88808df8e000 RCX: 1ffff9200112ffab
RDX: 0000000000000000 RSI: ffffffff81d549c6 RDI: ffff88808df8e300
RBP: ffffed1011bf1c2c R08: 0000000000000001 R09: ffffed1011bf1c61
R10: ffff88808df8e307 R11: ffffed1011bf1c60 R12: ffffed1011bf1c22
R13: ffff88808df8e160 R14: ffff88808df8e110 R15: ffffffff81d54ed0
 io_uring_release+0x3e/0x50 fs/io_uring.c:7324
 __fput+0x33e/0x880 fs/file_table.c:280
 task_work_run+0xf4/0x1b0 kernel/task_work.c:123
 tracehook_notify_resume include/linux/tracehook.h:188 [inline]
 exit_to_usermode_loop+0x2fa/0x360 arch/x86/entry/common.c:165
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
 do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x416421
Code: 75 14 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 0f 83 04 1b 00 00 c3 48 83 ec 08 e8 0a fc ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 53 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00007ffc3c9f63d0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 0000000000416421
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 01ffffffffffffff
R10: 0000000000770b20 R11: 0000000000000293 R12: 000000000076bfa0
R13: 0000000000770b30 R14: 0000000000000001 R15: 000000000076bfac


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at [email protected].

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu detected stall in io_uring_release
       [not found] ` <[email protected]>
@ 2020-04-19 19:57   ` Jens Axboe
  2020-04-20 11:47   ` Dan Carpenter
  1 sibling, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2020-04-19 19:57 UTC (permalink / raw)
  To: Hillf Danton, syzbot
  Cc: io-uring, linux-fsdevel, linux-kernel, syzkaller-bugs, viro

On 4/18/20 10:06 PM, Hillf Danton wrote:
> 
> Sat, 18 Apr 2020 11:59:13 -0700
>>
>> syzbot found the following crash on:
>>
>> HEAD commit:    8f3d9f35 Linux 5.7-rc1
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=115720c3e00000
>> kernel config:  https://syzkaller.appspot.com/x/.config?x=5d351a1019ed81a2
>> dashboard link: https://syzkaller.appspot.com/bug?extid=66243bb7126c410cefe6
>> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>>
>> Unfortunately, I don't have any reproducer for this crash yet.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: [email protected]
>>
>> rcu: INFO: rcu_preempt self-detected stall on CPU
>> rcu: 	0-....: (10500 ticks this GP) idle=57e/1/0x4000000000000002 softirq=44329/44329 fqs=5245 
>> 	(t=10502 jiffies g=79401 q=2096)
>> NMI backtrace for cpu 0
>> CPU: 0 PID: 23184 Comm: syz-executor.5 Not tainted 5.7.0-rc1-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>> Call Trace:
>>  <IRQ>
>>  __dump_stack lib/dump_stack.c:77 [inline]
>>  dump_stack+0x188/0x20d lib/dump_stack.c:118
>>  nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
>>  nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
>>  trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
>>  rcu_dump_cpu_stacks+0x19b/0x1e5 kernel/rcu/tree_stall.h:254
>>  print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
>>  check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
>>  rcu_pending kernel/rcu/tree.c:3225 [inline]
>>  rcu_sched_clock_irq.cold+0x55d/0xcfa kernel/rcu/tree.c:2296
>>  update_process_times+0x25/0x60 kernel/time/timer.c:1727
>>  tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
>>  tick_sched_timer+0x4e/0x140 kernel/time/tick-sched.c:1320
>>  __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
>>  __hrtimer_run_queues+0x5ca/0xed0 kernel/time/hrtimer.c:1584
>>  hrtimer_interrupt+0x312/0x770 kernel/time/hrtimer.c:1646
>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 [inline]
>>  smp_apic_timer_interrupt+0x15b/0x600 arch/x86/kernel/apic/apic.c:1138
>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
>>  </IRQ>
>> RIP: 0010:io_ring_ctx_wait_and_kill+0x98/0x5a0 fs/io_uring.c:7301
>> Code: 01 00 00 4d 89 f4 48 b8 00 00 00 00 00 fc ff df 4c 89 ed 49 c1 ec 03 48 c1 ed 03 49 01 c4 48 01 c5 eb 1c e8 3a ea 9d ff f3 90 <41> 80 3c 24 00 0f 85 53 04 00 00 48 83 bb 10 01 00 00 00 74 21 e8
>> RSP: 0018:ffffc9000897fdf0 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
>> RAX: ffff888024082080 RBX: ffff88808df8e000 RCX: 1ffff9200112ffab
>> RDX: 0000000000000000 RSI: ffffffff81d549c6 RDI: ffff88808df8e300
>> RBP: ffffed1011bf1c2c R08: 0000000000000001 R09: ffffed1011bf1c61
>> R10: ffff88808df8e307 R11: ffffed1011bf1c60 R12: ffffed1011bf1c22
>> R13: ffff88808df8e160 R14: ffff88808df8e110 R15: ffffffff81d54ed0
>>  io_uring_release+0x3e/0x50 fs/io_uring.c:7324
>>  __fput+0x33e/0x880 fs/file_table.c:280
>>  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
>>  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>>  exit_to_usermode_loop+0x2fa/0x360 arch/x86/entry/common.c:165
>>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>>  syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
>>  do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
>>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> Make io ring ctx's percpu_ref balanced.
> 
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5904,6 +5904,7 @@ static int io_submit_sqes(struct io_ring
>  fail_req:
>  			io_cqring_add_event(req, err);
>  			io_double_put_req(req);
> +			--submitted;
>  			break;
>  		}

Not sure this is right, need to look closer. But if we post a completion
event, the event has been consumed and should be accounted as such.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu detected stall in io_uring_release
       [not found] ` <[email protected]>
  2020-04-19 19:57   ` Jens Axboe
@ 2020-04-20 11:47   ` Dan Carpenter
  2020-04-20 12:57     ` Pavel Begunkov
  1 sibling, 1 reply; 6+ messages in thread
From: Dan Carpenter @ 2020-04-20 11:47 UTC (permalink / raw)
  To: Hillf Danton
  Cc: syzbot, axboe, io-uring, linux-fsdevel, linux-kernel,
	syzkaller-bugs, viro

On Sun, Apr 19, 2020 at 12:06:26PM +0800, Hillf Danton wrote:
> 
> Sat, 18 Apr 2020 11:59:13 -0700
> > 
> > syzbot found the following crash on:
> > 
> > HEAD commit:    8f3d9f35 Linux 5.7-rc1
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=115720c3e00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=5d351a1019ed81a2
> > dashboard link: https://syzkaller.appspot.com/bug?extid=66243bb7126c410cefe6
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> > 
> > Unfortunately, I don't have any reproducer for this crash yet.
> > 
> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > Reported-by: [email protected]
> > 
> > rcu: INFO: rcu_preempt self-detected stall on CPU
> > rcu: 	0-....: (10500 ticks this GP) idle=57e/1/0x4000000000000002 softirq=44329/44329 fqs=5245 
> > 	(t=10502 jiffies g=79401 q=2096)
> > NMI backtrace for cpu 0
> > CPU: 0 PID: 23184 Comm: syz-executor.5 Not tainted 5.7.0-rc1-syzkaller #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> > Call Trace:
> >  <IRQ>
> >  __dump_stack lib/dump_stack.c:77 [inline]
> >  dump_stack+0x188/0x20d lib/dump_stack.c:118
> >  nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
> >  nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
> >  trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
> >  rcu_dump_cpu_stacks+0x19b/0x1e5 kernel/rcu/tree_stall.h:254
> >  print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
> >  check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
> >  rcu_pending kernel/rcu/tree.c:3225 [inline]
> >  rcu_sched_clock_irq.cold+0x55d/0xcfa kernel/rcu/tree.c:2296
> >  update_process_times+0x25/0x60 kernel/time/timer.c:1727
> >  tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
> >  tick_sched_timer+0x4e/0x140 kernel/time/tick-sched.c:1320
> >  __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
> >  __hrtimer_run_queues+0x5ca/0xed0 kernel/time/hrtimer.c:1584
> >  hrtimer_interrupt+0x312/0x770 kernel/time/hrtimer.c:1646
> >  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 [inline]
> >  smp_apic_timer_interrupt+0x15b/0x600 arch/x86/kernel/apic/apic.c:1138
> >  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
> >  </IRQ>
> > RIP: 0010:io_ring_ctx_wait_and_kill+0x98/0x5a0 fs/io_uring.c:7301
> > Code: 01 00 00 4d 89 f4 48 b8 00 00 00 00 00 fc ff df 4c 89 ed 49 c1 ec 03 48 c1 ed 03 49 01 c4 48 01 c5 eb 1c e8 3a ea 9d ff f3 90 <41> 80 3c 24 00 0f 85 53 04 00 00 48 83 bb 10 01 00 00 00 74 21 e8
> > RSP: 0018:ffffc9000897fdf0 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
> > RAX: ffff888024082080 RBX: ffff88808df8e000 RCX: 1ffff9200112ffab
> > RDX: 0000000000000000 RSI: ffffffff81d549c6 RDI: ffff88808df8e300
> > RBP: ffffed1011bf1c2c R08: 0000000000000001 R09: ffffed1011bf1c61
> > R10: ffff88808df8e307 R11: ffffed1011bf1c60 R12: ffffed1011bf1c22
> > R13: ffff88808df8e160 R14: ffff88808df8e110 R15: ffffffff81d54ed0
> >  io_uring_release+0x3e/0x50 fs/io_uring.c:7324
> >  __fput+0x33e/0x880 fs/file_table.c:280
> >  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
> >  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
> >  exit_to_usermode_loop+0x2fa/0x360 arch/x86/entry/common.c:165
> >  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> >  syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
> >  do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
> >  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> 
> Make io ring ctx's percpu_ref balanced.
> 
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5904,6 +5904,7 @@ static int io_submit_sqes(struct io_ring
>  fail_req:
>  			io_cqring_add_event(req, err);
>  			io_double_put_req(req);
> +			--submitted;
>  			break;
>  		}


fs/io_uring.c
  5880          for (i = 0; i < nr; i++) {
  5881                  const struct io_uring_sqe *sqe;
  5882                  struct io_kiocb *req;
  5883                  int err;
  5884  
  5885                  sqe = io_get_sqe(ctx);
  5886                  if (unlikely(!sqe)) {
  5887                          io_consume_sqe(ctx);
  5888                          break;
  5889                  }
  5890                  req = io_alloc_req(ctx, statep);
  5891                  if (unlikely(!req)) {
  5892                          if (!submitted)
  5893                                  submitted = -EAGAIN;
  5894                          break;
  5895                  }
  5896  
  5897                  err = io_init_req(ctx, req, sqe, statep, async);
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
On the success path io_init_req() takes some references like:

	get_cred(req->work.creds);

That one is probably buggy and should be put if the call to:

	return io_req_set_file(state, req, fd, sqe_flags);

fails...  But io_req_set_file() takes some other references if it
succeeds like percpu_ref_get(req->fixed_file_refs); and it's not clear
that those are released if io_submit_sqe() fails.

  5898                  io_consume_sqe(ctx);
  5899                  /* will complete beyond this point, count as submitted */
  5900                  submitted++;
  5901  
  5902                  if (unlikely(err)) {
  5903  fail_req:
  5904                          io_cqring_add_event(req, err);
  5905                          io_double_put_req(req);
  5906                          break;
  5907                  }
  5908  
  5909                  trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data,
  5910                                                  true, async);
  5911                  err = io_submit_sqe(req, sqe, statep, &link);
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
here

  5912                  if (err)
  5913                          goto fail_req;
  5914          }

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu detected stall in io_uring_release
  2020-04-20 11:47   ` Dan Carpenter
@ 2020-04-20 12:57     ` Pavel Begunkov
  0 siblings, 0 replies; 6+ messages in thread
From: Pavel Begunkov @ 2020-04-20 12:57 UTC (permalink / raw)
  To: Dan Carpenter, Hillf Danton
  Cc: syzbot, axboe, io-uring, linux-fsdevel, linux-kernel,
	syzkaller-bugs, viro

On 4/20/2020 2:47 PM, Dan Carpenter wrote:
> On Sun, Apr 19, 2020 at 12:06:26PM +0800, Hillf Danton wrote:
>>
>> Sat, 18 Apr 2020 11:59:13 -0700
>>>
>>> syzbot found the following crash on:
>>>
>>> HEAD commit:    8f3d9f35 Linux 5.7-rc1
>>> git tree:       upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=115720c3e00000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=5d351a1019ed81a2
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=66243bb7126c410cefe6
>>> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
>>>
>>> Unfortunately, I don't have any reproducer for this crash yet.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: [email protected]
>>>
>>> rcu: INFO: rcu_preempt self-detected stall on CPU
>>> rcu: 	0-....: (10500 ticks this GP) idle=57e/1/0x4000000000000002 softirq=44329/44329 fqs=5245 
>>> 	(t=10502 jiffies g=79401 q=2096)
>>> NMI backtrace for cpu 0
>>> CPU: 0 PID: 23184 Comm: syz-executor.5 Not tainted 5.7.0-rc1-syzkaller #0
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>> Call Trace:
>>>  <IRQ>
>>>  __dump_stack lib/dump_stack.c:77 [inline]
>>>  dump_stack+0x188/0x20d lib/dump_stack.c:118
>>>  nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
>>>  nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
>>>  trigger_single_cpu_backtrace include/linux/nmi.h:164 [inline]
>>>  rcu_dump_cpu_stacks+0x19b/0x1e5 kernel/rcu/tree_stall.h:254
>>>  print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
>>>  check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
>>>  rcu_pending kernel/rcu/tree.c:3225 [inline]
>>>  rcu_sched_clock_irq.cold+0x55d/0xcfa kernel/rcu/tree.c:2296
>>>  update_process_times+0x25/0x60 kernel/time/timer.c:1727
>>>  tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
>>>  tick_sched_timer+0x4e/0x140 kernel/time/tick-sched.c:1320
>>>  __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
>>>  __hrtimer_run_queues+0x5ca/0xed0 kernel/time/hrtimer.c:1584
>>>  hrtimer_interrupt+0x312/0x770 kernel/time/hrtimer.c:1646
>>>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 [inline]
>>>  smp_apic_timer_interrupt+0x15b/0x600 arch/x86/kernel/apic/apic.c:1138
>>>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
>>>  </IRQ>
>>> RIP: 0010:io_ring_ctx_wait_and_kill+0x98/0x5a0 fs/io_uring.c:7301
>>> Code: 01 00 00 4d 89 f4 48 b8 00 00 00 00 00 fc ff df 4c 89 ed 49 c1 ec 03 48 c1 ed 03 49 01 c4 48 01 c5 eb 1c e8 3a ea 9d ff f3 90 <41> 80 3c 24 00 0f 85 53 04 00 00 48 83 bb 10 01 00 00 00 74 21 e8
>>> RSP: 0018:ffffc9000897fdf0 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
>>> RAX: ffff888024082080 RBX: ffff88808df8e000 RCX: 1ffff9200112ffab
>>> RDX: 0000000000000000 RSI: ffffffff81d549c6 RDI: ffff88808df8e300
>>> RBP: ffffed1011bf1c2c R08: 0000000000000001 R09: ffffed1011bf1c61
>>> R10: ffff88808df8e307 R11: ffffed1011bf1c60 R12: ffffed1011bf1c22
>>> R13: ffff88808df8e160 R14: ffff88808df8e110 R15: ffffffff81d54ed0
>>>  io_uring_release+0x3e/0x50 fs/io_uring.c:7324
>>>  __fput+0x33e/0x880 fs/file_table.c:280
>>>  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
>>>  tracehook_notify_resume include/linux/tracehook.h:188 [inline]
>>>  exit_to_usermode_loop+0x2fa/0x360 arch/x86/entry/common.c:165
>>>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>>>  syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
>>>  do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
>>>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
>>
>> Make io ring ctx's percpu_ref balanced.
>>
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5904,6 +5904,7 @@ static int io_submit_sqes(struct io_ring
>>  fail_req:
>>  			io_cqring_add_event(req, err);
>>  			io_double_put_req(req);
>> +			--submitted;
>>  			break;
>>  		}
> 
> 
> fs/io_uring.c
>   5880          for (i = 0; i < nr; i++) {
>   5881                  const struct io_uring_sqe *sqe;
>   5882                  struct io_kiocb *req;
>   5883                  int err;
>   5884  
>   5885                  sqe = io_get_sqe(ctx);
>   5886                  if (unlikely(!sqe)) {
>   5887                          io_consume_sqe(ctx);
>   5888                          break;
>   5889                  }
>   5890                  req = io_alloc_req(ctx, statep);
>   5891                  if (unlikely(!req)) {
>   5892                          if (!submitted)
>   5893                                  submitted = -EAGAIN;
>   5894                          break;
>   5895                  }
>   5896  
>   5897                  err = io_init_req(ctx, req, sqe, statep, async);
>                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> On the success path io_init_req() takes some references like:
> 
> 	get_cred(req->work.creds);

If a req have got into io_init_req(), than it'll be put at some point
with io_put_req(). io_req_work_drop_env() called from there will clean
up req->work.creds.

>  
> That one is probably buggy and should be put if the call to:
> 
> 	return io_req_set_file(state, req, fd, sqe_flags);
> 
> fails...  But io_req_set_file() takes some other references if it
> succeeds like percpu_ref_get(req->fixed_file_refs); and it's not clear
> that those are released if io_submit_sqe() fails.

The same should happen with req->fixed_file_refs, though I don't
remember in details.

> 
>   5898                  io_consume_sqe(ctx);
>   5899                  /* will complete beyond this point, count as submitted */
>   5900                  submitted++;

Regarding, "--submitted" patch -- we take 1 ctx->refs per request, which
is put in io_put_req(). So after a request passes the line above (5900),
it's ref will be eventually dropped in io_put_req() and friends.

And it's a bit more peculiar because io_submit_sqes() batch-takes N refs
first, and then puts unused back at the end.

>   5901  
>   5902                  if (unlikely(err)) {
>   5903  fail_req:
>   5904                          io_cqring_add_event(req, err);
>   5905                          io_double_put_req(req);
>   5906                          break;
>   5907                  }
>   5908  
>   5909                  trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data,
>   5910                                                  true, async);
>   5911                  err = io_submit_sqe(req, sqe, statep, &link);
>                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> here
> 
>   5912                  if (err)
>   5913                          goto fail_req;
>   5914          }
> 
> regards,
> dan carpenter
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu detected stall in io_uring_release
  2020-04-18 18:59 INFO: rcu detected stall in io_uring_release syzbot
       [not found] ` <[email protected]>
@ 2020-05-12  2:17 ` syzbot
  2020-05-12 15:25   ` Jens Axboe
  1 sibling, 1 reply; 6+ messages in thread
From: syzbot @ 2020-05-12  2:17 UTC (permalink / raw)
  To: asml.silence, axboe, dan.carpenter, hdanton, io-uring,
	linux-fsdevel, linux-kernel, syzkaller-bugs, viro

syzbot has found a reproducer for the following crash on:

HEAD commit:    ac935d22 Add linux-next specific files for 20200415
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13a38f0a100000
kernel config:  https://syzkaller.appspot.com/x/.config?x=bc498783097e9019
dashboard link: https://syzkaller.appspot.com/bug?extid=66243bb7126c410cefe6
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=179b3b32100000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: [email protected]

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 	0-....: (10533 ticks this GP) idle=aea/1/0x4000000000000002 softirq=10553/10555 fqs=5249 
	(t=10501 jiffies g=12777 q=556)
NMI backtrace for cpu 0
CPU: 0 PID: 8729 Comm: syz-executor.3 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x188/0x20d lib/dump_stack.c:118
 nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
 nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
 trigger_single_cpu_backtrace include/linux/nmi.h:165 [inline]
 rcu_dump_cpu_stacks+0x19b/0x1e5 kernel/rcu/tree_stall.h:254
 print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
 rcu_pending kernel/rcu/tree.c:3225 [inline]
 rcu_sched_clock_irq.cold+0x55d/0xd00 kernel/rcu/tree.c:2296
 update_process_times+0x25/0x60 kernel/time/timer.c:1727
 tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
 tick_sched_timer+0x4e/0x140 kernel/time/tick-sched.c:1320
 __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
 __hrtimer_run_queues+0x5ca/0xed0 kernel/time/hrtimer.c:1584
 hrtimer_interrupt+0x312/0x770 kernel/time/hrtimer.c:1646
 local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 [inline]
 smp_apic_timer_interrupt+0x15b/0x600 arch/x86/kernel/apic/apic.c:1138
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
 </IRQ>
RIP: 0010:io_ring_ctx_wait_and_kill+0x98/0x5a0 fs/io_uring.c:7301
Code: 01 00 00 4d 89 f4 48 b8 00 00 00 00 00 fc ff df 4c 89 ed 49 c1 ec 03 48 c1 ed 03 49 01 c4 48 01 c5 eb 1c e8 ba 65 9d ff f3 90 <41> 80 3c 24 00 0f 85 53 04 00 00 48 83 bb 10 01 00 00 00 74 21 e8
RSP: 0018:ffffc900054d7a50 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
RAX: ffff888097c14400 RBX: ffff8880963a0000 RCX: 1ffff92000a9af37
RDX: 0000000000000000 RSI: ffffffff81d5ced6 RDI: ffff8880963a0300
RBP: ffffed1012c7402c R08: 0000000000000001 R09: ffffed1012c74061
R10: ffff8880963a0307 R11: ffffed1012c74060 R12: ffffed1012c74022
R13: ffff8880963a0160 R14: ffff8880963a0110 R15: ffffffff81d5d3e0
 io_uring_release+0x3e/0x50 fs/io_uring.c:7324
 __fput+0x33e/0x880 fs/file_table.c:280
 task_work_run+0xf4/0x1b0 kernel/task_work.c:123
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0xb53/0x2e10 kernel/exit.c:795
 do_group_exit+0x125/0x340 kernel/exit.c:893
 get_signal+0x47b/0x24e0 kernel/signal.c:2735
 do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784
 exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
 do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
 entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x45c829
Code: Bad RIP value.
RSP: 002b:00007fdd2a37fc78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: 0000000000000003 RBX: 00000000004e0ce0 RCX: 000000000045c829
RDX: 0000000000000000 RSI: 0000000020000580 RDI: 00000000000000f1
RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000204 R14: 00000000004c425f R15: 00007fdd2a3806d4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: INFO: rcu detected stall in io_uring_release
  2020-05-12  2:17 ` syzbot
@ 2020-05-12 15:25   ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2020-05-12 15:25 UTC (permalink / raw)
  To: syzbot, asml.silence, dan.carpenter, hdanton, io-uring,
	linux-fsdevel, linux-kernel, syzkaller-bugs, viro

On 5/11/20 8:17 PM, syzbot wrote:
> syzbot has found a reproducer for the following crash on:
> 
> HEAD commit:    ac935d22 Add linux-next specific files for 20200415
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=13a38f0a100000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=bc498783097e9019
> dashboard link: https://syzkaller.appspot.com/bug?extid=66243bb7126c410cefe6
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=179b3b32100000
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: [email protected]
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu: 	0-....: (10533 ticks this GP) idle=aea/1/0x4000000000000002 softirq=10553/10555 fqs=5249 
> 	(t=10501 jiffies g=12777 q=556)
> NMI backtrace for cpu 0
> CPU: 0 PID: 8729 Comm: syz-executor.3 Not tainted 5.7.0-rc1-next-20200415-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
>  <IRQ>
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0x188/0x20d lib/dump_stack.c:118
>  nmi_cpu_backtrace.cold+0x70/0xb1 lib/nmi_backtrace.c:101
>  nmi_trigger_cpumask_backtrace+0x231/0x27e lib/nmi_backtrace.c:62
>  trigger_single_cpu_backtrace include/linux/nmi.h:165 [inline]
>  rcu_dump_cpu_stacks+0x19b/0x1e5 kernel/rcu/tree_stall.h:254
>  print_cpu_stall kernel/rcu/tree_stall.h:475 [inline]
>  check_cpu_stall kernel/rcu/tree_stall.h:549 [inline]
>  rcu_pending kernel/rcu/tree.c:3225 [inline]
>  rcu_sched_clock_irq.cold+0x55d/0xd00 kernel/rcu/tree.c:2296
>  update_process_times+0x25/0x60 kernel/time/timer.c:1727
>  tick_sched_handle+0x9b/0x180 kernel/time/tick-sched.c:176
>  tick_sched_timer+0x4e/0x140 kernel/time/tick-sched.c:1320
>  __run_hrtimer kernel/time/hrtimer.c:1520 [inline]
>  __hrtimer_run_queues+0x5ca/0xed0 kernel/time/hrtimer.c:1584
>  hrtimer_interrupt+0x312/0x770 kernel/time/hrtimer.c:1646
>  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113 [inline]
>  smp_apic_timer_interrupt+0x15b/0x600 arch/x86/kernel/apic/apic.c:1138
>  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:829
>  </IRQ>
> RIP: 0010:io_ring_ctx_wait_and_kill+0x98/0x5a0 fs/io_uring.c:7301
> Code: 01 00 00 4d 89 f4 48 b8 00 00 00 00 00 fc ff df 4c 89 ed 49 c1 ec 03 48 c1 ed 03 49 01 c4 48 01 c5 eb 1c e8 ba 65 9d ff f3 90 <41> 80 3c 24 00 0f 85 53 04 00 00 48 83 bb 10 01 00 00 00 74 21 e8
> RSP: 0018:ffffc900054d7a50 EFLAGS: 00000293 ORIG_RAX: ffffffffffffff13
> RAX: ffff888097c14400 RBX: ffff8880963a0000 RCX: 1ffff92000a9af37
> RDX: 0000000000000000 RSI: ffffffff81d5ced6 RDI: ffff8880963a0300
> RBP: ffffed1012c7402c R08: 0000000000000001 R09: ffffed1012c74061
> R10: ffff8880963a0307 R11: ffffed1012c74060 R12: ffffed1012c74022
> R13: ffff8880963a0160 R14: ffff8880963a0110 R15: ffffffff81d5d3e0
>  io_uring_release+0x3e/0x50 fs/io_uring.c:7324
>  __fput+0x33e/0x880 fs/file_table.c:280
>  task_work_run+0xf4/0x1b0 kernel/task_work.c:123
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0xb53/0x2e10 kernel/exit.c:795
>  do_group_exit+0x125/0x340 kernel/exit.c:893
>  get_signal+0x47b/0x24e0 kernel/signal.c:2735
>  do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784
>  exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161
>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:279 [inline]
>  do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305
>  entry_SYSCALL_64_after_hwframe+0x49/0xb3
> RIP: 0033:0x45c829
> Code: Bad RIP value.
> RSP: 002b:00007fdd2a37fc78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
> RAX: 0000000000000003 RBX: 00000000004e0ce0 RCX: 000000000045c829
> RDX: 0000000000000000 RSI: 0000000020000580 RDI: 00000000000000f1
> RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000000204 R14: 00000000004c425f R15: 00007fdd2a3806d4

#syz fix: io_uring: use cond_resched() in io_ring_ctx_wait_and_kill()


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-12 15:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-18 18:59 INFO: rcu detected stall in io_uring_release syzbot
     [not found] ` <[email protected]>
2020-04-19 19:57   ` Jens Axboe
2020-04-20 11:47   ` Dan Carpenter
2020-04-20 12:57     ` Pavel Begunkov
2020-05-12  2:17 ` syzbot
2020-05-12 15:25   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox