* Re: INFO: task hung in __io_uring_task_cancel
[not found] <CAGyP=7cFM6BJE7X2PN9YUptQgt5uQYwM4aVmOiVayQPJg1pqaA@mail.gmail.com>
@ 2021-01-03 21:53 ` Jens Axboe
[not found] ` <[email protected]>
1 sibling, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2021-01-03 21:53 UTC (permalink / raw)
To: Palash Oswal, io-uring, linux-fsdevel, linux-kernel, mingo, mingo,
peterz, rostedt, syzkaller-bugs, viro, will
On 1/2/21 9:14 PM, Palash Oswal wrote:
> Hello,
>
> I was running syzkaller and I found the following issue :
>
> Head Commit : b1313fe517ca3703119dcc99ef3bbf75ab42bcfb ( v5.10.4 )
> Git Tree : stable
> Console Output :
> [ 242.769080] INFO: task repro:2639 blocked for more than 120 seconds.
> [ 242.769096] Not tainted 5.10.4 #8
> [ 242.769103] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 242.769112] task:repro state:D stack: 0 pid: 2639
> ppid: 2638 flags:0x00000004
> [ 242.769126] Call Trace:
> [ 242.769148] __schedule+0x28d/0x7e0
> [ 242.769162] ? __percpu_counter_sum+0x75/0x90
> [ 242.769175] schedule+0x4f/0xc0
> [ 242.769187] __io_uring_task_cancel+0xad/0xf0
> [ 242.769198] ? wait_woken+0x80/0x80
> [ 242.769210] bprm_execve+0x67/0x8a0
> [ 242.769223] do_execveat_common+0x1d2/0x220
> [ 242.769235] __x64_sys_execveat+0x5d/0x70
> [ 242.769249] do_syscall_64+0x38/0x90
> [ 242.769260] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 242.769270] RIP: 0033:0x7f59ce45967d
> [ 242.769277] RSP: 002b:00007ffd05d10a58 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000142
> [ 242.769290] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f59ce45967d
> [ 242.769297] RDX: 0000000000000000 RSI: 0000000020000180 RDI: 00000000ffffffff
> [ 242.769304] RBP: 00007ffd05d10a70 R08: 0000000000000000 R09: 00007ffd05d10a70
> [ 242.769311] R10: 0000000000000000 R11: 0000000000000246 R12: 000055a91d37d320
> [ 242.769318] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Can you see if this helps? The reproducer is pretty brutal, it'll fork
thousands of tasks with rings! But should work of course. I think this
one is pretty straight forward, and actually an older issue with the
poll rewaiting.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ca46f314640b..539de04f9183 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5103,6 +5103,12 @@ static bool io_poll_rewait(struct io_kiocb *req, struct io_poll_iocb *poll)
{
struct io_ring_ctx *ctx = req->ctx;
+ /* Never re-wait on poll if the ctx or task is going away */
+ if (percpu_ref_is_dying(&ctx->refs) || req->task->flags & PF_EXITING) {
+ spin_lock_irq(&ctx->completion_lock);
+ return false;
+ }
+
if (!req->result && !READ_ONCE(poll->canceled)) {
struct poll_table_struct pt = { ._key = poll->events };
--
Jens Axboe
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: INFO: task hung in __io_uring_task_cancel
[not found] ` <[email protected]>
@ 2021-01-04 5:23 ` Palash Oswal
[not found] ` <[email protected]>
0 siblings, 1 reply; 3+ messages in thread
From: Palash Oswal @ 2021-01-04 5:23 UTC (permalink / raw)
To: Hillf Danton, axboe
Cc: io-uring, linux-kernel, syzkaller-bugs, Pavel Begunkov, viro,
will, rostedt, peterz, mingo, mingo, linux-fsdevel
Hillf -
> Can you reproduce it again against 5.11-rc1 with the tiny diff applied
> to see if there is a missing wakeup in the mainline?
Hey Hillf, thanks for sharing the diff. It seems like the reproducer
that I had sent did not work on 5.11-rc1 itself, so I'm trying to get
an updated reproducer for that.
I'm not well versed with the io_uring code yet, and therefore it'll
take me longer to get the reproducer going for 5.11-rc1.
Jens -
> Can you see if this helps? The reproducer is pretty brutal, it'll fork
> thousands of tasks with rings! But should work of course. I think this
> one is pretty straight forward, and actually an older issue with the
> poll rewaiting.
Hey Jens, I applied your diff to 5.10.4 (
b1313fe517ca3703119dcc99ef3bbf75ab42bcfb ), and unfortunately, I'm
still seeing the task being hung. Here's the console log if this helps
further -
root@syzkaller:~# [ 242.840696] INFO: task repro:395 blocked for more
than 120 seconds.
[ 242.846353] Not tainted 5.10.4+ #9
[ 242.849951] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 242.857665] task:repro state:D stack: 0 pid: 395
ppid: 394 flags:0x00000004
[ 242.867346] Call Trace:
[ 242.870521] __schedule+0x28d/0x7e0
[ 242.873597] ? __percpu_counter_sum+0x75/0x90
[ 242.876794] schedule+0x4f/0xc0
[ 242.878803] __io_uring_task_cancel+0xad/0xf0
[ 242.880952] ? wait_woken+0x80/0x80
[ 242.882330] bprm_execve+0x67/0x8a0
[ 242.884142] do_execveat_common+0x1d2/0x220
[ 242.885610] __x64_sys_execveat+0x5d/0x70
[ 242.886708] do_syscall_64+0x38/0x90
[ 242.887727] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 242.889298] RIP: 0033:0x7ffabedd6469
[ 242.890265] RSP: 002b:00007ffc56b8bc78 EFLAGS: 00000246 ORIG_RAX:
0000000000000142
[ 242.892055] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffabedd6469
[ 242.893776] RDX: 0000000000000000 RSI: 0000000020000180 RDI: 00000000ffffffff
[ 242.895400] RBP: 00007ffc56b8bc90 R08: 0000000000000000 R09: 00007ffc56b8bc90
[ 242.896879] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559c19400bf0
[ 242.898335] R13: 00007ffc56b8bdb0 R14: 0000000000000000 R15: 0000000000000000
[ 363.691144] INFO: task repro:395 blocked for more than 241 seconds.
[ 363.693724] Not tainted 5.10.4+ #9
[ 363.695513] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 363.700543] task:repro state:D stack: 0 pid: 395
ppid: 394 flags:0x00000004
[ 363.705747] Call Trace:
[ 363.707359] __schedule+0x28d/0x7e0
[ 363.709603] ? __percpu_counter_sum+0x75/0x90
[ 363.712900] schedule+0x4f/0xc0
[ 363.715002] __io_uring_task_cancel+0xad/0xf0
[ 363.718026] ? wait_woken+0x80/0x80
[ 363.720137] bprm_execve+0x67/0x8a0
[ 363.721992] do_execveat_common+0x1d2/0x220
[ 363.723997] __x64_sys_execveat+0x5d/0x70
[ 363.725857] do_syscall_64+0x38/0x90
[ 363.727501] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 363.729510] RIP: 0033:0x7ffabedd6469
[ 363.730913] RSP: 002b:00007ffc56b8bc78 EFLAGS: 00000246 ORIG_RAX:
0000000000000142
[ 363.733747] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffabedd6469
[ 363.736138] RDX: 0000000000000000 RSI: 0000000020000180 RDI: 00000000ffffffff
[ 363.738431] RBP: 00007ffc56b8bc90 R08: 0000000000000000 R09: 00007ffc56b8bc90
[ 363.740504] R10: 0000000000000000 R11: 0000000000000246 R12: 0000559c19400bf0
[ 363.742560] R13: 00007ffc56b8bdb0 R14: 0000000000000000 R15: 0000000000000000
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: INFO: task hung in __io_uring_task_cancel
[not found] ` <[email protected]>
@ 2021-01-04 9:40 ` Palash Oswal
0 siblings, 0 replies; 3+ messages in thread
From: Palash Oswal @ 2021-01-04 9:40 UTC (permalink / raw)
To: Hillf Danton
Cc: axboe, io-uring, linux-kernel, syzkaller-bugs, Pavel Begunkov
On Mon, Jan 4, 2021 at 12:22 PM Hillf Danton <[email protected]> wrote:
> It is now updated.
Hello Hilf,
Thanks for the new diff. I tested by applying the diff on 5.10.4 with
the original reproducer, and the issue still persists.
root@syzkaller:~# [ 242.925799] INFO: task repro:416 blocked for more
than 120 seconds.
[ 242.928095] Not tainted 5.10.4+ #12
[ 242.929034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 242.930825] task:repro state:D stack: 0 pid: 416
ppid: 415 flags:0x00000004
[ 242.933404] Call Trace:
[ 242.934365] __schedule+0x28d/0x7e0
[ 242.935199] ? __percpu_counter_sum+0x75/0x90
[ 242.936265] schedule+0x4f/0xc0
[ 242.937159] __io_uring_task_cancel+0xc0/0xf0
[ 242.938340] ? wait_woken+0x80/0x80
[ 242.939380] bprm_execve+0x67/0x8a0
[ 242.940163] do_execveat_common+0x1d2/0x220
[ 242.941090] __x64_sys_execveat+0x5d/0x70
[ 242.942056] do_syscall_64+0x38/0x90
[ 242.943088] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 242.944511] RIP: 0033:0x7fd0b781e469
[ 242.945422] RSP: 002b:00007fffda20e9c8 EFLAGS: 00000246 ORIG_RAX:
0000000000000142
[ 242.947289] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fd0b781e469
[ 242.949031] RDX: 0000000000000000 RSI: 0000000020000180 RDI: 00000000ffffffff
[ 242.950683] RBP: 00007fffda20e9e0 R08: 0000000000000000 R09: 00007fffda20e9e0
[ 242.952450] R10: 0000000000000000 R11: 0000000000000246 R12: 0000556068200bf0
[ 242.954045] R13: 00007fffda20eb00 R14: 0000000000000000 R15: 0000000000000000
linux git:(b1313fe517ca) git diff
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0fcd065baa76..e0c5424e28b1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1867,8 +1867,7 @@ static void __io_free_req(struct io_kiocb *req)
io_dismantle_req(req);
percpu_counter_dec(&tctx->inflight);
- if (atomic_read(&tctx->in_idle))
- wake_up(&tctx->wait);
+ wake_up(&tctx->wait);
put_task_struct(req->task);
if (likely(!io_is_fallback_req(req)))
@@ -8853,12 +8852,11 @@ void __io_uring_task_cancel(void)
* If we've seen completions, retry. This avoids a race where
* a completion comes in before we did prepare_to_wait().
*/
- if (inflight != tctx_inflight(tctx))
- continue;
- schedule();
+ if (inflight == tctx_inflight(tctx))
+ schedule();
+ finish_wait(&tctx->wait, &wait);
} while (1);
- finish_wait(&tctx->wait, &wait);
atomic_dec(&tctx->in_idle);
}
^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-01-04 9:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAGyP=7cFM6BJE7X2PN9YUptQgt5uQYwM4aVmOiVayQPJg1pqaA@mail.gmail.com>
2021-01-03 21:53 ` INFO: task hung in __io_uring_task_cancel Jens Axboe
[not found] ` <[email protected]>
2021-01-04 5:23 ` Palash Oswal
[not found] ` <[email protected]>
2021-01-04 9:40 ` Palash Oswal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox