public inbox for [email protected]
 help / color / mirror / Atom feed
* PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59
@ 2024-11-03 23:47 Andrew Marshall
  2024-11-03 23:53 ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Marshall @ 2024-11-03 23:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

Hi,

I, and others (see downstream report below), are encountering io_uring at times hanging on 6.6.59 LTS. If the process is killed, the process remains stuck in sleep uninterruptible ("D"). This failure can be fairly reliably reproduced via Node.js with `npm ci` in at least some projects; disabling that tool’s use of io_uring causes via its configuration causes it to succeed. I have identified what seems to be the problematic commit on linux-6.6.y (f4ce3b5).

Summary of Kernel version triaging:

- 6.6.56: succeeds
- 6.6.57: fails
- 6.6.58: fails
- 6.6.59: fails
- 6.6.59 (with f4ce3b5 reverted): succeeds
- 6.11.6: succeeds

System logs upon failure indicate hung task:

kernel: INFO: task npm ci:47920 blocked for more than 245 seconds.
kernel:       Tainted: P           O       6.6.58 #1-NixOS
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: task:npm ci          state:D stack:0     pid:47920 ppid:47710  flags:0x00004006
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x3fc/0x1430
kernel:  ? sysvec_apic_timer_interrupt+0xe/0x90
kernel:  schedule+0x5e/0xe0
kernel:  schedule_preempt_disabled+0x15/0x30
kernel:  __mutex_lock.constprop.0+0x3a2/0x6b0
kernel:  io_uring_del_tctx_node+0x61/0xf0
kernel:  io_uring_clean_tctx+0x5c/0xc0
kernel:  io_uring_cancel_generic+0x198/0x350
kernel:  ? srso_return_thunk+0x5/0x5f
kernel:  ? timerqueue_del+0x2e/0x50
kernel:  ? __pfx_autoremove_wake_function+0x10/0x10
kernel:  do_exit+0x167/0xad0
kernel:  ? __pfx_hrtimer_wakeup+0x10/0x10
kernel:  do_group_exit+0x31/0x80
kernel:  get_signal+0xa60/0xa60
kernel:  arch_do_signal_or_restart+0x3e/0x280
kernel:  exit_to_user_mode_prepare+0x1d4/0x230
kernel:  syscall_exit_to_user_mode+0x1b/0x50
kernel:  do_syscall_64+0x45/0x90
kernel:  entry_SYSCALL_64_after_hwframe+0x78/0xe2

For more details, see the downstream bug report in Node.js: https://github.com/nodejs/node/issues/55587

I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely problematic commit simply by browsing git log. As indicated above; reverting that atop 6.6.59 results in success. Since it is passing on 6.11.6, I suspect there is some missing backport to 6.6.x, or some other semantic merge conflict. Unfortunately I do not have a compact, minimal reproducer, but can provide my large one (it is testing a larger build process in a VM) if needed—there are some additional details in the above-linked downstream bug report, though. I hope that having identified the problematic commit is enough for someone with more context to go off of. Happy to provide more information if needed.


Thanks,
Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-11-06 14:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-03 23:47 PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59 Andrew Marshall
2024-11-03 23:53 ` Jens Axboe
2024-11-03 23:58   ` Jens Axboe
2024-11-04  0:01   ` Keith Busch
2024-11-04  0:06     ` Jens Axboe
2024-11-04  2:38       ` Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59") Jens Axboe
2024-11-04  4:25         ` Andrew Marshall
2024-11-04 13:17           ` Andrew Marshall
2024-11-04 15:58             ` Jens Axboe
2024-11-06  6:05         ` Greg Kroah-Hartman
2024-11-06 14:11           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox