From: Stefan Metzmacher <[email protected]>
To: Steven Rostedt <[email protected]>, Ingo Molnar <[email protected]>
Cc: [email protected], io-uring <[email protected]>
Subject: Tracing busy processes/threads freezes/stalls the whole machine
Date: Thu, 22 Apr 2021 16:26:57 +0200 [thread overview]
Message-ID: <[email protected]> (raw)
Hi Steven, hi Ingo,
I recently tried to analyze the performance of Samba using io-uring.
I was using ubuntu 20.04 with the 5.10.0-1023-oem kernel, which is based on v5.10.25, see:
https://kernel.ubuntu.com/git/kernel-ppa/mirror/ubuntu-oem-5.10-focal.git/log/?h=oem-5.10-prep
trace-cmd is at version 2.8.3-4build1.
In order to find the bottleneck I tried to use (trace-cmd is at version 2.8.3-4build1):
trace-cmd -e all -P ${pid_of_io_uring_worker}
As a result the server was completely dead immediately.
I tried to reproduce this in a virtual machine (inside virtualbox).
I used a modified 'io_uring-cp' that loops forever, see:
https://github.com/metze-samba/liburing/commit/5e98efed053baf03521692e786c1c55690b04d8e
When I run './io_uring-cp-forever link-cp.c file',
I see a 'io_wq_manager' and a 'io_wqe_worker-0' kernel thread,
while './io_uring-cp-forever link-cp.c file' as well as 'io_wqe_worker-0'
consume about 25% cpu each.
When I run 'trace-cmd -e all -P $pid' for 'io_wqe_worker-0' or 'io_wq_manager'
I can reproduce the problem, then I found that the same seems to happen for
also for other kernel threads e.g. '[kworker/1:1-events]', it seems that
it happens for all kernel threads, which are not completely idle.
Which this:
From 'top':
1341 root 20 0 2512 576 508 S 33,4 0,1 0:10.39 io_uring-cp-for
1343 root 20 0 0 0 0 R 29,8 0,0 0:08.43 io_wqe_worker-0
7 root 20 0 0 0 0 I 0,3 0,0 0:00.31 kworker/0:1-events
PID 5 is [kworker/0:0-ata_sff]
# trace-cmd record -e all -P 5'
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x7b8000
0 bytes in size
CPU1 data recorded at offset=0x7b8000
69632 bytes in size
# But
# trace-cmd record -e all -P 7
=> machine unresponsive (no blinking cursor on the console anymore)
On the host 'top' shows that the VirtualBoxVM cpu emulator thread 'EMT-1'
uses 100% cpu, so I guess the guest kernel is in something like an endless
recursion loop. Maybe a trace function recursing to itself?
On the same VM I tried a 5.12rc8 kernel and there I can also reproduce the
problem.
I also managed to reproduce the problem without io-uring, just using:
while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done
in order to keep some kernel threads moving.
This happens with 5.12rc8 and 5.10.0-1023-oem, but I was not able to
reproduce any of this using the 5.8.0-50-generic kernel, see
https://kernel.ubuntu.com/git/ubuntu/ubuntu-focal.git/log/?h=Ubuntu-hwe-5.8-5.8.0-50.56_20.04.1
I was also able to reproduce this with a ubuntu 21.04 vm using
the 5.11.0-14-generic kernel, see:
https://kernel.ubuntu.com/git/ubuntu/ubuntu-hirsute.git/log/?h=Ubuntu-5.11.0-14.15
On this one I only managed to reproduce the problem with
'./io_uring-cp-forever link-cp.c file', but not with
'while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done'
So it seems the problem was introduced after 5.8 and is not really related to
io-uring. And it may not be purely related to kernel threads.
With this on 5.12-rc8 (again):
└─tmux: server,903
├─bash,904
│ └─io_uring-cp-for,925 link-cp.c file
│ ├─{iou-mgr-925},926
│ └─{iou-wrk-925},927
└─bash,929
└─pstree,938 -a -t -p
I was able to to trace once:
root@ub1704-166:~# trace-cmd record -e all -P 925
Hit Ctrl^C to stop recording
^CCPU0 data recorded at offset=0x7b8000
10842112 bytes in size
CPU1 data recorded at offset=0x120f000
36450304 bytes in size
But the 2nd run reproduced the problem:
root@ub1704-166:~# trace-cmd record -e all -P 925
I was also able to reproduce it with:
while true; do cat linux-image-5.12.0-rc8-dbg_5.12.0-rc8-22_amd64.deb > /dev/null; done
and
pidofcat=$(pidof cat); echo $pidofcat; trace-cmd record -e all -P $pidofcat
So it seems any busy thread (user or kernel) triggers the problem.
Any ideas what has changed after 5.8?
Thanks!
metze
next reply other threads:[~2021-04-22 14:27 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-22 14:26 Stefan Metzmacher [this message]
2021-04-22 14:55 ` Tracing busy processes/threads freezes/stalls the whole machine Stefan Metzmacher
2021-05-04 12:26 ` Stefan Metzmacher
2021-05-04 13:24 ` Steven Rostedt
2021-05-04 13:28 ` Stefan Metzmacher
2021-05-04 13:32 ` Steven Rostedt
2021-05-04 13:35 ` Steven Rostedt
2021-05-05 9:50 ` Stefan Metzmacher
2021-05-31 15:39 ` Stefan Metzmacher
2021-07-19 17:07 ` Stefan Metzmacher
2021-07-22 22:43 ` sched_waking vs. set_event_pid crash (Re: Tracing busy processes/threads freezes/stalls the whole machine) Stefan Metzmacher
2021-07-23 1:41 ` Steven Rostedt
2021-07-23 2:51 ` Steven Rostedt
2021-07-23 10:35 ` Stefan Metzmacher
2021-07-23 11:29 ` Steven Rostedt
2021-07-23 11:53 ` Stefan Metzmacher
2021-07-23 12:41 ` Steven Rostedt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox