public inbox for [email protected]
 help / color / mirror / Atom feed
From: Peter Mann <[email protected]>
To: [email protected]
Cc: [email protected], [email protected]
Subject: [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes
Date: Thu, 31 Oct 2024 12:20:41 +0100	[thread overview]
Message-ID: <[email protected]> (raw)

Hello,

it appears that there is a high probability of a deadlock occuring when 
performing fsfreeze on a filesystem which is currently performing 
multiple io_uring O_DIRECT writes.

Steps to reproduce:
1. Mount xfs or ext4 filesystem on /mnt

2. Start writing to the filesystem. Must use io_uring, direct io and 
iodepth>1 to reproduce:
fio --ioengine=io_uring --direct=1 --bs=4k --size=100M --rw=randwrite 
--loops=100000 --iodepth=32 --name=test --filename=/mnt/fio_test

3. Run this in another shell. For me it deadlocks almost immediately:
while true; do fsfreeze -f /mnt/; echo froze; fsfreeze -u /mnt/; echo 
unfroze; done

4. Fsfreeze and all tasks attempting to write /mnt get stuck:
At this point all stuck processes cannot be killed by SIGKILL and they 
are stuck in uninterruptible sleep.
If you try 'touch /mnt/a' for example, the new process gets stuck in the 
exact same way as well.

This gets printed when running 6.11.4 with some debug options enabled:
[  539.586122] Showing all locks held in the system:
[  539.612972] 1 lock held by khungtaskd/35:
[  539.626204]  #0: ffffffffb3b1c100 (rcu_read_lock){....}-{1:2}, at: 
debug_show_all_locks+0x32/0x1e0
[  539.640561] 1 lock held by dmesg/640:
[  539.654282]  #0: ffff9fd541a8e0e0 (&user->lock){+.+.}-{3:3}, at: 
devkmsg_read+0x74/0x2d0
[  539.669220] 2 locks held by fio/647:
[  539.684253]  #0: ffff9fd54fe720b0 (&ctx->uring_lock){+.+.}-{3:3}, at: 
__do_sys_io_uring_enter+0x5c2/0x820
[  539.699565]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
io_issue_sqe+0x9c/0x780
[  539.715587] 2 locks held by fio/648:
[  539.732293]  #0: ffff9fd54fe710b0 (&ctx->uring_lock){+.+.}-{3:3}, at: 
__do_sys_io_uring_enter+0x5c2/0x820
[  539.749121]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
io_issue_sqe+0x9c/0x780
[  539.765484] 2 locks held by fio/649:
[  539.781483]  #0: ffff9fd541a8f0b0 (&ctx->uring_lock){+.+.}-{3:3}, at: 
__do_sys_io_uring_enter+0x5c2/0x820
[  539.798785]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
io_issue_sqe+0x9c/0x780
[  539.815466] 2 locks held by fio/650:
[  539.831966]  #0: ffff9fd54fe740b0 (&ctx->uring_lock){+.+.}-{3:3}, at: 
__do_sys_io_uring_enter+0x5c2/0x820
[  539.849527]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
io_issue_sqe+0x9c/0x780
[  539.867469] 1 lock held by fsfreeze/696:
[  539.884565]  #0: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
freeze_super+0x20a/0x600

I reproduced this bug on nvme, sata ssd, virtio disks and lvm logical 
volumes.
It deadlocks on all kernels that I tried (all on amd64):
6.12-rc5 (compiled from kernel.org)
6.11.4 (compiled from kernel.org)
6.10.11-1~bpo12+1 (debian)
6.1.0-23 (debian)
5.14.0-427.40.1.el9_4.x86_64 (rocky linux)
5.10.0-33-amd64 (debian)

I tried to compile some older ones to check if it's a regression, but 
those either didn't compile or didn't boot in my VM, sorry about that.
If you have anything specific for me to try, I'm happy to help.

Found this issue as well, so it seems like it's not just me:
https://gitlab.com/qemu-project/qemu/-/issues/881
Note that mariadb 10.6 adds support for io_uring, and that proxmox 
backups perform fsfreeze in the guest VM.

Originally I discovered this after a scheduled lvm snapshot of mariadb 
got stuck.
It appears that lvm calls dm_suspend, which then calls freeze_super, so 
it looks like the same bug to me.
I discovered the simpler fsfreeze/fio reproduction method when I tried 
to find a workaround.

Regards,
Peter Mann


             reply	other threads:[~2024-10-31 11:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-31 11:20 Peter Mann [this message]
2024-10-31 13:54 ` [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes Jens Axboe
2024-10-31 14:02   ` Jens Axboe
2024-10-31 15:37     ` Peter Mann
2024-10-31 15:43       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox