Re: [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Peter Mann <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected], [email protected]
Subject: Re: [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes
Date: Thu, 31 Oct 2024 16:37:33 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 10/31/24 15:02, Jens Axboe wrote:
> On 10/31/24 7:54 AM, Jens Axboe wrote:
>> On 10/31/24 5:20 AM, Peter Mann wrote:
>>> Hello,
>>>
>>> it appears that there is a high probability of a deadlock occuring when performing fsfreeze on a filesystem which is currently performing multiple io_uring O_DIRECT writes.
>>>
>>> Steps to reproduce:
>>> 1. Mount xfs or ext4 filesystem on /mnt
>>>
>>> 2. Start writing to the filesystem. Must use io_uring, direct io and iodepth>1 to reproduce:
>>> fio --ioengine=io_uring --direct=1 --bs=4k --size=100M --rw=randwrite --loops=100000 --iodepth=32 --name=test --filename=/mnt/fio_test
>>>
>>> 3. Run this in another shell. For me it deadlocks almost immediately:
>>> while true; do fsfreeze -f /mnt/; echo froze; fsfreeze -u /mnt/; echo unfroze; done
>>>
>>> 4. Fsfreeze and all tasks attempting to write /mnt get stuck:
>>> At this point all stuck processes cannot be killed by SIGKILL and they are stuck in uninterruptible sleep.
>>> If you try 'touch /mnt/a' for example, the new process gets stuck in the exact same way as well.
>>>
>>> This gets printed when running 6.11.4 with some debug options enabled:
>>> [  539.586122] Showing all locks held in the system:
>>> [  539.612972] 1 lock held by khungtaskd/35:
>>> [  539.626204]  #0: ffffffffb3b1c100 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1e0
>>> [  539.640561] 1 lock held by dmesg/640:
>>> [  539.654282]  #0: ffff9fd541a8e0e0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x74/0x2d0
>>> [  539.669220] 2 locks held by fio/647:
>>> [  539.684253]  #0: ffff9fd54fe720b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>> [  539.699565]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>> [  539.715587] 2 locks held by fio/648:
>>> [  539.732293]  #0: ffff9fd54fe710b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>> [  539.749121]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>> [  539.765484] 2 locks held by fio/649:
>>> [  539.781483]  #0: ffff9fd541a8f0b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>> [  539.798785]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>> [  539.815466] 2 locks held by fio/650:
>>> [  539.831966]  #0: ffff9fd54fe740b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>> [  539.849527]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>> [  539.867469] 1 lock held by fsfreeze/696:
>>> [  539.884565]  #0: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: freeze_super+0x20a/0x600
>>>
>>> I reproduced this bug on nvme, sata ssd, virtio disks and lvm logical volumes.
>>> It deadlocks on all kernels that I tried (all on amd64):
>>> 6.12-rc5 (compiled from kernel.org)
>>> 6.11.4 (compiled from kernel.org)
>>> 6.10.11-1~bpo12+1 (debian)
>>> 6.1.0-23 (debian)
>>> 5.14.0-427.40.1.el9_4.x86_64 (rocky linux)
>>> 5.10.0-33-amd64 (debian)
>>>
>>> I tried to compile some older ones to check if it's a regression, but
>>> those either didn't compile or didn't boot in my VM, sorry about that.
>>> If you have anything specific for me to try, I'm happy to help.
>>>
>>> Found this issue as well, so it seems like it's not just me:
>>> https://gitlab.com/qemu-project/qemu/-/issues/881
>>> Note that mariadb 10.6 adds support for io_uring, and that proxmox backups perform fsfreeze in the guest VM.
>>>
>>> Originally I discovered this after a scheduled lvm snapshot of mariadb
>>> got stuck. It appears that lvm calls dm_suspend, which then calls
>>> freeze_super, so it looks like the same bug to me. I discovered the
>>> simpler fsfreeze/fio reproduction method when I tried to find a
>>> workaround.
>> Thanks for the report! I'm pretty sure this is due to the freezing not
>> allowing task_work to run, which prevents completions from being run.
>> Hence you run into a situation where freezing isn't running the very IO
>> completions that will free up the rwsem, with IO issue being stuck on
>> the freeze having started.
>>
>> I'll take a look...
> Can you try the below? Probably easiest on 6.12-rc5 as you already
> tested that and should apply directly.
>
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 30448f343c7f..ea057ec4365f 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -1013,6 +1013,18 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
>   	return IOU_OK;
>   }
>   
> +static bool io_kiocb_start_write(struct io_kiocb *req, struct kiocb *kiocb)
> +{
> +	if (!(req->flags & REQ_F_ISREG))
> +		return true;
> +	if (!(kiocb->ki_flags & IOCB_NOWAIT)) {
> +		kiocb_start_write(kiocb);
> +		return true;
> +	}
> +
> +	return sb_start_write_trylock(file_inode(kiocb->ki_filp)->i_sb);
> +}
> +
>   int io_write(struct io_kiocb *req, unsigned int issue_flags)
>   {
>   	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
> @@ -1050,8 +1062,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>   	if (unlikely(ret))
>   		return ret;
>   
> -	if (req->flags & REQ_F_ISREG)
> -		kiocb_start_write(kiocb);
> +	if (unlikely(!io_kiocb_start_write(req, kiocb)))
> +		return -EAGAIN;
>   	kiocb->ki_flags |= IOCB_WRITE;
>   
>   	if (likely(req->file->f_op->write_iter))
>

I can confirm this fixes both the fsfreeze and lvm snapshot issues.

Thank you very much!

-- 
Peter Mann

next prev parent reply	other threads:[~2024-10-31 15:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-31 11:20 [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes Peter Mann
2024-10-31 13:54 ` Jens Axboe
2024-10-31 14:02   ` Jens Axboe
2024-10-31 15:37     ` Peter Mann [this message]
2024-10-31 15:43       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox