public inbox for [email protected]
 help / color / mirror / Atom feed
* [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes
@ 2024-10-31 11:20 Peter Mann
  2024-10-31 13:54 ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Mann @ 2024-10-31 11:20 UTC (permalink / raw)
  To: axboe; +Cc: asml.silence, io-uring

Hello,

it appears that there is a high probability of a deadlock occuring when 
performing fsfreeze on a filesystem which is currently performing 
multiple io_uring O_DIRECT writes.

Steps to reproduce:
1. Mount xfs or ext4 filesystem on /mnt

2. Start writing to the filesystem. Must use io_uring, direct io and 
iodepth>1 to reproduce:
fio --ioengine=io_uring --direct=1 --bs=4k --size=100M --rw=randwrite 
--loops=100000 --iodepth=32 --name=test --filename=/mnt/fio_test

3. Run this in another shell. For me it deadlocks almost immediately:
while true; do fsfreeze -f /mnt/; echo froze; fsfreeze -u /mnt/; echo 
unfroze; done

4. Fsfreeze and all tasks attempting to write /mnt get stuck:
At this point all stuck processes cannot be killed by SIGKILL and they 
are stuck in uninterruptible sleep.
If you try 'touch /mnt/a' for example, the new process gets stuck in the 
exact same way as well.

This gets printed when running 6.11.4 with some debug options enabled:
[  539.586122] Showing all locks held in the system:
[  539.612972] 1 lock held by khungtaskd/35:
[  539.626204]  #0: ffffffffb3b1c100 (rcu_read_lock){....}-{1:2}, at: 
debug_show_all_locks+0x32/0x1e0
[  539.640561] 1 lock held by dmesg/640:
[  539.654282]  #0: ffff9fd541a8e0e0 (&user->lock){+.+.}-{3:3}, at: 
devkmsg_read+0x74/0x2d0
[  539.669220] 2 locks held by fio/647:
[  539.684253]  #0: ffff9fd54fe720b0 (&ctx->uring_lock){+.+.}-{3:3}, at: 
__do_sys_io_uring_enter+0x5c2/0x820
[  539.699565]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
io_issue_sqe+0x9c/0x780
[  539.715587] 2 locks held by fio/648:
[  539.732293]  #0: ffff9fd54fe710b0 (&ctx->uring_lock){+.+.}-{3:3}, at: 
__do_sys_io_uring_enter+0x5c2/0x820
[  539.749121]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
io_issue_sqe+0x9c/0x780
[  539.765484] 2 locks held by fio/649:
[  539.781483]  #0: ffff9fd541a8f0b0 (&ctx->uring_lock){+.+.}-{3:3}, at: 
__do_sys_io_uring_enter+0x5c2/0x820
[  539.798785]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
io_issue_sqe+0x9c/0x780
[  539.815466] 2 locks held by fio/650:
[  539.831966]  #0: ffff9fd54fe740b0 (&ctx->uring_lock){+.+.}-{3:3}, at: 
__do_sys_io_uring_enter+0x5c2/0x820
[  539.849527]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
io_issue_sqe+0x9c/0x780
[  539.867469] 1 lock held by fsfreeze/696:
[  539.884565]  #0: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: 
freeze_super+0x20a/0x600

I reproduced this bug on nvme, sata ssd, virtio disks and lvm logical 
volumes.
It deadlocks on all kernels that I tried (all on amd64):
6.12-rc5 (compiled from kernel.org)
6.11.4 (compiled from kernel.org)
6.10.11-1~bpo12+1 (debian)
6.1.0-23 (debian)
5.14.0-427.40.1.el9_4.x86_64 (rocky linux)
5.10.0-33-amd64 (debian)

I tried to compile some older ones to check if it's a regression, but 
those either didn't compile or didn't boot in my VM, sorry about that.
If you have anything specific for me to try, I'm happy to help.

Found this issue as well, so it seems like it's not just me:
https://gitlab.com/qemu-project/qemu/-/issues/881
Note that mariadb 10.6 adds support for io_uring, and that proxmox 
backups perform fsfreeze in the guest VM.

Originally I discovered this after a scheduled lvm snapshot of mariadb 
got stuck.
It appears that lvm calls dm_suspend, which then calls freeze_super, so 
it looks like the same bug to me.
I discovered the simpler fsfreeze/fio reproduction method when I tried 
to find a workaround.

Regards,
Peter Mann


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes
  2024-10-31 11:20 [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes Peter Mann
@ 2024-10-31 13:54 ` Jens Axboe
  2024-10-31 14:02   ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2024-10-31 13:54 UTC (permalink / raw)
  To: Peter Mann; +Cc: asml.silence, io-uring

On 10/31/24 5:20 AM, Peter Mann wrote:
> Hello,
> 
> it appears that there is a high probability of a deadlock occuring when performing fsfreeze on a filesystem which is currently performing multiple io_uring O_DIRECT writes.
> 
> Steps to reproduce:
> 1. Mount xfs or ext4 filesystem on /mnt
> 
> 2. Start writing to the filesystem. Must use io_uring, direct io and iodepth>1 to reproduce:
> fio --ioengine=io_uring --direct=1 --bs=4k --size=100M --rw=randwrite --loops=100000 --iodepth=32 --name=test --filename=/mnt/fio_test
> 
> 3. Run this in another shell. For me it deadlocks almost immediately:
> while true; do fsfreeze -f /mnt/; echo froze; fsfreeze -u /mnt/; echo unfroze; done
> 
> 4. Fsfreeze and all tasks attempting to write /mnt get stuck:
> At this point all stuck processes cannot be killed by SIGKILL and they are stuck in uninterruptible sleep.
> If you try 'touch /mnt/a' for example, the new process gets stuck in the exact same way as well.
> 
> This gets printed when running 6.11.4 with some debug options enabled:
> [  539.586122] Showing all locks held in the system:
> [  539.612972] 1 lock held by khungtaskd/35:
> [  539.626204]  #0: ffffffffb3b1c100 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1e0
> [  539.640561] 1 lock held by dmesg/640:
> [  539.654282]  #0: ffff9fd541a8e0e0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x74/0x2d0
> [  539.669220] 2 locks held by fio/647:
> [  539.684253]  #0: ffff9fd54fe720b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
> [  539.699565]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
> [  539.715587] 2 locks held by fio/648:
> [  539.732293]  #0: ffff9fd54fe710b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
> [  539.749121]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
> [  539.765484] 2 locks held by fio/649:
> [  539.781483]  #0: ffff9fd541a8f0b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
> [  539.798785]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
> [  539.815466] 2 locks held by fio/650:
> [  539.831966]  #0: ffff9fd54fe740b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
> [  539.849527]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
> [  539.867469] 1 lock held by fsfreeze/696:
> [  539.884565]  #0: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: freeze_super+0x20a/0x600
> 
> I reproduced this bug on nvme, sata ssd, virtio disks and lvm logical volumes.
> It deadlocks on all kernels that I tried (all on amd64):
> 6.12-rc5 (compiled from kernel.org)
> 6.11.4 (compiled from kernel.org)
> 6.10.11-1~bpo12+1 (debian)
> 6.1.0-23 (debian)
> 5.14.0-427.40.1.el9_4.x86_64 (rocky linux)
> 5.10.0-33-amd64 (debian)
> 
> I tried to compile some older ones to check if it's a regression, but
> those either didn't compile or didn't boot in my VM, sorry about that.
> If you have anything specific for me to try, I'm happy to help.
> 
> Found this issue as well, so it seems like it's not just me:
> https://gitlab.com/qemu-project/qemu/-/issues/881
> Note that mariadb 10.6 adds support for io_uring, and that proxmox backups perform fsfreeze in the guest VM.
> 
> Originally I discovered this after a scheduled lvm snapshot of mariadb
> got stuck. It appears that lvm calls dm_suspend, which then calls
> freeze_super, so it looks like the same bug to me. I discovered the
> simpler fsfreeze/fio reproduction method when I tried to find a
> workaround.

Thanks for the report! I'm pretty sure this is due to the freezing not
allowing task_work to run, which prevents completions from being run.
Hence you run into a situation where freezing isn't running the very IO
completions that will free up the rwsem, with IO issue being stuck on
the freeze having started.

I'll take a look...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes
  2024-10-31 13:54 ` Jens Axboe
@ 2024-10-31 14:02   ` Jens Axboe
  2024-10-31 15:37     ` Peter Mann
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2024-10-31 14:02 UTC (permalink / raw)
  To: Peter Mann; +Cc: asml.silence, io-uring

On 10/31/24 7:54 AM, Jens Axboe wrote:
> On 10/31/24 5:20 AM, Peter Mann wrote:
>> Hello,
>>
>> it appears that there is a high probability of a deadlock occuring when performing fsfreeze on a filesystem which is currently performing multiple io_uring O_DIRECT writes.
>>
>> Steps to reproduce:
>> 1. Mount xfs or ext4 filesystem on /mnt
>>
>> 2. Start writing to the filesystem. Must use io_uring, direct io and iodepth>1 to reproduce:
>> fio --ioengine=io_uring --direct=1 --bs=4k --size=100M --rw=randwrite --loops=100000 --iodepth=32 --name=test --filename=/mnt/fio_test
>>
>> 3. Run this in another shell. For me it deadlocks almost immediately:
>> while true; do fsfreeze -f /mnt/; echo froze; fsfreeze -u /mnt/; echo unfroze; done
>>
>> 4. Fsfreeze and all tasks attempting to write /mnt get stuck:
>> At this point all stuck processes cannot be killed by SIGKILL and they are stuck in uninterruptible sleep.
>> If you try 'touch /mnt/a' for example, the new process gets stuck in the exact same way as well.
>>
>> This gets printed when running 6.11.4 with some debug options enabled:
>> [  539.586122] Showing all locks held in the system:
>> [  539.612972] 1 lock held by khungtaskd/35:
>> [  539.626204]  #0: ffffffffb3b1c100 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1e0
>> [  539.640561] 1 lock held by dmesg/640:
>> [  539.654282]  #0: ffff9fd541a8e0e0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x74/0x2d0
>> [  539.669220] 2 locks held by fio/647:
>> [  539.684253]  #0: ffff9fd54fe720b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>> [  539.699565]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>> [  539.715587] 2 locks held by fio/648:
>> [  539.732293]  #0: ffff9fd54fe710b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>> [  539.749121]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>> [  539.765484] 2 locks held by fio/649:
>> [  539.781483]  #0: ffff9fd541a8f0b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>> [  539.798785]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>> [  539.815466] 2 locks held by fio/650:
>> [  539.831966]  #0: ffff9fd54fe740b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>> [  539.849527]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>> [  539.867469] 1 lock held by fsfreeze/696:
>> [  539.884565]  #0: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: freeze_super+0x20a/0x600
>>
>> I reproduced this bug on nvme, sata ssd, virtio disks and lvm logical volumes.
>> It deadlocks on all kernels that I tried (all on amd64):
>> 6.12-rc5 (compiled from kernel.org)
>> 6.11.4 (compiled from kernel.org)
>> 6.10.11-1~bpo12+1 (debian)
>> 6.1.0-23 (debian)
>> 5.14.0-427.40.1.el9_4.x86_64 (rocky linux)
>> 5.10.0-33-amd64 (debian)
>>
>> I tried to compile some older ones to check if it's a regression, but
>> those either didn't compile or didn't boot in my VM, sorry about that.
>> If you have anything specific for me to try, I'm happy to help.
>>
>> Found this issue as well, so it seems like it's not just me:
>> https://gitlab.com/qemu-project/qemu/-/issues/881
>> Note that mariadb 10.6 adds support for io_uring, and that proxmox backups perform fsfreeze in the guest VM.
>>
>> Originally I discovered this after a scheduled lvm snapshot of mariadb
>> got stuck. It appears that lvm calls dm_suspend, which then calls
>> freeze_super, so it looks like the same bug to me. I discovered the
>> simpler fsfreeze/fio reproduction method when I tried to find a
>> workaround.
> 
> Thanks for the report! I'm pretty sure this is due to the freezing not
> allowing task_work to run, which prevents completions from being run.
> Hence you run into a situation where freezing isn't running the very IO
> completions that will free up the rwsem, with IO issue being stuck on
> the freeze having started.
> 
> I'll take a look...

Can you try the below? Probably easiest on 6.12-rc5 as you already
tested that and should apply directly.

diff --git a/io_uring/rw.c b/io_uring/rw.c
index 30448f343c7f..ea057ec4365f 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -1013,6 +1013,18 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
 	return IOU_OK;
 }
 
+static bool io_kiocb_start_write(struct io_kiocb *req, struct kiocb *kiocb)
+{
+	if (!(req->flags & REQ_F_ISREG))
+		return true;
+	if (!(kiocb->ki_flags & IOCB_NOWAIT)) {
+		kiocb_start_write(kiocb);
+		return true;
+	}
+
+	return sb_start_write_trylock(file_inode(kiocb->ki_filp)->i_sb);
+}
+
 int io_write(struct io_kiocb *req, unsigned int issue_flags)
 {
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
@@ -1050,8 +1062,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(ret))
 		return ret;
 
-	if (req->flags & REQ_F_ISREG)
-		kiocb_start_write(kiocb);
+	if (unlikely(!io_kiocb_start_write(req, kiocb)))
+		return -EAGAIN;
 	kiocb->ki_flags |= IOCB_WRITE;
 
 	if (likely(req->file->f_op->write_iter))

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes
  2024-10-31 14:02   ` Jens Axboe
@ 2024-10-31 15:37     ` Peter Mann
  2024-10-31 15:43       ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Mann @ 2024-10-31 15:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: asml.silence, io-uring

On 10/31/24 15:02, Jens Axboe wrote:
> On 10/31/24 7:54 AM, Jens Axboe wrote:
>> On 10/31/24 5:20 AM, Peter Mann wrote:
>>> Hello,
>>>
>>> it appears that there is a high probability of a deadlock occuring when performing fsfreeze on a filesystem which is currently performing multiple io_uring O_DIRECT writes.
>>>
>>> Steps to reproduce:
>>> 1. Mount xfs or ext4 filesystem on /mnt
>>>
>>> 2. Start writing to the filesystem. Must use io_uring, direct io and iodepth>1 to reproduce:
>>> fio --ioengine=io_uring --direct=1 --bs=4k --size=100M --rw=randwrite --loops=100000 --iodepth=32 --name=test --filename=/mnt/fio_test
>>>
>>> 3. Run this in another shell. For me it deadlocks almost immediately:
>>> while true; do fsfreeze -f /mnt/; echo froze; fsfreeze -u /mnt/; echo unfroze; done
>>>
>>> 4. Fsfreeze and all tasks attempting to write /mnt get stuck:
>>> At this point all stuck processes cannot be killed by SIGKILL and they are stuck in uninterruptible sleep.
>>> If you try 'touch /mnt/a' for example, the new process gets stuck in the exact same way as well.
>>>
>>> This gets printed when running 6.11.4 with some debug options enabled:
>>> [  539.586122] Showing all locks held in the system:
>>> [  539.612972] 1 lock held by khungtaskd/35:
>>> [  539.626204]  #0: ffffffffb3b1c100 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1e0
>>> [  539.640561] 1 lock held by dmesg/640:
>>> [  539.654282]  #0: ffff9fd541a8e0e0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x74/0x2d0
>>> [  539.669220] 2 locks held by fio/647:
>>> [  539.684253]  #0: ffff9fd54fe720b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>> [  539.699565]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>> [  539.715587] 2 locks held by fio/648:
>>> [  539.732293]  #0: ffff9fd54fe710b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>> [  539.749121]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>> [  539.765484] 2 locks held by fio/649:
>>> [  539.781483]  #0: ffff9fd541a8f0b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>> [  539.798785]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>> [  539.815466] 2 locks held by fio/650:
>>> [  539.831966]  #0: ffff9fd54fe740b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>> [  539.849527]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>> [  539.867469] 1 lock held by fsfreeze/696:
>>> [  539.884565]  #0: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: freeze_super+0x20a/0x600
>>>
>>> I reproduced this bug on nvme, sata ssd, virtio disks and lvm logical volumes.
>>> It deadlocks on all kernels that I tried (all on amd64):
>>> 6.12-rc5 (compiled from kernel.org)
>>> 6.11.4 (compiled from kernel.org)
>>> 6.10.11-1~bpo12+1 (debian)
>>> 6.1.0-23 (debian)
>>> 5.14.0-427.40.1.el9_4.x86_64 (rocky linux)
>>> 5.10.0-33-amd64 (debian)
>>>
>>> I tried to compile some older ones to check if it's a regression, but
>>> those either didn't compile or didn't boot in my VM, sorry about that.
>>> If you have anything specific for me to try, I'm happy to help.
>>>
>>> Found this issue as well, so it seems like it's not just me:
>>> https://gitlab.com/qemu-project/qemu/-/issues/881
>>> Note that mariadb 10.6 adds support for io_uring, and that proxmox backups perform fsfreeze in the guest VM.
>>>
>>> Originally I discovered this after a scheduled lvm snapshot of mariadb
>>> got stuck. It appears that lvm calls dm_suspend, which then calls
>>> freeze_super, so it looks like the same bug to me. I discovered the
>>> simpler fsfreeze/fio reproduction method when I tried to find a
>>> workaround.
>> Thanks for the report! I'm pretty sure this is due to the freezing not
>> allowing task_work to run, which prevents completions from being run.
>> Hence you run into a situation where freezing isn't running the very IO
>> completions that will free up the rwsem, with IO issue being stuck on
>> the freeze having started.
>>
>> I'll take a look...
> Can you try the below? Probably easiest on 6.12-rc5 as you already
> tested that and should apply directly.
>
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 30448f343c7f..ea057ec4365f 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -1013,6 +1013,18 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
>   	return IOU_OK;
>   }
>   
> +static bool io_kiocb_start_write(struct io_kiocb *req, struct kiocb *kiocb)
> +{
> +	if (!(req->flags & REQ_F_ISREG))
> +		return true;
> +	if (!(kiocb->ki_flags & IOCB_NOWAIT)) {
> +		kiocb_start_write(kiocb);
> +		return true;
> +	}
> +
> +	return sb_start_write_trylock(file_inode(kiocb->ki_filp)->i_sb);
> +}
> +
>   int io_write(struct io_kiocb *req, unsigned int issue_flags)
>   {
>   	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
> @@ -1050,8 +1062,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>   	if (unlikely(ret))
>   		return ret;
>   
> -	if (req->flags & REQ_F_ISREG)
> -		kiocb_start_write(kiocb);
> +	if (unlikely(!io_kiocb_start_write(req, kiocb)))
> +		return -EAGAIN;
>   	kiocb->ki_flags |= IOCB_WRITE;
>   
>   	if (likely(req->file->f_op->write_iter))
>

I can confirm this fixes both the fsfreeze and lvm snapshot issues.

Thank you very much!

-- 
Peter Mann

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes
  2024-10-31 15:37     ` Peter Mann
@ 2024-10-31 15:43       ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2024-10-31 15:43 UTC (permalink / raw)
  To: Peter Mann; +Cc: asml.silence, io-uring

On 10/31/24 9:37 AM, Peter Mann wrote:
> On 10/31/24 15:02, Jens Axboe wrote:
>> On 10/31/24 7:54 AM, Jens Axboe wrote:
>>> On 10/31/24 5:20 AM, Peter Mann wrote:
>>>> Hello,
>>>>
>>>> it appears that there is a high probability of a deadlock occuring when performing fsfreeze on a filesystem which is currently performing multiple io_uring O_DIRECT writes.
>>>>
>>>> Steps to reproduce:
>>>> 1. Mount xfs or ext4 filesystem on /mnt
>>>>
>>>> 2. Start writing to the filesystem. Must use io_uring, direct io and iodepth>1 to reproduce:
>>>> fio --ioengine=io_uring --direct=1 --bs=4k --size=100M --rw=randwrite --loops=100000 --iodepth=32 --name=test --filename=/mnt/fio_test
>>>>
>>>> 3. Run this in another shell. For me it deadlocks almost immediately:
>>>> while true; do fsfreeze -f /mnt/; echo froze; fsfreeze -u /mnt/; echo unfroze; done
>>>>
>>>> 4. Fsfreeze and all tasks attempting to write /mnt get stuck:
>>>> At this point all stuck processes cannot be killed by SIGKILL and they are stuck in uninterruptible sleep.
>>>> If you try 'touch /mnt/a' for example, the new process gets stuck in the exact same way as well.
>>>>
>>>> This gets printed when running 6.11.4 with some debug options enabled:
>>>> [  539.586122] Showing all locks held in the system:
>>>> [  539.612972] 1 lock held by khungtaskd/35:
>>>> [  539.626204]  #0: ffffffffb3b1c100 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x32/0x1e0
>>>> [  539.640561] 1 lock held by dmesg/640:
>>>> [  539.654282]  #0: ffff9fd541a8e0e0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x74/0x2d0
>>>> [  539.669220] 2 locks held by fio/647:
>>>> [  539.684253]  #0: ffff9fd54fe720b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>>> [  539.699565]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>>> [  539.715587] 2 locks held by fio/648:
>>>> [  539.732293]  #0: ffff9fd54fe710b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>>> [  539.749121]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>>> [  539.765484] 2 locks held by fio/649:
>>>> [  539.781483]  #0: ffff9fd541a8f0b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>>> [  539.798785]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>>> [  539.815466] 2 locks held by fio/650:
>>>> [  539.831966]  #0: ffff9fd54fe740b0 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_enter+0x5c2/0x820
>>>> [  539.849527]  #1: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: io_issue_sqe+0x9c/0x780
>>>> [  539.867469] 1 lock held by fsfreeze/696:
>>>> [  539.884565]  #0: ffff9fd541a8d450 (sb_writers#15){++++}-{0:0}, at: freeze_super+0x20a/0x600
>>>>
>>>> I reproduced this bug on nvme, sata ssd, virtio disks and lvm logical volumes.
>>>> It deadlocks on all kernels that I tried (all on amd64):
>>>> 6.12-rc5 (compiled from kernel.org)
>>>> 6.11.4 (compiled from kernel.org)
>>>> 6.10.11-1~bpo12+1 (debian)
>>>> 6.1.0-23 (debian)
>>>> 5.14.0-427.40.1.el9_4.x86_64 (rocky linux)
>>>> 5.10.0-33-amd64 (debian)
>>>>
>>>> I tried to compile some older ones to check if it's a regression, but
>>>> those either didn't compile or didn't boot in my VM, sorry about that.
>>>> If you have anything specific for me to try, I'm happy to help.
>>>>
>>>> Found this issue as well, so it seems like it's not just me:
>>>> https://gitlab.com/qemu-project/qemu/-/issues/881
>>>> Note that mariadb 10.6 adds support for io_uring, and that proxmox backups perform fsfreeze in the guest VM.
>>>>
>>>> Originally I discovered this after a scheduled lvm snapshot of mariadb
>>>> got stuck. It appears that lvm calls dm_suspend, which then calls
>>>> freeze_super, so it looks like the same bug to me. I discovered the
>>>> simpler fsfreeze/fio reproduction method when I tried to find a
>>>> workaround.
>>> Thanks for the report! I'm pretty sure this is due to the freezing not
>>> allowing task_work to run, which prevents completions from being run.
>>> Hence you run into a situation where freezing isn't running the very IO
>>> completions that will free up the rwsem, with IO issue being stuck on
>>> the freeze having started.
>>>
>>> I'll take a look...
>> Can you try the below? Probably easiest on 6.12-rc5 as you already
>> tested that and should apply directly.
>>
>> diff --git a/io_uring/rw.c b/io_uring/rw.c
>> index 30448f343c7f..ea057ec4365f 100644
>> --- a/io_uring/rw.c
>> +++ b/io_uring/rw.c
>> @@ -1013,6 +1013,18 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags)
>>       return IOU_OK;
>>   }
>>   +static bool io_kiocb_start_write(struct io_kiocb *req, struct kiocb *kiocb)
>> +{
>> +    if (!(req->flags & REQ_F_ISREG))
>> +        return true;
>> +    if (!(kiocb->ki_flags & IOCB_NOWAIT)) {
>> +        kiocb_start_write(kiocb);
>> +        return true;
>> +    }
>> +
>> +    return sb_start_write_trylock(file_inode(kiocb->ki_filp)->i_sb);
>> +}
>> +
>>   int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>   {
>>       bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
>> @@ -1050,8 +1062,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
>>       if (unlikely(ret))
>>           return ret;
>>   -    if (req->flags & REQ_F_ISREG)
>> -        kiocb_start_write(kiocb);
>> +    if (unlikely(!io_kiocb_start_write(req, kiocb)))
>> +        return -EAGAIN;
>>       kiocb->ki_flags |= IOCB_WRITE;
>>         if (likely(req->file->f_op->write_iter))
>>
> 
> I can confirm this fixes both the fsfreeze and lvm snapshot issues.
> 
> Thank you very much!

Thanks for the great bug report and the quick testing! Much appreciated.
Patch will land in 6.12-rc6 and make its way back to stable after that.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-10-31 15:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 11:20 [bug report] io_uring: fsfreeze deadlocks when performing O_DIRECT writes Peter Mann
2024-10-31 13:54 ` Jens Axboe
2024-10-31 14:02   ` Jens Axboe
2024-10-31 15:37     ` Peter Mann
2024-10-31 15:43       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox