Possible io_uring regression with QEMU on Ubuntu's kernel

public inbox for [email protected]
 help / color / mirror / Atom feed

* Possible io_uring regression with QEMU on Ubuntu's kernel
@ 2021-06-30  8:47 Juhyung Park
       [not found] ` <CAEO-eVO_hEvGzoUdoExs67ybfQC0WgpwOLbg3n9fc+R4JfikZQ@mail.gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Juhyung Park @ 2021-06-30  8:47 UTC (permalink / raw)
  To: Kamal Mostafa, Stefan Bader, io-uring
  Cc: Jens Axboe, Stefano Garzarella, qemu-devel

Hi everyone.

With the latest Ubuntu 20.04's HWE kernel 5.8.0-59, I'm noticing some
weirdness when using QEMU/libvirt with the following storage
configuration:

<disk type="block" device="disk">
  <driver name="qemu" type="raw" cache="none" io="io_uring"
discard="unmap" detect_zeroes="unmap"/>
  <source dev="/dev/disk/by-id/md-uuid-df271a1e:9dfb7edb:8dc4fbb8:c43e652f-part1"
index="1"/>
  <backingStore/>
  <target dev="vda" bus="virtio"/>
  <alias name="virtio-disk0"/>
  <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</disk>

QEMU version is 5.2+dfsg-9ubuntu3 and libvirt version is 7.0.0-2ubuntu2.

The guest VM is unable to handle I/O properly with io_uring, and
nuking io="io_uring" fixes the issue.
On one machine (EPYC 7742), the partition table cannot be read and on
another (Ryzen 9 3950X), ext4 detects weirdness with journaling and
ultimately remounts the guest disk to R/O:

[    2.712321] virtio_blk virtio5: [vda] 3906519775 512-byte logical
blocks (2.00 TB/1.82 TiB)
[    2.714054] vda: detected capacity change from 0 to 2000138124800
[    2.963671] blk_update_request: I/O error, dev vda, sector 0 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.964909] Buffer I/O error on dev vda, logical block 0, async page read
[    2.966021] blk_update_request: I/O error, dev vda, sector 1 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.967177] Buffer I/O error on dev vda, logical block 1, async page read
[    2.968330] blk_update_request: I/O error, dev vda, sector 2 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.969504] Buffer I/O error on dev vda, logical block 2, async page read
[    2.970767] blk_update_request: I/O error, dev vda, sector 3 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.971624] Buffer I/O error on dev vda, logical block 3, async page read
[    2.972170] blk_update_request: I/O error, dev vda, sector 4 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.972728] Buffer I/O error on dev vda, logical block 4, async page read
[    2.973308] blk_update_request: I/O error, dev vda, sector 5 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.973920] Buffer I/O error on dev vda, logical block 5, async page read
[    2.974496] blk_update_request: I/O error, dev vda, sector 6 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.975093] Buffer I/O error on dev vda, logical block 6, async page read
[    2.975685] blk_update_request: I/O error, dev vda, sector 7 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.976295] Buffer I/O error on dev vda, logical block 7, async page read
[    2.980074] blk_update_request: I/O error, dev vda, sector 0 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.981104] Buffer I/O error on dev vda, logical block 0, async page read
[    2.981786] blk_update_request: I/O error, dev vda, sector 1 op
0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[    2.982083] ixgbe 0000:06:00.0: Multiqueue Enabled: Rx Queue count
= 63, Tx Queue count = 63 XDP Queue count = 0
[    2.982442] Buffer I/O error on dev vda, logical block 1, async page read
[    2.983642] ldm_validate_partition_table(): Disk read failed.

Kernel 5.8.0-55 is fine, and the only io_uring-related change between
5.8.0-55 and 5.8.0-59 is the commit 4b982bd0f383 ("io_uring: don't
mark S_ISBLK async work as unbounded").

The weird thing is that this commit was first introduced with v5.12,
but neither the mainline v5.12.0 or v5.13.0 is affected by this issue.

I guess one of these commits following the backported commit from
v5.12 fixes the issue, but that's just a guess. It might be another
earlier commit:
c7d95613c7d6 io_uring: fix early sqd_list removal sqpoll hangs
9728463737db io_uring: fix rw req completion
6ad7f2332e84 io_uring: clear F_REISSUE right after getting it
e82ad4853948 io_uring: fix !CONFIG_BLOCK compilation failure
230d50d448ac io_uring: move reissue into regular IO path
07204f21577a io_uring: fix EIOCBQUEUED iter revert
696ee88a7c50 io_uring/io-wq: protect against sprintf overflow

It would be much appreciated if Jens could give pointers to Canonical
developers on how to fix the issue, and hopefully a suggestion to
prevent this from happening again.

Thanks,
Regards

^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <CAEO-eVO_hEvGzoUdoExs67ybfQC0WgpwOLbg3n9fc+R4JfikZQ@mail.gmail.com>]

* Re: Possible io_uring regression with QEMU on Ubuntu's kernel
       [not found] ` <CAEO-eVO_hEvGzoUdoExs67ybfQC0WgpwOLbg3n9fc+R4JfikZQ@mail.gmail.com>
@ 2021-07-01 18:16   ` Juhyung Park
       [not found]     ` <CAEO-eVM49rB_a21hiQdpK_FQYPy=mUKtLuh4Aa2J3fGhw91isg@mail.gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Juhyung Park @ 2021-07-01 18:16 UTC (permalink / raw)
  To: Kamal Mostafa
  Cc: Stefan Bader, io-uring, Jens Axboe, Stefano Garzarella,
	qemu-devel, Ubuntu Kernel Team

Hi Kamal.

Thanks for the timely response.
We currently worked around the issue by installing linux-generic-hwe-20.04-edge.

I've just installed the new build that you provided but I'm afraid the
same issue persists.

I've double-checked that the kernel is installed properly:
root@datai-ampere:~# uname -a
Linux datai-ampere 5.8.0-59-generic #66~20.04.1+uringrevert0 SMP Thu
Jul 1 16:50:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
root@datai-ampere:~# cat /proc/version
Linux version 5.8.0-59-generic (ubuntu@ip-10-0-33-11) (gcc (Ubuntu
9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34)
#66~20.04.1+uringrevert0 SMP Thu Jul 1 16:50:12 UTC 2021

The guest VM is still unable to read /dev/vda's partition table with
READ errors.

Is the commit reverted properly?
If it is, I'm afraid that it might be something else, hmm..

I'm still certain that it's a regression from 5.8.0-55 to 5.8.0-59.

Thanks.

On Fri, Jul 2, 2021 at 2:50 AM Kamal Mostafa <[email protected]> wrote:
>
> Hi-
>
> Thanks very much for reporting this.  We picked up that patch ("io_uring: don't mark S_ISBLK async work as unbounded") for our Ubuntu v5.8 kernel from linux-stable/v5.10.31.  Since it's not clear that it's appropriate for v5.8 (or even v5.10-stable?) we'll revert it from Ubuntu v5.8 if you can confirm that actually fixes the problem.
>
> Here's a test build of that (5.8.0-59 with that commit reverted).  The full set of packages is provided, but you probably only actually need to install the linux-image and linux-modules[-extra] deb's. We'll stand by for your results:
> https://kernel.ubuntu.com/~kamal/uringrevert0/
>
> Thanks again,
>
>  -Kamal Mostafa (Canonical Kernel Team)
>
> On Wed, Jun 30, 2021 at 1:47 AM Juhyung Park <[email protected]> wrote:
>>
>> Hi everyone.
>>
>> With the latest Ubuntu 20.04's HWE kernel 5.8.0-59, I'm noticing some
>> weirdness when using QEMU/libvirt with the following storage
>> configuration:
>>
>> <disk type="block" device="disk">
>>   <driver name="qemu" type="raw" cache="none" io="io_uring"
>> discard="unmap" detect_zeroes="unmap"/>
>>   <source dev="/dev/disk/by-id/md-uuid-df271a1e:9dfb7edb:8dc4fbb8:c43e652f-part1"
>> index="1"/>
>>   <backingStore/>
>>   <target dev="vda" bus="virtio"/>
>>   <alias name="virtio-disk0"/>
>>   <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
>> </disk>
>>
>> QEMU version is 5.2+dfsg-9ubuntu3 and libvirt version is 7.0.0-2ubuntu2.
>>
>> The guest VM is unable to handle I/O properly with io_uring, and
>> nuking io="io_uring" fixes the issue.
>> On one machine (EPYC 7742), the partition table cannot be read and on
>> another (Ryzen 9 3950X), ext4 detects weirdness with journaling and
>> ultimately remounts the guest disk to R/O:
>>
>> [    2.712321] virtio_blk virtio5: [vda] 3906519775 512-byte logical
>> blocks (2.00 TB/1.82 TiB)
>> [    2.714054] vda: detected capacity change from 0 to 2000138124800
>> [    2.963671] blk_update_request: I/O error, dev vda, sector 0 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.964909] Buffer I/O error on dev vda, logical block 0, async page read
>> [    2.966021] blk_update_request: I/O error, dev vda, sector 1 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.967177] Buffer I/O error on dev vda, logical block 1, async page read
>> [    2.968330] blk_update_request: I/O error, dev vda, sector 2 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.969504] Buffer I/O error on dev vda, logical block 2, async page read
>> [    2.970767] blk_update_request: I/O error, dev vda, sector 3 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.971624] Buffer I/O error on dev vda, logical block 3, async page read
>> [    2.972170] blk_update_request: I/O error, dev vda, sector 4 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.972728] Buffer I/O error on dev vda, logical block 4, async page read
>> [    2.973308] blk_update_request: I/O error, dev vda, sector 5 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.973920] Buffer I/O error on dev vda, logical block 5, async page read
>> [    2.974496] blk_update_request: I/O error, dev vda, sector 6 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.975093] Buffer I/O error on dev vda, logical block 6, async page read
>> [    2.975685] blk_update_request: I/O error, dev vda, sector 7 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.976295] Buffer I/O error on dev vda, logical block 7, async page read
>> [    2.980074] blk_update_request: I/O error, dev vda, sector 0 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.981104] Buffer I/O error on dev vda, logical block 0, async page read
>> [    2.981786] blk_update_request: I/O error, dev vda, sector 1 op
>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> [    2.982083] ixgbe 0000:06:00.0: Multiqueue Enabled: Rx Queue count
>> = 63, Tx Queue count = 63 XDP Queue count = 0
>> [    2.982442] Buffer I/O error on dev vda, logical block 1, async page read
>> [    2.983642] ldm_validate_partition_table(): Disk read failed.
>>
>> Kernel 5.8.0-55 is fine, and the only io_uring-related change between
>> 5.8.0-55 and 5.8.0-59 is the commit 4b982bd0f383 ("io_uring: don't
>> mark S_ISBLK async work as unbounded").
>>
>> The weird thing is that this commit was first introduced with v5.12,
>> but neither the mainline v5.12.0 or v5.13.0 is affected by this issue.
>>
>> I guess one of these commits following the backported commit from
>> v5.12 fixes the issue, but that's just a guess. It might be another
>> earlier commit:
>> c7d95613c7d6 io_uring: fix early sqd_list removal sqpoll hangs
>> 9728463737db io_uring: fix rw req completion
>> 6ad7f2332e84 io_uring: clear F_REISSUE right after getting it
>> e82ad4853948 io_uring: fix !CONFIG_BLOCK compilation failure
>> 230d50d448ac io_uring: move reissue into regular IO path
>> 07204f21577a io_uring: fix EIOCBQUEUED iter revert
>> 696ee88a7c50 io_uring/io-wq: protect against sprintf overflow
>>
>> It would be much appreciated if Jens could give pointers to Canonical
>> developers on how to fix the issue, and hopefully a suggestion to
>> prevent this from happening again.
>>
>> Thanks,
>> Regards

^ permalink raw reply	[flat|nested] 3+ messages in thread

[parent not found: <CAEO-eVM49rB_a21hiQdpK_FQYPy=mUKtLuh4Aa2J3fGhw91isg@mail.gmail.com>]

* Re: Possible io_uring regression with QEMU on Ubuntu's kernel
       [not found]     ` <CAEO-eVM49rB_a21hiQdpK_FQYPy=mUKtLuh4Aa2J3fGhw91isg@mail.gmail.com>
@ 2021-07-08 10:02       ` Juhyung Park
  0 siblings, 0 replies; 3+ messages in thread
From: Juhyung Park @ 2021-07-08 10:02 UTC (permalink / raw)
  To: Kamal Mostafa; +Cc: Ubuntu Kernel Team, Jens Axboe, io-uring

Hi Kamal.

On Sat, Jul 3, 2021 at 2:33 AM Kamal Mostafa <[email protected]> wrote:
>
> Hi Juhyung-
> [trimmed the cc: list for now]

Let me add Jens and io-uring list back, juuust in case this affects
mainline too in a way that I didn't notice.

>
> We don't doubt it.  Before we ask you to start trying all the intervening kernels, let's try one more targeted shot.  Here's another test kernel which is 5.8.0-59 with a set of md/raid patches reverted.  Those patches -- backports targeting the bug "raid10: Block discard is very slow" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1896578 -- landed in 5.8.0-56.63_20.04.1.
>
> https://kernel.ubuntu.com/~kamal/uring-mdrevert1/
>
> TEST KERNEL 5.8.0-59.66~20.04.1+mdrevert1
>
> Revert "md: add md_submit_discard_bio() for submitting discard bio"
>
> Revert "md/raid10: extend r10bio devs to raid disks"
>
> Revert "md/raid10: pull the code that wait for blocked dev into one function"
>
> Revert "md/raid10: improve raid10 discard request"
>
> Revert "md/raid10: improve discard request for far layout"
>
> Revert "dm raid: remove unnecessary discard limits for raid0 and raid10"
>

The 3950X machine that had this issue as well didn't use a md device
to QEMU and simply used a partition under an NVMe device, so it was
unlikely that an md patch would cause the issue.

I've set up a kernel build environment and manually bisected the issue
(hence the delayed reply, apologies).

It was the commit 87c9cfe0fa1fb ("block: don't ignore REQ_NOWAIT for
direct IO").
(Upstream commit f8b78caf21d5bc3fcfc40c18898f9d52ed1451a5)

I've double checked by resetting the Git to
Ubuntu-hwe-5.8-5.8.0-59.66_20.04.1 and reverting that patch alone.
It fixes the issue.

It seems like this patch was backported to multiple stable trees, so
I'm not exactly sure why only Canonical's 5.8 is affected.
FWIW, 5.8.0-61 is also affected.

>
> Also (regardless of the outcome of that test kernel), we would like to start tracking this with a Launchpad.net bug.  If you'd be so kind as to file one via https://bugs.launchpad.net/ubuntu/+source/linux/+filebug it would be much appreciated.
>

Yep, will do this as well.

Thanks.

>  -Kamal
>
>
>>
>> On Fri, Jul 2, 2021 at 2:50 AM Kamal Mostafa <[email protected]> wrote:
>> >
>> > Hi-
>> >
>> > Thanks very much for reporting this.  We picked up that patch ("io_uring: don't mark S_ISBLK async work as unbounded") for our Ubuntu v5.8 kernel from linux-stable/v5.10.31.  Since it's not clear that it's appropriate for v5.8 (or even v5.10-stable?) we'll revert it from Ubuntu v5.8 if you can confirm that actually fixes the problem.
>> >
>> > Here's a test build of that (5.8.0-59 with that commit reverted).  The full set of packages is provided, but you probably only actually need to install the linux-image and linux-modules[-extra] deb's. We'll stand by for your results:
>> > https://kernel.ubuntu.com/~kamal/uringrevert0/
>> >
>> > Thanks again,
>> >
>> >  -Kamal Mostafa (Canonical Kernel Team)
>> >
>> > On Wed, Jun 30, 2021 at 1:47 AM Juhyung Park <[email protected]> wrote:
>> >>
>> >> Hi everyone.
>> >>
>> >> With the latest Ubuntu 20.04's HWE kernel 5.8.0-59, I'm noticing some
>> >> weirdness when using QEMU/libvirt with the following storage
>> >> configuration:
>> >>
>> >> <disk type="block" device="disk">
>> >>   <driver name="qemu" type="raw" cache="none" io="io_uring"
>> >> discard="unmap" detect_zeroes="unmap"/>
>> >>   <source dev="/dev/disk/by-id/md-uuid-df271a1e:9dfb7edb:8dc4fbb8:c43e652f-part1"
>> >> index="1"/>
>> >>   <backingStore/>
>> >>   <target dev="vda" bus="virtio"/>
>> >>   <alias name="virtio-disk0"/>
>> >>   <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
>> >> </disk>
>> >>
>> >> QEMU version is 5.2+dfsg-9ubuntu3 and libvirt version is 7.0.0-2ubuntu2.
>> >>
>> >> The guest VM is unable to handle I/O properly with io_uring, and
>> >> nuking io="io_uring" fixes the issue.
>> >> On one machine (EPYC 7742), the partition table cannot be read and on
>> >> another (Ryzen 9 3950X), ext4 detects weirdness with journaling and
>> >> ultimately remounts the guest disk to R/O:
>> >>
>> >> [    2.712321] virtio_blk virtio5: [vda] 3906519775 512-byte logical
>> >> blocks (2.00 TB/1.82 TiB)
>> >> [    2.714054] vda: detected capacity change from 0 to 2000138124800
>> >> [    2.963671] blk_update_request: I/O error, dev vda, sector 0 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.964909] Buffer I/O error on dev vda, logical block 0, async page read
>> >> [    2.966021] blk_update_request: I/O error, dev vda, sector 1 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.967177] Buffer I/O error on dev vda, logical block 1, async page read
>> >> [    2.968330] blk_update_request: I/O error, dev vda, sector 2 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.969504] Buffer I/O error on dev vda, logical block 2, async page read
>> >> [    2.970767] blk_update_request: I/O error, dev vda, sector 3 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.971624] Buffer I/O error on dev vda, logical block 3, async page read
>> >> [    2.972170] blk_update_request: I/O error, dev vda, sector 4 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.972728] Buffer I/O error on dev vda, logical block 4, async page read
>> >> [    2.973308] blk_update_request: I/O error, dev vda, sector 5 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.973920] Buffer I/O error on dev vda, logical block 5, async page read
>> >> [    2.974496] blk_update_request: I/O error, dev vda, sector 6 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.975093] Buffer I/O error on dev vda, logical block 6, async page read
>> >> [    2.975685] blk_update_request: I/O error, dev vda, sector 7 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.976295] Buffer I/O error on dev vda, logical block 7, async page read
>> >> [    2.980074] blk_update_request: I/O error, dev vda, sector 0 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.981104] Buffer I/O error on dev vda, logical block 0, async page read
>> >> [    2.981786] blk_update_request: I/O error, dev vda, sector 1 op
>> >> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
>> >> [    2.982083] ixgbe 0000:06:00.0: Multiqueue Enabled: Rx Queue count
>> >> = 63, Tx Queue count = 63 XDP Queue count = 0
>> >> [    2.982442] Buffer I/O error on dev vda, logical block 1, async page read
>> >> [    2.983642] ldm_validate_partition_table(): Disk read failed.
>> >>
>> >> Kernel 5.8.0-55 is fine, and the only io_uring-related change between
>> >> 5.8.0-55 and 5.8.0-59 is the commit 4b982bd0f383 ("io_uring: don't
>> >> mark S_ISBLK async work as unbounded").
>> >>
>> >> The weird thing is that this commit was first introduced with v5.12,
>> >> but neither the mainline v5.12.0 or v5.13.0 is affected by this issue.
>> >>
>> >> I guess one of these commits following the backported commit from
>> >> v5.12 fixes the issue, but that's just a guess. It might be another
>> >> earlier commit:
>> >> c7d95613c7d6 io_uring: fix early sqd_list removal sqpoll hangs
>> >> 9728463737db io_uring: fix rw req completion
>> >> 6ad7f2332e84 io_uring: clear F_REISSUE right after getting it
>> >> e82ad4853948 io_uring: fix !CONFIG_BLOCK compilation failure
>> >> 230d50d448ac io_uring: move reissue into regular IO path
>> >> 07204f21577a io_uring: fix EIOCBQUEUED iter revert
>> >> 696ee88a7c50 io_uring/io-wq: protect against sprintf overflow
>> >>
>> >> It would be much appreciated if Jens could give pointers to Canonical
>> >> developers on how to fix the issue, and hopefully a suggestion to
>> >> prevent this from happening again.
>> >>
>> >> Thanks,
>> >> Regards

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-08 10:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-30  8:47 Possible io_uring regression with QEMU on Ubuntu's kernel Juhyung Park
     [not found] ` <CAEO-eVO_hEvGzoUdoExs67ybfQC0WgpwOLbg3n9fc+R4JfikZQ@mail.gmail.com>
2021-07-01 18:16   ` Juhyung Park
     [not found]     ` <CAEO-eVM49rB_a21hiQdpK_FQYPy=mUKtLuh4Aa2J3fGhw91isg@mail.gmail.com>
2021-07-08 10:02       ` Juhyung Park

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox