Re: read corruption with qemu master io_uring engine / linux master / btrfs(?)

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: Dominique MARTINET <[email protected]>,
	[email protected], [email protected]
Subject: Re: read corruption with qemu master io_uring engine / linux master / btrfs(?)
Date: Tue, 28 Jun 2022 13:12:54 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 6/28/22 3:08 AM, Dominique MARTINET wrote:
> I don't have any good reproducer so it's a bit difficult to specify,
> let's start with what I have...
> 
> I've got this one VM which has various segfaults all over the place when
> starting it with aio=io_uring for its disk as follow:
> 
>   qemu-system-x86_64 -drive file=qemu/atde-test,if=none,id=hd0,format=raw,cache=none,aio=io_uring \
>       -device virtio-blk-pci,drive=hd0 -m 8G -smp 4 -serial mon:stdio -enable-kvm
> 
> It also happens with virtio-scsi-blk:
>   -device virtio-scsi-pci,id=scsihw0 \
>   -drive file=qemu/atde-test,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring \
>   -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100
> 
> It also happened when the disk I was using was a qcow file backing up a
> vmdk image (this VM's original disk is for vmware), so while I assume
> qemu reading code and qemu-img convert code are similar I'll pretend
> image format doesn't matter at this point...
> It's happened with two such images, but I haven't been able to reproduce
> with any other VMs yet.
> 
> I can also reproduce this on a second host machine with a completely
> different ssd (WD sata in one vs. samsung nvme), so probably not a
> firmware bug.
> 
> scrub sees no problem with my filesystems on the host.
> 
> I've confirmed it happens with at least debian testing's 5.16.0-4-amd64
> and 5.17.0-1-amd64 kernels, as well as 5.19.0-rc4.
> It also happens with both debian's 7.0.0 and the master branch
> (v7.0.0-2031-g40d522490714)
> 
> 
> These factors aside, anything else I tried changing made this bug no
> longer reproduce:
>  - I'm not sure what the rule is but it sometimes doesn't happen when
> running the VM twice in a row, sometimes it happens again. Making a
> fresh copy with `cp --reflink=always` of my source image seems to be
> reliable.
>  - it stops happening without io_uring
>  - it stops happening if I copy the disk image with --reflink=never
>  - it stops happening if I copy the disk image to another btrfs
> partition, created in the same lv, so something about my partition
> history matters?...
> (I have ssd > GPT partitions > luks > lvm > btrfs with a single disk as
> metadata DUP data single)
>  - I was unable to reproduce on xfs (with a reflink copy) either but I
> also was only able to try on a new fs...
>  - I've never been able to reproduce on other VMs
> 
> 
> If you'd like to give it a try, my reproducer source image is
> ---
> curl -O https://download.atmark-techno.com/atde/atde9-amd64-20220624.tar.xz
> tar xf atde9-amd64-20220624.tar.xz
> qemu-img convert -O raw atde9-amd64-20220624/atde9-amd64.vmdk atde-test
> cp --reflink=always atde-test atde-test2
> ---
> and using 'atde-test'.
> For further attempts I've removed atde-test and copied back from
> atde-test2 with cp --reflink=always.
> This VM graphical output is borked, but ssh listens so something like
> `-netdev user,id=net0,hostfwd=tcp::2227-:22 -device virtio-net-pci,netdev=net0`
> and 'ssh -p 2227 -l atmark localhost' should allow login with password
> 'atmark' or you can change vt on the console (root password 'root')
> 
> I also had similar problems with atde9-amd64-20211201.tar.xz .
> 
> 
> When reproducing I've had either segfaults in the initrd and complete
> boot failures, or boot working and login failures but ssh working
> without login shell (ssh ... -tt localhost sh)
> that allowed me to dump content of a couple of corrupted files.
> When I looked:
> - /usr/lib/x86_64-linux-gnu/libc-2.31.so had zeroes instead of data from
> offset 0xb6000 to 0xb7fff; rest of file was identical.
> - /usr/bin/dmesg had garbadge from 0x05000 until 0x149d8 (end of file).
> I was lucky and could match the garbage quickly: it is identical to the
> content from 0x1000-0x109d8 within the disk itself.
> 
> I've rebooted a few times and it looks like the corruption is identical
> everytime for this machine as long as I keep using the same source file;
> running from qemu-img convert again seems to change things a bit?
> but whatever it is that is specific to these files is stable, even
> through host reboots.
> 
> 
> 
> I'm sorry I haven't been able to make a better reproducer, I'll keep
> trying a bit more tomorrow but maybe someone has an idea with what I've
> had so far :/
> 
> Perhaps at this point it might be simpler to just try to take qemu out
> of the equation and issue many parallel reads to different offsets
> (overlapping?) of a large file in a similar way qemu io_uring engine
> does and check their contents?
> 
> 
> Thanks, and I'll probably follow up a bit tomorrow even if no-one has
> any idea, but even ideas of where to look would be appreciated.

Not sure what's going on here, but I use qemu with io_uring many times
each day and haven't seen anything odd. This is on ext4 and xfs however,
I haven't used btrfs as the backing file system. I wonder if we can boil
this down into a test case and try and figure out what is doing on here.

-- 
Jens Axboe

     prev parent reply	other threads:[~2022-06-28 19:15 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-28  9:08 read corruption with qemu master io_uring engine / linux master / btrfs(?) Dominique MARTINET
2022-06-28 19:03 ` Nikolay Borisov
2022-06-29  0:35   ` Dominique MARTINET
2022-06-29  5:14     ` Dominique MARTINET
2022-06-29 15:37       ` Filipe Manana
2022-06-30  0:41         ` Dominique MARTINET
2022-06-30  7:56           ` Dominique MARTINET
2022-06-30 10:45             ` Filipe Manana
2022-06-30 11:09               ` Filipe Manana
2022-06-30 11:27               ` Dominique MARTINET
2022-06-30 12:51                 ` Filipe Manana
2022-06-30 13:08                   ` Dominique MARTINET
2022-06-30 15:10                     ` Filipe Manana
2022-07-01  1:25                       ` Dominique MARTINET
2022-07-01  8:45                         ` Filipe Manana
2022-07-01 10:33                           ` Filipe Manana
2022-07-01 23:48                             ` Dominique MARTINET
2022-06-28 19:05 ` Nikolay Borisov
2022-06-28 19:12 ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox