public inbox for [email protected]
 help / color / mirror / Atom feed
From: Dominique MARTINET <[email protected]>
To: Filipe Manana <[email protected]>
Cc: Nikolay Borisov <[email protected]>, Jens Axboe <[email protected]>,
	[email protected], [email protected]
Subject: Re: read corruption with qemu master io_uring engine / linux master / btrfs(?)
Date: Thu, 30 Jun 2022 20:27:50 +0900	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <20220630104536.GA434846@falcondesktop>

Filipe Manana wrote on Thu, Jun 30, 2022 at 11:45:36AM +0100:
> So here's a patch for you to try:
> 
> https://gist.githubusercontent.com/fdmanana/4b24d6b30983e956bb1784a44873c5dd/raw/572490b127071bf827c3bc05dd58dcb7bcff373a/dio.patch

Thanks.
Unfortunately I still hit short reads with this; I can't really tell if
there are more or less than before (unfortunately the parallelism of my
reproducer means that even dropping caches and restarting with the same
seed I get a different offset for short read), but it looks fairly
similar -- usually happens within the first 1000 operations with
sometimes a bit slower with or without the patch.

I went ahead and added a printk in dio_fault_in_size to see if it was
used and it looks like it is, but that doesn't really tell if it is the
reason for short reads (hm, thinking back I could just have printed the
offsets...):
----
/ # /mnt/repro /mnt/t/t/atde-test 
random seed 4061910570
Starting io_uring reads...
[   17.872992] dio_fault_in_size: left 3710976 prev_left 0 size 131072
[   17.873958] dio_fault_in_size: left 3579904 prev_left 3710976 size 131072
[   17.874246] dio_fault_in_size: left 1933312 prev_left 0 size 131072
[   17.875111] dio_fault_in_size: left 3448832 prev_left 3579904 size 131072
[   17.876446] dio_fault_in_size: left 3317760 prev_left 3448832 size 131072
[   17.877493] dio_fault_in_size: left 3186688 prev_left 3317760 size 131072
[   17.878667] dio_fault_in_size: left 3055616 prev_left 3186688 size 131072
[   17.880001] dio_fault_in_size: left 2924544 prev_left 3055616 size 131072
[   17.881524] dio_fault_in_size: left 2793472 prev_left 2924544 size 131072
[   17.882462] dio_fault_in_size: left 2662400 prev_left 2793472 size 131072
[   17.883433] dio_fault_in_size: left 2531328 prev_left 2662400 size 131072
[   17.884573] dio_fault_in_size: left 2400256 prev_left 2531328 size 131072
[   17.886008] dio_fault_in_size: left 2269184 prev_left 2400256 size 131072
[   17.887058] dio_fault_in_size: left 2138112 prev_left 2269184 size 131072
[   17.888313] dio_fault_in_size: left 2007040 prev_left 2138112 size 131072
[   17.889873] dio_fault_in_size: left 1875968 prev_left 2007040 size 131072
[   17.891041] dio_fault_in_size: left 1744896 prev_left 1875968 size 131072
[   17.893174] dio_fault_in_size: left 802816 prev_left 1744896 size 131072
[   17.930249] dio_fault_in_size: left 3325952 prev_left 0 size 131072
[   17.931472] dio_fault_in_size: left 1699840 prev_left 0 size 131072
[   17.956509] dio_fault_in_size: left 1699840 prev_left 0 size 131072
[   17.957522] dio_fault_in_size: left 1888256 prev_left 0 size 131072
bad read result for io 3, offset 4022030336: 176128 should be 1531904
----

(ugh, saw the second patch after writing all this.. but it's the same:
----
/ # /mnt/repro /mnt/t/t/atde-test 
random seed 634214270
Starting io_uring reads...
[   17.858718] dio_fault_in_size: left 1949696 prev_left 0 size 131072
[   18.193604] dio_fault_in_size: left 1142784 prev_left 0 size 131072
[   18.218500] dio_fault_in_size: left 528384 prev_left 0 size 131072
[   18.248184] dio_fault_in_size: left 643072 prev_left 0 size 131072
[   18.291639] dio_fault_in_size: left 131072 prev_left 0 size 131072
bad read result for io 4, offset 5079498752: 241664 should be 2142208
----
rest of the mail is on first patch as I used offset of first message,
but shouldn't matter)

Given my file has many many extents, my guess would be that short reads
happen when we're crossing an extent boundary.


Using the fiemap[1] command I can confirm that it is the case:
[1] https://github.com/ColinIanKing/fiemap

$ printf "%x\n" $((4022030336 + 176128))
efbe0000
$ fiemap /mnt/t/t/atde-test
File atde-test has 199533 extents:
#       Logical          Physical         Length           Flags
...
23205:  00000000efba0000 0000001324f00000 0000000000020000 0008
23206:  00000000efbc0000 00000013222af000 0000000000020000 0008
23207:  00000000efbe0000 00000013222bb000 0000000000020000 0008

but given how many extents there are that doesn't explain why it stopped
at this offset within the file and not another before it: transition
from compressed to non-compressed or something? I didn't find any tool
able to show extent attributes; here's what `btrfs insp dump-tree` has
to say about this physical offset:

$ printf "%d\n" 0x00000013222af000
82177617920
$ printf "%d\n" 0x00000013222bb000
82177667072
$ btrfs insp dump-tree /dev/vg/test
...
leaf 171360256 items 195 free space 29 generation 527 owner EXTENT_TREE
leaf 171360256 flags 0x1(WRITTEN) backref revision 1
checksum stored d9b6566b00000000000000000000000000000000000000000000000000000000
checksum calced d9b6566b00000000000000000000000000000000000000000000000000000000
fs uuid 3f85a731-21b4-4f3d-85b5-f9c45e8493f5
chunk uuid 77575a06-4d6f-4748-a62c-59e6d9221be8
        item 0 key (82177576960 EXTENT_ITEM 40960) itemoff 16230 itemsize 53
                refs 1 gen 527 flags DATA
                extent data backref root 256 objectid 257 offset 4021682176 count 1
        item 1 key (82177617920 EXTENT_ITEM 49152) itemoff 16177 itemsize 53
                refs 1 gen 527 flags DATA
                extent data backref root 256 objectid 257 offset 4022075392 count 1
        item 2 key (82177667072 EXTENT_ITEM 36864) itemoff 16124 itemsize 53
                refs 1 gen 527 flags DATA
                extent data backref root 256 objectid 257 offset 4022206464 count 1

... but that doesn't really help me understand here.

Oh, well, passing you the ball again! :)
Please ask if there's any infos I could get you.

-- 
Dominique

  parent reply	other threads:[~2022-06-30 11:30 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-28  9:08 read corruption with qemu master io_uring engine / linux master / btrfs(?) Dominique MARTINET
2022-06-28 19:03 ` Nikolay Borisov
2022-06-29  0:35   ` Dominique MARTINET
2022-06-29  5:14     ` Dominique MARTINET
2022-06-29 15:37       ` Filipe Manana
2022-06-30  0:41         ` Dominique MARTINET
2022-06-30  7:56           ` Dominique MARTINET
2022-06-30 10:45             ` Filipe Manana
2022-06-30 11:09               ` Filipe Manana
2022-06-30 11:27               ` Dominique MARTINET [this message]
2022-06-30 12:51                 ` Filipe Manana
2022-06-30 13:08                   ` Dominique MARTINET
2022-06-30 15:10                     ` Filipe Manana
2022-07-01  1:25                       ` Dominique MARTINET
2022-07-01  8:45                         ` Filipe Manana
2022-07-01 10:33                           ` Filipe Manana
2022-07-01 23:48                             ` Dominique MARTINET
2022-06-28 19:05 ` Nikolay Borisov
2022-06-28 19:12 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox