From: Qu Wenruo <[email protected]>
To: Jens Axboe <[email protected]>,
"[email protected]" <[email protected]>,
Linux FS Devel <[email protected]>,
[email protected]
Subject: Re: Possible io_uring related race leads to btrfs data csum mismatch
Date: Thu, 17 Aug 2023 09:31:24 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 2023/8/17 09:23, Jens Axboe wrote:
> On 8/16/23 7:19 PM, Qu Wenruo wrote:
>> On 2023/8/17 09:12, Jens Axboe wrote:
>>> On 8/16/23 7:05 PM, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2023/8/17 06:28, Jens Axboe wrote:
>>>> [...]
>>>>>
>>>>>>> 2) What's the .config you are using?
>>>>>>
>>>>>> Pretty common config, no heavy debug options (KASAN etc).
>>>>>
>>>>> Please just send the .config, I'd rather not have to guess. Things like
>>>>> preempt etc may make a difference in reproducing this.
>>>>
>>>> Sure, please see the attached config.gz
>>>
>>> Thanks
>>>
>>>>> And just to be sure, this is not mixing dio and buffered, right?
>>>>
>>>> I'd say it's mixing, there are dwrite() and writev() for the same file,
>>>> but at least not overlapping using this particular seed, nor they are
>>>> concurrent (all inside the same process sequentially).
>>>>
>>>> But considering if only uring_write is disabled, then no more reproduce,
>>>> thus there must be some untested btrfs path triggered by uring_write.
>>>
>>> That would be one conclusion, another would be that timing is just
>>> different and that triggers and issue. Or it could of course be a bug in
>>> io_uring, perhaps a short write that gets retried or something like
>>> that. I've run the tests for hours here and don't hit anything, I've
>>> pulled in the for-next branch for btrfs and see if that'll make a
>>> difference. I'll check your .config too.
>>
>> Just to mention, the problem itself was pretty hard to hit before if
>> using any debug kernel configs.
>
> The kernels I'm testing with don't have any debug options enabled,
> outside of the basic cheap stuff. I do notice you have all btrfs debug
> stuff enabled, I'll try and do that too.
>
>> Not sure why but later I switched both my CPUs (from a desktop i7-13700K
>> but with limited 160W power, to a laptop 7940HS), dropping all heavy
>> debug kernel configs, then it's 100% reproducible here.
>>
>> So I guess a faster CPU is also one factor?
>
> I've run this on kvm on an apple m1 max, no luck there. Ran it on a
> 7950X, no luck there. Fiddling config options on the 7950 and booting up
> the 7763 two socket box. Both that and the 7950 are using gen4 optane,
> should be plenty beefy. But if it's timing related, well...
Just to mention, the following progs are involved:
- btrfs-progs v6.3.3
In theory anything newer than 5.15 should be fine, it's some default
settings change.
- fsstress from xfstests project
Thus it's not the one directly from LTP
Hopes this could help you to reproduce the bug.
Thanks,
Qu
>
>>> Might not be a bad idea to have the writes contain known data, and when
>>> you hit the failure to verify the csum, dump the data where the csum
>>> says it's wrong and figure out at what offset, what content, etc it is?
>>> If that can get correlated to the log of what happened, that might shed
>>> some light on this.
>>>
>> Thanks for the advice, would definitely try this method, would keep you
>> updated when I found something valuable.
>
> If I can't reproduce this, then this seems like the best way forward
> indeed.
>
next prev parent reply other threads:[~2023-08-17 1:32 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-16 6:52 Possible io_uring related race leads to btrfs data csum mismatch Qu Wenruo
2023-08-16 14:33 ` Jens Axboe
2023-08-16 14:49 ` Jens Axboe
2023-08-16 21:46 ` Qu Wenruo
2023-08-16 22:28 ` Jens Axboe
2023-08-17 1:05 ` Qu Wenruo
2023-08-17 1:12 ` Jens Axboe
2023-08-17 1:19 ` Qu Wenruo
2023-08-17 1:23 ` Jens Axboe
2023-08-17 1:31 ` Qu Wenruo [this message]
2023-08-17 1:32 ` Jens Axboe
2023-08-19 23:59 ` Qu Wenruo
2023-08-20 0:22 ` Qu Wenruo
2023-08-20 13:26 ` Jens Axboe
2023-08-20 14:11 ` Jens Axboe
2023-08-20 18:18 ` Matthew Wilcox
2023-08-20 18:40 ` Jens Axboe
2023-08-21 0:38 ` Qu Wenruo
2023-08-21 14:57 ` Jens Axboe
2023-08-21 21:42 ` Qu Wenruo
2023-08-16 22:36 ` Jens Axboe
2023-08-17 0:40 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox