public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Qu Wenruo <[email protected]>,
	"[email protected]" <[email protected]>,
	Linux FS Devel <[email protected]>,
	[email protected]
Subject: Re: Possible io_uring related race leads to btrfs data csum mismatch
Date: Wed, 16 Aug 2023 19:12:11 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/16/23 7:05 PM, Qu Wenruo wrote:
> 
> 
> On 2023/8/17 06:28, Jens Axboe wrote:
> [...]
>>
>>>> 2) What's the .config you are using?
>>>
>>> Pretty common config, no heavy debug options (KASAN etc).
>>
>> Please just send the .config, I'd rather not have to guess. Things like
>> preempt etc may make a difference in reproducing this.
> 
> Sure, please see the attached config.gz

Thanks

>> And just to be sure, this is not mixing dio and buffered, right?
> 
> I'd say it's mixing, there are dwrite() and writev() for the same file,
> but at least not overlapping using this particular seed, nor they are
> concurrent (all inside the same process sequentially).
> 
> But considering if only uring_write is disabled, then no more reproduce,
> thus there must be some untested btrfs path triggered by uring_write.

That would be one conclusion, another would be that timing is just
different and that triggers and issue. Or it could of course be a bug in
io_uring, perhaps a short write that gets retried or something like
that. I've run the tests for hours here and don't hit anything, I've
pulled in the for-next branch for btrfs and see if that'll make a
difference. I'll check your .config too.

Might not be a bad idea to have the writes contain known data, and when
you hit the failure to verify the csum, dump the data where the csum
says it's wrong and figure out at what offset, what content, etc it is?
If that can get correlated to the log of what happened, that might shed
some light on this.

>>>>> However I didn't see any io_uring related callback inside btrfs code,
>>>>> any advice on the io_uring part would be appreciated.
>>>>
>>>> io_uring doesn't do anything special here, it uses the normal page cache
>>>> read/write parts for buffered IO. But you may get extra parallellism
>>>> with io_uring here. For example, with the buffered write that this most
>>>> likely is, libaio would be exactly the same as a pwrite(2) on the file.
>>>> If this would've blocked, io_uring would offload this to a helper
>>>> thread. Depending on the workload, you could have multiple of those in
>>>> progress at the same time.
>>>
>>> My biggest concern is, would io_uring modify the page when it's still
>>> under writeback?
>>
>> No, of course not. Like I mentioned, io_uring doesn't do anything that
>> the normal read/write path isn't already doing - it's using the same
>> ->read_iter() and ->write_iter() that everything else is, there's no
>> page cache code in io_uring.
>>
>>> In that case, it's going to cause csum mismatch as btrfs relies on the
>>> page under writeback to be unchanged.
>>
>> Sure, I'm aware of the stable page requirements.
>>
>> See my followup email as well on a patch to test as well.
>>
> 
> Applied and tested, using "-p 10 -n 1000" as fsstress workload, failed
> at 23rd run.

OK, that rules out the multiple-writers theory.

-- 
Jens Axboe


  reply	other threads:[~2023-08-17  1:12 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-16  6:52 Possible io_uring related race leads to btrfs data csum mismatch Qu Wenruo
2023-08-16 14:33 ` Jens Axboe
2023-08-16 14:49   ` Jens Axboe
2023-08-16 21:46   ` Qu Wenruo
2023-08-16 22:28     ` Jens Axboe
2023-08-17  1:05       ` Qu Wenruo
2023-08-17  1:12         ` Jens Axboe [this message]
2023-08-17  1:19           ` Qu Wenruo
2023-08-17  1:23             ` Jens Axboe
2023-08-17  1:31               ` Qu Wenruo
2023-08-17  1:32                 ` Jens Axboe
2023-08-19 23:59                   ` Qu Wenruo
2023-08-20  0:22                     ` Qu Wenruo
2023-08-20 13:26                       ` Jens Axboe
2023-08-20 14:11                         ` Jens Axboe
2023-08-20 18:18                           ` Matthew Wilcox
2023-08-20 18:40                             ` Jens Axboe
2023-08-21  0:38                           ` Qu Wenruo
2023-08-21 14:57                             ` Jens Axboe
2023-08-21 21:42                               ` Qu Wenruo
2023-08-16 22:36     ` Jens Axboe
2023-08-17  0:40       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox