public inbox for [email protected]
 help / color / mirror / Atom feed
From: Stefan Metzmacher <[email protected]>
To: Andreas Schneider <[email protected]>,
	Anoop C S <[email protected]>, Jeremy Allison <[email protected]>,
	[email protected]
Cc: Jens Axboe <[email protected]>, io-uring <[email protected]>,
	Pavel Begunkov <[email protected]>
Subject: Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
Date: Wed, 6 May 2020 17:06:44 +0200	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <27882183.aulKxNbigu@magrathea>


[-- Attachment #1.1: Type: text/plain, Size: 4455 bytes --]

Am 06.05.20 um 16:46 schrieb Andreas Schneider via samba-technical:
> On Wednesday, 6 May 2020 16:08:03 CEST Stefan Metzmacher via samba-technical 
> wrote:
>> Am 06.05.20 um 14:41 schrieb Anoop C S:
>>> On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
>>>> Hi Anoop,
>>>>
>>>>> I could reproduce the difference in SHA256 checksum after copying a
>>>>> directory with 100 copies of test file(provided by reporter) from
>>>>> io_uring VFS module enabled share using Windows explorer(right-
>>>>> click-
>>>>>
>>>>>> copy/paste). Only 5 out of 100 files had correct checksum after
>>>>>> copy
>>>>>
>>>>> operation :-/
>>>>
>>>> Great! Can you please try to collect level 1 log files with
>>>> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
>>>> applied?
>>>
>>> I have attached three log files.
>>> log.io_uring.smbd -- Copy using Windows explorer
>>> log.io_uring-mget.smd -- Copy using smbclient
>>> log.io_uring-powershell.smd -- Copy using `Copy-Item`
>>
>> Thanks! All of them show short reads like:
>>> [2020/05/06 17:27:28.130248,  1]
>>> ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000)
>>>   nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536
>>>   dir/1.bin fnum 1607026405> 
>>> [2020/05/06 17:27:28.131049,  1]
>>> ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680)
>>>   nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536
>>>   dir/1.bin fnum 1607026405> 
>>> [2020/05/06 17:27:28.133679,  1]
>>> ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000)
>>>   nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536
>>>   dir/1.bin fnum 1607026405> 
>>> [2020/05/06 17:27:28.140184,  0]
>>> ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576
>>>   (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096
>>>   blocks=19536 dir/1.bin fnum 1607026405
>> It seems the first request is at ofs=0 len=32768 (0x8000) and we get
>> 32768 back.
>> A follow up request with ofs=0 len=1048576 (0x100000) only gets the
>> first 32768 bytes which are already in the page cache.
>>
>> I can easily reproduce this with the Ubuntu 5.4 kernel once I add
>> state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send()
>> and use this.
>>
>> echo 1 > /proc/sys/vm/drop_caches
>> head -c 1024 /root/samba-test/ff1.dat | hexdump -C
>> 00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
>>
>> |................|
>>
>> *
>> 00000400
>> smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat"
>>
>> results in this log entries:
>>> [2020/05/06 06:51:57.069990,  0]
>>> ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608
>>>   (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096
>>>   blocks=16384 ff1.dat fnum 840153065> 
>>> [2020/05/06 06:51:57.076882,  1]
>>> ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000)
>>>   nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384
>>>   ff1.dat fnum 840153065
>> smbclient is just smart enough to recover itself from the short read.
>> But the windows client isn't.
>>
>> The attached test against liburing (git://git.kernel.dk/liburing) should
>> be able to demonstrate the problem. It can also be found in
>> https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://gith
>> ub.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3
>>
>>
>> I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
>> against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.
>>
>> Can someone run the unmodified test/implicit-rwf_nowait against
>> a newer kernel?
> 
> $ uname -a
> Linux magrathea 5.6.8-1-default #1 SMP Thu Apr 30 10:47:01 UTC 2020 (63116ab) 
> x86_64 x86_64 x86_64 GNU/Linux
> $ ./a.out 
> cqe res=4096 != 8192

Thanks for confirming, it passes fine on 5.3 and 5.4 kernels.

metze



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-05-06 15:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 10:04 Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3 Stefan Metzmacher
2020-05-05 14:41 ` Jens Axboe
2020-05-05 15:44   ` Jens Axboe
2020-05-05 16:53     ` Jens Axboe
2020-05-05 17:39       ` Jens Axboe
2020-05-05 17:48         ` Jeremy Allison
2020-05-05 17:50           ` Jens Axboe
     [not found]           ` <[email protected]>
2020-05-06 10:33             ` Stefan Metzmacher
2020-05-06 10:41               ` Stefan Metzmacher
     [not found]               ` <[email protected]>
2020-05-06 14:08                 ` Stefan Metzmacher
2020-05-06 14:43                   ` Andreas Schneider
2020-05-06 14:46                   ` Andreas Schneider
2020-05-06 15:06                     ` Stefan Metzmacher [this message]
2020-05-06 17:03                   ` Jeremy Allison
2020-05-06 17:13                     ` Jeremy Allison
2020-05-06 18:01                     ` Jeremy Allison
2020-05-05 20:19       ` Stefan Metzmacher
2020-05-06 12:55         ` Pavel Begunkov
2020-05-06 15:20           ` Stefan Metzmacher
2020-05-06 15:42             ` Pavel Begunkov
2020-05-07 16:43               ` Jens Axboe
2020-05-07 16:48                 ` Jeremy Allison
2020-05-07 16:50                   ` Jens Axboe
2020-05-07 18:31                     ` Jeremy Allison
2020-05-07 18:35                       ` Jens Axboe
2020-05-07 18:55                         ` Jeremy Allison
2020-05-07 18:58                           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox