public inbox for [email protected]
 help / color / mirror / Atom feed
* Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
@ 2020-05-05 10:04 Stefan Metzmacher
  2020-05-05 14:41 ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Metzmacher @ 2020-05-05 10:04 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Samba Technical, Jeremy Allison


[-- Attachment #1.1: Type: text/plain, Size: 583 bytes --]

Hi Jens,

we currently have a bug report [1][2] regarding a data corruption with
Samba's vfs_io_uring.c [3].

Is there're a know problem in newer kernels? It seems the 5.3 Kernel
doesn't have the problem (at least Jeremy wasn't able to reproduce it
on the Ubuntu 5.3 kernel).

Do you have any hints how to track that down?

Thanks!
metze

[1] https://bugzilla.samba.org/show_bug.cgi?id=14361
[2]
https://lists.samba.org/archive/samba/2020-April/thread.html#229372[3]
https://git.samba.org/?p=samba.git;a=blob;f=source3/modules/vfs_io_uring.c;hb=refs/heads/master


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-05 10:04 Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3 Stefan Metzmacher
@ 2020-05-05 14:41 ` Jens Axboe
  2020-05-05 15:44   ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2020-05-05 14:41 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Samba Technical, Jeremy Allison

On 5/5/20 4:04 AM, Stefan Metzmacher wrote:
> Hi Jens,
> 
> we currently have a bug report [1][2] regarding a data corruption with
> Samba's vfs_io_uring.c [3].
> 
> Is there're a know problem in newer kernels? It seems the 5.3 Kernel
> doesn't have the problem (at least Jeremy wasn't able to reproduce it
> on the Ubuntu 5.3 kernel).
> 
> Do you have any hints how to track that down?

I'll take a look at this! Any chance Jeremy can try 5.4 and 5.5 as well,
just to see where we're at, roughly? That might be very helpful.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-05 14:41 ` Jens Axboe
@ 2020-05-05 15:44   ` Jens Axboe
  2020-05-05 16:53     ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2020-05-05 15:44 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Samba Technical, Jeremy Allison

On 5/5/20 8:41 AM, Jens Axboe wrote:
> On 5/5/20 4:04 AM, Stefan Metzmacher wrote:
>> Hi Jens,
>>
>> we currently have a bug report [1][2] regarding a data corruption with
>> Samba's vfs_io_uring.c [3].
>>
>> Is there're a know problem in newer kernels? It seems the 5.3 Kernel
>> doesn't have the problem (at least Jeremy wasn't able to reproduce it
>> on the Ubuntu 5.3 kernel).
>>
>> Do you have any hints how to track that down?
> 
> I'll take a look at this! Any chance Jeremy can try 5.4 and 5.5 as well,
> just to see where we're at, roughly? That might be very helpful.

Trying to setup samba in a vm here to attempt to reproduce. I'm a major
samba noob, running with the smb.conf from the reporters email, I get:

[2020/05/05 15:43:07.126674,  0] ../../source4/smbd/server.c:629(binary_smbd_main)
  samba version 4.12.2 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2020
[2020/05/05 15:43:07.152828,  0] ../../source4/smbd/server.c:826(binary_smbd_main)
  At this time the 'samba' binary should only be used for either:
  'server role = active directory domain controller' or to access the ntvfs file server with 'server services = +smb' or the rpc proxy with 'dcerpc endpoint servers = remote'
  You should start smbd/nmbd/winbindd instead for domain member and standalone file server tasks
[2020/05/05 15:43:07.152937,  0] ../../lib/util/become_daemon.c:121(exit_daemon)
  exit_daemon: daemon failed to start: Samba detected misconfigured 'server role' and exited. Check logs for details, error code 22

Clue bat appreciated.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-05 15:44   ` Jens Axboe
@ 2020-05-05 16:53     ` Jens Axboe
  2020-05-05 17:39       ` Jens Axboe
  2020-05-05 20:19       ` Stefan Metzmacher
  0 siblings, 2 replies; 27+ messages in thread
From: Jens Axboe @ 2020-05-05 16:53 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Samba Technical, Jeremy Allison

On 5/5/20 9:44 AM, Jens Axboe wrote:
> On 5/5/20 8:41 AM, Jens Axboe wrote:
>> On 5/5/20 4:04 AM, Stefan Metzmacher wrote:
>>> Hi Jens,
>>>
>>> we currently have a bug report [1][2] regarding a data corruption with
>>> Samba's vfs_io_uring.c [3].
>>>
>>> Is there're a know problem in newer kernels? It seems the 5.3 Kernel
>>> doesn't have the problem (at least Jeremy wasn't able to reproduce it
>>> on the Ubuntu 5.3 kernel).
>>>
>>> Do you have any hints how to track that down?
>>
>> I'll take a look at this! Any chance Jeremy can try 5.4 and 5.5 as well,
>> just to see where we're at, roughly? That might be very helpful.
> 
> Trying to setup samba in a vm here to attempt to reproduce. I'm a major
> samba noob, running with the smb.conf from the reporters email, I get:
> 
> [2020/05/05 15:43:07.126674,  0] ../../source4/smbd/server.c:629(binary_smbd_main)
>   samba version 4.12.2 started.
>   Copyright Andrew Tridgell and the Samba Team 1992-2020
> [2020/05/05 15:43:07.152828,  0] ../../source4/smbd/server.c:826(binary_smbd_main)
>   At this time the 'samba' binary should only be used for either:
>   'server role = active directory domain controller' or to access the ntvfs file server with 'server services = +smb' or the rpc proxy with 'dcerpc endpoint servers = remote'
>   You should start smbd/nmbd/winbindd instead for domain member and standalone file server tasks
> [2020/05/05 15:43:07.152937,  0] ../../lib/util/become_daemon.c:121(exit_daemon)
>   exit_daemon: daemon failed to start: Samba detected misconfigured 'server role' and exited. Check logs for details, error code 22
> 
> Clue bat appreciated.

Got it working, but apparently the arch samba doesn't come with io_uring...
One question, though, from looking at the source:

static ssize_t vfs_io_uring_pread_recv(struct tevent_req *req,
				  struct vfs_aio_state *vfs_aio_state)
{
[...]
	if (state->ur.cqe.res < 0) {
		vfs_aio_state->error = -state->ur.cqe.res;
		ret = -1;
	} else {
		vfs_aio_state->error = 0;
		ret = state->ur.cqe.res;
	}

	tevent_req_received(req);
[...]

I'm assuming this is dealing with short reads?

I'll try and see if I can get an arch binary build that has the
vfs_io_uring module and reproduce.
	
-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-05 16:53     ` Jens Axboe
@ 2020-05-05 17:39       ` Jens Axboe
  2020-05-05 17:48         ` Jeremy Allison
  2020-05-05 20:19       ` Stefan Metzmacher
  1 sibling, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2020-05-05 17:39 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: io-uring, Samba Technical, Jeremy Allison

On 5/5/20 10:53 AM, Jens Axboe wrote:
> On 5/5/20 9:44 AM, Jens Axboe wrote:
>> On 5/5/20 8:41 AM, Jens Axboe wrote:
>>> On 5/5/20 4:04 AM, Stefan Metzmacher wrote:
>>>> Hi Jens,
>>>>
>>>> we currently have a bug report [1][2] regarding a data corruption with
>>>> Samba's vfs_io_uring.c [3].
>>>>
>>>> Is there're a know problem in newer kernels? It seems the 5.3 Kernel
>>>> doesn't have the problem (at least Jeremy wasn't able to reproduce it
>>>> on the Ubuntu 5.3 kernel).
>>>>
>>>> Do you have any hints how to track that down?
>>>
>>> I'll take a look at this! Any chance Jeremy can try 5.4 and 5.5 as well,
>>> just to see where we're at, roughly? That might be very helpful.
>>
>> Trying to setup samba in a vm here to attempt to reproduce. I'm a major
>> samba noob, running with the smb.conf from the reporters email, I get:
>>
>> [2020/05/05 15:43:07.126674,  0] ../../source4/smbd/server.c:629(binary_smbd_main)
>>   samba version 4.12.2 started.
>>   Copyright Andrew Tridgell and the Samba Team 1992-2020
>> [2020/05/05 15:43:07.152828,  0] ../../source4/smbd/server.c:826(binary_smbd_main)
>>   At this time the 'samba' binary should only be used for either:
>>   'server role = active directory domain controller' or to access the ntvfs file server with 'server services = +smb' or the rpc proxy with 'dcerpc endpoint servers = remote'
>>   You should start smbd/nmbd/winbindd instead for domain member and standalone file server tasks
>> [2020/05/05 15:43:07.152937,  0] ../../lib/util/become_daemon.c:121(exit_daemon)
>>   exit_daemon: daemon failed to start: Samba detected misconfigured 'server role' and exited. Check logs for details, error code 22
>>
>> Clue bat appreciated.
> 
> Got it working, but apparently the arch samba doesn't come with io_uring...
> One question, though, from looking at the source:
> 
> static ssize_t vfs_io_uring_pread_recv(struct tevent_req *req,
> 				  struct vfs_aio_state *vfs_aio_state)
> {
> [...]
> 	if (state->ur.cqe.res < 0) {
> 		vfs_aio_state->error = -state->ur.cqe.res;
> 		ret = -1;
> 	} else {
> 		vfs_aio_state->error = 0;
> 		ret = state->ur.cqe.res;
> 	}
> 
> 	tevent_req_received(req);
> [...]
> 
> I'm assuming this is dealing with short reads?
> 
> I'll try and see if I can get an arch binary build that has the
> vfs_io_uring module and reproduce.

Got that done, and I can now mount it on Linux. Been trying pretty
hard to trigger any corruptions on reads, but it works for me. Checked
that we see short reads, and we do, and that it handles it just fine.
So pretty blank right now on what this could be.

FWIW, I'm mounting on Linux as:

# mount -t cifs -o ro,guest //arch/data /smb

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-05 17:39       ` Jens Axboe
@ 2020-05-05 17:48         ` Jeremy Allison
  2020-05-05 17:50           ` Jens Axboe
       [not found]           ` <[email protected]>
  0 siblings, 2 replies; 27+ messages in thread
From: Jeremy Allison @ 2020-05-05 17:48 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Stefan Metzmacher, io-uring, Samba Technical, jra

On Tue, May 05, 2020 at 11:39:14AM -0600, Jens Axboe wrote:
> > 
> > I'll try and see if I can get an arch binary build that has the
> > vfs_io_uring module and reproduce.
> 
> Got that done, and I can now mount it on Linux. Been trying pretty
> hard to trigger any corruptions on reads, but it works for me. Checked
> that we see short reads, and we do, and that it handles it just fine.
> So pretty blank right now on what this could be.
> 
> FWIW, I'm mounting on Linux as:
> 
> # mount -t cifs -o ro,guest //arch/data /smb

The reporter claims this only happens with
a Windows client, as the Linux and Samba libsmb
clients don't pipeline the reads in the same
way for multiple files the same way that Windows
does.

Here is the bug report with all the details I've
managed to get out of the original reporter:

https://bugzilla.samba.org/show_bug.cgi?id=14361

I also can't reproduce the data corruption, even
from a Windows client although I've only tried
so far on my home Ubuntu 19.04 kernel 5.3.0-51-generic #44-Ubuntu SMP
box.

I tried running 2 powershell processes on the
Windows client and doing a copy in both of them
to ensure we are opening multiple files simultaneously
and going through the io_uring module (which I
saw happening by adding debug statements) but
I never see corruption.

I'm planning to try running against a Fedora32
VM next, as that's case the reporter claims will
reproduce the issue.

Thanks for helping me look at this !

Jeremy.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-05 17:48         ` Jeremy Allison
@ 2020-05-05 17:50           ` Jens Axboe
       [not found]           ` <[email protected]>
  1 sibling, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2020-05-05 17:50 UTC (permalink / raw)
  To: Jeremy Allison; +Cc: Stefan Metzmacher, io-uring, Samba Technical

On 5/5/20 11:48 AM, Jeremy Allison wrote:
> On Tue, May 05, 2020 at 11:39:14AM -0600, Jens Axboe wrote:
>>>
>>> I'll try and see if I can get an arch binary build that has the
>>> vfs_io_uring module and reproduce.
>>
>> Got that done, and I can now mount it on Linux. Been trying pretty
>> hard to trigger any corruptions on reads, but it works for me. Checked
>> that we see short reads, and we do, and that it handles it just fine.
>> So pretty blank right now on what this could be.
>>
>> FWIW, I'm mounting on Linux as:
>>
>> # mount -t cifs -o ro,guest //arch/data /smb
> 
> The reporter claims this only happens with
> a Windows client, as the Linux and Samba libsmb
> clients don't pipeline the reads in the same
> way for multiple files the same way that Windows
> does.
> 
> Here is the bug report with all the details I've
> managed to get out of the original reporter:
> 
> https://bugzilla.samba.org/show_bug.cgi?id=14361
> 
> I also can't reproduce the data corruption, even
> from a Windows client although I've only tried
> so far on my home Ubuntu 19.04 kernel 5.3.0-51-generic #44-Ubuntu SMP
> box.
> 
> I tried running 2 powershell processes on the
> Windows client and doing a copy in both of them
> to ensure we are opening multiple files simultaneously
> and going through the io_uring module (which I
> saw happening by adding debug statements) but
> I never see corruption.
> 
> I'm planning to try running against a Fedora32
> VM next, as that's case the reporter claims will
> reproduce the issue.
> 
> Thanks for helping me look at this !

Thanks for the detailed reply! If you figure out a way to
reproduce on Linux, please send me a note and I'll be happy
to dive further into this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-05 16:53     ` Jens Axboe
  2020-05-05 17:39       ` Jens Axboe
@ 2020-05-05 20:19       ` Stefan Metzmacher
  2020-05-06 12:55         ` Pavel Begunkov
  1 sibling, 1 reply; 27+ messages in thread
From: Stefan Metzmacher @ 2020-05-05 20:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Samba Technical, Jeremy Allison


[-- Attachment #1.1: Type: text/plain, Size: 1251 bytes --]

Hi Jens,

> Got it working, but apparently the arch samba doesn't come with io_uring...
> One question, though, from looking at the source:

Thanks for taking a look!

> static ssize_t vfs_io_uring_pread_recv(struct tevent_req *req,
> 				  struct vfs_aio_state *vfs_aio_state)
> {
> [...]
> 	if (state->ur.cqe.res < 0) {
> 		vfs_aio_state->error = -state->ur.cqe.res;
> 		ret = -1;
> 	} else {
> 		vfs_aio_state->error = 0;
> 		ret = state->ur.cqe.res;
> 	}
> 
> 	tevent_req_received(req);
> [...]
> 
> I'm assuming this is dealing with short reads?
> 
> I'll try and see if I can get an arch binary build that has the
> vfs_io_uring module and reproduce.

I guess I don't expect short reads for files unless the client asked
for a read beyond EOF. Does IORING_OP_READV may return short reads
similar to preadv2 with RWF_NOWAIT? And if so, did this got changed
since 5.3?

By default Samba uses pread()/pwrite() from within a helper thread
and I modeled the io_uring module with the same expecations that
we wouldn't get a short read if only part of the requested buffer (can
be up to 8MB) is returned because only some of it is already in the
buffer cache.

I'll try the ubuntu 5.4 kernel tomorrow.

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
       [not found]           ` <[email protected]>
@ 2020-05-06 10:33             ` Stefan Metzmacher
  2020-05-06 10:41               ` Stefan Metzmacher
       [not found]               ` <[email protected]>
  0 siblings, 2 replies; 27+ messages in thread
From: Stefan Metzmacher @ 2020-05-06 10:33 UTC (permalink / raw)
  To: Anoop C S, Jeremy Allison; +Cc: Samba Technical, io-uring, Jens Axboe


[-- Attachment #1.1: Type: text/plain, Size: 618 bytes --]

Hi Anoop,

> I could reproduce the difference in SHA256 checksum after copying a
> directory with 100 copies of test file(provided by reporter) from
> io_uring VFS module enabled share using Windows explorer(right-click-
>> copy/paste). Only 5 out of 100 files had correct checksum after copy
> operation :-/

Great! Can you please try to collect level 1 log files with
the patch https://bugzilla.samba.org/attachment.cgi?id=15955
applied?

If you use files with only 0xff please also use the
"io_uring:check_ff=yes" option.

Maybe the problem is really triggered by short reads...

Thanks!
metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 10:33             ` Stefan Metzmacher
@ 2020-05-06 10:41               ` Stefan Metzmacher
       [not found]               ` <[email protected]>
  1 sibling, 0 replies; 27+ messages in thread
From: Stefan Metzmacher @ 2020-05-06 10:41 UTC (permalink / raw)
  To: Anoop C S, Jeremy Allison; +Cc: Jens Axboe, Samba Technical, io-uring


[-- Attachment #1.1: Type: text/plain, Size: 699 bytes --]

Am 06.05.20 um 12:33 schrieb Stefan Metzmacher via samba-technical:
> Hi Anoop,
> 
>> I could reproduce the difference in SHA256 checksum after copying a
>> directory with 100 copies of test file(provided by reporter) from
>> io_uring VFS module enabled share using Windows explorer(right-click-
>>> copy/paste). Only 5 out of 100 files had correct checksum after copy
>> operation :-/
> 
> Great! Can you please try to collect level 1 log files with
> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
> applied?
> 
> If you use files with only 0xff please also use the
> "io_uring:check_ff=yes" option.

And "log level = 1" and "max log size = 0"

Thanks!
metze



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-05 20:19       ` Stefan Metzmacher
@ 2020-05-06 12:55         ` Pavel Begunkov
  2020-05-06 15:20           ` Stefan Metzmacher
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Begunkov @ 2020-05-06 12:55 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe; +Cc: io-uring, Samba Technical, Jeremy Allison

On 05/05/2020 23:19, Stefan Metzmacher wrote:
> Hi Jens,
> 
>> Got it working, but apparently the arch samba doesn't come with io_uring...
>> One question, though, from looking at the source:
> 
> Thanks for taking a look!
> 
>> static ssize_t vfs_io_uring_pread_recv(struct tevent_req *req,
>> 				  struct vfs_aio_state *vfs_aio_state)
>> {
>> [...]
>> 	if (state->ur.cqe.res < 0) {
>> 		vfs_aio_state->error = -state->ur.cqe.res;
>> 		ret = -1;
>> 	} else {
>> 		vfs_aio_state->error = 0;
>> 		ret = state->ur.cqe.res;
>> 	}
>>
>> 	tevent_req_received(req);
>> [...]
>>
>> I'm assuming this is dealing with short reads?
>>
>> I'll try and see if I can get an arch binary build that has the
>> vfs_io_uring module and reproduce.
> 
> I guess I don't expect short reads for files unless the client asked
> for a read beyond EOF. Does IORING_OP_READV may return short reads
> similar to preadv2 with RWF_NOWAIT? And if so, did this got changed
> since 5.3?

AFAIK, it can. io_uring first tries to submit a request with IOCB_NOWAIT,
in short for performance reasons. And it have been doing so from the beginning
or so. The same is true for writes.

> 
> By default Samba uses pread()/pwrite() from within a helper thread
> and I modeled the io_uring module with the same expecations that
> we wouldn't get a short read if only part of the requested buffer (can
> be up to 8MB) is returned because only some of it is already in the
> buffer cache.
> 
> I'll try the ubuntu 5.4 kernel tomorrow.
> 
> metze
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
       [not found]               ` <[email protected]>
@ 2020-05-06 14:08                 ` Stefan Metzmacher
  2020-05-06 14:43                   ` Andreas Schneider
                                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Stefan Metzmacher @ 2020-05-06 14:08 UTC (permalink / raw)
  To: Anoop C S, Jeremy Allison; +Cc: Samba Technical, io-uring, Jens Axboe


[-- Attachment #1.1.1: Type: text/plain, Size: 3753 bytes --]

Am 06.05.20 um 14:41 schrieb Anoop C S:
> On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
>> Hi Anoop,
>>
>>> I could reproduce the difference in SHA256 checksum after copying a
>>> directory with 100 copies of test file(provided by reporter) from
>>> io_uring VFS module enabled share using Windows explorer(right-
>>> click-
>>>> copy/paste). Only 5 out of 100 files had correct checksum after
>>>> copy
>>> operation :-/
>>
>> Great! Can you please try to collect level 1 log files with
>> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
>> applied?
> 
> I have attached three log files.
> log.io_uring.smbd -- Copy using Windows explorer
> log.io_uring-mget.smd -- Copy using smbclient
> log.io_uring-powershell.smd -- Copy using `Copy-Item`

Thanks! All of them show short reads like:

> [2020/05/06 17:27:28.130248,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> [2020/05/06 17:27:28.131049,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680) nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> [2020/05/06 17:27:28.133679,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> [2020/05/06 17:27:28.140184,  0] ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576 (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405

It seems the first request is at ofs=0 len=32768 (0x8000) and we get
32768 back.
A follow up request with ofs=0 len=1048576 (0x100000) only gets the
first 32768 bytes which are already in the page cache.

I can easily reproduce this with the Ubuntu 5.4 kernel once I add
state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send()
and use this.

echo 1 > /proc/sys/vm/drop_caches
head -c 1024 /root/samba-test/ff1.dat | hexdump -C
00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
|................|
*
00000400
smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat"

results in this log entries:
> [2020/05/06 06:51:57.069990,  0] ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608 (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065
> [2020/05/06 06:51:57.076882,  1] ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000) nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065

smbclient is just smart enough to recover itself from the short read.
But the windows client isn't.

The attached test against liburing (git://git.kernel.dk/liburing) should
be able to demonstrate the problem. It can also be found in
https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://github.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3


I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.

Can someone run the unmodified test/implicit-rwf_nowait against
a newer kernel?

Thanks!
metze

[-- Attachment #1.1.2: tmp.diff.txt --]
[-- Type: text/plain, Size: 4710 bytes --]

From eb06dcfde747e46bd08bedf9def2e6cb536c39e3 Mon Sep 17 00:00:00 2001
From: Stefan Metzmacher <[email protected]>
Date: Wed, 6 May 2020 15:33:53 +0200
Subject: [PATCH] TODO!!! test/implicit-rwf_nowait.c

Once this proves the problem, we need a more detailed commit message...
---
 test/Makefile              |   4 +-
 test/implicit-rwf_nowait.c | 162 +++++++++++++++++++++++++++++++++++++
 2 files changed, 164 insertions(+), 2 deletions(-)
 create mode 100644 test/implicit-rwf_nowait.c

diff --git a/test/Makefile b/test/Makefile
index ff4d4b8ec04a..50a75588aca7 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -21,7 +21,7 @@ all_targets += poll poll-cancel ring-leak fsync io_uring_setup io_uring_register
 		file-update accept-reuse poll-v-poll fadvise madvise \
 		short-read openat2 probe shared-wq personality eventfd \
 		send_recv eventfd-ring across-fork sq-poll-kthread splice \
-		lfs-openat lfs-openat-write
+		lfs-openat lfs-openat-write implicit-rwf_nowait
 
 include ../Makefile.quiet
 
@@ -53,7 +53,7 @@ test_srcs := poll.c poll-cancel.c ring-leak.c fsync.c io_uring_setup.c \
 	file-update.c accept-reuse.c poll-v-poll.c fadvise.c \
 	madvise.c short-read.c openat2.c probe.c shared-wq.c \
 	personality.c eventfd.c eventfd-ring.c across-fork.c sq-poll-kthread.c \
-	splice.c lfs-openat.c lfs-openat-write.c
+	splice.c lfs-openat.c lfs-openat-write.c implicit-rwf-nowait.c
 
 ifdef CONFIG_HAVE_STATX
 test_srcs += statx.c
diff --git a/test/implicit-rwf_nowait.c b/test/implicit-rwf_nowait.c
new file mode 100644
index 000000000000..581df209d295
--- /dev/null
+++ b/test/implicit-rwf_nowait.c
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Description: Implicit RWF_NOWAIT bug
+ *
+ */
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+
+#include "liburing.h"
+
+#define BLOCK	4096
+
+#ifndef RWF_NOWAIT
+#define RWF_NOWAIT	8
+#endif
+
+static int get_file_fd(void)
+{
+	ssize_t ret;
+	char *buf;
+	int fd;
+
+	fd = open("testfile", O_RDWR | O_CREAT, 0644);
+	if (fd < 0) {
+		perror("open file");
+		return -1;
+	}
+
+	buf = malloc(BLOCK);
+	memset(buf, 0xff, BLOCK);
+	ret = pwrite(fd, buf, BLOCK, 0);
+	if (ret != BLOCK) {
+		if (ret < 0)
+			perror("write");
+		else
+			printf("Short write\n");
+		goto err;
+	}
+	ret = pwrite(fd, buf, BLOCK, BLOCK);
+	if (ret != BLOCK) {
+		if (ret < 0)
+			perror("write");
+		else
+			printf("Short write\n");
+		goto err;
+	}
+	fsync(fd);
+
+	if (posix_fadvise(fd, BLOCK, BLOCK, POSIX_FADV_DONTNEED)) {
+		perror("fadvise");
+err:
+		close(fd);
+		free(buf);
+		return -1;
+	}
+
+	free(buf);
+	return fd;
+}
+
+static void put_file_fd(int fd)
+{
+	close(fd);
+	unlink("testfile");
+}
+
+int main(int argc, char *argv[])
+{
+	struct io_uring ring;
+	struct io_uring_sqe *sqe;
+	struct io_uring_cqe *cqe;
+	struct iovec iov;
+	int ret, fd;
+
+	iov.iov_base = malloc(2*BLOCK);
+	iov.iov_len = BLOCK;
+
+	ret = io_uring_queue_init(1, &ring, 0);
+	if (ret) {
+		printf("ring setup failed\n");
+		return 1;
+
+	}
+
+	sqe = io_uring_get_sqe(&ring);
+	if (!sqe) {
+		printf("get sqe failed\n");
+		return 1;
+	}
+
+	fd = get_file_fd();
+	if (fd < 0)
+		return 1;
+
+	io_uring_prep_readv(sqe, fd, &iov, 1, 0);
+	io_uring_sqe_set_data(sqe, (void *)(uintptr_t)0x11111111);
+	ret = io_uring_submit(&ring);
+	if (ret != 1) {
+		printf("Got submit %d, expected 1\n", ret);
+		goto err;
+	}
+
+	ret = io_uring_wait_cqe(&ring, &cqe);
+	if (ret) {
+		printf("Ring wait got %d\n", ret);
+		goto err;
+	}
+	ret = (uintptr_t)io_uring_cqe_get_data(cqe);
+	if (ret != 0x11111111) {
+		printf("Got invalid data 0x%08x, expected 0x11111111\n", ret);
+		goto err;
+	}
+	io_uring_cq_advance(&ring, 1);
+
+	if (cqe->res != BLOCK) {
+		printf("cqe res=%d != %u\n", cqe->res, BLOCK);
+		goto err;
+	}
+
+	sqe = io_uring_get_sqe(&ring);
+	if (!sqe) {
+		printf("get sqe failed\n");
+		return 1;
+	}
+
+	iov.iov_len = 2*BLOCK;
+	io_uring_prep_readv(sqe, fd, &iov, 1, 0);
+	// Impliet by broken kernels? sqe->rw_flags = RWF_NOWAIT;
+	io_uring_sqe_set_data(sqe, (void *)(uintptr_t)0x22222222);
+	ret = io_uring_submit(&ring);
+	if (ret != 1) {
+		printf("Got submit %d, expected 1\n", ret);
+		goto err;
+	}
+
+	ret = io_uring_wait_cqe(&ring, &cqe);
+	if (ret) {
+		printf("Ring peek got %d\n", ret);
+		goto err;
+	}
+	ret = (uintptr_t)io_uring_cqe_get_data(cqe);
+	if (ret != 0x22222222) {
+		printf("Got invalid data 0x%08x, expected 0x22222222\n", ret);
+		goto err;
+	}
+	io_uring_cq_advance(&ring, 1);
+
+	if (cqe->res != 2*BLOCK) {
+		printf("cqe res=%d != %u\n", cqe->res, 2*BLOCK);
+		goto err;
+	}
+
+	put_file_fd(fd);
+	return 0;
+err:
+	put_file_fd(fd);
+	return 1;
+}
-- 
2.17.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 14:08                 ` Stefan Metzmacher
@ 2020-05-06 14:43                   ` Andreas Schneider
  2020-05-06 14:46                   ` Andreas Schneider
  2020-05-06 17:03                   ` Jeremy Allison
  2 siblings, 0 replies; 27+ messages in thread
From: Andreas Schneider @ 2020-05-06 14:43 UTC (permalink / raw)
  To: Anoop C S, Jeremy Allison, samba-technical
  Cc: Jens Axboe, Samba Technical, io-uring, Stefan Metzmacher

On Wednesday, 6 May 2020 16:08:03 CEST Stefan Metzmacher via samba-technical 
wrote:
> Am 06.05.20 um 14:41 schrieb Anoop C S:
> > On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
> >> Hi Anoop,
> >> 
> >>> I could reproduce the difference in SHA256 checksum after copying a
> >>> directory with 100 copies of test file(provided by reporter) from
> >>> io_uring VFS module enabled share using Windows explorer(right-
> >>> click-
> >>> 
> >>>> copy/paste). Only 5 out of 100 files had correct checksum after
> >>>> copy
> >>> 
> >>> operation :-/
> >> 
> >> Great! Can you please try to collect level 1 log files with
> >> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
> >> applied?
> > 
> > I have attached three log files.
> > log.io_uring.smbd -- Copy using Windows explorer
> > log.io_uring-mget.smd -- Copy using smbclient
> > log.io_uring-powershell.smd -- Copy using `Copy-Item`
> 
> Thanks! All of them show short reads like:
> > [2020/05/06 17:27:28.130248,  1]
> > ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000)
> >   nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536
> >   dir/1.bin fnum 1607026405> 
> > [2020/05/06 17:27:28.131049,  1]
> > ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680)
> >   nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536
> >   dir/1.bin fnum 1607026405> 
> > [2020/05/06 17:27:28.133679,  1]
> > ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000)
> >   nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536
> >   dir/1.bin fnum 1607026405> 
> > [2020/05/06 17:27:28.140184,  0]
> > ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576
> >   (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096
> >   blocks=19536 dir/1.bin fnum 1607026405
> It seems the first request is at ofs=0 len=32768 (0x8000) and we get
> 32768 back.
> A follow up request with ofs=0 len=1048576 (0x100000) only gets the
> first 32768 bytes which are already in the page cache.
> 
> I can easily reproduce this with the Ubuntu 5.4 kernel once I add
> state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send()
> and use this.
> 
> echo 1 > /proc/sys/vm/drop_caches
> head -c 1024 /root/samba-test/ff1.dat | hexdump -C
> 00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
> 
> |................|
> 
> *
> 00000400
> smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat"
> 
> results in this log entries:
> > [2020/05/06 06:51:57.069990,  0]
> > ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608
> >   (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096
> >   blocks=16384 ff1.dat fnum 840153065> 
> > [2020/05/06 06:51:57.076882,  1]
> > ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000)
> >   nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384
> >   ff1.dat fnum 840153065
> smbclient is just smart enough to recover itself from the short read.
> But the windows client isn't.
> 
> The attached test against liburing (git://git.kernel.dk/liburing) should
> be able to demonstrate the problem. It can also be found in
> https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://gith
> ub.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3

  ^^^ This link gives me 404 ...

> I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
> against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.
> 
> Can someone run the unmodified test/implicit-rwf_nowait against
> a newer kernel?
> 
> Thanks!
> metze


-- 
Andreas Schneider                      [email protected]
Samba Team                             www.samba.org
GPG-ID:     8DFF53E18F2ABC8D8F3C92237EE0FC4DCC014E3D



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 14:08                 ` Stefan Metzmacher
  2020-05-06 14:43                   ` Andreas Schneider
@ 2020-05-06 14:46                   ` Andreas Schneider
  2020-05-06 15:06                     ` Stefan Metzmacher
  2020-05-06 17:03                   ` Jeremy Allison
  2 siblings, 1 reply; 27+ messages in thread
From: Andreas Schneider @ 2020-05-06 14:46 UTC (permalink / raw)
  To: Anoop C S, Jeremy Allison, samba-technical
  Cc: Jens Axboe, Samba Technical, io-uring, Stefan Metzmacher

On Wednesday, 6 May 2020 16:08:03 CEST Stefan Metzmacher via samba-technical 
wrote:
> Am 06.05.20 um 14:41 schrieb Anoop C S:
> > On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
> >> Hi Anoop,
> >> 
> >>> I could reproduce the difference in SHA256 checksum after copying a
> >>> directory with 100 copies of test file(provided by reporter) from
> >>> io_uring VFS module enabled share using Windows explorer(right-
> >>> click-
> >>> 
> >>>> copy/paste). Only 5 out of 100 files had correct checksum after
> >>>> copy
> >>> 
> >>> operation :-/
> >> 
> >> Great! Can you please try to collect level 1 log files with
> >> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
> >> applied?
> > 
> > I have attached three log files.
> > log.io_uring.smbd -- Copy using Windows explorer
> > log.io_uring-mget.smd -- Copy using smbclient
> > log.io_uring-powershell.smd -- Copy using `Copy-Item`
> 
> Thanks! All of them show short reads like:
> > [2020/05/06 17:27:28.130248,  1]
> > ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000)
> >   nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536
> >   dir/1.bin fnum 1607026405> 
> > [2020/05/06 17:27:28.131049,  1]
> > ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680)
> >   nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536
> >   dir/1.bin fnum 1607026405> 
> > [2020/05/06 17:27:28.133679,  1]
> > ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000)
> >   nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536
> >   dir/1.bin fnum 1607026405> 
> > [2020/05/06 17:27:28.140184,  0]
> > ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576
> >   (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096
> >   blocks=19536 dir/1.bin fnum 1607026405
> It seems the first request is at ofs=0 len=32768 (0x8000) and we get
> 32768 back.
> A follow up request with ofs=0 len=1048576 (0x100000) only gets the
> first 32768 bytes which are already in the page cache.
> 
> I can easily reproduce this with the Ubuntu 5.4 kernel once I add
> state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send()
> and use this.
> 
> echo 1 > /proc/sys/vm/drop_caches
> head -c 1024 /root/samba-test/ff1.dat | hexdump -C
> 00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
> 
> |................|
> 
> *
> 00000400
> smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat"
> 
> results in this log entries:
> > [2020/05/06 06:51:57.069990,  0]
> > ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608
> >   (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096
> >   blocks=16384 ff1.dat fnum 840153065> 
> > [2020/05/06 06:51:57.076882,  1]
> > ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req)> 
> >   vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000)
> >   nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384
> >   ff1.dat fnum 840153065
> smbclient is just smart enough to recover itself from the short read.
> But the windows client isn't.
> 
> The attached test against liburing (git://git.kernel.dk/liburing) should
> be able to demonstrate the problem. It can also be found in
> https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://gith
> ub.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3
> 
> 
> I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
> against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.
> 
> Can someone run the unmodified test/implicit-rwf_nowait against
> a newer kernel?

$ uname -a
Linux magrathea 5.6.8-1-default #1 SMP Thu Apr 30 10:47:01 UTC 2020 (63116ab) 
x86_64 x86_64 x86_64 GNU/Linux
$ ./a.out 
cqe res=4096 != 8192


	Andreas

-- 
Andreas Schneider                      [email protected]
Samba Team                             www.samba.org
GPG-ID:     8DFF53E18F2ABC8D8F3C92237EE0FC4DCC014E3D



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 14:46                   ` Andreas Schneider
@ 2020-05-06 15:06                     ` Stefan Metzmacher
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Metzmacher @ 2020-05-06 15:06 UTC (permalink / raw)
  To: Andreas Schneider, Anoop C S, Jeremy Allison, samba-technical
  Cc: Jens Axboe, io-uring, Pavel Begunkov


[-- Attachment #1.1: Type: text/plain, Size: 4455 bytes --]

Am 06.05.20 um 16:46 schrieb Andreas Schneider via samba-technical:
> On Wednesday, 6 May 2020 16:08:03 CEST Stefan Metzmacher via samba-technical 
> wrote:
>> Am 06.05.20 um 14:41 schrieb Anoop C S:
>>> On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
>>>> Hi Anoop,
>>>>
>>>>> I could reproduce the difference in SHA256 checksum after copying a
>>>>> directory with 100 copies of test file(provided by reporter) from
>>>>> io_uring VFS module enabled share using Windows explorer(right-
>>>>> click-
>>>>>
>>>>>> copy/paste). Only 5 out of 100 files had correct checksum after
>>>>>> copy
>>>>>
>>>>> operation :-/
>>>>
>>>> Great! Can you please try to collect level 1 log files with
>>>> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
>>>> applied?
>>>
>>> I have attached three log files.
>>> log.io_uring.smbd -- Copy using Windows explorer
>>> log.io_uring-mget.smd -- Copy using smbclient
>>> log.io_uring-powershell.smd -- Copy using `Copy-Item`
>>
>> Thanks! All of them show short reads like:
>>> [2020/05/06 17:27:28.130248,  1]
>>> ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000)
>>>   nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536
>>>   dir/1.bin fnum 1607026405> 
>>> [2020/05/06 17:27:28.131049,  1]
>>> ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680)
>>>   nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536
>>>   dir/1.bin fnum 1607026405> 
>>> [2020/05/06 17:27:28.133679,  1]
>>> ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000)
>>>   nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536
>>>   dir/1.bin fnum 1607026405> 
>>> [2020/05/06 17:27:28.140184,  0]
>>> ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576
>>>   (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096
>>>   blocks=19536 dir/1.bin fnum 1607026405
>> It seems the first request is at ofs=0 len=32768 (0x8000) and we get
>> 32768 back.
>> A follow up request with ofs=0 len=1048576 (0x100000) only gets the
>> first 32768 bytes which are already in the page cache.
>>
>> I can easily reproduce this with the Ubuntu 5.4 kernel once I add
>> state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send()
>> and use this.
>>
>> echo 1 > /proc/sys/vm/drop_caches
>> head -c 1024 /root/samba-test/ff1.dat | hexdump -C
>> 00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
>>
>> |................|
>>
>> *
>> 00000400
>> smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat"
>>
>> results in this log entries:
>>> [2020/05/06 06:51:57.069990,  0]
>>> ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608
>>>   (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096
>>>   blocks=16384 ff1.dat fnum 840153065> 
>>> [2020/05/06 06:51:57.076882,  1]
>>> ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req)> 
>>>   vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000)
>>>   nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384
>>>   ff1.dat fnum 840153065
>> smbclient is just smart enough to recover itself from the short read.
>> But the windows client isn't.
>>
>> The attached test against liburing (git://git.kernel.dk/liburing) should
>> be able to demonstrate the problem. It can also be found in
>> https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://gith
>> ub.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3
>>
>>
>> I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
>> against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.
>>
>> Can someone run the unmodified test/implicit-rwf_nowait against
>> a newer kernel?
> 
> $ uname -a
> Linux magrathea 5.6.8-1-default #1 SMP Thu Apr 30 10:47:01 UTC 2020 (63116ab) 
> x86_64 x86_64 x86_64 GNU/Linux
> $ ./a.out 
> cqe res=4096 != 8192

Thanks for confirming, it passes fine on 5.3 and 5.4 kernels.

metze



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 12:55         ` Pavel Begunkov
@ 2020-05-06 15:20           ` Stefan Metzmacher
  2020-05-06 15:42             ` Pavel Begunkov
  0 siblings, 1 reply; 27+ messages in thread
From: Stefan Metzmacher @ 2020-05-06 15:20 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Samba Technical, Jeremy Allison


[-- Attachment #1.1: Type: text/plain, Size: 1653 bytes --]

Am 06.05.20 um 14:55 schrieb Pavel Begunkov:
> On 05/05/2020 23:19, Stefan Metzmacher wrote:
>> Hi Jens,
>>
>>> Got it working, but apparently the arch samba doesn't come with io_uring...
>>> One question, though, from looking at the source:
>>
>> Thanks for taking a look!
>>
>>> static ssize_t vfs_io_uring_pread_recv(struct tevent_req *req,
>>> 				  struct vfs_aio_state *vfs_aio_state)
>>> {
>>> [...]
>>> 	if (state->ur.cqe.res < 0) {
>>> 		vfs_aio_state->error = -state->ur.cqe.res;
>>> 		ret = -1;
>>> 	} else {
>>> 		vfs_aio_state->error = 0;
>>> 		ret = state->ur.cqe.res;
>>> 	}
>>>
>>> 	tevent_req_received(req);
>>> [...]
>>>
>>> I'm assuming this is dealing with short reads?
>>>
>>> I'll try and see if I can get an arch binary build that has the
>>> vfs_io_uring module and reproduce.
>>
>> I guess I don't expect short reads for files unless the client asked
>> for a read beyond EOF. Does IORING_OP_READV may return short reads
>> similar to preadv2 with RWF_NOWAIT? And if so, did this got changed
>> since 5.3?
> 
> AFAIK, it can. io_uring first tries to submit a request with IOCB_NOWAIT,
> in short for performance reasons. And it have been doing so from the beginning
> or so. The same is true for writes.

See the other mails in the thread. The test I wrote shows the
implicit IOCB_NOWAIT was not exposed to the caller in  (at least in 5.3
and 5.4).

I think the typical user don't want it to be exposed!
I'm not sure for blocking reads on a socket, but for files
below EOF it's really not what's expected.

If that behavior is desired RWF_NOWAIT can be used explicitly.

metze


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 15:20           ` Stefan Metzmacher
@ 2020-05-06 15:42             ` Pavel Begunkov
  2020-05-07 16:43               ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Pavel Begunkov @ 2020-05-06 15:42 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe; +Cc: io-uring, Samba Technical, Jeremy Allison

On 06/05/2020 18:20, Stefan Metzmacher wrote:
> Am 06.05.20 um 14:55 schrieb Pavel Begunkov:
>> On 05/05/2020 23:19, Stefan Metzmacher wrote:
>> AFAIK, it can. io_uring first tries to submit a request with IOCB_NOWAIT,
>> in short for performance reasons. And it have been doing so from the beginning
>> or so. The same is true for writes.
> 
> See the other mails in the thread. The test I wrote shows the

Cool you resolved the issue!

> implicit IOCB_NOWAIT was not exposed to the caller in  (at least in 5.3
> and 5.4).
> 

# git show remotes/origin/for-5.3/io_uring:fs/io_uring grep "kiocb->ki_flags |=
IOCB_NOWAIT" -A 5 -B 5

if (force_nonblock)
        kiocb->ki_flags |= IOCB_NOWAIT;

And it have been there since 5.2 or even earlier. I don't know, your results
could be because of different policy in block layer, something unexpected in
io_uring, etc., but it's how it was intended to be.


> I think the typical user don't want it to be exposed!
> I'm not sure for blocking reads on a socket, but for files
> below EOF it's really not what's expected.

Hard to say, but even read(2) without any NONBLOCK doesn't guarantee that.
Hopefully, BPF will help us with that in the future.

> 
> If that behavior is desired RWF_NOWAIT can be used explicitly.
> 
> metze
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 14:08                 ` Stefan Metzmacher
  2020-05-06 14:43                   ` Andreas Schneider
  2020-05-06 14:46                   ` Andreas Schneider
@ 2020-05-06 17:03                   ` Jeremy Allison
  2020-05-06 17:13                     ` Jeremy Allison
  2020-05-06 18:01                     ` Jeremy Allison
  2 siblings, 2 replies; 27+ messages in thread
From: Jeremy Allison @ 2020-05-06 17:03 UTC (permalink / raw)
  To: Stefan Metzmacher; +Cc: Anoop C S, Samba Technical, io-uring, Jens Axboe, jra

On Wed, May 06, 2020 at 04:08:03PM +0200, Stefan Metzmacher wrote:
> Am 06.05.20 um 14:41 schrieb Anoop C S:
> > On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
> >> Hi Anoop,
> >>
> >>> I could reproduce the difference in SHA256 checksum after copying a
> >>> directory with 100 copies of test file(provided by reporter) from
> >>> io_uring VFS module enabled share using Windows explorer(right-
> >>> click-
> >>>> copy/paste). Only 5 out of 100 files had correct checksum after
> >>>> copy
> >>> operation :-/
> >>
> >> Great! Can you please try to collect level 1 log files with
> >> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
> >> applied?
> > 
> > I have attached three log files.
> > log.io_uring.smbd -- Copy using Windows explorer
> > log.io_uring-mget.smd -- Copy using smbclient
> > log.io_uring-powershell.smd -- Copy using `Copy-Item`
> 
> Thanks! All of them show short reads like:
> 
> > [2020/05/06 17:27:28.130248,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> > [2020/05/06 17:27:28.131049,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680) nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> > [2020/05/06 17:27:28.133679,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> > [2020/05/06 17:27:28.140184,  0] ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576 (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> 
> It seems the first request is at ofs=0 len=32768 (0x8000) and we get
> 32768 back.
> A follow up request with ofs=0 len=1048576 (0x100000) only gets the
> first 32768 bytes which are already in the page cache.
> 
> I can easily reproduce this with the Ubuntu 5.4 kernel once I add
> state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send()
> and use this.
> 
> echo 1 > /proc/sys/vm/drop_caches
> head -c 1024 /root/samba-test/ff1.dat | hexdump -C
> 00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
> |................|
> *
> 00000400
> smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat"
> 
> results in this log entries:
> > [2020/05/06 06:51:57.069990,  0] ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608 (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065
> > [2020/05/06 06:51:57.076882,  1] ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req)
> >   vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000) nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065
> 
> smbclient is just smart enough to recover itself from the short read.
> But the windows client isn't.

Well we pay attention to the amount of data returned
and only increment the next read request by the amount
actually returned.

I'm amazed that the Windows client doesn't seem to
check this !

> The attached test against liburing (git://git.kernel.dk/liburing) should
> be able to demonstrate the problem. It can also be found in
> https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://github.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3
> 
> 
> I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
> against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.
> 
> Can someone run the unmodified test/implicit-rwf_nowait against
> a newer kernel?

Aha. I wondered about the short read issue when this
was first reported but I could never catch it in the
act.

If the Windows client doesn't check and the kernel
returns short reads I guess we'll have to add logic
similar to tstream_readv_send()/tstream_writev_send()
that ensure all bytes requested/send actually go through
the interface and from/into the kernel unless a read
returns 0 (EOF) or a write returns an error.

What a pain though :-(. SMB2+ server implementors
really need to take note that Windows clients will corrupt
files if they get a short read/write return.

The fact that early kernels don't return short
reads on io_uring but later kernels do makes it
even worse :-(.

There's even an SMB2 protocol field in SMB2_READ:

"MinimumCount (4 bytes): The minimum number of bytes to be read for this operation to be
successful. If fewer than the minimum number of bytes are read by the server, the server
MUST return an error rather than the bytes read."

We correctly return EOF if the amount read from
the kernel is less than SMB2_READ.MinimumCount
so I'm guessing they're not using it or looking
at it (or setting it to zero).

MinimumCount is supposed to allow the client to cope with
this. Anoop, do you have wireshark traces so we can
see what the Windows clients are setting here ?

Jeremy.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 17:03                   ` Jeremy Allison
@ 2020-05-06 17:13                     ` Jeremy Allison
  2020-05-06 18:01                     ` Jeremy Allison
  1 sibling, 0 replies; 27+ messages in thread
From: Jeremy Allison @ 2020-05-06 17:13 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe, Anoop C S, Samba Technical,
	io-uring

On Wed, May 06, 2020 at 10:03:44AM -0700, Jeremy Allison via samba-technical wrote:
> 
> Well we pay attention to the amount of data returned
> and only increment the next read request by the amount
> actually returned.
> 
> I'm amazed that the Windows client doesn't seem to
> check this !
> 
> > The attached test against liburing (git://git.kernel.dk/liburing) should
> > be able to demonstrate the problem. It can also be found in
> > https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://github.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3
> > 
> > 
> > I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
> > against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.
> > 
> > Can someone run the unmodified test/implicit-rwf_nowait against
> > a newer kernel?
> 
> Aha. I wondered about the short read issue when this
> was first reported but I could never catch it in the
> act.
> 
> If the Windows client doesn't check and the kernel
> returns short reads I guess we'll have to add logic
> similar to tstream_readv_send()/tstream_writev_send()
> that ensure all bytes requested/send actually go through
> the interface and from/into the kernel unless a read
> returns 0 (EOF) or a write returns an error.
> 
> What a pain though :-(. SMB2+ server implementors
> really need to take note that Windows clients will corrupt
> files if they get a short read/write return.
> 
> The fact that early kernels don't return short
> reads on io_uring but later kernels do makes it
> even worse :-(.
> 
> There's even an SMB2 protocol field in SMB2_READ:
> 
> "MinimumCount (4 bytes): The minimum number of bytes to be read for this operation to be
> successful. If fewer than the minimum number of bytes are read by the server, the server
> MUST return an error rather than the bytes read."
> 
> We correctly return EOF if the amount read from
> the kernel is less than SMB2_READ.MinimumCount
> so I'm guessing they're not using it or looking
> at it (or setting it to zero).
> 
> MinimumCount is supposed to allow the client to cope with
> this. Anoop, do you have wireshark traces so we can
> see what the Windows clients are setting here ?

Just did a quick check myself and Windows10 clients
are setting Minimumcount==0 on read, so any amount
should be good here.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 17:03                   ` Jeremy Allison
  2020-05-06 17:13                     ` Jeremy Allison
@ 2020-05-06 18:01                     ` Jeremy Allison
  1 sibling, 0 replies; 27+ messages in thread
From: Jeremy Allison @ 2020-05-06 18:01 UTC (permalink / raw)
  To: Stefan Metzmacher, Jens Axboe, Anoop C S, Samba Technical,
	io-uring; +Cc: jra

[-- Attachment #1: Type: text/plain, Size: 802 bytes --]

On Wed, May 06, 2020 at 10:03:44AM -0700, Jeremy Allison via samba-technical wrote:
> On Wed, May 06, 2020 at 04:08:03PM +0200, Stefan Metzmacher wrote:
> > 
> > smbclient is just smart enough to recover itself from the short read.
> > But the windows client isn't.
> 
> Well we pay attention to the amount of data returned
> and only increment the next read request by the amount
> actually returned.
> 
> I'm amazed that the Windows client doesn't seem to
> check this !

Confirmed. I just ran a copy test from a Windows10
client copying from Samba with the attached patch
applied (when reading from offset zero, reduce
the amount of data returned by 2 bytes to force
a short read return), and it reliably corrupts files.

Windows isn't looking at the DataLength field
of the SMB2_READ response :-(.

[-- Attachment #2: look --]
[-- Type: text/plain, Size: 1015 bytes --]

diff --git a/source3/modules/vfs_io_uring.c b/source3/modules/vfs_io_uring.c
index 378e48d112f..d21a3485536 100644
--- a/source3/modules/vfs_io_uring.c
+++ b/source3/modules/vfs_io_uring.c
@@ -294,6 +294,7 @@ static void vfs_io_uring_fd_handler(struct tevent_context *ev,
 struct vfs_io_uring_pread_state {
 	struct vfs_io_uring_request ur;
 	struct iovec iov;
+	off_t offset;
 };
 
 static struct tevent_req *vfs_io_uring_pread_send(struct vfs_handle_struct *handle,
@@ -319,6 +320,7 @@ static struct tevent_req *vfs_io_uring_pread_send(struct vfs_handle_struct *hand
 	state->ur.config = config;
 	state->ur.req = req;
 	state->ur.state = state;
+	state->offset = offset;
 
 	SMBPROFILE_BYTES_ASYNC_START(syscall_asys_pread, profile_p,
 				     state->ur.profile_bytes, n);
@@ -367,6 +369,11 @@ static ssize_t vfs_io_uring_pread_recv(struct tevent_req *req,
 		ret = state->ur.cqe.res;
 	}
 
+	//JRATEST
+	if (ret > 2 && state->offset == 0) {
+		ret = ret - 2;
+	}
+
 	tevent_req_received(req);
 	return ret;
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-06 15:42             ` Pavel Begunkov
@ 2020-05-07 16:43               ` Jens Axboe
  2020-05-07 16:48                 ` Jeremy Allison
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2020-05-07 16:43 UTC (permalink / raw)
  To: Pavel Begunkov, Stefan Metzmacher
  Cc: io-uring, Samba Technical, Jeremy Allison

On 5/6/20 9:42 AM, Pavel Begunkov wrote:
> On 06/05/2020 18:20, Stefan Metzmacher wrote:
>> Am 06.05.20 um 14:55 schrieb Pavel Begunkov:
>>> On 05/05/2020 23:19, Stefan Metzmacher wrote:
>>> AFAIK, it can. io_uring first tries to submit a request with IOCB_NOWAIT,
>>> in short for performance reasons. And it have been doing so from the beginning
>>> or so. The same is true for writes.
>>
>> See the other mails in the thread. The test I wrote shows the
> 
> Cool you resolved the issue!
> 
>> implicit IOCB_NOWAIT was not exposed to the caller in  (at least in 5.3
>> and 5.4).
>>
> 
> # git show remotes/origin/for-5.3/io_uring:fs/io_uring grep "kiocb->ki_flags |=
> IOCB_NOWAIT" -A 5 -B 5
> 
> if (force_nonblock)
>         kiocb->ki_flags |= IOCB_NOWAIT;
> 
> And it have been there since 5.2 or even earlier. I don't know, your results
> could be because of different policy in block layer, something unexpected in
> io_uring, etc., but it's how it was intended to be.
> 
> 
>> I think the typical user don't want it to be exposed!
>> I'm not sure for blocking reads on a socket, but for files
>> below EOF it's really not what's expected.
> 
> Hard to say, but even read(2) without any NONBLOCK doesn't guarantee that.
> Hopefully, BPF will help us with that in the future.

Replying here, as I missed the storm yesterday... The reason why it's
different is that later kernels no longer attempt to prevent the short
reads. They happen when you get overlapping buffered IO. Then one sqe
will find that X of the Y range is already in cache, and return that.
We don't retry the latter blocking. We previously did, but there was
a few issues with it:

- You're redoing the whole IO, which means more copying

- It's not safe to retry, it'll depend on the file type. For socket,
  pipe, etc we obviously cannot. This is the real reason it got disabled,
  as it was broken there.

Just like for regular system calls, applications must be able to deal
with short IO.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-07 16:43               ` Jens Axboe
@ 2020-05-07 16:48                 ` Jeremy Allison
  2020-05-07 16:50                   ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Jeremy Allison @ 2020-05-07 16:48 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, Stefan Metzmacher, io-uring, Samba Technical, jra

On Thu, May 07, 2020 at 10:43:17AM -0600, Jens Axboe wrote:
> 
> Replying here, as I missed the storm yesterday... The reason why it's
> different is that later kernels no longer attempt to prevent the short
> reads. They happen when you get overlapping buffered IO. Then one sqe
> will find that X of the Y range is already in cache, and return that.
> We don't retry the latter blocking. We previously did, but there was
> a few issues with it:
> 
> - You're redoing the whole IO, which means more copying
> 
> - It's not safe to retry, it'll depend on the file type. For socket,
>   pipe, etc we obviously cannot. This is the real reason it got disabled,
>   as it was broken there.
> 
> Just like for regular system calls, applications must be able to deal
> with short IO.

Thanks, that's a helpful definitive reply. Of course, the SMB3
protocol is designed to deal with short IO replies as well, and
the Samba and linux kernel clients are well-enough written that
they do so. MacOS and Windows however..

Unfortunately they're the most popular clients on the planet,
so we'll probably have to fix Samba to never return short IOs.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-07 16:48                 ` Jeremy Allison
@ 2020-05-07 16:50                   ` Jens Axboe
  2020-05-07 18:31                     ` Jeremy Allison
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2020-05-07 16:50 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Pavel Begunkov, Stefan Metzmacher, io-uring, Samba Technical

On 5/7/20 10:48 AM, Jeremy Allison wrote:
> On Thu, May 07, 2020 at 10:43:17AM -0600, Jens Axboe wrote:
>>
>> Replying here, as I missed the storm yesterday... The reason why it's
>> different is that later kernels no longer attempt to prevent the short
>> reads. They happen when you get overlapping buffered IO. Then one sqe
>> will find that X of the Y range is already in cache, and return that.
>> We don't retry the latter blocking. We previously did, but there was
>> a few issues with it:
>>
>> - You're redoing the whole IO, which means more copying
>>
>> - It's not safe to retry, it'll depend on the file type. For socket,
>>   pipe, etc we obviously cannot. This is the real reason it got disabled,
>>   as it was broken there.
>>
>> Just like for regular system calls, applications must be able to deal
>> with short IO.
> 
> Thanks, that's a helpful definitive reply. Of course, the SMB3
> protocol is designed to deal with short IO replies as well, and
> the Samba and linux kernel clients are well-enough written that
> they do so. MacOS and Windows however..

I'm honestly surprised that such broken clients exists! Even being
a somewhat old timer cynic...

> Unfortunately they're the most popular clients on the planet,
> so we'll probably have to fix Samba to never return short IOs.

That does sound like the best way forward, short IOs is possible
with regular system calls as well, but will definitely be a lot
more frequent with io_uring depending on the access patterns,
page cache, number of threads, and so on.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-07 16:50                   ` Jens Axboe
@ 2020-05-07 18:31                     ` Jeremy Allison
  2020-05-07 18:35                       ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Jeremy Allison @ 2020-05-07 18:31 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, Stefan Metzmacher, io-uring, Samba Technical, jra

On Thu, May 07, 2020 at 10:50:40AM -0600, Jens Axboe wrote:
> On 5/7/20 10:48 AM, Jeremy Allison wrote:
> > On Thu, May 07, 2020 at 10:43:17AM -0600, Jens Axboe wrote:
> >>
> >> Just like for regular system calls, applications must be able to deal
> >> with short IO.
> > 
> > Thanks, that's a helpful definitive reply. Of course, the SMB3
> > protocol is designed to deal with short IO replies as well, and
> > the Samba and linux kernel clients are well-enough written that
> > they do so. MacOS and Windows however..
> 
> I'm honestly surprised that such broken clients exists! Even being
> a somewhat old timer cynic...
> 
> > Unfortunately they're the most popular clients on the planet,
> > so we'll probably have to fix Samba to never return short IOs.
> 
> That does sound like the best way forward, short IOs is possible
> with regular system calls as well, but will definitely be a lot
> more frequent with io_uring depending on the access patterns,
> page cache, number of threads, and so on.

OK, I just want to be *REALLY CLEAR* what you're telling me
(I've already written the pread/pwrite wrappers for Samba
that deal with short IO but want to ensure I understand
fully before making any changes to Samba).

You're saying that on a bog-standard ext4 disk file:

ret = pread(fd, buf, count, offset);

can return *less* than count bytes if there's no IO
error and the file size is greater than offset+count
and no one else is in the middle of a truncate etc. ?

And:

ret = pwrite(fd, buf, count, offset);

can return less* than count bytes if there's no IO
error and there's ample space on disk ?

I have to say I've *never* seen that happen, and
Samba is widely enough used that IO corruption from
short reads/writes from MacOSX and Windows clients
would have been widely reported by now.

Look at how quickly someone spotted disk corruption
because of the change in userspace-visible behavior
of the io_uring interface. We only shipped that code
03 March 2020 and someone *already* found it.

Jeremy.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-07 18:31                     ` Jeremy Allison
@ 2020-05-07 18:35                       ` Jens Axboe
  2020-05-07 18:55                         ` Jeremy Allison
  0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2020-05-07 18:35 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Pavel Begunkov, Stefan Metzmacher, io-uring, Samba Technical

On 5/7/20 12:31 PM, Jeremy Allison wrote:
> On Thu, May 07, 2020 at 10:50:40AM -0600, Jens Axboe wrote:
>> On 5/7/20 10:48 AM, Jeremy Allison wrote:
>>> On Thu, May 07, 2020 at 10:43:17AM -0600, Jens Axboe wrote:
>>>>
>>>> Just like for regular system calls, applications must be able to deal
>>>> with short IO.
>>>
>>> Thanks, that's a helpful definitive reply. Of course, the SMB3
>>> protocol is designed to deal with short IO replies as well, and
>>> the Samba and linux kernel clients are well-enough written that
>>> they do so. MacOS and Windows however..
>>
>> I'm honestly surprised that such broken clients exists! Even being
>> a somewhat old timer cynic...
>>
>>> Unfortunately they're the most popular clients on the planet,
>>> so we'll probably have to fix Samba to never return short IOs.
>>
>> That does sound like the best way forward, short IOs is possible
>> with regular system calls as well, but will definitely be a lot
>> more frequent with io_uring depending on the access patterns,
>> page cache, number of threads, and so on.
> 
> OK, I just want to be *REALLY CLEAR* what you're telling me
> (I've already written the pread/pwrite wrappers for Samba
> that deal with short IO but want to ensure I understand
> fully before making any changes to Samba).
> 
> You're saying that on a bog-standard ext4 disk file:
> 
> ret = pread(fd, buf, count, offset);
> 
> can return *less* than count bytes if there's no IO
> error and the file size is greater than offset+count
> and no one else is in the middle of a truncate etc. ?
> 
> And:
> 
> ret = pwrite(fd, buf, count, offset);
> 
> can return less* than count bytes if there's no IO
> error and there's ample space on disk ?
> 
> I have to say I've *never* seen that happen, and
> Samba is widely enough used that IO corruption from
> short reads/writes from MacOSX and Windows clients
> would have been widely reported by now.
> 
> Look at how quickly someone spotted disk corruption
> because of the change in userspace-visible behavior
> of the io_uring interface. We only shipped that code
> 03 March 2020 and someone *already* found it.

I _think_ that will only happen on regular files if you use RWF_NOWAIT
or similar, for regular blocking it should not happen. So I don't think
you're at risk there, though I do think that anyone should write
applications with short IOs in mind or they will run into surprises down
the line. Should have been more clear!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-07 18:35                       ` Jens Axboe
@ 2020-05-07 18:55                         ` Jeremy Allison
  2020-05-07 18:58                           ` Jens Axboe
  0 siblings, 1 reply; 27+ messages in thread
From: Jeremy Allison @ 2020-05-07 18:55 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, Stefan Metzmacher, io-uring, Samba Technical, jra

On Thu, May 07, 2020 at 12:35:42PM -0600, Jens Axboe wrote:
> On 5/7/20 12:31 PM, Jeremy Allison wrote:
> > 
> > Look at how quickly someone spotted disk corruption
> > because of the change in userspace-visible behavior
> > of the io_uring interface. We only shipped that code
> > 03 March 2020 and someone *already* found it.
> 
> I _think_ that will only happen on regular files if you use RWF_NOWAIT
> or similar, for regular blocking it should not happen. So I don't think
> you're at risk there, though I do think that anyone should write
> applications with short IOs in mind or they will run into surprises down
> the line. Should have been more clear!

Well we definitely considered short IOs writing the
server code, but as the protocol allows that to be
visible to the clients (in fact it has explicit
fields meant to deal with it) it wasn't considered
vital to hide them from clients.

We'll certainly fix up short reads for the iouring
module, but it's less clear we should mess with
our existing blocking threaded pread/pwrite code
to deal with them. Possibly goes into the bucket
of "belt and braces, couldn't possibly hurt" :-).

Thanks for the clarification !

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
  2020-05-07 18:55                         ` Jeremy Allison
@ 2020-05-07 18:58                           ` Jens Axboe
  0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2020-05-07 18:58 UTC (permalink / raw)
  To: Jeremy Allison
  Cc: Pavel Begunkov, Stefan Metzmacher, io-uring, Samba Technical

On 5/7/20 12:55 PM, Jeremy Allison wrote:
> On Thu, May 07, 2020 at 12:35:42PM -0600, Jens Axboe wrote:
>> On 5/7/20 12:31 PM, Jeremy Allison wrote:
>>>
>>> Look at how quickly someone spotted disk corruption
>>> because of the change in userspace-visible behavior
>>> of the io_uring interface. We only shipped that code
>>> 03 March 2020 and someone *already* found it.
>>
>> I _think_ that will only happen on regular files if you use RWF_NOWAIT
>> or similar, for regular blocking it should not happen. So I don't think
>> you're at risk there, though I do think that anyone should write
>> applications with short IOs in mind or they will run into surprises down
>> the line. Should have been more clear!
> 
> Well we definitely considered short IOs writing the
> server code, but as the protocol allows that to be
> visible to the clients (in fact it has explicit
> fields meant to deal with it) it wasn't considered
> vital to hide them from clients.

Yes, and in case my reply wasn't totally clear, it was more of a general
observation, not directed at Samba specifically!

> We'll certainly fix up short reads for the iouring
> module, but it's less clear we should mess with
> our existing blocking threaded pread/pwrite code
> to deal with them. Possibly goes into the bucket
> of "belt and braces, couldn't possibly hurt" :-).

Agree, belts and suspenders for the regular pread/pwrite, that's a fair
position.

> Thanks for the clarification !

Thanks for getting this fleshed out! Impressed with the speed at which
we got to the bottom of this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-05-07 18:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-05 10:04 Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3 Stefan Metzmacher
2020-05-05 14:41 ` Jens Axboe
2020-05-05 15:44   ` Jens Axboe
2020-05-05 16:53     ` Jens Axboe
2020-05-05 17:39       ` Jens Axboe
2020-05-05 17:48         ` Jeremy Allison
2020-05-05 17:50           ` Jens Axboe
     [not found]           ` <[email protected]>
2020-05-06 10:33             ` Stefan Metzmacher
2020-05-06 10:41               ` Stefan Metzmacher
     [not found]               ` <[email protected]>
2020-05-06 14:08                 ` Stefan Metzmacher
2020-05-06 14:43                   ` Andreas Schneider
2020-05-06 14:46                   ` Andreas Schneider
2020-05-06 15:06                     ` Stefan Metzmacher
2020-05-06 17:03                   ` Jeremy Allison
2020-05-06 17:13                     ` Jeremy Allison
2020-05-06 18:01                     ` Jeremy Allison
2020-05-05 20:19       ` Stefan Metzmacher
2020-05-06 12:55         ` Pavel Begunkov
2020-05-06 15:20           ` Stefan Metzmacher
2020-05-06 15:42             ` Pavel Begunkov
2020-05-07 16:43               ` Jens Axboe
2020-05-07 16:48                 ` Jeremy Allison
2020-05-07 16:50                   ` Jens Axboe
2020-05-07 18:31                     ` Jeremy Allison
2020-05-07 18:35                       ` Jens Axboe
2020-05-07 18:55                         ` Jeremy Allison
2020-05-07 18:58                           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox