public inbox for [email protected]
 help / color / mirror / Atom feed
From: Stefan Metzmacher <[email protected]>
To: Anoop C S <[email protected]>, Jeremy Allison <[email protected]>
Cc: Samba Technical <[email protected]>,
	io-uring <[email protected]>, Jens Axboe <[email protected]>
Subject: Re: Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3
Date: Wed, 6 May 2020 16:08:03 +0200	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>


[-- Attachment #1.1.1: Type: text/plain, Size: 3753 bytes --]

Am 06.05.20 um 14:41 schrieb Anoop C S:
> On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
>> Hi Anoop,
>>
>>> I could reproduce the difference in SHA256 checksum after copying a
>>> directory with 100 copies of test file(provided by reporter) from
>>> io_uring VFS module enabled share using Windows explorer(right-
>>> click-
>>>> copy/paste). Only 5 out of 100 files had correct checksum after
>>>> copy
>>> operation :-/
>>
>> Great! Can you please try to collect level 1 log files with
>> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
>> applied?
> 
> I have attached three log files.
> log.io_uring.smbd -- Copy using Windows explorer
> log.io_uring-mget.smd -- Copy using smbclient
> log.io_uring-powershell.smd -- Copy using `Copy-Item`

Thanks! All of them show short reads like:

> [2020/05/06 17:27:28.130248,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> [2020/05/06 17:27:28.131049,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680) nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> [2020/05/06 17:27:28.133679,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405
> [2020/05/06 17:27:28.140184,  0] ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576 (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin fnum 1607026405

It seems the first request is at ofs=0 len=32768 (0x8000) and we get
32768 back.
A follow up request with ofs=0 len=1048576 (0x100000) only gets the
first 32768 bytes which are already in the page cache.

I can easily reproduce this with the Ubuntu 5.4 kernel once I add
state->ur.sqe.rw_flags |= RWF_NOWAIT; to vfs_io_uring_pread_send()
and use this.

echo 1 > /proc/sys/vm/drop_caches
head -c 1024 /root/samba-test/ff1.dat | hexdump -C
00000000  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff
|................|
*
00000400
smbclient //172.31.9.167/uringff -Uroot%test -c "get ff1.dat"

results in this log entries:
> [2020/05/06 06:51:57.069990,  0] ../../source3/modules/vfs_io_uring.c:89(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=8388608 (0x800000) nread=16384 (0x4000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065
> [2020/05/06 06:51:57.076882,  1] ../../source3/modules/vfs_io_uring.c:104(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=16384 (0x4000) len=8372224 (0x7fc000) nread=8372224 (0x7fc000) eof=8388608 (0x800000) blks=4096 blocks=16384 ff1.dat fnum 840153065

smbclient is just smart enough to recover itself from the short read.
But the windows client isn't.

The attached test against liburing (git://git.kernel.dk/liburing) should
be able to demonstrate the problem. It can also be found in
https://github.com/metze-samba/liburing/tree/implicit-rwf-nowaithttps://github.com/metze-samba/liburing/commit/eb06dcfde747e46bd08bedf9def2e6cb536c39e3


I added the sqe->rw_flags = RWF_NOWAIT; line in order to demonstrate it
against the Ubuntu 5.3 and 5.4 kernels. They both seem to have the bug.

Can someone run the unmodified test/implicit-rwf_nowait against
a newer kernel?

Thanks!
metze

[-- Attachment #1.1.2: tmp.diff.txt --]
[-- Type: text/plain, Size: 4710 bytes --]

From eb06dcfde747e46bd08bedf9def2e6cb536c39e3 Mon Sep 17 00:00:00 2001
From: Stefan Metzmacher <[email protected]>
Date: Wed, 6 May 2020 15:33:53 +0200
Subject: [PATCH] TODO!!! test/implicit-rwf_nowait.c

Once this proves the problem, we need a more detailed commit message...
---
 test/Makefile              |   4 +-
 test/implicit-rwf_nowait.c | 162 +++++++++++++++++++++++++++++++++++++
 2 files changed, 164 insertions(+), 2 deletions(-)
 create mode 100644 test/implicit-rwf_nowait.c

diff --git a/test/Makefile b/test/Makefile
index ff4d4b8ec04a..50a75588aca7 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -21,7 +21,7 @@ all_targets += poll poll-cancel ring-leak fsync io_uring_setup io_uring_register
 		file-update accept-reuse poll-v-poll fadvise madvise \
 		short-read openat2 probe shared-wq personality eventfd \
 		send_recv eventfd-ring across-fork sq-poll-kthread splice \
-		lfs-openat lfs-openat-write
+		lfs-openat lfs-openat-write implicit-rwf_nowait
 
 include ../Makefile.quiet
 
@@ -53,7 +53,7 @@ test_srcs := poll.c poll-cancel.c ring-leak.c fsync.c io_uring_setup.c \
 	file-update.c accept-reuse.c poll-v-poll.c fadvise.c \
 	madvise.c short-read.c openat2.c probe.c shared-wq.c \
 	personality.c eventfd.c eventfd-ring.c across-fork.c sq-poll-kthread.c \
-	splice.c lfs-openat.c lfs-openat-write.c
+	splice.c lfs-openat.c lfs-openat-write.c implicit-rwf-nowait.c
 
 ifdef CONFIG_HAVE_STATX
 test_srcs += statx.c
diff --git a/test/implicit-rwf_nowait.c b/test/implicit-rwf_nowait.c
new file mode 100644
index 000000000000..581df209d295
--- /dev/null
+++ b/test/implicit-rwf_nowait.c
@@ -0,0 +1,162 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Description: Implicit RWF_NOWAIT bug
+ *
+ */
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+
+#include "liburing.h"
+
+#define BLOCK	4096
+
+#ifndef RWF_NOWAIT
+#define RWF_NOWAIT	8
+#endif
+
+static int get_file_fd(void)
+{
+	ssize_t ret;
+	char *buf;
+	int fd;
+
+	fd = open("testfile", O_RDWR | O_CREAT, 0644);
+	if (fd < 0) {
+		perror("open file");
+		return -1;
+	}
+
+	buf = malloc(BLOCK);
+	memset(buf, 0xff, BLOCK);
+	ret = pwrite(fd, buf, BLOCK, 0);
+	if (ret != BLOCK) {
+		if (ret < 0)
+			perror("write");
+		else
+			printf("Short write\n");
+		goto err;
+	}
+	ret = pwrite(fd, buf, BLOCK, BLOCK);
+	if (ret != BLOCK) {
+		if (ret < 0)
+			perror("write");
+		else
+			printf("Short write\n");
+		goto err;
+	}
+	fsync(fd);
+
+	if (posix_fadvise(fd, BLOCK, BLOCK, POSIX_FADV_DONTNEED)) {
+		perror("fadvise");
+err:
+		close(fd);
+		free(buf);
+		return -1;
+	}
+
+	free(buf);
+	return fd;
+}
+
+static void put_file_fd(int fd)
+{
+	close(fd);
+	unlink("testfile");
+}
+
+int main(int argc, char *argv[])
+{
+	struct io_uring ring;
+	struct io_uring_sqe *sqe;
+	struct io_uring_cqe *cqe;
+	struct iovec iov;
+	int ret, fd;
+
+	iov.iov_base = malloc(2*BLOCK);
+	iov.iov_len = BLOCK;
+
+	ret = io_uring_queue_init(1, &ring, 0);
+	if (ret) {
+		printf("ring setup failed\n");
+		return 1;
+
+	}
+
+	sqe = io_uring_get_sqe(&ring);
+	if (!sqe) {
+		printf("get sqe failed\n");
+		return 1;
+	}
+
+	fd = get_file_fd();
+	if (fd < 0)
+		return 1;
+
+	io_uring_prep_readv(sqe, fd, &iov, 1, 0);
+	io_uring_sqe_set_data(sqe, (void *)(uintptr_t)0x11111111);
+	ret = io_uring_submit(&ring);
+	if (ret != 1) {
+		printf("Got submit %d, expected 1\n", ret);
+		goto err;
+	}
+
+	ret = io_uring_wait_cqe(&ring, &cqe);
+	if (ret) {
+		printf("Ring wait got %d\n", ret);
+		goto err;
+	}
+	ret = (uintptr_t)io_uring_cqe_get_data(cqe);
+	if (ret != 0x11111111) {
+		printf("Got invalid data 0x%08x, expected 0x11111111\n", ret);
+		goto err;
+	}
+	io_uring_cq_advance(&ring, 1);
+
+	if (cqe->res != BLOCK) {
+		printf("cqe res=%d != %u\n", cqe->res, BLOCK);
+		goto err;
+	}
+
+	sqe = io_uring_get_sqe(&ring);
+	if (!sqe) {
+		printf("get sqe failed\n");
+		return 1;
+	}
+
+	iov.iov_len = 2*BLOCK;
+	io_uring_prep_readv(sqe, fd, &iov, 1, 0);
+	// Impliet by broken kernels? sqe->rw_flags = RWF_NOWAIT;
+	io_uring_sqe_set_data(sqe, (void *)(uintptr_t)0x22222222);
+	ret = io_uring_submit(&ring);
+	if (ret != 1) {
+		printf("Got submit %d, expected 1\n", ret);
+		goto err;
+	}
+
+	ret = io_uring_wait_cqe(&ring, &cqe);
+	if (ret) {
+		printf("Ring peek got %d\n", ret);
+		goto err;
+	}
+	ret = (uintptr_t)io_uring_cqe_get_data(cqe);
+	if (ret != 0x22222222) {
+		printf("Got invalid data 0x%08x, expected 0x22222222\n", ret);
+		goto err;
+	}
+	io_uring_cq_advance(&ring, 1);
+
+	if (cqe->res != 2*BLOCK) {
+		printf("cqe res=%d != %u\n", cqe->res, 2*BLOCK);
+		goto err;
+	}
+
+	put_file_fd(fd);
+	return 0;
+err:
+	put_file_fd(fd);
+	return 1;
+}
-- 
2.17.1


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2020-05-06 14:08 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 10:04 Data Corruption bug with Samba's vfs_iouring and Linux 5.6.7/5.7rc3 Stefan Metzmacher
2020-05-05 14:41 ` Jens Axboe
2020-05-05 15:44   ` Jens Axboe
2020-05-05 16:53     ` Jens Axboe
2020-05-05 17:39       ` Jens Axboe
2020-05-05 17:48         ` Jeremy Allison
2020-05-05 17:50           ` Jens Axboe
     [not found]           ` <[email protected]>
2020-05-06 10:33             ` Stefan Metzmacher
2020-05-06 10:41               ` Stefan Metzmacher
     [not found]               ` <[email protected]>
2020-05-06 14:08                 ` Stefan Metzmacher [this message]
2020-05-06 14:43                   ` Andreas Schneider
2020-05-06 14:46                   ` Andreas Schneider
2020-05-06 15:06                     ` Stefan Metzmacher
2020-05-06 17:03                   ` Jeremy Allison
2020-05-06 17:13                     ` Jeremy Allison
2020-05-06 18:01                     ` Jeremy Allison
2020-05-05 20:19       ` Stefan Metzmacher
2020-05-06 12:55         ` Pavel Begunkov
2020-05-06 15:20           ` Stefan Metzmacher
2020-05-06 15:42             ` Pavel Begunkov
2020-05-07 16:43               ` Jens Axboe
2020-05-07 16:48                 ` Jeremy Allison
2020-05-07 16:50                   ` Jens Axboe
2020-05-07 18:31                     ` Jeremy Allison
2020-05-07 18:35                       ` Jens Axboe
2020-05-07 18:55                         ` Jeremy Allison
2020-05-07 18:58                           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox