public inbox for [email protected]
 help / color / mirror / Atom feed
From: Dominique MARTINET <[email protected]>
To: Nikolay Borisov <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], [email protected]
Subject: Re: read corruption with qemu master io_uring engine / linux master / btrfs(?)
Date: Wed, 29 Jun 2022 14:14:18 +0900	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

[-- Attachment #1: Type: text/plain, Size: 2008 bytes --]

Dominique MARTINET wrote on Wed, Jun 29, 2022 at 09:35:44AM +0900:
> I also agree writing a simple program like the io_uring test in the
> above commit that'd sort of do it like qemu and compare contents would
> be ideal.
> I'll have a stab at this today.

Okay, after half a day failing to reproduce I had a closer look at qemu
and... it's a qemu bug.

Well, there probably are two bugs, but one should be benign:

 - qemu short read handling was... rather disappointing.
Patch should appear here[1] eventually, but as it seems moderated?
I'm reposting it here:
-----
diff --git a/block/io_uring.c b/block/io_uring.c
index d48e472e74cb..d58aff9615ce 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -103,7 +103,7 @@ static void luring_resubmit_short_read(LuringState *s, LuringAIOCB *luringcb,
                       remaining);
 
     /* Update sqe */
-    luringcb->sqeq.off = nread;
+    luringcb->sqeq.off += nread;
     luringcb->sqeq.addr = (__u64)(uintptr_t)luringcb->resubmit_qiov.iov;
     luringcb->sqeq.len = luringcb->resubmit_qiov.niov;
 
-----
 (basically "just" a typo, but that must have never been tested!)
[1] https://lore.kernel.org/qemu-devel/[email protected]


 - comments there also say short reads should never happen on newer
kernels (assuming local filesystems?) -- how true is that? If we're
doing our best kernel side to avoid short reads I guess we probably
ought to have a look at this.

It can easily be reproduced with a simple io_uring program -- see
example attached that eventually fails with the following error on
btrfs:
bad read result for io 8, offset 792227840: 266240 should be 1466368

but doesn't fail on tmpfs or without O_DIRECT.
feel free to butcher it, it's already a quickly hacked downversion of my
original test that had hash computation etc so the flow might feel a bit
weird.
Just compile with `gcc -o shortreads uring_shortreads.c -luring` and run
with file to read in argument.


Thanks!
-- 
Dominique

[-- Attachment #2: uring_shortreads.c --]
[-- Type: text/x-csrc, Size: 3359 bytes --]

/* Get O_DIRECT */
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>
#include <liburing.h>
#include <sys/random.h>
#include <sys/stat.h>

long pagesize;
size_t n_blocks;
#define QUEUE_SIZE 10
char *buffers[QUEUE_SIZE];
int bufsize[QUEUE_SIZE];
struct iovec iovec[QUEUE_SIZE];
long int offsets[QUEUE_SIZE];

void breakme(void) {
}

int submit_read(struct io_uring *ring, int fd, int i) {
	struct io_uring_sqe *sqe;
	int ret;

	sqe = io_uring_get_sqe(ring);
	if (!sqe) {
		fprintf(stderr, "Failed to get io_uring sqe\n");
		return 1;
	}
	if (i == 0 || rand() % 2 == 0 || offsets[i-1] > n_blocks - bufsize[i]) {
		offsets[i] = rand() % (n_blocks - bufsize[i] + 1);
	} else {
		offsets[i] = offsets[i - 1];
	}
	io_uring_prep_readv(sqe, fd, iovec + i, 1, offsets[i] * pagesize);
	io_uring_sqe_set_data(sqe, (void*)(uintptr_t)i);
	ret = io_uring_submit(ring);
	if (ret != 1) {
		fprintf(stderr,	"submit failed\n");
		return 1;
	}
	return 0;
}

int getsize(int fd) {
	struct stat sb;
	if (fstat(fd, &sb)) {
		fprintf(stderr, "fstat: %m\n");
		return 1;
	}
	n_blocks = sb.st_size / pagesize;
	return 0;
}

int main(int argc, char *argv[])
{
	char *file, *mapfile;
	unsigned int seed;
	struct io_uring ring;
	struct io_uring_cqe *cqe;
	int fd, i;
	ssize_t ret;
	size_t total = 0;

	if (argc < 2 || argc > 3) {
		fprintf(stderr, "Use: %s <file> [<seed>]\n", argv[0]);
		return 1;
	}
	file = argv[1];
	if (argc == 3) {
		seed = atol(argv[2]);
	} else {
		getrandom(&seed, sizeof(seed), 0);
	}
	printf("random seed %u\n", seed);
	srand(seed);
	pagesize = sysconf(_SC_PAGE_SIZE);
	if (asprintf(&mapfile, "%s.map", file) < 0) {
		fprintf(stderr, "asprintf map %d\n", errno);
		return 1;
	}

	fd = open(file, O_RDONLY | O_DIRECT);
	if (fd == -1) {
		fprintf(stderr,
				"Failed to open file '%s': %s (errno %d)\n",
				file, strerror(errno), errno);
		return 1;
	}
	if (getsize(fd))
		return 1;

	for (i = 0 ; i < QUEUE_SIZE; i++) {
		bufsize[i] = (rand() % 1024) + 1;
		ret = posix_memalign((void**)&buffers[i], pagesize, bufsize[i] * pagesize);
		if (ret) {
			fprintf(stderr, "Failed to allocate read buffer\n");
			return 1;
		}
	}


	printf("Starting io_uring reads...\n");


	ret = io_uring_queue_init(QUEUE_SIZE, &ring, 0);
	if (ret != 0) {
		fprintf(stderr, "Failed to create io_uring queue\n");
		return 1;
	}


	for (i = 0 ; i < QUEUE_SIZE; i++) {
		iovec[i].iov_base = buffers[i];
		iovec[i].iov_len = bufsize[i] * pagesize;
		if (submit_read(&ring, fd, i))
			return 1;
	}

	while (total++ < 10000000) {
		if (total % 1000 == 0)
			printf("%zd\n", total);
		ret = io_uring_wait_cqe(&ring, &cqe);
		if (ret < 0) {
			fprintf(stderr, "Failed at io_uring_wait_cqe()\n");
			return 1;
		}
		i = (intptr_t)io_uring_cqe_get_data(cqe);
		if (cqe->res < 0) {
			fprintf(stderr, "bad read result for io %d, offset %zd: %d\n",
				i, offsets[i] * pagesize, cqe->res);
			breakme();
			return 1;
		}
		if (cqe->res != bufsize[i] * pagesize) {
			fprintf(stderr, "bad read result for io %d, offset %zd: %d should be %zd\n",
				i, offsets[i] * pagesize, cqe->res, bufsize[i] * pagesize);
			breakme();
			return 1;
		}
		io_uring_cqe_seen(&ring, cqe);

		// resubmit
		if (submit_read(&ring, fd, i))
			return 1;
	}
	io_uring_queue_exit(&ring);

	return 0;
}

  reply	other threads:[~2022-06-29  5:14 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-28  9:08 read corruption with qemu master io_uring engine / linux master / btrfs(?) Dominique MARTINET
2022-06-28 19:03 ` Nikolay Borisov
2022-06-29  0:35   ` Dominique MARTINET
2022-06-29  5:14     ` Dominique MARTINET [this message]
2022-06-29 15:37       ` Filipe Manana
2022-06-30  0:41         ` Dominique MARTINET
2022-06-30  7:56           ` Dominique MARTINET
2022-06-30 10:45             ` Filipe Manana
2022-06-30 11:09               ` Filipe Manana
2022-06-30 11:27               ` Dominique MARTINET
2022-06-30 12:51                 ` Filipe Manana
2022-06-30 13:08                   ` Dominique MARTINET
2022-06-30 15:10                     ` Filipe Manana
2022-07-01  1:25                       ` Dominique MARTINET
2022-07-01  8:45                         ` Filipe Manana
2022-07-01 10:33                           ` Filipe Manana
2022-07-01 23:48                             ` Dominique MARTINET
2022-06-28 19:05 ` Nikolay Borisov
2022-06-28 19:12 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox