From: Florian Fischer <[email protected]>
To: Jens Axboe <[email protected]>
Cc: Florian Schmaus <[email protected]>, [email protected]
Subject: Re: Canceled read requests never completed
Date: Wed, 19 Jan 2022 14:42:22 +0100 [thread overview]
Message-ID: <20220119134222.5wx7wrnalhrerspu@pasture> (raw)
In-Reply-To: <[email protected]>
[-- Attachment #1: Type: text/plain, Size: 2822 bytes --]
On 18.01.2022 19:32, Jens Axboe wrote:
> On 1/18/22 4:36 PM, Jens Axboe wrote:
> > On 1/18/22 1:05 PM, Florian Fischer wrote:
> >>>> After reading the io_uring_enter(2) man page a IORING_OP_ASYNC_CANCEL's return value of -EALREADY apparently
> >>>> may not cause the request to terminate. At least that is our interpretation of "…res field will contain -EALREADY.
> >>>> In this case, the request may or may not terminate."
> >>>
> >>> I took a look at this, and my theory is that the request cancelation
> >>> ends up happening right in between when the work item is moved between
> >>> the work list and to the worker itself. The way the async queue works,
> >>> the work item is sitting in a list until it gets assigned by a worker.
> >>> When that assignment happens, it's removed from the general work list
> >>> and then assigned to the worker itself. There's a small gap there where
> >>> the work cannot be found in the general list, and isn't yet findable in
> >>> the worker itself either.
> >>>
> >>> Do you always see -ENOENT from the cancel when you get the hang
> >>> condition?
> >>
> >> No we also and actually more commonly observe cancel returning
> >> -EALREADY and the canceled read request never gets completed.
> >>
> >> As shown in the log snippet I included below.
> >
> > I think there are a couple of different cases here. Can you try the
> > below patch? It's against current -git.
>
> Cleaned it up and split it into functional bits, end result is here:
>
> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-5.17
Thanks. I have build and tested your patches.
The most common error we observed (read -> read -> write -> 2x cancel)
is no longer reproducible and our originally test case works flawless :)
Nor could I reproduce any hangs with cancel returning -ENOENT.
But I still can reliably reproduce stuck threads when not incrementing the evfd
count and thus never completing reads due to available data to read.
(read -> read -> write (do not increment evfd count) -> 2x cancel)
I further reduced the attached C program to reproduce the above problem.
The code is also available now at our gitlab [1].
The following log output was created with a less 'minimal' version still including
log functionality:
75 Collect write completion: 8
75 Collect cancel read 1 completion: 0
75 Collect cancel read 2 completion: -114
75 Collect read 1 completion: -125
75 Collect read 2 completion: -4
75 Collect write completion: 8
75 Collect cancel read 1 completion: 0
75 Collect cancel read 2 completion: -114
75 Collect read 1 completion: -125
thread 75 stuck here
The scenario seams extremely artificial but non or less I think it should
work regardless of its usefulness.
Flo Fischer
[1]: https://gitlab.cs.fau.de/aj46ezos/io_uring-cancel/-/tree/minimal-write0
[-- Attachment #2: io_uring-cancel.c --]
[-- Type: text/plain, Size: 4862 bytes --]
/**
* io_uring cancel miss reproducer
* Copyright 2022 Florian Fischer <[email protected]>
*
* io_urings's IORING_OP_ASYNC_CANCEL sometimes does not cancel inflight requests.
*
* Tested systems:
* - Xeon E7-4830 Debian 11
* - liburing 2.1
* - reproducable on: linux-block/io_uring-5.17
*
* Building the reproducer:
* $ gcc -Werror -Wall -g -O3 io_uring-cancel.c -o io_uring-cancel -pthread -luring
*
* Steps to reproduce on the above system:
* 1. Compile the reproducer
* 2. Run the reproducer in a loop
* $ for i in $(seq 100); do echo $i; ./io_uring-cancel; done
*
* (I need < ~30 runs to reproduce a stuck thread with 96 threads)
*
* Explanation:
* The reproducer opens #CPUs event file descriptors and starts #CPUs threads.
* Each thread creates an io_uring and enters a loop.
* In each iteration the thread submits a read request from evfds[thread_id].
* Submits a second read request from evfds[(thread_id + 1) % #CPUs].
* And issues a write request for evfds[(thread_id + 2) % #CPUs] but the write
* request does not increase the eventfds count thus it will not complete any read.
* The write request is awaited.
* Then two cancel requests for the reads are submitted and the outstanding
* requests are awaited.
*
* The reproducer gets stuck because a cancel request returning with -EALREADY
* did not properly cancel the read the read request is never completed.
*
* WhenpPassing a non-zero value to the reproducer as write increment
* $ ./io_uring-cancel 1
* I could not longer reproduce stuck threads.
*/
#define _GNU_SOURCE
#include <assert.h>
#include <err.h>
#include <errno.h>
#include <liburing.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/eventfd.h>
#include <sys/sysinfo.h>
#include <unistd.h>
#define IORING_ENTRIES 2
#define ITERATIONS 1000
_Thread_local size_t thread_id;
static pthread_t* threads;
static int* evfds;
static unsigned nthreads;
static unsigned write_increment = 0;
static struct io_uring_sqe* checked_get_sqe(struct io_uring* ring) {
struct io_uring_sqe* sqe = io_uring_get_sqe(ring);
assert(sqe);
return sqe;
}
static void prep_evfd_read(struct io_uring* ring, int evfd, uint64_t* buf, uintptr_t tag) {
struct io_uring_sqe* sqe = checked_get_sqe(ring);
io_uring_prep_read(sqe, evfds[evfd], buf, sizeof(*buf), 0);
io_uring_sqe_set_data(sqe, (void*)tag);
}
static void prep_evfd_write(struct io_uring* ring, int evfd, uint64_t* buf) {
struct io_uring_sqe* sqe = checked_get_sqe(ring);
io_uring_prep_write(sqe, evfds[evfd], buf, sizeof(*buf), 0);
}
static void prep_cancel(struct io_uring* ring, uintptr_t tag) {
struct io_uring_sqe* sqe = checked_get_sqe(ring);
io_uring_prep_cancel(sqe, (void*)tag, 0);
}
static void checked_submit_n(struct io_uring* ring, int n) {
int res = io_uring_submit(ring);
if (res < 0) err(EXIT_FAILURE, "io_uring_submit failed");
if (res != n) errx(EXIT_FAILURE, "io_uring_submit submitted less sqes than preprared");
}
static void checked_submit_one(struct io_uring* ring) {
checked_submit_n(ring, 1);
}
static void collect_n_completions(struct io_uring* ring, int n) {
struct io_uring_cqe* cqe;
for (int i = 0; i < n; ++i) {
int res = io_uring_wait_cqe(ring, &cqe);
if (res) err(EXIT_FAILURE, "io_uring_wait_cqe failed");
io_uring_cqe_seen(ring, cqe);
}
}
void* thread_func(void* arg) {
thread_id = (uintptr_t)(arg);
struct io_uring ring;
int res = io_uring_queue_init(IORING_ENTRIES, &ring, 0);
if (res) err(EXIT_FAILURE, "io_uring_queue_init failed");
for (unsigned i = 0; i < ITERATIONS; ++i) {
uint64_t rbuf = 0;
uint64_t wbuf = write_increment;
prep_evfd_read(&ring, thread_id, &rbuf, 1);
checked_submit_one(&ring);
prep_evfd_read(&ring, (thread_id + 1) % nthreads, &rbuf, 2);
checked_submit_one(&ring);
prep_evfd_write(&ring, (thread_id + 2) % nthreads, &wbuf);
checked_submit_one(&ring);
collect_n_completions(&ring, 1);
prep_cancel(&ring, 1);
prep_cancel(&ring, 2);
checked_submit_n(&ring, 2);
collect_n_completions(&ring, 4);
}
return NULL;
}
int main(int argc, char* argv[]) {
if (argc > 1)
write_increment = atoi(argv[1]);
nthreads = get_nprocs();
// nthreads = 10;
evfds = malloc(sizeof(int) * nthreads);
if (!evfds) err(EXIT_FAILURE, "malloc failed");
for (unsigned i = 0; i < nthreads; ++i) {
evfds[i] = eventfd(0, 0);
if(!evfds[i]) err(EXIT_FAILURE, "eventfd failed");
}
threads = malloc(sizeof(pthread_t) * nthreads);
if (!threads) err(EXIT_FAILURE, "malloc failed");
for (unsigned i = 0; i < nthreads; ++i) {
errno = pthread_create(&threads[i], NULL, thread_func, (void*)(uintptr_t)i);
if (errno) err(EXIT_FAILURE, "pthread_create failed");
}
for (unsigned i = 0; i < nthreads; ++i) {
errno = pthread_join(threads[i], NULL);
if (errno) err(EXIT_FAILURE, "pthread_join failed");
}
exit(EXIT_SUCCESS);
}
prev parent reply other threads:[~2022-01-19 13:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-18 15:13 Canceled read requests never completed Florian Fischer
2022-01-18 19:07 ` Jens Axboe
2022-01-18 20:05 ` Florian Fischer
2022-01-18 23:36 ` Jens Axboe
2022-01-19 2:32 ` Jens Axboe
2022-01-19 13:42 ` Florian Fischer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220119134222.5wx7wrnalhrerspu@pasture \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox