* [PATCH liburing 0/5] segfault and not only fixes
@ 2021-02-11 23:08 Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 1/5] src/queue: don't re-wait for CQEs Pavel Begunkov
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
To: Jens Axboe, io-uring
First 4 should be good and simple. 5/5 is my shot on the segfaults,
take it with a grain of salt.
link-timeout failure is a separate beast, it's from the old times,
and comes from the kernel's io_async_find_and_cancel() failing with
ENOENT(?) when a linked-timeout sees its master but fails to cancel
it, e.g. when the master is in IRQ or posting CQE.
Maybe we just need to fix the test.
Pavel Begunkov (5):
src/queue: don't re-wait for CQEs
src/queue: control kernel enter with a var
test/link-timeout: close pipes after yourself
test/sq-poll-share: don't ignore wait errors
src/queue: fix no-error with NULL cqe
src/include/liburing.h | 4 +++-
src/queue.c | 22 +++++++++-------------
test/link-timeout.c | 2 ++
test/sq-poll-share.c | 9 ++++++++-
4 files changed, 22 insertions(+), 15 deletions(-)
--
2.24.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH liburing 1/5] src/queue: don't re-wait for CQEs
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 2/5] src/queue: control kernel enter with a var Pavel Begunkov
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
To: Jens Axboe, io-uring
If ret==to_submit then io_uring_enter did go to waiting path and we
should not repeat it. And that was the intention of a post syscall
data->submit check, but reshuffling spoiled it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
src/queue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/queue.c b/src/queue.c
index 8c394dd..c2facfd 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -124,7 +124,7 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
}
data->submit -= ret;
- if (ret == (int)data->submit) {
+ if (!data->submit) {
/*
* When SETUP_IOPOLL is set, __sys_io_uring enter()
* must be called to reap new completions but the call
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH liburing 2/5] src/queue: control kernel enter with a var
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 1/5] src/queue: don't re-wait for CQEs Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 3/5] test/link-timeout: close pipes after yourself Pavel Begunkov
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
To: Jens Axboe, io-uring
We check twice for all entering conditions in _io_uring_get_cqe(), first
to set flags, and the second to potentially break the loop. Save it into
a need_enter var.
Also, don't set IORING_ENTER_GETEVENTS when there is already enough of
events in the CQ.
Signed-off-by: Pavel Begunkov <[email protected]>
---
src/queue.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/queue.c b/src/queue.c
index c2facfd..4fb4ea7 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -92,6 +92,7 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
int err;
do {
+ bool need_enter = false;
bool cq_overflow_flush = false;
unsigned flags = 0;
unsigned nr_available;
@@ -107,12 +108,15 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
}
cq_overflow_flush = true;
}
- if (data->wait_nr || cq_overflow_flush)
+ if (data->wait_nr > nr_available || cq_overflow_flush) {
flags = IORING_ENTER_GETEVENTS | data->get_flags;
- if (data->submit)
+ need_enter = true;
+ }
+ if (data->submit) {
sq_ring_needs_enter(ring, &flags);
- if (data->wait_nr <= nr_available && !data->submit &&
- !cq_overflow_flush)
+ need_enter = true;
+ }
+ if (!need_enter)
break;
ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH liburing 3/5] test/link-timeout: close pipes after yourself
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 1/5] src/queue: don't re-wait for CQEs Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 2/5] src/queue: control kernel enter with a var Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 4/5] test/sq-poll-share: don't ignore wait errors Pavel Begunkov
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
To: Jens Axboe, io-uring
Close pipe fds when we exit test_single_link_timeout().
Signed-off-by: Pavel Begunkov <[email protected]>
---
test/link-timeout.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/test/link-timeout.c b/test/link-timeout.c
index a517c5e..5d8417f 100644
--- a/test/link-timeout.c
+++ b/test/link-timeout.c
@@ -528,6 +528,8 @@ static int test_single_link_timeout(struct io_uring *ring, unsigned nsec)
io_uring_cqe_seen(ring, cqe);
}
+ close(fds[0]);
+ close(fds[1]);
return 0;
err:
return 1;
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH liburing 4/5] test/sq-poll-share: don't ignore wait errors
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
` (2 preceding siblings ...)
2021-02-11 23:08 ` [PATCH liburing 3/5] test/link-timeout: close pipes after yourself Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 5/5] src/queue: fix no-error with NULL cqe Pavel Begunkov
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
To: Jens Axboe, io-uring
Check io_uring_wait_cqe() result, it's not safe to poke into cqe on
error.
Signed-off-by: Pavel Begunkov <[email protected]>
---
test/sq-poll-share.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/test/sq-poll-share.c b/test/sq-poll-share.c
index 0f25389..02b008e 100644
--- a/test/sq-poll-share.c
+++ b/test/sq-poll-share.c
@@ -60,7 +60,14 @@ static int wait_io(struct io_uring *ring, int nr_ios)
struct io_uring_cqe *cqe;
while (nr_ios) {
- io_uring_wait_cqe(ring, &cqe);
+ int ret = io_uring_wait_cqe(ring, &cqe);
+
+ if (ret == -EAGAIN) {
+ continue;
+ } else if (ret) {
+ fprintf(stderr, "io_uring_wait_cqe failed %i\n", ret);
+ return 1;
+ }
if (cqe->res != BS) {
fprintf(stderr, "Unexpected ret %d\n", cqe->res);
return 1;
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH liburing 5/5] src/queue: fix no-error with NULL cqe
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
` (3 preceding siblings ...)
2021-02-11 23:08 ` [PATCH liburing 4/5] test/sq-poll-share: don't ignore wait errors Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
2021-02-11 23:10 ` [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
2021-02-11 23:45 ` Jens Axboe
6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
To: Jens Axboe, io-uring
Can happen that _io_uring_get_cqe() returns success without filled cqe,
and it's ok, especially for wait_n=0. Fix up error code for functions
that always want to have a valid CQE on success.
Also don't do cq_ring_needs_flush() check in a some weird place,
mainstream it together with wait_nr testing.
Signed-off-by: Pavel Begunkov <[email protected]>
---
src/include/liburing.h | 4 +++-
src/queue.c | 10 +---------
2 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/src/include/liburing.h b/src/include/liburing.h
index 90403bc..27c5a14 100644
--- a/src/include/liburing.h
+++ b/src/include/liburing.h
@@ -601,7 +601,9 @@ static inline int io_uring_wait_cqe_nr(struct io_uring *ring,
struct io_uring_cqe **cqe_ptr,
unsigned wait_nr)
{
- return __io_uring_get_cqe(ring, cqe_ptr, 0, wait_nr, NULL);
+ int ret = __io_uring_get_cqe(ring, cqe_ptr, 0, wait_nr, NULL);
+
+ return (ret || *cqe_ptr) ? ret : -EAGAIN;
}
/*
diff --git a/src/queue.c b/src/queue.c
index 4fb4ea7..0b09a9c 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -93,7 +93,6 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
do {
bool need_enter = false;
- bool cq_overflow_flush = false;
unsigned flags = 0;
unsigned nr_available;
int ret;
@@ -101,14 +100,7 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
if (err)
break;
- if (!cqe && !to_wait && !data->submit) {
- if (!cq_ring_needs_flush(ring)) {
- err = -EAGAIN;
- break;
- }
- cq_overflow_flush = true;
- }
- if (data->wait_nr > nr_available || cq_overflow_flush) {
+ if (data->wait_nr > nr_available || cq_ring_needs_flush(ring)) {
flags = IORING_ENTER_GETEVENTS | data->get_flags;
need_enter = true;
}
--
2.24.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH liburing 0/5] segfault and not only fixes
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
` (4 preceding siblings ...)
2021-02-11 23:08 ` [PATCH liburing 5/5] src/queue: fix no-error with NULL cqe Pavel Begunkov
@ 2021-02-11 23:10 ` Pavel Begunkov
2021-02-11 23:45 ` Jens Axboe
6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:10 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 11/02/2021 23:08, Pavel Begunkov wrote:
> First 4 should be good and simple. 5/5 is my shot on the segfaults,
> take it with a grain of salt.
>
> link-timeout failure is a separate beast, it's from the old times,
> and comes from the kernel's io_async_find_and_cancel() failing with
> ENOENT(?) when a linked-timeout sees its master but fails to cancel
> it, e.g. when the master is in IRQ or posting CQE.
> Maybe we just need to fix the test.
Easily reproducible if you apply 3/5 and do
while (1) {
int err = test_single_link_timeout(10);
assert(err == 0);
}
>
> Pavel Begunkov (5):
> src/queue: don't re-wait for CQEs
> src/queue: control kernel enter with a var
> test/link-timeout: close pipes after yourself
> test/sq-poll-share: don't ignore wait errors
> src/queue: fix no-error with NULL cqe
>
> src/include/liburing.h | 4 +++-
> src/queue.c | 22 +++++++++-------------
> test/link-timeout.c | 2 ++
> test/sq-poll-share.c | 9 ++++++++-
> 4 files changed, 22 insertions(+), 15 deletions(-)
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH liburing 0/5] segfault and not only fixes
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
` (5 preceding siblings ...)
2021-02-11 23:10 ` [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
@ 2021-02-11 23:45 ` Jens Axboe
6 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2021-02-11 23:45 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 2/11/21 4:08 PM, Pavel Begunkov wrote:
> First 4 should be good and simple. 5/5 is my shot on the segfaults,
> take it with a grain of salt.
>
> link-timeout failure is a separate beast, it's from the old times,
> and comes from the kernel's io_async_find_and_cancel() failing with
> ENOENT(?) when a linked-timeout sees its master but fails to cancel
> it, e.g. when the master is in IRQ or posting CQE.
> Maybe we just need to fix the test.
1-4 look fine to me, I don't like 5. I've committed a different
variant that I think better fixes the real issue of doing a
return that's too early.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2021-02-11 23:47 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 1/5] src/queue: don't re-wait for CQEs Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 2/5] src/queue: control kernel enter with a var Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 3/5] test/link-timeout: close pipes after yourself Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 4/5] test/sq-poll-share: don't ignore wait errors Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 5/5] src/queue: fix no-error with NULL cqe Pavel Begunkov
2021-02-11 23:10 ` [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
2021-02-11 23:45 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox