public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH liburing 0/5] segfault and not only fixes
@ 2021-02-11 23:08 Pavel Begunkov
  2021-02-11 23:08 ` [PATCH liburing 1/5] src/queue: don't re-wait for CQEs Pavel Begunkov
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
  To: Jens Axboe, io-uring

First 4 should be good and simple. 5/5 is my shot on the segfaults,
take it with a grain of salt.

link-timeout failure is a separate beast, it's from the old times,
and comes from the kernel's io_async_find_and_cancel() failing with
ENOENT(?) when a linked-timeout sees its master but fails to cancel
it, e.g. when the master is in IRQ or posting CQE.
Maybe we just need to fix the test.

Pavel Begunkov (5):
  src/queue: don't re-wait for CQEs
  src/queue: control kernel enter with a var
  test/link-timeout: close pipes after yourself
  test/sq-poll-share: don't ignore wait errors
  src/queue: fix no-error with NULL cqe

 src/include/liburing.h |  4 +++-
 src/queue.c            | 22 +++++++++-------------
 test/link-timeout.c    |  2 ++
 test/sq-poll-share.c   |  9 ++++++++-
 4 files changed, 22 insertions(+), 15 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH liburing 1/5] src/queue: don't re-wait for CQEs
  2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
  2021-02-11 23:08 ` [PATCH liburing 2/5] src/queue: control kernel enter with a var Pavel Begunkov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
  To: Jens Axboe, io-uring

If ret==to_submit then io_uring_enter did go to waiting path and we
should not repeat it. And that was the intention of a post syscall
data->submit check, but reshuffling spoiled it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 src/queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/queue.c b/src/queue.c
index 8c394dd..c2facfd 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -124,7 +124,7 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
 		}
 
 		data->submit -= ret;
-		if (ret == (int)data->submit) {
+		if (!data->submit) {
 			/*
 			 * When SETUP_IOPOLL is set, __sys_io_uring enter()
 			 * must be called to reap new completions but the call
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH liburing 2/5] src/queue: control kernel enter with a var
  2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
  2021-02-11 23:08 ` [PATCH liburing 1/5] src/queue: don't re-wait for CQEs Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
  2021-02-11 23:08 ` [PATCH liburing 3/5] test/link-timeout: close pipes after yourself Pavel Begunkov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
  To: Jens Axboe, io-uring

We check twice for all entering conditions in _io_uring_get_cqe(), first
to set flags, and the second to potentially break the loop. Save it into
a need_enter var.

Also, don't set IORING_ENTER_GETEVENTS when there is already enough of
events in the CQ.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 src/queue.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/queue.c b/src/queue.c
index c2facfd..4fb4ea7 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -92,6 +92,7 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
 	int err;
 
 	do {
+		bool need_enter = false;
 		bool cq_overflow_flush = false;
 		unsigned flags = 0;
 		unsigned nr_available;
@@ -107,12 +108,15 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
 			}
 			cq_overflow_flush = true;
 		}
-		if (data->wait_nr || cq_overflow_flush)
+		if (data->wait_nr > nr_available || cq_overflow_flush) {
 			flags = IORING_ENTER_GETEVENTS | data->get_flags;
-		if (data->submit)
+			need_enter = true;
+		}
+		if (data->submit) {
 			sq_ring_needs_enter(ring, &flags);
-		if (data->wait_nr <= nr_available && !data->submit &&
-		    !cq_overflow_flush)
+			need_enter = true;
+		}
+		if (!need_enter)
 			break;
 
 		ret = __sys_io_uring_enter2(ring->ring_fd, data->submit,
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH liburing 3/5] test/link-timeout: close pipes after yourself
  2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
  2021-02-11 23:08 ` [PATCH liburing 1/5] src/queue: don't re-wait for CQEs Pavel Begunkov
  2021-02-11 23:08 ` [PATCH liburing 2/5] src/queue: control kernel enter with a var Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
  2021-02-11 23:08 ` [PATCH liburing 4/5] test/sq-poll-share: don't ignore wait errors Pavel Begunkov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Close pipe fds when we exit test_single_link_timeout().

Signed-off-by: Pavel Begunkov <[email protected]>
---
 test/link-timeout.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/test/link-timeout.c b/test/link-timeout.c
index a517c5e..5d8417f 100644
--- a/test/link-timeout.c
+++ b/test/link-timeout.c
@@ -528,6 +528,8 @@ static int test_single_link_timeout(struct io_uring *ring, unsigned nsec)
 		io_uring_cqe_seen(ring, cqe);
 	}
 
+	close(fds[0]);
+	close(fds[1]);
 	return 0;
 err:
 	return 1;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH liburing 4/5] test/sq-poll-share: don't ignore wait errors
  2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-02-11 23:08 ` [PATCH liburing 3/5] test/link-timeout: close pipes after yourself Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
  2021-02-11 23:08 ` [PATCH liburing 5/5] src/queue: fix no-error with NULL cqe Pavel Begunkov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Check io_uring_wait_cqe() result, it's not safe to poke into cqe on
error.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 test/sq-poll-share.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/test/sq-poll-share.c b/test/sq-poll-share.c
index 0f25389..02b008e 100644
--- a/test/sq-poll-share.c
+++ b/test/sq-poll-share.c
@@ -60,7 +60,14 @@ static int wait_io(struct io_uring *ring, int nr_ios)
 	struct io_uring_cqe *cqe;
 
 	while (nr_ios) {
-		io_uring_wait_cqe(ring, &cqe);
+		int ret = io_uring_wait_cqe(ring, &cqe);
+
+		if (ret == -EAGAIN) {
+			continue;
+		} else if (ret) {
+			fprintf(stderr, "io_uring_wait_cqe failed %i\n", ret);
+			return 1;
+		}
 		if (cqe->res != BS) {
 			fprintf(stderr, "Unexpected ret %d\n", cqe->res);
 			return 1;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH liburing 5/5] src/queue: fix no-error with NULL cqe
  2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
                   ` (3 preceding siblings ...)
  2021-02-11 23:08 ` [PATCH liburing 4/5] test/sq-poll-share: don't ignore wait errors Pavel Begunkov
@ 2021-02-11 23:08 ` Pavel Begunkov
  2021-02-11 23:10 ` [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
  2021-02-11 23:45 ` Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:08 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Can happen that _io_uring_get_cqe() returns success without filled cqe,
and it's ok, especially for wait_n=0. Fix up error code for functions
that always want to have a valid CQE on success.

Also don't do cq_ring_needs_flush() check in a some weird place,
mainstream it together with wait_nr testing.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 src/include/liburing.h |  4 +++-
 src/queue.c            | 10 +---------
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/src/include/liburing.h b/src/include/liburing.h
index 90403bc..27c5a14 100644
--- a/src/include/liburing.h
+++ b/src/include/liburing.h
@@ -601,7 +601,9 @@ static inline int io_uring_wait_cqe_nr(struct io_uring *ring,
 				      struct io_uring_cqe **cqe_ptr,
 				      unsigned wait_nr)
 {
-	return __io_uring_get_cqe(ring, cqe_ptr, 0, wait_nr, NULL);
+	int ret = __io_uring_get_cqe(ring, cqe_ptr, 0, wait_nr, NULL);
+
+	return (ret || *cqe_ptr) ? ret : -EAGAIN;
 }
 
 /*
diff --git a/src/queue.c b/src/queue.c
index 4fb4ea7..0b09a9c 100644
--- a/src/queue.c
+++ b/src/queue.c
@@ -93,7 +93,6 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
 
 	do {
 		bool need_enter = false;
-		bool cq_overflow_flush = false;
 		unsigned flags = 0;
 		unsigned nr_available;
 		int ret;
@@ -101,14 +100,7 @@ static int _io_uring_get_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_pt
 		err = __io_uring_peek_cqe(ring, &cqe, &nr_available);
 		if (err)
 			break;
-		if (!cqe && !to_wait && !data->submit) {
-			if (!cq_ring_needs_flush(ring)) {
-				err = -EAGAIN;
-				break;
-			}
-			cq_overflow_flush = true;
-		}
-		if (data->wait_nr > nr_available || cq_overflow_flush) {
+		if (data->wait_nr > nr_available || cq_ring_needs_flush(ring)) {
 			flags = IORING_ENTER_GETEVENTS | data->get_flags;
 			need_enter = true;
 		}
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH liburing 0/5] segfault and not only fixes
  2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
                   ` (4 preceding siblings ...)
  2021-02-11 23:08 ` [PATCH liburing 5/5] src/queue: fix no-error with NULL cqe Pavel Begunkov
@ 2021-02-11 23:10 ` Pavel Begunkov
  2021-02-11 23:45 ` Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2021-02-11 23:10 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 11/02/2021 23:08, Pavel Begunkov wrote:
> First 4 should be good and simple. 5/5 is my shot on the segfaults,
> take it with a grain of salt.
> 
> link-timeout failure is a separate beast, it's from the old times,
> and comes from the kernel's io_async_find_and_cancel() failing with
> ENOENT(?) when a linked-timeout sees its master but fails to cancel
> it, e.g. when the master is in IRQ or posting CQE.
> Maybe we just need to fix the test.

Easily reproducible if you apply 3/5 and do

while (1) {
	int err = test_single_link_timeout(10);
	assert(err == 0);
}

> 
> Pavel Begunkov (5):
>   src/queue: don't re-wait for CQEs
>   src/queue: control kernel enter with a var
>   test/link-timeout: close pipes after yourself
>   test/sq-poll-share: don't ignore wait errors
>   src/queue: fix no-error with NULL cqe
> 
>  src/include/liburing.h |  4 +++-
>  src/queue.c            | 22 +++++++++-------------
>  test/link-timeout.c    |  2 ++
>  test/sq-poll-share.c   |  9 ++++++++-
>  4 files changed, 22 insertions(+), 15 deletions(-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH liburing 0/5] segfault and not only fixes
  2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
                   ` (5 preceding siblings ...)
  2021-02-11 23:10 ` [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
@ 2021-02-11 23:45 ` Jens Axboe
  6 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2021-02-11 23:45 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 2/11/21 4:08 PM, Pavel Begunkov wrote:
> First 4 should be good and simple. 5/5 is my shot on the segfaults,
> take it with a grain of salt.
> 
> link-timeout failure is a separate beast, it's from the old times,
> and comes from the kernel's io_async_find_and_cancel() failing with
> ENOENT(?) when a linked-timeout sees its master but fails to cancel
> it, e.g. when the master is in IRQ or posting CQE.
> Maybe we just need to fix the test.

1-4 look fine to me, I don't like 5. I've committed a different
variant that I think better fixes the real issue of doing a
return that's too early.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-02-11 23:47 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-02-11 23:08 [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 1/5] src/queue: don't re-wait for CQEs Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 2/5] src/queue: control kernel enter with a var Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 3/5] test/link-timeout: close pipes after yourself Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 4/5] test/sq-poll-share: don't ignore wait errors Pavel Begunkov
2021-02-11 23:08 ` [PATCH liburing 5/5] src/queue: fix no-error with NULL cqe Pavel Begunkov
2021-02-11 23:10 ` [PATCH liburing 0/5] segfault and not only fixes Pavel Begunkov
2021-02-11 23:45 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox