public inbox for [email protected]
 help / color / mirror / Atom feed
* [v6.4 Regression] rust/io_uring: tests::net::test_tcp_recv_multi hangs
@ 2023-10-19  8:06 Ming Lei
  2023-10-19 11:31 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2023-10-19  8:06 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: ming.lei, Guang Wu

Hello Jens,

Guang Wu found that tests::net::test_tcp_recv_multi in rust:io_uring
hangs, and no such issue in RH test kernel.

- git clone https://github.com/tokio-rs/io-uring.git
- cd io-uring
- cargo run --package io-uring-test

I figured out that it is made by missing the last CQE with -ENOBUFS,
which is caused by commit a2741c58ac67 ("io_uring/net: don't retry recvmsg()
unnecessarily").

I am not sure if the last CQE should be returned and that depends how normal
recv_multi is written, but IORING_CQE_F_MORE in the previous CQE shouldn't be
returned at least.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [v6.4 Regression] rust/io_uring: tests::net::test_tcp_recv_multi hangs
  2023-10-19  8:06 [v6.4 Regression] rust/io_uring: tests::net::test_tcp_recv_multi hangs Ming Lei
@ 2023-10-19 11:31 ` Jens Axboe
  2023-10-19 11:47   ` Ming Lei
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2023-10-19 11:31 UTC (permalink / raw)
  To: Ming Lei, io-uring; +Cc: Guang Wu

On 10/19/23 2:06 AM, Ming Lei wrote:
> Hello Jens,
> 
> Guang Wu found that tests::net::test_tcp_recv_multi in rust:io_uring
> hangs, and no such issue in RH test kernel.
> 
> - git clone https://github.com/tokio-rs/io-uring.git
> - cd io-uring
> - cargo run --package io-uring-test
> 
> I figured out that it is made by missing the last CQE with -ENOBUFS,
> which is caused by commit a2741c58ac67 ("io_uring/net: don't retry recvmsg()
> unnecessarily").
> 
> I am not sure if the last CQE should be returned and that depends how normal
> recv_multi is written, but IORING_CQE_F_MORE in the previous CQE shouldn't be
> returned at least.

Is this because it depends on this spurious retry? IOW, it adds N
buffers and triggers N receives, then depends on an internal extra retry
which would then yield -ENOBUFS? Because that sounds like a broken test.
As long as the recv triggers successfully, IORING_CQE_F_MORE will be
set. Only if it his some terminating condition would it trigger a CQE
without the MORE flag set. If it remains armed and ready to trigger
again, it will have MORE set. I'll take a look, this is pure guesswork
on my side right now.

We've done quite a lot of testing with recv multishot with this change,
and haven't had any issues. Which is why I'm a bit skeptical.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [v6.4 Regression] rust/io_uring: tests::net::test_tcp_recv_multi hangs
  2023-10-19 11:31 ` Jens Axboe
@ 2023-10-19 11:47   ` Ming Lei
  2023-10-19 12:10     ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2023-10-19 11:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Guang Wu, ming.lei

On Thu, Oct 19, 2023 at 05:31:11AM -0600, Jens Axboe wrote:
> On 10/19/23 2:06 AM, Ming Lei wrote:
> > Hello Jens,
> > 
> > Guang Wu found that tests::net::test_tcp_recv_multi in rust:io_uring
> > hangs, and no such issue in RH test kernel.
> > 
> > - git clone https://github.com/tokio-rs/io-uring.git
> > - cd io-uring
> > - cargo run --package io-uring-test
> > 
> > I figured out that it is made by missing the last CQE with -ENOBUFS,
> > which is caused by commit a2741c58ac67 ("io_uring/net: don't retry recvmsg()
> > unnecessarily").
> > 
> > I am not sure if the last CQE should be returned and that depends how normal
> > recv_multi is written, but IORING_CQE_F_MORE in the previous CQE shouldn't be
> > returned at least.
> 
> Is this because it depends on this spurious retry? IOW, it adds N
> buffers and triggers N receives, then depends on an internal extra retry
> which would then yield -ENOBUFS? Because that sounds like a broken test.

Yeah, that is basically what the test does. 

The test gets two recv CQEs, both have IORING_CQE_F_MORE. And it waits for 3
CQEs, and never return because there isn't the 3rd CQE after
a2741c58ac67 ("io_uring/net: don't retry recvmsg() unnecessarily")
is merged.

> As long as the recv triggers successfully, IORING_CQE_F_MORE will be
> set. Only if it his some terminating condition would it trigger a CQE
> without the MORE flag set. If it remains armed and ready to trigger
> again, it will have MORE set. I'll take a look, this is pure guesswork
> on my side right now.

.B IORING_CQE_F_MORE
If set, the application should expect more completions from the request. This
is used for requests that can generate multiple completions, such as multi-shot
requests, receive, or accept.

I understand that if one CQE is received with IORING_CQE_F_MORE, it is
reasonable for userspace to wait for one extra CQE, is this expectation
correct? Or the documentation needs to be updated?


Thanks,
Ming


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [v6.4 Regression] rust/io_uring: tests::net::test_tcp_recv_multi hangs
  2023-10-19 11:47   ` Ming Lei
@ 2023-10-19 12:10     ` Jens Axboe
  0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2023-10-19 12:10 UTC (permalink / raw)
  To: Ming Lei; +Cc: io-uring, Guang Wu

On 10/19/23 5:47 AM, Ming Lei wrote:
> On Thu, Oct 19, 2023 at 05:31:11AM -0600, Jens Axboe wrote:
>> On 10/19/23 2:06 AM, Ming Lei wrote:
>>> Hello Jens,
>>>
>>> Guang Wu found that tests::net::test_tcp_recv_multi in rust:io_uring
>>> hangs, and no such issue in RH test kernel.
>>>
>>> - git clone https://github.com/tokio-rs/io-uring.git
>>> - cd io-uring
>>> - cargo run --package io-uring-test
>>>
>>> I figured out that it is made by missing the last CQE with -ENOBUFS,
>>> which is caused by commit a2741c58ac67 ("io_uring/net: don't retry recvmsg()
>>> unnecessarily").
>>>
>>> I am not sure if the last CQE should be returned and that depends how normal
>>> recv_multi is written, but IORING_CQE_F_MORE in the previous CQE shouldn't be
>>> returned at least.
>>
>> Is this because it depends on this spurious retry? IOW, it adds N
>> buffers and triggers N receives, then depends on an internal extra retry
>> which would then yield -ENOBUFS? Because that sounds like a broken test.
> 
> Yeah, that is basically what the test does. 
> 
> The test gets two recv CQEs, both have IORING_CQE_F_MORE. And it waits for 3
> CQEs, and never return because there isn't the 3rd CQE after
> a2741c58ac67 ("io_uring/net: don't retry recvmsg() unnecessarily")
> is merged.

Right, and this is why it's invalid. If you send two, you will get two.
This is a misunderstanding of how recv multishot works, and the test
relied on an odd quirk where we'd sometimes re-trigger a recv even
though we did not have to.

>> As long as the recv triggers successfully, IORING_CQE_F_MORE will be
>> set. Only if it his some terminating condition would it trigger a CQE
>> without the MORE flag set. If it remains armed and ready to trigger
>> again, it will have MORE set. I'll take a look, this is pure guesswork
>> on my side right now.
> 
> .B IORING_CQE_F_MORE
> If set, the application should expect more completions from the request. This
> is used for requests that can generate multiple completions, such as multi-shot
> requests, receive, or accept.
> 
> I understand that if one CQE is received with IORING_CQE_F_MORE, it is
> reasonable for userspace to wait for one extra CQE, is this expectation
> correct? Or the documentation needs to be updated?

This is correct, if IORING_CQE_F_MORE is set, the request remains armed
and will trigger an even again. That next event may not have
IORING_CQE_F_MORE set, but there will always be that next even _as long
as something causes a new cqe to be issued_.

The test sets up two buffers, and arms a recv multishot for them. It
then proceeds to send two buffers, which are received and completed.
Each of those CQEs will have MORE set, because they both completed
successfully, and if someone sends more data, it will trigger again.
What the test should do is ensure that another recv is triggered, I've
attached a dummy patch below.

This is simply a broken test. No errors have occurred (eg running out of
buffers, or receive being shorter than it should be), hence there's no
reason for io_uring to terminate the multishot request.


diff --git a/io-uring-test/src/tests/net.rs b/io-uring-test/src/tests/net.rs
index 18beb20773cf..82208443f2df 100644
--- a/io-uring-test/src/tests/net.rs
+++ b/io-uring-test/src/tests/net.rs
@@ -1100,7 +1100,9 @@ pub fn test_tcp_recv_multi<S: squeue::EntryMarker, C: cqueue::EntryMarker>(
     // Send one package made of two segments, and receive as two buffers, each max length 1024
     // so the first buffer received should be length 1024 and the second length 256.
     let mut input = vec![0xde; 1024];
-    input.extend_from_slice(&[0xad; 256]);
+    input.extend_from_slice(&[0xad; 1024]);
+    let mut enobufs = vec![0xff; 256];
+    input.append(&mut enobufs);
     let mut bufs = vec![0; 2 * 1024];
 
     // provide bufs
@@ -1118,7 +1120,7 @@ pub fn test_tcp_recv_multi<S: squeue::EntryMarker, C: cqueue::EntryMarker>(
     assert_eq!(cqe.user_data(), 0x21);
     assert_eq!(cqe.result(), 0);
 
-    // write all 1024 + 256
+    // write all 1024 + 1024 + 256
     send_stream.write_all(&input)?;
 
     // multishot recv using a buf_group with 1024 length buffers
@@ -1143,7 +1145,7 @@ pub fn test_tcp_recv_multi<S: squeue::EntryMarker, C: cqueue::EntryMarker>(
     assert_eq!(&bufs[..1024], &input[..1024]);
 
     assert_eq!(cqes[1].user_data(), 0x22);
-    assert_eq!(cqes[1].result(), 256); // length 256
+    assert_eq!(cqes[1].result(), 1024); // length 1024
     assert!(cqueue::more(cqes[1].flags()));
     assert_eq!(cqueue::buffer_select(cqes[1].flags()), Some(1));
     assert_eq!(&bufs[1024..(1024 + 256)], &input[1024..(1024 + 256)]);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-19 12:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-19  8:06 [v6.4 Regression] rust/io_uring: tests::net::test_tcp_recv_multi hangs Ming Lei
2023-10-19 11:31 ` Jens Axboe
2023-10-19 11:47   ` Ming Lei
2023-10-19 12:10     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox