public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/3] Add cap for multishot recv receive size
@ 2025-07-08 14:26 Jens Axboe
  2025-07-08 14:26 ` [PATCH 1/3] io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Jens Axboe @ 2025-07-08 14:26 UTC (permalink / raw)
  To: io-uring

Hi,

When using multishot receive and handling many simultaneous streams,
there's a potential fairness issue that can occur. For each receive
operation, io_uring will keep retrying a request for up to 32 times
as long as there's data pending in the socket. Depending on data
delivery times, the amount of data received can vary quite a bit.
If the multishot receives is using bundles as well, then each bundle
can use up to 256 vectors of data. This is good for effiency, but
can skew the fairness between sockets.

Multishot recv does not support setting sqe->len currently, it'll
return -EINVAL if that is done. Add support for specifying the length
in the SQE, and have it apply as a per-iteration limit for each
receive. For example, if sr->len is set to 512k, then each multishot
invocation of this request will transfer 512k bytes, at most.

As an example, this test case sets up 4 streams, and uses 32b buffers
for each stream. Each client will read 8k of data, or 256 buffers in
total per stream. If the per-invocation limit isn't set, it looks as
follows:

axboe@m2max-kvm ~> ./recv-streams
bundle=1, mshot=1
Will receive 32768 bytes total
cqe res 8192 (bid=0, id=1)
cqe res 0 (bid=0, id=1)
id=1, done, 8192 bytes
rd switch, prev id=1, bytes=8192, total_bytes=8192
cqe res 8192 (bid=256, id=2)
cqe res 0 (bid=0, id=2)
id=2, done, 8192 bytes
rd switch, prev id=2, bytes=8192, total_bytes=8192
cqe res 8192 (bid=512, id=3)
cqe res 0 (bid=0, id=3)
id=3, done, 8192 bytes
rd switch, prev id=3, bytes=8192, total_bytes=8192
cqe res 8192 (bid=768, id=4)
id=4, done, 8192 bytes

where each stream will end up reading the full 8k before the next stream
is able to make any progress. With this patchset and setting sr->len to
2048, it looks like this instead:

axboe@m2max-kvm ~> ./recv-streams
bundle=1, mshot=1
Will receive 32768 bytes total
cqe res 2048 (bid=0, id=1)
rd switch, prev id=1, bytes=2048, total_bytes=2048
cqe res 2048 (bid=64, id=2)
rd switch, prev id=2, bytes=2048, total_bytes=2048
cqe res 2048 (bid=128, id=3)
rd switch, prev id=3, bytes=2048, total_bytes=2048
cqe res 2048 (bid=192, id=4)
rd switch, prev id=4, bytes=2048, total_bytes=2048
cqe res 2048 (bid=256, id=1)
rd switch, prev id=1, bytes=2048, total_bytes=4096
cqe res 2048 (bid=320, id=2)
rd switch, prev id=2, bytes=2048, total_bytes=4096
cqe res 2048 (bid=384, id=3)
rd switch, prev id=3, bytes=2048, total_bytes=4096
cqe res 2048 (bid=448, id=4)
rd switch, prev id=4, bytes=2048, total_bytes=4096
cqe res 2048 (bid=512, id=1)
rd switch, prev id=1, bytes=2048, total_bytes=6144
cqe res 2048 (bid=576, id=2)
rd switch, prev id=2, bytes=2048, total_bytes=6144
cqe res 2048 (bid=640, id=3)
rd switch, prev id=3, bytes=2048, total_bytes=6144
cqe res 2048 (bid=704, id=4)
rd switch, prev id=4, bytes=2048, total_bytes=6144
cqe res 2048 (bid=768, id=1)
rd switch, prev id=1, bytes=2048, total_bytes=8192
cqe res 2048 (bid=832, id=2)
rd switch, prev id=2, bytes=2048, total_bytes=8192
cqe res 2048 (bid=896, id=3)
rd switch, prev id=3, bytes=2048, total_bytes=8192
cqe res 2048 (bid=960, id=4)
id=4, done, 8192 bytes

where each stream gets to read 2k before switching to the next stream,
and then this repeats until they've all read 8k of data.

Patches 1+2 are just prep patches, patch 3 implements the capping logic.

Can also be found here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-recv-mshot-len

 include/uapi/linux/io_uring.h |  9 ++++++
 io_uring/net.c                | 52 +++++++++++++++++++++++------------
 2 files changed, 44 insertions(+), 17 deletions(-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread
* [PATCHSET v2] Add retry cap for multishot recv receive size
@ 2025-07-09 20:32 Jens Axboe
  2025-07-09 20:32 ` [PATCH 1/3] io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2025-07-09 20:32 UTC (permalink / raw)
  To: io-uring

Hi,

When using multishot receive and handling many simultaneous streams,
there's a potential fairness issue that can occur. For each receive
operation, io_uring will keep retrying a request for up to 32 times
as long as there's data pending in the socket. Depending on data
delivery times, the amount of data received can vary quite a bit.
If the multishot receives is using bundles as well, then each bundle
can use up to 256 vectors of data. This is good for effiency, but
can skew the fairness between sockets.

Multishot recv does not support setting sqe->len currently, it'll
return -EINVAL if that is done. Add support for specifying the length
in the SQE, and have it apply as a per-iteration limit for each
receive. For example, if sr->len is set to 512k, then each multishot
invocation of this request will transfer 512k bytes, at most.

As an example, this test case sets up 4 streams, and uses 32b buffers
for each stream. Each client will read 8k of data, or 256 buffers in
total per stream. If the per-invocation limit isn't set, it looks as
follows:

axboe@m2max-kvm ~> ./recv-streams
bundle=1, mshot=1
Will receive 32768 bytes total
cqe res 8192 (bid=0, id=1)
cqe res 0 (bid=0, id=1)
id=1, done, 8192 bytes
rd switch, prev id=1, bytes=8192, total_bytes=8192
cqe res 8192 (bid=256, id=2)
cqe res 0 (bid=0, id=2)
id=2, done, 8192 bytes
rd switch, prev id=2, bytes=8192, total_bytes=8192
cqe res 8192 (bid=512, id=3)
cqe res 0 (bid=0, id=3)
id=3, done, 8192 bytes
rd switch, prev id=3, bytes=8192, total_bytes=8192
cqe res 8192 (bid=768, id=4)
id=4, done, 8192 bytes

where each stream will end up reading the full 8k before the next stream
is able to make any progress. With this patchset and setting sr->len to
2048, it looks like this instead:

axboe@m2max-kvm ~> ./recv-streams
bundle=1, mshot=1
Will receive 32768 bytes total
cqe res 2048 (bid=0, id=1)
rd switch, prev id=1, bytes=2048, total_bytes=2048
cqe res 2048 (bid=64, id=2)
rd switch, prev id=2, bytes=2048, total_bytes=2048
cqe res 2048 (bid=128, id=3)
rd switch, prev id=3, bytes=2048, total_bytes=2048
cqe res 2048 (bid=192, id=4)
rd switch, prev id=4, bytes=2048, total_bytes=2048
cqe res 2048 (bid=256, id=1)
rd switch, prev id=1, bytes=2048, total_bytes=4096
cqe res 2048 (bid=320, id=2)
rd switch, prev id=2, bytes=2048, total_bytes=4096
cqe res 2048 (bid=384, id=3)
rd switch, prev id=3, bytes=2048, total_bytes=4096
cqe res 2048 (bid=448, id=4)
rd switch, prev id=4, bytes=2048, total_bytes=4096
cqe res 2048 (bid=512, id=1)
rd switch, prev id=1, bytes=2048, total_bytes=6144
cqe res 2048 (bid=576, id=2)
rd switch, prev id=2, bytes=2048, total_bytes=6144
cqe res 2048 (bid=640, id=3)
rd switch, prev id=3, bytes=2048, total_bytes=6144
cqe res 2048 (bid=704, id=4)
rd switch, prev id=4, bytes=2048, total_bytes=6144
cqe res 2048 (bid=768, id=1)
rd switch, prev id=1, bytes=2048, total_bytes=8192
cqe res 2048 (bid=832, id=2)
rd switch, prev id=2, bytes=2048, total_bytes=8192
cqe res 2048 (bid=896, id=3)
rd switch, prev id=3, bytes=2048, total_bytes=8192
cqe res 2048 (bid=960, id=4)
id=4, done, 8192 bytes

where each stream gets to read 2k before switching to the next stream,
and then this repeats until they've all read 8k of data.

Patches 1+2 are just prep patches, patch 3 implements the capping logic.

Can also be found here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-recv-mshot-len

 io_uring/net.c | 71 ++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 54 insertions(+), 17 deletions(-)

Since v1:
- Get rid of the need to check for flags overlaps and io_uring.h
  addition, as we can just document that the upper 8 bits are for
  internal uses. UAPI is only lower 8 bits anyway, as the ioprio
  field is that size.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-09 20:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 14:26 [PATCHSET 0/3] Add cap for multishot recv receive size Jens Axboe
2025-07-08 14:26 ` [PATCH 1/3] io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags Jens Axboe
2025-07-08 14:26 ` [PATCH 2/3] io_uring/net: use passed in 'len' in io_recv_buf_select() Jens Axboe
2025-07-08 14:26 ` [PATCH 3/3] io_uring/net: allow multishot receive per-invocation cap Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2025-07-09 20:32 [PATCHSET v2] Add retry cap for multishot recv receive size Jens Axboe
2025-07-09 20:32 ` [PATCH 1/3] io_uring/net: move io_sr_msg->retry_flags to io_sr_msg->flags Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox