* [PATCH 0/2] Introduce non circular SQ
@ 2025-10-14 10:58 Pavel Begunkov
2025-10-14 10:58 ` [PATCH 1/2] io_uring: check for user passing 0 nr_submit Pavel Begunkov
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-10-14 10:58 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Add a feature that makes the kernel to ignore SQ head/tail and
always start fetching SQ entries from index 0, which helps to
keep caches hot. See Patch 2 for more details.
liburing support:
https://github.com/isilence/liburing.git sq-rewind
Tested by forcing liburing to enable the flag for compatible setups.
Pavel Begunkov (2):
io_uring: check for user passing 0 nr_submit
io_uring: introduce non-circular SQ
include/uapi/linux/io_uring.h | 6 ++++++
io_uring/io_uring.c | 34 +++++++++++++++++++++++++---------
io_uring/io_uring.h | 3 ++-
3 files changed, 33 insertions(+), 10 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 1/2] io_uring: check for user passing 0 nr_submit
2025-10-14 10:58 [PATCH 0/2] Introduce non circular SQ Pavel Begunkov
@ 2025-10-14 10:58 ` Pavel Begunkov
2025-10-14 10:58 ` [PATCH 2/2] io_uring: introduce non-circular SQ Pavel Begunkov
2025-10-14 15:05 ` [PATCH 0/2] Introduce non circular SQ Jens Axboe
2 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-10-14 10:58 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
io_submit_sqes() shouldn't be stepping into its main loop when there is
nothing to submit, i.e. nr=0. Fix 0 submission queue entries checks,
which should follow after all user input truncations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
io_uring/io_uring.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 820ef0527666..ee04ab9bf968 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2422,10 +2422,11 @@ int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
unsigned int left;
int ret;
+ entries = min(nr, entries);
if (unlikely(!entries))
return 0;
- /* make sure SQ entry isn't read before tail */
- ret = left = min(nr, entries);
+
+ ret = left = entries;
io_get_task_refs(left);
io_submit_state_start(&ctx->submit_state, left);
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH 2/2] io_uring: introduce non-circular SQ
2025-10-14 10:58 [PATCH 0/2] Introduce non circular SQ Pavel Begunkov
2025-10-14 10:58 ` [PATCH 1/2] io_uring: check for user passing 0 nr_submit Pavel Begunkov
@ 2025-10-14 10:58 ` Pavel Begunkov
2025-10-14 17:21 ` Jens Axboe
2025-10-14 18:37 ` Caleb Sander Mateos
2025-10-14 15:05 ` [PATCH 0/2] Introduce non circular SQ Jens Axboe
2 siblings, 2 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-10-14 10:58 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence
Outside of SQPOLL, normally SQ entries are consumed by the time the
submission syscall returns. For those cases we don't need a circular
buffer and the head/tail tracking, instead the kernel can assume that
entries always start from the beginning of the SQ at index 0. This patch
introduces a setup flag doing exactly that.
This method is simpler in general, needs fewer operations, doesn't
require looking up heads and tails, however, the main goal here is to
keep caches hot. The userspace might overprovision SQ, and in the normal
way we'd be touching all the cache lines, but with this feature it
reuses first entries and keeps them hot. This simplicity will also be
quite handy for bpf-io_uring.
To use the feature the user should set the IORING_SETUP_SQ_REWIND flag,
and have a compatible liburing/userspace. The flag is restricted to
IORING_SETUP_NO_SQARRAY rings and is not compatible with
IORING_SETUP_SQPOLL.
Note, it uses relaxed ring synchronisation as the userspace doing the
syscall is naturally in sync, and setups that share a SQ should be
rolling their own intra process/thread synchronisation.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
include/uapi/linux/io_uring.h | 6 ++++++
io_uring/io_uring.c | 29 ++++++++++++++++++++++-------
io_uring/io_uring.h | 3 ++-
3 files changed, 30 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index a0cc1cc0dd01..d1c654a7fa9a 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -231,6 +231,12 @@ enum io_uring_sqe_flags_bit {
*/
#define IORING_SETUP_CQE_MIXED (1U << 18)
+/*
+ * SQEs always start at index 0 in the submission ring instead of using a
+ * wrap around indexing.
+ */
+#define IORING_SETUP_SQ_REWIND (1U << 19)
+
enum io_uring_op {
IORING_OP_NOP,
IORING_OP_READV,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ee04ab9bf968..e8af963d3233 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2367,12 +2367,16 @@ static void io_commit_sqring(struct io_ring_ctx *ctx)
{
struct io_rings *rings = ctx->rings;
- /*
- * Ensure any loads from the SQEs are done at this point,
- * since once we write the new head, the application could
- * write new data to them.
- */
- smp_store_release(&rings->sq.head, ctx->cached_sq_head);
+ if (ctx->flags & IORING_SETUP_SQ_REWIND) {
+ ctx->cached_sq_head = 0;
+ } else {
+ /*
+ * Ensure any loads from the SQEs are done at this point,
+ * since once we write the new head, the application could
+ * write new data to them.
+ */
+ smp_store_release(&rings->sq.head, ctx->cached_sq_head);
+ }
}
/*
@@ -2418,10 +2422,15 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
__must_hold(&ctx->uring_lock)
{
- unsigned int entries = io_sqring_entries(ctx);
+ unsigned int entries;
unsigned int left;
int ret;
+ if (ctx->flags & IORING_SETUP_SQ_REWIND)
+ entries = ctx->sq_entries;
+ else
+ entries = io_sqring_entries(ctx);
+
entries = min(nr, entries);
if (unlikely(!entries))
return 0;
@@ -3678,6 +3687,12 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
{
unsigned flags = p->flags;
+ if (flags & IORING_SETUP_SQ_REWIND) {
+ if ((flags & IORING_SETUP_SQPOLL) ||
+ !(flags & IORING_SETUP_NO_SQARRAY))
+ return -EINVAL;
+ }
+
/* There is no way to mmap rings without a real fd */
if ((flags & IORING_SETUP_REGISTERED_FD_ONLY) &&
!(flags & IORING_SETUP_NO_MMAP))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 46d9141d772a..b998ed57dd93 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -54,7 +54,8 @@
IORING_SETUP_REGISTERED_FD_ONLY |\
IORING_SETUP_NO_SQARRAY |\
IORING_SETUP_HYBRID_IOPOLL |\
- IORING_SETUP_CQE_MIXED)
+ IORING_SETUP_CQE_MIXED |\
+ IORING_SETUP_SQ_REWIND)
#define IORING_ENTER_FLAGS (IORING_ENTER_GETEVENTS |\
IORING_ENTER_SQ_WAKEUP |\
--
2.49.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Introduce non circular SQ
2025-10-14 10:58 [PATCH 0/2] Introduce non circular SQ Pavel Begunkov
2025-10-14 10:58 ` [PATCH 1/2] io_uring: check for user passing 0 nr_submit Pavel Begunkov
2025-10-14 10:58 ` [PATCH 2/2] io_uring: introduce non-circular SQ Pavel Begunkov
@ 2025-10-14 15:05 ` Jens Axboe
2025-10-14 16:02 ` Pavel Begunkov
2 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-14 15:05 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 10/14/25 4:58 AM, Pavel Begunkov wrote:
> Add a feature that makes the kernel to ignore SQ head/tail and
> always start fetching SQ entries from index 0, which helps to
> keep caches hot. See Patch 2 for more details.
>
> liburing support:
> https://github.com/isilence/liburing.git sq-rewind
>
> Tested by forcing liburing to enable the flag for compatible setups.
>
> Pavel Begunkov (2):
> io_uring: check for user passing 0 nr_submit
> io_uring: introduce non-circular SQ
>
> include/uapi/linux/io_uring.h | 6 ++++++
> io_uring/io_uring.c | 34 +++++++++++++++++++++++++---------
> io_uring/io_uring.h | 3 ++-
> 3 files changed, 33 insertions(+), 10 deletions(-)
I like the concept of this, makes a lot of sense. No need to keep
churning through the entire SQ ring, when apps mostly submit a few
requests at the time. Will help cut down on cacheline usage.
Curious, do you have any numbers on this for any kind of workload?
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Introduce non circular SQ
2025-10-14 15:05 ` [PATCH 0/2] Introduce non circular SQ Jens Axboe
@ 2025-10-14 16:02 ` Pavel Begunkov
2025-10-14 16:08 ` Pavel Begunkov
0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2025-10-14 16:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 10/14/25 16:05, Jens Axboe wrote:
> On 10/14/25 4:58 AM, Pavel Begunkov wrote:
>> Add a feature that makes the kernel to ignore SQ head/tail and
>> always start fetching SQ entries from index 0, which helps to
>> keep caches hot. See Patch 2 for more details.
>>
>> liburing support:
>> https://github.com/isilence/liburing.git sq-rewind
>>
>> Tested by forcing liburing to enable the flag for compatible setups.
>>
>> Pavel Begunkov (2):
>> io_uring: check for user passing 0 nr_submit
>> io_uring: introduce non-circular SQ
>>
>> include/uapi/linux/io_uring.h | 6 ++++++
>> io_uring/io_uring.c | 34 +++++++++++++++++++++++++---------
>> io_uring/io_uring.h | 3 ++-
>> 3 files changed, 33 insertions(+), 10 deletions(-)
>
> I like the concept of this, makes a lot of sense. No need to keep
> churning through the entire SQ ring, when apps mostly submit a few
> requests at the time. Will help cut down on cacheline usage.
>
> Curious, do you have any numbers on this for any kind of workload?
No, very likely it's a micro optimisation in the grand picture and
would be hard to measure for anything sensible / realistic. It
shouldn't be too difficult to come up with a test with a bunch of
pinned tasks putting a memory pressure, but that would be as useful
as it sounds.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Introduce non circular SQ
2025-10-14 16:02 ` Pavel Begunkov
@ 2025-10-14 16:08 ` Pavel Begunkov
2025-10-14 17:19 ` Jens Axboe
0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2025-10-14 16:08 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 10/14/25 17:02, Pavel Begunkov wrote:
> On 10/14/25 16:05, Jens Axboe wrote:
>> On 10/14/25 4:58 AM, Pavel Begunkov wrote:
>>> Add a feature that makes the kernel to ignore SQ head/tail and
>>> always start fetching SQ entries from index 0, which helps to
>>> keep caches hot. See Patch 2 for more details.
>>>
>>> liburing support:
>>> https://github.com/isilence/liburing.git sq-rewind
>>>
>>> Tested by forcing liburing to enable the flag for compatible setups.
>>>
>>> Pavel Begunkov (2):
>>> io_uring: check for user passing 0 nr_submit
>>> io_uring: introduce non-circular SQ
>>>
>>> include/uapi/linux/io_uring.h | 6 ++++++
>>> io_uring/io_uring.c | 34 +++++++++++++++++++++++++---------
>>> io_uring/io_uring.h | 3 ++-
>>> 3 files changed, 33 insertions(+), 10 deletions(-)
>>
>> I like the concept of this, makes a lot of sense. No need to keep
>> churning through the entire SQ ring, when apps mostly submit a few
>> requests at the time. Will help cut down on cacheline usage.
>>
>> Curious, do you have any numbers on this for any kind of workload?
>
> No, very likely it's a micro optimisation in the grand picture and
> would be hard to measure for anything sensible / realistic. It
> shouldn't be too difficult to come up with a test with a bunch of
> pinned tasks putting a memory pressure, but that would be as useful
> as it sounds.
Simplicity was not the last priority either. E.g. userspace can
replace all io_uring_get_sqe() with wrapping the SQ pointer
into some array_view at the beginning and work with that.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/2] Introduce non circular SQ
2025-10-14 16:08 ` Pavel Begunkov
@ 2025-10-14 17:19 ` Jens Axboe
0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2025-10-14 17:19 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 10/14/25 10:08 AM, Pavel Begunkov wrote:
> On 10/14/25 17:02, Pavel Begunkov wrote:
>> On 10/14/25 16:05, Jens Axboe wrote:
>>> On 10/14/25 4:58 AM, Pavel Begunkov wrote:
>>>> Add a feature that makes the kernel to ignore SQ head/tail and
>>>> always start fetching SQ entries from index 0, which helps to
>>>> keep caches hot. See Patch 2 for more details.
>>>>
>>>> liburing support:
>>>> https://github.com/isilence/liburing.git sq-rewind
>>>>
>>>> Tested by forcing liburing to enable the flag for compatible setups.
>>>>
>>>> Pavel Begunkov (2):
>>>> io_uring: check for user passing 0 nr_submit
>>>> io_uring: introduce non-circular SQ
>>>>
>>>> include/uapi/linux/io_uring.h | 6 ++++++
>>>> io_uring/io_uring.c | 34 +++++++++++++++++++++++++---------
>>>> io_uring/io_uring.h | 3 ++-
>>>> 3 files changed, 33 insertions(+), 10 deletions(-)
>>>
>>> I like the concept of this, makes a lot of sense. No need to keep
>>> churning through the entire SQ ring, when apps mostly submit a few
>>> requests at the time. Will help cut down on cacheline usage.
>>>
>>> Curious, do you have any numbers on this for any kind of workload?
>>
>> No, very likely it's a micro optimisation in the grand picture and
>> would be hard to measure for anything sensible / realistic. It
>> shouldn't be too difficult to come up with a test with a bunch of
>> pinned tasks putting a memory pressure, but that would be as useful
>> as it sounds.
>
> Simplicity was not the last priority either. E.g. userspace can
> replace all io_uring_get_sqe() with wrapping the SQ pointer
> into some array_view at the beginning and work with that.
Yeah agree, it's nice both ways.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] io_uring: introduce non-circular SQ
2025-10-14 10:58 ` [PATCH 2/2] io_uring: introduce non-circular SQ Pavel Begunkov
@ 2025-10-14 17:21 ` Jens Axboe
2025-10-14 18:58 ` Pavel Begunkov
2025-10-14 18:37 ` Caleb Sander Mateos
1 sibling, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2025-10-14 17:21 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 10/14/25 4:58 AM, Pavel Begunkov wrote:
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index a0cc1cc0dd01..d1c654a7fa9a 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -231,6 +231,12 @@ enum io_uring_sqe_flags_bit {
> */
> #define IORING_SETUP_CQE_MIXED (1U << 18)
>
> +/*
> + * SQEs always start at index 0 in the submission ring instead of using a
> + * wrap around indexing.
> + */
> +#define IORING_SETUP_SQ_REWIND (1U << 19)
No real comments on this patch, very straightforward. I do think this
comment should be expanded on considerably, mostly just by adding a
comparison to how it's different from the normal non-rewind way of
tracking which SQEs to submit. I can do it too, or send a v2.
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] io_uring: introduce non-circular SQ
2025-10-14 10:58 ` [PATCH 2/2] io_uring: introduce non-circular SQ Pavel Begunkov
2025-10-14 17:21 ` Jens Axboe
@ 2025-10-14 18:37 ` Caleb Sander Mateos
2025-10-14 19:26 ` Pavel Begunkov
1 sibling, 1 reply; 13+ messages in thread
From: Caleb Sander Mateos @ 2025-10-14 18:37 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring
On Tue, Oct 14, 2025 at 3:57 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> Outside of SQPOLL, normally SQ entries are consumed by the time the
> submission syscall returns. For those cases we don't need a circular
> buffer and the head/tail tracking, instead the kernel can assume that
> entries always start from the beginning of the SQ at index 0. This patch
> introduces a setup flag doing exactly that.
>
> This method is simpler in general, needs fewer operations, doesn't
> require looking up heads and tails, however, the main goal here is to
> keep caches hot. The userspace might overprovision SQ, and in the normal
> way we'd be touching all the cache lines, but with this feature it
> reuses first entries and keeps them hot. This simplicity will also be
> quite handy for bpf-io_uring.
>
> To use the feature the user should set the IORING_SETUP_SQ_REWIND flag,
> and have a compatible liburing/userspace. The flag is restricted to
> IORING_SETUP_NO_SQARRAY rings and is not compatible with
> IORING_SETUP_SQPOLL.
>
> Note, it uses relaxed ring synchronisation as the userspace doing the
> syscall is naturally in sync, and setups that share a SQ should be
> rolling their own intra process/thread synchronisation.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
> include/uapi/linux/io_uring.h | 6 ++++++
> io_uring/io_uring.c | 29 ++++++++++++++++++++++-------
> io_uring/io_uring.h | 3 ++-
> 3 files changed, 30 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index a0cc1cc0dd01..d1c654a7fa9a 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -231,6 +231,12 @@ enum io_uring_sqe_flags_bit {
> */
> #define IORING_SETUP_CQE_MIXED (1U << 18)
>
> +/*
> + * SQEs always start at index 0 in the submission ring instead of using a
> + * wrap around indexing.
> + */
> +#define IORING_SETUP_SQ_REWIND (1U << 19)
Keith's mixed-SQE-size patch series is already planning to use this
flag: https://lore.kernel.org/io-uring/20251013180011.134131-3-kbusch@meta.com/
> +
> enum io_uring_op {
> IORING_OP_NOP,
> IORING_OP_READV,
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index ee04ab9bf968..e8af963d3233 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2367,12 +2367,16 @@ static void io_commit_sqring(struct io_ring_ctx *ctx)
> {
> struct io_rings *rings = ctx->rings;
>
> - /*
> - * Ensure any loads from the SQEs are done at this point,
> - * since once we write the new head, the application could
> - * write new data to them.
> - */
> - smp_store_release(&rings->sq.head, ctx->cached_sq_head);
> + if (ctx->flags & IORING_SETUP_SQ_REWIND) {
> + ctx->cached_sq_head = 0;
The only awkward thing about this interface seems to be if
io_submit_sqes() aborts early without submitting all the requested
SQEs. Does userspace then need to memmove() the remaining SQEs to the
start of the ring? It's certainly an unlikely case but something
userspace has to handle because io_alloc_req() can fail for reasons
outside its control. Seems like it might simplify the userspace side
if cached_sq_head wasn't rewound if not all SQEs were consumed.
> + } else {
> + /*
> + * Ensure any loads from the SQEs are done at this point,
> + * since once we write the new head, the application could
> + * write new data to them.
> + */
> + smp_store_release(&rings->sq.head, ctx->cached_sq_head);
> + }
> }
>
> /*
> @@ -2418,10 +2422,15 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
> int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
> __must_hold(&ctx->uring_lock)
> {
> - unsigned int entries = io_sqring_entries(ctx);
> + unsigned int entries;
> unsigned int left;
> int ret;
>
> + if (ctx->flags & IORING_SETUP_SQ_REWIND)
> + entries = ctx->sq_entries;
> + else
> + entries = io_sqring_entries(ctx);
> +
> entries = min(nr, entries);
> if (unlikely(!entries))
> return 0;
> @@ -3678,6 +3687,12 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
> {
> unsigned flags = p->flags;
>
> + if (flags & IORING_SETUP_SQ_REWIND) {
> + if ((flags & IORING_SETUP_SQPOLL) ||
> + !(flags & IORING_SETUP_NO_SQARRAY))
Is there a reason IORING_SETUP_NO_SQARRAY is required? It seems like
the implementation would work just fine with the SQ indirection ring;
the rewind would just apply to the indirection ring instead of the SQE
array. The cache hit rate benefit would probably be smaller since many
more SQ indirection entries fit in a single cache line, but I don't
see a reason to explicitly forbid it.
Best,
Caleb
> + return -EINVAL;
> + }
> +
> /* There is no way to mmap rings without a real fd */
> if ((flags & IORING_SETUP_REGISTERED_FD_ONLY) &&
> !(flags & IORING_SETUP_NO_MMAP))
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index 46d9141d772a..b998ed57dd93 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -54,7 +54,8 @@
> IORING_SETUP_REGISTERED_FD_ONLY |\
> IORING_SETUP_NO_SQARRAY |\
> IORING_SETUP_HYBRID_IOPOLL |\
> - IORING_SETUP_CQE_MIXED)
> + IORING_SETUP_CQE_MIXED |\
> + IORING_SETUP_SQ_REWIND)
>
> #define IORING_ENTER_FLAGS (IORING_ENTER_GETEVENTS |\
> IORING_ENTER_SQ_WAKEUP |\
> --
> 2.49.0
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] io_uring: introduce non-circular SQ
2025-10-14 17:21 ` Jens Axboe
@ 2025-10-14 18:58 ` Pavel Begunkov
0 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-10-14 18:58 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 10/14/25 18:21, Jens Axboe wrote:
> On 10/14/25 4:58 AM, Pavel Begunkov wrote:
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index a0cc1cc0dd01..d1c654a7fa9a 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -231,6 +231,12 @@ enum io_uring_sqe_flags_bit {
>> */
>> #define IORING_SETUP_CQE_MIXED (1U << 18)
>>
>> +/*
>> + * SQEs always start at index 0 in the submission ring instead of using a
>> + * wrap around indexing.
>> + */
>> +#define IORING_SETUP_SQ_REWIND (1U << 19)
>
> No real comments on this patch, very straightforward. I do think this
> comment should be expanded on considerably, mostly just by adding a
> comparison to how it's different from the normal non-rewind way of
> tracking which SQEs to submit. I can do it too, or send a v2.
Please do if that's fine with you. Not entirely sure how you
want to put it into words.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] io_uring: introduce non-circular SQ
2025-10-14 18:37 ` Caleb Sander Mateos
@ 2025-10-14 19:26 ` Pavel Begunkov
2025-10-14 19:46 ` Caleb Sander Mateos
0 siblings, 1 reply; 13+ messages in thread
From: Pavel Begunkov @ 2025-10-14 19:26 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: io-uring
On 10/14/25 19:37, Caleb Sander Mateos wrote:
> On Tue, Oct 14, 2025 at 3:57 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
...>> + * SQEs always start at index 0 in the submission ring instead of using a
>> + * wrap around indexing.
>> + */
>> +#define IORING_SETUP_SQ_REWIND (1U << 19)
>
> Keith's mixed-SQE-size patch series is already planning to use this
> flag: https://lore.kernel.org/io-uring/20251013180011.134131-3-kbusch@meta.com/
I'll rebase it as ususual if that gets merged before.
>> - /*
>> - * Ensure any loads from the SQEs are done at this point,
>> - * since once we write the new head, the application could
>> - * write new data to them.
>> - */
>> - smp_store_release(&rings->sq.head, ctx->cached_sq_head);
>> + if (ctx->flags & IORING_SETUP_SQ_REWIND) {
>> + ctx->cached_sq_head = 0;
>
> The only awkward thing about this interface seems to be if
> io_submit_sqes() aborts early without submitting all the requested
> SQEs. Does userspace then need to memmove() the remaining SQEs to the
> start of the ring? It's certainly an unlikely case but something
> userspace has to handle because io_alloc_req() can fail for reasons
> outside its control. Seems like it might simplify the userspace side
> if cached_sq_head wasn't rewound if not all SQEs were consumed.
This kind of special rules is what usually makes interfaces a pain to
work with. What if you want to abort all un-submitted requests
instead? You can empty the queue, but then the next syscall will
still start from the middle. Or what if the application wants to
queue more requests before resubmitting previous ones? There are
reasons b/c the kernel will need to handle it in a less elegant way
than it potentially can otherwise. memmove sounds appropriate.
>> @@ -3678,6 +3687,12 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
>> {
>> unsigned flags = p->flags;
>>
>> + if (flags & IORING_SETUP_SQ_REWIND) {
>> + if ((flags & IORING_SETUP_SQPOLL) ||
>> + !(flags & IORING_SETUP_NO_SQARRAY))
>
> Is there a reason IORING_SETUP_NO_SQARRAY is required? It seems like
> the implementation would work just fine with the SQ indirection ring;
> the rewind would just apply to the indirection ring instead of the SQE
> array. The cache hit rate benefit would probably be smaller since many
> more SQ indirection entries fit in a single cache line, but I don't
> see a reason to explicitly forbid it.
B/c I don't care about sqarray setups, they are on the way out for soft
deprecation with liburing defaulting to NO_SQARRAY, and once you try
to optimise the kernel IORING_SETUP_SQ_REWIND handling it might turn
out that !NO_SQARRAY is in the way... or not, but you can always allow
it later while limiting it would break uapi. In short, it's weighting
chances of (micro) optimisations in the future vs supporting a case
which is unlikely going to be used.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] io_uring: introduce non-circular SQ
2025-10-14 19:26 ` Pavel Begunkov
@ 2025-10-14 19:46 ` Caleb Sander Mateos
2025-10-16 11:38 ` Pavel Begunkov
0 siblings, 1 reply; 13+ messages in thread
From: Caleb Sander Mateos @ 2025-10-14 19:46 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring
On Tue, Oct 14, 2025 at 12:25 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 10/14/25 19:37, Caleb Sander Mateos wrote:
> > On Tue, Oct 14, 2025 at 3:57 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> ...>> + * SQEs always start at index 0 in the submission ring instead of using a
> >> + * wrap around indexing.
> >> + */
> >> +#define IORING_SETUP_SQ_REWIND (1U << 19)
> >
> > Keith's mixed-SQE-size patch series is already planning to use this
> > flag: https://lore.kernel.org/io-uring/20251013180011.134131-3-kbusch@meta.com/
>
> I'll rebase it as ususual if that gets merged before.
> >> - /*
> >> - * Ensure any loads from the SQEs are done at this point,
> >> - * since once we write the new head, the application could
> >> - * write new data to them.
> >> - */
> >> - smp_store_release(&rings->sq.head, ctx->cached_sq_head);
> >> + if (ctx->flags & IORING_SETUP_SQ_REWIND) {
> >> + ctx->cached_sq_head = 0;
> >
> > The only awkward thing about this interface seems to be if
> > io_submit_sqes() aborts early without submitting all the requested
> > SQEs. Does userspace then need to memmove() the remaining SQEs to the
> > start of the ring? It's certainly an unlikely case but something
> > userspace has to handle because io_alloc_req() can fail for reasons
> > outside its control. Seems like it might simplify the userspace side
> > if cached_sq_head wasn't rewound if not all SQEs were consumed.
> This kind of special rules is what usually makes interfaces a pain to
> work with. What if you want to abort all un-submitted requests
> instead? You can empty the queue, but then the next syscall will
> still start from the middle. Or what if the application wants to
> queue more requests before resubmitting previous ones? There are
> reasons b/c the kernel will need to handle it in a less elegant way
> than it potentially can otherwise. memmove sounds appropriate.
Maybe most convenient would be a way for userspace to pass both a head
and a nr/tail value to the syscall instead of assuming the head is
always 0. But it's probably difficult to modify the existing syscall
interface without an indirection to the head value, which seems to be
a main point of this series. So always resetting to 0 and requiring
userspace to memmove() the remaining SQEs in the rare case that
io_uring_enter() doesn't consume all of them seems like a reasonable
approach.
>
> >> @@ -3678,6 +3687,12 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
> >> {
> >> unsigned flags = p->flags;
> >>
> >> + if (flags & IORING_SETUP_SQ_REWIND) {
> >> + if ((flags & IORING_SETUP_SQPOLL) ||
> >> + !(flags & IORING_SETUP_NO_SQARRAY))
> >
> > Is there a reason IORING_SETUP_NO_SQARRAY is required? It seems like
> > the implementation would work just fine with the SQ indirection ring;
> > the rewind would just apply to the indirection ring instead of the SQE
> > array. The cache hit rate benefit would probably be smaller since many
> > more SQ indirection entries fit in a single cache line, but I don't
> > see a reason to explicitly forbid it.
>
> B/c I don't care about sqarray setups, they are on the way out for soft
> deprecation with liburing defaulting to NO_SQARRAY, and once you try
> to optimise the kernel IORING_SETUP_SQ_REWIND handling it might turn
> out that !NO_SQARRAY is in the way... or not, but you can always allow
> it later while limiting it would break uapi. In short, it's weighting
> chances of (micro) optimisations in the future vs supporting a case
> which is unlikely going to be used.
Fair point, tradeoffs either way.
Best,
Caleb
>
> --
> Pavel Begunkov
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 2/2] io_uring: introduce non-circular SQ
2025-10-14 19:46 ` Caleb Sander Mateos
@ 2025-10-16 11:38 ` Pavel Begunkov
0 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2025-10-16 11:38 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: io-uring
On 10/14/25 20:46, Caleb Sander Mateos wrote:
> On Tue, Oct 14, 2025 at 12:25 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 10/14/25 19:37, Caleb Sander Mateos wrote:
>>> On Tue, Oct 14, 2025 at 3:57 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>> ...>> + * SQEs always start at index 0 in the submission ring instead of using a
>>>> + * wrap around indexing.
>>>> + */
>>>> +#define IORING_SETUP_SQ_REWIND (1U << 19)
>>>
>>> Keith's mixed-SQE-size patch series is already planning to use this
>>> flag: https://lore.kernel.org/io-uring/20251013180011.134131-3-kbusch@meta.com/
>>
>> I'll rebase it as ususual if that gets merged before.
>>>> - /*
>>>> - * Ensure any loads from the SQEs are done at this point,
>>>> - * since once we write the new head, the application could
>>>> - * write new data to them.
>>>> - */
>>>> - smp_store_release(&rings->sq.head, ctx->cached_sq_head);
>>>> + if (ctx->flags & IORING_SETUP_SQ_REWIND) {
>>>> + ctx->cached_sq_head = 0;
>>>
>>> The only awkward thing about this interface seems to be if
>>> io_submit_sqes() aborts early without submitting all the requested
>>> SQEs. Does userspace then need to memmove() the remaining SQEs to the
>>> start of the ring? It's certainly an unlikely case but something
>>> userspace has to handle because io_alloc_req() can fail for reasons
>>> outside its control. Seems like it might simplify the userspace side
>>> if cached_sq_head wasn't rewound if not all SQEs were consumed.
>> This kind of special rules is what usually makes interfaces a pain to
>> work with. What if you want to abort all un-submitted requests
>> instead? You can empty the queue, but then the next syscall will
>> still start from the middle. Or what if the application wants to
>> queue more requests before resubmitting previous ones? There are
>> reasons b/c the kernel will need to handle it in a less elegant way
>> than it potentially can otherwise. memmove sounds appropriate.
>
> Maybe most convenient would be a way for userspace to pass both a head
> and a nr/tail value to the syscall instead of assuming the head is
> always 0. But it's probably difficult to modify the existing syscall
It feels fine from the API perspective, but you still need head/tail
fetching and care, additional index sanitisation (Spectre), and either
handling wrap around or extra border checks. All minor points, but
the index handling will likely be more annoying than just doing a
memmove.
> interface without an indirection to the head value, which seems to be
> a main point of this series. So always resetting to 0 and requiring
> userspace to memmove() the remaining SQEs in the rare case that
> io_uring_enter() doesn't consume all of them seems like a reasonable
> approach.
If that's what the user wants to do as there are other ways
it could be handled.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2025-10-16 11:36 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-14 10:58 [PATCH 0/2] Introduce non circular SQ Pavel Begunkov
2025-10-14 10:58 ` [PATCH 1/2] io_uring: check for user passing 0 nr_submit Pavel Begunkov
2025-10-14 10:58 ` [PATCH 2/2] io_uring: introduce non-circular SQ Pavel Begunkov
2025-10-14 17:21 ` Jens Axboe
2025-10-14 18:58 ` Pavel Begunkov
2025-10-14 18:37 ` Caleb Sander Mateos
2025-10-14 19:26 ` Pavel Begunkov
2025-10-14 19:46 ` Caleb Sander Mateos
2025-10-16 11:38 ` Pavel Begunkov
2025-10-14 15:05 ` [PATCH 0/2] Introduce non circular SQ Jens Axboe
2025-10-14 16:02 ` Pavel Begunkov
2025-10-14 16:08 ` Pavel Begunkov
2025-10-14 17:19 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox