* [PATCH v3 1/1] io_uring: introduce non-circular SQ
@ 2026-01-20 20:47 Pavel Begunkov
2026-01-21 14:59 ` Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Pavel Begunkov @ 2026-01-20 20:47 UTC (permalink / raw)
To: io-uring; +Cc: asml.silence, axboe
Outside of SQPOLL, normally SQ entries are consumed by the time the
submission syscall returns. For those cases we don't need a circular
buffer and the head/tail tracking, instead the kernel can assume that
entries always start from the beginning of the SQ at index 0. This patch
introduces a setup flag doing exactly that. It's a simpler and helps
to keeps SQEs hot in cache.
The feature is optional and enabled by setting IORING_SETUP_SQ_REWIND.
The flag is rejected if passed together with SQPOLL as it'd require
waiting for SQ before each submission. It also requires
IORING_SETUP_NO_SQARRAY, which can be supported but it's unlikely there
will be users, so leave more space for future optimisations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
v3: rebase
v2: expanded comments
include/uapi/linux/io_uring.h | 12 ++++++++++++
io_uring/io_uring.c | 29 ++++++++++++++++++++++-------
io_uring/io_uring.h | 3 ++-
3 files changed, 36 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index b5b23c0d5283..475094c7a668 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -237,6 +237,18 @@ enum io_uring_sqe_flags_bit {
*/
#define IORING_SETUP_SQE_MIXED (1U << 19)
+/*
+ * When set, io_uring ignores SQ head and tail and fetches SQEs to submit
+ * starting from index 0 instead from the index stored in the head pointer.
+ * IOW, the user should place all SQE at the beginning of the SQ memory
+ * before issuing a submission syscall.
+ *
+ * It requires IORING_SETUP_NO_SQARRAY and is incompatible with
+ * IORING_SETUP_SQPOLL. The user must also never change the SQ head and tail
+ * values and keep it set to 0. Any other value is undefined behaviour.
+ */
+#define IORING_SETUP_SQ_REWIND (1U << 20)
+
enum io_uring_op {
IORING_OP_NOP,
IORING_OP_READV,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ea4eb3eedb43..4e97cfe488c6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2333,12 +2333,16 @@ static void io_commit_sqring(struct io_ring_ctx *ctx)
{
struct io_rings *rings = ctx->rings;
- /*
- * Ensure any loads from the SQEs are done at this point,
- * since once we write the new head, the application could
- * write new data to them.
- */
- smp_store_release(&rings->sq.head, ctx->cached_sq_head);
+ if (ctx->flags & IORING_SETUP_SQ_REWIND) {
+ ctx->cached_sq_head = 0;
+ } else {
+ /*
+ * Ensure any loads from the SQEs are done at this point,
+ * since once we write the new head, the application could
+ * write new data to them.
+ */
+ smp_store_release(&rings->sq.head, ctx->cached_sq_head);
+ }
}
/*
@@ -2384,10 +2388,15 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
__must_hold(&ctx->uring_lock)
{
- unsigned int entries = io_sqring_entries(ctx);
+ unsigned int entries;
unsigned int left;
int ret;
+ if (ctx->flags & IORING_SETUP_SQ_REWIND)
+ entries = ctx->sq_entries;
+ else
+ entries = io_sqring_entries(ctx);
+
entries = min(nr, entries);
if (unlikely(!entries))
return 0;
@@ -3422,6 +3431,12 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
if (flags & ~IORING_SETUP_FLAGS)
return -EINVAL;
+ if (flags & IORING_SETUP_SQ_REWIND) {
+ if ((flags & IORING_SETUP_SQPOLL) ||
+ !(flags & IORING_SETUP_NO_SQARRAY))
+ return -EINVAL;
+ }
+
/* There is no way to mmap rings without a real fd */
if ((flags & IORING_SETUP_REGISTERED_FD_ONLY) &&
!(flags & IORING_SETUP_NO_MMAP))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index c5bbb43b5842..baa1a20d0d6a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -68,7 +68,8 @@ struct io_ctx_config {
IORING_SETUP_NO_SQARRAY |\
IORING_SETUP_HYBRID_IOPOLL |\
IORING_SETUP_CQE_MIXED |\
- IORING_SETUP_SQE_MIXED)
+ IORING_SETUP_SQE_MIXED |\
+ IORING_SETUP_SQ_REWIND)
#define IORING_ENTER_FLAGS (IORING_ENTER_GETEVENTS |\
IORING_ENTER_SQ_WAKEUP |\
--
2.52.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/1] io_uring: introduce non-circular SQ
2026-01-20 20:47 [PATCH v3 1/1] io_uring: introduce non-circular SQ Pavel Begunkov
@ 2026-01-21 14:59 ` Jens Axboe
2026-01-21 18:20 ` Gabriel Krisman Bertazi
2026-01-22 22:59 ` Jens Axboe
2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2026-01-21 14:59 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 1/20/26 1:47 PM, Pavel Begunkov wrote:
> Outside of SQPOLL, normally SQ entries are consumed by the time the
> submission syscall returns. For those cases we don't need a circular
> buffer and the head/tail tracking, instead the kernel can assume that
> entries always start from the beginning of the SQ at index 0. This patch
> introduces a setup flag doing exactly that. It's a simpler and helps
> to keeps SQEs hot in cache.
>
> The feature is optional and enabled by setting IORING_SETUP_SQ_REWIND.
> The flag is rejected if passed together with SQPOLL as it'd require
> waiting for SQ before each submission. It also requires
> IORING_SETUP_NO_SQARRAY, which can be supported but it's unlikely there
> will be users, so leave more space for future optimisations.
Do you have liburing tests and man page updates for this too?
The feature itself looks fine, makes sense to keep reusing SQEs
rather than always going around the wheel.
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/1] io_uring: introduce non-circular SQ
2026-01-20 20:47 [PATCH v3 1/1] io_uring: introduce non-circular SQ Pavel Begunkov
2026-01-21 14:59 ` Jens Axboe
@ 2026-01-21 18:20 ` Gabriel Krisman Bertazi
2026-01-21 21:55 ` Pavel Begunkov
2026-01-22 22:59 ` Jens Axboe
2 siblings, 1 reply; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2026-01-21 18:20 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, axboe
Pavel Begunkov <asml.silence@gmail.com> writes:
> Outside of SQPOLL, normally SQ entries are consumed by the time the
> submission syscall returns. For those cases we don't need a circular
> buffer and the head/tail tracking, instead the kernel can assume that
> entries always start from the beginning of the SQ at index 0. This patch
> introduces a setup flag doing exactly that. It's a simpler and helps
> to keeps SQEs hot in cache.
>
> The feature is optional and enabled by setting IORING_SETUP_SQ_REWIND.
> The flag is rejected if passed together with SQPOLL as it'd require
> waiting for SQ before each submission. It also requires
> IORING_SETUP_NO_SQARRAY, which can be supported but it's unlikely there
> will be users, so leave more space for future optimisations.
This patch got me wondering if it would make sense to have a way to
point to different buffers as the SQE map and execute them. This way
the user could initialize a set of operations in a specific region of
the sq ring (or a separate buffer) once and have them repeatedly
executed with a single command, similar to a procedure call.
Say we have a preloaded ring with some sqes to accept a new connection,
and immediately some fixed data, etc. When I want to run it, I push a
SQE OP_EXECUTE pointing to this buffer to the "main" ring and io_uring
will queue everything in this pre-registered buffer.
I imagine it would save nothing beyond SQ initialization. just curious
if you see a use case for something like this?
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/1] io_uring: introduce non-circular SQ
2026-01-21 18:20 ` Gabriel Krisman Bertazi
@ 2026-01-21 21:55 ` Pavel Begunkov
2026-01-22 16:18 ` Gabriel Krisman Bertazi
0 siblings, 1 reply; 6+ messages in thread
From: Pavel Begunkov @ 2026-01-21 21:55 UTC (permalink / raw)
To: Gabriel Krisman Bertazi; +Cc: io-uring, axboe
On 1/21/26 18:20, Gabriel Krisman Bertazi wrote:
> Pavel Begunkov <asml.silence@gmail.com> writes:
>
>> Outside of SQPOLL, normally SQ entries are consumed by the time the
>> submission syscall returns. For those cases we don't need a circular
>> buffer and the head/tail tracking, instead the kernel can assume that
>> entries always start from the beginning of the SQ at index 0. This patch
>> introduces a setup flag doing exactly that. It's a simpler and helps
>> to keeps SQEs hot in cache.
>>
>> The feature is optional and enabled by setting IORING_SETUP_SQ_REWIND.
>> The flag is rejected if passed together with SQPOLL as it'd require
>> waiting for SQ before each submission. It also requires
>> IORING_SETUP_NO_SQARRAY, which can be supported but it's unlikely there
>> will be users, so leave more space for future optimisations.
>
> This patch got me wondering if it would make sense to have a way to
> point to different buffers as the SQE map and execute them. This way
> the user could initialize a set of operations in a specific region of
> the sq ring (or a separate buffer) once and have them repeatedly
> executed with a single command, similar to a procedure call.
>
> Say we have a preloaded ring with some sqes to accept a new connection,
> and immediately some fixed data, etc. When I want to run it, I push a
> SQE OP_EXECUTE pointing to this buffer to the "main" ring and io_uring
> will queue everything in this pre-registered buffer.
>
> I imagine it would save nothing beyond SQ initialization. just curious
> if you see a use case for something like this?
You already can do it with the sq array. Never heard of anyone
using it, but liburing never exposed it to users either.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/1] io_uring: introduce non-circular SQ
2026-01-21 21:55 ` Pavel Begunkov
@ 2026-01-22 16:18 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 6+ messages in thread
From: Gabriel Krisman Bertazi @ 2026-01-22 16:18 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: io-uring, axboe
Pavel Begunkov <asml.silence@gmail.com> writes:
> On 1/21/26 18:20, Gabriel Krisman Bertazi wrote:
>> and immediately some fixed data, etc. When I want to run it, I push a
>> SQE OP_EXECUTE pointing to this buffer to the "main" ring and io_uring
>> will queue everything in this pre-registered buffer.
>> I imagine it would save nothing beyond SQ initialization. just curious
>> if you see a use case for something like this?
>
> You already can do it with the sq array. Never heard of anyone
> using it, but liburing never exposed it to users either.
Thanks for the explanation. make sense!
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v3 1/1] io_uring: introduce non-circular SQ
2026-01-20 20:47 [PATCH v3 1/1] io_uring: introduce non-circular SQ Pavel Begunkov
2026-01-21 14:59 ` Jens Axboe
2026-01-21 18:20 ` Gabriel Krisman Bertazi
@ 2026-01-22 22:59 ` Jens Axboe
2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2026-01-22 22:59 UTC (permalink / raw)
To: io-uring, Pavel Begunkov
On Tue, 20 Jan 2026 20:47:40 +0000, Pavel Begunkov wrote:
> Outside of SQPOLL, normally SQ entries are consumed by the time the
> submission syscall returns. For those cases we don't need a circular
> buffer and the head/tail tracking, instead the kernel can assume that
> entries always start from the beginning of the SQ at index 0. This patch
> introduces a setup flag doing exactly that. It's a simpler and helps
> to keeps SQEs hot in cache.
>
> [...]
Applied, thanks!
[1/1] io_uring: introduce non-circular SQ
commit: 5247c034a67f5a93cc1faa15e9867eec5b22f38a
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-22 22:59 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-20 20:47 [PATCH v3 1/1] io_uring: introduce non-circular SQ Pavel Begunkov
2026-01-21 14:59 ` Jens Axboe
2026-01-21 18:20 ` Gabriel Krisman Bertazi
2026-01-21 21:55 ` Pavel Begunkov
2026-01-22 16:18 ` Gabriel Krisman Bertazi
2026-01-22 22:59 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox