From: Pavel Begunkov <asml.silence@gmail.com>
To: io-uring@vger.kernel.org
Cc: asml.silence@gmail.com, axboe@kernel.dk
Subject: [PATCH v3 1/1] io_uring: introduce non-circular SQ
Date: Tue, 20 Jan 2026 20:47:40 +0000 [thread overview]
Message-ID: <b7a5502ee3da7ef096455498cd1ad3efbdbee288.1768940337.git.asml.silence@gmail.com> (raw)
Outside of SQPOLL, normally SQ entries are consumed by the time the
submission syscall returns. For those cases we don't need a circular
buffer and the head/tail tracking, instead the kernel can assume that
entries always start from the beginning of the SQ at index 0. This patch
introduces a setup flag doing exactly that. It's a simpler and helps
to keeps SQEs hot in cache.
The feature is optional and enabled by setting IORING_SETUP_SQ_REWIND.
The flag is rejected if passed together with SQPOLL as it'd require
waiting for SQ before each submission. It also requires
IORING_SETUP_NO_SQARRAY, which can be supported but it's unlikely there
will be users, so leave more space for future optimisations.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
v3: rebase
v2: expanded comments
include/uapi/linux/io_uring.h | 12 ++++++++++++
io_uring/io_uring.c | 29 ++++++++++++++++++++++-------
io_uring/io_uring.h | 3 ++-
3 files changed, 36 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index b5b23c0d5283..475094c7a668 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -237,6 +237,18 @@ enum io_uring_sqe_flags_bit {
*/
#define IORING_SETUP_SQE_MIXED (1U << 19)
+/*
+ * When set, io_uring ignores SQ head and tail and fetches SQEs to submit
+ * starting from index 0 instead from the index stored in the head pointer.
+ * IOW, the user should place all SQE at the beginning of the SQ memory
+ * before issuing a submission syscall.
+ *
+ * It requires IORING_SETUP_NO_SQARRAY and is incompatible with
+ * IORING_SETUP_SQPOLL. The user must also never change the SQ head and tail
+ * values and keep it set to 0. Any other value is undefined behaviour.
+ */
+#define IORING_SETUP_SQ_REWIND (1U << 20)
+
enum io_uring_op {
IORING_OP_NOP,
IORING_OP_READV,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ea4eb3eedb43..4e97cfe488c6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2333,12 +2333,16 @@ static void io_commit_sqring(struct io_ring_ctx *ctx)
{
struct io_rings *rings = ctx->rings;
- /*
- * Ensure any loads from the SQEs are done at this point,
- * since once we write the new head, the application could
- * write new data to them.
- */
- smp_store_release(&rings->sq.head, ctx->cached_sq_head);
+ if (ctx->flags & IORING_SETUP_SQ_REWIND) {
+ ctx->cached_sq_head = 0;
+ } else {
+ /*
+ * Ensure any loads from the SQEs are done at this point,
+ * since once we write the new head, the application could
+ * write new data to them.
+ */
+ smp_store_release(&rings->sq.head, ctx->cached_sq_head);
+ }
}
/*
@@ -2384,10 +2388,15 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
__must_hold(&ctx->uring_lock)
{
- unsigned int entries = io_sqring_entries(ctx);
+ unsigned int entries;
unsigned int left;
int ret;
+ if (ctx->flags & IORING_SETUP_SQ_REWIND)
+ entries = ctx->sq_entries;
+ else
+ entries = io_sqring_entries(ctx);
+
entries = min(nr, entries);
if (unlikely(!entries))
return 0;
@@ -3422,6 +3431,12 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
if (flags & ~IORING_SETUP_FLAGS)
return -EINVAL;
+ if (flags & IORING_SETUP_SQ_REWIND) {
+ if ((flags & IORING_SETUP_SQPOLL) ||
+ !(flags & IORING_SETUP_NO_SQARRAY))
+ return -EINVAL;
+ }
+
/* There is no way to mmap rings without a real fd */
if ((flags & IORING_SETUP_REGISTERED_FD_ONLY) &&
!(flags & IORING_SETUP_NO_MMAP))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index c5bbb43b5842..baa1a20d0d6a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -68,7 +68,8 @@ struct io_ctx_config {
IORING_SETUP_NO_SQARRAY |\
IORING_SETUP_HYBRID_IOPOLL |\
IORING_SETUP_CQE_MIXED |\
- IORING_SETUP_SQE_MIXED)
+ IORING_SETUP_SQE_MIXED |\
+ IORING_SETUP_SQ_REWIND)
#define IORING_ENTER_FLAGS (IORING_ENTER_GETEVENTS |\
IORING_ENTER_SQ_WAKEUP |\
--
2.52.0
next reply other threads:[~2026-01-20 20:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-20 20:47 Pavel Begunkov [this message]
2026-01-21 14:59 ` [PATCH v3 1/1] io_uring: introduce non-circular SQ Jens Axboe
2026-01-21 18:20 ` Gabriel Krisman Bertazi
2026-01-21 21:55 ` Pavel Begunkov
2026-01-22 16:18 ` Gabriel Krisman Bertazi
2026-01-22 22:59 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b7a5502ee3da7ef096455498cd1ad3efbdbee288.1768940337.git.asml.silence@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox