public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCHv2 0/1]
@ 2025-09-04 19:27 Keith Busch
  2025-09-04 19:27 ` [RFC PATCHv2 1/3] Add support IORING_SETUP_SQE_MIXED Keith Busch
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Keith Busch @ 2025-09-04 19:27 UTC (permalink / raw)
  To: io-uring, axboe, csander; +Cc: Keith Busch

From: Keith Busch <kbusch@kernel.org>

The CQ supports mixed size entries, so why not do it for SQ's too? There
are use cases that currently allocate different queues just to keep
these things separated, but we can efficiently handle both cases in a
single ring.

This RFC is taking the last SQE flags bit for the purpose. That might
not be okay, but serves as a proof of concept for an initial look.

Changes from v1:

  - Fixed up the kernel's handling for big SQEs on a mixed SQ for
    uring_cmd and fdinfo

  - Added liburing tests for nvme uring_cmd usage and inserting a 128b
    SQE in the last entry to test the kernel's handling for a bad wrap.

kernel:

Keith Busch (1):
  io_uring: add support for IORING_SETUP_SQE_MIXED

 include/linux/io_uring_types.h |  3 +++
 include/uapi/linux/io_uring.h  |  9 +++++++++
 io_uring/fdinfo.c              | 32 +++++++++++++++++++++++++-------
 io_uring/io_uring.c            | 20 +++++++++++++++++++-
 io_uring/register.c            |  2 +-
 io_uring/uring_cmd.c           |  4 +++-
 6 files changed, 60 insertions(+), 10 deletions(-)

liburing:

Keith Busch (3):
  Add support IORING_SETUP_SQE_MIXED
  Add nop testing for IORING_SETUP_SQE_MIXED
  Add mixed sqe test for uring commands

 src/include/liburing.h          |  31 ++++++++
 src/include/liburing/io_uring.h |   9 +++
 test/Makefile                   |   3 +
 test/sqe-mixed-bad-wrap.c       | 121 +++++++++++++++++++++++++++++
 test/sqe-mixed-nop.c            | 104 +++++++++++++++++++++++++
 test/sqe-mixed-uring_cmd.c      | 168 +++++++++++++++++++++++++++++++++++++++++
 6 files changed, 436 insertions(+)

-- 
2.47.3


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCHv2 1/3] Add support IORING_SETUP_SQE_MIXED
  2025-09-04 19:27 [RFC PATCHv2 0/1] Keith Busch
@ 2025-09-04 19:27 ` Keith Busch
  2025-09-11 16:27   ` Caleb Sander Mateos
  2025-09-04 19:27 ` [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED Keith Busch
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-09-04 19:27 UTC (permalink / raw)
  To: io-uring, axboe, csander; +Cc: Keith Busch

From: Keith Busch <kbusch@kernel.org>

This adds core support for mixed sized SQEs in the same SQ ring. Before
this, SQEs were either 64b in size (the normal size), or 128b if
IORING_SETUP_SQE128 was set in the ring initialization. With the mixed
support, an SQE may be either 64b or 128b on the same SQ ring. If the
SQE is 128b in size, then IOSQE_SQE_128B will be set in the sqe flags.
The client may post a NOP SQE with IOSQE_CQE_SKIP_SUCCESS set that the
kernel should simply ignore as it's just a pad filler that is posted
when required.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 src/include/liburing.h          | 31 +++++++++++++++++++++++++++++++
 src/include/liburing/io_uring.h |  9 +++++++++
 2 files changed, 40 insertions(+)

diff --git a/src/include/liburing.h b/src/include/liburing.h
index 7ea876e1..97c70fa7 100644
--- a/src/include/liburing.h
+++ b/src/include/liburing.h
@@ -1853,6 +1853,37 @@ IOURINGINLINE struct io_uring_sqe *_io_uring_get_sqe(struct io_uring *ring)
 	return sqe;
 }
 
+IOURINGINLINE struct io_uring_sqe *io_uring_get_sqe128_mixed(struct io_uring *ring)
+	LIBURING_NOEXCEPT
+{
+	struct io_uring_sq *sq = &ring->sq;
+	unsigned head = io_uring_load_sq_head(ring), tail = sq->sqe_tail;
+	struct io_uring_sqe *sqe;
+
+	if (!(ring->flags & IORING_SETUP_SQE_MIXED))
+		return NULL;
+
+	if ((tail & sq->ring_mask) + 1 == sq->ring_entries) {
+		sqe = _io_uring_get_sqe(ring);
+		if (!sqe)
+			return NULL;
+
+		io_uring_prep_nop(sqe);
+		sqe->flags |= IOSQE_CQE_SKIP_SUCCESS;
+		tail = sq->sqe_tail;
+	}
+
+	if ((tail + 1) - head >= sq->ring_entries)
+		return NULL;
+
+	sqe = &sq->sqes[tail & sq->ring_mask];
+	sq->sqe_tail = tail + 2;
+	io_uring_initialize_sqe(sqe);
+	sqe->flags |= IOSQE_SQE_128B;
+
+	return sqe;
+}
+
 /*
  * Return the appropriate mask for a buffer ring of size 'ring_entries'
  */
diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h
index 643514e5..fd02fa52 100644
--- a/src/include/liburing/io_uring.h
+++ b/src/include/liburing/io_uring.h
@@ -126,6 +126,7 @@ enum io_uring_sqe_flags_bit {
 	IOSQE_ASYNC_BIT,
 	IOSQE_BUFFER_SELECT_BIT,
 	IOSQE_CQE_SKIP_SUCCESS_BIT,
+	IOSQE_SQE_128B_BIT,
 };
 
 /*
@@ -145,6 +146,8 @@ enum io_uring_sqe_flags_bit {
 #define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
 /* don't post CQE if request succeeded */
 #define IOSQE_CQE_SKIP_SUCCESS	(1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
+/* this is a 128b/big-sqe posting */
+#define IOSQE_SQE_128B          (1U << IOSQE_SQE_128B_BIT)
 
 /*
  * io_uring_setup() flags
@@ -211,6 +214,12 @@ enum io_uring_sqe_flags_bit {
  */
 #define IORING_SETUP_CQE_MIXED		(1U << 18)
 
+/*
+ *  Allow both 64b and 128b SQEs. If a 128b SQE is posted, it will have
+ *  IOSQE_SQE_128B set in sqe->flags.
+ */
+#define IORING_SETUP_SQE_MIXED		(1U << 19)
+
 enum io_uring_op {
 	IORING_OP_NOP,
 	IORING_OP_READV,
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-04 19:27 [RFC PATCHv2 0/1] Keith Busch
  2025-09-04 19:27 ` [RFC PATCHv2 1/3] Add support IORING_SETUP_SQE_MIXED Keith Busch
@ 2025-09-04 19:27 ` Keith Busch
  2025-09-10 17:44   ` Caleb Sander Mateos
  2025-09-04 19:27 ` [RFC PATCHv2 2/3] Add nop testing " Keith Busch
  2025-09-04 19:27 ` [RFC PATCHv2 3/3] Add mixed sqe test for uring commands Keith Busch
  3 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-09-04 19:27 UTC (permalink / raw)
  To: io-uring, axboe, csander; +Cc: Keith Busch

From: Keith Busch <kbusch@kernel.org>

Normal rings support 64b SQEs for posting submissions, while certain
features require the ring to be configured with IORING_SETUP_SQE128, as
they need to convey more information per submission. This, in turn,
makes ALL the SQEs be 128b in size. This is somewhat wasteful and
inefficient, particularly when only certain SQEs need to be of the
bigger variant.

This adds support for setting up a ring with mixed SQE sizes, using
IORING_SETUP_SQE_MIXED. When setup in this mode, SQEs posted to the ring
may be either 64b or 128b in size. If a SQE is 128b in size, then
IOSQE_SQE_128B flag is set in the SQE flags to indicate that this is the
case. If this flag isn't set, the SQE is the normal 64b variant.

SQEs on these types of mixed rings may also utilize NOP with skip
success set.  This can happen if the ring is one (small) SQE entry away
from wrapping, and an attempt is made to post a 128b SQE. As SQEs must be
contiguous in the SQ ring, a 128b SQE cannot wrap the ring. For this
case, a single NOP SQE is posted with the SKIP flag set. The kernel
should simply ignore those.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/linux/io_uring_types.h |  3 +++
 include/uapi/linux/io_uring.h  |  9 +++++++++
 io_uring/fdinfo.c              | 32 +++++++++++++++++++++++++-------
 io_uring/io_uring.c            | 20 +++++++++++++++++++-
 io_uring/register.c            |  2 +-
 io_uring/uring_cmd.c           |  4 +++-
 6 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index d1e25f3fe0b3a..d6e1c73400820 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -489,6 +489,7 @@ enum {
 	REQ_F_FORCE_ASYNC_BIT	= IOSQE_ASYNC_BIT,
 	REQ_F_BUFFER_SELECT_BIT	= IOSQE_BUFFER_SELECT_BIT,
 	REQ_F_CQE_SKIP_BIT	= IOSQE_CQE_SKIP_SUCCESS_BIT,
+	REQ_F_SQE_128B_BIT	= IOSQE_SQE_128B_BIT,
 
 	/* first byte is taken by user flags, shift it to not overlap */
 	REQ_F_FAIL_BIT		= 8,
@@ -547,6 +548,8 @@ enum {
 	REQ_F_BUFFER_SELECT	= IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT),
 	/* IOSQE_CQE_SKIP_SUCCESS */
 	REQ_F_CQE_SKIP		= IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT),
+	/* IOSQE_SQE_128B */
+	REQ_F_SQE_128B		= IO_REQ_FLAG(REQ_F_SQE_128B_BIT),
 
 	/* fail rest of links */
 	REQ_F_FAIL		= IO_REQ_FLAG(REQ_F_FAIL_BIT),
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 04ebff33d0e62..9cef9085f52ee 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -146,6 +146,7 @@ enum io_uring_sqe_flags_bit {
 	IOSQE_ASYNC_BIT,
 	IOSQE_BUFFER_SELECT_BIT,
 	IOSQE_CQE_SKIP_SUCCESS_BIT,
+	IOSQE_SQE_128B_BIT,
 };
 
 /*
@@ -165,6 +166,8 @@ enum io_uring_sqe_flags_bit {
 #define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
 /* don't post CQE if request succeeded */
 #define IOSQE_CQE_SKIP_SUCCESS	(1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
+/* this is a 128b/big-sqe posting */
+#define IOSQE_SQE_128B		(1U << IOSQE_SQE_128B_BIT)
 
 /*
  * io_uring_setup() flags
@@ -231,6 +234,12 @@ enum io_uring_sqe_flags_bit {
  */
 #define IORING_SETUP_CQE_MIXED		(1U << 18)
 
+/*
+ * Allow both 64b and 128b SQEs. If a 128b SQE is posted, it will have
+ * IOSQE_SQE_128B set in sqe->flags.
+ */
+#define IORING_SETUP_SQE_MIXED		(1U << 19)
+
 enum io_uring_op {
 	IORING_OP_NOP,
 	IORING_OP_READV,
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index 5c73398387690..ef0d17876a7b9 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -66,7 +66,6 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 	unsigned int cq_head = READ_ONCE(r->cq.head);
 	unsigned int cq_tail = READ_ONCE(r->cq.tail);
 	unsigned int sq_shift = 0;
-	unsigned int sq_entries;
 	int sq_pid = -1, sq_cpu = -1;
 	u64 sq_total_time = 0, sq_work_time = 0;
 	unsigned int i;
@@ -89,26 +88,44 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 	seq_printf(m, "CqTail:\t%u\n", cq_tail);
 	seq_printf(m, "CachedCqTail:\t%u\n", data_race(ctx->cached_cq_tail));
 	seq_printf(m, "SQEs:\t%u\n", sq_tail - sq_head);
-	sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
-	for (i = 0; i < sq_entries; i++) {
-		unsigned int entry = i + sq_head;
+	while (sq_head < sq_tail) {
 		struct io_uring_sqe *sqe;
 		unsigned int sq_idx;
+		bool sqe128 = false;
+		u8 sqe_flags;
 
 		if (ctx->flags & IORING_SETUP_NO_SQARRAY)
 			break;
-		sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
+		sq_idx = READ_ONCE(ctx->sq_array[sq_head & sq_mask]);
 		if (sq_idx > sq_mask)
 			continue;
 		sqe = &ctx->sq_sqes[sq_idx << sq_shift];
+		sqe_flags = READ_ONCE(sqe->flags);
+		if (sq_shift)
+			sqe128 = true;
+		else if (sqe_flags & IOSQE_SQE_128B) {
+			if (!(ctx->flags & IORING_SETUP_SQE_MIXED)) {
+				seq_printf(m,
+					"%5u: invalid sqe, 128B entry on non-mixed sq\n",
+					sq_idx);
+				break;
+			}
+			if ((++sq_head & sq_mask) == 0) {
+				seq_printf(m,
+					"%5u: corrupted sqe, wrapping 128B entry\n",
+					sq_idx);
+				break;
+			}
+			sqe128 = true;
+		}
 		seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, "
 			      "addr:0x%llx, rw_flags:0x%x, buf_index:%d "
 			      "user_data:%llu",
 			   sq_idx, io_uring_get_opcode(sqe->opcode), sqe->fd,
-			   sqe->flags, (unsigned long long) sqe->off,
+			   sqe_flags, (unsigned long long) sqe->off,
 			   (unsigned long long) sqe->addr, sqe->rw_flags,
 			   sqe->buf_index, sqe->user_data);
-		if (sq_shift) {
+		if (sqe128) {
 			u64 *sqeb = (void *) (sqe + 1);
 			int size = sizeof(struct io_uring_sqe) / sizeof(u64);
 			int j;
@@ -120,6 +137,7 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
 			}
 		}
 		seq_printf(m, "\n");
+		sq_head++;
 	}
 	seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
 	while (cq_head < cq_tail) {
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 6c07efac977ce..78a81e882fce7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2180,6 +2180,13 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
 	}
 	opcode = array_index_nospec(opcode, IORING_OP_LAST);
 
+	if (ctx->flags & IORING_SETUP_SQE_MIXED &&
+	    req->flags & REQ_F_SQE_128B) {
+		if ((ctx->cached_sq_head & (ctx->sq_entries - 1)) == 0)
+			return io_init_fail_req(req, -EINVAL);
+		ctx->cached_sq_head++;
+	}
+
 	def = &io_issue_defs[opcode];
 	if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) {
 		/* enforce forwards compatibility on users */
@@ -2793,6 +2800,10 @@ unsigned long rings_size(unsigned int flags, unsigned int sq_entries,
 		if (cq_entries < 2)
 			return SIZE_MAX;
 	}
+	if (flags & IORING_SETUP_SQE_MIXED) {
+		if (sq_entries < 2)
+			return SIZE_MAX;
+	}
 
 #ifdef CONFIG_SMP
 	off = ALIGN(off, SMP_CACHE_BYTES);
@@ -3724,6 +3735,13 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
 	if ((flags & (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED)) ==
 	    (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED))
 		return -EINVAL;
+	/*
+	 * Nonsensical to ask for SQE128 and mixed SQE support, it's not
+	 * supported to post 64b SQEs on a ring setup with SQE128.
+	 */
+	if ((flags & (IORING_SETUP_SQE128|IORING_SETUP_SQE_MIXED)) ==
+	    (IORING_SETUP_SQE128|IORING_SETUP_SQE_MIXED))
+		return -EINVAL;
 
 	return 0;
 }
@@ -3952,7 +3970,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 			IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN |
 			IORING_SETUP_NO_MMAP | IORING_SETUP_REGISTERED_FD_ONLY |
 			IORING_SETUP_NO_SQARRAY | IORING_SETUP_HYBRID_IOPOLL |
-			IORING_SETUP_CQE_MIXED))
+			IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED))
 		return -EINVAL;
 
 	return io_uring_create(entries, &p, params);
diff --git a/io_uring/register.c b/io_uring/register.c
index aa5f56ad83584..29aa7d3b4e820 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -397,7 +397,7 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
 #define RESIZE_FLAGS	(IORING_SETUP_CQSIZE | IORING_SETUP_CLAMP)
 #define COPY_FLAGS	(IORING_SETUP_NO_SQARRAY | IORING_SETUP_SQE128 | \
 			 IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP | \
-			 IORING_SETUP_CQE_MIXED)
+			 IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED)
 
 static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 5562e8491c5bd..8dd409774fd87 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -239,7 +239,9 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 	if (ret)
 		return ret;
 
-	if (ctx->flags & IORING_SETUP_SQE128)
+	if (ctx->flags & IORING_SETUP_SQE128 ||
+	    (ctx->flags & IORING_SETUP_SQE_MIXED &&
+	     req->flags & REQ_F_SQE_128B))
 		issue_flags |= IO_URING_F_SQE128;
 	if (ctx->flags & (IORING_SETUP_CQE32 | IORING_SETUP_CQE_MIXED))
 		issue_flags |= IO_URING_F_CQE32;
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCHv2 2/3] Add nop testing for IORING_SETUP_SQE_MIXED
  2025-09-04 19:27 [RFC PATCHv2 0/1] Keith Busch
  2025-09-04 19:27 ` [RFC PATCHv2 1/3] Add support IORING_SETUP_SQE_MIXED Keith Busch
  2025-09-04 19:27 ` [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED Keith Busch
@ 2025-09-04 19:27 ` Keith Busch
  2025-09-04 19:27 ` [RFC PATCHv2 3/3] Add mixed sqe test for uring commands Keith Busch
  3 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2025-09-04 19:27 UTC (permalink / raw)
  To: io-uring, axboe, csander; +Cc: Keith Busch

From: Keith Busch <kbusch@kernel.org>

Add NOP testing for mixed sized SQEs.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 test/Makefile              |   2 +
 test/sqe-mixed-bad-wrap.c  | 121 +++++++++++++++++++++++++++++++++++++
 test/sqe-mixed-nop.c       | 104 +++++++++++++++++++++++++++++++
 test/sqe-mixed-uring_cmd.c |   0
 4 files changed, 227 insertions(+)
 create mode 100644 test/sqe-mixed-bad-wrap.c
 create mode 100644 test/sqe-mixed-nop.c
 create mode 100644 test/sqe-mixed-uring_cmd.c

diff --git a/test/Makefile b/test/Makefile
index 54251bf2..a36c70a4 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -232,6 +232,8 @@ test_srcs := \
 	sq-poll-share.c \
 	sqpoll-sleep.c \
 	sq-space_left.c \
+	sqe-mixed-nop.c \
+	sqe-mixed-bad-wrap.c \
 	sqwait.c \
 	stdout.c \
 	submit-and-wait.c \
diff --git a/test/sqe-mixed-bad-wrap.c b/test/sqe-mixed-bad-wrap.c
new file mode 100644
index 00000000..25764fff
--- /dev/null
+++ b/test/sqe-mixed-bad-wrap.c
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Description: run various nop tests
+ *
+ */
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+
+#include "liburing.h"
+#include "helpers.h"
+#include "test.h"
+
+static int seq;
+
+static int test_single_nop(struct io_uring *ring, unsigned req_flags,
+			   bool should_fail)
+{
+	struct io_uring_cqe *cqe;
+	struct io_uring_sqe *sqe;
+	int ret;
+
+	sqe = io_uring_get_sqe(ring);
+	if (!sqe) {
+		fprintf(stderr, "get sqe failed\n");
+		goto err;
+	}
+
+	io_uring_prep_nop(sqe);
+	sqe->user_data = ++seq;
+	sqe->flags |= req_flags;
+
+	ret = io_uring_submit(ring);
+	if (ret <= 0) {
+		fprintf(stderr, "sqe submit failed: %d\n", ret);
+		goto err;
+	}
+
+	ret = io_uring_wait_cqe(ring, &cqe);
+	if (ret < 0) {
+		fprintf(stderr, "wait completion %d\n", ret);
+		goto err;
+	}
+	if (should_fail && cqe->res == 0) {
+		fprintf(stderr, "Unexpected success\n");
+		goto err;
+	}
+	if (!should_fail && cqe->res != 0) {
+		fprintf(stderr, "Completion error:%d\n", cqe->res);
+		goto err;
+	}
+	if (cqe->res == 0 && cqe->user_data != seq) {
+		fprintf(stderr, "Unexpected user_data: %ld\n", (long) cqe->user_data);
+		goto err;
+	}
+	io_uring_cqe_seen(ring, cqe);
+
+	return T_EXIT_PASS;
+err:
+	return T_EXIT_FAIL;
+}
+
+static int test_ring(unsigned flags)
+{
+	struct io_uring_params p = { };
+	struct io_uring ring;
+	int ret, i;
+
+	p.flags = flags;
+	ret = io_uring_queue_init_params(8, &ring, &p);
+	if (ret) {
+		if (ret == -EINVAL)
+			return T_EXIT_SKIP;
+		fprintf(stderr, "ring setup failed: %d\n", ret);
+		return T_EXIT_FAIL;
+	}
+
+	for (i = 0; i < 7; i++) {
+		ret = test_single_nop(&ring, 0, false);
+		if (ret)
+			goto err;
+	}
+
+	/* inserting a 128b sqe at the end should fail */
+	ret = test_single_nop(&ring, IOSQE_SQE_128B, true);
+	if (ret)
+		goto err;
+
+	/* proceeding from the bad wrap should succeed */
+	ret = test_single_nop(&ring, 0, false);
+	if (ret)
+		goto err;
+
+	io_uring_queue_exit(&ring);
+	return ret;
+err:
+	fprintf(stderr, "test_single_nop failed\n");
+	io_uring_queue_exit(&ring);
+	return ret;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	ret = test_ring(IORING_SETUP_SQE_MIXED);
+	if (ret == T_EXIT_SKIP) {
+		return T_EXIT_SKIP;
+	} else if (ret != T_EXIT_PASS) {
+		fprintf(stderr, "Mixed ring test failed\n");
+		return ret;
+	}
+
+	return T_EXIT_PASS;
+}
diff --git a/test/sqe-mixed-nop.c b/test/sqe-mixed-nop.c
new file mode 100644
index 00000000..91c1c795
--- /dev/null
+++ b/test/sqe-mixed-nop.c
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Description: run various nop tests
+ *
+ */
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+#include <fcntl.h>
+
+#include "liburing.h"
+#include "helpers.h"
+#include "test.h"
+
+static int seq;
+
+static int test_single_nop(struct io_uring *ring, unsigned req_flags, bool sqe128)
+{
+	struct io_uring_cqe *cqe;
+	struct io_uring_sqe *sqe;
+	int ret;
+
+	if (sqe128)
+		sqe = io_uring_get_sqe128_mixed(ring);
+	else
+		sqe = io_uring_get_sqe(ring);
+	if (!sqe) {
+		fprintf(stderr, "get sqe failed\n");
+		goto err;
+	}
+
+	io_uring_prep_nop(sqe);
+	sqe->user_data = ++seq;
+	sqe->flags |= req_flags;
+
+	ret = io_uring_submit(ring);
+	if (ret <= 0) {
+		fprintf(stderr, "sqe submit failed: %d\n", ret);
+		goto err;
+	}
+
+	ret = io_uring_wait_cqe(ring, &cqe);
+	if (ret < 0) {
+		fprintf(stderr, "wait completion %d\n", ret);
+		goto err;
+	}
+	if (cqe->user_data != seq) {
+		fprintf(stderr, "Unexpected user_data: %ld\n", (long) cqe->user_data);
+		goto err;
+	}
+	io_uring_cqe_seen(ring, cqe);
+	return T_EXIT_PASS;
+err:
+	return T_EXIT_FAIL;
+}
+
+static int test_ring(unsigned flags)
+{
+	struct io_uring_params p = { };
+	struct io_uring ring;
+	int ret, i;
+
+	p.flags = flags;
+	ret = io_uring_queue_init_params(8, &ring, &p);
+	if (ret) {
+		if (ret == -EINVAL)
+			return T_EXIT_SKIP;
+		fprintf(stderr, "ring setup failed: %d\n", ret);
+		return T_EXIT_FAIL;
+	}
+
+	/* alternate big and little sqe's */
+	for (i = 0; i < 32; i++) {
+		ret = test_single_nop(&ring, 0, i & 2);
+		if (ret) {
+			printf("fail off %d\n", i);
+			fprintf(stderr, "test_single_nop failed\n");
+			goto err;
+		}
+	}
+err:
+	io_uring_queue_exit(&ring);
+	return ret;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc > 1)
+		return T_EXIT_SKIP;
+
+	ret = test_ring(IORING_SETUP_SQE_MIXED);
+	if (ret == T_EXIT_SKIP) {
+		return T_EXIT_SKIP;
+	} else if (ret != T_EXIT_PASS) {
+		fprintf(stderr, "Mixed ring test failed\n");
+		return ret;
+	}
+
+	return T_EXIT_PASS;
+}
diff --git a/test/sqe-mixed-uring_cmd.c b/test/sqe-mixed-uring_cmd.c
new file mode 100644
index 00000000..e69de29b
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCHv2 3/3] Add mixed sqe test for uring commands
  2025-09-04 19:27 [RFC PATCHv2 0/1] Keith Busch
                   ` (2 preceding siblings ...)
  2025-09-04 19:27 ` [RFC PATCHv2 2/3] Add nop testing " Keith Busch
@ 2025-09-04 19:27 ` Keith Busch
  3 siblings, 0 replies; 16+ messages in thread
From: Keith Busch @ 2025-09-04 19:27 UTC (permalink / raw)
  To: io-uring, axboe, csander; +Cc: Keith Busch

From: Keith Busch <kbusch@kernel.org>

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 test/Makefile              |   1 +
 test/sqe-mixed-uring_cmd.c | 168 +++++++++++++++++++++++++++++++++++++
 2 files changed, 169 insertions(+)

diff --git a/test/Makefile b/test/Makefile
index a36c70a4..25af26a2 100644
--- a/test/Makefile
+++ b/test/Makefile
@@ -234,6 +234,7 @@ test_srcs := \
 	sq-space_left.c \
 	sqe-mixed-nop.c \
 	sqe-mixed-bad-wrap.c \
+	sqe-mixed-uring_cmd.c \
 	sqwait.c \
 	stdout.c \
 	submit-and-wait.c \
diff --git a/test/sqe-mixed-uring_cmd.c b/test/sqe-mixed-uring_cmd.c
index e69de29b..b213270b 100644
--- a/test/sqe-mixed-uring_cmd.c
+++ b/test/sqe-mixed-uring_cmd.c
@@ -0,0 +1,168 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Description: mixed sqes utilizing basic nop and io_uring passthrough commands
+ */
+#include <errno.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "helpers.h"
+#include "liburing.h"
+#include "../src/syscall.h"
+#include "nvme.h"
+
+#define min(a, b)	((a) < (b) ? (a) : (b))
+
+static int seq;
+
+static int test_single_nop(struct io_uring *ring, unsigned req_flags)
+{
+	struct io_uring_cqe *cqe;
+	struct io_uring_sqe *sqe;
+	int ret;
+
+	sqe = io_uring_get_sqe(ring);
+	if (!sqe) {
+		fprintf(stderr, "get sqe failed\n");
+		goto err;
+	}
+
+	io_uring_prep_nop(sqe);
+	sqe->user_data = ++seq;
+	sqe->flags |= req_flags;
+
+	ret = io_uring_submit(ring);
+	if (ret <= 0) {
+		fprintf(stderr, "sqe submit failed: %d\n", ret);
+		goto err;
+	}
+
+	ret = io_uring_wait_cqe(ring, &cqe);
+	if (ret < 0) {
+		fprintf(stderr, "wait completion %d\n", ret);
+		goto err;
+	}
+	if (cqe->user_data != seq) {
+		fprintf(stderr, "Unexpected user_data: %ld\n", (long) cqe->user_data);
+		goto err;
+	}
+	io_uring_cqe_seen(ring, cqe);
+	return T_EXIT_PASS;
+err:
+	return T_EXIT_FAIL;
+}
+
+static int test_single_nvme_read(struct io_uring *ring, int fd)
+{
+	struct nvme_uring_cmd *cmd;
+	struct io_uring_cqe *cqe;
+	struct io_uring_sqe *sqe;
+	int ret;
+
+	unsigned len = 0x1000;
+	unsigned char buf[0x1000] = {};
+
+	sqe = io_uring_get_sqe128_mixed(ring);
+	if (!sqe) {
+		fprintf(stderr, "get sqe failed\n");
+		goto err;
+	}
+
+	sqe->fd = fd;
+	sqe->user_data = ++seq;
+	sqe->opcode = IORING_OP_URING_CMD;
+	sqe->cmd_op = NVME_URING_CMD_IO;
+
+	cmd = (struct nvme_uring_cmd *)sqe->cmd;
+	memset(cmd, 0, sizeof(struct nvme_uring_cmd));
+	cmd->opcode = nvme_cmd_read;
+	cmd->cdw12 = (len >> lba_shift) - 1;
+	cmd->addr = (__u64)(uintptr_t)buf;
+	cmd->data_len = len;
+	cmd->nsid = nsid;
+
+	ret = io_uring_submit(ring);
+	if (ret <= 0) {
+		fprintf(stderr, "sqe submit failed: %d\n", ret);
+		goto err;
+	}
+
+	ret = io_uring_wait_cqe(ring, &cqe);
+	if (ret < 0) {
+		fprintf(stderr, "wait completion %d\n", ret);
+		goto err;
+	}
+	if (cqe->res != 0) {
+		fprintf(stderr, "cqe res %d, wanted 0\n", cqe->res);
+		goto err;
+	}
+	if (cqe->user_data != seq) {
+		fprintf(stderr, "Unexpected user_data: %ld\n", (long) cqe->user_data);
+		goto err;
+	}
+
+	io_uring_cqe_seen(ring, cqe);
+	return T_EXIT_PASS;
+err:
+	return T_EXIT_FAIL;
+}
+
+static int test_io(const char *file)
+{
+	struct io_uring_params p = { .flags = IORING_SETUP_SQE_MIXED };
+	struct io_uring ring;
+	int fd, ret, i;
+
+	fd = open(file, O_RDONLY);
+	if (fd < 0) {
+		if (errno == EACCES || errno == EPERM)
+			return T_EXIT_SKIP;
+		perror("file open");
+		goto err;
+	}
+
+	ret = io_uring_queue_init_params(8, &ring, &p);
+	if (ret) {
+		if (ret == -EINVAL)
+			return T_EXIT_SKIP;
+		fprintf(stderr, "ring setup failed: %d\n", ret);
+		goto close;
+	}
+
+	for (i = 0; i < 32; i++) {
+		if (i & 2)
+			ret = test_single_nvme_read(&ring, fd);
+		else
+			ret = test_single_nop(&ring, 0);
+
+		if (ret) {
+			fprintf(stderr, "test_single_nop failed\n");
+			goto exit;
+		}
+	}
+
+	io_uring_queue_exit(&ring);
+	return T_EXIT_PASS;
+exit:
+	io_uring_queue_exit(&ring);
+close:
+	close(fd);
+err:
+	return T_EXIT_FAIL;
+}
+
+int main(int argc, char *argv[])
+{
+	int ret;
+
+	if (argc < 2)
+		return T_EXIT_SKIP;
+
+	ret = nvme_get_info(argv[1]);
+	if (ret)
+		return T_EXIT_SKIP;
+
+	return test_io(argv[1]);
+}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-04 19:27 ` [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED Keith Busch
@ 2025-09-10 17:44   ` Caleb Sander Mateos
  2025-09-11  0:28     ` Jens Axboe
  2025-09-11  2:06     ` Keith Busch
  0 siblings, 2 replies; 16+ messages in thread
From: Caleb Sander Mateos @ 2025-09-10 17:44 UTC (permalink / raw)
  To: Keith Busch; +Cc: io-uring, axboe, Keith Busch

On Thu, Sep 4, 2025 at 12:27 PM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> Normal rings support 64b SQEs for posting submissions, while certain
> features require the ring to be configured with IORING_SETUP_SQE128, as
> they need to convey more information per submission. This, in turn,
> makes ALL the SQEs be 128b in size. This is somewhat wasteful and
> inefficient, particularly when only certain SQEs need to be of the
> bigger variant.
>
> This adds support for setting up a ring with mixed SQE sizes, using
> IORING_SETUP_SQE_MIXED. When setup in this mode, SQEs posted to the ring
> may be either 64b or 128b in size. If a SQE is 128b in size, then
> IOSQE_SQE_128B flag is set in the SQE flags to indicate that this is the
> case. If this flag isn't set, the SQE is the normal 64b variant.

I mentioned this on the previous revision, but it might be helpful to
document that IOSQE_SQE_128B also implies skipping an entry in the SQ
indirection array (when not using IORING_SETUP_NO_SQARRAY). There's no
obvious need to burn an entry in this array, as only a single u32
index is needed to refer to the SQE regardless of the SQE size. But it
certainly simplifies the implementation on the userspace side if it's
using an identity mapping in the SQ indirection array.

>
> SQEs on these types of mixed rings may also utilize NOP with skip
> success set.  This can happen if the ring is one (small) SQE entry away
> from wrapping, and an attempt is made to post a 128b SQE. As SQEs must be
> contiguous in the SQ ring, a 128b SQE cannot wrap the ring. For this
> case, a single NOP SQE is posted with the SKIP flag set. The kernel
> should simply ignore those.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  include/linux/io_uring_types.h |  3 +++
>  include/uapi/linux/io_uring.h  |  9 +++++++++
>  io_uring/fdinfo.c              | 32 +++++++++++++++++++++++++-------
>  io_uring/io_uring.c            | 20 +++++++++++++++++++-
>  io_uring/register.c            |  2 +-
>  io_uring/uring_cmd.c           |  4 +++-
>  6 files changed, 60 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index d1e25f3fe0b3a..d6e1c73400820 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -489,6 +489,7 @@ enum {
>         REQ_F_FORCE_ASYNC_BIT   = IOSQE_ASYNC_BIT,
>         REQ_F_BUFFER_SELECT_BIT = IOSQE_BUFFER_SELECT_BIT,
>         REQ_F_CQE_SKIP_BIT      = IOSQE_CQE_SKIP_SUCCESS_BIT,
> +       REQ_F_SQE_128B_BIT      = IOSQE_SQE_128B_BIT,
>
>         /* first byte is taken by user flags, shift it to not overlap */
>         REQ_F_FAIL_BIT          = 8,
> @@ -547,6 +548,8 @@ enum {
>         REQ_F_BUFFER_SELECT     = IO_REQ_FLAG(REQ_F_BUFFER_SELECT_BIT),
>         /* IOSQE_CQE_SKIP_SUCCESS */
>         REQ_F_CQE_SKIP          = IO_REQ_FLAG(REQ_F_CQE_SKIP_BIT),
> +       /* IOSQE_SQE_128B */
> +       REQ_F_SQE_128B          = IO_REQ_FLAG(REQ_F_SQE_128B_BIT),
>
>         /* fail rest of links */
>         REQ_F_FAIL              = IO_REQ_FLAG(REQ_F_FAIL_BIT),
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 04ebff33d0e62..9cef9085f52ee 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -146,6 +146,7 @@ enum io_uring_sqe_flags_bit {
>         IOSQE_ASYNC_BIT,
>         IOSQE_BUFFER_SELECT_BIT,
>         IOSQE_CQE_SKIP_SUCCESS_BIT,
> +       IOSQE_SQE_128B_BIT,

Have you given any thought to how we would handle the likely scenario
that we want to define more SQE flags in the future? Are there
existing unused bytes of the SQE where the new flags could go? If not,
we may need to repurpose some existing but rarely used field. And then
we'd likely want to reserve this last flag bit to specify whether the
SQE is using this "extended flags" field.

>  };
>
>  /*
> @@ -165,6 +166,8 @@ enum io_uring_sqe_flags_bit {
>  #define IOSQE_BUFFER_SELECT    (1U << IOSQE_BUFFER_SELECT_BIT)
>  /* don't post CQE if request succeeded */
>  #define IOSQE_CQE_SKIP_SUCCESS (1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
> +/* this is a 128b/big-sqe posting */
> +#define IOSQE_SQE_128B         (1U << IOSQE_SQE_128B_BIT)
>
>  /*
>   * io_uring_setup() flags
> @@ -231,6 +234,12 @@ enum io_uring_sqe_flags_bit {
>   */
>  #define IORING_SETUP_CQE_MIXED         (1U << 18)
>
> +/*
> + * Allow both 64b and 128b SQEs. If a 128b SQE is posted, it will have
> + * IOSQE_SQE_128B set in sqe->flags.
> + */
> +#define IORING_SETUP_SQE_MIXED         (1U << 19)
> +
>  enum io_uring_op {
>         IORING_OP_NOP,
>         IORING_OP_READV,
> diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
> index 5c73398387690..ef0d17876a7b9 100644
> --- a/io_uring/fdinfo.c
> +++ b/io_uring/fdinfo.c
> @@ -66,7 +66,6 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
>         unsigned int cq_head = READ_ONCE(r->cq.head);
>         unsigned int cq_tail = READ_ONCE(r->cq.tail);
>         unsigned int sq_shift = 0;
> -       unsigned int sq_entries;
>         int sq_pid = -1, sq_cpu = -1;
>         u64 sq_total_time = 0, sq_work_time = 0;
>         unsigned int i;
> @@ -89,26 +88,44 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
>         seq_printf(m, "CqTail:\t%u\n", cq_tail);
>         seq_printf(m, "CachedCqTail:\t%u\n", data_race(ctx->cached_cq_tail));
>         seq_printf(m, "SQEs:\t%u\n", sq_tail - sq_head);
> -       sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
> -       for (i = 0; i < sq_entries; i++) {
> -               unsigned int entry = i + sq_head;
> +       while (sq_head < sq_tail) {
>                 struct io_uring_sqe *sqe;
>                 unsigned int sq_idx;
> +               bool sqe128 = false;
> +               u8 sqe_flags;
>
>                 if (ctx->flags & IORING_SETUP_NO_SQARRAY)
>                         break;
> -               sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
> +               sq_idx = READ_ONCE(ctx->sq_array[sq_head & sq_mask]);
>                 if (sq_idx > sq_mask)
>                         continue;
>                 sqe = &ctx->sq_sqes[sq_idx << sq_shift];
> +               sqe_flags = READ_ONCE(sqe->flags);
> +               if (sq_shift)
> +                       sqe128 = true;
> +               else if (sqe_flags & IOSQE_SQE_128B) {
> +                       if (!(ctx->flags & IORING_SETUP_SQE_MIXED)) {
> +                               seq_printf(m,
> +                                       "%5u: invalid sqe, 128B entry on non-mixed sq\n",
> +                                       sq_idx);
> +                               break;
> +                       }
> +                       if ((++sq_head & sq_mask) == 0) {
> +                               seq_printf(m,
> +                                       "%5u: corrupted sqe, wrapping 128B entry\n",
> +                                       sq_idx);
> +                               break;
> +                       }
> +                       sqe128 = true;
> +               }
>                 seq_printf(m, "%5u: opcode:%s, fd:%d, flags:%x, off:%llu, "
>                               "addr:0x%llx, rw_flags:0x%x, buf_index:%d "
>                               "user_data:%llu",
>                            sq_idx, io_uring_get_opcode(sqe->opcode), sqe->fd,
> -                          sqe->flags, (unsigned long long) sqe->off,
> +                          sqe_flags, (unsigned long long) sqe->off,
>                            (unsigned long long) sqe->addr, sqe->rw_flags,
>                            sqe->buf_index, sqe->user_data);
> -               if (sq_shift) {
> +               if (sqe128) {
>                         u64 *sqeb = (void *) (sqe + 1);
>                         int size = sizeof(struct io_uring_sqe) / sizeof(u64);
>                         int j;
> @@ -120,6 +137,7 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *ctx, struct seq_file *m)
>                         }
>                 }
>                 seq_printf(m, "\n");
> +               sq_head++;
>         }
>         seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
>         while (cq_head < cq_tail) {
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 6c07efac977ce..78a81e882fce7 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2180,6 +2180,13 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
>         }
>         opcode = array_index_nospec(opcode, IORING_OP_LAST);
>
> +       if (ctx->flags & IORING_SETUP_SQE_MIXED &&
> +           req->flags & REQ_F_SQE_128B) {

Could use the local variable sqe_flags instead of req->flags

> +               if ((ctx->cached_sq_head & (ctx->sq_entries - 1)) == 0)
> +                       return io_init_fail_req(req, -EINVAL);
> +               ctx->cached_sq_head++;

I think the double increment of cached_sq_head breaks the logic in
io_sqring_entries(). The current assumption is that the difference
between the SQ tail and cached_sq_head is the number of SQEs available
to dispatch. And io_submit_sqes() will happily submit that many SQEs
without checking whether cached_sq_head has reached the SQ tail. This
assumption is broken now that one SQE may count for 2 increments of
the SQ indices.

> +       }
> +
>         def = &io_issue_defs[opcode];
>         if (unlikely(sqe_flags & ~SQE_COMMON_FLAGS)) {
>                 /* enforce forwards compatibility on users */
> @@ -2793,6 +2800,10 @@ unsigned long rings_size(unsigned int flags, unsigned int sq_entries,
>                 if (cq_entries < 2)
>                         return SIZE_MAX;
>         }
> +       if (flags & IORING_SETUP_SQE_MIXED) {
> +               if (sq_entries < 2)
> +                       return SIZE_MAX;
> +       }
>
>  #ifdef CONFIG_SMP
>         off = ALIGN(off, SMP_CACHE_BYTES);
> @@ -3724,6 +3735,13 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
>         if ((flags & (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED)) ==
>             (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED))
>                 return -EINVAL;
> +       /*
> +        * Nonsensical to ask for SQE128 and mixed SQE support, it's not
> +        * supported to post 64b SQEs on a ring setup with SQE128.
> +        */
> +       if ((flags & (IORING_SETUP_SQE128|IORING_SETUP_SQE_MIXED)) ==
> +           (IORING_SETUP_SQE128|IORING_SETUP_SQE_MIXED))
> +               return -EINVAL;
>
>         return 0;
>  }
> @@ -3952,7 +3970,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
>                         IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN |
>                         IORING_SETUP_NO_MMAP | IORING_SETUP_REGISTERED_FD_ONLY |
>                         IORING_SETUP_NO_SQARRAY | IORING_SETUP_HYBRID_IOPOLL |
> -                       IORING_SETUP_CQE_MIXED))
> +                       IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED))
>                 return -EINVAL;
>
>         return io_uring_create(entries, &p, params);
> diff --git a/io_uring/register.c b/io_uring/register.c
> index aa5f56ad83584..29aa7d3b4e820 100644
> --- a/io_uring/register.c
> +++ b/io_uring/register.c
> @@ -397,7 +397,7 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
>  #define RESIZE_FLAGS   (IORING_SETUP_CQSIZE | IORING_SETUP_CLAMP)
>  #define COPY_FLAGS     (IORING_SETUP_NO_SQARRAY | IORING_SETUP_SQE128 | \
>                          IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP | \
> -                        IORING_SETUP_CQE_MIXED)
> +                        IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED)
>
>  static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>  {
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index 5562e8491c5bd..8dd409774fd87 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -239,7 +239,9 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
>         if (ret)
>                 return ret;
>
> -       if (ctx->flags & IORING_SETUP_SQE128)
> +       if (ctx->flags & IORING_SETUP_SQE128 ||
> +           (ctx->flags & IORING_SETUP_SQE_MIXED &&
> +            req->flags & REQ_F_SQE_128B))
>                 issue_flags |= IO_URING_F_SQE128;

uring_sqe_size() still seems to be missing this REQ_F_SQE_128B logic?

Best,
Caleb

>         if (ctx->flags & (IORING_SETUP_CQE32 | IORING_SETUP_CQE_MIXED))
>                 issue_flags |= IO_URING_F_CQE32;
> --
> 2.47.3
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-10 17:44   ` Caleb Sander Mateos
@ 2025-09-11  0:28     ` Jens Axboe
  2025-09-11  2:11       ` Ming Lei
  2025-09-11  2:06     ` Keith Busch
  1 sibling, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2025-09-11  0:28 UTC (permalink / raw)
  To: Caleb Sander Mateos, Keith Busch; +Cc: io-uring, Keith Busch

On 9/10/25 11:44 AM, Caleb Sander Mateos wrote:
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index 04ebff33d0e62..9cef9085f52ee 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -146,6 +146,7 @@ enum io_uring_sqe_flags_bit {
>>         IOSQE_ASYNC_BIT,
>>         IOSQE_BUFFER_SELECT_BIT,
>>         IOSQE_CQE_SKIP_SUCCESS_BIT,
>> +       IOSQE_SQE_128B_BIT,
> 
> Have you given any thought to how we would handle the likely scenario
> that we want to define more SQE flags in the future? Are there
> existing unused bytes of the SQE where the new flags could go? If not,
> we may need to repurpose some existing but rarely used field. And then
> we'd likely want to reserve this last flag bit to specify whether the
> SQE is using this "extended flags" field.

Yep this is my main problem with this change. If you search the io_uring
list on lore you can find discussions about this in relation to when
Ming had his SQE grouping patches that also used this last bit. My
suggestion then was indeed to have this last flag be "look at XX for
IOSQE2_* flags". But it never quite got finalized. IIRC, my suggestion
back then was to use the personality field, since it's a pretty
specialized use case. Only issue with that is that you could then not
use IOSQE2_* flags with personality.

IOW, I think the IOSQE_SQE_128B flag is fine for prototyping and testing
these patches, but we unfortunately do need to iron out how on earth
we'll expose some more flags before this can go in.

Suggestions welcome...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-10 17:44   ` Caleb Sander Mateos
  2025-09-11  0:28     ` Jens Axboe
@ 2025-09-11  2:06     ` Keith Busch
  1 sibling, 0 replies; 16+ messages in thread
From: Keith Busch @ 2025-09-11  2:06 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: Keith Busch, io-uring, axboe

On Wed, Sep 10, 2025 at 10:44:10AM -0700, Caleb Sander Mateos wrote:
> On Thu, Sep 4, 2025 at 12:27 PM Keith Busch <kbusch@meta.com> wrote:
> > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > index 04ebff33d0e62..9cef9085f52ee 100644
> > --- a/include/uapi/linux/io_uring.h
> > +++ b/include/uapi/linux/io_uring.h
> > @@ -146,6 +146,7 @@ enum io_uring_sqe_flags_bit {
> >         IOSQE_ASYNC_BIT,
> >         IOSQE_BUFFER_SELECT_BIT,
> >         IOSQE_CQE_SKIP_SUCCESS_BIT,
> > +       IOSQE_SQE_128B_BIT,
> 
> Have you given any thought to how we would handle the likely scenario
> that we want to define more SQE flags in the future? Are there
> existing unused bytes of the SQE where the new flags could go? If not,
> we may need to repurpose some existing but rarely used field. And then
> we'd likely want to reserve this last flag bit to specify whether the
> SQE is using this "extended flags" field.

Yeah, I mentioned in the cover letter it may not okay to take this bit
for the cause. Using it this way is just a simple way forward for the
proof-of-concept to iron out handling mixed SQEs everywhere else. I
wouldn't remove the "RFC" prefix until we have agreement on how to flag
a big SQE command on a mixed SQ. One option, for example, might take the
highest opcode bit since we're a ways off off from needing it for more
ops.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-11  0:28     ` Jens Axboe
@ 2025-09-11  2:11       ` Ming Lei
  2025-09-11  2:19         ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2025-09-11  2:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caleb Sander Mateos, Keith Busch, io-uring, Keith Busch

On Wed, Sep 10, 2025 at 06:28:43PM -0600, Jens Axboe wrote:
> On 9/10/25 11:44 AM, Caleb Sander Mateos wrote:
> >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> >> index 04ebff33d0e62..9cef9085f52ee 100644
> >> --- a/include/uapi/linux/io_uring.h
> >> +++ b/include/uapi/linux/io_uring.h
> >> @@ -146,6 +146,7 @@ enum io_uring_sqe_flags_bit {
> >>         IOSQE_ASYNC_BIT,
> >>         IOSQE_BUFFER_SELECT_BIT,
> >>         IOSQE_CQE_SKIP_SUCCESS_BIT,
> >> +       IOSQE_SQE_128B_BIT,
> > 
> > Have you given any thought to how we would handle the likely scenario
> > that we want to define more SQE flags in the future? Are there
> > existing unused bytes of the SQE where the new flags could go? If not,
> > we may need to repurpose some existing but rarely used field. And then
> > we'd likely want to reserve this last flag bit to specify whether the
> > SQE is using this "extended flags" field.
> 
> Yep this is my main problem with this change. If you search the io_uring
> list on lore you can find discussions about this in relation to when
> Ming had his SQE grouping patches that also used this last bit. My
> suggestion then was indeed to have this last flag be "look at XX for
> IOSQE2_* flags". But it never quite got finalized. IIRC, my suggestion
> back then was to use the personality field, since it's a pretty
> specialized use case. Only issue with that is that you could then not
> use IOSQE2_* flags with personality.
> 
> IOW, I think the IOSQE_SQE_128B flag is fine for prototyping and testing
> these patches, but we unfortunately do need to iron out how on earth
> we'll expose some more flags before this can go in.

SQE128 is used for uring_cmd only, so it could be one uring_cmd
private flag. However, the implementation may be ugly and fragile.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-11  2:11       ` Ming Lei
@ 2025-09-11  2:19         ` Ming Lei
  2025-09-11 13:02           ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2025-09-11  2:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Caleb Sander Mateos, Keith Busch, io-uring, Keith Busch

On Thu, Sep 11, 2025 at 10:11:47AM +0800, Ming Lei wrote:
> On Wed, Sep 10, 2025 at 06:28:43PM -0600, Jens Axboe wrote:
> > On 9/10/25 11:44 AM, Caleb Sander Mateos wrote:
> > >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > >> index 04ebff33d0e62..9cef9085f52ee 100644
> > >> --- a/include/uapi/linux/io_uring.h
> > >> +++ b/include/uapi/linux/io_uring.h
> > >> @@ -146,6 +146,7 @@ enum io_uring_sqe_flags_bit {
> > >>         IOSQE_ASYNC_BIT,
> > >>         IOSQE_BUFFER_SELECT_BIT,
> > >>         IOSQE_CQE_SKIP_SUCCESS_BIT,
> > >> +       IOSQE_SQE_128B_BIT,
> > > 
> > > Have you given any thought to how we would handle the likely scenario
> > > that we want to define more SQE flags in the future? Are there
> > > existing unused bytes of the SQE where the new flags could go? If not,
> > > we may need to repurpose some existing but rarely used field. And then
> > > we'd likely want to reserve this last flag bit to specify whether the
> > > SQE is using this "extended flags" field.
> > 
> > Yep this is my main problem with this change. If you search the io_uring
> > list on lore you can find discussions about this in relation to when
> > Ming had his SQE grouping patches that also used this last bit. My
> > suggestion then was indeed to have this last flag be "look at XX for
> > IOSQE2_* flags". But it never quite got finalized. IIRC, my suggestion
> > back then was to use the personality field, since it's a pretty
> > specialized use case. Only issue with that is that you could then not
> > use IOSQE2_* flags with personality.
> > 
> > IOW, I think the IOSQE_SQE_128B flag is fine for prototyping and testing
> > these patches, but we unfortunately do need to iron out how on earth
> > we'll expose some more flags before this can go in.
> 
> SQE128 is used for uring_cmd only, so it could be one uring_cmd
> private flag. However, the implementation may be ugly and fragile.

Or in case of IORING_SETUP_SQE_MIXED, IORING_OP_URING_CMD is always interpreted
as plain 64bit SQE, also add IORING_OP_URING_CMD128 for SQE128 only.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-11  2:19         ` Ming Lei
@ 2025-09-11 13:02           ` Keith Busch
  2025-09-11 13:07             ` Ming Lei
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-09-11 13:02 UTC (permalink / raw)
  To: Ming Lei; +Cc: Jens Axboe, Caleb Sander Mateos, Keith Busch, io-uring

On Thu, Sep 11, 2025 at 10:19:06AM +0800, Ming Lei wrote:
> On Thu, Sep 11, 2025 at 10:11:47AM +0800, Ming Lei wrote:
> > SQE128 is used for uring_cmd only, so it could be one uring_cmd
> > private flag. However, the implementation may be ugly and fragile.
> 
> Or in case of IORING_SETUP_SQE_MIXED, IORING_OP_URING_CMD is always interpreted
> as plain 64bit SQE, also add IORING_OP_URING_CMD128 for SQE128 only.

Maybe that's good enough, but I was looking for more flexibility to have
big SQEs for read/write too. Not that I have a strong use case for it
now, but in hindsight, that's where "io_uring_attr_pi" should have been
placed instead of outide the submission queue.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-11 13:02           ` Keith Busch
@ 2025-09-11 13:07             ` Ming Lei
  2025-09-17 14:44               ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2025-09-11 13:07 UTC (permalink / raw)
  To: Keith Busch; +Cc: Jens Axboe, Caleb Sander Mateos, Keith Busch, io-uring

On Thu, Sep 11, 2025 at 9:02 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Thu, Sep 11, 2025 at 10:19:06AM +0800, Ming Lei wrote:
> > On Thu, Sep 11, 2025 at 10:11:47AM +0800, Ming Lei wrote:
> > > SQE128 is used for uring_cmd only, so it could be one uring_cmd
> > > private flag. However, the implementation may be ugly and fragile.
> >
> > Or in case of IORING_SETUP_SQE_MIXED, IORING_OP_URING_CMD is always interpreted
> > as plain 64bit SQE, also add IORING_OP_URING_CMD128 for SQE128 only.
>
> Maybe that's good enough, but I was looking for more flexibility to have
> big SQEs for read/write too. Not that I have a strong use case for it
> now, but in hindsight, that's where "io_uring_attr_pi" should have been
> placed instead of outide the submission queue.

Then you can add READ128/WRITE128...

Thanks,


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/3] Add support IORING_SETUP_SQE_MIXED
  2025-09-04 19:27 ` [RFC PATCHv2 1/3] Add support IORING_SETUP_SQE_MIXED Keith Busch
@ 2025-09-11 16:27   ` Caleb Sander Mateos
  0 siblings, 0 replies; 16+ messages in thread
From: Caleb Sander Mateos @ 2025-09-11 16:27 UTC (permalink / raw)
  To: Keith Busch; +Cc: io-uring, axboe, Keith Busch

On Thu, Sep 4, 2025 at 12:27 PM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> This adds core support for mixed sized SQEs in the same SQ ring. Before
> this, SQEs were either 64b in size (the normal size), or 128b if
> IORING_SETUP_SQE128 was set in the ring initialization. With the mixed
> support, an SQE may be either 64b or 128b on the same SQ ring. If the
> SQE is 128b in size, then IOSQE_SQE_128B will be set in the sqe flags.
> The client may post a NOP SQE with IOSQE_CQE_SKIP_SUCCESS set that the
> kernel should simply ignore as it's just a pad filler that is posted
> when required.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  src/include/liburing.h          | 31 +++++++++++++++++++++++++++++++
>  src/include/liburing/io_uring.h |  9 +++++++++
>  2 files changed, 40 insertions(+)
>
> diff --git a/src/include/liburing.h b/src/include/liburing.h
> index 7ea876e1..97c70fa7 100644
> --- a/src/include/liburing.h
> +++ b/src/include/liburing.h
> @@ -1853,6 +1853,37 @@ IOURINGINLINE struct io_uring_sqe *_io_uring_get_sqe(struct io_uring *ring)
>         return sqe;
>  }
>
> +IOURINGINLINE struct io_uring_sqe *io_uring_get_sqe128_mixed(struct io_uring *ring)
> +       LIBURING_NOEXCEPT
> +{
> +       struct io_uring_sq *sq = &ring->sq;
> +       unsigned head = io_uring_load_sq_head(ring), tail = sq->sqe_tail;
> +       struct io_uring_sqe *sqe;
> +
> +       if (!(ring->flags & IORING_SETUP_SQE_MIXED))
> +               return NULL;
> +
> +       if ((tail & sq->ring_mask) + 1 == sq->ring_entries) {

The condition you used on the kernel side is probably a bit more efficient:
(tail + 1) & sq->ring_mask == 0

> +               sqe = _io_uring_get_sqe(ring);
> +               if (!sqe)
> +                       return NULL;
> +
> +               io_uring_prep_nop(sqe);
> +               sqe->flags |= IOSQE_CQE_SKIP_SUCCESS;
> +               tail = sq->sqe_tail;
> +       }
> +
> +       if ((tail + 1) - head >= sq->ring_entries)
> +               return NULL;

Would it make sense to check for a full SQ before creating a NOP SQE
to avoid wasted work if the actual SQE can't be posted?

Best,
Caleb

> +
> +       sqe = &sq->sqes[tail & sq->ring_mask];
> +       sq->sqe_tail = tail + 2;
> +       io_uring_initialize_sqe(sqe);
> +       sqe->flags |= IOSQE_SQE_128B;
> +
> +       return sqe;
> +}
> +
>  /*
>   * Return the appropriate mask for a buffer ring of size 'ring_entries'
>   */
> diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h
> index 643514e5..fd02fa52 100644
> --- a/src/include/liburing/io_uring.h
> +++ b/src/include/liburing/io_uring.h
> @@ -126,6 +126,7 @@ enum io_uring_sqe_flags_bit {
>         IOSQE_ASYNC_BIT,
>         IOSQE_BUFFER_SELECT_BIT,
>         IOSQE_CQE_SKIP_SUCCESS_BIT,
> +       IOSQE_SQE_128B_BIT,
>  };
>
>  /*
> @@ -145,6 +146,8 @@ enum io_uring_sqe_flags_bit {
>  #define IOSQE_BUFFER_SELECT    (1U << IOSQE_BUFFER_SELECT_BIT)
>  /* don't post CQE if request succeeded */
>  #define IOSQE_CQE_SKIP_SUCCESS (1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
> +/* this is a 128b/big-sqe posting */
> +#define IOSQE_SQE_128B          (1U << IOSQE_SQE_128B_BIT)
>
>  /*
>   * io_uring_setup() flags
> @@ -211,6 +214,12 @@ enum io_uring_sqe_flags_bit {
>   */
>  #define IORING_SETUP_CQE_MIXED         (1U << 18)
>
> +/*
> + *  Allow both 64b and 128b SQEs. If a 128b SQE is posted, it will have
> + *  IOSQE_SQE_128B set in sqe->flags.
> + */
> +#define IORING_SETUP_SQE_MIXED         (1U << 19)
> +
>  enum io_uring_op {
>         IORING_OP_NOP,
>         IORING_OP_READV,
> --
> 2.47.3
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-11 13:07             ` Ming Lei
@ 2025-09-17 14:44               ` Jens Axboe
  2025-09-18 21:22                 ` Keith Busch
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2025-09-17 14:44 UTC (permalink / raw)
  To: Ming Lei, Keith Busch; +Cc: Caleb Sander Mateos, Keith Busch, io-uring

On 9/11/25 7:07 AM, Ming Lei wrote:
> On Thu, Sep 11, 2025 at 9:02?PM Keith Busch <kbusch@kernel.org> wrote:
>>
>> On Thu, Sep 11, 2025 at 10:19:06AM +0800, Ming Lei wrote:
>>> On Thu, Sep 11, 2025 at 10:11:47AM +0800, Ming Lei wrote:
>>>> SQE128 is used for uring_cmd only, so it could be one uring_cmd
>>>> private flag. However, the implementation may be ugly and fragile.
>>>
>>> Or in case of IORING_SETUP_SQE_MIXED, IORING_OP_URING_CMD is always interpreted
>>> as plain 64bit SQE, also add IORING_OP_URING_CMD128 for SQE128 only.
>>
>> Maybe that's good enough, but I was looking for more flexibility to have
>> big SQEs for read/write too. Not that I have a strong use case for it
>> now, but in hindsight, that's where "io_uring_attr_pi" should have been
>> placed instead of outide the submission queue.
> 
> Then you can add READ128/WRITE128...

Yeah, I do think this is the best approach - make it implied by the
opcode. Doesn't mean we have to bifurcate the whole opcode space,
as generally not a lot of opcodes will want/need an 128b SQE.

And it also nicely solves the issue of needing to solve the flags space
issue.

So maybe spin a v3 with that approach?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-17 14:44               ` Jens Axboe
@ 2025-09-18 21:22                 ` Keith Busch
  2025-09-18 23:35                   ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Keith Busch @ 2025-09-18 21:22 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Ming Lei, Caleb Sander Mateos, Keith Busch, io-uring

On Wed, Sep 17, 2025 at 08:44:13AM -0600, Jens Axboe wrote:
> On 9/11/25 7:07 AM, Ming Lei wrote:
> > On Thu, Sep 11, 2025 at 9:02?PM Keith Busch <kbusch@kernel.org> wrote:
> >>
> >> On Thu, Sep 11, 2025 at 10:19:06AM +0800, Ming Lei wrote:
> >>> On Thu, Sep 11, 2025 at 10:11:47AM +0800, Ming Lei wrote:
> >>>> SQE128 is used for uring_cmd only, so it could be one uring_cmd
> >>>> private flag. However, the implementation may be ugly and fragile.
> >>>
> >>> Or in case of IORING_SETUP_SQE_MIXED, IORING_OP_URING_CMD is always interpreted
> >>> as plain 64bit SQE, also add IORING_OP_URING_CMD128 for SQE128 only.
> >>
> >> Maybe that's good enough, but I was looking for more flexibility to have
> >> big SQEs for read/write too. Not that I have a strong use case for it
> >> now, but in hindsight, that's where "io_uring_attr_pi" should have been
> >> placed instead of outide the submission queue.
> > 
> > Then you can add READ128/WRITE128...
> 
> Yeah, I do think this is the best approach - make it implied by the
> opcode. Doesn't mean we have to bifurcate the whole opcode space,
> as generally not a lot of opcodes will want/need an 128b SQE.
> 
> And it also nicely solves the issue of needing to solve the flags space
> issue.
> 
> So maybe spin a v3 with that approach?

Yep, almost got it ready. I had to introduce NOP128 because that's a
very convenient op for testing. I hope that's okay.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED
  2025-09-18 21:22                 ` Keith Busch
@ 2025-09-18 23:35                   ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2025-09-18 23:35 UTC (permalink / raw)
  To: Keith Busch; +Cc: Ming Lei, Caleb Sander Mateos, Keith Busch, io-uring

On 9/18/25 3:22 PM, Keith Busch wrote:
> On Wed, Sep 17, 2025 at 08:44:13AM -0600, Jens Axboe wrote:
>> On 9/11/25 7:07 AM, Ming Lei wrote:
>>> On Thu, Sep 11, 2025 at 9:02?PM Keith Busch <kbusch@kernel.org> wrote:
>>>>
>>>> On Thu, Sep 11, 2025 at 10:19:06AM +0800, Ming Lei wrote:
>>>>> On Thu, Sep 11, 2025 at 10:11:47AM +0800, Ming Lei wrote:
>>>>>> SQE128 is used for uring_cmd only, so it could be one uring_cmd
>>>>>> private flag. However, the implementation may be ugly and fragile.
>>>>>
>>>>> Or in case of IORING_SETUP_SQE_MIXED, IORING_OP_URING_CMD is always interpreted
>>>>> as plain 64bit SQE, also add IORING_OP_URING_CMD128 for SQE128 only.
>>>>
>>>> Maybe that's good enough, but I was looking for more flexibility to have
>>>> big SQEs for read/write too. Not that I have a strong use case for it
>>>> now, but in hindsight, that's where "io_uring_attr_pi" should have been
>>>> placed instead of outide the submission queue.
>>>
>>> Then you can add READ128/WRITE128...
>>
>> Yeah, I do think this is the best approach - make it implied by the
>> opcode. Doesn't mean we have to bifurcate the whole opcode space,
>> as generally not a lot of opcodes will want/need an 128b SQE.
>>
>> And it also nicely solves the issue of needing to solve the flags space
>> issue.
>>
>> So maybe spin a v3 with that approach?
> 
> Yep, almost got it ready. I had to introduce NOP128 because that's a
> very convenient op for testing. I hope that's okay.

Yep that's fine, I'm assuming they'd basically use the same opcode
handler anyway.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-09-18 23:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-04 19:27 [RFC PATCHv2 0/1] Keith Busch
2025-09-04 19:27 ` [RFC PATCHv2 1/3] Add support IORING_SETUP_SQE_MIXED Keith Busch
2025-09-11 16:27   ` Caleb Sander Mateos
2025-09-04 19:27 ` [RFC PATCHv2 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED Keith Busch
2025-09-10 17:44   ` Caleb Sander Mateos
2025-09-11  0:28     ` Jens Axboe
2025-09-11  2:11       ` Ming Lei
2025-09-11  2:19         ` Ming Lei
2025-09-11 13:02           ` Keith Busch
2025-09-11 13:07             ` Ming Lei
2025-09-17 14:44               ` Jens Axboe
2025-09-18 21:22                 ` Keith Busch
2025-09-18 23:35                   ` Jens Axboe
2025-09-11  2:06     ` Keith Busch
2025-09-04 19:27 ` [RFC PATCHv2 2/3] Add nop testing " Keith Busch
2025-09-04 19:27 ` [RFC PATCHv2 3/3] Add mixed sqe test for uring commands Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox