From: Caleb Sander Mateos <csander@purestorage.com>
To: Keith Busch <kbusch@meta.com>
Cc: io-uring@vger.kernel.org, axboe@kernel.dk, ming.lei@redhat.com,
Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCHv3 1/3] Add support IORING_SETUP_SQE_MIXED
Date: Wed, 24 Sep 2025 13:20:44 -0700 [thread overview]
Message-ID: <CADUfDZrmFphH5AwNkLs=OtPg9qfnpciJB--28PVQ4q=5Fh21TQ@mail.gmail.com> (raw)
In-Reply-To: <20250924151210.619099-2-kbusch@meta.com>
On Wed, Sep 24, 2025 at 8:12 AM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> This adds core support for mixed sized SQEs in the same SQ ring. Before
> this, SQEs were either 64b in size (the normal size), or 128b if
> IORING_SETUP_SQE128 was set in the ring initialization. With the mixed
> support, an SQE may be either 64b or 128b on the same SQ ring. If the
> SQE is 128b in size, then a 128b opcode will be set in the sqe op. When
> acquiring a large sqe at the end of the sq, the client may post a NOP
> SQE with IOSQE_CQE_SKIP_SUCCESS set that the kernel should simply ignore
> as it's just a pad filler that is posted when required.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> src/include/liburing.h | 50 +++++++++++++++++++++++++++++++++
> src/include/liburing/io_uring.h | 11 ++++++++
> 2 files changed, 61 insertions(+)
>
> diff --git a/src/include/liburing.h b/src/include/liburing.h
> index 052d6b56..66f1b990 100644
> --- a/src/include/liburing.h
> +++ b/src/include/liburing.h
> @@ -575,6 +575,7 @@ IOURINGINLINE void io_uring_initialize_sqe(struct io_uring_sqe *sqe)
> sqe->buf_index = 0;
> sqe->personality = 0;
> sqe->file_index = 0;
> + sqe->addr2 = 0;
Why is this necessary for mixed SQE size support? It looks like this
field is already initialized in io_uring_prep_rw() via the unioned off
field. Though, to be honest, I can't say I understand why the
initialization of the SQE fields is split between
io_uring_initialize_sqe() and io_uring_prep_rw().
> sqe->addr3 = 0;
> sqe->__pad2[0] = 0;
> }
> @@ -799,6 +800,12 @@ IOURINGINLINE void io_uring_prep_nop(struct io_uring_sqe *sqe)
> io_uring_prep_rw(IORING_OP_NOP, sqe, -1, NULL, 0, 0);
> }
>
> +IOURINGINLINE void io_uring_prep_nop128(struct io_uring_sqe *sqe)
> + LIBURING_NOEXCEPT
> +{
> + io_uring_prep_rw(IORING_OP_NOP128, sqe, -1, NULL, 0, 0);
> +}
> +
> IOURINGINLINE void io_uring_prep_timeout(struct io_uring_sqe *sqe,
> struct __kernel_timespec *ts,
> unsigned count, unsigned flags)
> @@ -1882,6 +1889,49 @@ IOURINGINLINE struct io_uring_sqe *_io_uring_get_sqe(struct io_uring *ring)
> return sqe;
> }
>
> +/*
> + * Return a 128B sqe to fill. Applications must later call io_uring_submit()
> + * when it's ready to tell the kernel about it. The caller may call this
> + * function multiple times before calling io_uring_submit().
> + *
> + * Returns a vacant 128B sqe, or NULL if we're full. If the current tail is the
> + * last entry in the ring, this function will insert a nop + skip complete such
> + * that the 128b entry wraps back to the beginning of the queue for a
> + * contiguous big sq entry. It's up to the caller to use a 128b opcode in order
> + * for the kernel to know how to advance its sq head pointer.
> + */
> +IOURINGINLINE struct io_uring_sqe *io_uring_get_sqe128_mixed(struct io_uring *ring)
> + LIBURING_NOEXCEPT
> +{
> + struct io_uring_sq *sq = &ring->sq;
> + unsigned head = io_uring_load_sq_head(ring), tail = sq->sqe_tail;
> + struct io_uring_sqe *sqe;
> +
> + if (!(ring->flags & IORING_SETUP_SQE_MIXED))
> + return NULL;
> +
> + if (((tail + 1) & sq->ring_mask) == 0) {
> + if ((tail + 2) - head >= sq->ring_entries)
> + return NULL;
> +
> + sqe = _io_uring_get_sqe(ring);
> + if (!sqe)
> + return NULL;
This case should be impossible since we just checked there is an empty
SQ slot at the end of the ring plus two more at the beginning.
> +
> + io_uring_prep_nop(sqe);
> + sqe->flags |= IOSQE_CQE_SKIP_SUCCESS;
> + tail = sq->sqe_tail;
> + } else if ((tail + 1) - head >= sq->ring_entries) {
> + return NULL;
> + }
> +
> + sqe = &sq->sqes[tail & sq->ring_mask];
> + sq->sqe_tail = tail + 2;
> + io_uring_initialize_sqe(sqe);
> +
> + return sqe;
> +}
> +
> /*
> * Return the appropriate mask for a buffer ring of size 'ring_entries'
> */
> diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h
> index 31396057..1e0b6398 100644
> --- a/src/include/liburing/io_uring.h
> +++ b/src/include/liburing/io_uring.h
> @@ -126,6 +126,7 @@ enum io_uring_sqe_flags_bit {
> IOSQE_ASYNC_BIT,
> IOSQE_BUFFER_SELECT_BIT,
> IOSQE_CQE_SKIP_SUCCESS_BIT,
> + IOSQE_SQE_128B_BIT,
I thought we decided against using an SQE flag bit for this? Looks
like this needs to be re-synced with the kernel uapi header.
Best,
Caleb
> };
>
> /*
> @@ -145,6 +146,8 @@ enum io_uring_sqe_flags_bit {
> #define IOSQE_BUFFER_SELECT (1U << IOSQE_BUFFER_SELECT_BIT)
> /* don't post CQE if request succeeded */
> #define IOSQE_CQE_SKIP_SUCCESS (1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
> +/* this is a 128b/big-sqe posting */
> +#define IOSQE_SQE_128B (1U << IOSQE_SQE_128B_BIT)
>
> /*
> * io_uring_setup() flags
> @@ -211,6 +214,12 @@ enum io_uring_sqe_flags_bit {
> */
> #define IORING_SETUP_CQE_MIXED (1U << 18)
>
> +/*
> + * Allow both 64b and 128b SQEs. If a 128b SQE is posted, it will have
> + * IOSQE_SQE_128B set in sqe->flags.
> + */
> +#define IORING_SETUP_SQE_MIXED (1U << 19)
> +
> enum io_uring_op {
> IORING_OP_NOP,
> IORING_OP_READV,
> @@ -275,6 +284,8 @@ enum io_uring_op {
> IORING_OP_READV_FIXED,
> IORING_OP_WRITEV_FIXED,
> IORING_OP_PIPE,
> + IORING_OP_NOP128,
> + IORING_OP_URING_CMD128,
>
> /* this goes last, obviously */
> IORING_OP_LAST,
> --
> 2.47.3
>
next prev parent reply other threads:[~2025-09-24 20:20 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-24 15:12 [PATCHv3 0/3] Keith Busch
2025-09-24 15:12 ` [PATCHv3 1/3] Add support IORING_SETUP_SQE_MIXED Keith Busch
2025-09-24 20:20 ` Caleb Sander Mateos [this message]
2025-09-24 20:30 ` Keith Busch
2025-09-24 20:37 ` Caleb Sander Mateos
2025-09-24 15:12 ` [PATCHv3 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED Keith Busch
2025-09-25 15:03 ` Jens Axboe
2025-09-25 18:21 ` Caleb Sander Mateos
2025-09-25 18:44 ` Jens Axboe
2025-09-24 15:12 ` [PATCHv3 2/3] Add nop testing " Keith Busch
2025-09-24 15:12 ` [PATCHv3 3/3] Add mixed sqe test for uring commands Keith Busch
2025-09-24 15:54 ` [PATCHv3 0/3] io_uring: mixed submission queue size support Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADUfDZrmFphH5AwNkLs=OtPg9qfnpciJB--28PVQ4q=5Fh21TQ@mail.gmail.com' \
--to=csander@purestorage.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
--cc=kbusch@kernel.org \
--cc=kbusch@meta.com \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox