public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Caleb Sander Mateos <csander@purestorage.com>
To: Keith Busch <kbusch@meta.com>
Cc: io-uring@vger.kernel.org, axboe@kernel.dk, ming.lei@redhat.com,
	 Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCHv3 1/3] Add support IORING_SETUP_SQE_MIXED
Date: Wed, 24 Sep 2025 13:20:44 -0700	[thread overview]
Message-ID: <CADUfDZrmFphH5AwNkLs=OtPg9qfnpciJB--28PVQ4q=5Fh21TQ@mail.gmail.com> (raw)
In-Reply-To: <20250924151210.619099-2-kbusch@meta.com>

On Wed, Sep 24, 2025 at 8:12 AM Keith Busch <kbusch@meta.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> This adds core support for mixed sized SQEs in the same SQ ring. Before
> this, SQEs were either 64b in size (the normal size), or 128b if
> IORING_SETUP_SQE128 was set in the ring initialization. With the mixed
> support, an SQE may be either 64b or 128b on the same SQ ring. If the
> SQE is 128b in size, then a 128b opcode will be set in the sqe op. When
> acquiring a large sqe at the end of the sq, the client may post a NOP
> SQE with IOSQE_CQE_SKIP_SUCCESS set that the kernel should simply ignore
> as it's just a pad filler that is posted when required.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  src/include/liburing.h          | 50 +++++++++++++++++++++++++++++++++
>  src/include/liburing/io_uring.h | 11 ++++++++
>  2 files changed, 61 insertions(+)
>
> diff --git a/src/include/liburing.h b/src/include/liburing.h
> index 052d6b56..66f1b990 100644
> --- a/src/include/liburing.h
> +++ b/src/include/liburing.h
> @@ -575,6 +575,7 @@ IOURINGINLINE void io_uring_initialize_sqe(struct io_uring_sqe *sqe)
>         sqe->buf_index = 0;
>         sqe->personality = 0;
>         sqe->file_index = 0;
> +       sqe->addr2 = 0;

Why is this necessary for mixed SQE size support? It looks like this
field is already initialized in io_uring_prep_rw() via the unioned off
field. Though, to be honest, I can't say I understand why the
initialization of the SQE fields is split between
io_uring_initialize_sqe() and io_uring_prep_rw().

>         sqe->addr3 = 0;
>         sqe->__pad2[0] = 0;
>  }
> @@ -799,6 +800,12 @@ IOURINGINLINE void io_uring_prep_nop(struct io_uring_sqe *sqe)
>         io_uring_prep_rw(IORING_OP_NOP, sqe, -1, NULL, 0, 0);
>  }
>
> +IOURINGINLINE void io_uring_prep_nop128(struct io_uring_sqe *sqe)
> +       LIBURING_NOEXCEPT
> +{
> +       io_uring_prep_rw(IORING_OP_NOP128, sqe, -1, NULL, 0, 0);
> +}
> +
>  IOURINGINLINE void io_uring_prep_timeout(struct io_uring_sqe *sqe,
>                                          struct __kernel_timespec *ts,
>                                          unsigned count, unsigned flags)
> @@ -1882,6 +1889,49 @@ IOURINGINLINE struct io_uring_sqe *_io_uring_get_sqe(struct io_uring *ring)
>         return sqe;
>  }
>
> +/*
> + * Return a 128B sqe to fill. Applications must later call io_uring_submit()
> + * when it's ready to tell the kernel about it. The caller may call this
> + * function multiple times before calling io_uring_submit().
> + *
> + * Returns a vacant 128B sqe, or NULL if we're full. If the current tail is the
> + * last entry in the ring, this function will insert a nop + skip complete such
> + * that the 128b entry wraps back to the beginning of the queue for a
> + * contiguous big sq entry. It's up to the caller to use a 128b opcode in order
> + * for the kernel to know how to advance its sq head pointer.
> + */
> +IOURINGINLINE struct io_uring_sqe *io_uring_get_sqe128_mixed(struct io_uring *ring)
> +       LIBURING_NOEXCEPT
> +{
> +       struct io_uring_sq *sq = &ring->sq;
> +       unsigned head = io_uring_load_sq_head(ring), tail = sq->sqe_tail;
> +       struct io_uring_sqe *sqe;
> +
> +       if (!(ring->flags & IORING_SETUP_SQE_MIXED))
> +               return NULL;
> +
> +       if (((tail + 1) & sq->ring_mask) == 0) {
> +               if ((tail + 2) - head >= sq->ring_entries)
> +                       return NULL;
> +
> +               sqe = _io_uring_get_sqe(ring);
> +               if (!sqe)
> +                       return NULL;

This case should be impossible since we just checked there is an empty
SQ slot at the end of the ring plus two more at the beginning.

> +
> +               io_uring_prep_nop(sqe);
> +               sqe->flags |= IOSQE_CQE_SKIP_SUCCESS;
> +               tail = sq->sqe_tail;
> +       } else if ((tail + 1) - head >= sq->ring_entries) {
> +               return NULL;
> +       }
> +
> +       sqe = &sq->sqes[tail & sq->ring_mask];
> +       sq->sqe_tail = tail + 2;
> +       io_uring_initialize_sqe(sqe);
> +
> +       return sqe;
> +}
> +
>  /*
>   * Return the appropriate mask for a buffer ring of size 'ring_entries'
>   */
> diff --git a/src/include/liburing/io_uring.h b/src/include/liburing/io_uring.h
> index 31396057..1e0b6398 100644
> --- a/src/include/liburing/io_uring.h
> +++ b/src/include/liburing/io_uring.h
> @@ -126,6 +126,7 @@ enum io_uring_sqe_flags_bit {
>         IOSQE_ASYNC_BIT,
>         IOSQE_BUFFER_SELECT_BIT,
>         IOSQE_CQE_SKIP_SUCCESS_BIT,
> +       IOSQE_SQE_128B_BIT,

I thought we decided against using an SQE flag bit for this? Looks
like this needs to be re-synced with the kernel uapi header.

Best,
Caleb

>  };
>
>  /*
> @@ -145,6 +146,8 @@ enum io_uring_sqe_flags_bit {
>  #define IOSQE_BUFFER_SELECT    (1U << IOSQE_BUFFER_SELECT_BIT)
>  /* don't post CQE if request succeeded */
>  #define IOSQE_CQE_SKIP_SUCCESS (1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
> +/* this is a 128b/big-sqe posting */
> +#define IOSQE_SQE_128B          (1U << IOSQE_SQE_128B_BIT)
>
>  /*
>   * io_uring_setup() flags
> @@ -211,6 +214,12 @@ enum io_uring_sqe_flags_bit {
>   */
>  #define IORING_SETUP_CQE_MIXED         (1U << 18)
>
> +/*
> + *  Allow both 64b and 128b SQEs. If a 128b SQE is posted, it will have
> + *  IOSQE_SQE_128B set in sqe->flags.
> + */
> +#define IORING_SETUP_SQE_MIXED         (1U << 19)
> +
>  enum io_uring_op {
>         IORING_OP_NOP,
>         IORING_OP_READV,
> @@ -275,6 +284,8 @@ enum io_uring_op {
>         IORING_OP_READV_FIXED,
>         IORING_OP_WRITEV_FIXED,
>         IORING_OP_PIPE,
> +       IORING_OP_NOP128,
> +       IORING_OP_URING_CMD128,
>
>         /* this goes last, obviously */
>         IORING_OP_LAST,
> --
> 2.47.3
>

  reply	other threads:[~2025-09-24 20:20 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-24 15:12 [PATCHv3 0/3] Keith Busch
2025-09-24 15:12 ` [PATCHv3 1/3] Add support IORING_SETUP_SQE_MIXED Keith Busch
2025-09-24 20:20   ` Caleb Sander Mateos [this message]
2025-09-24 20:30     ` Keith Busch
2025-09-24 20:37       ` Caleb Sander Mateos
2025-09-24 15:12 ` [PATCHv3 1/1] io_uring: add support for IORING_SETUP_SQE_MIXED Keith Busch
2025-09-25 15:03   ` Jens Axboe
2025-09-25 18:21     ` Caleb Sander Mateos
2025-09-25 18:44       ` Jens Axboe
2025-09-24 15:12 ` [PATCHv3 2/3] Add nop testing " Keith Busch
2025-09-24 15:12 ` [PATCHv3 3/3] Add mixed sqe test for uring commands Keith Busch
2025-09-24 15:54 ` [PATCHv3 0/3] io_uring: mixed submission queue size support Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADUfDZrmFphH5AwNkLs=OtPg9qfnpciJB--28PVQ4q=5Fh21TQ@mail.gmail.com' \
    --to=csander@purestorage.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=kbusch@kernel.org \
    --cc=kbusch@meta.com \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox