public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org, Keith Busch <kbusch@kernel.org>
Subject: Re: [PATCHSET v2 0/8] Add support for mixed sized CQEs
Date: Thu, 21 Aug 2025 11:19:58 -0700	[thread overview]
Message-ID: <CADUfDZrbt1Yz7KwxEmOUc+Z+jgOvTzzqOq2cM91VBPXw-PEDAQ@mail.gmail.com> (raw)
In-Reply-To: <670929ea-b614-40cf-b5cc-929a39d9e59d@kernel.dk>

On Thu, Aug 21, 2025 at 10:46 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 8/21/25 11:41 AM, Caleb Sander Mateos wrote:
> > On Thu, Aug 21, 2025 at 10:12?AM Jens Axboe <axboe@kernel.dk> wrote:
> >>
> >> On 8/21/25 11:02 AM, Caleb Sander Mateos wrote:
> >>> On Thu, Aug 21, 2025 at 7:28?AM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Currently io_uring supports two modes for CQEs:
> >>>>
> >>>> 1) The standard mode, where 16b CQEs are used
> >>>> 2) Setting IORING_SETUP_CQE32, which makes all CQEs posted 32b
> >>>>
> >>>> Certain features need to pass more information back than just a single
> >>>> 32-bit res field, and hence mandate the use of CQE32 to be able to work.
> >>>> Examples of that include passthrough or other uses of ->uring_cmd() like
> >>>> socket option getting and setting, including timestamps.
> >>>>
> >>>> This patchset adds support for IORING_SETUP_CQE_MIXED, which allows
> >>>> posting both 16b and 32b CQEs on the same CQ ring. The idea here is that
> >>>> we need not waste twice the space for CQ rings, or use twice the space
> >>>> per CQE posted, if only some of the CQEs posted require the use of 32b
> >>>> CQEs. On a ring setup in CQE mixed mode, 32b posted CQEs will have
> >>>> IORING_CQE_F_32 set in cqe->flags to tell the application (or liburing)
> >>>> about this fact.
> >>>
> >>> This makes a lot of sense. Have you considered something analogous for
> >>> SQEs? Requiring all SQEs to be 128 bytes when an io_uring is used for
> >>> a mix of 64-byte and 128-byte SQEs also wastes memory, probably even
> >>> more since SQEs are 4x larger than CQEs.
> >>
> >> Adding Keith, as he and I literally just talked about that. My answer
> >> was that the case is a bit different in that 32b CQEs can be useful in
> >> cases that are predominately 16b in the first place. For example,
> >> networking workload doing send/recv/etc and the occassional
> >> get/setsockopt kind of thing. Or maybe a mix of normal recv and zero
> >> copy rx.
> >>
> >> For the SQE case, I think it's a bit different. At least the cases I
> >> know of, it's mostly 100% 64b SQEs or 128b SQEs. I'm certainly willing
> >> to be told otherwise! Because that is kind of the key question that
> >> needs answering before even thinking about doing that kind of work.
> >
> > We certainly have a use case that mixes the two on the same io_uring:
> > ublk commit/buffer register/unregister commands (64 byte SQEs) and
> > NVMe passthru commands (128 byte SQEs). I could also imagine an
> > application issuing both normal read/write commands and NVMe passthru
> > commands. But you're probably right that this isn't a super common use
> > case.
>
> Yes that's a good point, and that would roughly be 50/50 in terms of 64b
> vs 128b SQEs?

For our application, the ratio between 64 and 128 bytes SQEs depends
on the ublk workload. Small ublk I/Os are translated 1-1 into NVMe
passthru I/Os, so there can be as many as 3 64-byte ublk SQEs
(register buffer, unregister buffer, and commit) for each 128-byte
NVMe passthru SQE. Larger I/Os are sharded into more NVMe passthru
commands, so there are relatively more 128-byte SQEs. And some
workloads can't use ublk zero-copy (since the data needs to go through
a RAID computation), in which case the only 64-byte SQE is the ublk
commit.

Best,
Caleb

>
> And yes, I can imagine other uses cases too, but I'm also finding a hard
> time justifying those as likely. On the other hand, people do the
> weirdest things...
>
> >> But yes, it could be supported, and Keith (kind of) signed himself up to
> >> do that. One oddity I see on that side is that while with CQE32 the
> >> kernel can manage the potential wrap-around gap, for SQEs that's
> >> obviously on the application to do. That could just be a NOP or
> >> something like that, but you do need something to fill/skip that space.
> >> I guess that could be as simple as having an opcode that is simply "skip
> >> me", so on the kernel side it'd be easy as it'd just drop it on the
> >> floor. You still need to app side to fill one, however, and then deal
> >> with "oops SQ ring is now full" too.
> >
> > Sure, of course userspace would need to handle a misaligned big SQE at
> > the end of the SQ analogously to mixed CQE sizes. I assume liburing
> > should be able to do that mostly transparently, that logic could all
> > be encapsulated by io_uring_get_sqe().
>
> Yep I think so, we'd need a new helper to return the kind of SQE you
> want, and it'd just need to get a 64b one and mark it with the SKIP
> opcode first if being asked for a 128b one and we're one off from
> wrapping around.
>
> --
> Jens Axboe

      reply	other threads:[~2025-08-21 18:20 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-21 14:18 [PATCHSET v2 0/8] Add support for mixed sized CQEs Jens Axboe
2025-08-21 14:18 ` [PATCH 1/8] io_uring: remove io_ctx_cqe32() helper Jens Axboe
2025-08-21 14:18 ` [PATCH 2/8] io_uring: add UAPI definitions for mixed CQE postings Jens Axboe
2025-08-21 14:18 ` [PATCH 3/8] io_uring/fdinfo: handle mixed sized CQEs Jens Axboe
2025-08-21 14:18 ` [PATCH 4/8] io_uring/trace: support completion tracing of mixed 32b CQEs Jens Axboe
2025-08-21 14:18 ` [PATCH 5/8] io_uring: add support for IORING_SETUP_CQE_MIXED Jens Axboe
2025-08-21 14:18 ` [PATCH 6/8] io_uring/nop: " Jens Axboe
2025-08-21 14:18 ` [PATCH 7/8] io_uring/uring_cmd: " Jens Axboe
2025-08-21 14:18 ` [PATCH 8/8] io_uring/zcrx: " Jens Axboe
2025-08-21 17:02 ` [PATCHSET v2 0/8] Add support for mixed sized CQEs Caleb Sander Mateos
2025-08-21 17:12   ` Jens Axboe
2025-08-21 17:40     ` Keith Busch
2025-08-21 17:47       ` Jens Axboe
2025-08-21 17:41     ` Caleb Sander Mateos
2025-08-21 17:46       ` Jens Axboe
2025-08-21 18:19         ` Caleb Sander Mateos [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADUfDZrbt1Yz7KwxEmOUc+Z+jgOvTzzqOq2cM91VBPXw-PEDAQ@mail.gmail.com \
    --to=csander@purestorage.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=kbusch@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox