public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Joanne Koong <joannelkoong@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	axboe@kernel.dk, io-uring@vger.kernel.org,
	csander@purestorage.com, krisman@suse.de, bernd@bsbernd.com,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
Date: Wed, 18 Feb 2026 12:36:43 +0000	[thread overview]
Message-ID: <7a62c5a9-1ac2-4cc2-a22f-e5b0c52dabea@gmail.com> (raw)
In-Reply-To: <CAJnrk1a+YuPpoLghA01uJhEKrhmrLhQ+5bw2OeeuLG3tG8p6Ew@mail.gmail.com>

On 2/13/26 22:04, Joanne Koong wrote:
> On Fri, Feb 13, 2026 at 4:41 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
...
>> Fuse is doing both adding (kernel) buffers to the ring and consuming
>> them. At which point it's not clear:
>>
>> 1. Why it even needs io_uring provided buffer rings, it can be all
>>      contained in fuse. Maybe it's trying to reuse pbuf ring code as
>>      basically an internal memory allocator, but then why expose buffer
>>      rings as an io_uring uapi instead of keeping it internally.
>>
>>      That's also why I mentioned whether those buffers are supposed to
>>      be used with other types of io_uring requests like recv, etc.
> 
> On the userspace/server side, it uses the buffers for other io-uring
> operations (eg reading or writing the contents from/to a
> locally-backed file).

Oops, typo. I was asking whether the buffer rings (not buffers) are
supposed to be used with other requests. E.g. submitting a
IORING_OP_RECV with IOSQE_BUFFER_SELECT set and the bgid specifying
your kernel-managed buffer ring.

>> 2. Why making io_uring to allocate payload memory. The answer to which
>>      is probably to reuse the region api with mmap and so on. And why
>>      payload buffers are inseparably created together with the ring
> 
> My main motivation for this is simplicity. I see (and thanks for
> explaining) that using a registered mem region allows the use of some
> optimizations (the only one I know of right now is the PMD one you
> mentioned but maybe there's more I'm missing) that could be useful for
> some workloads, but I don't think (and this could just be my lack of
> understanding of what more optimizations there are) most use cases of
> kmbufs benefit from those optimizations, so to me it feels like we're
> adding non-trivial complexity for no noticeable benefit.

There are two separate arguments. The first is about not making buffers
inseparable from buffer rings in the io_uring user API. Whether it's
IORING_REGISTER_MEM_REGION or something else is not that important.
I have no objection if it's a part of fuse instead though, e.g. if
fuse binds two objects together when you register it with fuse, or even
if fuse create a buffer ring internally (assuming it doesn't indirectly
leak into io_uring uapi).

And the second was about optionally allowing user memory for buffer
creation as you're reusing the region abstraction. You can find pros
and cons for both modes, and funnily enough, SQ/CQ were first kernel
allocated and then people asked for backing it by user memory, and IIRC
it was in the reverse order for pbuf rings.

Implementing this is trivial as well, you just need to pass an argument
while creating a region. All new region users use struct
io_uring_region_desc for uapi and forward it to io_create_region()
without caring if it's user or kernel allocated memory.

> I feel like we get the best of both worlds by letting users have both:
> the simple kernel-managed pbuf where the kernel allocates the buffers
> and the buffers are tied to the lifecycle of the ring, and the more
> advanced kernel-managed pbuf where buffers are tied to a registered
> memory region that the subsystem is responsible for later populating
> the ring with.
> 
>>      and via a new io_uring uapi.
> 
> imo it felt cleaner to have a new uapi for it because kmbufs and pbufs

The stress is on why it's an _io_uring_ API. It doesn't matter to me
whether it's a separate opcode or not. Currently, buffer rings don't give
you anything that can't be pure fuse, and it might be simpler to have
it implemented in fuse than binding to some io_uring object. Or it could
create buffer rings internally to reuse code but it doesn't become an
io_uring uapi but rather implementation detail. And that predicates on
whether km rings are intended to be used with other / non-fuse requests.

> have different expectations and behaviors (eg pbufs only work with
> user-provided buffers and requires userspace to populate the ring
> before using it, whereas for kmbufs the kernel allocates the buffers
> and populates it for you; pbufs require userspace to recycle back the
> buffer, whereas for kmbufs the kernel is the one in control of
> recycling) and from the user pov it seemed confusing to have kmbufs as
> part of the pbuf ring uapi, instead of separating it out as a
> different type of ringbuffer with a different expectation and

I believe the source of disagreement is that you're thinking
about how it's going to look like for fuse specifically, and I
believe you that it'll be nicer for the fuse use case. However,
on the other hand it's an io_uring uapi, and if it is an io_uring
uapi, we need reusable blocks that are not specific to particular
users.

If it km rings has to stay an io_uring uapi, I guess a middle
ground would be to allow registering km rings together with memory,
but make it a pure region without a notion of a buffer, and let
fuse to chunk it. Later, we can make payload memory allocation
optional.

> behavior. I was trying to make the point that combining the interface
> if we go with IORING_MEM_REGION gets even more confusing because now
> pbufs that are kernel-managed are also empty at initialization and
> only can point to areas inside a registered mem region and the
> responsibility of populating it is now on whatever subsystem is using
> it.

Right, intentionally so, because otherwise it's a fuse uapi that
pretends to be a generic io_uring uapi but it's not because of
all assumptions in different places.

> I still have this opinion but I also think in general, you likely know
> better than I do what kind of io-uring uapi is best for io-uring's
> users. For v2 I'll have kmbufs go through the pbuf uapi.
> 
>>
>>      And yes, I believe in the current form it's inflexible, it requires
>>      a new io_uring uapi. It requires the number of buffers to match
>>      the number of ring entries, which are related but not the same
> 
> I'm not really seeing what the purpose of having a ring entry with no
> buffer associated with it is. In the existing code for non-kernel
> managed pbuf rings, there's the same tie between reg->ring_entries
> being used as the marker for how many buffers the ring supports. But

Not really, it tells the buffer ring depth but says nothing about
how much memory user space allocated and how it's pushed. It's a
reasonable default but they could be different. For example, if you
expect adding more memory at runtime, you might create the buffer
ring a bit larger. Or when server processing takes a while and you
can't recycle until it finishes, you might have more buffers than
you need ring entries. Or you might might decide to split buffers
and as you mentioned incremental consumption, which is an entire
separate topic because it doesn't do de-fragmentation and you'd
need to have it in fuse, just like user space does with pbufs.

> if the number of buffers should be different than the number of ring
> entries, this can be easily fixed by passing in the number of buffers
> from the uapi for kernel-managed pbuf rings.

My entire point is that we're making lots of assumptions for io_uring
uapi, and if it's moved to fuse because it knows better what it
needs, it should be a win.

IOW, it sounds better if instead of passing the number of buffers to
io_uring, you just ask it to create a large chunk of memory, and then
fuse chunks it up and puts into the ring.

>>      thing. You can't easily add more memory as it's bound to the ring
>>      object. The buffer memory won't even have same lifetime as the
> 
> To play devil's advocate, we also can't easily add more memory to the
> mem region once it's been registered. I think there's also a worse
> penalty where the user needs to know upfront how much memory to
> allocate for the mem region for the lifetime of the ring, which imo
> may be hard to do (eg if a kernel-managed buf ring only needs to be
> registered for some code paths and not others, the mem region
> registration would still have to allocate the memory a potential kbuf
> ring would use).

I agree, and you'd need something new in either case to add more
memory, and it doesn't need to be IORING_REGISTER_MEM_REGION
specifically.

>>      ring object -- allow using that km buffer ring with recv requests
>>      and highly likely I'll most likely give you a way to crash the
>>      kernel.
> 
> I'm a bit confused by this part. The buffer memory does have the same
> lifetime as the ring object, no? The buffers only get freed when the
> ring itself is freed.

Unregistering a buffer ring doesn't guarantee that there are no
inflight requests that are still using buffers that came out of
the buffer ring. The fuse driver can wait/terminate its requests
before unregisteration, but allow userspace issued IORING_OP_RECV
to use this km buffer ring, and you'll need to somehow synchronise
with all other io_uring requests.

-- 
Pavel Begunkov


  reply	other threads:[~2026-02-18 12:36 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
2026-02-10  0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
2026-02-10  0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2026-02-10 16:34   ` Pavel Begunkov
2026-02-10 19:39     ` Joanne Koong
2026-02-11 12:01       ` Pavel Begunkov
2026-02-11 22:06         ` Joanne Koong
2026-02-12 10:07           ` Christoph Hellwig
2026-02-12 10:52             ` Pavel Begunkov
2026-02-12 17:29               ` Joanne Koong
2026-02-13  7:27                 ` Christoph Hellwig
2026-02-13 15:31                   ` Pavel Begunkov
2026-02-13 15:48                     ` Pavel Begunkov
2026-02-13 19:09                     ` Joanne Koong
2026-02-13 19:30                       ` Bernd Schubert
2026-02-13 19:38                         ` Joanne Koong
2026-02-17  5:36                       ` Christoph Hellwig
2026-02-13 19:14                   ` Joanne Koong
2026-02-17  5:38                     ` Christoph Hellwig
2026-02-18  9:51                       ` Pavel Begunkov
2026-02-13 16:27                 ` Pavel Begunkov
2026-02-13  7:21               ` Christoph Hellwig
2026-02-13 13:18                 ` Pavel Begunkov
2026-02-13 15:26           ` Pavel Begunkov
2026-02-11 15:45     ` Christoph Hellwig
2026-02-12 10:44       ` Pavel Begunkov
2026-02-13  7:18         ` Christoph Hellwig
2026-02-13 12:41           ` Pavel Begunkov
2026-02-13 22:04             ` Joanne Koong
2026-02-18 12:36               ` Pavel Begunkov [this message]
2026-02-10  0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
2026-02-10  1:02   ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2026-02-10  0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2026-02-10  1:07   ` Jens Axboe
2026-02-10 17:57     ` Caleb Sander Mateos
2026-02-10 18:00       ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
2026-02-10  0:52   ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2026-02-10  0:53   ` Jens Axboe
2026-02-10 22:36     ` Joanne Koong
2026-02-10  0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2026-02-10  0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
2026-02-10 22:45   ` Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a62c5a9-1ac2-4cc2-a22f-e5b0c52dabea@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bernd@bsbernd.com \
    --cc=csander@purestorage.com \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=joannelkoong@gmail.com \
    --cc=krisman@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox