From: Joanne Koong <joannelkoong@gmail.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>,
axboe@kernel.dk, io-uring@vger.kernel.org,
csander@purestorage.com, krisman@suse.de, bernd@bsbernd.com,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
Date: Fri, 13 Feb 2026 14:04:06 -0800 [thread overview]
Message-ID: <CAJnrk1a+YuPpoLghA01uJhEKrhmrLhQ+5bw2OeeuLG3tG8p6Ew@mail.gmail.com> (raw)
In-Reply-To: <34cf24a3-f7f3-46ed-96be-bf716b2db060@gmail.com>
On Fri, Feb 13, 2026 at 4:41 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 2/13/26 07:18, Christoph Hellwig wrote:
> > On Thu, Feb 12, 2026 at 10:44:44AM +0000, Pavel Begunkov wrote:
> >>>
> > Can you clarify what you mean with 'pbuf'? The only fixed buffer API I
> > know is io_uring_register_buffers* which always takes user provided
> > buffers, so I have a hard time parsing what you're saying there. But
> > that might just be sign that I'm no expert in io_uring APIs, and that
> > web searches have degraded to the point of not being very useful
> > anymore.
>
> Registered, aka fixed, buffers are the ones you pass to
> IORING_OP_[READ,WRITE]_FIXED and some other requests. It's normally
> created by io_uring_register_buffers*() / IORING_REGISTER_BUFFERS*
> with user memory, but there are special cases when it's installed
> internally by other kernel components, e.g. ublk.
> This series has nothing to do with them, and relevant parts of
> the discussion here don't mention them either.
>
> Provided buffer rings, a.k.a pbuf rings, IORING_REGISTER_PBUF_RING
> is a kernel-user shared ring. The entries are user buffers
> {uaddr, size}. The user space adds entries, the kernel (io_uring
> requests) consumes them and issues I/O using the user addresses.
> E.g. you can issue a IORING_OP_RECV request (+IOSQE_BUFFER_SELECT)
> and it'll grab a buffer from the ring instead of using sqe->addr.
>
> pbuf rings, IORING_REGISTER_MEM_REGION, completion/submission
> queues and all other kernel-user rings/etc. are internally based
> on so called regions. All of them support both user allocated
> memory and kernel allocations + mmap.
>
> This series essentially creates provided buffer rings, where
> 1. the ring now contains kernel addresses
> 2. the ring itself is in-kernel only and not shared with user space
> 3. it also allocates kernel buffers (as a region), populates the ring
> with them, and allows mapping the buffers into the user space.
The most important part and the whole reason fuse needs the buffer
ring to be kernel-managed is because the kernel needs to control when
buffers get recycled back into the ring. For fuse's use case, the
buffer is used for passing data between the kernel and the server. We
can't have the server recycle the buffer because the server writes
back data to the kernel in that buffer when it submits the sqe. After
fuse receives the sqe and reads the reply from the server, it then
needs to recycle that buffer back into the ring so it can be reused
for a future cqe (eg sending a future request).
>
> Fuse is doing both adding (kernel) buffers to the ring and consuming
> them. At which point it's not clear:
>
> 1. Why it even needs io_uring provided buffer rings, it can be all
> contained in fuse. Maybe it's trying to reuse pbuf ring code as
> basically an internal memory allocator, but then why expose buffer
> rings as an io_uring uapi instead of keeping it internally.
>
> That's also why I mentioned whether those buffers are supposed to
> be used with other types of io_uring requests like recv, etc.
On the userspace/server side, it uses the buffers for other io-uring
operations (eg reading or writing the contents from/to a
locally-backed file).
>
> 2. Why making io_uring to allocate payload memory. The answer to which
> is probably to reuse the region api with mmap and so on. And why
> payload buffers are inseparably created together with the ring
My main motivation for this is simplicity. I see (and thanks for
explaining) that using a registered mem region allows the use of some
optimizations (the only one I know of right now is the PMD one you
mentioned but maybe there's more I'm missing) that could be useful for
some workloads, but I don't think (and this could just be my lack of
understanding of what more optimizations there are) most use cases of
kmbufs benefit from those optimizations, so to me it feels like we're
adding non-trivial complexity for no noticeable benefit.
I feel like we get the best of both worlds by letting users have both:
the simple kernel-managed pbuf where the kernel allocates the buffers
and the buffers are tied to the lifecycle of the ring, and the more
advanced kernel-managed pbuf where buffers are tied to a registered
memory region that the subsystem is responsible for later populating
the ring with.
> and via a new io_uring uapi.
imo it felt cleaner to have a new uapi for it because kmbufs and pbufs
have different expectations and behaviors (eg pbufs only work with
user-provided buffers and requires userspace to populate the ring
before using it, whereas for kmbufs the kernel allocates the buffers
and populates it for you; pbufs require userspace to recycle back the
buffer, whereas for kmbufs the kernel is the one in control of
recycling) and from the user pov it seemed confusing to have kmbufs as
part of the pbuf ring uapi, instead of separating it out as a
different type of ringbuffer with a different expectation and
behavior. I was trying to make the point that combining the interface
if we go with IORING_MEM_REGION gets even more confusing because now
pbufs that are kernel-managed are also empty at initialization and
only can point to areas inside a registered mem region and the
responsibility of populating it is now on whatever subsystem is using
it.
I still have this opinion but I also think in general, you likely know
better than I do what kind of io-uring uapi is best for io-uring's
users. For v2 I'll have kmbufs go through the pbuf uapi.
>
> And yes, I believe in the current form it's inflexible, it requires
> a new io_uring uapi. It requires the number of buffers to match
> the number of ring entries, which are related but not the same
I'm not really seeing what the purpose of having a ring entry with no
buffer associated with it is. In the existing code for non-kernel
managed pbuf rings, there's the same tie between reg->ring_entries
being used as the marker for how many buffers the ring supports. But
if the number of buffers should be different than the number of ring
entries, this can be easily fixed by passing in the number of buffers
from the uapi for kernel-managed pbuf rings.
> thing. You can't easily add more memory as it's bound to the ring
> object. The buffer memory won't even have same lifetime as the
To play devil's advocate, we also can't easily add more memory to the
mem region once it's been registered. I think there's also a worse
penalty where the user needs to know upfront how much memory to
allocate for the mem region for the lifetime of the ring, which imo
may be hard to do (eg if a kernel-managed buf ring only needs to be
registered for some code paths and not others, the mem region
registration would still have to allocate the memory a potential kbuf
ring would use).
> ring object -- allow using that km buffer ring with recv requests
> and highly likely I'll most likely give you a way to crash the
> kernel.
I'm a bit confused by this part. The buffer memory does have the same
lifetime as the ring object, no? The buffers only get freed when the
ring itself is freed.
>
> But hey, I'm tired. I don't have any beef here and am only trying
> to make it a bit cleaner and flexible for fuse in the first place
> without even questioning the I/O path. If everyone believes
I appreciate you looking at this and giving your feedback and insight.
Thank you for doing so. I don't want to merge in something you're
unhappy with.
Are you open to having support for both a simple kernel-managed pbuf
interface and later on if/when the need arises, a kernel-managed pbuf
interface that goes through a registered memory region? If the answer
is no, then I'll make the change to have kmbufs go through the
registered memory region.
Thanks,
Joanne
> everything is right, just ask Jens to merge it.
>
> --
> Pavel Begunkov
>
next prev parent reply other threads:[~2026-02-13 22:04 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
2026-02-10 0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
2026-02-10 0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2026-02-10 16:34 ` Pavel Begunkov
2026-02-10 19:39 ` Joanne Koong
2026-02-11 12:01 ` Pavel Begunkov
2026-02-11 22:06 ` Joanne Koong
2026-02-12 10:07 ` Christoph Hellwig
2026-02-12 10:52 ` Pavel Begunkov
2026-02-12 17:29 ` Joanne Koong
2026-02-13 7:27 ` Christoph Hellwig
2026-02-13 15:31 ` Pavel Begunkov
2026-02-13 15:48 ` Pavel Begunkov
2026-02-13 19:09 ` Joanne Koong
2026-02-13 19:30 ` Bernd Schubert
2026-02-13 19:38 ` Joanne Koong
2026-02-17 5:36 ` Christoph Hellwig
2026-02-13 19:14 ` Joanne Koong
2026-02-17 5:38 ` Christoph Hellwig
2026-02-18 9:51 ` Pavel Begunkov
2026-02-13 16:27 ` Pavel Begunkov
2026-02-13 7:21 ` Christoph Hellwig
2026-02-13 13:18 ` Pavel Begunkov
2026-02-13 15:26 ` Pavel Begunkov
2026-02-27 1:12 ` Joanne Koong
2026-02-27 20:48 ` Pavel Begunkov
2026-03-02 20:50 ` Joanne Koong
2026-02-11 15:45 ` Christoph Hellwig
2026-02-12 10:44 ` Pavel Begunkov
2026-02-13 7:18 ` Christoph Hellwig
2026-02-13 12:41 ` Pavel Begunkov
2026-02-13 22:04 ` Joanne Koong [this message]
2026-02-18 12:36 ` Pavel Begunkov
2026-02-18 21:43 ` Joanne Koong
2026-02-20 12:53 ` Pavel Begunkov
2026-02-21 2:14 ` Joanne Koong
2026-02-23 20:00 ` Pavel Begunkov
2026-02-24 22:19 ` Joanne Koong
2026-02-27 20:05 ` Pavel Begunkov
2026-03-02 19:49 ` Joanne Koong
2026-02-10 0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
2026-02-10 1:02 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2026-02-10 0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2026-02-10 1:07 ` Jens Axboe
2026-02-10 17:57 ` Caleb Sander Mateos
2026-02-10 18:00 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
2026-02-10 0:52 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2026-02-10 0:53 ` Jens Axboe
2026-02-10 22:36 ` Joanne Koong
2026-02-10 0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2026-02-10 0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
2026-02-10 22:45 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJnrk1a+YuPpoLghA01uJhEKrhmrLhQ+5bw2OeeuLG3tG8p6Ew@mail.gmail.com \
--to=joannelkoong@gmail.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=bernd@bsbernd.com \
--cc=csander@purestorage.com \
--cc=hch@infradead.org \
--cc=io-uring@vger.kernel.org \
--cc=krisman@suse.de \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox