From: Joanne Koong <joannelkoong@gmail.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Christoph Hellwig <hch@infradead.org>,
axboe@kernel.dk, io-uring@vger.kernel.org,
csander@purestorage.com, krisman@suse.de, bernd@bsbernd.com,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
Date: Wed, 18 Feb 2026 13:43:38 -0800 [thread overview]
Message-ID: <CAJnrk1Y5iTOhj4_RbnR7RJPkr7fFcCdh1gY=3Hm72M91D-SnyQ@mail.gmail.com> (raw)
In-Reply-To: <7a62c5a9-1ac2-4cc2-a22f-e5b0c52dabea@gmail.com>
On Wed, Feb 18, 2026 at 4:36 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 2/13/26 22:04, Joanne Koong wrote:
> > On Fri, Feb 13, 2026 at 4:41 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> ...
> >> Fuse is doing both adding (kernel) buffers to the ring and consuming
> >> them. At which point it's not clear:
> >>
> >> 1. Why it even needs io_uring provided buffer rings, it can be all
> >> contained in fuse. Maybe it's trying to reuse pbuf ring code as
> >> basically an internal memory allocator, but then why expose buffer
> >> rings as an io_uring uapi instead of keeping it internally.
> >>
> >> That's also why I mentioned whether those buffers are supposed to
> >> be used with other types of io_uring requests like recv, etc.
> >
> > On the userspace/server side, it uses the buffers for other io-uring
> > operations (eg reading or writing the contents from/to a
> > locally-backed file).
>
Sorry, I submitted v2 last night thinking the conversation on this
thread had died. After reading through your reply, I'll modify v2.
> Oops, typo. I was asking whether the buffer rings (not buffers) are
> supposed to be used with other requests. E.g. submitting a
> IORING_OP_RECV with IOSQE_BUFFER_SELECT set and the bgid specifying
> your kernel-managed buffer ring.
Yes the buffer rings are intended to be used with other io-uring
requests. The ideal scenario is that the user can then do the
equivalent of IORING_OP_READ/WRITE_FIXED operations on the
kernel-managed buffers and avoid the per-i/o page pinning overhead
costs.
>
> >> 2. Why making io_uring to allocate payload memory. The answer to which
> >> is probably to reuse the region api with mmap and so on. And why
> >> payload buffers are inseparably created together with the ring
> >
> > My main motivation for this is simplicity. I see (and thanks for
> > explaining) that using a registered mem region allows the use of some
> > optimizations (the only one I know of right now is the PMD one you
> > mentioned but maybe there's more I'm missing) that could be useful for
> > some workloads, but I don't think (and this could just be my lack of
> > understanding of what more optimizations there are) most use cases of
> > kmbufs benefit from those optimizations, so to me it feels like we're
> > adding non-trivial complexity for no noticeable benefit.
>
> There are two separate arguments. The first is about not making buffers
> inseparable from buffer rings in the io_uring user API. Whether it's
> IORING_REGISTER_MEM_REGION or something else is not that important.
> I have no objection if it's a part of fuse instead though, e.g. if
> fuse binds two objects together when you register it with fuse, or even
> if fuse create a buffer ring internally (assuming it doesn't indirectly
> leak into io_uring uapi).
>
> And the second was about optionally allowing user memory for buffer
> creation as you're reusing the region abstraction. You can find pros
> and cons for both modes, and funnily enough, SQ/CQ were first kernel
> allocated and then people asked for backing it by user memory, and IIRC
> it was in the reverse order for pbuf rings.
>
> Implementing this is trivial as well, you just need to pass an argument
> while creating a region. All new region users use struct
> io_uring_region_desc for uapi and forward it to io_create_region()
> without caring if it's user or kernel allocated memory.
>
> > I feel like we get the best of both worlds by letting users have both:
> > the simple kernel-managed pbuf where the kernel allocates the buffers
> > and the buffers are tied to the lifecycle of the ring, and the more
> > advanced kernel-managed pbuf where buffers are tied to a registered
> > memory region that the subsystem is responsible for later populating
> > the ring with.
> >
> >> and via a new io_uring uapi.
> >
> > imo it felt cleaner to have a new uapi for it because kmbufs and pbufs
>
> The stress is on why it's an _io_uring_ API. It doesn't matter to me
> whether it's a separate opcode or not. Currently, buffer rings don't give
> you anything that can't be pure fuse, and it might be simpler to have
> it implemented in fuse than binding to some io_uring object. Or it could
> create buffer rings internally to reuse code but it doesn't become an
> io_uring uapi but rather implementation detail. And that predicates on
> whether km rings are intended to be used with other / non-fuse requests.
>
> > have different expectations and behaviors (eg pbufs only work with
> > user-provided buffers and requires userspace to populate the ring
> > before using it, whereas for kmbufs the kernel allocates the buffers
> > and populates it for you; pbufs require userspace to recycle back the
> > buffer, whereas for kmbufs the kernel is the one in control of
> > recycling) and from the user pov it seemed confusing to have kmbufs as
> > part of the pbuf ring uapi, instead of separating it out as a
> > different type of ringbuffer with a different expectation and
>
> I believe the source of disagreement is that you're thinking
> about how it's going to look like for fuse specifically, and I
> believe you that it'll be nicer for the fuse use case. However,
> on the other hand it's an io_uring uapi, and if it is an io_uring
> uapi, we need reusable blocks that are not specific to particular
> users.
I agree 100%. The api we add should be what's best for io-uring, not fuse.
For the majority of use cases, it seemed to me that having the buffers
separated from the buffer rings didn't yield perceptible benefits but
added complexity and more restrictions like having to statically know
up front how big the mem region needs to be across the lifetime of the
io-uring for anything the io-uring might use the mem region for. It
seems more generically useful as a concept to have the buffers owned
by the ring and tied to the lifetime of the ring. I like how with this
design everything is self-contained and multiple subsystems can use it
without having to reimplement functionality locally in the subsystem.
On the other hand, I see your point about how it might be something
users want in the future if they want complete control over which
parts of the mem region get used as the backing buffers to do stuff
like PMD optimizations.
I think this is a matter of opinion/preference and I think in general
for anything io-uring related, yours should take precedence.
With it going through a mem region, I don't think it should even go
through the "pbuf ring" interface then if it's not going to specify
the number of entries and buffer sizes upfront, if support is added
for io-uring normal requests (eg IORING_OP_READ/WRITE) to use the
backing pages from a memory region and if we're able to guarantee that
the registered memory region will never be able to be unregistered by
the user. I think if we repurpose the
union {
__u64 addr; /* pointer to buffer or iovecs */
__u64 splice_off_in;
};
fields in the struct io_uring_sqe to
union {
__u64 addr; /* pointer to buffer or iovecs */
__u64 splice_off_in;
__u64 offset; /* offset into registered mem region */
};
and add some IOSQE_ flag to indicate it should find the pages from the
registered mem region, then that should work for normal requests.
Where on the kernel side, it looks up the associated pages stored in
the io_mapped_region's pages array for the offset passed in.
Right now there's only a uapi to register a memory region and none to
unregister one. Is it guaranteed that io-uring will never add
something in the future that will let userspace unregister the memory
region or at least unregister it while it's being used (eg if we add
future refcounting to it to track active uses of it)?
If so, then end-to-end, with it going through the mem region, it would
be something like:
* user creates a mem region for the io-uring
* user mmaps the mem region
* user passes in offset into region, length of each buffer, and number
of entries in the ring to the subsystem
* subsystem creates a locally managed bufring and adds buffers to that
ring from the mem region
* on the cqe side, it sends the buffer id of the registered mem region
through the same "IORING_CQE_F_BUFFER | (buf_id <<
IORING_CQE_BUFFER_SHIFT)" mechanism
Does this design match what you had in mind / prefer?
I think the above works for Christoph's use case too (as his and my
use case are the same) but if not, please let me know.
Thanks,
Joanne
next prev parent reply other threads:[~2026-02-18 21:43 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
2026-02-10 0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
2026-02-10 0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2026-02-10 16:34 ` Pavel Begunkov
2026-02-10 19:39 ` Joanne Koong
2026-02-11 12:01 ` Pavel Begunkov
2026-02-11 22:06 ` Joanne Koong
2026-02-12 10:07 ` Christoph Hellwig
2026-02-12 10:52 ` Pavel Begunkov
2026-02-12 17:29 ` Joanne Koong
2026-02-13 7:27 ` Christoph Hellwig
2026-02-13 15:31 ` Pavel Begunkov
2026-02-13 15:48 ` Pavel Begunkov
2026-02-13 19:09 ` Joanne Koong
2026-02-13 19:30 ` Bernd Schubert
2026-02-13 19:38 ` Joanne Koong
2026-02-17 5:36 ` Christoph Hellwig
2026-02-13 19:14 ` Joanne Koong
2026-02-17 5:38 ` Christoph Hellwig
2026-02-18 9:51 ` Pavel Begunkov
2026-02-13 16:27 ` Pavel Begunkov
2026-02-13 7:21 ` Christoph Hellwig
2026-02-13 13:18 ` Pavel Begunkov
2026-02-13 15:26 ` Pavel Begunkov
2026-02-27 1:12 ` Joanne Koong
2026-02-27 20:48 ` Pavel Begunkov
2026-03-02 20:50 ` Joanne Koong
2026-02-11 15:45 ` Christoph Hellwig
2026-02-12 10:44 ` Pavel Begunkov
2026-02-13 7:18 ` Christoph Hellwig
2026-02-13 12:41 ` Pavel Begunkov
2026-02-13 22:04 ` Joanne Koong
2026-02-18 12:36 ` Pavel Begunkov
2026-02-18 21:43 ` Joanne Koong [this message]
2026-02-20 12:53 ` Pavel Begunkov
2026-02-21 2:14 ` Joanne Koong
2026-02-23 20:00 ` Pavel Begunkov
2026-02-24 22:19 ` Joanne Koong
2026-02-27 20:05 ` Pavel Begunkov
2026-03-02 19:49 ` Joanne Koong
2026-02-10 0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
2026-02-10 1:02 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2026-02-10 0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2026-02-10 1:07 ` Jens Axboe
2026-02-10 17:57 ` Caleb Sander Mateos
2026-02-10 18:00 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
2026-02-10 0:52 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2026-02-10 0:53 ` Jens Axboe
2026-02-10 22:36 ` Joanne Koong
2026-02-10 0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2026-02-10 0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
2026-02-10 22:45 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJnrk1Y5iTOhj4_RbnR7RJPkr7fFcCdh1gY=3Hm72M91D-SnyQ@mail.gmail.com' \
--to=joannelkoong@gmail.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=bernd@bsbernd.com \
--cc=csander@purestorage.com \
--cc=hch@infradead.org \
--cc=io-uring@vger.kernel.org \
--cc=krisman@suse.de \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox