From: Pavel Begunkov <asml.silence@gmail.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Joanne Koong <joannelkoong@gmail.com>,
axboe@kernel.dk, io-uring@vger.kernel.org,
csander@purestorage.com, krisman@suse.de, bernd@bsbernd.com,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
Date: Fri, 13 Feb 2026 12:41:07 +0000 [thread overview]
Message-ID: <34cf24a3-f7f3-46ed-96be-bf716b2db060@gmail.com> (raw)
In-Reply-To: <aY7QX-BIW-SMJ3h_@infradead.org>
On 2/13/26 07:18, Christoph Hellwig wrote:
> On Thu, Feb 12, 2026 at 10:44:44AM +0000, Pavel Begunkov wrote:
>>>
>>> Any pages mapped to userspace can be allocated in the kernel as well.
>>
>> pow2 round ups will waste memory. 1MB allocations will never
>> become 2MB huge pages. And there is a separate question of
>> 1GB huge pages. The user can be smarter about all placement
>> decisions.
>
> Sure. But if the application cares that much about TLB pressure
> I'd just round up to nice multtiple of PTE levels.
>
>>
>>> And I really do like this design, because it means we can have a
>>> buffer ring that is only mapped read-only into userspace. That way
>>> we can still do zero-copy raids if the device requires stable pages
>>> for checksumming or raid. I was going to implement this as soon
>>> as this series lands upstream.
>>
>> That's an interesting case. To be clear, user provided memory is
>> an optional feature for pbuf rings / regions / etc., and I think
>> the io_uring uapi should leave fields for the feature. However, I
>> have nothing against fuse refusing to bind to buffer rings it
>> doesn't like.
>
> Can you clarify what you mean with 'pbuf'? The only fixed buffer API I
> know is io_uring_register_buffers* which always takes user provided
> buffers, so I have a hard time parsing what you're saying there. But
> that might just be sign that I'm no expert in io_uring APIs, and that
> web searches have degraded to the point of not being very useful
> anymore.
Registered, aka fixed, buffers are the ones you pass to
IORING_OP_[READ,WRITE]_FIXED and some other requests. It's normally
created by io_uring_register_buffers*() / IORING_REGISTER_BUFFERS*
with user memory, but there are special cases when it's installed
internally by other kernel components, e.g. ublk.
This series has nothing to do with them, and relevant parts of
the discussion here don't mention them either.
Provided buffer rings, a.k.a pbuf rings, IORING_REGISTER_PBUF_RING
is a kernel-user shared ring. The entries are user buffers
{uaddr, size}. The user space adds entries, the kernel (io_uring
requests) consumes them and issues I/O using the user addresses.
E.g. you can issue a IORING_OP_RECV request (+IOSQE_BUFFER_SELECT)
and it'll grab a buffer from the ring instead of using sqe->addr.
pbuf rings, IORING_REGISTER_MEM_REGION, completion/submission
queues and all other kernel-user rings/etc. are internally based
on so called regions. All of them support both user allocated
memory and kernel allocations + mmap.
This series essentially creates provided buffer rings, where
1. the ring now contains kernel addresses
2. the ring itself is in-kernel only and not shared with user space
3. it also allocates kernel buffers (as a region), populates the ring
with them, and allows mapping the buffers into the user space.
Fuse is doing both adding (kernel) buffers to the ring and consuming
them. At which point it's not clear:
1. Why it even needs io_uring provided buffer rings, it can be all
contained in fuse. Maybe it's trying to reuse pbuf ring code as
basically an internal memory allocator, but then why expose buffer
rings as an io_uring uapi instead of keeping it internally.
That's also why I mentioned whether those buffers are supposed to
be used with other types of io_uring requests like recv, etc.
2. Why making io_uring to allocate payload memory. The answer to which
is probably to reuse the region api with mmap and so on. And why
payload buffers are inseparably created together with the ring
and via a new io_uring uapi.
And yes, I believe in the current form it's inflexible, it requires
a new io_uring uapi. It requires the number of buffers to match
the number of ring entries, which are related but not the same
thing. You can't easily add more memory as it's bound to the ring
object. The buffer memory won't even have same lifetime as the
ring object -- allow using that km buffer ring with recv requests
and highly likely I'll most likely give you a way to crash the
kernel.
But hey, I'm tired. I don't have any beef here and am only trying
to make it a bit cleaner and flexible for fuse in the first place
without even questioning the I/O path. If everyone believes
everything is right, just ask Jens to merge it.
--
Pavel Begunkov
next prev parent reply other threads:[~2026-02-13 12:41 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
2026-02-10 0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
2026-02-10 0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2026-02-10 16:34 ` Pavel Begunkov
2026-02-10 19:39 ` Joanne Koong
2026-02-11 12:01 ` Pavel Begunkov
2026-02-11 22:06 ` Joanne Koong
2026-02-12 10:07 ` Christoph Hellwig
2026-02-12 10:52 ` Pavel Begunkov
2026-02-12 17:29 ` Joanne Koong
2026-02-13 7:27 ` Christoph Hellwig
2026-02-13 15:31 ` Pavel Begunkov
2026-02-13 15:48 ` Pavel Begunkov
2026-02-13 16:27 ` Pavel Begunkov
2026-02-13 7:21 ` Christoph Hellwig
2026-02-13 13:18 ` Pavel Begunkov
2026-02-13 15:26 ` Pavel Begunkov
2026-02-11 15:45 ` Christoph Hellwig
2026-02-12 10:44 ` Pavel Begunkov
2026-02-13 7:18 ` Christoph Hellwig
2026-02-13 12:41 ` Pavel Begunkov [this message]
2026-02-10 0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
2026-02-10 1:02 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2026-02-10 0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2026-02-10 1:07 ` Jens Axboe
2026-02-10 17:57 ` Caleb Sander Mateos
2026-02-10 18:00 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
2026-02-10 0:52 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2026-02-10 0:53 ` Jens Axboe
2026-02-10 22:36 ` Joanne Koong
2026-02-10 0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2026-02-10 0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
2026-02-10 22:45 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=34cf24a3-f7f3-46ed-96be-bf716b2db060@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=bernd@bsbernd.com \
--cc=csander@purestorage.com \
--cc=hch@infradead.org \
--cc=io-uring@vger.kernel.org \
--cc=joannelkoong@gmail.com \
--cc=krisman@suse.de \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox