From: Joanne Koong <joannelkoong@gmail.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: axboe@kernel.dk, io-uring@vger.kernel.org,
csander@purestorage.com, krisman@suse.de, bernd@bsbernd.com,
hch@infradead.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
Date: Mon, 2 Mar 2026 12:50:55 -0800 [thread overview]
Message-ID: <CAJnrk1YGaF=5TOgeUo34=iOYRdxz+xBEwg7+A=2QjTBxUp=c4g@mail.gmail.com> (raw)
In-Reply-To: <ae3d2ea3-c835-495b-a033-01a5c9fd82fc@gmail.com>
On Fri, Feb 27, 2026 at 12:48 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 2/27/26 01:12, Joanne Koong wrote:
> ...
> >>> Regions shouldn't know anything about your buffers, how it's
> >>> subdivided after, etc.
> >
> > I still think the memory for the buffers should be tied to the ring
> > itself and allocated physically contiguously per buffer. Per-buffer
> > contiguity will enable the most efficient DMA path for servers to send
> > read/write data to local storage or the network. If the buffers for
> > the bufring have to be allocated as one single memory region, the
> > io_mem_alloc_compound() call will fail for this large allocation size.
> > Even if io_mem_alloc_compound() did succeed, this is a waste as the
> > buffer pool as an entity doesn't need to be physically contiguous,
> > just the individual buffers themselves. For fuse, the server
> > configures what buffer pool size it wants to use, depending on what
> > queue depth and max request size it needs. So for most use cases, at
> > least for high-performance servers, allocation will have to fall back
> > to alloc_pages_bulk_node(), which doesn't allocate contiguously. You
> > mentioned in an earlier comment that this "only violates abstractions"
> > - which abstractions does this break? The pre-existing behavior
> > already defaults to allocating pages non-contiguously if the mem
> > region can't be allocated fully contiguously.
>
> Regions has uapi (see struct io_uring_region_desc) so that users
> can operate with them in a unified manner. If you want regions to
> be allocated in some special way, just extend it.
You can't say "regions shouldn't know anything about your buffers, how
it's subdivided, etc" and then also say "extend the region uapi for
special allocation to make it buffer-compatible". If we extend the
region uapi to specify contiguous chunks of size X starting at offset
Y for len Z, that is basically encoding buffer layout information into
the region. The buffer ring already knows buffer sizes and count - it
is the natural place to express contiguity requirements.
Pushing this into the region abstraction muddies the uapi and forces
awkward indirection where callers now need to manually synchronize
region chunk specifications with their buffer layout. Memory regions
are generic and will be used for purposes beyond kmbufs. forcing
buffer-specific allocation semantics into the region UAPI pollutues a
general abstraction with domain-specific details.
>
> > Going through registered buffers doesn't help either. Fuse servers can
> > be unprivileged and it's not guaranteed that there are enough huge
> > pages reserved or that another process hasn't taken them or that the
> > server has privileges to pre-reserve pages for the allocation. Also
>
> There is THP these days. And FWIW, we should be vigilant about not
THP is opportunistic and not guaranteed. It depends on external
factors like fragmentation, memory pressure, system settings, etc. For
high-performance FUSE servers where deterministic DMA efficiency is
required, this doesn't suffice.
> using io_uring to work around capabilities and mm policies. If user
This isn't working around capabilities / mm policies. The user isn't
getting contiguous physical memory to use freely, the kernel is
allocating it internally to service I/O efficiently. Providing
infrastructure for efficient DMA isn't a capability / mm bypass, this
is standard kernel behavior. When userspace does i/o through sockets
or block devices, the kernel routinely allocates contiguous memory
with dma_alloc_coherent() or alloc_pages() with order > 0. That's
exactly the point I'm trying to make - users shouldn't have to do this
themselves (eg going through registered buffers with user-allocated
buffers). The kernel should handle it internally.
> can't do it, io_uring shouldn't either. It's also all accounted
> against mlock, if the limit is not high enough, you won't be able
> to use this feature at all.
The mlock point is orthogonal. it restricts how much memory a user can
pin, but contiguous and noncontiguous allocations of the same size
consume the same mlock budget.
Thanks,
Joanne
>
> > the 2 MB granularity is inflexible while 1 GB is too much.
>
> --
> Pavel Begunkov
>
next prev parent reply other threads:[~2026-03-02 20:51 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
2026-02-10 0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
2026-02-10 0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2026-02-10 16:34 ` Pavel Begunkov
2026-02-10 19:39 ` Joanne Koong
2026-02-11 12:01 ` Pavel Begunkov
2026-02-11 22:06 ` Joanne Koong
2026-02-12 10:07 ` Christoph Hellwig
2026-02-12 10:52 ` Pavel Begunkov
2026-02-12 17:29 ` Joanne Koong
2026-02-13 7:27 ` Christoph Hellwig
2026-02-13 15:31 ` Pavel Begunkov
2026-02-13 15:48 ` Pavel Begunkov
2026-02-13 19:09 ` Joanne Koong
2026-02-13 19:30 ` Bernd Schubert
2026-02-13 19:38 ` Joanne Koong
2026-02-17 5:36 ` Christoph Hellwig
2026-02-13 19:14 ` Joanne Koong
2026-02-17 5:38 ` Christoph Hellwig
2026-02-18 9:51 ` Pavel Begunkov
2026-02-13 16:27 ` Pavel Begunkov
2026-02-13 7:21 ` Christoph Hellwig
2026-02-13 13:18 ` Pavel Begunkov
2026-02-13 15:26 ` Pavel Begunkov
2026-02-27 1:12 ` Joanne Koong
2026-02-27 20:48 ` Pavel Begunkov
2026-03-02 20:50 ` Joanne Koong [this message]
2026-02-11 15:45 ` Christoph Hellwig
2026-02-12 10:44 ` Pavel Begunkov
2026-02-13 7:18 ` Christoph Hellwig
2026-02-13 12:41 ` Pavel Begunkov
2026-02-13 22:04 ` Joanne Koong
2026-02-18 12:36 ` Pavel Begunkov
2026-02-18 21:43 ` Joanne Koong
2026-02-20 12:53 ` Pavel Begunkov
2026-02-21 2:14 ` Joanne Koong
2026-02-23 20:00 ` Pavel Begunkov
2026-02-24 22:19 ` Joanne Koong
2026-02-27 20:05 ` Pavel Begunkov
2026-03-02 19:49 ` Joanne Koong
2026-02-10 0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
2026-02-10 1:02 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2026-02-10 0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2026-02-10 1:07 ` Jens Axboe
2026-02-10 17:57 ` Caleb Sander Mateos
2026-02-10 18:00 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
2026-02-10 0:52 ` Jens Axboe
2026-02-10 0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2026-02-10 0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2026-02-10 0:53 ` Jens Axboe
2026-02-10 22:36 ` Joanne Koong
2026-02-10 0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2026-02-10 0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
2026-02-10 22:45 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJnrk1YGaF=5TOgeUo34=iOYRdxz+xBEwg7+A=2QjTBxUp=c4g@mail.gmail.com' \
--to=joannelkoong@gmail.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=bernd@bsbernd.com \
--cc=csander@purestorage.com \
--cc=hch@infradead.org \
--cc=io-uring@vger.kernel.org \
--cc=krisman@suse.de \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox