public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Joanne Koong <joannelkoong@gmail.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: axboe@kernel.dk, io-uring@vger.kernel.org,
	csander@purestorage.com,  krisman@suse.de, bernd@bsbernd.com,
	hch@infradead.org,  linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings
Date: Thu, 26 Feb 2026 17:12:01 -0800	[thread overview]
Message-ID: <CAJnrk1YoaHnCmuwQra0XwOxf0aC_PQGby-DT1y_p=YRzotiE-w@mail.gmail.com> (raw)
In-Reply-To: <CAJnrk1YXmxqUnT561-J7seaicxFRJTyJ=F3_MX1rmtAROC6Ybg@mail.gmail.com>

On Wed, Feb 11, 2026 at 2:06 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Wed, Feb 11, 2026 at 4:01 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >
> > On 2/10/26 19:39, Joanne Koong wrote:
> > > On Tue, Feb 10, 2026 at 8:34 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> > >
> > >>> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
> > >>> index aa9b70b72db4..9bc36451d083 100644
> > >>> --- a/io_uring/kbuf.c
> > >>> +++ b/io_uring/kbuf.c
> > >> ...
> > >>> +static int io_setup_kmbuf_ring(struct io_ring_ctx *ctx,
> > >>> +                            struct io_buffer_list *bl,
> > >>> +                            struct io_uring_buf_reg *reg)
> > >>> +{
> > >>> +     struct io_uring_buf_ring *ring;
> > >>> +     unsigned long ring_size;
> > >>> +     void *buf_region;
> > >>> +     unsigned int i;
> > >>> +     int ret;
> > >>> +
> > >>> +     /* allocate pages for the ring structure */
> > >>> +     ring_size = flex_array_size(ring, bufs, bl->nr_entries);
> > >>> +     ring = kzalloc(ring_size, GFP_KERNEL_ACCOUNT);
> > >>> +     if (!ring)
> > >>> +             return -ENOMEM;
> > >>> +
> > >>> +     ret = io_create_region_multi_buf(ctx, &bl->region, bl->nr_entries,
> > >>> +                                      reg->buf_size);
> > >>
> > >> Please use io_create_region(), the new function does nothing new
> > >> and only violates abstractions.
> > >
> > > There's separate checks needed between io_create_region() and
> > > io_create_region_multi_buf() (eg IORING_MEM_REGION_TYPE_USER flag
> >
> > If io_create_region() is too strict, let's discuss that in
> > examples if there are any, but it's likely not a good idea changing
> > that. If it's too lax, filter arguments in the caller. IOW, don't
> > pass IORING_MEM_REGION_TYPE_USER if it's not used.
> >
> > > checking) and different allocation calls (eg
> > > io_region_allocate_pages() vs io_region_allocate_pages_multi_buf()).
> >
> > I saw that and saying that all memmap.c changes can get dropped.
> > You're using it as one big virtually contig kernel memory range then
> > chunked into buffers, and that's pretty much what you're getting with
> > normal io_create_region(). I get that you only need it to be
> > contiguous within a single buffer, but that's not what you're doing,
> > and it'll be only worse than default io_create_region() e.g.
> > effectively disabling any usefulness of io_mem_alloc_compound(),
> > and ultimately you don't need to care.
>
> When I originally implemented it, I had it use
> io_region_allocate_pages() but this fails because it's allocating way
> too much memory at once. For fuse's use case, each buffer is usually
> at least 1 MB if not more. Allocating the memory one buffer a time in
> io_region_allocate_pages_multi_buf() bypasses the allocation errors I
> was seeing. That's the main reason I don't think this can just use
> io_create_region().
>
> >
> > Regions shouldn't know anything about your buffers, how it's
> > subdivided after, etc.
> >

I still think the memory for the buffers should be tied to the ring
itself and allocated physically contiguously per buffer. Per-buffer
contiguity will enable the most efficient DMA path for servers to send
read/write data to local storage or the network. If the buffers for
the bufring have to be allocated as one single memory region, the
io_mem_alloc_compound() call will fail for this large allocation size.
Even if io_mem_alloc_compound() did succeed, this is a waste as the
buffer pool as an entity doesn't need to be physically contiguous,
just the individual buffers themselves. For fuse, the server
configures what buffer pool size it wants to use, depending on what
queue depth and max request size it needs. So for most use cases, at
least for high-performance servers, allocation will have to fall back
to alloc_pages_bulk_node(), which doesn't allocate contiguously. You
mentioned in an earlier comment that this "only violates abstractions"
- which abstractions does this break? The pre-existing behavior
already defaults to allocating pages non-contiguously if the mem
region can't be allocated fully contiguously.

Going through registered buffers doesn't help either. Fuse servers can
be unprivileged and it's not guaranteed that there are enough huge
pages reserved or that another process hasn't taken them or that the
server has privileges to pre-reserve pages for the allocation. Also
the 2 MB granularity is inflexible while 1 GB is too much.

I'm not really seeing a way where we can honor the physical contiguity
requirements for the buffers without going through kernel-managed
bufrings with the allocation done on a per-buffer basis. Or am I
missing something here?

Thanks,
Joanne

  parent reply	other threads:[~2026-02-27  1:12 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10  0:28 [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Joanne Koong
2026-02-10  0:28 ` [PATCH v1 01/11] io_uring/kbuf: refactor io_register_pbuf_ring() logic into generic helpers Joanne Koong
2026-02-10  0:28 ` [PATCH v1 02/11] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 03/11] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2026-02-10 16:34   ` Pavel Begunkov
2026-02-10 19:39     ` Joanne Koong
2026-02-11 12:01       ` Pavel Begunkov
2026-02-11 22:06         ` Joanne Koong
2026-02-12 10:07           ` Christoph Hellwig
2026-02-12 10:52             ` Pavel Begunkov
2026-02-12 17:29               ` Joanne Koong
2026-02-13  7:27                 ` Christoph Hellwig
2026-02-13 15:31                   ` Pavel Begunkov
2026-02-13 15:48                     ` Pavel Begunkov
2026-02-13 19:09                     ` Joanne Koong
2026-02-13 19:30                       ` Bernd Schubert
2026-02-13 19:38                         ` Joanne Koong
2026-02-17  5:36                       ` Christoph Hellwig
2026-02-13 19:14                   ` Joanne Koong
2026-02-17  5:38                     ` Christoph Hellwig
2026-02-18  9:51                       ` Pavel Begunkov
2026-02-13 16:27                 ` Pavel Begunkov
2026-02-13  7:21               ` Christoph Hellwig
2026-02-13 13:18                 ` Pavel Begunkov
2026-02-13 15:26           ` Pavel Begunkov
2026-02-27  1:12           ` Joanne Koong [this message]
2026-02-27 20:48             ` Pavel Begunkov
2026-03-02 20:50               ` Joanne Koong
2026-02-11 15:45     ` Christoph Hellwig
2026-02-12 10:44       ` Pavel Begunkov
2026-02-13  7:18         ` Christoph Hellwig
2026-02-13 12:41           ` Pavel Begunkov
2026-02-13 22:04             ` Joanne Koong
2026-02-18 12:36               ` Pavel Begunkov
2026-02-18 21:43                 ` Joanne Koong
2026-02-20 12:53                   ` Pavel Begunkov
2026-02-21  2:14                     ` Joanne Koong
2026-02-23 20:00                       ` Pavel Begunkov
2026-02-24 22:19                         ` Joanne Koong
2026-02-27 20:05                           ` Pavel Begunkov
2026-03-02 19:49                             ` Joanne Koong
2026-02-10  0:28 ` [PATCH v1 04/11] io_uring/kbuf: add mmap " Joanne Koong
2026-02-10  1:02   ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 05/11] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2026-02-10  0:28 ` [PATCH v1 06/11] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2026-02-10  1:07   ` Jens Axboe
2026-02-10 17:57     ` Caleb Sander Mateos
2026-02-10 18:00       ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 07/11] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
2026-02-10  0:52   ` Jens Axboe
2026-02-10  0:28 ` [PATCH v1 08/11] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 09/11] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2026-02-10  0:28 ` [PATCH v1 10/11] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2026-02-10  0:53   ` Jens Axboe
2026-02-10 22:36     ` Joanne Koong
2026-02-10  0:28 ` [PATCH v1 11/11] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2026-02-10  0:55 ` [PATCH v1 00/11] io_uring: add kernel-managed buffer rings Jens Axboe
2026-02-10 22:45   ` Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJnrk1YoaHnCmuwQra0XwOxf0aC_PQGby-DT1y_p=YRzotiE-w@mail.gmail.com' \
    --to=joannelkoong@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bernd@bsbernd.com \
    --cc=csander@purestorage.com \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=krisman@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox