public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Joanne Koong <joannelkoong@gmail.com>
To: Bernd Schubert <bernd@bsbernd.com>
Cc: axboe@kernel.dk, hch@infradead.org, asml.silence@gmail.com,
	 csander@purestorage.com, krisman@suse.de,
	linux-fsdevel@vger.kernel.org,  io-uring@vger.kernel.org,
	Horst Birthelmer <hbirthelmer@ddn.com>
Subject: Re: [PATCH v3 0/8] io_uring: add kernel-managed buffer rings
Date: Fri, 20 Mar 2026 12:20:32 -0700	[thread overview]
Message-ID: <CAJnrk1YLtQF=SF-GoG4irKYzzePNewNgyTeU7VLvUN6Ub_NFVw@mail.gmail.com> (raw)
In-Reply-To: <59dcb27f-875c-4a2a-82dc-63b832f8eb1e@bsbernd.com>

On Fri, Mar 20, 2026 at 10:16 AM Bernd Schubert <bernd@bsbernd.com> wrote:
>
> On 3/6/26 01:32, Joanne Koong wrote:
> > Currently, io_uring buffer rings require the application to allocate and
> > manage the backing buffers. This series introduces buffer rings where the
> > kernel allocates and manages the buffers on behalf of the application. From
> > the uapi side, this goes through the pbuf ring interface, through the
> > IOU_PBUF_RING_KERNEL_MANAGED flag.
> >
> > There was a long discussion with Pavel on v1 [1] regarding the design. The
> > alternatives were to have the buffers allocated and registered through a
> > memory region or through the registered buffers interface and have fuse
> > implement ring buffer logic internally outside of io-uring. However, because
> > the buffers need to be contiguous for DMA and some high-performance fuse
> > servers may need non-fuse io-uring requests to use the buffer ring directly,
> > v3 keeps the design.
> >
> > This is split out from the fuse-over-io_uring series in [2], which needs the
> > kernel to own and manage buffers shared between the fuse server and the
> > kernel. The link to the fuse tree that uses the commits in this series is in
> > [3].
> >
> > This series is on top of the for-7.1/io_uring branch in Jens' io-uring
> > tree (commit ee1d7dc33990). The corresponding liburing changes are in [4] and
> > will be submitted after the changes in this patchset have landed.
> >
> > Thanks,
> > Joanne
> >
> > [1] https://lore.kernel.org/linux-fsdevel/20260210002852.1394504-1-joannelkoong@gmail.com/T/#t
> > [2] https://lore.kernel.org/linux-fsdevel/20260116233044.1532965-1-joannelkoong@gmail.com/
> > [3] https://github.com/joannekoong/linux/commits/fuse_zero_copy_for_v3/
> > [4] https://github.com/joannekoong/liburing/commits/pbuf_kernel_managed/
>

Hi Bernd,

> Hi Joanne,
>
> I'm a bit late, but could we have a design discussion about fuse here?
> From my point of view it would be good if we could have different
> request sizes for the ring buffers. Without kbuf I thought we would just

Is your motivation for wanting different request sizes for the ring
buffers so that it can optimize the memory costs of the buffers? I
agree that trying to reduce the memory footprint of the buffers is
very important. The main reason I ended up going with the buffer ring
design was for that purpose. When kbuf incremental buffer consumption
is added in the future (I plan to submit it separately once all the
io-uring pieces of the fuse-zero-copy patchset land), this will allow
non-overlapping regions of the individual buffer to be used across
multiple different-sized requests concurrently.

From my point of view, this is better than allocating variable-sized
buffers upfront because:
a) entries are fully maximized. With variable-sized buffers, the big
buffers would be reserved specifically for payload requests while the
small buffers would be reserved specifically for metadata requests. We
could allocate '# entries' amount of small buffers, but for big
buffers there would be less than '# entries'. If the server needs to
service a lot of concurrent I/O requests, then the ring gets throttled
on the limited number of big buffers available.

b) it best maximizes buffer memory. A request could need a buffer of
any size so with variable-sized buffers, there's extra space in the
buffer that is still being wasted. For example, for large payload
requests, the big buffers would need to be the size of the max payload
size (eg default 1 MB) but a lot of requests will fall under that.
With incremental buffer consumption, only however many bytes used by
the request are reserved in the buffer.

c) there's no overhead with having to (as you pointed out) keep the
buffers tracked and sorted into per-sized lists. If we wanted to use
variable-sized buffers with kbufs instead of using incremental buffer
consumption, the best way to do that would be to allocate a separate
kbufring to support payload requests vs metadata requests.

> register entries with different sizes, which would then get sorted into
> per size lists. Now with kbuf that will not work anymore and we need
> different kbuf sizes. But then kbuf is not suitable for non-privileged
> users. So in order to support different request sizes one basically has

Non-privileged fuse servers use kbufs as well. It's only zero-copying
that is not possible for non-privileged servers.

> to implement things two times - not ideal. Couldn't we have pbuf for
> non-privileged users and basically depcrecate the existing fuse io-uring

I don't think this is necessary because kbufs works for both
non-privileged and privileged servers. For how the buffer gets used by
the server/kernel, pbufs are not an option here because the kernel has
to be the one to recycle back the buffer (since it needs to read /
copy data the server returns back in the buffer).

> buffer API? In the sense that it needs to be further supported for some
> time, but won't get any new feature. Different buffer sizes would then
> only be supported through kbuf/pbuf?

I hope I understood your questions correctly, but if I misread
anything, please let me know. I am going to be updating and submitting
the fuse patches next week - the main update will be changing the
headers to go through a registered memory region (which I only
realized existed after the discussion with Pavel in v1) instead of as
a registered buffer, as that will allow us to avoid the per I/O lookup
overhead and drop the patch for the
"io_uring_fixed_index_get()/io_uring_fixed_index_put()" refcount dance
altogether.

Thanks,
Joanne

>
>
> Thanks,
> Bernd

  reply	other threads:[~2026-03-20 19:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-06  0:32 [PATCH v3 0/8] io_uring: add kernel-managed buffer rings Joanne Koong
2026-03-06  0:32 ` [PATCH v3 1/8] io_uring/kbuf: add support for " Joanne Koong
2026-03-06  0:32 ` [PATCH v3 2/8] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2026-03-06  0:32 ` [PATCH v3 3/8] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2026-03-06  0:32 ` [PATCH v3 4/8] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2026-03-06  0:32 ` [PATCH v3 5/8] io_uring/kbuf: add recycling for kernel managed buffer rings Joanne Koong
2026-03-06  0:32 ` [PATCH v3 6/8] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2026-03-06  0:32 ` [PATCH v3 7/8] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2026-03-06  0:32 ` [PATCH v3 8/8] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2026-03-20 16:45 ` [PATCH v3 0/8] io_uring: add kernel-managed buffer rings Jens Axboe
2026-03-20 17:16 ` Bernd Schubert
2026-03-20 19:20   ` Joanne Koong [this message]
2026-03-20 19:45     ` Bernd Schubert
2026-03-20 21:58       ` Joanne Koong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJnrk1YLtQF=SF-GoG4irKYzzePNewNgyTeU7VLvUN6Ub_NFVw@mail.gmail.com' \
    --to=joannelkoong@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=bernd@bsbernd.com \
    --cc=csander@purestorage.com \
    --cc=hbirthelmer@ddn.com \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=krisman@suse.de \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox