public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* fuse/io-uring: Proposal to support pBuf in additon to kBuf
@ 2026-04-13 21:33 Bernd Schubert
  2026-04-14  0:56 ` Joanne Koong
  2026-04-16 13:49 ` Ming Lei
  0 siblings, 2 replies; 10+ messages in thread
From: Bernd Schubert @ 2026-04-13 21:33 UTC (permalink / raw)
  To: fuse-devel
  Cc: Joanne Koong, io-uring, Jens Axboe, Pavel Begunkov, Ming Lei,
	Miklos Szeredi

Hi Joanne, et al,

this is a bit of duplication of the discussion we had before, but I was
badly distracted with other work and also switching employer - didn't
manage to reply [1].


I'm still not too happy about kBuf and its restriction of locked-only
memory. Right now I'm reviewing your patches from the view of what needs
to be done for ublk (for my current employer) and also for fuse to
support different buffer sizes. Let's say fuse only support kBuf and its
restriction of pinned memory, I think we would be forced to add support
for different buffer sizes to the current ring-entry-provides-the-buffer
and the new kBuf interface - from my point of view code dup.
If we would allow pBuf for fuse, we could put the current
'ring-entry-provides-the-buffer' interface into maintenance mode and
support new features with the new interface only. I know you disagree on
using pBuf [1] with the argument that userspace could free the buffer.
Well, if it does, it does something totally wrong and the same could
happen today over /dev/fuse and also the existing fuse-over-io-uring.
Just the window is smaller, as the pages are extracted from the buffer
during the copy.

I was looking into what would be needed to support pBuf and I think
io-uring could extract pages from pBuf when the buffer is obtained - it
would limit the window when userspace can do something wrong in a
similar way current fuse and ublk works.

Suggested changes:

io_uring:

  - io_pin_pages() gets a 'bool longterm' parameter.
The new pBuf path would pass false, every other exsting caller true.

  - io_ring_buf_pin_user() / io_ring_buf_unpin_user()
  - io_ring_buf_get_pages()/io_ring_buf_put_pages() -> fills the
provided bvec
  - New struct io_ring_buf (in cmd.h)

struct io_ring_buf {
       size_t                  len;
       unsigned int            buf_id;
       unsigned int            nr_bvecs;

       /* private */
       u64                     addr;
       u8                      is_pinned;
};


Fuse changes:

  - fuse_ring_ent (bufring union side): payload_kvec and ringbuf_buf_id
    replaced by io_ring_buf + pre-allocated bvec array.
  - Buffer selection under queue->lock removed.  The lock only protects
    request dequeue and entry state transitions.  Page access happens
    after the lock is dropped, in the context where the copy runs.
  - setup_fuse_copy_state bufring branch: is_kaddr/kaddr replaced by
    iov_iter_bvec() and would continue to use iov_iter_get_pages2()

What do you think?

And my current primary goal is to let ublk to support multiple buffer
sizes - ublk would also need to get support for kBuf/pBuf and I'm
current assuming that fuse and ublk rings should just get multiple
kBufs/pBufs and a config options that mapps bufs to io-size. I'm still
looking into details for that.


Thanks,
Bernd


[1]
https://lore.kernel.org/r/CAJnrk1armV9VzBqrrdfr15K5ySBx2YJRk_P0okGnkzyMx_eDOw@mail.gmail.com



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-04-17 21:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-13 21:33 fuse/io-uring: Proposal to support pBuf in additon to kBuf Bernd Schubert
2026-04-14  0:56 ` Joanne Koong
2026-04-14 17:34   ` Bernd Schubert
2026-04-15  0:19     ` Joanne Koong
2026-04-16 13:49 ` Ming Lei
2026-04-16 14:46   ` Bernd Schubert
2026-04-16 15:48     ` Ming Lei
2026-04-16 19:13       ` Bernd Schubert
2026-04-17 14:35         ` Ming Lei
2026-04-17 21:02     ` Joanne Koong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox