From: Joanne Koong <joannelkoong@gmail.com>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: miklos@szeredi.hu, axboe@kernel.dk, bschubert@ddn.com,
asml.silence@gmail.com, io-uring@vger.kernel.org,
xiaobing.li@samsung.com, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v1 07/30] io_uring/rsrc: add fixed buffer table pinning/unpinning
Date: Wed, 3 Dec 2025 14:52:31 -0800 [thread overview]
Message-ID: <CAJnrk1Z_UZxmppmXXQr3joGzMSdU4ycnnGt=SacQT+6DbALDmA@mail.gmail.com> (raw)
In-Reply-To: <CADUfDZosVLf4vGm4_kNFReaNH3wSi2RoLXwZBc6TN0Jw__s1OQ@mail.gmail.com>
On Tue, Dec 2, 2025 at 8:49 PM Caleb Sander Mateos
<csander@purestorage.com> wrote:
>
> On Tue, Dec 2, 2025 at 4:36 PM Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > Add kernel APIs to pin and unpin the buffer table for fixed buffers,
> > preventing userspace from unregistering or updating the fixed buffers
> > table while it is pinned by the kernel.
> >
> > This has two advantages:
> > a) Eliminating the overhead of having to fetch and construct an iter for
> > a fixed buffer per every cmd. Instead, the caller can pin the buffer
> > table, fetch/construct the iter once, and use that across cmds for
> > however long it needs to until it is ready to unpin the buffer table.
> >
> > b) Allowing a fixed buffer lookup at any index. The buffer table must be
> > pinned in order to allow this, otherwise we would have to keep track of
> > all the nodes that have been looked up by the io_kiocb so that we can
> > properly adjust the refcounts for those nodes. Ensuring that the buffer
> > table must first be pinned before being able to fetch a buffer at any
> > index makes things logistically a lot neater.
>
> Why is it necessary to pin the entire buffer table rather than
> specific entries? That's the purpose of the existing io_rsrc_node refs
> field.
How would this work with userspace buffer unregistration (which works
at the table level)? If buffer unregistration should still succeed
then fuse would need a way to be notified that the buffer has been
unregistered since the buffer belongs to userspace (eg it would be
wrong if fuse continues using it even though fuse retains a refcount
on it). If buffer unregistration should fail, then we would need to
track this pinned state inside the node instead of relying just on the
refs field, as buffers can be unregistered even if there are in-flight
refs (eg we would need to differentiate the ref being from a pin vs
from not a pin), and I think this would make unregistration more
cumbersome as well (eg we would have to iterate through all the
entries looking to see if any are pinned before iterating through them
again to do the actual unregistration).
>
> >
> > This is a preparatory patch for fuse io-uring's usage of fixed buffers.
> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> > include/linux/io_uring/buf.h | 13 +++++++++++
> > include/linux/io_uring_types.h | 9 ++++++++
> > io_uring/rsrc.c | 42 ++++++++++++++++++++++++++++++++++
> > 3 files changed, 64 insertions(+)
> >
> > diff --git a/include/linux/io_uring/buf.h b/include/linux/io_uring/buf.h
> > index 7a1cf197434d..c997c01c24c4 100644
> > --- a/include/linux/io_uring/buf.h
> > +++ b/include/linux/io_uring/buf.h
> > @@ -9,6 +9,9 @@ int io_uring_buf_ring_pin(struct io_ring_ctx *ctx, unsigned buf_group,
> > unsigned issue_flags, struct io_buffer_list **bl);
> > int io_uring_buf_ring_unpin(struct io_ring_ctx *ctx, unsigned buf_group,
> > unsigned issue_flags);
> > +
> > +int io_uring_buf_table_pin(struct io_ring_ctx *ctx, unsigned issue_flags);
> > +int io_uring_buf_table_unpin(struct io_ring_ctx *ctx, unsigned issue_flags);
> > #else
> > static inline int io_uring_buf_ring_pin(struct io_ring_ctx *ctx,
> > unsigned buf_group,
> > @@ -23,6 +26,16 @@ static inline int io_uring_buf_ring_unpin(struct io_ring_ctx *ctx,
> > {
> > return -EOPNOTSUPP;
> > }
> > +static inline int io_uring_buf_table_pin(struct io_ring_ctx *ctx,
> > + unsigned issue_flags)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +static inline int io_uring_buf_table_unpin(struct io_ring_ctx *ctx,
> > + unsigned issue_flags)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > #endif /* CONFIG_IO_URING */
> >
> > #endif /* _LINUX_IO_URING_BUF_H */
> > diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> > index 36fac08db636..e1a75cfe57d9 100644
> > --- a/include/linux/io_uring_types.h
> > +++ b/include/linux/io_uring_types.h
> > @@ -57,8 +57,17 @@ struct io_wq_work {
> > int cancel_seq;
> > };
> >
> > +/*
> > + * struct io_rsrc_data flag values:
> > + *
> > + * IO_RSRC_DATA_PINNED: data is pinned and cannot be unregistered by userspace
> > + * until it has been unpinned. Currently this is only possible on buffer tables.
> > + */
> > +#define IO_RSRC_DATA_PINNED BIT(0)
> > +
> > struct io_rsrc_data {
> > unsigned int nr;
> > + u8 flags;
> > struct io_rsrc_node **nodes;
> > };
> >
> > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> > index 3765a50329a8..67331cae0a5a 100644
> > --- a/io_uring/rsrc.c
> > +++ b/io_uring/rsrc.c
> > @@ -9,6 +9,7 @@
> > #include <linux/hugetlb.h>
> > #include <linux/compat.h>
> > #include <linux/io_uring.h>
> > +#include <linux/io_uring/buf.h>
> > #include <linux/io_uring/cmd.h>
> >
> > #include <uapi/linux/io_uring.h>
> > @@ -304,6 +305,8 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx,
> > return -ENXIO;
> > if (up->offset + nr_args > ctx->buf_table.nr)
> > return -EINVAL;
> > + if (ctx->buf_table.flags & IO_RSRC_DATA_PINNED)
> > + return -EBUSY;
>
> IORING_REGISTER_CLONE_BUFFERS can also be used to unregister existing
> buffers, so it may need the check too?
Ah I didn't realize this existed, thanks. imo I think it's okay to
clone the buffers in a source ring's pinned buffer table to the
destination ring (where the destination ring's buffer table is
unpinned) since the clone acquires its own refcounts on the underlying
nodes and the clone is its own entity. Do you think this makes sense
or do you think it's better to just not allow this?
>
> >
> > for (done = 0; done < nr_args; done++) {
> > struct io_rsrc_node *node;
> > @@ -615,6 +618,8 @@ int io_sqe_buffers_unregister(struct io_ring_ctx *ctx)
> > {
> > if (!ctx->buf_table.nr)
> > return -ENXIO;
> > + if (ctx->buf_table.flags & IO_RSRC_DATA_PINNED)
> > + return -EBUSY;
>
> io_buffer_unregister_bvec() can also be used to unregister ublk
> zero-copy buffers (also under control of userspace), so it may need
> the check too? But maybe fuse ensures that it never uses a ublk
> zero-copy buffer?
fuse doesn't expose a way for userspace to unregister a zero-copy
buffer, but thanks for considering this possibility.
Thanks,
Joanne
>
> Best,
> Caleb
next prev parent reply other threads:[~2025-12-03 22:52 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 0:34 [PATCH v1 00/30] fuse/io-uring: add kernel-managed buffer rings and zero-copy Joanne Koong
2025-12-03 0:34 ` [PATCH v1 01/30] io_uring/kbuf: refactor io_buf_pbuf_register() logic into generic helpers Joanne Koong
2025-12-03 0:34 ` [PATCH v1 02/30] io_uring/kbuf: rename io_unregister_pbuf_ring() to io_unregister_buf_ring() Joanne Koong
2025-12-03 0:34 ` [PATCH v1 03/30] io_uring/kbuf: add support for kernel-managed buffer rings Joanne Koong
2025-12-03 0:34 ` [PATCH v1 04/30] io_uring/kbuf: add mmap " Joanne Koong
2025-12-03 0:35 ` [PATCH v1 05/30] io_uring/kbuf: support kernel-managed buffer rings in buffer selection Joanne Koong
2025-12-03 0:35 ` [PATCH v1 06/30] io_uring/kbuf: add buffer ring pinning/unpinning Joanne Koong
2025-12-03 4:13 ` Caleb Sander Mateos
2025-12-04 18:41 ` Joanne Koong
2025-12-03 0:35 ` [PATCH v1 07/30] io_uring/rsrc: add fixed buffer table pinning/unpinning Joanne Koong
2025-12-03 4:49 ` Caleb Sander Mateos
2025-12-03 22:52 ` Joanne Koong [this message]
2025-12-04 1:24 ` Caleb Sander Mateos
2025-12-04 20:07 ` Joanne Koong
2025-12-10 3:35 ` Caleb Sander Mateos
2025-12-13 6:07 ` Joanne Koong
2025-12-03 0:35 ` [PATCH v1 08/30] io_uring/kbuf: add recycling for pinned kernel managed buffer rings Joanne Koong
2025-12-03 0:35 ` [PATCH v1 09/30] io_uring: add io_uring_cmd_import_fixed_index() Joanne Koong
2025-12-03 21:43 ` Caleb Sander Mateos
2025-12-04 18:56 ` Joanne Koong
2025-12-05 16:56 ` Caleb Sander Mateos
2025-12-05 23:28 ` Joanne Koong
2025-12-11 2:57 ` Caleb Sander Mateos
2025-12-03 0:35 ` [PATCH v1 10/30] io_uring/kbuf: add io_uring_is_kmbuf_ring() Joanne Koong
2025-12-03 0:35 ` [PATCH v1 11/30] io_uring/kbuf: return buffer id in buffer selection Joanne Koong
2025-12-03 21:53 ` Caleb Sander Mateos
2025-12-04 19:22 ` Joanne Koong
2025-12-04 21:57 ` Caleb Sander Mateos
2025-12-03 0:35 ` [PATCH v1 12/30] io_uring/kbuf: export io_ring_buffer_select() Joanne Koong
2025-12-03 0:35 ` [PATCH v1 13/30] io_uring/cmd: set selected buffer index in __io_uring_cmd_done() Joanne Koong
2025-12-03 0:35 ` [PATCH v1 14/30] io_uring: add release callback for ring death Joanne Koong
2025-12-03 22:25 ` Caleb Sander Mateos
2025-12-03 22:54 ` Joanne Koong
2025-12-03 0:35 ` [PATCH v1 15/30] fuse: refactor io-uring logic for getting next fuse request Joanne Koong
2025-12-03 0:35 ` [PATCH v1 16/30] fuse: refactor io-uring header copying to ring Joanne Koong
2025-12-03 0:35 ` [PATCH v1 17/30] fuse: refactor io-uring header copying from ring Joanne Koong
2025-12-03 0:35 ` [PATCH v1 18/30] fuse: use enum types for header copying Joanne Koong
2025-12-03 0:35 ` [PATCH v1 19/30] fuse: refactor setting up copy state for payload copying Joanne Koong
2025-12-03 0:35 ` [PATCH v1 20/30] fuse: support buffer copying for kernel addresses Joanne Koong
2025-12-03 0:35 ` [PATCH v1 21/30] fuse: add io-uring kernel-managed buffer ring Joanne Koong
2025-12-03 0:35 ` [PATCH v1 22/30] io_uring/rsrc: refactor io_buffer_register_bvec()/io_buffer_unregister_bvec() Joanne Koong
2025-12-07 8:33 ` Caleb Sander Mateos
2025-12-13 5:11 ` Joanne Koong
2025-12-16 3:07 ` Caleb Sander Mateos
2025-12-03 0:35 ` [PATCH v1 23/30] io_uring/rsrc: split io_buffer_register_request() logic Joanne Koong
2025-12-07 8:41 ` Caleb Sander Mateos
2025-12-13 5:24 ` Joanne Koong
2025-12-15 17:09 ` Caleb Sander Mateos
2025-12-03 0:35 ` [PATCH v1 24/30] io_uring/rsrc: Allow buffer release callback to be optional Joanne Koong
2025-12-07 8:42 ` Caleb Sander Mateos
2025-12-03 0:35 ` [PATCH v1 25/30] io_uring/rsrc: add io_buffer_register_bvec() Joanne Koong
2025-12-03 0:35 ` [PATCH v1 26/30] io_uring/rsrc: export io_buffer_unregister Joanne Koong
2025-12-03 0:35 ` [PATCH v1 27/30] fuse: rename fuse_set_zero_arg0() to fuse_zero_in_arg0() Joanne Koong
2025-12-03 0:35 ` [PATCH v1 28/30] fuse: enforce op header for every payload reply Joanne Koong
2025-12-03 0:35 ` [PATCH v1 29/30] fuse: add zero-copy over io-uring Joanne Koong
2025-12-03 0:35 ` [PATCH v1 30/30] docs: fuse: add io-uring bufring and zero-copy documentation Joanne Koong
2025-12-13 7:52 ` Askar Safin
2025-12-15 3:18 ` Joanne Koong
2025-12-13 9:14 ` [PATCH v1 00/30] fuse/io-uring: add kernel-managed buffer rings and zero-copy Askar Safin
2025-12-15 3:24 ` Joanne Koong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJnrk1Z_UZxmppmXXQr3joGzMSdU4ycnnGt=SacQT+6DbALDmA@mail.gmail.com' \
--to=joannelkoong@gmail.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=bschubert@ddn.com \
--cc=csander@purestorage.com \
--cc=io-uring@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=miklos@szeredi.hu \
--cc=xiaobing.li@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox