From: Ming Lei <[email protected]>
To: Keith Busch <[email protected]>
Cc: Keith Busch <[email protected]>,
[email protected], [email protected],
[email protected], [email protected],
Bernd Schubert <[email protected]>
Subject: Re: [PATCH 0/6] ublk zero-copy support
Date: Sat, 8 Feb 2025 13:44:41 +0800 [thread overview]
Message-ID: <Z6bvSXKF9ESwJ61r@fedora> (raw)
In-Reply-To: <Z6YTfi29FcSQ1cSe@kbusch-mbp>
On Fri, Feb 07, 2025 at 07:06:54AM -0700, Keith Busch wrote:
> On Fri, Feb 07, 2025 at 11:51:49AM +0800, Ming Lei wrote:
> > On Mon, Feb 03, 2025 at 07:45:11AM -0800, Keith Busch wrote:
> > >
> > > The previous version from Ming can be viewed here:
> > >
> > > https://lore.kernel.org/linux-block/[email protected]/
> > >
> > > Based on the feedback from that thread, the desired io_uring interfaces
> > > needed to be simpler, and the kernel registered resources need to behave
> > > more similiar to user registered buffers.
> > >
> > > This series introduces a new resource node type, KBUF, which, like the
> > > BUFFER resource, needs to be installed into an io_uring buf_node table
> > > in order for the user to access it in a fixed buffer command. The
> > > new io_uring kernel API provides a way for a user to register a struct
> > > request's bvec to a specific index, and a way to unregister it.
> > >
> > > When the ublk server receives notification of a new command, it must
> > > first select an index and register the zero copy buffer. It may use that
> > > index for any number of fixed buffer commands, then it must unregister
> > > the index when it's done. This can all be done in a single io_uring_enter
> > > if desired, or it can be split into multiple enters if needed.
> >
> > I suspect it may not be done in single io_uring_enter() because there
> > is strict dependency among the three OPs(register buffer, read/write,
> > unregister buffer).
>
> The registration is synchronous. io_uring completes the SQE entirely
> before it even looks at the read command in the next SQE.
Can you explain a bit "synchronous" here?
In patch 4, two ublk uring_cmd(UBLK_U_IO_REGISTER_IO_BUF/UBLK_U_IO_UNREGISTER_IO_BUF)
are added, and their handlers are called from uring_cmd's ->issue().
>
> The read or write is asynchronous, but it's prep takes a reference on
> the node before moving on to the next SQE..
The buffer is registered in ->issue() of UBLK_U_IO_REGISTER_IO_BUF,
and it isn't done yet when calling ->prep() of read_fixed/write_fixed,
in which buffer is looked up in ->prep().
>
> The unregister is synchronous, and clears the index node, but the
> possibly inflight read or write has a reference on that node, so all
> good.
UBLK_U_IO_UNREGISTER_IO_BUF tells ublk that the buffer isn't used any
more, but it is being used by the async read/write.
It might work, but looks a bit fragile, such as:
One buggy application may panic kernel if the IO command is completed
before read/write is done.
>
> > > + ublk_get_sqe_three(q->ring_ptr, ®, &read, &ureg);
> > > +
> > > + io_uring_prep_buf_register(reg, 0, tag, q->q_id, tag);
> > > +
> > > + io_uring_prep_read_fixed(read, 1 /*fds[1]*/,
> > > + 0,
> > > + iod->nr_sectors << 9,
> > > + iod->start_sector << 9,
> > > + tag);
> > > + io_uring_sqe_set_flags(read, IOSQE_FIXED_FILE);
> > > + read->user_data = build_user_data(tag, ublk_op, 0, 1);
> >
> > Does this interface support to read to partial buffer? Which is useful
> > for stacking device cases.
>
> Are you wanting to read into this buffer without copying in parts? As in
> provide an offset and/or smaller length across multiple commands? If
> that's what you mean, then yes, you can do that here.
OK.
>
> > Also does this interface support to consume the buffer from multiple
> > OPs concurrently?
>
> You can register as many kernel buffers from as many OPs as you have
> space for in your table, and you can use them all concurrently. Pretty
> much the same as user registered fixed buffers. The main difference from
> user buffers is how you register them.
Here it depends on if LINK between buffer register and read/write are
required. If it is required, multiple OPs consuming the buffer have to
be linked one by one, then they can't be issue concurrently.
Thanks,
Ming
next prev parent reply other threads:[~2025-02-08 5:44 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-03 15:45 [PATCH 0/6] ublk zero-copy support Keith Busch
2025-02-03 15:45 ` [PATCH 1/6] block: const blk_rq_nr_phys_segments request Keith Busch
2025-02-03 15:45 ` [PATCH 2/6] io_uring: use node for import Keith Busch
2025-02-03 15:45 ` [PATCH 3/6] io_uring: add support for kernel registered bvecs Keith Busch
2025-02-07 14:08 ` Pavel Begunkov
2025-02-07 15:17 ` Keith Busch
2025-02-08 15:49 ` Pavel Begunkov
2025-02-10 14:12 ` Ming Lei
2025-02-10 15:05 ` Keith Busch
2025-02-03 15:45 ` [PATCH 4/6] ublk: zc register/unregister bvec Keith Busch
2025-02-08 5:50 ` Ming Lei
2025-02-03 15:45 ` [PATCH 5/6] io_uring: add abstraction for buf_table rsrc data Keith Busch
2025-02-03 15:45 ` [PATCH 6/6] io_uring: cache nodes and mapped buffers Keith Busch
2025-02-07 12:41 ` Pavel Begunkov
2025-02-07 15:33 ` Keith Busch
2025-02-08 14:00 ` Pavel Begunkov
2025-02-07 15:59 ` Keith Busch
2025-02-08 14:24 ` Pavel Begunkov
2025-02-06 15:28 ` [PATCH 0/6] ublk zero-copy support Keith Busch
2025-02-07 3:51 ` Ming Lei
2025-02-07 14:06 ` Keith Busch
2025-02-08 5:44 ` Ming Lei [this message]
2025-02-08 14:16 ` Pavel Begunkov
2025-02-08 20:13 ` Keith Busch
2025-02-08 21:40 ` Pavel Begunkov
2025-02-08 7:52 ` Ming Lei
2025-02-08 0:51 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z6bvSXKF9ESwJ61r@fedora \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox