public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Keith Busch <[email protected]>
Cc: Keith Busch <[email protected]>,
	[email protected], [email protected],
	[email protected], [email protected],
	Bernd Schubert <[email protected]>
Subject: Re: [PATCH 0/6] ublk zero-copy support
Date: Sat, 8 Feb 2025 13:44:41 +0800	[thread overview]
Message-ID: <Z6bvSXKF9ESwJ61r@fedora> (raw)
In-Reply-To: <Z6YTfi29FcSQ1cSe@kbusch-mbp>

On Fri, Feb 07, 2025 at 07:06:54AM -0700, Keith Busch wrote:
> On Fri, Feb 07, 2025 at 11:51:49AM +0800, Ming Lei wrote:
> > On Mon, Feb 03, 2025 at 07:45:11AM -0800, Keith Busch wrote:
> > > 
> > > The previous version from Ming can be viewed here:
> > > 
> > >   https://lore.kernel.org/linux-block/[email protected]/
> > > 
> > > Based on the feedback from that thread, the desired io_uring interfaces
> > > needed to be simpler, and the kernel registered resources need to behave
> > > more similiar to user registered buffers.
> > > 
> > > This series introduces a new resource node type, KBUF, which, like the
> > > BUFFER resource, needs to be installed into an io_uring buf_node table
> > > in order for the user to access it in a fixed buffer command. The
> > > new io_uring kernel API provides a way for a user to register a struct
> > > request's bvec to a specific index, and a way to unregister it.
> > > 
> > > When the ublk server receives notification of a new command, it must
> > > first select an index and register the zero copy buffer. It may use that
> > > index for any number of fixed buffer commands, then it must unregister
> > > the index when it's done. This can all be done in a single io_uring_enter
> > > if desired, or it can be split into multiple enters if needed.
> > 
> > I suspect it may not be done in single io_uring_enter() because there
> > is strict dependency among the three OPs(register buffer, read/write,
> > unregister buffer).
> 
> The registration is synchronous. io_uring completes the SQE entirely
> before it even looks at the read command in the next SQE.

Can you explain a bit "synchronous" here?

In patch 4, two ublk uring_cmd(UBLK_U_IO_REGISTER_IO_BUF/UBLK_U_IO_UNREGISTER_IO_BUF)
are added, and their handlers are called from uring_cmd's ->issue().

> 
> The read or write is asynchronous, but it's prep takes a reference on
> the node before moving on to the next SQE..

The buffer is registered in ->issue() of UBLK_U_IO_REGISTER_IO_BUF,
and it isn't done yet when calling ->prep() of read_fixed/write_fixed,
in which buffer is looked up in ->prep().

> 
> The unregister is synchronous, and clears the index node, but the
> possibly inflight read or write has a reference on that node, so all
> good.

UBLK_U_IO_UNREGISTER_IO_BUF tells ublk that the buffer isn't used any
more, but it is being used by the async read/write.

It might work, but looks a bit fragile, such as:

One buggy application may panic kernel if the IO command is completed
before read/write is done.

> 
> > > +		ublk_get_sqe_three(q->ring_ptr, &reg, &read, &ureg);
> > > +
> > > +		io_uring_prep_buf_register(reg, 0, tag, q->q_id, tag);
> > > +
> > > +		io_uring_prep_read_fixed(read, 1 /*fds[1]*/,
> > > +			0,
> > > +			iod->nr_sectors << 9,
> > > +			iod->start_sector << 9,
> > > +			tag);
> > > +		io_uring_sqe_set_flags(read, IOSQE_FIXED_FILE);
> > > +		read->user_data = build_user_data(tag, ublk_op, 0, 1);
> > 
> > Does this interface support to read to partial buffer? Which is useful
> > for stacking device cases.
> 
> Are you wanting to read into this buffer without copying in parts? As in
> provide an offset and/or smaller length across multiple commands? If
> that's what you mean, then yes, you can do that here.

OK.

>  
> > Also does this interface support to consume the buffer from multiple
> > OPs concurrently? 
> 
> You can register as many kernel buffers from as many OPs as you have
> space for in your table, and you can use them all concurrently. Pretty
> much the same as user registered fixed buffers. The main difference from
> user buffers is how you register them.

Here it depends on if LINK between buffer register and read/write are
required. If it is required, multiple OPs consuming the buffer have to
be linked one by one, then they can't be issue concurrently.


Thanks,
Ming


  reply	other threads:[~2025-02-08  5:44 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-03 15:45 [PATCH 0/6] ublk zero-copy support Keith Busch
2025-02-03 15:45 ` [PATCH 1/6] block: const blk_rq_nr_phys_segments request Keith Busch
2025-02-03 15:45 ` [PATCH 2/6] io_uring: use node for import Keith Busch
2025-02-03 15:45 ` [PATCH 3/6] io_uring: add support for kernel registered bvecs Keith Busch
2025-02-07 14:08   ` Pavel Begunkov
2025-02-07 15:17     ` Keith Busch
2025-02-08 15:49       ` Pavel Begunkov
2025-02-10 14:12   ` Ming Lei
2025-02-10 15:05     ` Keith Busch
2025-02-03 15:45 ` [PATCH 4/6] ublk: zc register/unregister bvec Keith Busch
2025-02-08  5:50   ` Ming Lei
2025-02-03 15:45 ` [PATCH 5/6] io_uring: add abstraction for buf_table rsrc data Keith Busch
2025-02-03 15:45 ` [PATCH 6/6] io_uring: cache nodes and mapped buffers Keith Busch
2025-02-07 12:41   ` Pavel Begunkov
2025-02-07 15:33     ` Keith Busch
2025-02-08 14:00       ` Pavel Begunkov
2025-02-07 15:59     ` Keith Busch
2025-02-08 14:24       ` Pavel Begunkov
2025-02-06 15:28 ` [PATCH 0/6] ublk zero-copy support Keith Busch
2025-02-07  3:51 ` Ming Lei
2025-02-07 14:06   ` Keith Busch
2025-02-08  5:44     ` Ming Lei [this message]
2025-02-08 14:16       ` Pavel Begunkov
2025-02-08 20:13         ` Keith Busch
2025-02-08 21:40           ` Pavel Begunkov
2025-02-08  7:52     ` Ming Lei
2025-02-08  0:51 ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z6bvSXKF9ESwJ61r@fedora \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox