public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Bernd Schubert <[email protected]>
Cc: Jens Axboe <[email protected]>,
	[email protected], [email protected],
	[email protected], [email protected],
	Miklos Szeredi <[email protected]>,
	Stefan Hajnoczi <[email protected]>,
	ZiyangZhang <[email protected]>
Subject: Re: [RFC PATCH 4/4] ublk_drv: support splice based read/write zero copy
Date: Mon, 7 Nov 2022 09:05:05 +0800	[thread overview]
Message-ID: <Y2hZwWdY28bCn+iT@T590> (raw)
In-Reply-To: <[email protected]>

On Sat, Nov 05, 2022 at 12:37:21AM +0100, Bernd Schubert wrote:
> 
> 
> On 11/4/22 01:44, Ming Lei wrote:
> > On Thu, Nov 03, 2022 at 11:28:29PM +0100, Bernd Schubert wrote:
> > > 
> > > 
> > > On 11/3/22 09:50, Ming Lei wrote:
> > > > Pass ublk block IO request pages to kernel backend IO handling code via
> > > > pipe, and request page copy can be avoided. So far, the existed
> > > > pipe/splice mechanism works for handling write request only.
> > > > 
> > > > The initial idea of using splice for zero copy is from Miklos and Stefan.
> > > > 
> > > > Read request's zero copy requires pipe's change to allow one read end to
> > > > produce buffers for another read end to consume. The added SPLICE_F_READ_TO_READ
> > > > flag is for supporting this feature.
> > > > 
> > > > READ is handled by sending IORING_OP_SPLICE with SPLICE_F_DIRECT |
> > > > SPLICE_F_READ_TO_READ. WRITE is handled by sending IORING_OP_SPLICE with
> > > > SPLICE_F_DIRECT. Kernel internal pipe is used for simplifying userspace,
> > > > meantime potential info leak could be avoided.
> > > 
> > > 
> > > Sorry to ask, do you have an ublk branch that gives an example how to use
> > > this?
> > 
> > Follows the ublk splice-zc branch:
> > 
> > https://github.com/ming1/ubdsrv/commits/splice-zc
> > 
> > which is mentioned in cover letter, but I guess it should be added to
> > here too, sorry for that, so far only ublk-loop supports it by:
> > 
> >     ublk add -t loop -f $BACKING -z
> > 
> > without '-z', ublk-loop is created with zero copy disabled.
> 
> Ah, thanks a lot! And sorry, I had missed this part in the cover letter.
> 
> I will take a look on your new zero copy code on Monday.
> 
> 
> > 
> > > 
> > > I still have several things to fix in my branches, but I got basic fuse
> > > uring with copies working. Adding back splice would be next after posting
> > > rfc patches. My initial assumption was that I needed to duplicate everything
> > > splice does into the fuse .uring_cmd handler - obviously there is a better
> > > way with your patches.
> > > 
> > > This week I have a few days off, by end of next week or the week after I
> > > might have patches in an rfc state (one thing I'm going to ask about is how
> > > do I know what is the next CQE in the kernel handler - ublk does this with
> > > tags through mq, but I don't understand yet where the tag is increased and
> > > what the relation between tag and right CQE order is).
> > 
> > tag is one attribute of io request, which is originated from ublk
> > driver, and it is unique for each request among one queue. So ublksrv
> > won't change it at all, just use it, and ublk driver guarantees that
> > it is unique.
> > 
> > In ublkserv implementation, the tag info is set in cqe->user_data, so
> > we can retrieve the io request via tag part of cqe->user_data.
> 
> Yeah, this is the easy part I understood. At least I hope so :)
> 
> > 
> > Also I may not understand your question of 'the relation between tag and right
> > CQE order', io_uring provides IOSQE_IO_DRAIN/IOSQE_IO_LINK for ordering
> > SQE, and ublksrv only applies IOSQE_IO_LINK in ublk-qcow2, so care to
> > explain it in a bit details about the "the relation between tag and right
> > CQE order"?
> 
> 
> For fuse (kernel) a vfs request comes in and I need to choose a command in
> the ring queue. Right now this is just an atomic counter % queue_size
> 
> fuse_request_alloc_ring()
> 	req_cnt = atomic_inc_return(&queue->req_cnt);
> 	tag = req_cnt & (fc->ring.queue_depth - 1); /* cnt % queue_depth */
> 
> 	ring_req = &queue->ring_req[tag];
> 
> 
> 
> I might be wrong, but I think that can be compared a bit to ublk_queue_rq().
> Looks like ublk_queue_rq gets called in blk-mq context and blk-mq seems to
> provide rq->tag, which then determines the command in the ring queue -
> completion of commands is done in tag-order provided by blk-mq? The part I

The two are not related, blk-mq tag number means nothing wrt. io
handling order:

- tag is allocated via sbitmap, which may return tag number in any
  order, you may think the returned number is just random
- blk-mq may re-order requests and dispatch them with any order
- once requests are issued to io_uring, userspace may handles these IOs
  with any order
- after backend io is queued via io_uring or libaio or whatever to kernel, it
could be completed at any order

> didn't figure out yet is where the tag value gets set.
> Also interesting is that there is no handler if the ring is already full -
> like the ublk_io command is currently busy in ublksrv (user space). Handled
> auto-magically with blk-mq?

For ublk, the queue has fixed depth, so the pre-allocated io_uring size is
enough, and blk-mq can throttle IOs from the beginning if the max queue depth is
reached, so ublk needn't to worry about io_uring size/depth.

But fuse may have to consider request throttle.


Thanks, 
Ming


  reply	other threads:[~2022-11-07  1:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-03  8:50 [RFC PATCH 0/4] io_uring/splice: extend splice for supporting ublk zero copy Ming Lei
2022-11-03  8:50 ` [RFC PATCH 1/4] io_uring/splice: support do_splice_direct Ming Lei
2022-11-08  7:42   ` Christoph Hellwig
2022-11-03  8:50 ` [RFC PATCH 2/4] fs/splice: add helper of splice_dismiss_pipe() Ming Lei
2022-11-03  8:50 ` [RFC PATCH 3/4] io_uring/splice: support splice from ->splice_read to ->splice_read Ming Lei
2022-11-08  7:45   ` Christoph Hellwig
2022-11-08  8:29     ` Ming Lei
2022-11-03  8:50 ` [RFC PATCH 4/4] ublk_drv: support splice based read/write zero copy Ming Lei
2022-11-03 22:28   ` Bernd Schubert
2022-11-04  0:44     ` Ming Lei
2022-11-04 23:37       ` Bernd Schubert
2022-11-07  1:05         ` Ming Lei [this message]
2022-11-04  9:15 ` [RFC PATCH 0/4] io_uring/splice: extend splice for supporting ublk " Ziyang Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2hZwWdY28bCn+iT@T590 \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox