public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Pavel Begunkov <[email protected]>
Cc: Ziyang Zhang <[email protected]>,
	Miklos Szeredi <[email protected]>,
	Bernd Schubert <[email protected]>, Jens Axboe <[email protected]>,
	Xiaoguang Wang <[email protected]>,
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH V3 00/16] io_uring/ublk: add IORING_OP_FUSED_CMD
Date: Tue, 28 Mar 2023 09:01:21 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Mon, Mar 27, 2023 at 05:04:01PM +0100, Pavel Begunkov wrote:
> On 3/21/23 09:17, Ziyang Zhang wrote:
> > On 2023/3/19 00:23, Pavel Begunkov wrote:
> > > On 3/16/23 03:13, Xiaoguang Wang wrote:
> > > > > Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to
> > > > > be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd
> > > > > 64byte SQE(slave) is another normal 64byte OP. For any OP which needs
> > > > > to support slave OP, io_issue_defs[op].fused_slave needs to be set as 1,
> > > > > and its ->issue() can retrieve/import buffer from master request's
> > > > > fused_cmd_kbuf. The slave OP is actually submitted from kernel, part of
> > > > > this idea is from Xiaoguang's ublk ebpf patchset, but this patchset
> > > > > submits slave OP just like normal OP issued from userspace, that said,
> > > > > SQE order is kept, and batching handling is done too.
> > > > Thanks for this great work, seems that we're now in the right direction
> > > > to support ublk zero copy, I believe this feature will improve io throughput
> > > > greatly and reduce ublk's cpu resource usage.
> > > > 
> > > > I have gone through your 2th patch, and have some little concerns here:
> > > > Say we have one ublk loop target device, but it has 4 backend files,
> > > > every file will carry 25% of device capacity and it's implemented in stripped
> > > > way, then for every io request, current implementation will need issed 4
> > > > fused_cmd, right? 4 slave sqes are necessary, but it would be better to
> > > > have just one master sqe, so I wonder whether we can have another
> > > > method. The key point is to let io_uring support register various kernel
> > > > memory objects, which come from kernel, such as ITER_BVEC or
> > > > ITER_KVEC. so how about below actions:
> > > > 1. add a new infrastructure in io_uring, which will support to register
> > > > various kernel memory objects in it, this new infrastructure could be
> > > > maintained in a xarray structure, every memory objects in it will have
> > > > a unique id. This registration could be done in a ublk uring cmd, io_uring
> > > > offers registration interface.
> > > > 2. then any sqe can use these memory objects freely, so long as it
> > > > passes above unique id in sqe properly.
> > > > Above are just rough ideas, just for your reference.
> > > 
> > > It precisely hints on what I proposed a bit earlier, that makes
> > > me not alone thinking that it's a good idea to have a design allowing
> > > 1) multiple ops using a buffer and 2) to limiting it to one single
> > > submission because the userspace might want to preprocess a part
> > > of the data, multiplex it or on the opposite divide. I was mostly
> > > coming from non ublk cases, and one example would be such zc recv,
> > > parsing the app level headers and redirecting the rest of the data
> > > somewhere.
> > > 
> > > I haven't got a chance to work on it but will return to it in
> > > a week. The discussion was here:
> > > 
> > > https://lore.kernel.org/all/[email protected]/
> > > 
> > 
> > Hi Pavel and all,
> > 
> > I think it is a good idea to register some kernel objects(such as bvec)
> > in io_uring and return a cookie(such as buf_idx) for READ/WRITE/SEND/RECV sqes.
> > There are some ways to register user's buffer such as IORING_OP_PROVIDE_BUFFERS
> > and IORING_REGISTER_PBUF_RING but there is not a way to register kernel buffer(bvec).
> > 
> > I do not think reusing splice is a good idea because splice should run in io-wq.
> 
> The reason why I disabled inline splice execution is because do_splice()
> and below the stack doesn't support nowait well enough, which is not a
> problem when we hook directly under the ->splice_read() callback and
> operate only with one file at a time at the io_uring level.

I believe I have explained several times[1][2] it isn't good solution for ublk
zero copy.

But if you insist on reusing splice for this feature, please share your code and
I'm happy to give an review.

[1] https://lore.kernel.org/linux-block/ZB8B8cr1%[email protected]/T/#m1bfa358524b6af94731bcd5be28056f9f4408ecf
[2] https://github.com/ming1/linux/blob/my_v6.3-io_uring_fuse_cmd_v4/Documentation/block/ublk.rst#zero-copy

Thanks,
Ming


  reply	other threads:[~2023-03-28  1:02 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 12:57 [PATCH V3 00/16] io_uring/ublk: add IORING_OP_FUSED_CMD Ming Lei
2023-03-14 12:57 ` [PATCH V3 01/16] io_uring: increase io_kiocb->flags into 64bit Ming Lei
2023-03-14 12:57 ` [PATCH V3 02/16] io_uring: add IORING_OP_FUSED_CMD Ming Lei
2023-03-18 14:31   ` Jens Axboe
2023-03-18 15:24     ` Ming Lei
2023-03-18 16:00       ` Jens Axboe
2023-03-18 16:13       ` Ming Lei
2023-03-14 12:57 ` [PATCH V3 03/16] io_uring: support OP_READ/OP_WRITE for fused slave request Ming Lei
2023-03-14 12:57 ` [PATCH V3 04/16] io_uring: support OP_SEND_ZC/OP_RECV " Ming Lei
2023-03-14 12:57 ` [PATCH V3 05/16] block: ublk_drv: mark device as LIVE before adding disk Ming Lei
2023-03-14 12:57 ` [PATCH V3 06/16] block: ublk_drv: add common exit handling Ming Lei
2023-03-14 12:57 ` [PATCH V3 07/16] block: ublk_drv: don't consider flush request in map/unmap io Ming Lei
2023-03-14 12:57 ` [PATCH V3 08/16] block: ublk_drv: add two helpers to clean up map/unmap request Ming Lei
2023-03-14 12:57 ` [PATCH V3 09/16] block: ublk_drv: clean up several helpers Ming Lei
2023-03-14 12:57 ` [PATCH V3 10/16] block: ublk_drv: cleanup 'struct ublk_map_data' Ming Lei
2023-03-14 12:57 ` [PATCH V3 11/16] block: ublk_drv: cleanup ublk_copy_user_pages Ming Lei
2023-03-14 12:57 ` [PATCH V3 12/16] block: ublk_drv: grab request reference when the request is handled by userspace Ming Lei
2023-03-14 12:57 ` [PATCH V3 13/16] block: ublk_drv: support to copy any part of request pages Ming Lei
2023-03-14 12:57 ` [PATCH V3 14/16] block: ublk_drv: add read()/write() support for ublk char device Ming Lei
2023-03-14 12:57 ` [PATCH V3 15/16] block: ublk_drv: don't check buffer in case of zero copy Ming Lei
2023-03-14 12:57 ` [PATCH V3 16/16] block: ublk_drv: apply io_uring FUSED_CMD for supporting " Ming Lei
2023-03-16  3:13 ` [PATCH V3 00/16] io_uring/ublk: add IORING_OP_FUSED_CMD Xiaoguang Wang
2023-03-16  3:56   ` Ming Lei
2023-03-18 16:23   ` Pavel Begunkov
2023-03-18 16:39     ` Ming Lei
2023-03-21  9:17     ` Ziyang Zhang
2023-03-27 16:04       ` Pavel Begunkov
2023-03-28  1:01         ` Ming Lei [this message]
2023-03-28 11:01           ` Pavel Begunkov
2023-03-28  0:53       ` Ming Lei
2023-03-29  6:57         ` Ziyang Zhang
2023-03-29  8:52           ` Ming Lei
2023-03-25 14:15     ` Ming Lei
2023-03-17  8:14 ` Ming Lei
2023-03-18 12:59   ` Jens Axboe
2023-03-18 13:35     ` Ming Lei
2023-03-18 14:36       ` Jens Axboe
2023-03-18 15:06         ` Ming Lei
2023-03-18 16:51       ` Pavel Begunkov
2023-03-18 23:42         ` Ming Lei
2023-03-19  0:17           ` Ming Lei
2023-03-28 10:55           ` Pavel Begunkov
2023-03-28 13:01             ` Ming Lei
2023-03-29  6:59               ` Ziyang Zhang
2023-03-29 10:43               ` Pavel Begunkov
2023-03-29 11:55                 ` Ming Lei
2023-03-18 16:09 ` Jens Axboe
2023-03-18 17:01   ` Ming Lei
2023-03-21 15:56 ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox