public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Bernd Schubert <[email protected]>
Cc: Jens Axboe <[email protected]>,
	"[email protected]" <[email protected]>,
	"[email protected]" <[email protected]>,
	"[email protected]" <[email protected]>,
	Miklos Szeredi <[email protected]>,
	ZiyangZhang <[email protected]>,
	Xiaoguang Wang <[email protected]>,
	Pavel Begunkov <[email protected]>,
	Stefan Hajnoczi <[email protected]>,
	Dan Williams <[email protected]>,
	Amir Goldstein <[email protected]>,
	[email protected]
Subject: Re: [PATCH V6 00/17] io_uring/ublk: add generic IORING_OP_FUSED_CMD
Date: Wed, 19 Apr 2023 09:51:31 +0800	[thread overview]
Message-ID: <ZD9JI/[email protected]> (raw)
In-Reply-To: <[email protected]>

On Tue, Apr 18, 2023 at 07:38:03PM +0000, Bernd Schubert wrote:
> On 3/30/23 13:36, Ming Lei wrote:
> [...]
> > V6:
> > 	- re-design fused command, and make it more generic, moving sharing buffer
> > 	as one plugin of fused command, so in future we can implement more plugins
> > 	- document potential other use cases of fused command
> > 	- drop support for builtin secondary sqe in SQE128, so all secondary
> > 	  requests has standalone SQE
> > 	- make fused command as one feature
> > 	- cleanup & improve naming
> 
> Hi Ming, et al.,
> 
> I started to wonder if fused SQE could be extended to combine multiple 
> syscalls, for example open/read/close.  Which would be another solution 
> for the readfile syscall Miklos had proposed some time ago.
> 
> https://lore.kernel.org/lkml/CAJfpegusi8BjWFzEi05926d4RsEQvPnRW-w7My=ibBHQ8NgCuw@mail.gmail.com/
> 
> If fused SQEs could be extended, I think it would be quite helpful for 
> many other patterns. Another similar examples would open/write/close, 
> but ideal would be also to allow to have it more complex like 
> "open/write/sync_file_range/close" - open/write/close might be the 
> fastest and could possibly return before sync_file_range. Use case for 
> the latter would be a file server that wants to give notifications to 
> client when pages have been written out.

The above pattern needn't fused command, and it can be done by plain
SQEs chain, follows the usage:

1) suppose you get one command from /dev/fuse, then FUSE daemon
needs to handle the command as open/write/sync/close
2) get sqe1, prepare it for open syscall, mark it as IOSQE_IO_LINK;
3) get sqe2, prepare it for write syscall, mark it as IOSQE_IO_LINK;
4) get sqe3, prepare it for sync file range syscall, mark it as IOSQE_IO_LINK;
5) get sqe4, prepare it for close syscall
6) io_uring_enter();	//for submit and get events

Then all the four OPs are done one by one by io_uring internal
machinery, and you can choose to get successful CQE for each OP.

Is the above what you want to do?

The fused command proposal is actually for zero copy(but not limited to zc).

If the above write OP need to write to file with in-kernel buffer
of /dev/fuse directly, you can get one sqe0 and prepare it for primary command
before 1), and set sqe2->addr to offet of the buffer in 3).

However, fused command is usually used in the following way, such as FUSE daemon
gets one READ request from /dev/fuse, FUSE userspace can handle the READ request
as io_uring fused command:

1) get sqe0 and prepare it for primary command, in which you need to
provide info for retrieving kernel buffer/pages of this READ request

2) suppose this READ request needs to be handled by translating it to
READs to two files/devices, considering it as one mirror:

- get sqe1, prepare it for read from file1, and set sqe->addr to offset
  of the buffer in 1), set sqe->len as length for read; this READ OP
  uses the kernel buffer in 1) directly 

- get sqe2, prepare it for read from file2, and set sqe->addr to offset
  of buffer in 1), set sqe->len as length for read;  this READ OP
  uses the kernel buffer in 1) directly 

3) submit the three sqe by io_uring_enter()

sqe1 and sqe2 can be submitted concurrently or be issued one by one
in order, fused command supports both, and depends on user requirement.
But io_uring linked OPs is usually slower.

Also file1/file2 needs to be opened beforehand in this example, and FD is
passed to sqe1/sqe2, another choice is to use fixed File; Also you can
add the open/close() OPs into above steps, which need these open/close/READ
to be linked in order, usually slower tnan non-linked OPs.


Thanks, 
Ming


  reply	other threads:[~2023-04-19  1:52 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 11:36 [PATCH V6 00/17] io_uring/ublk: add generic IORING_OP_FUSED_CMD Ming Lei
2023-03-30 11:36 ` [PATCH V6 01/17] io_uring: increase io_kiocb->flags into 64bit Ming Lei
2023-03-30 11:36 ` [PATCH V6 02/17] io_uring: use ctx->cached_sq_head to calculate left sqes Ming Lei
2023-03-30 11:36 ` [PATCH V6 03/17] io_uring: add generic IORING_OP_FUSED_CMD Ming Lei
2023-04-01 14:35   ` Ming Lei
2023-03-30 11:36 ` [PATCH V6 04/17] io_uring: support providing buffer by IORING_OP_FUSED_CMD Ming Lei
2023-03-30 11:36 ` [PATCH V6 05/17] io_uring: support OP_READ/OP_WRITE for fused secondary request Ming Lei
2023-03-30 11:36 ` [PATCH V6 06/17] io_uring: support OP_SEND_ZC/OP_RECV " Ming Lei
2023-03-30 11:36 ` [PATCH V6 07/17] block: ublk_drv: add common exit handling Ming Lei
2023-03-30 11:36 ` [PATCH V6 08/17] block: ublk_drv: don't consider flush request in map/unmap io Ming Lei
2023-03-30 11:36 ` [PATCH V6 09/17] block: ublk_drv: add two helpers to clean up map/unmap request Ming Lei
2023-03-30 11:36 ` [PATCH V6 10/17] block: ublk_drv: clean up several helpers Ming Lei
2023-03-30 11:36 ` [PATCH V6 11/17] block: ublk_drv: cleanup 'struct ublk_map_data' Ming Lei
2023-03-30 11:36 ` [PATCH V6 12/17] block: ublk_drv: cleanup ublk_copy_user_pages Ming Lei
2023-03-31 16:22   ` Bernd Schubert
2023-03-30 11:36 ` [PATCH V6 13/17] block: ublk_drv: grab request reference when the request is handled by userspace Ming Lei
2023-03-30 11:36 ` [PATCH V6 14/17] block: ublk_drv: support to copy any part of request pages Ming Lei
2023-03-30 11:36 ` [PATCH V6 15/17] block: ublk_drv: add read()/write() support for ublk char device Ming Lei
2023-03-30 11:36 ` [PATCH V6 16/17] block: ublk_drv: don't check buffer in case of zero copy Ming Lei
2023-03-30 11:36 ` [PATCH V6 17/17] block: ublk_drv: apply io_uring FUSED_CMD for supporting " Ming Lei
2023-03-31 19:13   ` Bernd Schubert
2023-04-01 13:19     ` Ming Lei
2023-03-31 19:55   ` Bernd Schubert
2023-04-01 13:22     ` Ming Lei
2023-04-03  9:25     ` Bernd Schubert
2023-04-03  1:11 ` [PATCH V6 00/17] io_uring/ublk: add generic IORING_OP_FUSED_CMD Ming Lei
2023-04-03  1:24   ` Jens Axboe
2023-04-04  7:48     ` Ming Lei
2023-04-03  1:23 ` (subset) " Jens Axboe
2023-04-18 19:38 ` Bernd Schubert
2023-04-19  1:51   ` Ming Lei [this message]
2023-04-19  9:56     ` Bernd Schubert
2023-04-19 11:19       ` Ming Lei
2023-04-19 15:42         ` Bernd Schubert
2023-04-20  1:18           ` Pavel Begunkov
2023-04-20  1:38           ` Ming Lei
2023-04-21 22:38             ` Bernd Schubert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZD9JI/[email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox