From: Ming Lei <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected], [email protected],
[email protected],
Miklos Szeredi <[email protected]>,
ZiyangZhang <[email protected]>,
Xiaoguang Wang <[email protected]>,
Bernd Schubert <[email protected]>,
Pavel Begunkov <[email protected]>,
Stefan Hajnoczi <[email protected]>,
Dan Williams <[email protected]>,
[email protected]
Subject: Re: [PATCH V6 00/17] io_uring/ublk: add generic IORING_OP_FUSED_CMD
Date: Tue, 4 Apr 2023 15:48:50 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
Hello Jens and Everyone,
On Sun, Apr 02, 2023 at 07:24:17PM -0600, Jens Axboe wrote:
> On 4/2/23 7:11?PM, Ming Lei wrote:
> > On Thu, Mar 30, 2023 at 07:36:13PM +0800, Ming Lei wrote:
> >> Hello Jens and Guys,
> >>
> >> Add generic fused command, which can include one primary command and multiple
> >> secondary requests. This command provides one safe way to share resource between
> >> primary command and secondary requests, and primary command is always
> >> completed after all secondary requests are done, and resource lifetime
> >> is bound with primary command.
> >>
> >> With this way, it is easy to support zero copy for ublk/fuse device, and
> >> there could be more potential use cases, such as offloading complicated logic
> >> into userspace, or decouple kernel subsystems.
> >>
> >> Follows ublksrv code, which implements zero copy for loop, nbd and
> >> qcow2 targets with fused command:
> >>
> >> https://github.com/ming1/ubdsrv/tree/fused-cmd-zc-for-v6
> >>
> >> All three(loop, nbd and qcow2) ublk targets have supported zero copy by passing:
> >>
> >> ublk add -t [loop|nbd|qcow2] -z ....
> >>
> >> Also add liburing test case for covering fused command based on miniublk
> >> of blktest.
> >>
> >> https://github.com/ming1/liburing/tree/fused_cmd_miniublk_for_v6
> >>
> >> Performance improvement is obvious on memory bandwidth related workloads,
> >> such as, 1~2X improvement on 64K/512K BS IO test on loop with ramfs backing file.
> >> ublk-null shows 5X IOPS improvement on big BS test when the copy is avoided.
> >>
> >> Please review and consider for v6.4.
> >>
> >> V6:
> >> - re-design fused command, and make it more generic, moving sharing buffer
> >> as one plugin of fused command, so in future we can implement more plugins
> >> - document potential other use cases of fused command
> >> - drop support for builtin secondary sqe in SQE128, so all secondary
> >> requests has standalone SQE
> >> - make fused command as one feature
> >> - cleanup & improve naming
> >
> > Hi Jens,
> >
> > Can you apply ublk cleanup patches 7~11 on for-6.4? For others, we may
> > delay to 6.5, and I am looking at other approach too.
>
> Done - and yes, we're probably looking at 6.5 for the rest. But that's
Thanks!
> fine, I'd rather end up with the right interface than try and rush one.
Also I'd provide one summery about this work here so that it may help
for anyone interested in this work, follows three approaches we have
tried or proposed:
1) splice can't do this job[1][2]
2) fused command in this patchset
- it is more like sendfile() or copy_file_range(), because the internal
buffer isn't exposed outside
- v6 becomes a bit more generic, the theory is that one SQE list is submitted
as a whole request logically; the 1st sqe is the primary command, which
provides buffer for others, and is responsible for submitting other SQEs
(secondary)in this list; the primary command isn't completed until all secondary
requests are done
- this approach solves two problems efficiently in one simple way:
a) buffer lifetime issue, and buffer lifetime is same with primary command, so
all secondary OPs can be submitted & completely safely
b) request dependency issue, all secondary requests depend on primary command,
and secondary request itself could be independent, we start to allow to submit
secondary request in non-async style, and all secondary requests can be issued
concurrently
- this approach is simple, because we don't expose buffer outside, and
buffer is just shared among these secondary requests; meantime
internal buffer saves us complicated OPs' dependency issue, avoid
contention by registering buffer anywhere between submission and
completion code path
- the drawback is that we add one new SQE usage/model of primary SQE and
secondary SQEs, and the whole logical request in concept, which is
like sendfile() or copy_file_range()
3) register transient buffers for OPs[3]
- it is more like splice(), which is flexible and could be more generic, but
internal pipe buffer is added to pipe which is visible outside, so the
implementation becomes complicated; and it should be more than splice(),
because the io buffer needs to be shared among multiple OPs
- inefficiently & complicated
a) buffer has to be added to one global container(suppose it is
io_uring context pipe) by ADD_BUF OP, and either buffer needs to be removed after
consumer OPs are completed, or DEL_OP is run for removing buffer explicitly, so
either contention on the io_uring pipe is added, or another new dependency is
added(DEL_OP depends on all normal OPs)
b) ADD_BUF OP is needed, and normal OPs have to depend on this new
OP by IOSQE_IO_LINK, then all normal OPs will be submitted in async way,
even worse, each normal OP has to be issued one by one, because io_uring
isn't capable of handling 1:N dependency issue[5]
c) if DEL_BUF OP is needed, then it is basically not possible
to solve 1:N dependency any more, given DEL_BUF starts to depends on the previous
N OPs; otherwise, contention on pipe is inevitable.
d) solving 1:N dependency issue generically
- advantage
Follows current io_uring SQE usage, and looks more generic/flexible,
like splice().
4) others approaches or suggestions?
Any idea is welcome as usual.
Finally from problem viewpoint, if the problem domain is just ublk/fuse zero copy
or other similar problems[6], fused command might be the simpler & more efficient
approach, compared with approach 3). However, are there any other problems we
want to cover by one more generic/flexible interface? If not, would we
like to pay the complexity & inefficiency for one kind of less generic
problem?
[1] https://lore.kernel.org/linux-block/[email protected]/T/#m1bfa358524b6af94731bcd5be28056f9f4408ecf
[2] https://github.com/ming1/linux/blob/my_v6.3-io_uring_fuse_cmd_v6/Documentation/block/ublk.rst#zero-copy
[3] https://lore.kernel.org/linux-block/[email protected]/T/#mbe428dfeb0417487cd1db7e6dabca7399a3c265b
[4] https://lore.kernel.org/linux-block/[email protected]/T/#md035ffa4c6b69e85de2ab145418a9849a3b33741
[5] https://lore.kernel.org/linux-block/[email protected]/T/#m5e0c282ad26d9f3d8e519645168aeb3a19b5740b
[6] https://lore.kernel.org/linux-block/[email protected]/T/#me5cca4db606541fae452d625780635fcedcd5c6c
Thanks,
Ming
next prev parent reply other threads:[~2023-04-04 7:49 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-30 11:36 [PATCH V6 00/17] io_uring/ublk: add generic IORING_OP_FUSED_CMD Ming Lei
2023-03-30 11:36 ` [PATCH V6 01/17] io_uring: increase io_kiocb->flags into 64bit Ming Lei
2023-03-30 11:36 ` [PATCH V6 02/17] io_uring: use ctx->cached_sq_head to calculate left sqes Ming Lei
2023-03-30 11:36 ` [PATCH V6 03/17] io_uring: add generic IORING_OP_FUSED_CMD Ming Lei
2023-04-01 14:35 ` Ming Lei
2023-03-30 11:36 ` [PATCH V6 04/17] io_uring: support providing buffer by IORING_OP_FUSED_CMD Ming Lei
2023-03-30 11:36 ` [PATCH V6 05/17] io_uring: support OP_READ/OP_WRITE for fused secondary request Ming Lei
2023-03-30 11:36 ` [PATCH V6 06/17] io_uring: support OP_SEND_ZC/OP_RECV " Ming Lei
2023-03-30 11:36 ` [PATCH V6 07/17] block: ublk_drv: add common exit handling Ming Lei
2023-03-30 11:36 ` [PATCH V6 08/17] block: ublk_drv: don't consider flush request in map/unmap io Ming Lei
2023-03-30 11:36 ` [PATCH V6 09/17] block: ublk_drv: add two helpers to clean up map/unmap request Ming Lei
2023-03-30 11:36 ` [PATCH V6 10/17] block: ublk_drv: clean up several helpers Ming Lei
2023-03-30 11:36 ` [PATCH V6 11/17] block: ublk_drv: cleanup 'struct ublk_map_data' Ming Lei
2023-03-30 11:36 ` [PATCH V6 12/17] block: ublk_drv: cleanup ublk_copy_user_pages Ming Lei
2023-03-31 16:22 ` Bernd Schubert
2023-03-30 11:36 ` [PATCH V6 13/17] block: ublk_drv: grab request reference when the request is handled by userspace Ming Lei
2023-03-30 11:36 ` [PATCH V6 14/17] block: ublk_drv: support to copy any part of request pages Ming Lei
2023-03-30 11:36 ` [PATCH V6 15/17] block: ublk_drv: add read()/write() support for ublk char device Ming Lei
2023-03-30 11:36 ` [PATCH V6 16/17] block: ublk_drv: don't check buffer in case of zero copy Ming Lei
2023-03-30 11:36 ` [PATCH V6 17/17] block: ublk_drv: apply io_uring FUSED_CMD for supporting " Ming Lei
2023-03-31 19:13 ` Bernd Schubert
2023-04-01 13:19 ` Ming Lei
2023-03-31 19:55 ` Bernd Schubert
2023-04-01 13:22 ` Ming Lei
2023-04-03 9:25 ` Bernd Schubert
2023-04-03 1:11 ` [PATCH V6 00/17] io_uring/ublk: add generic IORING_OP_FUSED_CMD Ming Lei
2023-04-03 1:24 ` Jens Axboe
2023-04-04 7:48 ` Ming Lei [this message]
2023-04-03 1:23 ` (subset) " Jens Axboe
2023-04-18 19:38 ` Bernd Schubert
2023-04-19 1:51 ` Ming Lei
2023-04-19 9:56 ` Bernd Schubert
2023-04-19 11:19 ` Ming Lei
2023-04-19 15:42 ` Bernd Schubert
2023-04-20 1:18 ` Pavel Begunkov
2023-04-20 1:38 ` Ming Lei
2023-04-21 22:38 ` Bernd Schubert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox