public inbox for [email protected]
 help / color / mirror / Atom feed
From: Bernd Schubert <[email protected]>
To: Ming Lei <[email protected]>, Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	Miklos Szeredi <[email protected]>,
	Christoph Hellwig <[email protected]>,
	Ziyang Zhang <[email protected]>,
	Xiaoguang Wang <[email protected]>
Cc: "[email protected]" 
	<[email protected]>,
	"[email protected]" <[email protected]>,
	"[email protected]" <[email protected]>
Subject: Re: [LSF/MM/BPF TOPIC] ublk & io_uring: ublk zero copy support
Date: Fri, 5 May 2023 21:57:47 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <ZEx+h/[email protected]>

Hi Ming,

On 4/29/23 04:18, Ming Lei wrote:
> Hello,
> 
> ublk zero copy is observed to improve big chunk(64KB+) sequential IO performance a
> lot, such as, IOPS of ublk-loop over tmpfs is increased by 1~2X[1], Jens also observed
> that IOPS of ublk-qcow2 can be increased by ~1X[2]. Meantime it saves memory bandwidth.
> 
> So this is one important performance improvement.
> 
> So far there are three proposal:

looks like there is no dedicated session. Could we still have a 
discussion in a free slot, if possible?

Thanks,
Bernd


> 
> 1) splice based
> 
> - spliced page from ->splice_read() can't be written
> 
> ublk READ request can't be handled because spliced page can't be written
> to, and extending splice for ublk zero copy isn't one good solution[3]
> 
> - it is very hard to meet above requirements  wrt. request buffer lifetime
> 
> splice/pipe focuses on page reference lifetime, but ublk zero copy pays more
> attention to ublk request buffer lifetime. If is very inefficient to respect
> request buffer lifetime by using all pipe buffer's ->release() which requires
> all pipe buffers and pipe to be kept when ublk server handles IO. That means
> one single dedicated ``pipe_inode_info`` has to be allocated runtime for each
> provided buffer, and the pipe needs to be populated with pages in ublk request
> buffer.
> 
> IMO, it isn't one good way to take splice from both correctness and performance
> viewpoint.
> 
> 2) io_uring register buffer based
> 
> - the main idea is to register one runtime buffer in fast io path, and
>    unregister it after the buffer is used by the following OPs
> 
> - the main problem is that bad performance caused by io_uring link model
> 
> registering buffer has to be one OP, same with unregistering buffer; the
> following normal OPs(such as FS IO) have to depend on the registering
> buffer OP, then io_uring link has to be used.
> 
> It is normal to see more than one normal OPs which depend on the registering
> buffer OP, so all these OPs(registering buffer, normal (FS IO) OPs and
> unregistering buffer) have to be linked together, then normal(FS IO) OPs
> have to be submitted one by one, and this way is slow, because there is
> often no dependency among all these normal FS OPs. Basically io_uring
> link model does not support this kind of 1:N dependency.
> 
> No one posted code for showing this approach yet.
> 
> 3) io_uring fused command[1]
> 
> - fused command extend current io_uring usage by allowing submitting following
> FS OPs(called secondary OPs) after the primary command provides buffer, and
> primary command won't be completed until all secondary OPs are done.
> 
> This way solves the problem in 2), and meantime avoids the buffer register cost in
> both submission and completion IO fast code path because the primary command won't
> be completed until all secondary OPs are done, so no need to write/read the
> buffer into per-context global data structure.
> 
> Meantime buffer lifetime problem is addressed simply, so correctness gets guaranteed,
> and performance is pretty good, and even IOPS of 4k IO gets a little
> improved in some workloads, or at least no perf regression is observed
> for small size IO.
> 
> fused command can be thought as one single request logically, just it has more
> than one SQE(all share same link flag), that is why is named as fused command.
> 
> - the only concern is that fused command starts one use usage of io_uring, but
> still not see comments wrt. what/why is bad with this kind of new usage/interface.
> 
> I propose this topic and want to discuss about how to move on with this
> feature.
> 
> 
> [1] https://lore.kernel.org/linux-block/[email protected]/
> [2] https://lore.kernel.org/linux-block/[email protected]/
> [3] https://lore.kernel.org/linux-block/CAHk-=wgJsi7t7YYpuo6ewXGnHz2nmj67iWR6KPGoz5TBu34mWQ@mail.gmail.com/
> 
> 
> Thanks,
> Ming
> 


  reply	other threads:[~2023-05-05 21:58 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-29  2:18 [LSF/MM/BPF TOPIC] ublk & io_uring: ublk zero copy support Ming Lei
2023-05-05 21:57 ` Bernd Schubert [this message]
2023-05-06  1:38   ` Ming Lei
2023-05-08  2:16     ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox