public inbox for [email protected]
 help / color / mirror / Atom feed
From: Mike Snitzer <[email protected]>
To: JeffleXu <[email protected]>
Cc: [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]
Subject: Re: [RFC 0/3] Add support of iopoll for dm device
Date: Wed, 4 Nov 2020 10:08:48 -0500	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Wed, Nov 04 2020 at  1:47am -0500,
JeffleXu <[email protected]> wrote:

> 
> On 11/2/20 11:28 PM, Mike Snitzer wrote:
> >On Sun, Nov 01 2020 at 10:14pm -0500,
> >JeffleXu <[email protected]> wrote:
> >
> >>On 10/27/20 2:53 AM, Mike Snitzer wrote:
> >>>What you detailed there isn't properly modeling what it needs to.
> >>>A given dm_target_io could result in quite a few bios (e.g. for
> >>>dm-striped we clone each bio for each of N stripes).  So the fan-out,
> >>>especially if then stacked on N layers of stacked devices, to all the
> >>>various hctx at the lowest layers is like herding cats.
> >>>
> >>>But the recursion in block core's submit_bio path makes that challenging
> >>>to say the least.  So much so that any solution related to enabling
> >>>proper bio-based IO polling is going to need a pretty significant
> >>>investment in fixing block core (storing __submit_bio()'s cookie during
> >>>recursion, possibly storing to driver provided memory location,
> >>>e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM
> >>>clone bio's per-bio-data).
> >>>
> >>>SO __submit_bio_noacct would become:
> >>>
> >>>    retp = &ret;
> >>>    if (bio->submit_cookie)
> >>>           retp = bio->submit_cookie;
> >>>    *retp = __submit_bio(bio);
> >>Sorry for the late reply. Exactly I missed this point before. IF you
> >>have not started working on this, I'd like to try to implement this as
> >>an RFC.
> >I did start on this line of development but it needs quite a bit more
> >work.  Even the pseudo code I provided above isn't useful in the context
> >of DM clone bios that have their own per-bio-data to assist with this
> >implementation.  Because the __submit_bio_noacct() recursive call
> >drivers/md/dm.c:__split_and_process_bio() makes is supplying the
> >original bio (modified to only point to remaining work).
> 
> Yes I noticed this recently. Since the depth-first splitting
> introduced in commit 18a25da84354
> 
> ("dm: ensure bio submission follows a depth-first tree walk"), one
> bio to dm device can be
> 
> split into multiple bios to this dm device.
> 
> ```
> 
> one bio to dm device (dm0) = one dm_io (to nvme0) + one bio to this
> same dm device (dm0)
> 
> ```
> 
> 
> In this case we need a mechanism to track all split sub-bios of the
> very beginning original bio.

Yes, splitting causes additional potential for sub-bios.  There are
other cases that cause a 1-to-many bio generation (e.g. dm-striped) or
splitting cases where a DM target makes use of dm_accept_partial_bio
(e.g. dm-snapshot, dm-integrity, dm-writecache, etc).


> I'm doubted if this should be implemented in block layer like:
> 
> ```
> 
> struct bio {
> 
>     ...
> 
>     struct list_head  cookies;
> 
> };
> 
> ```
> 
> After all it's only used by bio-based queue, or more specifically
> only dm device currently.

I do think this line of work really should be handled in block core
because I cannot see any reason why MD or bcache or whatever bio-based
device wouldn't want the ability to better support io_uring (with IO
poll).

> Another design I can come up with is to maintain a global data
> structure for the very beginning
> original bio. Currently the blocking point is that now one original
> bio to the dm device (@bio of dm_submit()) can correspond to multiple
> dm_io and thus we have nowhere to place the @cookies list.

Yes, and that will always be the case.  We need the design to handle an
arbitrary sprawl of splitting from a given bio.  The graph of bios
resulting from that fan-out needs to be walked at various levels -- be
it the top-level original bio's submit_bio() returned cookie or some
intermediate point in the chain of bios.

The problem is the lifetime of the data structure created for a given
split bio versus layering boundaries (that come from block core's
simplistic recursion via bio using submit_bio).

> Now we have to maintain one data structure for every original bio,
> something like
> 
> ```
> 
> struct dm_poll_instance {
> 
>     ...
> 
>     struct list_head cookies;
> 
> };
> 
> ```

I do think we need a hybrid where at the point of recursion we're able
to make the associated data structure available across the recursion
boundary so that modeling the association in a chain of split bios is
possible. (e.g. struct dm_poll_data or dm_poll_instance as you named it,
_but_ that struct definition would live in block core, but would be part
of per-bio-data; so 'struct blk_poll_data' is more logical name when
elevated to block core).

It _might_ be worthwhile to see if a new BIO_ flag could be added to
allow augmenting the bio_split + bio_chain pattern to also track this
additional case of carrying additional data per-bio while creating
bio-chains.  I may not be clear yet, said differently: augmenting
bio_chain to not only chain bios, but to _also_ thread/chain together
per-bio-data that lives within those chained bios.  SO you have the
chain of bios _and_ the chain of potentially opaque void * that happens
to point to a list head for a list of 'struct blk_poll_data'.

Does that make sense?

> We can transfer this dm_poll_instance between split bios by
> bio->bi_private, like
> 
> ```
> 
> dm_submit_bio(...) {
> 
>     struct dm_poll_instance *ins;
> 
>     if (bio->bi_private)
> 
>         ins = bio->bi_private;
> 
>     else {
> 
>         ins = alloc_poll_instance();
> 
>         bio->bi_private = ins;
> 
>     }
> 
>     ...
> 
> }
> 
> ```

Sadly, we cannot (ab)use bi_private for this given its (ab)used via the
bio_chain() interface.  It's almost like we need to add a new pointer in
the bio that isn't left for block core to hijack.

There is the well-worn pattern of saving off the original bi_private,
hooking a new endio method and then when that endio is called restoring
bi_private but we really want to avoid excessive indirect function calls
for this usecase.  The entire point of implementing blk_poll support is
for performance after all.

Mike


  reply	other threads:[~2020-11-04 16:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <[email protected]>
     [not found] ` <[email protected]>
2020-10-22  5:28   ` [RFC 0/3] Add support of iopoll for dm device JeffleXu
2020-10-26 18:53     ` Mike Snitzer
2020-11-02  3:14       ` JeffleXu
2020-11-02 15:28         ` Mike Snitzer
2020-11-03  8:59           ` JeffleXu
2020-11-04  6:47           ` JeffleXu
2020-11-04 15:08             ` Mike Snitzer [this message]
2020-11-06  2:51               ` [dm-devel] " JeffleXu
2020-11-06 17:45                 ` Mike Snitzer
2020-11-08  1:09                   ` [dm-devel] " JeffleXu
2020-11-09 18:15                     ` Mike Snitzer
2020-11-10  1:43                       ` [dm-devel] " JeffleXu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox