From: Mike Snitzer <[email protected]>
To: JeffleXu <[email protected]>
Cc: [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected]
Subject: Re: [RFC 0/3] Add support of iopoll for dm device
Date: Mon, 26 Oct 2020 14:53:35 -0400 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On Thu, Oct 22 2020 at 1:28am -0400,
JeffleXu <[email protected]> wrote:
>
> On 10/22/20 4:39 AM, Mike Snitzer wrote:
>
> >What you've _done_ could serve as a stop-gap but I'd really rather we
> >get it properly designed from the start.
>
> Indeed I totally agree with you that the design should be done
> nicely at the very beginning. And this
>
> is indeed the purpose of this RFC patch.
>
>
> >>This patch set adds support of iopoll for dm device.
> >>
> >>This is only an RFC patch. I'm really looking forward getting your
> >>feedbacks on if you're interested in supporting iopoll for dm device,
> >>or if there's a better design to implement that.
> >>
> >>Thanks.
> >>
> >>
> >>[Purpose]
> >>IO polling is an important mode of io_uring. Currently only mq devices
> >>support iopoll. As for dm devices, only dm-multipath is request-base,
> >>while others are all bio-based and have no support for iopoll.
> >>Supporting iopoll for dm devices can be of great meaning when the
> >>device seen by application is dm device such as dm-linear/dm-stripe,
> >>in which case iopoll is not usable for io_uring.
> >I appreciate you taking the initiative on this; polling support is on my
> >TODO so your work serves as a nice reminder to pursue this more
> >urgently.
>
> It's a good news that iopoll for DM is meaningful.
>
>
> >but we cannot avoid properly mapping a cookie to each
> >split bio. Otherwise you resort to inefficiently polling everything.
>
> Yes. At the very beginning I tried to build the mapping a cookie to
> each bio, but I failed with several
>
> blocking design issues. By the way maybe we could clarify these
> design issues here, if you'd like.
Biggest issue I'm seeing is that block core's bio-based IO submission
implementation really never seriously carried the blk_qc_t changes
through. The cookie return from __submit_bio is thrown away when
recursion occurs in __submit_bio_noacct -- last bio submission's cookie
is what is returned back to caller. That cookie could be very far
removed from all the others returned earlier in the recursion.
Fixing this would require quite some design and cleanup.
> >Seems your attempt to have the cookie point to a dm_io object was likely
> >too coarse-grained (when bios are split they get their own dm_io on
> >recursive re-entry to DM core from ->submit_bio); but isn't having a
> >list of cookies still too imprecise for polling purposes? You could
> >easily have a list that translates to many blk-mq queues. Possibly
> >better than your current approach of polling everything -- but not
> >much.
>
> To make the discussion more specific, assume that dm0 is mapped to
> dm1/2/3, while dm1 mapped to
>
> nvme1, dm2 mapped to dm2, etc..
>
> dm0
>
> dm1 dm2 dm3
>
> nvme1 nvme2 nvme3
>
>
> Then the returned cookie of dm0 could be pointer pointing to dm_io
> object of dm0.
>
> struct dm_io { // the returned cookie points to dm_io object
> ...
> + struct list_head cookies;
> };
>
> struct dm_target_io {
> ...
> /*
> * The corresponding polling hw queue if submitted to mq device (such as nvme1/2/3),
> * NULL if submitted to dm device (such as dm1/2/3)
> */
> + struct blk_mq_hw_ctx *hctx;
> + struct list_head node; // add to @cookies list
> };
>
> The @cookies list of dm_io object could maintain all dm_target_io objects
> of all **none-dm** devices, that is, all hw queues that we should poll on.
>
>
> returned -> @cookies list
> cookie of dm_io object of dm0
> |
> +--> dm_target_io -> dm_target_io -> dm_target_io
> object of nvme1 object of nvme2 object of nvme3
>
> When polling returned cookie of dm0, actually we're polling @cookies
> list. Once one of the dm_target_io
>
> completed (e.g. nvme2), it should be removed from the @cookies
> list., and thus we should only focus on
>
> hw queues that have not completed.
What you detailed there isn't properly modeling what it needs to.
A given dm_target_io could result in quite a few bios (e.g. for
dm-striped we clone each bio for each of N stripes). So the fan-out,
especially if then stacked on N layers of stacked devices, to all the
various hctx at the lowest layers is like herding cats.
But the recursion in block core's submit_bio path makes that challenging
to say the least. So much so that any solution related to enabling
proper bio-based IO polling is going to need a pretty significant
investment in fixing block core (storing __submit_bio()'s cookie during
recursion, possibly storing to driver provided memory location,
e.g. DM initialized bio->submit_cookie pointer to a blk_qc_t within a DM
clone bio's per-bio-data).
SO __submit_bio_noacct would become:
retp = &ret;
if (bio->submit_cookie)
retp = bio->submit_cookie;
*retp = __submit_bio(bio);
> >>[Design Notes]
> >>
> >>cookie
> >>------
> >>Let's start from cookie. Cookie is one important concept in iopoll. It
> >>is used to identify one specific request in one specific hardware queue.
> >>The concept of cookie is initially designed as a per-bio concept, and
> >>thus it doesn't work well when bio-split involved. When bio is split,
> >>the returned cookie is indeed the cookie of one of the split bio, and
> >>the following polling on this returned cookie can only guarantee the
> >>completion of this specific split bio, while the other split bios may
> >>be still uncompleted. Bio-split is also resolved for dm device, though
> >>in another form, in which case the original bio submitted to the dm
> >>device may be split into multiple bios submitted to the underlying
> >>devices.
> >>
> >>In previous discussion, Lei Ming has suggested that iopoll should be
> >>disabled for bio-split. This works for the normal bio-split (done in
> >>blk_mq_submit_bio->__blk_queue_split), while iopoll will never be
> >>usable for dm devices if this also applies for dm device.
> >>
> >>So come back to the design of the cookie for dm devices. At the very
> >>beginning I want to refactor the design of cookie, from 'unsigned int'
> >>type to the pointer type for dm device, so that cookie can point to
> >>something, e.g. a list containing all cookies of one original bio,
> >>something like this:
> >>
> >>struct dm_io { // the returned cookie points to dm_io
> >> ...
> >> struct list_head cookies;
> >>};
> >>
> >>In this design, we can fetch all cookies of one original bio, but the
> >>implementation can be difficult and buggy. For example, the
> >>'struct dm_io' structure may be already freed when the returned cookie
> >>is used in blk_poll(). Then what if maintain a refcount in struct dm_io
> >>so that 'struct dm_io' structure can not be freed until blk_poll()
> >>called? Then the 'struct dm_io' structure will never be freed if the
> >>IO polling is not used at all.
> >I'd have to look closer at the race in the code you wrote (though you
> >didn't share it);
>
> I worried that dm_target_io/dm_io objects could have been freed
> before/when we are polling on them,
>
> and thus could cause use-after-free when accessing @cookies list in
> dm_target_io. It could happen
>
> when there are multiple polling instance. io_uring has implemented
> per-instance polling thread. If
>
> there are two bios submitted to dm0, please consider the following
> race sequence:
The lifetime of the bios should be fine given that the cloning nature of
DM requires that all clones complete before the origin may complete.
I think you probably just got caught out by the recursive nature of the bio
submission path -- makes creating a graph of submitted bios and their
associated per-bio-data and generated cookies a mess to track (again,
like herding cats).
Could also be you didn't quite understand the DM code's various
structures.
In any case, the block core changes needed to make bio-based IO polling
work is the main limiting factor right now.
Not sure it is worth the investment... but I could be persuaded to try
harder! ;)
But then once block core is fixed to allow this, we _still_ need to link
all the various 'struct dm_poll_data' structures to allow blk_poll()
to call DM specific method to walk all in the list to calling blk_poll()
for stored cookie and request_queue*, e.g.:
struct dm_poll_data {
blk_qc_t cookie;
struct request_queue *queue;
struct list_head list;
};
Again, it is the recursive nature of submit_bio() that makes this
challenging.
Mike
next prev parent reply other threads:[~2020-10-26 19:54 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <[email protected]>
[not found] ` <[email protected]>
2020-10-22 5:28 ` [RFC 0/3] Add support of iopoll for dm device JeffleXu
2020-10-26 18:53 ` Mike Snitzer [this message]
2020-11-02 3:14 ` JeffleXu
2020-11-02 15:28 ` Mike Snitzer
2020-11-03 8:59 ` JeffleXu
2020-11-04 6:47 ` JeffleXu
2020-11-04 15:08 ` Mike Snitzer
2020-11-06 2:51 ` [dm-devel] " JeffleXu
2020-11-06 17:45 ` Mike Snitzer
2020-11-08 1:09 ` [dm-devel] " JeffleXu
2020-11-09 18:15 ` Mike Snitzer
2020-11-10 1:43 ` [dm-devel] " JeffleXu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox