From: JeffleXu <[email protected]>
To: Mike Snitzer <[email protected]>
Cc: [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected]
Subject: Re: [RFC 0/3] Add support of iopoll for dm device
Date: Thu, 22 Oct 2020 13:28:01 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 10/22/20 4:39 AM, Mike Snitzer wrote:
> What you've _done_ could serve as a stop-gap but I'd really rather we
> get it properly designed from the start.
Indeed I totally agree with you that the design should be done nicely at
the very beginning. And this
is indeed the purpose of this RFC patch.
>> This patch set adds support of iopoll for dm device.
>>
>> This is only an RFC patch. I'm really looking forward getting your
>> feedbacks on if you're interested in supporting iopoll for dm device,
>> or if there's a better design to implement that.
>>
>> Thanks.
>>
>>
>> [Purpose]
>> IO polling is an important mode of io_uring. Currently only mq devices
>> support iopoll. As for dm devices, only dm-multipath is request-base,
>> while others are all bio-based and have no support for iopoll.
>> Supporting iopoll for dm devices can be of great meaning when the
>> device seen by application is dm device such as dm-linear/dm-stripe,
>> in which case iopoll is not usable for io_uring.
> I appreciate you taking the initiative on this; polling support is on my
> TODO so your work serves as a nice reminder to pursue this more
> urgently.
It's a good news that iopoll for DM is meaningful.
> but we cannot avoid properly mapping a cookie to each
> split bio. Otherwise you resort to inefficiently polling everything.
Yes. At the very beginning I tried to build the mapping a cookie to
each bio, but I failed with several
blocking design issues. By the way maybe we could clarify these design
issues here, if you'd like.
>
> Seems your attempt to have the cookie point to a dm_io object was likely
> too coarse-grained (when bios are split they get their own dm_io on
> recursive re-entry to DM core from ->submit_bio); but isn't having a
> list of cookies still too imprecise for polling purposes? You could
> easily have a list that translates to many blk-mq queues. Possibly
> better than your current approach of polling everything -- but not
> much.
To make the discussion more specific, assume that dm0 is mapped to
dm1/2/3, while dm1 mapped to
nvme1, dm2 mapped to dm2, etc..
dm0
dm1 dm2 dm3
nvme1 nvme2 nvme3
Then the returned cookie of dm0 could be pointer pointing to dm_io
object of dm0.
struct dm_io { // the returned cookie points to dm_io object
...
+ struct list_head cookies;
};
struct dm_target_io {
...
/*
* The corresponding polling hw queue if submitted to mq device (such as nvme1/2/3),
* NULL if submitted to dm device (such as dm1/2/3)
*/
+ struct blk_mq_hw_ctx *hctx;
+ struct list_head node; // add to @cookies list
};
The @cookies list of dm_io object could maintain all dm_target_io objects
of all **none-dm** devices, that is, all hw queues that we should poll on.
returned -> @cookies list
cookie of dm_io object of dm0
|
+--> dm_target_io -> dm_target_io -> dm_target_io
object of nvme1 object of nvme2 object of nvme3
When polling returned cookie of dm0, actually we're polling @cookies
list. Once one of the dm_target_io
completed (e.g. nvme2), it should be removed from the @cookies list.,
and thus we should only focus on
hw queues that have not completed.
>> [Design Notes]
>>
>> cookie
>> ------
>> Let's start from cookie. Cookie is one important concept in iopoll. It
>> is used to identify one specific request in one specific hardware queue.
>> The concept of cookie is initially designed as a per-bio concept, and
>> thus it doesn't work well when bio-split involved. When bio is split,
>> the returned cookie is indeed the cookie of one of the split bio, and
>> the following polling on this returned cookie can only guarantee the
>> completion of this specific split bio, while the other split bios may
>> be still uncompleted. Bio-split is also resolved for dm device, though
>> in another form, in which case the original bio submitted to the dm
>> device may be split into multiple bios submitted to the underlying
>> devices.
>>
>> In previous discussion, Lei Ming has suggested that iopoll should be
>> disabled for bio-split. This works for the normal bio-split (done in
>> blk_mq_submit_bio->__blk_queue_split), while iopoll will never be
>> usable for dm devices if this also applies for dm device.
>>
>> So come back to the design of the cookie for dm devices. At the very
>> beginning I want to refactor the design of cookie, from 'unsigned int'
>> type to the pointer type for dm device, so that cookie can point to
>> something, e.g. a list containing all cookies of one original bio,
>> something like this:
>>
>> struct dm_io { // the returned cookie points to dm_io
>> ...
>> struct list_head cookies;
>> };
>>
>> In this design, we can fetch all cookies of one original bio, but the
>> implementation can be difficult and buggy. For example, the
>> 'struct dm_io' structure may be already freed when the returned cookie
>> is used in blk_poll(). Then what if maintain a refcount in struct dm_io
>> so that 'struct dm_io' structure can not be freed until blk_poll()
>> called? Then the 'struct dm_io' structure will never be freed if the
>> IO polling is not used at all.
> I'd have to look closer at the race in the code you wrote (though you
> didn't share it);
I worried that dm_target_io/dm_io objects could have been freed
before/when we are polling on them,
and thus could cause use-after-free when accessing @cookies list in
dm_target_io. It could happen
when there are multiple polling instance. io_uring has implemented
per-instance polling thread. If
there are two bios submitted to dm0, please consider the following race
sequence:
1. race 1: dm_target_io/dm_io objects could have been freed ****when****
we are polling on them
```*
*
*thread1 polling on bio1 thread2 polling on bio2*
fetch dm_io object by the cookie
reaps completions in nvme 1/2/3,
completes bio1 and frees dm_io object of bio1 by the way
use-after-free when accessing dm_io object
```
Maybe we should get a refcount of dm_io of bio1 when start polling, but
I'm not sure if the design is
elegant or not.
2. race 2: dm_target_io/dm_io objects could have been freed
****before**** we are polling on them
```
*thread1 polling on bio1 thread2 polling on bio2*
reaps completions in nvme 1/2/3,
clone_endio
dec_pending
free_io(md, io); // free dm_io object
__blkdev_direct_IO
READ_ONCE(dio->waiter) // dio->waiter is still none-NULL
bio_endio(io->orig_bio) //call bi_end_io() of original bio,
that is blkdev_bio_end_io
WRITE_ONCE(dio->waiter, NULL);
blk_poll
fetch dm_io object by the cookie // use-after-free!
```
Thanks.
Jeffle.
next parent reply other threads:[~2020-10-22 5:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <[email protected]>
[not found] ` <[email protected]>
2020-10-22 5:28 ` JeffleXu [this message]
2020-10-26 18:53 ` [RFC 0/3] Add support of iopoll for dm device Mike Snitzer
2020-11-02 3:14 ` JeffleXu
2020-11-02 15:28 ` Mike Snitzer
2020-11-03 8:59 ` JeffleXu
2020-11-04 6:47 ` JeffleXu
2020-11-04 15:08 ` Mike Snitzer
2020-11-06 2:51 ` [dm-devel] " JeffleXu
2020-11-06 17:45 ` Mike Snitzer
2020-11-08 1:09 ` [dm-devel] " JeffleXu
2020-11-09 18:15 ` Mike Snitzer
2020-11-10 1:43 ` [dm-devel] " JeffleXu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da936cfa-93a8-d6ec-bd88-c0fad6c67c8b@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox