public inbox for [email protected]
 help / color / mirror / Atom feed
From: JeffleXu <[email protected]>
To: Mike Snitzer <[email protected]>
Cc: [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]
Subject: Re: [RFC 0/3] Add support of iopoll for dm device
Date: Thu, 22 Oct 2020 13:28:01 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>


On 10/22/20 4:39 AM, Mike Snitzer wrote:

> What you've _done_ could serve as a stop-gap but I'd really rather we
> get it properly designed from the start.

Indeed I totally agree with you that the design should be done nicely at 
the very beginning. And this

is indeed the purpose of this RFC patch.


>> This patch set adds support of iopoll for dm device.
>>
>> This is only an RFC patch. I'm really looking forward getting your
>> feedbacks on if you're interested in supporting iopoll for dm device,
>> or if there's a better design to implement that.
>>
>> Thanks.
>>
>>
>> [Purpose]
>> IO polling is an important mode of io_uring. Currently only mq devices
>> support iopoll. As for dm devices, only dm-multipath is request-base,
>> while others are all bio-based and have no support for iopoll.
>> Supporting iopoll for dm devices can be of great meaning when the
>> device seen by application is dm device such as dm-linear/dm-stripe,
>> in which case iopoll is not usable for io_uring.
> I appreciate you taking the initiative on this; polling support is on my
> TODO so your work serves as a nice reminder to pursue this more
> urgently.

It's a good news that iopoll for DM is meaningful.


> but we cannot avoid properly mapping a cookie to each
> split bio.  Otherwise you resort to inefficiently polling everything.

Yes. At the very beginning  I tried to build the mapping a cookie to 
each bio, but I failed with several

blocking design issues. By the way maybe we could clarify these design 
issues here, if you'd like.


>
> Seems your attempt to have the cookie point to a dm_io object was likely
> too coarse-grained (when bios are split they get their own dm_io on
> recursive re-entry to DM core from ->submit_bio); but isn't having a
> list of cookies still too imprecise for polling purposes?  You could
> easily have a list that translates to many blk-mq queues.  Possibly
> better than your current approach of polling everything -- but not
> much.

To make the discussion more specific, assume that dm0 is mapped to 
dm1/2/3, while dm1 mapped to

nvme1, dm2 mapped to dm2, etc..

                     dm0

dm1             dm2            dm3

nvme1        nvme2        nvme3


Then the returned cookie of dm0 could be pointer pointing to dm_io 
object of dm0.

struct dm_io {  // the returned cookie points to dm_io object
	...
+	struct list_head cookies;
};

struct dm_target_io {
	...
	/*
	 * The corresponding polling hw queue if submitted to mq device (such as nvme1/2/3),
	 * NULL if submitted to dm device (such as dm1/2/3)
	 */
+	struct blk_mq_hw_ctx *hctx;
+	struct list_head      node;  // add to @cookies list
};

The @cookies list of dm_io object could maintain all dm_target_io objects
of all **none-dm** devices, that is, all hw queues that we should poll on.


returned  ->  @cookies list	
cookie	      of dm_io object of dm0
		   |
		   +--> dm_target_io	 ->  dm_target_io     ->  dm_target_io
			object of nvme1      object of nvme2	  object of nvme3

When polling returned cookie of dm0, actually we're polling @cookies 
list. Once one of the dm_target_io

completed (e.g. nvme2), it should be removed from the @cookies list., 
and thus we should only focus on

hw queues that have not completed.



>> [Design Notes]
>>
>> cookie
>> ------
>> Let's start from cookie. Cookie is one important concept in iopoll. It
>> is used to identify one specific request in one specific hardware queue.
>> The concept of cookie is initially designed as a per-bio concept, and
>> thus it doesn't work well when bio-split involved. When bio is split,
>> the returned cookie is indeed the cookie of one of the split bio, and
>> the following polling on this returned cookie can only guarantee the
>> completion of this specific split bio, while the other split bios may
>> be still uncompleted. Bio-split is also resolved for dm device, though
>> in another form, in which case the original bio submitted to the dm
>> device may be split into multiple bios submitted to the underlying
>> devices.
>>
>> In previous discussion, Lei Ming has suggested that iopoll should be
>> disabled for bio-split. This works for the normal bio-split (done in
>> blk_mq_submit_bio->__blk_queue_split), while iopoll will never be
>> usable for dm devices if this also applies for dm device.
>>
>> So come back to the design of the cookie for dm devices. At the very
>> beginning I want to refactor the design of cookie, from 'unsigned int'
>> type to the pointer type for dm device, so that cookie can point to
>> something, e.g. a list containing all cookies of one original bio,
>> something like this:
>>
>> struct dm_io { // the returned cookie points to dm_io
>> 	...
>> 	struct list_head cookies;
>> };
>>
>> In this design, we can fetch all cookies of one original bio, but the
>> implementation can be difficult and buggy. For example, the
>> 'struct dm_io' structure may be already freed when the returned cookie
>> is used in blk_poll(). Then what if maintain a refcount in struct dm_io
>> so that 'struct dm_io' structure can not be freed until blk_poll()
>> called? Then the 'struct dm_io' structure will never be freed if the
>> IO polling is not used at all.
> I'd have to look closer at the race in the code you wrote (though you
> didn't share it);

I worried that dm_target_io/dm_io objects could have been freed 
before/when we are polling on them,

and thus could cause use-after-free when accessing @cookies list in 
dm_target_io. It could happen

when there are multiple polling instance. io_uring has implemented 
per-instance polling thread. If

there are two bios submitted to dm0, please consider the following race 
sequence:


1. race 1: dm_target_io/dm_io objects could have been freed ****when**** 
we are polling on them

```*
*

*thread1 polling on bio1                    thread2 polling on bio2*

     fetch dm_io object by the cookie

reaps completions in nvme 1/2/3,

completes bio1 and frees dm_io object of bio1 by the way

     use-after-free when accessing dm_io object

```

Maybe we should get a refcount of dm_io of bio1 when start polling, but 
I'm not sure if the design is

elegant or not.



2. race 2: dm_target_io/dm_io objects could have been freed 
****before**** we are polling on them

```

*thread1 polling on bio1                    thread2 polling on bio2*

reaps completions in nvme 1/2/3,

clone_endio

     dec_pending

         free_io(md, io);  // free dm_io object

     __blkdev_direct_IO

         READ_ONCE(dio->waiter)  // dio->waiter is still none-NULL

         bio_endio(io->orig_bio) //call bi_end_io() of original bio, 
that is blkdev_bio_end_io

             WRITE_ONCE(dio->waiter, NULL);

         blk_poll

             fetch dm_io object by the cookie  // use-after-free!

```


Thanks.

Jeffle.


       reply	other threads:[~2020-10-22  5:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <[email protected]>
     [not found] ` <[email protected]>
2020-10-22  5:28   ` JeffleXu [this message]
2020-10-26 18:53     ` [RFC 0/3] Add support of iopoll for dm device Mike Snitzer
2020-11-02  3:14       ` JeffleXu
2020-11-02 15:28         ` Mike Snitzer
2020-11-03  8:59           ` JeffleXu
2020-11-04  6:47           ` JeffleXu
2020-11-04 15:08             ` Mike Snitzer
2020-11-06  2:51               ` [dm-devel] " JeffleXu
2020-11-06 17:45                 ` Mike Snitzer
2020-11-08  1:09                   ` [dm-devel] " JeffleXu
2020-11-09 18:15                     ` Mike Snitzer
2020-11-10  1:43                       ` [dm-devel] " JeffleXu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da936cfa-93a8-d6ec-bd88-c0fad6c67c8b@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox