Re: [PATCH v3 09/11] dm: support IO polling for bio-based dm device

public inbox for [email protected]
 help / color / mirror / Atom feed

From: JeffleXu <[email protected]>
To: Ming Lei <[email protected]>
Cc: [email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v3 09/11] dm: support IO polling for bio-based dm device
Date: Tue, 9 Feb 2021 16:46:01 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <20210209080739.GB94287@T590>



On 2/9/21 4:07 PM, Ming Lei wrote:
> On Tue, Feb 09, 2021 at 02:13:38PM +0800, JeffleXu wrote:
>>
>>
>> On 2/9/21 11:11 AM, Ming Lei wrote:
>>> On Mon, Feb 08, 2021 at 04:52:41PM +0800, Jeffle Xu wrote:
>>>> DM will iterate and poll all polling hardware queues of all target mq
>>>> devices when polling IO for dm device. To mitigate the race introduced
>>>> by iterating all target hw queues, a per-hw-queue flag is maintained
>>>
>>> What is the per-hw-queue flag?
>>
>> Sorry I forgot to update the commit message as the implementation
>> changed. Actually this mechanism is implemented by patch 10 of this
>> patch set.
> 
> It is hard to associate patch 10's spin_trylock() with per-hw-queue
> flag. 

You're right, the commit message here is totally a mistake. Actually I
had ever implemented a per-hw-queue flag, such as

```
struct blk_mq_hw_ctx {
	atomic_t busy;
	...
};
```

In this case, the skipping mechanism is implemented in block layer.


But later I refactor the code and move the implementation to the device
driver layer as described by patch 10, while forgetting to update the
commit message. The reason why I implement it in device driver layer is
that, the competition actually stems from the underlying device driver
(e.g., nvme driver), as described in the following snippet.

```
nvme_poll
	spin_lock(&nvmeq->cq_poll_lock);
	found = nvme_process_cq(nvmeq);
	spin_unlock(&nvmeq->cq_poll_lock);
```

It is @nvmeq->cq_poll_lock, i.e., the implementation of the underlying
device driver that has caused the competition. Thus maybe it is
reasonable to handle the competition issue in the device driver layer?


> Also scsi's poll implementation is in-progress, and scsi's poll may
> not be implemented in this way.

Yes. The defect of leaving the competition issue to the device driver
layer is that, every device driver supporting polling need to be somehow
optimized individually. Actually I have not taken a close look at the
other two types of nvme driver (drivers/nvme/host/tcp.c and
drivers/nvme/host/rdma.c), which also support polling.



>>
>>>
>>>> to indicate whether this polling hw queue currently being polled on or
>>>> not. Every polling hw queue is exclusive to one polling instance, i.e.,
>>>> the polling instance will skip this polling hw queue if this hw queue
>>>> currently is being polled by another polling instance, and start
>>>> polling on the next hw queue.
>>>
>>> Not see such skip in dm_poll_one_dev() in which
>>> queue_for_each_poll_hw_ctx() is called directly for polling all POLL
>>> hctxs of the request queue, so can you explain it a bit more about this
>>> skip mechanism?
>>>
>>
>> It is implemented as patch 10 of this patch set. When spin_trylock()
>> fails, the polling instance will return immediately, instead of busy
>> waiting.
>>
>>
>>> Even though such skipping is implemented, not sure if good performance
>>> can be reached because hctx poll may be done in ping-pong style
>>> among several CPUs. But blk-mq hctx is supposed to have its cpu affinities.
>>>
>>
>> Yes, the mechanism of iterating all hw queues can make the competition
>> worse.
>>
>> If every underlying data device has **only** one polling hw queue, then
>> this ping-pong style polling still exist, even when we implement split
>> bio tracking mechanism, i.e., acquiring the specific hw queue the bio
>> enqueued into. Because multiple polling instance has to compete for the
>> only polling hw queue.
>>
>> But if multiple polling hw queues per device are reserved for multiple
>> polling instances, (e.g., every underlying data device has 3 polling hw
>> queues when there are 3 polling instances), just as what we practice on
>> mq polling, then the current implementation of iterating all hw queues
>> will indeed works in a ping-pong style, while this issue shall not exist
>> when accurate split bio tracking mechanism could be implemented.
> 
> In reality it could be possible to have one hw queue for each numa node.
> 
> And you may re-use blk_mq_map_queue() for getting the proper hw queue for poll.

Thanks. But the optimization I proposed in [1] may not works well when
the IO submitting process migrates to another CPU halfway. I mean, the
process has submitted several split bios, and then it migrates to
another CPU and moves on submitting the left split bios.

[1]
https://lore.kernel.org/io-uring/[email protected]/T/#m0d9a0e55e11874a70c6a3491f191289df72a84f8

-- 
Thanks,
Jeffle

next prev parent reply	other threads:[~2021-02-09  8:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08  8:52 [PATCH v3 00/11] dm: support IO polling Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 01/11] block: move definition of blk_qc_t to types.h Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 02/11] block: add queue_to_disk() to get gendisk from request_queue Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 03/11] block: add poll method to support bio-based IO polling Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 04/11] block: add poll_capable " Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 05/11] block/mq: extract one helper function polling hw queue Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 06/11] block/mq: add iterator for polling hw queues Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 07/11] dm: always return BLK_QC_T_NONE for bio-based device Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 08/11] dm: fix iterate_device sanity check Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 09/11] dm: support IO polling for bio-based dm device Jeffle Xu
2021-02-09  3:11   ` Ming Lei
2021-02-09  6:13     ` JeffleXu
2021-02-09  6:22       ` JeffleXu
2021-02-09  8:07       ` Ming Lei
2021-02-09  8:46         ` JeffleXu [this message]
2021-02-19 14:17   ` [dm-devel] " Mikulas Patocka
2021-02-24  1:42     ` JeffleXu
2021-02-26  8:22     ` JeffleXu
2021-02-08  8:52 ` [PATCH v3 10/11] nvme/pci: don't wait for locked polling queue Jeffle Xu
2021-02-08  8:52 ` [PATCH v3 11/11] dm: fastpath of bio-based polling Jeffle Xu
2021-02-19 19:38   ` [dm-devel] " Mikulas Patocka
2021-02-26  8:12     ` JeffleXu
2021-03-02 19:03       ` Mikulas Patocka
2021-03-03  1:55         ` JeffleXu
2021-02-17 13:15 ` [PATCH v3 00/11] dm: support IO polling JeffleXu
2021-03-10 20:01 ` Mike Snitzer
2021-03-11  7:07   ` [dm-devel] " JeffleXu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ac63594-7764-dc13-a217-50a96cd9a93c@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox