From: Pavel Begunkov <[email protected]>
To: Hao Xu <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [POC RFC 0/3] support graph like dependent sqes
Date: Tue, 21 Dec 2021 16:19:30 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 12/18/21 06:57, Hao Xu wrote:
> 在 2021/12/18 上午3:33, Pavel Begunkov 写道:
>> On 12/16/21 16:55, Hao Xu wrote:
>>> 在 2021/12/15 上午2:16, Pavel Begunkov 写道:
>>>> On 12/14/21 16:53, Hao Xu wrote:
>>>>> 在 2021/12/14 下午11:21, Pavel Begunkov 写道:
>>>>>> On 12/14/21 05:57, Hao Xu wrote:
>>>>>>> This is just a proof of concept which is incompleted, send it early for
>>>>>>> thoughts and suggestions.
>>>>>>>
>>>>>>> We already have IOSQE_IO_LINK to describe linear dependency
>>>>>>> relationship sqes. While this patchset provides a new feature to
>>>>>>> support DAG dependency. For instance, 4 sqes have a relationship
>>>>>>> as below:
>>>>>>> --> 2 --
>>>>>>> / \
>>>>>>> 1 --- ---> 4
>>>>>>> \ /
>>>>>>> --> 3 --
>>>>>>> IOSQE_IO_LINK serializes them to 1-->2-->3-->4, which unneccessarily
>>>>>>> serializes 2 and 3. But a DAG can fully describe it.
>>>>>>>
>>>>>>> For the detail usage, see the following patches' messages.
>>>>>>>
>>>>>>> Tested it with 100 direct read sqes, each one reads a BS=4k block data
>>>>>>> in a same file, blocks are not overlapped. These sqes form a graph:
>>>>>>> 2
>>>>>>> 3
>>>>>>> 1 --> 4 --> 100
>>>>>>> ...
>>>>>>> 99
>>>>>>>
>>>>>>> This is an extreme case, just to show the idea.
>>>>>>>
>>>>>>> results below:
>>>>>>> io_link:
>>>>>>> IOPS: 15898251
>>>>>>> graph_link:
>>>>>>> IOPS: 29325513
>>>>>>> io_link:
>>>>>>> IOPS: 16420361
>>>>>>> graph_link:
>>>>>>> IOPS: 29585798
>>>>>>> io_link:
>>>>>>> IOPS: 18148820
>>>>>>> graph_link:
>>>>>>> IOPS: 27932960
>>>>>>
>>>>>> Hmm, what do we compare here? IIUC,
>>>>>> "io_link" is a huge link of 100 requests. Around 15898251 IOPS
>>>>>> "graph_link" is a graph of diameter 3. Around 29585798 IOPS
>>>>
>>>> Diam 2 graph, my bad
>>>>
>>>>
>>>>>> Is that right? If so it'd more more fair to compare with a
>>>>>> similar graph-like scheduling on the userspace side.
>>>>>
>>>>> The above test is more like to show the disadvantage of LINK
>>>>
>>>> Oh yeah, links can be slow, especially when it kills potential
>>>> parallelism or need extra allocations for keeping state, like
>>>> READV and WRITEV.
>>>>
>>>>
>>>>> But yes, it's better to test the similar userspace scheduling since
>>>>>
>>>>> LINK is definitely not a good choice so have to prove the graph stuff
>>>>>
>>>>> beat the userspace scheduling. Will test that soon. Thanks.
>>>>
>>>> Would be also great if you can also post the benchmark once
>>>> it's done
>>>
>>> Wrote a new test to test nop sqes forming a full binary tree with (2^10)-1 nodes,
>>> which I think it a more general case. Turns out the result is still not stable and
>>> the kernel side graph link is much slow. I'll try to optimize it.
>>
>> That's expected unfortunately. And without reacting on results
>> of previous requests, it's hard to imagine to be useful. BPF may
>> have helped, e.g. not keeping an explicit graph but just generating
>> new requests from the kernel... But apparently even with this it's
>> hard to compete with just leaving it in userspace.
>>
> Tried to exclude the memory allocation stuff, seems it's a bit better than the user graph.
>
> For the result delivery, I was thinking of attaching BPF program within a sqe, not creating
> a single BPF type sqe. Then we can have data flow in the graph or linkchain. But I haven't
> had a clear draft for it
Oh, I dismissed this idea before. Even if it can be done in-place without any
additional tw (consider recursion and submit_state not prepared for that), it'll
be a horror to maintain. And I also don't see it being flexible enough.
There is one idea from guys that I have to implement, i.e. having a per-CQ
callback. Might interesting to experiment, but I don't see it being viable
in the long run.
>>> Btw, is there any comparison data between the current io link feature and the
>>> userspace scheduling.
>>
>> Don't remember. I'd try to look up the cover-letter for the patches
>> implementing it, I believe there should've been some numbers and
>> hopefully test description.
>>
>> fwiw, before io_uring mailing list got established patches/etc.
>> were mostly going through linux-block mailing list. Links are old, so
>> patches might be there.
>>
--
Pavel Begunkov
next prev parent reply other threads:[~2021-12-21 16:19 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-14 5:57 [POC RFC 0/3] support graph like dependent sqes Hao Xu
2021-12-14 5:57 ` [PATCH 1/3] io_uring: add data structure for graph sqe feature Hao Xu
2021-12-14 5:57 ` [PATCH 2/3] io_uring: implement new sqe opcode to build graph like links Hao Xu
2021-12-14 5:57 ` [PATCH 3/3] io_uring: implement logic of IOSQE_GRAPH request Hao Xu
2021-12-14 15:21 ` [POC RFC 0/3] support graph like dependent sqes Pavel Begunkov
2021-12-14 16:53 ` Hao Xu
2021-12-14 18:16 ` Pavel Begunkov
2021-12-16 16:55 ` Hao Xu
2021-12-17 19:33 ` Pavel Begunkov
2021-12-18 6:57 ` Hao Xu
2021-12-21 16:19 ` Pavel Begunkov [this message]
2021-12-23 4:14 ` Hao Xu
2021-12-23 10:06 ` Christian Dietrich
2021-12-27 3:27 ` Hao Xu
2021-12-27 5:49 ` Christian Dietrich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox