public inbox for [email protected]
 help / color / mirror / Atom feed
From: Hao Xu <[email protected]>
To: Pavel Begunkov <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH for-5.16 v3 0/8] task work optimization
Date: Fri, 29 Oct 2021 14:18:41 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

在 2021/10/29 上午3:08, Pavel Begunkov 写道:
> On 10/28/21 12:46, Hao Xu wrote:
>> 在 2021/10/28 下午2:07, Hao Xu 写道:
>>> 在 2021/10/28 上午2:15, Pavel Begunkov 写道:
>>>> On 10/27/21 15:02, Hao Xu wrote:
>>>>> Tested this patchset by manually replace __io_queue_sqe() in
>>>>> io_queue_sqe() by io_req_task_queue() to construct 'heavy' task works.
>>>>> Then test with fio:
>>>>
>>>> If submissions and completions are done by the same task it doesn't
>>>> really matter in which order they're executed because the task won't
>>>> get back to userspace execution to see CQEs until tw returns.
>>> It may matter, it depends on the time cost of submittion
>>> and the DMA IO time. Pick up sqpoll mode as example,
>>> we submit 10 reqs:
>>> t1          io_submit_sqes
>>>              -->io_req_task_queue
>>> t2          io_task_work_run
>>> we actually do the submittion in t2,  but if the workload
>>> is big engough, the 'irq completion TW' will be inserted
>>> to the TW list after t2 is fully done, then those
>>> 'irq completion TW' will be delayed to the next round.
>>> With this patchset, we can handle them first.
>>>> Furthermore, it even might be worse because the earlier you submit
>>>> the better with everything else equal.
>>>>
>>>> IIRC, that's how it's with fio, right? If so, you may get better
>>>> numbers with a test that does submissions and completions in
>>>> different threads.
>>> Because of the completion cache, I doubt if it works.
>>> For single ctx, it seems we always update the cqring
>>> pointer after all the TWs in the list are done.
>> I suddenly realized sqpoll mode does submissions and completions
>> in different threads, and in this situation this patchset always
>> first commit_cqring() after handling TWs in priority list.
>> So this is the right test, do I miss something?
> 
> Yep, should be it. So the scope of the feature is SQPOLL or
> completion/submission with different tasks.
 From the test results, it's a bit risk to apply this feature to
normal mode(no good, but have to ensure no bad), so I'd like to
apply it to sqpoll mode for now. For completion/submission
decoupled situation, maybe we can include it later.
> 
>>>>
>>>> Also interesting to find an explanation for you numbers assuming
>>> The reason may be what I said above, but I don't have a
>>> strict proof now.
>>>> they're stable. 7/8 batching? How often it does it go this path?
>>>> If only one task submits requests it should already be covered
>>>> with existing batching.
>>> the problem of the existing batch is(given there is only
>>> one ctx):
>>> 1. we flush it after all the TWs done
>>> 2. we batch them if we have uring lock.
>>> the new batch is:
>>> 1. don't care about uring lock
>>> 2. we can flush the completions in the priority list
>>>     in advance.(which means userland can see it earlier.)
>>>>
>>>>
>>>>> ioengine=io_uring
>>>>> sqpoll=1
>>>>> thread=1
>>>>> bs=4k
>>>>> direct=1
>>>>> rw=randread
>>>>> time_based=1
>>>>> runtime=600
>>>>> randrepeat=0
>>>>> group_reporting=1
>>>>> filename=/dev/nvme0n1
>>>>>
>>>>> 2/8 set unlimited priority_task_list, 8/8 set a limitation of
>>>>> 1/3 * (len_prority_list + len_normal_list), data below:
>>>>>     depth     no 8/8   include 8/8      before this patchset
>>>>>      1        7.05         7.82              7.10
>>>>>      2        8.47         8.48              8.60
>>>>>      4        10.42        9.99              10.42
>>>>>      8        13.78        13.13             13.22
>>>>>      16       27.41        27.92             24.33
>>>>>      32       49.40        46.16             53.08
>>>>>      64       102.53       105.68            103.36
>>>>>      128      196.98       202.76            205.61
>>>>>      256      372.99       375.61            414.88
>>>>>      512      747.23       763.95            791.30
>>>>>      1024     1472.59      1527.46           1538.72
>>>>>      2048     3153.49      3129.22           3329.01
>>>>>      4096     6387.86      5899.74           6682.54
>>>>>      8192     12150.25     12433.59          12774.14
>>>>>      16384    23085.58     24342.84          26044.71
>>>>>
>>>>> It seems 2/8 is better, haven't tried other choices other than 1/3,
>>>>> still put 8/8 here for people's further thoughts.
>>>>>
>>>>> Hao Xu (8):
>>>>>    io-wq: add helper to merge two wq_lists
>>>>>    io_uring: add a priority tw list for irq completion work
>>>>>    io_uring: add helper for task work execution code
>>>>>    io_uring: split io_req_complete_post() and add a helper
>>>>>    io_uring: move up io_put_kbuf() and io_put_rw_kbuf()
>>>>>    io_uring: add nr_ctx to record the number of ctx in a task
>>>>>    io_uring: batch completion in prior_task_list
>>>>>    io_uring: add limited number of TWs to priority task list
>>>>>
>>>>>   fs/io-wq.h    |  21 +++++++
>>>>>   fs/io_uring.c | 168 
>>>>> +++++++++++++++++++++++++++++++++++---------------
>>>>>   2 files changed, 138 insertions(+), 51 deletions(-)
>>>>>
>>>>
>>
> 


      reply	other threads:[~2021-10-29  6:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27 14:02 [PATCH for-5.16 v3 0/8] task work optimization Hao Xu
2021-10-27 14:02 ` [PATCH 1/8] io-wq: add helper to merge two wq_lists Hao Xu
2021-10-27 14:02 ` [PATCH 2/8] io_uring: add a priority tw list for irq completion work Hao Xu
2021-10-27 14:02 ` [PATCH 3/8] io_uring: add helper for task work execution code Hao Xu
2021-10-27 14:02 ` [PATCH 4/8] io_uring: split io_req_complete_post() and add a helper Hao Xu
2021-10-27 14:02 ` [PATCH 5/8] io_uring: move up io_put_kbuf() and io_put_rw_kbuf() Hao Xu
2021-10-27 14:02 ` [PATCH 6/8] io_uring: add nr_ctx to record the number of ctx in a task Hao Xu
2021-10-28  6:27   ` Hao Xu
2021-10-27 14:02 ` [PATCH 7/8] io_uring: batch completion in prior_task_list Hao Xu
2021-10-27 14:02 ` [PATCH 8/8] io_uring: add limited number of TWs to priority task list Hao Xu
2021-10-27 16:39 ` [PATCH for-5.16 v3 0/8] task work optimization Hao Xu
2021-10-27 18:15 ` Pavel Begunkov
2021-10-28  6:07   ` Hao Xu
2021-10-28 11:46     ` Hao Xu
2021-10-28 19:08       ` Pavel Begunkov
2021-10-29  6:18         ` Hao Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4bc45226-8b27-500a-58e7-36da2eb5f92e@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox