Re: [PATCH 3/8] io_uring: add a limited tw list for irq completion work

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: Hao Xu <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH 3/8] io_uring: add a limited tw list for irq completion work
Date: Thu, 30 Sep 2021 10:02:07 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 9/29/21 12:38 PM, Hao Xu wrote:
> 在 2021/9/28 下午7:29, Pavel Begunkov 写道:
[...]
>>>   @@ -2132,12 +2136,16 @@ static void tctx_task_work(struct callback_head *cb)
>>>       while (1) {
>>>           struct io_wq_work_node *node;
>>>   -        if (!tctx->task_list.first && locked)
>>> +        if (!tctx->prior_task_list.first &&
>>> +            !tctx->task_list.first && locked)
>>>               io_submit_flush_completions(ctx);
>>>             spin_lock_irq(&tctx->task_lock);
>>> -        node = tctx->task_list.first;
>>> +        wq_list_merge(&tctx->prior_task_list, &tctx->task_list);
>>> +        node = tctx->prior_task_list.first;
>>
>> I find all this accounting expensive, sure I'll see it for my BPF tests.
> May I ask how do you evaluate the overhead with BPF here?

It's a custom branch and apparently would need some thinking on how
to apply your stuff on top, because of yet another list in [1]. In
short, the case in mind spins inside of tctx_task_work() doing one
request at a time.
I think would be easier if I try it out myself.

[1] https://github.com/isilence/linux/commit/d6285a9817eb26aa52ad54a79584512d7efa82fd

>>
>> How about
>> 1) remove MAX_EMERGENCY_TW_RATIO and all the counters,
>> prior_nr and others.
>>
>> 2) rely solely on list merging
>>
>> So, when it enters an iteration of the loop it finds a set of requests
>> to run, it first executes all priority ones of that set and then the
>> rest (just by the fact that you merged the lists and execute all from
>> them).
>>
>> It solves the problem of total starvation of non-prio requests, e.g.
>> if new completions coming as fast as you complete previous ones. One
>> downside is that prio requests coming while we execute a previous
>> batch will be executed only after a previous batch of non-prio
>> requests, I don't think it's much of a problem but interesting to
>> see numbers.
> hmm, this probably doesn't solve the starvation, since there may be
> a number of priority TWs ahead of non-prio TWs in one iteration, in the
> case of submitting many sqes in one io_submit_sqes. That's why I keep
> just 1/3 priority TWs there.

I don't think it's a problem, they should be fast enough and we have
a forward progress guarantees for non-prio. IMHO that should be enough.


>>
>>
>>>           INIT_WQ_LIST(&tctx->task_list);
>>> +        INIT_WQ_LIST(&tctx->prior_task_list);
>>> +        tctx->nr = tctx->prior_nr = 0;
>>>           if (!node)
>>>               tctx->task_running = false;
>>>           spin_unlock_irq(&tctx->task_lock);
>>> @@ -2166,7 +2174,7 @@ static void tctx_task_work(struct callback_head *cb)
>>>       ctx_flush_and_put(ctx, &locked);
>>>   }
>>>   -static void io_req_task_work_add(struct io_kiocb *req)
>>> +static void io_req_task_work_add(struct io_kiocb *req, bool emergency)
>>
>> It think "priority" instead of "emergency" will be more accurate
>>
>>>   {
>>>       struct task_struct *tsk = req->task;
>>>       struct io_uring_task *tctx = tsk->io_uring;
>>> @@ -2178,7 +2186,13 @@ static void io_req_task_work_add(struct io_kiocb *req)
>>>       WARN_ON_ONCE(!tctx);
>>>         spin_lock_irqsave(&tctx->task_lock, flags);
>>> -    wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
>>> +    if (emergency && tctx->prior_nr * MAX_EMERGENCY_TW_RATIO < tctx->nr) {
>>> +        wq_list_add_tail(&req->io_task_work.node, &tctx->prior_task_list);
>>> +        tctx->prior_nr++;
>>> +    } else {
>>> +        wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
>>> +    }
>>> +    tctx->nr++;
>>>       running = tctx->task_running;
>>>       if (!running)
>>>           tctx->task_running = true;
>>
>>
>>
> 

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2021-09-30  9:02 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27  6:17 [PATCH 0/6] task_work optimization Hao Xu
2021-09-27  6:17 ` [PATCH 1/8] io-wq: code clean for io_wq_add_work_after() Hao Xu
2021-09-28 11:08   ` Pavel Begunkov
2021-09-29  7:36     ` Hao Xu
2021-09-29 11:23       ` Pavel Begunkov
2021-09-27  6:17 ` [PATCH 2/8] io-wq: add helper to merge two wq_lists Hao Xu
2021-09-27 10:17   ` Hao Xu
2021-09-28 11:10   ` Pavel Begunkov
2021-09-28 16:48     ` Hao Xu
2021-09-29 11:23       ` Pavel Begunkov
2021-09-27  6:17 ` [PATCH 3/8] io_uring: add a limited tw list for irq completion work Hao Xu
2021-09-28 11:29   ` Pavel Begunkov
2021-09-28 16:55     ` Hao Xu
2021-09-29 11:25       ` Pavel Begunkov
2021-09-29 11:38     ` Hao Xu
2021-09-30  9:02       ` Pavel Begunkov [this message]
2021-09-30  3:21     ` Hao Xu
2021-09-27  6:17 ` [PATCH 4/8] io_uring: add helper for task work execution code Hao Xu
2021-09-27  6:17 ` [PATCH 5/8] io_uring: split io_req_complete_post() and add a helper Hao Xu
2021-09-27  6:17 ` [PATCH 6/8] io_uring: move up io_put_kbuf() and io_put_rw_kbuf() Hao Xu
2021-09-27  6:17 ` [PATCH 7/8] io_uring: add tw_ctx for io_uring_task Hao Xu
2021-09-27  6:17 ` [PATCH 8/8] io_uring: batch completion in prior_task_list Hao Xu
2021-09-27  6:21 ` [PATCH 0/6] task_work optimization Hao Xu
  -- strict thread matches above, loose matches on Subject: below --
2021-09-27 10:51 [PATCH v2 0/8] " Hao Xu
2021-09-27 10:51 ` [PATCH 3/8] io_uring: add a limited tw list for irq completion work Hao Xu
2021-09-29 12:31   ` Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox