public inbox for [email protected]
 help / color / mirror / Atom feed
From: Hao Xu <[email protected]>
To: Pavel Begunkov <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH v6 0/6] task work optimization
Date: Fri, 3 Dec 2021 11:24:08 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

在 2021/12/3 上午9:39, Pavel Begunkov 写道:
> On 11/26/21 10:07, Hao Xu wrote:
>> v4->v5
>> - change the implementation of merge_wq_list
>>
>> v5->v6
>> - change the logic of handling prior task list to:
>>    1) grabbed uring_lock: leverage the inline completion infra
>>    2) otherwise: batch __req_complete_post() calls to save
>>       completion_lock operations.
> 
> some testing for v6, first is taking first 5 patches (1-5), and
> then all 6 (see 1-6).
> 
> modprobe null_blk no_sched=1 irqmode=1 completion_nsec=0 
> submit_queues=16 poll_queues=32 hw_queue_depth=128
> echo 2 | sudo tee /sys/block/nullb0/queue/nomerges
> echo 0 | sudo tee /sys/block/nullb0/queue/iostats
> mitigations=off
> 
> added this to test non-sqpoll:
> 
> @@ -2840,7 +2840,7 @@ static void io_complete_rw(struct kiocb *kiocb, 
> long res)
>                  return;
>          req->result = res;
>          req->io_task_work.func = io_req_task_complete;
> -       io_req_task_work_add(req, !!(req->ctx->flags & 
> IORING_SETUP_SQPOLL));
> +       io_req_task_work_add(req, true);
>   }
> 
> # 1-5, sqpoll=0
> nice -n -20 taskset -c 0 ./io_uring -d32 -s32 -c32 -p0 -B1 -F1 -b512 
> /dev/nullb0
> IOPS=3238688, IOS/call=32/32, inflight=32 (32)
> IOPS=3299776, IOS/call=32/32, inflight=32 (32)
> IOPS=3328416, IOS/call=32/32, inflight=32 (32)
> IOPS=3291488, IOS/call=32/32, inflight=32 (32)
> IOPS=3284480, IOS/call=32/32, inflight=32 (32)
> IOPS=3305248, IOS/call=32/32, inflight=32 (32)
> IOPS=3275392, IOS/call=32/32, inflight=32 (32)
> IOPS=3301376, IOS/call=32/32, inflight=32 (32)
> IOPS=3287392, IOS/call=32/32, inflight=32 (32)
> 
> # 1-5, sqpoll=1
> nice -n -20  ./io_uring -d32 -s32 -c32 -p0 -B1 -F1 -b512 /dev/nullb0
> IOPS=2730752, IOS/call=2730752/2730752, inflight=32 (32)
> IOPS=2822432, IOS/call=-1/-1, inflight=0 (32)
> IOPS=2818464, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2802880, IOS/call=-1/-1, inflight=0 (32)
> IOPS=2773440, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2827296, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2808320, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2793120, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2769632, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2752896, IOS/call=-1/-1, inflight=32 (32)
> 
> # 1-6, sqpoll=0
> nice -n -20 taskset -c 0 ./io_uring -d32 -s32 -c32 -p0 -B1 -F1 -b512 
> /dev/nullb0
> IOPS=3219552, IOS/call=32/32, inflight=32 (32)
> IOPS=3284128, IOS/call=32/32, inflight=32 (32)
> IOPS=3305024, IOS/call=32/32, inflight=32 (32)
> IOPS=3301920, IOS/call=32/32, inflight=32 (32)
> IOPS=3330592, IOS/call=32/32, inflight=32 (32)
> IOPS=3286496, IOS/call=32/32, inflight=32 (32)
> IOPS=3236160, IOS/call=32/32, inflight=32 (32)
> IOPS=3307552, IOS/call=32/32, inflight=32 (32)
> 
> # 1-6, sqpoll=1
> nice -n -20  ./io_uring -d32 -s32 -c32 -p0 -B1 -F1 -b512 /dev/nullb0
> IOPS=2777152, IOS/call=2777152/2777152, inflight=32 (32)
> IOPS=2822080, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2785472, IOS/call=-1/-1, inflight=0 (32)
> IOPS=2763360, IOS/call=-1/-1, inflight=0 (32)
> IOPS=2789856, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2783296, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2786016, IOS/call=-1/-1, inflight=0 (32)
> IOPS=2773760, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2745408, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2764352, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2766912, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2757216, IOS/call=-1/-1, inflight=32 (32)
> 
> So, no difference here as expected, it just takes uring_lock
> as per v6 changes and goes through the old path. Than I added
> this to compare old vs new paths:
> 
> @@ -2283,7 +2283,7 @@ static void handle_prior_tw_list(struct 
> io_wq_work_node *node, struct io_ring_ct
>                          ctx_flush_and_put(*ctx, locked);
>                          *ctx = req->ctx;
>                          /* if not contended, grab and improve batching */
> -                       *locked = mutex_trylock(&(*ctx)->uring_lock);
> +                       // *locked = mutex_trylock(&(*ctx)->uring_lock);
>                          percpu_ref_get(&(*ctx)->refs);
>                          if (unlikely(!*locked))
>                                  spin_lock(&(*ctx)->completion_lock);
> 
> 
> # 1-6 + no trylock, sqpoll=0
> nice -n -20 taskset -c 0 ./io_uring -d32 -s32 -c32 -p0 -B1 -F1 -b512 
> /dev/nullb0
> IOPS=3239040, IOS/call=32/32, inflight=32 (32)
> IOPS=3244800, IOS/call=32/32, inflight=32 (32)
> IOPS=3208544, IOS/call=32/32, inflight=32 (32)
> IOPS=3264384, IOS/call=32/32, inflight=32 (32)
> IOPS=3264000, IOS/call=32/32, inflight=32 (32)
> IOPS=3296960, IOS/call=32/32, inflight=32 (32)
> IOPS=3283424, IOS/call=32/32, inflight=32 (32)
> IOPS=3284064, IOS/call=32/32, inflight=32 (32)
> IOPS=3275232, IOS/call=32/32, inflight=32 (32)
> IOPS=3261248, IOS/call=32/32, inflight=32 (32)
> IOPS=3273792, IOS/call=32/32, inflight=32 (32)
> 
> #1-6 + no trylock, sqpoll=1
> nice -n -20  ./io_uring -d32 -s32 -c32 -p0 -B1 -F1 -b512 /dev/nullb0
> IOPS=2676736, IOS/call=2676736/2676736, inflight=32 (32)
> IOPS=2639776, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2660000, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2639584, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2634592, IOS/call=-1/-1, inflight=0 (32)
> IOPS=2611488, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2647360, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2630720, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2663200, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2694240, IOS/call=-1/-1, inflight=32 (32)
> IOPS=2674592, IOS/call=-1/-1, inflight=32 (32)
> 
> Seems it goes a little bit down, but not much. Considering that
> it's an optimisation for cases where there is no batching at all,
> that's good.
Nice, thanks for testing this, now it's clear that the inline completion
path is faster.

Regards,
Hao
> 


  parent reply	other threads:[~2021-12-03  3:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-26 10:07 [PATCH v6 0/6] task work optimization Hao Xu
2021-11-26 10:07 ` [PATCH 1/6] io-wq: add helper to merge two wq_lists Hao Xu
2021-11-26 10:07 ` [PATCH 2/6] io_uring: add a priority tw list for irq completion work Hao Xu
2021-11-26 10:07 ` [PATCH 3/6] io_uring: add helper for task work execution code Hao Xu
2021-11-26 10:07 ` [PATCH 4/6] io_uring: split io_req_complete_post() and add a helper Hao Xu
2021-11-26 10:07 ` [PATCH 5/6] io_uring: move up io_put_kbuf() and io_put_rw_kbuf() Hao Xu
2021-11-26 10:07 ` [PATCH 6/6] io_uring: batch completion in prior_task_list Hao Xu
2021-11-26 12:56   ` Hao Xu
2021-11-26 13:37 ` [PATCH RESEND " Hao Xu
2021-11-27 15:24   ` [PATCH v7] " Hao Xu
2021-11-28 15:28     ` Pavel Begunkov
2021-12-03  1:39 ` [PATCH v6 0/6] task work optimization Pavel Begunkov
2021-12-03  2:01   ` Pavel Begunkov
2021-12-03  7:30     ` Hao Xu
2021-12-03 14:21       ` Pavel Begunkov
2021-12-05 15:02         ` Hao Xu
2021-12-05 15:42           ` Pavel Begunkov
2021-12-06  8:35             ` Hao Xu
2021-12-06  9:48               ` Hao Xu
2021-12-03  3:24   ` Hao Xu [this message]
2021-12-04 20:58 ` Pavel Begunkov
2021-12-05 15:11   ` Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e1f0a017-2aa4-a585-f35b-aefafd035de4@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox