public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Anuj gupta <[email protected]>
Cc: [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected], [email protected]
Subject: Re: [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough
Date: Tue, 16 May 2023 19:38:20 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CACzX3Av9yOkAK16QRJ7npQUVAiTjA-nqLR2Doob9p6nYYYkyOg@mail.gmail.com>

On 5/16/23 12:42, Anuj gupta wrote:
> On Mon, May 15, 2023 at 6:29 PM Pavel Begunkov <[email protected]> wrote:
>>
>> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.
>>
>> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
>> patchset, but for a quick test I took fio/t/io_uring with 4 threads each
>> reading their own drive and all pinned to the same CPU to make it CPU
>> bound and got +10% throughput improvement.
>>
>> [1] https://lore.kernel.org/all/[email protected]/
>>
>> Pavel Begunkov (2):
>>    io_uring/cmd: add cmd lazy tw wake helper
>>    nvme: optimise io_uring passthrough completion
>>
>>   drivers/nvme/host/ioctl.c |  4 ++--
>>   include/linux/io_uring.h  | 18 ++++++++++++++++--
>>   io_uring/uring_cmd.c      | 16 ++++++++++++----
>>   3 files changed, 30 insertions(+), 8 deletions(-)
>>
>>
>> base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6
>> --
>> 2.40.0
>>
> 
> I tried to run a few workloads on my setup with your patches applied. However, I
> couldn't see any difference in io passthrough performance. I might have missed
> something. Can you share the workload that you ran which gave you the perf
> improvement. Here is the workload that I ran -

The patch is way to make completion batching more consistent. If you're so
lucky that all IO complete before task_work runs, it'll be perfect batching
and there is nothing to improve. That often happens with high throughput
benchmarks because of how consistent they are: no writes, same size,
everything is issued at the same time and so on. In reality it depends
on your use pattern, timings, nvme coalescing, will also change if you
introduce a second drive, and so on.

With the patch t/io_uring should run task_work once for exactly the
number of cqes the user is waiting for, i.e. -c<N>, regardless of
circumstances.

Just tried it out to confirm,

taskset -c 0 nice -n -20 /t/io_uring -p0 -d4 -b8192 -s4 -c4 -F1 -B1 -R0 -X1 -u1 -O0 /dev/ng0n1

Without:
12:11:10 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:11:20 PM    0    2.03    0.00   25.95    0.00    0.00    0.00    0.00    0.00    0.00   72.03
With:
12:12:00 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:12:10 PM    0    2.22    0.00   17.39    0.00    0.00    0.00    0.00    0.00    0.00   80.40


Double checking it works:

echo 1 > /sys/kernel/debug/tracing/events/io_uring/io_uring_local_work_run/enable
cat /sys/kernel/debug/tracing/trace_pipe

Without I see

io_uring-4108    [000] .....   653.820369: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108    [000] .....   653.820371: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108    [000] .....   653.820382: io_uring_local_work_run: ring 00000000b843f57f, count 2, loops 1
io_uring-4108    [000] .....   653.820383: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108    [000] .....   653.820386: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108    [000] .....   653.820398: io_uring_local_work_run: ring 00000000b843f57f, count 2, loops 1
io_uring-4108    [000] .....   653.820398: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1

And with patches it's strictly count=4.

Another way would be to add more SSDs to the picture and hope they don't
conspire to complete at the same time


-- 
Pavel Begunkov

  reply	other threads:[~2023-05-16 18:41 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15 12:54 [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Pavel Begunkov
2023-05-15 12:54 ` [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper Pavel Begunkov
2023-05-16 10:00   ` Kanchan Joshi
2023-05-16 18:52     ` Pavel Begunkov
2023-05-17 10:33       ` Kanchan Joshi
2023-05-17 12:00         ` Pavel Begunkov
2023-05-19 15:00         ` Pavel Begunkov
2023-05-15 12:54 ` [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion Pavel Begunkov
2023-05-17  7:23   ` Christoph Hellwig
2023-05-17 12:32     ` Pavel Begunkov
2023-05-17 12:39       ` Christoph Hellwig
2023-05-17 13:30         ` Pavel Begunkov
2023-05-17 13:53           ` Christoph Hellwig
2023-05-17 20:11             ` Pavel Begunkov
2023-05-17 19:31       ` Jens Axboe
2023-05-18  2:15         ` Ming Lei
2023-05-16 11:42 ` [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Anuj gupta
2023-05-16 18:38   ` Pavel Begunkov [this message]
2023-05-25 14:54 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox