public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Dylan Yudaken <[email protected]>,
	[email protected], [email protected]
Cc: [email protected]
Subject: Re: [PATCH v2 for-next 0/8] io_uring: tw contention improvments
Date: Wed, 22 Jun 2022 09:21:53 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 6/22/22 7:40 AM, Dylan Yudaken wrote:
> Task work currently uses a spin lock to guard task_list and
> task_running. Some use cases such as networking can trigger task_work_add
> from multiple threads all at once, which suffers from contention here.
> 
> This can be changed to use a lockless list which seems to have better
> performance. Running the micro benchmark in [1] I see 20% improvment in
> multithreaded task work add. It required removing the priority tw list
> optimisation, however it isn't clear how important that optimisation is.
> Additionally it has fairly easy to break semantics.
> 
> Patch 1-2 remove the priority tw list optimisation
> Patch 3-5 add lockless lists for task work
> Patch 6 fixes a bug I noticed in io_uring event tracing
> Patch 7-8 adds tracing for task_work_run

I ran some IRQ driven workloads on this. Basic 512b random read, DIO,
IRQ, and then at queue depths 1-64, doubling every time. Once we get to
QD=8, start doing submit/complete batch of 1/4th of the QD so we ramp up
there too. Results below, first set is 5.19-rc3 + for-5.20/io_uring,
second set is that plus this series.

This is what I ran:

sudo taskset -c 12 t/io_uring -d<QD> -b512 -s<batch> -c<batch> -p0 -F1 -B1 -n1 -D0 -R0 -X1 -R1 -t1 -r5 /dev/nvme0n1

on a gen2 optane drive.

tldr - looks like an improvement there too, and no ill effects seen on
latency.

5.19-rc3 + for-5.20/io_uring:

QD1, Batch=1
Maximum IOPS=244K
1509: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3996],  5.0000th=[ 3996], 10.0000th=[ 3996],
     | 20.0000th=[ 4036], 30.0000th=[ 4036], 40.0000th=[ 4036],
     | 50.0000th=[ 4036], 60.0000th=[ 4036], 70.0000th=[ 4036],
     | 80.0000th=[ 4076], 90.0000th=[ 4116], 95.0000th=[ 4196],
     | 99.0000th=[ 4437], 99.5000th=[ 5421], 99.9000th=[ 7590],
     | 99.9500th=[ 9518], 99.9900th=[32289]

QD=2, Batch=1
Maximum IOPS=483K
1533: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3714],  5.0000th=[ 3755], 10.0000th=[ 3795],
     | 20.0000th=[ 3795], 30.0000th=[ 3835], 40.0000th=[ 3955],
     | 50.0000th=[ 4036], 60.0000th=[ 4076], 70.0000th=[ 4076],
     | 80.0000th=[ 4076], 90.0000th=[ 4116], 95.0000th=[ 4156],
     | 99.0000th=[ 4518], 99.5000th=[ 6144], 99.9000th=[ 7510],
     | 99.9500th=[ 9839], 99.9900th=[32289]

QD=4, Batch=1
Maximum IOPS=907K
1583: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3393],  5.0000th=[ 3514], 10.0000th=[ 3594],
     | 20.0000th=[ 3634], 30.0000th=[ 3795], 40.0000th=[ 3875],
     | 50.0000th=[ 3955], 60.0000th=[ 4076], 70.0000th=[ 4156],
     | 80.0000th=[ 4277], 90.0000th=[ 4397], 95.0000th=[ 4477],
     | 99.0000th=[ 5120], 99.5000th=[ 5903], 99.9000th=[ 9357],
     | 99.9500th=[11004], 99.9900th=[32289]

QD=8, Batch=2
Maximum IOPS=1688K
1631: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3353],  5.0000th=[ 3554], 10.0000th=[ 3634],
     | 20.0000th=[ 3755], 30.0000th=[ 3875], 40.0000th=[ 4036],
     | 50.0000th=[ 4156], 60.0000th=[ 4277], 70.0000th=[ 4437],
     | 80.0000th=[ 4678], 90.0000th=[ 4839], 95.0000th=[ 5040],
     | 99.0000th=[ 6305], 99.5000th=[ 7028], 99.9000th=[10080],
     | 99.9500th=[15502], 99.9900th=[32932]

QD=16, Batch=4
Maximum IOPS=2613K
1680: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3955],  5.0000th=[ 4397], 10.0000th=[ 4558],
     | 20.0000th=[ 4759], 30.0000th=[ 4959], 40.0000th=[ 5120],
     | 50.0000th=[ 5261], 60.0000th=[ 5502], 70.0000th=[ 5743],
     | 80.0000th=[ 5903], 90.0000th=[ 6305], 95.0000th=[ 6706],
     | 99.0000th=[ 8393], 99.5000th=[ 8955], 99.9000th=[11325],
     | 99.9500th=[31968], 99.9900th=[34217]

QD=32, Batch=8
Maximum IOPS=3573K
1706: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 4919],  5.0000th=[ 5662], 10.0000th=[ 5903],
     | 20.0000th=[ 6144], 30.0000th=[ 6465], 40.0000th=[ 6626],
     | 50.0000th=[ 6867], 60.0000th=[ 7188], 70.0000th=[ 7510],
     | 80.0000th=[ 7992], 90.0000th=[ 8714], 95.0000th=[ 9357],
     | 99.0000th=[11325], 99.5000th=[11967], 99.9000th=[16626],
     | 99.9500th=[34217], 99.9900th=[37108]

QD=64, Batch=16
Maximum IOPS=3953K
1735: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 6626],  5.0000th=[ 7188], 10.0000th=[ 7510],
     | 20.0000th=[ 7992], 30.0000th=[ 8393], 40.0000th=[ 9116],
     | 50.0000th=[10160], 60.0000th=[11164], 70.0000th=[11646],
     | 80.0000th=[12128], 90.0000th=[12931], 95.0000th=[13735],
     | 99.0000th=[15984], 99.5000th=[16787], 99.9000th=[34217],
     | 99.9500th=[38072], 99.9900th=[40964]


============


5.19-rc3 + for-5.20/io_uring + this series:

QD=1, Batch=1
Maximum IOPS=246K
909: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3955],  5.0000th=[ 3996], 10.0000th=[ 3996],
     | 20.0000th=[ 3996], 30.0000th=[ 3996], 40.0000th=[ 3996],
     | 50.0000th=[ 3996], 60.0000th=[ 3996], 70.0000th=[ 4036],
     | 80.0000th=[ 4036], 90.0000th=[ 4076], 95.0000th=[ 4116],
     | 99.0000th=[ 4196], 99.5000th=[ 5341], 99.9000th=[ 7590],
     | 99.9500th=[ 9357], 99.9900th=[32289]

QD=2, Batch=1
Maximum IOPS=487K
932: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3714],  5.0000th=[ 3755], 10.0000th=[ 3755],
     | 20.0000th=[ 3755], 30.0000th=[ 3795], 40.0000th=[ 3795],
     | 50.0000th=[ 3996], 60.0000th=[ 4036], 70.0000th=[ 4036],
     | 80.0000th=[ 4036], 90.0000th=[ 4076], 95.0000th=[ 4116],
     | 99.0000th=[ 4437], 99.5000th=[ 6224], 99.9000th=[ 7510],
     | 99.9500th=[ 9598], 99.9900th=[32289]

QD=4, Batch=1
aximum IOPS=921K
955: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3393],  5.0000th=[ 3433], 10.0000th=[ 3514],
     | 20.0000th=[ 3594], 30.0000th=[ 3674], 40.0000th=[ 3795],
     | 50.0000th=[ 3875], 60.0000th=[ 3996], 70.0000th=[ 4036],
     | 80.0000th=[ 4156], 90.0000th=[ 4317], 95.0000th=[ 4678],
     | 99.0000th=[ 5120], 99.5000th=[ 5903], 99.9000th=[ 9116],
     | 99.9500th=[10522], 99.9900th=[32289]

QD=8, Batch=2
Maximum IOPS=1658K
981: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3313],  5.0000th=[ 3514], 10.0000th=[ 3594],
     | 20.0000th=[ 3714], 30.0000th=[ 3835], 40.0000th=[ 3996],
     | 50.0000th=[ 4116], 60.0000th=[ 4196], 70.0000th=[ 4397],
     | 80.0000th=[ 4598], 90.0000th=[ 4718], 95.0000th=[ 4919],
     | 99.0000th=[ 6385], 99.5000th=[ 6947], 99.9000th=[10000],
     | 99.9500th=[15180], 99.9900th=[32932]

QD=16, Batch=4
Maximum IOPS=2749K
1010: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 3955],  5.0000th=[ 4437], 10.0000th=[ 4558],
     | 20.0000th=[ 4759], 30.0000th=[ 4959], 40.0000th=[ 5120],
     | 50.0000th=[ 5261], 60.0000th=[ 5502], 70.0000th=[ 5743],
     | 80.0000th=[ 5903], 90.0000th=[ 6224], 95.0000th=[ 6626],
     | 99.0000th=[ 8313], 99.5000th=[ 9036], 99.9000th=[11967],
     | 99.9500th=[32289], 99.9900th=[34217]

QD=32, Batch=8
Maximum IOPS=3583K
1050: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 4879],  5.0000th=[ 5582], 10.0000th=[ 5903],
     | 20.0000th=[ 6224], 30.0000th=[ 6465], 40.0000th=[ 6626],
     | 50.0000th=[ 6787], 60.0000th=[ 7028], 70.0000th=[ 7349],
     | 80.0000th=[ 7911], 90.0000th=[ 8634], 95.0000th=[ 9196],
     | 99.0000th=[11164], 99.5000th=[11967], 99.9000th=[16305],
     | 99.9500th=[34217], 99.9900th=[37108]

QD=64, Batch=16
Maximum IOPS=3959K
1081: Latency percentiles:
    percentiles (nsec):
     |  1.0000th=[ 6546],  5.0000th=[ 7108], 10.0000th=[ 7429],
     | 20.0000th=[ 7992], 30.0000th=[ 8313], 40.0000th=[ 8955],
     | 50.0000th=[10000], 60.0000th=[11004], 70.0000th=[11646],
     | 80.0000th=[12128], 90.0000th=[12931], 95.0000th=[13735],
     | 99.0000th=[15984], 99.5000th=[16787], 99.9000th=[33253],
     | 99.9500th=[38072], 99.9900th=[41446]

-- 
Jens Axboe


  parent reply	other threads:[~2022-06-22 15:22 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-22 13:40 [PATCH v2 for-next 0/8] io_uring: tw contention improvments Dylan Yudaken
2022-06-22 13:40 ` [PATCH v2 for-next 1/8] io_uring: remove priority tw list optimisation Dylan Yudaken
2022-06-22 13:40 ` [PATCH v2 for-next 2/8] io_uring: remove __io_req_task_work_add Dylan Yudaken
2022-06-22 13:40 ` [PATCH v2 for-next 3/8] io_uring: lockless task list Dylan Yudaken
2022-06-22 13:40 ` [PATCH v2 for-next 4/8] io_uring: introduce llist helpers Dylan Yudaken
2022-06-22 13:40 ` [PATCH v2 for-next 5/8] io_uring: batch task_work Dylan Yudaken
2022-06-22 13:40 ` [PATCH v2 for-next 6/8] io_uring: move io_uring_get_opcode out of TP_printk Dylan Yudaken
2022-06-22 13:40 ` [PATCH v2 for-next 7/8] io_uring: add trace event for running task work Dylan Yudaken
2022-06-22 13:40 ` [PATCH v2 for-next 8/8] io_uring: trace task_work_run Dylan Yudaken
2022-06-22 15:21 ` Jens Axboe [this message]
2022-06-23  8:23   ` [PATCH v2 for-next 0/8] io_uring: tw contention improvments Hao Xu
2022-06-22 17:39 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox