io_uring and Optane2

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: io-uring <[email protected]>,
	"[email protected]" <[email protected]>
Subject: io_uring and Optane2
Date: Fri, 21 Aug 2020 09:58:57 -0600	[thread overview]
Message-ID: <[email protected]> (raw)

Hi,

One of the key elements for eeking out the very last bit of performance
with io_uring is being able to test your design and improvements. I had
a bit of help on that front since Intel got me some Gen2 Optane SSD
samples a while back, and I've been using those to guide improvements -
and vice versa, to see which changes end up being detrimental to
latencies or scalability. I haven't been able to share any numbers on
that until now. So without further ado, here's some insight into what is
possible with io_uring, and the Linux IO stack, today.

Test setup:

Kernel: 5.9.0-rc1
System: Intel Ice Lake-SP Next-Gen Xeon (https://www.servethehome.com/intel-ice-lake-sp-next-gen-xeon-architecture-at-hc32/)
Storage device: Single Gen2 Optane SSD (https://blocksandfiles.com/2020/08/14/intel-gen-2-optane-details/)
Benchmark: t/io_uring from fio
Workload: Single thread random 512b O_DIRECT reads

Note that this is utilizing a single core in the system, out of the many
available. t/io_uring is used for light overhead IO generation, and
we're using polled IO with io_uring, and registered buffers and files.
512b IOs are used to keep us well below the bandwidth ceiling.
Throughput is easy, IOPS and latency are harder. My goal here was to
demonstrate what is possible today with io_uring in terms of efficiency.

Results
-----------------------------------------------------------
QD128 :		2.58M IOPS per core (34.9 usec avg latency)
QD16  :		2.06M IOPS per core ( 6.9 usec avg latency)
QD1   :		 290K IOPS per core ( 3.4 usec avg latency)

Outside of showing what's possible with io_uring today, these results
are also a testament to the general Linux IO stack efficiency. The
introduction of blk-mq was as much about general efficiency as it was
about scalability. That was a design criteria for both blk-mq and
io_uring from day 1. Even if you aren't driving millions of IOPS, or
using tons of threads/cores, you still care about getting your work done
in the shortest amount of time, using the fewest amount of wasted
cycles.

More to come...

-- 
Jens Axboe

                 reply	other threads:[~2020-08-21 15:59 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox