[PATCHSET v3 0/9] Improve MSG_RING DEFER_TASKRUN performance

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: [email protected]
Cc: [email protected]
Subject: [PATCHSET v3 0/9] Improve MSG_RING DEFER_TASKRUN performance
Date: Wed,  5 Jun 2024 07:51:08 -0600	[thread overview]
Message-ID: <[email protected]> (raw)

Hi,

For v1 and replies to that and tons of perf measurements, go here:

https://lore.kernel.org/io-uring/[email protected]/T/#m12f44c0a9ee40a59b0dcc226e22a0d031903aa73

and find v2 here:

https://lore.kernel.org/io-uring/[email protected]/

and you can find the git tree here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-msg_ring

Patches are based on top of current Linus -git, with the 6.10 and 6.11
pending io_uring changes pulled in.

tldr is that this series greatly improves both latency, overhead, and
throughput of sending messages to other rings. It's done by using the
CQE overflow framework rather than attempting to local remote rings,
which can potentially cause spurious -EAGAIN and io-wq usage. Outside
of that, it also unifies how message posting is done, ending up with a
single method across target ring types.

Some select performance results:

Sender using 10 usec delay, sending ~100K messages per second:

Pre-patches:

Latencies for: Sender (msg=131950)
    percentiles (nsec):
     |  1.0000th=[ 1896],  5.0000th=[ 2064], 10.0000th=[ 2096],
     | 20.0000th=[ 2192], 30.0000th=[ 2352], 40.0000th=[ 2480],
     | 50.0000th=[ 2544], 60.0000th=[ 2608], 70.0000th=[ 2896],
     | 80.0000th=[ 2992], 90.0000th=[ 3376], 95.0000th=[ 3472],
     | 99.0000th=[ 3568], 99.5000th=[ 3728], 99.9000th=[ 6880],
     | 99.9500th=[14656], 99.9900th=[42752]
Latencies for: Receiver (msg=131950)
    percentiles (nsec):
     |  1.0000th=[ 1160],  5.0000th=[ 1288], 10.0000th=[ 1336],
     | 20.0000th=[ 1384], 30.0000th=[ 1448], 40.0000th=[ 1624],
     | 50.0000th=[ 1688], 60.0000th=[ 1736], 70.0000th=[ 1768],
     | 80.0000th=[ 1848], 90.0000th=[ 2256], 95.0000th=[ 2320],
     | 99.0000th=[ 2416], 99.5000th=[ 2480], 99.9000th=[ 3184],
     | 99.9500th=[14400], 99.9900th=[18304]
Expected messages: 299882

and with the patches:

Latencies for: Sender (msg=247931)
    percentiles (nsec):
     |  1.0000th=[  181],  5.0000th=[  191], 10.0000th=[  201],
     | 20.0000th=[  211], 30.0000th=[  231], 40.0000th=[  262],
     | 50.0000th=[  290], 60.0000th=[  322], 70.0000th=[  390],
     | 80.0000th=[  482], 90.0000th=[  748], 95.0000th=[  892],
     | 99.0000th=[ 1032], 99.5000th=[ 1096], 99.9000th=[ 1336],
     | 99.9500th=[ 1512], 99.9900th=[ 1992]
Latencies for: Receiver (msg=247931)
    percentiles (nsec):
     |  1.0000th=[  350],  5.0000th=[  382], 10.0000th=[  410],
     | 20.0000th=[  482], 30.0000th=[  572], 40.0000th=[  652],
     | 50.0000th=[  764], 60.0000th=[  860], 70.0000th=[ 1080],
     | 80.0000th=[ 1480], 90.0000th=[ 1768], 95.0000th=[ 1896],
     | 99.0000th=[ 2448], 99.5000th=[ 2576], 99.9000th=[ 3184],
     | 99.9500th=[ 3792], 99.9900th=[17280]
Expected messages: 299926

which is a ~8.7x improvement for 50th latency percentile for the sender,
and ~3.5x for the 99th percentile, and a ~2.2x receiver side improvement
for the 50th percentile. Higher percentiels for the receiver are pretty
similar, but note that this is accomplished with the throughput being
almost twice that of before (~248K messages over 3 seconds vs ~132K
before).

Using a 20 usec message delay, targeting 50K messages per second,
the latency picture is close to the same as above. However, pre patches
we get ~110K messages and after we get ~142K messages. Pre patches is
~37% off the target rate, with the patches we're within 5% of the
target.

One interesting use case for message passing is sending work items
between rings. For example, you can have a ring that accepts connections
and then passes them to worker threads that have their own ring. Or you
can have threads that receive data and needs to pass a work item for
processing to another thread. Normally that would be done with some kind
of queue with serialization, and then a remote wakeup with eg epoll on
the other end and using eventfd. That isn't very efficient. With message
passing, you can simply hand over the work item rather than need to
manage both a queue and a wakeup mechanism in userspace.

 include/linux/io_uring_types.h |   8 ++
 io_uring/io_uring.c            |  33 ++----
 io_uring/io_uring.h            |  44 +++++++
 io_uring/msg_ring.c            | 211 +++++++++++++++++----------------
 io_uring/msg_ring.h            |   3 +
 5 files changed, 176 insertions(+), 123 deletions(-)

Changes since v2:
- Add wakeup batching for MSG_RING with DEFER_TASKRUN by refactoring
  the helpers that we use for local task_work.
- Drop patch splitting fd installing into a separate helper, as we just
  remove it at the end anyway when the old MSG_RING posting code is
  removed.
- Little cleanups

-- 
Jens Axboe

next             reply	other threads:[~2024-06-05 14:19 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05 13:51 Jens Axboe [this message]
2024-06-05 13:51 ` [PATCH 1/9] io_uring/msg_ring: tighten requirement for remote posting Jens Axboe
2024-06-05 13:51 ` [PATCH 2/9] io_uring: keep track of overflow entry count Jens Axboe
2024-06-05 13:51 ` [PATCH 3/9] io_uring: abstract out helpers for DEFER_TASKRUN wakeup batching Jens Axboe
2024-06-05 13:51 ` [PATCH 4/9] io_uring/msg_ring: avoid double indirection task_work for data messages Jens Axboe
2024-06-05 13:51 ` [PATCH 5/9] io_uring/msg_ring: avoid double indirection task_work for fd passing Jens Axboe
2024-06-05 13:51 ` [PATCH 6/9] io_uring/msg_ring: add an alloc cache for CQE entries Jens Axboe
2024-06-05 13:51 ` [PATCH 7/9] io_uring/msg_ring: remove callback_head from struct io_msg Jens Axboe
2024-06-05 13:51 ` [PATCH 8/9] io_uring/msg_ring: add basic wakeup batch support Jens Axboe
2024-06-05 15:32   ` Pavel Begunkov
2024-06-05 15:50     ` Jens Axboe
2024-06-05 13:51 ` [PATCH 9/9] io_uring/msg_ring: remove non-remote message passing Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox