[PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention)

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Max Kellermann <[email protected]>
To: [email protected], [email protected],
	[email protected], [email protected]
Cc: Max Kellermann <[email protected]>
Subject: [PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention)
Date: Tue, 28 Jan 2025 14:39:19 +0100	[thread overview]
Message-ID: <[email protected]> (raw)

While optimizing my io_uring-based web server, I found that the kernel
spends 35% of the CPU time waiting for `io_wq_acct.lock`.  This patch
set reduces contention of this lock, though I believe much more should
be done in order to allow more worker concurrency.

I measured these patches with my HTTP server (serving static files and
running a tiny PHP script) and with a micro-benchmark that submits
millions of `IORING_OP_NOP` entries (with `IOSQE_ASYNC` to force
offloading the operation to a worker, so this offload overhead can be
measured).

Some of the optimizations eliminate memory accesses, e.g. by passing
values that are already known to (inlined) functions and by caching
values in local variables.  These are useful optimizations, but they
are too small to measure them in a benchmark (too much noise).

Some of the patches have a measurable effect and they contain
benchmark numbers that I could reproduce in repeated runs, despite the
noise.

I'm not confident about the correctness of the last patch ("io_uring:
skip redundant poll wakeups").  This seemed like low-hanging fruit, so
low that it seemed suspicious to me.  If this is a useful
optimization, the idea could probably be ported to other wait_queue
users, or even into the wait_queue library.  What I'm not confident
about is whether the optimization is valid or whether it may miss
wakeups, leading to stalls.  Please advise!

Total "perf diff" for `IORING_OP_NOP`:

    42.25%     -9.24%  [kernel.kallsyms]     [k] queued_spin_lock_slowpath
     4.79%     +2.83%  [kernel.kallsyms]     [k] io_worker_handle_work
     7.23%     -1.41%  [kernel.kallsyms]     [k] io_wq_submit_work
     6.80%     +1.23%  [kernel.kallsyms]     [k] io_wq_free_work
     3.19%     +1.10%  [kernel.kallsyms]     [k] io_req_task_complete
     2.45%     +0.94%  [kernel.kallsyms]     [k] try_to_wake_up
               +0.81%  [kernel.kallsyms]     [k] io_acct_activate_free_worker
     0.79%     +0.64%  [kernel.kallsyms]     [k] __schedule

Serving static files with HTTP (send+receive on local+TCP,splice
file->pipe->TCP):

    42.92%     -7.84%  [kernel.kallsyms]     [k] queued_spin_lock_slowpath
     1.53%     -1.51%  [kernel.kallsyms]     [k] ep_poll_callback
     1.18%     +1.49%  [kernel.kallsyms]     [k] io_wq_free_work
     0.61%     +0.60%  [kernel.kallsyms]     [k] try_to_wake_up
     0.76%     -0.43%  [kernel.kallsyms]     [k] _raw_spin_lock_irqsave
     2.22%     -0.33%  [kernel.kallsyms]     [k] io_wq_submit_work

Running PHP script (send+receive on local+TCP, splice pipe->TCP):

    33.01%     -4.13%  [kernel.kallsyms]     [k] queued_spin_lock_slowpath
     1.57%     -1.56%  [kernel.kallsyms]     [k] ep_poll_callback
     1.36%     +1.19%  [kernel.kallsyms]     [k] io_wq_free_work
     0.94%     -0.61%  [kernel.kallsyms]     [k] _raw_spin_lock_irqsave
     2.56%     -0.36%  [kernel.kallsyms]     [k] io_wq_submit_work
     2.06%     +0.36%  [kernel.kallsyms]     [k] io_worker_handle_work
     1.00%     +0.35%  [kernel.kallsyms]     [k] try_to_wake_up

(The `IORING_OP_NOP` benchmark finishes after a hardcoded number of
operations; the two HTTP benchmarks finish after a certain wallclock
duration, and therefore more HTTP requests were handled.)

Max Kellermann (8):
  io_uring/io-wq: eliminate redundant io_work_get_acct() calls
  io_uring/io-wq: add io_worker.acct pointer
  io_uring/io-wq: move worker lists to struct io_wq_acct
  io_uring/io-wq: cache work->flags in variable
  io_uring/io-wq: do not use bogus hash value
  io_uring/io-wq: pass io_wq to io_get_next_work()
  io_uring: cache io_kiocb->flags in variable
  io_uring: skip redundant poll wakeups

 include/linux/io_uring_types.h |  10 ++
 io_uring/io-wq.c               | 230 +++++++++++++++++++--------------
 io_uring/io-wq.h               |   7 +-
 io_uring/io_uring.c            |  63 +++++----
 io_uring/io_uring.h            |   2 +-
 5 files changed, 187 insertions(+), 125 deletions(-)

-- 
2.45.2

next             reply	other threads:[~2025-01-28 13:39 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-28 13:39 Max Kellermann [this message]
2025-01-28 13:39 ` [PATCH 1/8] io_uring/io-wq: eliminate redundant io_work_get_acct() calls Max Kellermann
2025-01-28 13:39 ` [PATCH 2/8] io_uring/io-wq: add io_worker.acct pointer Max Kellermann
2025-01-28 13:39 ` [PATCH 3/8] io_uring/io-wq: move worker lists to struct io_wq_acct Max Kellermann
2025-01-28 13:39 ` [PATCH 4/8] io_uring/io-wq: cache work->flags in variable Max Kellermann
2025-01-29 18:57   ` Pavel Begunkov
2025-01-29 19:11     ` Max Kellermann
2025-01-29 23:41       ` Pavel Begunkov
2025-01-30  5:36         ` Max Kellermann
2025-01-30 14:57           ` Jens Axboe
2025-01-31 14:06             ` Pavel Begunkov
2025-01-30 14:54         ` Jens Axboe
2025-01-28 13:39 ` [PATCH 5/8] io_uring/io-wq: do not use bogus hash value Max Kellermann
2025-01-28 13:39 ` [PATCH 6/8] io_uring/io-wq: pass io_wq to io_get_next_work() Max Kellermann
2025-01-28 13:39 ` [PATCH 7/8] io_uring: cache io_kiocb->flags in variable Max Kellermann
2025-01-29 19:11   ` Pavel Begunkov
2025-01-28 13:39 ` [PATCH 8/8] io_uring: skip redundant poll wakeups Max Kellermann
2025-01-31 13:54   ` Pavel Begunkov
2025-01-31 17:16     ` Max Kellermann
2025-01-31 17:25       ` Pavel Begunkov
2025-01-29 17:18 ` [PATCH 0/8] Various io_uring micro-optimizations (reducing lock contention) Jens Axboe
2025-01-29 17:39   ` Max Kellermann
2025-01-29 17:45     ` Jens Axboe
2025-01-29 18:01       ` Max Kellermann
2025-01-31 16:13         ` Jens Axboe
2025-02-01 15:25           ` Jens Axboe
2025-02-01 15:30             ` Max Kellermann
2025-02-01 15:38               ` Jens Axboe
2025-01-29 19:30     ` Pavel Begunkov
2025-01-29 19:43       ` Max Kellermann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox