public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2 0/8] optimise resheduling due to deferred tw
@ 2023-04-06 13:20 Pavel Begunkov
  2023-04-06 13:20 ` [PATCH v2 1/8] io_uring: move pinning out of io_req_local_work_add Pavel Begunkov
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Pavel Begunkov @ 2023-04-06 13:20 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence, linux-kernel

io_uring extensively uses task_work, but when a task is waiting
every new queued task_work batch will try to wake it up and so
cause lots of scheduling activity. This series optimises it,
specifically applied for rw completions and send-zc notifications
for now, and will helpful for further optimisations.

Quick testing shows similar to v1 results, numbers from v1:
For my zc net test once in a while waiting for a portion of buffers
I've got 10x descrease in the number of context switches and 2x
improvement in CPU util (17% vs 8%). In profiles, io_cqring_work()
got down from 40-50% of CPU to ~13%.

There is also an improvement on the softirq side for io_uring
notifications as io_req_local_work_add() doesn't trigger wake_up()
as often. System wide profiles show reduction of cycles taken
by io_req_local_work_add() from 3% -> 0.5%, which is mostly not
reflected in the numbers above as it was firing off of a different
CPU.

v2: Remove atomics decrements by the queueing side and instead carry
    all the info in requests. It's definitely simpler and removes extra
    atomics, the downside is touching the previous request, which might
    be not cached.

    Add a couple of patches from backlog optimising and cleaning
    io_req_local_work_add().

Pavel Begunkov (8):
  io_uring: move pinning out of io_req_local_work_add
  io_uring: optimie local tw add ctx pinning
  io_uring: refactor __io_cq_unlock_post_flush()
  io_uring: add tw add flags
  io_uring: inline llist_add()
  io_uring: reduce scheduling due to tw
  io_uring: refactor __io_cq_unlock_post_flush()
  io_uring: optimise io_req_local_work_add

 include/linux/io_uring_types.h |   3 +-
 io_uring/io_uring.c            | 131 ++++++++++++++++++++++-----------
 io_uring/io_uring.h            |  29 +++++---
 io_uring/notif.c               |   2 +-
 io_uring/notif.h               |   2 +-
 io_uring/rw.c                  |   2 +-
 6 files changed, 110 insertions(+), 59 deletions(-)

-- 
2.40.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-04-12  1:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-06 13:20 [PATCH v2 0/8] optimise resheduling due to deferred tw Pavel Begunkov
2023-04-06 13:20 ` [PATCH v2 1/8] io_uring: move pinning out of io_req_local_work_add Pavel Begunkov
2023-04-06 13:20 ` [PATCH v2 2/8] io_uring: optimie local tw add ctx pinning Pavel Begunkov
2023-04-06 13:20 ` [PATCH v2 3/8] io_uring: refactor __io_cq_unlock_post_flush() Pavel Begunkov
2023-04-06 14:23   ` Pavel Begunkov
2023-04-06 13:20 ` [PATCH v2 4/8] io_uring: add tw add flags Pavel Begunkov
2023-04-06 13:20 ` [PATCH v2 5/8] io_uring: inline llist_add() Pavel Begunkov
2023-04-06 13:20 ` [PATCH v2 6/8] io_uring: reduce scheduling due to tw Pavel Begunkov
2023-04-06 13:20 ` [PATCH v2 7/8] io_uring: refactor __io_cq_unlock_post_flush() Pavel Begunkov
2023-04-06 13:20 ` [PATCH v2 8/8] io_uring: optimise io_req_local_work_add Pavel Begunkov
2023-04-12  1:53 ` [PATCH v2 0/8] optimise resheduling due to deferred tw Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox