From: Pavel Begunkov <[email protected]>
To: [email protected]
Cc: Jens Axboe <[email protected]>, [email protected]
Subject: [RFC v2 10/13] io_uring: add lazy poll_wq activation
Date: Tue, 3 Jan 2023 03:04:01 +0000 [thread overview]
Message-ID: <81e49bfc364b3b385fa405adf4065a41dcaf9141.1672713341.git.asml.silence@gmail.com> (raw)
In-Reply-To: <[email protected]>
Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do
for another waitqueue, it's not going to be the case in the future and
so we want to have a fast path for it when the ring has never been
polled.
Move poll_wq wake ups into __io_commit_cqring_flush() using a new flag
called ->poll_activated. The idea behind the flag is to set it when the
ring was polled for the first time. This requires additional sync to not
miss events, which is done here by using task_work for ->task_complete
rings, and by default enabling the flag for all other types of rings.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 2 ++
io_uring/io_uring.c | 40 ++++++++++++++++++++++++++++++++++
io_uring/io_uring.h | 7 +++---
3 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index cbcd3aaddd9d..1452ff745e5c 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -210,6 +210,7 @@ struct io_ring_ctx {
unsigned int syscall_iopoll: 1;
/* all CQEs should be posted only by the submitter task */
unsigned int task_complete: 1;
+ unsigned int poll_activated: 1;
} ____cacheline_aligned_in_smp;
/* submission data */
@@ -357,6 +358,7 @@ struct io_ring_ctx {
u32 iowq_limits[2];
bool iowq_limits_set;
+ struct callback_head poll_wq_task_work;
struct list_head defer_list;
unsigned sq_thread_idle;
/* protected by ->completion_lock */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 42f512c42099..d2a3d9928ba3 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -573,6 +573,8 @@ static void io_eventfd_flush_signal(struct io_ring_ctx *ctx)
void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
{
+ if (ctx->poll_activated)
+ io_poll_wq_wake(ctx);
if (ctx->off_timeout_used)
io_flush_timeouts(ctx);
if (ctx->drain_active) {
@@ -2764,11 +2766,42 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
kfree(ctx);
}
+static __cold void io_lazy_activate_poll(struct callback_head *cb)
+{
+ struct io_ring_ctx *ctx = container_of(cb, struct io_ring_ctx,
+ poll_wq_task_work);
+
+ mutex_lock(&ctx->uring_lock);
+ ctx->poll_activated = true;
+ mutex_unlock(&ctx->uring_lock);
+
+ /*
+ * Wake ups for some events between start of polling and activation
+ * might've been lost due to loose synchronisation.
+ */
+ io_poll_wq_wake(ctx);
+ percpu_ref_put(&ctx->refs);
+}
+
static __poll_t io_uring_poll(struct file *file, poll_table *wait)
{
struct io_ring_ctx *ctx = file->private_data;
__poll_t mask = 0;
+ if (unlikely(!ctx->poll_activated)) {
+ spin_lock(&ctx->completion_lock);
+ if (!ctx->poll_activated && !ctx->poll_wq_task_work.func &&
+ ctx->submitter_task) {
+ init_task_work(&ctx->poll_wq_task_work, io_lazy_activate_poll);
+ percpu_ref_get(&ctx->refs);
+
+ if (task_work_add(ctx->submitter_task,
+ &ctx->poll_wq_task_work, TWA_SIGNAL))
+ percpu_ref_put(&ctx->refs);
+ }
+ spin_unlock(&ctx->completion_lock);
+ }
+
poll_wait(file, &ctx->poll_wq, wait);
/*
* synchronizes with barrier from wq_has_sleeper call in
@@ -3575,6 +3608,13 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
!(ctx->flags & IORING_SETUP_SQPOLL))
ctx->task_complete = true;
+ /*
+ * Lazy poll_wq activation requires sync with all potential completors,
+ * ->task_complete guarantees a single completor
+ */
+ if (!ctx->task_complete)
+ ctx->poll_activated = true;
+
/*
* When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user
* space applications don't need to do io completion events
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 645ace377d7e..e9819872c186 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -209,7 +209,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
{
- if (waitqueue_active(&ctx->poll_wq))
+ if (wq_has_sleeper(&ctx->poll_wq))
__wake_up(&ctx->poll_wq, TASK_NORMAL, 0,
poll_to_key(EPOLL_URING_WAKE | EPOLLIN));
}
@@ -217,8 +217,6 @@ static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
/* requires smb_mb() prior, see wq_has_sleeper() */
static inline void __io_cqring_wake(struct io_ring_ctx *ctx)
{
- io_poll_wq_wake(ctx);
-
/*
* Trigger waitqueue handler on all waiters on our waitqueue. This
* won't necessarily wake up all the tasks, io_should_wake() will make
@@ -319,7 +317,8 @@ static inline void io_req_complete_defer(struct io_kiocb *req)
static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx)
{
- if (unlikely(ctx->off_timeout_used || ctx->drain_active || ctx->has_evfd))
+ if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
+ ctx->has_evfd || ctx->poll_activated))
__io_commit_cqring_flush(ctx);
}
--
2.38.1
next prev parent reply other threads:[~2023-01-03 3:05 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-03 3:03 [RFC v2 00/13] CQ waiting and wake up optimisations Pavel Begunkov
2023-01-03 3:03 ` [RFC v2 01/13] io_uring: rearrange defer list checks Pavel Begunkov
2023-01-03 3:03 ` [RFC v2 02/13] io_uring: don't iterate cq wait fast path Pavel Begunkov
2023-01-03 3:03 ` [RFC v2 03/13] io_uring: kill io_run_task_work_ctx Pavel Begunkov
2023-01-03 3:03 ` [RFC v2 04/13] io_uring: move defer tw task checks Pavel Begunkov
2023-01-03 3:03 ` [RFC v2 05/13] io_uring: parse check_cq out of wq waiting Pavel Begunkov
2023-01-03 3:03 ` [RFC v2 06/13] io_uring: mimimise io_cqring_wait_schedule Pavel Begunkov
2023-01-03 3:03 ` [RFC v2 07/13] io_uring: simplify io_has_work Pavel Begunkov
2023-01-03 3:03 ` [RFC v2 08/13] io_uring: set TASK_RUNNING right after schedule Pavel Begunkov
2023-01-03 3:04 ` [RFC v2 09/13] io_uring: separate wq for ring polling Pavel Begunkov
2023-01-04 18:08 ` Jens Axboe
2023-01-04 20:28 ` Pavel Begunkov
2023-01-04 20:34 ` Jens Axboe
2023-01-04 20:45 ` Pavel Begunkov
2023-01-04 20:53 ` Jens Axboe
2023-01-04 20:52 ` Jens Axboe
2023-01-03 3:04 ` Pavel Begunkov [this message]
2023-01-03 3:04 ` [RFC v2 11/13] io_uring: wake up optimisations Pavel Begunkov
2023-01-03 3:04 ` [RFC v2 12/13] io_uring: waitqueue-less cq waiting Pavel Begunkov
2023-01-03 3:04 ` [RFC v2 13/13] io_uring: add io_req_local_work_add wake fast path Pavel Begunkov
2023-01-04 18:05 ` (subset) [RFC v2 00/13] CQ waiting and wake up optimisations Jens Axboe
2023-01-04 20:25 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=81e49bfc364b3b385fa405adf4065a41dcaf9141.1672713341.git.asml.silence@gmail.com \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox