* [PATCH v3 01/11] io_uring: move submitter_task out of cold cacheline
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 02/11] io_uring: refactor io_wake_function Pavel Begunkov
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
->submitter_task is used somewhat more frequent now than before, i.e.
for local tw enqueue and run, let's move it from the end of ctx, which
is full of cold data, to the first cacheline with mostly constants.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 128a67a40065..8dfb6c4a35d9 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -195,11 +195,7 @@ struct io_alloc_cache {
struct io_ring_ctx {
/* const or read-mostly hot data */
struct {
- struct percpu_ref refs;
-
- struct io_rings *rings;
unsigned int flags;
- enum task_work_notify_mode notify_method;
unsigned int compat: 1;
unsigned int drain_next: 1;
unsigned int restricted: 1;
@@ -210,6 +206,11 @@ struct io_ring_ctx {
unsigned int syscall_iopoll: 1;
/* all CQEs should be posted only by the submitter task */
unsigned int task_complete: 1;
+
+ enum task_work_notify_mode notify_method;
+ struct io_rings *rings;
+ struct task_struct *submitter_task;
+ struct percpu_ref refs;
} ____cacheline_aligned_in_smp;
/* submission data */
@@ -320,7 +321,6 @@ struct io_ring_ctx {
/* Keep this last, we don't need it for the fast path */
struct io_restriction restrictions;
- struct task_struct *submitter_task;
/* slow path rsrc auxilary data, used by update/register */
struct io_rsrc_node *rsrc_backup_node;
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 02/11] io_uring: refactor io_wake_function
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 01/11] io_uring: move submitter_task out of cold cacheline Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 03/11] io_uring: don't set TASK_RUNNING in local tw runner Pavel Begunkov
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Remove a local variable ctx in io_wake_function(), we don't need it if
io_should_wake() triggers it to wake up.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 89dd2827a977..157e6ef6da7c 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2441,15 +2441,13 @@ static inline bool io_should_wake(struct io_wait_queue *iowq)
static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
int wake_flags, void *key)
{
- struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue,
- wq);
- struct io_ring_ctx *ctx = iowq->ctx;
+ struct io_wait_queue *iowq = container_of(curr, struct io_wait_queue, wq);
/*
* Cannot safely flush overflowed CQEs from here, ensure we wake up
* the task, and the next invocation will do it.
*/
- if (io_should_wake(iowq) || io_has_work(ctx))
+ if (io_should_wake(iowq) || io_has_work(iowq->ctx))
return autoremove_wake_function(curr, mode, wake_flags, key);
return -1;
}
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 03/11] io_uring: don't set TASK_RUNNING in local tw runner
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 01/11] io_uring: move submitter_task out of cold cacheline Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 02/11] io_uring: refactor io_wake_function Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 04/11] io_uring: mark io_run_local_work static Pavel Begunkov
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
The CQ waiting loop sets TASK_RUNNING before trying to execute
task_work, no need to repeat it in io_run_local_work().
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 157e6ef6da7c..961ebc031992 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1339,11 +1339,9 @@ int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
int io_run_local_work(struct io_ring_ctx *ctx)
{
- bool locked;
+ bool locked = mutex_trylock(&ctx->uring_lock);
int ret;
- __set_current_state(TASK_RUNNING);
- locked = mutex_trylock(&ctx->uring_lock);
ret = __io_run_local_work(ctx, &locked);
if (locked)
mutex_unlock(&ctx->uring_lock);
@@ -2455,6 +2453,7 @@ static int io_wake_function(struct wait_queue_entry *curr, unsigned int mode,
int io_run_task_work_sig(struct io_ring_ctx *ctx)
{
if (!llist_empty(&ctx->work_llist)) {
+ __set_current_state(TASK_RUNNING);
if (io_run_local_work(ctx) > 0)
return 1;
}
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 04/11] io_uring: mark io_run_local_work static
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (2 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 03/11] io_uring: don't set TASK_RUNNING in local tw runner Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 05/11] io_uring: move io_run_local_work_locked Pavel Begunkov
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
io_run_local_work is enclosed in io_uring.c, we don't need to export it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 2 +-
io_uring/io_uring.h | 1 -
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 961ebc031992..5b62386413d9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1337,7 +1337,7 @@ int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
}
-int io_run_local_work(struct io_ring_ctx *ctx)
+static int io_run_local_work(struct io_ring_ctx *ctx)
{
bool locked = mutex_trylock(&ctx->uring_lock);
int ret;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 011932935c36..78896d1f7916 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -29,7 +29,6 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow);
bool io_req_cqe_overflow(struct io_kiocb *req);
int io_run_task_work_sig(struct io_ring_ctx *ctx);
int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked);
-int io_run_local_work(struct io_ring_ctx *ctx);
void io_req_defer_failed(struct io_kiocb *req, s32 res);
void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags);
bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 05/11] io_uring: move io_run_local_work_locked
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (3 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 04/11] io_uring: mark io_run_local_work static Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 06/11] io_uring: separate wq for ring polling Pavel Begunkov
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
io_run_local_work_locked() is only used in io_uring.c, move it there.
With that we can also make __io_run_local_work() static.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 18 +++++++++++++++++-
io_uring/io_uring.h | 17 -----------------
2 files changed, 17 insertions(+), 18 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5b62386413d9..c251acf0964e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1296,7 +1296,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
}
}
-int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
+static int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
{
struct llist_node *node;
struct llist_node fake;
@@ -1337,6 +1337,22 @@ int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
}
+static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
+{
+ bool locked;
+ int ret;
+
+ if (llist_empty(&ctx->work_llist))
+ return 0;
+
+ locked = true;
+ ret = __io_run_local_work(ctx, &locked);
+ /* shouldn't happen! */
+ if (WARN_ON_ONCE(!locked))
+ mutex_lock(&ctx->uring_lock);
+ return ret;
+}
+
static int io_run_local_work(struct io_ring_ctx *ctx)
{
bool locked = mutex_trylock(&ctx->uring_lock);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 78896d1f7916..b5975e353aa1 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -28,7 +28,6 @@ enum {
struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx, bool overflow);
bool io_req_cqe_overflow(struct io_kiocb *req);
int io_run_task_work_sig(struct io_ring_ctx *ctx);
-int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked);
void io_req_defer_failed(struct io_kiocb *req, s32 res);
void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags);
bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
@@ -283,22 +282,6 @@ static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
return task_work_pending(current) || !wq_list_empty(&ctx->work_llist);
}
-static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
-{
- bool locked;
- int ret;
-
- if (llist_empty(&ctx->work_llist))
- return 0;
-
- locked = true;
- ret = __io_run_local_work(ctx, &locked);
- /* shouldn't happen! */
- if (WARN_ON_ONCE(!locked))
- mutex_lock(&ctx->uring_lock);
- return ret;
-}
-
static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
{
if (!*locked) {
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 06/11] io_uring: separate wq for ring polling
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (4 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 05/11] io_uring: move io_run_local_work_locked Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 07/11] io_uring: add lazy poll_wq activation Pavel Begunkov
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Don't use ->cq_wait for ring polling but add a separate wait queue for
it. We need it for following patches.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 2 +-
io_uring/io_uring.c | 3 ++-
io_uring/io_uring.h | 9 +++++++++
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 8dfb6c4a35d9..0d94ee191c15 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -319,7 +319,7 @@ struct io_ring_ctx {
} ____cacheline_aligned_in_smp;
/* Keep this last, we don't need it for the fast path */
-
+ struct wait_queue_head poll_wq;
struct io_restriction restrictions;
/* slow path rsrc auxilary data, used by update/register */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c251acf0964e..8d19b0812f30 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -316,6 +316,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
xa_init_flags(&ctx->personalities, XA_FLAGS_ALLOC1);
mutex_init(&ctx->uring_lock);
init_waitqueue_head(&ctx->cq_wait);
+ init_waitqueue_head(&ctx->poll_wq);
spin_lock_init(&ctx->completion_lock);
spin_lock_init(&ctx->timeout_lock);
INIT_WQ_LIST(&ctx->iopoll_list);
@@ -2788,7 +2789,7 @@ static __poll_t io_uring_poll(struct file *file, poll_table *wait)
struct io_ring_ctx *ctx = file->private_data;
__poll_t mask = 0;
- poll_wait(file, &ctx->cq_wait, wait);
+ poll_wait(file, &ctx->poll_wq, wait);
/*
* synchronizes with barrier from wq_has_sleeper call in
* io_commit_cqring
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index b5975e353aa1..c75bbb94703c 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -220,9 +220,18 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
smp_store_release(&ctx->rings->cq.tail, ctx->cached_cq_tail);
}
+static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
+{
+ if (waitqueue_active(&ctx->poll_wq))
+ __wake_up(&ctx->poll_wq, TASK_NORMAL, 0,
+ poll_to_key(EPOLL_URING_WAKE | EPOLLIN));
+}
+
/* requires smb_mb() prior, see wq_has_sleeper() */
static inline void __io_cqring_wake(struct io_ring_ctx *ctx)
{
+ io_poll_wq_wake(ctx);
+
/*
* Trigger waitqueue handler on all waiters on our waitqueue. This
* won't necessarily wake up all the tasks, io_should_wake() will make
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 07/11] io_uring: add lazy poll_wq activation
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (5 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 06/11] io_uring: separate wq for ring polling Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 08/11] io_uring: wake up optimisations Pavel Begunkov
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Even though io_poll_wq_wake()'s waitqueue_active reuses a barrier we do
for another waitqueue, it's not going to be the case in the future and
so we want to have a fast path for it when the ring has never been
polled.
Move poll_wq wake ups into __io_commit_cqring_flush() using a new flag
called ->poll_activated. The idea behind the flag is to set it when the
ring was polled for the first time. This requires additional sync to not
miss events, which is done here by using task_work for ->task_complete
rings, and by default enabling the flag for all other types of rings.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 2 ++
io_uring/io_uring.c | 60 +++++++++++++++++++++++++++++++++-
io_uring/io_uring.h | 7 ++--
3 files changed, 64 insertions(+), 5 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 0d94ee191c15..7b5e90520278 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -206,6 +206,7 @@ struct io_ring_ctx {
unsigned int syscall_iopoll: 1;
/* all CQEs should be posted only by the submitter task */
unsigned int task_complete: 1;
+ unsigned int poll_activated: 1;
enum task_work_notify_mode notify_method;
struct io_rings *rings;
@@ -357,6 +358,7 @@ struct io_ring_ctx {
u32 iowq_limits[2];
bool iowq_limits_set;
+ struct callback_head poll_wq_task_work;
struct list_head defer_list;
unsigned sq_thread_idle;
/* protected by ->completion_lock */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8d19b0812f30..a6ed022c1356 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -573,6 +573,8 @@ static void io_eventfd_flush_signal(struct io_ring_ctx *ctx)
void __io_commit_cqring_flush(struct io_ring_ctx *ctx)
{
+ if (ctx->poll_activated)
+ io_poll_wq_wake(ctx);
if (ctx->off_timeout_used)
io_flush_timeouts(ctx);
if (ctx->drain_active) {
@@ -2784,11 +2786,53 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
kfree(ctx);
}
+static __cold void io_activate_pollwq_cb(struct callback_head *cb)
+{
+ struct io_ring_ctx *ctx = container_of(cb, struct io_ring_ctx,
+ poll_wq_task_work);
+
+ mutex_lock(&ctx->uring_lock);
+ ctx->poll_activated = true;
+ mutex_unlock(&ctx->uring_lock);
+
+ /*
+ * Wake ups for some events between start of polling and activation
+ * might've been lost due to loose synchronisation.
+ */
+ wake_up_all(&ctx->poll_wq);
+ percpu_ref_put(&ctx->refs);
+}
+
+static __cold void io_activate_pollwq(struct io_ring_ctx *ctx)
+{
+ spin_lock(&ctx->completion_lock);
+ /* already activated or in progress */
+ if (ctx->poll_activated || ctx->poll_wq_task_work.func)
+ goto out;
+ if (WARN_ON_ONCE(!ctx->task_complete))
+ goto out;
+ if (!ctx->submitter_task)
+ goto out;
+ /*
+ * with ->submitter_task only the submitter task completes requests, we
+ * only need to sync with it, which is done by injecting a tw
+ */
+ init_task_work(&ctx->poll_wq_task_work, io_activate_pollwq_cb);
+ percpu_ref_get(&ctx->refs);
+ if (task_work_add(ctx->submitter_task, &ctx->poll_wq_task_work, TWA_SIGNAL))
+ percpu_ref_put(&ctx->refs);
+out:
+ spin_unlock(&ctx->completion_lock);
+}
+
static __poll_t io_uring_poll(struct file *file, poll_table *wait)
{
struct io_ring_ctx *ctx = file->private_data;
__poll_t mask = 0;
+ if (unlikely(!ctx->poll_activated))
+ io_activate_pollwq(ctx);
+
poll_wait(file, &ctx->poll_wq, wait);
/*
* synchronizes with barrier from wq_has_sleeper call in
@@ -3595,6 +3639,13 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
!(ctx->flags & IORING_SETUP_SQPOLL))
ctx->task_complete = true;
+ /*
+ * lazy poll_wq activation relies on ->task_complete for synchronisation
+ * purposes, see io_activate_pollwq()
+ */
+ if (!ctx->task_complete)
+ ctx->poll_activated = true;
+
/*
* When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, user
* space applications don't need to do io completion events
@@ -3888,8 +3939,15 @@ static int io_register_enable_rings(struct io_ring_ctx *ctx)
if (!(ctx->flags & IORING_SETUP_R_DISABLED))
return -EBADFD;
- if (ctx->flags & IORING_SETUP_SINGLE_ISSUER && !ctx->submitter_task)
+ if (ctx->flags & IORING_SETUP_SINGLE_ISSUER && !ctx->submitter_task) {
ctx->submitter_task = get_task_struct(current);
+ /*
+ * Lazy activation attempts would fail if it was polled before
+ * submitter_task is set.
+ */
+ if (wq_has_sleeper(&ctx->poll_wq))
+ io_activate_pollwq(ctx);
+ }
if (ctx->restrictions.registered)
ctx->restricted = 1;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index c75bbb94703c..5113e0ddb01d 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -222,7 +222,7 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
{
- if (waitqueue_active(&ctx->poll_wq))
+ if (wq_has_sleeper(&ctx->poll_wq))
__wake_up(&ctx->poll_wq, TASK_NORMAL, 0,
poll_to_key(EPOLL_URING_WAKE | EPOLLIN));
}
@@ -230,8 +230,6 @@ static inline void io_poll_wq_wake(struct io_ring_ctx *ctx)
/* requires smb_mb() prior, see wq_has_sleeper() */
static inline void __io_cqring_wake(struct io_ring_ctx *ctx)
{
- io_poll_wq_wake(ctx);
-
/*
* Trigger waitqueue handler on all waiters on our waitqueue. This
* won't necessarily wake up all the tasks, io_should_wake() will make
@@ -316,7 +314,8 @@ static inline void io_req_complete_defer(struct io_kiocb *req)
static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx)
{
- if (unlikely(ctx->off_timeout_used || ctx->drain_active || ctx->has_evfd))
+ if (unlikely(ctx->off_timeout_used || ctx->drain_active ||
+ ctx->has_evfd || ctx->poll_activated))
__io_commit_cqring_flush(ctx);
}
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 08/11] io_uring: wake up optimisations
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (6 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 07/11] io_uring: add lazy poll_wq activation Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 09/11] io_uring: waitqueue-less cq waiting Pavel Begunkov
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Flush completions is done either from the submit syscall or by the
task_work, both are in the context of the submitter task, and when it
goes for a single threaded rings like implied by ->task_complete, there
won't be any waiters on ->cq_wait but the master task. That means that
there can be no tasks sleeping on cq_wait while we run
__io_submit_flush_completions() and so waking up can be skipped.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index a6ed022c1356..0dabd0f3271f 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -621,6 +621,25 @@ static inline void __io_cq_unlock_post(struct io_ring_ctx *ctx)
io_cqring_wake(ctx);
}
+static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
+ __releases(ctx->completion_lock)
+{
+ io_commit_cqring(ctx);
+ __io_cq_unlock(ctx);
+ io_commit_cqring_flush(ctx);
+
+ /*
+ * As ->task_complete implies that the ring is single tasked, cq_wait
+ * may only be waited on by the current in io_cqring_wait(), but since
+ * it will re-check the wakeup conditions once we return we can safely
+ * skip waking it up.
+ */
+ if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
+ smp_mb();
+ __io_cqring_wake(ctx);
+ }
+}
+
void io_cq_unlock_post(struct io_ring_ctx *ctx)
__releases(ctx->completion_lock)
{
@@ -1480,7 +1499,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
}
}
}
- __io_cq_unlock_post(ctx);
+ __io_cq_unlock_post_flush(ctx);
if (!wq_list_empty(&ctx->submit_state.compl_reqs)) {
io_free_batch_list(ctx, state->compl_reqs.first);
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 09/11] io_uring: waitqueue-less cq waiting
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (7 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 08/11] io_uring: wake up optimisations Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 10/11] io_uring: add io_req_local_work_add wake fast path Pavel Begunkov
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
With DEFER_TASKRUN only ctx->submitter_task might be waiting for CQEs,
we can use this to optimise io_cqring_wait(). Replace ->cq_wait
waitqueue with waking the task directly.
It works but misses an important optimisation covered by the following
patch, so this patch without follow ups might hurt performance.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 0dabd0f3271f..62d879b14873 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1263,7 +1263,7 @@ static void io_req_local_work_add(struct io_kiocb *req)
percpu_ref_put(&ctx->refs);
return;
}
- /* need it for the following io_cqring_wake() */
+ /* needed for the following wake up */
smp_mb__after_atomic();
if (unlikely(atomic_read(&req->task->io_uring->in_idle))) {
@@ -1274,10 +1274,9 @@ static void io_req_local_work_add(struct io_kiocb *req)
if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
-
if (ctx->has_evfd)
io_eventfd_signal(ctx);
- __io_cqring_wake(ctx);
+ wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
percpu_ref_put(&ctx->refs);
}
@@ -2578,12 +2577,17 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
do {
unsigned long check_cq;
- prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
- TASK_INTERRUPTIBLE);
+ if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ } else {
+ prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
+ TASK_INTERRUPTIBLE);
+ }
+
ret = io_cqring_wait_schedule(ctx, &iowq);
+ __set_current_state(TASK_RUNNING);
if (ret < 0)
break;
- __set_current_state(TASK_RUNNING);
/*
* Run task_work after scheduling and before io_should_wake().
* If we got woken because of task_work being processed, run it
@@ -2611,7 +2615,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
cond_resched();
} while (1);
- finish_wait(&ctx->cq_wait, &iowq.wq);
+ if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
+ finish_wait(&ctx->cq_wait, &iowq.wq);
restore_saved_sigmask_unless(ret == -EINTR);
return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 10/11] io_uring: add io_req_local_work_add wake fast path
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (8 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 09/11] io_uring: waitqueue-less cq waiting Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-09 14:46 ` [PATCH v3 11/11] io_uring: optimise deferred tw execution Pavel Begunkov
2023-01-11 18:00 ` [PATCH v3 00/11] CQ waiting / task_work optimisations Jens Axboe
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
Don't wake the master task after queueing a deferred tw unless it's
currently waiting in io_cqring_wait.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 1 +
io_uring/io_uring.c | 7 ++++++-
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 7b5e90520278..cc0cf0705b8f 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -295,6 +295,7 @@ struct io_ring_ctx {
spinlock_t completion_lock;
bool poll_multi_queue;
+ bool cq_waiting;
/*
* ->iopoll_list is protected by the ctx->uring_lock for
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 62d879b14873..9c95ceb1a9f2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1276,7 +1276,9 @@ static void io_req_local_work_add(struct io_kiocb *req)
atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
if (ctx->has_evfd)
io_eventfd_signal(ctx);
- wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
+
+ if (READ_ONCE(ctx->cq_waiting))
+ wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
percpu_ref_put(&ctx->refs);
}
@@ -2578,6 +2580,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
unsigned long check_cq;
if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
+ WRITE_ONCE(ctx->cq_waiting, 1);
set_current_state(TASK_INTERRUPTIBLE);
} else {
prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
@@ -2586,6 +2589,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
ret = io_cqring_wait_schedule(ctx, &iowq);
__set_current_state(TASK_RUNNING);
+ WRITE_ONCE(ctx->cq_waiting, 0);
+
if (ret < 0)
break;
/*
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH v3 11/11] io_uring: optimise deferred tw execution
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (9 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 10/11] io_uring: add io_req_local_work_add wake fast path Pavel Begunkov
@ 2023-01-09 14:46 ` Pavel Begunkov
2023-01-11 18:00 ` [PATCH v3 00/11] CQ waiting / task_work optimisations Jens Axboe
11 siblings, 0 replies; 13+ messages in thread
From: Pavel Begunkov @ 2023-01-09 14:46 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence
We needed fake nodes in __io_run_local_work() and to avoid unecessary wake
ups while the task already running task_works, but we don't need them
anymore since wake ups are protected by cq_waiting, which is always
cleared by the time we're executing deferred task_work items.
Note that because of loose sync around cq_waiting clearing
io_req_local_work_add() may wake the task more than once, but that's
fine and should be rare to not hurt perf.
Signed-off-by: Pavel Begunkov <[email protected]>
---
io_uring/io_uring.c | 24 +++++++-----------------
1 file changed, 7 insertions(+), 17 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9c95ceb1a9f2..1dd0fc0412c8 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1322,17 +1322,16 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
static int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
{
struct llist_node *node;
- struct llist_node fake;
- struct llist_node *current_final = NULL;
+ unsigned int loops = 0;
int ret = 0;
- unsigned int loops = 1;
if (WARN_ON_ONCE(ctx->submitter_task != current))
return -EEXIST;
-
- node = io_llist_xchg(&ctx->work_llist, &fake);
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
again:
- while (node != current_final) {
+ node = io_llist_xchg(&ctx->work_llist, NULL);
+ while (node) {
struct llist_node *next = node->next;
struct io_kiocb *req = container_of(node, struct io_kiocb,
io_task_work.node);
@@ -1341,23 +1340,14 @@ static int __io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
ret++;
node = next;
}
+ loops++;
- if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
- atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
-
- node = io_llist_cmpxchg(&ctx->work_llist, &fake, NULL);
- if (node != &fake) {
- loops++;
- current_final = &fake;
- node = io_llist_xchg(&ctx->work_llist, &fake);
+ if (!llist_empty(&ctx->work_llist))
goto again;
- }
-
if (*locked)
io_submit_flush_completions(ctx);
trace_io_uring_local_work_run(ctx, ret, loops);
return ret;
-
}
static inline int io_run_local_work_locked(struct io_ring_ctx *ctx)
--
2.38.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v3 00/11] CQ waiting / task_work optimisations
2023-01-09 14:46 [PATCH v3 00/11] CQ waiting / task_work optimisations Pavel Begunkov
` (10 preceding siblings ...)
2023-01-09 14:46 ` [PATCH v3 11/11] io_uring: optimise deferred tw execution Pavel Begunkov
@ 2023-01-11 18:00 ` Jens Axboe
11 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2023-01-11 18:00 UTC (permalink / raw)
To: io-uring, Pavel Begunkov
On Mon, 09 Jan 2023 14:46:02 +0000, Pavel Begunkov wrote:
> For DEFER_TASKRUN rings replace CQ waitqueues with a custom implementation
> based on the fact that only one task may be waiting for completions. Also,
> improve deferred task running by removing one atomic in patch 11
>
> Benchmarking QD1 with simulated tw arrival right after we start waiting:
> 7.5 MIOPS -> 9.3 (+23%), where half of CPU cycles goes to syscall overhead.
>
> [...]
Applied, thanks!
[01/11] io_uring: move submitter_task out of cold cacheline
commit: 8516c8b514839600b7e63090f2dce5b4d658fd68
[02/11] io_uring: refactor io_wake_function
commit: 291f31bf963c0018a2b84a94388a0e7b535c3dae
[03/11] io_uring: don't set TASK_RUNNING in local tw runner
commit: 5eb30c28823aed63946c9d2a222bf1158a3aecae
[04/11] io_uring: mark io_run_local_work static
commit: 88d14c077c1a04555978c499acd12f5f55de51da
[05/11] io_uring: move io_run_local_work_locked
commit: 78c37b460a63c866050d3e05d6d4bfddf654075e
[06/11] io_uring: separate wq for ring polling
commit: 6b40f3c9a37b97e629a99a92d2c392d77ae20f60
[07/11] io_uring: add lazy poll_wq activation
commit: e05f6f47bf8aed0e97d9ba1d52e2a10ea542609c
[08/11] io_uring: wake up optimisations
commit: ef3ddc6ac629fc829ed6e08418e1c070332dde63
[09/11] io_uring: waitqueue-less cq waiting
commit: 65ca9dd8ce5e3de42b100f0e7d2ae360e9b8d14e
[10/11] io_uring: add io_req_local_work_add wake fast path
commit: 6cd16656e2ddc63ee7aae7c7f27edcab933a0e09
[11/11] io_uring: optimise deferred tw execution
commit: 607947314b4c9f8c979f79c095da9156b41c82b8
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 13+ messages in thread