* [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
2022-04-25 14:21 [PATCHSET v3 " Jens Axboe
@ 2022-04-25 14:21 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-25 14:21 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
Rather than require ctx->completion_lock for ensuring that we don't
clobber the flags, use the atomic bitop helpers instead. This removes
the need to grab the completion_lock, in preparation for needing to set
or clear sq_flags when we don't know the status of this lock.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 36 ++++++++++--------------------------
1 file changed, 10 insertions(+), 26 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index bf95ef9240e5..511b52e4b9fd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -170,7 +170,7 @@ struct io_rings {
* The application needs a full memory barrier before checking
* for IORING_SQ_NEED_WAKEUP after updating the sq tail.
*/
- u32 sq_flags;
+ atomic_t sq_flags;
/*
* Runtime CQ flags
*
@@ -2060,8 +2060,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
all_flushed = list_empty(&ctx->cq_overflow_list);
if (all_flushed) {
clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_CQ_OVERFLOW);
+ atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
io_commit_cqring(ctx);
@@ -2155,8 +2154,7 @@ static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
}
if (list_empty(&ctx->cq_overflow_list)) {
set_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_CQ_OVERFLOW);
+ atomic_or(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
ocqe->cqe.user_data = user_data;
@@ -8477,23 +8475,6 @@ static inline bool io_sqd_events_pending(struct io_sq_data *sqd)
return READ_ONCE(sqd->state);
}
-static inline void io_ring_set_wakeup_flag(struct io_ring_ctx *ctx)
-{
- /* Tell userspace we may need a wakeup call */
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
-static inline void io_ring_clear_wakeup_flag(struct io_ring_ctx *ctx)
-{
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
{
unsigned int to_submit;
@@ -8609,8 +8590,8 @@ static int io_sq_thread(void *data)
bool needs_sched = true;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
- io_ring_set_wakeup_flag(ctx);
-
+ atomic_or(IORING_SQ_NEED_WAKEUP,
+ &ctx->rings->sq_flags);
if ((ctx->flags & IORING_SETUP_IOPOLL) &&
!wq_list_empty(&ctx->iopoll_list)) {
needs_sched = false;
@@ -8635,7 +8616,8 @@ static int io_sq_thread(void *data)
mutex_lock(&sqd->lock);
}
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_clear_wakeup_flag(ctx);
+ atomic_andnot(IORING_SQ_NEED_WAKEUP,
+ &ctx->rings->sq_flags);
}
finish_wait(&sqd->wait, &wait);
@@ -8645,7 +8627,7 @@ static int io_sq_thread(void *data)
io_uring_cancel_generic(true, sqd);
sqd->thread = NULL;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_set_wakeup_flag(ctx);
+ atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
io_run_task_work();
mutex_unlock(&sqd->lock);
@@ -12399,6 +12381,8 @@ static int __init io_uring_init(void)
BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST);
BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
+ BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32));
+
req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
SLAB_ACCOUNT);
return 0;
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCHSET v4 next 0/5] Add support for non-IPI task_work
@ 2022-04-26 1:48 Jens Axboe
2022-04-26 1:48 ` [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:48 UTC (permalink / raw)
To: io-uring
Hi,
Unless we're using SQPOLL, any task_work queue will result in an IPI
to the target task unless it's running in the kernel already. This isn't
always needed, particularly not for the common case of not sharing the
ring. In certain workloads, this can provide a 5-10% improvement. Some
of this is due the cost of the IPI, and some from needlessly
interrupting the target task when the work could just get run when
completions are being waited for.
Patches 1..4 are prep patches, patch 5 is the actual change, and patch 6
adds support for IORING_SQ_TASKRUN so that applications may use this
feature and still rely on io_uring_peek_cqe().
v4:
- Make SQPOLL incompatible with the IPI flags. It makes no sense for
SQPOLL as no IPIs are ever used there anyway, so make that explicit
and fail a request to setup a ring like that.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
@ 2022-04-26 1:48 ` Jens Axboe
2022-04-26 14:39 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
` (5 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:48 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
Some use cases don't always need an IPI when sending a TWA_SIGNAL
notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
except it doesn't send an IPI to the target task. It merely sets
TIF_NOTIFY_SIGNAL and wakes up the task.
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/sched/signal.h | 13 +++++++++++--
include/linux/task_work.h | 1 +
kernel/task_work.c | 15 ++++++++++-----
3 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..66b689f6cfcb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -355,14 +355,23 @@ static inline void clear_notify_signal(void)
smp_mb__after_atomic();
}
+/*
+ * Returns 'true' if kick_process() is needed to force a transition from
+ * user -> kernel to guarantee expedient run of TWA_SIGNAL based task_work.
+ */
+static inline bool __set_notify_signal(struct task_struct *task)
+{
+ return !test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
+ !wake_up_state(task, TASK_INTERRUPTIBLE);
+}
+
/*
* Called to break out of interruptible wait loops, and enter the
* exit_to_user_mode_loop().
*/
static inline void set_notify_signal(struct task_struct *task)
{
- if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
- !wake_up_state(task, TASK_INTERRUPTIBLE))
+ if (__set_notify_signal(task))
kick_process(task);
}
diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 897494b597ba..795ef5a68429 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -17,6 +17,7 @@ enum task_work_notify_mode {
TWA_NONE,
TWA_RESUME,
TWA_SIGNAL,
+ TWA_SIGNAL_NO_IPI,
};
static inline bool task_work_pending(struct task_struct *task)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index c59e1a49bc40..fa8fdd04aa17 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -13,11 +13,13 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
*
* Queue @work for task_work_run() below and notify the @task if @notify
* is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL works like signals, in that the
- * it will interrupt the targeted task and run the task_work. @TWA_RESUME
- * work is run only when the task exits the kernel and returns to user mode,
- * or before entering guest mode. Fails if the @task is exiting/exited and thus
- * it can't process this @work. Otherwise @work->func() will be called when the
- * @task goes through one of the aforementioned transitions, or exits.
+ * it will interrupt the targeted task and run the task_work. @TWA_SIGNAL_NO_IPI
+ * works like @TWA_SIGNAL, except it doesn't send a reschedule IPI to force the
+ * targeted task to reschedule and run task_work. @TWA_RESUME work is run only
+ * when the task exits the kernel and returns to user mode, or before entering
+ * guest mode. Fails if the @task is exiting/exited and thus it can't process
+ * this @work. Otherwise @work->func() will be called when the @task goes
+ * through one of the aforementioned transitions, or exits.
*
* If the targeted task is exiting, then an error is returned and the work item
* is not queued. It's up to the caller to arrange for an alternative mechanism
@@ -53,6 +55,9 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
case TWA_SIGNAL:
set_notify_signal(task);
break;
+ case TWA_SIGNAL_NO_IPI:
+ __set_notify_signal(task);
+ break;
default:
WARN_ON_ONCE(1);
break;
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
2022-04-26 1:48 ` [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 15:03 ` Almog Khaikin
2022-04-26 1:49 ` [PATCH 3/6] io-wq: use __set_notify_signal() to wake workers Jens Axboe
` (4 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
Rather than require ctx->completion_lock for ensuring that we don't
clobber the flags, use the atomic bitop helpers instead. This removes
the need to grab the completion_lock, in preparation for needing to set
or clear sq_flags when we don't know the status of this lock.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 36 ++++++++++--------------------------
1 file changed, 10 insertions(+), 26 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index bf95ef9240e5..511b52e4b9fd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -170,7 +170,7 @@ struct io_rings {
* The application needs a full memory barrier before checking
* for IORING_SQ_NEED_WAKEUP after updating the sq tail.
*/
- u32 sq_flags;
+ atomic_t sq_flags;
/*
* Runtime CQ flags
*
@@ -2060,8 +2060,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
all_flushed = list_empty(&ctx->cq_overflow_list);
if (all_flushed) {
clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_CQ_OVERFLOW);
+ atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
io_commit_cqring(ctx);
@@ -2155,8 +2154,7 @@ static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
}
if (list_empty(&ctx->cq_overflow_list)) {
set_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_CQ_OVERFLOW);
+ atomic_or(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
ocqe->cqe.user_data = user_data;
@@ -8477,23 +8475,6 @@ static inline bool io_sqd_events_pending(struct io_sq_data *sqd)
return READ_ONCE(sqd->state);
}
-static inline void io_ring_set_wakeup_flag(struct io_ring_ctx *ctx)
-{
- /* Tell userspace we may need a wakeup call */
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
-static inline void io_ring_clear_wakeup_flag(struct io_ring_ctx *ctx)
-{
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
{
unsigned int to_submit;
@@ -8609,8 +8590,8 @@ static int io_sq_thread(void *data)
bool needs_sched = true;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
- io_ring_set_wakeup_flag(ctx);
-
+ atomic_or(IORING_SQ_NEED_WAKEUP,
+ &ctx->rings->sq_flags);
if ((ctx->flags & IORING_SETUP_IOPOLL) &&
!wq_list_empty(&ctx->iopoll_list)) {
needs_sched = false;
@@ -8635,7 +8616,8 @@ static int io_sq_thread(void *data)
mutex_lock(&sqd->lock);
}
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_clear_wakeup_flag(ctx);
+ atomic_andnot(IORING_SQ_NEED_WAKEUP,
+ &ctx->rings->sq_flags);
}
finish_wait(&sqd->wait, &wait);
@@ -8645,7 +8627,7 @@ static int io_sq_thread(void *data)
io_uring_cancel_generic(true, sqd);
sqd->thread = NULL;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_set_wakeup_flag(ctx);
+ atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
io_run_task_work();
mutex_unlock(&sqd->lock);
@@ -12399,6 +12381,8 @@ static int __init io_uring_init(void)
BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST);
BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
+ BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32));
+
req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
SLAB_ACCOUNT);
return 0;
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/6] io-wq: use __set_notify_signal() to wake workers
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
2022-04-26 1:48 ` [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
2022-04-26 1:49 ` [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 4/6] io_uring: set task_work notify method at init time Jens Axboe
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
The only difference between set_notify_signal() and __set_notify_signal()
is that the former checks if it needs to deliver an IPI to force a
reschedule. As the io-wq workers never leave the kernel, and IPI is never
needed, they simply need a wakeup.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io-wq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 32aeb2c581c5..824623bcf1a5 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -871,7 +871,7 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
static bool io_wq_worker_wake(struct io_worker *worker, void *data)
{
- set_notify_signal(worker->task);
+ __set_notify_signal(worker->task);
wake_up_process(worker->task);
return false;
}
@@ -991,7 +991,7 @@ static bool __io_wq_worker_cancel(struct io_worker *worker,
{
if (work && match->fn(work, match->data)) {
work->flags |= IO_WQ_WORK_CANCEL;
- set_notify_signal(worker->task);
+ __set_notify_signal(worker->task);
return true;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/6] io_uring: set task_work notify method at init time
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
` (2 preceding siblings ...)
2022-04-26 1:49 ` [PATCH 3/6] io-wq: use __set_notify_signal() to wake workers Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used Jens Axboe
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
While doing so, switch SQPOLL to TWA_SIGNAL_NO_IPI as well, as that
just does a task wakeup and then we can remove the special wakeup we
have in task_work_add.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 511b52e4b9fd..7e9ac5fd3a8c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -367,6 +367,7 @@ struct io_ring_ctx {
struct io_rings *rings;
unsigned int flags;
+ enum task_work_notify_mode notify_method;
unsigned int compat: 1;
unsigned int drain_next: 1;
unsigned int restricted: 1;
@@ -2651,8 +2652,8 @@ static void tctx_task_work(struct callback_head *cb)
static void io_req_task_work_add(struct io_kiocb *req, bool priority)
{
struct task_struct *tsk = req->task;
+ struct io_ring_ctx *ctx = req->ctx;
struct io_uring_task *tctx = tsk->io_uring;
- enum task_work_notify_mode notify;
struct io_wq_work_node *node;
unsigned long flags;
bool running;
@@ -2675,18 +2676,8 @@ static void io_req_task_work_add(struct io_kiocb *req, bool priority)
if (running)
return;
- /*
- * SQPOLL kernel thread doesn't need notification, just a wakeup. For
- * all other cases, use TWA_SIGNAL unconditionally to ensure we're
- * processing task_work. There's no reliable way to tell if TWA_RESUME
- * will do the job.
- */
- notify = (req->ctx->flags & IORING_SETUP_SQPOLL) ? TWA_NONE : TWA_SIGNAL;
- if (likely(!task_work_add(tsk, &tctx->task_work, notify))) {
- if (notify == TWA_NONE)
- wake_up_process(tsk);
+ if (likely(!task_work_add(tsk, &tctx->task_work, ctx->notify_method)))
return;
- }
spin_lock_irqsave(&tctx->task_lock, flags);
tctx->task_running = false;
@@ -11704,6 +11695,14 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
if (!capable(CAP_IPC_LOCK))
ctx->user = get_uid(current_user());
+ /*
+ * For SQPOLL, we just need a wakeup, always.
+ */
+ if (ctx->flags & IORING_SETUP_SQPOLL)
+ ctx->notify_method = TWA_SIGNAL_NO_IPI;
+ else
+ ctx->notify_method = TWA_SIGNAL;
+
/*
* This is just grabbed for accounting purposes. When a process exits,
* the mm is exited and dropped before the files, hence we need to hang
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
` (3 preceding siblings ...)
2022-04-26 1:49 ` [PATCH 4/6] io_uring: set task_work notify method at init time Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG Jens Axboe
2022-04-26 14:02 ` [PATCHSET v4 next 0/5] Add support for non-IPI task_work Pavel Begunkov
6 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
If this is set, io_uring will never use an IPI to deliver a task_work
notification. This can be used in the common case where a single task or
thread communicates with the ring, and doesn't rely on
io_uring_cqe_peek().
This provides a noticeable win in performance, both from eliminating
the IPI itself, but also from avoiding interrupting the submitting
task unnecessarily.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 17 +++++++++++++----
include/uapi/linux/io_uring.h | 8 ++++++++
2 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7e9ac5fd3a8c..5e4842cd21c2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -11696,12 +11696,20 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
ctx->user = get_uid(current_user());
/*
- * For SQPOLL, we just need a wakeup, always.
+ * For SQPOLL, we just need a wakeup, always. For !SQPOLL, if
+ * COOP_TASKRUN is set, then IPIs are never needed by the app.
*/
- if (ctx->flags & IORING_SETUP_SQPOLL)
+ ret = -EINVAL;
+ if (ctx->flags & IORING_SETUP_SQPOLL) {
+ /* IPI related flags don't make sense with SQPOLL */
+ if (ctx->flags & IORING_SETUP_COOP_TASKRUN)
+ goto err;
ctx->notify_method = TWA_SIGNAL_NO_IPI;
- else
+ } else if (ctx->flags & IORING_SETUP_COOP_TASKRUN) {
+ ctx->notify_method = TWA_SIGNAL_NO_IPI;
+ } else {
ctx->notify_method = TWA_SIGNAL;
+ }
/*
* This is just grabbed for accounting purposes. When a process exits,
@@ -11800,7 +11808,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
if (p.flags & ~(IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL |
IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
- IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL))
+ IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
+ IORING_SETUP_COOP_TASKRUN))
return -EINVAL;
return io_uring_create(entries, &p, params);
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 5fb52bf32435..4654842ace88 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -104,6 +104,14 @@ enum {
#define IORING_SETUP_ATTACH_WQ (1U << 5) /* attach to existing wq */
#define IORING_SETUP_R_DISABLED (1U << 6) /* start with ring disabled */
#define IORING_SETUP_SUBMIT_ALL (1U << 7) /* continue submit on error */
+/*
+ * Cooperative task running. When requests complete, they often require
+ * forcing the submitter to transition to the kernel to complete. If this
+ * flag is set, work will be done when the task transitions anyway, rather
+ * than force an inter-processor interrupt reschedule. This avoids interrupting
+ * a task running in userspace, and saves an IPI.
+ */
+#define IORING_SETUP_COOP_TASKRUN (1U << 8)
enum {
IORING_OP_NOP,
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
` (4 preceding siblings ...)
2022-04-26 1:49 ` [PATCH 5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 14:02 ` [PATCHSET v4 next 0/5] Add support for non-IPI task_work Pavel Begunkov
6 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
If IORING_SETUP_COOP_TASKRUN is set to use cooperative scheduling for
running task_work, then IORING_SETUP_TASKRUN_FLAG can be set so the
application can tell if task_work is pending in the kernel for this
ring. This allows use cases like io_uring_peek_cqe() to still function
appropriately, or for the task to know when it would be useful to
call io_uring_wait_cqe() to run pending events.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 14 +++++++++++---
include/uapi/linux/io_uring.h | 7 +++++++
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5e4842cd21c2..2c859ab326cd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2536,6 +2536,8 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
{
if (!ctx)
return;
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
if (*locked) {
io_submit_flush_completions(ctx);
mutex_unlock(&ctx->uring_lock);
@@ -2676,6 +2678,9 @@ static void io_req_task_work_add(struct io_kiocb *req, bool priority)
if (running)
return;
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+
if (likely(!task_work_add(tsk, &tctx->task_work, ctx->notify_method)))
return;
@@ -11702,12 +11707,15 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
ret = -EINVAL;
if (ctx->flags & IORING_SETUP_SQPOLL) {
/* IPI related flags don't make sense with SQPOLL */
- if (ctx->flags & IORING_SETUP_COOP_TASKRUN)
+ if (ctx->flags & (IORING_SETUP_COOP_TASKRUN |
+ IORING_SETUP_TASKRUN_FLAG))
goto err;
ctx->notify_method = TWA_SIGNAL_NO_IPI;
} else if (ctx->flags & IORING_SETUP_COOP_TASKRUN) {
ctx->notify_method = TWA_SIGNAL_NO_IPI;
} else {
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ goto err;
ctx->notify_method = TWA_SIGNAL;
}
@@ -11809,10 +11817,10 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
- IORING_SETUP_COOP_TASKRUN))
+ IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG))
return -EINVAL;
- return io_uring_create(entries, &p, params);
+ return io_uring_create(entries, &p, params);
}
SYSCALL_DEFINE2(io_uring_setup, u32, entries,
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 4654842ace88..ad53def6abb8 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -112,6 +112,12 @@ enum {
* a task running in userspace, and saves an IPI.
*/
#define IORING_SETUP_COOP_TASKRUN (1U << 8)
+/*
+ * If COOP_TASKRUN is set, get notified if task work is available for
+ * running and a kernel transition would be needed to run it. This sets
+ * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
+ */
+#define IORING_SETUP_TASKRUN_FLAG (1U << 9)
enum {
IORING_OP_NOP,
@@ -263,6 +269,7 @@ struct io_sqring_offsets {
*/
#define IORING_SQ_NEED_WAKEUP (1U << 0) /* needs io_uring_enter wakeup */
#define IORING_SQ_CQ_OVERFLOW (1U << 1) /* CQ ring is overflown */
+#define IORING_SQ_TASKRUN (1U << 2) /* task should enter the kernel */
struct io_cqring_offsets {
__u32 head;
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCHSET v4 next 0/5] Add support for non-IPI task_work
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
` (5 preceding siblings ...)
2022-04-26 1:49 ` [PATCH 6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG Jens Axboe
@ 2022-04-26 14:02 ` Pavel Begunkov
6 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-04-26 14:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 4/26/22 02:48, Jens Axboe wrote:
> Hi,
>
> Unless we're using SQPOLL, any task_work queue will result in an IPI
> to the target task unless it's running in the kernel already. This isn't
> always needed, particularly not for the common case of not sharing the
> ring. In certain workloads, this can provide a 5-10% improvement. Some
> of this is due the cost of the IPI, and some from needlessly
> interrupting the target task when the work could just get run when
> completions are being waited for.
>
> Patches 1..4 are prep patches, patch 5 is the actual change, and patch 6
> adds support for IORING_SQ_TASKRUN so that applications may use this
> feature and still rely on io_uring_peek_cqe().
>
> v4:
> - Make SQPOLL incompatible with the IPI flags. It makes no sense for
> SQPOLL as no IPIs are ever used there anyway, so make that explicit
> and fail a request to setup a ring like that.
looks good
Reviewed-by: Pavel Begunkov <[email protected]>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI
2022-04-26 1:48 ` [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
@ 2022-04-26 14:39 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 14:39 UTC (permalink / raw)
To: io-uring, axboe
On Mon, 25 Apr 2022 19:48:59 -0600, Jens Axboe wrote:
> Some use cases don't always need an IPI when sending a TWA_SIGNAL
> notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
> except it doesn't send an IPI to the target task. It merely sets
> TIF_NOTIFY_SIGNAL and wakes up the task.
>
>
Applied, thanks!
[1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI
commit: c0c84594c0234aac5d09af8a595d25d822c6dcc8
[2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
commit: 8018823e6987032d3d751263872b5385359c2819
[3/6] io-wq: use __set_notify_signal() to wake workers
commit: 8a68648b353bb6e20a3dc8c0b914792ce0a0391f
[4/6] io_uring: set task_work notify method at init time
commit: 35ac0da1d1346d182003db278c2d7b2ac32420a7
[5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
commit: a933a9031e40c972c24ce6406e7cea73657728a5
[6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG
commit: 6f07a54a90ee98ae13b37ac358624d7cc7e57850
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
2022-04-26 1:49 ` [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
@ 2022-04-26 15:03 ` Almog Khaikin
2022-04-26 15:32 ` Jens Axboe
0 siblings, 1 reply; 12+ messages in thread
From: Almog Khaikin @ 2022-04-26 15:03 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 4/26/22 04:49, Jens Axboe wrote:
> Rather than require ctx->completion_lock for ensuring that we don't
> clobber the flags, use the atomic bitop helpers instead. This removes
> the need to grab the completion_lock, in preparation for needing to set
> or clear sq_flags when we don't know the status of this lock.
The smp_mb() in io_sq_thread() should also be changed to
smp_mb__after_atomic()
--
Almog Khaikin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
2022-04-26 15:03 ` Almog Khaikin
@ 2022-04-26 15:32 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 15:32 UTC (permalink / raw)
To: Almog Khaikin, io-uring
On 4/26/22 9:03 AM, Almog Khaikin wrote:
> On 4/26/22 04:49, Jens Axboe wrote:
>> Rather than require ctx->completion_lock for ensuring that we don't
>> clobber the flags, use the atomic bitop helpers instead. This removes
>> the need to grab the completion_lock, in preparation for needing to set
>> or clear sq_flags when we don't know the status of this lock.
>
> The smp_mb() in io_sq_thread() should also be changed to
> smp_mb__after_atomic()
Indeed, want to send a patch?
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-04-26 15:33 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
2022-04-26 1:48 ` [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
2022-04-26 14:39 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
2022-04-26 15:03 ` Almog Khaikin
2022-04-26 15:32 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 3/6] io-wq: use __set_notify_signal() to wake workers Jens Axboe
2022-04-26 1:49 ` [PATCH 4/6] io_uring: set task_work notify method at init time Jens Axboe
2022-04-26 1:49 ` [PATCH 5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used Jens Axboe
2022-04-26 1:49 ` [PATCH 6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG Jens Axboe
2022-04-26 14:02 ` [PATCHSET v4 next 0/5] Add support for non-IPI task_work Pavel Begunkov
-- strict thread matches above, loose matches on Subject: below --
2022-04-25 14:21 [PATCHSET v3 " Jens Axboe
2022-04-25 14:21 ` [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox