* [PATCH 1/5] task_work: allow TWA_SIGNAL without a rescheduling IPI
2022-04-23 17:39 [PATCHSET v2 next 0/5] Add support for non-IPI task_work Jens Axboe
@ 2022-04-23 17:39 ` Jens Axboe
2022-04-23 17:39 ` [PATCH 2/5] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2022-04-23 17:39 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
Some use cases don't always need an IPI when sending a TWA_SIGNAL
notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
except it doesn't send an IPI to the target task. It merely sets
TIF_NOTIFY_SIGNAL and wakes up the task.
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/sched/signal.h | 13 +++++++++++--
include/linux/task_work.h | 1 +
kernel/task_work.c | 15 ++++++++++-----
3 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..66b689f6cfcb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -355,14 +355,23 @@ static inline void clear_notify_signal(void)
smp_mb__after_atomic();
}
+/*
+ * Returns 'true' if kick_process() is needed to force a transition from
+ * user -> kernel to guarantee expedient run of TWA_SIGNAL based task_work.
+ */
+static inline bool __set_notify_signal(struct task_struct *task)
+{
+ return !test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
+ !wake_up_state(task, TASK_INTERRUPTIBLE);
+}
+
/*
* Called to break out of interruptible wait loops, and enter the
* exit_to_user_mode_loop().
*/
static inline void set_notify_signal(struct task_struct *task)
{
- if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
- !wake_up_state(task, TASK_INTERRUPTIBLE))
+ if (__set_notify_signal(task))
kick_process(task);
}
diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 897494b597ba..795ef5a68429 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -17,6 +17,7 @@ enum task_work_notify_mode {
TWA_NONE,
TWA_RESUME,
TWA_SIGNAL,
+ TWA_SIGNAL_NO_IPI,
};
static inline bool task_work_pending(struct task_struct *task)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index c59e1a49bc40..fa8fdd04aa17 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -13,11 +13,13 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
*
* Queue @work for task_work_run() below and notify the @task if @notify
* is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL works like signals, in that the
- * it will interrupt the targeted task and run the task_work. @TWA_RESUME
- * work is run only when the task exits the kernel and returns to user mode,
- * or before entering guest mode. Fails if the @task is exiting/exited and thus
- * it can't process this @work. Otherwise @work->func() will be called when the
- * @task goes through one of the aforementioned transitions, or exits.
+ * it will interrupt the targeted task and run the task_work. @TWA_SIGNAL_NO_IPI
+ * works like @TWA_SIGNAL, except it doesn't send a reschedule IPI to force the
+ * targeted task to reschedule and run task_work. @TWA_RESUME work is run only
+ * when the task exits the kernel and returns to user mode, or before entering
+ * guest mode. Fails if the @task is exiting/exited and thus it can't process
+ * this @work. Otherwise @work->func() will be called when the @task goes
+ * through one of the aforementioned transitions, or exits.
*
* If the targeted task is exiting, then an error is returned and the work item
* is not queued. It's up to the caller to arrange for an alternative mechanism
@@ -53,6 +55,9 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
case TWA_SIGNAL:
set_notify_signal(task);
break;
+ case TWA_SIGNAL_NO_IPI:
+ __set_notify_signal(task);
+ break;
default:
WARN_ON_ONCE(1);
break;
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/5] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
2022-04-23 17:39 [PATCHSET v2 next 0/5] Add support for non-IPI task_work Jens Axboe
2022-04-23 17:39 ` [PATCH 1/5] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
@ 2022-04-23 17:39 ` Jens Axboe
2022-04-23 17:39 ` [PATCH 3/5] io-wq: use __set_notify_signal() to wake workers Jens Axboe
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2022-04-23 17:39 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
Rather than require ctx->completion_lock for ensuring that we don't
clobber the flags, use the atomic bitop helpers instead. This removes
the need to grab the completion_lock, in preparation for needing to set
or clear sq_flags when we don't know the status of this lock.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 36 ++++++++++--------------------------
1 file changed, 10 insertions(+), 26 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 626bf840bed2..4fe3a53187cf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -169,7 +169,7 @@ struct io_rings {
* The application needs a full memory barrier before checking
* for IORING_SQ_NEED_WAKEUP after updating the sq tail.
*/
- u32 sq_flags;
+ atomic_t sq_flags;
/*
* Runtime CQ flags
*
@@ -2030,8 +2030,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
all_flushed = list_empty(&ctx->cq_overflow_list);
if (all_flushed) {
clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_CQ_OVERFLOW);
+ atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
io_commit_cqring(ctx);
@@ -2125,8 +2124,7 @@ static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
}
if (list_empty(&ctx->cq_overflow_list)) {
set_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_CQ_OVERFLOW);
+ atomic_or(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
ocqe->cqe.user_data = user_data;
@@ -8105,23 +8103,6 @@ static inline bool io_sqd_events_pending(struct io_sq_data *sqd)
return READ_ONCE(sqd->state);
}
-static inline void io_ring_set_wakeup_flag(struct io_ring_ctx *ctx)
-{
- /* Tell userspace we may need a wakeup call */
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
-static inline void io_ring_clear_wakeup_flag(struct io_ring_ctx *ctx)
-{
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
{
unsigned int to_submit;
@@ -8237,8 +8218,8 @@ static int io_sq_thread(void *data)
bool needs_sched = true;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
- io_ring_set_wakeup_flag(ctx);
-
+ atomic_or(IORING_SQ_NEED_WAKEUP,
+ &ctx->rings->sq_flags);
if ((ctx->flags & IORING_SETUP_IOPOLL) &&
!wq_list_empty(&ctx->iopoll_list)) {
needs_sched = false;
@@ -8263,7 +8244,8 @@ static int io_sq_thread(void *data)
mutex_lock(&sqd->lock);
}
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_clear_wakeup_flag(ctx);
+ atomic_andnot(IORING_SQ_NEED_WAKEUP,
+ &ctx->rings->sq_flags);
}
finish_wait(&sqd->wait, &wait);
@@ -8273,7 +8255,7 @@ static int io_sq_thread(void *data)
io_uring_cancel_generic(true, sqd);
sqd->thread = NULL;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_set_wakeup_flag(ctx);
+ atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
io_run_task_work();
mutex_unlock(&sqd->lock);
@@ -12026,6 +12008,8 @@ static int __init io_uring_init(void)
BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST);
BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
+ BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32));
+
req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
SLAB_ACCOUNT);
return 0;
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/5] io-wq: use __set_notify_signal() to wake workers
2022-04-23 17:39 [PATCHSET v2 next 0/5] Add support for non-IPI task_work Jens Axboe
2022-04-23 17:39 ` [PATCH 1/5] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
2022-04-23 17:39 ` [PATCH 2/5] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
@ 2022-04-23 17:39 ` Jens Axboe
2022-04-23 17:39 ` [PATCH 4/5] io_uring: set task_work notify method at init time Jens Axboe
2022-04-23 17:39 ` [PATCH 5/5] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_NO_RESCHED is used Jens Axboe
4 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2022-04-23 17:39 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
The only difference between set_notify_signal() and __set_notify_signal()
is that the former checks if it needs to deliver an IPI to force a
reschedule. As the io-wq workers never leave the kernel, and IPI is never
needed, they simply need a wakeup.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io-wq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 32aeb2c581c5..824623bcf1a5 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -871,7 +871,7 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
static bool io_wq_worker_wake(struct io_worker *worker, void *data)
{
- set_notify_signal(worker->task);
+ __set_notify_signal(worker->task);
wake_up_process(worker->task);
return false;
}
@@ -991,7 +991,7 @@ static bool __io_wq_worker_cancel(struct io_worker *worker,
{
if (work && match->fn(work, match->data)) {
work->flags |= IO_WQ_WORK_CANCEL;
- set_notify_signal(worker->task);
+ __set_notify_signal(worker->task);
return true;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/5] io_uring: set task_work notify method at init time
2022-04-23 17:39 [PATCHSET v2 next 0/5] Add support for non-IPI task_work Jens Axboe
` (2 preceding siblings ...)
2022-04-23 17:39 ` [PATCH 3/5] io-wq: use __set_notify_signal() to wake workers Jens Axboe
@ 2022-04-23 17:39 ` Jens Axboe
2022-04-23 17:39 ` [PATCH 5/5] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_NO_RESCHED is used Jens Axboe
4 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2022-04-23 17:39 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
While doing so, switch SQPOLL to TWA_SIGNAL_NO_IPI as well, as that
just does a task wakeup and then we can remove the special wakeup we
have in task_work_add.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 4fe3a53187cf..45992ada2f06 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -366,6 +366,7 @@ struct io_ring_ctx {
struct io_rings *rings;
unsigned int flags;
+ enum task_work_notify_mode notify_method;
unsigned int compat: 1;
unsigned int drain_next: 1;
unsigned int restricted: 1;
@@ -2621,8 +2622,8 @@ static void tctx_task_work(struct callback_head *cb)
static void io_req_task_work_add(struct io_kiocb *req, bool priority)
{
struct task_struct *tsk = req->task;
+ struct io_ring_ctx *ctx = req->ctx;
struct io_uring_task *tctx = tsk->io_uring;
- enum task_work_notify_mode notify;
struct io_wq_work_node *node;
unsigned long flags;
bool running;
@@ -2645,18 +2646,8 @@ static void io_req_task_work_add(struct io_kiocb *req, bool priority)
if (running)
return;
- /*
- * SQPOLL kernel thread doesn't need notification, just a wakeup. For
- * all other cases, use TWA_SIGNAL unconditionally to ensure we're
- * processing task_work. There's no reliable way to tell if TWA_RESUME
- * will do the job.
- */
- notify = (req->ctx->flags & IORING_SETUP_SQPOLL) ? TWA_NONE : TWA_SIGNAL;
- if (likely(!task_work_add(tsk, &tctx->task_work, notify))) {
- if (notify == TWA_NONE)
- wake_up_process(tsk);
+ if (likely(!task_work_add(tsk, &tctx->task_work, ctx->notify_method)))
return;
- }
spin_lock_irqsave(&tctx->task_lock, flags);
tctx->task_running = false;
@@ -11332,6 +11323,14 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
if (!capable(CAP_IPC_LOCK))
ctx->user = get_uid(current_user());
+ /*
+ * For SQPOLL, we just need a wakeup, always.
+ */
+ if (ctx->flags & IORING_SETUP_SQPOLL)
+ ctx->notify_method = TWA_SIGNAL_NO_IPI;
+ else
+ ctx->notify_method = TWA_SIGNAL;
+
/*
* This is just grabbed for accounting purposes. When a process exits,
* the mm is exited and dropped before the files, hence we need to hang
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 5/5] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_NO_RESCHED is used
2022-04-23 17:39 [PATCHSET v2 next 0/5] Add support for non-IPI task_work Jens Axboe
` (3 preceding siblings ...)
2022-04-23 17:39 ` [PATCH 4/5] io_uring: set task_work notify method at init time Jens Axboe
@ 2022-04-23 17:39 ` Jens Axboe
4 siblings, 0 replies; 7+ messages in thread
From: Jens Axboe @ 2022-04-23 17:39 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
If this is set, io_uring will never use an IPI to deliver a task_work
notification. This can be used in the common case where a single task or
thread communicates with the ring, and doesn't rely on
io_uring_cqe_peek().
This provides a noticeable win in performance, both from eliminating
the IPI itself, but also from avoiding interrupting the submitting
task unnecessarily.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 8 +++++---
include/uapi/linux/io_uring.h | 1 +
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 45992ada2f06..7824a615f06d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -11324,9 +11324,10 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
ctx->user = get_uid(current_user());
/*
- * For SQPOLL, we just need a wakeup, always.
+ * For SQPOLL, we just need a wakeup, always. For !SQPOLL, if
+ * NO_RESCHED is set, then IPIs are never needed by the app.
*/
- if (ctx->flags & IORING_SETUP_SQPOLL)
+ if (ctx->flags & (IORING_SETUP_SQPOLL|IORING_SETUP_NO_RESCHED))
ctx->notify_method = TWA_SIGNAL_NO_IPI;
else
ctx->notify_method = TWA_SIGNAL;
@@ -11428,7 +11429,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
if (p.flags & ~(IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL |
IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
- IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL))
+ IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
+ IORING_SETUP_NO_RESCHED))
return -EINVAL;
return io_uring_create(entries, &p, params);
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 980d82eb196e..8a32230aa6f4 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -102,6 +102,7 @@ enum {
#define IORING_SETUP_ATTACH_WQ (1U << 5) /* attach to existing wq */
#define IORING_SETUP_R_DISABLED (1U << 6) /* start with ring disabled */
#define IORING_SETUP_SUBMIT_ALL (1U << 7) /* continue submit on error */
+#define IORING_SETUP_NO_RESCHED (1U << 8) /* work doesn't need resched */
enum {
IORING_OP_NOP,
--
2.35.1
^ permalink raw reply related [flat|nested] 7+ messages in thread