* [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
@ 2022-04-26 1:48 ` Jens Axboe
2022-04-26 14:39 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
` (5 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:48 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
Some use cases don't always need an IPI when sending a TWA_SIGNAL
notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
except it doesn't send an IPI to the target task. It merely sets
TIF_NOTIFY_SIGNAL and wakes up the task.
Signed-off-by: Jens Axboe <[email protected]>
---
include/linux/sched/signal.h | 13 +++++++++++--
include/linux/task_work.h | 1 +
kernel/task_work.c | 15 ++++++++++-----
3 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 3c8b34876744..66b689f6cfcb 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -355,14 +355,23 @@ static inline void clear_notify_signal(void)
smp_mb__after_atomic();
}
+/*
+ * Returns 'true' if kick_process() is needed to force a transition from
+ * user -> kernel to guarantee expedient run of TWA_SIGNAL based task_work.
+ */
+static inline bool __set_notify_signal(struct task_struct *task)
+{
+ return !test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
+ !wake_up_state(task, TASK_INTERRUPTIBLE);
+}
+
/*
* Called to break out of interruptible wait loops, and enter the
* exit_to_user_mode_loop().
*/
static inline void set_notify_signal(struct task_struct *task)
{
- if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
- !wake_up_state(task, TASK_INTERRUPTIBLE))
+ if (__set_notify_signal(task))
kick_process(task);
}
diff --git a/include/linux/task_work.h b/include/linux/task_work.h
index 897494b597ba..795ef5a68429 100644
--- a/include/linux/task_work.h
+++ b/include/linux/task_work.h
@@ -17,6 +17,7 @@ enum task_work_notify_mode {
TWA_NONE,
TWA_RESUME,
TWA_SIGNAL,
+ TWA_SIGNAL_NO_IPI,
};
static inline bool task_work_pending(struct task_struct *task)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index c59e1a49bc40..fa8fdd04aa17 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -13,11 +13,13 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
*
* Queue @work for task_work_run() below and notify the @task if @notify
* is @TWA_RESUME or @TWA_SIGNAL. @TWA_SIGNAL works like signals, in that the
- * it will interrupt the targeted task and run the task_work. @TWA_RESUME
- * work is run only when the task exits the kernel and returns to user mode,
- * or before entering guest mode. Fails if the @task is exiting/exited and thus
- * it can't process this @work. Otherwise @work->func() will be called when the
- * @task goes through one of the aforementioned transitions, or exits.
+ * it will interrupt the targeted task and run the task_work. @TWA_SIGNAL_NO_IPI
+ * works like @TWA_SIGNAL, except it doesn't send a reschedule IPI to force the
+ * targeted task to reschedule and run task_work. @TWA_RESUME work is run only
+ * when the task exits the kernel and returns to user mode, or before entering
+ * guest mode. Fails if the @task is exiting/exited and thus it can't process
+ * this @work. Otherwise @work->func() will be called when the @task goes
+ * through one of the aforementioned transitions, or exits.
*
* If the targeted task is exiting, then an error is returned and the work item
* is not queued. It's up to the caller to arrange for an alternative mechanism
@@ -53,6 +55,9 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
case TWA_SIGNAL:
set_notify_signal(task);
break;
+ case TWA_SIGNAL_NO_IPI:
+ __set_notify_signal(task);
+ break;
default:
WARN_ON_ONCE(1);
break;
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI
2022-04-26 1:48 ` [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
@ 2022-04-26 14:39 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 14:39 UTC (permalink / raw)
To: io-uring, axboe
On Mon, 25 Apr 2022 19:48:59 -0600, Jens Axboe wrote:
> Some use cases don't always need an IPI when sending a TWA_SIGNAL
> notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL,
> except it doesn't send an IPI to the target task. It merely sets
> TIF_NOTIFY_SIGNAL and wakes up the task.
>
>
Applied, thanks!
[1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI
commit: c0c84594c0234aac5d09af8a595d25d822c6dcc8
[2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
commit: 8018823e6987032d3d751263872b5385359c2819
[3/6] io-wq: use __set_notify_signal() to wake workers
commit: 8a68648b353bb6e20a3dc8c0b914792ce0a0391f
[4/6] io_uring: set task_work notify method at init time
commit: 35ac0da1d1346d182003db278c2d7b2ac32420a7
[5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
commit: a933a9031e40c972c24ce6406e7cea73657728a5
[6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG
commit: 6f07a54a90ee98ae13b37ac358624d7cc7e57850
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
2022-04-26 1:48 ` [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 15:03 ` Almog Khaikin
2022-04-26 1:49 ` [PATCH 3/6] io-wq: use __set_notify_signal() to wake workers Jens Axboe
` (4 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
Rather than require ctx->completion_lock for ensuring that we don't
clobber the flags, use the atomic bitop helpers instead. This removes
the need to grab the completion_lock, in preparation for needing to set
or clear sq_flags when we don't know the status of this lock.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 36 ++++++++++--------------------------
1 file changed, 10 insertions(+), 26 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index bf95ef9240e5..511b52e4b9fd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -170,7 +170,7 @@ struct io_rings {
* The application needs a full memory barrier before checking
* for IORING_SQ_NEED_WAKEUP after updating the sq tail.
*/
- u32 sq_flags;
+ atomic_t sq_flags;
/*
* Runtime CQ flags
*
@@ -2060,8 +2060,7 @@ static bool __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool force)
all_flushed = list_empty(&ctx->cq_overflow_list);
if (all_flushed) {
clear_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_CQ_OVERFLOW);
+ atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
io_commit_cqring(ctx);
@@ -2155,8 +2154,7 @@ static bool io_cqring_event_overflow(struct io_ring_ctx *ctx, u64 user_data,
}
if (list_empty(&ctx->cq_overflow_list)) {
set_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_CQ_OVERFLOW);
+ atomic_or(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
}
ocqe->cqe.user_data = user_data;
@@ -8477,23 +8475,6 @@ static inline bool io_sqd_events_pending(struct io_sq_data *sqd)
return READ_ONCE(sqd->state);
}
-static inline void io_ring_set_wakeup_flag(struct io_ring_ctx *ctx)
-{
- /* Tell userspace we may need a wakeup call */
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags | IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
-static inline void io_ring_clear_wakeup_flag(struct io_ring_ctx *ctx)
-{
- spin_lock(&ctx->completion_lock);
- WRITE_ONCE(ctx->rings->sq_flags,
- ctx->rings->sq_flags & ~IORING_SQ_NEED_WAKEUP);
- spin_unlock(&ctx->completion_lock);
-}
-
static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
{
unsigned int to_submit;
@@ -8609,8 +8590,8 @@ static int io_sq_thread(void *data)
bool needs_sched = true;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
- io_ring_set_wakeup_flag(ctx);
-
+ atomic_or(IORING_SQ_NEED_WAKEUP,
+ &ctx->rings->sq_flags);
if ((ctx->flags & IORING_SETUP_IOPOLL) &&
!wq_list_empty(&ctx->iopoll_list)) {
needs_sched = false;
@@ -8635,7 +8616,8 @@ static int io_sq_thread(void *data)
mutex_lock(&sqd->lock);
}
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_clear_wakeup_flag(ctx);
+ atomic_andnot(IORING_SQ_NEED_WAKEUP,
+ &ctx->rings->sq_flags);
}
finish_wait(&sqd->wait, &wait);
@@ -8645,7 +8627,7 @@ static int io_sq_thread(void *data)
io_uring_cancel_generic(true, sqd);
sqd->thread = NULL;
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list)
- io_ring_set_wakeup_flag(ctx);
+ atomic_or(IORING_SQ_NEED_WAKEUP, &ctx->rings->sq_flags);
io_run_task_work();
mutex_unlock(&sqd->lock);
@@ -12399,6 +12381,8 @@ static int __init io_uring_init(void)
BUILD_BUG_ON(ARRAY_SIZE(io_op_defs) != IORING_OP_LAST);
BUILD_BUG_ON(__REQ_F_LAST_BIT > 8 * sizeof(int));
+ BUILD_BUG_ON(sizeof(atomic_t) != sizeof(u32));
+
req_cachep = KMEM_CACHE(io_kiocb, SLAB_HWCACHE_ALIGN | SLAB_PANIC |
SLAB_ACCOUNT);
return 0;
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
2022-04-26 1:49 ` [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
@ 2022-04-26 15:03 ` Almog Khaikin
2022-04-26 15:32 ` Jens Axboe
0 siblings, 1 reply; 12+ messages in thread
From: Almog Khaikin @ 2022-04-26 15:03 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 4/26/22 04:49, Jens Axboe wrote:
> Rather than require ctx->completion_lock for ensuring that we don't
> clobber the flags, use the atomic bitop helpers instead. This removes
> the need to grab the completion_lock, in preparation for needing to set
> or clear sq_flags when we don't know the status of this lock.
The smp_mb() in io_sq_thread() should also be changed to
smp_mb__after_atomic()
--
Almog Khaikin
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and
2022-04-26 15:03 ` Almog Khaikin
@ 2022-04-26 15:32 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 15:32 UTC (permalink / raw)
To: Almog Khaikin, io-uring
On 4/26/22 9:03 AM, Almog Khaikin wrote:
> On 4/26/22 04:49, Jens Axboe wrote:
>> Rather than require ctx->completion_lock for ensuring that we don't
>> clobber the flags, use the atomic bitop helpers instead. This removes
>> the need to grab the completion_lock, in preparation for needing to set
>> or clear sq_flags when we don't know the status of this lock.
>
> The smp_mb() in io_sq_thread() should also be changed to
> smp_mb__after_atomic()
Indeed, want to send a patch?
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 3/6] io-wq: use __set_notify_signal() to wake workers
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
2022-04-26 1:48 ` [PATCH 1/6] task_work: allow TWA_SIGNAL without a rescheduling IPI Jens Axboe
2022-04-26 1:49 ` [PATCH 2/6] io_uring: serialize ctx->rings->sq_flags with atomic_or/and Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 4/6] io_uring: set task_work notify method at init time Jens Axboe
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
The only difference between set_notify_signal() and __set_notify_signal()
is that the former checks if it needs to deliver an IPI to force a
reschedule. As the io-wq workers never leave the kernel, and IPI is never
needed, they simply need a wakeup.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io-wq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c
index 32aeb2c581c5..824623bcf1a5 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -871,7 +871,7 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
static bool io_wq_worker_wake(struct io_worker *worker, void *data)
{
- set_notify_signal(worker->task);
+ __set_notify_signal(worker->task);
wake_up_process(worker->task);
return false;
}
@@ -991,7 +991,7 @@ static bool __io_wq_worker_cancel(struct io_worker *worker,
{
if (work && match->fn(work, match->data)) {
work->flags |= IO_WQ_WORK_CANCEL;
- set_notify_signal(worker->task);
+ __set_notify_signal(worker->task);
return true;
}
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 4/6] io_uring: set task_work notify method at init time
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
` (2 preceding siblings ...)
2022-04-26 1:49 ` [PATCH 3/6] io-wq: use __set_notify_signal() to wake workers Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used Jens Axboe
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
While doing so, switch SQPOLL to TWA_SIGNAL_NO_IPI as well, as that
just does a task wakeup and then we can remove the special wakeup we
have in task_work_add.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 511b52e4b9fd..7e9ac5fd3a8c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -367,6 +367,7 @@ struct io_ring_ctx {
struct io_rings *rings;
unsigned int flags;
+ enum task_work_notify_mode notify_method;
unsigned int compat: 1;
unsigned int drain_next: 1;
unsigned int restricted: 1;
@@ -2651,8 +2652,8 @@ static void tctx_task_work(struct callback_head *cb)
static void io_req_task_work_add(struct io_kiocb *req, bool priority)
{
struct task_struct *tsk = req->task;
+ struct io_ring_ctx *ctx = req->ctx;
struct io_uring_task *tctx = tsk->io_uring;
- enum task_work_notify_mode notify;
struct io_wq_work_node *node;
unsigned long flags;
bool running;
@@ -2675,18 +2676,8 @@ static void io_req_task_work_add(struct io_kiocb *req, bool priority)
if (running)
return;
- /*
- * SQPOLL kernel thread doesn't need notification, just a wakeup. For
- * all other cases, use TWA_SIGNAL unconditionally to ensure we're
- * processing task_work. There's no reliable way to tell if TWA_RESUME
- * will do the job.
- */
- notify = (req->ctx->flags & IORING_SETUP_SQPOLL) ? TWA_NONE : TWA_SIGNAL;
- if (likely(!task_work_add(tsk, &tctx->task_work, notify))) {
- if (notify == TWA_NONE)
- wake_up_process(tsk);
+ if (likely(!task_work_add(tsk, &tctx->task_work, ctx->notify_method)))
return;
- }
spin_lock_irqsave(&tctx->task_lock, flags);
tctx->task_running = false;
@@ -11704,6 +11695,14 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
if (!capable(CAP_IPC_LOCK))
ctx->user = get_uid(current_user());
+ /*
+ * For SQPOLL, we just need a wakeup, always.
+ */
+ if (ctx->flags & IORING_SETUP_SQPOLL)
+ ctx->notify_method = TWA_SIGNAL_NO_IPI;
+ else
+ ctx->notify_method = TWA_SIGNAL;
+
/*
* This is just grabbed for accounting purposes. When a process exits,
* the mm is exited and dropped before the files, hence we need to hang
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
` (3 preceding siblings ...)
2022-04-26 1:49 ` [PATCH 4/6] io_uring: set task_work notify method at init time Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 1:49 ` [PATCH 6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG Jens Axboe
2022-04-26 14:02 ` [PATCHSET v4 next 0/5] Add support for non-IPI task_work Pavel Begunkov
6 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
If this is set, io_uring will never use an IPI to deliver a task_work
notification. This can be used in the common case where a single task or
thread communicates with the ring, and doesn't rely on
io_uring_cqe_peek().
This provides a noticeable win in performance, both from eliminating
the IPI itself, but also from avoiding interrupting the submitting
task unnecessarily.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 17 +++++++++++++----
include/uapi/linux/io_uring.h | 8 ++++++++
2 files changed, 21 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7e9ac5fd3a8c..5e4842cd21c2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -11696,12 +11696,20 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
ctx->user = get_uid(current_user());
/*
- * For SQPOLL, we just need a wakeup, always.
+ * For SQPOLL, we just need a wakeup, always. For !SQPOLL, if
+ * COOP_TASKRUN is set, then IPIs are never needed by the app.
*/
- if (ctx->flags & IORING_SETUP_SQPOLL)
+ ret = -EINVAL;
+ if (ctx->flags & IORING_SETUP_SQPOLL) {
+ /* IPI related flags don't make sense with SQPOLL */
+ if (ctx->flags & IORING_SETUP_COOP_TASKRUN)
+ goto err;
ctx->notify_method = TWA_SIGNAL_NO_IPI;
- else
+ } else if (ctx->flags & IORING_SETUP_COOP_TASKRUN) {
+ ctx->notify_method = TWA_SIGNAL_NO_IPI;
+ } else {
ctx->notify_method = TWA_SIGNAL;
+ }
/*
* This is just grabbed for accounting purposes. When a process exits,
@@ -11800,7 +11808,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
if (p.flags & ~(IORING_SETUP_IOPOLL | IORING_SETUP_SQPOLL |
IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
- IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL))
+ IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
+ IORING_SETUP_COOP_TASKRUN))
return -EINVAL;
return io_uring_create(entries, &p, params);
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 5fb52bf32435..4654842ace88 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -104,6 +104,14 @@ enum {
#define IORING_SETUP_ATTACH_WQ (1U << 5) /* attach to existing wq */
#define IORING_SETUP_R_DISABLED (1U << 6) /* start with ring disabled */
#define IORING_SETUP_SUBMIT_ALL (1U << 7) /* continue submit on error */
+/*
+ * Cooperative task running. When requests complete, they often require
+ * forcing the submitter to transition to the kernel to complete. If this
+ * flag is set, work will be done when the task transitions anyway, rather
+ * than force an inter-processor interrupt reschedule. This avoids interrupting
+ * a task running in userspace, and saves an IPI.
+ */
+#define IORING_SETUP_COOP_TASKRUN (1U << 8)
enum {
IORING_OP_NOP,
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
` (4 preceding siblings ...)
2022-04-26 1:49 ` [PATCH 5/6] io_uring: use TWA_SIGNAL_NO_IPI if IORING_SETUP_COOP_TASKRUN is used Jens Axboe
@ 2022-04-26 1:49 ` Jens Axboe
2022-04-26 14:02 ` [PATCHSET v4 next 0/5] Add support for non-IPI task_work Pavel Begunkov
6 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-04-26 1:49 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
If IORING_SETUP_COOP_TASKRUN is set to use cooperative scheduling for
running task_work, then IORING_SETUP_TASKRUN_FLAG can be set so the
application can tell if task_work is pending in the kernel for this
ring. This allows use cases like io_uring_peek_cqe() to still function
appropriately, or for the task to know when it would be useful to
call io_uring_wait_cqe() to run pending events.
Signed-off-by: Jens Axboe <[email protected]>
---
fs/io_uring.c | 14 +++++++++++---
include/uapi/linux/io_uring.h | 7 +++++++
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5e4842cd21c2..2c859ab326cd 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2536,6 +2536,8 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
{
if (!ctx)
return;
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
if (*locked) {
io_submit_flush_completions(ctx);
mutex_unlock(&ctx->uring_lock);
@@ -2676,6 +2678,9 @@ static void io_req_task_work_add(struct io_kiocb *req, bool priority)
if (running)
return;
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+
if (likely(!task_work_add(tsk, &tctx->task_work, ctx->notify_method)))
return;
@@ -11702,12 +11707,15 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
ret = -EINVAL;
if (ctx->flags & IORING_SETUP_SQPOLL) {
/* IPI related flags don't make sense with SQPOLL */
- if (ctx->flags & IORING_SETUP_COOP_TASKRUN)
+ if (ctx->flags & (IORING_SETUP_COOP_TASKRUN |
+ IORING_SETUP_TASKRUN_FLAG))
goto err;
ctx->notify_method = TWA_SIGNAL_NO_IPI;
} else if (ctx->flags & IORING_SETUP_COOP_TASKRUN) {
ctx->notify_method = TWA_SIGNAL_NO_IPI;
} else {
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
+ goto err;
ctx->notify_method = TWA_SIGNAL;
}
@@ -11809,10 +11817,10 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
IORING_SETUP_SQ_AFF | IORING_SETUP_CQSIZE |
IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
- IORING_SETUP_COOP_TASKRUN))
+ IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG))
return -EINVAL;
- return io_uring_create(entries, &p, params);
+ return io_uring_create(entries, &p, params);
}
SYSCALL_DEFINE2(io_uring_setup, u32, entries,
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 4654842ace88..ad53def6abb8 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -112,6 +112,12 @@ enum {
* a task running in userspace, and saves an IPI.
*/
#define IORING_SETUP_COOP_TASKRUN (1U << 8)
+/*
+ * If COOP_TASKRUN is set, get notified if task work is available for
+ * running and a kernel transition would be needed to run it. This sets
+ * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
+ */
+#define IORING_SETUP_TASKRUN_FLAG (1U << 9)
enum {
IORING_OP_NOP,
@@ -263,6 +269,7 @@ struct io_sqring_offsets {
*/
#define IORING_SQ_NEED_WAKEUP (1U << 0) /* needs io_uring_enter wakeup */
#define IORING_SQ_CQ_OVERFLOW (1U << 1) /* CQ ring is overflown */
+#define IORING_SQ_TASKRUN (1U << 2) /* task should enter the kernel */
struct io_cqring_offsets {
__u32 head;
--
2.35.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCHSET v4 next 0/5] Add support for non-IPI task_work
2022-04-26 1:48 [PATCHSET v4 next 0/5] Add support for non-IPI task_work Jens Axboe
` (5 preceding siblings ...)
2022-04-26 1:49 ` [PATCH 6/6] io_uring: add IORING_SETUP_TASKRUN_FLAG Jens Axboe
@ 2022-04-26 14:02 ` Pavel Begunkov
6 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-04-26 14:02 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 4/26/22 02:48, Jens Axboe wrote:
> Hi,
>
> Unless we're using SQPOLL, any task_work queue will result in an IPI
> to the target task unless it's running in the kernel already. This isn't
> always needed, particularly not for the common case of not sharing the
> ring. In certain workloads, this can provide a 5-10% improvement. Some
> of this is due the cost of the IPI, and some from needlessly
> interrupting the target task when the work could just get run when
> completions are being waited for.
>
> Patches 1..4 are prep patches, patch 5 is the actual change, and patch 6
> adds support for IORING_SQ_TASKRUN so that applications may use this
> feature and still rely on io_uring_peek_cqe().
>
> v4:
> - Make SQPOLL incompatible with the IPI flags. It makes no sense for
> SQPOLL as no IPIs are ever used there anyway, so make that explicit
> and fail a request to setup a ring like that.
looks good
Reviewed-by: Pavel Begunkov <[email protected]>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 12+ messages in thread