From: Pavel Begunkov <[email protected]>
To: [email protected]
Cc: Jens Axboe <[email protected]>
Subject: Re: [RFC] io_uring: wake up optimisations
Date: Tue, 20 Dec 2022 18:06:20 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <81104db1a04efbfcec90f5819081b4299542671a.1671559005.git.asml.silence@gmail.com>
On 12/20/22 17:58, Pavel Begunkov wrote:
> NOT FOR INCLUSION, needs some ring poll workarounds
>
> Flush completions is done either from the submit syscall or by the
> task_work, both are in the context of the submitter task, and when it
> goes for a single threaded rings like implied by ->task_complete, there
> won't be any waiters on ->cq_wait but the master task. That means that
> there can be no tasks sleeping on cq_wait while we run
> __io_submit_flush_completions() and so waking up can be skipped.
Not trivial to benchmark as we need something to emulate a task_work
coming in the middle of waiting. I used the diff below to complete nops
in tw and removed preliminary tw runs for the "in the middle of waiting"
part. IORING_SETUP_SKIP_CQWAKE controls whether we use optimisation or
not.
It gets around 15% more IOPS (6769526 -> 7803304), which correlates
to 10% of wakeup cost in profiles. Another interesting part is that
waitqueues are excessive for our purposes and we can replace cq_wait
with something less heavier, e.g. atomic bit set
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 9d4c4078e8d0..5a4f03a4ea40 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -164,6 +164,7 @@ enum {
* try to do it just before it is needed.
*/
#define IORING_SETUP_DEFER_TASKRUN (1U << 13)
+#define IORING_SETUP_SKIP_CQWAKE (1U << 14)
enum io_uring_op {
IORING_OP_NOP,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index a57b9008807c..68556dea060b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -631,7 +631,7 @@ static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
* it will re-check the wakeup conditions once we return we can safely
* skip waking it up.
*/
- if (!ctx->task_complete) {
+ if (!(ctx->flags & IORING_SETUP_SKIP_CQWAKE)) {
smp_mb();
__io_cqring_wake(ctx);
}
@@ -2519,18 +2519,6 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
if (!io_allowed_run_tw(ctx))
return -EEXIST;
- do {
- /* always run at least 1 task work to process local work */
- ret = io_run_task_work_ctx(ctx);
- if (ret < 0)
- return ret;
- io_cqring_overflow_flush(ctx);
-
- /* if user messes with these they will just get an early return */
- if (__io_cqring_events_user(ctx) >= min_events)
- return 0;
- } while (ret > 0);
-
if (sig) {
#ifdef CONFIG_COMPAT
if (in_compat_syscall())
@@ -3345,16 +3333,6 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
mutex_unlock(&ctx->uring_lock);
goto out;
}
- if (flags & IORING_ENTER_GETEVENTS) {
- if (ctx->syscall_iopoll)
- goto iopoll_locked;
- /*
- * Ignore errors, we'll soon call io_cqring_wait() and
- * it should handle ownership problems if any.
- */
- if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
- (void)io_run_local_work_locked(ctx);
- }
mutex_unlock(&ctx->uring_lock);
}
@@ -3721,7 +3699,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
- IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN))
+ IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN |
+ IORING_SETUP_SKIP_CQWAKE))
return -EINVAL;
return io_uring_create(entries, &p, params);
diff --git a/io_uring/nop.c b/io_uring/nop.c
index d956599a3c1b..77c686de3eb2 100644
--- a/io_uring/nop.c
+++ b/io_uring/nop.c
@@ -20,6 +20,6 @@ int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
*/
int io_nop(struct io_kiocb *req, unsigned int issue_flags)
{
- io_req_set_res(req, 0, 0);
- return IOU_OK;
+ io_req_queue_tw_complete(req, 0);
+ return IOU_ISSUE_SKIP_COMPLETE;
}
--
Pavel Begunkov
next prev parent reply other threads:[~2022-12-20 18:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-20 17:58 [RFC] io_uring: wake up optimisations Pavel Begunkov
2022-12-20 18:06 ` Pavel Begunkov [this message]
2022-12-20 18:10 ` Jens Axboe
2022-12-20 19:12 ` Pavel Begunkov
2022-12-20 19:22 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox