* [PATCH 0/8] second part of 5.12 patches
@ 2021-01-25 11:42 Pavel Begunkov
2021-01-25 11:42 ` [PATCH 1/8] io_uring: ensure only sqo_task has file notes Pavel Begunkov
` (8 more replies)
0 siblings, 9 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
1,2 are simple, can be considered separately
3-8 are inline completion optimisations, should affect buffered rw,
recv/send and others that can complete inline.
fio/t/io_uring do_nop=1 benchmark (batch=32) in KIOPS:
baseline (1-5 applied): qd32: 8001, qd1: 2015
arrays (+6/8): qd32: 8128, qd1: 2028
batching (+7/8): qd32: 10300, qd1: 1946
The downside is worse qd1 with batching, don't think we should care much
because most of the time is syscalling, and I can easily get ~15-30% and
5-10% for qd32 and qd1 respectively by making ring's allocation cache
persistent and feeding memory of inline executed requests back into it.
Note: this should not affect async executed requests, e.g. block rw,
because they never hit this path.
Pavel Begunkov (8):
io_uring: ensure only sqo_task has file notes
io_uring: consolidate putting reqs task
io_uring: don't keep submit_state on stack
io_uring: remove ctx from comp_state
io_uring: don't reinit submit state every time
io_uring: replace list with array for compl batch
io_uring: submit-completion free batching
io_uring: keep interrupts on on submit completion
fs/io_uring.c | 221 +++++++++++++++++++++++++-------------------------
1 file changed, 110 insertions(+), 111 deletions(-)
--
2.24.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/8] io_uring: ensure only sqo_task has file notes
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
@ 2021-01-25 11:42 ` Pavel Begunkov
2021-01-25 11:42 ` [PATCH 2/8] io_uring: consolidate putting reqs task Pavel Begunkov
` (7 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
For SQPOLL io_urnig we want to have only one file note held by
sqo_task. Add a warning to make sure it holds. It's deep in
io_uring_add_task_file() out of hot path, so shouldn't hurt.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7a17c947e64b..8be7b4c6d304 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -9074,6 +9074,10 @@ static int io_uring_add_task_file(struct io_ring_ctx *ctx, struct file *file)
fput(file);
return ret;
}
+
+ /* one and only SQPOLL file note, held by sqo_task */
+ WARN_ON_ONCE((ctx->flags & IORING_SETUP_SQPOLL) &&
+ current != ctx->sqo_task);
}
tctx->last = file;
}
--
2.24.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/8] io_uring: consolidate putting reqs task
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
2021-01-25 11:42 ` [PATCH 1/8] io_uring: ensure only sqo_task has file notes Pavel Begunkov
@ 2021-01-25 11:42 ` Pavel Begunkov
2021-01-25 11:42 ` [PATCH 3/8] io_uring: don't keep submit_state on stack Pavel Begunkov
` (6 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
We grab a task for each request and while putting it it also have to do
extra work like inflight accounting and waking up that task. This
sequence is duplicated several time, it's good time to add a helper.
More to that, the helper generates better code due to better locality
and so not failing alias analysis.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 34 ++++++++++++++--------------------
1 file changed, 14 insertions(+), 20 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8be7b4c6d304..fa421555ab8b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2013,17 +2013,22 @@ static void io_dismantle_req(struct io_kiocb *req)
io_req_clean_work(req);
}
+static inline void io_put_task(struct task_struct *task, int nr)
+{
+ struct io_uring_task *tctx = task->io_uring;
+
+ percpu_counter_sub(&tctx->inflight, nr);
+ if (unlikely(atomic_read(&tctx->in_idle)))
+ wake_up(&tctx->wait);
+ put_task_struct_many(task, nr);
+}
+
static void __io_free_req(struct io_kiocb *req)
{
- struct io_uring_task *tctx = req->task->io_uring;
struct io_ring_ctx *ctx = req->ctx;
io_dismantle_req(req);
-
- percpu_counter_dec(&tctx->inflight);
- if (atomic_read(&tctx->in_idle))
- wake_up(&tctx->wait);
- put_task_struct(req->task);
+ io_put_task(req->task, 1);
if (likely(!io_is_fallback_req(req)))
kmem_cache_free(req_cachep, req);
@@ -2277,12 +2282,7 @@ static void io_req_free_batch_finish(struct io_ring_ctx *ctx,
if (rb->to_free)
__io_req_free_batch_flush(ctx, rb);
if (rb->task) {
- struct io_uring_task *tctx = rb->task->io_uring;
-
- percpu_counter_sub(&tctx->inflight, rb->task_refs);
- if (atomic_read(&tctx->in_idle))
- wake_up(&tctx->wait);
- put_task_struct_many(rb->task, rb->task_refs);
+ io_put_task(rb->task, rb->task_refs);
rb->task = NULL;
}
}
@@ -2296,14 +2296,8 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
io_queue_next(req);
if (req->task != rb->task) {
- if (rb->task) {
- struct io_uring_task *tctx = rb->task->io_uring;
-
- percpu_counter_sub(&tctx->inflight, rb->task_refs);
- if (atomic_read(&tctx->in_idle))
- wake_up(&tctx->wait);
- put_task_struct_many(rb->task, rb->task_refs);
- }
+ if (rb->task)
+ io_put_task(rb->task, rb->task_refs);
rb->task = req->task;
rb->task_refs = 0;
}
--
2.24.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/8] io_uring: don't keep submit_state on stack
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
2021-01-25 11:42 ` [PATCH 1/8] io_uring: ensure only sqo_task has file notes Pavel Begunkov
2021-01-25 11:42 ` [PATCH 2/8] io_uring: consolidate putting reqs task Pavel Begunkov
@ 2021-01-25 11:42 ` Pavel Begunkov
2021-01-25 16:00 ` Jens Axboe
2021-01-25 11:42 ` [PATCH 4/8] io_uring: remove ctx from comp_state Pavel Begunkov
` (5 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
struct io_submit_state is quite big (168 bytes) and going to grow. It's
better to not keep it on stack as it is now. Move it to context, it's
always protected by uring_lock, so it's fine to have only one instance
of it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 90 ++++++++++++++++++++++++++-------------------------
1 file changed, 46 insertions(+), 44 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fa421555ab8b..9ba33ee08d2a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -259,6 +259,39 @@ struct io_sq_data {
unsigned sq_thread_idle;
};
+#define IO_IOPOLL_BATCH 8
+
+struct io_comp_state {
+ unsigned int nr;
+ struct list_head list;
+ struct io_ring_ctx *ctx;
+};
+
+struct io_submit_state {
+ struct blk_plug plug;
+
+ /*
+ * io_kiocb alloc cache
+ */
+ void *reqs[IO_IOPOLL_BATCH];
+ unsigned int free_reqs;
+
+ bool plug_started;
+
+ /*
+ * Batch completion logic
+ */
+ struct io_comp_state comp;
+
+ /*
+ * File reference cache
+ */
+ struct file *file;
+ unsigned int fd;
+ unsigned int file_refs;
+ unsigned int ios_left;
+};
+
struct io_ring_ctx {
struct {
struct percpu_ref refs;
@@ -401,6 +434,7 @@ struct io_ring_ctx {
struct work_struct exit_work;
struct io_restriction restrictions;
+ struct io_submit_state submit_state;
};
/*
@@ -752,39 +786,6 @@ struct io_defer_entry {
u32 seq;
};
-#define IO_IOPOLL_BATCH 8
-
-struct io_comp_state {
- unsigned int nr;
- struct list_head list;
- struct io_ring_ctx *ctx;
-};
-
-struct io_submit_state {
- struct blk_plug plug;
-
- /*
- * io_kiocb alloc cache
- */
- void *reqs[IO_IOPOLL_BATCH];
- unsigned int free_reqs;
-
- bool plug_started;
-
- /*
- * Batch completion logic
- */
- struct io_comp_state comp;
-
- /*
- * File reference cache
- */
- struct file *file;
- unsigned int fd;
- unsigned int file_refs;
- unsigned int ios_left;
-};
-
struct io_op_def {
/* needs req->file assigned */
unsigned needs_file : 1;
@@ -1965,9 +1966,10 @@ static struct io_kiocb *io_get_fallback_req(struct io_ring_ctx *ctx)
return NULL;
}
-static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx,
- struct io_submit_state *state)
+static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
{
+ struct io_submit_state *state = &ctx->submit_state;
+
if (!state->free_reqs) {
gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
size_t sz;
@@ -6800,9 +6802,9 @@ static inline bool io_check_restriction(struct io_ring_ctx *ctx,
IOSQE_BUFFER_SELECT)
static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
- const struct io_uring_sqe *sqe,
- struct io_submit_state *state)
+ const struct io_uring_sqe *sqe)
{
+ struct io_submit_state *state;
unsigned int sqe_flags;
int id, ret;
@@ -6854,6 +6856,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
/* same numerical values with corresponding REQ_F_*, safe to copy */
req->flags |= sqe_flags;
+ state = &ctx->submit_state;
/*
* Plug now if we have more than 1 IO left after this, and the target
@@ -6881,7 +6884,6 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
{
- struct io_submit_state state;
struct io_submit_link link;
int i, submitted = 0;
@@ -6900,7 +6902,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
percpu_counter_add(¤t->io_uring->inflight, nr);
refcount_add(nr, ¤t->usage);
- io_submit_state_start(&state, ctx, nr);
+ io_submit_state_start(&ctx->submit_state, ctx, nr);
link.head = NULL;
for (i = 0; i < nr; i++) {
@@ -6913,7 +6915,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
io_consume_sqe(ctx);
break;
}
- req = io_alloc_req(ctx, &state);
+ req = io_alloc_req(ctx);
if (unlikely(!req)) {
if (!submitted)
submitted = -EAGAIN;
@@ -6923,7 +6925,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
/* will complete beyond this point, count as submitted */
submitted++;
- err = io_init_req(ctx, req, sqe, &state);
+ err = io_init_req(ctx, req, sqe);
if (unlikely(err)) {
fail_req:
io_put_req(req);
@@ -6933,7 +6935,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
trace_io_uring_submit_sqe(ctx, req->opcode, req->user_data,
true, ctx->flags & IORING_SETUP_SQPOLL);
- err = io_submit_sqe(req, sqe, &link, &state.comp);
+ err = io_submit_sqe(req, sqe, &link, &ctx->submit_state.comp);
if (err)
goto fail_req;
}
@@ -6948,8 +6950,8 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
put_task_struct_many(current, unused);
}
if (link.head)
- io_queue_link_head(link.head, &state.comp);
- io_submit_state_end(&state);
+ io_queue_link_head(link.head, &ctx->submit_state.comp);
+ io_submit_state_end(&ctx->submit_state);
/* Commit SQ ring head once we've consumed and submitted all SQEs */
io_commit_sqring(ctx);
--
2.24.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/8] io_uring: remove ctx from comp_state
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
` (2 preceding siblings ...)
2021-01-25 11:42 ` [PATCH 3/8] io_uring: don't keep submit_state on stack Pavel Begunkov
@ 2021-01-25 11:42 ` Pavel Begunkov
2021-01-25 11:42 ` [PATCH 5/8] io_uring: don't reinit submit state every time Pavel Begunkov
` (4 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
completion state is closely bound to ctx, we don't need to store ctx
inside as we always have it around to pass to flush.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9ba33ee08d2a..7d811cf0c27b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -264,7 +264,6 @@ struct io_sq_data {
struct io_comp_state {
unsigned int nr;
struct list_head list;
- struct io_ring_ctx *ctx;
};
struct io_submit_state {
@@ -1892,10 +1891,9 @@ static void io_req_complete_nostate(struct io_kiocb *req, long res,
io_put_req(req);
}
-static void io_submit_flush_completions(struct io_comp_state *cs)
+static void io_submit_flush_completions(struct io_comp_state *cs,
+ struct io_ring_ctx *ctx)
{
- struct io_ring_ctx *ctx = cs->ctx;
-
spin_lock_irq(&ctx->completion_lock);
while (!list_empty(&cs->list)) {
struct io_kiocb *req;
@@ -6562,7 +6560,7 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
if (req->flags & REQ_F_COMPLETE_INLINE) {
list_add_tail(&req->compl.list, &cs->list);
if (++cs->nr >= 32)
- io_submit_flush_completions(cs);
+ io_submit_flush_completions(cs, req->ctx);
req = NULL;
} else {
req = io_put_req_find_next(req);
@@ -6697,10 +6695,11 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
/*
* Batched submission is done, ensure local IO is flushed out.
*/
-static void io_submit_state_end(struct io_submit_state *state)
+static void io_submit_state_end(struct io_submit_state *state,
+ struct io_ring_ctx *ctx)
{
if (!list_empty(&state->comp.list))
- io_submit_flush_completions(&state->comp);
+ io_submit_flush_completions(&state->comp, ctx);
if (state->plug_started)
blk_finish_plug(&state->plug);
io_state_file_put(state);
@@ -6712,12 +6711,11 @@ static void io_submit_state_end(struct io_submit_state *state)
* Start submission side cache.
*/
static void io_submit_state_start(struct io_submit_state *state,
- struct io_ring_ctx *ctx, unsigned int max_ios)
+ unsigned int max_ios)
{
state->plug_started = false;
state->comp.nr = 0;
INIT_LIST_HEAD(&state->comp.list);
- state->comp.ctx = ctx;
state->free_reqs = 0;
state->file_refs = 0;
state->ios_left = max_ios;
@@ -6902,7 +6900,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
percpu_counter_add(¤t->io_uring->inflight, nr);
refcount_add(nr, ¤t->usage);
- io_submit_state_start(&ctx->submit_state, ctx, nr);
+ io_submit_state_start(&ctx->submit_state, nr);
link.head = NULL;
for (i = 0; i < nr; i++) {
@@ -6951,7 +6949,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
}
if (link.head)
io_queue_link_head(link.head, &ctx->submit_state.comp);
- io_submit_state_end(&ctx->submit_state);
+ io_submit_state_end(&ctx->submit_state, ctx);
/* Commit SQ ring head once we've consumed and submitted all SQEs */
io_commit_sqring(ctx);
--
2.24.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/8] io_uring: don't reinit submit state every time
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
` (3 preceding siblings ...)
2021-01-25 11:42 ` [PATCH 4/8] io_uring: remove ctx from comp_state Pavel Begunkov
@ 2021-01-25 11:42 ` Pavel Begunkov
2021-01-25 11:42 ` [PATCH 6/8] io_uring: replace list with array for compl batch Pavel Begunkov
` (3 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
As now submit_state is retained across syscalls, we can save ourself
from initialising it from ground up for each io_submit_sqes(). Set some
fields during ctx allocation, and just keep them always consistent.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7d811cf0c27b..08d0c8b60c2a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1284,6 +1284,7 @@ static inline bool io_is_timeout_noseq(struct io_kiocb *req)
static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
{
+ struct io_submit_state *submit_state;
struct io_ring_ctx *ctx;
int hash_bits;
@@ -1335,6 +1336,12 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_LIST_HEAD(&ctx->rsrc_ref_list);
INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
init_llist_head(&ctx->rsrc_put_llist);
+
+ submit_state = &ctx->submit_state;
+ INIT_LIST_HEAD(&submit_state->comp.list);
+ submit_state->comp.nr = 0;
+ submit_state->file_refs = 0;
+ submit_state->free_reqs = 0;
return ctx;
err:
if (ctx->fallback_req)
@@ -6703,8 +6710,10 @@ static void io_submit_state_end(struct io_submit_state *state,
if (state->plug_started)
blk_finish_plug(&state->plug);
io_state_file_put(state);
- if (state->free_reqs)
+ if (state->free_reqs) {
kmem_cache_free_bulk(req_cachep, state->free_reqs, state->reqs);
+ state->free_reqs = 0;
+ }
}
/*
@@ -6714,10 +6723,6 @@ static void io_submit_state_start(struct io_submit_state *state,
unsigned int max_ios)
{
state->plug_started = false;
- state->comp.nr = 0;
- INIT_LIST_HEAD(&state->comp.list);
- state->free_reqs = 0;
- state->file_refs = 0;
state->ios_left = max_ios;
}
--
2.24.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 6/8] io_uring: replace list with array for compl batch
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
` (4 preceding siblings ...)
2021-01-25 11:42 ` [PATCH 5/8] io_uring: don't reinit submit state every time Pavel Begunkov
@ 2021-01-25 11:42 ` Pavel Begunkov
2021-01-25 11:42 ` [PATCH 7/8] io_uring: submit-completion free batching Pavel Begunkov
` (2 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
Reincarnation of an old patch that replaces a list in struct
io_compl_batch with an array. It's needed to avoid hooking requests via
their compl.list, because it won't be always available in the future.
It's also nice to split io_submit_flush_completions() to avoid free
under locks and remove unlock/lock with a long comment describing when
it can be done.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 35 +++++++++++------------------------
1 file changed, 11 insertions(+), 24 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 08d0c8b60c2a..fcd8df43b6bf 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -260,10 +260,11 @@ struct io_sq_data {
};
#define IO_IOPOLL_BATCH 8
+#define IO_COMPL_BATCH 32
struct io_comp_state {
unsigned int nr;
- struct list_head list;
+ struct io_kiocb *reqs[IO_COMPL_BATCH];
};
struct io_submit_state {
@@ -1338,7 +1339,6 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
init_llist_head(&ctx->rsrc_put_llist);
submit_state = &ctx->submit_state;
- INIT_LIST_HEAD(&submit_state->comp.list);
submit_state->comp.nr = 0;
submit_state->file_refs = 0;
submit_state->free_reqs = 0;
@@ -1901,33 +1901,20 @@ static void io_req_complete_nostate(struct io_kiocb *req, long res,
static void io_submit_flush_completions(struct io_comp_state *cs,
struct io_ring_ctx *ctx)
{
+ int i, nr = cs->nr;
+
spin_lock_irq(&ctx->completion_lock);
- while (!list_empty(&cs->list)) {
- struct io_kiocb *req;
+ for (i = 0; i < nr; i++) {
+ struct io_kiocb *req = cs->reqs[i];
- req = list_first_entry(&cs->list, struct io_kiocb, compl.list);
- list_del(&req->compl.list);
__io_cqring_fill_event(req, req->result, req->compl.cflags);
-
- /*
- * io_free_req() doesn't care about completion_lock unless one
- * of these flags is set. REQ_F_WORK_INITIALIZED is in the list
- * because of a potential deadlock with req->work.fs->lock
- * We defer both, completion and submission refs.
- */
- if (req->flags & (REQ_F_FAIL_LINK|REQ_F_LINK_TIMEOUT
- |REQ_F_WORK_INITIALIZED)) {
- spin_unlock_irq(&ctx->completion_lock);
- io_double_put_req(req);
- spin_lock_irq(&ctx->completion_lock);
- } else {
- io_double_put_req(req);
- }
}
io_commit_cqring(ctx);
spin_unlock_irq(&ctx->completion_lock);
io_cqring_ev_posted(ctx);
+ for (i = 0; i < nr; i++)
+ io_double_put_req(cs->reqs[i]);
cs->nr = 0;
}
@@ -6565,8 +6552,8 @@ static void __io_queue_sqe(struct io_kiocb *req, struct io_comp_state *cs)
} else if (likely(!ret)) {
/* drop submission reference */
if (req->flags & REQ_F_COMPLETE_INLINE) {
- list_add_tail(&req->compl.list, &cs->list);
- if (++cs->nr >= 32)
+ cs->reqs[cs->nr++] = req;
+ if (cs->nr == IO_COMPL_BATCH)
io_submit_flush_completions(cs, req->ctx);
req = NULL;
} else {
@@ -6705,7 +6692,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
static void io_submit_state_end(struct io_submit_state *state,
struct io_ring_ctx *ctx)
{
- if (!list_empty(&state->comp.list))
+ if (state->comp.nr)
io_submit_flush_completions(&state->comp, ctx);
if (state->plug_started)
blk_finish_plug(&state->plug);
--
2.24.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 7/8] io_uring: submit-completion free batching
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
` (5 preceding siblings ...)
2021-01-25 11:42 ` [PATCH 6/8] io_uring: replace list with array for compl batch Pavel Begunkov
@ 2021-01-25 11:42 ` Pavel Begunkov
2021-01-25 11:42 ` [PATCH 8/8] io_uring: keep interrupts on on submit completion Pavel Begunkov
2021-01-25 16:08 ` [PATCH 0/8] second part of 5.12 patches Jens Axboe
8 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_submit_flush_completions() does completion batching, but may also use
free batching as iopoll does. The main beneficiaries should be buffered
reads/writes and send/recv.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 49 +++++++++++++++++++++++++++++--------------------
1 file changed, 29 insertions(+), 20 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fcd8df43b6bf..7a720995e24f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1898,26 +1898,6 @@ static void io_req_complete_nostate(struct io_kiocb *req, long res,
io_put_req(req);
}
-static void io_submit_flush_completions(struct io_comp_state *cs,
- struct io_ring_ctx *ctx)
-{
- int i, nr = cs->nr;
-
- spin_lock_irq(&ctx->completion_lock);
- for (i = 0; i < nr; i++) {
- struct io_kiocb *req = cs->reqs[i];
-
- __io_cqring_fill_event(req, req->result, req->compl.cflags);
- }
- io_commit_cqring(ctx);
- spin_unlock_irq(&ctx->completion_lock);
-
- io_cqring_ev_posted(ctx);
- for (i = 0; i < nr; i++)
- io_double_put_req(cs->reqs[i]);
- cs->nr = 0;
-}
-
static void io_req_complete_state(struct io_kiocb *req, long res,
unsigned int cflags, struct io_comp_state *cs)
{
@@ -2303,6 +2283,35 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
__io_req_free_batch_flush(req->ctx, rb);
}
+static void io_submit_flush_completions(struct io_comp_state *cs,
+ struct io_ring_ctx *ctx)
+{
+ int i, nr = cs->nr;
+ struct io_kiocb *req;
+ struct req_batch rb;
+
+ io_init_req_batch(&rb);
+ spin_lock_irq(&ctx->completion_lock);
+ for (i = 0; i < nr; i++) {
+ req = cs->reqs[i];
+ __io_cqring_fill_event(req, req->result, req->compl.cflags);
+ }
+ io_commit_cqring(ctx);
+ spin_unlock_irq(&ctx->completion_lock);
+
+ io_cqring_ev_posted(ctx);
+ for (i = 0; i < nr; i++) {
+ req = cs->reqs[i];
+
+ /* submission and completion refs */
+ if (refcount_sub_and_test(2, &req->refs))
+ io_req_free_batch(&rb, req);
+ }
+
+ io_req_free_batch_finish(ctx, &rb);
+ cs->nr = 0;
+}
+
/*
* Drop reference to request, return next in chain (if there is one) if this
* was the last reference to this request.
--
2.24.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 8/8] io_uring: keep interrupts on on submit completion
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
` (6 preceding siblings ...)
2021-01-25 11:42 ` [PATCH 7/8] io_uring: submit-completion free batching Pavel Begunkov
@ 2021-01-25 11:42 ` Pavel Begunkov
2021-01-25 16:02 ` Jens Axboe
2021-01-25 16:08 ` [PATCH 0/8] second part of 5.12 patches Jens Axboe
8 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't call io_submit_flush_completions() in interrupt context, no
need to use irq* versions of spinlock.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7a720995e24f..d5baf7a0d4e7 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2291,13 +2291,13 @@ static void io_submit_flush_completions(struct io_comp_state *cs,
struct req_batch rb;
io_init_req_batch(&rb);
- spin_lock_irq(&ctx->completion_lock);
+ spin_lock(&ctx->completion_lock);
for (i = 0; i < nr; i++) {
req = cs->reqs[i];
__io_cqring_fill_event(req, req->result, req->compl.cflags);
}
io_commit_cqring(ctx);
- spin_unlock_irq(&ctx->completion_lock);
+ spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);
for (i = 0; i < nr; i++) {
--
2.24.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/8] io_uring: don't keep submit_state on stack
2021-01-25 11:42 ` [PATCH 3/8] io_uring: don't keep submit_state on stack Pavel Begunkov
@ 2021-01-25 16:00 ` Jens Axboe
2021-01-25 16:25 ` Pavel Begunkov
0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2021-01-25 16:00 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 1/25/21 4:42 AM, Pavel Begunkov wrote:
> struct io_submit_state is quite big (168 bytes) and going to grow. It's
> better to not keep it on stack as it is now. Move it to context, it's
> always protected by uring_lock, so it's fine to have only one instance
> of it.
I don't like this one. Unless you have plans to make it much bigger,
I think it should stay on the stack. On the stack, the ownership is
clear.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 8/8] io_uring: keep interrupts on on submit completion
2021-01-25 11:42 ` [PATCH 8/8] io_uring: keep interrupts on on submit completion Pavel Begunkov
@ 2021-01-25 16:02 ` Jens Axboe
0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2021-01-25 16:02 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 1/25/21 4:42 AM, Pavel Begunkov wrote:
> We don't call io_submit_flush_completions() in interrupt context, no
> need to use irq* versions of spinlock.
_irq() isn't safe to use from IRQ context, it would have to be
_irqsave() for that. So the point here isn't to ensure that we work from
IRQ context, it's to protect against a deadlock when the lock is _also_
grabbed from IRQ context. And for that, we need to make sure that IRQs
are disabled, if this lock is already grabbed from IRQ context.
So this patch is wrong, the code is correct as-is.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/8] second part of 5.12 patches
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
` (7 preceding siblings ...)
2021-01-25 11:42 ` [PATCH 8/8] io_uring: keep interrupts on on submit completion Pavel Begunkov
@ 2021-01-25 16:08 ` Jens Axboe
2021-01-25 16:56 ` Pavel Begunkov
8 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2021-01-25 16:08 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 1/25/21 4:42 AM, Pavel Begunkov wrote:
Applied 1-2 for now. Maybe the context state is workable for the
state, if the numbers are looking this good. I'll try and run
it here too and see what I find.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/8] io_uring: don't keep submit_state on stack
2021-01-25 16:00 ` Jens Axboe
@ 2021-01-25 16:25 ` Pavel Begunkov
2021-01-25 16:31 ` Jens Axboe
0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 16:25 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 25/01/2021 16:00, Jens Axboe wrote:
> On 1/25/21 4:42 AM, Pavel Begunkov wrote:
>> struct io_submit_state is quite big (168 bytes) and going to grow. It's
>> better to not keep it on stack as it is now. Move it to context, it's
>> always protected by uring_lock, so it's fine to have only one instance
>> of it.
>
> I don't like this one. Unless you have plans to make it much bigger,
> I think it should stay on the stack. On the stack, the ownership is
> clear.
Thinking of it, it's not needed for this series, just traversing a list
twice is not nice but bearable.
For experiments I was using its persistency across syscalls + grew it
to 32 to match up completion flush (allocating still by 8) to add req
memory reuse, but that's out of scope of these patches.
I haven't got a strong opinion on that one yet, even though
alloc/dealloc are pretty heavy, this approach may loose allocation
locality.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/8] io_uring: don't keep submit_state on stack
2021-01-25 16:25 ` Pavel Begunkov
@ 2021-01-25 16:31 ` Jens Axboe
2021-01-25 16:52 ` Pavel Begunkov
0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2021-01-25 16:31 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 1/25/21 9:25 AM, Pavel Begunkov wrote:
> On 25/01/2021 16:00, Jens Axboe wrote:
>> On 1/25/21 4:42 AM, Pavel Begunkov wrote:
>>> struct io_submit_state is quite big (168 bytes) and going to grow. It's
>>> better to not keep it on stack as it is now. Move it to context, it's
>>> always protected by uring_lock, so it's fine to have only one instance
>>> of it.
>>
>> I don't like this one. Unless you have plans to make it much bigger,
>> I think it should stay on the stack. On the stack, the ownership is
>> clear.
>
> Thinking of it, it's not needed for this series, just traversing a list
> twice is not nice but bearable.
>
> For experiments I was using its persistency across syscalls + grew it
> to 32 to match up completion flush (allocating still by 8) to add req
> memory reuse, but that's out of scope of these patches.
> I haven't got a strong opinion on that one yet, even though
> alloc/dealloc are pretty heavy, this approach may loose allocation
> locality.
Agree on all of that. Locality is important, but reuse usually gets
pretty useful as long as the total number (and life time) can be
managed.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/8] io_uring: don't keep submit_state on stack
2021-01-25 16:31 ` Jens Axboe
@ 2021-01-25 16:52 ` Pavel Begunkov
0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 16:52 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 25/01/2021 16:31, Jens Axboe wrote:
> On 1/25/21 9:25 AM, Pavel Begunkov wrote:
>> On 25/01/2021 16:00, Jens Axboe wrote:
>>> On 1/25/21 4:42 AM, Pavel Begunkov wrote:
>>>> struct io_submit_state is quite big (168 bytes) and going to grow. It's
>>>> better to not keep it on stack as it is now. Move it to context, it's
>>>> always protected by uring_lock, so it's fine to have only one instance
>>>> of it.
>>>
>>> I don't like this one. Unless you have plans to make it much bigger,
>>> I think it should stay on the stack. On the stack, the ownership is
>>> clear.
>>
>> Thinking of it, it's not needed for this series, just traversing a list
>> twice is not nice but bearable.
>>
>> For experiments I was using its persistency across syscalls + grew it
>> to 32 to match up completion flush (allocating still by 8) to add req
>> memory reuse, but that's out of scope of these patches.
>> I haven't got a strong opinion on that one yet, even though
>> alloc/dealloc are pretty heavy, this approach may loose allocation
>> locality.
>
> Agree on all of that. Locality is important, but reuse usually gets
> pretty useful as long as the total number (and life time) can be
> managed.
That all was about reqs completed inline, and for those it is pretty
easy and without any extra synchronisation. Depending on QD/etc.
it slashes 5-25% of overhead (~5-33% t-put boost), from what's left
with this series.
There are also other tricks extending it to async reqs, but that's
rather for hi QD with plugging off and ultra-fast devices.
Let's forget about these patches for now and I'll wrap experiments
into a patchset sometime later.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/8] second part of 5.12 patches
2021-01-25 16:08 ` [PATCH 0/8] second part of 5.12 patches Jens Axboe
@ 2021-01-25 16:56 ` Pavel Begunkov
2021-01-25 17:04 ` Jens Axboe
0 siblings, 1 reply; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 16:56 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 25/01/2021 16:08, Jens Axboe wrote:
> On 1/25/21 4:42 AM, Pavel Begunkov wrote:
>
> Applied 1-2 for now. Maybe the context state is workable for the
> state, if the numbers are looking this good. I'll try and run
> it here too and see what I find.
I believe I've seen it jumping from ~9.5M to 14M with turbo boost
and not so strict environment, but this claim is not very reliable.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/8] second part of 5.12 patches
2021-01-25 16:56 ` Pavel Begunkov
@ 2021-01-25 17:04 ` Jens Axboe
2021-01-25 17:05 ` Pavel Begunkov
0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2021-01-25 17:04 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 1/25/21 9:56 AM, Pavel Begunkov wrote:
> On 25/01/2021 16:08, Jens Axboe wrote:
>> On 1/25/21 4:42 AM, Pavel Begunkov wrote:
>>
>> Applied 1-2 for now. Maybe the context state is workable for the
>> state, if the numbers are looking this good. I'll try and run
>> it here too and see what I find.
>
> I believe I've seen it jumping from ~9.5M to 14M with turbo boost
> and not so strict environment, but this claim is not very reliable.
Would be nice to see some numbers using eg a network echo bench,
or storage that actually does IO. Even null_blk is better than just
a nop benchmark, even if that one is nice to use as well. But better
as a complement than the main thing.
--
Jens Axboe
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/8] second part of 5.12 patches
2021-01-25 17:04 ` Jens Axboe
@ 2021-01-25 17:05 ` Pavel Begunkov
0 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2021-01-25 17:05 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 25/01/2021 17:04, Jens Axboe wrote:
> On 1/25/21 9:56 AM, Pavel Begunkov wrote:
>> On 25/01/2021 16:08, Jens Axboe wrote:
>>> On 1/25/21 4:42 AM, Pavel Begunkov wrote:
>>>
>>> Applied 1-2 for now. Maybe the context state is workable for the
>>> state, if the numbers are looking this good. I'll try and run
>>> it here too and see what I find.
>>
>> I believe I've seen it jumping from ~9.5M to 14M with turbo boost
>> and not so strict environment, but this claim is not very reliable.
>
> Would be nice to see some numbers using eg a network echo bench,
> or storage that actually does IO. Even null_blk is better than just
> a nop benchmark, even if that one is nice to use as well. But better
> as a complement than the main thing.
I think direct I/O null_blk won't cut because it completes through a
callback, but maybe some buffered rw. Yeah, good idea to try something
more realistic.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2021-01-26 20:38 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-01-25 11:42 [PATCH 0/8] second part of 5.12 patches Pavel Begunkov
2021-01-25 11:42 ` [PATCH 1/8] io_uring: ensure only sqo_task has file notes Pavel Begunkov
2021-01-25 11:42 ` [PATCH 2/8] io_uring: consolidate putting reqs task Pavel Begunkov
2021-01-25 11:42 ` [PATCH 3/8] io_uring: don't keep submit_state on stack Pavel Begunkov
2021-01-25 16:00 ` Jens Axboe
2021-01-25 16:25 ` Pavel Begunkov
2021-01-25 16:31 ` Jens Axboe
2021-01-25 16:52 ` Pavel Begunkov
2021-01-25 11:42 ` [PATCH 4/8] io_uring: remove ctx from comp_state Pavel Begunkov
2021-01-25 11:42 ` [PATCH 5/8] io_uring: don't reinit submit state every time Pavel Begunkov
2021-01-25 11:42 ` [PATCH 6/8] io_uring: replace list with array for compl batch Pavel Begunkov
2021-01-25 11:42 ` [PATCH 7/8] io_uring: submit-completion free batching Pavel Begunkov
2021-01-25 11:42 ` [PATCH 8/8] io_uring: keep interrupts on on submit completion Pavel Begunkov
2021-01-25 16:02 ` Jens Axboe
2021-01-25 16:08 ` [PATCH 0/8] second part of 5.12 patches Jens Axboe
2021-01-25 16:56 ` Pavel Begunkov
2021-01-25 17:04 ` Jens Axboe
2021-01-25 17:05 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox