* [RFC][PATCHSET 00/23] rework/optimise submission+completion paths
@ 2021-09-24 16:31 Pavel Begunkov
2021-09-24 16:31 ` [PATCH 01/23] io_uring: mark having different creds unlikely Pavel Begunkov
` (22 more replies)
0 siblings, 23 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
tested with fio/t/io_uring nops all batching=32:
24 vs 31.5 MIOPS, or ~30% win
WARNING: there is one problem with draining, will fix in v2
There are two parts:
1-14 are about optimising the completion path:
- replaces lists with single linked lists
- kills 64 * 8B of caches in ctx
- adds some shuffling of iopoll bits
- list splice instead of per-req list_add in one place
- inlines io_req_free_batch() and other helpers
15-22: inlines __io_queue_sqe() so all the submission path
up to io_issue_sqe() is inlined + little tweaks
Pavel Begunkov (23):
io_uring: mark having different creds unlikely
io_uring: force_nonspin
io_uring: make io_do_iopoll return number of reqs
io_uring: use slist for completion batching
io_uring: remove allocation cache array
io-wq: add io_wq_work_node based stack
io_uring: replace list with stack for req caches
io_uring: split iopoll loop
io_uring: use single linked list for iopoll
io_uring: add a helper for batch free
io_uring: convert iopoll_completed to store_release
io_uring: optimise batch completion
io_uring: inline completion batching helpers
io_uring: don't pass tail into io_free_batch_list
io_uring: don't pass state to io_submit_state_end
io_uring: deduplicate io_queue_sqe() call sites
io_uring: remove drain_active check from hot path
io_uring: split slow path from io_queue_sqe
io_uring: inline hot path of __io_queue_sqe()
io_uring: reshuffle queue_sqe completion handling
io_uring: restructure submit sqes to_submit checks
io_uring: kill off ->inflight_entry field
io_uring: comment why inline complete calls io_clean_op()
fs/io-wq.h | 60 +++++-
fs/io_uring.c | 503 +++++++++++++++++++++++---------------------------
2 files changed, 283 insertions(+), 280 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 01/23] io_uring: mark having different creds unlikely
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 02/23] io_uring: force_nonspin Pavel Begunkov
` (21 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Hint the compiler that it's not as likely to have creds different from
current attached to a request. The current code generation is far from
ideal, hopefully it can help to some compilers to remove duplicated jump
tables and so.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index d7888bb78cbf..8d0751fba1c2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6629,7 +6629,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
const struct cred *creds = NULL;
int ret;
- if ((req->flags & REQ_F_CREDS) && req->creds != current_cred())
+ if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
creds = override_creds(req->creds);
switch (req->opcode) {
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 02/23] io_uring: force_nonspin
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
2021-09-24 16:31 ` [PATCH 01/23] io_uring: mark having different creds unlikely Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 03/23] io_uring: make io_do_iopoll return number of reqs Pavel Begunkov
` (20 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't really need to pass the number of requests to complete into
io_do_iopoll(), a flag whether to enforce non-spin mode is enough.
Should be straightforward, maybe except io_iopoll_check(). We pass !min
there, because we do never enter with the number of already reaped
requests is larger than the specified @min, apart from the first
iteration, where nr_events is 0 and so the final check should be
identical.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8d0751fba1c2..b5631dcc4540 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2455,7 +2455,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
}
static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
- long min)
+ bool force_nonspin)
{
struct io_kiocb *req, *tmp;
LIST_HEAD(done);
@@ -2463,9 +2463,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
/*
* Only spin for completions if we don't have multiple devices hanging
- * off our complete list, and we're under the requested amount.
+ * off our complete list.
*/
- spin = !ctx->poll_multi_queue && *nr_events < min;
+ spin = !ctx->poll_multi_queue && !force_nonspin;
list_for_each_entry_safe(req, tmp, &ctx->iopoll_list, inflight_entry) {
struct kiocb *kiocb = &req->rw.kiocb;
@@ -2513,7 +2513,7 @@ static void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
while (!list_empty(&ctx->iopoll_list)) {
unsigned int nr_events = 0;
- io_do_iopoll(ctx, &nr_events, 0);
+ io_do_iopoll(ctx, &nr_events, true);
/* let it sleep and repeat later if can't complete a request */
if (nr_events == 0)
@@ -2575,7 +2575,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
list_empty(&ctx->iopoll_list))
break;
}
- ret = io_do_iopoll(ctx, &nr_events, min);
+ ret = io_do_iopoll(ctx, &nr_events, !min);
} while (!ret && nr_events < min && !need_resched());
out:
mutex_unlock(&ctx->uring_lock);
@@ -7338,7 +7338,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
mutex_lock(&ctx->uring_lock);
if (!list_empty(&ctx->iopoll_list))
- io_do_iopoll(ctx, &nr_events, 0);
+ io_do_iopoll(ctx, &nr_events, true);
/*
* Don't submit if refs are dying, good for io_uring_register(),
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 03/23] io_uring: make io_do_iopoll return number of reqs
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
2021-09-24 16:31 ` [PATCH 01/23] io_uring: mark having different creds unlikely Pavel Begunkov
2021-09-24 16:31 ` [PATCH 02/23] io_uring: force_nonspin Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 04/23] io_uring: use slist for completion batching Pavel Begunkov
` (19 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Don't pass nr_events pointer around but return directly, it's less
expensive than pointer increments.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b5631dcc4540..13cb40d8ce08 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2427,8 +2427,7 @@ static inline bool io_run_task_work(void)
/*
* Find and free completed poll iocbs
*/
-static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
- struct list_head *done)
+static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
{
struct req_batch rb;
struct io_kiocb *req;
@@ -2443,7 +2442,6 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
__io_cqring_fill_event(ctx, req->user_data, req->result,
io_put_rw_kbuf(req));
- (*nr_events)++;
if (req_ref_put_and_test(req))
io_req_free_batch(&rb, req, &ctx->submit_state);
@@ -2454,11 +2452,11 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
io_req_free_batch_finish(ctx, &rb);
}
-static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
- bool force_nonspin)
+static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
struct io_kiocb *req, *tmp;
LIST_HEAD(done);
+ int nr_events = 0;
bool spin;
/*
@@ -2478,6 +2476,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
*/
if (READ_ONCE(req->iopoll_completed)) {
list_move_tail(&req->inflight_entry, &done);
+ nr_events++;
continue;
}
if (!list_empty(&done))
@@ -2490,14 +2489,16 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
spin = false;
/* iopoll may have completed current req */
- if (READ_ONCE(req->iopoll_completed))
+ if (READ_ONCE(req->iopoll_completed)) {
list_move_tail(&req->inflight_entry, &done);
+ nr_events++;
+ }
}
if (!list_empty(&done))
- io_iopoll_complete(ctx, nr_events, &done);
+ io_iopoll_complete(ctx, &done);
- return 0;
+ return nr_events;
}
/*
@@ -2511,12 +2512,8 @@ static void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
mutex_lock(&ctx->uring_lock);
while (!list_empty(&ctx->iopoll_list)) {
- unsigned int nr_events = 0;
-
- io_do_iopoll(ctx, &nr_events, true);
-
/* let it sleep and repeat later if can't complete a request */
- if (nr_events == 0)
+ if (io_do_iopoll(ctx, true) == 0)
break;
/*
* Ensure we allow local-to-the-cpu processing to take place,
@@ -2575,8 +2572,12 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
list_empty(&ctx->iopoll_list))
break;
}
- ret = io_do_iopoll(ctx, &nr_events, !min);
- } while (!ret && nr_events < min && !need_resched());
+ ret = io_do_iopoll(ctx, !min);
+ if (ret < 0)
+ break;
+ nr_events += ret;
+ ret = 0;
+ } while (nr_events < min && !need_resched());
out:
mutex_unlock(&ctx->uring_lock);
return ret;
@@ -7330,7 +7331,6 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
if (!list_empty(&ctx->iopoll_list) || to_submit) {
- unsigned nr_events = 0;
const struct cred *creds = NULL;
if (ctx->sq_creds != current_cred())
@@ -7338,7 +7338,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
mutex_lock(&ctx->uring_lock);
if (!list_empty(&ctx->iopoll_list))
- io_do_iopoll(ctx, &nr_events, true);
+ io_do_iopoll(ctx, true);
/*
* Don't submit if refs are dying, good for io_uring_register(),
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 04/23] io_uring: use slist for completion batching
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (2 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 03/23] io_uring: make io_do_iopoll return number of reqs Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 05/23] io_uring: remove allocation cache array Pavel Begunkov
` (18 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Currently we collect requests for completion batching in an array.
Replace them with a singly linked list. It's as fast as arrays but
doesn't take some much space in ctx, and will be used in future patches.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 52 +++++++++++++++++++++++++--------------------------
1 file changed, 25 insertions(+), 27 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 13cb40d8ce08..027e6595321e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -322,8 +322,8 @@ struct io_submit_state {
/*
* Batch completion logic
*/
- struct io_kiocb *compl_reqs[IO_COMPL_BATCH];
- unsigned int compl_nr;
+ struct io_wq_work_list compl_reqs;
+
/* inline/task_work completion list, under ->uring_lock */
struct list_head free_list;
};
@@ -882,6 +882,8 @@ struct io_kiocb {
struct io_wq_work work;
const struct cred *creds;
+ struct io_wq_work_node comp_list;
+
/* store used ubuf, so we can prevent reloading */
struct io_mapped_ubuf *imu;
};
@@ -1166,7 +1168,7 @@ static inline void req_ref_get(struct io_kiocb *req)
static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
{
- if (ctx->submit_state.compl_nr)
+ if (!wq_list_empty(&ctx->submit_state.compl_reqs))
__io_submit_flush_completions(ctx);
}
@@ -1323,6 +1325,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_LIST_HEAD(&ctx->submit_state.free_list);
INIT_LIST_HEAD(&ctx->locked_free_list);
INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
+ INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
return ctx;
err:
kfree(ctx->dummy_ubuf);
@@ -1828,11 +1831,16 @@ static inline bool io_req_needs_clean(struct io_kiocb *req)
static void io_req_complete_state(struct io_kiocb *req, long res,
unsigned int cflags)
{
+ struct io_submit_state *state;
+
if (io_req_needs_clean(req))
io_clean_op(req);
req->result = res;
req->compl.cflags = cflags;
req->flags |= REQ_F_COMPLETE_INLINE;
+
+ state = &req->ctx->submit_state;
+ wq_list_add_tail(&req->comp_list, &state->compl_reqs);
}
static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
@@ -2321,13 +2329,14 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
__must_hold(&ctx->uring_lock)
{
+ struct io_wq_work_node *node, *prev;
struct io_submit_state *state = &ctx->submit_state;
- int i, nr = state->compl_nr;
struct req_batch rb;
spin_lock(&ctx->completion_lock);
- for (i = 0; i < nr; i++) {
- struct io_kiocb *req = state->compl_reqs[i];
+ wq_list_for_each(node, prev, &state->compl_reqs) {
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ comp_list);
__io_cqring_fill_event(ctx, req->user_data, req->result,
req->compl.cflags);
@@ -2337,15 +2346,18 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
io_cqring_ev_posted(ctx);
io_init_req_batch(&rb);
- for (i = 0; i < nr; i++) {
- struct io_kiocb *req = state->compl_reqs[i];
+ node = state->compl_reqs.first;
+ do {
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ comp_list);
+ node = req->comp_list.next;
if (req_ref_put_and_test(req))
io_req_free_batch(&rb, req, &ctx->submit_state);
- }
+ } while (node);
io_req_free_batch_finish(ctx, &rb);
- state->compl_nr = 0;
+ INIT_WQ_LIST(&state->compl_reqs);
}
/*
@@ -2665,17 +2677,10 @@ static void io_req_task_complete(struct io_kiocb *req, bool *locked)
unsigned int cflags = io_put_rw_kbuf(req);
long res = req->result;
- if (*locked) {
- struct io_ring_ctx *ctx = req->ctx;
- struct io_submit_state *state = &ctx->submit_state;
-
+ if (*locked)
io_req_complete_state(req, res, cflags);
- state->compl_reqs[state->compl_nr++] = req;
- if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
- io_submit_flush_completions(ctx);
- } else {
+ else
io_req_complete_post(req, res, cflags);
- }
}
static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
@@ -6953,15 +6958,8 @@ static void __io_queue_sqe(struct io_kiocb *req)
* doesn't support non-blocking read/write attempts
*/
if (likely(!ret)) {
- if (req->flags & REQ_F_COMPLETE_INLINE) {
- struct io_ring_ctx *ctx = req->ctx;
- struct io_submit_state *state = &ctx->submit_state;
-
- state->compl_reqs[state->compl_nr++] = req;
- if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
- io_submit_flush_completions(ctx);
+ if (req->flags & REQ_F_COMPLETE_INLINE)
return;
- }
linked_timeout = io_prep_linked_timeout(req);
if (linked_timeout)
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 05/23] io_uring: remove allocation cache array
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (3 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 04/23] io_uring: use slist for completion batching Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 06/23] io-wq: add io_wq_work_node based stack Pavel Begunkov
` (17 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
We have several of request allocation layers, remove the last one, which
is the submit->reqs array, and always use submit->free_reqs instead.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 60 +++++++++++++++------------------------------------
1 file changed, 17 insertions(+), 43 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 027e6595321e..bf59ca19aef2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -310,12 +310,6 @@ struct io_submit_state {
struct blk_plug plug;
struct io_submit_link link;
- /*
- * io_kiocb alloc cache
- */
- void *reqs[IO_REQ_CACHE_SIZE];
- unsigned int free_reqs;
-
bool plug_started;
bool need_plug;
@@ -1900,7 +1894,6 @@ static void io_flush_cached_locked_reqs(struct io_ring_ctx *ctx,
static bool io_flush_cached_reqs(struct io_ring_ctx *ctx)
{
struct io_submit_state *state = &ctx->submit_state;
- int nr;
/*
* If we have more than a batch's worth of requests in our IRQ side
@@ -1909,20 +1902,7 @@ static bool io_flush_cached_reqs(struct io_ring_ctx *ctx)
*/
if (READ_ONCE(ctx->locked_free_nr) > IO_COMPL_BATCH)
io_flush_cached_locked_reqs(ctx, state);
-
- nr = state->free_reqs;
- while (!list_empty(&state->free_list)) {
- struct io_kiocb *req = list_first_entry(&state->free_list,
- struct io_kiocb, inflight_entry);
-
- list_del(&req->inflight_entry);
- state->reqs[nr++] = req;
- if (nr == ARRAY_SIZE(state->reqs))
- break;
- }
-
- state->free_reqs = nr;
- return nr != 0;
+ return !list_empty(&state->free_list);
}
/*
@@ -1936,33 +1916,36 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
{
struct io_submit_state *state = &ctx->submit_state;
gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
+ void *reqs[IO_REQ_ALLOC_BATCH];
+ struct io_kiocb *req;
int ret, i;
- BUILD_BUG_ON(ARRAY_SIZE(state->reqs) < IO_REQ_ALLOC_BATCH);
-
- if (likely(state->free_reqs || io_flush_cached_reqs(ctx)))
+ if (likely(!list_empty(&state->free_list) || io_flush_cached_reqs(ctx)))
goto got_req;
- ret = kmem_cache_alloc_bulk(req_cachep, gfp, IO_REQ_ALLOC_BATCH,
- state->reqs);
+ ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
/*
* Bulk alloc is all-or-nothing. If we fail to get a batch,
* retry single alloc to be on the safe side.
*/
if (unlikely(ret <= 0)) {
- state->reqs[0] = kmem_cache_alloc(req_cachep, gfp);
- if (!state->reqs[0])
+ reqs[0] = kmem_cache_alloc(req_cachep, gfp);
+ if (!reqs[0])
return NULL;
ret = 1;
}
- for (i = 0; i < ret; i++)
- io_preinit_req(state->reqs[i], ctx);
- state->free_reqs = ret;
+ for (i = 0; i < ret; i++) {
+ req = reqs[i];
+
+ io_preinit_req(req, ctx);
+ list_add(&req->inflight_entry, &state->free_list);
+ }
got_req:
- state->free_reqs--;
- return state->reqs[state->free_reqs];
+ req = list_first_entry(&state->free_list, struct io_kiocb, inflight_entry);
+ list_del(&req->inflight_entry);
+ return req;
}
static inline void io_put_file(struct file *file)
@@ -2320,10 +2303,7 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
rb->task_refs++;
rb->ctx_refs++;
- if (state->free_reqs != ARRAY_SIZE(state->reqs))
- state->reqs[state->free_reqs++] = req;
- else
- list_add(&req->inflight_entry, &state->free_list);
+ list_add(&req->inflight_entry, &state->free_list);
}
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
@@ -9179,12 +9159,6 @@ static void io_req_caches_free(struct io_ring_ctx *ctx)
struct io_submit_state *state = &ctx->submit_state;
mutex_lock(&ctx->uring_lock);
-
- if (state->free_reqs) {
- kmem_cache_free_bulk(req_cachep, state->free_reqs, state->reqs);
- state->free_reqs = 0;
- }
-
io_flush_cached_locked_reqs(ctx, state);
io_req_cache_free(&state->free_list);
mutex_unlock(&ctx->uring_lock);
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 06/23] io-wq: add io_wq_work_node based stack
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (4 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 05/23] io_uring: remove allocation cache array Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 07/23] io_uring: replace list with stack for req caches Pavel Begunkov
` (16 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Apart from just using lists (i.e. io_wq_work_list), we also want to have
stacks, which are a bit faster, and have some interoperability between
them. Add a stack implementation based on io_wq_work_node and some
helpers.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 50 insertions(+), 7 deletions(-)
diff --git a/fs/io-wq.h b/fs/io-wq.h
index bf5c4c533760..c870062105d1 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -29,6 +29,15 @@ struct io_wq_work_list {
struct io_wq_work_node *last;
};
+#define wq_list_for_each(pos, prv, head) \
+ for (pos = (head)->first, prv = NULL; pos; prv = pos, pos = (pos)->next)
+
+#define wq_list_empty(list) (READ_ONCE((list)->first) == NULL)
+#define INIT_WQ_LIST(list) do { \
+ (list)->first = NULL; \
+ (list)->last = NULL; \
+} while (0)
+
static inline void wq_list_add_after(struct io_wq_work_node *node,
struct io_wq_work_node *pos,
struct io_wq_work_list *list)
@@ -54,6 +63,15 @@ static inline void wq_list_add_tail(struct io_wq_work_node *node,
}
}
+static inline void wq_list_add_head(struct io_wq_work_node *node,
+ struct io_wq_work_list *list)
+{
+ node->next = list->first;
+ if (!node->next)
+ list->last = node;
+ WRITE_ONCE(list->first, node);
+}
+
static inline void wq_list_cut(struct io_wq_work_list *list,
struct io_wq_work_node *last,
struct io_wq_work_node *prev)
@@ -69,6 +87,31 @@ static inline void wq_list_cut(struct io_wq_work_list *list,
last->next = NULL;
}
+static inline void __wq_list_splice(struct io_wq_work_list *list,
+ struct io_wq_work_node *to)
+{
+ list->last->next = to->next;
+ to->next = list->first;
+ INIT_WQ_LIST(list);
+}
+
+static inline bool wq_list_splice(struct io_wq_work_list *list,
+ struct io_wq_work_node *to)
+{
+ if (!wq_list_empty(list)) {
+ __wq_list_splice(list, to);
+ return true;
+ }
+ return false;
+}
+
+static inline void wq_stack_add_head(struct io_wq_work_node *node,
+ struct io_wq_work_node *stack)
+{
+ node->next = stack->next;
+ stack->next = node;
+}
+
static inline void wq_list_del(struct io_wq_work_list *list,
struct io_wq_work_node *node,
struct io_wq_work_node *prev)
@@ -76,14 +119,14 @@ static inline void wq_list_del(struct io_wq_work_list *list,
wq_list_cut(list, node, prev);
}
-#define wq_list_for_each(pos, prv, head) \
- for (pos = (head)->first, prv = NULL; pos; prv = pos, pos = (pos)->next)
+static inline
+struct io_wq_work_node *wq_stack_extract(struct io_wq_work_node *stack)
+{
+ struct io_wq_work_node *node = stack->next;
-#define wq_list_empty(list) (READ_ONCE((list)->first) == NULL)
-#define INIT_WQ_LIST(list) do { \
- (list)->first = NULL; \
- (list)->last = NULL; \
-} while (0)
+ stack->next = node->next;
+ return node;
+}
struct io_wq_work {
struct io_wq_work_node list;
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 07/23] io_uring: replace list with stack for req caches
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (5 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 06/23] io-wq: add io_wq_work_node based stack Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 08/23] io_uring: split iopoll loop Pavel Begunkov
` (15 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Replace struct list_head free_list serving for caching requests with
singly linked stack, which is faster.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 51 ++++++++++++++++++++++++---------------------------
1 file changed, 24 insertions(+), 27 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index bf59ca19aef2..67390b131fb0 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -319,7 +319,7 @@ struct io_submit_state {
struct io_wq_work_list compl_reqs;
/* inline/task_work completion list, under ->uring_lock */
- struct list_head free_list;
+ struct io_wq_work_node free_list;
};
struct io_ring_ctx {
@@ -379,7 +379,7 @@ struct io_ring_ctx {
} ____cacheline_aligned_in_smp;
/* IRQ completion list, under ->completion_lock */
- struct list_head locked_free_list;
+ struct io_wq_work_list locked_free_list;
unsigned int locked_free_nr;
const struct cred *sq_creds; /* cred used for __io_sq_thread() */
@@ -1316,8 +1316,8 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
init_llist_head(&ctx->rsrc_put_llist);
INIT_LIST_HEAD(&ctx->tctx_list);
- INIT_LIST_HEAD(&ctx->submit_state.free_list);
- INIT_LIST_HEAD(&ctx->locked_free_list);
+ ctx->submit_state.free_list.next = NULL;
+ INIT_WQ_LIST(&ctx->locked_free_list);
INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
return ctx;
@@ -1808,7 +1808,7 @@ static void io_req_complete_post(struct io_kiocb *req, long res,
}
io_dismantle_req(req);
io_put_task(req->task, 1);
- list_add(&req->inflight_entry, &ctx->locked_free_list);
+ wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
ctx->locked_free_nr++;
percpu_ref_put(&ctx->refs);
}
@@ -1885,7 +1885,7 @@ static void io_flush_cached_locked_reqs(struct io_ring_ctx *ctx,
struct io_submit_state *state)
{
spin_lock(&ctx->completion_lock);
- list_splice_init(&ctx->locked_free_list, &state->free_list);
+ wq_list_splice(&ctx->locked_free_list, &state->free_list);
ctx->locked_free_nr = 0;
spin_unlock(&ctx->completion_lock);
}
@@ -1902,7 +1902,7 @@ static bool io_flush_cached_reqs(struct io_ring_ctx *ctx)
*/
if (READ_ONCE(ctx->locked_free_nr) > IO_COMPL_BATCH)
io_flush_cached_locked_reqs(ctx, state);
- return !list_empty(&state->free_list);
+ return !!state->free_list.next;
}
/*
@@ -1917,10 +1917,11 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
struct io_submit_state *state = &ctx->submit_state;
gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
void *reqs[IO_REQ_ALLOC_BATCH];
+ struct io_wq_work_node *node;
struct io_kiocb *req;
int ret, i;
- if (likely(!list_empty(&state->free_list) || io_flush_cached_reqs(ctx)))
+ if (likely(state->free_list.next || io_flush_cached_reqs(ctx)))
goto got_req;
ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
@@ -1940,12 +1941,11 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
req = reqs[i];
io_preinit_req(req, ctx);
- list_add(&req->inflight_entry, &state->free_list);
+ wq_stack_add_head(&req->comp_list, &state->free_list);
}
got_req:
- req = list_first_entry(&state->free_list, struct io_kiocb, inflight_entry);
- list_del(&req->inflight_entry);
- return req;
+ node = wq_stack_extract(&state->free_list);
+ return container_of(node, struct io_kiocb, comp_list);
}
static inline void io_put_file(struct file *file)
@@ -1978,7 +1978,7 @@ static void __io_free_req(struct io_kiocb *req)
io_put_task(req->task, 1);
spin_lock(&ctx->completion_lock);
- list_add(&req->inflight_entry, &ctx->locked_free_list);
+ wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
ctx->locked_free_nr++;
spin_unlock(&ctx->completion_lock);
@@ -2302,8 +2302,7 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
}
rb->task_refs++;
rb->ctx_refs++;
-
- list_add(&req->inflight_entry, &state->free_list);
+ wq_stack_add_head(&req->comp_list, &state->free_list);
}
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
@@ -7252,7 +7251,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
}
sqe = io_get_sqe(ctx);
if (unlikely(!sqe)) {
- list_add(&req->inflight_entry, &ctx->submit_state.free_list);
+ wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
break;
}
/* will complete beyond this point, count as submitted */
@@ -9144,23 +9143,21 @@ static void io_destroy_buffers(struct io_ring_ctx *ctx)
__io_remove_buffers(ctx, buf, index, -1U);
}
-static void io_req_cache_free(struct list_head *list)
-{
- struct io_kiocb *req, *nxt;
-
- list_for_each_entry_safe(req, nxt, list, inflight_entry) {
- list_del(&req->inflight_entry);
- kmem_cache_free(req_cachep, req);
- }
-}
-
static void io_req_caches_free(struct io_ring_ctx *ctx)
{
struct io_submit_state *state = &ctx->submit_state;
mutex_lock(&ctx->uring_lock);
io_flush_cached_locked_reqs(ctx, state);
- io_req_cache_free(&state->free_list);
+
+ while (state->free_list.next) {
+ struct io_wq_work_node *node;
+ struct io_kiocb *req;
+
+ node = wq_stack_extract(&state->free_list);
+ req = container_of(node, struct io_kiocb, comp_list);
+ kmem_cache_free(req_cachep, req);
+ }
mutex_unlock(&ctx->uring_lock);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 08/23] io_uring: split iopoll loop
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (6 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 07/23] io_uring: replace list with stack for req caches Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 09/23] io_uring: use single linked list for iopoll Pavel Begunkov
` (14 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
The main loop of io_do_iopoll() iterates and does ->iopoll() until it
meets a first completed request, then it continues from that position
and splices requests to pass them through io_iopoll_complete().
Split the loop in two for clearness, iopolling and reaping completed
requests from the list.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 67390b131fb0..e0f897c3779d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2443,6 +2443,12 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
io_req_free_batch_finish(ctx, &rb);
}
+/* same as "continue" but starts from the pos, not next to it */
+#define list_for_each_entry_safe_resume(pos, n, head, member) \
+ for (n = list_next_entry(pos, member); \
+ !list_entry_is_head(pos, head, member); \
+ pos = n, n = list_next_entry(n, member))
+
static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
struct io_kiocb *req, *tmp;
@@ -2456,7 +2462,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
*/
spin = !ctx->poll_multi_queue && !force_nonspin;
- list_for_each_entry_safe(req, tmp, &ctx->iopoll_list, inflight_entry) {
+ list_for_each_entry(req, &ctx->iopoll_list, inflight_entry) {
struct kiocb *kiocb = &req->rw.kiocb;
int ret;
@@ -2465,12 +2471,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
* If we find a request that requires polling, break out
* and complete those lists first, if we have entries there.
*/
- if (READ_ONCE(req->iopoll_completed)) {
- list_move_tail(&req->inflight_entry, &done);
- nr_events++;
- continue;
- }
- if (!list_empty(&done))
+ if (READ_ONCE(req->iopoll_completed))
break;
ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin);
@@ -2480,15 +2481,20 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
spin = false;
/* iopoll may have completed current req */
- if (READ_ONCE(req->iopoll_completed)) {
- list_move_tail(&req->inflight_entry, &done);
- nr_events++;
- }
+ if (READ_ONCE(req->iopoll_completed))
+ break;
}
- if (!list_empty(&done))
- io_iopoll_complete(ctx, &done);
+ list_for_each_entry_safe_resume(req, tmp, &ctx->iopoll_list,
+ inflight_entry) {
+ if (!READ_ONCE(req->iopoll_completed))
+ break;
+ list_move_tail(&req->inflight_entry, &done);
+ nr_events++;
+ }
+ if (nr_events)
+ io_iopoll_complete(ctx, &done);
return nr_events;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 09/23] io_uring: use single linked list for iopoll
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (7 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 08/23] io_uring: split iopoll loop Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 10/23] io_uring: add a helper for batch free Pavel Begunkov
` (13 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Use single linked lists for keeping iopoll requests, takes less space,
may be faster, but mostly will be of benefit for further patches.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.h | 3 +++
fs/io_uring.c | 53 ++++++++++++++++++++++++++-------------------------
2 files changed, 30 insertions(+), 26 deletions(-)
diff --git a/fs/io-wq.h b/fs/io-wq.h
index c870062105d1..87ba6a733630 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -32,6 +32,9 @@ struct io_wq_work_list {
#define wq_list_for_each(pos, prv, head) \
for (pos = (head)->first, prv = NULL; pos; prv = pos, pos = (pos)->next)
+#define wq_list_for_each_resume(pos, prv) \
+ for (; pos; prv = pos, pos = (pos)->next)
+
#define wq_list_empty(list) (READ_ONCE((list)->first) == NULL)
#define INIT_WQ_LIST(list) do { \
(list)->first = NULL; \
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e0f897c3779d..f4a7468c4570 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -413,7 +413,7 @@ struct io_ring_ctx {
* For SQPOLL, only the single threaded io_sq_thread() will
* manipulate the list, hence no extra locking is needed there.
*/
- struct list_head iopoll_list;
+ struct io_wq_work_list iopoll_list;
struct hlist_head *cancel_hash;
unsigned cancel_hash_bits;
bool poll_multi_queue;
@@ -1307,7 +1307,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
init_waitqueue_head(&ctx->cq_wait);
spin_lock_init(&ctx->completion_lock);
spin_lock_init(&ctx->timeout_lock);
- INIT_LIST_HEAD(&ctx->iopoll_list);
+ INIT_WQ_LIST(&ctx->iopoll_list);
INIT_LIST_HEAD(&ctx->defer_list);
INIT_LIST_HEAD(&ctx->timeout_list);
INIT_LIST_HEAD(&ctx->ltimeout_list);
@@ -2443,15 +2443,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
io_req_free_batch_finish(ctx, &rb);
}
-/* same as "continue" but starts from the pos, not next to it */
-#define list_for_each_entry_safe_resume(pos, n, head, member) \
- for (n = list_next_entry(pos, member); \
- !list_entry_is_head(pos, head, member); \
- pos = n, n = list_next_entry(n, member))
-
static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
- struct io_kiocb *req, *tmp;
+ struct io_wq_work_node *pos, *start, *prev;
LIST_HEAD(done);
int nr_events = 0;
bool spin;
@@ -2462,7 +2456,8 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
*/
spin = !ctx->poll_multi_queue && !force_nonspin;
- list_for_each_entry(req, &ctx->iopoll_list, inflight_entry) {
+ wq_list_for_each(pos, start, &ctx->iopoll_list) {
+ struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
struct kiocb *kiocb = &req->rw.kiocb;
int ret;
@@ -2485,14 +2480,20 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
break;
}
- list_for_each_entry_safe_resume(req, tmp, &ctx->iopoll_list,
- inflight_entry) {
+ if (!pos)
+ return 0;
+
+ prev = start;
+ wq_list_for_each_resume(pos, prev) {
+ struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
+
if (!READ_ONCE(req->iopoll_completed))
break;
- list_move_tail(&req->inflight_entry, &done);
+ list_add_tail(&req->inflight_entry, &done);
nr_events++;
}
+ wq_list_cut(&ctx->iopoll_list, prev, start);
if (nr_events)
io_iopoll_complete(ctx, &done);
return nr_events;
@@ -2508,7 +2509,7 @@ static void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
return;
mutex_lock(&ctx->uring_lock);
- while (!list_empty(&ctx->iopoll_list)) {
+ while (!wq_list_empty(&ctx->iopoll_list)) {
/* let it sleep and repeat later if can't complete a request */
if (io_do_iopoll(ctx, true) == 0)
break;
@@ -2557,7 +2558,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
* forever, while the workqueue is stuck trying to acquire the
* very same mutex.
*/
- if (list_empty(&ctx->iopoll_list)) {
+ if (wq_list_empty(&ctx->iopoll_list)) {
u32 tail = ctx->cached_cq_tail;
mutex_unlock(&ctx->uring_lock);
@@ -2566,7 +2567,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
/* some requests don't go through iopoll_list */
if (tail != ctx->cached_cq_tail ||
- list_empty(&ctx->iopoll_list))
+ wq_list_empty(&ctx->iopoll_list))
break;
}
ret = io_do_iopoll(ctx, !min);
@@ -2726,14 +2727,14 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
* how we do polling eventually, not spinning if we're on potentially
* different devices.
*/
- if (list_empty(&ctx->iopoll_list)) {
+ if (wq_list_empty(&ctx->iopoll_list)) {
ctx->poll_multi_queue = false;
} else if (!ctx->poll_multi_queue) {
struct io_kiocb *list_req;
unsigned int queue_num0, queue_num1;
- list_req = list_first_entry(&ctx->iopoll_list, struct io_kiocb,
- inflight_entry);
+ list_req = container_of(ctx->iopoll_list.first, struct io_kiocb,
+ comp_list);
if (list_req->file != req->file) {
ctx->poll_multi_queue = true;
@@ -2750,9 +2751,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
* it to the front so we find it first.
*/
if (READ_ONCE(req->iopoll_completed))
- list_add(&req->inflight_entry, &ctx->iopoll_list);
+ wq_list_add_head(&req->comp_list, &ctx->iopoll_list);
else
- list_add_tail(&req->inflight_entry, &ctx->iopoll_list);
+ wq_list_add_tail(&req->comp_list, &ctx->iopoll_list);
if (unlikely(in_async)) {
/*
@@ -7313,14 +7314,14 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
- if (!list_empty(&ctx->iopoll_list) || to_submit) {
+ if (!wq_list_empty(&ctx->iopoll_list) || to_submit) {
const struct cred *creds = NULL;
if (ctx->sq_creds != current_cred())
creds = override_creds(ctx->sq_creds);
mutex_lock(&ctx->uring_lock);
- if (!list_empty(&ctx->iopoll_list))
+ if (!wq_list_empty(&ctx->iopoll_list))
io_do_iopoll(ctx, true);
/*
@@ -7398,7 +7399,7 @@ static int io_sq_thread(void *data)
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
int ret = __io_sq_thread(ctx, cap_entries);
- if (!sqt_spin && (ret > 0 || !list_empty(&ctx->iopoll_list)))
+ if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
sqt_spin = true;
}
if (io_run_task_work())
@@ -7419,7 +7420,7 @@ static int io_sq_thread(void *data)
io_ring_set_wakeup_flag(ctx);
if ((ctx->flags & IORING_SETUP_IOPOLL) &&
- !list_empty_careful(&ctx->iopoll_list)) {
+ !wq_list_empty(&ctx->iopoll_list)) {
needs_sched = false;
break;
}
@@ -9541,7 +9542,7 @@ static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
/* SQPOLL thread does its own polling */
if ((!(ctx->flags & IORING_SETUP_SQPOLL) && cancel_all) ||
(ctx->sq_data && ctx->sq_data->thread == current)) {
- while (!list_empty_careful(&ctx->iopoll_list)) {
+ while (!wq_list_empty(&ctx->iopoll_list)) {
io_iopoll_try_reap_events(ctx);
ret = true;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 10/23] io_uring: add a helper for batch free
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (8 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 09/23] io_uring: use single linked list for iopoll Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 11/23] io_uring: convert iopoll_completed to store_release Pavel Begunkov
` (12 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Add a helper io_free_batch_list(), which takes a single linked list and
puts/frees all requests from it in an efficient manner. Will be reused
later.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f4a7468c4570..e1c8374954fc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2305,12 +2305,31 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
wq_stack_add_head(&req->comp_list, &state->free_list);
}
+static void io_free_batch_list(struct io_ring_ctx *ctx,
+ struct io_wq_work_list *list)
+ __must_hold(&ctx->uring_lock)
+{
+ struct io_wq_work_node *node;
+ struct req_batch rb;
+
+ io_init_req_batch(&rb);
+ node = list->first;
+ do {
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ comp_list);
+
+ node = req->comp_list.next;
+ if (req_ref_put_and_test(req))
+ io_req_free_batch(&rb, req, &ctx->submit_state);
+ } while (node);
+ io_req_free_batch_finish(ctx, &rb);
+}
+
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
__must_hold(&ctx->uring_lock)
{
struct io_wq_work_node *node, *prev;
struct io_submit_state *state = &ctx->submit_state;
- struct req_batch rb;
spin_lock(&ctx->completion_lock);
wq_list_for_each(node, prev, &state->compl_reqs) {
@@ -2324,18 +2343,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);
- io_init_req_batch(&rb);
- node = state->compl_reqs.first;
- do {
- struct io_kiocb *req = container_of(node, struct io_kiocb,
- comp_list);
-
- node = req->comp_list.next;
- if (req_ref_put_and_test(req))
- io_req_free_batch(&rb, req, &ctx->submit_state);
- } while (node);
-
- io_req_free_batch_finish(ctx, &rb);
+ io_free_batch_list(ctx, &state->compl_reqs);
INIT_WQ_LIST(&state->compl_reqs);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 11/23] io_uring: convert iopoll_completed to store_release
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (9 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 10/23] io_uring: add a helper for batch free Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 12/23] io_uring: optimise batch completion Pavel Begunkov
` (11 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Bart Van Assche
Convert explicit barrier around iopoll_completed to smp_load_acquire()
and smp_store_release(). Similar on the callback side, but replaces a
single smp_rmb() with per-request smp_load_acquire(), neither imply any
extra CPU ordering for x86. Use READ_ONCE as usual where it doesn't
matter.
Use it to move filling CQEs by iopoll earlier, that will be necessary
to avoid traversing the list one extra time in the future.
Suggested-by: Bart Van Assche <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e1c8374954fc..e5d42ca45bce 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2431,17 +2431,11 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
struct req_batch rb;
struct io_kiocb *req;
- /* order with ->result store in io_complete_rw_iopoll() */
- smp_rmb();
-
io_init_req_batch(&rb);
while (!list_empty(done)) {
req = list_first_entry(done, struct io_kiocb, inflight_entry);
list_del(&req->inflight_entry);
- __io_cqring_fill_event(ctx, req->user_data, req->result,
- io_put_rw_kbuf(req));
-
if (req_ref_put_and_test(req))
io_req_free_batch(&rb, req, &ctx->submit_state);
}
@@ -2495,8 +2489,12 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
wq_list_for_each_resume(pos, prev) {
struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
- if (!READ_ONCE(req->iopoll_completed))
+ /* order with io_complete_rw_iopoll(), e.g. ->result updates */
+ if (!smp_load_acquire(&req->iopoll_completed))
break;
+ __io_cqring_fill_event(ctx, req->user_data, req->result,
+ io_put_rw_kbuf(req));
+
list_add_tail(&req->inflight_entry, &done);
nr_events++;
}
@@ -2709,10 +2707,9 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2)
}
}
- WRITE_ONCE(req->result, res);
- /* order with io_iopoll_complete() checking ->result */
- smp_wmb();
- WRITE_ONCE(req->iopoll_completed, 1);
+ req->result = res;
+ /* order with io_iopoll_complete() checking ->iopoll_completed */
+ smp_store_release(&req->iopoll_completed, 1);
}
/*
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 12/23] io_uring: optimise batch completion
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (10 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 11/23] io_uring: convert iopoll_completed to store_release Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 13/23] io_uring: inline completion batching helpers Pavel Begunkov
` (10 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
First, convert rest of iopoll bits to single linked lists, and also
replace per-request list_add_tail() with splicing a part of slist.
With that, use io_free_batch_list() to put/free requests. The main
advantage of it is that it's now the only user of struct req_batch and
friends, and so they can be inlined. The main overhead there was
per-request call to not-inlined io_req_free_batch(), which is expensive
enough.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 38 ++++++++++----------------------------
1 file changed, 10 insertions(+), 28 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e5d42ca45bce..70cd32fb4c70 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2423,32 +2423,10 @@ static inline bool io_run_task_work(void)
return false;
}
-/*
- * Find and free completed poll iocbs
- */
-static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
-{
- struct req_batch rb;
- struct io_kiocb *req;
-
- io_init_req_batch(&rb);
- while (!list_empty(done)) {
- req = list_first_entry(done, struct io_kiocb, inflight_entry);
- list_del(&req->inflight_entry);
-
- if (req_ref_put_and_test(req))
- io_req_free_batch(&rb, req, &ctx->submit_state);
- }
-
- io_commit_cqring(ctx);
- io_cqring_ev_posted_iopoll(ctx);
- io_req_free_batch_finish(ctx, &rb);
-}
-
static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
struct io_wq_work_node *pos, *start, *prev;
- LIST_HEAD(done);
+ struct io_wq_work_list list;
int nr_events = 0;
bool spin;
@@ -2493,15 +2471,19 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
if (!smp_load_acquire(&req->iopoll_completed))
break;
__io_cqring_fill_event(ctx, req->user_data, req->result,
- io_put_rw_kbuf(req));
-
- list_add_tail(&req->inflight_entry, &done);
+ io_put_rw_kbuf(req));
nr_events++;
}
+ if (unlikely(!nr_events))
+ return 0;
+
+ io_commit_cqring(ctx);
+ io_cqring_ev_posted_iopoll(ctx);
+ list.first = start ? start->next : ctx->iopoll_list.first;
+ list.last = prev;
wq_list_cut(&ctx->iopoll_list, prev, start);
- if (nr_events)
- io_iopoll_complete(ctx, &done);
+ io_free_batch_list(ctx, &list);
return nr_events;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 13/23] io_uring: inline completion batching helpers
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (11 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 12/23] io_uring: optimise batch completion Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 14/23] io_uring: don't pass tail into io_free_batch_list Pavel Begunkov
` (9 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
We now have a single function for batched put of requests, just inline
struct req_batch and all related helpers into it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 66 +++++++++++++++++----------------------------------
1 file changed, 22 insertions(+), 44 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 70cd32fb4c70..e1637ea4db4c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2266,63 +2266,41 @@ static void io_free_req_work(struct io_kiocb *req, bool *locked)
io_free_req(req);
}
-struct req_batch {
- struct task_struct *task;
- int task_refs;
- int ctx_refs;
-};
-
-static inline void io_init_req_batch(struct req_batch *rb)
-{
- rb->task_refs = 0;
- rb->ctx_refs = 0;
- rb->task = NULL;
-}
-
-static void io_req_free_batch_finish(struct io_ring_ctx *ctx,
- struct req_batch *rb)
-{
- if (rb->ctx_refs)
- percpu_ref_put_many(&ctx->refs, rb->ctx_refs);
- if (rb->task)
- io_put_task(rb->task, rb->task_refs);
-}
-
-static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
- struct io_submit_state *state)
-{
- io_queue_next(req);
- io_dismantle_req(req);
-
- if (req->task != rb->task) {
- if (rb->task)
- io_put_task(rb->task, rb->task_refs);
- rb->task = req->task;
- rb->task_refs = 0;
- }
- rb->task_refs++;
- rb->ctx_refs++;
- wq_stack_add_head(&req->comp_list, &state->free_list);
-}
-
static void io_free_batch_list(struct io_ring_ctx *ctx,
struct io_wq_work_list *list)
__must_hold(&ctx->uring_lock)
{
struct io_wq_work_node *node;
- struct req_batch rb;
+ struct task_struct *task = NULL;
+ int task_refs = 0, ctx_refs = 0;
- io_init_req_batch(&rb);
node = list->first;
do {
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
node = req->comp_list.next;
- if (req_ref_put_and_test(req))
- io_req_free_batch(&rb, req, &ctx->submit_state);
+ if (!req_ref_put_and_test(req))
+ continue;
+
+ io_queue_next(req);
+ io_dismantle_req(req);
+
+ if (req->task != task) {
+ if (task)
+ io_put_task(task, task_refs);
+ task = req->task;
+ task_refs = 0;
+ }
+ task_refs++;
+ ctx_refs++;
+ wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
} while (node);
- io_req_free_batch_finish(ctx, &rb);
+
+ if (ctx_refs)
+ percpu_ref_put_many(&ctx->refs, ctx_refs);
+ if (task)
+ io_put_task(task, task_refs);
}
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 14/23] io_uring: don't pass tail into io_free_batch_list
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (12 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 13/23] io_uring: inline completion batching helpers Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 15/23] io_uring: don't pass state to io_submit_state_end Pavel Begunkov
` (8 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_free_batch_list() iterates all requests in the passed in list,
so we don't really need to know the tail but can keep iterating until
meet NULL. Just pass the first node into it and it will be enough.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e1637ea4db4c..1d14d85377b5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2267,14 +2267,12 @@ static void io_free_req_work(struct io_kiocb *req, bool *locked)
}
static void io_free_batch_list(struct io_ring_ctx *ctx,
- struct io_wq_work_list *list)
+ struct io_wq_work_node *node)
__must_hold(&ctx->uring_lock)
{
- struct io_wq_work_node *node;
struct task_struct *task = NULL;
int task_refs = 0, ctx_refs = 0;
- node = list->first;
do {
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
@@ -2321,7 +2319,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);
- io_free_batch_list(ctx, &state->compl_reqs);
+ io_free_batch_list(ctx, state->compl_reqs.first);
INIT_WQ_LIST(&state->compl_reqs);
}
@@ -2404,7 +2402,6 @@ static inline bool io_run_task_work(void)
static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
struct io_wq_work_node *pos, *start, *prev;
- struct io_wq_work_list list;
int nr_events = 0;
bool spin;
@@ -2458,10 +2455,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
io_commit_cqring(ctx);
io_cqring_ev_posted_iopoll(ctx);
- list.first = start ? start->next : ctx->iopoll_list.first;
- list.last = prev;
+ pos = start ? start->next : ctx->iopoll_list.first;
wq_list_cut(&ctx->iopoll_list, prev, start);
- io_free_batch_list(ctx, &list);
+ io_free_batch_list(ctx, pos);
return nr_events;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 15/23] io_uring: don't pass state to io_submit_state_end
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (13 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 14/23] io_uring: don't pass tail into io_free_batch_list Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 16/23] io_uring: deduplicate io_queue_sqe() call sites Pavel Begunkov
` (7 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Submission state and ctx and coupled together, no need to passs
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1d14d85377b5..26342ff481cb 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7129,11 +7129,13 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
/*
* Batched submission is done, ensure local IO is flushed out.
*/
-static void io_submit_state_end(struct io_submit_state *state,
- struct io_ring_ctx *ctx)
+static void io_submit_state_end(struct io_ring_ctx *ctx)
{
+ struct io_submit_state *state = &ctx->submit_state;
+
if (state->link.head)
io_queue_sqe(state->link.head);
+ /* flush only after queuing links as they can generate completions */
io_submit_flush_completions(ctx);
if (state->plug_started)
blk_finish_plug(&state->plug);
@@ -7236,7 +7238,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
percpu_ref_put_many(&ctx->refs, unused);
}
- io_submit_state_end(&ctx->submit_state, ctx);
+ io_submit_state_end(ctx);
/* Commit SQ ring head once we've consumed and submitted all SQEs */
io_commit_sqring(ctx);
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 16/23] io_uring: deduplicate io_queue_sqe() call sites
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (14 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 15/23] io_uring: don't pass state to io_submit_state_end Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 17/23] io_uring: remove drain_active check from hot path Pavel Begunkov
` (6 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
There are two call sites of io_queue_sqe() in io_submit_sqe(), combine
them into one, because io_queue_sqe() is inline and we don't want to
bloat binary, and will become even bigger
text data bss dec hex filename
92126 13986 8 106120 19e88 ./fs/io_uring.o
91966 13986 8 105960 19de8 ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 26342ff481cb..5cffbfc8a61f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7109,20 +7109,18 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
link->last->link = req;
link->last = req;
+ if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK))
+ return 0;
/* last request of a link, enqueue the link */
- if (!(req->flags & (REQ_F_LINK | REQ_F_HARDLINK))) {
- link->head = NULL;
- io_queue_sqe(head);
- }
- } else {
- if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
- link->head = req;
- link->last = req;
- } else {
- io_queue_sqe(req);
- }
+ link->head = NULL;
+ req = head;
+ } else if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
+ link->head = req;
+ link->last = req;
+ return 0;
}
+ io_queue_sqe(req);
return 0;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 17/23] io_uring: remove drain_active check from hot path
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (15 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 16/23] io_uring: deduplicate io_queue_sqe() call sites Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 18/23] io_uring: split slow path from io_queue_sqe Pavel Begunkov
` (5 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
req->ctx->active_drain is a bit too expensive, partially because of two
dereferences. Do a trick, if we see it set in io_init_req(), set
REQ_F_FORCE_ASYNC and it automatically goes through a slower path where
we can catch it. It's nearly free to do in io_init_req() because there
is already ->restricted check and it's in the same byte of a bitmask.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 53 ++++++++++++++++++++++++++++-----------------------
1 file changed, 29 insertions(+), 24 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 5cffbfc8a61f..c1c967088252 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6430,23 +6430,15 @@ static bool io_drain_req(struct io_kiocb *req)
int ret;
u32 seq;
- if (req->flags & REQ_F_FAIL) {
- io_req_complete_fail_submit(req);
- return true;
- }
-
- /*
- * If we need to drain a request in the middle of a link, drain the
- * head request and the next request/link after the current link.
- * Considering sequential execution of links, IOSQE_IO_DRAIN will be
- * maintained for every request of our link.
- */
- if (ctx->drain_next) {
- req->flags |= REQ_F_IO_DRAIN;
- ctx->drain_next = false;
- }
/* not interested in head, start from the first linked */
io_for_each_link(pos, req->link) {
+ /*
+ * If we need to drain a request in the middle of a link, drain
+ * the head request and the next request/link after the current
+ * link. Considering sequential execution of links,
+ * IOSQE_IO_DRAIN will be maintained for every request of our
+ * link.
+ */
if (pos->flags & REQ_F_IO_DRAIN) {
ctx->drain_next = true;
req->flags |= REQ_F_IO_DRAIN;
@@ -6938,13 +6930,12 @@ static void __io_queue_sqe(struct io_kiocb *req)
static inline void io_queue_sqe(struct io_kiocb *req)
__must_hold(&req->ctx->uring_lock)
{
- if (unlikely(req->ctx->drain_active) && io_drain_req(req))
- return;
-
if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL)))) {
__io_queue_sqe(req);
} else if (req->flags & REQ_F_FAIL) {
io_req_complete_fail_submit(req);
+ } else if (unlikely(req->ctx->drain_active) && io_drain_req(req)) {
+ return;
} else {
int ret = io_req_prep_async(req);
@@ -6964,9 +6955,6 @@ static inline bool io_check_restriction(struct io_ring_ctx *ctx,
struct io_kiocb *req,
unsigned int sqe_flags)
{
- if (likely(!ctx->restricted))
- return true;
-
if (!test_bit(req->opcode, ctx->restrictions.sqe_op))
return false;
@@ -7007,11 +6995,28 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
if ((sqe_flags & IOSQE_BUFFER_SELECT) &&
!io_op_defs[req->opcode].buffer_select)
return -EOPNOTSUPP;
- if (sqe_flags & IOSQE_IO_DRAIN)
+ if (sqe_flags & IOSQE_IO_DRAIN) {
+ struct io_submit_link *link = &ctx->submit_state.link;
+
ctx->drain_active = true;
+ req->flags |= REQ_F_FORCE_ASYNC;
+ if (link->head)
+ link->head->flags |= IOSQE_IO_DRAIN | REQ_F_FORCE_ASYNC;
+ }
+ }
+ if (unlikely(ctx->restricted || ctx->drain_active || ctx->drain_next)) {
+ if (ctx->restricted && !io_check_restriction(ctx, req, sqe_flags))
+ return -EACCES;
+ /* knock it to the slow queue path, will be drained there */
+ if (ctx->drain_active)
+ req->flags |= REQ_F_FORCE_ASYNC;
+ /* if there is no link, we're at "next" request and need to drain */
+ if (unlikely(ctx->drain_next) && !ctx->submit_state.link.head) {
+ ctx->drain_next = false;
+ ctx->drain_active = true;
+ req->flags |= REQ_F_FORCE_ASYNC | IOSQE_IO_DRAIN;
+ }
}
- if (!io_check_restriction(ctx, req, sqe_flags))
- return -EACCES;
personality = READ_ONCE(sqe->personality);
if (personality) {
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 18/23] io_uring: split slow path from io_queue_sqe
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (16 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 17/23] io_uring: remove drain_active check from hot path Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 19/23] io_uring: inline hot path of __io_queue_sqe() Pavel Begunkov
` (4 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't want the slow path of io_queue_sqe to be inlined, so extract a
function from it.
text data bss dec hex filename
91950 13986 8 105944 19dd8 ./fs/io_uring.o
91758 13986 8 105752 19d18 ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index c1c967088252..6d47b5150e80 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6927,12 +6927,10 @@ static void __io_queue_sqe(struct io_kiocb *req)
}
}
-static inline void io_queue_sqe(struct io_kiocb *req)
+static void io_queue_sqe_fallback(struct io_kiocb *req)
__must_hold(&req->ctx->uring_lock)
{
- if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL)))) {
- __io_queue_sqe(req);
- } else if (req->flags & REQ_F_FAIL) {
+ if (req->flags & REQ_F_FAIL) {
io_req_complete_fail_submit(req);
} else if (unlikely(req->ctx->drain_active) && io_drain_req(req)) {
return;
@@ -6946,6 +6944,15 @@ static inline void io_queue_sqe(struct io_kiocb *req)
}
}
+static inline void io_queue_sqe(struct io_kiocb *req)
+ __must_hold(&req->ctx->uring_lock)
+{
+ if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL))))
+ __io_queue_sqe(req);
+ else
+ io_queue_sqe_fallback(req);
+}
+
/*
* Check SQE restrictions (opcode and flags).
*
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 19/23] io_uring: inline hot path of __io_queue_sqe()
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (17 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 18/23] io_uring: split slow path from io_queue_sqe Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 20/23] io_uring: reshuffle queue_sqe completion handling Pavel Begunkov
` (3 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Extract slow paths from __io_queue_sqe() into a function and inline the
hot path. With that we have everything completely inlined on the
submission path up until io_issue_sqe().
-> io_submit_sqes()
-> io_submit_sqe() (inlined)
-> io_queue_sqe() (inlined)
-> __io_queue_sqe() (inlined)
-> io_issue_sqe()
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 48 ++++++++++++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6d47b5150e80..76838d18493f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6883,13 +6883,38 @@ static void io_queue_linked_timeout(struct io_kiocb *req)
io_put_req(req);
}
-static void __io_queue_sqe(struct io_kiocb *req)
+static void io_queue_sqe_arm_apoll(struct io_kiocb *req)
+ __must_hold(&req->ctx->uring_lock)
+{
+ struct io_kiocb *linked_timeout = io_prep_linked_timeout(req);
+
+ switch (io_arm_poll_handler(req)) {
+ case IO_APOLL_READY:
+ if (linked_timeout) {
+ io_unprep_linked_timeout(req);
+ linked_timeout = NULL;
+ }
+ io_req_task_queue(req);
+ break;
+ case IO_APOLL_ABORTED:
+ /*
+ * Queued up for async execution, worker will release
+ * submit reference when the iocb is actually submitted.
+ */
+ io_queue_async_work(req, NULL);
+ break;
+ }
+
+ if (linked_timeout)
+ io_queue_linked_timeout(linked_timeout);
+}
+
+static inline void __io_queue_sqe(struct io_kiocb *req)
__must_hold(&req->ctx->uring_lock)
{
struct io_kiocb *linked_timeout;
int ret;
-issue_sqe:
ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
/*
@@ -6904,24 +6929,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
if (linked_timeout)
io_queue_linked_timeout(linked_timeout);
} else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
- linked_timeout = io_prep_linked_timeout(req);
-
- switch (io_arm_poll_handler(req)) {
- case IO_APOLL_READY:
- if (linked_timeout)
- io_unprep_linked_timeout(req);
- goto issue_sqe;
- case IO_APOLL_ABORTED:
- /*
- * Queued up for async execution, worker will release
- * submit reference when the iocb is actually submitted.
- */
- io_queue_async_work(req, NULL);
- break;
- }
-
- if (linked_timeout)
- io_queue_linked_timeout(linked_timeout);
+ io_queue_sqe_arm_apoll(req);
} else {
io_req_complete_failed(req, ret);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 20/23] io_uring: reshuffle queue_sqe completion handling
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (18 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 19/23] io_uring: inline hot path of __io_queue_sqe() Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:31 ` [PATCH 21/23] io_uring: restructure submit sqes to_submit checks Pavel Begunkov
` (2 subsequent siblings)
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
If a request completed inline the result should only be zero, it's a
grave error otherwise. So, when we see REQ_F_COMPLETE_INLINE it's not
even necessary to check the return code, and the flag check can be moved
earlier.
It's one "if" less for inline completions, and same two checks for it
normally completing (ret == 0). Those are two cases we care about the
most.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 76838d18493f..7c91e194a43b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6917,14 +6917,13 @@ static inline void __io_queue_sqe(struct io_kiocb *req)
ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
+ if (req->flags & REQ_F_COMPLETE_INLINE)
+ return;
/*
* We async punt it if the file wasn't marked NOWAIT, or if the file
* doesn't support non-blocking read/write attempts
*/
if (likely(!ret)) {
- if (req->flags & REQ_F_COMPLETE_INLINE)
- return;
-
linked_timeout = io_prep_linked_timeout(req);
if (linked_timeout)
io_queue_linked_timeout(linked_timeout);
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 21/23] io_uring: restructure submit sqes to_submit checks
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (19 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 20/23] io_uring: reshuffle queue_sqe completion handling Pavel Begunkov
@ 2021-09-24 16:31 ` Pavel Begunkov
2021-09-24 16:32 ` [PATCH 22/23] io_uring: kill off ->inflight_entry field Pavel Begunkov
2021-09-24 16:32 ` [PATCH 23/23] io_uring: comment why inline complete calls io_clean_op() Pavel Begunkov
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:31 UTC (permalink / raw)
To: Jens Axboe, io-uring
Put an explicit check for number of requests to submit. First,
we can turn while into do-while and it generates better code, and second
that if can be cheaper, e.g. by using CPU flags after sub in
io_sqring_entries().
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7c91e194a43b..3ddca031d7d5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7217,16 +7217,19 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
__must_hold(&ctx->uring_lock)
{
+ unsigned int entries = io_sqring_entries(ctx);
int submitted = 0;
+ if (!entries)
+ return 0;
/* make sure SQ entry isn't read before tail */
- nr = min3(nr, ctx->sq_entries, io_sqring_entries(ctx));
+ nr = min3(nr, ctx->sq_entries, entries);
if (!percpu_ref_tryget_many(&ctx->refs, nr))
return -EAGAIN;
io_get_task_refs(nr);
io_submit_state_start(&ctx->submit_state, nr);
- while (submitted < nr) {
+ do {
const struct io_uring_sqe *sqe;
struct io_kiocb *req;
@@ -7245,7 +7248,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
submitted++;
if (io_submit_sqe(ctx, req, sqe))
break;
- }
+ } while (submitted < nr);
if (unlikely(submitted != nr)) {
int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 22/23] io_uring: kill off ->inflight_entry field
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (20 preceding siblings ...)
2021-09-24 16:31 ` [PATCH 21/23] io_uring: restructure submit sqes to_submit checks Pavel Begunkov
@ 2021-09-24 16:32 ` Pavel Begunkov
2021-09-24 16:32 ` [PATCH 23/23] io_uring: comment why inline complete calls io_clean_op() Pavel Begunkov
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:32 UTC (permalink / raw)
To: Jens Axboe, io-uring
->inflight_entry is not used anymore after converting everything to
single linked lists, remove it. Also adjust io_kiocb layout, so all hot
bits are in first 3 cachelines.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3ddca031d7d5..c53e0f48dc69 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -868,18 +868,15 @@ struct io_kiocb {
struct percpu_ref *fixed_rsrc_refs;
/* used with ctx->iopoll_list with reads/writes */
- struct list_head inflight_entry;
+ struct io_wq_work_node comp_list;
struct io_task_work io_task_work;
/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
struct hlist_node hash_node;
struct async_poll *apoll;
- struct io_wq_work work;
- const struct cred *creds;
-
- struct io_wq_work_node comp_list;
-
/* store used ubuf, so we can prevent reloading */
struct io_mapped_ubuf *imu;
+ struct io_wq_work work;
+ const struct cred *creds;
};
struct io_tctx_node {
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 23/23] io_uring: comment why inline complete calls io_clean_op()
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
` (21 preceding siblings ...)
2021-09-24 16:32 ` [PATCH 22/23] io_uring: kill off ->inflight_entry field Pavel Begunkov
@ 2021-09-24 16:32 ` Pavel Begunkov
22 siblings, 0 replies; 24+ messages in thread
From: Pavel Begunkov @ 2021-09-24 16:32 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_req_complete_state() calls io_clean_op() and it may be not entirely
obvious, leave a comment.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index c53e0f48dc69..5d19016f663d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1824,6 +1824,7 @@ static void io_req_complete_state(struct io_kiocb *req, long res,
{
struct io_submit_state *state;
+ /* clean per-opcode space, because req->compl is aliased with it */
if (io_req_needs_clean(req))
io_clean_op(req);
req->result = res;
--
2.33.0
^ permalink raw reply related [flat|nested] 24+ messages in thread
end of thread, other threads:[~2021-09-24 16:36 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-24 16:31 [RFC][PATCHSET 00/23] rework/optimise submission+completion paths Pavel Begunkov
2021-09-24 16:31 ` [PATCH 01/23] io_uring: mark having different creds unlikely Pavel Begunkov
2021-09-24 16:31 ` [PATCH 02/23] io_uring: force_nonspin Pavel Begunkov
2021-09-24 16:31 ` [PATCH 03/23] io_uring: make io_do_iopoll return number of reqs Pavel Begunkov
2021-09-24 16:31 ` [PATCH 04/23] io_uring: use slist for completion batching Pavel Begunkov
2021-09-24 16:31 ` [PATCH 05/23] io_uring: remove allocation cache array Pavel Begunkov
2021-09-24 16:31 ` [PATCH 06/23] io-wq: add io_wq_work_node based stack Pavel Begunkov
2021-09-24 16:31 ` [PATCH 07/23] io_uring: replace list with stack for req caches Pavel Begunkov
2021-09-24 16:31 ` [PATCH 08/23] io_uring: split iopoll loop Pavel Begunkov
2021-09-24 16:31 ` [PATCH 09/23] io_uring: use single linked list for iopoll Pavel Begunkov
2021-09-24 16:31 ` [PATCH 10/23] io_uring: add a helper for batch free Pavel Begunkov
2021-09-24 16:31 ` [PATCH 11/23] io_uring: convert iopoll_completed to store_release Pavel Begunkov
2021-09-24 16:31 ` [PATCH 12/23] io_uring: optimise batch completion Pavel Begunkov
2021-09-24 16:31 ` [PATCH 13/23] io_uring: inline completion batching helpers Pavel Begunkov
2021-09-24 16:31 ` [PATCH 14/23] io_uring: don't pass tail into io_free_batch_list Pavel Begunkov
2021-09-24 16:31 ` [PATCH 15/23] io_uring: don't pass state to io_submit_state_end Pavel Begunkov
2021-09-24 16:31 ` [PATCH 16/23] io_uring: deduplicate io_queue_sqe() call sites Pavel Begunkov
2021-09-24 16:31 ` [PATCH 17/23] io_uring: remove drain_active check from hot path Pavel Begunkov
2021-09-24 16:31 ` [PATCH 18/23] io_uring: split slow path from io_queue_sqe Pavel Begunkov
2021-09-24 16:31 ` [PATCH 19/23] io_uring: inline hot path of __io_queue_sqe() Pavel Begunkov
2021-09-24 16:31 ` [PATCH 20/23] io_uring: reshuffle queue_sqe completion handling Pavel Begunkov
2021-09-24 16:31 ` [PATCH 21/23] io_uring: restructure submit sqes to_submit checks Pavel Begunkov
2021-09-24 16:32 ` [PATCH 22/23] io_uring: kill off ->inflight_entry field Pavel Begunkov
2021-09-24 16:32 ` [PATCH 23/23] io_uring: comment why inline complete calls io_clean_op() Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox