* [PATCH v2 00/24] rework and optimise submission+completion paths
@ 2021-09-24 20:59 Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 01/24] io_uring: mark having different creds unlikely Pavel Begunkov
` (24 more replies)
0 siblings, 25 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
24 MIOPS vs 31.5, or ~30% win for fio/t/io_uring nops batching=32
Jens mentioned that with his standard test against Optane it gave
yet another +3% to throughput.
1-14 are about optimising the completion path:
- replaces lists with single linked lists
- kills 64 * 8B of caches in ctx
- adds some shuffling of iopoll bits
- list splice instead of per-req list_add in one place
- inlines io_req_free_batch() and other helpers
15-22: inlines __io_queue_sqe() so all the submission path
up to io_issue_sqe() is inlined + little tweaks
v2: rebase for-5.16/io_uring
multicqe_drain was hanging because it's a bit buggy, i.e. doesn't
consider that requests may get punted, but still add 24th patch to
avoid it.
Pavel Begunkov (24):
io_uring: mark having different creds unlikely
io_uring: force_nonspin
io_uring: make io_do_iopoll return number of reqs
io_uring: use slist for completion batching
io_uring: remove allocation cache array
io-wq: add io_wq_work_node based stack
io_uring: replace list with stack for req caches
io_uring: split iopoll loop
io_uring: use single linked list for iopoll
io_uring: add a helper for batch free
io_uring: convert iopoll_completed to store_release
io_uring: optimise batch completion
io_uring: inline completion batching helpers
io_uring: don't pass tail into io_free_batch_list
io_uring: don't pass state to io_submit_state_end
io_uring: deduplicate io_queue_sqe() call sites
io_uring: remove drain_active check from hot path
io_uring: split slow path from io_queue_sqe
io_uring: inline hot path of __io_queue_sqe()
io_uring: reshuffle queue_sqe completion handling
io_uring: restructure submit sqes to_submit checks
io_uring: kill off ->inflight_entry field
io_uring: comment why inline complete calls io_clean_op()
io_uring: disable draining earlier
fs/io-wq.h | 60 +++++-
fs/io_uring.c | 508 +++++++++++++++++++++++---------------------------
2 files changed, 287 insertions(+), 281 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH v2 01/24] io_uring: mark having different creds unlikely
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 02/24] io_uring: force_nonspin Pavel Begunkov
` (23 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Hint the compiler that it's not as likely to have creds different from
current attached to a request. The current code generation is far from
ideal, hopefully it can help to some compilers to remove duplicated jump
tables and so.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ad71c7ef7f6d..9e269fff3bbe 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6645,7 +6645,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
const struct cred *creds = NULL;
int ret;
- if ((req->flags & REQ_F_CREDS) && req->creds != current_cred())
+ if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
creds = override_creds(req->creds);
switch (req->opcode) {
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 02/24] io_uring: force_nonspin
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 01/24] io_uring: mark having different creds unlikely Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 03/24] io_uring: make io_do_iopoll return number of reqs Pavel Begunkov
` (22 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't really need to pass the number of requests to complete into
io_do_iopoll(), a flag whether to enforce non-spin mode is enough.
Should be straightforward, maybe except io_iopoll_check(). We pass !min
there, because we do never enter with the number of already reaped
requests is larger than the specified @min, apart from the first
iteration, where nr_events is 0 and so the final check should be
identical.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9e269fff3bbe..b615fa7963ae 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2458,7 +2458,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
}
static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
- long min)
+ bool force_nonspin)
{
struct io_kiocb *req, *tmp;
LIST_HEAD(done);
@@ -2466,9 +2466,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
/*
* Only spin for completions if we don't have multiple devices hanging
- * off our complete list, and we're under the requested amount.
+ * off our complete list.
*/
- spin = !ctx->poll_multi_queue && *nr_events < min;
+ spin = !ctx->poll_multi_queue && !force_nonspin;
list_for_each_entry_safe(req, tmp, &ctx->iopoll_list, inflight_entry) {
struct kiocb *kiocb = &req->rw.kiocb;
@@ -2516,7 +2516,7 @@ static void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
while (!list_empty(&ctx->iopoll_list)) {
unsigned int nr_events = 0;
- io_do_iopoll(ctx, &nr_events, 0);
+ io_do_iopoll(ctx, &nr_events, true);
/* let it sleep and repeat later if can't complete a request */
if (nr_events == 0)
@@ -2578,7 +2578,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
list_empty(&ctx->iopoll_list))
break;
}
- ret = io_do_iopoll(ctx, &nr_events, min);
+ ret = io_do_iopoll(ctx, &nr_events, !min);
} while (!ret && nr_events < min && !need_resched());
out:
mutex_unlock(&ctx->uring_lock);
@@ -7354,7 +7354,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
mutex_lock(&ctx->uring_lock);
if (!list_empty(&ctx->iopoll_list))
- io_do_iopoll(ctx, &nr_events, 0);
+ io_do_iopoll(ctx, &nr_events, true);
/*
* Don't submit if refs are dying, good for io_uring_register(),
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 03/24] io_uring: make io_do_iopoll return number of reqs
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 01/24] io_uring: mark having different creds unlikely Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 02/24] io_uring: force_nonspin Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 04/24] io_uring: use slist for completion batching Pavel Begunkov
` (21 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Don't pass nr_events pointer around but return directly, it's less
expensive than pointer increments.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b615fa7963ae..9c14e9e722ba 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2430,8 +2430,7 @@ static inline bool io_run_task_work(void)
/*
* Find and free completed poll iocbs
*/
-static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
- struct list_head *done)
+static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
{
struct req_batch rb;
struct io_kiocb *req;
@@ -2446,7 +2445,6 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
__io_cqring_fill_event(ctx, req->user_data, req->result,
io_put_rw_kbuf(req));
- (*nr_events)++;
if (req_ref_put_and_test(req))
io_req_free_batch(&rb, req, &ctx->submit_state);
@@ -2457,11 +2455,11 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
io_req_free_batch_finish(ctx, &rb);
}
-static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
- bool force_nonspin)
+static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
struct io_kiocb *req, *tmp;
LIST_HEAD(done);
+ int nr_events = 0;
bool spin;
/*
@@ -2481,6 +2479,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
*/
if (READ_ONCE(req->iopoll_completed)) {
list_move_tail(&req->inflight_entry, &done);
+ nr_events++;
continue;
}
if (!list_empty(&done))
@@ -2493,14 +2492,16 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, unsigned int *nr_events,
spin = false;
/* iopoll may have completed current req */
- if (READ_ONCE(req->iopoll_completed))
+ if (READ_ONCE(req->iopoll_completed)) {
list_move_tail(&req->inflight_entry, &done);
+ nr_events++;
+ }
}
if (!list_empty(&done))
- io_iopoll_complete(ctx, nr_events, &done);
+ io_iopoll_complete(ctx, &done);
- return 0;
+ return nr_events;
}
/*
@@ -2514,12 +2515,8 @@ static void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
mutex_lock(&ctx->uring_lock);
while (!list_empty(&ctx->iopoll_list)) {
- unsigned int nr_events = 0;
-
- io_do_iopoll(ctx, &nr_events, true);
-
/* let it sleep and repeat later if can't complete a request */
- if (nr_events == 0)
+ if (io_do_iopoll(ctx, true) == 0)
break;
/*
* Ensure we allow local-to-the-cpu processing to take place,
@@ -2578,8 +2575,12 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
list_empty(&ctx->iopoll_list))
break;
}
- ret = io_do_iopoll(ctx, &nr_events, !min);
- } while (!ret && nr_events < min && !need_resched());
+ ret = io_do_iopoll(ctx, !min);
+ if (ret < 0)
+ break;
+ nr_events += ret;
+ ret = 0;
+ } while (nr_events < min && !need_resched());
out:
mutex_unlock(&ctx->uring_lock);
return ret;
@@ -7346,7 +7347,6 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
if (!list_empty(&ctx->iopoll_list) || to_submit) {
- unsigned nr_events = 0;
const struct cred *creds = NULL;
if (ctx->sq_creds != current_cred())
@@ -7354,7 +7354,7 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
mutex_lock(&ctx->uring_lock);
if (!list_empty(&ctx->iopoll_list))
- io_do_iopoll(ctx, &nr_events, true);
+ io_do_iopoll(ctx, true);
/*
* Don't submit if refs are dying, good for io_uring_register(),
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 04/24] io_uring: use slist for completion batching
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (2 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 03/24] io_uring: make io_do_iopoll return number of reqs Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-26 6:57 ` Hao Xu
2021-09-24 20:59 ` [PATCH v2 05/24] io_uring: remove allocation cache array Pavel Begunkov
` (20 subsequent siblings)
24 siblings, 1 reply; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Currently we collect requests for completion batching in an array.
Replace them with a singly linked list. It's as fast as arrays but
doesn't take some much space in ctx, and will be used in future patches.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 52 +++++++++++++++++++++++++--------------------------
1 file changed, 25 insertions(+), 27 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9c14e9e722ba..9a76c4f84311 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -322,8 +322,8 @@ struct io_submit_state {
/*
* Batch completion logic
*/
- struct io_kiocb *compl_reqs[IO_COMPL_BATCH];
- unsigned int compl_nr;
+ struct io_wq_work_list compl_reqs;
+
/* inline/task_work completion list, under ->uring_lock */
struct list_head free_list;
};
@@ -883,6 +883,8 @@ struct io_kiocb {
struct io_wq_work work;
const struct cred *creds;
+ struct io_wq_work_node comp_list;
+
/* store used ubuf, so we can prevent reloading */
struct io_mapped_ubuf *imu;
};
@@ -1169,7 +1171,7 @@ static inline void req_ref_get(struct io_kiocb *req)
static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
{
- if (ctx->submit_state.compl_nr)
+ if (!wq_list_empty(&ctx->submit_state.compl_reqs))
__io_submit_flush_completions(ctx);
}
@@ -1326,6 +1328,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_LIST_HEAD(&ctx->submit_state.free_list);
INIT_LIST_HEAD(&ctx->locked_free_list);
INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
+ INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
return ctx;
err:
kfree(ctx->dummy_ubuf);
@@ -1831,11 +1834,16 @@ static inline bool io_req_needs_clean(struct io_kiocb *req)
static void io_req_complete_state(struct io_kiocb *req, long res,
unsigned int cflags)
{
+ struct io_submit_state *state;
+
if (io_req_needs_clean(req))
io_clean_op(req);
req->result = res;
req->compl.cflags = cflags;
req->flags |= REQ_F_COMPLETE_INLINE;
+
+ state = &req->ctx->submit_state;
+ wq_list_add_tail(&req->comp_list, &state->compl_reqs);
}
static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
@@ -2324,13 +2332,14 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
__must_hold(&ctx->uring_lock)
{
+ struct io_wq_work_node *node, *prev;
struct io_submit_state *state = &ctx->submit_state;
- int i, nr = state->compl_nr;
struct req_batch rb;
spin_lock(&ctx->completion_lock);
- for (i = 0; i < nr; i++) {
- struct io_kiocb *req = state->compl_reqs[i];
+ wq_list_for_each(node, prev, &state->compl_reqs) {
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ comp_list);
__io_cqring_fill_event(ctx, req->user_data, req->result,
req->compl.cflags);
@@ -2340,15 +2349,18 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
io_cqring_ev_posted(ctx);
io_init_req_batch(&rb);
- for (i = 0; i < nr; i++) {
- struct io_kiocb *req = state->compl_reqs[i];
+ node = state->compl_reqs.first;
+ do {
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ comp_list);
+ node = req->comp_list.next;
if (req_ref_put_and_test(req))
io_req_free_batch(&rb, req, &ctx->submit_state);
- }
+ } while (node);
io_req_free_batch_finish(ctx, &rb);
- state->compl_nr = 0;
+ INIT_WQ_LIST(&state->compl_reqs);
}
/*
@@ -2668,17 +2680,10 @@ static void io_req_task_complete(struct io_kiocb *req, bool *locked)
unsigned int cflags = io_put_rw_kbuf(req);
long res = req->result;
- if (*locked) {
- struct io_ring_ctx *ctx = req->ctx;
- struct io_submit_state *state = &ctx->submit_state;
-
+ if (*locked)
io_req_complete_state(req, res, cflags);
- state->compl_reqs[state->compl_nr++] = req;
- if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
- io_submit_flush_completions(ctx);
- } else {
+ else
io_req_complete_post(req, res, cflags);
- }
}
static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
@@ -6969,15 +6974,8 @@ static void __io_queue_sqe(struct io_kiocb *req)
* doesn't support non-blocking read/write attempts
*/
if (likely(!ret)) {
- if (req->flags & REQ_F_COMPLETE_INLINE) {
- struct io_ring_ctx *ctx = req->ctx;
- struct io_submit_state *state = &ctx->submit_state;
-
- state->compl_reqs[state->compl_nr++] = req;
- if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
- io_submit_flush_completions(ctx);
+ if (req->flags & REQ_F_COMPLETE_INLINE)
return;
- }
linked_timeout = io_prep_linked_timeout(req);
if (linked_timeout)
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 05/24] io_uring: remove allocation cache array
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (3 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 04/24] io_uring: use slist for completion batching Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 06/24] io-wq: add io_wq_work_node based stack Pavel Begunkov
` (19 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
We have several of request allocation layers, remove the last one, which
is the submit->reqs array, and always use submit->free_reqs instead.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 60 +++++++++++++++------------------------------------
1 file changed, 17 insertions(+), 43 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9a76c4f84311..9d8d79104d75 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -310,12 +310,6 @@ struct io_submit_state {
struct blk_plug plug;
struct io_submit_link link;
- /*
- * io_kiocb alloc cache
- */
- void *reqs[IO_REQ_CACHE_SIZE];
- unsigned int free_reqs;
-
bool plug_started;
bool need_plug;
@@ -1903,7 +1897,6 @@ static void io_flush_cached_locked_reqs(struct io_ring_ctx *ctx,
static bool io_flush_cached_reqs(struct io_ring_ctx *ctx)
{
struct io_submit_state *state = &ctx->submit_state;
- int nr;
/*
* If we have more than a batch's worth of requests in our IRQ side
@@ -1912,20 +1905,7 @@ static bool io_flush_cached_reqs(struct io_ring_ctx *ctx)
*/
if (READ_ONCE(ctx->locked_free_nr) > IO_COMPL_BATCH)
io_flush_cached_locked_reqs(ctx, state);
-
- nr = state->free_reqs;
- while (!list_empty(&state->free_list)) {
- struct io_kiocb *req = list_first_entry(&state->free_list,
- struct io_kiocb, inflight_entry);
-
- list_del(&req->inflight_entry);
- state->reqs[nr++] = req;
- if (nr == ARRAY_SIZE(state->reqs))
- break;
- }
-
- state->free_reqs = nr;
- return nr != 0;
+ return !list_empty(&state->free_list);
}
/*
@@ -1939,33 +1919,36 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
{
struct io_submit_state *state = &ctx->submit_state;
gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
+ void *reqs[IO_REQ_ALLOC_BATCH];
+ struct io_kiocb *req;
int ret, i;
- BUILD_BUG_ON(ARRAY_SIZE(state->reqs) < IO_REQ_ALLOC_BATCH);
-
- if (likely(state->free_reqs || io_flush_cached_reqs(ctx)))
+ if (likely(!list_empty(&state->free_list) || io_flush_cached_reqs(ctx)))
goto got_req;
- ret = kmem_cache_alloc_bulk(req_cachep, gfp, IO_REQ_ALLOC_BATCH,
- state->reqs);
+ ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
/*
* Bulk alloc is all-or-nothing. If we fail to get a batch,
* retry single alloc to be on the safe side.
*/
if (unlikely(ret <= 0)) {
- state->reqs[0] = kmem_cache_alloc(req_cachep, gfp);
- if (!state->reqs[0])
+ reqs[0] = kmem_cache_alloc(req_cachep, gfp);
+ if (!reqs[0])
return NULL;
ret = 1;
}
- for (i = 0; i < ret; i++)
- io_preinit_req(state->reqs[i], ctx);
- state->free_reqs = ret;
+ for (i = 0; i < ret; i++) {
+ req = reqs[i];
+
+ io_preinit_req(req, ctx);
+ list_add(&req->inflight_entry, &state->free_list);
+ }
got_req:
- state->free_reqs--;
- return state->reqs[state->free_reqs];
+ req = list_first_entry(&state->free_list, struct io_kiocb, inflight_entry);
+ list_del(&req->inflight_entry);
+ return req;
}
static inline void io_put_file(struct file *file)
@@ -2323,10 +2306,7 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
rb->task_refs++;
rb->ctx_refs++;
- if (state->free_reqs != ARRAY_SIZE(state->reqs))
- state->reqs[state->free_reqs++] = req;
- else
- list_add(&req->inflight_entry, &state->free_list);
+ list_add(&req->inflight_entry, &state->free_list);
}
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
@@ -9235,12 +9215,6 @@ static void io_req_caches_free(struct io_ring_ctx *ctx)
struct io_submit_state *state = &ctx->submit_state;
mutex_lock(&ctx->uring_lock);
-
- if (state->free_reqs) {
- kmem_cache_free_bulk(req_cachep, state->free_reqs, state->reqs);
- state->free_reqs = 0;
- }
-
io_flush_cached_locked_reqs(ctx, state);
io_req_cache_free(&state->free_list);
mutex_unlock(&ctx->uring_lock);
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 06/24] io-wq: add io_wq_work_node based stack
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (4 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 05/24] io_uring: remove allocation cache array Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 07/24] io_uring: replace list with stack for req caches Pavel Begunkov
` (18 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Apart from just using lists (i.e. io_wq_work_list), we also want to have
stacks, which are a bit faster, and have some interoperability between
them. Add a stack implementation based on io_wq_work_node and some
helpers.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 50 insertions(+), 7 deletions(-)
diff --git a/fs/io-wq.h b/fs/io-wq.h
index bf5c4c533760..c870062105d1 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -29,6 +29,15 @@ struct io_wq_work_list {
struct io_wq_work_node *last;
};
+#define wq_list_for_each(pos, prv, head) \
+ for (pos = (head)->first, prv = NULL; pos; prv = pos, pos = (pos)->next)
+
+#define wq_list_empty(list) (READ_ONCE((list)->first) == NULL)
+#define INIT_WQ_LIST(list) do { \
+ (list)->first = NULL; \
+ (list)->last = NULL; \
+} while (0)
+
static inline void wq_list_add_after(struct io_wq_work_node *node,
struct io_wq_work_node *pos,
struct io_wq_work_list *list)
@@ -54,6 +63,15 @@ static inline void wq_list_add_tail(struct io_wq_work_node *node,
}
}
+static inline void wq_list_add_head(struct io_wq_work_node *node,
+ struct io_wq_work_list *list)
+{
+ node->next = list->first;
+ if (!node->next)
+ list->last = node;
+ WRITE_ONCE(list->first, node);
+}
+
static inline void wq_list_cut(struct io_wq_work_list *list,
struct io_wq_work_node *last,
struct io_wq_work_node *prev)
@@ -69,6 +87,31 @@ static inline void wq_list_cut(struct io_wq_work_list *list,
last->next = NULL;
}
+static inline void __wq_list_splice(struct io_wq_work_list *list,
+ struct io_wq_work_node *to)
+{
+ list->last->next = to->next;
+ to->next = list->first;
+ INIT_WQ_LIST(list);
+}
+
+static inline bool wq_list_splice(struct io_wq_work_list *list,
+ struct io_wq_work_node *to)
+{
+ if (!wq_list_empty(list)) {
+ __wq_list_splice(list, to);
+ return true;
+ }
+ return false;
+}
+
+static inline void wq_stack_add_head(struct io_wq_work_node *node,
+ struct io_wq_work_node *stack)
+{
+ node->next = stack->next;
+ stack->next = node;
+}
+
static inline void wq_list_del(struct io_wq_work_list *list,
struct io_wq_work_node *node,
struct io_wq_work_node *prev)
@@ -76,14 +119,14 @@ static inline void wq_list_del(struct io_wq_work_list *list,
wq_list_cut(list, node, prev);
}
-#define wq_list_for_each(pos, prv, head) \
- for (pos = (head)->first, prv = NULL; pos; prv = pos, pos = (pos)->next)
+static inline
+struct io_wq_work_node *wq_stack_extract(struct io_wq_work_node *stack)
+{
+ struct io_wq_work_node *node = stack->next;
-#define wq_list_empty(list) (READ_ONCE((list)->first) == NULL)
-#define INIT_WQ_LIST(list) do { \
- (list)->first = NULL; \
- (list)->last = NULL; \
-} while (0)
+ stack->next = node->next;
+ return node;
+}
struct io_wq_work {
struct io_wq_work_node list;
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 07/24] io_uring: replace list with stack for req caches
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (5 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 06/24] io-wq: add io_wq_work_node based stack Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 08/24] io_uring: split iopoll loop Pavel Begunkov
` (17 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Replace struct list_head free_list serving for caching requests with
singly linked stack, which is faster.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 51 ++++++++++++++++++++++++---------------------------
1 file changed, 24 insertions(+), 27 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9d8d79104d75..e29f75bc69ae 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -319,7 +319,7 @@ struct io_submit_state {
struct io_wq_work_list compl_reqs;
/* inline/task_work completion list, under ->uring_lock */
- struct list_head free_list;
+ struct io_wq_work_node free_list;
};
struct io_ring_ctx {
@@ -379,7 +379,7 @@ struct io_ring_ctx {
} ____cacheline_aligned_in_smp;
/* IRQ completion list, under ->completion_lock */
- struct list_head locked_free_list;
+ struct io_wq_work_list locked_free_list;
unsigned int locked_free_nr;
const struct cred *sq_creds; /* cred used for __io_sq_thread() */
@@ -1319,8 +1319,8 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
INIT_DELAYED_WORK(&ctx->rsrc_put_work, io_rsrc_put_work);
init_llist_head(&ctx->rsrc_put_llist);
INIT_LIST_HEAD(&ctx->tctx_list);
- INIT_LIST_HEAD(&ctx->submit_state.free_list);
- INIT_LIST_HEAD(&ctx->locked_free_list);
+ ctx->submit_state.free_list.next = NULL;
+ INIT_WQ_LIST(&ctx->locked_free_list);
INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
return ctx;
@@ -1811,7 +1811,7 @@ static void io_req_complete_post(struct io_kiocb *req, long res,
}
io_dismantle_req(req);
io_put_task(req->task, 1);
- list_add(&req->inflight_entry, &ctx->locked_free_list);
+ wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
ctx->locked_free_nr++;
percpu_ref_put(&ctx->refs);
}
@@ -1888,7 +1888,7 @@ static void io_flush_cached_locked_reqs(struct io_ring_ctx *ctx,
struct io_submit_state *state)
{
spin_lock(&ctx->completion_lock);
- list_splice_init(&ctx->locked_free_list, &state->free_list);
+ wq_list_splice(&ctx->locked_free_list, &state->free_list);
ctx->locked_free_nr = 0;
spin_unlock(&ctx->completion_lock);
}
@@ -1905,7 +1905,7 @@ static bool io_flush_cached_reqs(struct io_ring_ctx *ctx)
*/
if (READ_ONCE(ctx->locked_free_nr) > IO_COMPL_BATCH)
io_flush_cached_locked_reqs(ctx, state);
- return !list_empty(&state->free_list);
+ return !!state->free_list.next;
}
/*
@@ -1920,10 +1920,11 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
struct io_submit_state *state = &ctx->submit_state;
gfp_t gfp = GFP_KERNEL | __GFP_NOWARN;
void *reqs[IO_REQ_ALLOC_BATCH];
+ struct io_wq_work_node *node;
struct io_kiocb *req;
int ret, i;
- if (likely(!list_empty(&state->free_list) || io_flush_cached_reqs(ctx)))
+ if (likely(state->free_list.next || io_flush_cached_reqs(ctx)))
goto got_req;
ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
@@ -1943,12 +1944,11 @@ static struct io_kiocb *io_alloc_req(struct io_ring_ctx *ctx)
req = reqs[i];
io_preinit_req(req, ctx);
- list_add(&req->inflight_entry, &state->free_list);
+ wq_stack_add_head(&req->comp_list, &state->free_list);
}
got_req:
- req = list_first_entry(&state->free_list, struct io_kiocb, inflight_entry);
- list_del(&req->inflight_entry);
- return req;
+ node = wq_stack_extract(&state->free_list);
+ return container_of(node, struct io_kiocb, comp_list);
}
static inline void io_put_file(struct file *file)
@@ -1981,7 +1981,7 @@ static void __io_free_req(struct io_kiocb *req)
io_put_task(req->task, 1);
spin_lock(&ctx->completion_lock);
- list_add(&req->inflight_entry, &ctx->locked_free_list);
+ wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
ctx->locked_free_nr++;
spin_unlock(&ctx->completion_lock);
@@ -2305,8 +2305,7 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
}
rb->task_refs++;
rb->ctx_refs++;
-
- list_add(&req->inflight_entry, &state->free_list);
+ wq_stack_add_head(&req->comp_list, &state->free_list);
}
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
@@ -7268,7 +7267,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
}
sqe = io_get_sqe(ctx);
if (unlikely(!sqe)) {
- list_add(&req->inflight_entry, &ctx->submit_state.free_list);
+ wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
break;
}
/* will complete beyond this point, count as submitted */
@@ -9200,23 +9199,21 @@ static void io_destroy_buffers(struct io_ring_ctx *ctx)
}
}
-static void io_req_cache_free(struct list_head *list)
-{
- struct io_kiocb *req, *nxt;
-
- list_for_each_entry_safe(req, nxt, list, inflight_entry) {
- list_del(&req->inflight_entry);
- kmem_cache_free(req_cachep, req);
- }
-}
-
static void io_req_caches_free(struct io_ring_ctx *ctx)
{
struct io_submit_state *state = &ctx->submit_state;
mutex_lock(&ctx->uring_lock);
io_flush_cached_locked_reqs(ctx, state);
- io_req_cache_free(&state->free_list);
+
+ while (state->free_list.next) {
+ struct io_wq_work_node *node;
+ struct io_kiocb *req;
+
+ node = wq_stack_extract(&state->free_list);
+ req = container_of(node, struct io_kiocb, comp_list);
+ kmem_cache_free(req_cachep, req);
+ }
mutex_unlock(&ctx->uring_lock);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 08/24] io_uring: split iopoll loop
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (6 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 07/24] io_uring: replace list with stack for req caches Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 09/24] io_uring: use single linked list for iopoll Pavel Begunkov
` (16 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
The main loop of io_do_iopoll() iterates and does ->iopoll() until it
meets a first completed request, then it continues from that position
and splices requests to pass them through io_iopoll_complete().
Split the loop in two for clearness, iopolling and reaping completed
requests from the list.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index e29f75bc69ae..0e683d0f5b73 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2446,6 +2446,12 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
io_req_free_batch_finish(ctx, &rb);
}
+/* same as "continue" but starts from the pos, not next to it */
+#define list_for_each_entry_safe_resume(pos, n, head, member) \
+ for (n = list_next_entry(pos, member); \
+ !list_entry_is_head(pos, head, member); \
+ pos = n, n = list_next_entry(n, member))
+
static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
struct io_kiocb *req, *tmp;
@@ -2459,7 +2465,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
*/
spin = !ctx->poll_multi_queue && !force_nonspin;
- list_for_each_entry_safe(req, tmp, &ctx->iopoll_list, inflight_entry) {
+ list_for_each_entry(req, &ctx->iopoll_list, inflight_entry) {
struct kiocb *kiocb = &req->rw.kiocb;
int ret;
@@ -2468,12 +2474,7 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
* If we find a request that requires polling, break out
* and complete those lists first, if we have entries there.
*/
- if (READ_ONCE(req->iopoll_completed)) {
- list_move_tail(&req->inflight_entry, &done);
- nr_events++;
- continue;
- }
- if (!list_empty(&done))
+ if (READ_ONCE(req->iopoll_completed))
break;
ret = kiocb->ki_filp->f_op->iopoll(kiocb, spin);
@@ -2483,15 +2484,20 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
spin = false;
/* iopoll may have completed current req */
- if (READ_ONCE(req->iopoll_completed)) {
- list_move_tail(&req->inflight_entry, &done);
- nr_events++;
- }
+ if (READ_ONCE(req->iopoll_completed))
+ break;
}
- if (!list_empty(&done))
- io_iopoll_complete(ctx, &done);
+ list_for_each_entry_safe_resume(req, tmp, &ctx->iopoll_list,
+ inflight_entry) {
+ if (!READ_ONCE(req->iopoll_completed))
+ break;
+ list_move_tail(&req->inflight_entry, &done);
+ nr_events++;
+ }
+ if (nr_events)
+ io_iopoll_complete(ctx, &done);
return nr_events;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 09/24] io_uring: use single linked list for iopoll
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (7 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 08/24] io_uring: split iopoll loop Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 10/24] io_uring: add a helper for batch free Pavel Begunkov
` (15 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Use single linked lists for keeping iopoll requests, takes less space,
may be faster, but mostly will be of benefit for further patches.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io-wq.h | 3 +++
fs/io_uring.c | 53 ++++++++++++++++++++++++++-------------------------
2 files changed, 30 insertions(+), 26 deletions(-)
diff --git a/fs/io-wq.h b/fs/io-wq.h
index c870062105d1..87ba6a733630 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -32,6 +32,9 @@ struct io_wq_work_list {
#define wq_list_for_each(pos, prv, head) \
for (pos = (head)->first, prv = NULL; pos; prv = pos, pos = (pos)->next)
+#define wq_list_for_each_resume(pos, prv) \
+ for (; pos; prv = pos, pos = (pos)->next)
+
#define wq_list_empty(list) (READ_ONCE((list)->first) == NULL)
#define INIT_WQ_LIST(list) do { \
(list)->first = NULL; \
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0e683d0f5b73..205127394649 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -413,7 +413,7 @@ struct io_ring_ctx {
* For SQPOLL, only the single threaded io_sq_thread() will
* manipulate the list, hence no extra locking is needed there.
*/
- struct list_head iopoll_list;
+ struct io_wq_work_list iopoll_list;
struct hlist_head *cancel_hash;
unsigned cancel_hash_bits;
bool poll_multi_queue;
@@ -1310,7 +1310,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
init_waitqueue_head(&ctx->cq_wait);
spin_lock_init(&ctx->completion_lock);
spin_lock_init(&ctx->timeout_lock);
- INIT_LIST_HEAD(&ctx->iopoll_list);
+ INIT_WQ_LIST(&ctx->iopoll_list);
INIT_LIST_HEAD(&ctx->defer_list);
INIT_LIST_HEAD(&ctx->timeout_list);
INIT_LIST_HEAD(&ctx->ltimeout_list);
@@ -2446,15 +2446,9 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
io_req_free_batch_finish(ctx, &rb);
}
-/* same as "continue" but starts from the pos, not next to it */
-#define list_for_each_entry_safe_resume(pos, n, head, member) \
- for (n = list_next_entry(pos, member); \
- !list_entry_is_head(pos, head, member); \
- pos = n, n = list_next_entry(n, member))
-
static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
- struct io_kiocb *req, *tmp;
+ struct io_wq_work_node *pos, *start, *prev;
LIST_HEAD(done);
int nr_events = 0;
bool spin;
@@ -2465,7 +2459,8 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
*/
spin = !ctx->poll_multi_queue && !force_nonspin;
- list_for_each_entry(req, &ctx->iopoll_list, inflight_entry) {
+ wq_list_for_each(pos, start, &ctx->iopoll_list) {
+ struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
struct kiocb *kiocb = &req->rw.kiocb;
int ret;
@@ -2488,14 +2483,20 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
break;
}
- list_for_each_entry_safe_resume(req, tmp, &ctx->iopoll_list,
- inflight_entry) {
+ if (!pos)
+ return 0;
+
+ prev = start;
+ wq_list_for_each_resume(pos, prev) {
+ struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
+
if (!READ_ONCE(req->iopoll_completed))
break;
- list_move_tail(&req->inflight_entry, &done);
+ list_add_tail(&req->inflight_entry, &done);
nr_events++;
}
+ wq_list_cut(&ctx->iopoll_list, prev, start);
if (nr_events)
io_iopoll_complete(ctx, &done);
return nr_events;
@@ -2511,7 +2512,7 @@ static void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
return;
mutex_lock(&ctx->uring_lock);
- while (!list_empty(&ctx->iopoll_list)) {
+ while (!wq_list_empty(&ctx->iopoll_list)) {
/* let it sleep and repeat later if can't complete a request */
if (io_do_iopoll(ctx, true) == 0)
break;
@@ -2560,7 +2561,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
* forever, while the workqueue is stuck trying to acquire the
* very same mutex.
*/
- if (list_empty(&ctx->iopoll_list)) {
+ if (wq_list_empty(&ctx->iopoll_list)) {
u32 tail = ctx->cached_cq_tail;
mutex_unlock(&ctx->uring_lock);
@@ -2569,7 +2570,7 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
/* some requests don't go through iopoll_list */
if (tail != ctx->cached_cq_tail ||
- list_empty(&ctx->iopoll_list))
+ wq_list_empty(&ctx->iopoll_list))
break;
}
ret = io_do_iopoll(ctx, !min);
@@ -2729,14 +2730,14 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
* how we do polling eventually, not spinning if we're on potentially
* different devices.
*/
- if (list_empty(&ctx->iopoll_list)) {
+ if (wq_list_empty(&ctx->iopoll_list)) {
ctx->poll_multi_queue = false;
} else if (!ctx->poll_multi_queue) {
struct io_kiocb *list_req;
unsigned int queue_num0, queue_num1;
- list_req = list_first_entry(&ctx->iopoll_list, struct io_kiocb,
- inflight_entry);
+ list_req = container_of(ctx->iopoll_list.first, struct io_kiocb,
+ comp_list);
if (list_req->file != req->file) {
ctx->poll_multi_queue = true;
@@ -2753,9 +2754,9 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
* it to the front so we find it first.
*/
if (READ_ONCE(req->iopoll_completed))
- list_add(&req->inflight_entry, &ctx->iopoll_list);
+ wq_list_add_head(&req->comp_list, &ctx->iopoll_list);
else
- list_add_tail(&req->inflight_entry, &ctx->iopoll_list);
+ wq_list_add_tail(&req->comp_list, &ctx->iopoll_list);
if (unlikely(in_async)) {
/*
@@ -7329,14 +7330,14 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, bool cap_entries)
if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
to_submit = IORING_SQPOLL_CAP_ENTRIES_VALUE;
- if (!list_empty(&ctx->iopoll_list) || to_submit) {
+ if (!wq_list_empty(&ctx->iopoll_list) || to_submit) {
const struct cred *creds = NULL;
if (ctx->sq_creds != current_cred())
creds = override_creds(ctx->sq_creds);
mutex_lock(&ctx->uring_lock);
- if (!list_empty(&ctx->iopoll_list))
+ if (!wq_list_empty(&ctx->iopoll_list))
io_do_iopoll(ctx, true);
/*
@@ -7414,7 +7415,7 @@ static int io_sq_thread(void *data)
list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) {
int ret = __io_sq_thread(ctx, cap_entries);
- if (!sqt_spin && (ret > 0 || !list_empty(&ctx->iopoll_list)))
+ if (!sqt_spin && (ret > 0 || !wq_list_empty(&ctx->iopoll_list)))
sqt_spin = true;
}
if (io_run_task_work())
@@ -7435,7 +7436,7 @@ static int io_sq_thread(void *data)
io_ring_set_wakeup_flag(ctx);
if ((ctx->flags & IORING_SETUP_IOPOLL) &&
- !list_empty_careful(&ctx->iopoll_list)) {
+ !wq_list_empty(&ctx->iopoll_list)) {
needs_sched = false;
break;
}
@@ -9597,7 +9598,7 @@ static void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
/* SQPOLL thread does its own polling */
if ((!(ctx->flags & IORING_SETUP_SQPOLL) && cancel_all) ||
(ctx->sq_data && ctx->sq_data->thread == current)) {
- while (!list_empty_careful(&ctx->iopoll_list)) {
+ while (!wq_list_empty(&ctx->iopoll_list)) {
io_iopoll_try_reap_events(ctx);
ret = true;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 10/24] io_uring: add a helper for batch free
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (8 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 09/24] io_uring: use single linked list for iopoll Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-26 3:36 ` Hao Xu
2021-09-24 20:59 ` [PATCH v2 11/24] io_uring: convert iopoll_completed to store_release Pavel Begunkov
` (14 subsequent siblings)
24 siblings, 1 reply; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Add a helper io_free_batch_list(), which takes a single linked list and
puts/frees all requests from it in an efficient manner. Will be reused
later.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 205127394649..ad8af05af4bc 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2308,12 +2308,31 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
wq_stack_add_head(&req->comp_list, &state->free_list);
}
+static void io_free_batch_list(struct io_ring_ctx *ctx,
+ struct io_wq_work_list *list)
+ __must_hold(&ctx->uring_lock)
+{
+ struct io_wq_work_node *node;
+ struct req_batch rb;
+
+ io_init_req_batch(&rb);
+ node = list->first;
+ do {
+ struct io_kiocb *req = container_of(node, struct io_kiocb,
+ comp_list);
+
+ node = req->comp_list.next;
+ if (req_ref_put_and_test(req))
+ io_req_free_batch(&rb, req, &ctx->submit_state);
+ } while (node);
+ io_req_free_batch_finish(ctx, &rb);
+}
+
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
__must_hold(&ctx->uring_lock)
{
struct io_wq_work_node *node, *prev;
struct io_submit_state *state = &ctx->submit_state;
- struct req_batch rb;
spin_lock(&ctx->completion_lock);
wq_list_for_each(node, prev, &state->compl_reqs) {
@@ -2327,18 +2346,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);
- io_init_req_batch(&rb);
- node = state->compl_reqs.first;
- do {
- struct io_kiocb *req = container_of(node, struct io_kiocb,
- comp_list);
-
- node = req->comp_list.next;
- if (req_ref_put_and_test(req))
- io_req_free_batch(&rb, req, &ctx->submit_state);
- } while (node);
-
- io_req_free_batch_finish(ctx, &rb);
+ io_free_batch_list(ctx, &state->compl_reqs);
INIT_WQ_LIST(&state->compl_reqs);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 11/24] io_uring: convert iopoll_completed to store_release
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (9 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 10/24] io_uring: add a helper for batch free Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 12/24] io_uring: optimise batch completion Pavel Begunkov
` (13 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Bart Van Assche
Convert explicit barrier around iopoll_completed to smp_load_acquire()
and smp_store_release(). Similar on the callback side, but replaces a
single smp_rmb() with per-request smp_load_acquire(), neither imply any
extra CPU ordering for x86. Use READ_ONCE as usual where it doesn't
matter.
Use it to move filling CQEs by iopoll earlier, that will be necessary
to avoid traversing the list one extra time in the future.
Suggested-by: Bart Van Assche <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ad8af05af4bc..cf7fcbdb5563 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2434,17 +2434,11 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
struct req_batch rb;
struct io_kiocb *req;
- /* order with ->result store in io_complete_rw_iopoll() */
- smp_rmb();
-
io_init_req_batch(&rb);
while (!list_empty(done)) {
req = list_first_entry(done, struct io_kiocb, inflight_entry);
list_del(&req->inflight_entry);
- __io_cqring_fill_event(ctx, req->user_data, req->result,
- io_put_rw_kbuf(req));
-
if (req_ref_put_and_test(req))
io_req_free_batch(&rb, req, &ctx->submit_state);
}
@@ -2498,8 +2492,12 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
wq_list_for_each_resume(pos, prev) {
struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list);
- if (!READ_ONCE(req->iopoll_completed))
+ /* order with io_complete_rw_iopoll(), e.g. ->result updates */
+ if (!smp_load_acquire(&req->iopoll_completed))
break;
+ __io_cqring_fill_event(ctx, req->user_data, req->result,
+ io_put_rw_kbuf(req));
+
list_add_tail(&req->inflight_entry, &done);
nr_events++;
}
@@ -2712,10 +2710,9 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2)
}
}
- WRITE_ONCE(req->result, res);
- /* order with io_iopoll_complete() checking ->result */
- smp_wmb();
- WRITE_ONCE(req->iopoll_completed, 1);
+ req->result = res;
+ /* order with io_iopoll_complete() checking ->iopoll_completed */
+ smp_store_release(&req->iopoll_completed, 1);
}
/*
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 12/24] io_uring: optimise batch completion
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (10 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 11/24] io_uring: convert iopoll_completed to store_release Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
[not found] ` <CAFUsyfLSXMvd_MBAp83qriW7LD=bg2=25TC4e_X4oMO1atoPYg@mail.gmail.com>
2021-09-24 20:59 ` [PATCH v2 13/24] io_uring: inline completion batching helpers Pavel Begunkov
` (12 subsequent siblings)
24 siblings, 1 reply; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
First, convert rest of iopoll bits to single linked lists, and also
replace per-request list_add_tail() with splicing a part of slist.
With that, use io_free_batch_list() to put/free requests. The main
advantage of it is that it's now the only user of struct req_batch and
friends, and so they can be inlined. The main overhead there was
per-request call to not-inlined io_req_free_batch(), which is expensive
enough.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 38 ++++++++++----------------------------
1 file changed, 10 insertions(+), 28 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index cf7fcbdb5563..c9588519e04a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2426,32 +2426,10 @@ static inline bool io_run_task_work(void)
return false;
}
-/*
- * Find and free completed poll iocbs
- */
-static void io_iopoll_complete(struct io_ring_ctx *ctx, struct list_head *done)
-{
- struct req_batch rb;
- struct io_kiocb *req;
-
- io_init_req_batch(&rb);
- while (!list_empty(done)) {
- req = list_first_entry(done, struct io_kiocb, inflight_entry);
- list_del(&req->inflight_entry);
-
- if (req_ref_put_and_test(req))
- io_req_free_batch(&rb, req, &ctx->submit_state);
- }
-
- io_commit_cqring(ctx);
- io_cqring_ev_posted_iopoll(ctx);
- io_req_free_batch_finish(ctx, &rb);
-}
-
static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
struct io_wq_work_node *pos, *start, *prev;
- LIST_HEAD(done);
+ struct io_wq_work_list list;
int nr_events = 0;
bool spin;
@@ -2496,15 +2474,19 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
if (!smp_load_acquire(&req->iopoll_completed))
break;
__io_cqring_fill_event(ctx, req->user_data, req->result,
- io_put_rw_kbuf(req));
-
- list_add_tail(&req->inflight_entry, &done);
+ io_put_rw_kbuf(req));
nr_events++;
}
+ if (unlikely(!nr_events))
+ return 0;
+
+ io_commit_cqring(ctx);
+ io_cqring_ev_posted_iopoll(ctx);
+ list.first = start ? start->next : ctx->iopoll_list.first;
+ list.last = prev;
wq_list_cut(&ctx->iopoll_list, prev, start);
- if (nr_events)
- io_iopoll_complete(ctx, &done);
+ io_free_batch_list(ctx, &list);
return nr_events;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 13/24] io_uring: inline completion batching helpers
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (11 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 12/24] io_uring: optimise batch completion Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 14/24] io_uring: don't pass tail into io_free_batch_list Pavel Begunkov
` (11 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
We now have a single function for batched put of requests, just inline
struct req_batch and all related helpers into it.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 66 +++++++++++++++++----------------------------------
1 file changed, 22 insertions(+), 44 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index c9588519e04a..8d2aa0951579 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2269,63 +2269,41 @@ static void io_free_req_work(struct io_kiocb *req, bool *locked)
io_free_req(req);
}
-struct req_batch {
- struct task_struct *task;
- int task_refs;
- int ctx_refs;
-};
-
-static inline void io_init_req_batch(struct req_batch *rb)
-{
- rb->task_refs = 0;
- rb->ctx_refs = 0;
- rb->task = NULL;
-}
-
-static void io_req_free_batch_finish(struct io_ring_ctx *ctx,
- struct req_batch *rb)
-{
- if (rb->ctx_refs)
- percpu_ref_put_many(&ctx->refs, rb->ctx_refs);
- if (rb->task)
- io_put_task(rb->task, rb->task_refs);
-}
-
-static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
- struct io_submit_state *state)
-{
- io_queue_next(req);
- io_dismantle_req(req);
-
- if (req->task != rb->task) {
- if (rb->task)
- io_put_task(rb->task, rb->task_refs);
- rb->task = req->task;
- rb->task_refs = 0;
- }
- rb->task_refs++;
- rb->ctx_refs++;
- wq_stack_add_head(&req->comp_list, &state->free_list);
-}
-
static void io_free_batch_list(struct io_ring_ctx *ctx,
struct io_wq_work_list *list)
__must_hold(&ctx->uring_lock)
{
struct io_wq_work_node *node;
- struct req_batch rb;
+ struct task_struct *task = NULL;
+ int task_refs = 0, ctx_refs = 0;
- io_init_req_batch(&rb);
node = list->first;
do {
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
node = req->comp_list.next;
- if (req_ref_put_and_test(req))
- io_req_free_batch(&rb, req, &ctx->submit_state);
+ if (!req_ref_put_and_test(req))
+ continue;
+
+ io_queue_next(req);
+ io_dismantle_req(req);
+
+ if (req->task != task) {
+ if (task)
+ io_put_task(task, task_refs);
+ task = req->task;
+ task_refs = 0;
+ }
+ task_refs++;
+ ctx_refs++;
+ wq_stack_add_head(&req->comp_list, &ctx->submit_state.free_list);
} while (node);
- io_req_free_batch_finish(ctx, &rb);
+
+ if (ctx_refs)
+ percpu_ref_put_many(&ctx->refs, ctx_refs);
+ if (task)
+ io_put_task(task, task_refs);
}
static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 14/24] io_uring: don't pass tail into io_free_batch_list
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (12 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 13/24] io_uring: inline completion batching helpers Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 15/24] io_uring: don't pass state to io_submit_state_end Pavel Begunkov
` (10 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_free_batch_list() iterates all requests in the passed in list,
so we don't really need to know the tail but can keep iterating until
meet NULL. Just pass the first node into it and it will be enough.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 8d2aa0951579..f8640959554b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2270,14 +2270,12 @@ static void io_free_req_work(struct io_kiocb *req, bool *locked)
}
static void io_free_batch_list(struct io_ring_ctx *ctx,
- struct io_wq_work_list *list)
+ struct io_wq_work_node *node)
__must_hold(&ctx->uring_lock)
{
- struct io_wq_work_node *node;
struct task_struct *task = NULL;
int task_refs = 0, ctx_refs = 0;
- node = list->first;
do {
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
@@ -2324,7 +2322,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
spin_unlock(&ctx->completion_lock);
io_cqring_ev_posted(ctx);
- io_free_batch_list(ctx, &state->compl_reqs);
+ io_free_batch_list(ctx, state->compl_reqs.first);
INIT_WQ_LIST(&state->compl_reqs);
}
@@ -2407,7 +2405,6 @@ static inline bool io_run_task_work(void)
static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
{
struct io_wq_work_node *pos, *start, *prev;
- struct io_wq_work_list list;
int nr_events = 0;
bool spin;
@@ -2461,10 +2458,9 @@ static int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
io_commit_cqring(ctx);
io_cqring_ev_posted_iopoll(ctx);
- list.first = start ? start->next : ctx->iopoll_list.first;
- list.last = prev;
+ pos = start ? start->next : ctx->iopoll_list.first;
wq_list_cut(&ctx->iopoll_list, prev, start);
- io_free_batch_list(ctx, &list);
+ io_free_batch_list(ctx, pos);
return nr_events;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 15/24] io_uring: don't pass state to io_submit_state_end
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (13 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 14/24] io_uring: don't pass tail into io_free_batch_list Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 16/24] io_uring: deduplicate io_queue_sqe() call sites Pavel Begunkov
` (9 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Submission state and ctx and coupled together, no need to passs
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index f8640959554b..bc2a5cd80f07 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7145,11 +7145,13 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
/*
* Batched submission is done, ensure local IO is flushed out.
*/
-static void io_submit_state_end(struct io_submit_state *state,
- struct io_ring_ctx *ctx)
+static void io_submit_state_end(struct io_ring_ctx *ctx)
{
+ struct io_submit_state *state = &ctx->submit_state;
+
if (state->link.head)
io_queue_sqe(state->link.head);
+ /* flush only after queuing links as they can generate completions */
io_submit_flush_completions(ctx);
if (state->plug_started)
blk_finish_plug(&state->plug);
@@ -7252,7 +7254,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
percpu_ref_put_many(&ctx->refs, unused);
}
- io_submit_state_end(&ctx->submit_state, ctx);
+ io_submit_state_end(ctx);
/* Commit SQ ring head once we've consumed and submitted all SQEs */
io_commit_sqring(ctx);
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 16/24] io_uring: deduplicate io_queue_sqe() call sites
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (14 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 15/24] io_uring: don't pass state to io_submit_state_end Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 17/24] io_uring: remove drain_active check from hot path Pavel Begunkov
` (8 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
There are two call sites of io_queue_sqe() in io_submit_sqe(), combine
them into one, because io_queue_sqe() is inline and we don't want to
bloat binary, and will become even bigger
text data bss dec hex filename
92126 13986 8 106120 19e88 ./fs/io_uring.o
91966 13986 8 105960 19de8 ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index bc2a5cd80f07..271d921508f8 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7125,20 +7125,18 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
link->last->link = req;
link->last = req;
+ if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK))
+ return 0;
/* last request of a link, enqueue the link */
- if (!(req->flags & (REQ_F_LINK | REQ_F_HARDLINK))) {
- link->head = NULL;
- io_queue_sqe(head);
- }
- } else {
- if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
- link->head = req;
- link->last = req;
- } else {
- io_queue_sqe(req);
- }
+ link->head = NULL;
+ req = head;
+ } else if (req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) {
+ link->head = req;
+ link->last = req;
+ return 0;
}
+ io_queue_sqe(req);
return 0;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 17/24] io_uring: remove drain_active check from hot path
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (15 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 16/24] io_uring: deduplicate io_queue_sqe() call sites Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 18/24] io_uring: split slow path from io_queue_sqe Pavel Begunkov
` (7 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
req->ctx->active_drain is a bit too expensive, partially because of two
dereferences. Do a trick, if we see it set in io_init_req(), set
REQ_F_FORCE_ASYNC and it automatically goes through a slower path where
we can catch it. It's nearly free to do in io_init_req() because there
is already ->restricted check and it's in the same byte of a bitmask.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 53 ++++++++++++++++++++++++++++-----------------------
1 file changed, 29 insertions(+), 24 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 271d921508f8..25f6096269c5 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6446,23 +6446,15 @@ static bool io_drain_req(struct io_kiocb *req)
int ret;
u32 seq;
- if (req->flags & REQ_F_FAIL) {
- io_req_complete_fail_submit(req);
- return true;
- }
-
- /*
- * If we need to drain a request in the middle of a link, drain the
- * head request and the next request/link after the current link.
- * Considering sequential execution of links, IOSQE_IO_DRAIN will be
- * maintained for every request of our link.
- */
- if (ctx->drain_next) {
- req->flags |= REQ_F_IO_DRAIN;
- ctx->drain_next = false;
- }
/* not interested in head, start from the first linked */
io_for_each_link(pos, req->link) {
+ /*
+ * If we need to drain a request in the middle of a link, drain
+ * the head request and the next request/link after the current
+ * link. Considering sequential execution of links,
+ * IOSQE_IO_DRAIN will be maintained for every request of our
+ * link.
+ */
if (pos->flags & REQ_F_IO_DRAIN) {
ctx->drain_next = true;
req->flags |= REQ_F_IO_DRAIN;
@@ -6954,13 +6946,12 @@ static void __io_queue_sqe(struct io_kiocb *req)
static inline void io_queue_sqe(struct io_kiocb *req)
__must_hold(&req->ctx->uring_lock)
{
- if (unlikely(req->ctx->drain_active) && io_drain_req(req))
- return;
-
if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL)))) {
__io_queue_sqe(req);
} else if (req->flags & REQ_F_FAIL) {
io_req_complete_fail_submit(req);
+ } else if (unlikely(req->ctx->drain_active) && io_drain_req(req)) {
+ return;
} else {
int ret = io_req_prep_async(req);
@@ -6980,9 +6971,6 @@ static inline bool io_check_restriction(struct io_ring_ctx *ctx,
struct io_kiocb *req,
unsigned int sqe_flags)
{
- if (likely(!ctx->restricted))
- return true;
-
if (!test_bit(req->opcode, ctx->restrictions.sqe_op))
return false;
@@ -7023,11 +7011,28 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
if ((sqe_flags & IOSQE_BUFFER_SELECT) &&
!io_op_defs[req->opcode].buffer_select)
return -EOPNOTSUPP;
- if (sqe_flags & IOSQE_IO_DRAIN)
+ if (sqe_flags & IOSQE_IO_DRAIN) {
+ struct io_submit_link *link = &ctx->submit_state.link;
+
ctx->drain_active = true;
+ req->flags |= REQ_F_FORCE_ASYNC;
+ if (link->head)
+ link->head->flags |= IOSQE_IO_DRAIN | REQ_F_FORCE_ASYNC;
+ }
+ }
+ if (unlikely(ctx->restricted || ctx->drain_active || ctx->drain_next)) {
+ if (ctx->restricted && !io_check_restriction(ctx, req, sqe_flags))
+ return -EACCES;
+ /* knock it to the slow queue path, will be drained there */
+ if (ctx->drain_active)
+ req->flags |= REQ_F_FORCE_ASYNC;
+ /* if there is no link, we're at "next" request and need to drain */
+ if (unlikely(ctx->drain_next) && !ctx->submit_state.link.head) {
+ ctx->drain_next = false;
+ ctx->drain_active = true;
+ req->flags |= REQ_F_FORCE_ASYNC | IOSQE_IO_DRAIN;
+ }
}
- if (!io_check_restriction(ctx, req, sqe_flags))
- return -EACCES;
personality = READ_ONCE(sqe->personality);
if (personality) {
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 18/24] io_uring: split slow path from io_queue_sqe
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (16 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 17/24] io_uring: remove drain_active check from hot path Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 19/24] io_uring: inline hot path of __io_queue_sqe() Pavel Begunkov
` (6 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
We don't want the slow path of io_queue_sqe to be inlined, so extract a
function from it.
text data bss dec hex filename
91950 13986 8 105944 19dd8 ./fs/io_uring.o
91758 13986 8 105752 19d18 ./fs/io_uring.o
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 25f6096269c5..54910ed86493 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6943,12 +6943,10 @@ static void __io_queue_sqe(struct io_kiocb *req)
}
}
-static inline void io_queue_sqe(struct io_kiocb *req)
+static void io_queue_sqe_fallback(struct io_kiocb *req)
__must_hold(&req->ctx->uring_lock)
{
- if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL)))) {
- __io_queue_sqe(req);
- } else if (req->flags & REQ_F_FAIL) {
+ if (req->flags & REQ_F_FAIL) {
io_req_complete_fail_submit(req);
} else if (unlikely(req->ctx->drain_active) && io_drain_req(req)) {
return;
@@ -6962,6 +6960,15 @@ static inline void io_queue_sqe(struct io_kiocb *req)
}
}
+static inline void io_queue_sqe(struct io_kiocb *req)
+ __must_hold(&req->ctx->uring_lock)
+{
+ if (likely(!(req->flags & (REQ_F_FORCE_ASYNC | REQ_F_FAIL))))
+ __io_queue_sqe(req);
+ else
+ io_queue_sqe_fallback(req);
+}
+
/*
* Check SQE restrictions (opcode and flags).
*
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 19/24] io_uring: inline hot path of __io_queue_sqe()
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (17 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 18/24] io_uring: split slow path from io_queue_sqe Pavel Begunkov
@ 2021-09-24 20:59 ` Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 20/24] io_uring: reshuffle queue_sqe completion handling Pavel Begunkov
` (5 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 20:59 UTC (permalink / raw)
To: Jens Axboe, io-uring
Extract slow paths from __io_queue_sqe() into a function and inline the
hot path. With that we have everything completely inlined on the
submission path up until io_issue_sqe().
-> io_submit_sqes()
-> io_submit_sqe() (inlined)
-> io_queue_sqe() (inlined)
-> __io_queue_sqe() (inlined)
-> io_issue_sqe()
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 48 ++++++++++++++++++++++++++++--------------------
1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54910ed86493..0470e1cae582 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6899,13 +6899,38 @@ static void io_queue_linked_timeout(struct io_kiocb *req)
io_put_req(req);
}
-static void __io_queue_sqe(struct io_kiocb *req)
+static void io_queue_sqe_arm_apoll(struct io_kiocb *req)
+ __must_hold(&req->ctx->uring_lock)
+{
+ struct io_kiocb *linked_timeout = io_prep_linked_timeout(req);
+
+ switch (io_arm_poll_handler(req)) {
+ case IO_APOLL_READY:
+ if (linked_timeout) {
+ io_unprep_linked_timeout(req);
+ linked_timeout = NULL;
+ }
+ io_req_task_queue(req);
+ break;
+ case IO_APOLL_ABORTED:
+ /*
+ * Queued up for async execution, worker will release
+ * submit reference when the iocb is actually submitted.
+ */
+ io_queue_async_work(req, NULL);
+ break;
+ }
+
+ if (linked_timeout)
+ io_queue_linked_timeout(linked_timeout);
+}
+
+static inline void __io_queue_sqe(struct io_kiocb *req)
__must_hold(&req->ctx->uring_lock)
{
struct io_kiocb *linked_timeout;
int ret;
-issue_sqe:
ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
/*
@@ -6920,24 +6945,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
if (linked_timeout)
io_queue_linked_timeout(linked_timeout);
} else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
- linked_timeout = io_prep_linked_timeout(req);
-
- switch (io_arm_poll_handler(req)) {
- case IO_APOLL_READY:
- if (linked_timeout)
- io_unprep_linked_timeout(req);
- goto issue_sqe;
- case IO_APOLL_ABORTED:
- /*
- * Queued up for async execution, worker will release
- * submit reference when the iocb is actually submitted.
- */
- io_queue_async_work(req, NULL);
- break;
- }
-
- if (linked_timeout)
- io_queue_linked_timeout(linked_timeout);
+ io_queue_sqe_arm_apoll(req);
} else {
io_req_complete_failed(req, ret);
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 20/24] io_uring: reshuffle queue_sqe completion handling
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (18 preceding siblings ...)
2021-09-24 20:59 ` [PATCH v2 19/24] io_uring: inline hot path of __io_queue_sqe() Pavel Begunkov
@ 2021-09-24 21:00 ` Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 21/24] io_uring: restructure submit sqes to_submit checks Pavel Begunkov
` (4 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 21:00 UTC (permalink / raw)
To: Jens Axboe, io-uring
If a request completed inline the result should only be zero, it's a
grave error otherwise. So, when we see REQ_F_COMPLETE_INLINE it's not
even necessary to check the return code, and the flag check can be moved
earlier.
It's one "if" less for inline completions, and same two checks for it
normally completing (ret == 0). Those are two cases we care about the
most.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0470e1cae582..695c751434c3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6933,14 +6933,13 @@ static inline void __io_queue_sqe(struct io_kiocb *req)
ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
+ if (req->flags & REQ_F_COMPLETE_INLINE)
+ return;
/*
* We async punt it if the file wasn't marked NOWAIT, or if the file
* doesn't support non-blocking read/write attempts
*/
if (likely(!ret)) {
- if (req->flags & REQ_F_COMPLETE_INLINE)
- return;
-
linked_timeout = io_prep_linked_timeout(req);
if (linked_timeout)
io_queue_linked_timeout(linked_timeout);
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 21/24] io_uring: restructure submit sqes to_submit checks
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (19 preceding siblings ...)
2021-09-24 21:00 ` [PATCH v2 20/24] io_uring: reshuffle queue_sqe completion handling Pavel Begunkov
@ 2021-09-24 21:00 ` Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 22/24] io_uring: kill off ->inflight_entry field Pavel Begunkov
` (3 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 21:00 UTC (permalink / raw)
To: Jens Axboe, io-uring
Put an explicit check for number of requests to submit. First,
we can turn while into do-while and it generates better code, and second
that if can be cheaper, e.g. by using CPU flags after sub in
io_sqring_entries().
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 695c751434c3..b3444a7c8c89 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -7233,16 +7233,19 @@ static const struct io_uring_sqe *io_get_sqe(struct io_ring_ctx *ctx)
static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
__must_hold(&ctx->uring_lock)
{
+ unsigned int entries = io_sqring_entries(ctx);
int submitted = 0;
+ if (!entries)
+ return 0;
/* make sure SQ entry isn't read before tail */
- nr = min3(nr, ctx->sq_entries, io_sqring_entries(ctx));
+ nr = min3(nr, ctx->sq_entries, entries);
if (!percpu_ref_tryget_many(&ctx->refs, nr))
return -EAGAIN;
io_get_task_refs(nr);
io_submit_state_start(&ctx->submit_state, nr);
- while (submitted < nr) {
+ do {
const struct io_uring_sqe *sqe;
struct io_kiocb *req;
@@ -7261,7 +7264,7 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
submitted++;
if (io_submit_sqe(ctx, req, sqe))
break;
- }
+ } while (submitted < nr);
if (unlikely(submitted != nr)) {
int ref_used = (submitted == -EAGAIN) ? 0 : submitted;
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 22/24] io_uring: kill off ->inflight_entry field
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (20 preceding siblings ...)
2021-09-24 21:00 ` [PATCH v2 21/24] io_uring: restructure submit sqes to_submit checks Pavel Begunkov
@ 2021-09-24 21:00 ` Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 23/24] io_uring: comment why inline complete calls io_clean_op() Pavel Begunkov
` (2 subsequent siblings)
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 21:00 UTC (permalink / raw)
To: Jens Axboe, io-uring
->inflight_entry is not used anymore after converting everything to
single linked lists, remove it. Also adjust io_kiocb layout, so all hot
bits are in first 3 cachelines.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b3444a7c8c89..34c222a32b12 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -869,18 +869,15 @@ struct io_kiocb {
struct percpu_ref *fixed_rsrc_refs;
/* used with ctx->iopoll_list with reads/writes */
- struct list_head inflight_entry;
+ struct io_wq_work_node comp_list;
struct io_task_work io_task_work;
/* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */
struct hlist_node hash_node;
struct async_poll *apoll;
- struct io_wq_work work;
- const struct cred *creds;
-
- struct io_wq_work_node comp_list;
-
/* store used ubuf, so we can prevent reloading */
struct io_mapped_ubuf *imu;
+ struct io_wq_work work;
+ const struct cred *creds;
};
struct io_tctx_node {
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 23/24] io_uring: comment why inline complete calls io_clean_op()
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (21 preceding siblings ...)
2021-09-24 21:00 ` [PATCH v2 22/24] io_uring: kill off ->inflight_entry field Pavel Begunkov
@ 2021-09-24 21:00 ` Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 24/24] io_uring: disable draining earlier Pavel Begunkov
2021-09-30 16:04 ` [PATCH v2 00/24] rework and optimise submission+completion paths Jens Axboe
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 21:00 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_req_complete_state() calls io_clean_op() and it may be not entirely
obvious, leave a comment.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 34c222a32b12..b6bdf8e72123 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1827,6 +1827,7 @@ static void io_req_complete_state(struct io_kiocb *req, long res,
{
struct io_submit_state *state;
+ /* clean per-opcode space, because req->compl is aliased with it */
if (io_req_needs_clean(req))
io_clean_op(req);
req->result = res;
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH v2 24/24] io_uring: disable draining earlier
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (22 preceding siblings ...)
2021-09-24 21:00 ` [PATCH v2 23/24] io_uring: comment why inline complete calls io_clean_op() Pavel Begunkov
@ 2021-09-24 21:00 ` Pavel Begunkov
2021-09-30 16:04 ` [PATCH v2 00/24] rework and optimise submission+completion paths Jens Axboe
24 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-24 21:00 UTC (permalink / raw)
To: Jens Axboe, io-uring
Clear ->drain_active in two more cases where we check for a need of
draining. It's not a bug, but still may lead to some extra requests
being punted to io-wq, and that may be not desirable.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b6bdf8e72123..c6a82c67a93d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6469,8 +6469,10 @@ static bool io_drain_req(struct io_kiocb *req)
seq = io_get_sequence(req);
/* Still a chance to pass the sequence check */
- if (!req_need_defer(req, seq) && list_empty_careful(&ctx->defer_list))
+ if (!req_need_defer(req, seq) && list_empty_careful(&ctx->defer_list)) {
+ ctx->drain_active = false;
return false;
+ }
ret = io_req_prep_async(req);
if (ret)
@@ -6489,6 +6491,7 @@ static bool io_drain_req(struct io_kiocb *req)
spin_unlock(&ctx->completion_lock);
kfree(de);
io_queue_async_work(req, NULL);
+ ctx->drain_active = false;
return true;
}
--
2.33.0
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH v2 10/24] io_uring: add a helper for batch free
2021-09-24 20:59 ` [PATCH v2 10/24] io_uring: add a helper for batch free Pavel Begunkov
@ 2021-09-26 3:36 ` Hao Xu
2021-09-28 9:33 ` Pavel Begunkov
0 siblings, 1 reply; 32+ messages in thread
From: Hao Xu @ 2021-09-26 3:36 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/9/25 上午4:59, Pavel Begunkov 写道:
> Add a helper io_free_batch_list(), which takes a single linked list and
> puts/frees all requests from it in an efficient manner. Will be reused
> later.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> fs/io_uring.c | 34 +++++++++++++++++++++-------------
> 1 file changed, 21 insertions(+), 13 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 205127394649..ad8af05af4bc 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2308,12 +2308,31 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
> wq_stack_add_head(&req->comp_list, &state->free_list);
> }
>
> +static void io_free_batch_list(struct io_ring_ctx *ctx,
> + struct io_wq_work_list *list)
> + __must_hold(&ctx->uring_lock)
> +{
> + struct io_wq_work_node *node;
> + struct req_batch rb;
> +
> + io_init_req_batch(&rb);
> + node = list->first;
> + do {
> + struct io_kiocb *req = container_of(node, struct io_kiocb,
> + comp_list);
> +
> + node = req->comp_list.next;
> + if (req_ref_put_and_test(req))
> + io_req_free_batch(&rb, req, &ctx->submit_state);
> + } while (node);
> + io_req_free_batch_finish(ctx, &rb);
> +}
Hi Pavel, Why not we use the wq_list_for_each here as well?
> +
> static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
> __must_hold(&ctx->uring_lock)
> {
> struct io_wq_work_node *node, *prev;
> struct io_submit_state *state = &ctx->submit_state;
> - struct req_batch rb;
>
> spin_lock(&ctx->completion_lock);
> wq_list_for_each(node, prev, &state->compl_reqs) {
> @@ -2327,18 +2346,7 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
> spin_unlock(&ctx->completion_lock);
> io_cqring_ev_posted(ctx);
>
> - io_init_req_batch(&rb);
> - node = state->compl_reqs.first;
> - do {
> - struct io_kiocb *req = container_of(node, struct io_kiocb,
> - comp_list);
> -
> - node = req->comp_list.next;
> - if (req_ref_put_and_test(req))
> - io_req_free_batch(&rb, req, &ctx->submit_state);
> - } while (node);
> -
> - io_req_free_batch_finish(ctx, &rb);
> + io_free_batch_list(ctx, &state->compl_reqs);
> INIT_WQ_LIST(&state->compl_reqs);
> }
>
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 04/24] io_uring: use slist for completion batching
2021-09-24 20:59 ` [PATCH v2 04/24] io_uring: use slist for completion batching Pavel Begunkov
@ 2021-09-26 6:57 ` Hao Xu
2021-09-28 9:41 ` Pavel Begunkov
0 siblings, 1 reply; 32+ messages in thread
From: Hao Xu @ 2021-09-26 6:57 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/9/25 上午4:59, Pavel Begunkov 写道:
> Currently we collect requests for completion batching in an array.
> Replace them with a singly linked list. It's as fast as arrays but
> doesn't take some much space in ctx, and will be used in future patches.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> fs/io_uring.c | 52 +++++++++++++++++++++++++--------------------------
> 1 file changed, 25 insertions(+), 27 deletions(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 9c14e9e722ba..9a76c4f84311 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -322,8 +322,8 @@ struct io_submit_state {
> /*
> * Batch completion logic
> */
> - struct io_kiocb *compl_reqs[IO_COMPL_BATCH];
> - unsigned int compl_nr;
> + struct io_wq_work_list compl_reqs;
Will it be better to rename struct io_wq_work_list to something more
generic, io_wq_work_list is a bit confused, we are now using this
type of linked list (stack as well) for various aim, not just to link
iowq works.
> +
> /* inline/task_work completion list, under ->uring_lock */
> struct list_head free_list;
> };
> @@ -883,6 +883,8 @@ struct io_kiocb {
> struct io_wq_work work;
> const struct cred *creds;
>
> + struct io_wq_work_node comp_list;
> +
> /* store used ubuf, so we can prevent reloading */
> struct io_mapped_ubuf *imu;
> };
> @@ -1169,7 +1171,7 @@ static inline void req_ref_get(struct io_kiocb *req)
>
> static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
> {
> - if (ctx->submit_state.compl_nr)
> + if (!wq_list_empty(&ctx->submit_state.compl_reqs))
> __io_submit_flush_completions(ctx);
> }
>
> @@ -1326,6 +1328,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
> INIT_LIST_HEAD(&ctx->submit_state.free_list);
> INIT_LIST_HEAD(&ctx->locked_free_list);
> INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
> + INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
> return ctx;
> err:
> kfree(ctx->dummy_ubuf);
> @@ -1831,11 +1834,16 @@ static inline bool io_req_needs_clean(struct io_kiocb *req)
> static void io_req_complete_state(struct io_kiocb *req, long res,
> unsigned int cflags)
> {
> + struct io_submit_state *state;
> +
> if (io_req_needs_clean(req))
> io_clean_op(req);
> req->result = res;
> req->compl.cflags = cflags;
> req->flags |= REQ_F_COMPLETE_INLINE;
> +
> + state = &req->ctx->submit_state;
> + wq_list_add_tail(&req->comp_list, &state->compl_reqs);
> }
>
> static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
> @@ -2324,13 +2332,14 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
> static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
> __must_hold(&ctx->uring_lock)
> {
> + struct io_wq_work_node *node, *prev;
> struct io_submit_state *state = &ctx->submit_state;
> - int i, nr = state->compl_nr;
> struct req_batch rb;
>
> spin_lock(&ctx->completion_lock);
> - for (i = 0; i < nr; i++) {
> - struct io_kiocb *req = state->compl_reqs[i];
> + wq_list_for_each(node, prev, &state->compl_reqs) {
> + struct io_kiocb *req = container_of(node, struct io_kiocb,
> + comp_list);
>
> __io_cqring_fill_event(ctx, req->user_data, req->result,
> req->compl.cflags);
> @@ -2340,15 +2349,18 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
> io_cqring_ev_posted(ctx);
>
> io_init_req_batch(&rb);
> - for (i = 0; i < nr; i++) {
> - struct io_kiocb *req = state->compl_reqs[i];
> + node = state->compl_reqs.first;
> + do {
> + struct io_kiocb *req = container_of(node, struct io_kiocb,
> + comp_list);
>
> + node = req->comp_list.next;
> if (req_ref_put_and_test(req))
> io_req_free_batch(&rb, req, &ctx->submit_state);
> - }
> + } while (node);
>
> io_req_free_batch_finish(ctx, &rb);
> - state->compl_nr = 0;
> + INIT_WQ_LIST(&state->compl_reqs);
> }
>
> /*
> @@ -2668,17 +2680,10 @@ static void io_req_task_complete(struct io_kiocb *req, bool *locked)
> unsigned int cflags = io_put_rw_kbuf(req);
> long res = req->result;
>
> - if (*locked) {
> - struct io_ring_ctx *ctx = req->ctx;
> - struct io_submit_state *state = &ctx->submit_state;
> -
> + if (*locked)
> io_req_complete_state(req, res, cflags);
> - state->compl_reqs[state->compl_nr++] = req;
> - if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
> - io_submit_flush_completions(ctx);
> - } else {
> + else
> io_req_complete_post(req, res, cflags);
> - }
> }
>
> static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
> @@ -6969,15 +6974,8 @@ static void __io_queue_sqe(struct io_kiocb *req)
> * doesn't support non-blocking read/write attempts
> */
> if (likely(!ret)) {
> - if (req->flags & REQ_F_COMPLETE_INLINE) {
> - struct io_ring_ctx *ctx = req->ctx;
> - struct io_submit_state *state = &ctx->submit_state;
> -
> - state->compl_reqs[state->compl_nr++] = req;
> - if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
> - io_submit_flush_completions(ctx);
> + if (req->flags & REQ_F_COMPLETE_INLINE)
> return;
> - }
>
> linked_timeout = io_prep_linked_timeout(req);
> if (linked_timeout)
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 10/24] io_uring: add a helper for batch free
2021-09-26 3:36 ` Hao Xu
@ 2021-09-28 9:33 ` Pavel Begunkov
0 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-28 9:33 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 9/26/21 4:36 AM, Hao Xu wrote:
> 在 2021/9/25 上午4:59, Pavel Begunkov 写道:
>> Add a helper io_free_batch_list(), which takes a single linked list and
>> puts/frees all requests from it in an efficient manner. Will be reused
>> later.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> fs/io_uring.c | 34 +++++++++++++++++++++-------------
>> 1 file changed, 21 insertions(+), 13 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 205127394649..ad8af05af4bc 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -2308,12 +2308,31 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
>> wq_stack_add_head(&req->comp_list, &state->free_list);
>> }
>> +static void io_free_batch_list(struct io_ring_ctx *ctx,
>> + struct io_wq_work_list *list)
>> + __must_hold(&ctx->uring_lock)
>> +{
>> + struct io_wq_work_node *node;
>> + struct req_batch rb;
>> +
>> + io_init_req_batch(&rb);
>> + node = list->first;
>> + do {
>> + struct io_kiocb *req = container_of(node, struct io_kiocb,
>> + comp_list);
>> +
>> + node = req->comp_list.next;
>> + if (req_ref_put_and_test(req))
>> + io_req_free_batch(&rb, req, &ctx->submit_state);
>> + } while (node);
>> + io_req_free_batch_finish(ctx, &rb);
>> +}
> Hi Pavel, Why not we use the wq_list_for_each here as well?
Because we do wq_stack_add_head() in the middle of the loop and
there is also refcounting adding potential races. We'd need
some for_each_*_safe version of it.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 12/24] io_uring: optimise batch completion
[not found] ` <CAFUsyfLSXMvd_MBAp83qriW7LD=bg2=25TC4e_X4oMO1atoPYg@mail.gmail.com>
@ 2021-09-28 9:35 ` Pavel Begunkov
0 siblings, 0 replies; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-28 9:35 UTC (permalink / raw)
To: Noah Goldstein; +Cc: Jens Axboe, open list:IO_URING
On 9/25/21 8:58 PM, Noah Goldstein wrote:
> On Fri, Sep 24, 2021 at 5:45 PM Pavel Begunkov <[email protected]>
[...]
>> + if (unlikely(!nr_events))
>> + return 0;
>> +
>> + io_commit_cqring(ctx);
>> + io_cqring_ev_posted_iopoll(ctx);
>> + list.first = start ? start->next : ctx->iopoll_list.first;
>> + list.last = prev;
>> wq_list_cut(&ctx->iopoll_list, prev, start);
>> - if (nr_events)
>> - io_iopoll_complete(ctx, &done);
>> + io_free_batch_list(ctx, &list);
>>
>
> If it's logically feasible it may be slightly faster on speculative machines
> to pass `nr_events` to `io_free_batch_list` so instead of having the loop
> condition on `node` you can use the counter and hopefully recover from
> the branch miss at the end of the loop before current execution catches up.
>
> return nr_events;
May be. We can experiment afterward and see if the numbers get better
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 04/24] io_uring: use slist for completion batching
2021-09-26 6:57 ` Hao Xu
@ 2021-09-28 9:41 ` Pavel Begunkov
2021-09-28 15:32 ` Jens Axboe
0 siblings, 1 reply; 32+ messages in thread
From: Pavel Begunkov @ 2021-09-28 9:41 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 9/26/21 7:57 AM, Hao Xu wrote:
> 在 2021/9/25 上午4:59, Pavel Begunkov 写道:
>> Currently we collect requests for completion batching in an array.
>> Replace them with a singly linked list. It's as fast as arrays but
>> doesn't take some much space in ctx, and will be used in future patches.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> fs/io_uring.c | 52 +++++++++++++++++++++++++--------------------------
>> 1 file changed, 25 insertions(+), 27 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 9c14e9e722ba..9a76c4f84311 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -322,8 +322,8 @@ struct io_submit_state {
>> /*
>> * Batch completion logic
>> */
>> - struct io_kiocb *compl_reqs[IO_COMPL_BATCH];
>> - unsigned int compl_nr;
>> + struct io_wq_work_list compl_reqs;
> Will it be better to rename struct io_wq_work_list to something more
> generic, io_wq_work_list is a bit confused, we are now using this
> type of linked list (stack as well) for various aim, not just to link
> iowq works.
Was thinking about it, e.g. io_slist, but had been already late --
lots of conflicts and a good chance to add a couple of extra bugs
on rebase. I think we can do it afterward (if ever considering
it troubles backporting)
>> +
>> /* inline/task_work completion list, under ->uring_lock */
>> struct list_head free_list;
>> };
>> @@ -883,6 +883,8 @@ struct io_kiocb {
>> struct io_wq_work work;
>> const struct cred *creds;
>> + struct io_wq_work_node comp_list;
>> +
>> /* store used ubuf, so we can prevent reloading */
>> struct io_mapped_ubuf *imu;
>> };
>> @@ -1169,7 +1171,7 @@ static inline void req_ref_get(struct io_kiocb *req)
>> static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
>> {
>> - if (ctx->submit_state.compl_nr)
>> + if (!wq_list_empty(&ctx->submit_state.compl_reqs))
>> __io_submit_flush_completions(ctx);
>> }
>> @@ -1326,6 +1328,7 @@ static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
>> INIT_LIST_HEAD(&ctx->submit_state.free_list);
>> INIT_LIST_HEAD(&ctx->locked_free_list);
>> INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func);
>> + INIT_WQ_LIST(&ctx->submit_state.compl_reqs);
>> return ctx;
>> err:
>> kfree(ctx->dummy_ubuf);
>> @@ -1831,11 +1834,16 @@ static inline bool io_req_needs_clean(struct io_kiocb *req)
>> static void io_req_complete_state(struct io_kiocb *req, long res,
>> unsigned int cflags)
>> {
>> + struct io_submit_state *state;
>> +
>> if (io_req_needs_clean(req))
>> io_clean_op(req);
>> req->result = res;
>> req->compl.cflags = cflags;
>> req->flags |= REQ_F_COMPLETE_INLINE;
>> +
>> + state = &req->ctx->submit_state;
>> + wq_list_add_tail(&req->comp_list, &state->compl_reqs);
>> }
>> static inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags,
>> @@ -2324,13 +2332,14 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req,
>> static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
>> __must_hold(&ctx->uring_lock)
>> {
>> + struct io_wq_work_node *node, *prev;
>> struct io_submit_state *state = &ctx->submit_state;
>> - int i, nr = state->compl_nr;
>> struct req_batch rb;
>> spin_lock(&ctx->completion_lock);
>> - for (i = 0; i < nr; i++) {
>> - struct io_kiocb *req = state->compl_reqs[i];
>> + wq_list_for_each(node, prev, &state->compl_reqs) {
>> + struct io_kiocb *req = container_of(node, struct io_kiocb,
>> + comp_list);
>> __io_cqring_fill_event(ctx, req->user_data, req->result,
>> req->compl.cflags);
>> @@ -2340,15 +2349,18 @@ static void __io_submit_flush_completions(struct io_ring_ctx *ctx)
>> io_cqring_ev_posted(ctx);
>> io_init_req_batch(&rb);
>> - for (i = 0; i < nr; i++) {
>> - struct io_kiocb *req = state->compl_reqs[i];
>> + node = state->compl_reqs.first;
>> + do {
>> + struct io_kiocb *req = container_of(node, struct io_kiocb,
>> + comp_list);
>> + node = req->comp_list.next;
>> if (req_ref_put_and_test(req))
>> io_req_free_batch(&rb, req, &ctx->submit_state);
>> - }
>> + } while (node);
>> io_req_free_batch_finish(ctx, &rb);
>> - state->compl_nr = 0;
>> + INIT_WQ_LIST(&state->compl_reqs);
>> }
>> /*
>> @@ -2668,17 +2680,10 @@ static void io_req_task_complete(struct io_kiocb *req, bool *locked)
>> unsigned int cflags = io_put_rw_kbuf(req);
>> long res = req->result;
>> - if (*locked) {
>> - struct io_ring_ctx *ctx = req->ctx;
>> - struct io_submit_state *state = &ctx->submit_state;
>> -
>> + if (*locked)
>> io_req_complete_state(req, res, cflags);
>> - state->compl_reqs[state->compl_nr++] = req;
>> - if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
>> - io_submit_flush_completions(ctx);
>> - } else {
>> + else
>> io_req_complete_post(req, res, cflags);
>> - }
>> }
>> static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
>> @@ -6969,15 +6974,8 @@ static void __io_queue_sqe(struct io_kiocb *req)
>> * doesn't support non-blocking read/write attempts
>> */
>> if (likely(!ret)) {
>> - if (req->flags & REQ_F_COMPLETE_INLINE) {
>> - struct io_ring_ctx *ctx = req->ctx;
>> - struct io_submit_state *state = &ctx->submit_state;
>> -
>> - state->compl_reqs[state->compl_nr++] = req;
>> - if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
>> - io_submit_flush_completions(ctx);
>> + if (req->flags & REQ_F_COMPLETE_INLINE)
>> return;
>> - }
>> linked_timeout = io_prep_linked_timeout(req);
>> if (linked_timeout)
>>
>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 04/24] io_uring: use slist for completion batching
2021-09-28 9:41 ` Pavel Begunkov
@ 2021-09-28 15:32 ` Jens Axboe
0 siblings, 0 replies; 32+ messages in thread
From: Jens Axboe @ 2021-09-28 15:32 UTC (permalink / raw)
To: Pavel Begunkov, Hao Xu, io-uring
On 9/28/21 3:41 AM, Pavel Begunkov wrote:
> On 9/26/21 7:57 AM, Hao Xu wrote:
>> 在 2021/9/25 上午4:59, Pavel Begunkov 写道:
>>> Currently we collect requests for completion batching in an array.
>>> Replace them with a singly linked list. It's as fast as arrays but
>>> doesn't take some much space in ctx, and will be used in future patches.
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>> fs/io_uring.c | 52 +++++++++++++++++++++++++--------------------------
>>> 1 file changed, 25 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 9c14e9e722ba..9a76c4f84311 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -322,8 +322,8 @@ struct io_submit_state {
>>> /*
>>> * Batch completion logic
>>> */
>>> - struct io_kiocb *compl_reqs[IO_COMPL_BATCH];
>>> - unsigned int compl_nr;
>>> + struct io_wq_work_list compl_reqs;
>> Will it be better to rename struct io_wq_work_list to something more
>> generic, io_wq_work_list is a bit confused, we are now using this
>> type of linked list (stack as well) for various aim, not just to link
>> iowq works.
>
> Was thinking about it, e.g. io_slist, but had been already late --
> lots of conflicts and a good chance to add a couple of extra bugs
> on rebase. I think we can do it afterward (if ever considering
> it troubles backporting)
Agree with both of you - it should be renamed, but at the same time it's
also really annoying with trivial conflicts for eg stable patches due to
random renaming. I was bit by this again just recently, function rename
in that case.
So let's keep the name for now, and once we have some quiet time, we
can get that done.
--
Jens Axboe
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH v2 00/24] rework and optimise submission+completion paths
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
` (23 preceding siblings ...)
2021-09-24 21:00 ` [PATCH v2 24/24] io_uring: disable draining earlier Pavel Begunkov
@ 2021-09-30 16:04 ` Jens Axboe
24 siblings, 0 replies; 32+ messages in thread
From: Jens Axboe @ 2021-09-30 16:04 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 9/24/21 2:59 PM, Pavel Begunkov wrote:
> 24 MIOPS vs 31.5, or ~30% win for fio/t/io_uring nops batching=32
> Jens mentioned that with his standard test against Optane it gave
> yet another +3% to throughput.
>
> 1-14 are about optimising the completion path:
> - replaces lists with single linked lists
> - kills 64 * 8B of caches in ctx
> - adds some shuffling of iopoll bits
> - list splice instead of per-req list_add in one place
> - inlines io_req_free_batch() and other helpers
>
> 15-22: inlines __io_queue_sqe() so all the submission path
> up to io_issue_sqe() is inlined + little tweaks
Applied for 5.16, thanks!
--
Jens Axboe
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2021-09-30 16:04 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-24 20:59 [PATCH v2 00/24] rework and optimise submission+completion paths Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 01/24] io_uring: mark having different creds unlikely Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 02/24] io_uring: force_nonspin Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 03/24] io_uring: make io_do_iopoll return number of reqs Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 04/24] io_uring: use slist for completion batching Pavel Begunkov
2021-09-26 6:57 ` Hao Xu
2021-09-28 9:41 ` Pavel Begunkov
2021-09-28 15:32 ` Jens Axboe
2021-09-24 20:59 ` [PATCH v2 05/24] io_uring: remove allocation cache array Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 06/24] io-wq: add io_wq_work_node based stack Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 07/24] io_uring: replace list with stack for req caches Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 08/24] io_uring: split iopoll loop Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 09/24] io_uring: use single linked list for iopoll Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 10/24] io_uring: add a helper for batch free Pavel Begunkov
2021-09-26 3:36 ` Hao Xu
2021-09-28 9:33 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 11/24] io_uring: convert iopoll_completed to store_release Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 12/24] io_uring: optimise batch completion Pavel Begunkov
[not found] ` <CAFUsyfLSXMvd_MBAp83qriW7LD=bg2=25TC4e_X4oMO1atoPYg@mail.gmail.com>
2021-09-28 9:35 ` Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 13/24] io_uring: inline completion batching helpers Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 14/24] io_uring: don't pass tail into io_free_batch_list Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 15/24] io_uring: don't pass state to io_submit_state_end Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 16/24] io_uring: deduplicate io_queue_sqe() call sites Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 17/24] io_uring: remove drain_active check from hot path Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 18/24] io_uring: split slow path from io_queue_sqe Pavel Begunkov
2021-09-24 20:59 ` [PATCH v2 19/24] io_uring: inline hot path of __io_queue_sqe() Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 20/24] io_uring: reshuffle queue_sqe completion handling Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 21/24] io_uring: restructure submit sqes to_submit checks Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 22/24] io_uring: kill off ->inflight_entry field Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 23/24] io_uring: comment why inline complete calls io_clean_op() Pavel Begunkov
2021-09-24 21:00 ` [PATCH v2 24/24] io_uring: disable draining earlier Pavel Begunkov
2021-09-30 16:04 ` [PATCH v2 00/24] rework and optimise submission+completion paths Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox