public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/5] batch completion + freeing improvements
@ 2020-07-13 23:41 Pavel Begunkov
  2020-07-13 23:41 ` [PATCH 1/5] io_uring: move io_req_complete() definition Pavel Begunkov
                   ` (5 more replies)
  0 siblings, 6 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-07-13 23:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

Different batching improvements, that's it.

Unfortunately, I don't have a decent SSD/setup at hand to
benchmark it properly.

p.s. if extra 32 pointers on stack would be a problem, I wanted for
long to put submit_state into ctx itself.

Pavel Begunkov (5):
  io_uring: move io_req_complete() definition
  io_uring: replace list with array for compl batch
  io_uring: batch free in batched completion
  tasks: add put_task_struct_many()
  io_uring: batch put_task_struct()

 fs/io_uring.c              | 129 ++++++++++++++++++++++---------------
 include/linux/sched/task.h |   6 ++
 2 files changed, 82 insertions(+), 53 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] io_uring: move io_req_complete() definition
  2020-07-13 23:41 [PATCH 0/5] batch completion + freeing improvements Pavel Begunkov
@ 2020-07-13 23:41 ` Pavel Begunkov
  2020-07-13 23:41 ` [PATCH 2/5] io_uring: replace list with array for compl batch Pavel Begunkov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-07-13 23:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

Just a preparation patch, moving several functions
(i.e. [__]io_req_complete(), io_submit_flush_completions()) below in
code to avoid extra declarations.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 96 +++++++++++++++++++++++++--------------------------
 1 file changed, 48 insertions(+), 48 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 7038c4f08805..609c7da044d7 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1428,54 +1428,6 @@ static void io_cqring_add_event(struct io_kiocb *req, long res, long cflags)
 	io_cqring_ev_posted(ctx);
 }
 
-static void io_submit_flush_completions(struct io_comp_state *cs)
-{
-	struct io_ring_ctx *ctx = cs->ctx;
-
-	spin_lock_irq(&ctx->completion_lock);
-	while (!list_empty(&cs->list)) {
-		struct io_kiocb *req;
-
-		req = list_first_entry(&cs->list, struct io_kiocb, compl.list);
-		list_del(&req->compl.list);
-		__io_cqring_fill_event(req, req->result, req->compl.cflags);
-		if (!(req->flags & REQ_F_LINK_HEAD)) {
-			req->flags |= REQ_F_COMP_LOCKED;
-			io_put_req(req);
-		} else {
-			spin_unlock_irq(&ctx->completion_lock);
-			io_put_req(req);
-			spin_lock_irq(&ctx->completion_lock);
-		}
-	}
-	io_commit_cqring(ctx);
-	spin_unlock_irq(&ctx->completion_lock);
-
-	io_cqring_ev_posted(ctx);
-	cs->nr = 0;
-}
-
-static void __io_req_complete(struct io_kiocb *req, long res, unsigned cflags,
-			      struct io_comp_state *cs)
-{
-	if (!cs) {
-		io_cqring_add_event(req, res, cflags);
-		io_put_req(req);
-	} else {
-		io_clean_op(req);
-		req->result = res;
-		req->compl.cflags = cflags;
-		list_add_tail(&req->compl.list, &cs->list);
-		if (++cs->nr >= 32)
-			io_submit_flush_completions(cs);
-	}
-}
-
-static void io_req_complete(struct io_kiocb *req, long res)
-{
-	__io_req_complete(req, res, 0, NULL);
-}
-
 static inline bool io_is_fallback_req(struct io_kiocb *req)
 {
 	return req == (struct io_kiocb *)
@@ -1840,6 +1792,54 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
 		__io_req_free_batch_flush(req->ctx, rb);
 }
 
+static void io_submit_flush_completions(struct io_comp_state *cs)
+{
+	struct io_ring_ctx *ctx = cs->ctx;
+
+	spin_lock_irq(&ctx->completion_lock);
+	while (!list_empty(&cs->list)) {
+		struct io_kiocb *req;
+
+		req = list_first_entry(&cs->list, struct io_kiocb, compl.list);
+		list_del(&req->compl.list);
+		__io_cqring_fill_event(req, req->result, req->compl.cflags);
+		if (!(req->flags & REQ_F_LINK_HEAD)) {
+			req->flags |= REQ_F_COMP_LOCKED;
+			io_put_req(req);
+		} else {
+			spin_unlock_irq(&ctx->completion_lock);
+			io_put_req(req);
+			spin_lock_irq(&ctx->completion_lock);
+		}
+	}
+	io_commit_cqring(ctx);
+	spin_unlock_irq(&ctx->completion_lock);
+
+	io_cqring_ev_posted(ctx);
+	cs->nr = 0;
+}
+
+static void __io_req_complete(struct io_kiocb *req, long res, unsigned cflags,
+			      struct io_comp_state *cs)
+{
+	if (!cs) {
+		io_cqring_add_event(req, res, cflags);
+		io_put_req(req);
+	} else {
+		io_clean_op(req);
+		req->result = res;
+		req->compl.cflags = cflags;
+		list_add_tail(&req->compl.list, &cs->list);
+		if (++cs->nr >= 32)
+			io_submit_flush_completions(cs);
+	}
+}
+
+static void io_req_complete(struct io_kiocb *req, long res)
+{
+	__io_req_complete(req, res, 0, NULL);
+}
+
 /*
  * Drop reference to request, return next in chain (if there is one) if this
  * was the last reference to this request.
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] io_uring: replace list with array for compl batch
  2020-07-13 23:41 [PATCH 0/5] batch completion + freeing improvements Pavel Begunkov
  2020-07-13 23:41 ` [PATCH 1/5] io_uring: move io_req_complete() definition Pavel Begunkov
@ 2020-07-13 23:41 ` Pavel Begunkov
  2020-07-13 23:41 ` [PATCH 3/5] io_uring: batch free in batched completion Pavel Begunkov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-07-13 23:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

We limit how much request we batch on the completion path, use a fixed
size array instead of lists for completion batching. That also allows
to split io_submit_flush_completions() into 2 steps: the first is
filling CQEs, and the second actually frees requests.

There are plenty of benefits:
- list head tossing is expensive + removes LIST_INIT in state prep
- doesn't do extra unlock/lock to put a linked request
- filling CQEs first gives better latency
- will be used to handle list entry aliasing and add batch free there

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 33 +++++++++++++--------------------
 1 file changed, 13 insertions(+), 20 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 609c7da044d7..3277a06e2fb6 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -680,11 +680,12 @@ struct io_defer_entry {
 };
 
 #define IO_IOPOLL_BATCH			8
+#define IO_COMPL_BATCH			32
 
 struct io_comp_state {
-	unsigned int		nr;
-	struct list_head	list;
 	struct io_ring_ctx	*ctx;
+	unsigned int		nr;
+	struct io_kiocb		*reqs[IO_COMPL_BATCH];
 };
 
 struct io_submit_state {
@@ -1794,28 +1795,21 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
 
 static void io_submit_flush_completions(struct io_comp_state *cs)
 {
+	struct io_kiocb *req;
 	struct io_ring_ctx *ctx = cs->ctx;
+	int i, nr = cs->nr;
 
 	spin_lock_irq(&ctx->completion_lock);
-	while (!list_empty(&cs->list)) {
-		struct io_kiocb *req;
-
-		req = list_first_entry(&cs->list, struct io_kiocb, compl.list);
-		list_del(&req->compl.list);
+	for (i = 0; i < nr; ++i) {
+		req = cs->reqs[i];
 		__io_cqring_fill_event(req, req->result, req->compl.cflags);
-		if (!(req->flags & REQ_F_LINK_HEAD)) {
-			req->flags |= REQ_F_COMP_LOCKED;
-			io_put_req(req);
-		} else {
-			spin_unlock_irq(&ctx->completion_lock);
-			io_put_req(req);
-			spin_lock_irq(&ctx->completion_lock);
-		}
 	}
 	io_commit_cqring(ctx);
 	spin_unlock_irq(&ctx->completion_lock);
-
 	io_cqring_ev_posted(ctx);
+
+	for (i = 0; i < nr; ++i)
+		io_put_req(cs->reqs[i]);
 	cs->nr = 0;
 }
 
@@ -1829,8 +1823,8 @@ static void __io_req_complete(struct io_kiocb *req, long res, unsigned cflags,
 		io_clean_op(req);
 		req->result = res;
 		req->compl.cflags = cflags;
-		list_add_tail(&req->compl.list, &cs->list);
-		if (++cs->nr >= 32)
+		cs->reqs[cs->nr++] = req;
+		if (cs->nr == IO_COMPL_BATCH)
 			io_submit_flush_completions(cs);
 	}
 }
@@ -6118,7 +6112,7 @@ static int io_submit_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
  */
 static void io_submit_state_end(struct io_submit_state *state)
 {
-	if (!list_empty(&state->comp.list))
+	if (state->comp.nr)
 		io_submit_flush_completions(&state->comp);
 	blk_finish_plug(&state->plug);
 	io_state_file_put(state);
@@ -6137,7 +6131,6 @@ static void io_submit_state_start(struct io_submit_state *state,
 	state->plug.nowait = true;
 #endif
 	state->comp.nr = 0;
-	INIT_LIST_HEAD(&state->comp.list);
 	state->comp.ctx = ctx;
 	state->free_reqs = 0;
 	state->file = NULL;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] io_uring: batch free in batched completion
  2020-07-13 23:41 [PATCH 0/5] batch completion + freeing improvements Pavel Begunkov
  2020-07-13 23:41 ` [PATCH 1/5] io_uring: move io_req_complete() definition Pavel Begunkov
  2020-07-13 23:41 ` [PATCH 2/5] io_uring: replace list with array for compl batch Pavel Begunkov
@ 2020-07-13 23:41 ` Pavel Begunkov
  2020-07-13 23:41 ` [PATCH 4/5] tasks: add put_task_struct_many() Pavel Begunkov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-07-13 23:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

io_submit_flush_completions() already does batching, hence also bundle
freeing reusing batch_free from iopoll.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3277a06e2fb6..6f767781351f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1795,6 +1795,7 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
 
 static void io_submit_flush_completions(struct io_comp_state *cs)
 {
+	struct req_batch rb;
 	struct io_kiocb *req;
 	struct io_ring_ctx *ctx = cs->ctx;
 	int i, nr = cs->nr;
@@ -1808,8 +1809,13 @@ static void io_submit_flush_completions(struct io_comp_state *cs)
 	spin_unlock_irq(&ctx->completion_lock);
 	io_cqring_ev_posted(ctx);
 
-	for (i = 0; i < nr; ++i)
-		io_put_req(cs->reqs[i]);
+	rb.to_free = 0;
+	for (i = 0; i < nr; ++i) {
+		req = cs->reqs[i];
+		if (refcount_dec_and_test(&req->refs))
+			io_req_free_batch(&rb, req);
+	}
+	io_req_free_batch_finish(ctx, &rb);
 	cs->nr = 0;
 }
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] tasks: add put_task_struct_many()
  2020-07-13 23:41 [PATCH 0/5] batch completion + freeing improvements Pavel Begunkov
                   ` (2 preceding siblings ...)
  2020-07-13 23:41 ` [PATCH 3/5] io_uring: batch free in batched completion Pavel Begunkov
@ 2020-07-13 23:41 ` Pavel Begunkov
  2020-07-13 23:41 ` [PATCH 5/5] io_uring: batch put_task_struct() Pavel Begunkov
  2020-07-14  1:08 ` [PATCH 0/5] batch completion + freeing improvements Jens Axboe
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-07-13 23:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

put_task_struct_many() is as put_task_struct() but puts several
references at once. Useful to batching it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 include/linux/sched/task.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
index 38359071236a..1301077f9c24 100644
--- a/include/linux/sched/task.h
+++ b/include/linux/sched/task.h
@@ -126,6 +126,12 @@ static inline void put_task_struct(struct task_struct *t)
 		__put_task_struct(t);
 }
 
+static inline void put_task_struct_many(struct task_struct *t, int nr)
+{
+	if (refcount_sub_and_test(nr, &t->usage))
+		__put_task_struct(t);
+}
+
 void put_task_struct_rcu_user(struct task_struct *task);
 
 #ifdef CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] io_uring: batch put_task_struct()
  2020-07-13 23:41 [PATCH 0/5] batch completion + freeing improvements Pavel Begunkov
                   ` (3 preceding siblings ...)
  2020-07-13 23:41 ` [PATCH 4/5] tasks: add put_task_struct_many() Pavel Begunkov
@ 2020-07-13 23:41 ` Pavel Begunkov
  2020-07-14  1:08 ` [PATCH 0/5] batch completion + freeing improvements Jens Axboe
  5 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-07-13 23:41 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

Each put_task_struct() is an atomic_dec. Do that in batches.

Tested io_uring-bench(iopoll,QD=128) with a custom nullblk, where
added ->iopoll() is not optimised at all:

before: 529504 IOPS
after: 	538415 IOPS
diff:	~1.8%

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6f767781351f..3216cc00061b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1761,8 +1761,18 @@ static void io_free_req(struct io_kiocb *req)
 struct req_batch {
 	void *reqs[IO_IOPOLL_BATCH];
 	int to_free;
+
+	struct task_struct	*task;
+	int			task_refs;
 };
 
+static void io_init_req_batch(struct req_batch *rb)
+{
+	rb->to_free = 0;
+	rb->task_refs = 0;
+	rb->task = NULL;
+}
+
 static void __io_req_free_batch_flush(struct io_ring_ctx *ctx,
 				      struct req_batch *rb)
 {
@@ -1776,6 +1786,10 @@ static void io_req_free_batch_finish(struct io_ring_ctx *ctx,
 {
 	if (rb->to_free)
 		__io_req_free_batch_flush(ctx, rb);
+	if (rb->task) {
+		put_task_struct_many(rb->task, rb->task_refs);
+		rb->task = NULL;
+	}
 }
 
 static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
@@ -1787,6 +1801,16 @@ static void io_req_free_batch(struct req_batch *rb, struct io_kiocb *req)
 	if (req->flags & REQ_F_LINK_HEAD)
 		io_queue_next(req);
 
+	if (req->flags & REQ_F_TASK_PINNED) {
+		if (req->task != rb->task && rb->task) {
+			put_task_struct_many(rb->task, rb->task_refs);
+			rb->task = req->task;
+			rb->task_refs = 0;
+		}
+		rb->task_refs++;
+		req->flags &= ~REQ_F_TASK_PINNED;
+	}
+
 	io_dismantle_req(req);
 	rb->reqs[rb->to_free++] = req;
 	if (unlikely(rb->to_free == ARRAY_SIZE(rb->reqs)))
@@ -1809,7 +1833,7 @@ static void io_submit_flush_completions(struct io_comp_state *cs)
 	spin_unlock_irq(&ctx->completion_lock);
 	io_cqring_ev_posted(ctx);
 
-	rb.to_free = 0;
+	io_init_req_batch(&rb);
 	for (i = 0; i < nr; ++i) {
 		req = cs->reqs[i];
 		if (refcount_dec_and_test(&req->refs))
@@ -1973,7 +1997,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 	/* order with ->result store in io_complete_rw_iopoll() */
 	smp_rmb();
 
-	rb.to_free = 0;
+	io_init_req_batch(&rb);
 	while (!list_empty(done)) {
 		int cflags = 0;
 
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] batch completion + freeing improvements
  2020-07-13 23:41 [PATCH 0/5] batch completion + freeing improvements Pavel Begunkov
                   ` (4 preceding siblings ...)
  2020-07-13 23:41 ` [PATCH 5/5] io_uring: batch put_task_struct() Pavel Begunkov
@ 2020-07-14  1:08 ` Jens Axboe
  2020-07-14  7:15   ` Pavel Begunkov
  5 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2020-07-14  1:08 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring, linux-kernel

On 7/13/20 5:41 PM, Pavel Begunkov wrote:
> Different batching improvements, that's it.
> 
> Unfortunately, I don't have a decent SSD/setup at hand to
> benchmark it properly.

I do though, but I'm not seeing any improvement with this, whereas
some of the previous series made nice improvements... If anything
maybe it's a bit slower.

> p.s. if extra 32 pointers on stack would be a problem, I wanted for
> long to put submit_state into ctx itself.

It's getting up there... But really depends on how early in the stack,
so 32 could _probably_ work, though usually batched on-stack counts
are a bit lower than that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/5] batch completion + freeing improvements
  2020-07-14  1:08 ` [PATCH 0/5] batch completion + freeing improvements Jens Axboe
@ 2020-07-14  7:15   ` Pavel Begunkov
  0 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2020-07-14  7:15 UTC (permalink / raw)
  To: Jens Axboe, io-uring, linux-kernel

On 14/07/2020 04:08, Jens Axboe wrote:
> On 7/13/20 5:41 PM, Pavel Begunkov wrote:
>> Different batching improvements, that's it.
>>
>> Unfortunately, I don't have a decent SSD/setup at hand to
>> benchmark it properly.
> 
> I do though, but I'm not seeing any improvement with this, whereas
> some of the previous series made nice improvements... If anything
> maybe it's a bit slower.

Thanks for testing it, appreciate that. Probably, the array did
something wrong with your caches, or the 2-step approach is to blame.
I'll try to refine and/or resend parts after closer benchmarking.

> 
>> p.s. if extra 32 pointers on stack would be a problem, I wanted for
>> long to put submit_state into ctx itself.
> 
> It's getting up there... But really depends on how early in the stack,
> so 32 could _probably_ work, though usually batched on-stack counts
> are a bit lower than that.

On a fresh head 250 bytes looks too much, I agree. That considering
that io_uring is stacking on top of vfs or near that, and there are
already fast_iovec/msg.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-07-14  7:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-07-13 23:41 [PATCH 0/5] batch completion + freeing improvements Pavel Begunkov
2020-07-13 23:41 ` [PATCH 1/5] io_uring: move io_req_complete() definition Pavel Begunkov
2020-07-13 23:41 ` [PATCH 2/5] io_uring: replace list with array for compl batch Pavel Begunkov
2020-07-13 23:41 ` [PATCH 3/5] io_uring: batch free in batched completion Pavel Begunkov
2020-07-13 23:41 ` [PATCH 4/5] tasks: add put_task_struct_many() Pavel Begunkov
2020-07-13 23:41 ` [PATCH 5/5] io_uring: batch put_task_struct() Pavel Begunkov
2020-07-14  1:08 ` [PATCH 0/5] batch completion + freeing improvements Jens Axboe
2020-07-14  7:15   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox