[PATCH v7 0/5] task optimization

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH v7 0/5] task optimization
@ 2021-12-07  9:39 Hao Xu
  2021-12-07  9:39 ` [PATCH 1/5] io-wq: add helper to merge two wq_lists Hao Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-07  9:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

v4->v5
- change the implementation of merge_wq_list

v5->v6
- change the logic of handling prior task list to:
  1) grabbed uring_lock: leverage the inline completion infra
  2) otherwise: batch __req_complete_post() calls to save
     completion_lock operations.

v6->v7
- add Pavel's fix of wrong spin unlock
- remove a patch and rebase work

Hao Xu (5):
  io-wq: add helper to merge two wq_lists
  io_uring: add a priority tw list for irq completion work
  io_uring: add helper for task work execution code
  io_uring: split io_req_complete_post() and add a helper
  io_uring: batch completion in prior_task_list

 fs/io-wq.h    |  22 ++++++++
 fs/io_uring.c | 142 ++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 126 insertions(+), 38 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] io-wq: add helper to merge two wq_lists
  2021-12-07  9:39 [PATCH v7 0/5] task optimization Hao Xu
@ 2021-12-07  9:39 ` Hao Xu
  2021-12-07  9:39 ` [PATCH 2/5] io_uring: add a priority tw list for irq completion work Hao Xu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-07  9:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

add a helper to merge two wq_lists, it will be useful in the next
patches.

Reviewed-by: Pavel Begunkov <[email protected]>
Signed-off-by: Hao Xu <[email protected]>
---
 fs/io-wq.h | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/fs/io-wq.h b/fs/io-wq.h
index 41bf37674a49..3709b7c5ec98 100644
--- a/fs/io-wq.h
+++ b/fs/io-wq.h
@@ -52,6 +52,28 @@ static inline void wq_list_add_after(struct io_wq_work_node *node,
 		list->last = node;
 }
 
+/**
+ * wq_list_merge - merge the second list to the first one.
+ * @list0: the first list
+ * @list1: the second list
+ * Return the first node after mergence.
+ */
+static inline struct io_wq_work_node *wq_list_merge(struct io_wq_work_list *list0,
+						    struct io_wq_work_list *list1)
+{
+	struct io_wq_work_node *ret;
+
+	if (!list0->first) {
+		ret = list1->first;
+	} else {
+		ret = list0->first;
+		list0->last->next = list1->first;
+	}
+	INIT_WQ_LIST(list0);
+	INIT_WQ_LIST(list1);
+	return ret;
+}
+
 static inline void wq_list_add_tail(struct io_wq_work_node *node,
 				    struct io_wq_work_list *list)
 {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] io_uring: add a priority tw list for irq completion work
  2021-12-07  9:39 [PATCH v7 0/5] task optimization Hao Xu
  2021-12-07  9:39 ` [PATCH 1/5] io-wq: add helper to merge two wq_lists Hao Xu
@ 2021-12-07  9:39 ` Hao Xu
  2021-12-07  9:39 ` [PATCH 3/5] io_uring: add helper for task work execution code Hao Xu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-07  9:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Now we have a lot of task_work users, some are just to complete a req
and generate a cqe. Let's put the work to a new tw list which has a
higher priority, so that it can be handled quickly and thus to reduce
avg req latency and users can issue next round of sqes earlier.
An explanatory case:

origin timeline:
    submit_sqe-->irq-->add completion task_work
    -->run heavy work0~n-->run completion task_work
now timeline:
    submit_sqe-->irq-->add completion task_work
    -->run completion task_work-->run heavy work0~n

Limitation: this optimization is only for those that submission and
reaping process are in different threads. Otherwise anyhow we have to
submit new sqes after returning to userspace, then the order of TWs
doesn't matter.

Tested this patch(and the following ones) by manually replace
__io_queue_sqe() in io_queue_sqe() by io_req_task_queue() to construct
'heavy' task works. Then test with fio:

ioengine=io_uring
sqpoll=1
thread=1
bs=4k
direct=1
rw=randread
time_based=1
runtime=600
randrepeat=0
group_reporting=1
filename=/dev/nvme0n1

Tried various iodepth.
The peak IOPS for this patch is 710K, while the old one is 665K.
For avg latency, difference shows when iodepth grow:
depth and avg latency(usec):
	depth      new          old
	 1        7.05         7.10
	 2        8.47         8.60
	 4        10.42        10.42
	 8        13.78        13.22
	 16       27.41        24.33
	 32       49.40        53.08
	 64       102.53       103.36
	 128      196.98       205.61
	 256      372.99       414.88
         512      747.23       791.30
         1024     1472.59      1538.72
         2048     3153.49      3329.01
         4096     6387.86      6682.54
         8192     12150.25     12774.14
         16384    23085.58     26044.71

Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 1265dc1942eb..ad389466a912 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -474,6 +474,7 @@ struct io_uring_task {
 
 	spinlock_t		task_lock;
 	struct io_wq_work_list	task_list;
+	struct io_wq_work_list	prior_task_list;
 	struct callback_head	task_work;
 	bool			task_running;
 };
@@ -2226,12 +2227,12 @@ static void tctx_task_work(struct callback_head *cb)
 	while (1) {
 		struct io_wq_work_node *node;
 
-		if (!tctx->task_list.first && locked)
+		if (!tctx->prior_task_list.first &&
+		    !tctx->task_list.first && locked)
 			io_submit_flush_completions(ctx);
 
 		spin_lock_irq(&tctx->task_lock);
-		node = tctx->task_list.first;
-		INIT_WQ_LIST(&tctx->task_list);
+		node= wq_list_merge(&tctx->prior_task_list, &tctx->task_list);
 		if (!node)
 			tctx->task_running = false;
 		spin_unlock_irq(&tctx->task_lock);
@@ -2260,7 +2261,7 @@ static void tctx_task_work(struct callback_head *cb)
 	ctx_flush_and_put(ctx, &locked);
 }
 
-static void io_req_task_work_add(struct io_kiocb *req)
+static void io_req_task_work_add(struct io_kiocb *req, bool priority)
 {
 	struct task_struct *tsk = req->task;
 	struct io_uring_task *tctx = tsk->io_uring;
@@ -2272,7 +2273,10 @@ static void io_req_task_work_add(struct io_kiocb *req)
 	WARN_ON_ONCE(!tctx);
 
 	spin_lock_irqsave(&tctx->task_lock, flags);
-	wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
+	if (priority)
+		wq_list_add_tail(&req->io_task_work.node, &tctx->prior_task_list);
+	else
+		wq_list_add_tail(&req->io_task_work.node, &tctx->task_list);
 	running = tctx->task_running;
 	if (!running)
 		tctx->task_running = true;
@@ -2297,8 +2301,7 @@ static void io_req_task_work_add(struct io_kiocb *req)
 
 	spin_lock_irqsave(&tctx->task_lock, flags);
 	tctx->task_running = false;
-	node = tctx->task_list.first;
-	INIT_WQ_LIST(&tctx->task_list);
+	node = wq_list_merge(&tctx->prior_task_list, &tctx->task_list);
 	spin_unlock_irqrestore(&tctx->task_lock, flags);
 
 	while (node) {
@@ -2335,19 +2338,19 @@ static void io_req_task_queue_fail(struct io_kiocb *req, int ret)
 {
 	req->result = ret;
 	req->io_task_work.func = io_req_task_cancel;
-	io_req_task_work_add(req);
+	io_req_task_work_add(req, false);
 }
 
 static void io_req_task_queue(struct io_kiocb *req)
 {
 	req->io_task_work.func = io_req_task_submit;
-	io_req_task_work_add(req);
+	io_req_task_work_add(req, false);
 }
 
 static void io_req_task_queue_reissue(struct io_kiocb *req)
 {
 	req->io_task_work.func = io_queue_async_work;
-	io_req_task_work_add(req);
+	io_req_task_work_add(req, false);
 }
 
 static inline void io_queue_next(struct io_kiocb *req)
@@ -2457,7 +2460,7 @@ static inline void io_put_req_deferred(struct io_kiocb *req)
 {
 	if (req_ref_put_and_test(req)) {
 		req->io_task_work.func = io_free_req_work;
-		io_req_task_work_add(req);
+		io_req_task_work_add(req, false);
 	}
 }
 
@@ -2744,7 +2747,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res)
 		return;
 	req->result = res;
 	req->io_task_work.func = io_req_task_complete;
-	io_req_task_work_add(req);
+	io_req_task_work_add(req, true);
 }
 
 static void io_complete_rw_iopoll(struct kiocb *kiocb, long res)
@@ -2986,7 +2989,7 @@ static void kiocb_done(struct io_kiocb *req, ssize_t ret,
 			req_set_fail(req);
 			req->result = ret;
 			req->io_task_work.func = io_req_task_complete;
-			io_req_task_work_add(req);
+			io_req_task_work_add(req, false);
 		}
 	}
 }
@@ -5309,7 +5312,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll,
 	 * of executing it. We can't safely execute it anyway, as we may not
 	 * have the needed state needed for it anyway.
 	 */
-	io_req_task_work_add(req);
+	io_req_task_work_add(req, false);
 	return 1;
 }
 
@@ -5972,7 +5975,7 @@ static enum hrtimer_restart io_timeout_fn(struct hrtimer *timer)
 
 	req->result = -ETIME;
 	req->io_task_work.func = io_req_task_complete;
-	io_req_task_work_add(req);
+	io_req_task_work_add(req, false);
 	return HRTIMER_NORESTART;
 }
 
@@ -6947,7 +6950,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer)
 	spin_unlock_irqrestore(&ctx->timeout_lock, flags);
 
 	req->io_task_work.func = io_req_task_link_timeout;
-	io_req_task_work_add(req);
+	io_req_task_work_add(req, false);
 	return HRTIMER_NORESTART;
 }
 
@@ -8662,6 +8665,7 @@ static __cold int io_uring_alloc_task_context(struct task_struct *task,
 	task->io_uring = tctx;
 	spin_lock_init(&tctx->task_lock);
 	INIT_WQ_LIST(&tctx->task_list);
+	INIT_WQ_LIST(&tctx->prior_task_list);
 	init_task_work(&tctx->task_work, tctx_task_work);
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] io_uring: add helper for task work execution code
  2021-12-07  9:39 [PATCH v7 0/5] task optimization Hao Xu
  2021-12-07  9:39 ` [PATCH 1/5] io-wq: add helper to merge two wq_lists Hao Xu
  2021-12-07  9:39 ` [PATCH 2/5] io_uring: add a priority tw list for irq completion work Hao Xu
@ 2021-12-07  9:39 ` Hao Xu
  2021-12-07  9:39 ` [PATCH 4/5] io_uring: split io_req_complete_post() and add a helper Hao Xu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-07  9:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Add a helper for task work execution code. We will use it later.

Reviewed-by: Pavel Begunkov <[email protected]>
Signed-off-by: Hao Xu <[email protected]>
---
 fs/io_uring.c | 36 ++++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ad389466a912..85f9459e9072 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2217,6 +2217,25 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
 	percpu_ref_put(&ctx->refs);
 }
 
+static void handle_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx, bool *locked)
+{
+	do {
+		struct io_wq_work_node *next = node->next;
+		struct io_kiocb *req = container_of(node, struct io_kiocb,
+						    io_task_work.node);
+
+		if (req->ctx != *ctx) {
+			ctx_flush_and_put(*ctx, locked);
+			*ctx = req->ctx;
+			/* if not contended, grab and improve batching */
+			*locked = mutex_trylock(&(*ctx)->uring_lock);
+			percpu_ref_get(&(*ctx)->refs);
+		}
+		req->io_task_work.func(req, locked);
+		node = next;
+	} while (node);
+}
+
 static void tctx_task_work(struct callback_head *cb)
 {
 	bool locked = false;
@@ -2239,22 +2258,7 @@ static void tctx_task_work(struct callback_head *cb)
 		if (!node)
 			break;
 
-		do {
-			struct io_wq_work_node *next = node->next;
-			struct io_kiocb *req = container_of(node, struct io_kiocb,
-							    io_task_work.node);
-
-			if (req->ctx != ctx) {
-				ctx_flush_and_put(ctx, &locked);
-				ctx = req->ctx;
-				/* if not contended, grab and improve batching */
-				locked = mutex_trylock(&ctx->uring_lock);
-				percpu_ref_get(&ctx->refs);
-			}
-			req->io_task_work.func(req, &locked);
-			node = next;
-		} while (node);
-
+		handle_tw_list(node, &ctx, &locked);
 		cond_resched();
 	}
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] io_uring: split io_req_complete_post() and add a helper
  2021-12-07  9:39 [PATCH v7 0/5] task optimization Hao Xu
                   ` (2 preceding siblings ...)
  2021-12-07  9:39 ` [PATCH 3/5] io_uring: add helper for task work execution code Hao Xu
@ 2021-12-07  9:39 ` Hao Xu
  2021-12-07  9:39 ` [PATCH 5/5] io_uring: batch completion in prior_task_list Hao Xu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-07  9:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Split io_req_complete_post(), this is a prep for the next patch.

Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 85f9459e9072..21738ed7521e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1891,12 +1891,11 @@ static noinline bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data,
 	return __io_fill_cqe(ctx, user_data, res, cflags);
 }
 
-static void io_req_complete_post(struct io_kiocb *req, s32 res,
-				 u32 cflags)
+static void __io_req_complete_post(struct io_kiocb *req, s32 res,
+				   u32 cflags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
-	spin_lock(&ctx->completion_lock);
 	if (!(req->flags & REQ_F_CQE_SKIP))
 		__io_fill_cqe(ctx, req->user_data, res, cflags);
 	/*
@@ -1918,6 +1917,15 @@ static void io_req_complete_post(struct io_kiocb *req, s32 res,
 		wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
 		ctx->locked_free_nr++;
 	}
+}
+
+static void io_req_complete_post(struct io_kiocb *req, s32 res,
+				 u32 cflags)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	spin_lock(&ctx->completion_lock);
+	__io_req_complete_post(req, res, cflags);
 	io_commit_cqring(ctx);
 	spin_unlock(&ctx->completion_lock);
 	io_cqring_ev_posted(ctx);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] io_uring: batch completion in prior_task_list
  2021-12-07  9:39 [PATCH v7 0/5] task optimization Hao Xu
                   ` (3 preceding siblings ...)
  2021-12-07  9:39 ` [PATCH 4/5] io_uring: split io_req_complete_post() and add a helper Hao Xu
@ 2021-12-07  9:39 ` Hao Xu
  2021-12-07 21:01   ` Pavel Begunkov
  2021-12-07 21:59   ` Jens Axboe
  2021-12-07 11:18 ` [PATCH v7 0/5] task optimization Hao Xu
  2021-12-07 16:48 ` Jens Axboe
  6 siblings, 2 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-07  9:39 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

In previous patches, we have already gathered some tw with
io_req_task_complete() as callback in prior_task_list, let's complete
them in batch while we cannot grab uring lock. In this way, we batch
the req_complete_post path.

Tested-by: Pavel Begunkov <[email protected]>
Signed-off-by: Hao Xu <[email protected]>
---

Hi Pavel,
May I add the above Test-by tag here?

 fs/io_uring.c | 70 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 60 insertions(+), 10 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 21738ed7521e..f224f8df77a1 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2225,6 +2225,49 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
 	percpu_ref_put(&ctx->refs);
 }
 
+static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
+{
+	io_commit_cqring(ctx);
+	spin_unlock(&ctx->completion_lock);
+	io_cqring_ev_posted(ctx);
+}
+
+static void handle_prior_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx,
+				 bool *uring_locked, bool *compl_locked)
+{
+	do {
+		struct io_wq_work_node *next = node->next;
+		struct io_kiocb *req = container_of(node, struct io_kiocb,
+						    io_task_work.node);
+
+		if (req->ctx != *ctx) {
+			if (unlikely(*compl_locked)) {
+				ctx_commit_and_unlock(*ctx);
+				*compl_locked = false;
+			}
+			ctx_flush_and_put(*ctx, uring_locked);
+			*ctx = req->ctx;
+			/* if not contended, grab and improve batching */
+			*uring_locked = mutex_trylock(&(*ctx)->uring_lock);
+			percpu_ref_get(&(*ctx)->refs);
+			if (unlikely(!*uring_locked)) {
+				spin_lock(&(*ctx)->completion_lock);
+				*compl_locked = true;
+			}
+		}
+		if (likely(*uring_locked))
+			req->io_task_work.func(req, uring_locked);
+		else
+			__io_req_complete_post(req, req->result, io_put_kbuf(req));
+		node = next;
+	} while (node);
+
+	if (unlikely(*compl_locked)) {
+		ctx_commit_and_unlock(*ctx);
+		*compl_locked = false;
+	}
+}
+
 static void handle_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx, bool *locked)
 {
 	do {
@@ -2246,31 +2289,38 @@ static void handle_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ct
 
 static void tctx_task_work(struct callback_head *cb)
 {
-	bool locked = false;
+	bool uring_locked = false, compl_locked = false;
 	struct io_ring_ctx *ctx = NULL;
 	struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
 						  task_work);
 
 	while (1) {
-		struct io_wq_work_node *node;
+		struct io_wq_work_node *node1, *node2;
 
-		if (!tctx->prior_task_list.first &&
-		    !tctx->task_list.first && locked)
+		if (!tctx->task_list.first &&
+		    !tctx->prior_task_list.first && uring_locked)
 			io_submit_flush_completions(ctx);
 
 		spin_lock_irq(&tctx->task_lock);
-		node= wq_list_merge(&tctx->prior_task_list, &tctx->task_list);
-		if (!node)
+		node1 = tctx->prior_task_list.first;
+		node2 = tctx->task_list.first;
+		INIT_WQ_LIST(&tctx->task_list);
+		INIT_WQ_LIST(&tctx->prior_task_list);
+		if (!node2 && !node1)
 			tctx->task_running = false;
 		spin_unlock_irq(&tctx->task_lock);
-		if (!node)
+		if (!node2 && !node1)
 			break;
 
-		handle_tw_list(node, &ctx, &locked);
+		if (node1)
+			handle_prior_tw_list(node1, &ctx, &uring_locked, &compl_locked);
+
+		if (node2)
+			handle_tw_list(node2, &ctx, &uring_locked);
 		cond_resched();
 	}
 
-	ctx_flush_and_put(ctx, &locked);
+	ctx_flush_and_put(ctx, &uring_locked);
 }
 
 static void io_req_task_work_add(struct io_kiocb *req, bool priority)
@@ -2759,7 +2809,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res)
 		return;
 	req->result = res;
 	req->io_task_work.func = io_req_task_complete;
-	io_req_task_work_add(req, true);
+	io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOLL));
 }
 
 static void io_complete_rw_iopoll(struct kiocb *kiocb, long res)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 0/5] task optimization
  2021-12-07  9:39 [PATCH v7 0/5] task optimization Hao Xu
                   ` (4 preceding siblings ...)
  2021-12-07  9:39 ` [PATCH 5/5] io_uring: batch completion in prior_task_list Hao Xu
@ 2021-12-07 11:18 ` Hao Xu
  2021-12-07 16:48 ` Jens Axboe
  6 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-07 11:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

在 2021/12/7 下午5:39, Hao Xu 写道:
> v4->v5
> - change the implementation of merge_wq_list
> 
> v5->v6
> - change the logic of handling prior task list to:
>    1) grabbed uring_lock: leverage the inline completion infra
>    2) otherwise: batch __req_complete_post() calls to save
>       completion_lock operations.
> 
> v6->v7
> - add Pavel's fix of wrong spin unlock
> - remove a patch and rebase work
> 
> Hao Xu (5):
>    io-wq: add helper to merge two wq_lists
>    io_uring: add a priority tw list for irq completion work
>    io_uring: add helper for task work execution code
>    io_uring: split io_req_complete_post() and add a helper
>    io_uring: batch completion in prior_task_list
> 
>   fs/io-wq.h    |  22 ++++++++
>   fs/io_uring.c | 142 ++++++++++++++++++++++++++++++++++++--------------
>   2 files changed, 126 insertions(+), 38 deletions(-)
> 
typo: the subject should be task work optimization..


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v7 0/5] task optimization
  2021-12-07  9:39 [PATCH v7 0/5] task optimization Hao Xu
                   ` (5 preceding siblings ...)
  2021-12-07 11:18 ` [PATCH v7 0/5] task optimization Hao Xu
@ 2021-12-07 16:48 ` Jens Axboe
  6 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2021-12-07 16:48 UTC (permalink / raw)
  To: Hao Xu; +Cc: Pavel Begunkov, Joseph Qi, io-uring

On Tue, 7 Dec 2021 17:39:46 +0800, Hao Xu wrote:
> v4->v5
> - change the implementation of merge_wq_list
> 
> v5->v6
> - change the logic of handling prior task list to:
>   1) grabbed uring_lock: leverage the inline completion infra
>   2) otherwise: batch __req_complete_post() calls to save
>      completion_lock operations.
> 
> [...]

Applied, thanks!

[1/5] io-wq: add helper to merge two wq_lists
      commit: ce220e94513db1231a91847e2d5dd51baec6613c
[2/5] io_uring: add a priority tw list for irq completion work
      commit: e4780d989211b8d41571493d9423cc46a2cbe191
[3/5] io_uring: add helper for task work execution code
      commit: 9277479b763b9f210cab256022dd974341b87c69
[4/5] io_uring: split io_req_complete_post() and add a helper
      commit: 028b57a9cfb34267542484f26f25ae4d2dc31d67
[5/5] io_uring: batch completion in prior_task_list
      commit: 186cbb99b1e82b857b986ce886bafb255bd2f43c

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] io_uring: batch completion in prior_task_list
  2021-12-07  9:39 ` [PATCH 5/5] io_uring: batch completion in prior_task_list Hao Xu
@ 2021-12-07 21:01   ` Pavel Begunkov
  2021-12-07 21:16     ` Pavel Begunkov
  2021-12-08  5:08     ` Hao Xu
  2021-12-07 21:59   ` Jens Axboe
  1 sibling, 2 replies; 14+ messages in thread
From: Pavel Begunkov @ 2021-12-07 21:01 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 12/7/21 09:39, Hao Xu wrote:
> In previous patches, we have already gathered some tw with
> io_req_task_complete() as callback in prior_task_list, let's complete
> them in batch while we cannot grab uring lock. In this way, we batch
> the req_complete_post path.
> 
> Tested-by: Pavel Begunkov <[email protected]>

Hao, please never add tags for other people unless they confirmed
that it's fine. I asked Jens to kill this one and my signed-off
from 4/5 from io_uring branches.


> Signed-off-by: Hao Xu <[email protected]>
> ---
> 
> Hi Pavel,
> May I add the above Test-by tag here?
> 
>   fs/io_uring.c | 70 +++++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 60 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 21738ed7521e..f224f8df77a1 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2225,6 +2225,49 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
>   	percpu_ref_put(&ctx->refs);
>   }
>   
> +static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
> +{
> +	io_commit_cqring(ctx);
> +	spin_unlock(&ctx->completion_lock);
> +	io_cqring_ev_posted(ctx);
> +}
> +
> +static void handle_prior_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx,
> +				 bool *uring_locked, bool *compl_locked)

compl_locked probably can be a local var here, you're clearing it at the
end of the function anyway.

> +{
> +	do {
> +		struct io_wq_work_node *next = node->next;
> +		struct io_kiocb *req = container_of(node, struct io_kiocb,
> +						    io_task_work.node);
> +
> +		if (req->ctx != *ctx) {
> +			if (unlikely(*compl_locked)) {
> +				ctx_commit_and_unlock(*ctx);
> +				*compl_locked = false;
> +			}
> +			ctx_flush_and_put(*ctx, uring_locked);
> +			*ctx = req->ctx;
> +			/* if not contended, grab and improve batching */
> +			*uring_locked = mutex_trylock(&(*ctx)->uring_lock);
> +			percpu_ref_get(&(*ctx)->refs);
> +			if (unlikely(!*uring_locked)) {
> +				spin_lock(&(*ctx)->completion_lock);
> +				*compl_locked = true;
> +			}
> +		}
> +		if (likely(*uring_locked))
> +			req->io_task_work.func(req, uring_locked);
> +		else
> +			__io_req_complete_post(req, req->result, io_put_kbuf(req));

I think there is the same issue as last time, first iteration of tctx_task_work()
sets ctx but doesn't get uring_lock. Then you go here, find a request with the
same ctx and end up here with locking.


> +		node = next;
> +	} while (node);
> +
> +	if (unlikely(*compl_locked)) {
> +		ctx_commit_and_unlock(*ctx);
> +		*compl_locked = false;
> +	}
> +}


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] io_uring: batch completion in prior_task_list
  2021-12-07 21:01   ` Pavel Begunkov
@ 2021-12-07 21:16     ` Pavel Begunkov
  2021-12-08  5:04       ` Hao Xu
  2021-12-08  5:08     ` Hao Xu
  1 sibling, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2021-12-07 21:16 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 12/7/21 21:01, Pavel Begunkov wrote:
> On 12/7/21 09:39, Hao Xu wrote:
>> In previous patches, we have already gathered some tw with
>> io_req_task_complete() as callback in prior_task_list, let's complete
>> them in batch while we cannot grab uring lock. In this way, we batch
>> the req_complete_post path.
[...]
>> +        if (likely(*uring_locked))
>> +            req->io_task_work.func(req, uring_locked);
>> +        else
>> +            __io_req_complete_post(req, req->result, io_put_kbuf(req));
> 
> I think there is the same issue as last time, first iteration of tctx_task_work()
> sets ctx but doesn't get uring_lock. Then you go here, find a request with the
> same ctx and end up here with locking.

Maybe something like below on top? Totally untested. We basically always
want *uring_locked != *compl_locked, so we don't even need to to store
both vars.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index f224f8df77a1..dfa226bf2c53 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2233,27 +2233,28 @@ static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
  }
  
  static void handle_prior_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx,
-				 bool *uring_locked, bool *compl_locked)
+				 bool *uring_locked)
  {
+	if (*ctx && !*uring_locked)
+		spin_lock(&(*ctx)->completion_lock);
+
  	do {
  		struct io_wq_work_node *next = node->next;
  		struct io_kiocb *req = container_of(node, struct io_kiocb,
  						    io_task_work.node);
  
  		if (req->ctx != *ctx) {
-			if (unlikely(*compl_locked)) {
+			if (unlikely(!*uring_locked && *ctx))
  				ctx_commit_and_unlock(*ctx);
-				*compl_locked = false;
-			}
+
  			ctx_flush_and_put(*ctx, uring_locked);
  			*ctx = req->ctx;
  			/* if not contended, grab and improve batching */
  			*uring_locked = mutex_trylock(&(*ctx)->uring_lock);
-			percpu_ref_get(&(*ctx)->refs);
-			if (unlikely(!*uring_locked)) {
+			if (unlikely(!*uring_locked))
  				spin_lock(&(*ctx)->completion_lock);
-				*compl_locked = true;
-			}
+
+			percpu_ref_get(&(*ctx)->refs);
  		}
  		if (likely(*uring_locked))
  			req->io_task_work.func(req, uring_locked);
@@ -2262,10 +2263,8 @@ static void handle_prior_tw_list(struct io_wq_work_node *node, struct io_ring_ct
  		node = next;
  	} while (node);
  
-	if (unlikely(*compl_locked)) {
+	if (unlikely(!*uring_locked))
  		ctx_commit_and_unlock(*ctx);
-		*compl_locked = false;
-	}
  }
  
  static void handle_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx, bool *locked)
@@ -2289,7 +2288,7 @@ static void handle_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ct
  
  static void tctx_task_work(struct callback_head *cb)
  {
-	bool uring_locked = false, compl_locked = false;
+	bool uring_locked = false;
  	struct io_ring_ctx *ctx = NULL;
  	struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
  						  task_work);
@@ -2313,7 +2312,7 @@ static void tctx_task_work(struct callback_head *cb)
  			break;
  
  		if (node1)
-			handle_prior_tw_list(node1, &ctx, &uring_locked, &compl_locked);
+			handle_prior_tw_list(node1, &ctx, &uring_locked);
  
  		if (node2)
  			handle_tw_list(node2, &ctx, &uring_locked);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] io_uring: batch completion in prior_task_list
  2021-12-07  9:39 ` [PATCH 5/5] io_uring: batch completion in prior_task_list Hao Xu
  2021-12-07 21:01   ` Pavel Begunkov
@ 2021-12-07 21:59   ` Jens Axboe
  2021-12-08  5:23     ` Hao Xu
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2021-12-07 21:59 UTC (permalink / raw)
  To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi

On 12/7/21 2:39 AM, Hao Xu wrote:
> In previous patches, we have already gathered some tw with
> io_req_task_complete() as callback in prior_task_list, let's complete
> them in batch while we cannot grab uring lock. In this way, we batch
> the req_complete_post path.
> 
> Tested-by: Pavel Begunkov <[email protected]>
> Signed-off-by: Hao Xu <[email protected]>
> ---
> 
> Hi Pavel,
> May I add the above Test-by tag here?

When you fold in Pavel's fixed, please also address the below.

>  fs/io_uring.c | 70 +++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 60 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 21738ed7521e..f224f8df77a1 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2225,6 +2225,49 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
>  	percpu_ref_put(&ctx->refs);
>  }
>  
> +static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
> +{
> +	io_commit_cqring(ctx);
> +	spin_unlock(&ctx->completion_lock);
> +	io_cqring_ev_posted(ctx);
> +}
> +
> +static void handle_prior_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx,
> +				 bool *uring_locked, bool *compl_locked)
> +{

Please wrap at 80 lines. And let's name this one 'handle_prev_tw_list'
instead.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] io_uring: batch completion in prior_task_list
  2021-12-07 21:16     ` Pavel Begunkov
@ 2021-12-08  5:04       ` Hao Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-08  5:04 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

Hi,

在 2021/12/8 上午5:16, Pavel Begunkov 写道:
> On 12/7/21 21:01, Pavel Begunkov wrote:
>> On 12/7/21 09:39, Hao Xu wrote:
>>> In previous patches, we have already gathered some tw with
>>> io_req_task_complete() as callback in prior_task_list, let's complete
>>> them in batch while we cannot grab uring lock. In this way, we batch
>>> the req_complete_post path.
> [...]
>>> +        if (likely(*uring_locked))
>>> +            req->io_task_work.func(req, uring_locked);
>>> +        else
>>> +            __io_req_complete_post(req, req->result, 
>>> io_put_kbuf(req));
>>
>> I think there is the same issue as last time, first iteration of 
>> tctx_task_work()
>> sets ctx but doesn't get uring_lock. Then you go here, find a request 
>> with the
>> same ctx and end up here with locking.
>
> Maybe something like below on top? Totally untested. We basically always
> want *uring_locked != *compl_locked, so we don't even need to to store
> both vars.
>
Thanks, Pavel. I Forgot to think about when req->ctx == ctx, tested the 
fix here

and it looks good.


Regards,
Hao


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] io_uring: batch completion in prior_task_list
  2021-12-07 21:01   ` Pavel Begunkov
  2021-12-07 21:16     ` Pavel Begunkov
@ 2021-12-08  5:08     ` Hao Xu
  1 sibling, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-08  5:08 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

Hi,

在 2021/12/8 上午5:01, Pavel Begunkov 写道:
> On 12/7/21 09:39, Hao Xu wrote:
>> In previous patches, we have already gathered some tw with
>> io_req_task_complete() as callback in prior_task_list, let's complete
>> them in batch while we cannot grab uring lock. In this way, we batch
>> the req_complete_post path.
>>
>> Tested-by: Pavel Begunkov <[email protected]>
>
> Hao, please never add tags for other people unless they confirmed
> that it's fine. I asked Jens to kill this one and my signed-off
> from 4/5 from io_uring branches.
>
Apologize for that, I'll remember this rule.


Regards,
Hao


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] io_uring: batch completion in prior_task_list
  2021-12-07 21:59   ` Jens Axboe
@ 2021-12-08  5:23     ` Hao Xu
  0 siblings, 0 replies; 14+ messages in thread
From: Hao Xu @ 2021-12-08  5:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi


在 2021/12/8 上午5:59, Jens Axboe 写道:
> On 12/7/21 2:39 AM, Hao Xu wrote:
>> In previous patches, we have already gathered some tw with
>> io_req_task_complete() as callback in prior_task_list, let's complete
>> them in batch while we cannot grab uring lock. In this way, we batch
>> the req_complete_post path.
>>
>> Tested-by: Pavel Begunkov <[email protected]>
>> Signed-off-by: Hao Xu <[email protected]>
>> ---
>>
>> Hi Pavel,
>> May I add the above Test-by tag here?
> When you fold in Pavel's fixed, please also address the below.
>
>>   fs/io_uring.c | 70 +++++++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 60 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 21738ed7521e..f224f8df77a1 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -2225,6 +2225,49 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
>>   	percpu_ref_put(&ctx->refs);
>>   }
>>   
>> +static inline void ctx_commit_and_unlock(struct io_ring_ctx *ctx)
>> +{
>> +	io_commit_cqring(ctx);
>> +	spin_unlock(&ctx->completion_lock);
>> +	io_cqring_ev_posted(ctx);
>> +}
>> +
>> +static void handle_prior_tw_list(struct io_wq_work_node *node, struct io_ring_ctx **ctx,
>> +				 bool *uring_locked, bool *compl_locked)
>> +{
> Please wrap at 80 lines. And let's name this one 'handle_prev_tw_list'
> instead.
Gotcha.
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-12-08  5:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-07  9:39 [PATCH v7 0/5] task optimization Hao Xu
2021-12-07  9:39 ` [PATCH 1/5] io-wq: add helper to merge two wq_lists Hao Xu
2021-12-07  9:39 ` [PATCH 2/5] io_uring: add a priority tw list for irq completion work Hao Xu
2021-12-07  9:39 ` [PATCH 3/5] io_uring: add helper for task work execution code Hao Xu
2021-12-07  9:39 ` [PATCH 4/5] io_uring: split io_req_complete_post() and add a helper Hao Xu
2021-12-07  9:39 ` [PATCH 5/5] io_uring: batch completion in prior_task_list Hao Xu
2021-12-07 21:01   ` Pavel Begunkov
2021-12-07 21:16     ` Pavel Begunkov
2021-12-08  5:04       ` Hao Xu
2021-12-08  5:08     ` Hao Xu
2021-12-07 21:59   ` Jens Axboe
2021-12-08  5:23     ` Hao Xu
2021-12-07 11:18 ` [PATCH v7 0/5] task optimization Hao Xu
2021-12-07 16:48 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox