public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/2] yet another optimisation for-next
@ 2021-10-09 22:14 Pavel Begunkov
  2021-10-09 22:14 ` [PATCH 1/2] io_uring: optimise io_req_set_rsrc_node() Pavel Begunkov
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-10-09 22:14 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

./io_uring -d 32 -s 32 -c 32 -b512 -p1 /dev/nullb0

3.43 MIOPS -> ~3.6 MIOPS, so getting us another 4-6% for nullblk I/O

Pavel Begunkov (2):
  io_uring: optimise io_req_set_rsrc_node()
  io_uring: optimise ctx referencing

 fs/io_uring.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 52 insertions(+), 7 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] io_uring: optimise io_req_set_rsrc_node()
  2021-10-09 22:14 [PATCH 0/2] yet another optimisation for-next Pavel Begunkov
@ 2021-10-09 22:14 ` Pavel Begunkov
  2021-10-09 22:14 ` [PATCH 2/2] io_uring: optimise ctx referencing Pavel Begunkov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-10-09 22:14 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

io_req_set_rsrc_node() reloads loads req->ctx, however it's already in
registers in all use cases, so better to pass it as a parameter.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index dff732397264..24984b3f4a49 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1175,10 +1175,9 @@ static inline void io_req_set_refcount(struct io_kiocb *req)
 	__io_req_set_refcount(req, 1);
 }
 
-static inline void io_req_set_rsrc_node(struct io_kiocb *req)
+static inline void io_req_set_rsrc_node(struct io_kiocb *req,
+					struct io_ring_ctx *ctx)
 {
-	struct io_ring_ctx *ctx = req->ctx;
-
 	if (!req->fixed_rsrc_refs) {
 		req->fixed_rsrc_refs = &ctx->rsrc_node->refs;
 		percpu_ref_get(req->fixed_rsrc_refs);
@@ -2843,7 +2842,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	if (req->opcode == IORING_OP_READ_FIXED ||
 	    req->opcode == IORING_OP_WRITE_FIXED) {
 		req->imu = NULL;
-		io_req_set_rsrc_node(req);
+		io_req_set_rsrc_node(req, ctx);
 	}
 
 	req->rw.addr = READ_ONCE(sqe->addr);
@@ -6772,7 +6771,7 @@ static inline struct file *io_file_get_fixed(struct io_ring_ctx *ctx,
 	file_ptr &= ~FFS_MASK;
 	/* mask in overlapping REQ_F and FFS bits */
 	req->flags |= (file_ptr << REQ_F_NOWAIT_READ_BIT);
-	io_req_set_rsrc_node(req);
+	io_req_set_rsrc_node(req, ctx);
 	return file;
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] io_uring: optimise ctx referencing
  2021-10-09 22:14 [PATCH 0/2] yet another optimisation for-next Pavel Begunkov
  2021-10-09 22:14 ` [PATCH 1/2] io_uring: optimise io_req_set_rsrc_node() Pavel Begunkov
@ 2021-10-09 22:14 ` Pavel Begunkov
  2021-10-14 13:50 ` (subset) [PATCH 0/2] yet another optimisation for-next Jens Axboe
  2021-10-14 13:54 ` Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2021-10-09 22:14 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Apparently, percpu_ref_put/get() are expensive enough if done per
request, get them in a batch and cache on the submission side to avoid
getting it over and over again. Also, if we're completing under
uring_lock, return refs back into the cache instead of
perfcpu_ref_put(). Pretty similar to how we do tctx->cached_refs
accounting, but fall back to normal putting when we already changed a
rsrc node by the time of free.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 24984b3f4a49..e558a68a371d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -361,6 +361,7 @@ struct io_ring_ctx {
 		 * uring_lock, and updated through io_uring_register(2)
 		 */
 		struct io_rsrc_node	*rsrc_node;
+		int			rsrc_cached_refs;
 		struct io_file_table	file_table;
 		unsigned		nr_user_files;
 		unsigned		nr_user_bufs;
@@ -1175,12 +1176,52 @@ static inline void io_req_set_refcount(struct io_kiocb *req)
 	__io_req_set_refcount(req, 1);
 }
 
+#define IO_RSRC_REF_BATCH	100
+
+static inline void io_req_put_rsrc_locked(struct io_kiocb *req,
+					  struct io_ring_ctx *ctx)
+	__must_hold(&ctx->uring_lock)
+{
+	struct percpu_ref *ref = req->fixed_rsrc_refs;
+
+	if (ref) {
+		if (ref == &ctx->rsrc_node->refs)
+			ctx->rsrc_cached_refs++;
+		else
+			percpu_ref_put(ref);
+	}
+}
+
+static inline void io_req_put_rsrc(struct io_kiocb *req, struct io_ring_ctx *ctx)
+{
+	if (req->fixed_rsrc_refs)
+		percpu_ref_put(req->fixed_rsrc_refs);
+}
+
+static __cold void io_rsrc_refs_drop(struct io_ring_ctx *ctx)
+	__must_hold(&ctx->uring_lock)
+{
+	if (ctx->rsrc_cached_refs) {
+		percpu_ref_put_many(&ctx->rsrc_node->refs, ctx->rsrc_cached_refs);
+		ctx->rsrc_cached_refs = 0;
+	}
+}
+
+static void io_rsrc_refs_refill(struct io_ring_ctx *ctx)
+	__must_hold(&ctx->uring_lock)
+{
+	ctx->rsrc_cached_refs += IO_RSRC_REF_BATCH;
+	percpu_ref_get_many(&ctx->rsrc_node->refs, IO_RSRC_REF_BATCH);
+}
+
 static inline void io_req_set_rsrc_node(struct io_kiocb *req,
 					struct io_ring_ctx *ctx)
 {
 	if (!req->fixed_rsrc_refs) {
 		req->fixed_rsrc_refs = &ctx->rsrc_node->refs;
-		percpu_ref_get(req->fixed_rsrc_refs);
+		ctx->rsrc_cached_refs--;
+		if (unlikely(ctx->rsrc_cached_refs < 0))
+			io_rsrc_refs_refill(ctx);
 	}
 }
 
@@ -1801,6 +1842,7 @@ static void io_req_complete_post(struct io_kiocb *req, s32 res,
 				req->link = NULL;
 			}
 		}
+		io_req_put_rsrc(req, ctx);
 		io_dismantle_req(req);
 		io_put_task(req->task, 1);
 		wq_list_add_head(&req->comp_list, &ctx->locked_free_list);
@@ -1957,14 +1999,13 @@ static inline void io_dismantle_req(struct io_kiocb *req)
 		io_clean_op(req);
 	if (!(flags & REQ_F_FIXED_FILE))
 		io_put_file(req->file);
-	if (req->fixed_rsrc_refs)
-		percpu_ref_put(req->fixed_rsrc_refs);
 }
 
 static __cold void __io_free_req(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 
+	io_req_put_rsrc(req, ctx);
 	io_dismantle_req(req);
 	io_put_task(req->task, 1);
 
@@ -2271,6 +2312,7 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
 			continue;
 		}
 
+		io_req_put_rsrc_locked(req, ctx);
 		io_queue_next(req);
 		io_dismantle_req(req);
 
@@ -7646,10 +7688,13 @@ static struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx)
 
 static void io_rsrc_node_switch(struct io_ring_ctx *ctx,
 				struct io_rsrc_data *data_to_kill)
+	__must_hold(&ctx->uring_lock)
 {
 	WARN_ON_ONCE(!ctx->rsrc_backup_node);
 	WARN_ON_ONCE(data_to_kill && !ctx->rsrc_node);
 
+	io_rsrc_refs_drop(ctx);
+
 	if (data_to_kill) {
 		struct io_rsrc_node *rsrc_node = ctx->rsrc_node;
 
@@ -9203,6 +9248,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 		ctx->mm_account = NULL;
 	}
 
+	io_rsrc_refs_drop(ctx);
 	/* __io_rsrc_put_work() may need uring_lock to progress, wait w/o it */
 	io_wait_rsrc_data(ctx->buf_data);
 	io_wait_rsrc_data(ctx->file_data);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: (subset) [PATCH 0/2] yet another optimisation for-next
  2021-10-09 22:14 [PATCH 0/2] yet another optimisation for-next Pavel Begunkov
  2021-10-09 22:14 ` [PATCH 1/2] io_uring: optimise io_req_set_rsrc_node() Pavel Begunkov
  2021-10-09 22:14 ` [PATCH 2/2] io_uring: optimise ctx referencing Pavel Begunkov
@ 2021-10-14 13:50 ` Jens Axboe
  2021-10-14 13:54 ` Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2021-10-14 13:50 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On Sat, 9 Oct 2021 23:14:39 +0100, Pavel Begunkov wrote:
> ./io_uring -d 32 -s 32 -c 32 -b512 -p1 /dev/nullb0
> 
> 3.43 MIOPS -> ~3.6 MIOPS, so getting us another 4-6% for nullblk I/O
> 
> Pavel Begunkov (2):
>   io_uring: optimise io_req_set_rsrc_node()
>   io_uring: optimise ctx referencing
> 
> [...]

Applied, thanks!

[1/2] io_uring: optimise io_req_set_rsrc_node()
      commit: 5d72a8b5371a761423c0bb781e717f8ff28c6851

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: (subset) [PATCH 0/2] yet another optimisation for-next
  2021-10-09 22:14 [PATCH 0/2] yet another optimisation for-next Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-10-14 13:50 ` (subset) [PATCH 0/2] yet another optimisation for-next Jens Axboe
@ 2021-10-14 13:54 ` Jens Axboe
  3 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2021-10-14 13:54 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On Sat, 9 Oct 2021 23:14:39 +0100, Pavel Begunkov wrote:
> ./io_uring -d 32 -s 32 -c 32 -b512 -p1 /dev/nullb0
> 
> 3.43 MIOPS -> ~3.6 MIOPS, so getting us another 4-6% for nullblk I/O
> 
> Pavel Begunkov (2):
>   io_uring: optimise io_req_set_rsrc_node()
>   io_uring: optimise ctx referencing
> 
> [...]

Applied, thanks!

[2/2] io_uring: optimise ctx referencing
      commit: c267832ae16edac4a0a829efccf1910edda74b91

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-14 13:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-09 22:14 [PATCH 0/2] yet another optimisation for-next Pavel Begunkov
2021-10-09 22:14 ` [PATCH 1/2] io_uring: optimise io_req_set_rsrc_node() Pavel Begunkov
2021-10-09 22:14 ` [PATCH 2/2] io_uring: optimise ctx referencing Pavel Begunkov
2021-10-14 13:50 ` (subset) [PATCH 0/2] yet another optimisation for-next Jens Axboe
2021-10-14 13:54 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox