[PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations
@ 2022-06-16  9:21 Pavel Begunkov
  2022-06-16  9:21 ` [PATCH for-next v3 01/16] io_uring: rw: delegate sync completions to core io_uring Pavel Begunkov
                   ` (17 more replies)
  0 siblings, 18 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:21 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

1-4 kills REQ_F_COMPLETE_INLINE as we're out of bits.

Patch 5 from Hao should remove some overhead from poll requests

Patch 6 from Hao adds per-bucket spinlocks, and 16-19 do a little
bit of cleanup. The downside of per-bucket spinlocks is that it adds
additional spinlock/unlock pair in the poll request completion side,
which shouldn't matter much with 20/25.

Patch 11 uses inline completion infra for poll requests, this nicely
improves perf when there is a good tw batching.

Patch 12 implements the userspace visible side of
IORING_SETUP_SINGLE_ISSUER, it'll be used for poll requests and
later for spinlock optimisations.

13-16 introduces ->uring_lock protected cancellation hashing. It
requires us to grab ->uring_lock in the completion side, but saves
two spin lock/unlock pairs. We apply it automatically in cases the
mutex is already likely to be held (see 25/25 description), so there
is no additional mutex overhead and potential latency problemes.

Numbers:

The used poll benchmark each iteration queues a batch of 32 POLLIN
poll requests and triggers all of them with read (+write).

baseline (patches 1-10):
    11720 K req/s
base + 11 (+ inline completion infra)
    12419 K req/s, ~+6%
base + 11-16 (+ uring_lock hashing):
    12804 K req/s, +9.2% from the baseline, or +3.2% relative to patch 19.

Note that patch 11 only helps performance of poll-add requests, whenever
16/16 also improves apoll.

v2:
  don't move ->cancel_seq out of iowq work struct
  fix up single-issuer

v3:
  clarify locking expectation around ->uring_lock hashing
  don't complete by hand in io_read/write (see 1/16)

Hao Xu (2):
  io_uring: poll: remove unnecessary req->ref set
  io_uring: switch cancel_hash to use per entry spinlock

Pavel Begunkov (14):
  io_uring: rw: delegate sync completions to core io_uring
  io_uring: kill REQ_F_COMPLETE_INLINE
  io_uring: refactor io_req_task_complete()
  io_uring: don't inline io_put_kbuf
  io_uring: pass poll_find lock back
  io_uring: clean up io_try_cancel
  io_uring: limit the number of cancellation buckets
  io_uring: clean up io_ring_ctx_alloc
  io_uring: use state completion infra for poll reqs
  io_uring: add IORING_SETUP_SINGLE_ISSUER
  io_uring: pass hash table into poll_find
  io_uring: introduce a struct for hash table
  io_uring: propagate locking state to poll cancel
  io_uring: mutex locked poll hashing

 include/uapi/linux/io_uring.h |   5 +-
 io_uring/cancel.c             |  23 +++-
 io_uring/cancel.h             |   4 +-
 io_uring/fdinfo.c             |  11 +-
 io_uring/io_uring.c           |  84 ++++++++-----
 io_uring/io_uring.h           |   5 -
 io_uring/io_uring_types.h     |  21 +++-
 io_uring/kbuf.c               |  33 +++++
 io_uring/kbuf.h               |  38 +-----
 io_uring/poll.c               | 225 +++++++++++++++++++++++++---------
 io_uring/poll.h               |   3 +-
 io_uring/rw.c                 |  41 +++----
 io_uring/tctx.c               |  27 +++-
 io_uring/tctx.h               |   4 +-
 io_uring/timeout.c            |   3 +-
 15 files changed, 353 insertions(+), 174 deletions(-)

-- 
2.36.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 01/16] io_uring: rw: delegate sync completions to core io_uring
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
@ 2022-06-16  9:21 ` Pavel Begunkov
  2022-06-16  9:21 ` [PATCH for-next v3 02/16] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:21 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

io_issue_sqe() from the io_uring core knows how to complete requests
based on the returned error code, we can delegate io_read()/io_write()
completion to it. Make kiocb_done() to return the right completion
code and propagate it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/rw.c | 41 +++++++++++++++++++----------------------
 1 file changed, 19 insertions(+), 22 deletions(-)

diff --git a/io_uring/rw.c b/io_uring/rw.c
index fa1063c738f8..818692a83d75 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -207,15 +207,6 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res)
 	return false;
 }
 
-static void __io_complete_rw(struct io_kiocb *req, long res,
-			     unsigned int issue_flags)
-{
-	if (__io_complete_rw_common(req, res))
-		return;
-	io_req_set_res(req, req->cqe.res, io_put_kbuf(req, issue_flags));
-	__io_req_complete(req, issue_flags);
-}
-
 static void io_complete_rw(struct kiocb *kiocb, long res)
 {
 	struct io_rw *rw = container_of(kiocb, struct io_rw, kiocb);
@@ -247,7 +238,7 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res)
 	smp_store_release(&req->iopoll_completed, 1);
 }
 
-static void kiocb_done(struct io_kiocb *req, ssize_t ret,
+static int kiocb_done(struct io_kiocb *req, ssize_t ret,
 		       unsigned int issue_flags)
 {
 	struct io_async_rw *io = req->async_data;
@@ -263,10 +254,15 @@ static void kiocb_done(struct io_kiocb *req, ssize_t ret,
 
 	if (req->flags & REQ_F_CUR_POS)
 		req->file->f_pos = rw->kiocb.ki_pos;
-	if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw))
-		__io_complete_rw(req, ret, issue_flags);
-	else
+	if (ret >= 0 && (rw->kiocb.ki_complete == io_complete_rw)) {
+		if (!__io_complete_rw_common(req, ret)) {
+			io_req_set_res(req, req->cqe.res,
+				       io_put_kbuf(req, issue_flags));
+			return IOU_OK;
+		}
+	} else {
 		io_rw_done(&rw->kiocb, ret);
+	}
 
 	if (req->flags & REQ_F_REISSUE) {
 		req->flags &= ~REQ_F_REISSUE;
@@ -275,6 +271,7 @@ static void kiocb_done(struct io_kiocb *req, ssize_t ret,
 		else
 			io_req_task_queue_fail(req, ret);
 	}
+	return IOU_ISSUE_SKIP_COMPLETE;
 }
 
 static int __io_import_fixed(struct io_kiocb *req, int ddir,
@@ -846,7 +843,9 @@ int io_read(struct io_kiocb *req, unsigned int issue_flags)
 			goto done;
 		ret = 0;
 	} else if (ret == -EIOCBQUEUED) {
-		goto out_free;
+		if (iovec)
+			kfree(iovec);
+		return IOU_ISSUE_SKIP_COMPLETE;
 	} else if (ret == req->cqe.res || ret <= 0 || !force_nonblock ||
 		   (req->flags & REQ_F_NOWAIT) || !need_read_all(req)) {
 		/* read all, failed, already did sync or don't want to retry */
@@ -904,12 +903,10 @@ int io_read(struct io_kiocb *req, unsigned int issue_flags)
 		iov_iter_restore(&s->iter, &s->iter_state);
 	} while (ret > 0);
 done:
-	kiocb_done(req, ret, issue_flags);
-out_free:
 	/* it's faster to check here then delegate to kfree */
 	if (iovec)
 		kfree(iovec);
-	return IOU_ISSUE_SKIP_COMPLETE;
+	return kiocb_done(req, ret, issue_flags);
 }
 
 int io_write(struct io_kiocb *req, unsigned int issue_flags)
@@ -959,8 +956,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	ppos = io_kiocb_update_pos(req);
 
 	ret = rw_verify_area(WRITE, req->file, ppos, req->cqe.res);
-	if (unlikely(ret))
-		goto out_free;
+	if (unlikely(ret)) {
+		kfree(iovec);
+		return ret;
+	}
 
 	/*
 	 * Open-code file_start_write here to grab freeze protection,
@@ -1002,15 +1001,13 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 		if (ret2 == -EAGAIN && (req->ctx->flags & IORING_SETUP_IOPOLL))
 			goto copy_iov;
 done:
-		kiocb_done(req, ret2, issue_flags);
-		ret = IOU_ISSUE_SKIP_COMPLETE;
+		ret = kiocb_done(req, ret2, issue_flags);
 	} else {
 copy_iov:
 		iov_iter_restore(&s->iter, &s->iter_state);
 		ret = io_setup_async_rw(req, iovec, s, false);
 		return ret ?: -EAGAIN;
 	}
-out_free:
 	/* it's reportedly faster than delegating the null check to kfree() */
 	if (iovec)
 		kfree(iovec);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 02/16] io_uring: kill REQ_F_COMPLETE_INLINE
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
  2022-06-16  9:21 ` [PATCH for-next v3 01/16] io_uring: rw: delegate sync completions to core io_uring Pavel Begunkov
@ 2022-06-16  9:21 ` Pavel Begunkov
  2022-06-16  9:21 ` [PATCH for-next v3 03/16] io_uring: refactor io_req_task_complete() Pavel Begunkov
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:21 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

REQ_F_COMPLETE_INLINE is only needed to delay queueing into the
completion list to io_queue_sqe() as __io_req_complete() is inlined and
we don't want to bloat the kernel.

As now we complete in a more centralised fashion in io_issue_sqe() we
can get rid of the flag and queue to the list directly.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/io_uring.c       | 18 +++++++-----------
 io_uring/io_uring.h       |  5 -----
 io_uring/io_uring_types.h |  3 ---
 3 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index f1ecfdd0166e..02a70e7eb774 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -742,10 +742,7 @@ void io_req_complete_post(struct io_kiocb *req)
 
 inline void __io_req_complete(struct io_kiocb *req, unsigned issue_flags)
 {
-	if (issue_flags & IO_URING_F_COMPLETE_DEFER)
-		req->flags |= REQ_F_COMPLETE_INLINE;
-	else
-		io_req_complete_post(req);
+	io_req_complete_post(req);
 }
 
 void io_req_complete_failed(struct io_kiocb *req, s32 res)
@@ -1581,9 +1578,12 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
 	if (creds)
 		revert_creds(creds);
 
-	if (ret == IOU_OK)
-		__io_req_complete(req, issue_flags);
-	else if (ret != IOU_ISSUE_SKIP_COMPLETE)
+	if (ret == IOU_OK) {
+		if (issue_flags & IO_URING_F_COMPLETE_DEFER)
+			io_req_add_compl_list(req);
+		else
+			io_req_complete_post(req);
+	} else if (ret != IOU_ISSUE_SKIP_COMPLETE)
 		return ret;
 
 	/* If the op doesn't have a file, we're not polling for it */
@@ -1748,10 +1748,6 @@ static inline void io_queue_sqe(struct io_kiocb *req)
 
 	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
 
-	if (req->flags & REQ_F_COMPLETE_INLINE) {
-		io_req_add_compl_list(req);
-		return;
-	}
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
 	 * doesn't support non-blocking read/write attempts
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index e30e639c2822..3f6cad3d356c 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -217,11 +217,6 @@ static inline bool io_run_task_work(void)
 	return false;
 }
 
-static inline void io_req_complete_state(struct io_kiocb *req)
-{
-	req->flags |= REQ_F_COMPLETE_INLINE;
-}
-
 static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
 {
 	if (!*locked) {
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index ef1cf86e8932..4576ea8cad2e 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -301,7 +301,6 @@ enum {
 	REQ_F_POLLED_BIT,
 	REQ_F_BUFFER_SELECTED_BIT,
 	REQ_F_BUFFER_RING_BIT,
-	REQ_F_COMPLETE_INLINE_BIT,
 	REQ_F_REISSUE_BIT,
 	REQ_F_CREDS_BIT,
 	REQ_F_REFCOUNT_BIT,
@@ -356,8 +355,6 @@ enum {
 	REQ_F_BUFFER_SELECTED	= BIT(REQ_F_BUFFER_SELECTED_BIT),
 	/* buffer selected from ring, needs commit */
 	REQ_F_BUFFER_RING	= BIT(REQ_F_BUFFER_RING_BIT),
-	/* completion is deferred through io_comp_state */
-	REQ_F_COMPLETE_INLINE	= BIT(REQ_F_COMPLETE_INLINE_BIT),
 	/* caller should reissue async */
 	REQ_F_REISSUE		= BIT(REQ_F_REISSUE_BIT),
 	/* supports async reads/writes */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 03/16] io_uring: refactor io_req_task_complete()
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
  2022-06-16  9:21 ` [PATCH for-next v3 01/16] io_uring: rw: delegate sync completions to core io_uring Pavel Begunkov
  2022-06-16  9:21 ` [PATCH for-next v3 02/16] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
@ 2022-06-16  9:21 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 04/16] io_uring: don't inline io_put_kbuf Pavel Begunkov
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:21 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Clean up io_req_task_complete() and deduplicate io_put_kbuf() calls.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/io_uring.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 02a70e7eb774..1bcd2a8ebd4c 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1306,15 +1306,19 @@ static int io_iopoll_check(struct io_ring_ctx *ctx, long min)
 
 	return ret;
 }
-inline void io_req_task_complete(struct io_kiocb *req, bool *locked)
+
+void io_req_task_complete(struct io_kiocb *req, bool *locked)
 {
-	if (*locked) {
-		req->cqe.flags |= io_put_kbuf(req, 0);
+	if (req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)) {
+		unsigned issue_flags = *locked ? 0 : IO_URING_F_UNLOCKED;
+
+		req->cqe.flags |= io_put_kbuf(req, issue_flags);
+	}
+
+	if (*locked)
 		io_req_add_compl_list(req);
-	} else {
-		req->cqe.flags |= io_put_kbuf(req, IO_URING_F_UNLOCKED);
+	else
 		io_req_complete_post(req);
-	}
 }
 
 /*
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 04/16] io_uring: don't inline io_put_kbuf
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-06-16  9:21 ` [PATCH for-next v3 03/16] io_uring: refactor io_req_task_complete() Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 05/16] io_uring: poll: remove unnecessary req->ref set Pavel Begunkov
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

io_put_kbuf() is huge, don't bloat the kernel with inlining.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/kbuf.c | 33 +++++++++++++++++++++++++++++++++
 io_uring/kbuf.h | 38 ++++++--------------------------------
 2 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 5885343705bd..223d9db2ba94 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -82,6 +82,39 @@ static int io_buffer_add_list(struct io_ring_ctx *ctx,
 	return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
 }
 
+unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags)
+{
+	unsigned int cflags;
+
+	/*
+	 * We can add this buffer back to two lists:
+	 *
+	 * 1) The io_buffers_cache list. This one is protected by the
+	 *    ctx->uring_lock. If we already hold this lock, add back to this
+	 *    list as we can grab it from issue as well.
+	 * 2) The io_buffers_comp list. This one is protected by the
+	 *    ctx->completion_lock.
+	 *
+	 * We migrate buffers from the comp_list to the issue cache list
+	 * when we need one.
+	 */
+	if (req->flags & REQ_F_BUFFER_RING) {
+		/* no buffers to recycle for this case */
+		cflags = __io_put_kbuf_list(req, NULL);
+	} else if (issue_flags & IO_URING_F_UNLOCKED) {
+		struct io_ring_ctx *ctx = req->ctx;
+
+		spin_lock(&ctx->completion_lock);
+		cflags = __io_put_kbuf_list(req, &ctx->io_buffers_comp);
+		spin_unlock(&ctx->completion_lock);
+	} else {
+		lockdep_assert_held(&req->ctx->uring_lock);
+
+		cflags = __io_put_kbuf_list(req, &req->ctx->io_buffers_cache);
+	}
+	return cflags;
+}
+
 static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
 					      struct io_buffer_list *bl)
 {
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 80b6df2c7535..5da3d4039aed 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -47,6 +47,8 @@ int io_provide_buffers(struct io_kiocb *req, unsigned int issue_flags);
 int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
 int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg);
 
+unsigned int __io_put_kbuf(struct io_kiocb *req, unsigned issue_flags);
+
 static inline bool io_do_buffer_select(struct io_kiocb *req)
 {
 	if (!(req->flags & REQ_F_BUFFER_SELECT))
@@ -70,7 +72,8 @@ static inline void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
 	__io_kbuf_recycle(req, issue_flags);
 }
 
-static unsigned int __io_put_kbuf(struct io_kiocb *req, struct list_head *list)
+static inline unsigned int __io_put_kbuf_list(struct io_kiocb *req,
+					      struct list_head *list)
 {
 	if (req->flags & REQ_F_BUFFER_RING) {
 		if (req->buf_list)
@@ -90,44 +93,15 @@ static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
 
 	if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
 		return 0;
-	return __io_put_kbuf(req, &req->ctx->io_buffers_comp);
+	return __io_put_kbuf_list(req, &req->ctx->io_buffers_comp);
 }
 
 static inline unsigned int io_put_kbuf(struct io_kiocb *req,
 				       unsigned issue_flags)
 {
-	unsigned int cflags;
 
 	if (!(req->flags & (REQ_F_BUFFER_SELECTED|REQ_F_BUFFER_RING)))
 		return 0;
-
-	/*
-	 * We can add this buffer back to two lists:
-	 *
-	 * 1) The io_buffers_cache list. This one is protected by the
-	 *    ctx->uring_lock. If we already hold this lock, add back to this
-	 *    list as we can grab it from issue as well.
-	 * 2) The io_buffers_comp list. This one is protected by the
-	 *    ctx->completion_lock.
-	 *
-	 * We migrate buffers from the comp_list to the issue cache list
-	 * when we need one.
-	 */
-	if (req->flags & REQ_F_BUFFER_RING) {
-		/* no buffers to recycle for this case */
-		cflags = __io_put_kbuf(req, NULL);
-	} else if (issue_flags & IO_URING_F_UNLOCKED) {
-		struct io_ring_ctx *ctx = req->ctx;
-
-		spin_lock(&ctx->completion_lock);
-		cflags = __io_put_kbuf(req, &ctx->io_buffers_comp);
-		spin_unlock(&ctx->completion_lock);
-	} else {
-		lockdep_assert_held(&req->ctx->uring_lock);
-
-		cflags = __io_put_kbuf(req, &req->ctx->io_buffers_cache);
-	}
-
-	return cflags;
+	return __io_put_kbuf(req, issue_flags);
 }
 #endif
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 05/16] io_uring: poll: remove unnecessary req->ref set
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 04/16] io_uring: don't inline io_put_kbuf Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 06/16] io_uring: switch cancel_hash to use per entry spinlock Pavel Begunkov
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence, Hao Xu

From: Hao Xu <[email protected]>

We now don't need to set req->refcount for poll requests since the
reworked poll code ensures no request release race.

Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/poll.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 0df5eca93b16..73584c4e3e9b 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -683,7 +683,6 @@ int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if ((flags & IORING_POLL_ADD_MULTI) && (req->flags & REQ_F_CQE_SKIP))
 		return -EINVAL;
 
-	io_req_set_refcount(req);
 	req->apoll_events = poll->events = io_poll_parse_events(sqe, flags);
 	return 0;
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 06/16] io_uring: switch cancel_hash to use per entry spinlock
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 05/16] io_uring: poll: remove unnecessary req->ref set Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 07/16] io_uring: pass poll_find lock back Pavel Begunkov
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence, Hao Xu

From: Hao Xu <[email protected]>

Add a new io_hash_bucket structure so that each bucket in cancel_hash
has separate spinlock. Use per entry lock for cancel_hash, this removes
some completion lock invocation and remove contension between different
cancel_hash entries.

Signed-off-by: Hao Xu <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/cancel.c         | 14 ++++++-
 io_uring/cancel.h         |  6 +++
 io_uring/fdinfo.c         |  9 +++--
 io_uring/io_uring.c       |  9 +++--
 io_uring/io_uring_types.h |  2 +-
 io_uring/poll.c           | 80 ++++++++++++++++++++++++---------------
 6 files changed, 80 insertions(+), 40 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 83cceb52d82d..6f2888388a40 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -93,14 +93,14 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
 	if (!ret)
 		return 0;
 
-	spin_lock(&ctx->completion_lock);
 	ret = io_poll_cancel(ctx, cd);
 	if (ret != -ENOENT)
 		goto out;
+	spin_lock(&ctx->completion_lock);
 	if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
 		ret = io_timeout_cancel(ctx, cd);
-out:
 	spin_unlock(&ctx->completion_lock);
+out:
 	return ret;
 }
 
@@ -192,3 +192,13 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
 	io_req_set_res(req, ret, 0);
 	return IOU_OK;
 }
+
+void init_hash_table(struct io_hash_bucket *hash_table, unsigned size)
+{
+	unsigned int i;
+
+	for (i = 0; i < size; i++) {
+		spin_lock_init(&hash_table[i].lock);
+		INIT_HLIST_HEAD(&hash_table[i].list);
+	}
+}
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 4f35d8696325..556a7dcf160e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,3 +4,9 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
 
 int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+void init_hash_table(struct io_hash_bucket *hash_table, unsigned size);
+
+struct io_hash_bucket {
+	spinlock_t		lock;
+	struct hlist_head	list;
+} ____cacheline_aligned_in_smp;
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index fcedde4b4b1e..f941c73f5502 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -13,6 +13,7 @@
 #include "io_uring.h"
 #include "sqpoll.h"
 #include "fdinfo.h"
+#include "cancel.h"
 
 #ifdef CONFIG_PROC_FS
 static __cold int io_uring_show_cred(struct seq_file *m, unsigned int id,
@@ -157,17 +158,19 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 		mutex_unlock(&ctx->uring_lock);
 
 	seq_puts(m, "PollList:\n");
-	spin_lock(&ctx->completion_lock);
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct hlist_head *list = &ctx->cancel_hash[i];
+		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
 		struct io_kiocb *req;
 
-		hlist_for_each_entry(req, list, hash_node)
+		spin_lock(&hb->lock);
+		hlist_for_each_entry(req, &hb->list, hash_node)
 			seq_printf(m, "  op=%d, task_works=%d\n", req->opcode,
 					task_work_pending(req->task));
+		spin_unlock(&hb->lock);
 	}
 
 	seq_puts(m, "CqOverflowList:\n");
+	spin_lock(&ctx->completion_lock);
 	list_for_each_entry(ocqe, &ctx->cq_overflow_list, list) {
 		struct io_uring_cqe *cqe = &ocqe->cqe;
 
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1bcd2a8ebd4c..d0242e5c8d0a 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -89,6 +89,7 @@
 #include "fdinfo.h"
 #include "kbuf.h"
 #include "rsrc.h"
+#include "cancel.h"
 
 #include "timeout.h"
 #include "poll.h"
@@ -260,11 +261,13 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	if (hash_bits <= 0)
 		hash_bits = 1;
 	ctx->cancel_hash_bits = hash_bits;
-	ctx->cancel_hash = kmalloc((1U << hash_bits) * sizeof(struct hlist_head),
-					GFP_KERNEL);
+	ctx->cancel_hash =
+		kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
+			GFP_KERNEL);
 	if (!ctx->cancel_hash)
 		goto err;
-	__hash_init(ctx->cancel_hash, 1U << hash_bits);
+
+	init_hash_table(ctx->cancel_hash, 1U << hash_bits);
 
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 4576ea8cad2e..1f8db2dd7af7 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -224,7 +224,7 @@ struct io_ring_ctx {
 		 * manipulate the list, hence no extra locking is needed there.
 		 */
 		struct io_wq_work_list	iopoll_list;
-		struct hlist_head	*cancel_hash;
+		struct io_hash_bucket	*cancel_hash;
 		unsigned		cancel_hash_bits;
 		bool			poll_multi_queue;
 
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 73584c4e3e9b..7f6b16f687b0 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -19,6 +19,7 @@
 #include "opdef.h"
 #include "kbuf.h"
 #include "poll.h"
+#include "cancel.h"
 
 struct io_poll_update {
 	struct file			*file;
@@ -73,10 +74,22 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
 static void io_poll_req_insert(struct io_kiocb *req)
 {
 	struct io_ring_ctx *ctx = req->ctx;
-	struct hlist_head *list;
+	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
 
-	list = &ctx->cancel_hash[hash_long(req->cqe.user_data, ctx->cancel_hash_bits)];
-	hlist_add_head(&req->hash_node, list);
+	spin_lock(&hb->lock);
+	hlist_add_head(&req->hash_node, &hb->list);
+	spin_unlock(&hb->lock);
+}
+
+static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
+{
+	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+	spinlock_t *lock = &ctx->cancel_hash[index].lock;
+
+	spin_lock(lock);
+	hash_del(&req->hash_node);
+	spin_unlock(lock);
 }
 
 static void io_init_poll_iocb(struct io_poll *poll, __poll_t events,
@@ -220,8 +233,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	}
 
 	io_poll_remove_entries(req);
+	io_poll_req_delete(req, ctx);
 	spin_lock(&ctx->completion_lock);
-	hash_del(&req->hash_node);
 	req->cqe.flags = 0;
 	__io_req_complete_post(req);
 	io_commit_cqring(ctx);
@@ -231,7 +244,6 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 
 static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 {
-	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
@@ -239,9 +251,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 		return;
 
 	io_poll_remove_entries(req);
-	spin_lock(&ctx->completion_lock);
-	hash_del(&req->hash_node);
-	spin_unlock(&ctx->completion_lock);
+	io_poll_req_delete(req, req->ctx);
 
 	if (!ret)
 		io_req_task_submit(req, locked);
@@ -435,9 +445,7 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
 		return 0;
 	}
 
-	spin_lock(&ctx->completion_lock);
 	io_poll_req_insert(req);
-	spin_unlock(&ctx->completion_lock);
 
 	if (mask && (poll->events & EPOLLET)) {
 		/* can't multishot if failed, just queue the event we've got */
@@ -534,32 +542,31 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 	bool found = false;
 	int i;
 
-	spin_lock(&ctx->completion_lock);
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct hlist_head *list;
+		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
 
-		list = &ctx->cancel_hash[i];
-		hlist_for_each_entry_safe(req, tmp, list, hash_node) {
+		spin_lock(&hb->lock);
+		hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) {
 			if (io_match_task_safe(req, tsk, cancel_all)) {
 				hlist_del_init(&req->hash_node);
 				io_poll_cancel_req(req);
 				found = true;
 			}
 		}
+		spin_unlock(&hb->lock);
 	}
-	spin_unlock(&ctx->completion_lock);
 	return found;
 }
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
-	struct hlist_head *list;
 	struct io_kiocb *req;
+	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
+	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
 
-	list = &ctx->cancel_hash[hash_long(cd->data, ctx->cancel_hash_bits)];
-	hlist_for_each_entry(req, list, hash_node) {
+	spin_lock(&hb->lock);
+	hlist_for_each_entry(req, &hb->list, hash_node) {
 		if (cd->data != req->cqe.user_data)
 			continue;
 		if (poll_only && req->opcode != IORING_OP_POLL_ADD)
@@ -571,21 +578,21 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 		}
 		return req;
 	}
+	spin_unlock(&hb->lock);
 	return NULL;
 }
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 					  struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
 	struct io_kiocb *req;
 	int i;
 
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct hlist_head *list;
+		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
 
-		list = &ctx->cancel_hash[i];
-		hlist_for_each_entry(req, list, hash_node) {
+		spin_lock(&hb->lock);
+		hlist_for_each_entry(req, &hb->list, hash_node) {
 			if (!(cd->flags & IORING_ASYNC_CANCEL_ANY) &&
 			    req->file != cd->file)
 				continue;
@@ -594,12 +601,12 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 			req->work.cancel_seq = cd->seq;
 			return req;
 		}
+		spin_unlock(&hb->lock);
 	}
 	return NULL;
 }
 
 static bool io_poll_disarm(struct io_kiocb *req)
-	__must_hold(&ctx->completion_lock)
 {
 	if (!io_poll_get_ownership(req))
 		return false;
@@ -609,17 +616,23 @@ static bool io_poll_disarm(struct io_kiocb *req)
 }
 
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
-	__must_hold(&ctx->completion_lock)
 {
 	struct io_kiocb *req;
+	u32 index;
+	spinlock_t *lock;
 
 	if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
 		req = io_poll_file_find(ctx, cd);
 	else
 		req = io_poll_find(ctx, false, cd);
-	if (!req)
+	if (!req) {
 		return -ENOENT;
+	} else {
+		index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
+		lock = &ctx->cancel_hash[index].lock;
+	}
 	io_poll_cancel_req(req);
+	spin_unlock(lock);
 	return 0;
 }
 
@@ -713,18 +726,23 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_poll_update *poll_update = io_kiocb_to_cmd(req);
 	struct io_cancel_data cd = { .data = poll_update->old_user_data, };
 	struct io_ring_ctx *ctx = req->ctx;
+	u32 index = hash_long(cd.data, ctx->cancel_hash_bits);
+	spinlock_t *lock = &ctx->cancel_hash[index].lock;
 	struct io_kiocb *preq;
 	int ret2, ret = 0;
 	bool locked;
 
-	spin_lock(&ctx->completion_lock);
 	preq = io_poll_find(ctx, true, &cd);
-	if (!preq || !io_poll_disarm(preq)) {
-		spin_unlock(&ctx->completion_lock);
-		ret = preq ? -EALREADY : -ENOENT;
+	if (!preq) {
+		ret = -ENOENT;
+		goto out;
+	}
+	ret2 = io_poll_disarm(preq);
+	spin_unlock(lock);
+	if (!ret2) {
+		ret = -EALREADY;
 		goto out;
 	}
-	spin_unlock(&ctx->completion_lock);
 
 	if (poll_update->update_events || poll_update->update_user_data) {
 		/* only mask one event flags, keep behavior flags */
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 07/16] io_uring: pass poll_find lock back
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 06/16] io_uring: switch cancel_hash to use per entry spinlock Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 08/16] io_uring: clean up io_try_cancel Pavel Begunkov
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Instead of using implicit knowledge of what is locked or not after
io_poll_find() and co returns, pass back a pointer to the locked
bucket if any. If set the user must to unlock the spinlock.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/poll.c | 46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 7f6b16f687b0..7fc4aafcca95 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -559,12 +559,15 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 }
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
-				     struct io_cancel_data *cd)
+				     struct io_cancel_data *cd,
+				     struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
 	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
 	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
 
+	*out_bucket = NULL;
+
 	spin_lock(&hb->lock);
 	hlist_for_each_entry(req, &hb->list, hash_node) {
 		if (cd->data != req->cqe.user_data)
@@ -576,6 +579,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				continue;
 			req->work.cancel_seq = cd->seq;
 		}
+		*out_bucket = hb;
 		return req;
 	}
 	spin_unlock(&hb->lock);
@@ -583,11 +587,14 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 }
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
-					  struct io_cancel_data *cd)
+					  struct io_cancel_data *cd,
+					  struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
 	int i;
 
+	*out_bucket = NULL;
+
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
 		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
 
@@ -599,6 +606,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 			if (cd->seq == req->work.cancel_seq)
 				continue;
 			req->work.cancel_seq = cd->seq;
+			*out_bucket = hb;
 			return req;
 		}
 		spin_unlock(&hb->lock);
@@ -617,23 +625,19 @@ static bool io_poll_disarm(struct io_kiocb *req)
 
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
 {
+	struct io_hash_bucket *bucket;
 	struct io_kiocb *req;
-	u32 index;
-	spinlock_t *lock;
 
 	if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
-		req = io_poll_file_find(ctx, cd);
+		req = io_poll_file_find(ctx, cd, &bucket);
 	else
-		req = io_poll_find(ctx, false, cd);
-	if (!req) {
-		return -ENOENT;
-	} else {
-		index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
-		lock = &ctx->cancel_hash[index].lock;
-	}
-	io_poll_cancel_req(req);
-	spin_unlock(lock);
-	return 0;
+		req = io_poll_find(ctx, false, cd, &bucket);
+
+	if (req)
+		io_poll_cancel_req(req);
+	if (bucket)
+		spin_unlock(&bucket->lock);
+	return req ? 0 : -ENOENT;
 }
 
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -726,19 +730,21 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_poll_update *poll_update = io_kiocb_to_cmd(req);
 	struct io_cancel_data cd = { .data = poll_update->old_user_data, };
 	struct io_ring_ctx *ctx = req->ctx;
-	u32 index = hash_long(cd.data, ctx->cancel_hash_bits);
-	spinlock_t *lock = &ctx->cancel_hash[index].lock;
+	struct io_hash_bucket *bucket;
 	struct io_kiocb *preq;
 	int ret2, ret = 0;
 	bool locked;
 
-	preq = io_poll_find(ctx, true, &cd);
+	preq = io_poll_find(ctx, true, &cd, &bucket);
+	if (preq)
+		ret2 = io_poll_disarm(preq);
+	if (bucket)
+		spin_unlock(&bucket->lock);
+
 	if (!preq) {
 		ret = -ENOENT;
 		goto out;
 	}
-	ret2 = io_poll_disarm(preq);
-	spin_unlock(lock);
 	if (!ret2) {
 		ret = -EALREADY;
 		goto out;
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 08/16] io_uring: clean up io_try_cancel
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (6 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 07/16] io_uring: pass poll_find lock back Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 09/16] io_uring: limit the number of cancellation buckets Pavel Begunkov
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Get rid of an unnecessary extra goto in io_try_cancel() and simplify the
function.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/cancel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 6f2888388a40..a253e2ad22eb 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -95,12 +95,12 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
 
 	ret = io_poll_cancel(ctx, cd);
 	if (ret != -ENOENT)
-		goto out;
+		return ret;
+
 	spin_lock(&ctx->completion_lock);
 	if (!(cd->flags & IORING_ASYNC_CANCEL_FD))
 		ret = io_timeout_cancel(ctx, cd);
 	spin_unlock(&ctx->completion_lock);
-out:
 	return ret;
 }
 
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 09/16] io_uring: limit the number of cancellation buckets
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (7 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 08/16] io_uring: clean up io_try_cancel Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 10/16] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Don't allocate to many hash/cancellation buckets, there might be too
many, clamp it to 8 bits, or 256 * 64B = 16KB. We don't usually have too
many requests, and 256 buckets should be enough, especially since we
do hash search only in the cancellation path.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/io_uring.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index d0242e5c8d0a..97113c71e881 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -254,12 +254,12 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 
 	/*
 	 * Use 5 bits less than the max cq entries, that should give us around
-	 * 32 entries per hash list if totally full and uniformly spread.
+	 * 32 entries per hash list if totally full and uniformly spread, but
+	 * don't keep too many buckets to not overconsume memory.
 	 */
-	hash_bits = ilog2(p->cq_entries);
-	hash_bits -= 5;
-	if (hash_bits <= 0)
-		hash_bits = 1;
+	hash_bits = ilog2(p->cq_entries) - 5;
+	hash_bits = clamp(hash_bits, 1, 8);
+
 	ctx->cancel_hash_bits = hash_bits;
 	ctx->cancel_hash =
 		kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 10/16] io_uring: clean up io_ring_ctx_alloc
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (8 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 09/16] io_uring: limit the number of cancellation buckets Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 11/16] io_uring: use state completion infra for poll reqs Pavel Begunkov
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Add a variable for the number of hash buckets in io_ring_ctx_alloc(),
makes it more readable.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/io_uring.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 97113c71e881..b2bd71cb2be6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -244,6 +244,8 @@ static __cold void io_fallback_req_func(struct work_struct *work)
 static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 {
 	struct io_ring_ctx *ctx;
+	unsigned hash_buckets;
+	size_t hash_size;
 	int hash_bits;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -259,15 +261,15 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	 */
 	hash_bits = ilog2(p->cq_entries) - 5;
 	hash_bits = clamp(hash_bits, 1, 8);
+	hash_buckets = 1U << hash_bits;
+	hash_size = hash_buckets * sizeof(struct io_hash_bucket);
 
 	ctx->cancel_hash_bits = hash_bits;
-	ctx->cancel_hash =
-		kmalloc((1U << hash_bits) * sizeof(struct io_hash_bucket),
-			GFP_KERNEL);
+	ctx->cancel_hash = kmalloc(hash_size, GFP_KERNEL);
 	if (!ctx->cancel_hash)
 		goto err;
 
-	init_hash_table(ctx->cancel_hash, 1U << hash_bits);
+	init_hash_table(ctx->cancel_hash, hash_buckets);
 
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 11/16] io_uring: use state completion infra for poll reqs
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (9 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 10/16] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 12/16] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Use io_req_task_complete() for poll request completions, so it can
utilise state completions and save lots of unnecessary locking.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/poll.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 7fc4aafcca95..c4ce98504986 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -234,12 +234,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 
 	io_poll_remove_entries(req);
 	io_poll_req_delete(req, ctx);
-	spin_lock(&ctx->completion_lock);
-	req->cqe.flags = 0;
-	__io_req_complete_post(req);
-	io_commit_cqring(ctx);
-	spin_unlock(&ctx->completion_lock);
-	io_cqring_ev_posted(ctx);
+	io_req_set_res(req, req->cqe.res, 0);
+	io_req_task_complete(req, locked);
 }
 
 static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 12/16] io_uring: add IORING_SETUP_SINGLE_ISSUER
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (10 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 11/16] io_uring: use state completion infra for poll reqs Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 13/16] io_uring: pass hash table into poll_find Pavel Begunkov
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Add a new IORING_SETUP_SINGLE_ISSUER flag and the userspace visible part
of it, i.e. put limitations of submitters. Also, don't allow it together
with IOPOLL as we're not going to put it to good use.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 include/uapi/linux/io_uring.h |  5 ++++-
 io_uring/io_uring.c           |  7 +++++--
 io_uring/io_uring_types.h     |  1 +
 io_uring/tctx.c               | 27 ++++++++++++++++++++++++---
 io_uring/tctx.h               |  4 ++--
 5 files changed, 36 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 1d176f935f5d..8715f0942ec2 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -137,9 +137,12 @@ enum {
  * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
  */
 #define IORING_SETUP_TASKRUN_FLAG	(1U << 9)
-
 #define IORING_SETUP_SQE128		(1U << 10) /* SQEs are 128 byte */
 #define IORING_SETUP_CQE32		(1U << 11) /* CQEs are 32 byte */
+/*
+ * Only one task is allowed to submit requests
+ */
+#define IORING_SETUP_SINGLE_ISSUER	(1U << 12)
 
 enum io_uring_op {
 	IORING_OP_NOP,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index b2bd71cb2be6..1308a1f0c349 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2469,6 +2469,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	io_destroy_buffers(ctx);
 	if (ctx->sq_creds)
 		put_cred(ctx->sq_creds);
+	if (ctx->submitter_task)
+		put_task_struct(ctx->submitter_task);
 
 	/* there are no registered resources left, nobody uses it */
 	if (ctx->rsrc_node)
@@ -3201,7 +3203,7 @@ static int io_uring_install_fd(struct io_ring_ctx *ctx, struct file *file)
 	if (fd < 0)
 		return fd;
 
-	ret = io_uring_add_tctx_node(ctx);
+	ret = __io_uring_add_tctx_node(ctx, false);
 	if (ret) {
 		put_unused_fd(fd);
 		return ret;
@@ -3421,7 +3423,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params)
 			IORING_SETUP_CLAMP | IORING_SETUP_ATTACH_WQ |
 			IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL |
 			IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG |
-			IORING_SETUP_SQE128 | IORING_SETUP_CQE32))
+			IORING_SETUP_SQE128 | IORING_SETUP_CQE32 |
+			IORING_SETUP_SINGLE_ISSUER))
 		return -EINVAL;
 
 	return io_uring_create(entries, &p, params);
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 1f8db2dd7af7..8b00243abf65 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -243,6 +243,7 @@ struct io_ring_ctx {
 	/* Keep this last, we don't need it for the fast path */
 
 	struct io_restriction		restrictions;
+	struct task_struct		*submitter_task;
 
 	/* slow path rsrc auxilary data, used by update/register */
 	struct io_rsrc_node		*rsrc_backup_node;
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index 6adf659687f8..012be261dc50 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -81,12 +81,32 @@ __cold int io_uring_alloc_task_context(struct task_struct *task,
 	return 0;
 }
 
-int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
+static int io_register_submitter(struct io_ring_ctx *ctx)
+{
+	int ret = 0;
+
+	mutex_lock(&ctx->uring_lock);
+	if (!ctx->submitter_task)
+		ctx->submitter_task = get_task_struct(current);
+	else if (ctx->submitter_task != current)
+		ret = -EEXIST;
+	mutex_unlock(&ctx->uring_lock);
+
+	return ret;
+}
+
+int __io_uring_add_tctx_node(struct io_ring_ctx *ctx, bool submitter)
 {
 	struct io_uring_task *tctx = current->io_uring;
 	struct io_tctx_node *node;
 	int ret;
 
+	if ((ctx->flags & IORING_SETUP_SINGLE_ISSUER) && submitter) {
+		ret = io_register_submitter(ctx);
+		if (ret)
+			return ret;
+	}
+
 	if (unlikely(!tctx)) {
 		ret = io_uring_alloc_task_context(current, ctx);
 		if (unlikely(ret))
@@ -120,7 +140,8 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
 		list_add(&node->ctx_node, &ctx->tctx_list);
 		mutex_unlock(&ctx->uring_lock);
 	}
-	tctx->last = ctx;
+	if (submitter)
+		tctx->last = ctx;
 	return 0;
 }
 
@@ -228,7 +249,7 @@ int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
 		return -EINVAL;
 
 	mutex_unlock(&ctx->uring_lock);
-	ret = io_uring_add_tctx_node(ctx);
+	ret = __io_uring_add_tctx_node(ctx, false);
 	mutex_lock(&ctx->uring_lock);
 	if (ret)
 		return ret;
diff --git a/io_uring/tctx.h b/io_uring/tctx.h
index 7684713e950f..dde82ce4d8e2 100644
--- a/io_uring/tctx.h
+++ b/io_uring/tctx.h
@@ -34,7 +34,7 @@ struct io_tctx_node {
 int io_uring_alloc_task_context(struct task_struct *task,
 				struct io_ring_ctx *ctx);
 void io_uring_del_tctx_node(unsigned long index);
-int __io_uring_add_tctx_node(struct io_ring_ctx *ctx);
+int __io_uring_add_tctx_node(struct io_ring_ctx *ctx, bool submitter);
 void io_uring_clean_tctx(struct io_uring_task *tctx);
 
 void io_uring_unreg_ringfd(void);
@@ -52,5 +52,5 @@ static inline int io_uring_add_tctx_node(struct io_ring_ctx *ctx)
 
 	if (likely(tctx && tctx->last == ctx))
 		return 0;
-	return __io_uring_add_tctx_node(ctx);
+	return __io_uring_add_tctx_node(ctx, true);
 }
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 13/16] io_uring: pass hash table into poll_find
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (11 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 12/16] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 14/16] io_uring: introduce a struct for hash table Pavel Begunkov
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

In preparation for having multiple cancellation hash tables, pass a
table pointer into io_poll_find() and other poll cancel functions.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/poll.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index c4ce98504986..5cc03be365e3 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -556,11 +556,12 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd,
+				     struct io_hash_bucket hash_table[],
 				     struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
 	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
-	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
+	struct io_hash_bucket *hb = &hash_table[index];
 
 	*out_bucket = NULL;
 
@@ -584,6 +585,7 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 					  struct io_cancel_data *cd,
+					  struct io_hash_bucket hash_table[],
 					  struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
@@ -592,7 +594,7 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 	*out_bucket = NULL;
 
 	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+		struct io_hash_bucket *hb = &hash_table[i];
 
 		spin_lock(&hb->lock);
 		hlist_for_each_entry(req, &hb->list, hash_node) {
@@ -619,15 +621,16 @@ static bool io_poll_disarm(struct io_kiocb *req)
 	return true;
 }
 
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+			    struct io_hash_bucket hash_table[])
 {
 	struct io_hash_bucket *bucket;
 	struct io_kiocb *req;
 
 	if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
-		req = io_poll_file_find(ctx, cd, &bucket);
+		req = io_poll_file_find(ctx, cd, ctx->cancel_hash, &bucket);
 	else
-		req = io_poll_find(ctx, false, cd, &bucket);
+		req = io_poll_find(ctx, false, cd, ctx->cancel_hash, &bucket);
 
 	if (req)
 		io_poll_cancel_req(req);
@@ -636,6 +639,11 @@ int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
 	return req ? 0 : -ENOENT;
 }
 
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+{
+	return __io_poll_cancel(ctx, cd, ctx->cancel_hash);
+}
+
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
 				     unsigned int flags)
 {
@@ -731,7 +739,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	int ret2, ret = 0;
 	bool locked;
 
-	preq = io_poll_find(ctx, true, &cd, &bucket);
+	preq = io_poll_find(ctx, true, &cd, ctx->cancel_hash, &bucket);
 	if (preq)
 		ret2 = io_poll_disarm(preq);
 	if (bucket)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 14/16] io_uring: introduce a struct for hash table
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (12 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 13/16] io_uring: pass hash table into poll_find Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 15/16] io_uring: propagate locking state to poll cancel Pavel Begunkov
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Instead of passing around a pointer to hash buckets, add a bit of type
safety and wrap it into a structure.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/cancel.c         |  6 +++---
 io_uring/cancel.h         |  7 +------
 io_uring/fdinfo.c         |  4 ++--
 io_uring/io_uring.c       | 29 ++++++++++++++++------------
 io_uring/io_uring_types.h | 13 +++++++++++--
 io_uring/poll.c           | 40 +++++++++++++++++++++------------------
 6 files changed, 56 insertions(+), 43 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index a253e2ad22eb..f28f0a7d1272 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -193,12 +193,12 @@ int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
 	return IOU_OK;
 }
 
-void init_hash_table(struct io_hash_bucket *hash_table, unsigned size)
+void init_hash_table(struct io_hash_table *table, unsigned size)
 {
 	unsigned int i;
 
 	for (i = 0; i < size; i++) {
-		spin_lock_init(&hash_table[i].lock);
-		INIT_HLIST_HEAD(&hash_table[i].list);
+		spin_lock_init(&table->hbs[i].lock);
+		INIT_HLIST_HEAD(&table->hbs[i].list);
 	}
 }
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 556a7dcf160e..fd4cb1a2595d 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -4,9 +4,4 @@ int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
 
 int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
-void init_hash_table(struct io_hash_bucket *hash_table, unsigned size);
-
-struct io_hash_bucket {
-	spinlock_t		lock;
-	struct hlist_head	list;
-} ____cacheline_aligned_in_smp;
+void init_hash_table(struct io_hash_table *table, unsigned size);
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index f941c73f5502..344e7d90d557 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -158,8 +158,8 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 		mutex_unlock(&ctx->uring_lock);
 
 	seq_puts(m, "PollList:\n");
-	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+	for (i = 0; i < (1U << ctx->cancel_table.hash_bits); i++) {
+		struct io_hash_bucket *hb = &ctx->cancel_table.hbs[i];
 		struct io_kiocb *req;
 
 		spin_lock(&hb->lock);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 1308a1f0c349..46db51641069 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -241,11 +241,23 @@ static __cold void io_fallback_req_func(struct work_struct *work)
 	percpu_ref_put(&ctx->refs);
 }
 
+static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits)
+{
+	unsigned hash_buckets = 1U << bits;
+	size_t hash_size = hash_buckets * sizeof(table->hbs[0]);
+
+	table->hbs = kmalloc(hash_size, GFP_KERNEL);
+	if (!table->hbs)
+		return -ENOMEM;
+
+	table->hash_bits = bits;
+	init_hash_table(table, hash_buckets);
+	return 0;
+}
+
 static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 {
 	struct io_ring_ctx *ctx;
-	unsigned hash_buckets;
-	size_t hash_size;
 	int hash_bits;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -261,16 +273,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	 */
 	hash_bits = ilog2(p->cq_entries) - 5;
 	hash_bits = clamp(hash_bits, 1, 8);
-	hash_buckets = 1U << hash_bits;
-	hash_size = hash_buckets * sizeof(struct io_hash_bucket);
-
-	ctx->cancel_hash_bits = hash_bits;
-	ctx->cancel_hash = kmalloc(hash_size, GFP_KERNEL);
-	if (!ctx->cancel_hash)
+	if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
 		goto err;
 
-	init_hash_table(ctx->cancel_hash, hash_buckets);
-
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
 		goto err;
@@ -311,7 +316,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	return ctx;
 err:
 	kfree(ctx->dummy_ubuf);
-	kfree(ctx->cancel_hash);
+	kfree(ctx->cancel_table.hbs);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
 	kfree(ctx);
@@ -2499,7 +2504,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	io_req_caches_free(ctx);
 	if (ctx->hash_map)
 		io_wq_put_hash(ctx->hash_map);
-	kfree(ctx->cancel_hash);
+	kfree(ctx->cancel_table.hbs);
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index 8b00243abf65..d3b9bde9c702 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -9,6 +9,16 @@
 #include "io-wq.h"
 #include "filetable.h"
 
+struct io_hash_bucket {
+	spinlock_t		lock;
+	struct hlist_head	list;
+} ____cacheline_aligned_in_smp;
+
+struct io_hash_table {
+	struct io_hash_bucket	*hbs;
+	unsigned		hash_bits;
+};
+
 struct io_uring {
 	u32 head ____cacheline_aligned_in_smp;
 	u32 tail ____cacheline_aligned_in_smp;
@@ -224,8 +234,7 @@ struct io_ring_ctx {
 		 * manipulate the list, hence no extra locking is needed there.
 		 */
 		struct io_wq_work_list	iopoll_list;
-		struct io_hash_bucket	*cancel_hash;
-		unsigned		cancel_hash_bits;
+		struct io_hash_table	cancel_table;
 		bool			poll_multi_queue;
 
 		struct list_head	io_buffers_comp;
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 5cc03be365e3..9c7793f5e93b 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -73,9 +73,9 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
 
 static void io_poll_req_insert(struct io_kiocb *req)
 {
-	struct io_ring_ctx *ctx = req->ctx;
-	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
-	struct io_hash_bucket *hb = &ctx->cancel_hash[index];
+	struct io_hash_table *table = &req->ctx->cancel_table;
+	u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+	struct io_hash_bucket *hb = &table->hbs[index];
 
 	spin_lock(&hb->lock);
 	hlist_add_head(&req->hash_node, &hb->list);
@@ -84,8 +84,9 @@ static void io_poll_req_insert(struct io_kiocb *req)
 
 static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
 {
-	u32 index = hash_long(req->cqe.user_data, ctx->cancel_hash_bits);
-	spinlock_t *lock = &ctx->cancel_hash[index].lock;
+	struct io_hash_table *table = &req->ctx->cancel_table;
+	u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+	spinlock_t *lock = &table->hbs[index].lock;
 
 	spin_lock(lock);
 	hash_del(&req->hash_node);
@@ -533,13 +534,15 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 			       bool cancel_all)
 {
+	struct io_hash_table *table = &ctx->cancel_table;
+	unsigned nr_buckets = 1U << table->hash_bits;
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
 	bool found = false;
 	int i;
 
-	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct io_hash_bucket *hb = &ctx->cancel_hash[i];
+	for (i = 0; i < nr_buckets; i++) {
+		struct io_hash_bucket *hb = &table->hbs[i];
 
 		spin_lock(&hb->lock);
 		hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) {
@@ -556,12 +559,12 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd,
-				     struct io_hash_bucket hash_table[],
+				     struct io_hash_table *table,
 				     struct io_hash_bucket **out_bucket)
 {
 	struct io_kiocb *req;
-	u32 index = hash_long(cd->data, ctx->cancel_hash_bits);
-	struct io_hash_bucket *hb = &hash_table[index];
+	u32 index = hash_long(cd->data, table->hash_bits);
+	struct io_hash_bucket *hb = &table->hbs[index];
 
 	*out_bucket = NULL;
 
@@ -585,16 +588,17 @@ static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 
 static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 					  struct io_cancel_data *cd,
-					  struct io_hash_bucket hash_table[],
+					  struct io_hash_table *table,
 					  struct io_hash_bucket **out_bucket)
 {
+	unsigned nr_buckets = 1U << table->hash_bits;
 	struct io_kiocb *req;
 	int i;
 
 	*out_bucket = NULL;
 
-	for (i = 0; i < (1U << ctx->cancel_hash_bits); i++) {
-		struct io_hash_bucket *hb = &hash_table[i];
+	for (i = 0; i < nr_buckets; i++) {
+		struct io_hash_bucket *hb = &table->hbs[i];
 
 		spin_lock(&hb->lock);
 		hlist_for_each_entry(req, &hb->list, hash_node) {
@@ -622,15 +626,15 @@ static bool io_poll_disarm(struct io_kiocb *req)
 }
 
 static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
-			    struct io_hash_bucket hash_table[])
+			    struct io_hash_table *table)
 {
 	struct io_hash_bucket *bucket;
 	struct io_kiocb *req;
 
 	if (cd->flags & (IORING_ASYNC_CANCEL_FD|IORING_ASYNC_CANCEL_ANY))
-		req = io_poll_file_find(ctx, cd, ctx->cancel_hash, &bucket);
+		req = io_poll_file_find(ctx, cd, table, &bucket);
 	else
-		req = io_poll_find(ctx, false, cd, ctx->cancel_hash, &bucket);
+		req = io_poll_find(ctx, false, cd, table, &bucket);
 
 	if (req)
 		io_poll_cancel_req(req);
@@ -641,7 +645,7 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
 {
-	return __io_poll_cancel(ctx, cd, ctx->cancel_hash);
+	return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
 }
 
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -739,7 +743,7 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	int ret2, ret = 0;
 	bool locked;
 
-	preq = io_poll_find(ctx, true, &cd, ctx->cancel_hash, &bucket);
+	preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
 	if (preq)
 		ret2 = io_poll_disarm(preq);
 	if (bucket)
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 15/16] io_uring: propagate locking state to poll cancel
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (13 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 14/16] io_uring: introduce a struct for hash table Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-16  9:22 ` [PATCH for-next v3 16/16] io_uring: mutex locked poll hashing Pavel Begunkov
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Poll cancellation will be soon need to grab ->uring_lock inside, pass
the locking state, i.e. issue_flags, inside the cancellation functions.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/cancel.c  | 7 ++++---
 io_uring/cancel.h  | 3 ++-
 io_uring/poll.c    | 3 ++-
 io_uring/poll.h    | 3 ++-
 io_uring/timeout.c | 3 ++-
 5 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index f28f0a7d1272..f07bfd27c98a 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -78,7 +78,8 @@ static int io_async_cancel_one(struct io_uring_task *tctx,
 	return ret;
 }
 
-int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
+int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd,
+		  unsigned issue_flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
@@ -93,7 +94,7 @@ int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd)
 	if (!ret)
 		return 0;
 
-	ret = io_poll_cancel(ctx, cd);
+	ret = io_poll_cancel(ctx, cd, issue_flags);
 	if (ret != -ENOENT)
 		return ret;
 
@@ -136,7 +137,7 @@ static int __io_async_cancel(struct io_cancel_data *cd, struct io_kiocb *req,
 	int ret, nr = 0;
 
 	do {
-		ret = io_try_cancel(req, cd);
+		ret = io_try_cancel(req, cd, issue_flags);
 		if (ret == -ENOENT)
 			break;
 		if (!all)
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index fd4cb1a2595d..8dd259dc383e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -3,5 +3,6 @@
 int io_async_cancel_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
 
-int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd);
+int io_try_cancel(struct io_kiocb *req, struct io_cancel_data *cd,
+		  unsigned int issue_flags);
 void init_hash_table(struct io_hash_table *table, unsigned size);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 9c7793f5e93b..07157da1c2cb 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -643,7 +643,8 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 	return req ? 0 : -ENOENT;
 }
 
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+		   unsigned issue_flags)
 {
 	return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
 }
diff --git a/io_uring/poll.h b/io_uring/poll.h
index cc75c1567a84..fa3e19790281 100644
--- a/io_uring/poll.h
+++ b/io_uring/poll.h
@@ -24,7 +24,8 @@ int io_poll_add(struct io_kiocb *req, unsigned int issue_flags);
 int io_poll_remove_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags);
 
-int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd);
+int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
+		   unsigned issue_flags);
 int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags);
 bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 			bool cancel_all);
diff --git a/io_uring/timeout.c b/io_uring/timeout.c
index 69cca42d6835..526fc8b2e3b6 100644
--- a/io_uring/timeout.c
+++ b/io_uring/timeout.c
@@ -262,6 +262,7 @@ int io_timeout_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd)
 
 static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
 {
+	unsigned issue_flags = *locked ? 0 : IO_URING_F_UNLOCKED;
 	struct io_timeout *timeout = io_kiocb_to_cmd(req);
 	struct io_kiocb *prev = timeout->prev;
 	int ret = -ENOENT;
@@ -273,7 +274,7 @@ static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
 				.data		= prev->cqe.user_data,
 			};
 
-			ret = io_try_cancel(req, &cd);
+			ret = io_try_cancel(req, &cd, issue_flags);
 		}
 		io_req_set_res(req, ret ?: -ETIME, 0);
 		io_req_complete_post(req);
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH for-next v3 16/16] io_uring: mutex locked poll hashing
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (14 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 15/16] io_uring: propagate locking state to poll cancel Pavel Begunkov
@ 2022-06-16  9:22 ` Pavel Begunkov
  2022-06-17 15:35   ` Nathan Chancellor
  2022-06-16 13:18 ` [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Jens Axboe
  2022-06-16 15:58 ` Hao Xu
  17 siblings, 1 reply; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-16  9:22 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Currently we do two extra spin lock/unlock pairs to add a poll/apoll
request to the cancellation hash table and remove it from there.

On the submission side we often already hold ->uring_lock and tw
completion is likely to hold it as well. Add a second cancellation hash
table protected by ->uring_lock. In concerns for latency because of a
need to have the mutex locked on the completion side, use the new table
only in following cases:

1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
   is little to no contention and so the main tw hander will almost
   always end up grabbing it before calling callbacks.

2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
   a major user of ->uring_lock.

3) apoll: we normally grab the lock on the completion side anyway to
   execute the request, so it's free.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 io_uring/io_uring.c       |   9 ++-
 io_uring/io_uring_types.h |   4 ++
 io_uring/poll.c           | 117 +++++++++++++++++++++++++++++++-------
 3 files changed, 108 insertions(+), 22 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 46db51641069..d256a611be4e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -275,6 +275,8 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	hash_bits = clamp(hash_bits, 1, 8);
 	if (io_alloc_hash_table(&ctx->cancel_table, hash_bits))
 		goto err;
+	if (io_alloc_hash_table(&ctx->cancel_table_locked, hash_bits))
+		goto err;
 
 	ctx->dummy_ubuf = kzalloc(sizeof(*ctx->dummy_ubuf), GFP_KERNEL);
 	if (!ctx->dummy_ubuf)
@@ -317,6 +319,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 err:
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx->cancel_table.hbs);
+	kfree(ctx->cancel_table_locked.hbs);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
 	kfree(ctx);
@@ -2505,6 +2508,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	if (ctx->hash_map)
 		io_wq_put_hash(ctx->hash_map);
 	kfree(ctx->cancel_table.hbs);
+	kfree(ctx->cancel_table_locked.hbs);
 	kfree(ctx->dummy_ubuf);
 	kfree(ctx->io_bl);
 	xa_destroy(&ctx->io_bl_xa);
@@ -2666,12 +2670,13 @@ static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 		__io_cqring_overflow_flush(ctx, true);
 	xa_for_each(&ctx->personalities, index, creds)
 		io_unregister_personality(ctx, index);
+	if (ctx->rings)
+		io_poll_remove_all(ctx, NULL, true);
 	mutex_unlock(&ctx->uring_lock);
 
 	/* failed during ring init, it couldn't have issued any requests */
 	if (ctx->rings) {
 		io_kill_timeouts(ctx, NULL, true);
-		io_poll_remove_all(ctx, NULL, true);
 		/* if we failed setting up the ctx, we might not have any rings */
 		io_iopoll_try_reap_events(ctx);
 	}
@@ -2796,7 +2801,9 @@ static __cold void io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 		}
 
 		ret |= io_cancel_defer_files(ctx, task, cancel_all);
+		mutex_lock(&ctx->uring_lock);
 		ret |= io_poll_remove_all(ctx, task, cancel_all);
+		mutex_unlock(&ctx->uring_lock);
 		ret |= io_kill_timeouts(ctx, task, cancel_all);
 		if (task)
 			ret |= io_run_task_work();
diff --git a/io_uring/io_uring_types.h b/io_uring/io_uring_types.h
index d3b9bde9c702..65ac7cdaaa73 100644
--- a/io_uring/io_uring_types.h
+++ b/io_uring/io_uring_types.h
@@ -191,6 +191,7 @@ struct io_ring_ctx {
 		struct xarray		io_bl_xa;
 		struct list_head	io_buffers_cache;
 
+		struct io_hash_table	cancel_table_locked;
 		struct list_head	cq_overflow_list;
 		struct list_head	apoll_cache;
 		struct xarray		personalities;
@@ -323,6 +324,7 @@ enum {
 	REQ_F_CQE32_INIT_BIT,
 	REQ_F_APOLL_MULTISHOT_BIT,
 	REQ_F_CLEAR_POLLIN_BIT,
+	REQ_F_HASH_LOCKED_BIT,
 	/* keep async read/write and isreg together and in order */
 	REQ_F_SUPPORT_NOWAIT_BIT,
 	REQ_F_ISREG_BIT,
@@ -393,6 +395,8 @@ enum {
 	REQ_F_CQE32_INIT	= BIT(REQ_F_CQE32_INIT_BIT),
 	/* recvmsg special flag, clear EPOLLIN */
 	REQ_F_CLEAR_POLLIN	= BIT(REQ_F_CLEAR_POLLIN_BIT),
+	/* hashed into ->cancel_hash_locked, protected by ->uring_lock */
+	REQ_F_HASH_LOCKED	= BIT(REQ_F_HASH_LOCKED_BIT),
 };
 
 typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 07157da1c2cb..2e068e05732a 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -93,6 +93,32 @@ static void io_poll_req_delete(struct io_kiocb *req, struct io_ring_ctx *ctx)
 	spin_unlock(lock);
 }
 
+static void io_poll_req_insert_locked(struct io_kiocb *req)
+{
+	struct io_hash_table *table = &req->ctx->cancel_table_locked;
+	u32 index = hash_long(req->cqe.user_data, table->hash_bits);
+
+	hlist_add_head(&req->hash_node, &table->hbs[index].list);
+}
+
+static void io_poll_tw_hash_eject(struct io_kiocb *req, bool *locked)
+{
+	struct io_ring_ctx *ctx = req->ctx;
+
+	if (req->flags & REQ_F_HASH_LOCKED) {
+		/*
+		 * ->cancel_table_locked is protected by ->uring_lock in
+		 * contrast to per bucket spinlocks. Likely, tctx_task_work()
+		 * already grabbed the mutex for us, but there is a chance it
+		 * failed.
+		 */
+		io_tw_lock(ctx, locked);
+		hash_del(&req->hash_node);
+	} else {
+		io_poll_req_delete(req, ctx);
+	}
+}
+
 static void io_init_poll_iocb(struct io_poll *poll, __poll_t events,
 			      wait_queue_func_t wake_func)
 {
@@ -217,7 +243,6 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 
 static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 {
-	struct io_ring_ctx *ctx = req->ctx;
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
@@ -234,7 +259,8 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	}
 
 	io_poll_remove_entries(req);
-	io_poll_req_delete(req, ctx);
+	io_poll_tw_hash_eject(req, locked);
+
 	io_req_set_res(req, req->cqe.res, 0);
 	io_req_task_complete(req, locked);
 }
@@ -248,7 +274,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 		return;
 
 	io_poll_remove_entries(req);
-	io_poll_req_delete(req, req->ctx);
+	io_poll_tw_hash_eject(req, locked);
 
 	if (!ret)
 		io_req_task_submit(req, locked);
@@ -442,7 +468,10 @@ static int __io_arm_poll_handler(struct io_kiocb *req,
 		return 0;
 	}
 
-	io_poll_req_insert(req);
+	if (req->flags & REQ_F_HASH_LOCKED)
+		io_poll_req_insert_locked(req);
+	else
+		io_poll_req_insert(req);
 
 	if (mask && (poll->events & EPOLLET)) {
 		/* can't multishot if failed, just queue the event we've got */
@@ -480,6 +509,15 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 	__poll_t mask = POLLPRI | POLLERR | EPOLLET;
 	int ret;
 
+	/*
+	 * apoll requests already grab the mutex to complete in the tw handler,
+	 * so removal from the mutex-backed hash is free, use it by default.
+	 */
+	if (issue_flags & IO_URING_F_UNLOCKED)
+		req->flags &= ~REQ_F_HASH_LOCKED;
+	else
+		req->flags |= REQ_F_HASH_LOCKED;
+
 	if (!def->pollin && !def->pollout)
 		return IO_APOLL_ABORTED;
 	if (!file_can_poll(req->file))
@@ -528,13 +566,10 @@ int io_arm_poll_handler(struct io_kiocb *req, unsigned issue_flags)
 	return IO_APOLL_OK;
 }
 
-/*
- * Returns true if we found and killed one or more poll requests
- */
-__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
-			       bool cancel_all)
+static __cold bool io_poll_remove_all_table(struct task_struct *tsk,
+					    struct io_hash_table *table,
+					    bool cancel_all)
 {
-	struct io_hash_table *table = &ctx->cancel_table;
 	unsigned nr_buckets = 1U << table->hash_bits;
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
@@ -557,6 +592,17 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 	return found;
 }
 
+/*
+ * Returns true if we found and killed one or more poll requests
+ */
+__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
+			       bool cancel_all)
+	__must_hold(&ctx->uring_lock)
+{
+	return io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all) |
+	       io_poll_remove_all_table(tsk, &ctx->cancel_table_locked, cancel_all);
+}
+
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,
 				     struct io_cancel_data *cd,
 				     struct io_hash_table *table,
@@ -616,13 +662,15 @@ static struct io_kiocb *io_poll_file_find(struct io_ring_ctx *ctx,
 	return NULL;
 }
 
-static bool io_poll_disarm(struct io_kiocb *req)
+static int io_poll_disarm(struct io_kiocb *req)
 {
+	if (!req)
+		return -ENOENT;
 	if (!io_poll_get_ownership(req))
-		return false;
+		return -EALREADY;
 	io_poll_remove_entries(req);
 	hash_del(&req->hash_node);
-	return true;
+	return 0;
 }
 
 static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
@@ -646,7 +694,16 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 		   unsigned issue_flags)
 {
-	return __io_poll_cancel(ctx, cd, &ctx->cancel_table);
+	int ret;
+
+	ret = __io_poll_cancel(ctx, cd, &ctx->cancel_table);
+	if (ret != -ENOENT)
+		return ret;
+
+	io_ring_submit_lock(ctx, issue_flags);
+	ret = __io_poll_cancel(ctx, cd, &ctx->cancel_table_locked);
+	io_ring_submit_unlock(ctx, issue_flags);
+	return ret;
 }
 
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
@@ -721,6 +778,16 @@ int io_poll_add(struct io_kiocb *req, unsigned int issue_flags)
 
 	ipt.pt._qproc = io_poll_queue_proc;
 
+	/*
+	 * If sqpoll or single issuer, there is no contention for ->uring_lock
+	 * and we'll end up holding it in tw handlers anyway.
+	 */
+	if (!(issue_flags & IO_URING_F_UNLOCKED) &&
+	    (req->ctx->flags & (IORING_SETUP_SQPOLL | IORING_SETUP_SINGLE_ISSUER)))
+		req->flags |= REQ_F_HASH_LOCKED;
+	else
+		req->flags &= ~REQ_F_HASH_LOCKED;
+
 	ret = __io_arm_poll_handler(req, poll, &ipt, poll->events);
 	if (ipt.error) {
 		return ipt.error;
@@ -745,20 +812,28 @@ int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 	bool locked;
 
 	preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table, &bucket);
-	if (preq)
-		ret2 = io_poll_disarm(preq);
+	ret2 = io_poll_disarm(preq);
 	if (bucket)
 		spin_unlock(&bucket->lock);
-
-	if (!preq) {
-		ret = -ENOENT;
+	if (!ret2)
+		goto found;
+	if (ret2 != -ENOENT) {
+		ret = ret2;
 		goto out;
 	}
-	if (!ret2) {
-		ret = -EALREADY;
+
+	io_ring_submit_lock(ctx, issue_flags);
+	preq = io_poll_find(ctx, true, &cd, &ctx->cancel_table_locked, &bucket);
+	ret2 = io_poll_disarm(preq);
+	if (bucket)
+		spin_unlock(&bucket->lock);
+	io_ring_submit_unlock(ctx, issue_flags);
+	if (ret2) {
+		ret = ret2;
 		goto out;
 	}
 
+found:
 	if (poll_update->update_events || poll_update->update_user_data) {
 		/* only mask one event flags, keep behavior flags */
 		if (poll_update->update_events) {
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (15 preceding siblings ...)
  2022-06-16  9:22 ` [PATCH for-next v3 16/16] io_uring: mutex locked poll hashing Pavel Begunkov
@ 2022-06-16 13:18 ` Jens Axboe
  2022-06-16 15:58 ` Hao Xu
  17 siblings, 0 replies; 21+ messages in thread
From: Jens Axboe @ 2022-06-16 13:18 UTC (permalink / raw)
  To: io-uring, asml.silence

On Thu, 16 Jun 2022 10:21:56 +0100, Pavel Begunkov wrote:
> 1-4 kills REQ_F_COMPLETE_INLINE as we're out of bits.
> 
> Patch 5 from Hao should remove some overhead from poll requests
> 
> Patch 6 from Hao adds per-bucket spinlocks, and 16-19 do a little
> bit of cleanup. The downside of per-bucket spinlocks is that it adds
> additional spinlock/unlock pair in the poll request completion side,
> which shouldn't matter much with 20/25.
> 
> [...]

Applied, thanks!

[01/16] io_uring: rw: delegate sync completions to core io_uring
        commit: 45bfddb605aebe5298b054553ac8daa04bd73c67
[02/16] io_uring: kill REQ_F_COMPLETE_INLINE
        commit: c2399c806444c54c8414f2196e43ea22843e51a5
[03/16] io_uring: refactor io_req_task_complete()
        commit: a74ba63b8d9ed12fc06d6e122a94ebb53e8ae126
[04/16] io_uring: don't inline io_put_kbuf
        commit: 8fffc6537fb8d7e4e8901b0ca982396999c89c09
[05/16] io_uring: poll: remove unnecessary req->ref set
        commit: 0e769a4667807d1bb249b4bcd9cc6ac6cbdea3ab
[06/16] io_uring: switch cancel_hash to use per entry spinlock
        commit: 6c41fff4b73e393107a867c3259a7ce38e3d7137
[07/16] io_uring: pass poll_find lock back
        commit: 4488b60bf5d73e69c6e17f6296f71ab19f290fae
[08/16] io_uring: clean up io_try_cancel
        commit: 034c5701e192e9521dae1a60c295a3dea8bd9f07
[09/16] io_uring: limit the number of cancellation buckets
        commit: 8a0089110740eb78bb6592b12b73c15096cc5b41
[10/16] io_uring: clean up io_ring_ctx_alloc
        commit: 8797b59e7bd7463775690a0ee0de4c2121e39a90
[11/16] io_uring: use state completion infra for poll reqs
        commit: 60ad0a221eb26eb7c3babb000c9fe05f5f3f9231
[12/16] io_uring: add IORING_SETUP_SINGLE_ISSUER
        commit: d2fbea05a52db51e1939fe3f99fdc5086ff093c4
[13/16] io_uring: pass hash table into poll_find
        commit: fbd91877aac264a470b666fbd88a8a31d202993e
[14/16] io_uring: introduce a struct for hash table
        commit: 1aa9e1ce9887505ea87aa86128c1e0960e85e9dd
[15/16] io_uring: propagate locking state to poll cancel
        commit: 3f301363931da831687160817f4b31fad89b50de
[16/16] io_uring: mutex locked poll hashing
        commit: 154d61b44e7eee3b5db68d65d9cb7403c9f58e71

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations
  2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
                   ` (16 preceding siblings ...)
  2022-06-16 13:18 ` [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Jens Axboe
@ 2022-06-16 15:58 ` Hao Xu
  17 siblings, 0 replies; 21+ messages in thread
From: Hao Xu @ 2022-06-16 15:58 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Jens Axboe

On 6/16/22 17:21, Pavel Begunkov wrote:
> 1-4 kills REQ_F_COMPLETE_INLINE as we're out of bits.
> 
> Patch 5 from Hao should remove some overhead from poll requests
> 
> Patch 6 from Hao adds per-bucket spinlocks, and 16-19 do a little
> bit of cleanup. The downside of per-bucket spinlocks is that it adds
> additional spinlock/unlock pair in the poll request completion side,
> which shouldn't matter much with 20/25.
> 
> Patch 11 uses inline completion infra for poll requests, this nicely
> improves perf when there is a good tw batching.
> 
> Patch 12 implements the userspace visible side of
> IORING_SETUP_SINGLE_ISSUER, it'll be used for poll requests and
> later for spinlock optimisations.
> 
> 13-16 introduces ->uring_lock protected cancellation hashing. It
> requires us to grab ->uring_lock in the completion side, but saves
> two spin lock/unlock pairs. We apply it automatically in cases the
> mutex is already likely to be held (see 25/25 description), so there
> is no additional mutex overhead and potential latency problemes.
> 

Reviewed-by: Hao Xu <[email protected]>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v3 16/16] io_uring: mutex locked poll hashing
  2022-06-16  9:22 ` [PATCH for-next v3 16/16] io_uring: mutex locked poll hashing Pavel Begunkov
@ 2022-06-17 15:35   ` Nathan Chancellor
  2022-06-18 12:07     ` Pavel Begunkov
  0 siblings, 1 reply; 21+ messages in thread
From: Nathan Chancellor @ 2022-06-17 15:35 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, Jens Axboe, llvm

On Thu, Jun 16, 2022 at 10:22:12AM +0100, Pavel Begunkov wrote:
> Currently we do two extra spin lock/unlock pairs to add a poll/apoll
> request to the cancellation hash table and remove it from there.
> 
> On the submission side we often already hold ->uring_lock and tw
> completion is likely to hold it as well. Add a second cancellation hash
> table protected by ->uring_lock. In concerns for latency because of a
> need to have the mutex locked on the completion side, use the new table
> only in following cases:
> 
> 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
>    is little to no contention and so the main tw hander will almost
>    always end up grabbing it before calling callbacks.
> 
> 2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
>    a major user of ->uring_lock.
> 
> 3) apoll: we normally grab the lock on the completion side anyway to
>    execute the request, so it's free.
> 
> Signed-off-by: Pavel Begunkov <[email protected]>

<snip>

> -/*
> - * Returns true if we found and killed one or more poll requests
> - */
> -__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
> -			       bool cancel_all)
> +static __cold bool io_poll_remove_all_table(struct task_struct *tsk,
> +					    struct io_hash_table *table,
> +					    bool cancel_all)
>  {
> -	struct io_hash_table *table = &ctx->cancel_table;
>  	unsigned nr_buckets = 1U << table->hash_bits;
>  	struct hlist_node *tmp;
>  	struct io_kiocb *req;
> @@ -557,6 +592,17 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
>  	return found;
>  }
>  
> +/*
> + * Returns true if we found and killed one or more poll requests
> + */
> +__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
> +			       bool cancel_all)
> +	__must_hold(&ctx->uring_lock)
> +{
> +	return io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all) |
> +	       io_poll_remove_all_table(tsk, &ctx->cancel_table_locked, cancel_all);
> +}

Clang warns:

  io_uring/poll.c:602:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
          return io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all) |
                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                                               ||
  io_uring/poll.c:602:9: note: cast one or both operands to int to silence this warning
  1 error generated.

I assume this is intentional so io_poll_remove_all_table() gets called
twice every time? If so, would you be opposed to unrolling this a bit to
make it clear to the compiler? Alternatively, we could change the return
type of io_poll_remove_all_table to be an int or add a cast to int like
the note mentioned but that is rather ugly to me. I can send a formal
patch depending on your preference.

Cheers,
Nathan

diff --git a/io_uring/poll.c b/io_uring/poll.c
index 2e068e05732a..6a70bc220971 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -599,8 +599,12 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
 			       bool cancel_all)
 	__must_hold(&ctx->uring_lock)
 {
-	return io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all) |
-	       io_poll_remove_all_table(tsk, &ctx->cancel_table_locked, cancel_all);
+	bool ret;
+
+	ret = io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all);
+	ret |= io_poll_remove_all_table(tsk, &ctx->cancel_table_locked, cancel_all);
+
+	return ret;
 }
 
 static struct io_kiocb *io_poll_find(struct io_ring_ctx *ctx, bool poll_only,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH for-next v3 16/16] io_uring: mutex locked poll hashing
  2022-06-17 15:35   ` Nathan Chancellor
@ 2022-06-18 12:07     ` Pavel Begunkov
  0 siblings, 0 replies; 21+ messages in thread
From: Pavel Begunkov @ 2022-06-18 12:07 UTC (permalink / raw)
  To: Nathan Chancellor; +Cc: io-uring, Jens Axboe, llvm

On 6/17/22 16:35, Nathan Chancellor wrote:
> On Thu, Jun 16, 2022 at 10:22:12AM +0100, Pavel Begunkov wrote:
>> Currently we do two extra spin lock/unlock pairs to add a poll/apoll
>> request to the cancellation hash table and remove it from there.
>>
>> On the submission side we often already hold ->uring_lock and tw
>> completion is likely to hold it as well. Add a second cancellation hash
>> table protected by ->uring_lock. In concerns for latency because of a
>> need to have the mutex locked on the completion side, use the new table
>> only in following cases:
>>
>> 1) IORING_SETUP_SINGLE_ISSUER: only one task grabs uring_lock, so there
>>     is little to no contention and so the main tw hander will almost
>>     always end up grabbing it before calling callbacks.
>>
>> 2) IORING_SETUP_SQPOLL: same as with single issuer, only one task is
>>     a major user of ->uring_lock.
>>
>> 3) apoll: we normally grab the lock on the completion side anyway to
>>     execute the request, so it's free.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
> 
> <snip>
> 
>> -/*
>> - * Returns true if we found and killed one or more poll requests
>> - */
>> -__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
>> -			       bool cancel_all)
>> +static __cold bool io_poll_remove_all_table(struct task_struct *tsk,
>> +					    struct io_hash_table *table,
>> +					    bool cancel_all)
>>   {
>> -	struct io_hash_table *table = &ctx->cancel_table;
>>   	unsigned nr_buckets = 1U << table->hash_bits;
>>   	struct hlist_node *tmp;
>>   	struct io_kiocb *req;
>> @@ -557,6 +592,17 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
>>   	return found;
>>   }
>>   
>> +/*
>> + * Returns true if we found and killed one or more poll requests
>> + */
>> +__cold bool io_poll_remove_all(struct io_ring_ctx *ctx, struct task_struct *tsk,
>> +			       bool cancel_all)
>> +	__must_hold(&ctx->uring_lock)
>> +{
>> +	return io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all) |
>> +	       io_poll_remove_all_table(tsk, &ctx->cancel_table_locked, cancel_all);
>> +}
> 
> Clang warns:
> 
>    io_uring/poll.c:602:9: error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]
>            return io_poll_remove_all_table(tsk, &ctx->cancel_table, cancel_all) |
>                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>                                                                                 ||
>    io_uring/poll.c:602:9: note: cast one or both operands to int to silence this warning
>    1 error generated.
> 
> I assume this is intentional so io_poll_remove_all_table() gets called
> twice every time? If so, would you be opposed to unrolling this a bit to
> make it clear to the compiler? Alternatively, we could change the return
> type of io_poll_remove_all_table to be an int or add a cast to int like
> the note mentioned but that is rather ugly to me. I can send a formal
> patch depending on your preference.

Thanks for the heads up. Yes, it's intentional, and I'm fine with either
version. Are you going to send a patch?


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-06-18 12:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-16  9:21 [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Pavel Begunkov
2022-06-16  9:21 ` [PATCH for-next v3 01/16] io_uring: rw: delegate sync completions to core io_uring Pavel Begunkov
2022-06-16  9:21 ` [PATCH for-next v3 02/16] io_uring: kill REQ_F_COMPLETE_INLINE Pavel Begunkov
2022-06-16  9:21 ` [PATCH for-next v3 03/16] io_uring: refactor io_req_task_complete() Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 04/16] io_uring: don't inline io_put_kbuf Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 05/16] io_uring: poll: remove unnecessary req->ref set Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 06/16] io_uring: switch cancel_hash to use per entry spinlock Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 07/16] io_uring: pass poll_find lock back Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 08/16] io_uring: clean up io_try_cancel Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 09/16] io_uring: limit the number of cancellation buckets Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 10/16] io_uring: clean up io_ring_ctx_alloc Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 11/16] io_uring: use state completion infra for poll reqs Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 12/16] io_uring: add IORING_SETUP_SINGLE_ISSUER Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 13/16] io_uring: pass hash table into poll_find Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 14/16] io_uring: introduce a struct for hash table Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 15/16] io_uring: propagate locking state to poll cancel Pavel Begunkov
2022-06-16  9:22 ` [PATCH for-next v3 16/16] io_uring: mutex locked poll hashing Pavel Begunkov
2022-06-17 15:35   ` Nathan Chancellor
2022-06-18 12:07     ` Pavel Begunkov
2022-06-16 13:18 ` [PATCH for-next v3 00/16] 5.20 cleanups and poll optimisations Jens Axboe
2022-06-16 15:58 ` Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox