public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
@ 2025-09-04 17:08 Caleb Sander Mateos
  2025-09-04 17:08 ` [PATCH v2 1/5] io_uring: don't include filetable.h in io_uring.h Caleb Sander Mateos
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 17:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos

As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
an io_uring doesn't actually enable any additional optimizations (aside
from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
submits SQEs to skip taking the uring_lock mutex in the submission and
task work paths.

First, we need to close a hole in the IORING_SETUP_SINGLE_ISSUER checks
where IORING_REGISTER_CLONE_BUFFERS only checks whether the thread is
allowed to access one of the two io_urings. It assumes the uring_lock
will prevent concurrent access to the other io_uring, but this will no
longer be the case after the optimization to skip taking uring_lock.

We also need to remove the unused filetable.h #include from io_uring.h
to avoid an #include cycle.

v2:
- Don't enable these optimizations for IORING_SETUP_SQPOLL, as we still
  need to synchronize SQ thread submission with io_uring_register()

Caleb Sander Mateos (5):
  io_uring: don't include filetable.h in io_uring.h
  io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
  io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
  io_uring: factor out uring_lock helpers
  io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER

 io_uring/cancel.c    |  1 +
 io_uring/fdinfo.c    |  2 +-
 io_uring/filetable.c |  3 +-
 io_uring/io_uring.c  | 67 +++++++++++++++++++++++++++++---------------
 io_uring/io_uring.h  | 43 ++++++++++++++++++++++------
 io_uring/kbuf.c      |  6 ++--
 io_uring/net.c       |  1 +
 io_uring/notif.c     |  5 ++--
 io_uring/notif.h     |  3 +-
 io_uring/openclose.c |  1 +
 io_uring/poll.c      |  2 +-
 io_uring/register.c  |  1 +
 io_uring/rsrc.c      | 10 ++++++-
 io_uring/rsrc.h      |  3 +-
 io_uring/rw.c        |  3 +-
 io_uring/splice.c    |  1 +
 io_uring/waitid.c    |  2 +-
 17 files changed, 111 insertions(+), 43 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/5] io_uring: don't include filetable.h in io_uring.h
  2025-09-04 17:08 [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
@ 2025-09-04 17:08 ` Caleb Sander Mateos
  2025-09-04 17:08 ` [PATCH v2 2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() Caleb Sander Mateos
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 17:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos

io_uring/io_uring.h doesn't use anything declared in
io_uring/filetable.h, so drop the unnecessary #include. Add filetable.h
includes in .c files previously relying on the transitive include from
io_uring.h.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
 io_uring/cancel.c    | 1 +
 io_uring/fdinfo.c    | 2 +-
 io_uring/io_uring.c  | 1 +
 io_uring/io_uring.h  | 1 -
 io_uring/net.c       | 1 +
 io_uring/openclose.c | 1 +
 io_uring/register.c  | 1 +
 io_uring/rsrc.c      | 1 +
 io_uring/rw.c        | 1 +
 io_uring/splice.c    | 1 +
 10 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index 6d57602304df..64b51e82baa2 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -9,10 +9,11 @@
 #include <linux/nospec.h>
 #include <linux/io_uring.h>
 
 #include <uapi/linux/io_uring.h>
 
+#include "filetable.h"
 #include "io_uring.h"
 #include "tctx.h"
 #include "poll.h"
 #include "timeout.h"
 #include "waitid.h"
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index 5c7339838769..ff3364531c77 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -7,11 +7,11 @@
 #include <linux/seq_file.h>
 #include <linux/io_uring.h>
 
 #include <uapi/linux/io_uring.h>
 
-#include "io_uring.h"
+#include "filetable.h"
 #include "sqpoll.h"
 #include "fdinfo.h"
 #include "cancel.h"
 #include "rsrc.h"
 
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 20dfa5ef75dc..42f6bfbb99d3 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -77,10 +77,11 @@
 
 #include <uapi/linux/io_uring.h>
 
 #include "io-wq.h"
 
+#include "filetable.h"
 #include "io_uring.h"
 #include "opdef.h"
 #include "refs.h"
 #include "tctx.h"
 #include "register.h"
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index fa8a66b34d4e..d62b7d9fafed 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -9,11 +9,10 @@
 #include <linux/io_uring_types.h>
 #include <uapi/linux/eventpoll.h>
 #include "alloc_cache.h"
 #include "io-wq.h"
 #include "slist.h"
-#include "filetable.h"
 #include "opdef.h"
 
 #ifndef CREATE_TRACE_POINTS
 #include <trace/events/io_uring.h>
 #endif
diff --git a/io_uring/net.c b/io_uring/net.c
index 226fea2312b5..f99b90c762fc 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -8,10 +8,11 @@
 #include <net/compat.h>
 #include <linux/io_uring.h>
 
 #include <uapi/linux/io_uring.h>
 
+#include "filetable.h"
 #include "io_uring.h"
 #include "kbuf.h"
 #include "alloc_cache.h"
 #include "net.h"
 #include "notif.h"
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index d70700e5cef8..bfeb91b31bba 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -12,10 +12,11 @@
 
 #include <uapi/linux/io_uring.h>
 
 #include "../fs/internal.h"
 
+#include "filetable.h"
 #include "io_uring.h"
 #include "rsrc.h"
 #include "openclose.h"
 
 struct io_open {
diff --git a/io_uring/register.c b/io_uring/register.c
index aa5f56ad8358..5e493917a1a8 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -16,10 +16,11 @@
 #include <linux/nospec.h>
 #include <linux/compat.h>
 #include <linux/io_uring.h>
 #include <linux/io_uring_types.h>
 
+#include "filetable.h"
 #include "io_uring.h"
 #include "opdef.h"
 #include "tctx.h"
 #include "rsrc.h"
 #include "sqpoll.h"
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index f75f5e43fa4a..2d15b8785a95 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -11,10 +11,11 @@
 #include <linux/io_uring.h>
 #include <linux/io_uring/cmd.h>
 
 #include <uapi/linux/io_uring.h>
 
+#include "filetable.h"
 #include "io_uring.h"
 #include "openclose.h"
 #include "rsrc.h"
 #include "memmap.h"
 #include "register.h"
diff --git a/io_uring/rw.c b/io_uring/rw.c
index dcde5bb7421a..ab6b4afccec3 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -13,10 +13,11 @@
 #include <linux/io_uring/cmd.h>
 #include <linux/indirect_call_wrapper.h>
 
 #include <uapi/linux/io_uring.h>
 
+#include "filetable.h"
 #include "io_uring.h"
 #include "opdef.h"
 #include "kbuf.h"
 #include "alloc_cache.h"
 #include "rsrc.h"
diff --git a/io_uring/splice.c b/io_uring/splice.c
index 35ce4e60b495..e81ebbb91925 100644
--- a/io_uring/splice.c
+++ b/io_uring/splice.c
@@ -9,10 +9,11 @@
 #include <linux/io_uring.h>
 #include <linux/splice.h>
 
 #include <uapi/linux/io_uring.h>
 
+#include "filetable.h"
 #include "io_uring.h"
 #include "splice.h"
 
 struct io_splice {
 	struct file			*file_out;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
  2025-09-04 17:08 [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
  2025-09-04 17:08 ` [PATCH v2 1/5] io_uring: don't include filetable.h in io_uring.h Caleb Sander Mateos
@ 2025-09-04 17:08 ` Caleb Sander Mateos
  2025-09-09 13:35   ` Jens Axboe
  2025-09-04 17:09 ` [PATCH v2 3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL Caleb Sander Mateos
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 17:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos

io_ring_ctx's enabled with IORING_SETUP_SINGLE_ISSUER are only allowed
a single task submitting to the ctx. Although the documentation only
mentions this restriction applying to io_uring_enter() syscalls,
commit d7cce96c449e ("io_uring: limit registration w/ SINGLE_ISSUER")
extends it to io_uring_register(). Ensuring only one task interacts
with the io_ring_ctx will be important to allow this task to avoid
taking the uring_lock.
There is, however, one gap in these checks: io_register_clone_buffers()
may take the uring_lock on a second (source) io_ring_ctx, but
__io_uring_register() only checks the current thread against the
*destination* io_ring_ctx's submitter_task. Fail the
IORING_REGISTER_CLONE_BUFFERS with -EEXIST if the source io_ring_ctx has
a registered submitter_task other than the current task.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
 io_uring/rsrc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 2d15b8785a95..1e5b7833076a 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1298,14 +1298,21 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
 
 	src_ctx = file->private_data;
 	if (src_ctx != ctx) {
 		mutex_unlock(&ctx->uring_lock);
 		lock_two_rings(ctx, src_ctx);
+
+		if (src_ctx->submitter_task && 
+		    src_ctx->submitter_task != current) {
+			ret = -EEXIST;
+			goto out;
+		}
 	}
 
 	ret = io_clone_buffers(ctx, src_ctx, &buf);
 
+out:
 	if (src_ctx != ctx)
 		mutex_unlock(&src_ctx->uring_lock);
 
 	fput(file);
 	return ret;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
  2025-09-04 17:08 [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
  2025-09-04 17:08 ` [PATCH v2 1/5] io_uring: don't include filetable.h in io_uring.h Caleb Sander Mateos
  2025-09-04 17:08 ` [PATCH v2 2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() Caleb Sander Mateos
@ 2025-09-04 17:09 ` Caleb Sander Mateos
  2025-09-08 14:13   ` Jens Axboe
  2025-09-04 17:09 ` [PATCH v2 4/5] io_uring: factor out uring_lock helpers Caleb Sander Mateos
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 17:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos

IORING_SETUP_SINGLE_ISSUER doesn't currently enable any optimizations,
but it will soon be used to avoid taking io_ring_ctx's uring_lock when
submitting from the single issuer task. If the IORING_SETUP_SQPOLL flag
is set, the SQ thread is the sole task issuing SQEs. However, other
tasks may make io_uring_register() syscalls, which must be synchronized
with SQE submission. So it wouldn't be safe to skip the uring_lock
around the SQ thread's submission even if IORING_SETUP_SINGLE_ISSUER is
set. Therefore, clear IORING_SETUP_SINGLE_ISSUER from the io_ring_ctx
flags if IORING_SETUP_SQPOLL is set.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
 io_uring/io_uring.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 42f6bfbb99d3..c7af9dc3d95a 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3724,10 +3724,19 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
 	 */
 	if ((flags & (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED)) ==
 	    (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED))
 		return -EINVAL;
 
+	/*
+	 * If IORING_SETUP_SQPOLL is set, only the SQ thread issues SQEs,
+	 * but other threads may call io_uring_register() concurrently.
+	 * We still need uring_lock to synchronize these io_ring_ctx accesses,
+	 * so disable the single issuer optimizations.
+	 */
+	if (flags & IORING_SETUP_SQPOLL)
+		p->flags &= ~IORING_SETUP_SINGLE_ISSUER;
+
 	return 0;
 }
 
 int io_uring_fill_params(unsigned entries, struct io_uring_params *p)
 {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/5] io_uring: factor out uring_lock helpers
  2025-09-04 17:08 [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
                   ` (2 preceding siblings ...)
  2025-09-04 17:09 ` [PATCH v2 3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL Caleb Sander Mateos
@ 2025-09-04 17:09 ` Caleb Sander Mateos
  2025-09-04 17:09 ` [PATCH v2 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
  2025-09-09 13:35 ` [PATCH v2 0/5] " Jens Axboe
  5 siblings, 0 replies; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 17:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos

A subsequent commit will skip acquiring the io_ring_ctx uring_lock in
io_uring_enter() and io_handle_tw_list() for IORING_SETUP_SINGLE_ISSUER.
Prepare for this change by factoring out the uring_lock accesses under
these functions into helper functions:
- io_ring_ctx_lock() for mutex_lock(&ctx->uring_lock)
- io_ring_ctx_unlock() for mutex_unlock(&ctx->uring_lock)
- io_ring_ctx_assert_locked() for lockdep_assert_held(&ctx->uring_lock)

For now, the helpers unconditionally call the mutex functions. But a
subsequent commit will condition them on !IORING_SETUP_SINGLE_ISSUER.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
 io_uring/filetable.c |  3 ++-
 io_uring/io_uring.c  | 51 ++++++++++++++++++++++++++------------------
 io_uring/io_uring.h  | 28 ++++++++++++++++++------
 io_uring/kbuf.c      |  6 +++---
 io_uring/notif.c     |  5 +++--
 io_uring/notif.h     |  3 ++-
 io_uring/poll.c      |  2 +-
 io_uring/rsrc.c      |  2 +-
 io_uring/rsrc.h      |  3 ++-
 io_uring/rw.c        |  2 +-
 io_uring/waitid.c    |  2 +-
 11 files changed, 67 insertions(+), 40 deletions(-)

diff --git a/io_uring/filetable.c b/io_uring/filetable.c
index a21660e3145a..aae283e77856 100644
--- a/io_uring/filetable.c
+++ b/io_uring/filetable.c
@@ -55,14 +55,15 @@ void io_free_file_tables(struct io_ring_ctx *ctx, struct io_file_table *table)
 	table->bitmap = NULL;
 }
 
 static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *file,
 				 u32 slot_index)
-	__must_hold(&req->ctx->uring_lock)
 {
 	struct io_rsrc_node *node;
 
+	io_ring_ctx_assert_locked(ctx);
+
 	if (io_is_uring_fops(file))
 		return -EBADF;
 	if (!ctx->file_table.data.nr)
 		return -ENXIO;
 	if (slot_index >= ctx->file_table.data.nr)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c7af9dc3d95a..69e1175256bb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -554,11 +554,11 @@ static unsigned io_linked_nr(struct io_kiocb *req)
 
 static __cold noinline void io_queue_deferred(struct io_ring_ctx *ctx)
 {
 	bool drain_seen = false, first = true;
 
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	__io_req_caches_free(ctx);
 
 	while (!list_empty(&ctx->defer_list)) {
 		struct io_defer_entry *de = list_first_entry(&ctx->defer_list,
 						struct io_defer_entry, list);
@@ -925,11 +925,11 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags
  * Must be called from inline task_work so we now a flush will happen later,
  * and obviously with ctx->uring_lock held (tw always has that).
  */
 void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	lockdep_assert(ctx->lockless_cq);
 
 	if (!io_fill_cqe_aux(ctx, user_data, res, cflags)) {
 		struct io_cqe cqe = io_init_cqe(user_data, res, cflags);
 
@@ -954,11 +954,11 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
 	 */
 	if (!wq_list_empty(&ctx->submit_state.compl_reqs))
 		__io_submit_flush_completions(ctx);
 
 	lockdep_assert(!io_wq_current_is_worker());
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 
 	if (!ctx->lockless_cq) {
 		spin_lock(&ctx->completion_lock);
 		posted = io_fill_cqe_aux(ctx, req->cqe.user_data, res, cflags);
 		spin_unlock(&ctx->completion_lock);
@@ -978,11 +978,11 @@ bool io_req_post_cqe32(struct io_kiocb *req, struct io_uring_cqe cqe[2])
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	bool posted;
 
 	lockdep_assert(!io_wq_current_is_worker());
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 
 	cqe[0].user_data = req->cqe.user_data;
 	if (!ctx->lockless_cq) {
 		spin_lock(&ctx->completion_lock);
 		posted = io_fill_cqe_aux32(ctx, cqe);
@@ -1032,15 +1032,14 @@ static void io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
 	 */
 	req_ref_put(req);
 }
 
 void io_req_defer_failed(struct io_kiocb *req, s32 res)
-	__must_hold(&ctx->uring_lock)
 {
 	const struct io_cold_def *def = &io_cold_defs[req->opcode];
 
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
 
 	req_set_fail(req);
 	io_req_set_res(req, res, io_put_kbuf(req, res, NULL));
 	if (def->fail)
 		def->fail(req);
@@ -1052,16 +1051,17 @@ void io_req_defer_failed(struct io_kiocb *req, s32 res)
  * handlers and io_issue_sqe() are done with it, e.g. inline completion path.
  * Because of that, io_alloc_req() should be called only under ->uring_lock
  * and with extra caution to not get a request that is still worked on.
  */
 __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx)
-	__must_hold(&ctx->uring_lock)
 {
 	gfp_t gfp = GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO;
 	void *reqs[IO_REQ_ALLOC_BATCH];
 	int ret;
 
+	io_ring_ctx_assert_locked(ctx);
+
 	ret = kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
 
 	/*
 	 * Bulk alloc is all-or-nothing. If we fail to get a batch,
 	 * retry single alloc to be on the safe side.
@@ -1126,11 +1126,11 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw)
 		return;
 	if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
 		atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
 
 	io_submit_flush_completions(ctx);
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx);
 	percpu_ref_put(&ctx->refs);
 }
 
 /*
  * Run queued task_work, returning the number of entries processed in *count.
@@ -1150,11 +1150,11 @@ struct llist_node *io_handle_tw_list(struct llist_node *node,
 						    io_task_work.node);
 
 		if (req->ctx != ctx) {
 			ctx_flush_and_put(ctx, ts);
 			ctx = req->ctx;
-			mutex_lock(&ctx->uring_lock);
+			io_ring_ctx_lock(ctx);
 			percpu_ref_get(&ctx->refs);
 		}
 		INDIRECT_CALL_2(req->io_task_work.func,
 				io_poll_task_func, io_req_rw_complete,
 				req, ts);
@@ -1502,12 +1502,13 @@ static inline void io_req_put_rsrc_nodes(struct io_kiocb *req)
 		io_put_rsrc_node(req->ctx, req->buf_node);
 }
 
 static void io_free_batch_list(struct io_ring_ctx *ctx,
 			       struct io_wq_work_node *node)
-	__must_hold(&ctx->uring_lock)
 {
+	io_ring_ctx_assert_locked(ctx);
+
 	do {
 		struct io_kiocb *req = container_of(node, struct io_kiocb,
 						    comp_list);
 
 		if (unlikely(req->flags & IO_REQ_CLEAN_SLOW_FLAGS)) {
@@ -1543,15 +1544,16 @@ static void io_free_batch_list(struct io_ring_ctx *ctx,
 		io_req_add_to_cache(req, ctx);
 	} while (node);
 }
 
 void __io_submit_flush_completions(struct io_ring_ctx *ctx)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_submit_state *state = &ctx->submit_state;
 	struct io_wq_work_node *node;
 
+	io_ring_ctx_assert_locked(ctx);
+
 	__io_cq_lock(ctx);
 	__wq_list_for_each(node, &state->compl_reqs) {
 		struct io_kiocb *req = container_of(node, struct io_kiocb,
 					    comp_list);
 
@@ -1767,16 +1769,17 @@ io_req_flags_t io_file_get_flags(struct file *file)
 		res |= REQ_F_SUPPORT_NOWAIT;
 	return res;
 }
 
 static __cold void io_drain_req(struct io_kiocb *req)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	bool drain = req->flags & IOSQE_IO_DRAIN;
 	struct io_defer_entry *de;
 
+	io_ring_ctx_assert_locked(ctx);
+
 	de = kmalloc(sizeof(*de), GFP_KERNEL_ACCOUNT);
 	if (!de) {
 		io_req_defer_failed(req, -ENOMEM);
 		return;
 	}
@@ -2043,12 +2046,13 @@ static int io_req_sqe_copy(struct io_kiocb *req, unsigned int issue_flags)
 	def->sqe_copy(req);
 	return 0;
 }
 
 static void io_queue_async(struct io_kiocb *req, unsigned int issue_flags, int ret)
-	__must_hold(&req->ctx->uring_lock)
 {
+	io_ring_ctx_assert_locked(req->ctx);
+
 	if (ret != -EAGAIN || (req->flags & REQ_F_NOWAIT)) {
 fail:
 		io_req_defer_failed(req, ret);
 		return;
 	}
@@ -2068,16 +2072,17 @@ static void io_queue_async(struct io_kiocb *req, unsigned int issue_flags, int r
 		break;
 	}
 }
 
 static inline void io_queue_sqe(struct io_kiocb *req, unsigned int extra_flags)
-	__must_hold(&req->ctx->uring_lock)
 {
 	unsigned int issue_flags = IO_URING_F_NONBLOCK |
 				   IO_URING_F_COMPLETE_DEFER | extra_flags;
 	int ret;
 
+	io_ring_ctx_assert_locked(req->ctx);
+
 	ret = io_issue_sqe(req, issue_flags);
 
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
 	 * doesn't support non-blocking read/write attempts
@@ -2085,12 +2090,13 @@ static inline void io_queue_sqe(struct io_kiocb *req, unsigned int extra_flags)
 	if (unlikely(ret))
 		io_queue_async(req, issue_flags, ret);
 }
 
 static void io_queue_sqe_fallback(struct io_kiocb *req)
-	__must_hold(&req->ctx->uring_lock)
 {
+	io_ring_ctx_assert_locked(req->ctx);
+
 	if (unlikely(req->flags & REQ_F_FAIL)) {
 		/*
 		 * We don't submit, fail them all, for that replace hardlinks
 		 * with normal links. Extra REQ_F_LINK is tolerated.
 		 */
@@ -2155,17 +2161,18 @@ static __cold int io_init_fail_req(struct io_kiocb *req, int err)
 	return err;
 }
 
 static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
 		       const struct io_uring_sqe *sqe)
-	__must_hold(&ctx->uring_lock)
 {
 	const struct io_issue_def *def;
 	unsigned int sqe_flags;
 	int personality;
 	u8 opcode;
 
+	io_ring_ctx_assert_locked(ctx);
+
 	req->ctx = ctx;
 	req->opcode = opcode = READ_ONCE(sqe->opcode);
 	/* same numerical values with corresponding REQ_F_*, safe to copy */
 	sqe_flags = READ_ONCE(sqe->flags);
 	req->flags = (__force io_req_flags_t) sqe_flags;
@@ -2290,15 +2297,16 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe,
 	return 0;
 }
 
 static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
 			 const struct io_uring_sqe *sqe)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_submit_link *link = &ctx->submit_state.link;
 	int ret;
 
+	io_ring_ctx_assert_locked(ctx);
+
 	ret = io_init_req(ctx, req, sqe);
 	if (unlikely(ret))
 		return io_submit_fail_init(sqe, req, ret);
 
 	trace_io_uring_submit_req(req);
@@ -2419,16 +2427,17 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, const struct io_uring_sqe **sqe)
 	*sqe = &ctx->sq_sqes[head];
 	return true;
 }
 
 int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
-	__must_hold(&ctx->uring_lock)
 {
 	unsigned int entries = io_sqring_entries(ctx);
 	unsigned int left;
 	int ret;
 
+	io_ring_ctx_assert_locked(ctx);
+
 	if (unlikely(!entries))
 		return 0;
 	/* make sure SQ entry isn't read before tail */
 	ret = left = min(nr, entries);
 	io_get_task_refs(left);
@@ -3518,14 +3527,14 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	} else if (to_submit) {
 		ret = io_uring_add_tctx_node(ctx);
 		if (unlikely(ret))
 			goto out;
 
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx);
 		ret = io_submit_sqes(ctx, to_submit);
 		if (ret != to_submit) {
-			mutex_unlock(&ctx->uring_lock);
+			io_ring_ctx_unlock(ctx);
 			goto out;
 		}
 		if (flags & IORING_ENTER_GETEVENTS) {
 			if (ctx->syscall_iopoll)
 				goto iopoll_locked;
@@ -3534,11 +3543,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 			 * it should handle ownership problems if any.
 			 */
 			if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
 				(void)io_run_local_work_locked(ctx, min_complete);
 		}
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx);
 	}
 
 	if (flags & IORING_ENTER_GETEVENTS) {
 		int ret2;
 
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index d62b7d9fafed..a0580a1bf6b5 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -119,20 +119,35 @@ bool __io_alloc_req_refill(struct io_ring_ctx *ctx);
 bool io_match_task_safe(struct io_kiocb *head, struct io_uring_task *tctx,
 			bool cancel_all);
 
 void io_activate_pollwq(struct io_ring_ctx *ctx);
 
+static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx)
+{
+	mutex_lock(&ctx->uring_lock);
+}
+
+static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx)
+{
+	mutex_unlock(&ctx->uring_lock);
+}
+
+static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx)
+{
+	lockdep_assert_held(&ctx->uring_lock);
+}
+
 static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
 {
 #if defined(CONFIG_PROVE_LOCKING)
 	lockdep_assert(in_task());
 
 	if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
-		lockdep_assert_held(&ctx->uring_lock);
+		io_ring_ctx_assert_locked(ctx);
 
 	if (ctx->flags & IORING_SETUP_IOPOLL) {
-		lockdep_assert_held(&ctx->uring_lock);
+		io_ring_ctx_assert_locked(ctx);
 	} else if (!ctx->task_complete) {
 		lockdep_assert_held(&ctx->completion_lock);
 	} else if (ctx->submitter_task) {
 		/*
 		 * ->submitter_task may be NULL and we can still post a CQE,
@@ -300,11 +315,11 @@ static inline void io_put_file(struct io_kiocb *req)
 }
 
 static inline void io_ring_submit_unlock(struct io_ring_ctx *ctx,
 					 unsigned issue_flags)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
 		mutex_unlock(&ctx->uring_lock);
 }
 
 static inline void io_ring_submit_lock(struct io_ring_ctx *ctx,
@@ -316,11 +331,11 @@ static inline void io_ring_submit_lock(struct io_ring_ctx *ctx,
 	 * The only exception is when we've detached the request and issue it
 	 * from an async worker thread, grab the lock for that case.
 	 */
 	if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
 		mutex_lock(&ctx->uring_lock);
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 }
 
 static inline void io_commit_cqring(struct io_ring_ctx *ctx)
 {
 	/* order cqe stores with ring update */
@@ -428,24 +443,23 @@ static inline bool io_task_work_pending(struct io_ring_ctx *ctx)
 	return task_work_pending(current) || io_local_work_pending(ctx);
 }
 
 static inline void io_tw_lock(struct io_ring_ctx *ctx, io_tw_token_t tw)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 }
 
 /*
  * Don't complete immediately but use deferred completion infrastructure.
  * Protected by ->uring_lock and can only be used either with
  * IO_URING_F_COMPLETE_DEFER or inside a tw handler holding the mutex.
  */
 static inline void io_req_complete_defer(struct io_kiocb *req)
-	__must_hold(&req->ctx->uring_lock)
 {
 	struct io_submit_state *state = &req->ctx->submit_state;
 
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
 
 	wq_list_add_tail(&req->comp_list, &state->compl_reqs);
 }
 
 static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 3e9aab21af9d..ea6f3588d875 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -68,11 +68,11 @@ bool io_kbuf_commit(struct io_kiocb *req,
 }
 
 static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx *ctx,
 							unsigned int bgid)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 
 	return xa_load(&ctx->io_bl_xa, bgid);
 }
 
 static int io_buffer_add_list(struct io_ring_ctx *ctx,
@@ -337,11 +337,11 @@ int io_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg,
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_buffer_list *bl;
 	int ret;
 
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 
 	bl = io_buffer_get_list(ctx, arg->buf_group);
 	if (unlikely(!bl))
 		return -ENOENT;
 
@@ -393,11 +393,11 @@ static int io_remove_buffers_legacy(struct io_ring_ctx *ctx,
 {
 	unsigned long i = 0;
 	struct io_buffer *nxt;
 
 	/* protects io_buffers_cache */
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	WARN_ON_ONCE(bl->flags & IOBL_BUF_RING);
 
 	for (i = 0; i < nbufs && !list_empty(&bl->buf_list); i++) {
 		nxt = list_first_entry(&bl->buf_list, struct io_buffer, list);
 		list_del(&nxt->list);
diff --git a/io_uring/notif.c b/io_uring/notif.c
index 8c92e9cde2c6..9dd248fcb213 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -14,11 +14,11 @@ static const struct ubuf_info_ops io_ubuf_ops;
 static void io_notif_tw_complete(struct io_kiocb *notif, io_tw_token_t tw)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
 	struct io_ring_ctx *ctx = notif->ctx;
 
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 
 	do {
 		notif = cmd_to_io_kiocb(nd);
 
 		if (WARN_ON_ONCE(ctx != notif->ctx))
@@ -108,15 +108,16 @@ static const struct ubuf_info_ops io_ubuf_ops = {
 	.complete = io_tx_ubuf_complete,
 	.link_skb = io_link_skb,
 };
 
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_kiocb *notif;
 	struct io_notif_data *nd;
 
+	io_ring_ctx_assert_locked(ctx);
+
 	if (unlikely(!io_alloc_req(ctx, &notif)))
 		return NULL;
 	notif->ctx = ctx;
 	notif->opcode = IORING_OP_NOP;
 	notif->flags = 0;
diff --git a/io_uring/notif.h b/io_uring/notif.h
index f3589cfef4a9..c33c9a1179c9 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -31,14 +31,15 @@ static inline struct io_notif_data *io_notif_to_data(struct io_kiocb *notif)
 {
 	return io_kiocb_to_cmd(notif, struct io_notif_data);
 }
 
 static inline void io_notif_flush(struct io_kiocb *notif)
-	__must_hold(&notif->ctx->uring_lock)
 {
 	struct io_notif_data *nd = io_notif_to_data(notif);
 
+	io_ring_ctx_assert_locked(notif->ctx);
+
 	io_tx_ubuf_complete(NULL, &nd->uarg, true);
 }
 
 static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned len)
 {
diff --git a/io_uring/poll.c b/io_uring/poll.c
index ea75c5cd81a0..ba71403c8fd8 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -121,11 +121,11 @@ static struct io_poll *io_poll_get_single(struct io_kiocb *req)
 static void io_poll_req_insert(struct io_kiocb *req)
 {
 	struct io_hash_table *table = &req->ctx->cancel_table;
 	u32 index = hash_long(req->cqe.user_data, table->hash_bits);
 
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
 
 	hlist_add_head(&req->hash_node, &table->hbs[index].list);
 }
 
 static void io_init_poll_iocb(struct io_poll *poll, __poll_t events)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 1e5b7833076a..1c1753de7340 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -347,11 +347,11 @@ static int __io_register_rsrc_update(struct io_ring_ctx *ctx, unsigned type,
 				     struct io_uring_rsrc_update2 *up,
 				     unsigned nr_args)
 {
 	__u32 tmp;
 
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 
 	if (check_add_overflow(up->offset, nr_args, &tmp))
 		return -EOVERFLOW;
 
 	switch (type) {
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index a3ca6ba66596..d537a3b895d6 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -2,10 +2,11 @@
 #ifndef IOU_RSRC_H
 #define IOU_RSRC_H
 
 #include <linux/io_uring_types.h>
 #include <linux/lockdep.h>
+#include "io_uring.h"
 
 #define IO_VEC_CACHE_SOFT_CAP		256
 
 enum {
 	IORING_RSRC_FILE		= 0,
@@ -97,11 +98,11 @@ static inline struct io_rsrc_node *io_rsrc_node_lookup(struct io_rsrc_data *data
 	return NULL;
 }
 
 static inline void io_put_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	if (!--node->refs)
 		io_free_rsrc_node(ctx, node);
 }
 
 static inline bool io_reset_rsrc_node(struct io_ring_ctx *ctx,
diff --git a/io_uring/rw.c b/io_uring/rw.c
index ab6b4afccec3..f00e02a02dc7 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -461,11 +461,11 @@ int io_read_mshot_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
 void io_readv_writev_cleanup(struct io_kiocb *req)
 {
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
 	io_rw_recycle(req, 0);
 }
 
 static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
 {
diff --git a/io_uring/waitid.c b/io_uring/waitid.c
index 26c118f3918d..f7a5054d4d81 100644
--- a/io_uring/waitid.c
+++ b/io_uring/waitid.c
@@ -114,11 +114,11 @@ static void io_waitid_complete(struct io_kiocb *req, int ret)
 	struct io_waitid *iw = io_kiocb_to_cmd(req, struct io_waitid);
 
 	/* anyone completing better be holding a reference */
 	WARN_ON_ONCE(!(atomic_read(&iw->refs) & IO_WAITID_REF_MASK));
 
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
 
 	hlist_del_init(&req->hash_node);
 
 	ret = io_waitid_finish(req, ret);
 	if (ret < 0)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-04 17:08 [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
                   ` (3 preceding siblings ...)
  2025-09-04 17:09 ` [PATCH v2 4/5] io_uring: factor out uring_lock helpers Caleb Sander Mateos
@ 2025-09-04 17:09 ` Caleb Sander Mateos
  2025-09-08 19:20   ` Jens Axboe
  2025-09-09 13:35 ` [PATCH v2 0/5] " Jens Axboe
  5 siblings, 1 reply; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-04 17:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-kernel, Caleb Sander Mateos, syzbot

io_ring_ctx's mutex uring_lock can be quite expensive in high-IOPS
workloads. Even when only one thread pinned to a single CPU is accessing
the io_ring_ctx, the atomic CASes required to lock and unlock the mutex
are very hot instructions. The mutex's primary purpose is to prevent
concurrent io_uring system calls on the same io_ring_ctx. However, there
is already a flag IORING_SETUP_SINGLE_ISSUER that promises only one
task will make io_uring_enter() and io_uring_register() system calls on
the io_ring_ctx once it's enabled.
So if the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the
uring_lock mutex_lock() and mutex_unlock() for the io_uring_enter()
submission as well as for io_handle_tw_list(). io_uring_enter()
submission calls __io_uring_add_tctx_node_from_submit() to verify the
current task matches submitter_task for IORING_SETUP_SINGLE_ISSUER. And
task work can only be scheduled on tasks that submit io_uring requests,
so io_handle_tw_list() will also only be called on submitter_task.
There is a goto from the io_uring_enter() submission to the middle of
the IOPOLL block which assumed the uring_lock would already be held.
This is no longer the case for IORING_SETUP_SINGLE_ISSUER, so goto the
preceding mutex_lock() in that case.
It may be possible to avoid taking uring_lock in other places too for
IORING_SETUP_SINGLE_ISSUER, but these two cover the primary hot paths.
The uring_lock in io_uring_register() is necessary at least before the
io_uring is enabled because submitter_task isn't set yet. uring_lock is
also used to synchronize IOPOLL on submitting tasks with io_uring worker
tasks, so it's still needed there. But in principle, it should be
possible to remove the mutex entirely for IORING_SETUP_SINGLE_ISSUER by
running any code needing exclusive access to the io_ring_ctx in task
work context on submitter_task.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Tested-by: syzbot@syzkaller.appspotmail.com
---
 io_uring/io_uring.c |  6 +++++-
 io_uring/io_uring.h | 14 ++++++++++++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 69e1175256bb..b743644a3fac 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3534,12 +3534,15 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		if (ret != to_submit) {
 			io_ring_ctx_unlock(ctx);
 			goto out;
 		}
 		if (flags & IORING_ENTER_GETEVENTS) {
-			if (ctx->syscall_iopoll)
+			if (ctx->syscall_iopoll) {
+				if (ctx->flags & IORING_SETUP_SINGLE_ISSUER)
+					goto iopoll;
 				goto iopoll_locked;
+			}
 			/*
 			 * Ignore errors, we'll soon call io_cqring_wait() and
 			 * it should handle ownership problems if any.
 			 */
 			if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
@@ -3556,10 +3559,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 			 * We disallow the app entering submit/complete with
 			 * polling, but we still need to lock the ring to
 			 * prevent racing with polled issue that got punted to
 			 * a workqueue.
 			 */
+iopoll:
 			mutex_lock(&ctx->uring_lock);
 iopoll_locked:
 			ret2 = io_validate_ext_arg(ctx, flags, argp, argsz);
 			if (likely(!ret2))
 				ret2 = io_iopoll_check(ctx, min_complete);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index a0580a1bf6b5..7296b12b0897 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -121,20 +121,34 @@ bool io_match_task_safe(struct io_kiocb *head, struct io_uring_task *tctx,
 
 void io_activate_pollwq(struct io_ring_ctx *ctx);
 
 static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx)
 {
+	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER) {
+		WARN_ON_ONCE(current != ctx->submitter_task);
+		return;
+	}
+
 	mutex_lock(&ctx->uring_lock);
 }
 
 static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx)
 {
+	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER) {
+		WARN_ON_ONCE(current != ctx->submitter_task);
+		return;
+	}
+
 	mutex_unlock(&ctx->uring_lock);
 }
 
 static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx)
 {
+	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER &&
+	    current == ctx->submitter_task)
+		return;
+
 	lockdep_assert_held(&ctx->uring_lock);
 }
 
 static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
 {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
  2025-09-04 17:09 ` [PATCH v2 3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL Caleb Sander Mateos
@ 2025-09-08 14:13   ` Jens Axboe
  2025-09-08 18:11     ` Caleb Sander Mateos
  0 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2025-09-08 14:13 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: io-uring, linux-kernel

On 9/4/25 11:09 AM, Caleb Sander Mateos wrote:
> IORING_SETUP_SINGLE_ISSUER doesn't currently enable any optimizations,
> but it will soon be used to avoid taking io_ring_ctx's uring_lock when
> submitting from the single issuer task. If the IORING_SETUP_SQPOLL flag
> is set, the SQ thread is the sole task issuing SQEs. However, other
> tasks may make io_uring_register() syscalls, which must be synchronized
> with SQE submission. So it wouldn't be safe to skip the uring_lock
> around the SQ thread's submission even if IORING_SETUP_SINGLE_ISSUER is
> set. Therefore, clear IORING_SETUP_SINGLE_ISSUER from the io_ring_ctx
> flags if IORING_SETUP_SQPOLL is set.
> 
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> ---
>  io_uring/io_uring.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 42f6bfbb99d3..c7af9dc3d95a 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3724,10 +3724,19 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
>  	 */
>  	if ((flags & (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED)) ==
>  	    (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED))
>  		return -EINVAL;
>  
> +	/*
> +	 * If IORING_SETUP_SQPOLL is set, only the SQ thread issues SQEs,
> +	 * but other threads may call io_uring_register() concurrently.
> +	 * We still need uring_lock to synchronize these io_ring_ctx accesses,
> +	 * so disable the single issuer optimizations.
> +	 */
> +	if (flags & IORING_SETUP_SQPOLL)
> +		p->flags &= ~IORING_SETUP_SINGLE_ISSUER;
> +

As mentioned I think this is fine. Just for posterity, one solution
here would be to require that the task doing eg io_uring_register() on a
setup with SINGLE_ISSUER|SQPOLL would be required to park and unpark the
SQ thread before doing what it needs to do. That should get us most/all
of the way there to enabling it with SQPOLL as well.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
  2025-09-08 14:13   ` Jens Axboe
@ 2025-09-08 18:11     ` Caleb Sander Mateos
  2025-09-08 19:19       ` Jens Axboe
  0 siblings, 1 reply; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-08 18:11 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, linux-kernel

On Mon, Sep 8, 2025 at 7:13 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/4/25 11:09 AM, Caleb Sander Mateos wrote:
> > IORING_SETUP_SINGLE_ISSUER doesn't currently enable any optimizations,
> > but it will soon be used to avoid taking io_ring_ctx's uring_lock when
> > submitting from the single issuer task. If the IORING_SETUP_SQPOLL flag
> > is set, the SQ thread is the sole task issuing SQEs. However, other
> > tasks may make io_uring_register() syscalls, which must be synchronized
> > with SQE submission. So it wouldn't be safe to skip the uring_lock
> > around the SQ thread's submission even if IORING_SETUP_SINGLE_ISSUER is
> > set. Therefore, clear IORING_SETUP_SINGLE_ISSUER from the io_ring_ctx
> > flags if IORING_SETUP_SQPOLL is set.
> >
> > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > ---
> >  io_uring/io_uring.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> > index 42f6bfbb99d3..c7af9dc3d95a 100644
> > --- a/io_uring/io_uring.c
> > +++ b/io_uring/io_uring.c
> > @@ -3724,10 +3724,19 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
> >        */
> >       if ((flags & (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED)) ==
> >           (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED))
> >               return -EINVAL;
> >
> > +     /*
> > +      * If IORING_SETUP_SQPOLL is set, only the SQ thread issues SQEs,
> > +      * but other threads may call io_uring_register() concurrently.
> > +      * We still need uring_lock to synchronize these io_ring_ctx accesses,
> > +      * so disable the single issuer optimizations.
> > +      */
> > +     if (flags & IORING_SETUP_SQPOLL)
> > +             p->flags &= ~IORING_SETUP_SINGLE_ISSUER;
> > +
>
> As mentioned I think this is fine. Just for posterity, one solution
> here would be to require that the task doing eg io_uring_register() on a
> setup with SINGLE_ISSUER|SQPOLL would be required to park and unpark the
> SQ thread before doing what it needs to do. That should get us most/all
> of the way there to enabling it with SQPOLL as well.

Right, though that may make io_uring_register() significantly slower
and disruptive to the I/O path. Another option would be to proxy all
registrations to the SQ thread via task_work. I think leaving the
current behavior as-is makes the most sense to avoid any regressions.
If someone is interested in optimizing the IORING_SETUP_SQPOLL &&
IORING_SETUP_SINGLE_ISSUER use case, they're more than welcome to!

I appreciate your feedback on the series. Do you have any other thoughts on it?

Best,
Caleb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
  2025-09-08 18:11     ` Caleb Sander Mateos
@ 2025-09-08 19:19       ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2025-09-08 19:19 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: io-uring, linux-kernel

On 9/8/25 12:11 PM, Caleb Sander Mateos wrote:
> On Mon, Sep 8, 2025 at 7:13?AM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 9/4/25 11:09 AM, Caleb Sander Mateos wrote:
>>> IORING_SETUP_SINGLE_ISSUER doesn't currently enable any optimizations,
>>> but it will soon be used to avoid taking io_ring_ctx's uring_lock when
>>> submitting from the single issuer task. If the IORING_SETUP_SQPOLL flag
>>> is set, the SQ thread is the sole task issuing SQEs. However, other
>>> tasks may make io_uring_register() syscalls, which must be synchronized
>>> with SQE submission. So it wouldn't be safe to skip the uring_lock
>>> around the SQ thread's submission even if IORING_SETUP_SINGLE_ISSUER is
>>> set. Therefore, clear IORING_SETUP_SINGLE_ISSUER from the io_ring_ctx
>>> flags if IORING_SETUP_SQPOLL is set.
>>>
>>> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
>>> ---
>>>  io_uring/io_uring.c | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index 42f6bfbb99d3..c7af9dc3d95a 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -3724,10 +3724,19 @@ static int io_uring_sanitise_params(struct io_uring_params *p)
>>>        */
>>>       if ((flags & (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED)) ==
>>>           (IORING_SETUP_CQE32|IORING_SETUP_CQE_MIXED))
>>>               return -EINVAL;
>>>
>>> +     /*
>>> +      * If IORING_SETUP_SQPOLL is set, only the SQ thread issues SQEs,
>>> +      * but other threads may call io_uring_register() concurrently.
>>> +      * We still need uring_lock to synchronize these io_ring_ctx accesses,
>>> +      * so disable the single issuer optimizations.
>>> +      */
>>> +     if (flags & IORING_SETUP_SQPOLL)
>>> +             p->flags &= ~IORING_SETUP_SINGLE_ISSUER;
>>> +
>>
>> As mentioned I think this is fine. Just for posterity, one solution
>> here would be to require that the task doing eg io_uring_register() on a
>> setup with SINGLE_ISSUER|SQPOLL would be required to park and unpark the
>> SQ thread before doing what it needs to do. That should get us most/all
>> of the way there to enabling it with SQPOLL as well.
> 
> Right, though that may make io_uring_register() significantly slower
> and disruptive to the I/O path. Another option would be to proxy all
> registrations to the SQ thread via task_work. I think leaving the
> current behavior as-is makes the most sense to avoid any regressions.
> If someone is interested in optimizing the IORING_SETUP_SQPOLL &&
> IORING_SETUP_SINGLE_ISSUER use case, they're more than welcome to!

True, though for most cases that won't matter, but for some it certainly
could. I certainly agree that this is a problen that's best deferred
anyway, SQPOLL is a bit of an oddball use case anyway.

> I appreciate your feedback on the series. Do you have any other
> thoughts on it?

Looks pretty clean to me, no big concerns honestly.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-04 17:09 ` [PATCH v2 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
@ 2025-09-08 19:20   ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2025-09-08 19:20 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: io-uring, linux-kernel, syzbot

On 9/4/25 11:09 AM, Caleb Sander Mateos wrote:
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index a0580a1bf6b5..7296b12b0897 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -121,20 +121,34 @@ bool io_match_task_safe(struct io_kiocb *head, struct io_uring_task *tctx,
>  
>  void io_activate_pollwq(struct io_ring_ctx *ctx);
>  
>  static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx)
>  {
> +	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER) {
> +		WARN_ON_ONCE(current != ctx->submitter_task);
> +		return;
> +	}
> +
>  	mutex_lock(&ctx->uring_lock);
>  }
>  
>  static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx)
>  {
> +	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER) {
> +		WARN_ON_ONCE(current != ctx->submitter_task);
> +		return;
> +	}
> +
>  	mutex_unlock(&ctx->uring_lock);
>  }

I do want to get rid of these WARN_ON_ONCE() down the line, but it's
prudent to keep them there for now.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
  2025-09-04 17:08 ` [PATCH v2 2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() Caleb Sander Mateos
@ 2025-09-09 13:35   ` Jens Axboe
  0 siblings, 0 replies; 18+ messages in thread
From: Jens Axboe @ 2025-09-09 13:35 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: io-uring, linux-kernel

On 9/4/25 11:08 AM, Caleb Sander Mateos wrote:
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 2d15b8785a95..1e5b7833076a 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -1298,14 +1298,21 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
>  
>  	src_ctx = file->private_data;
>  	if (src_ctx != ctx) {
>  		mutex_unlock(&ctx->uring_lock);
>  		lock_two_rings(ctx, src_ctx);
> +
> +		if (src_ctx->submitter_task && 
> +		    src_ctx->submitter_task != current) {
> +			ret = -EEXIST;
> +			goto out;
> +		}

Spurious end-of-line space detected here, pruned while merging.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-04 17:08 [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
                   ` (4 preceding siblings ...)
  2025-09-04 17:09 ` [PATCH v2 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
@ 2025-09-09 13:35 ` Jens Axboe
  2025-09-10 11:57   ` Pavel Begunkov
  5 siblings, 1 reply; 18+ messages in thread
From: Jens Axboe @ 2025-09-09 13:35 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: io-uring, linux-kernel


On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
> an io_uring doesn't actually enable any additional optimizations (aside
> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
> submits SQEs to skip taking the uring_lock mutex in the submission and
> task work paths.
> 
> [...]

Applied, thanks!

[1/5] io_uring: don't include filetable.h in io_uring.h
      commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
[2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
      commit: 2f076a453f75de691a081c89bce31b530153d53b
[3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
      commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
[4/5] io_uring: factor out uring_lock helpers
      commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
[5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
      commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-09 13:35 ` [PATCH v2 0/5] " Jens Axboe
@ 2025-09-10 11:57   ` Pavel Begunkov
  2025-09-10 15:36     ` Jens Axboe
  2025-09-11 16:14     ` Caleb Sander Mateos
  0 siblings, 2 replies; 18+ messages in thread
From: Pavel Begunkov @ 2025-09-10 11:57 UTC (permalink / raw)
  To: Jens Axboe, Caleb Sander Mateos; +Cc: io-uring, linux-kernel

On 9/9/25 14:35, Jens Axboe wrote:
> 
> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
>> an io_uring doesn't actually enable any additional optimizations (aside
>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
>> submits SQEs to skip taking the uring_lock mutex in the submission and
>> task work paths.
>>
>> [...]
> 
> Applied, thanks!
> 
> [1/5] io_uring: don't include filetable.h in io_uring.h
>        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
>        commit: 2f076a453f75de691a081c89bce31b530153d53b
> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
>        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
> [4/5] io_uring: factor out uring_lock helpers
>        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2

FWIW, from a glance that should be quite broken, there is a bunch of
bits protected from parallel use by the lock. I described this
optimisation few years back around when first introduced SINGLE_ISSUER
and the DEFER_TASKRUN locking model, but to this day think it's not
worth it as it'll be a major pain for any future changes. It would've
been more feasible if links wasn't a thing. Though, none of it is
my problem anymore, and I'm not insisting.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-10 11:57   ` Pavel Begunkov
@ 2025-09-10 15:36     ` Jens Axboe
  2025-09-11 10:36       ` Pavel Begunkov
                         ` (2 more replies)
  2025-09-11 16:14     ` Caleb Sander Mateos
  1 sibling, 3 replies; 18+ messages in thread
From: Jens Axboe @ 2025-09-10 15:36 UTC (permalink / raw)
  To: Pavel Begunkov, Caleb Sander Mateos; +Cc: io-uring, linux-kernel

On 9/10/25 5:57 AM, Pavel Begunkov wrote:
> On 9/9/25 14:35, Jens Axboe wrote:
>>
>> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
>>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
>>> an io_uring doesn't actually enable any additional optimizations (aside
>>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
>>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
>>> submits SQEs to skip taking the uring_lock mutex in the submission and
>>> task work paths.
>>>
>>> [...]
>>
>> Applied, thanks!
>>
>> [1/5] io_uring: don't include filetable.h in io_uring.h
>>        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
>> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
>>        commit: 2f076a453f75de691a081c89bce31b530153d53b
>> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
>>        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
>> [4/5] io_uring: factor out uring_lock helpers
>>        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
>> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>>        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
> 
> FWIW, from a glance that should be quite broken, there is a bunch of
> bits protected from parallel use by the lock. I described this
> optimisation few years back around when first introduced SINGLE_ISSUER
> and the DEFER_TASKRUN locking model, but to this day think it's not
> worth it as it'll be a major pain for any future changes. It would've
> been more feasible if links wasn't a thing. Though, none of it is
> my problem anymore, and I'm not insisting.

Hmm yes, was actually pondering this last night as well and was going
to take a closer look today as I have a flight coming up. I'll leave
them in there for now just to see if syzbot finds anything, and take
that closer look and see if it's salvageable for now or if we just need
a new revised take on this.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-10 15:36     ` Jens Axboe
@ 2025-09-11 10:36       ` Pavel Begunkov
  2025-09-30 23:37       ` Caleb Sander Mateos
  2025-11-03 20:47       ` Caleb Sander Mateos
  2 siblings, 0 replies; 18+ messages in thread
From: Pavel Begunkov @ 2025-09-11 10:36 UTC (permalink / raw)
  To: Jens Axboe, Caleb Sander Mateos; +Cc: io-uring, linux-kernel

On 9/10/25 16:36, Jens Axboe wrote:
> On 9/10/25 5:57 AM, Pavel Begunkov wrote:
>> On 9/9/25 14:35, Jens Axboe wrote:
>>>
>>> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
>>>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
>>>> an io_uring doesn't actually enable any additional optimizations (aside
>>>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
>>>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
>>>> submits SQEs to skip taking the uring_lock mutex in the submission and
>>>> task work paths.
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>>
>>> [1/5] io_uring: don't include filetable.h in io_uring.h
>>>         commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
>>> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
>>>         commit: 2f076a453f75de691a081c89bce31b530153d53b
>>> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
>>>         commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
>>> [4/5] io_uring: factor out uring_lock helpers
>>>         commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
>>> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>>>         commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
>>
>> FWIW, from a glance that should be quite broken, there is a bunch of
>> bits protected from parallel use by the lock. I described this
>> optimisation few years back around when first introduced SINGLE_ISSUER
>> and the DEFER_TASKRUN locking model, but to this day think it's not
>> worth it as it'll be a major pain for any future changes. It would've
>> been more feasible if links wasn't a thing. Though, none of it is
>> my problem anymore, and I'm not insisting.
> 
> Hmm yes, was actually pondering this last night as well and was going
> to take a closer look today as I have a flight coming up. I'll leave
> them in there for now just to see if syzbot finds anything, and take

Better not to? People are already nervous about security / bugs,
and there is no ambiguity how and where it's wrong. Also because
it breaks uapi in a couple places, with at least one of them
being a security hole.

> that closer look and see if it's salvageable for now or if we just need
> a new revised take on this.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-10 11:57   ` Pavel Begunkov
  2025-09-10 15:36     ` Jens Axboe
@ 2025-09-11 16:14     ` Caleb Sander Mateos
  1 sibling, 0 replies; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-11 16:14 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, linux-kernel

On Wed, Sep 10, 2025 at 4:56 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 9/9/25 14:35, Jens Axboe wrote:
> >
> > On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
> >> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
> >> an io_uring doesn't actually enable any additional optimizations (aside
> >> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
> >> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
> >> submits SQEs to skip taking the uring_lock mutex in the submission and
> >> task work paths.
> >>
> >> [...]
> >
> > Applied, thanks!
> >
> > [1/5] io_uring: don't include filetable.h in io_uring.h
> >        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
> > [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> >        commit: 2f076a453f75de691a081c89bce31b530153d53b
> > [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
> >        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
> > [4/5] io_uring: factor out uring_lock helpers
> >        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
> > [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> >        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
>
> FWIW, from a glance that should be quite broken, there is a bunch of
> bits protected from parallel use by the lock. I described this
> optimisation few years back around when first introduced SINGLE_ISSUER
> and the DEFER_TASKRUN locking model, but to this day think it's not
> worth it as it'll be a major pain for any future changes. It would've
> been more feasible if links wasn't a thing. Though, none of it is
> my problem anymore, and I'm not insisting.

If you have a link to this prior discussion, I'd appreciate it. I had
tried searching through the io-uring lore archives, but apparently I
don't know the magic keywords. If there are specific issues with this
locking model, I'd like to understand and address them. But opaque
references to known "bugs" are not helpful in improving these patches.

Thanks,
Caleb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-10 15:36     ` Jens Axboe
  2025-09-11 10:36       ` Pavel Begunkov
@ 2025-09-30 23:37       ` Caleb Sander Mateos
  2025-11-03 20:47       ` Caleb Sander Mateos
  2 siblings, 0 replies; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-09-30 23:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-kernel

On Wed, Sep 10, 2025 at 8:36 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/10/25 5:57 AM, Pavel Begunkov wrote:
> > On 9/9/25 14:35, Jens Axboe wrote:
> >>
> >> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
> >>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
> >>> an io_uring doesn't actually enable any additional optimizations (aside
> >>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
> >>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
> >>> submits SQEs to skip taking the uring_lock mutex in the submission and
> >>> task work paths.
> >>>
> >>> [...]
> >>
> >> Applied, thanks!
> >>
> >> [1/5] io_uring: don't include filetable.h in io_uring.h
> >>        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
> >> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> >>        commit: 2f076a453f75de691a081c89bce31b530153d53b
> >> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
> >>        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
> >> [4/5] io_uring: factor out uring_lock helpers
> >>        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
> >> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> >>        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
> >
> > FWIW, from a glance that should be quite broken, there is a bunch of
> > bits protected from parallel use by the lock. I described this
> > optimisation few years back around when first introduced SINGLE_ISSUER
> > and the DEFER_TASKRUN locking model, but to this day think it's not
> > worth it as it'll be a major pain for any future changes. It would've
> > been more feasible if links wasn't a thing. Though, none of it is
> > my problem anymore, and I'm not insisting.
>
> Hmm yes, was actually pondering this last night as well and was going
> to take a closer look today as I have a flight coming up. I'll leave
> them in there for now just to see if syzbot finds anything, and take
> that closer look and see if it's salvageable for now or if we just need
> a new revised take on this.

Hi Jens,
I'd love to understand the race condition concerns you have about not
holding the uring_lock during submission and task work. My limited
mental model of io_uring is that the io_ring_ctx is only accessed in
the context of the task that submitted work to it (during the
io_uring_enter() syscall or task work) or from some interrupt context
that reports a completed operation. Since it's not possible to block
on a mutex in interrupt context, the uring_lock couldn't be used to
synchronize anything running in interrupt context. And then all other
work would be running in the context of the single issuer task, which
couldn't race with itself. Perhaps io_uring worker threads complicate
this picture?
No urgency on this, but I would love to find a way forward here.
Acquiring and releasing the uring_lock is one of the single hottest
CPU instructions in some of our workloads even though each mutex is
accessed only on one CPU. If  uring_lock is necessary to prevent some
other critical sections from racing, perhaps there's an alternate way
to synchronize them (e.g. by deferring them to task work). Or if the
racing sections are specific to certain uses of io_uring, maybe we
could add a setup flag allowing an io_uring user to indicate that it
won't use those features.

Thanks,
Caleb

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
  2025-09-10 15:36     ` Jens Axboe
  2025-09-11 10:36       ` Pavel Begunkov
  2025-09-30 23:37       ` Caleb Sander Mateos
@ 2025-11-03 20:47       ` Caleb Sander Mateos
  2 siblings, 0 replies; 18+ messages in thread
From: Caleb Sander Mateos @ 2025-11-03 20:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Pavel Begunkov, io-uring, linux-kernel

On Wed, Sep 10, 2025 at 8:36 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/10/25 5:57 AM, Pavel Begunkov wrote:
> > On 9/9/25 14:35, Jens Axboe wrote:
> >>
> >> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
> >>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
> >>> an io_uring doesn't actually enable any additional optimizations (aside
> >>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
> >>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
> >>> submits SQEs to skip taking the uring_lock mutex in the submission and
> >>> task work paths.
> >>>
> >>> [...]
> >>
> >> Applied, thanks!
> >>
> >> [1/5] io_uring: don't include filetable.h in io_uring.h
> >>        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
> >> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> >>        commit: 2f076a453f75de691a081c89bce31b530153d53b
> >> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
> >>        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
> >> [4/5] io_uring: factor out uring_lock helpers
> >>        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
> >> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> >>        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
> >
> > FWIW, from a glance that should be quite broken, there is a bunch of
> > bits protected from parallel use by the lock. I described this
> > optimisation few years back around when first introduced SINGLE_ISSUER
> > and the DEFER_TASKRUN locking model, but to this day think it's not
> > worth it as it'll be a major pain for any future changes. It would've
> > been more feasible if links wasn't a thing. Though, none of it is
> > my problem anymore, and I'm not insisting.
>
> Hmm yes, was actually pondering this last night as well and was going
> to take a closer look today as I have a flight coming up. I'll leave
> them in there for now just to see if syzbot finds anything, and take
> that closer look and see if it's salvageable for now or if we just need
> a new revised take on this.

Is the concern the various IO_URING_F_UNLOCKED contexts (e.g. io_uring
worker threads) relying on uring_lock to synchronize access to the
io_ring_ctx with submitter_task? I think it would be possible to
provide mutual exclusion in those contexts using a task work item to
suspend submitter_task. When submitter_task picks up the task work, it
can unblock the thread running in IO_URING_F_UNLOCKED context, which
can then take the uring_lock as usual. Once it releases the
uring_lock, it can unblock submitter_task.
This approach could certainly add latency to taking uring_lock in
IO_URING_F_UNLOCKED contexts, though I don't expect that is very
common in applications using io_uring. We could certainly add a new
setup flag to avoid changing the behavior for existing
IORING_SETUP_SINGLE_ISSUER users. What are your thoughts on this
approach?

Thanks,
Caleb

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-11-03 20:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-04 17:08 [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-09-04 17:08 ` [PATCH v2 1/5] io_uring: don't include filetable.h in io_uring.h Caleb Sander Mateos
2025-09-04 17:08 ` [PATCH v2 2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() Caleb Sander Mateos
2025-09-09 13:35   ` Jens Axboe
2025-09-04 17:09 ` [PATCH v2 3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL Caleb Sander Mateos
2025-09-08 14:13   ` Jens Axboe
2025-09-08 18:11     ` Caleb Sander Mateos
2025-09-08 19:19       ` Jens Axboe
2025-09-04 17:09 ` [PATCH v2 4/5] io_uring: factor out uring_lock helpers Caleb Sander Mateos
2025-09-04 17:09 ` [PATCH v2 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Caleb Sander Mateos
2025-09-08 19:20   ` Jens Axboe
2025-09-09 13:35 ` [PATCH v2 0/5] " Jens Axboe
2025-09-10 11:57   ` Pavel Begunkov
2025-09-10 15:36     ` Jens Axboe
2025-09-11 10:36       ` Pavel Begunkov
2025-09-30 23:37       ` Caleb Sander Mateos
2025-11-03 20:47       ` Caleb Sander Mateos
2025-09-11 16:14     ` Caleb Sander Mateos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox