public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/2] Fix DEFER_TASKRUN ring resize flag manipulation
@ 2026-03-10 14:45 Jens Axboe
  2026-03-10 14:45 ` [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation Jens Axboe
  2026-03-10 14:45 ` [PATCH 2/2] io_uring/eventfd: use ctx->rings_rcu for flags checking Jens Axboe
  0 siblings, 2 replies; 8+ messages in thread
From: Jens Axboe @ 2026-03-10 14:45 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, naup96721

Hi,

Two patches here:

1) Fix adding local task_work during ring resize. There's a tiny gap
   where a NULL pointer would be used.

2) Same issue exists in the eventfd handling, so apply the same kind of
   fix there.

Thanks to Hao-Yu Yang for the report and initial fix attempt, and Pavel
for a good suggestion on how best to handle this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation
  2026-03-10 14:45 [PATCHSET 0/2] Fix DEFER_TASKRUN ring resize flag manipulation Jens Axboe
@ 2026-03-10 14:45 ` Jens Axboe
  2026-03-11 11:13   ` Pavel Begunkov
  2026-03-10 14:45 ` [PATCH 2/2] io_uring/eventfd: use ctx->rings_rcu for flags checking Jens Axboe
  1 sibling, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2026-03-10 14:45 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, naup96721, Jens Axboe, stable

If DEFER_TASKRUN | SETUP_TASKRUN is used and task work is added while
the ring is being resized, it's possible for the OR'ing of
IORING_SQ_TASKRUN to happen in the small window of swapping into the
new rings and the old rings being freed.

Prevent this by adding a 2nd ->rings pointer, ->rings_rcu, which is
protected by RCU. The task work flags manipulation is inside RCU
already, and if the resize ring freeing is done post an RCU synchronize,
then there's no need to add locking to the fast path of task work
additions.

Note: this is only done for DEFER_TASKRUN, as that's the only setup mode
that supports ring resizing. If this ever changes, then they too need to
use the io_ctx_mark_taskrun() helper.

Link: https://lore.kernel.org/io-uring/20260309062759.482210-1-naup96721@gmail.com/
Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reported-by: Hao-Yu Yang <naup96721@gmail.com>
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/io_uring_types.h |  1 +
 io_uring/io_uring.c            |  2 ++
 io_uring/register.c            | 20 ++++++++++++++++++--
 io_uring/tw.c                  | 24 ++++++++++++++++++++++--
 4 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3e4a82a6f817..dd1420bfcb73 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -388,6 +388,7 @@ struct io_ring_ctx {
 	 * regularly bounce b/w CPUs.
 	 */
 	struct {
+		struct io_rings	__rcu	*rings_rcu;
 		struct llist_head	work_llist;
 		struct llist_head	retry_llist;
 		unsigned long		check_cq;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ccab8562d273..20fdc442e014 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2066,6 +2066,7 @@ static void io_rings_free(struct io_ring_ctx *ctx)
 	io_free_region(ctx->user, &ctx->sq_region);
 	io_free_region(ctx->user, &ctx->ring_region);
 	ctx->rings = NULL;
+	RCU_INIT_POINTER(ctx->rings_rcu, NULL);
 	ctx->sq_sqes = NULL;
 }
 
@@ -2703,6 +2704,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
 	if (ret)
 		return ret;
 	ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
+	rcu_assign_pointer(ctx->rings_rcu, rings);
 	if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
 		ctx->sq_array = (u32 *)((char *)rings + rl->sq_array_offset);
 
diff --git a/io_uring/register.c b/io_uring/register.c
index a839b22fd392..5f2985ba0879 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -487,6 +487,18 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
 			 IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP | \
 			 IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED)
 
+static void io_resize_assign_rings(struct io_ring_ctx *ctx, struct io_rings *rings)
+{
+	/*
+	 * Just mark any flag we may have missed and that the application
+	 * should act on unconditionally. Worst case it'll be an extra
+	 * syscall.
+	 */
+	atomic_or(IORING_SQ_TASKRUN | IORING_SQ_NEED_WAKEUP, &rings->sq_flags);
+	ctx->rings = rings;
+	rcu_assign_pointer(ctx->rings_rcu, rings);
+}
+
 static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 {
 	struct io_ctx_config config;
@@ -579,6 +591,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 	spin_lock(&ctx->completion_lock);
 	o.rings = ctx->rings;
 	ctx->rings = NULL;
+	RCU_INIT_POINTER(ctx->rings_rcu, NULL);
 	o.sq_sqes = ctx->sq_sqes;
 	ctx->sq_sqes = NULL;
 
@@ -604,7 +617,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 	if (tail - old_head > p->cq_entries) {
 overflow:
 		/* restore old rings, and return -EOVERFLOW via cleanup path */
-		ctx->rings = o.rings;
+		io_resize_assign_rings(ctx, o.rings);
 		ctx->sq_sqes = o.sq_sqes;
 		to_free = &n;
 		ret = -EOVERFLOW;
@@ -633,7 +646,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 	ctx->sq_entries = p->sq_entries;
 	ctx->cq_entries = p->cq_entries;
 
-	ctx->rings = n.rings;
+	io_resize_assign_rings(ctx, n.rings);
 	ctx->sq_sqes = n.sq_sqes;
 	swap_old(ctx, o, n, ring_region);
 	swap_old(ctx, o, n, sq_region);
@@ -642,6 +655,9 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 out:
 	spin_unlock(&ctx->completion_lock);
 	mutex_unlock(&ctx->mmap_lock);
+	/* Wait for concurrent io_ctx_mark_taskrun() */
+	if (to_free == &o)
+		synchronize_rcu();
 	io_register_free_rings(ctx, to_free);
 
 	if (ctx->sq_data)
diff --git a/io_uring/tw.c b/io_uring/tw.c
index 1ee2b8ab07c8..c104e1e30d7c 100644
--- a/io_uring/tw.c
+++ b/io_uring/tw.c
@@ -152,6 +152,23 @@ void tctx_task_work(struct callback_head *cb)
 	WARN_ON_ONCE(ret);
 }
 
+/*
+ * Sets IORING_SQ_TASKRUN in the sq_flags shared with userspace, using the
+ * RCU protected rings pointer to be safe against concurrent ring resizing.
+ * Must be called inside an RCU read-side critical section.
+ */
+static void io_ctx_mark_taskrun(struct io_ring_ctx *ctx)
+{
+	struct io_rings *rings;
+
+	if (!(ctx->flags & IORING_SETUP_TASKRUN_FLAG))
+		return;
+
+	rings = rcu_dereference(ctx->rings_rcu);
+	if (rings)
+		atomic_or(IORING_SQ_TASKRUN, &rings->sq_flags);
+}
+
 void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
@@ -206,8 +223,7 @@ void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
 	 */
 
 	if (!head) {
-		if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
-			atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+		io_ctx_mark_taskrun(ctx);
 		if (ctx->has_evfd)
 			io_eventfd_signal(ctx, false);
 	}
@@ -231,6 +247,10 @@ void io_req_normal_work_add(struct io_kiocb *req)
 	if (!llist_add(&req->io_task_work.node, &tctx->task_list))
 		return;
 
+	/*
+	 * Doesn't need to use ->rings_rcu, as resizing isn't supported for
+	 * !DEFER_TASKRUN.
+	 */
 	if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
 		atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] io_uring/eventfd: use ctx->rings_rcu for flags checking
  2026-03-10 14:45 [PATCHSET 0/2] Fix DEFER_TASKRUN ring resize flag manipulation Jens Axboe
  2026-03-10 14:45 ` [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation Jens Axboe
@ 2026-03-10 14:45 ` Jens Axboe
  1 sibling, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2026-03-10 14:45 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, naup96721, Jens Axboe, stable

Similarly to what commit e78f7b70e837 did for local task work additions,
use ->rings_rcu under RCU rather than dereference ->rings directly. See
that commit for more details.

Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/eventfd.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/io_uring/eventfd.c b/io_uring/eventfd.c
index 78f8ab7db104..ab789e1ebe91 100644
--- a/io_uring/eventfd.c
+++ b/io_uring/eventfd.c
@@ -76,11 +76,15 @@ void io_eventfd_signal(struct io_ring_ctx *ctx, bool cqe_event)
 {
 	bool skip = false;
 	struct io_ev_fd *ev_fd;
-
-	if (READ_ONCE(ctx->rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
-		return;
+	struct io_rings *rings;
 
 	guard(rcu)();
+
+	rings = rcu_dereference(ctx->rings_rcu);
+	if (!rings)
+		return;
+	if (READ_ONCE(rings->cq_flags) & IORING_CQ_EVENTFD_DISABLED)
+		return;
 	ev_fd = rcu_dereference(ctx->io_ev_fd);
 	/*
 	 * Check again if ev_fd exists in case an io_eventfd_unregister call
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation
  2026-03-10 14:45 ` [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation Jens Axboe
@ 2026-03-11 11:13   ` Pavel Begunkov
  2026-03-11 13:05     ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2026-03-11 11:13 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: naup96721, stable

On 3/10/26 14:45, Jens Axboe wrote:
> If DEFER_TASKRUN | SETUP_TASKRUN is used and task work is added while
> the ring is being resized, it's possible for the OR'ing of
> IORING_SQ_TASKRUN to happen in the small window of swapping into the
> new rings and the old rings being freed.
> 
> Prevent this by adding a 2nd ->rings pointer, ->rings_rcu, which is
> protected by RCU. The task work flags manipulation is inside RCU
> already, and if the resize ring freeing is done post an RCU synchronize,
> then there's no need to add locking to the fast path of task work
> additions.
> 
> Note: this is only done for DEFER_TASKRUN, as that's the only setup mode
> that supports ring resizing. If this ever changes, then they too need to
> use the io_ctx_mark_taskrun() helper.
> 
> Link: https://lore.kernel.org/io-uring/20260309062759.482210-1-naup96721@gmail.com/
> Cc: stable@vger.kernel.org
> Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
> Reported-by: Hao-Yu Yang <naup96721@gmail.com>
> Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>   include/linux/io_uring_types.h |  1 +
>   io_uring/io_uring.c            |  2 ++
>   io_uring/register.c            | 20 ++++++++++++++++++--
>   io_uring/tw.c                  | 24 ++++++++++++++++++++++--
>   4 files changed, 43 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 3e4a82a6f817..dd1420bfcb73 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -388,6 +388,7 @@ struct io_ring_ctx {
>   	 * regularly bounce b/w CPUs.
>   	 */
>   	struct {
> +		struct io_rings	__rcu	*rings_rcu;
>   		struct llist_head	work_llist;
>   		struct llist_head	retry_llist;
>   		unsigned long		check_cq;
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index ccab8562d273..20fdc442e014 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -2066,6 +2066,7 @@ static void io_rings_free(struct io_ring_ctx *ctx)
>   	io_free_region(ctx->user, &ctx->sq_region);
>   	io_free_region(ctx->user, &ctx->ring_region);
>   	ctx->rings = NULL;
> +	RCU_INIT_POINTER(ctx->rings_rcu, NULL);
>   	ctx->sq_sqes = NULL;
>   }
>   
> @@ -2703,6 +2704,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
>   	if (ret)
>   		return ret;
>   	ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
> +	rcu_assign_pointer(ctx->rings_rcu, rings);
>   	if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
>   		ctx->sq_array = (u32 *)((char *)rings + rl->sq_array_offset);
>   
> diff --git a/io_uring/register.c b/io_uring/register.c
> index a839b22fd392..5f2985ba0879 100644
> --- a/io_uring/register.c
> +++ b/io_uring/register.c
> @@ -487,6 +487,18 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
>   			 IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP | \
>   			 IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED)
>   
> +static void io_resize_assign_rings(struct io_ring_ctx *ctx, struct io_rings *rings)
> +{
> +	/*
> +	 * Just mark any flag we may have missed and that the application
> +	 * should act on unconditionally. Worst case it'll be an extra
> +	 * syscall.
> +	 */
> +	atomic_or(IORING_SQ_TASKRUN | IORING_SQ_NEED_WAKEUP, &rings->sq_flags);
> +	ctx->rings = rings;
> +	rcu_assign_pointer(ctx->rings_rcu, rings);
> +}
> +
>   static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>   {
>   	struct io_ctx_config config;
> @@ -579,6 +591,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>   	spin_lock(&ctx->completion_lock);
>   	o.rings = ctx->rings;
>   	ctx->rings = NULL;
> +	RCU_INIT_POINTER(ctx->rings_rcu, NULL);
>   	o.sq_sqes = ctx->sq_sqes;
>   	ctx->sq_sqes = NULL;

Should be better to not have a transient null, and then there
is no need to check for that in task_work. I.e. don't zero it
and only assign the new value if you successfully created a
new set of rings.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation
  2026-03-11 11:13   ` Pavel Begunkov
@ 2026-03-11 13:05     ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2026-03-11 13:05 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: naup96721, stable

On 3/11/26 5:13 AM, Pavel Begunkov wrote:
> On 3/10/26 14:45, Jens Axboe wrote:
>> If DEFER_TASKRUN | SETUP_TASKRUN is used and task work is added while
>> the ring is being resized, it's possible for the OR'ing of
>> IORING_SQ_TASKRUN to happen in the small window of swapping into the
>> new rings and the old rings being freed.
>>
>> Prevent this by adding a 2nd ->rings pointer, ->rings_rcu, which is
>> protected by RCU. The task work flags manipulation is inside RCU
>> already, and if the resize ring freeing is done post an RCU synchronize,
>> then there's no need to add locking to the fast path of task work
>> additions.
>>
>> Note: this is only done for DEFER_TASKRUN, as that's the only setup mode
>> that supports ring resizing. If this ever changes, then they too need to
>> use the io_ctx_mark_taskrun() helper.
>>
>> Link: https://lore.kernel.org/io-uring/20260309062759.482210-1-naup96721@gmail.com/
>> Cc: stable@vger.kernel.org
>> Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
>> Reported-by: Hao-Yu Yang <naup96721@gmail.com>
>> Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>   include/linux/io_uring_types.h |  1 +
>>   io_uring/io_uring.c            |  2 ++
>>   io_uring/register.c            | 20 ++++++++++++++++++--
>>   io_uring/tw.c                  | 24 ++++++++++++++++++++++--
>>   4 files changed, 43 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index 3e4a82a6f817..dd1420bfcb73 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -388,6 +388,7 @@ struct io_ring_ctx {
>>        * regularly bounce b/w CPUs.
>>        */
>>       struct {
>> +        struct io_rings    __rcu    *rings_rcu;
>>           struct llist_head    work_llist;
>>           struct llist_head    retry_llist;
>>           unsigned long        check_cq;
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index ccab8562d273..20fdc442e014 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -2066,6 +2066,7 @@ static void io_rings_free(struct io_ring_ctx *ctx)
>>       io_free_region(ctx->user, &ctx->sq_region);
>>       io_free_region(ctx->user, &ctx->ring_region);
>>       ctx->rings = NULL;
>> +    RCU_INIT_POINTER(ctx->rings_rcu, NULL);
>>       ctx->sq_sqes = NULL;
>>   }
>>   @@ -2703,6 +2704,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
>>       if (ret)
>>           return ret;
>>       ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
>> +    rcu_assign_pointer(ctx->rings_rcu, rings);
>>       if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
>>           ctx->sq_array = (u32 *)((char *)rings + rl->sq_array_offset);
>>   diff --git a/io_uring/register.c b/io_uring/register.c
>> index a839b22fd392..5f2985ba0879 100644
>> --- a/io_uring/register.c
>> +++ b/io_uring/register.c
>> @@ -487,6 +487,18 @@ static void io_register_free_rings(struct io_ring_ctx *ctx,
>>                IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP | \
>>                IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED)
>>   +static void io_resize_assign_rings(struct io_ring_ctx *ctx, struct io_rings *rings)
>> +{
>> +    /*
>> +     * Just mark any flag we may have missed and that the application
>> +     * should act on unconditionally. Worst case it'll be an extra
>> +     * syscall.
>> +     */
>> +    atomic_or(IORING_SQ_TASKRUN | IORING_SQ_NEED_WAKEUP, &rings->sq_flags);
>> +    ctx->rings = rings;
>> +    rcu_assign_pointer(ctx->rings_rcu, rings);
>> +}
>> +
>>   static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>>   {
>>       struct io_ctx_config config;
>> @@ -579,6 +591,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>>       spin_lock(&ctx->completion_lock);
>>       o.rings = ctx->rings;
>>       ctx->rings = NULL;
>> +    RCU_INIT_POINTER(ctx->rings_rcu, NULL);
>>       o.sq_sqes = ctx->sq_sqes;
>>       ctx->sq_sqes = NULL;
> 
> Should be better to not have a transient null, and then there
> is no need to check for that in task_work. I.e. don't zero it
> and only assign the new value if you successfully created a
> new set of rings.

That's a good idea, I like that. I'll make the change.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation
  2026-03-11 13:11 [PATCHSET v2] Fix DEFER_TASKRUN ring resize flag manipulation Jens Axboe
@ 2026-03-11 13:11 ` Jens Axboe
  2026-03-11 15:06   ` Keith Busch
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2026-03-11 13:11 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, naup96721, Jens Axboe, stable

If DEFER_TASKRUN | SETUP_TASKRUN is used and task work is added while
the ring is being resized, it's possible for the OR'ing of
IORING_SQ_TASKRUN to happen in the small window of swapping into the
new rings and the old rings being freed.

Prevent this by adding a 2nd ->rings pointer, ->rings_rcu, which is
protected by RCU. The task work flags manipulation is inside RCU
already, and if the resize ring freeing is done post an RCU synchronize,
then there's no need to add locking to the fast path of task work
additions.

Note: this is only done for DEFER_TASKRUN, as that's the only setup mode
that supports ring resizing. If this ever changes, then they too need to
use the io_ctx_mark_taskrun() helper.

Link: https://lore.kernel.org/io-uring/20260309062759.482210-1-naup96721@gmail.com/
Cc: stable@vger.kernel.org
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reported-by: Hao-Yu Yang <naup96721@gmail.com>
Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/io_uring_types.h |  1 +
 io_uring/io_uring.c            |  2 ++
 io_uring/register.c            | 11 +++++++++++
 io_uring/tw.c                  | 21 +++++++++++++++++++--
 4 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3e4a82a6f817..dd1420bfcb73 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -388,6 +388,7 @@ struct io_ring_ctx {
 	 * regularly bounce b/w CPUs.
 	 */
 	struct {
+		struct io_rings	__rcu	*rings_rcu;
 		struct llist_head	work_llist;
 		struct llist_head	retry_llist;
 		unsigned long		check_cq;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ccab8562d273..20fdc442e014 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2066,6 +2066,7 @@ static void io_rings_free(struct io_ring_ctx *ctx)
 	io_free_region(ctx->user, &ctx->sq_region);
 	io_free_region(ctx->user, &ctx->ring_region);
 	ctx->rings = NULL;
+	RCU_INIT_POINTER(ctx->rings_rcu, NULL);
 	ctx->sq_sqes = NULL;
 }
 
@@ -2703,6 +2704,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx,
 	if (ret)
 		return ret;
 	ctx->rings = rings = io_region_get_ptr(&ctx->ring_region);
+	rcu_assign_pointer(ctx->rings_rcu, rings);
 	if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
 		ctx->sq_array = (u32 *)((char *)rings + rl->sq_array_offset);
 
diff --git a/io_uring/register.c b/io_uring/register.c
index a839b22fd392..6d3e65b17514 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -633,7 +633,15 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 	ctx->sq_entries = p->sq_entries;
 	ctx->cq_entries = p->cq_entries;
 
+	/*
+	 * Just mark any flag we may have missed and that the application
+	 * should act on unconditionally. Worst case it'll be an extra
+	 * syscall.
+	 */
+	atomic_or(IORING_SQ_TASKRUN | IORING_SQ_NEED_WAKEUP, &n.rings->sq_flags);
 	ctx->rings = n.rings;
+	rcu_assign_pointer(ctx->rings_rcu, n.rings);
+
 	ctx->sq_sqes = n.sq_sqes;
 	swap_old(ctx, o, n, ring_region);
 	swap_old(ctx, o, n, sq_region);
@@ -642,6 +650,9 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
 out:
 	spin_unlock(&ctx->completion_lock);
 	mutex_unlock(&ctx->mmap_lock);
+	/* Wait for concurrent io_ctx_mark_taskrun() */
+	if (to_free == &o)
+		synchronize_rcu();
 	io_register_free_rings(ctx, to_free);
 
 	if (ctx->sq_data)
diff --git a/io_uring/tw.c b/io_uring/tw.c
index 1ee2b8ab07c8..0c860a7e6c61 100644
--- a/io_uring/tw.c
+++ b/io_uring/tw.c
@@ -152,6 +152,20 @@ void tctx_task_work(struct callback_head *cb)
 	WARN_ON_ONCE(ret);
 }
 
+/*
+ * Sets IORING_SQ_TASKRUN in the sq_flags shared with userspace, using the
+ * RCU protected rings pointer to be safe against concurrent ring resizing.
+ * Must be called inside an RCU read-side critical section.
+ */
+static void io_ctx_mark_taskrun(struct io_ring_ctx *ctx)
+{
+	if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) {
+		struct io_rings *rings = rcu_dereference(ctx->rings_rcu);
+
+		atomic_or(IORING_SQ_TASKRUN, &rings->sq_flags);
+	}
+}
+
 void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
 {
 	struct io_ring_ctx *ctx = req->ctx;
@@ -206,8 +220,7 @@ void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
 	 */
 
 	if (!head) {
-		if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
-			atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+		io_ctx_mark_taskrun(ctx);
 		if (ctx->has_evfd)
 			io_eventfd_signal(ctx, false);
 	}
@@ -231,6 +244,10 @@ void io_req_normal_work_add(struct io_kiocb *req)
 	if (!llist_add(&req->io_task_work.node, &tctx->task_list))
 		return;
 
+	/*
+	 * Doesn't need to use ->rings_rcu, as resizing isn't supported for
+	 * !DEFER_TASKRUN.
+	 */
 	if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
 		atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation
  2026-03-11 13:11 ` [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation Jens Axboe
@ 2026-03-11 15:06   ` Keith Busch
  2026-03-11 15:12     ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Keith Busch @ 2026-03-11 15:06 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, asml.silence, naup96721, stable

On Wed, Mar 11, 2026 at 07:11:55AM -0600, Jens Axboe wrote:
> +/*
> + * Sets IORING_SQ_TASKRUN in the sq_flags shared with userspace, using the
> + * RCU protected rings pointer to be safe against concurrent ring resizing.
> + * Must be called inside an RCU read-side critical section.

You can make the rcu requirement explicit in the code with:

	ASSERT(rcu_read_lock_held());

And debug kernels will catch misuse, too.

> + */
> +static void io_ctx_mark_taskrun(struct io_ring_ctx *ctx)
> +{
> +	if (ctx->flags & IORING_SETUP_TASKRUN_FLAG) {
> +		struct io_rings *rings = rcu_dereference(ctx->rings_rcu);
> +
> +		atomic_or(IORING_SQ_TASKRUN, &rings->sq_flags);
> +	}
> +}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation
  2026-03-11 15:06   ` Keith Busch
@ 2026-03-11 15:12     ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2026-03-11 15:12 UTC (permalink / raw)
  To: Keith Busch; +Cc: io-uring, asml.silence, naup96721, stable

On 3/11/26 9:06 AM, Keith Busch wrote:
> On Wed, Mar 11, 2026 at 07:11:55AM -0600, Jens Axboe wrote:
>> +/*
>> + * Sets IORING_SQ_TASKRUN in the sq_flags shared with userspace, using the
>> + * RCU protected rings pointer to be safe against concurrent ring resizing.
>> + * Must be called inside an RCU read-side critical section.
> 
> You can make the rcu requirement explicit in the code with:
> 
> 	ASSERT(rcu_read_lock_held());
> 
> And debug kernels will catch misuse, too.

We have lockdep_assert_in_rcu_read_lock(), that should do it. Did ponder
that, and then I could also kill the comment as it's self documenting
by that point.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-11 15:12 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 14:45 [PATCHSET 0/2] Fix DEFER_TASKRUN ring resize flag manipulation Jens Axboe
2026-03-10 14:45 ` [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation Jens Axboe
2026-03-11 11:13   ` Pavel Begunkov
2026-03-11 13:05     ` Jens Axboe
2026-03-10 14:45 ` [PATCH 2/2] io_uring/eventfd: use ctx->rings_rcu for flags checking Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2026-03-11 13:11 [PATCHSET v2] Fix DEFER_TASKRUN ring resize flag manipulation Jens Axboe
2026-03-11 13:11 ` [PATCH 1/2] io_uring: ensure ctx->rings is stable for task work flags manipulation Jens Axboe
2026-03-11 15:06   ` Keith Busch
2026-03-11 15:12     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox