* [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
@ 2026-03-09 6:27 Hao-Yu Yang
2026-03-09 13:11 ` Jens Axboe
0 siblings, 1 reply; 10+ messages in thread
From: Hao-Yu Yang @ 2026-03-09 6:27 UTC (permalink / raw)
To: security; +Cc: naxboe, io-uring, linux-kernel, Hao-Yu Yang
During io_register_resize_rings execution, ctx->rings is temporarily
set to NULL before new ring memory is allocated. If a timer interrupt
fires during this window, the interrupt handler (via timerfd_tmrproc
-> io_poll_wake -> __io_req_task_work_add -> io_req_local_work_add)
attempts to access ctx->rings->sq_flags, causing race condition and
a NULL pointer dereference.
BUG: kernel NULL pointer dereference, address: 0000000000000024
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<IRQ>
__io_poll_execute (io_uring/poll.c:223)
io_poll_wake (io_uring/poll.c:426)
__wake_up_common (kernel/sched/wait.c:109)
__wake_up_locked_key (kernel/sched/wait.c:167)
timerfd_tmrproc (./include/linux/spinlock.h:407 fs/timerfd.c:71 fs/timerfd.c:78)
? __pfx_timerfd_tmrproc (fs/timerfd.c:75)
__hrtimer_run_queues (kernel/time/hrtimer.c:1785 kernel/time/hrtimer.c:1849)
hrtimer_interrupt (kernel/time/hrtimer.c:1914)
__sysvec_apic_timer_interrupt (./arch/x86/include/asm/jump_label.h:37 ./arch/x86/include/asm/trace/irq_vectors.h:40 arch/x86/kernel/apic/apic.c:1063)
sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1056 arch/x86/kernel/apic/apic.c:1056)
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:697)
RIP: 0010:io_register_resize_rings (io_uring/register.c:593)
? io_register_resize_rings (io_uring/register.c:580)
__io_uring_register (io_uring/register.c:898)
? fget (fs/file.c:1114)
__x64_sys_io_uring_register (io_uring/register.c:1026 io_uring/register.c:1001 io_uring/register.c:1001)
x64_sys_call (arch/x86/entry/syscall_64.c:41)
do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
</TASK>
Fix by using spin_lock_irq/spin_unlock_irq instead of spin_lock/spin_unlock
in io_register_resize_rings. This disables IRQs while ctx->rings is set to
NULL, preventing interrupt handlers from executing during the window when
ctx->rings is NULL.
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS")
Reported-by: Hao-Yu Yang <naup96721@gmail.com>
Signed-off-by: Hao-Yu Yang <naup96721@gmail.com>
---
io_uring/register.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/io_uring/register.c b/io_uring/register.c
index 6015a3e9ce69..0526301f7a25 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -576,7 +576,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
* duration of the actual swap.
*/
mutex_lock(&ctx->mmap_lock);
- spin_lock(&ctx->completion_lock);
+ spin_lock_irq(&ctx->completion_lock);
o.rings = ctx->rings;
ctx->rings = NULL;
o.sq_sqes = ctx->sq_sqes;
@@ -640,7 +640,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
to_free = &o;
ret = 0;
out:
- spin_unlock(&ctx->completion_lock);
+ spin_unlock_irq(&ctx->completion_lock);
mutex_unlock(&ctx->mmap_lock);
io_register_free_rings(ctx, to_free);
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 6:27 [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings Hao-Yu Yang
@ 2026-03-09 13:11 ` Jens Axboe
2026-03-09 16:04 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2026-03-09 13:11 UTC (permalink / raw)
To: Hao-Yu Yang, security; +Cc: io-uring, linux-kernel
On 3/9/26 12:27 AM, Hao-Yu Yang wrote:
> During io_register_resize_rings execution, ctx->rings is temporarily
> set to NULL before new ring memory is allocated. If a timer interrupt
> fires during this window, the interrupt handler (via timerfd_tmrproc
> -> io_poll_wake -> __io_req_task_work_add -> io_req_local_work_add)
> attempts to access ctx->rings->sq_flags, causing race condition and
> a NULL pointer dereference.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000024
> PF: supervisor read access in kernel mode
> PF: error_code(0x0000) - not-present page
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> Call Trace:
> <IRQ>
> __io_poll_execute (io_uring/poll.c:223)
> io_poll_wake (io_uring/poll.c:426)
> __wake_up_common (kernel/sched/wait.c:109)
> __wake_up_locked_key (kernel/sched/wait.c:167)
> timerfd_tmrproc (./include/linux/spinlock.h:407 fs/timerfd.c:71 fs/timerfd.c:78)
> ? __pfx_timerfd_tmrproc (fs/timerfd.c:75)
> __hrtimer_run_queues (kernel/time/hrtimer.c:1785 kernel/time/hrtimer.c:1849)
> hrtimer_interrupt (kernel/time/hrtimer.c:1914)
> __sysvec_apic_timer_interrupt (./arch/x86/include/asm/jump_label.h:37 ./arch/x86/include/asm/trace/irq_vectors.h:40 arch/x86/kernel/apic/apic.c:1063)
> sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1056 arch/x86/kernel/apic/apic.c:1056)
> </IRQ>
> <TASK>
> asm_sysvec_apic_timer_interrupt (./arch/x86/include/asm/idtentry.h:697)
> RIP: 0010:io_register_resize_rings (io_uring/register.c:593)
> ? io_register_resize_rings (io_uring/register.c:580)
> __io_uring_register (io_uring/register.c:898)
> ? fget (fs/file.c:1114)
> __x64_sys_io_uring_register (io_uring/register.c:1026 io_uring/register.c:1001 io_uring/register.c:1001)
> x64_sys_call (arch/x86/entry/syscall_64.c:41)
> do_syscall_64 (arch/x86/entry/syscall_64.c:63 arch/x86/entry/syscall)
> entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
> </TASK>
>
> Fix by using spin_lock_irq/spin_unlock_irq instead of spin_lock/spin_unlock
> in io_register_resize_rings. This disables IRQs while ctx->rings is set to
> NULL, preventing interrupt handlers from executing during the window when
> ctx->rings is NULL.
I see the your crash, but this isn't the right fix - you can't just have
that one spot turn the completion lock into an IRQ disabling one,
that'll mess up the rest of the use cases in terms of lockdep. This lock
is never grabbed from IRQ context, hence it'd be misleading too.
You probably want something ala:
mutex_lock(&ctx->mmap_lock);
spin_lock(&ctx->completion_lock();
+ local_irq_disable();
...
+ local_irq_enable();
spin_unlock(&ctx->completion_lock);
mutex_unlock(&ctx->mmap_lock);
which I think should make lockdep happy with it too.
And then I think it wants a comment on top of that local_irq_disable(),
explaining how this is meant to prevent any irq/bh from triggering
task_work additions that touch ctx->rings to set the IORING_SQ_TASKRUN
flag.
Does that makes sense? If so, please do send a v2!
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 13:11 ` Jens Axboe
@ 2026-03-09 16:04 ` Linus Torvalds
2026-03-09 16:29 ` Jens Axboe
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2026-03-09 16:04 UTC (permalink / raw)
To: Jens Axboe; +Cc: Hao-Yu Yang, security, io-uring, linux-kernel
On Mon, 9 Mar 2026 at 06:11, Jens Axboe <axboe@kernel.dk> wrote:
>
> You probably want something ala:
>
> mutex_lock(&ctx->mmap_lock);
> spin_lock(&ctx->completion_lock();
> + local_irq_disable();
How could that ever work?
Irqs will happily continue on other CPUs, so disabling interrupts is
complete nonsense as far as I can tell - whether done with
spin_lock_irq() *or* with local_irq_disable()/.
So basically, touching ctx->rings from irq context in this section is
simply not an option - or the rings pointer just needs to be updated
atomically so that it doesn't matter.
I assume this was tested in qemu on a single-core setup, so that
fundamental mistake was hidden by an irrelevant configuration.
Where is the actual oops - for some inexplicable reason that had been
edited out, and it only had the call trace leading up toit? Based on
the incomplete information and the faulting address of 0x24, I'm
*guessing* that it is
if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
in io_req_normal_work_add(), but that may be complete garbage.
So the actual fix may be to just make damn sure that
IORING_SETUP_TASKRUN_FLAG is *not* set when the rings are resized.
But for all I know, (a) I may be looking at entirely the wrong place
and (b) there might be millions of other places that want to access
ctx->rings, so the above may be the rantings of a crazy old man.
Get off my lawn.
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 16:04 ` Linus Torvalds
@ 2026-03-09 16:29 ` Jens Axboe
2026-03-09 18:34 ` Jens Axboe
0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2026-03-09 16:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Hao-Yu Yang, security, io-uring, linux-kernel
On Mar 9, 2026, at 10:05 AM, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>
> On Mon, 9 Mar 2026 at 06:11, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> You probably want something ala:
>>
>> mutex_lock(&ctx->mmap_lock);
>> spin_lock(&ctx->completion_lock();
>> + local_irq_disable();
>
> How could that ever work?
>
> Irqs will happily continue on other CPUs, so disabling interrupts is
> complete nonsense as far as I can tell - whether done with
> spin_lock_irq() *or* with local_irq_disable()/.
>
> So basically, touching ctx->rings from irq context in this section is
> simply not an option - or the rings pointer just needs to be updated
> atomically so that it doesn't matter.
>
> I assume this was tested in qemu on a single-core setup, so that
> fundamental mistake was hidden by an irrelevant configuration.
>
> Where is the actual oops - for some inexplicable reason that had been
> edited out, and it only had the call trace leading up toit? Based on
> the incomplete information and the faulting address of 0x24, I'm
> *guessing* that it is
>
> if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
> atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
>
> in io_req_normal_work_add(), but that may be complete garbage.
>
> So the actual fix may be to just make damn sure that
> IORING_SETUP_TASKRUN_FLAG is *not* set when the rings are resized.
>
> But for all I know, (a) I may be looking at entirely the wrong place
> and (b) there might be millions of other places that want to access
> ctx->rings, so the above may be the rantings of a crazy old man.
Nah you’re totally right. I’m operating in few hours of sleep and on a plane. I’ll take a closer look later. The flag mask protecting it is a good idea, another one could be just a specific irq safe resize lock would be better here.
Jens
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 16:29 ` Jens Axboe
@ 2026-03-09 18:34 ` Jens Axboe
2026-03-09 18:35 ` Jens Axboe
2026-03-09 18:57 ` Pavel Begunkov
0 siblings, 2 replies; 10+ messages in thread
From: Jens Axboe @ 2026-03-09 18:34 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Hao-Yu Yang, security, io-uring, linux-kernel
On 3/9/26 10:29 AM, Jens Axboe wrote:
> On Mar 9, 2026, at 10:05?AM, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>>
>> ?On Mon, 9 Mar 2026 at 06:11, Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> You probably want something ala:
>>>
>>> mutex_lock(&ctx->mmap_lock);
>>> spin_lock(&ctx->completion_lock();
>>> + local_irq_disable();
>>
>> How could that ever work?
>>
>> Irqs will happily continue on other CPUs, so disabling interrupts is
>> complete nonsense as far as I can tell - whether done with
>> spin_lock_irq() *or* with local_irq_disable()/.
>>
>> So basically, touching ctx->rings from irq context in this section is
>> simply not an option - or the rings pointer just needs to be updated
>> atomically so that it doesn't matter.
>>
>> I assume this was tested in qemu on a single-core setup, so that
>> fundamental mistake was hidden by an irrelevant configuration.
>>
>> Where is the actual oops - for some inexplicable reason that had been
>> edited out, and it only had the call trace leading up toit? Based on
>> the incomplete information and the faulting address of 0x24, I'm
>> *guessing* that it is
>>
>> if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
>> atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
>>
>> in io_req_normal_work_add(), but that may be complete garbage.
>>
>> So the actual fix may be to just make damn sure that
>> IORING_SETUP_TASKRUN_FLAG is *not* set when the rings are resized.
>>
>> But for all I know, (a) I may be looking at entirely the wrong place
>> and (b) there might be millions of other places that want to access
>> ctx->rings, so the above may be the rantings of a crazy old man.
>
> Nah you?re totally right. I?m operating in few hours of sleep and on a
> plane. I?ll take a closer look later. The flag mask protecting it is a
> good idea, another one could be just a specific irq safe resize lock
> would be better here.
How about something like this? I don't particularly like using ->flags
for this, as these are otherwise static after the ring has been set up.
Hence it'd be better to to just use a separate value for this,
->in_resize, and use smp_load_acquire/release. The write side can be as
expensive as we want it to be, as it's not a hot path at all. And the
acquire read should light weight enough here.
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3e4a82a6f817..428eb5b2c624 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -394,6 +394,7 @@ struct io_ring_ctx {
atomic_t cq_wait_nr;
atomic_t cq_timeouts;
struct wait_queue_head cq_wait;
+ int in_resize;
} ____cacheline_aligned_in_smp;
/* timeouts */
diff --git a/io_uring/register.c b/io_uring/register.c
index 3378014e51fb..048a1dcd9df1 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -575,6 +575,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
* ctx->mmap_lock as well. Likewise, hold the completion lock over the
* duration of the actual swap.
*/
+ smp_store_release(&ctx->in_resize, 1);
mutex_lock(&ctx->mmap_lock);
spin_lock(&ctx->completion_lock);
o.rings = ctx->rings;
@@ -647,6 +648,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
if (ctx->sq_data)
io_sq_thread_unpark(ctx->sq_data);
+ smp_store_release(&ctx->in_resize, 0);
return ret;
}
diff --git a/io_uring/tw.c b/io_uring/tw.c
index 1ee2b8ab07c8..c66ffa787ec7 100644
--- a/io_uring/tw.c
+++ b/io_uring/tw.c
@@ -152,6 +152,13 @@ void tctx_task_work(struct callback_head *cb)
WARN_ON_ONCE(ret);
}
+static void io_mark_taskrun(struct io_ring_ctx *ctx)
+{
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG &&
+ !smp_load_acquire(&ctx->in_resize))
+ atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+}
+
void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
{
struct io_ring_ctx *ctx = req->ctx;
@@ -206,6 +213,7 @@ void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
*/
if (!head) {
+ io_mark_taskrun(ctx);
if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
if (ctx->has_evfd)
@@ -231,8 +239,7 @@ void io_req_normal_work_add(struct io_kiocb *req)
if (!llist_add(&req->io_task_work.node, &tctx->task_list))
return;
- if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
- atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+ io_mark_taskrun(ctx);
/* SQPOLL doesn't need the task_work added, it'll run it itself */
if (ctx->flags & IORING_SETUP_SQPOLL) {
--
Jens Axboe
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 18:34 ` Jens Axboe
@ 2026-03-09 18:35 ` Jens Axboe
2026-03-09 19:03 ` Linus Torvalds
2026-03-09 18:57 ` Pavel Begunkov
1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2026-03-09 18:35 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Hao-Yu Yang, security, io-uring, linux-kernel
On 3/9/26 12:34 PM, Jens Axboe wrote:
> How about something like this? I don't particularly like using ->flags
> for this, as these are otherwise static after the ring has been set up.
> Hence it'd be better to to just use a separate value for this,
> ->in_resize, and use smp_load_acquire/release. The write side can be as
> expensive as we want it to be, as it's not a hot path at all. And the
> acquire read should light weight enough here.
_actual_ patch, doesn't help if we don't kill the manual atomic_or()...
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 3e4a82a6f817..428eb5b2c624 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -394,6 +394,7 @@ struct io_ring_ctx {
atomic_t cq_wait_nr;
atomic_t cq_timeouts;
struct wait_queue_head cq_wait;
+ int in_resize;
} ____cacheline_aligned_in_smp;
/* timeouts */
diff --git a/io_uring/register.c b/io_uring/register.c
index 3378014e51fb..048a1dcd9df1 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -575,6 +575,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
* ctx->mmap_lock as well. Likewise, hold the completion lock over the
* duration of the actual swap.
*/
+ smp_store_release(&ctx->in_resize, 1);
mutex_lock(&ctx->mmap_lock);
spin_lock(&ctx->completion_lock);
o.rings = ctx->rings;
@@ -647,6 +648,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
if (ctx->sq_data)
io_sq_thread_unpark(ctx->sq_data);
+ smp_store_release(&ctx->in_resize, 0);
return ret;
}
diff --git a/io_uring/tw.c b/io_uring/tw.c
index 1ee2b8ab07c8..3414cb27879a 100644
--- a/io_uring/tw.c
+++ b/io_uring/tw.c
@@ -152,6 +152,13 @@ void tctx_task_work(struct callback_head *cb)
WARN_ON_ONCE(ret);
}
+static void io_mark_taskrun(struct io_ring_ctx *ctx)
+{
+ if (ctx->flags & IORING_SETUP_TASKRUN_FLAG &&
+ !smp_load_acquire(&ctx->in_resize))
+ atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+}
+
void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
{
struct io_ring_ctx *ctx = req->ctx;
@@ -206,8 +213,7 @@ void io_req_local_work_add(struct io_kiocb *req, unsigned flags)
*/
if (!head) {
- if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
- atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+ io_mark_taskrun(ctx);
if (ctx->has_evfd)
io_eventfd_signal(ctx, false);
}
@@ -231,8 +237,7 @@ void io_req_normal_work_add(struct io_kiocb *req)
if (!llist_add(&req->io_task_work.node, &tctx->task_list))
return;
- if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
- atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
+ io_mark_taskrun(ctx);
/* SQPOLL doesn't need the task_work added, it'll run it itself */
if (ctx->flags & IORING_SETUP_SQPOLL) {
--
Jens Axboe
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 18:34 ` Jens Axboe
2026-03-09 18:35 ` Jens Axboe
@ 2026-03-09 18:57 ` Pavel Begunkov
2026-03-09 19:16 ` Jens Axboe
1 sibling, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2026-03-09 18:57 UTC (permalink / raw)
To: Jens Axboe, Linus Torvalds; +Cc: Hao-Yu Yang, security, io-uring, linux-kernel
On 3/9/26 18:34, Jens Axboe wrote:
> On 3/9/26 10:29 AM, Jens Axboe wrote:
>> On Mar 9, 2026, at 10:05?AM, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>>>
>>> ?On Mon, 9 Mar 2026 at 06:11, Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> You probably want something ala:
>>>>
>>>> mutex_lock(&ctx->mmap_lock);
>>>> spin_lock(&ctx->completion_lock();
>>>> + local_irq_disable();
>>>
>>> How could that ever work?
>>>
>>> Irqs will happily continue on other CPUs, so disabling interrupts is
>>> complete nonsense as far as I can tell - whether done with
>>> spin_lock_irq() *or* with local_irq_disable()/.
>>>
>>> So basically, touching ctx->rings from irq context in this section is
>>> simply not an option - or the rings pointer just needs to be updated
>>> atomically so that it doesn't matter.
>>>
>>> I assume this was tested in qemu on a single-core setup, so that
>>> fundamental mistake was hidden by an irrelevant configuration.
>>>
>>> Where is the actual oops - for some inexplicable reason that had been
>>> edited out, and it only had the call trace leading up toit? Based on
>>> the incomplete information and the faulting address of 0x24, I'm
>>> *guessing* that it is
>>>
>>> if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
>>> atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
>>>
>>> in io_req_normal_work_add(), but that may be complete garbage.
>>>
>>> So the actual fix may be to just make damn sure that
>>> IORING_SETUP_TASKRUN_FLAG is *not* set when the rings are resized.
>>>
>>> But for all I know, (a) I may be looking at entirely the wrong place
>>> and (b) there might be millions of other places that want to access
>>> ctx->rings, so the above may be the rantings of a crazy old man.
>>
>> Nah you?re totally right. I?m operating in few hours of sleep and on a
>> plane. I?ll take a closer look later. The flag mask protecting it is a
>> good idea, another one could be just a specific irq safe resize lock
>> would be better here.
>
> How about something like this? I don't particularly like using ->flags
> for this, as these are otherwise static after the ring has been set up.
> Hence it'd be better to to just use a separate value for this,
> ->in_resize, and use smp_load_acquire/release. The write side can be as
> expensive as we want it to be, as it's not a hot path at all. And the
> acquire read should light weight enough here.
>
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 3e4a82a6f817..428eb5b2c624 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -394,6 +394,7 @@ struct io_ring_ctx {
> atomic_t cq_wait_nr;
> atomic_t cq_timeouts;
> struct wait_queue_head cq_wait;
> + int in_resize;
> } ____cacheline_aligned_in_smp;
>
> /* timeouts */
> diff --git a/io_uring/register.c b/io_uring/register.c
> index 3378014e51fb..048a1dcd9df1 100644
> --- a/io_uring/register.c
> +++ b/io_uring/register.c
> @@ -575,6 +575,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
> * ctx->mmap_lock as well. Likewise, hold the completion lock over the
> * duration of the actual swap.
> */
> + smp_store_release(&ctx->in_resize, 1);
> mutex_lock(&ctx->mmap_lock);
> spin_lock(&ctx->completion_lock);
> o.rings = ctx->rings;
> @@ -647,6 +648,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
> if (ctx->sq_data)
> io_sq_thread_unpark(ctx->sq_data);
>
> + smp_store_release(&ctx->in_resize, 0);
> return ret;
> }
>
> diff --git a/io_uring/tw.c b/io_uring/tw.c
> index 1ee2b8ab07c8..c66ffa787ec7 100644
> --- a/io_uring/tw.c
> +++ b/io_uring/tw.c
> @@ -152,6 +152,13 @@ void tctx_task_work(struct callback_head *cb)
> WARN_ON_ONCE(ret);
> }
>
> +static void io_mark_taskrun(struct io_ring_ctx *ctx)
> +{
> + if (ctx->flags & IORING_SETUP_TASKRUN_FLAG &&
> + !smp_load_acquire(&ctx->in_resize))
> + atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
> +}
That's not going to work, same raciness, but you can protect the
pointer with rcu + rcu sync on resize. Tips:
1) sq_flags might get out of sync at the end. Either say that
users should never try to resize with inflight reqs, or just
hand set all flags, e.g. SQ_WAKE can be set unconditionally
2) For a fix, it'll likely be cleaner to keep ->rings as is
and introduce a second pointer (rcu protected).
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 18:35 ` Jens Axboe
@ 2026-03-09 19:03 ` Linus Torvalds
2026-03-09 19:22 ` Jens Axboe
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2026-03-09 19:03 UTC (permalink / raw)
To: Jens Axboe; +Cc: Hao-Yu Yang, security, io-uring, linux-kernel
On Mon, 9 Mar 2026 at 11:35, Jens Axboe <axboe@kernel.dk> wrote:
>
> --- a/io_uring/register.c
> +++ b/io_uring/register.c
> @@ -575,6 +575,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
> * ctx->mmap_lock as well. Likewise, hold the completion lock over the
> * duration of the actual swap.
> */
> + smp_store_release(&ctx->in_resize, 1);
> mutex_lock(&ctx->mmap_lock);
> spin_lock(&ctx->completion_lock);
The store-release doesn't actually make sense here. It just says "this
store is visible after all previous stores".
It can still be delayed arbitraritly, and migrate down into the locked
regions, and be visible to other cpus much later.
On x86, getting a lock will be a full memory barrier, but that's not
true everywhere else: locks keep things *inside* the locked region
inside the lock, but don't stop things *outside* the locked region
from moving into it.
End result: the smp_store_release does nothing. You should use a write
barrier (or a smp_store_mb(), but that's expensive).
But even *that* won't work - because the irq can already be running on
another CPU, and maybe it already tested 'in_resize', and saw a zero,
and then did that
atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
afterwards.
> @@ -647,6 +648,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
> if (ctx->sq_data)
> io_sq_thread_unpark(ctx->sq_data);
>
> + smp_store_release(&ctx->in_resize, 0);
On the release side, the store_release would make sense - the store is
visible to others after all the other stores are done (including,
obviously, the new 'rings' calue)
But see above. This just doesn't *work*, because the irq - running on
another cpu - will do the flag test and the cts->rings access as two
separate operations.
All these semantics means that 'in_resize' needs to basically be a lock.
You can then use 'trylock()' in irq context *around* the whole
sequence of using ctx->rings, to avoid disabling interrupts.
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 18:57 ` Pavel Begunkov
@ 2026-03-09 19:16 ` Jens Axboe
0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2026-03-09 19:16 UTC (permalink / raw)
To: Pavel Begunkov, Linus Torvalds
Cc: Hao-Yu Yang, security, io-uring, linux-kernel
On 3/9/26 12:57 PM, Pavel Begunkov wrote:
> On 3/9/26 18:34, Jens Axboe wrote:
>> On 3/9/26 10:29 AM, Jens Axboe wrote:
>>> On Mar 9, 2026, at 10:05?AM, Linus Torvalds <torvalds@linuxfoundation.org> wrote:
>>>>
>>>> ?On Mon, 9 Mar 2026 at 06:11, Jens Axboe <axboe@kernel.dk> wrote:
>>>>>
>>>>> You probably want something ala:
>>>>>
>>>>> mutex_lock(&ctx->mmap_lock);
>>>>> spin_lock(&ctx->completion_lock();
>>>>> + local_irq_disable();
>>>>
>>>> How could that ever work?
>>>>
>>>> Irqs will happily continue on other CPUs, so disabling interrupts is
>>>> complete nonsense as far as I can tell - whether done with
>>>> spin_lock_irq() *or* with local_irq_disable()/.
>>>>
>>>> So basically, touching ctx->rings from irq context in this section is
>>>> simply not an option - or the rings pointer just needs to be updated
>>>> atomically so that it doesn't matter.
>>>>
>>>> I assume this was tested in qemu on a single-core setup, so that
>>>> fundamental mistake was hidden by an irrelevant configuration.
>>>>
>>>> Where is the actual oops - for some inexplicable reason that had been
>>>> edited out, and it only had the call trace leading up toit? Based on
>>>> the incomplete information and the faulting address of 0x24, I'm
>>>> *guessing* that it is
>>>>
>>>> if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
>>>> atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
>>>>
>>>> in io_req_normal_work_add(), but that may be complete garbage.
>>>>
>>>> So the actual fix may be to just make damn sure that
>>>> IORING_SETUP_TASKRUN_FLAG is *not* set when the rings are resized.
>>>>
>>>> But for all I know, (a) I may be looking at entirely the wrong place
>>>> and (b) there might be millions of other places that want to access
>>>> ctx->rings, so the above may be the rantings of a crazy old man.
>>>
>>> Nah you?re totally right. I?m operating in few hours of sleep and on a
>>> plane. I?ll take a closer look later. The flag mask protecting it is a
>>> good idea, another one could be just a specific irq safe resize lock
>>> would be better here.
>>
>> How about something like this? I don't particularly like using ->flags
>> for this, as these are otherwise static after the ring has been set up.
>> Hence it'd be better to to just use a separate value for this,
>> ->in_resize, and use smp_load_acquire/release. The write side can be as
>> expensive as we want it to be, as it's not a hot path at all. And the
>> acquire read should light weight enough here.
>>
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index 3e4a82a6f817..428eb5b2c624 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -394,6 +394,7 @@ struct io_ring_ctx {
>> atomic_t cq_wait_nr;
>> atomic_t cq_timeouts;
>> struct wait_queue_head cq_wait;
>> + int in_resize;
>> } ____cacheline_aligned_in_smp;
>> /* timeouts */
>> diff --git a/io_uring/register.c b/io_uring/register.c
>> index 3378014e51fb..048a1dcd9df1 100644
>> --- a/io_uring/register.c
>> +++ b/io_uring/register.c
>> @@ -575,6 +575,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>> * ctx->mmap_lock as well. Likewise, hold the completion lock over the
>> * duration of the actual swap.
>> */
>> + smp_store_release(&ctx->in_resize, 1);
>> mutex_lock(&ctx->mmap_lock);
>> spin_lock(&ctx->completion_lock);
>> o.rings = ctx->rings;
>> @@ -647,6 +648,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>> if (ctx->sq_data)
>> io_sq_thread_unpark(ctx->sq_data);
>> + smp_store_release(&ctx->in_resize, 0);
>> return ret;
>> }
>> diff --git a/io_uring/tw.c b/io_uring/tw.c
>> index 1ee2b8ab07c8..c66ffa787ec7 100644
>> --- a/io_uring/tw.c
>> +++ b/io_uring/tw.c
>> @@ -152,6 +152,13 @@ void tctx_task_work(struct callback_head *cb)
>> WARN_ON_ONCE(ret);
>> }
>> +static void io_mark_taskrun(struct io_ring_ctx *ctx)
>> +{
>> + if (ctx->flags & IORING_SETUP_TASKRUN_FLAG &&
>> + !smp_load_acquire(&ctx->in_resize))
>> + atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
>> +}
>
> That's not going to work, same raciness, but you can protect the
> pointer with rcu + rcu sync on resize. Tips:
>
> 1) sq_flags might get out of sync at the end. Either say that
> users should never try to resize with inflight reqs, or just
> hand set all flags, e.g. SQ_WAKE can be set unconditionally
>
> 2) For a fix, it'll likely be cleaner to keep ->rings as is
> and introduce a second pointer (rcu protected).
Yeah you are right, it's not enough, it just shrinks the gap but it's
still there. rcu sync on resize is probably the best way. I'll take
another look tomorrow after some sleep.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings
2026-03-09 19:03 ` Linus Torvalds
@ 2026-03-09 19:22 ` Jens Axboe
0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2026-03-09 19:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Hao-Yu Yang, security, io-uring, linux-kernel
On 3/9/26 1:03 PM, Linus Torvalds wrote:
> On Mon, 9 Mar 2026 at 11:35, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> --- a/io_uring/register.c
>> +++ b/io_uring/register.c
>> @@ -575,6 +575,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>> * ctx->mmap_lock as well. Likewise, hold the completion lock over the
>> * duration of the actual swap.
>> */
>> + smp_store_release(&ctx->in_resize, 1);
>> mutex_lock(&ctx->mmap_lock);
>> spin_lock(&ctx->completion_lock);
>
> The store-release doesn't actually make sense here. It just says "this
> store is visible after all previous stores".
>
> It can still be delayed arbitraritly, and migrate down into the locked
> regions, and be visible to other cpus much later.
>
> On x86, getting a lock will be a full memory barrier, but that's not
> true everywhere else: locks keep things *inside* the locked region
> inside the lock, but don't stop things *outside* the locked region
> from moving into it.
>
> End result: the smp_store_release does nothing. You should use a write
> barrier (or a smp_store_mb(), but that's expensive).
>
> But even *that* won't work - because the irq can already be running on
> another CPU, and maybe it already tested 'in_resize', and saw a zero,
> and then did that
>
> atomic_or(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
>
> afterwards.
>
>> @@ -647,6 +648,7 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg)
>> if (ctx->sq_data)
>> io_sq_thread_unpark(ctx->sq_data);
>>
>> + smp_store_release(&ctx->in_resize, 0);
>
> On the release side, the store_release would make sense - the store is
> visible to others after all the other stores are done (including,
> obviously, the new 'rings' calue)
>
> But see above. This just doesn't *work*, because the irq - running on
> another cpu - will do the flag test and the cts->rings access as two
> separate operations.
>
> All these semantics means that 'in_resize' needs to basically be a lock.
>
> You can then use 'trylock()' in irq context *around* the whole
> sequence of using ctx->rings, to avoid disabling interrupts.
Agree - I think Pavel's suggestion to use an rcu protected pointer and
have the resize sync rcu is probably better though. As mentioned, resize
can be expensive, it's not a hot path operation. the local_work_add()
path is extremely hot, however.
I'll take a look with fresh eyes tomorrow.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-03-09 19:22 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-09 6:27 [PATCH v1] io_uring/register.c: fix NULL pointer dereference in io_register_resize_rings Hao-Yu Yang
2026-03-09 13:11 ` Jens Axboe
2026-03-09 16:04 ` Linus Torvalds
2026-03-09 16:29 ` Jens Axboe
2026-03-09 18:34 ` Jens Axboe
2026-03-09 18:35 ` Jens Axboe
2026-03-09 19:03 ` Linus Torvalds
2026-03-09 19:22 ` Jens Axboe
2026-03-09 18:57 ` Pavel Begunkov
2026-03-09 19:16 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox