[PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
@ 2022-12-06  9:38 Harshit Mogalapalli
  2022-12-06 13:52 ` Vegard Nossum
  2022-12-06 21:15 ` Jens Axboe
  0 siblings, 2 replies; 6+ messages in thread
From: Harshit Mogalapalli @ 2022-12-06  9:38 UTC (permalink / raw)
  Cc: harshit.m.mogalapalli, harshit.m.mogalapalli, vegard.nossum,
	george.kennedy, darren.kenny, syzkaller, Jens Axboe,
	Pavel Begunkov, io-uring, linux-kernel

Syzkaller reports a NULL deref bug as follows:

 BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
 Read of size 4 at addr 0000000000000138 by task file1/1955

 CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0xcd/0x134
  ? io_tctx_exit_cb+0x53/0xd3
  kasan_report+0xbb/0x1f0
  ? io_tctx_exit_cb+0x53/0xd3
  kasan_check_range+0x140/0x190
  io_tctx_exit_cb+0x53/0xd3
  task_work_run+0x164/0x250
  ? task_work_cancel+0x30/0x30
  get_signal+0x1c3/0x2440
  ? lock_downgrade+0x6e0/0x6e0
  ? lock_downgrade+0x6e0/0x6e0
  ? exit_signals+0x8b0/0x8b0
  ? do_raw_read_unlock+0x3b/0x70
  ? do_raw_spin_unlock+0x50/0x230
  arch_do_signal_or_restart+0x82/0x2470
  ? kmem_cache_free+0x260/0x4b0
  ? putname+0xfe/0x140
  ? get_sigframe_size+0x10/0x10
  ? do_execveat_common.isra.0+0x226/0x710
  ? lockdep_hardirqs_on+0x79/0x100
  ? putname+0xfe/0x140
  ? do_execveat_common.isra.0+0x238/0x710
  exit_to_user_mode_prepare+0x15f/0x250
  syscall_exit_to_user_mode+0x19/0x50
  do_syscall_64+0x42/0xb0
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
 RIP: 0023:0x0
 Code: Unable to access opcode bytes at 0xffffffffffffffd6.
 RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
  </TASK>
 Kernel panic - not syncing: panic_on_warn set ...

Add a NULL check on tctx to prevent this.

Fixes: d56d938b4bef ("io_uring: do ctx initiated file note removal")
Reported-by: syzkaller <[email protected]>
Signed-off-by: Harshit Mogalapalli <[email protected]>
---
Could not find the root cause of this.
---
 io_uring/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8840cf3e20f2..20f7d8655b50 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2708,7 +2708,7 @@ static __cold void io_tctx_exit_cb(struct callback_head *cb)
 	 * When @in_idle, we're in cancellation and it's racy to remove the
 	 * node. It'll be removed by the end of cancellation, just ignore it.
 	 */
-	if (!atomic_read(&tctx->in_idle))
+	if (tctx && !atomic_read(&tctx->in_idle))
 		io_uring_del_tctx_node((unsigned long)work->ctx);
 	complete(&work->completion);
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
  2022-12-06  9:38 [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb() Harshit Mogalapalli
@ 2022-12-06 13:52 ` Vegard Nossum
  2022-12-06 21:15 ` Jens Axboe
  1 sibling, 0 replies; 6+ messages in thread
From: Vegard Nossum @ 2022-12-06 13:52 UTC (permalink / raw)
  To: Harshit Mogalapalli
  Cc: harshit.m.mogalapalli, george.kennedy, darren.kenny, syzkaller,
	Jens Axboe, Pavel Begunkov, io-uring, linux-kernel

On 12/6/22 10:38, Harshit Mogalapalli wrote:
> Syzkaller reports a NULL deref bug as follows:
> 
>   BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3

[...]

> Add a NULL check on tctx to prevent this.
> 
> Fixes: d56d938b4bef ("io_uring: do ctx initiated file note removal")
> Reported-by: syzkaller <[email protected]>
> Signed-off-by: Harshit Mogalapalli <[email protected]>
> ---
> Could not find the root cause of this.

Hi,

I don't think the patch is correct as-is -- we should in any case
probably understand better what's going on.

I think what's happening is something like this, where tsk->io_uring is
set to NULL in begin_new_exec() while we have a pending callback:

fd = io_uring_setup()
[...]

close(fd) ?
- __fput()
   - io_uring_release()
     - io_ring_ctx_wait_and_kill()
       - init_task_work(..., io_tctx_exit_cb) // callback posted

exec()
- begin_new_exec()
   - io_uring_task_cancel()
     - __io_uring_cancel()
       - io_uring_cancel_generic()
         - __io_uring_free()
           - tsk->io_uring = NULL // pointer nulled
- syscall_exit_to_user_mode()
   - [...]
     - task_work_run()
       - io_tctx_exit_cb()
         - *current->io_uring // callback runs: oops

As far as I can tell, whatever is happening in io_ring_exit_work() is
happening too late, as task->io_uring has already been set to NULL.

It looks a bit like this is supposed to be handled in
io_uring_cancel_generic() already where it tries to cancel and wait for
all the outstanding work items to finish, but maybe that is not taking
into account the fact that the exit callback is still pending? Should
io_ring_ctx_wait_and_kill() bump the inflight counter..?

It's unclear to me whether the io_ring_ctx_wait_and_kill() call is
coming through close(), dup2(), or simply exec(), but it looks like this
could potentially get delayed (from the current syscall) and thus pushed
into the exec() call. Maybe flush_delayed_fput() needs to be called
somewhere..?

Anyway, I could be completely off base here as I'm not really familiar
with the code, just wanted to share my notes.

Vegard

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
  2022-12-06  9:38 [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb() Harshit Mogalapalli
  2022-12-06 13:52 ` Vegard Nossum
@ 2022-12-06 21:15 ` Jens Axboe
  2022-12-06 21:30   ` Jens Axboe
  2022-12-07  2:40   ` Harshit Mogalapalli
  1 sibling, 2 replies; 6+ messages in thread
From: Jens Axboe @ 2022-12-06 21:15 UTC (permalink / raw)
  To: Harshit Mogalapalli
  Cc: harshit.m.mogalapalli, vegard.nossum, george.kennedy,
	darren.kenny, syzkaller, Pavel Begunkov, io-uring, linux-kernel

On 12/6/22 2:38?AM, Harshit Mogalapalli wrote:
> Syzkaller reports a NULL deref bug as follows:
> 
>  BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
>  Read of size 4 at addr 0000000000000138 by task file1/1955
> 
>  CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
>  Call Trace:
>   <TASK>
>   dump_stack_lvl+0xcd/0x134
>   ? io_tctx_exit_cb+0x53/0xd3
>   kasan_report+0xbb/0x1f0
>   ? io_tctx_exit_cb+0x53/0xd3
>   kasan_check_range+0x140/0x190
>   io_tctx_exit_cb+0x53/0xd3
>   task_work_run+0x164/0x250
>   ? task_work_cancel+0x30/0x30
>   get_signal+0x1c3/0x2440
>   ? lock_downgrade+0x6e0/0x6e0
>   ? lock_downgrade+0x6e0/0x6e0
>   ? exit_signals+0x8b0/0x8b0
>   ? do_raw_read_unlock+0x3b/0x70
>   ? do_raw_spin_unlock+0x50/0x230
>   arch_do_signal_or_restart+0x82/0x2470
>   ? kmem_cache_free+0x260/0x4b0
>   ? putname+0xfe/0x140
>   ? get_sigframe_size+0x10/0x10
>   ? do_execveat_common.isra.0+0x226/0x710
>   ? lockdep_hardirqs_on+0x79/0x100
>   ? putname+0xfe/0x140
>   ? do_execveat_common.isra.0+0x238/0x710
>   exit_to_user_mode_prepare+0x15f/0x250
>   syscall_exit_to_user_mode+0x19/0x50
>   do_syscall_64+0x42/0xb0
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>  RIP: 0023:0x0
>  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>  RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
>  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>   </TASK>
>  Kernel panic - not syncing: panic_on_warn set ...
> 
> Add a NULL check on tctx to prevent this.

I agree with Vegard that I don't think this is fixing the core of
the issue. I think what is happening here is that we don't run the
task_work in io_uring_cancel_generic() unconditionally, if we don't
need to in the loop above. But we do need to ensure we run it before
we clear current->io_uring.

Do you have a reproducer? If so, can you try the below? I _think_
this is all we need. We can't be hitting the delayed fput path as
the task isn't exiting, and we're dealing with current here.


diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 36cb63e4174f..4791d94c88f5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3125,6 +3125,15 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd)
 
 	io_uring_clean_tctx(tctx);
 	if (cancel_all) {
+		/*
+		 * If we didn't run task_work in the loop above, ensure we
+		 * do so here. If an fput() queued up exit task_work for the
+		 * ring descriptor before we started the exec that led to this
+		 * cancelation, then we need to have that run before we proceed
+		 * with tearing down current->io_uring.
+		 */
+		io_run_task_work();
+
 		/*
 		 * We shouldn't run task_works after cancel, so just leave
 		 * ->in_idle set for normal exit.

-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
  2022-12-06 21:15 ` Jens Axboe
@ 2022-12-06 21:30   ` Jens Axboe
  2022-12-06 22:51     ` Jens Axboe
  2022-12-07  2:40   ` Harshit Mogalapalli
  1 sibling, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2022-12-06 21:30 UTC (permalink / raw)
  To: Harshit Mogalapalli
  Cc: harshit.m.mogalapalli, vegard.nossum, george.kennedy,
	darren.kenny, syzkaller, Pavel Begunkov, io-uring, linux-kernel

On 12/6/22 2:15?PM, Jens Axboe wrote:
> On 12/6/22 2:38?AM, Harshit Mogalapalli wrote:
>> Syzkaller reports a NULL deref bug as follows:
>>
>>  BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
>>  Read of size 4 at addr 0000000000000138 by task file1/1955
>>
>>  CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
>>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
>>  Call Trace:
>>   <TASK>
>>   dump_stack_lvl+0xcd/0x134
>>   ? io_tctx_exit_cb+0x53/0xd3
>>   kasan_report+0xbb/0x1f0
>>   ? io_tctx_exit_cb+0x53/0xd3
>>   kasan_check_range+0x140/0x190
>>   io_tctx_exit_cb+0x53/0xd3
>>   task_work_run+0x164/0x250
>>   ? task_work_cancel+0x30/0x30
>>   get_signal+0x1c3/0x2440
>>   ? lock_downgrade+0x6e0/0x6e0
>>   ? lock_downgrade+0x6e0/0x6e0
>>   ? exit_signals+0x8b0/0x8b0
>>   ? do_raw_read_unlock+0x3b/0x70
>>   ? do_raw_spin_unlock+0x50/0x230
>>   arch_do_signal_or_restart+0x82/0x2470
>>   ? kmem_cache_free+0x260/0x4b0
>>   ? putname+0xfe/0x140
>>   ? get_sigframe_size+0x10/0x10
>>   ? do_execveat_common.isra.0+0x226/0x710
>>   ? lockdep_hardirqs_on+0x79/0x100
>>   ? putname+0xfe/0x140
>>   ? do_execveat_common.isra.0+0x238/0x710
>>   exit_to_user_mode_prepare+0x15f/0x250
>>   syscall_exit_to_user_mode+0x19/0x50
>>   do_syscall_64+0x42/0xb0
>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>  RIP: 0023:0x0
>>  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>>  RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
>>  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>>  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>   </TASK>
>>  Kernel panic - not syncing: panic_on_warn set ...
>>
>> Add a NULL check on tctx to prevent this.
> 
> I agree with Vegard that I don't think this is fixing the core of
> the issue. I think what is happening here is that we don't run the
> task_work in io_uring_cancel_generic() unconditionally, if we don't
> need to in the loop above. But we do need to ensure we run it before
> we clear current->io_uring.
> 
> Do you have a reproducer? If so, can you try the below? I _think_
> this is all we need. We can't be hitting the delayed fput path as
> the task isn't exiting, and we're dealing with current here.

While I think the above is the right description of what happens, I
think there's still a race with the proposed solution. If the task_work
gets added right after the newly inserted io_run_task_work(), then we'll
still crash when the targeted task exits to userspace and runs the
task_work.

It should actually be fine to add that NULL check in io_tctx_exit_cb. We
can't be racing here, as both the clear and io_tctx_exit_cb() are run by
current itself. It's really just an ordering issue.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
  2022-12-06 21:30   ` Jens Axboe
@ 2022-12-06 22:51     ` Jens Axboe
  0 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2022-12-06 22:51 UTC (permalink / raw)
  To: Harshit Mogalapalli
  Cc: harshit.m.mogalapalli, vegard.nossum, george.kennedy,
	darren.kenny, syzkaller, Pavel Begunkov, io-uring, linux-kernel

On 12/6/22 2:30 PM, Jens Axboe wrote:
> On 12/6/22 2:15?PM, Jens Axboe wrote:
>> On 12/6/22 2:38?AM, Harshit Mogalapalli wrote:
>>> Syzkaller reports a NULL deref bug as follows:
>>>
>>>  BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
>>>  Read of size 4 at addr 0000000000000138 by task file1/1955
>>>
>>>  CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
>>>  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
>>>  Call Trace:
>>>   <TASK>
>>>   dump_stack_lvl+0xcd/0x134
>>>   ? io_tctx_exit_cb+0x53/0xd3
>>>   kasan_report+0xbb/0x1f0
>>>   ? io_tctx_exit_cb+0x53/0xd3
>>>   kasan_check_range+0x140/0x190
>>>   io_tctx_exit_cb+0x53/0xd3
>>>   task_work_run+0x164/0x250
>>>   ? task_work_cancel+0x30/0x30
>>>   get_signal+0x1c3/0x2440
>>>   ? lock_downgrade+0x6e0/0x6e0
>>>   ? lock_downgrade+0x6e0/0x6e0
>>>   ? exit_signals+0x8b0/0x8b0
>>>   ? do_raw_read_unlock+0x3b/0x70
>>>   ? do_raw_spin_unlock+0x50/0x230
>>>   arch_do_signal_or_restart+0x82/0x2470
>>>   ? kmem_cache_free+0x260/0x4b0
>>>   ? putname+0xfe/0x140
>>>   ? get_sigframe_size+0x10/0x10
>>>   ? do_execveat_common.isra.0+0x226/0x710
>>>   ? lockdep_hardirqs_on+0x79/0x100
>>>   ? putname+0xfe/0x140
>>>   ? do_execveat_common.isra.0+0x238/0x710
>>>   exit_to_user_mode_prepare+0x15f/0x250
>>>   syscall_exit_to_user_mode+0x19/0x50
>>>   do_syscall_64+0x42/0xb0
>>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>  RIP: 0023:0x0
>>>  Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>>>  RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
>>>  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>>  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>>>  RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>>>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>  R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>   </TASK>
>>>  Kernel panic - not syncing: panic_on_warn set ...
>>>
>>> Add a NULL check on tctx to prevent this.
>>
>> I agree with Vegard that I don't think this is fixing the core of
>> the issue. I think what is happening here is that we don't run the
>> task_work in io_uring_cancel_generic() unconditionally, if we don't
>> need to in the loop above. But we do need to ensure we run it before
>> we clear current->io_uring.
>>
>> Do you have a reproducer? If so, can you try the below? I _think_
>> this is all we need. We can't be hitting the delayed fput path as
>> the task isn't exiting, and we're dealing with current here.
> 
> While I think the above is the right description of what happens, I
> think there's still a race with the proposed solution. If the task_work
> gets added right after the newly inserted io_run_task_work(), then we'll
> still crash when the targeted task exits to userspace and runs the
> task_work.
> 
> It should actually be fine to add that NULL check in io_tctx_exit_cb. We
> can't be racing here, as both the clear and io_tctx_exit_cb() are run by
> current itself. It's really just an ordering issue.

I've queued it up with an improved commit message, and also a code
comment:

https://git.kernel.dk/cgit/linux-block/commit/?h=for-6.2/io_uring-next&id=6d1b48314b989d059642958fc94ef0a58b25fc8c

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb()
  2022-12-06 21:15 ` Jens Axboe
  2022-12-06 21:30   ` Jens Axboe
@ 2022-12-07  2:40   ` Harshit Mogalapalli
  1 sibling, 0 replies; 6+ messages in thread
From: Harshit Mogalapalli @ 2022-12-07  2:40 UTC (permalink / raw)
  To: Jens Axboe
  Cc: harshit.m.mogalapalli, george.kennedy, darren.kenny, syzkaller,
	Pavel Begunkov, io-uring, linux-kernel, vegard.nossum



On 07/12/22 2:45 am, Jens Axboe wrote:
> On 12/6/22 2:38?AM, Harshit Mogalapalli wrote:
>> Syzkaller reports a NULL deref bug as follows:
>>
>>   BUG: KASAN: null-ptr-deref in io_tctx_exit_cb+0x53/0xd3
>>   Read of size 4 at addr 0000000000000138 by task file1/1955
>>
>>   CPU: 1 PID: 1955 Comm: file1 Not tainted 6.1.0-rc7-00103-gef4d3ea40565 #75
>>   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
>>   Call Trace:
>>    <TASK>
>>    dump_stack_lvl+0xcd/0x134
>>    ? io_tctx_exit_cb+0x53/0xd3
>>    kasan_report+0xbb/0x1f0
>>    ? io_tctx_exit_cb+0x53/0xd3
>>    kasan_check_range+0x140/0x190
>>    io_tctx_exit_cb+0x53/0xd3
>>    task_work_run+0x164/0x250
>>    ? task_work_cancel+0x30/0x30
>>    get_signal+0x1c3/0x2440
>>    ? lock_downgrade+0x6e0/0x6e0
>>    ? lock_downgrade+0x6e0/0x6e0
>>    ? exit_signals+0x8b0/0x8b0
>>    ? do_raw_read_unlock+0x3b/0x70
>>    ? do_raw_spin_unlock+0x50/0x230
>>    arch_do_signal_or_restart+0x82/0x2470
>>    ? kmem_cache_free+0x260/0x4b0
>>    ? putname+0xfe/0x140
>>    ? get_sigframe_size+0x10/0x10
>>    ? do_execveat_common.isra.0+0x226/0x710
>>    ? lockdep_hardirqs_on+0x79/0x100
>>    ? putname+0xfe/0x140
>>    ? do_execveat_common.isra.0+0x238/0x710
>>    exit_to_user_mode_prepare+0x15f/0x250
>>    syscall_exit_to_user_mode+0x19/0x50
>>    do_syscall_64+0x42/0xb0
>>    entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>   RIP: 0023:0x0
>>   Code: Unable to access opcode bytes at 0xffffffffffffffd6.
>>   RSP: 002b:00000000fffb7790 EFLAGS: 00000200 ORIG_RAX: 000000000000000b
>>   RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
>>   RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>>   RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
>>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>   R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>    </TASK>
>>   Kernel panic - not syncing: panic_on_warn set ...
>>
>> Add a NULL check on tctx to prevent this.
> 
> I agree with Vegard that I don't think this is fixing the core of
> the issue. I think what is happening here is that we don't run the
> task_work in io_uring_cancel_generic() unconditionally, if we don't
> need to in the loop above. But we do need to ensure we run it before
> we clear current->io_uring.
> 
> Do you have a reproducer? If so, can you try the below? I _think_
> this is all we need. We can't be hitting the delayed fput path as
> the task isn't exiting, and we're dealing with current here.
> 
> 
Thanks Jens and Vegard for the suggestions and analysis.

Yes, the below patch silences the reproducer.

> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 36cb63e4174f..4791d94c88f5 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3125,6 +3125,15 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd)
>   
>   	io_uring_clean_tctx(tctx);
>   	if (cancel_all) {
> +		/*
> +		 * If we didn't run task_work in the loop above, ensure we
> +		 * do so here. If an fput() queued up exit task_work for the
> +		 * ring descriptor before we started the exec that led to this
> +		 * cancelation, then we need to have that run before we proceed
> +		 * with tearing down current->io_uring.
> +		 */
> +		io_run_task_work();
> +
>   		/*
>   		 * We shouldn't run task_works after cancel, so just leave
>   		 * ->in_idle set for normal exit.
> 

Thanks,
Harshit

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-12-07  2:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-06  9:38 [PATCH] io_uring: Fix a null-ptr-deref in io_tctx_exit_cb() Harshit Mogalapalli
2022-12-06 13:52 ` Vegard Nossum
2022-12-06 21:15 ` Jens Axboe
2022-12-06 21:30   ` Jens Axboe
2022-12-06 22:51     ` Jens Axboe
2022-12-07  2:40   ` Harshit Mogalapalli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox