public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
@ 2020-12-02 11:31 Xiaoguang Wang
  2020-12-02 17:00 ` Pavel Begunkov
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Xiaoguang Wang @ 2020-12-02 11:31 UTC (permalink / raw)
  To: io-uring; +Cc: axboe, joseph.qi

Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
[   95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
[   95.505921]
[   95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G    B
W         5.10.0-rc5+ #1
[   95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[   95.508248] Call Trace:
[   95.508683]  dump_stack+0x107/0x163
[   95.509323]  ? io_commit_cqring+0x3ec/0x8e0
[   95.509982]  print_address_description.constprop.0+0x3e/0x60
[   95.510814]  ? vprintk_func+0x98/0x140
[   95.511399]  ? io_commit_cqring+0x3ec/0x8e0
[   95.512036]  ? io_commit_cqring+0x3ec/0x8e0
[   95.512733]  kasan_report_invalid_free+0x51/0x80
[   95.513431]  ? io_commit_cqring+0x3ec/0x8e0
[   95.514047]  __kasan_slab_free+0x141/0x160
[   95.514699]  kfree+0xd1/0x390
[   95.515182]  io_commit_cqring+0x3ec/0x8e0
[   95.515799]  __io_req_complete.part.0+0x64/0x90
[   95.516483]  io_wq_submit_work+0x1fa/0x260
[   95.517117]  io_worker_handle_work+0xeac/0x1c00
[   95.517828]  io_wqe_worker+0xc94/0x11a0
[   95.518438]  ? io_worker_handle_work+0x1c00/0x1c00
[   95.519151]  ? __kthread_parkme+0x11d/0x1d0
[   95.519806]  ? io_worker_handle_work+0x1c00/0x1c00
[   95.520512]  ? io_worker_handle_work+0x1c00/0x1c00
[   95.521211]  kthread+0x396/0x470
[   95.521727]  ? _raw_spin_unlock_irq+0x24/0x30
[   95.522380]  ? kthread_mod_delayed_work+0x180/0x180
[   95.523108]  ret_from_fork+0x22/0x30
[   95.523684]
[   95.523985] Allocated by task 4035:
[   95.524543]  kasan_save_stack+0x1b/0x40
[   95.525136]  __kasan_kmalloc.constprop.0+0xc2/0xd0
[   95.525882]  kmem_cache_alloc_trace+0x17b/0x310
[   95.533930]  io_queue_sqe+0x225/0xcb0
[   95.534505]  io_submit_sqes+0x1768/0x25f0
[   95.535164]  __x64_sys_io_uring_enter+0x89e/0xd10
[   95.535900]  do_syscall_64+0x33/0x40
[   95.536465]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   95.537199]
[   95.537505] Freed by task 4035:
[   95.538003]  kasan_save_stack+0x1b/0x40
[   95.538599]  kasan_set_track+0x1c/0x30
[   95.539177]  kasan_set_free_info+0x1b/0x30
[   95.539798]  __kasan_slab_free+0x112/0x160
[   95.540427]  kfree+0xd1/0x390
[   95.540910]  io_commit_cqring+0x3ec/0x8e0
[   95.541516]  io_iopoll_complete+0x914/0x1390
[   95.542150]  io_do_iopoll+0x580/0x700
[   95.542724]  io_iopoll_try_reap_events.part.0+0x108/0x200
[   95.543512]  io_ring_ctx_wait_and_kill+0x118/0x340
[   95.544206]  io_uring_release+0x43/0x50
[   95.544791]  __fput+0x28d/0x940
[   95.545291]  task_work_run+0xea/0x1b0
[   95.545873]  do_exit+0xb6a/0x2c60
[   95.546400]  do_group_exit+0x12a/0x320
[   95.546967]  __x64_sys_exit_group+0x3f/0x50
[   95.547605]  do_syscall_64+0x33/0x40
[   95.548155]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
we'll complete req by calling io_req_complete(), which will hold completion_lock
to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
hold completion_lock to call io_commit_cqring(), then there maybe concurrent
access to ctx->defer_list, double free may happen.

To fix this bug, we always let io_iopoll_complete() complete polled io.

Reported-by: Abaci Fuzz <[email protected]>
Signed-off-by: Xiaoguang Wang <[email protected]>
---
 fs/io_uring.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a8c136a..901ca67 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6074,8 +6074,19 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work)
 	}
 
 	if (ret) {
-		req_set_fail_links(req);
-		io_req_complete(req, ret);
+		/*
+		 * io_iopoll_complete() does not hold completion_lock to complete
+		 * polled io, so here for polled io, just mark it done and still let
+		 * io_iopoll_complete() complete it.
+		 */
+		if (req->ctx->flags & IORING_SETUP_IOPOLL) {
+			struct kiocb *kiocb = &req->rw.kiocb;
+
+			kiocb_done(kiocb, ret, NULL);
+		} else {
+			req_set_fail_links(req);
+			io_req_complete(req, ret);
+		}
 	}
 
 	return io_steal_work(req);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
  2020-12-02 11:31 [PATCH] io_uring: always let io_iopoll_complete() complete polled io Xiaoguang Wang
@ 2020-12-02 17:00 ` Pavel Begunkov
  2020-12-02 17:09   ` Pavel Begunkov
  2020-12-03  2:30 ` Joseph Qi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2020-12-02 17:00 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe, joseph.qi

On 02/12/2020 11:31, Xiaoguang Wang wrote:
> Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
> [   95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
> [   95.505921]
> [   95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G    B
> W         5.10.0-rc5+ #1
> [   95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   95.508248] Call Trace:
> [   95.508683]  dump_stack+0x107/0x163
> [   95.509323]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.509982]  print_address_description.constprop.0+0x3e/0x60
> [   95.510814]  ? vprintk_func+0x98/0x140
> [   95.511399]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.512036]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.512733]  kasan_report_invalid_free+0x51/0x80
> [   95.513431]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.514047]  __kasan_slab_free+0x141/0x160
> [   95.514699]  kfree+0xd1/0x390
> [   95.515182]  io_commit_cqring+0x3ec/0x8e0
> [   95.515799]  __io_req_complete.part.0+0x64/0x90
> [   95.516483]  io_wq_submit_work+0x1fa/0x260
> [   95.517117]  io_worker_handle_work+0xeac/0x1c00
> [   95.517828]  io_wqe_worker+0xc94/0x11a0
> [   95.518438]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.519151]  ? __kthread_parkme+0x11d/0x1d0
> [   95.519806]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.520512]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.521211]  kthread+0x396/0x470
> [   95.521727]  ? _raw_spin_unlock_irq+0x24/0x30
> [   95.522380]  ? kthread_mod_delayed_work+0x180/0x180
> [   95.523108]  ret_from_fork+0x22/0x30
> [   95.523684]
> [   95.523985] Allocated by task 4035:
> [   95.524543]  kasan_save_stack+0x1b/0x40
> [   95.525136]  __kasan_kmalloc.constprop.0+0xc2/0xd0
> [   95.525882]  kmem_cache_alloc_trace+0x17b/0x310
> [   95.533930]  io_queue_sqe+0x225/0xcb0
> [   95.534505]  io_submit_sqes+0x1768/0x25f0
> [   95.535164]  __x64_sys_io_uring_enter+0x89e/0xd10
> [   95.535900]  do_syscall_64+0x33/0x40
> [   95.536465]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   95.537199]
> [   95.537505] Freed by task 4035:
> [   95.538003]  kasan_save_stack+0x1b/0x40
> [   95.538599]  kasan_set_track+0x1c/0x30
> [   95.539177]  kasan_set_free_info+0x1b/0x30
> [   95.539798]  __kasan_slab_free+0x112/0x160
> [   95.540427]  kfree+0xd1/0x390
> [   95.540910]  io_commit_cqring+0x3ec/0x8e0
> [   95.541516]  io_iopoll_complete+0x914/0x1390
> [   95.542150]  io_do_iopoll+0x580/0x700
> [   95.542724]  io_iopoll_try_reap_events.part.0+0x108/0x200
> [   95.543512]  io_ring_ctx_wait_and_kill+0x118/0x340
> [   95.544206]  io_uring_release+0x43/0x50
> [   95.544791]  __fput+0x28d/0x940
> [   95.545291]  task_work_run+0xea/0x1b0
> [   95.545873]  do_exit+0xb6a/0x2c60
> [   95.546400]  do_group_exit+0x12a/0x320
> [   95.546967]  __x64_sys_exit_group+0x3f/0x50
> [   95.547605]  do_syscall_64+0x33/0x40
> [   95.548155]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
> we'll complete req by calling io_req_complete(), which will hold completion_lock
> to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
> hold completion_lock to call io_commit_cqring(), then there maybe concurrent
> access to ctx->defer_list, double free may happen.
> 
> To fix this bug, we always let io_iopoll_complete() complete polled io.

It makes sense if it got there though means of REQ_F_FORCE_ASYNC or as a linked,
but a thing I'm afraid of is going twice through the end section of io_issue_sqe()
(i.e. io_iopoll_req_issued). Shouldn't happen though.

Reviewed-by: Pavel Begunkov <[email protected]>

> 
> Reported-by: Abaci Fuzz <[email protected]>
> Signed-off-by: Xiaoguang Wang <[email protected]>
> ---
>  fs/io_uring.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a8c136a..901ca67 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6074,8 +6074,19 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work)
>  	}
>  
>  	if (ret) {
> -		req_set_fail_links(req);
> -		io_req_complete(req, ret);
> +		/*
> +		 * io_iopoll_complete() does not hold completion_lock to complete
> +		 * polled io, so here for polled io, just mark it done and still let
> +		 * io_iopoll_complete() complete it.
> +		 */
> +		if (req->ctx->flags & IORING_SETUP_IOPOLL) {
> +			struct kiocb *kiocb = &req->rw.kiocb;
> +
> +			kiocb_done(kiocb, ret, NULL);
> +		} else {
> +			req_set_fail_links(req);
> +			io_req_complete(req, ret);
> +		}
>  	}
>  
>  	return io_steal_work(req);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
  2020-12-02 17:00 ` Pavel Begunkov
@ 2020-12-02 17:09   ` Pavel Begunkov
  0 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2020-12-02 17:09 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe, joseph.qi

On 02/12/2020 17:00, Pavel Begunkov wrote:
> On 02/12/2020 11:31, Xiaoguang Wang wrote:
>> Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
>> [   95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
>> [   95.505921]
>> [   95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G    B
>> W         5.10.0-rc5+ #1
>> [   95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> [   95.508248] Call Trace:
>> [   95.508683]  dump_stack+0x107/0x163
>> [   95.509323]  ? io_commit_cqring+0x3ec/0x8e0
>> [   95.509982]  print_address_description.constprop.0+0x3e/0x60
>> [   95.510814]  ? vprintk_func+0x98/0x140
>> [   95.511399]  ? io_commit_cqring+0x3ec/0x8e0
>> [   95.512036]  ? io_commit_cqring+0x3ec/0x8e0
>> [   95.512733]  kasan_report_invalid_free+0x51/0x80
>> [   95.513431]  ? io_commit_cqring+0x3ec/0x8e0
>> [   95.514047]  __kasan_slab_free+0x141/0x160
>> [   95.514699]  kfree+0xd1/0x390
>> [   95.515182]  io_commit_cqring+0x3ec/0x8e0
>> [   95.515799]  __io_req_complete.part.0+0x64/0x90
>> [   95.516483]  io_wq_submit_work+0x1fa/0x260
>> [   95.517117]  io_worker_handle_work+0xeac/0x1c00
>> [   95.517828]  io_wqe_worker+0xc94/0x11a0
>> [   95.518438]  ? io_worker_handle_work+0x1c00/0x1c00
>> [   95.519151]  ? __kthread_parkme+0x11d/0x1d0
>> [   95.519806]  ? io_worker_handle_work+0x1c00/0x1c00
>> [   95.520512]  ? io_worker_handle_work+0x1c00/0x1c00
>> [   95.521211]  kthread+0x396/0x470
>> [   95.521727]  ? _raw_spin_unlock_irq+0x24/0x30
>> [   95.522380]  ? kthread_mod_delayed_work+0x180/0x180
>> [   95.523108]  ret_from_fork+0x22/0x30
>> [   95.523684]
>> [   95.523985] Allocated by task 4035:
>> [   95.524543]  kasan_save_stack+0x1b/0x40
>> [   95.525136]  __kasan_kmalloc.constprop.0+0xc2/0xd0
>> [   95.525882]  kmem_cache_alloc_trace+0x17b/0x310
>> [   95.533930]  io_queue_sqe+0x225/0xcb0
>> [   95.534505]  io_submit_sqes+0x1768/0x25f0
>> [   95.535164]  __x64_sys_io_uring_enter+0x89e/0xd10
>> [   95.535900]  do_syscall_64+0x33/0x40
>> [   95.536465]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [   95.537199]
>> [   95.537505] Freed by task 4035:
>> [   95.538003]  kasan_save_stack+0x1b/0x40
>> [   95.538599]  kasan_set_track+0x1c/0x30
>> [   95.539177]  kasan_set_free_info+0x1b/0x30
>> [   95.539798]  __kasan_slab_free+0x112/0x160
>> [   95.540427]  kfree+0xd1/0x390
>> [   95.540910]  io_commit_cqring+0x3ec/0x8e0
>> [   95.541516]  io_iopoll_complete+0x914/0x1390
>> [   95.542150]  io_do_iopoll+0x580/0x700
>> [   95.542724]  io_iopoll_try_reap_events.part.0+0x108/0x200
>> [   95.543512]  io_ring_ctx_wait_and_kill+0x118/0x340
>> [   95.544206]  io_uring_release+0x43/0x50
>> [   95.544791]  __fput+0x28d/0x940
>> [   95.545291]  task_work_run+0xea/0x1b0
>> [   95.545873]  do_exit+0xb6a/0x2c60
>> [   95.546400]  do_group_exit+0x12a/0x320
>> [   95.546967]  __x64_sys_exit_group+0x3f/0x50
>> [   95.547605]  do_syscall_64+0x33/0x40
>> [   95.548155]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
>> we'll complete req by calling io_req_complete(), which will hold completion_lock
>> to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
>> hold completion_lock to call io_commit_cqring(), then there maybe concurrent
>> access to ctx->defer_list, double free may happen.
>>
>> To fix this bug, we always let io_iopoll_complete() complete polled io.

This one can use a test by the way. E.g. sending a bunch of iopoll requests,
both generic and REQ_F_FORCE_ASYNC, and expect it not to fail. 

> 
> It makes sense if it got there though means of REQ_F_FORCE_ASYNC or as a linked,
> but a thing I'm afraid of is going twice through the end section of io_issue_sqe()
> (i.e. io_iopoll_req_issued). Shouldn't happen though.
> 
> Reviewed-by: Pavel Begunkov <[email protected]>
> 
>>
>> Reported-by: Abaci Fuzz <[email protected]>
>> Signed-off-by: Xiaoguang Wang <[email protected]>
>> ---
>>  fs/io_uring.c | 15 +++++++++++++--
>>  1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index a8c136a..901ca67 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -6074,8 +6074,19 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work)
>>  	}
>>  
>>  	if (ret) {
>> -		req_set_fail_links(req);
>> -		io_req_complete(req, ret);
>> +		/*
>> +		 * io_iopoll_complete() does not hold completion_lock to complete
>> +		 * polled io, so here for polled io, just mark it done and still let
>> +		 * io_iopoll_complete() complete it.
>> +		 */
>> +		if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>> +			struct kiocb *kiocb = &req->rw.kiocb;
>> +
>> +			kiocb_done(kiocb, ret, NULL);
>> +		} else {
>> +			req_set_fail_links(req);
>> +			io_req_complete(req, ret);
>> +		}
>>  	}
>>  
>>  	return io_steal_work(req);
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
  2020-12-02 11:31 [PATCH] io_uring: always let io_iopoll_complete() complete polled io Xiaoguang Wang
  2020-12-02 17:00 ` Pavel Begunkov
@ 2020-12-03  2:30 ` Joseph Qi
  2020-12-04 20:31   ` Pavel Begunkov
  2020-12-06 22:25 ` Pavel Begunkov
  2020-12-11 12:29 ` Pavel Begunkov
  3 siblings, 1 reply; 10+ messages in thread
From: Joseph Qi @ 2020-12-03  2:30 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe



On 12/2/20 7:31 PM, Xiaoguang Wang wrote:
> Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
> [   95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
> [   95.505921]
> [   95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G    B
> W         5.10.0-rc5+ #1
> [   95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   95.508248] Call Trace:
> [   95.508683]  dump_stack+0x107/0x163
> [   95.509323]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.509982]  print_address_description.constprop.0+0x3e/0x60
> [   95.510814]  ? vprintk_func+0x98/0x140
> [   95.511399]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.512036]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.512733]  kasan_report_invalid_free+0x51/0x80
> [   95.513431]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.514047]  __kasan_slab_free+0x141/0x160
> [   95.514699]  kfree+0xd1/0x390
> [   95.515182]  io_commit_cqring+0x3ec/0x8e0
> [   95.515799]  __io_req_complete.part.0+0x64/0x90
> [   95.516483]  io_wq_submit_work+0x1fa/0x260
> [   95.517117]  io_worker_handle_work+0xeac/0x1c00
> [   95.517828]  io_wqe_worker+0xc94/0x11a0
> [   95.518438]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.519151]  ? __kthread_parkme+0x11d/0x1d0
> [   95.519806]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.520512]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.521211]  kthread+0x396/0x470
> [   95.521727]  ? _raw_spin_unlock_irq+0x24/0x30
> [   95.522380]  ? kthread_mod_delayed_work+0x180/0x180
> [   95.523108]  ret_from_fork+0x22/0x30
> [   95.523684]
> [   95.523985] Allocated by task 4035:
> [   95.524543]  kasan_save_stack+0x1b/0x40
> [   95.525136]  __kasan_kmalloc.constprop.0+0xc2/0xd0
> [   95.525882]  kmem_cache_alloc_trace+0x17b/0x310
> [   95.533930]  io_queue_sqe+0x225/0xcb0
> [   95.534505]  io_submit_sqes+0x1768/0x25f0
> [   95.535164]  __x64_sys_io_uring_enter+0x89e/0xd10
> [   95.535900]  do_syscall_64+0x33/0x40
> [   95.536465]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   95.537199]
> [   95.537505] Freed by task 4035:
> [   95.538003]  kasan_save_stack+0x1b/0x40
> [   95.538599]  kasan_set_track+0x1c/0x30
> [   95.539177]  kasan_set_free_info+0x1b/0x30
> [   95.539798]  __kasan_slab_free+0x112/0x160
> [   95.540427]  kfree+0xd1/0x390
> [   95.540910]  io_commit_cqring+0x3ec/0x8e0
> [   95.541516]  io_iopoll_complete+0x914/0x1390
> [   95.542150]  io_do_iopoll+0x580/0x700
> [   95.542724]  io_iopoll_try_reap_events.part.0+0x108/0x200
> [   95.543512]  io_ring_ctx_wait_and_kill+0x118/0x340
> [   95.544206]  io_uring_release+0x43/0x50
> [   95.544791]  __fput+0x28d/0x940
> [   95.545291]  task_work_run+0xea/0x1b0
> [   95.545873]  do_exit+0xb6a/0x2c60
> [   95.546400]  do_group_exit+0x12a/0x320
> [   95.546967]  __x64_sys_exit_group+0x3f/0x50
> [   95.547605]  do_syscall_64+0x33/0x40
> [   95.548155]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
> we'll complete req by calling io_req_complete(), which will hold completion_lock
> to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
> hold completion_lock to call io_commit_cqring(), then there maybe concurrent
> access to ctx->defer_list, double free may happen.
> 
> To fix this bug, we always let io_iopoll_complete() complete polled io.
> 
> Reported-by: Abaci Fuzz <[email protected]>
> Signed-off-by: Xiaoguang Wang <[email protected]>
> ---
>  fs/io_uring.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a8c136a..901ca67 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6074,8 +6074,19 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work)
>  	}
>  
>  	if (ret) {
> -		req_set_fail_links(req);
> -		io_req_complete(req, ret);
> +		/*
> +		 * io_iopoll_complete() does not hold completion_lock to complete
> +		 * polled io, so here for polled io, just mark it done and still let
> +		 * io_iopoll_complete() complete it.
> +		 */
> +		if (req->ctx->flags & IORING_SETUP_IOPOLL) {
> +			struct kiocb *kiocb = &req->rw.kiocb;
> +
> +			kiocb_done(kiocb, ret, NULL);
> +		} else {
> +			req_set_fail_links(req);
> +			io_req_complete(req, ret);
> +		}
>  	}
>  
>  	return io_steal_work(req);
> 

This patch can also fix another BUG I'm looking at:

[   61.359713] BUG: KASAN: double-free or invalid-free in io_dismantle_req+0x938/0xf40
...
[   61.409315] refcount_t: underflow; use-after-free.
[   61.410261] WARNING: CPU: 1 PID: 1022 at lib/refcount.c:28 refcount_warn_saturate+0x266/0x2a0
...

It blames io_put_identity() has been called more than once and then
identity->count is underflow.

Reviewed-by: Joseph Qi <[email protected]>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
  2020-12-03  2:30 ` Joseph Qi
@ 2020-12-04 20:31   ` Pavel Begunkov
       [not found]     ` <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2020-12-04 20:31 UTC (permalink / raw)
  To: Joseph Qi, Xiaoguang Wang, io-uring; +Cc: axboe

On 03/12/2020 02:30, Joseph Qi wrote:
> This patch can also fix another BUG I'm looking at:
> 
> [   61.359713] BUG: KASAN: double-free or invalid-free in io_dismantle_req+0x938/0xf40
> ...
> [   61.409315] refcount_t: underflow; use-after-free.
> [   61.410261] WARNING: CPU: 1 PID: 1022 at lib/refcount.c:28 refcount_warn_saturate+0x266/0x2a0
> ...
> 
> It blames io_put_identity() has been called more than once and then
> identity->count is underflow.

Joseph, regarding your double-free
1. did you figure out how exactly this happens?
2. is it appears consistently so you can be sure that it's fixed
3. do you have a reproducer?
4. can you paste a full log of this BUG? (not cutting the stacktrace)

There are problems left even with this patch applied, but I need to
confirm which bug you saw.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
       [not found]     ` <[email protected]>
@ 2020-12-05 16:16       ` Pavel Begunkov
  0 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2020-12-05 16:16 UTC (permalink / raw)
  To: Joseph Qi, Xiaoguang Wang, io-uring; +Cc: axboe

On 05/12/2020 12:32, Joseph Qi wrote:
> On 12/5/20 4:31 AM, Pavel Begunkov wrote:
>> On 03/12/2020 02:30, Joseph Qi wrote:
>>> This patch can also fix another BUG I'm looking at:
>>>
>>> [   61.359713] BUG: KASAN: double-free or invalid-free in io_dismantle_req+0x938/0xf40
>>> ...
>>> [   61.409315] refcount_t: underflow; use-after-free.
>>> [   61.410261] WARNING: CPU: 1 PID: 1022 at lib/refcount.c:28 refcount_warn_saturate+0x266/0x2a0
>>> ...
>>>
>>> It blames io_put_identity() has been called more than once and then
>>> identity->count is underflow.
>>
>> Joseph, regarding your double-free
>> 1. did you figure out how exactly this happens?
> From the a little deep analysis, it looks like it caused by a corrupted
> identity->count. Sorry for the misleading before.
> 
>> 2. is it appears consistently so you can be sure that it's fixed> 3. do you have a reproducer?
> 
> Yes, it's a syzkaller C reproducer. It can be reproduced every
> time before applying this patch.
> Attached please find the defails.

Perfect, thanks. It's important to understand for what's really happening,
so I can find similar problems or make it more resilient.

If that's one of those recent syzkaller reports posted to
[email protected], please reply to it so we can keep track
on what is fixed.

> 
>> 4. can you paste a full log of this BUG? (not cutting the stacktrace)
>>
> 
> [   61.358485] ==================================================================
> [   61.359713] BUG: KASAN: double-free or invalid-free in io_dismantle_req+0x938/0xf40
> [   61.360820]
> [   61.361096] CPU: 0 PID: 1022 Comm: io_wqe_worker-0 Not tainted 5.10.0-rc5+ #1
> [   61.362079] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   61.362904] Call Trace:
> [   61.363370]  dump_stack+0x107/0x163
> [   61.363914]  ? io_dismantle_req+0x938/0xf40
> [   61.364584]  print_address_description.constprop.0+0x3e/0x60
> [   61.365583]  ? vprintk_func+0x98/0x140
> [   61.366246]  ? io_dismantle_req+0x938/0xf40
> [   61.366940]  ? io_dismantle_req+0x938/0xf40
> [   61.367643]  kasan_report_invalid_free+0x51/0x80
> [   61.368428]  ? io_dismantle_req+0x938/0xf40
> [   61.369192]  __kasan_slab_free+0x141/0x160
> [   61.369904]  kfree+0xd1/0x390
> [   61.370526]  io_dismantle_req+0x938/0xf40
> [   61.371199]  __io_free_req+0x95/0x5b0
> [   61.371823]  io_put_req+0x77/0xa0
> [   61.372386]  io_worker_handle_work+0xef3/0x1c00
> [   61.373130]  io_wqe_worker+0xc94/0x11a0
> [   61.373782]  ? io_worker_handle_work+0x1c00/0x1c00
> [   61.374579]  ? __kthread_parkme+0x11d/0x1d0
> [   61.375244]  ? io_worker_handle_work+0x1c00/0x1c00
> [   61.375968]  ? io_worker_handle_work+0x1c00/0x1c00
> [   61.376705]  kthread+0x396/0x470
> [   61.377271]  ? _raw_spin_unlock_irq+0x24/0x30
> [   61.377965]  ? kthread_mod_delayed_work+0x180/0x180
> [   61.378780]  ret_from_fork+0x22/0x30
> [   61.379457]
> [   61.379794] Allocated by task 1023:
> [   61.380413]  kasan_save_stack+0x1b/0x40
> [   61.381048]  __kasan_kmalloc.constprop.0+0xc2/0xd0
> [   61.381798]  kmem_cache_alloc_trace+0x17b/0x310
> [   61.382459]  io_uring_alloc_task_context+0x48/0x2b0
> [   61.383217]  io_uring_add_task_file+0x1f2/0x290
> [   61.383916]  io_uring_create+0x1745/0x2500
> [   61.384582]  io_uring_setup+0xbf/0x110
> [   61.385197]  do_syscall_64+0x33/0x40
> [   61.385797]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   61.386580]
> [   61.386905] The buggy address belongs to the object at ffff8881091d0800
> [   61.386905]  which belongs to the cache kmalloc-512 of size 512
> [   61.388735] The buggy address is located 264 bytes inside of
> [   61.388735]  512-byte region [ffff8881091d0800, ffff8881091d0a00)
> [   61.390398] The buggy address belongs to the page:
> [   61.391107] page:00000000f5dbd49a refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1091d0
> [   61.392496] head:00000000f5dbd49a order:2 compound_mapcount:0 compound_pincount:0
> [   61.393642] flags: 0x17fffc00010200(slab|head)
> [   61.394341] raw: 0017fffc00010200 dead000000000100 dead000000000122 ffff888100041280
> [   61.395436] raw: 0000000000000000 0000000080100010 00000001ffffffff ffff888100bf0301
> [   61.396571] page dumped because: kasan: bad access detected
> [   61.397432] page->mem_cgroup:ffff888100bf0301
> [   61.398123]
> [   61.398443] Memory state around the buggy address:
> [   61.399217]  ffff8881091d0800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [   61.400382]  ffff8881091d0880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [   61.401549] >ffff8881091d0900: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
> [   61.402654]                       ^
> [   61.403232]  ffff8881091d0980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   61.404343]  ffff8881091d0a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [   61.405506] ==================================================================
> [   61.406619] Disabling lock debugging due to kernel taint
> [   61.408513] ------------[ cut here ]------------
> [   61.409315] refcount_t: underflow; use-after-free.
> [   61.410261] WARNING: CPU: 1 PID: 1022 at lib/refcount.c:28 refcount_warn_saturate+0x266/0x2a0
> [   61.411489] Modules linked in:
> [   61.411989] CPU: 1 PID: 1022 Comm: io_wqe_worker-0 Tainted: G    B             5.10.0-rc5+ #1
> [   61.413202] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   61.414026] RIP: 0010:refcount_warn_saturate+0x266/0x2a0
> [   61.415488] Code: 01 31 ff 89 de e8 4a bc 34 ff 84 db 0f 85 bb fe ff ff e8 dd ba 34 ff 48 c7 c7 00 54 c7 83 c6 05 c2 b2 20 03 01 e8 40 56 22 01 <0f> 0b e9 9c fe ff ff 48 89 ef e8 2b 1d 77 ff e9 c9 fd ff ff e8 b1
> [   61.418943] RSP: 0018:ffff88810e467c98 EFLAGS: 00010286
> [   61.420053] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [   61.421434] RDX: 0000000000000000 RSI: ffffffff81348588 RDI: ffffed1021c8cf89
> [   61.422871] RBP: ffff8881091d0948 R08: ffff88810a4e2240 R09: ffffed1023466048
> [   61.424551] R10: ffff88811a33023b R11: ffffed1023466047 R12: ffff8881091d0908
> [   61.425955] R13: ffff8881090e30c8 R14: ffff8881091d0948 R15: ffff8881049b3300
> [   61.427361] FS:  0000000000000000(0000) GS:ffff88811a300000(0000) knlGS:0000000000000000
> [   61.428920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   61.430036] CR2: 00007ff46b29eef8 CR3: 000000010df64004 CR4: 00000000003706e0
> [   61.431380] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   61.432794] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   61.434178] Call Trace:
> [   61.434797]  io_dismantle_req+0x907/0xf40
> [   61.435676]  __io_free_req+0x95/0x5b0
> [   61.436408]  io_put_req+0x77/0xa0
> [   61.437141]  io_worker_handle_work+0xef3/0x1c00
> [   61.438073]  io_wqe_worker+0xc94/0x11a0
> [   61.438868]  ? io_worker_handle_work+0x1c00/0x1c00
> [   61.439862]  ? __kthread_parkme+0x11d/0x1d0
> [   61.440767]  ? io_worker_handle_work+0x1c00/0x1c00
> [   61.441760]  ? io_worker_handle_work+0x1c00/0x1c00
> [   61.442766]  kthread+0x396/0x470
> [   61.443491]  ? _raw_spin_unlock_irq+0x24/0x30
> [   61.444348]  ? kthread_mod_delayed_work+0x180/0x180
> [   61.445325]  ret_from_fork+0x22/0x30
> [   61.446118] irq event stamp: 38
> [   61.446867] hardirqs last  enabled at (37): [<ffffffff818e95d7>] kfree+0x1c7/0x390
> [   61.448351] hardirqs last disabled at (38): [<ffffffff834976e0>] _raw_spin_lock_irqsave+0x50/0x60
> [   61.449826] softirqs last  enabled at (0): [<ffffffff811a5989>] copy_process+0x18a9/0x67d0
> [   61.450930] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [   61.451756] ---[ end trace 67489de2f872e2d0 ]---
> [   61.498486] audit: type=1326 audit(1606805588.425:5): auid=0 uid=0 gid=0 ses=1 pid=1027 comm="a.out" exe="/root/a.out" sig=31 arch=c000003e syscall=202 compat=0 ip=0x7ff46abfc239 code=0x0
> [   61.503570] ------------[ cut here ]------------
> [   61.504505] WARNING: CPU: 0 PID: 725 at fs/io_uring.c:7769 __io_uring_free+0x179/0x1e0
> [   61.506114] Modules linked in:
> [   61.506869] CPU: 0 PID: 725 Comm: kworker/u4:3 Tainted: G    B   W         5.10.0-rc5+ #1
> [   61.508567] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   61.509774] Workqueue: events_unbound io_ring_exit_work
> [   61.510863] RIP: 0010:__io_uring_free+0x179/0x1e0
> [   61.511979] Code: 00 00 fc ff df 48 c1 ea 03 80 3c 02 00 75 6c 48 c7 83 b0 07 00 00 00 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 e8 37 96 9a ff <0f> 0b e9 6e ff ff ff e8 2b 96 9a ff 0f 0b e9 de fe ff ff 4c 89 e7
> [   61.515559] RSP: 0018:ffff888109b17cb0 EFLAGS: 00010293
> [   61.516733] RAX: ffff888108fa2240 RBX: ffff888100f04480 RCX: ffffffff81b22c24
> [   61.518237] RDX: 0000000000000000 RSI: ffffffff81b22cb9 RDI: 0000000000000005
> [   61.519635] RBP: ffff8881091d0800 R08: ffff888108fa2240 R09: ffffed102123a12a
> [   61.521169] R10: ffff8881091d094b R11: ffffed102123a129 R12: 00000000c0000000
> [   61.522662] R13: ffff888100f04c30 R14: ffff8881091d0950 R15: ffff8881091d0908
> [   61.523840] FS:  0000000000000000(0000) GS:ffff88811a200000(0000) knlGS:0000000000000000
> [   61.525130] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   61.526063] CR2: 00007f05a8ecbfc0 CR3: 0000000108f80001 CR4: 00000000003706f0
> [   61.527285] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   61.528435] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   61.529717] Call Trace:
> [   61.530384]  __put_task_struct+0x104/0x400
> [   61.531340]  io_ring_exit_work+0x777/0x9d0
> [   61.532279]  process_one_work+0x8aa/0x1520
> [   61.533275]  ? check_flags+0x60/0x60
> [   61.534085]  ? pwq_dec_nr_in_flight+0x360/0x360
> [   61.535067]  ? rwlock_bug.part.0+0x90/0x90
> [   61.535885]  worker_thread+0x9b/0xe20
> [   61.536535]  ? process_one_work+0x1520/0x1520
> [   61.537242]  ? process_one_work+0x1520/0x1520
> [   61.537902]  kthread+0x396/0x470
> [   61.538570]  ? _raw_spin_unlock_irq+0x24/0x30
> [   61.539239]  ? kthread_mod_delayed_work+0x180/0x180
> [   61.540058]  ret_from_fork+0x22/0x30
> [   61.540678] irq event stamp: 1626
> [   61.541272] hardirqs last  enabled at (1625): [<ffffffff83497894>] _raw_spin_unlock_irq+0x24/0x30
> [   61.542633] hardirqs last disabled at (1626): [<ffffffff834860e4>] __schedule+0x1114/0x2190
> [   61.543882] softirqs last  enabled at (1586): [<ffffffff81a477ea>] wb_workfn+0x64a/0x830
> [   61.545094] softirqs last disabled at (1582): [<ffffffff817b63c7>] wb_wakeup_delayed+0x67/0xf0
> [   61.546416] ---[ end trace 67489de2f872e2d1 ]---
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
  2020-12-02 11:31 [PATCH] io_uring: always let io_iopoll_complete() complete polled io Xiaoguang Wang
  2020-12-02 17:00 ` Pavel Begunkov
  2020-12-03  2:30 ` Joseph Qi
@ 2020-12-06 22:25 ` Pavel Begunkov
  2020-12-11 12:29 ` Pavel Begunkov
  3 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2020-12-06 22:25 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe, joseph.qi

On 02/12/2020 11:31, Xiaoguang Wang wrote:
[...]
> The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
> we'll complete req by calling io_req_complete(), which will hold completion_lock
> to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
> hold completion_lock to call io_commit_cqring(), then there maybe concurrent
> access to ctx->defer_list, double free may happen.
> 
> To fix this bug, we always let io_iopoll_complete() complete polled io.

I took this one into a series.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
  2020-12-02 11:31 [PATCH] io_uring: always let io_iopoll_complete() complete polled io Xiaoguang Wang
                   ` (2 preceding siblings ...)
  2020-12-06 22:25 ` Pavel Begunkov
@ 2020-12-11 12:29 ` Pavel Begunkov
  2020-12-11 14:59   ` Xiaoguang Wang
  3 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2020-12-11 12:29 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe, joseph.qi

On 02/12/2020 11:31, Xiaoguang Wang wrote:
> Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
> [   95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
> [   95.505921]
> [   95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G    B
> W         5.10.0-rc5+ #1
> [   95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [   95.508248] Call Trace:
> [   95.508683]  dump_stack+0x107/0x163
> [   95.509323]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.509982]  print_address_description.constprop.0+0x3e/0x60
> [   95.510814]  ? vprintk_func+0x98/0x140
> [   95.511399]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.512036]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.512733]  kasan_report_invalid_free+0x51/0x80
> [   95.513431]  ? io_commit_cqring+0x3ec/0x8e0
> [   95.514047]  __kasan_slab_free+0x141/0x160
> [   95.514699]  kfree+0xd1/0x390
> [   95.515182]  io_commit_cqring+0x3ec/0x8e0
> [   95.515799]  __io_req_complete.part.0+0x64/0x90
> [   95.516483]  io_wq_submit_work+0x1fa/0x260
> [   95.517117]  io_worker_handle_work+0xeac/0x1c00
> [   95.517828]  io_wqe_worker+0xc94/0x11a0
> [   95.518438]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.519151]  ? __kthread_parkme+0x11d/0x1d0
> [   95.519806]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.520512]  ? io_worker_handle_work+0x1c00/0x1c00
> [   95.521211]  kthread+0x396/0x470
> [   95.521727]  ? _raw_spin_unlock_irq+0x24/0x30
> [   95.522380]  ? kthread_mod_delayed_work+0x180/0x180
> [   95.523108]  ret_from_fork+0x22/0x30
> [   95.523684]
> [   95.523985] Allocated by task 4035:
> [   95.524543]  kasan_save_stack+0x1b/0x40
> [   95.525136]  __kasan_kmalloc.constprop.0+0xc2/0xd0
> [   95.525882]  kmem_cache_alloc_trace+0x17b/0x310
> [   95.533930]  io_queue_sqe+0x225/0xcb0
> [   95.534505]  io_submit_sqes+0x1768/0x25f0
> [   95.535164]  __x64_sys_io_uring_enter+0x89e/0xd10
> [   95.535900]  do_syscall_64+0x33/0x40
> [   95.536465]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   95.537199]
> [   95.537505] Freed by task 4035:
> [   95.538003]  kasan_save_stack+0x1b/0x40
> [   95.538599]  kasan_set_track+0x1c/0x30
> [   95.539177]  kasan_set_free_info+0x1b/0x30
> [   95.539798]  __kasan_slab_free+0x112/0x160
> [   95.540427]  kfree+0xd1/0x390
> [   95.540910]  io_commit_cqring+0x3ec/0x8e0
> [   95.541516]  io_iopoll_complete+0x914/0x1390
> [   95.542150]  io_do_iopoll+0x580/0x700
> [   95.542724]  io_iopoll_try_reap_events.part.0+0x108/0x200
> [   95.543512]  io_ring_ctx_wait_and_kill+0x118/0x340
> [   95.544206]  io_uring_release+0x43/0x50
> [   95.544791]  __fput+0x28d/0x940
> [   95.545291]  task_work_run+0xea/0x1b0
> [   95.545873]  do_exit+0xb6a/0x2c60
> [   95.546400]  do_group_exit+0x12a/0x320
> [   95.546967]  __x64_sys_exit_group+0x3f/0x50
> [   95.547605]  do_syscall_64+0x33/0x40
> [   95.548155]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
> we'll complete req by calling io_req_complete(), which will hold completion_lock
> to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
> hold completion_lock to call io_commit_cqring(), then there maybe concurrent
> access to ctx->defer_list, double free may happen.
> 
> To fix this bug, we always let io_iopoll_complete() complete polled io.
> 
> Reported-by: Abaci Fuzz <[email protected]>
> Signed-off-by: Xiaoguang Wang <[email protected]>
> ---
>  fs/io_uring.c | 15 +++++++++++++--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index a8c136a..901ca67 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6074,8 +6074,19 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work)
>  	}
>  
>  	if (ret) {
> -		req_set_fail_links(req);
> -		io_req_complete(req, ret);
> +		/*
> +		 * io_iopoll_complete() does not hold completion_lock to complete
> +		 * polled io, so here for polled io, just mark it done and still let
> +		 * io_iopoll_complete() complete it.
> +		 */
> +		if (req->ctx->flags & IORING_SETUP_IOPOLL) {
> +			struct kiocb *kiocb = &req->rw.kiocb;

Also remembered that IOPOLL apart from rw can do buf reg/unreg requests, so
it won't work for them. Could you fix it?

> +
> +			kiocb_done(kiocb, ret, NULL);
> +		} else {
> +			req_set_fail_links(req);
> +			io_req_complete(req, ret);
> +		}
>  	}
>  
>  	return io_steal_work(req);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
  2020-12-11 12:29 ` Pavel Begunkov
@ 2020-12-11 14:59   ` Xiaoguang Wang
  2020-12-11 18:20     ` Pavel Begunkov
  0 siblings, 1 reply; 10+ messages in thread
From: Xiaoguang Wang @ 2020-12-11 14:59 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: axboe, joseph.qi

hi Pavel,

> On 02/12/2020 11:31, Xiaoguang Wang wrote:
>> Abaci Fuzz reported a double-free or invalid-free BUG in io_commit_cqring():
>> [   95.504842] BUG: KASAN: double-free or invalid-free in io_commit_cqring+0x3ec/0x8e0
>> [   95.505921]
>> [   95.506225] CPU: 0 PID: 4037 Comm: io_wqe_worker-0 Tainted: G    B
>> W         5.10.0-rc5+ #1
>> [   95.507434] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
>> [   95.508248] Call Trace:
>> [   95.508683]  dump_stack+0x107/0x163
>> [   95.509323]  ? io_commit_cqring+0x3ec/0x8e0
>> [   95.509982]  print_address_description.constprop.0+0x3e/0x60
>> [   95.510814]  ? vprintk_func+0x98/0x140
>> [   95.511399]  ? io_commit_cqring+0x3ec/0x8e0
>> [   95.512036]  ? io_commit_cqring+0x3ec/0x8e0
>> [   95.512733]  kasan_report_invalid_free+0x51/0x80
>> [   95.513431]  ? io_commit_cqring+0x3ec/0x8e0
>> [   95.514047]  __kasan_slab_free+0x141/0x160
>> [   95.514699]  kfree+0xd1/0x390
>> [   95.515182]  io_commit_cqring+0x3ec/0x8e0
>> [   95.515799]  __io_req_complete.part.0+0x64/0x90
>> [   95.516483]  io_wq_submit_work+0x1fa/0x260
>> [   95.517117]  io_worker_handle_work+0xeac/0x1c00
>> [   95.517828]  io_wqe_worker+0xc94/0x11a0
>> [   95.518438]  ? io_worker_handle_work+0x1c00/0x1c00
>> [   95.519151]  ? __kthread_parkme+0x11d/0x1d0
>> [   95.519806]  ? io_worker_handle_work+0x1c00/0x1c00
>> [   95.520512]  ? io_worker_handle_work+0x1c00/0x1c00
>> [   95.521211]  kthread+0x396/0x470
>> [   95.521727]  ? _raw_spin_unlock_irq+0x24/0x30
>> [   95.522380]  ? kthread_mod_delayed_work+0x180/0x180
>> [   95.523108]  ret_from_fork+0x22/0x30
>> [   95.523684]
>> [   95.523985] Allocated by task 4035:
>> [   95.524543]  kasan_save_stack+0x1b/0x40
>> [   95.525136]  __kasan_kmalloc.constprop.0+0xc2/0xd0
>> [   95.525882]  kmem_cache_alloc_trace+0x17b/0x310
>> [   95.533930]  io_queue_sqe+0x225/0xcb0
>> [   95.534505]  io_submit_sqes+0x1768/0x25f0
>> [   95.535164]  __x64_sys_io_uring_enter+0x89e/0xd10
>> [   95.535900]  do_syscall_64+0x33/0x40
>> [   95.536465]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [   95.537199]
>> [   95.537505] Freed by task 4035:
>> [   95.538003]  kasan_save_stack+0x1b/0x40
>> [   95.538599]  kasan_set_track+0x1c/0x30
>> [   95.539177]  kasan_set_free_info+0x1b/0x30
>> [   95.539798]  __kasan_slab_free+0x112/0x160
>> [   95.540427]  kfree+0xd1/0x390
>> [   95.540910]  io_commit_cqring+0x3ec/0x8e0
>> [   95.541516]  io_iopoll_complete+0x914/0x1390
>> [   95.542150]  io_do_iopoll+0x580/0x700
>> [   95.542724]  io_iopoll_try_reap_events.part.0+0x108/0x200
>> [   95.543512]  io_ring_ctx_wait_and_kill+0x118/0x340
>> [   95.544206]  io_uring_release+0x43/0x50
>> [   95.544791]  __fput+0x28d/0x940
>> [   95.545291]  task_work_run+0xea/0x1b0
>> [   95.545873]  do_exit+0xb6a/0x2c60
>> [   95.546400]  do_group_exit+0x12a/0x320
>> [   95.546967]  __x64_sys_exit_group+0x3f/0x50
>> [   95.547605]  do_syscall_64+0x33/0x40
>> [   95.548155]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> The reason is that once we got a non EAGAIN error in io_wq_submit_work(),
>> we'll complete req by calling io_req_complete(), which will hold completion_lock
>> to call io_commit_cqring(), but for polled io, io_iopoll_complete() won't
>> hold completion_lock to call io_commit_cqring(), then there maybe concurrent
>> access to ctx->defer_list, double free may happen.
>>
>> To fix this bug, we always let io_iopoll_complete() complete polled io.
>>
>> Reported-by: Abaci Fuzz <[email protected]>
>> Signed-off-by: Xiaoguang Wang <[email protected]>
>> ---
>>   fs/io_uring.c | 15 +++++++++++++--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index a8c136a..901ca67 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -6074,8 +6074,19 @@ static struct io_wq_work *io_wq_submit_work(struct io_wq_work *work)
>>   	}
>>   
>>   	if (ret) {
>> -		req_set_fail_links(req);
>> -		io_req_complete(req, ret);
>> +		/*
>> +		 * io_iopoll_complete() does not hold completion_lock to complete
>> +		 * polled io, so here for polled io, just mark it done and still let
>> +		 * io_iopoll_complete() complete it.
>> +		 */
>> +		if (req->ctx->flags & IORING_SETUP_IOPOLL) {
>> +			struct kiocb *kiocb = &req->rw.kiocb;
> 
> Also remembered that IOPOLL apart from rw can do buf reg/unreg requests, so
> it won't work for them. Could you fix it?
Sorry for late, I had saw you and jens' discussion in previous mail. I was just
busy recently, did't have time to look into this issue. Now I have time to handle
it, will have a look at it tomorrow, thanks.

Regards,
Xiaoguang Wang

> 
>> +
>> +			kiocb_done(kiocb, ret, NULL);
>> +		} else {
>> +			req_set_fail_links(req);
>> +			io_req_complete(req, ret);
>> +		}
>>   	}
>>   
>>   	return io_steal_work(req);
>>
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] io_uring: always let io_iopoll_complete() complete polled io.
  2020-12-11 14:59   ` Xiaoguang Wang
@ 2020-12-11 18:20     ` Pavel Begunkov
  0 siblings, 0 replies; 10+ messages in thread
From: Pavel Begunkov @ 2020-12-11 18:20 UTC (permalink / raw)
  To: Xiaoguang Wang, io-uring; +Cc: axboe, joseph.qi

On 11/12/2020 14:59, Xiaoguang Wang wrote:
> Sorry for late, I had saw you and jens' discussion in previous mail. I was just
> busy recently, did't have time to look into this issue. Now I have time to handle
> it, will have a look at it tomorrow, thanks.

All cool, as Jens dropped it and decided to do that for 5.11 we're not in hurry,
just wanted to point out stuff to look after! You could also look for a similar
problem with io_cqring_events() if you're interested.

One more thing to ponder is failing -EAGAIN'ed requests in io_iopoll_complete()
if NOWAIT is set.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-12-11 20:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-12-02 11:31 [PATCH] io_uring: always let io_iopoll_complete() complete polled io Xiaoguang Wang
2020-12-02 17:00 ` Pavel Begunkov
2020-12-02 17:09   ` Pavel Begunkov
2020-12-03  2:30 ` Joseph Qi
2020-12-04 20:31   ` Pavel Begunkov
     [not found]     ` <[email protected]>
2020-12-05 16:16       ` Pavel Begunkov
2020-12-06 22:25 ` Pavel Begunkov
2020-12-11 12:29 ` Pavel Begunkov
2020-12-11 14:59   ` Xiaoguang Wang
2020-12-11 18:20     ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox