* [PATCH] io_uring: leave clean req to be done in flush overflow @ 2021-01-20 8:11 Joseph Qi 2021-01-20 12:35 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Joseph Qi @ 2021-01-20 8:11 UTC (permalink / raw) To: Jens Axboe, Pavel Begunkov, io-uring; +Cc: Xiaoguang Wang Abaci reported the following BUG: [ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 [ 27.633220] 1 lock held by io_wqe_worker-0/1012: [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 [ 27.636487] irq event stamp: 66658 [ 27.637302] hardirqs last enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0 [ 27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50 [ 27.641196] softirqs last enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa [ 27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20 [ 27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68 [ 27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 27.649249] Call Trace: [ 27.649874] dump_stack+0xac/0xe3 [ 27.650666] ___might_sleep+0x284/0x2c0 [ 27.651566] put_files_struct+0xb8/0x120 [ 27.652481] __io_clean_op+0x10c/0x2a0 [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 [ 27.654399] __io_req_complete.part.102+0x41/0x70 [ 27.655464] io_openat2+0x151/0x300 [ 27.656297] io_issue_sqe+0x6c/0x14e0 [ 27.657170] ? lock_acquire+0x31a/0x440 [ 27.658068] ? io_worker_handle_work+0x24e/0x8a0 [ 27.659119] ? find_held_lock+0x28/0xb0 [ 27.660026] ? io_wq_submit_work+0x7f/0x240 [ 27.660991] io_wq_submit_work+0x7f/0x240 [ 27.661915] ? trace_hardirqs_on+0x46/0x110 [ 27.662890] io_worker_handle_work+0x501/0x8a0 [ 27.663917] ? io_wqe_worker+0x135/0x520 [ 27.664836] io_wqe_worker+0x158/0x520 [ 27.665719] ? __kthread_parkme+0x96/0xc0 [ 27.666663] ? io_worker_handle_work+0x8a0/0x8a0 [ 27.667726] kthread+0x134/0x180 [ 27.668506] ? kthread_create_worker_on_cpu+0x90/0x90 [ 27.669641] ret_from_fork+0x1f/0x30 It blames we call cond_resched() with completion_lock when clean request. In fact we will do it during flush overflow and it seems we have no reason to do it before. So just remove io_clean_op() in __io_cqring_fill_event() to fix this BUG. Reported-by: Abaci <[email protected]> Signed-off-by: Joseph Qi <[email protected]> --- fs/io_uring.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 985a9e3..9b937d1 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1860,7 +1860,6 @@ static void __io_cqring_fill_event(struct io_kiocb *req, long res, long cflags) set_bit(0, &ctx->cq_check_overflow); ctx->rings->sq_flags |= IORING_SQ_CQ_OVERFLOW; } - io_clean_op(req); req->result = res; req->compl.cflags = cflags; refcount_inc(&req->refs); -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring: leave clean req to be done in flush overflow 2021-01-20 8:11 [PATCH] io_uring: leave clean req to be done in flush overflow Joseph Qi @ 2021-01-20 12:35 ` Pavel Begunkov 2021-01-21 1:37 ` Joseph Qi 2021-01-21 1:54 ` Xiaoguang Wang 0 siblings, 2 replies; 6+ messages in thread From: Pavel Begunkov @ 2021-01-20 12:35 UTC (permalink / raw) To: Joseph Qi, Jens Axboe, io-uring; +Cc: Xiaoguang Wang On 20/01/2021 08:11, Joseph Qi wrote: > Abaci reported the following BUG: > > [ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 > [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 > [ 27.633220] 1 lock held by io_wqe_worker-0/1012: > [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 > [ 27.636487] irq event stamp: 66658 > [ 27.637302] hardirqs last enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0 > [ 27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50 > [ 27.641196] softirqs last enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa > [ 27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20 > [ 27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68 > [ 27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 > [ 27.649249] Call Trace: > [ 27.649874] dump_stack+0xac/0xe3 > [ 27.650666] ___might_sleep+0x284/0x2c0 > [ 27.651566] put_files_struct+0xb8/0x120 > [ 27.652481] __io_clean_op+0x10c/0x2a0 > [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 > [ 27.654399] __io_req_complete.part.102+0x41/0x70 > [ 27.655464] io_openat2+0x151/0x300 > [ 27.656297] io_issue_sqe+0x6c/0x14e0 > [ 27.657170] ? lock_acquire+0x31a/0x440 > [ 27.658068] ? io_worker_handle_work+0x24e/0x8a0 > [ 27.659119] ? find_held_lock+0x28/0xb0 > [ 27.660026] ? io_wq_submit_work+0x7f/0x240 > [ 27.660991] io_wq_submit_work+0x7f/0x240 > [ 27.661915] ? trace_hardirqs_on+0x46/0x110 > [ 27.662890] io_worker_handle_work+0x501/0x8a0 > [ 27.663917] ? io_wqe_worker+0x135/0x520 > [ 27.664836] io_wqe_worker+0x158/0x520 > [ 27.665719] ? __kthread_parkme+0x96/0xc0 > [ 27.666663] ? io_worker_handle_work+0x8a0/0x8a0 > [ 27.667726] kthread+0x134/0x180 > [ 27.668506] ? kthread_create_worker_on_cpu+0x90/0x90 > [ 27.669641] ret_from_fork+0x1f/0x30 > > It blames we call cond_resched() with completion_lock when clean > request. In fact we will do it during flush overflow and it seems we > have no reason to do it before. So just remove io_clean_op() in > __io_cqring_fill_event() to fix this BUG. Nope, it would be broken. You may override, e.g. iov pointer that is dynamically allocated, and the function makes sure all those are deleted and freed. Most probably there will be problems on flush side as well. Looks like the problem is that we do spin_lock_irqsave() in __io_req_complete() and then just spin_lock() for put_files_struct(). Jens, is it a real problem? At least for 5.12 there is a cleanup as below, moving drop_files() into io_req_clean_work/io_free_req(), which is out of locks. Depends on that don't-cancel-by-files patch, but I guess can be for 5.11 diff --git a/fs/io_uring.c b/fs/io_uring.c index 4f702d03d375..3d3087851fed 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -614,7 +614,6 @@ enum { REQ_F_BUFFER_SELECT_BIT = IOSQE_BUFFER_SELECT_BIT, REQ_F_FAIL_LINK_BIT, - REQ_F_INFLIGHT_BIT, REQ_F_CUR_POS_BIT, REQ_F_NOWAIT_BIT, REQ_F_LINK_TIMEOUT_BIT, @@ -647,8 +646,6 @@ enum { /* fail rest of links */ REQ_F_FAIL_LINK = BIT(REQ_F_FAIL_LINK_BIT), - /* on inflight list */ - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), /* read/write uses file position */ REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), /* must not punt to workers */ @@ -1057,8 +1054,7 @@ EXPORT_SYMBOL(io_uring_get_socket); static inline void io_clean_op(struct io_kiocb *req) { - if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED | - REQ_F_INFLIGHT)) + if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED)) __io_clean_op(req); } @@ -1375,6 +1371,11 @@ static void io_req_clean_work(struct io_kiocb *req) free_fs_struct(fs); req->work.flags &= ~IO_WQ_WORK_FS; } + if (req->work.flags & IO_WQ_WORK_FILES) { + put_files_struct(req->work.identity->files); + put_nsproxy(req->work.identity->nsproxy); + req->work.flags &= ~IO_WQ_WORK_FILES; + } io_put_identity(req->task->io_uring, req); } @@ -1483,7 +1484,6 @@ static bool io_grab_identity(struct io_kiocb *req) return false; atomic_inc(&id->files->count); get_nsproxy(id->nsproxy); - req->flags |= REQ_F_INFLIGHT; req->work.flags |= IO_WQ_WORK_FILES; } if (!(req->work.flags & IO_WQ_WORK_MM) && @@ -6128,18 +6128,6 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EIOCBQUEUED; } -static void io_req_drop_files(struct io_kiocb *req) -{ - struct io_uring_task *tctx = req->task->io_uring; - - put_files_struct(req->work.identity->files); - put_nsproxy(req->work.identity->nsproxy); - req->flags &= ~REQ_F_INFLIGHT; - req->work.flags &= ~IO_WQ_WORK_FILES; - if (atomic_read(&tctx->in_idle)) - wake_up(&tctx->wait); -} - static void __io_clean_op(struct io_kiocb *req) { if (req->flags & REQ_F_BUFFER_SELECTED) { @@ -6197,9 +6185,6 @@ static void __io_clean_op(struct io_kiocb *req) } req->flags &= ~REQ_F_NEED_CLEANUP; } - - if (req->flags & REQ_F_INFLIGHT) - io_req_drop_files(req); } static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock, -- Pavel Begunkov ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring: leave clean req to be done in flush overflow 2021-01-20 12:35 ` Pavel Begunkov @ 2021-01-21 1:37 ` Joseph Qi 2021-01-21 2:00 ` Pavel Begunkov 2021-01-21 1:54 ` Xiaoguang Wang 1 sibling, 1 reply; 6+ messages in thread From: Joseph Qi @ 2021-01-21 1:37 UTC (permalink / raw) To: Pavel Begunkov, Joseph Qi, Jens Axboe, io-uring; +Cc: Xiaoguang Wang On 1/20/21 8:35 PM, Pavel Begunkov wrote: > On 20/01/2021 08:11, Joseph Qi wrote: >> Abaci reported the following BUG: >> >> [ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 >> [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 >> [ 27.633220] 1 lock held by io_wqe_worker-0/1012: >> [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 >> [ 27.636487] irq event stamp: 66658 >> [ 27.637302] hardirqs last enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0 >> [ 27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50 >> [ 27.641196] softirqs last enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa >> [ 27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20 >> [ 27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68 >> [ 27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 >> [ 27.649249] Call Trace: >> [ 27.649874] dump_stack+0xac/0xe3 >> [ 27.650666] ___might_sleep+0x284/0x2c0 >> [ 27.651566] put_files_struct+0xb8/0x120 >> [ 27.652481] __io_clean_op+0x10c/0x2a0 >> [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 >> [ 27.654399] __io_req_complete.part.102+0x41/0x70 >> [ 27.655464] io_openat2+0x151/0x300 >> [ 27.656297] io_issue_sqe+0x6c/0x14e0 >> [ 27.657170] ? lock_acquire+0x31a/0x440 >> [ 27.658068] ? io_worker_handle_work+0x24e/0x8a0 >> [ 27.659119] ? find_held_lock+0x28/0xb0 >> [ 27.660026] ? io_wq_submit_work+0x7f/0x240 >> [ 27.660991] io_wq_submit_work+0x7f/0x240 >> [ 27.661915] ? trace_hardirqs_on+0x46/0x110 >> [ 27.662890] io_worker_handle_work+0x501/0x8a0 >> [ 27.663917] ? io_wqe_worker+0x135/0x520 >> [ 27.664836] io_wqe_worker+0x158/0x520 >> [ 27.665719] ? __kthread_parkme+0x96/0xc0 >> [ 27.666663] ? io_worker_handle_work+0x8a0/0x8a0 >> [ 27.667726] kthread+0x134/0x180 >> [ 27.668506] ? kthread_create_worker_on_cpu+0x90/0x90 >> [ 27.669641] ret_from_fork+0x1f/0x30 >> >> It blames we call cond_resched() with completion_lock when clean >> request. In fact we will do it during flush overflow and it seems we >> have no reason to do it before. So just remove io_clean_op() in >> __io_cqring_fill_event() to fix this BUG. > > Nope, it would be broken. You may override, e.g. iov pointer > that is dynamically allocated, and the function makes sure all > those are deleted and freed. Most probably there will be problems > on flush side as well. > > Looks like the problem is that we do spin_lock_irqsave() in > __io_req_complete() and then just spin_lock() for put_files_struct(). > Jens, is it a real problem? > From the code, it is because it might sleep in close_files(): ... if (file) { filp_close(file, files); cond_resched(); } Thanks, Joseph > At least for 5.12 there is a cleanup as below, moving drop_files() > into io_req_clean_work/io_free_req(), which is out of locks. Depends > on that don't-cancel-by-files patch, but I guess can be for 5.11 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring: leave clean req to be done in flush overflow 2021-01-21 1:37 ` Joseph Qi @ 2021-01-21 2:00 ` Pavel Begunkov 0 siblings, 0 replies; 6+ messages in thread From: Pavel Begunkov @ 2021-01-21 2:00 UTC (permalink / raw) To: Joseph Qi, Joseph Qi, Jens Axboe, io-uring; +Cc: Xiaoguang Wang On 21/01/2021 01:37, Joseph Qi wrote: > > > On 1/20/21 8:35 PM, Pavel Begunkov wrote: >> On 20/01/2021 08:11, Joseph Qi wrote: >>> Abaci reported the following BUG: >>> >>> [ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 >>> [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 >>> [ 27.633220] 1 lock held by io_wqe_worker-0/1012: >>> [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 >>> [ 27.636487] irq event stamp: 66658 >>> [ 27.637302] hardirqs last enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0 >>> [ 27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50 >>> [ 27.641196] softirqs last enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa >>> [ 27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20 >>> [ 27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68 >>> [ 27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 >>> [ 27.649249] Call Trace: >>> [ 27.649874] dump_stack+0xac/0xe3 >>> [ 27.650666] ___might_sleep+0x284/0x2c0 >>> [ 27.651566] put_files_struct+0xb8/0x120 >>> [ 27.652481] __io_clean_op+0x10c/0x2a0 >>> [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 >>> [ 27.654399] __io_req_complete.part.102+0x41/0x70 >>> [ 27.655464] io_openat2+0x151/0x300 >>> [ 27.656297] io_issue_sqe+0x6c/0x14e0 >>> [ 27.657170] ? lock_acquire+0x31a/0x440 >>> [ 27.658068] ? io_worker_handle_work+0x24e/0x8a0 >>> [ 27.659119] ? find_held_lock+0x28/0xb0 >>> [ 27.660026] ? io_wq_submit_work+0x7f/0x240 >>> [ 27.660991] io_wq_submit_work+0x7f/0x240 >>> [ 27.661915] ? trace_hardirqs_on+0x46/0x110 >>> [ 27.662890] io_worker_handle_work+0x501/0x8a0 >>> [ 27.663917] ? io_wqe_worker+0x135/0x520 >>> [ 27.664836] io_wqe_worker+0x158/0x520 >>> [ 27.665719] ? __kthread_parkme+0x96/0xc0 >>> [ 27.666663] ? io_worker_handle_work+0x8a0/0x8a0 >>> [ 27.667726] kthread+0x134/0x180 >>> [ 27.668506] ? kthread_create_worker_on_cpu+0x90/0x90 >>> [ 27.669641] ret_from_fork+0x1f/0x30 >>> >>> It blames we call cond_resched() with completion_lock when clean >>> request. In fact we will do it during flush overflow and it seems we >>> have no reason to do it before. So just remove io_clean_op() in >>> __io_cqring_fill_event() to fix this BUG. >> >> Nope, it would be broken. You may override, e.g. iov pointer >> that is dynamically allocated, and the function makes sure all >> those are deleted and freed. Most probably there will be problems >> on flush side as well. >> >> Looks like the problem is that we do spin_lock_irqsave() in >> __io_req_complete() and then just spin_lock() for put_files_struct(). >> Jens, is it a real problem? >> > From the code, it is because it might sleep in close_files(): Makes sense. The diff should handle it, but let's see if it would ever be applicable after some other bug fixes. > > ... > if (file) { > filp_close(file, files); > cond_resched(); > } > > > Thanks, > Joseph > >> At least for 5.12 there is a cleanup as below, moving drop_files() >> into io_req_clean_work/io_free_req(), which is out of locks. Depends >> on that don't-cancel-by-files patch, but I guess can be for 5.11 -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring: leave clean req to be done in flush overflow 2021-01-20 12:35 ` Pavel Begunkov 2021-01-21 1:37 ` Joseph Qi @ 2021-01-21 1:54 ` Xiaoguang Wang 2021-01-21 2:11 ` Pavel Begunkov 1 sibling, 1 reply; 6+ messages in thread From: Xiaoguang Wang @ 2021-01-21 1:54 UTC (permalink / raw) To: Pavel Begunkov, Joseph Qi, Jens Axboe, io-uring hi Pavel, > On 20/01/2021 08:11, Joseph Qi wrote: >> Abaci reported the following BUG: >> >> [ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 >> [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 >> [ 27.633220] 1 lock held by io_wqe_worker-0/1012: >> [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 >> [ 27.636487] irq event stamp: 66658 >> [ 27.637302] hardirqs last enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0 >> [ 27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50 >> [ 27.641196] softirqs last enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa >> [ 27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20 >> [ 27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68 >> [ 27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 >> [ 27.649249] Call Trace: >> [ 27.649874] dump_stack+0xac/0xe3 >> [ 27.650666] ___might_sleep+0x284/0x2c0 >> [ 27.651566] put_files_struct+0xb8/0x120 >> [ 27.652481] __io_clean_op+0x10c/0x2a0 >> [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 >> [ 27.654399] __io_req_complete.part.102+0x41/0x70 >> [ 27.655464] io_openat2+0x151/0x300 >> [ 27.656297] io_issue_sqe+0x6c/0x14e0 >> [ 27.657170] ? lock_acquire+0x31a/0x440 >> [ 27.658068] ? io_worker_handle_work+0x24e/0x8a0 >> [ 27.659119] ? find_held_lock+0x28/0xb0 >> [ 27.660026] ? io_wq_submit_work+0x7f/0x240 >> [ 27.660991] io_wq_submit_work+0x7f/0x240 >> [ 27.661915] ? trace_hardirqs_on+0x46/0x110 >> [ 27.662890] io_worker_handle_work+0x501/0x8a0 >> [ 27.663917] ? io_wqe_worker+0x135/0x520 >> [ 27.664836] io_wqe_worker+0x158/0x520 >> [ 27.665719] ? __kthread_parkme+0x96/0xc0 >> [ 27.666663] ? io_worker_handle_work+0x8a0/0x8a0 >> [ 27.667726] kthread+0x134/0x180 >> [ 27.668506] ? kthread_create_worker_on_cpu+0x90/0x90 >> [ 27.669641] ret_from_fork+0x1f/0x30 >> >> It blames we call cond_resched() with completion_lock when clean >> request. In fact we will do it during flush overflow and it seems we >> have no reason to do it before. So just remove io_clean_op() in >> __io_cqring_fill_event() to fix this BUG. > > Nope, it would be broken. You may override, e.g. iov pointer > that is dynamically allocated, and the function makes sure all > those are deleted and freed. Most probably there will be problems > on flush side as well. Could you please explain more why this is a problem? io_clean_op justs does some clean work, free allocated memory, put file, etc, and these jobs should can be done in __io_cqring_overflow_flush(): while (!list_empty(&list)) { req = list_first_entry(&list, struct io_kiocb, compl.list); list_del(&req->compl.list); io_put_req(req); // will call io_clean_op } And calling a single io_clean_op in __io_cqring_fill_event() looks weird. Regards, Xiaoguang Wang > > Looks like the problem is that we do spin_lock_irqsave() in > __io_req_complete() and then just spin_lock() for put_files_struct(). > Jens, is it a real problem? > > At least for 5.12 there is a cleanup as below, moving drop_files() > into io_req_clean_work/io_free_req(), which is out of locks. Depends > on that don't-cancel-by-files patch, but I guess can be for 5.11 > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 4f702d03d375..3d3087851fed 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -614,7 +614,6 @@ enum { > REQ_F_BUFFER_SELECT_BIT = IOSQE_BUFFER_SELECT_BIT, > > REQ_F_FAIL_LINK_BIT, > - REQ_F_INFLIGHT_BIT, > REQ_F_CUR_POS_BIT, > REQ_F_NOWAIT_BIT, > REQ_F_LINK_TIMEOUT_BIT, > @@ -647,8 +646,6 @@ enum { > > /* fail rest of links */ > REQ_F_FAIL_LINK = BIT(REQ_F_FAIL_LINK_BIT), > - /* on inflight list */ > - REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), > /* read/write uses file position */ > REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), > /* must not punt to workers */ > @@ -1057,8 +1054,7 @@ EXPORT_SYMBOL(io_uring_get_socket); > > static inline void io_clean_op(struct io_kiocb *req) > { > - if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED | > - REQ_F_INFLIGHT)) > + if (req->flags & (REQ_F_NEED_CLEANUP | REQ_F_BUFFER_SELECTED)) > __io_clean_op(req); > } > > @@ -1375,6 +1371,11 @@ static void io_req_clean_work(struct io_kiocb *req) > free_fs_struct(fs); > req->work.flags &= ~IO_WQ_WORK_FS; > } > + if (req->work.flags & IO_WQ_WORK_FILES) { > + put_files_struct(req->work.identity->files); > + put_nsproxy(req->work.identity->nsproxy); > + req->work.flags &= ~IO_WQ_WORK_FILES; > + } > > io_put_identity(req->task->io_uring, req); > } > @@ -1483,7 +1484,6 @@ static bool io_grab_identity(struct io_kiocb *req) > return false; > atomic_inc(&id->files->count); > get_nsproxy(id->nsproxy); > - req->flags |= REQ_F_INFLIGHT; > req->work.flags |= IO_WQ_WORK_FILES; > } > if (!(req->work.flags & IO_WQ_WORK_MM) && > @@ -6128,18 +6128,6 @@ static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe) > return -EIOCBQUEUED; > } > > -static void io_req_drop_files(struct io_kiocb *req) > -{ > - struct io_uring_task *tctx = req->task->io_uring; > - > - put_files_struct(req->work.identity->files); > - put_nsproxy(req->work.identity->nsproxy); > - req->flags &= ~REQ_F_INFLIGHT; > - req->work.flags &= ~IO_WQ_WORK_FILES; > - if (atomic_read(&tctx->in_idle)) > - wake_up(&tctx->wait); > -} > - > static void __io_clean_op(struct io_kiocb *req) > { > if (req->flags & REQ_F_BUFFER_SELECTED) { > @@ -6197,9 +6185,6 @@ static void __io_clean_op(struct io_kiocb *req) > } > req->flags &= ~REQ_F_NEED_CLEANUP; > } > - > - if (req->flags & REQ_F_INFLIGHT) > - io_req_drop_files(req); > } > > static int io_issue_sqe(struct io_kiocb *req, bool force_nonblock, > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring: leave clean req to be done in flush overflow 2021-01-21 1:54 ` Xiaoguang Wang @ 2021-01-21 2:11 ` Pavel Begunkov 0 siblings, 0 replies; 6+ messages in thread From: Pavel Begunkov @ 2021-01-21 2:11 UTC (permalink / raw) To: Xiaoguang Wang, Joseph Qi, Jens Axboe, io-uring On 21/01/2021 01:54, Xiaoguang Wang wrote: > hi Pavel, > >> On 20/01/2021 08:11, Joseph Qi wrote: >>> Abaci reported the following BUG: >>> >>> [ 27.629441] BUG: sleeping function called from invalid context at fs/file.c:402 >>> [ 27.631317] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1012, name: io_wqe_worker-0 >>> [ 27.633220] 1 lock held by io_wqe_worker-0/1012: >>> [ 27.634286] #0: ffff888105e26c98 (&ctx->completion_lock){....}-{2:2}, at: __io_req_complete.part.102+0x30/0x70 >>> [ 27.636487] irq event stamp: 66658 >>> [ 27.637302] hardirqs last enabled at (66657): [<ffffffff8144ba02>] kmem_cache_free+0x1f2/0x3b0 >>> [ 27.639211] hardirqs last disabled at (66658): [<ffffffff82003a77>] _raw_spin_lock_irqsave+0x17/0x50 >>> [ 27.641196] softirqs last enabled at (64686): [<ffffffff824003c5>] __do_softirq+0x3c5/0x5aa >>> [ 27.643062] softirqs last disabled at (64681): [<ffffffff8220108f>] asm_call_irq_on_stack+0xf/0x20 >>> [ 27.645029] CPU: 1 PID: 1012 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc4+ #68 >>> [ 27.646651] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 >>> [ 27.649249] Call Trace: >>> [ 27.649874] dump_stack+0xac/0xe3 >>> [ 27.650666] ___might_sleep+0x284/0x2c0 >>> [ 27.651566] put_files_struct+0xb8/0x120 >>> [ 27.652481] __io_clean_op+0x10c/0x2a0 >>> [ 27.653362] __io_cqring_fill_event+0x2c1/0x350 >>> [ 27.654399] __io_req_complete.part.102+0x41/0x70 >>> [ 27.655464] io_openat2+0x151/0x300 >>> [ 27.656297] io_issue_sqe+0x6c/0x14e0 >>> [ 27.657170] ? lock_acquire+0x31a/0x440 >>> [ 27.658068] ? io_worker_handle_work+0x24e/0x8a0 >>> [ 27.659119] ? find_held_lock+0x28/0xb0 >>> [ 27.660026] ? io_wq_submit_work+0x7f/0x240 >>> [ 27.660991] io_wq_submit_work+0x7f/0x240 >>> [ 27.661915] ? trace_hardirqs_on+0x46/0x110 >>> [ 27.662890] io_worker_handle_work+0x501/0x8a0 >>> [ 27.663917] ? io_wqe_worker+0x135/0x520 >>> [ 27.664836] io_wqe_worker+0x158/0x520 >>> [ 27.665719] ? __kthread_parkme+0x96/0xc0 >>> [ 27.666663] ? io_worker_handle_work+0x8a0/0x8a0 >>> [ 27.667726] kthread+0x134/0x180 >>> [ 27.668506] ? kthread_create_worker_on_cpu+0x90/0x90 >>> [ 27.669641] ret_from_fork+0x1f/0x30 >>> >>> It blames we call cond_resched() with completion_lock when clean >>> request. In fact we will do it during flush overflow and it seems we >>> have no reason to do it before. So just remove io_clean_op() in >>> __io_cqring_fill_event() to fix this BUG. >> >> Nope, it would be broken. You may override, e.g. iov pointer >> that is dynamically allocated, and the function makes sure all >> those are deleted and freed. Most probably there will be problems >> on flush side as well. > Could you please explain more why this is a problem? > io_clean_op justs does some clean work, free allocated memory, put file, etc, > and these jobs should can be done in __io_cqring_overflow_flush(): struct io_kiocb { union { struct file *file; struct io_rw rw; ... /* use only after cleaning per-op data, see io_clean_op() */ struct io_completion compl; }; }; io_clean_op() cleans everything in first 64B (not only), and that space is used for overflow lists, etc. io_clean_op(req); req->compl.cflags = cflags; ----- list_add_tail(&req->compl.list, &ctx->cq_overflow_list); ----- That's the reason why we need to call it. A bit different story is why it does drop_files(). One time it was in io_req_clean_work(), which is called without locks held, but there were nasty races with cancellations of overflowed reqs, so it was much easier to move into io_clean_op(), so we just don't ever have requests with ->files in overflowed lists. As we just changed that cancellation scheme, those races are not existent anymore, and it could be moved back as in the diff. > while (!list_empty(&list)) { > req = list_first_entry(&list, struct io_kiocb, compl.list); > list_del(&req->compl.list); > io_put_req(req); // will call io_clean_op > } -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-01-21 3:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-01-20 8:11 [PATCH] io_uring: leave clean req to be done in flush overflow Joseph Qi 2021-01-20 12:35 ` Pavel Begunkov 2021-01-21 1:37 ` Joseph Qi 2021-01-21 2:00 ` Pavel Begunkov 2021-01-21 1:54 ` Xiaoguang Wang 2021-01-21 2:11 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox