* iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero() @ 2021-08-09 4:36 Nadav Amit 2021-08-09 13:49 ` Jens Axboe 0 siblings, 1 reply; 4+ messages in thread From: Nadav Amit @ 2021-08-09 4:36 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring Jens, others, Sorry for bothering again, but I encountered a lockdep assertion failure: [ 106.009878] ------------[ cut here ]------------ [ 106.012487] WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 [ 106.014524] Modules linked in: [ 106.015174] CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 [ 106.016653] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 [ 106.018959] RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 [ 106.020344] Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 [ 106.026258] RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 [ 106.028143] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 [ 106.029626] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac [ 106.031340] RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 [ 106.032938] R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae [ 106.034363] R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 [ 106.036115] FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 [ 106.037855] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.039010] CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 [ 106.040453] Call Trace: [ 106.041245] _raw_spin_unlock_bh+0x31/0x40 [ 106.042543] io_rsrc_node_ref_zero+0x13e/0x190 [ 106.043471] io_dismantle_req+0x215/0x220 [ 106.044297] io_req_complete_post+0x1b8/0x720 [ 106.045456] __io_complete_rw.isra.0+0x16b/0x1f0 [ 106.046593] io_complete_rw+0x10/0x20 [ .... The rest of the call-stack is my stuff ] Apparently, io_req_complete_post() disables IRQs and this code-path seems valid (IOW: I did not somehow cause this failure). I am not familiar with this code, so some feedback would be appreciated. Thanks, Nadav ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero() 2021-08-09 4:36 iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero() Nadav Amit @ 2021-08-09 13:49 ` Jens Axboe 2021-08-09 22:01 ` Nadav Amit 0 siblings, 1 reply; 4+ messages in thread From: Jens Axboe @ 2021-08-09 13:49 UTC (permalink / raw) To: Nadav Amit; +Cc: io-uring On 8/8/21 10:36 PM, Nadav Amit wrote: > Jens, others, > > Sorry for bothering again, but I encountered a lockdep assertion failure: > > [ 106.009878] ------------[ cut here ]------------ > [ 106.012487] WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 > [ 106.014524] Modules linked in: > [ 106.015174] CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 > [ 106.016653] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 > [ 106.018959] RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 > [ 106.020344] Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 > [ 106.026258] RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 > [ 106.028143] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 > [ 106.029626] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac > [ 106.031340] RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 > [ 106.032938] R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae > [ 106.034363] R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 > [ 106.036115] FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 > [ 106.037855] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 106.039010] CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 > [ 106.040453] Call Trace: > [ 106.041245] _raw_spin_unlock_bh+0x31/0x40 > [ 106.042543] io_rsrc_node_ref_zero+0x13e/0x190 > [ 106.043471] io_dismantle_req+0x215/0x220 > [ 106.044297] io_req_complete_post+0x1b8/0x720 > [ 106.045456] __io_complete_rw.isra.0+0x16b/0x1f0 > [ 106.046593] io_complete_rw+0x10/0x20 > > [ .... The rest of the call-stack is my stuff ] > > > Apparently, io_req_complete_post() disables IRQs and this code-path seems > valid (IOW: I did not somehow cause this failure). I am not familiar with > this code, so some feedback would be appreciated. Can you try with this patch? diff --git a/fs/io_uring.c b/fs/io_uring.c index ca064486cb41..6a8257233061 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -7138,16 +7138,6 @@ static void **io_alloc_page_table(size_t size) return table; } -static inline void io_rsrc_ref_lock(struct io_ring_ctx *ctx) -{ - spin_lock_bh(&ctx->rsrc_ref_lock); -} - -static inline void io_rsrc_ref_unlock(struct io_ring_ctx *ctx) -{ - spin_unlock_bh(&ctx->rsrc_ref_lock); -} - static void io_rsrc_node_destroy(struct io_rsrc_node *ref_node) { percpu_ref_exit(&ref_node->refs); @@ -7164,9 +7154,9 @@ static void io_rsrc_node_switch(struct io_ring_ctx *ctx, struct io_rsrc_node *rsrc_node = ctx->rsrc_node; rsrc_node->rsrc_data = data_to_kill; - io_rsrc_ref_lock(ctx); + spin_lock_irq(&ctx->rsrc_ref_lock); list_add_tail(&rsrc_node->node, &ctx->rsrc_ref_list); - io_rsrc_ref_unlock(ctx); + spin_unlock_irq(&ctx->rsrc_ref_lock); atomic_inc(&data_to_kill->refs); percpu_ref_kill(&rsrc_node->refs); @@ -7674,9 +7664,10 @@ static void io_rsrc_node_ref_zero(struct percpu_ref *ref) { struct io_rsrc_node *node = container_of(ref, struct io_rsrc_node, refs); struct io_ring_ctx *ctx = node->rsrc_data->ctx; + unsigned long flags; bool first_add = false; - io_rsrc_ref_lock(ctx); + spin_lock_irqsave(&ctx->rsrc_ref_lock, flags); node->done = true; while (!list_empty(&ctx->rsrc_ref_list)) { @@ -7688,7 +7679,7 @@ static void io_rsrc_node_ref_zero(struct percpu_ref *ref) list_del(&node->node); first_add |= llist_add(&node->llist, &ctx->rsrc_put_llist); } - io_rsrc_ref_unlock(ctx); + spin_unlock_irqrestore(&ctx->rsrc_ref_lock, flags); if (first_add) mod_delayed_work(system_wq, &ctx->rsrc_put_work, HZ); -- Jens Axboe ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero() 2021-08-09 13:49 ` Jens Axboe @ 2021-08-09 22:01 ` Nadav Amit 2021-08-09 22:11 ` Jens Axboe 0 siblings, 1 reply; 4+ messages in thread From: Nadav Amit @ 2021-08-09 22:01 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring > On Aug 9, 2021, at 6:49 AM, Jens Axboe <[email protected]> wrote: > > On 8/8/21 10:36 PM, Nadav Amit wrote: >> Jens, others, >> >> Sorry for bothering again, but I encountered a lockdep assertion failure: >> >> [ 106.009878] ------------[ cut here ]------------ >> [ 106.012487] WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 >> [ 106.014524] Modules linked in: >> [ 106.015174] CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 >> [ 106.016653] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 >> [ 106.018959] RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 >> [ 106.020344] Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 >> [ 106.026258] RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 >> [ 106.028143] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 >> [ 106.029626] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac >> [ 106.031340] RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 >> [ 106.032938] R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae >> [ 106.034363] R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 >> [ 106.036115] FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 >> [ 106.037855] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 106.039010] CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 >> [ 106.040453] Call Trace: >> [ 106.041245] _raw_spin_unlock_bh+0x31/0x40 >> [ 106.042543] io_rsrc_node_ref_zero+0x13e/0x190 >> [ 106.043471] io_dismantle_req+0x215/0x220 >> [ 106.044297] io_req_complete_post+0x1b8/0x720 >> [ 106.045456] __io_complete_rw.isra.0+0x16b/0x1f0 >> [ 106.046593] io_complete_rw+0x10/0x20 >> >> [ .... The rest of the call-stack is my stuff ] >> >> >> Apparently, io_req_complete_post() disables IRQs and this code-path seems >> valid (IOW: I did not somehow cause this failure). I am not familiar with >> this code, so some feedback would be appreciated. > > Can you try with this patch? Thanks! I might have hit another issue, but apparently even if it is real, it is unrelated. Tested-by: Nadav Amit <[email protected]> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero() 2021-08-09 22:01 ` Nadav Amit @ 2021-08-09 22:11 ` Jens Axboe 0 siblings, 0 replies; 4+ messages in thread From: Jens Axboe @ 2021-08-09 22:11 UTC (permalink / raw) To: Nadav Amit; +Cc: io-uring On 8/9/21 4:01 PM, Nadav Amit wrote: > >> On Aug 9, 2021, at 6:49 AM, Jens Axboe <[email protected]> wrote: >> >> On 8/8/21 10:36 PM, Nadav Amit wrote: >>> Jens, others, >>> >>> Sorry for bothering again, but I encountered a lockdep assertion failure: >>> >>> [ 106.009878] ------------[ cut here ]------------ >>> [ 106.012487] WARNING: CPU: 2 PID: 1777 at kernel/softirq.c:364 __local_bh_enable_ip+0xaa/0xe0 >>> [ 106.014524] Modules linked in: >>> [ 106.015174] CPU: 2 PID: 1777 Comm: umem Not tainted 5.13.1+ #161 >>> [ 106.016653] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/22/2020 >>> [ 106.018959] RIP: 0010:__local_bh_enable_ip+0xaa/0xe0 >>> [ 106.020344] Code: a9 00 ff ff 00 74 38 65 ff 0d a2 21 8c 7a e8 ed 1a 20 00 fb 66 0f 1f 44 00 00 5b 41 5c 5d c3 65 8b 05 e6 2d 8c 7a 85 c0 75 9a <0f> 0b eb 96 e8 2d 1f 20 00 eb a5 4c 89 e7 e8 73 4f 0c 00 eb ae 65 >>> [ 106.026258] RSP: 0018:ffff88812e58fcc8 EFLAGS: 00010046 >>> [ 106.028143] RAX: 0000000000000000 RBX: 0000000000000201 RCX: dffffc0000000000 >>> [ 106.029626] RDX: 0000000000000007 RSI: 0000000000000201 RDI: ffffffff8898c5ac >>> [ 106.031340] RBP: ffff88812e58fcd8 R08: ffffffff8575dbbf R09: ffffed1028ef14f9 >>> [ 106.032938] R10: ffff88814778a7c3 R11: ffffed1028ef14f8 R12: ffffffff85c9e9ae >>> [ 106.034363] R13: ffff88814778a000 R14: ffff88814778a7b0 R15: ffff8881086db890 >>> [ 106.036115] FS: 00007fbcfee17700(0000) GS:ffff8881e0300000(0000) knlGS:0000000000000000 >>> [ 106.037855] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 106.039010] CR2: 000000c0402a5008 CR3: 000000011c1ac003 CR4: 00000000003706e0 >>> [ 106.040453] Call Trace: >>> [ 106.041245] _raw_spin_unlock_bh+0x31/0x40 >>> [ 106.042543] io_rsrc_node_ref_zero+0x13e/0x190 >>> [ 106.043471] io_dismantle_req+0x215/0x220 >>> [ 106.044297] io_req_complete_post+0x1b8/0x720 >>> [ 106.045456] __io_complete_rw.isra.0+0x16b/0x1f0 >>> [ 106.046593] io_complete_rw+0x10/0x20 >>> >>> [ .... The rest of the call-stack is my stuff ] >>> >>> >>> Apparently, io_req_complete_post() disables IRQs and this code-path seems >>> valid (IOW: I did not somehow cause this failure). I am not familiar with >>> this code, so some feedback would be appreciated. >> >> Can you try with this patch? > > Thanks! I might have hit another issue, but apparently even if it is > real, it is unrelated. > > Tested-by: Nadav Amit <[email protected]> Thanks for testing! And regarding another issue, I would expect nothing less :-). It's always interesting to see new paths being paved, and inevitably that'll shake out a few issues in code that's been less exercised than the general part. -- Jens Axboe ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-08-09 22:12 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-08-09 4:36 iouring locking issue in io_req_complete_post() / io_rsrc_node_ref_zero() Nadav Amit 2021-08-09 13:49 ` Jens Axboe 2021-08-09 22:01 ` Nadav Amit 2021-08-09 22:11 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox