From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,UNPARSEABLE_RELAY,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0108CC433E9 for ; Mon, 25 Jan 2021 07:35:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BE2CB22ADF for ; Mon, 25 Jan 2021 07:35:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727120AbhAYHfJ (ORCPT ); Mon, 25 Jan 2021 02:35:09 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:41129 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727185AbhAYHac (ORCPT ); Mon, 25 Jan 2021 02:30:32 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R521e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e01424;MF=haoxu@linux.alibaba.com;NM=1;PH=DS;RN=3;SR=0;TI=SMTPD_---0UMmqIr4_1611559732; Received: from B-25KNML85-0107.local(mailfrom:haoxu@linux.alibaba.com fp:SMTPD_---0UMmqIr4_1611559732) by smtp.aliyun-inc.com(127.0.0.1); Mon, 25 Jan 2021 15:28:52 +0800 Subject: Re: [PATCH] io_uring: don't recursively hold ctx->uring_lock in io_wq_submit_work() To: Jens Axboe Cc: io-uring@vger.kernel.org, Joseph Qi References: <1611394824-73078-1-git-send-email-haoxu@linux.alibaba.com> <45a0221a-bd2b-7183-e35d-2d2550f687b5@kernel.dk> From: Hao Xu Message-ID: Date: Mon, 25 Jan 2021 15:28:52 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.6.1 MIME-Version: 1.0 In-Reply-To: <45a0221a-bd2b-7183-e35d-2d2550f687b5@kernel.dk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org 在 2021/1/25 下午12:31, Jens Axboe 写道: > On 1/23/21 2:40 AM, Hao Xu wrote: >> Abaci reported the following warning: >> >> [ 97.862205] ============================================ >> [ 97.863400] WARNING: possible recursive locking detected >> [ 97.864640] 5.11.0-rc4+ #12 Not tainted >> [ 97.865537] -------------------------------------------- >> [ 97.866748] a.out/2890 is trying to acquire lock: >> [ 97.867829] ffff8881046763e8 (&ctx->uring_lock){+.+.}-{3:3}, at: >> io_wq_submit_work+0x155/0x240 >> [ 97.869735] >> [ 97.869735] but task is already holding lock: >> [ 97.871033] ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: >> __x64_sys_io_uring_enter+0x3f0/0x5b0 >> [ 97.873074] >> [ 97.873074] other info that might help us debug this: >> [ 97.874520] Possible unsafe locking scenario: >> [ 97.874520] >> [ 97.875845] CPU0 >> [ 97.876440] ---- >> [ 97.877048] lock(&ctx->uring_lock); >> [ 97.877961] lock(&ctx->uring_lock); >> [ 97.878881] >> [ 97.878881] *** DEADLOCK *** >> [ 97.878881] >> [ 97.880341] May be due to missing lock nesting notation >> [ 97.880341] >> [ 97.881952] 1 lock held by a.out/2890: >> [ 97.882873] #0: ffff88810dfe0be8 (&ctx->uring_lock){+.+.}-{3:3}, at: >> __x64_sys_io_uring_enter+0x3f0/0x5b0 >> [ 97.885108] >> [ 97.885108] stack backtrace: >> [ 97.886209] CPU: 0 PID: 2890 Comm: a.out Not tainted 5.11.0-rc4+ #12 >> [ 97.887683] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS >> rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 >> [ 97.890457] Call Trace: >> [ 97.891121] dump_stack+0xac/0xe3 >> [ 97.891972] __lock_acquire+0xab6/0x13a0 >> [ 97.892940] lock_acquire+0x2c3/0x390 >> [ 97.893853] ? io_wq_submit_work+0x155/0x240 >> [ 97.894894] __mutex_lock+0xae/0x9f0 >> [ 97.895785] ? io_wq_submit_work+0x155/0x240 >> [ 97.896816] ? __lock_acquire+0x782/0x13a0 >> [ 97.897817] ? io_wq_submit_work+0x155/0x240 >> [ 97.898867] ? io_wq_submit_work+0x155/0x240 >> [ 97.899916] ? _raw_spin_unlock_irqrestore+0x2d/0x40 >> [ 97.901101] io_wq_submit_work+0x155/0x240 >> [ 97.902112] io_wq_cancel_cb+0x162/0x490 >> [ 97.903084] ? io_uring_get_socket+0x40/0x40 >> [ 97.904126] io_async_find_and_cancel+0x3b/0x140 >> [ 97.905247] io_issue_sqe+0x86d/0x13e0 >> [ 97.906186] ? __lock_acquire+0x782/0x13a0 >> [ 97.907195] ? __io_queue_sqe+0x10b/0x550 >> [ 97.908175] ? lock_acquire+0x2c3/0x390 >> [ 97.909122] __io_queue_sqe+0x10b/0x550 >> [ 97.910080] ? io_req_prep+0xd8/0x1090 >> [ 97.911044] ? mark_held_locks+0x5a/0x80 >> [ 97.912042] ? mark_held_locks+0x5a/0x80 >> [ 97.913014] ? io_queue_sqe+0x235/0x470 >> [ 97.913971] io_queue_sqe+0x235/0x470 >> [ 97.914894] io_submit_sqes+0xcce/0xf10 >> [ 97.915842] ? xa_store+0x3b/0x50 >> [ 97.916683] ? __x64_sys_io_uring_enter+0x3f0/0x5b0 >> [ 97.917872] __x64_sys_io_uring_enter+0x3fb/0x5b0 >> [ 97.918995] ? lockdep_hardirqs_on_prepare+0xde/0x180 >> [ 97.920204] ? syscall_enter_from_user_mode+0x26/0x70 >> [ 97.921424] do_syscall_64+0x2d/0x40 >> [ 97.922329] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [ 97.923538] RIP: 0033:0x7f0b62601239 >> [ 97.924437] Code: 01 00 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 >> 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f >> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 27 ec 2c 00 f7 d8 64 89 01 >> 48 >> [ 97.928628] RSP: 002b:00007f0b62cc4d28 EFLAGS: 00000246 ORIG_RAX: >> 00000000000001aa >> [ 97.930422] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: >> 00007f0b62601239 >> [ 97.932073] RDX: 0000000000000000 RSI: 0000000000006cf6 RDI: >> 0000000000000005 >> [ 97.933710] RBP: 00007f0b62cc4e20 R08: 0000000000000000 R09: >> 0000000000000000 >> [ 97.935369] R10: 0000000000000000 R11: 0000000000000246 R12: >> 0000000000000000 >> [ 97.937008] R13: 0000000000021000 R14: 0000000000000000 R15: >> 00007f0b62cc5700 >> >> This is caused by try to hold uring_lock in io_wq_submit_work() without >> checking if we are in io-wq thread context or not. It can be in original >> context when io_wq_submit_work() is called from IORING_OP_ASYNC_CANCEL >> code path, where we already held uring_lock. > > Looks like another fallout of the split CLOSE handling. I've got the > right fixes pending for 5.12: > > https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.12/io_uring&id=6bb0079ef3420041886afe1bcd8e7a87e08992e1 > > (and the prep patch before that in the tree). But that won't really > help us for 5.11 and earlier, though we probably should just queue > those two patches for 5.11 and get them into stable. I really don't > like the below patch, though it should fix it. But the root cause > is really the weird open cancelation... > Hi Jens, Thank you for the reference, I've got it. Allow me to ask one question, I doubt this warning may be triggered as well by: static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) { struct io_wqe_acct *acct = io_work_get_acct(wqe, work); int work_flags; unsigned long flags; /* ¦* Do early check to see if we need a new unbound worker, and if we do, ¦* if we're allowed to do so. This isn't 100% accurate as there's a ¦* gap between this check and incrementing the value, but that's OK. ¦* It's close enough to not be an issue, fork() has the same delay. ¦*/ if (unlikely(!io_wq_can_queue(wqe, acct, work))) { io_run_cancel(work, wqe); // here return; } But I'm not sure. Thanks, Hao