From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0B9B4CA0FE1 for ; Fri, 1 Sep 2023 01:50:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234984AbjIABuL (ORCPT ); Thu, 31 Aug 2023 21:50:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56862 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230061AbjIABuK (ORCPT ); Thu, 31 Aug 2023 21:50:10 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADF48E6E for ; Thu, 31 Aug 2023 18:50:07 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-26d50a832a9so1070012a91.3 for ; Thu, 31 Aug 2023 18:50:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1693533007; x=1694137807; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=8UYX/rV54NTNhfJ7ISy6QcWzS2KoV/Pg1bDdPX1eCsw=; b=gcNTYwDSiTZlzmPGF82VMzlaEcqVROV34tT/1diEai7Pc0gMW5knzyF3OcagHaJG/2 9bkXuty6KRTR2uQ0R9xpPMJpCV4uWTht0+iJKuno9X1nHBSMhhWGZR077Ho7w/iueMyy UmmUqvSRETQnlnDhk26zLpkljLuUHO/HCC1cjJSce9EpA9uX9uCWldDB+phDwT7Bc2mc ZH2PX9/ZpQ0SHUY5cZWJET6FvbewEqQwwjalQnHGDKPd2xne2DZe8KyQhqJHM7sSTgU3 Q5knRXfKQSXfEYcHHVE9n5z++FhrQCZBi0I6oYQj6iMynBeY0ausiE1z1PDd8wIU50DD Fixw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693533007; x=1694137807; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8UYX/rV54NTNhfJ7ISy6QcWzS2KoV/Pg1bDdPX1eCsw=; b=cTCa7Yig0bj94cmLYYsCNnnfj6uggkzKFWTOLeCiNpgkMXBP/n6SVo6DOkXL2ccEg2 2KennGozMlKds70mSjLFOMBfxwoH+khxL9tM+s/s+5KWX2L8Oej0uMnAqNiBwnl8nf0J 3WrE3orFQpw8/Iv+NAuRHncKgNyzcD6U5+qOptH/6/EUxiLYMpfYy569hyJgfJuyAZvf vGBNcAtSTDNMjqT+7ngh6h8ff7Iy3aLeFmhd7AAEGztwIkrclYSmtub3/CYoNaYu3PR/ zTFwwl47/jTz8XuSQ8diUpMMFcQ2+a+Qp/td+n+5TYjilqQddbwdElJu7XrJHgwb6RFR bheg== X-Gm-Message-State: AOJu0Yxjf9Mm4os9ESPhfrOsWpM2OArF7itpwDwXr6BxK4AnTI+YZSX1 H9oWr3R7guEMYQHZ8bvR/exSOw== X-Google-Smtp-Source: AGHT+IElCrE30MKp3wAErGkQ77jWKU29G37KiPz80q3RV29NKfH3oUb+F6dEMEDu8a+Ecg74eD0upg== X-Received: by 2002:a17:90a:5184:b0:262:ecd9:ed09 with SMTP id u4-20020a17090a518400b00262ecd9ed09mr1019862pjh.33.1693533007154; Thu, 31 Aug 2023 18:50:07 -0700 (PDT) Received: from [10.4.182.70] ([139.177.225.225]) by smtp.gmail.com with ESMTPSA id 27-20020a17090a031b00b002680b2d2ab6sm3814232pje.19.2023.08.31.18.50.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 31 Aug 2023 18:50:06 -0700 (PDT) Message-ID: <7a083b4e-f9f3-552b-0e6c-32bf44982d8f@bytedance.com> Date: Fri, 1 Sep 2023 09:50:02 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH] io_uring: fix IO hang in io_wq_put_and_exit from do_exit() Content-Language: en-US To: Ming Lei , Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org Cc: David Howells References: <20230831074221.2309565-1-ming.lei@redhat.com> From: Chengming Zhou In-Reply-To: <20230831074221.2309565-1-ming.lei@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 2023/8/31 15:42, Ming Lei wrote: > io_wq_put_and_exit() is called from do_exit(), but all requests in io_wq > aren't cancelled in io_uring_cancel_generic() called from do_exit(). > Meantime io_wq IO code path may share resource with normal iopoll code > path. > > So if any HIPRI request is pending in io_wq_submit_work(), this request > may not get resouce for moving on, given iopoll isn't possible in > io_wq_put_and_exit(). > > The issue can be triggered when terminating 't/io_uring -n4 /dev/nullb0' > with default null_blk parameters. > > Fix it by always cancelling all requests in io_wq from io_uring_cancel_generic(), > and this way is reasonable because io_wq destroying follows cancelling > requests immediately. Based on one patch from Chengming. Thanks much for this work, I'm still learning these code, so maybe some silly questions below. > > Closes: https://lore.kernel.org/linux-block/3893581.1691785261@warthog.procyon.org.uk/ > Reported-by: David Howells > Cc: Chengming Zhou , > Signed-off-by: Ming Lei > --- > io_uring/io_uring.c | 40 ++++++++++++++++++++++++++++------------ > 1 file changed, 28 insertions(+), 12 deletions(-) > > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index e7675355048d..18d5ab969c29 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -144,7 +144,7 @@ struct io_defer_entry { > > static bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, > struct task_struct *task, > - bool cancel_all); > + bool cancel_all, bool *wq_cancelled); > > static void io_queue_sqe(struct io_kiocb *req); > > @@ -3049,7 +3049,7 @@ static __cold void io_ring_exit_work(struct work_struct *work) > if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) > io_move_task_work_from_local(ctx); > > - while (io_uring_try_cancel_requests(ctx, NULL, true)) > + while (io_uring_try_cancel_requests(ctx, NULL, true, NULL)) > cond_resched(); > > if (ctx->sq_data) { > @@ -3231,12 +3231,13 @@ static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx) > > static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, > struct task_struct *task, > - bool cancel_all) > + bool cancel_all, bool *wq_cancelled) > { > - struct io_task_cancel cancel = { .task = task, .all = cancel_all, }; > + struct io_task_cancel cancel = { .task = task, .all = true, }; > struct io_uring_task *tctx = task ? task->io_uring : NULL; > enum io_wq_cancel cret; > bool ret = false; > + bool wq_active = false; > > /* set it so io_req_local_work_add() would wake us up */ > if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) { > @@ -3249,7 +3250,7 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, > return false; > > if (!task) { > - ret |= io_uring_try_cancel_iowq(ctx); > + wq_active = io_uring_try_cancel_iowq(ctx); > } else if (tctx && tctx->io_wq) { > /* > * Cancels requests of all rings, not only @ctx, but > @@ -3257,11 +3258,20 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, > */ > cret = io_wq_cancel_cb(tctx->io_wq, io_cancel_task_cb, > &cancel, true); > - ret |= (cret != IO_WQ_CANCEL_NOTFOUND); > + wq_active = (cret != IO_WQ_CANCEL_NOTFOUND); > } > + ret |= wq_active; > + if (wq_cancelled) > + *wq_cancelled = !wq_active; Here it seems "wq_cancelled" means no any pending or running work anymore. Why not just use the return value "loop", instead of using this new "wq_cancelled"? If return value "loop" is true, we know there is still any request need to cancel, so we will loop the cancel process until there is no any request. Ah, I guess you may want to cover one case: !wq_active && loop == true > > - /* SQPOLL thread does its own polling */ > - if ((!(ctx->flags & IORING_SETUP_SQPOLL) && cancel_all) || > + /* > + * SQPOLL thread does its own polling > + * > + * io_wq may share IO resources(such as requests) with iopoll, so > + * iopoll requests have to be reapped for providing forward > + * progress to io_wq cancelling > + */ > + if (!(ctx->flags & IORING_SETUP_SQPOLL) || > (ctx->sq_data && ctx->sq_data->thread == current)) { > while (!wq_list_empty(&ctx->iopoll_list)) { > io_iopoll_try_reap_events(ctx); > @@ -3313,11 +3323,12 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd) > atomic_inc(&tctx->in_cancel); > do { > bool loop = false; > + bool wq_cancelled; > > io_uring_drop_tctx_refs(current); > /* read completions before cancelations */ > inflight = tctx_inflight(tctx, !cancel_all); > - if (!inflight) > + if (!inflight && !tctx->io_wq) > break; > I think this inflight check should put after the cancel loop, because the cancel loop make sure there is no any request need to cancel, then we can loop inflight checking to make sure all inflight requests to complete. > if (!sqd) { > @@ -3326,20 +3337,25 @@ __cold void io_uring_cancel_generic(bool cancel_all, struct io_sq_data *sqd) > if (node->ctx->sq_data) > continue; > loop |= io_uring_try_cancel_requests(node->ctx, > - current, cancel_all); > + current, cancel_all, > + &wq_cancelled); > } > } else { > list_for_each_entry(ctx, &sqd->ctx_list, sqd_list) > loop |= io_uring_try_cancel_requests(ctx, > current, > - cancel_all); > + cancel_all, > + &wq_cancelled); > } > > - if (loop) { > + if (!wq_cancelled || (inflight && loop)) { > cond_resched(); > continue; > } So here we just need to check "loop", continue cancel if loop is true? Thanks! > > + if (!inflight) > + break; > + > prepare_to_wait(&tctx->wait, &wait, TASK_INTERRUPTIBLE); > io_run_task_work(); > io_uring_drop_tctx_refs(current);