From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2567EC433F5 for ; Tue, 18 Jan 2022 23:37:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238222AbiARXhC (ORCPT ); Tue, 18 Jan 2022 18:37:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238152AbiARXhC (ORCPT ); Tue, 18 Jan 2022 18:37:02 -0500 Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5599C061574 for ; Tue, 18 Jan 2022 15:37:01 -0800 (PST) Received: by mail-il1-x133.google.com with SMTP id i14so621488ila.11 for ; Tue, 18 Jan 2022 15:37:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=6MX2/YJCEuMVsjFucdd/zVrtMd+SerdZa167kbd56Zs=; b=XsOd+/fW/Vgzx7i9ImeWWHWUSBEJE5uQ73zI9nSemRD0QJ2oTTK/+9FSp0qc3VVKSg DVQ60icxfUZ7IWCKcFFqC67X9eNrjjy4vnfQ5s6ynSghcLkgxRkgyc1z/E15tbXqbTVQ SEAee6DJhrPuMID4wctgFx3On6bOzZL2B7YtnfdgY11kdkRNWXXqJSYcwA9623NxYv5I EeZY7+blb+BCZj+NPkZ3Ysg5SNi+kRCW7u001tgpbsPq+UHsSVrQDNQzwPyKPEowGdrs e0083Nx02NUYy5BH82YS4U6Lzx/hYCLoSueL/IkNPSzQGWsa90a7cRlEZnehRfRTk+V6 W7WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=6MX2/YJCEuMVsjFucdd/zVrtMd+SerdZa167kbd56Zs=; b=5JFoBkvikBH0lUkkgsZ7E1JBQQVo2fWl+mf7nSo9Kc2OX2PNs4dkuGsLq2jl8hGcjG eMhcnbm6qUtfe26q8m3y9yyGVjrANsyAR2xuSELPxVV+JnbYwB/vu+7f9uwKSD0wtDt/ LfcaOHUf28Pm8u2z69V54S7/sV9D1CtzFTCSWA56CeqllIGNp1+CHYYc5Q1ssldk66CD RM0xsgnOaXX/c96GeDY6n3M9cAujn81v9tJyYHeEEVZmtUeLL7DIVsNYQhJTF5GofOJe LkmgS880vLhpj4udGA4UPDrNMg/1LtZgjNoR41uC/h5kreM833QlLJWo6FGB8L5gxCsg KbGA== X-Gm-Message-State: AOAM532aeBe/91re/2u34bJgOYKDzG7CYJnTHX3jbSHY53eEjXVZ3FAO 4dBAJOZSnFdyt81IPdS6a6M6S7vWMgtwqQ== X-Google-Smtp-Source: ABdhPJx1YpCXPihNfcrc/dEFZz4bt62BLl65Z8rk/o858N5h1+MCjYSs/bcn7udnbOtUScp5QzjZXQ== X-Received: by 2002:a05:6e02:164c:: with SMTP id v12mr15178607ilu.49.1642549020911; Tue, 18 Jan 2022 15:37:00 -0800 (PST) Received: from [192.168.1.116] ([66.219.217.159]) by smtp.gmail.com with ESMTPSA id x13sm8473334ile.49.2022.01.18.15.37.00 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 18 Jan 2022 15:37:00 -0800 (PST) Subject: Re: Canceled read requests never completed To: io-uring@vger.kernel.org, flow@cs.fau.de References: <20220118151337.fac6cthvbnu7icoc@pasture> <81656a38-3628-e32f-1092-bacf7468a6bf@kernel.dk> <20220118200549.qybt7fgfqznscidx@pasture> From: Jens Axboe Message-ID: <1ec0f92c-117e-d584-f456-036d41348332@kernel.dk> Date: Tue, 18 Jan 2022 16:36:59 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <20220118200549.qybt7fgfqznscidx@pasture> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 1/18/22 1:05 PM, Florian Fischer wrote: >>> After reading the io_uring_enter(2) man page a IORING_OP_ASYNC_CANCEL's return value of -EALREADY apparently >>> may not cause the request to terminate. At least that is our interpretation of "…res field will contain -EALREADY. >>> In this case, the request may or may not terminate." >> >> I took a look at this, and my theory is that the request cancelation >> ends up happening right in between when the work item is moved between >> the work list and to the worker itself. The way the async queue works, >> the work item is sitting in a list until it gets assigned by a worker. >> When that assignment happens, it's removed from the general work list >> and then assigned to the worker itself. There's a small gap there where >> the work cannot be found in the general list, and isn't yet findable in >> the worker itself either. >> >> Do you always see -ENOENT from the cancel when you get the hang >> condition? > > No we also and actually more commonly observe cancel returning > -EALREADY and the canceled read request never gets completed. > > As shown in the log snippet I included below. I think there are a couple of different cases here. Can you try the below patch? It's against current -git. diff --git a/fs/io-wq.c b/fs/io-wq.c index 5c4f582d6549..e8f8a5ee93c1 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -48,7 +48,8 @@ struct io_worker { struct io_wqe *wqe; struct io_wq_work *cur_work; - spinlock_t lock; + struct io_wq_work *next_work; + raw_spinlock_t lock; struct completion ref_done; @@ -529,9 +530,11 @@ static void io_assign_current_work(struct io_worker *worker, cond_resched(); } - spin_lock(&worker->lock); + WARN_ON_ONCE(work != worker->next_work); + raw_spin_lock(&worker->lock); worker->cur_work = work; - spin_unlock(&worker->lock); + worker->next_work = NULL; + raw_spin_unlock(&worker->lock); } static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work); @@ -555,9 +558,20 @@ static void io_worker_handle_work(struct io_worker *worker) * clear the stalled flag. */ work = io_get_next_work(acct, worker); - if (work) + if (work) { __io_worker_busy(wqe, worker, work); + /* + * Make sure cancelation can find this, even before + * it becomes the active work. That avoids a window + * where the work has been removed from our general + * work list, but isn't yet discoverable as the + * current work item for this worker. + */ + raw_spin_lock(&worker->lock); + worker->next_work = work; + raw_spin_unlock(&worker->lock); + } raw_spin_unlock(&wqe->lock); if (!work) break; @@ -815,7 +829,7 @@ static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index) refcount_set(&worker->ref, 1); worker->wqe = wqe; - spin_lock_init(&worker->lock); + raw_spin_lock_init(&worker->lock); init_completion(&worker->ref_done); if (index == IO_WQ_ACCT_BOUND) @@ -973,6 +987,19 @@ void io_wq_hash_work(struct io_wq_work *work, void *val) work->flags |= (IO_WQ_WORK_HASHED | (bit << IO_WQ_HASH_SHIFT)); } +static bool __io_wq_worker_cancel(struct io_worker *worker, + struct io_cb_cancel_data *match, + struct io_wq_work *work) +{ + if (work && match->fn(work, match->data)) { + work->flags |= IO_WQ_WORK_CANCEL; + set_notify_signal(worker->task); + return true; + } + + return false; +} + static bool io_wq_worker_cancel(struct io_worker *worker, void *data) { struct io_cb_cancel_data *match = data; @@ -981,13 +1008,11 @@ static bool io_wq_worker_cancel(struct io_worker *worker, void *data) * Hold the lock to avoid ->cur_work going out of scope, caller * may dereference the passed in work. */ - spin_lock(&worker->lock); - if (worker->cur_work && - match->fn(worker->cur_work, match->data)) { - set_notify_signal(worker->task); + raw_spin_lock(&worker->lock); + if (__io_wq_worker_cancel(worker, match, worker->cur_work) || + __io_wq_worker_cancel(worker, match, worker->next_work)) match->nr_running++; - } - spin_unlock(&worker->lock); + raw_spin_unlock(&worker->lock); return match->nr_running && !match->cancel_all; } @@ -1039,17 +1064,16 @@ static void io_wqe_cancel_pending_work(struct io_wqe *wqe, { int i; retry: - raw_spin_lock(&wqe->lock); for (i = 0; i < IO_WQ_ACCT_NR; i++) { struct io_wqe_acct *acct = io_get_acct(wqe, i == 0); if (io_acct_cancel_pending_work(wqe, acct, match)) { + raw_spin_lock(&wqe->lock); if (match->cancel_all) goto retry; - return; + break; } } - raw_spin_unlock(&wqe->lock); } static void io_wqe_cancel_running_work(struct io_wqe *wqe, @@ -1078,7 +1102,9 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel, for_each_node(node) { struct io_wqe *wqe = wq->wqes[node]; + raw_spin_lock(&wqe->lock); io_wqe_cancel_pending_work(wqe, &match); + raw_spin_unlock(&wqe->lock); if (match.nr_pending && !match.cancel_all) return IO_WQ_CANCEL_OK; } @@ -1092,7 +1118,15 @@ enum io_wq_cancel io_wq_cancel_cb(struct io_wq *wq, work_cancel_fn *cancel, for_each_node(node) { struct io_wqe *wqe = wq->wqes[node]; + raw_spin_lock(&wqe->lock); + io_wqe_cancel_pending_work(wqe, &match); + if (match.nr_pending && !match.cancel_all) { + raw_spin_unlock(&wqe->lock); + return IO_WQ_CANCEL_OK; + } + io_wqe_cancel_running_work(wqe, &match); + raw_spin_unlock(&wqe->lock); if (match.nr_running && !match.cancel_all) return IO_WQ_CANCEL_RUNNING; } @@ -1263,7 +1297,9 @@ static void io_wq_destroy(struct io_wq *wq) .fn = io_wq_work_match_all, .cancel_all = true, }; + raw_spin_lock(&wqe->lock); io_wqe_cancel_pending_work(wqe, &match); + raw_spin_unlock(&wqe->lock); free_cpumask_var(wqe->cpu_mask); kfree(wqe); } diff --git a/fs/io_uring.c b/fs/io_uring.c index 422d6de48688..49f115a2dec4 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6386,7 +6386,8 @@ static int io_try_cancel_userdata(struct io_kiocb *req, u64 sqe_addr) WARN_ON_ONCE(!io_wq_current_is_worker() && req->task != current); ret = io_async_cancel_one(req->task->io_uring, sqe_addr, ctx); - if (ret != -ENOENT) + //if (ret != -ENOENT) + if (ret == 0) return ret; spin_lock(&ctx->completion_lock); @@ -6892,7 +6893,8 @@ static void io_wq_submit_work(struct io_wq_work *work) io_queue_linked_timeout(timeout); /* either cancelled or io-wq is dying, so don't touch tctx->iowq */ - if (work->flags & IO_WQ_WORK_CANCEL) { + if (work->flags & IO_WQ_WORK_CANCEL || + test_tsk_thread_flag(current, TIF_NOTIFY_SIGNAL)) { io_req_task_queue_fail(req, -ECANCELED); return; } -- Jens Axboe