From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4799C433F5 for ; Fri, 29 Oct 2021 13:48:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AC2366117A for ; Fri, 29 Oct 2021 13:48:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230296AbhJ2NvI (ORCPT ); Fri, 29 Oct 2021 09:51:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231431AbhJ2NvE (ORCPT ); Fri, 29 Oct 2021 09:51:04 -0400 Received: from mail-wm1-x329.google.com (mail-wm1-x329.google.com [IPv6:2a00:1450:4864:20::329]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 65C70C061570 for ; Fri, 29 Oct 2021 06:48:00 -0700 (PDT) Received: by mail-wm1-x329.google.com with SMTP id v127so8859635wme.5 for ; Fri, 29 Oct 2021 06:48:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :cc:references:from:in-reply-to:content-transfer-encoding; bh=qQM/MK/kxYNz3KYg7mZ+1iadFSbJoF579IZB1f8155Y=; b=YUE6CCbCayPP09ocNyGvPAYeP9FyUOAA9mtOP/qWMFDJNeiW4zUi9qNc4bqKXtjuEW Uv+Eha7g99LvR0DWTJhWsJ8QDqBFWzxwN39uIM3UeDQARykSViYQODVYYo5nkmsf0GNO Cp0YfPzfkAnDMe0M6q1pTWLwQLXMjfGt1OPkZIXClRfF5FvK2JuszFQ8Ezp8Wb+ax0G2 RvsxD/Cwi8S2cQ4eY5secaPwa9SEDVW4WrMv5vFWJEqIh+Q3SYg1UT8i2WjaX9m27DIt kzVu38aZ0OduAsScFctzNr650LfJAxbf4kErVKPcKGAFVCkzlFeUbJH6mlkVf7AqhkA9 O46Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=qQM/MK/kxYNz3KYg7mZ+1iadFSbJoF579IZB1f8155Y=; b=BDhZ8bm7jBNyZW6a5BGqJgp6sp0ljGfk1muAvoIHoXOf1j/d3/db+32n4tohfIuGb9 pYUg1DQsT5iLO5Jv2fQjnpvGxPeJDETVtJd7TwCKF7Q6a/H3jmpv9NFj8EBYHnBSTY5m b+okVQyU6onnNouNb6JLvvminXdbvkc32PZKo5pAqdEQhU8Ra/5MrPGyYX/sPKQF0Y9I l/aTIBvUg8VBpfHoeS340jhfTPyG3G+6jA5xpdMU1IetAXHGNtcNkyVpi+4k6Ucp4xnO VPwCxbsB3SJ2mU6o8gUyFbFyasr3F39Phs1b1cTgXzV7lkpKuf7nTwDFUDEZmwWGn4fA 5FAw== X-Gm-Message-State: AOAM531YpFGA+AptNpSxfRMoZ4qZ2xZKcxm7T93oEKQdMR6Z7ee8g/xy hNELjz2oBRR/+FhQkEeVKwI= X-Google-Smtp-Source: ABdhPJwNYKJr0Os2YYtSsMMFgLvyvLsdJ12xxQblb/HvolV0WSeGr1YIxW8LVpyhs7yvJLD5yNx69Q== X-Received: by 2002:a05:600c:4f52:: with SMTP id m18mr19900187wmq.50.1635515278979; Fri, 29 Oct 2021 06:47:58 -0700 (PDT) Received: from [192.168.8.198] ([148.252.129.16]) by smtp.gmail.com with ESMTPSA id l6sm5042351wmq.17.2021.10.29.06.47.58 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 29 Oct 2021 06:47:58 -0700 (PDT) Message-ID: <99d6451e-db6b-76ce-b808-c0859a2764af@gmail.com> Date: Fri, 29 Oct 2021 14:47:25 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request Content-Language: en-US To: Xiaoguang Wang , io-uring@vger.kernel.org Cc: axboe@kernel.dk References: <20211025053849.3139-1-xiaoguang.wang@linux.alibaba.com> <20211025053849.3139-3-xiaoguang.wang@linux.alibaba.com> <7dd1823d-0324-36d1-2562-362f2ef0399b@gmail.com> <7e6c2a36-adf5-ece5-9109-cd5c4429e79d@linux.alibaba.com> From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 10/29/21 14:37, Xiaoguang Wang wrote: > hi, > >> On 10/29/21 03:57, Xiaoguang Wang wrote: >>> hi, >>> >>>> On 10/25/21 06:38, Xiaoguang Wang wrote: >>>>> Run echo_server to evaluate io_uring's multi-shot poll performance, perf >>>>> shows that add_wait_queue() has obvious overhead. Intruduce a new state >>>>> 'active' in io_poll_iocb to indicate whether io_poll_wake() should queue >>>>> a task_work. This new state will be set to true initially, be set to false >>>>> when starting to queue a task work, and be set to true again when a poll >>>>> cqe has been committed. One concern is that this method may lost waken-up >>>>> event, but seems it's ok. >>>>> >>>>>    io_poll_wake                io_poll_task_func >>>>> t1                       | >>>>> t2                       | WRITE_ONCE(req->poll.active, true); >>>>> t3                       | >>>>> t4                       |    io_commit_cqring(ctx); >>>>> t5                       | >>>>> t6                       | >>>>> >>>>> If waken-up events happens before or at t4, it's ok, user app will always >>>>> see a cqe. If waken-up events happens after t4 and IIUC, io_poll_wake() >>>>> will see the new req->poll.active value by using READ_ONCE(). >>>>> >>>>> Echo_server codes can be cloned from: >>>>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git, >>>>> branch is xiaoguangwang/io_uring_multishot. >>>>> >>>>> Without this patch, the tps in our test environment is 284116, with >>>>> this patch, the tps is 287832, about 1.3% reqs improvement, which >>>>> is indeed in accord with the saved add_wait_queue() cost. >>>>> >>>>> Signed-off-by: Xiaoguang Wang >>>>> --- >>>>>   fs/io_uring.c | 57 +++++++++++++++++++++++++++++++++------------------------ >>>>>   1 file changed, 33 insertions(+), 24 deletions(-) >>>>> >>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>>> index 18af9bb9a4bc..e4c779dac953 100644 >>>>> --- a/fs/io_uring.c >>>>> +++ b/fs/io_uring.c >>>>> @@ -481,6 +481,7 @@ struct io_poll_iocb { >>>>>       __poll_t            events; >>>>>       bool                done; >>>>>       bool                canceled; >>>>> +    bool                active; >>>>>       struct wait_queue_entry        wait; >>>>>   }; >>>>>   @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol >>>>>   { >>>>>       trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask); >>>>>   -    list_del_init(&poll->wait.entry); >>>>> - >>>> >>>> As I mentioned to Hao some time ago, we can't allow this function or in >>>> particular io_req_task_work_add() to happen twice before the first >>>> task work got executed, they use the same field in io_kiocb and those >>>> will corrupt the tw list. >>>> >>>> Looks that's what can happen here. >>> If I have understood scenario your described correctly, I think it won't happen :) >>> With this patch, if the first io_req_task_work_add() is issued, poll.active >>> will be set to false, then no new io_req_task_work_add() will be issued. >>> Only the first task_work installed by the first io_req_task_work_add() has >>> completed, poll.active will be set to true again. >> >> Ah, I see now, the active dance is in io_poll_wake(). That won't work >> with READ_ONCE/WRITE_ONCE though, you would need real atomics >> >> The easiest race to explain is: >> >> CPU1                        | CPU2 >> io_poll_wake                | io_poll_wake >> if (p->active) return 0;    | if (p->active) return 0; > it's "if (!p->active) return 0;" in my patch :) >> // p->active is false in both cases, continue >> p->active = false;          | p->active = false; >> task_work_add()             | task_work_add() > io_poll_wake() is called with poll->head->lock, so there will no concurrent > io_poll_wake() calls. Double poll, and also there are no guarantees that CPUs don't see inconsistent values. -- Pavel Begunkov