From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0D1BC433FE for ; Mon, 21 Nov 2022 23:59:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230237AbiKUX7i (ORCPT ); Mon, 21 Nov 2022 18:59:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34506 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229739AbiKUX7g (ORCPT ); Mon, 21 Nov 2022 18:59:36 -0500 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C6CB10559 for ; Mon, 21 Nov 2022 15:59:29 -0800 (PST) Received: by mail-pj1-x102f.google.com with SMTP id b1-20020a17090a7ac100b00213fde52d49so12837486pjl.3 for ; Mon, 21 Nov 2022 15:59:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=jccJ4Bm2honkk55HL5L3ubecWLHTgsBISaT1aAHD2nQ=; b=5QhulFzI1G8WHE9oKX4gt1WUeh1Fynoxt5ZWH2vyk7Oo9+FSoHFRaWUZLEr3pywHNF nGb0E+jUepD7p20/iTExnyVbiziyc8byaeDw4KWtx5Z+RTbTKJ1cxsZBZCli/lr/yRI+ cAvro7xKn6Te2WNhjfzLv7u4hLfGfqYiUr5sePWuVMDXVHngVq9HENgpdAB5xVqc1l/x BVborIe3koNOM4bggrC9+HzY6mwzYgLUVM1gGVu+2vwcIGjmNToc0rqflPYE4TRgEEpf +5lnj9VX1VE3A2MrgNLnCB6v2AdV3wFaQ95in+L+uZ1Qr6Rm9qEemD8a3KJHbFhPBsi9 T/Rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jccJ4Bm2honkk55HL5L3ubecWLHTgsBISaT1aAHD2nQ=; b=A+utGZc6KVWnGSwcKfy8J2eA9P0739OdoyCiQWx5PzlH87lpvAxQehsPWGmliEThk7 q08If5IulArBN8N1Roi3PugVBjBM1eGyqRTYrzUh8n8cKvumsBWrcV6Sh+kwofhAoa/5 Ik6cpyuZYSWe9TAXHcR+QT31Br+kD0qAYxS02Qd0KzKXNqIf/u1MFFqUo7pVZH3KbjUr n9mDaEVLTf8qEIcdaylCmym/kfPIXhejGxEvym7GGnNr0SddfX9KFomFTXRlXC24MZ6v lxSVIwe81C+0TpRc1tyBRX+GJQpg1oajXNAMQFFwvwa4ghjvyNzIpL0NjM/Fd5m2Ps3O avvQ== X-Gm-Message-State: ANoB5pn13AisLWHkm2RUtUvjJ79GB7qiWAH7vG7Rfq+pDEcJHxV4zvwc D8FYwgDPKXdY1v3GIL1ngqWT4/zl37gwVQ== X-Google-Smtp-Source: AA0mqf6Qm/z49ORGSNpdORDWI9uJV5l4u+ZQ0/SajrYhCffr1+7KkL52SrzOdM2ahfQEvSvHL5HLNA== X-Received: by 2002:a17:903:2144:b0:188:a1eb:9a8a with SMTP id s4-20020a170903214400b00188a1eb9a8amr5571440ple.153.1669075168333; Mon, 21 Nov 2022 15:59:28 -0800 (PST) Received: from [192.168.1.136] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id t8-20020aa79468000000b005625d6d2999sm9231276pfq.187.2022.11.21.15.59.27 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 21 Nov 2022 15:59:27 -0800 (PST) Message-ID: <74feda24-37fd-11ea-af0e-1eff9ed4941e@kernel.dk> Date: Mon, 21 Nov 2022 16:59:26 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:102.0) Gecko/20100101 Thunderbird/102.5.0 Subject: Re: [PATCH v5 1/3] io_uring: add napi busy polling support Content-Language: en-US From: Jens Axboe To: Stefan Roesch , kernel-team@fb.com Cc: olivier@trillion01.com, netdev@vger.kernel.org, io-uring@vger.kernel.org, kuba@kernel.org References: <20221121191437.996297-1-shr@devkernel.io> <20221121191437.996297-2-shr@devkernel.io> <067a22bc-72ba-9035-05da-93c43ce356f2@kernel.dk> In-Reply-To: <067a22bc-72ba-9035-05da-93c43ce356f2@kernel.dk> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 11/21/22 12:45?PM, Jens Axboe wrote: > On 11/21/22 12:14?PM, Stefan Roesch wrote: >> +/* >> + * io_napi_add() - Add napi id to the busy poll list >> + * @file: file pointer for socket >> + * @ctx: io-uring context >> + * >> + * Add the napi id of the socket to the napi busy poll list. >> + */ >> +void io_napi_add(struct file *file, struct io_ring_ctx *ctx) >> +{ >> + unsigned int napi_id; >> + struct socket *sock; >> + struct sock *sk; >> + struct io_napi_entry *ne; >> + >> + if (!io_napi_busy_loop_on(ctx)) >> + return; >> + >> + sock = sock_from_file(file); >> + if (!sock) >> + return; >> + >> + sk = sock->sk; >> + if (!sk) >> + return; >> + >> + napi_id = READ_ONCE(sk->sk_napi_id); >> + >> + /* Non-NAPI IDs can be rejected */ >> + if (napi_id < MIN_NAPI_ID) >> + return; >> + >> + spin_lock(&ctx->napi_lock); >> + list_for_each_entry(ne, &ctx->napi_list, list) { >> + if (ne->napi_id == napi_id) { >> + ne->timeout = jiffies + NAPI_TIMEOUT; >> + goto out; >> + } >> + } >> + >> + ne = kmalloc(sizeof(*ne), GFP_NOWAIT); >> + if (!ne) >> + goto out; >> + >> + ne->napi_id = napi_id; >> + ne->timeout = jiffies + NAPI_TIMEOUT; >> + list_add_tail(&ne->list, &ctx->napi_list); >> + >> +out: >> + spin_unlock(&ctx->napi_lock); >> +} > > I think this all looks good now, just one minor comment on the above. Is > the expectation here that we'll basically always add to the napi list? > If so, then I think allocating 'ne' outside the spinlock would be a lot > saner, and then just kfree() it for the unlikely case where we find a > duplicate. After thinking about this a bit more, I don't think this is done in the most optimal fashion. If the list is longer than a few entries, this check (or check-alloc-insert) is pretty expensive and it'll add substantial overhead to the poll path for sockets if napi is enabled. I think we should do something ala: 1) When arming poll AND napi has been enabled for the ring, then alloc io_napi_entry upfront and store it in ->async_data. 2) Maintain the state in the io_napi_entry. If we're on the list, that can be checked with just list_empty(), for example. If not on the list, assign timeout and add. 3) Have regular request cleanup free it. This could be combined with an alloc cache, I would not do that for the first iteration though. This would make io_napi_add() cheap - no more list iteration, and no more allocations. And that is arguably the most important part, as that is called everytime the poll is woken up. Particularly for multishot that makes a big difference. It's also designed much better imho, moving the more expensive bits to the setup side. -- Jens Axboe