From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F0B9C433F5 for ; Wed, 13 Apr 2022 01:54:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231624AbiDMB5N (ORCPT ); Tue, 12 Apr 2022 21:57:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231623AbiDMB5N (ORCPT ); Tue, 12 Apr 2022 21:57:13 -0400 Received: from mail-yw1-x1134.google.com (mail-yw1-x1134.google.com [IPv6:2607:f8b0:4864:20::1134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88987101D9 for ; Tue, 12 Apr 2022 18:54:53 -0700 (PDT) Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-2eba37104a2so7754387b3.0 for ; Tue, 12 Apr 2022 18:54:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gyK+tGWNxOJuJzppEib2DQHvN0JkXZ8FD8J9aYbMf6I=; b=UllgAn666o261DEV7ZgNwRIT4RqRabRgXD+G9A8tnjP4qOrcBzuvTBOT19nJbxjvcz 5LELUkx7zC6VbmlOeZGaG+GaCQAIaRllZGmUNzkGqi7Uu9L9aj6ONyFGquG+9FMpBi5u XnWwtzYpxHAIpByEWlYTLwPjmXI4d6DJ/5l4UYgTs/GxHDZKvFL5Sep95bHR/6crAmKU AM6BKyD9wn8FkiQ4xgo1d11hMXE9IC+sP8kc/lieJtzAfy96PQTXl+Fub4ZjemXNjGaD vxFuDZF0m+uCIxTHXQppRQ96eHJbgxObn9hTFjv7ejYbzkpyWz0gDLWnpaIKLZPaQKnL RzmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gyK+tGWNxOJuJzppEib2DQHvN0JkXZ8FD8J9aYbMf6I=; b=spxHY/0SUVFyYKBL2ghoScYQiJTXVDdEL7bI858zVvBPW3dk7C064r1bBzVvfXQsjA eckNOlPsGjHr2mAKHHTAyphJ3r0IlLJXH6hYocwY2KyoEMYoGDbha9tjw73uFMY43OIu 9M2HVtusFmDbUivzY27i6+T44KBPOZUjWa2BdspUY1UuRIKi3GYzWsEJ9kTl3uqJ9aKz bogSL6IVF20hYgm/87+UCdjFmpW7Gnm4cBC55cGAwR9XsHt1AnKB7At3znwMiWGXJSsK uyZt976w5BbUspmWD85fN0JyWFrY022Ka3r6rCNd6uPsXg/eliJoZWXS64bgQ1mGQwzp C9XQ== X-Gm-Message-State: AOAM5319caIivRtZWatncNaoAV6amT4sUyf+NIDKaXcZmM0PdzKw8oyI oDx3XEg08NfhBclHvtexdC1miDzfIvzrdJvmDy6RXw== X-Google-Smtp-Source: ABdhPJwjuO0c397jUWUr68aKPwjqIlAgQjuQBYGyxfSeqE/B6EkoJoZ3dioG0f9MiDkXtG+kCgX83UYw26twuHZiR3s= X-Received: by 2002:a81:5409:0:b0:2eb:fea4:a240 with SMTP id i9-20020a815409000000b002ebfea4a240mr13795146ywb.47.1649814892486; Tue, 12 Apr 2022 18:54:52 -0700 (PDT) MIME-Version: 1.0 References: <20220412202613.234896-1-axboe@kernel.dk> <80ba97f9-3705-8fd6-8e7d-a934512d7ec0@kernel.dk> In-Reply-To: <80ba97f9-3705-8fd6-8e7d-a934512d7ec0@kernel.dk> From: Eric Dumazet Date: Tue, 12 Apr 2022 18:54:41 -0700 Message-ID: Subject: Re: [PATCHSET 0/4] Add support for no-lock sockets To: Jens Axboe Cc: Eric Dumazet , io-uring@vger.kernel.org, netdev Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On Tue, Apr 12, 2022 at 6:26 PM Jens Axboe wrote: > > On 4/12/22 6:40 PM, Eric Dumazet wrote: > > > > On 4/12/22 13:26, Jens Axboe wrote: > >> Hi, > >> > >> If we accept a connection directly, eg without installing a file > >> descriptor for it, or if we use IORING_OP_SOCKET in direct mode, then > >> we have a socket for recv/send that we can fully serialize access to. > >> > >> With that in mind, we can feasibly skip locking on the socket for TCP > >> in that case. Some of the testing I've done has shown as much as 15% > >> of overhead in the lock_sock/release_sock part, with this change then > >> we see none. > >> > >> Comments welcome! > >> > > How BH handlers (including TCP timers) and io_uring are going to run > > safely ? Even if a tcp socket had one user, (private fd opened by a > > non multi-threaded program), we would still to use the spinlock. > > But we don't even hold the spinlock over lock_sock() and release_sock(), > just the mutex. And we do check for running eg the backlog on release, > which I believe is done safely and similarly in other places too. So lets say TCP stack receives a packet in BH handler... it proceeds using many tcp sock fields. Then io_uring wants to read/write stuff from another cpu, while BH handler(s) is(are) not done yet, and will happily read/change many of the same fields Writing a 1 and a 0 in a bit field to ensure mutual exclusion is not going to work, even with the smp_rmb() and smp_wmb() you added (adding more costs for non io_uring users which already pay a high lock tax) If we want to optimize the lock_sock()/release_sock() for common cases (a single user thread per TCP socket), then maybe we can play games with some kind of cmpxchg() games, but that would be a generic change.