From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 65CBFECAAD8 for ; Thu, 22 Sep 2022 01:54:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231180AbiIVByl (ORCPT ); Wed, 21 Sep 2022 21:54:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52646 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231171AbiIVByj (ORCPT ); Wed, 21 Sep 2022 21:54:39 -0400 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 13F28B01 for ; Wed, 21 Sep 2022 18:54:32 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id 3so7734289pga.1 for ; Wed, 21 Sep 2022 18:54:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date; bh=v7udx63NJLPOWnkmiZoa0Esgcb0822FZgS1XvDl3bDM=; b=Mp780LFDO1D6TvZI6HEIyl72aJodqLT/UzmQDcDGyTpoKjwicarElWc90PWCqxqe5J tJ6NDKrhlGwcwBCqhNFTjpcWr1CfO4srCm/6MjMq6nN1xWLew+yXCZtZaSe+gYNLOIPe 1S9P2UV9tGDoIN0dCAer3ImuCdNKFLv+9AfpCz9Rf6YSpRcmG5qLUmw4HEYCEQ4i9ahw ox+wb3TY3zgRI0m5hUskFsiacg+JxUNWBB08dfD2SH/Fs1JkVGiN8zwBOk180WXVqzr3 tCJqjGf5UecKcjGsFrBgoS+7sFCdvWFuw8HkE4QJTQT4/135N85NFrtljgHH/amGY8UW Ma0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=v7udx63NJLPOWnkmiZoa0Esgcb0822FZgS1XvDl3bDM=; b=xykugGxrEi7QTUfXE8kctXddGOv53VInEseM1u/XIYAdPDuFZu5QkQPbgnATIvzUgK KoKb2qOesPaz2V8X4Iv59fsQeoX6Lzc6lFj1VqrPrRuTm/fXC7R/s2AkaKBm5fu2piCx dOp0EZbStJYada3KOpEHGaLheDf6uyLHmSpKjEve7v1TJ1wYTd8jbsvzXtNDdEY96lVf WCEBaaFbAIyUpQrf3vq6VR6+HSqe5WaeAtuLN8LK/XhQ8H3vaw2TJUVTGcImRXUgphCP RUcFOn+wBMlLzbXnjymGOEjmG//bbX7aNpROJ+2G1rVLHqysKZWCjCdjEyYG7FBdA7jt dLIA== X-Gm-Message-State: ACrzQf2zjSYI2EwhX06ELDXzKa0EFT9p0/kqheS8PwMdV7p0UFknSxYr JlWH0xUdmR2urlafiSHjrUdZ6t+RNhdb3Q== X-Google-Smtp-Source: AMsMyM7UR8Gh40vbiw77NwquHP3i5uDNmoODc3JPggPw2vMFNLaCxi4sUpo8qxEQb0n8Ki7OFVaLBg== X-Received: by 2002:a05:6a00:189d:b0:53e:79de:3fc1 with SMTP id x29-20020a056a00189d00b0053e79de3fc1mr1199895pfh.2.1663811672005; Wed, 21 Sep 2022 18:54:32 -0700 (PDT) Received: from [192.168.1.136] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id i62-20020a17090a3dc400b001facf455c91sm2570287pjc.21.2022.09.21.18.54.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Sep 2022 18:54:31 -0700 (PDT) Message-ID: <20adf5fe-98a0-06a0-7058-e6f9ba7d9e2a@kernel.dk> Date: Wed, 21 Sep 2022 19:54:30 -0600 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2 Subject: Re: Memory ordering description in io_uring.pdf Content-Language: en-US To: "J. Hanne" , io-uring@vger.kernel.org References: <20220918165616.38AC12FC059D@dd11108.kasserver.com> From: Jens Axboe In-Reply-To: <20220918165616.38AC12FC059D@dd11108.kasserver.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On 9/18/22 10:56 AM, J. Hanne wrote: > Hi, > > I have a couple of questions regarding the necessity of including memory > barriers when using io_uring, as outlined in > https://kernel.dk/io_uring.pdf. I'm fine with using liburing, but still I > do want to understand what is going on behind the scenes, so any comment > would be appreciated. In terms of the barriers, that doc is somewhat outdated... > Firstly, I wonder why memory barriers are required at all, when NOT using > polled mode. Because requiring them in non-polled mode somehow implies that: > - Memory re-ordering occurs across system-call boundaries (i.e. when > submitting, the tail write could happen after the io_uring_enter > syscall?!) > - CPU data dependency checks do not work > So, are memory barriers really required when just using a simple > loop around io_uring_enter with completely synchronous processing? No, I don't beleive that they are. The exception is SQPOLL, as you mention, as there's not necessarily a syscall involved with that. > Secondly, the examples in io_uring.pdf suggest that checking completion > entries requires a read_barrier and a write_barrier and submitting entries > requires *two* write_barriers. Really? > > My expectation would be, just as with "normal" inter-thread userspace ipc, > that plain store-release and load-acquire semantics are sufficient, e.g.: > - For reading completion entries: > -- first read the CQ ring head (without any ordering enforcement) > -- then use __atomic_load(__ATOMIC_ACQUIRE) to read the CQ ring tail > -- then use __atomic_store(__ATOMIC_RELEASE) to update the CQ ring head > - For submitting entries: > -- first read the SQ ring tail (without any ordering enforcement) > -- then use __atomic_load(__ATOMIC_ACQUIRE) to read the SQ ring head > -- then use __atomic_store(__ATOMIC_RELEASE) to update the SQ ring tail > Wouldn't these be sufficient?! Please check liburing to see what that does. Would be interested in your feedback (and patches!). Largely x86 not caring too much about these have meant that I think we've erred on the side of caution on that front. > Thirdly, io_uring.pdf and > https://github.com/torvalds/linux/blob/master/io_uring/io_uring.c seem a > little contradicting, at least from my reading: > > io_uring.pdf, in the completion entry example: > - Includes a read_barrier() **BEFORE** it reads the CQ ring tail > - Include a write_barrier() **AFTER** updating CQ head > > io_uring.c says on completion entries: > - **AFTER** the application reads the CQ ring tail, it must use an appropriate > smp_rmb() [...]. > - It also needs a smp_mb() **BEFORE** updating CQ head [...]. > > io_uring.pdf, in the submission entry example: > - Includes a write_barrier() **BEFORE** updating the SQ tail > - Includes a write_barrier() **AFTER** updating the SQ tail > > io_uring.c says on submission entries: > - [...] the application must use an appropriate smp_wmb() **BEFORE** > writing the SQ tail > (this matches io_uring.pdf) > - And it needs a barrier ordering the SQ head load before writing new > SQ entries > > I know, io_uring.pdf does mention that the memory ordering description > is simplified. So maybe this is the whole explanation for my confusion? The canonical resource at this point is the kernel code, as some of the revamping of the memory ordering happened way later than when that doc was written. Would be nice to get it updated at some point. -- Jens Axboe