From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50D20C05027 for ; Fri, 10 Feb 2023 19:28:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232833AbjBJT2E (ORCPT ); Fri, 10 Feb 2023 14:28:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233316AbjBJT2C (ORCPT ); Fri, 10 Feb 2023 14:28:02 -0500 Received: from hr2.samba.org (hr2.samba.org [IPv6:2a01:4f8:192:486::2:0]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26FB17E012; Fri, 10 Feb 2023 11:27:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=samba.org; s=42; h=Message-ID:Cc:To:From:Date; bh=mJvEE36AdMeH3sIvvfWubmnVmU4slCx5aMtae9s80iE=; b=LJ8QZ6feDCvHew+AkUbWfDtlfl 8RJc5UUKLvvifcHrKLNJbFZat0vqwYMk0bCk7Nwo6Cy43H+crmjQEw8Ape5Jq8863KK8RgysL4ESx muytWsRWCZj8CxUOlJTe/7xgt606cvWzuwZdDleIwAdgyflErcUJbYwIxjNNkQbRjbV+6H+6ehU+q LJh1BSLmMphQ90H9RTgiS3GDcdFqGoWczJKRTnQl0iQ5r2KJ+dLoD+d5e7QiTpQxG5K64GMzc9jn7 tbIdHKP7WDLKnArjHAUdbmX4nDUasTDhtDI4GjwwQwP8CePGbLMoNIAQ2se2Dfqj24squHpcIUj9G IXkLS4Cd3oufwE3FRrpTSczWBdq4DfD8rj0rUU5lVqXMV/d/BdlKO4jV8qtnglT9K4opAKFtxQsdu kRr9aNZ2QgQMRCydDY2SO5AiWCB5suKDBd2kNlHbb0H8FSG4jecDa2mS1OHFRggws53Hro/BdBCdn aMjb7mbgBYd6ypqeubyWIgkz; Received: from [127.0.0.2] (localhost [127.0.0.1]) by hr2.samba.org with esmtpsa (TLS1.3:ECDHE_SECP256R1__ECDSA_SECP256R1_SHA256__CHACHA20_POLY1305:256) (Exim) id 1pQZ3r-00D3HT-6F; Fri, 10 Feb 2023 19:27:55 +0000 Date: Fri, 10 Feb 2023 11:27:51 -0800 From: Jeremy Allison To: Linus Torvalds Cc: Andy Lutomirski , Jens Axboe , Linux API Mailing List , Dave Chinner , "linux-kernel@vger.kernel.org" , Matthew Wilcox , Stefan Metzmacher , Al Viro , linux-fsdevel , Samba Technical , io-uring Subject: Re: copy on write for splice() from file to pipe? Message-ID: Reply-To: Jeremy Allison References: <20230210021603.GA2825702@dread.disaster.area> <20230210040626.GB2825702@dread.disaster.area> <20230210065747.GD2825702@dread.disaster.area> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On Fri, Feb 10, 2023 at 11:18:05AM -0800, Linus Torvalds via samba-technical wrote: > >We should point the fingers at either the _user_ of splice - as Jeremy >Allison has done a couple of times - or we should point it at the sink >that cannot deal with unstable sources. > .... > - it sounds like the particular user in question (samba) already very >much has a reasonable model for "I have exclusive access to this" that >just wasn't used Having said that, I just had a phone discussion with Ralph Boehme on the Samba Team, who has been following along with this in read-only mode, and he did point out one case I had missed. 1). Client opens file with a lease. Hurrah, we think we can use splice() ! 2). Client writes into file. 3). Client calls SMB_FLUSH to ensure data is on disk. 4). Client reads the data just wrtten to ensure it's good. 5). Client overwrites the previously written data. Now when client issues (4), the read request, if we zero-copy using splice() - I don't think theres a way we get notified when the data has finally left the system and the mapped splice memory in the buffer cache is safe to overwrite by the write (5). So the read in (4) could potentially return the data written in (5), if the buffer cache mapped memory has not yet been sent out over the network. That is certainly unexpected behavior for the client, even if the client leased the file. If that's the case, then splice() is unusable for Samba even in the leased file case. > Maybe this thread raised some awareness of it for some people, but >more realistically - maybe we can really document this whole issue >somewhere much more clearly Complete comprehensive documentation on this would be extremely helpful (to say the least :-).