From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DA79BC05027 for ; Tue, 14 Feb 2023 14:36:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232927AbjBNOge (ORCPT ); Tue, 14 Feb 2023 09:36:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232771AbjBNOgd (ORCPT ); Tue, 14 Feb 2023 09:36:33 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0F105422E for ; Tue, 14 Feb 2023 06:35:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1676385344; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9RjyhigfePlUSy6qTwqC04zzL1v2DzjMlFxn8DxRpjo=; b=YWaLPM41MYWZ8hnB0wD1ywSpY0HELwall1XGb6QXGcfY3MukAIkHGb3d+pCE+bzO7nPvkD KocW7nutcjmoGXwQFXG//cJumk8+qS+/Ik2qABd08GBxZJD61vVbuCDMROqgj1pc09EiLu xC0H8q50T1lt2uRAPLRY8STBOVUL2BQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-637-z75F3GTIPGunDAcx57uv8w-1; Tue, 14 Feb 2023 09:35:41 -0500 X-MC-Unique: z75F3GTIPGunDAcx57uv8w-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 26565104F0A0; Tue, 14 Feb 2023 14:35:40 +0000 (UTC) Received: from T590 (ovpn-8-22.pek2.redhat.com [10.72.8.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id BE4CE2026D4B; Tue, 14 Feb 2023 14:35:30 +0000 (UTC) Date: Tue, 14 Feb 2023 22:35:24 +0800 From: Ming Lei To: Miklos Szeredi Cc: Linus Torvalds , Jens Axboe , io-uring@vger.kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, Alexander Viro , Stefan Hajnoczi , Miklos Szeredi , Bernd Schubert , Nitesh Shetty , Christoph Hellwig , Ziyang Zhang , ming.lei@redhat.com Subject: Re: [PATCH 1/4] fs/splice: enhance direct pipe & splice for moving pages in kernel Message-ID: References: <20230210153212.733006-1-ming.lei@redhat.com> <20230210153212.733006-2-ming.lei@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Hi Miklos, On Tue, Feb 14, 2023 at 12:03:44PM +0100, Miklos Szeredi wrote: > On Mon, 13 Feb 2023 at 21:04, Linus Torvalds > wrote: > > > > On Sat, Feb 11, 2023 at 5:39 PM Ming Lei wrote: > > > > > > > > > > > (a) what's the point of MAY_READ? A non-readable page sounds insane > > > > and wrong. All sinks expect to be able to read. > > > > > > For example, it is one page which needs sink end to fill data, so > > > we needn't to zero it in source end every time, just for avoiding > > > leak kernel data if (unexpected)sink end simply tried to read from > > > the spliced page instead of writing data to page. > > > > I still don't understand. > > > > A sink *reads* the data. It doesn't write the data. > > I think Ming is trying to generalize splice to allow flowing data in > the opposite direction. I think it isn't opposite direction, it is just that sink may be WRITE to buffer, and the model is: device(produce buffer in ->splice_read()) -> direct pipe -> file(consume buffer via READ or WRITE) Follows kernel side implementation: splice_direct_to_actor(pipe, sd, source_actor) direct_actor(): __splice_from_pipe(pipe, sd, sink_actor) sink_actor(): get_page() then read from file/socket to page. The current userspace owns the whole buffer, so I understand the buffer ownership can be transferred to consumer/sink side. > So yes, sink would be writing to the buffer. > And it MUST NOT be reading the data since the buffer may be > uninitialized. The added SPLICE_F_PRIV_FOR_READ[WRITE] is enough to avoid un-expected READ, but the source end needs to confirm the buffer ownership can be transferred to consumer, probably PIPE_BUF_FLAG_GIFT can be used for this purpose. > > The problem is how to tell the original source that the buffer is > ready? PG_uptodate comes to mind, but pipe buffers allow partial > pages to be passed around, and there's no mechanism to describe a > partially uptodate buffer. I understand it isn't one issue from block device driver viewpoint at least, since the WRITE to buffer in sink end can be thought as DMA to buffer from device, and it is the upper layer(FS)'s responsibility for updating page flag. And block driver needn't to handle page status update. So seems it is one fuse specific issue? Thanks, Ming