public inbox for [email protected]
 help / color / mirror / Atom feed
From: Dave Chinner <[email protected]>
To: Linus Torvalds <[email protected]>
Cc: Stefan Metzmacher <[email protected]>, Jens Axboe <[email protected]>,
	linux-fsdevel <[email protected]>,
	Linux API Mailing List <[email protected]>,
	io-uring <[email protected]>,
	"[email protected]" <[email protected]>,
	Al Viro <[email protected]>,
	Samba Technical <[email protected]>
Subject: Re: copy on write for splice() from file to pipe?
Date: Fri, 10 Feb 2023 17:19:53 +1100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAHk-=wip9xx367bfCV8xaF9Oaw4DZ6edF9Ojv10XoxJ-iUBwhA@mail.gmail.com>

On Thu, Feb 09, 2023 at 08:47:07PM -0800, Linus Torvalds wrote:
> On Thu, Feb 9, 2023 at 8:06 PM Dave Chinner <[email protected]> wrote:
> >>
> > So while I was pondering the complexity of this and watching a great
> > big shiny rocket create lots of heat, light and noise, it occurred
> > to me that we already have a mechanism for preventing page cache
> > data from being changed while the folios are under IO:
> > SB_I_STABLE_WRITES and folio_wait_stable().
> 
> No, Dave. Not at all.
> 
> Stop and think.

I have.

> splice() is not some "while under IO" thing. It's *UNBOUNDED*.

Splice has two sides - a source where we splice to the transport
pipe, then a destination where we splice pages from the transport
pipe. For better or worse, time in the transport pipe is unbounded,
but that does not mean the srouce or destination have unbound
processing times.

However, transport times being unbound are largely irrelevant, and
miss the fact that the application does not require pages in transit
to be stable.

The application we are talking about here is file -> pipe -> network
stack for zero copy sending of static file data and the problem is
that the file pages are not stable whilst they are under IO in the
network stack.

IOWs, the application does not care if the data changes whilst they
are in transport attached to the pipe - it only cares that the
contents are stable once they have been delivered and are now wholly
owned by the network stack IO path so that the OTW encodings
(checksum, encryption, whatever) done within the network IO path
don't get compromised.

i.e. the file pages only need to be stable whilst the network stack
IO path checksums and DMAs the data to the network hardware.

That's exactly the same IO context that the block device stack
requires the page contents  to be stable - across parity/checksum
calculations and the subsequent DMA transfers to the storage
hardware.

I'm suggesting that the page should only need to be held stable
whilst it is under IO, whether that IO is in the network stack via
skbs or in the block device stack via bios.  Both network and block
IO are bounded by fixed time limits, both IO paths typically only
need pages held stable for a few milliseconds at a time, and both
have worst case IO times in error situations are typically bound at
a few minutes.

IOWs, splice is a complete misdirection here - it doesn't need to
know a thing about stable data requirements at all. It's the
destination processing that requires stable data, not the transport
mechanism.

Hence if we have a generic mechanism that the network stack can use
to detect a file backed page and mark it needing to be stable whilst
the network stack is doing IO on it, everything on the filesystem
side should just work like it does for pages under IO in the block
device stack...

Indeed, I suspect that a filesystem -> pipe -> filesystem zero copy
path via splice probably also needs stable source pages for some
filesystems, in which case we need exactly the same mechanism as
we need for stable pages in the network stack zero copy splice
destiantion path....

Cheers,

Dave.
-- 
Dave Chinner
[email protected]

  reply	other threads:[~2023-02-10  6:20 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 13:55 copy on write for splice() from file to pipe? Stefan Metzmacher
2023-02-09 14:11 ` Matthew Wilcox
2023-02-09 14:29   ` Stefan Metzmacher
2023-02-09 16:41 ` Linus Torvalds
2023-02-09 19:17   ` Stefan Metzmacher
2023-02-09 19:36     ` Linus Torvalds
2023-02-09 19:48       ` Linus Torvalds
2023-02-09 20:33         ` Jeremy Allison
2023-02-10 20:45         ` Stefan Metzmacher
2023-02-10 20:51           ` Linus Torvalds
2023-02-10  2:16   ` Dave Chinner
2023-02-10  4:06     ` Dave Chinner
2023-02-10  4:44       ` Matthew Wilcox
2023-02-10  6:57         ` Dave Chinner
2023-02-10 15:14           ` Andy Lutomirski
2023-02-10 16:33             ` Linus Torvalds
2023-02-10 17:57               ` Andy Lutomirski
2023-02-10 18:19                 ` Jeremy Allison
2023-02-10 19:29                   ` Stefan Metzmacher
2023-02-10 18:37                 ` Linus Torvalds
2023-02-10 19:01                   ` Andy Lutomirski
2023-02-10 19:18                     ` Linus Torvalds
2023-02-10 19:27                       ` Jeremy Allison
2023-02-10 19:42                         ` Stefan Metzmacher
2023-02-10 19:42                         ` Linus Torvalds
2023-02-10 19:54                           ` Stefan Metzmacher
2023-02-10 19:29                       ` Linus Torvalds
2023-02-13  9:07                         ` Herbert Xu
2023-02-10 19:55                       ` Andy Lutomirski
2023-02-10 20:27                         ` Linus Torvalds
2023-02-10 20:32                           ` Jens Axboe
2023-02-10 20:36                             ` Linus Torvalds
2023-02-10 20:39                               ` Jens Axboe
2023-02-10 20:44                                 ` Linus Torvalds
2023-02-10 20:50                                   ` Jens Axboe
2023-02-10 21:14                                     ` Andy Lutomirski
2023-02-10 21:27                                       ` Jens Axboe
2023-02-10 21:51                                         ` Jens Axboe
2023-02-10 22:08                                           ` Linus Torvalds
2023-02-10 22:16                                             ` Jens Axboe
2023-02-10 22:17                                             ` Linus Torvalds
2023-02-10 22:25                                               ` Jens Axboe
2023-02-10 22:35                                                 ` Linus Torvalds
2023-02-10 22:51                                                   ` Jens Axboe
2023-02-11  3:18                                             ` Ming Lei
2023-02-11  6:17                                               ` Ming Lei
2023-02-11 14:13                                               ` Jens Axboe
2023-02-11 15:05                                                 ` Ming Lei
2023-02-11 15:33                                                   ` Jens Axboe
2023-02-11 18:57                                                     ` Linus Torvalds
2023-02-12  2:46                                                       ` Jens Axboe
2023-02-10  4:47       ` Linus Torvalds
2023-02-10  6:19         ` Dave Chinner [this message]
2023-02-10 17:23           ` Linus Torvalds
2023-02-10 17:47             ` Linus Torvalds
2023-02-13  9:28               ` Herbert Xu
2023-02-10 22:41             ` David Laight
2023-02-10 22:51               ` Jens Axboe
2023-02-13  9:30               ` Herbert Xu
2023-02-13  9:25           ` Herbert Xu
2023-02-13 18:01             ` Andy Lutomirski
2023-02-14  1:22               ` Herbert Xu
2023-02-17 23:13                 ` Andy Lutomirski
2023-02-20  4:54                   ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox