public inbox for [email protected]
 help / color / mirror / Atom feed
From: Linus Torvalds <[email protected]>
To: Dave Chinner <[email protected]>
Cc: Stefan Metzmacher <[email protected]>, Jens Axboe <[email protected]>,
	linux-fsdevel <[email protected]>,
	Linux API Mailing List <[email protected]>,
	io-uring <[email protected]>,
	"[email protected]" <[email protected]>,
	Al Viro <[email protected]>,
	Samba Technical <[email protected]>
Subject: Re: copy on write for splice() from file to pipe?
Date: Thu, 9 Feb 2023 20:47:07 -0800	[thread overview]
Message-ID: <CAHk-=wip9xx367bfCV8xaF9Oaw4DZ6edF9Ojv10XoxJ-iUBwhA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>

On Thu, Feb 9, 2023 at 8:06 PM Dave Chinner <[email protected]> wrote:
>>
> So while I was pondering the complexity of this and watching a great
> big shiny rocket create lots of heat, light and noise, it occurred
> to me that we already have a mechanism for preventing page cache
> data from being changed while the folios are under IO:
> SB_I_STABLE_WRITES and folio_wait_stable().

No, Dave. Not at all.

Stop and think.

splice() is not some "while under IO" thing. It's *UNBOUNDED*.

Let me just give an example: random user A does

   fd = open("/dev/passwd", O_RDONLY);
   splice(fd, NULL, pipefd, NULL, ..);
   sleep(10000);

and you now want to block all writes to the page in that file as long
as it's being held on to, do you?

So no.

The above is also why something like IOMAP_F_SHARED is not relevant.
The whole point of splice is to act as a way to communicate pages
between *DIFFERENT* subsystems. The only thing they have in common is
the buffer (typically a page reference, but it can be other things)
that is getting transferred.

A spliced page - by definition - is not in some controlled state where
one filesystem (or one subsystem like networking) owns it, because the
whole and only point of splice is to act as that "take data from one
random source and feed it in to another random destination", and avoid
the N*M complexity matrix of N sources and M destinations.

So no. We cannot block writes, because there is no bounded time for
them. And no, we cannot use some kind of "mark this IO as shared",
because there is no "this IO".

It is also worth noting that the shared behavior (ie "splice acts as a
temporary shared buffer") might even be something that people actually
expect and depend on for semantics. I hope not, but it's not entirely
impossible that people change the source (could be file data for the
above "splice from file" case, but could be your own virtual memory
image for "vmsplice()") _after_ splicing the source, and before
splicing it to the destination.

(It sounds very unlikely that people would do that for file contents,
but vmsplice() might intentionally use buffers that may be "live").

Now, to be honest, I hope nobody plays games like that. In fact, I'm a
bit sorry that I was pushing splice() in the first place. Playing
games with zero-copy tends to cause problems, and we've had some nasty
security issues in this area too.

Now, splice() is a *lot* cleaner conceptually than "sendfile()" ever
was, exactly because it allows that whole "many different sources,
many different destinations" model. But this is also very much an
example of how "generic" may be something that is revered in computer
science, but is often a *horrible* thing in reality.

Because if this was just "sendfile()", and would be one operation that
moves file contents from the page cache to the network buffers, then
your idea of "prevent data from being changed while in transit" would
actually be valid.

Special cases are often much simpler and easier, and sometimes the
special cases are all you actually want.

Splice() is not a special case. It's sadly a very interesting and very
generic model for sharing buffers, and that does cause very real
problems.

            Linus

  parent reply	other threads:[~2023-02-10  4:47 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 13:55 copy on write for splice() from file to pipe? Stefan Metzmacher
2023-02-09 14:11 ` Matthew Wilcox
2023-02-09 14:29   ` Stefan Metzmacher
2023-02-09 16:41 ` Linus Torvalds
2023-02-09 19:17   ` Stefan Metzmacher
2023-02-09 19:36     ` Linus Torvalds
2023-02-09 19:48       ` Linus Torvalds
2023-02-09 20:33         ` Jeremy Allison
2023-02-10 20:45         ` Stefan Metzmacher
2023-02-10 20:51           ` Linus Torvalds
2023-02-10  2:16   ` Dave Chinner
2023-02-10  4:06     ` Dave Chinner
2023-02-10  4:44       ` Matthew Wilcox
2023-02-10  6:57         ` Dave Chinner
2023-02-10 15:14           ` Andy Lutomirski
2023-02-10 16:33             ` Linus Torvalds
2023-02-10 17:57               ` Andy Lutomirski
2023-02-10 18:19                 ` Jeremy Allison
2023-02-10 19:29                   ` Stefan Metzmacher
2023-02-10 18:37                 ` Linus Torvalds
2023-02-10 19:01                   ` Andy Lutomirski
2023-02-10 19:18                     ` Linus Torvalds
2023-02-10 19:27                       ` Jeremy Allison
2023-02-10 19:42                         ` Stefan Metzmacher
2023-02-10 19:42                         ` Linus Torvalds
2023-02-10 19:54                           ` Stefan Metzmacher
2023-02-10 19:29                       ` Linus Torvalds
2023-02-13  9:07                         ` Herbert Xu
2023-02-10 19:55                       ` Andy Lutomirski
2023-02-10 20:27                         ` Linus Torvalds
2023-02-10 20:32                           ` Jens Axboe
2023-02-10 20:36                             ` Linus Torvalds
2023-02-10 20:39                               ` Jens Axboe
2023-02-10 20:44                                 ` Linus Torvalds
2023-02-10 20:50                                   ` Jens Axboe
2023-02-10 21:14                                     ` Andy Lutomirski
2023-02-10 21:27                                       ` Jens Axboe
2023-02-10 21:51                                         ` Jens Axboe
2023-02-10 22:08                                           ` Linus Torvalds
2023-02-10 22:16                                             ` Jens Axboe
2023-02-10 22:17                                             ` Linus Torvalds
2023-02-10 22:25                                               ` Jens Axboe
2023-02-10 22:35                                                 ` Linus Torvalds
2023-02-10 22:51                                                   ` Jens Axboe
2023-02-11  3:18                                             ` Ming Lei
2023-02-11  6:17                                               ` Ming Lei
2023-02-11 14:13                                               ` Jens Axboe
2023-02-11 15:05                                                 ` Ming Lei
2023-02-11 15:33                                                   ` Jens Axboe
2023-02-11 18:57                                                     ` Linus Torvalds
2023-02-12  2:46                                                       ` Jens Axboe
2023-02-10  4:47       ` Linus Torvalds [this message]
2023-02-10  6:19         ` Dave Chinner
2023-02-10 17:23           ` Linus Torvalds
2023-02-10 17:47             ` Linus Torvalds
2023-02-13  9:28               ` Herbert Xu
2023-02-10 22:41             ` David Laight
2023-02-10 22:51               ` Jens Axboe
2023-02-13  9:30               ` Herbert Xu
2023-02-13  9:25           ` Herbert Xu
2023-02-13 18:01             ` Andy Lutomirski
2023-02-14  1:22               ` Herbert Xu
2023-02-17 23:13                 ` Andy Lutomirski
2023-02-20  4:54                   ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wip9xx367bfCV8xaF9Oaw4DZ6edF9Ojv10XoxJ-iUBwhA@mail.gmail.com' \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox