public inbox for [email protected]
 help / color / mirror / Atom feed
From: Andy Lutomirski <[email protected]>
To: Herbert Xu <[email protected]>
Cc: Dave Chinner <[email protected]>,
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected]
Subject: Re: copy on write for splice() from file to pipe?
Date: Mon, 13 Feb 2023 10:01:27 -0800	[thread overview]
Message-ID: <CALCETrXKkZw3ojpmTftur1_-dEi6BOo9Q0cems_jgabntNFYig@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>

On Mon, Feb 13, 2023 at 1:45 AM Herbert Xu <[email protected]> wrote:
>
> Dave Chinner <[email protected]> wrote:
> >
> > IOWs, the application does not care if the data changes whilst they
> > are in transport attached to the pipe - it only cares that the
> > contents are stable once they have been delivered and are now wholly
> > owned by the network stack IO path so that the OTW encodings
> > (checksum, encryption, whatever) done within the network IO path
> > don't get compromised.
>
> Is this even a real problem? The network stack doesn't care at
> all if you modify the pages while it's being processed.  All the
> things you've mentioned (checksum, encryption, etc.) will be
> self-consistent on the wire.
>
> Even when actual hardware offload is involved it's hard to see how
> things could possibly go wrong unless the hardware was going out of
> its way to do the wrong thing by fetching from memory twice.
>

There's a difference between "kernel speaks TCP (or whatever)
correctly" and "kernel does what the application needs it to do".
When I write programs that send data on the network, I want the kernel
to send the data that I asked it to send.  As a silly but obvious
example, suppose I have two threads, and all I/O is blocking
(O_NONBLOCK is not set, etc):

char buffer[1024] = "A";

Thread A:
send(fd, buffer, 1, 0);

Thread B:
mb();
buffer[0] = 'B';
mb();


Obviously, there are three possible valid outcomes: Thread A can go
first (send returns before B changes the buffer), and 'A' gets sent.
Thread B can go first (the buffer is changed before send() starts),
and 'B' gets sent.  Or both can run concurrently, in which case the
data sent is indeterminate.

But it is not valid for send() to return, then the buffer to change,
and 'B' to get sent, just like:

char foo[] = "A";
send(fd, foo, 1, 0);
foo[0] = 'B';

must send 'A', not 'B'.

The trouble with splice() is that there is no clear point at which the
splice is complete and the data being sent is committed.  I don't
think user applications need the data committed particularly quickly,
but I do think it needs to be committed "eventually* and there needs
to be a point at which the application knows it's been committed.
Right now, if a user program does:

Write 'A' to a file
splice that file to a pipe
splice that pipe to a socket
... wait until when? ...
Write 'B' to a file

There is nothing the user program can wait for to make sure that 'A'
gets sent, but saying that the kernel speaks TCP correctly without
solving this problem doesn't actually solve the problem.

  reply	other threads:[~2023-02-13 18:01 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 13:55 copy on write for splice() from file to pipe? Stefan Metzmacher
2023-02-09 14:11 ` Matthew Wilcox
2023-02-09 14:29   ` Stefan Metzmacher
2023-02-09 16:41 ` Linus Torvalds
2023-02-09 19:17   ` Stefan Metzmacher
2023-02-09 19:36     ` Linus Torvalds
2023-02-09 19:48       ` Linus Torvalds
2023-02-09 20:33         ` Jeremy Allison
2023-02-10 20:45         ` Stefan Metzmacher
2023-02-10 20:51           ` Linus Torvalds
2023-02-10  2:16   ` Dave Chinner
2023-02-10  4:06     ` Dave Chinner
2023-02-10  4:44       ` Matthew Wilcox
2023-02-10  6:57         ` Dave Chinner
2023-02-10 15:14           ` Andy Lutomirski
2023-02-10 16:33             ` Linus Torvalds
2023-02-10 17:57               ` Andy Lutomirski
2023-02-10 18:19                 ` Jeremy Allison
2023-02-10 19:29                   ` Stefan Metzmacher
2023-02-10 18:37                 ` Linus Torvalds
2023-02-10 19:01                   ` Andy Lutomirski
2023-02-10 19:18                     ` Linus Torvalds
2023-02-10 19:27                       ` Jeremy Allison
2023-02-10 19:42                         ` Stefan Metzmacher
2023-02-10 19:42                         ` Linus Torvalds
2023-02-10 19:54                           ` Stefan Metzmacher
2023-02-10 19:29                       ` Linus Torvalds
2023-02-13  9:07                         ` Herbert Xu
2023-02-10 19:55                       ` Andy Lutomirski
2023-02-10 20:27                         ` Linus Torvalds
2023-02-10 20:32                           ` Jens Axboe
2023-02-10 20:36                             ` Linus Torvalds
2023-02-10 20:39                               ` Jens Axboe
2023-02-10 20:44                                 ` Linus Torvalds
2023-02-10 20:50                                   ` Jens Axboe
2023-02-10 21:14                                     ` Andy Lutomirski
2023-02-10 21:27                                       ` Jens Axboe
2023-02-10 21:51                                         ` Jens Axboe
2023-02-10 22:08                                           ` Linus Torvalds
2023-02-10 22:16                                             ` Jens Axboe
2023-02-10 22:17                                             ` Linus Torvalds
2023-02-10 22:25                                               ` Jens Axboe
2023-02-10 22:35                                                 ` Linus Torvalds
2023-02-10 22:51                                                   ` Jens Axboe
2023-02-11  3:18                                             ` Ming Lei
2023-02-11  6:17                                               ` Ming Lei
2023-02-11 14:13                                               ` Jens Axboe
2023-02-11 15:05                                                 ` Ming Lei
2023-02-11 15:33                                                   ` Jens Axboe
2023-02-11 18:57                                                     ` Linus Torvalds
2023-02-12  2:46                                                       ` Jens Axboe
2023-02-10  4:47       ` Linus Torvalds
2023-02-10  6:19         ` Dave Chinner
2023-02-10 17:23           ` Linus Torvalds
2023-02-10 17:47             ` Linus Torvalds
2023-02-13  9:28               ` Herbert Xu
2023-02-10 22:41             ` David Laight
2023-02-10 22:51               ` Jens Axboe
2023-02-13  9:30               ` Herbert Xu
2023-02-13  9:25           ` Herbert Xu
2023-02-13 18:01             ` Andy Lutomirski [this message]
2023-02-14  1:22               ` Herbert Xu
2023-02-17 23:13                 ` Andy Lutomirski
2023-02-20  4:54                   ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrXKkZw3ojpmTftur1_-dEi6BOo9Q0cems_jgabntNFYig@mail.gmail.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox