public inbox for [email protected]
 help / color / mirror / Atom feed
From: Andy Lutomirski <[email protected]>
To: Linus Torvalds <[email protected]>
Cc: Andy Lutomirski <[email protected]>,
	Dave Chinner <[email protected]>,
	Matthew Wilcox <[email protected]>,
	Stefan Metzmacher <[email protected]>, Jens Axboe <[email protected]>,
	linux-fsdevel <[email protected]>,
	Linux API Mailing List <[email protected]>,
	io-uring <[email protected]>,
	"[email protected]" <[email protected]>,
	Al Viro <[email protected]>,
	Samba Technical <[email protected]>
Subject: Re: copy on write for splice() from file to pipe?
Date: Fri, 10 Feb 2023 11:01:46 -0800	[thread overview]
Message-ID: <CALCETrWuRHWh5XFn8M8qx5z0FXAGHH=ysb+c6J+cqbYyTAHvhw@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wjQZWMeQ9OgXDNepf+TLijqj0Lm0dXWwWzDcbz6o7yy_g@mail.gmail.com>

On Fri, Feb 10, 2023 at 10:37 AM Linus Torvalds
<[email protected]> wrote:
>
> On Fri, Feb 10, 2023 at 9:57 AM Andy Lutomirski <[email protected]> wrote:
>
> I'm not convinced your suggestion of extending io_uring with new
> primitives is any better in practice, though.


I don't know if I'm really suggesting new primitives.  I think I'm
making two change suggestions that go together.

First, let splice() and IORING_OP_SPLICE copy (or zero-copy) data from
a file to a socket.

Second, either make splice more strict or add a new "strict splice"
variant.  Strict splice only completes when it can promise that writes
to the source that start after strict splice's completion won't change
what gets written to the destination.


I think that strict splice fixes Stefan's use case.  It's also easier
to reason about than regular splice.


The major caveat here is that zero-copy strict splice is fundamentally
a potentially long-running operation in a way that zero-copy splice()
isn't right now.  So the combination of O_NONBLOCK and strict splice()
(the syscall, not necessarily the io_uring operation) to something
like a TCP socket requires complicated locking or change tracking to
make sense.  This means that a splice() syscall providing strict
semantics to a TCP socket may just need to do a copy, at least in many
cases.  But maybe that's fine -- very-high-performance networking is
moving pretty aggressively to io_uring anyway.


And my possibly-quite-out-there claim is that, if Linux implements
strict splice, maybe non-strict splice could get replaced in a user
ABI-compatible manner with a much simpler non-zero-copy
implementation.  And strict splice from a file to a pipe could be
implemented as a copy -- high performance users can, if needed, start
strict-splicing from a file directly to a socket.

  reply	other threads:[~2023-02-10 19:02 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-09 13:55 copy on write for splice() from file to pipe? Stefan Metzmacher
2023-02-09 14:11 ` Matthew Wilcox
2023-02-09 14:29   ` Stefan Metzmacher
2023-02-09 16:41 ` Linus Torvalds
2023-02-09 19:17   ` Stefan Metzmacher
2023-02-09 19:36     ` Linus Torvalds
2023-02-09 19:48       ` Linus Torvalds
2023-02-09 20:33         ` Jeremy Allison
2023-02-10 20:45         ` Stefan Metzmacher
2023-02-10 20:51           ` Linus Torvalds
2023-02-10  2:16   ` Dave Chinner
2023-02-10  4:06     ` Dave Chinner
2023-02-10  4:44       ` Matthew Wilcox
2023-02-10  6:57         ` Dave Chinner
2023-02-10 15:14           ` Andy Lutomirski
2023-02-10 16:33             ` Linus Torvalds
2023-02-10 17:57               ` Andy Lutomirski
2023-02-10 18:19                 ` Jeremy Allison
2023-02-10 19:29                   ` Stefan Metzmacher
2023-02-10 18:37                 ` Linus Torvalds
2023-02-10 19:01                   ` Andy Lutomirski [this message]
2023-02-10 19:18                     ` Linus Torvalds
2023-02-10 19:27                       ` Jeremy Allison
2023-02-10 19:42                         ` Stefan Metzmacher
2023-02-10 19:42                         ` Linus Torvalds
2023-02-10 19:54                           ` Stefan Metzmacher
2023-02-10 19:29                       ` Linus Torvalds
2023-02-13  9:07                         ` Herbert Xu
2023-02-10 19:55                       ` Andy Lutomirski
2023-02-10 20:27                         ` Linus Torvalds
2023-02-10 20:32                           ` Jens Axboe
2023-02-10 20:36                             ` Linus Torvalds
2023-02-10 20:39                               ` Jens Axboe
2023-02-10 20:44                                 ` Linus Torvalds
2023-02-10 20:50                                   ` Jens Axboe
2023-02-10 21:14                                     ` Andy Lutomirski
2023-02-10 21:27                                       ` Jens Axboe
2023-02-10 21:51                                         ` Jens Axboe
2023-02-10 22:08                                           ` Linus Torvalds
2023-02-10 22:16                                             ` Jens Axboe
2023-02-10 22:17                                             ` Linus Torvalds
2023-02-10 22:25                                               ` Jens Axboe
2023-02-10 22:35                                                 ` Linus Torvalds
2023-02-10 22:51                                                   ` Jens Axboe
2023-02-11  3:18                                             ` Ming Lei
2023-02-11  6:17                                               ` Ming Lei
2023-02-11 14:13                                               ` Jens Axboe
2023-02-11 15:05                                                 ` Ming Lei
2023-02-11 15:33                                                   ` Jens Axboe
2023-02-11 18:57                                                     ` Linus Torvalds
2023-02-12  2:46                                                       ` Jens Axboe
2023-02-10  4:47       ` Linus Torvalds
2023-02-10  6:19         ` Dave Chinner
2023-02-10 17:23           ` Linus Torvalds
2023-02-10 17:47             ` Linus Torvalds
2023-02-13  9:28               ` Herbert Xu
2023-02-10 22:41             ` David Laight
2023-02-10 22:51               ` Jens Axboe
2023-02-13  9:30               ` Herbert Xu
2023-02-13  9:25           ` Herbert Xu
2023-02-13 18:01             ` Andy Lutomirski
2023-02-14  1:22               ` Herbert Xu
2023-02-17 23:13                 ` Andy Lutomirski
2023-02-20  4:54                   ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrWuRHWh5XFn8M8qx5z0FXAGHH=ysb+c6J+cqbYyTAHvhw@mail.gmail.com' \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox