From: Stefan Metzmacher <[email protected]>
To: Linus Torvalds <[email protected]>,
David Howells <[email protected]>
Cc: Jens Axboe <[email protected]>,
Linux API Mailing List <[email protected]>,
Samba Technical <[email protected]>,
"[email protected]" <[email protected]>,
Al Viro <[email protected]>,
linux-fsdevel <[email protected]>,
io-uring <[email protected]>,
[email protected]
Subject: Re: copy on write for splice() from file to pipe?
Date: Fri, 10 Feb 2023 21:45:29 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAHk-=whprvcY=KRh15uqtmVqR2rL-H1yN6RaswHiWPsGHDqsSQ@mail.gmail.com>
Am 09.02.23 um 20:48 schrieb Linus Torvalds via samba-technical:
> On Thu, Feb 9, 2023 at 11:36 AM Linus Torvalds
> <[email protected]> wrote:
>>
>> I guarantee that you will only slow things down with some odd async_memcpy.
>
> Extended note: even if the copies themselves would then be done
> concurrently with other work (so "not faster, but more parallel"), the
> synchronization required at the end would then end up being costly
> enough to eat up any possible win. Plus you'd still end up with a
> fundamental problem of "what if the data changes in the meantime".
>
> And that's ignoring all the practical problems of actually starting
> the async copy, which traditionally requires virtual to physical
> translations (where "physical" is whatever the DMA address space is).
>
> So I don't think there are any actual real cases of async memory copy
> engines being even _remotely_ better than memcpy outside of
> microcontrollers (and historical computers before caches - people may
> still remember things like the Amiga blitter fondly).
>
> Again, the exception ends up being if you can actually use real DMA to
> not do a memory copy, but to transfer data directly to or from the
> device. That's in some way what 'splice()' is designed to allow you to
> do, but exactly by the pipe part ending up being the "conceptual
> buffer" for the zero-copy pages.
>
> So this is exactly *why* splicing from a file all the way to the
> network will then show any file changes that have happened in between
> that "splice started" and "network card got the data". You're supposed
> to use splice only when you can guarantee the data stability (or,
> alternatively, when you simply don't care about the data stability,
> and getting the changed data is perfectly fine).
Ok, thanks for the explanation!
Looking at this patch from David Howells :
https://lore.kernel.org/linux-fsdevel/[email protected]/
indicates that we don't have that problem with O_DIRECT as it operates
on dedicated pages (not part of the shared page cache). And these pages
might be filled via DMA (depending on the filesystem and block device).
Is my understanding correct?
Together with this patch:
https://lore.kernel.org/linux-fsdevel/[email protected]/
I guess it would be easy to pass a flag (maybe SPLICE_F_FORCE_COPY)
down to generic_file_splice_read() and let it create dedicated pages
and use memcpy() from the page cache to the dedicated pages.
This would mean one memcpy(), but it would allow the pipe to be used
for the splice() to the socket, tee(). This might be easier than
using pread() followed by vmsplice(SPLICE_F_GIFT).
The usage of SPLICE_F_GIFT is very confusing to me...
I found this example in libkcapi using the kernel crypto sockets:
https://github.com/smuellerDD/libkcapi/blob/master/lib/kcapi-kernel-if.c#L324
where it just passes SPLICE_F_GIFT together with an iovec passed from the
caller of the library.
To me it's not clear if the caller can still use it's buffers referenced by
the passed iovec...
metze
next prev parent reply other threads:[~2023-02-10 20:45 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-09 13:55 copy on write for splice() from file to pipe? Stefan Metzmacher
2023-02-09 14:11 ` Matthew Wilcox
2023-02-09 14:29 ` Stefan Metzmacher
2023-02-09 16:41 ` Linus Torvalds
2023-02-09 19:17 ` Stefan Metzmacher
2023-02-09 19:36 ` Linus Torvalds
2023-02-09 19:48 ` Linus Torvalds
2023-02-09 20:33 ` Jeremy Allison
2023-02-10 20:45 ` Stefan Metzmacher [this message]
2023-02-10 20:51 ` Linus Torvalds
2023-02-10 2:16 ` Dave Chinner
2023-02-10 4:06 ` Dave Chinner
2023-02-10 4:44 ` Matthew Wilcox
2023-02-10 6:57 ` Dave Chinner
2023-02-10 15:14 ` Andy Lutomirski
2023-02-10 16:33 ` Linus Torvalds
2023-02-10 17:57 ` Andy Lutomirski
2023-02-10 18:19 ` Jeremy Allison
2023-02-10 19:29 ` Stefan Metzmacher
2023-02-10 18:37 ` Linus Torvalds
2023-02-10 19:01 ` Andy Lutomirski
2023-02-10 19:18 ` Linus Torvalds
2023-02-10 19:27 ` Jeremy Allison
2023-02-10 19:42 ` Stefan Metzmacher
2023-02-10 19:42 ` Linus Torvalds
2023-02-10 19:54 ` Stefan Metzmacher
2023-02-10 19:29 ` Linus Torvalds
2023-02-13 9:07 ` Herbert Xu
2023-02-10 19:55 ` Andy Lutomirski
2023-02-10 20:27 ` Linus Torvalds
2023-02-10 20:32 ` Jens Axboe
2023-02-10 20:36 ` Linus Torvalds
2023-02-10 20:39 ` Jens Axboe
2023-02-10 20:44 ` Linus Torvalds
2023-02-10 20:50 ` Jens Axboe
2023-02-10 21:14 ` Andy Lutomirski
2023-02-10 21:27 ` Jens Axboe
2023-02-10 21:51 ` Jens Axboe
2023-02-10 22:08 ` Linus Torvalds
2023-02-10 22:16 ` Jens Axboe
2023-02-10 22:17 ` Linus Torvalds
2023-02-10 22:25 ` Jens Axboe
2023-02-10 22:35 ` Linus Torvalds
2023-02-10 22:51 ` Jens Axboe
2023-02-11 3:18 ` Ming Lei
2023-02-11 6:17 ` Ming Lei
2023-02-11 14:13 ` Jens Axboe
2023-02-11 15:05 ` Ming Lei
2023-02-11 15:33 ` Jens Axboe
2023-02-11 18:57 ` Linus Torvalds
2023-02-12 2:46 ` Jens Axboe
2023-02-10 4:47 ` Linus Torvalds
2023-02-10 6:19 ` Dave Chinner
2023-02-10 17:23 ` Linus Torvalds
2023-02-10 17:47 ` Linus Torvalds
2023-02-13 9:28 ` Herbert Xu
2023-02-10 22:41 ` David Laight
2023-02-10 22:51 ` Jens Axboe
2023-02-13 9:30 ` Herbert Xu
2023-02-13 9:25 ` Herbert Xu
2023-02-13 18:01 ` Andy Lutomirski
2023-02-14 1:22 ` Herbert Xu
2023-02-17 23:13 ` Andy Lutomirski
2023-02-20 4:54 ` Herbert Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox