public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Jens Axboe <[email protected]>,
	[email protected], [email protected],
	[email protected],
	Alexander Viro <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>,
	Miklos Szeredi <[email protected]>,
	Bernd Schubert <[email protected]>,
	Nitesh Shetty <[email protected]>,
	Christoph Hellwig <[email protected]>,
	Ziyang Zhang <[email protected]>,
	Linus Torvalds <[email protected]>
Subject: Re: [PATCH 1/4] fs/splice: enhance direct pipe & splice for moving pages in kernel
Date: Sat, 11 Feb 2023 23:42:39 +0800	[thread overview]
Message-ID: <Y+e3b+Myg/30hlYk@T590> (raw)
In-Reply-To: <[email protected]>

On Fri, Feb 10, 2023 at 11:32:09PM +0800, Ming Lei wrote:
> Per-task direct pipe can transfer page between two files or one
> file and other kernel components, especially by splice_direct_to_actor
> and __splice_from_pipe().
> 
> This way is helpful for fuse/ublk to implement zero copy by transferring
> pages from device to file or socket. However, when device's ->splice_read()
> produces pages, the kernel consumer may read from or write to these pages,
> and from device viewpoint, there could be unexpected read or write on
> pages.
> 
> Enhance the limit by the following approach:
> 
> 1) add kernel splice flags of SPLICE_F_KERN_FOR_[READ|WRITE] which is
>    passed to device's ->read_splice(), then device can check if this
>    READ or WRITE is expected on pages filled to pipe together with
>    information from ppos & len
> 
> 2) add kernel splice flag of SPLICE_F_KERN_NEED_CONFIRM which is passed
>    to device's ->read_splice() for asking device to confirm if it
>    really supports this kind of usage of feeding pages by ->read_splice().
>    If device does support, pipe->ack_page_consuming is set. This way
>    can avoid misuse.
> 
> Signed-off-by: Ming Lei <[email protected]>
> ---
>  fs/splice.c               | 15 +++++++++++++++
>  include/linux/pipe_fs_i.h | 10 ++++++++++
>  include/linux/splice.h    | 22 ++++++++++++++++++++++
>  3 files changed, 47 insertions(+)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index 87d9b19349de..c4770e1644cc 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -792,6 +792,14 @@ static long do_splice_to(struct file *in, loff_t *ppos,
>  	return in->f_op->splice_read(in, ppos, pipe, len, flags);
>  }
>  
> +static inline bool slice_read_acked(const struct pipe_inode_info *pipe,
> +			       int flags)
> +{
> +	if (flags & SPLICE_F_KERN_NEED_CONFIRM)
> +		return pipe->ack_page_consuming;
> +	return true;
> +}
> +
>  /**
>   * splice_direct_to_actor - splices data directly between two non-pipes
>   * @in:		file to splice from
> @@ -861,10 +869,17 @@ ssize_t splice_direct_to_actor(struct file *in, struct splice_desc *sd,
>  		size_t read_len;
>  		loff_t pos = sd->pos, prev_pos = pos;
>  
> +		pipe->ack_page_consuming = false;
>  		ret = do_splice_to(in, &pos, pipe, len, flags);
>  		if (unlikely(ret <= 0))
>  			goto out_release;
>  
> +		if (!slice_read_acked(pipe, flags)) {
> +			bytes = 0;
> +			ret = -EACCES;
> +			goto out_release;
> +		}
> +
>  		read_len = ret;
>  		sd->total_len = read_len;
>  
> diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
> index 6cb65df3e3ba..09ee1a9380ec 100644
> --- a/include/linux/pipe_fs_i.h
> +++ b/include/linux/pipe_fs_i.h
> @@ -72,6 +72,7 @@ struct pipe_inode_info {
>  	unsigned int r_counter;
>  	unsigned int w_counter;
>  	bool poll_usage;
> +	bool ack_page_consuming;	/* only for direct pipe */
>  	struct page *tmp_page;
>  	struct fasync_struct *fasync_readers;
>  	struct fasync_struct *fasync_writers;
> @@ -218,6 +219,15 @@ static inline void pipe_discard_from(struct pipe_inode_info *pipe,
>  		pipe_buf_release(pipe, &pipe->bufs[--pipe->head & mask]);
>  }
>  
> +/*
> + * Called in ->splice_read() for confirming the READ/WRITE page is allowed
> + */
> +static inline void pipe_ack_page_consume(struct pipe_inode_info *pipe)
> +{
> +	if (!WARN_ON_ONCE(current->splice_pipe != pipe))
> +		pipe->ack_page_consuming = true;
> +}
> +
>  /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
>     memory allocation, whereas PIPE_BUF makes atomicity guarantees.  */
>  #define PIPE_SIZE		PAGE_SIZE
> diff --git a/include/linux/splice.h b/include/linux/splice.h
> index a55179fd60fc..98c471fd918d 100644
> --- a/include/linux/splice.h
> +++ b/include/linux/splice.h
> @@ -23,6 +23,28 @@
>  
>  #define SPLICE_F_ALL (SPLICE_F_MOVE|SPLICE_F_NONBLOCK|SPLICE_F_MORE|SPLICE_F_GIFT)
>  
> +/*
> + * Flags used for kernel internal page move from ->splice_read()
> + * by internal direct pipe, and user pipe can't touch these
> + * flags.
> + *
> + * Pages filled from ->splice_read() are usually moved/copied to
> + * ->splice_write(). Here address fuse/ublk zero copy by transferring
> + * page from device to file/socket for either READ or WRITE. So we
> + * need ->splice_read() to confirm if this READ/WRITE is allowed on
> + * pages filled in ->splice_read().
> + */
> +/* The page consumer is for READ from pages moved from direct pipe */
> +#define SPLICE_F_KERN_FOR_READ	(0x100)
> +/* The page consumer is for WRITE to pages moved from direct pipe */
> +#define SPLICE_F_KERN_FOR_WRITE	(0x200)
> +/*
> + * ->splice_read() has to confirm if consumer's READ/WRITE on pages
> + * is allow. If yes, ->splice_read() has to set pipe->ack_page_consuming,
> + * otherwise pipe->ack_page_consuming has to be cleared.
> + */
> +#define SPLICE_F_KERN_NEED_CONFIRM	(0x400)
> +

As Linus commented in another thread, this patch is really ugly.

I'd suggest to change to the following one, any comment is welcome!


From fb9340ce72a1c58c9428d2af7cb00b55fa4ba799 Mon Sep 17 00:00:00 2001
From: Ming Lei <[email protected]>
Date: Fri, 10 Feb 2023 09:16:46 +0000
Subject: [PATCH 2/4] fs/splice: add private flags for cross-check in two ends

Pages are transferred via pipe/splice between different subsystems.

The source subsystem may know if spliced pages are readable or
writable. The sink subsystem may know if spliced pages need to
write to or read from.

Add two pair of private flags, so that the source subsystem and
sink subsystem can run cross-check about page use correctness,
and they are supposed to be used in case of communicating over
direct pipe only. Generic splice and pipe code is supposed to
not touch the two pair of private flags.

Signed-off-by: Ming Lei <[email protected]>
---
 include/linux/pipe_fs_i.h | 8 ++++++++
 include/linux/splice.h    | 9 +++++++++
 2 files changed, 17 insertions(+)

diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 6cb65df3e3ba..959c8b9287f4 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -14,6 +14,14 @@
 #define PIPE_BUF_FLAG_LOSS	0x40	/* Message loss happened after this buffer */
 #endif
 
+/*
+ * Used by source/sink end only, don't touch them in generic
+ * splice/pipe code. Set in source side, and check in sink
+ * side
+ */
+#define PIPE_BUF_PRIV_FLAG_MAY_READ	0x1000 /* sink can read from page */
+#define PIPE_BUF_PRIV_FLAG_MAY_WRITE	0x2000 /* sink can write to page */
+
 /**
  *	struct pipe_buffer - a linux kernel pipe buffer
  *	@page: the page containing the data for the pipe buffer
diff --git a/include/linux/splice.h b/include/linux/splice.h
index a55179fd60fc..90d1d2b5327d 100644
--- a/include/linux/splice.h
+++ b/include/linux/splice.h
@@ -23,6 +23,15 @@
 
 #define SPLICE_F_ALL (SPLICE_F_MOVE|SPLICE_F_NONBLOCK|SPLICE_F_MORE|SPLICE_F_GIFT)
 
+/*
+ * Use for source/sink side only, don't touch them in generic splice/pipe
+ * code. Pass from sink side, and check in source side.
+ *
+ * So far, it is only for communicating over direct pipe.
+ */
+#define SPLICE_F_PRIV_FOR_READ	(0x100)	/* sink side will read from page */
+#define SPLICE_F_PRIV_FOR_WRITE	(0x200) /* sink side will write page */
+
 /*
  * Passed to the actors
  */
-- 
2.38.1



Thanks,
Ming


  reply	other threads:[~2023-02-11 15:43 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-10 15:32 [PATCH 0/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Ming Lei
2023-02-10 15:32 ` [PATCH 1/4] fs/splice: enhance direct pipe & splice for moving pages in kernel Ming Lei
2023-02-11 15:42   ` Ming Lei [this message]
2023-02-11 18:57     ` Linus Torvalds
2023-02-12  1:39       ` Ming Lei
2023-02-13 20:04         ` Linus Torvalds
2023-02-14  0:52           ` Ming Lei
2023-02-14  2:35             ` Ming Lei
2023-02-14 11:03           ` Miklos Szeredi
2023-02-14 14:35             ` Ming Lei
2023-02-14 15:39               ` Miklos Szeredi
2023-02-15  0:11                 ` Ming Lei
2023-02-15 10:36                   ` Miklos Szeredi
2023-02-10 15:32 ` [PATCH 2/4] fs/splice: allow to ignore signal in __splice_from_pipe Ming Lei
2023-02-10 15:32 ` [PATCH 3/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Ming Lei
2023-02-11 15:45   ` Jens Axboe
2023-02-11 16:12     ` Ming Lei
2023-02-11 16:52       ` Jens Axboe
2023-02-12  3:22         ` Ming Lei
2023-02-12  3:55           ` Jens Axboe
2023-02-13  1:06             ` Ming Lei
2023-02-11 17:13   ` Jens Axboe
2023-02-12  1:48     ` Ming Lei
2023-02-12  2:42       ` Jens Axboe
2023-02-10 15:32 ` [PATCH 4/4] ublk_drv: support splice based read/write zero copy Ming Lei
2023-02-10 21:54 ` [PATCH 0/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Jens Axboe
2023-02-10 22:19   ` Jens Axboe
2023-02-11  5:13   ` Ming Lei
2023-02-11 15:45     ` Jens Axboe
2023-02-14 16:36 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+e3b+Myg/30hlYk@T590 \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox