From: Jens Axboe <[email protected]>
To: Ming Lei <[email protected]>,
[email protected], [email protected],
[email protected],
Alexander Viro <[email protected]>
Cc: Stefan Hajnoczi <[email protected]>,
Miklos Szeredi <[email protected]>,
Bernd Schubert <[email protected]>,
Nitesh Shetty <[email protected]>,
Christoph Hellwig <[email protected]>,
Ziyang Zhang <[email protected]>
Subject: Re: [PATCH 3/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF
Date: Sat, 11 Feb 2023 08:45:18 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 2/10/23 8:32?AM, Ming Lei wrote:
> IORING_OP_READ_SPLICE_BUF: read to buffer which is built from
> ->read_splice() of specified fd, so user needs to provide (splice_fd, offset, len)
> for building buffer.
>
> IORING_OP_WRITE_SPLICE_BUF: write from buffer which is built from
> ->read_splice() of specified fd, so user needs to provide (splice_fd, offset, len)
> for building buffer.
>
> The typical use case is for supporting ublk/fuse io_uring zero copy,
> and READ/WRITE OP retrieves ublk/fuse request buffer via direct pipe
> from device->read_splice(), then READ/WRITE can be done to/from this
> buffer directly.
Main question here - would this be better not plumbed up through the rw
path? Might be cleaner, even if it either requires a bit of helper
refactoring or accepting a bit of duplication. But would still be better
than polluting the rw fast path imho.
Also seems like this should be separately testable. We can't add new
opcodes that don't have a feature test at least, and should also have
various corner case tests. A bit of commenting outside of this below.
> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
> index 5238ecd7af6a..91e8d8f96134 100644
> --- a/io_uring/opdef.c
> +++ b/io_uring/opdef.c
> @@ -427,6 +427,31 @@ const struct io_issue_def io_issue_defs[] = {
> .prep = io_eopnotsupp_prep,
> #endif
> },
> + [IORING_OP_READ_SPLICE_BUF] = {
> + .needs_file = 1,
> + .unbound_nonreg_file = 1,
> + .pollin = 1,
> + .plug = 1,
> + .audit_skip = 1,
> + .ioprio = 1,
> + .iopoll = 1,
> + .iopoll_queue = 1,
> + .prep = io_prep_rw,
> + .issue = io_read,
> + },
> + [IORING_OP_WRITE_SPLICE_BUF] = {
> + .needs_file = 1,
> + .hash_reg_file = 1,
> + .unbound_nonreg_file = 1,
> + .pollout = 1,
> + .plug = 1,
> + .audit_skip = 1,
> + .ioprio = 1,
> + .iopoll = 1,
> + .iopoll_queue = 1,
> + .prep = io_prep_rw,
> + .issue = io_write,
> + },
Are these really safe with iopoll?
> +static int io_prep_rw_splice_buf(struct io_kiocb *req,
> + const struct io_uring_sqe *sqe)
> +{
> + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
> + unsigned nr_pages = io_rw_splice_buf_nr_bvecs(rw->len);
> + loff_t splice_off = READ_ONCE(sqe->splice_off_in);
> + struct io_rw_splice_buf_data data;
> + struct io_mapped_ubuf *imu;
> + struct fd splice_fd;
> + int ret;
> +
> + splice_fd = fdget(READ_ONCE(sqe->splice_fd_in));
> + if (!splice_fd.file)
> + return -EBADF;
Seems like this should check for SPLICE_F_FD_IN_FIXED, and also use
io_file_get_normal() for the non-fixed case in case someone passed in an
io_uring fd.
> + data.imu = &imu;
> +
> + rw->addr = 0;
> + req->flags |= REQ_F_NEED_CLEANUP;
> +
> + ret = __io_prep_rw_splice_buf(req, &data, splice_fd.file, rw->len,
> + splice_off);
> + imu = *data.imu;
> + imu->acct_pages = 0;
> + imu->ubuf = 0;
> + imu->ubuf_end = data.total;
> + rw->len = data.total;
> + req->imu = imu;
> + if (!data.total) {
> + io_rw_cleanup_splice_buf(req);
> + } else {
> + ret = 0;
> + }
> +out_put_fd:
> + if (splice_fd.file)
> + fdput(splice_fd);
> +
> + return ret;
> +}
If the operation is done, clear NEED_CLEANUP and do the cleanup here?
That'll be faster.
--
Jens Axboe
next prev parent reply other threads:[~2023-02-11 15:45 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-10 15:32 [PATCH 0/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Ming Lei
2023-02-10 15:32 ` [PATCH 1/4] fs/splice: enhance direct pipe & splice for moving pages in kernel Ming Lei
2023-02-11 15:42 ` Ming Lei
2023-02-11 18:57 ` Linus Torvalds
2023-02-12 1:39 ` Ming Lei
2023-02-13 20:04 ` Linus Torvalds
2023-02-14 0:52 ` Ming Lei
2023-02-14 2:35 ` Ming Lei
2023-02-14 11:03 ` Miklos Szeredi
2023-02-14 14:35 ` Ming Lei
2023-02-14 15:39 ` Miklos Szeredi
2023-02-15 0:11 ` Ming Lei
2023-02-15 10:36 ` Miklos Szeredi
2023-02-10 15:32 ` [PATCH 2/4] fs/splice: allow to ignore signal in __splice_from_pipe Ming Lei
2023-02-10 15:32 ` [PATCH 3/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Ming Lei
2023-02-11 15:45 ` Jens Axboe [this message]
2023-02-11 16:12 ` Ming Lei
2023-02-11 16:52 ` Jens Axboe
2023-02-12 3:22 ` Ming Lei
2023-02-12 3:55 ` Jens Axboe
2023-02-13 1:06 ` Ming Lei
2023-02-11 17:13 ` Jens Axboe
2023-02-12 1:48 ` Ming Lei
2023-02-12 2:42 ` Jens Axboe
2023-02-10 15:32 ` [PATCH 4/4] ublk_drv: support splice based read/write zero copy Ming Lei
2023-02-10 21:54 ` [PATCH 0/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Jens Axboe
2023-02-10 22:19 ` Jens Axboe
2023-02-11 5:13 ` Ming Lei
2023-02-11 15:45 ` Jens Axboe
2023-02-14 16:36 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox