From: Jens Axboe <[email protected]>
To: Ming Lei <[email protected]>
Cc: [email protected], [email protected],
[email protected],
Alexander Viro <[email protected]>,
Stefan Hajnoczi <[email protected]>,
Miklos Szeredi <[email protected]>,
Bernd Schubert <[email protected]>,
Nitesh Shetty <[email protected]>,
Christoph Hellwig <[email protected]>,
Ziyang Zhang <[email protected]>
Subject: Re: [PATCH 3/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF
Date: Sat, 11 Feb 2023 09:52:58 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <Y+e+i5BXQHcqdDGo@T590>
On 2/11/23 9:12?AM, Ming Lei wrote:
> On Sat, Feb 11, 2023 at 08:45:18AM -0700, Jens Axboe wrote:
>> On 2/10/23 8:32?AM, Ming Lei wrote:
>>> IORING_OP_READ_SPLICE_BUF: read to buffer which is built from
>>> ->read_splice() of specified fd, so user needs to provide (splice_fd, offset, len)
>>> for building buffer.
>>>
>>> IORING_OP_WRITE_SPLICE_BUF: write from buffer which is built from
>>> ->read_splice() of specified fd, so user needs to provide (splice_fd, offset, len)
>>> for building buffer.
>>>
>>> The typical use case is for supporting ublk/fuse io_uring zero copy,
>>> and READ/WRITE OP retrieves ublk/fuse request buffer via direct pipe
>>> from device->read_splice(), then READ/WRITE can be done to/from this
>>> buffer directly.
>>
>> Main question here - would this be better not plumbed up through the rw
>> path? Might be cleaner, even if it either requires a bit of helper
>> refactoring or accepting a bit of duplication. But would still be better
>> than polluting the rw fast path imho.
>
> The buffer is actually IO buffer, which has to be plumbed up in IO path,
> and it can't be done like the registered buffer.
>
> The only affect on fast path is :
>
> if (io_rw_splice_buf(req)) //which just check opcode
> return io_prep_rw_splice_buf(req, sqe);
>
> and the cleanup code which is only done for the two new OPs.
>
> Or maybe I misunderstand your point? Or any detailed suggestion?
>
> Actually the code should be factored into generic helper, since net.c
> need to use them too. Probably it needs to move to rsrc.c?
Yep, just refactoring out those bits as a prep thing. rsrc could work,
or perhaps a new file for that.
>> Also seems like this should be separately testable. We can't add new
>> opcodes that don't have a feature test at least, and should also have
>> various corner case tests. A bit of commenting outside of this below.
>
> OK, I will write/add one very simple ublk userspace to liburing for
> test purpose.
Thanks!
>>> diff --git a/io_uring/opdef.c b/io_uring/opdef.c
>>> index 5238ecd7af6a..91e8d8f96134 100644
>>> --- a/io_uring/opdef.c
>>> +++ b/io_uring/opdef.c
>>> @@ -427,6 +427,31 @@ const struct io_issue_def io_issue_defs[] = {
>>> .prep = io_eopnotsupp_prep,
>>> #endif
>>> },
>>> + [IORING_OP_READ_SPLICE_BUF] = {
>>> + .needs_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollin = 1,
>>> + .plug = 1,
>>> + .audit_skip = 1,
>>> + .ioprio = 1,
>>> + .iopoll = 1,
>>> + .iopoll_queue = 1,
>>> + .prep = io_prep_rw,
>>> + .issue = io_read,
>>> + },
>>> + [IORING_OP_WRITE_SPLICE_BUF] = {
>>> + .needs_file = 1,
>>> + .hash_reg_file = 1,
>>> + .unbound_nonreg_file = 1,
>>> + .pollout = 1,
>>> + .plug = 1,
>>> + .audit_skip = 1,
>>> + .ioprio = 1,
>>> + .iopoll = 1,
>>> + .iopoll_queue = 1,
>>> + .prep = io_prep_rw,
>>> + .issue = io_write,
>>> + },
>>
>> Are these really safe with iopoll?
>
> Yeah, after the buffer is built, the handling is basically
> same with IORING_OP_WRITE_FIXED, so I think it is safe.
Yeah, on a second look, as these are just using the normal read/write
path after that should be fine indeed.
>>
>>> +static int io_prep_rw_splice_buf(struct io_kiocb *req,
>>> + const struct io_uring_sqe *sqe)
>>> +{
>>> + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
>>> + unsigned nr_pages = io_rw_splice_buf_nr_bvecs(rw->len);
>>> + loff_t splice_off = READ_ONCE(sqe->splice_off_in);
>>> + struct io_rw_splice_buf_data data;
>>> + struct io_mapped_ubuf *imu;
>>> + struct fd splice_fd;
>>> + int ret;
>>> +
>>> + splice_fd = fdget(READ_ONCE(sqe->splice_fd_in));
>>> + if (!splice_fd.file)
>>> + return -EBADF;
>>
>> Seems like this should check for SPLICE_F_FD_IN_FIXED, and also use
>> io_file_get_normal() for the non-fixed case in case someone passed in an
>> io_uring fd.
>
> SPLICE_F_FD_IN_FIXED needs one extra word for holding splice flags, if
> we can use sqe->addr3, I think it is doable.
I haven't checked the rest, but you can't just use ->splice_flags for
this?
In any case, the get path needs to look like io_tee() here, and:
>>> +out_put_fd:
>>> + if (splice_fd.file)
>>> + fdput(splice_fd);
this put needs to be gated on whether it's a fixed file or not.
>> If the operation is done, clear NEED_CLEANUP and do the cleanup here?
>> That'll be faster.
>
> The buffer has to be cleaned up after req is completed, since bvec
> table is needed for bio, and page reference need to be dropped after
> IO is done too.
I mean when you clear that flag, call the cleanup bits you otherwise
would've called on later cleanup.
--
Jens Axboe
next prev parent reply other threads:[~2023-02-11 16:53 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-10 15:32 [PATCH 0/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Ming Lei
2023-02-10 15:32 ` [PATCH 1/4] fs/splice: enhance direct pipe & splice for moving pages in kernel Ming Lei
2023-02-11 15:42 ` Ming Lei
2023-02-11 18:57 ` Linus Torvalds
2023-02-12 1:39 ` Ming Lei
2023-02-13 20:04 ` Linus Torvalds
2023-02-14 0:52 ` Ming Lei
2023-02-14 2:35 ` Ming Lei
2023-02-14 11:03 ` Miklos Szeredi
2023-02-14 14:35 ` Ming Lei
2023-02-14 15:39 ` Miklos Szeredi
2023-02-15 0:11 ` Ming Lei
2023-02-15 10:36 ` Miklos Szeredi
2023-02-10 15:32 ` [PATCH 2/4] fs/splice: allow to ignore signal in __splice_from_pipe Ming Lei
2023-02-10 15:32 ` [PATCH 3/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Ming Lei
2023-02-11 15:45 ` Jens Axboe
2023-02-11 16:12 ` Ming Lei
2023-02-11 16:52 ` Jens Axboe [this message]
2023-02-12 3:22 ` Ming Lei
2023-02-12 3:55 ` Jens Axboe
2023-02-13 1:06 ` Ming Lei
2023-02-11 17:13 ` Jens Axboe
2023-02-12 1:48 ` Ming Lei
2023-02-12 2:42 ` Jens Axboe
2023-02-10 15:32 ` [PATCH 4/4] ublk_drv: support splice based read/write zero copy Ming Lei
2023-02-10 21:54 ` [PATCH 0/4] io_uring: add IORING_OP_READ[WRITE]_SPLICE_BUF Jens Axboe
2023-02-10 22:19 ` Jens Axboe
2023-02-11 5:13 ` Ming Lei
2023-02-11 15:45 ` Jens Axboe
2023-02-14 16:36 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox