From: Pavel Begunkov <[email protected]>
To: Ming Lei <[email protected]>, Jens Axboe <[email protected]>,
[email protected]
Cc: [email protected], Miklos Szeredi <[email protected]>,
ZiyangZhang <[email protected]>,
Xiaoguang Wang <[email protected]>,
Bernd Schubert <[email protected]>
Subject: Re: [PATCH V2 00/17] io_uring/ublk: add IORING_OP_FUSED_CMD
Date: Tue, 7 Mar 2023 17:17:04 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 3/7/23 15:37, Pavel Begunkov wrote:
> On 3/7/23 14:15, Ming Lei wrote:
>> Hello,
>>
>> Add IORING_OP_FUSED_CMD, it is one special URING_CMD, which has to
>> be SQE128. The 1st SQE(master) is one 64byte URING_CMD, and the 2nd
>> 64byte SQE(slave) is another normal 64byte OP. For any OP which needs
>> to support slave OP, io_issue_defs[op].fused_slave needs to be set as 1,
>> and its ->issue() can retrieve/import buffer from master request's
>> fused_cmd_kbuf. The slave OP is actually submitted from kernel, part of
>> this idea is from Xiaoguang's ublk ebpf patchset, but this patchset
>> submits slave OP just like normal OP issued from userspace, that said,
>> SQE order is kept, and batching handling is done too.
>
> From a quick look through patches it all looks a bit complicated
> and intrusive, all over generic hot paths. I think instead we
> should be able to use registered buffer table as intermediary and
> reuse splicing. Let me try it out
Here we go, isolated in a new opcode, and in the end should work
with any file supporting splice. It's a quick prototype, it's lacking
and there are many obvious fatal bugs. It also needs some optimisations,
improvements on how executed by io_uring and extra stuff like
memcpy ops and fixed buf recv/send. I'll clean it up.
I used a test below, it essentially does zc recv.
https://github.com/isilence/liburing/commit/81fe705739af7d9b77266f9aa901c1ada870739d
From 87ad9e8e3aed683aa040fb4b9ae499f8726ba393 Mon Sep 17 00:00:00 2001
Message-Id: <87ad9e8e3aed683aa040fb4b9ae499f8726ba393.1678208911.git.asml.silence@gmail.com>
From: Pavel Begunkov <[email protected]>
Date: Tue, 7 Mar 2023 17:01:44 +0000
Subject: [POC 1/1] io_uring: splicing into reg buf table
EXTREMELY BUGGY! Not for inclusion.
Add a new operation called IORING_OP_SPLICE_FROM,
which splices from a file into the registered buffer table. This is
done in a zerocopy fashion with a caveat that the user won't have
direct access to the data, however it can use it with any io_uring
request supporting registered buffers.
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/uapi/linux/io_uring.h | 1 +
io_uring/io_uring.c | 4 +-
io_uring/opdef.c | 10 ++++
io_uring/splice.c | 98 +++++++++++++++++++++++++++++++++++
io_uring/splice.h | 3 ++
5 files changed, 114 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 709de6d4feb2..a91ce1d2ebd7 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -223,6 +223,7 @@ enum io_uring_op {
IORING_OP_URING_CMD,
IORING_OP_SEND_ZC,
IORING_OP_SENDMSG_ZC,
+ IORING_OP_SPLICE_FROM,
/* this goes last, obviously */
IORING_OP_LAST,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 7625597b5227..b7389a6ea190 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2781,8 +2781,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
io_wait_rsrc_data(ctx->file_data);
mutex_lock(&ctx->uring_lock);
- if (ctx->buf_data)
- __io_sqe_buffers_unregister(ctx);
+ // if (ctx->buf_data)
+ // __io_sqe_buffers_unregister(ctx);
if (ctx->file_data)
__io_sqe_files_unregister(ctx);
io_cqring_overflow_kill(ctx);
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index cca7c5b55208..28d4fa42676b 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -428,6 +428,13 @@ const struct io_issue_def io_issue_defs[] = {
.prep = io_eopnotsupp_prep,
#endif
},
+ [IORING_OP_SPLICE_FROM] = {
+ .needs_file = 1,
+ .unbound_nonreg_file = 1,
+ // .pollin = 1,
+ .prep = io_splice_from_prep,
+ .issue = io_splice_from,
+ }
};
@@ -648,6 +655,9 @@ const struct io_cold_def io_cold_defs[] = {
.fail = io_sendrecv_fail,
#endif
},
+ [IORING_OP_SPLICE_FROM] = {
+ .name = "SPLICE_FROM",
+ }
};
const char *io_uring_get_opcode(u8 opcode)
diff --git a/io_uring/splice.c b/io_uring/splice.c
index 2a4bbb719531..0467e9f46e99 100644
--- a/io_uring/splice.c
+++ b/io_uring/splice.c
@@ -8,11 +8,13 @@
#include <linux/namei.h>
#include <linux/io_uring.h>
#include <linux/splice.h>
+#include <linux/nospec.h>
#include <uapi/linux/io_uring.h>
#include "io_uring.h"
#include "splice.h"
+#include "rsrc.h"
struct io_splice {
struct file *file_out;
@@ -119,3 +121,99 @@ int io_splice(struct io_kiocb *req, unsigned int issue_flags)
io_req_set_res(req, ret, 0);
return IOU_OK;
}
+
+struct io_splice_from {
+ struct file *file;
+ loff_t off;
+ u64 len;
+};
+
+
+int io_splice_from_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+ struct io_splice_from *sp = io_kiocb_to_cmd(req, struct io_splice_from);
+
+ if (unlikely(sqe->splice_flags || sqe->splice_fd_in || sqe->ioprio ||
+ sqe->addr || sqe->addr3))
+ return -EINVAL;
+
+ req->buf_index = READ_ONCE(sqe->buf_index);
+
+ sp->len = READ_ONCE(sqe->len);
+ if (unlikely(!sp->len))
+ return -EINVAL;
+
+ sp->off = READ_ONCE(sqe->off);
+ return 0;
+}
+
+int io_splice_from(struct io_kiocb *req, unsigned int issue_flags)
+{
+ struct io_splice_from *sp = io_kiocb_to_cmd(req, struct io_splice_from);
+ loff_t *ppos = (sp->off == -1) ? NULL : &sp->off;
+ struct io_mapped_ubuf *imu;
+ struct pipe_inode_info *pi;
+ struct io_ring_ctx *ctx;
+ unsigned int pipe_tail;
+ int ret, i, nr_pages;
+ u16 index;
+
+ if (!sp->file->f_op->splice_read)
+ return -ENOTSUPP;
+
+ pi = alloc_pipe_info();
+ if (!pi)
+ return -ENOMEM;
+ pi->readers = 1;
+
+ ret = sp->file->f_op->splice_read(sp->file, ppos, pi, sp->len, 0);
+ if (ret < 0)
+ goto done;
+
+ nr_pages = pipe_occupancy(pi->head, pi->tail);
+ imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL);
+ if (!imu)
+ goto done;
+
+ ret = 0;
+ pipe_tail = pi->tail;
+ for (i = 0; !pipe_empty(pi->head, pipe_tail); i++) {
+ unsigned int mask = pi->ring_size - 1; // kill mask
+ struct pipe_buffer *buf = &pi->bufs[pipe_tail & mask];
+
+ bvec_set_page(&imu->bvec[i], buf->page, buf->len, buf->offset);
+ ret += buf->len;
+ pipe_tail++;
+ }
+ if (WARN_ON_ONCE(i != nr_pages))
+ return -EFAULT;
+
+ ctx = req->ctx;
+ io_ring_submit_lock(ctx, issue_flags);
+ if (unlikely(req->buf_index >= ctx->nr_user_bufs)) {
+ /* TODO: cleanup pages */
+ ret = -EFAULT;
+ kvfree(imu);
+ goto done_unlock;
+ }
+ index = array_index_nospec(req->buf_index, ctx->nr_user_bufs);
+ if (ctx->user_bufs[index] != ctx->dummy_ubuf) {
+ /* TODO: cleanup pages */
+ kvfree(imu);
+ ret = -EFAULT;
+ goto done_unlock;
+ }
+
+ imu->ubuf = 0;
+ imu->ubuf_end = ret;
+ imu->nr_bvecs = nr_pages;
+ ctx->user_bufs[index] = imu;
+done_unlock:
+ io_ring_submit_unlock(ctx, issue_flags);
+done:
+ free_pipe_info(pi);
+ if (ret != sp->len)
+ req_set_fail(req);
+ io_req_set_res(req, ret, 0);
+ return IOU_OK;
+}
diff --git a/io_uring/splice.h b/io_uring/splice.h
index 542f94168ad3..abdf5ad8e8d2 100644
--- a/io_uring/splice.h
+++ b/io_uring/splice.h
@@ -5,3 +5,6 @@ int io_tee(struct io_kiocb *req, unsigned int issue_flags);
int io_splice_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_splice(struct io_kiocb *req, unsigned int issue_flags);
+
+int io_splice_from_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_splice_from(struct io_kiocb *req, unsigned int issue_flags);
--
2.39.1
next prev parent reply other threads:[~2023-03-07 17:22 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-07 14:15 [PATCH V2 00/17] io_uring/ublk: add IORING_OP_FUSED_CMD Ming Lei
2023-03-07 14:15 ` [PATCH V2 01/17] io_uring: add IO_URING_F_FUSED and prepare for supporting OP_FUSED_CMD Ming Lei
2023-03-07 14:15 ` [PATCH V2 02/17] io_uring: increase io_kiocb->flags into 64bit Ming Lei
2023-03-07 14:15 ` [PATCH V2 03/17] io_uring: add IORING_OP_FUSED_CMD Ming Lei
2023-03-07 14:15 ` [PATCH V2 04/17] io_uring: support OP_READ/OP_WRITE for fused slave request Ming Lei
2023-03-07 14:15 ` [PATCH V2 05/17] io_uring: support OP_SEND_ZC/OP_RECV " Ming Lei
2023-03-09 7:46 ` kernel test robot
2023-03-09 17:22 ` kernel test robot
2023-03-07 14:15 ` [PATCH V2 06/17] block: ublk_drv: mark device as LIVE before adding disk Ming Lei
2023-03-08 3:48 ` Ziyang Zhang
2023-03-08 7:44 ` Ming Lei
2023-03-07 14:15 ` [PATCH V2 07/17] block: ublk_drv: add common exit handling Ming Lei
2023-03-14 17:15 ` kernel test robot
2023-03-07 14:15 ` [PATCH V2 08/17] block: ublk_drv: don't consider flush request in map/unmap io Ming Lei
2023-03-08 3:50 ` Ziyang Zhang
2023-03-07 14:15 ` [PATCH V2 09/17] block: ublk_drv: add two helpers to clean up map/unmap request Ming Lei
2023-03-09 3:12 ` Ziyang Zhang
2023-03-07 14:15 ` [PATCH V2 10/17] block: ublk_drv: clean up several helpers Ming Lei
2023-03-09 3:12 ` Ziyang Zhang
2023-03-07 14:15 ` [PATCH V2 11/17] block: ublk_drv: cleanup 'struct ublk_map_data' Ming Lei
2023-03-09 3:16 ` Ziyang Zhang
2023-03-07 14:15 ` [PATCH V2 12/17] block: ublk_drv: cleanup ublk_copy_user_pages Ming Lei
2023-03-07 23:57 ` kernel test robot
2023-03-15 7:05 ` Ziyang Zhang
2023-03-07 14:15 ` [PATCH V2 13/17] block: ublk_drv: grab request reference when the request is handled by userspace Ming Lei
2023-03-15 5:20 ` kernel test robot
2023-03-07 14:15 ` [PATCH V2 14/17] block: ublk_drv: support to copy any part of request pages Ming Lei
2023-03-07 14:15 ` [PATCH V2 15/17] block: ublk_drv: add read()/write() support for ublk char device Ming Lei
2023-03-07 14:15 ` [PATCH V2 16/17] block: ublk_drv: don't check buffer in case of zero copy Ming Lei
2023-03-07 14:15 ` [PATCH V2 17/17] block: ublk_drv: apply io_uring FUSED_CMD for supporting " Ming Lei
2023-03-07 15:37 ` [PATCH V2 00/17] io_uring/ublk: add IORING_OP_FUSED_CMD Pavel Begunkov
2023-03-07 17:17 ` Pavel Begunkov [this message]
2023-03-08 2:10 ` Ming Lei
2023-03-08 14:46 ` Pavel Begunkov
2023-03-08 16:17 ` Ming Lei
2023-03-08 16:54 ` Pavel Begunkov
2023-03-09 1:44 ` Ming Lei
2023-03-08 1:08 ` Ming Lei
2023-03-08 16:22 ` Pavel Begunkov
2023-03-09 2:05 ` Ming Lei
2023-03-15 7:08 ` Ziyang Zhang
2023-03-15 7:54 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox