[RFC 0/7] Rethinking splice

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: [email protected]
Cc: [email protected], [email protected]
Subject: [RFC 0/7] Rethinking splice
Date: Sun, 30 Apr 2023 10:35:22 +0100	[thread overview]
Message-ID: <[email protected]> (raw)

IORING_OP_SPLICE has problems, many of them are fundamental and rooted
in the uapi design, see the patch 8 description. This patchset introduces
a different approach, which came from discussions about splices
and fused commands and absorbed ideas from both of them. We remove
reliance onto pipes and registering "spliced" buffers with data as an
io_uring's registered buffer. Then the user can use it as a usual
registered buffer, e.g. pass it to IORING_OP_WRITE_FIXED.

Once a buffer is released, it'll be returned back to the file it
originated from via a callback. It's carried on on the level of the
enitre buffer rather than on per-page basis as with splice, which,
as noted by Ming, will allow more optimisations.

The communication with the target file is done by a new fops callback,
however the end mean of getting a buffer might change. It also peels
layers of code compared to splice requests, which helps it to be more
flexible and support more cases. For instance, Ming has a case where
it's beneficial for the target file to provide a buffer to be filled
with read/recv/etc. requests and then returned back to the file.

Testing:

I was benchmarking using liburing/examples/splice-bench.t [1], which
also needs additional test kernel patches [2]. It implements get-buf
for /dev/null, and the test grabs one page from it and then feeds it
back without any actual IO, then repeats.

fairness:
IORING_OP_SPLICE performs very poorly not even reaching 450K qps, so one
of the patches enables inline execution of it to make it more
interesting but is only fine for testing.

Buffer removal is done by OP_GET_BUF without issuing a separate op
for that. "GET_BUF + nop" emulates the overhead by additional additional
nop requests.

Another aspect is that OP_GET_BUF issues OP_WRITE_FIXED, which, as
profiles show, are quite expensive, which is not exactly a problem of
GET_BUF but skews results. E.g. io_get_buf() - 10.7, io_write() - 24.3%

The last bit is that the buffer removal, if done by a separate request,
might and likely will be batched with other requests, so "GET_BUF + nop"
is rather the worst case.

The numbers below are "requests / s".

QD  | splice2() | OP_SPLICE | OP_GET_BUF | GET_BUF, link | GET_BUF + nop
1   | 5009035   | 3697020   | 3886356    | 4616123       | 2886171
2   | 4859523   | 5205564   | 5309510    | 5591521       | 4139125
4   | 4908353   | 6265771   | 6415036    | 6331249       | 5198505
8   | 4955003   | 7141326   | 7243434    | 6850088       | 5984588
16  | 4959496   | 7640409   | 7794564    | 7208221       | 6587212
32  | 4937463   | 7868501   | 8103406    | 7385890       | 6844390

The test is obviously not exhausting and it should further be tried
and with more complicated cases. E.g. need quantify performance with
sockets, where apoll feature will be involved, and it'll need to get
internal partial IO retry support.

[1] https://github.com/isilence/liburing.git io_uring/get-buf-op
[2] https://github.com/isilence/linux.git io_uring/get-buf-op

Links for convenience:
https://github.com/isilence/liburing/tree/io_uring/get-buf-op
https://github.com/isilence/linux/tree/io_uring/get-buf-op

Pavel Begunkov (7):
  io_uring: add io_mapped_ubuf caches
  io_uring: add reg-buffer data directions
  io_uring: fail loop_rw_iter with pure bvec bufs
  io_uring/rsrc: introduce struct iou_buf_desc
  io_uring/rsrc: add buffer release callbacks
  io_uring/rsrc: introduce helper installing one buffer
  io_uring,fs: introduce IORING_OP_GET_BUF

 include/linux/fs.h             |  2 +
 include/linux/io_uring.h       | 19 +++++++
 include/linux/io_uring_types.h |  2 +
 include/uapi/linux/io_uring.h  |  1 +
 io_uring/io_uring.c            |  9 ++++
 io_uring/opdef.c               | 11 +++++
 io_uring/rsrc.c                | 80 ++++++++++++++++++++++++++----
 io_uring/rsrc.h                | 24 +++++++--
 io_uring/rw.c                  |  7 +++
 io_uring/splice.c              | 90 ++++++++++++++++++++++++++++++++++
 io_uring/splice.h              |  4 ++
 11 files changed, 235 insertions(+), 14 deletions(-)

-- 
2.40.0

next             reply	other threads:[~2023-04-30  9:37 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-30  9:35 Pavel Begunkov [this message]
2023-04-30  9:35 ` [RFC 1/7] io_uring: add io_mapped_ubuf caches Pavel Begunkov
2023-04-30  9:35 ` [RFC 2/7] io_uring: add reg-buffer data directions Pavel Begunkov
2023-04-30  9:35 ` [RFC 3/7] io_uring: fail loop_rw_iter with pure bvec bufs Pavel Begunkov
2023-04-30  9:35 ` [RFC 4/7] io_uring/rsrc: introduce struct iou_buf_desc Pavel Begunkov
2023-04-30  9:35 ` [RFC 5/7] io_uring/rsrc: add buffer release callbacks Pavel Begunkov
2023-04-30  9:35 ` [RFC 6/7] io_uring/rsrc: introduce helper installing one buffer Pavel Begunkov
2023-04-30  9:35 ` [RFC 7/7] io_uring,fs: introduce IORING_OP_GET_BUF Pavel Begunkov
2023-05-02 14:57   ` Ming Lei
2023-05-02 15:20     ` Ming Lei
2023-05-03 14:54     ` Pavel Begunkov
2023-05-04  2:06       ` Ming Lei
2023-05-08  2:30         ` Pavel Begunkov
2023-05-17  4:05           ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox