public inbox for [email protected]
 help / color / mirror / Atom feed
From: Xiaoguang Wang <[email protected]>
To: [email protected], [email protected],
	[email protected]
Cc: [email protected], [email protected], [email protected],
	[email protected]
Subject: [RFC v2 0/4] Add io_uring & ebpf based methods to implement zero-copy for ublk
Date: Wed, 22 Feb 2023 21:25:30 +0800	[thread overview]
Message-ID: <[email protected]> (raw)

Normally, userspace block device implementations need to copy data between
kernel block layer's io requests and userspace block device's userspace
daemon. For example, ublk and tcmu both have similar logic, but this
operation will consume cpu resources obviously, especially for large io.

There are methods trying to reduce these cpu overheads, then userspace
block device's io performance will be improved further. These methods
contain: 1) use special hardware to do memory copy, but seems not all
architectures have these special hardware; 2) software methods, such as
mmap kernel block layer's io requests's data to userspace daemon [1],
but it has page table's map/unmap, tlb flush overhead, security issue,
etc, and it maybe only friendly to large io.

To solve this problem, I'd propose a new method, which will combine the
respective advantages of io_uring and ebpf. Add a new program type
BPF_PROG_TYPE_UBLK for ublk, userspace block device daemon process will
register ebpf progs, which will use bpf helper offered by ublk bpf prog
type to submit io requests on behalf of daemon process in kernel, note
io requests will use kernel block layer io reqeusts's pages to do io,
then the memory copy overhead will be gone.

Currently only one helper has beed added:
    u64 bpf_ublk_queue_sqe(struct ublk_io_bpf_ctx *bpf_ctx,
                struct io_uring_sqe *sqe, u32 sqe_len, u32, fd)

This helper will use io_uring to submit io requests, so we need to make
io_uring be able to submit a sqe located in kernel(Some codes idea comes
from Pavel's patchset [2], but pavel's patch needs sqe->buf comes from
userspace addr). Bpf prog will initialize sqes, but does not need to
initializes sqes' buf field, sqe->buf will come from kernel block layer
io requests in some form. See patch 2 for more.

By using ebpf, we can implement various userspace io logic in kernel,
and the ultimate goal is to support users to build an in-kernel io
agent for userspace daemon, userspace block device's daemon justs
registers an ebpf at startup, though which I think there'll be a long
way to go. There'll be advantages at least:
  1. Remove memory copy between kernel block layer and userspace daemon
completely.
  2. Save memory. Userspace daemon doesn't need to maintain memory to
issue and complete io requests, and use kernel block layer io requests
memory directly.
  2. We may reduce the number of round trips between kernel and userspace
daemon, so may reduce kernel & userspace context switch overheads.

HOW to test:
  git clone https://github.com/ming1/ubdsrv
  cd ubdsrv
  git am -3 0001-Add-ebpf-support.patch
  # replace "/root/ublk/" with your own linux build directory
  cd bpf; make; cd ..;
  ./build_with_liburing_src
  ./ublk add -t loop -q 1 -d 128 -f loop.file

fio job file:
  [global]
  direct=1
  filename=/dev/ublkb0
  time_based
  runtime=60
  numjobs=1
  cpus_allowed=1

  [rand-read-4k]
  bs=2048K
  iodepth=16
  ioengine=libaio
  rw=randwrite
  stonewall

Without this patch:
  READ: bw=373MiB/s (392MB/s), 373MiB/s-373MiB/s (392MB/s-392MB/s), io=21.9GiB (23.5GB), run=60042-60042msec
  WRITE: bw=371MiB/s (389MB/s), 371MiB/s-371MiB/s (389MB/s-389MB/s), io=21.8GiB (23.4GB), run=60042-60042msec
  ublk daemon's cpu utilization is about 12.5%, showed by top tool.

With this patch:
  READ: bw=373MiB/s (392MB/s), 373MiB/s-373MiB/s (392MB/s-392MB/s), io=21.9GiB (23.5GB), run=60043-60043msec
  WRITE: bw=371MiB/s (389MB/s), 371MiB/s-371MiB/s (389MB/s-389MB/s), io=21.8GiB (23.4GB), run=60043-60043msec
ublk daemon's cpu utilization is about 1%, showed by top tool.

From above tests, this method can reduce cpu copy overhead obviously.

TODO:
I must say this patchset is still just a RFC for design.

1. Currently for this patchset, I just make ublk ebpf prog submit io requests
using io_uring in kernel, cqe event still needs to be handled in userspace
daemon. Once later we succeed in make io_uring handle cqe in kernel, ublk
ebpf prog can implement io in kernel.

2. I have not done much tests yet, will run liburing/ublk/blktests later.

3. Try to build complicated ebpf prog.

Any review and suggestions are welcome, thanks.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/

Xiaoguang Wang (4):
  bpf: add UBLK program type
  io_uring: enable io_uring to submit sqes located in kernel
  io_uring: introduce IORING_URING_CMD_UNLOCK flag
  ublk_drv: add ebpf support

 drivers/block/ublk_drv.c       | 284 +++++++++++++++++++++++++++++++--
 include/linux/bpf_types.h      |   2 +
 include/linux/io_uring.h       |  12 ++
 include/linux/io_uring_types.h |   8 +-
 include/uapi/linux/bpf.h       |   2 +
 include/uapi/linux/io_uring.h  |   5 +
 include/uapi/linux/ublk_cmd.h  |  18 +++
 io_uring/io_uring.c            |  59 ++++++-
 io_uring/rsrc.c                |  18 +++
 io_uring/rsrc.h                |   4 +
 io_uring/rw.c                  |   7 +
 io_uring/uring_cmd.c           |   6 +-
 kernel/bpf/syscall.c           |   1 +
 kernel/bpf/verifier.c          |  10 +-
 scripts/bpf_doc.py             |   4 +
 tools/include/uapi/linux/bpf.h |  10 ++
 tools/lib/bpf/libbpf.c         |   1 +
 17 files changed, 434 insertions(+), 17 deletions(-)

-- 
2.31.1


             reply	other threads:[~2023-02-22 13:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-22 13:25 Xiaoguang Wang [this message]
2023-02-22 13:25 ` [RFC v2 1/4] bpf: add UBLK program type Xiaoguang Wang
2023-02-22 13:25 ` [RFC v2 2/4] io_uring: enable io_uring to submit sqes located in kernel Xiaoguang Wang
2023-02-22 13:25 ` [RFC v2 3/4] io_uring: introduce IORING_URING_CMD_UNLOCK flag Xiaoguang Wang
2023-02-22 13:25 ` [RFC v2 4/4] ublk_drv: add ebpf support Xiaoguang Wang
2023-02-22 19:25   ` Alexei Starovoitov
2023-02-23 14:01     ` Xiaoguang Wang
2023-02-22 13:27 ` [PATCH] Add " Xiaoguang Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230222132534.114574-1-xiaoguang.wang@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox