public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: ming.lei@redhat.com
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	io-uring@vger.kernel.org,
	Gabriel Krisman Bertazi <krisman@collabora.com>,
	ZiyangZhang <ZiyangZhang@linux.alibaba.com>,
	Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	kwolf@redhat.com, sgarzare@redhat.com
Subject: Re: [RFC PATCH] ubd: add io_uring based userspace block driver
Date: Mon, 16 May 2022 20:29:25 +0100	[thread overview]
Message-ID: <YoKmFYjIe1AWk/P8@stefanha-x1.localdomain> (raw)
In-Reply-To: <20220509092312.254354-1-ming.lei@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2778 bytes --]

Hi,
This looks interesting! I have some questions:

1. What is the ubdsrv permission model?

A big usability challenge for *-in-userspace interfaces is the balance
between security and allowing unprivileged processes to use these
features.

- Does /dev/ubd-control need to be privileged? I guess the answer is
  yes since an evil ubdsrv can hang I/O and corrupt data in hopes of
  triggering file system bugs.
- Can multiple processes that don't trust each other use UBD at the same
  time? I guess not since ubd_index_idr is global.
- What about containers and namespaces? They currently have (write)
  access to the same global ubd_index_idr.
- Maybe there should be a struct ubd_device "owner" (struct
  task_struct *) so only devices created by the current process can be
  modified?

2. io_uring_cmd design

The rationale for the io_uring_cmd design is not explained in the cover
letter. I think it's worth explaining the design. Here are my guesses:

The same thing can be achieved with just file_operations and io_uring.
ubdsrv could read I/O submissions with IORING_OP_READ and write I/O
completions with IORING_OP_WRITE. That would require 2 sqes per
roundtrip instead of 1, but the same number of io_uring_enter(2) calls
since multiple sqes/cqes can be batched per syscall:

- IORING_OP_READ, addr=(struct ubdsrv_io_desc*) (for submission)
- IORING_OP_WRITE, addr=(struct ubdsrv_io_cmd*) (for completion)

Both operations require a copy_to/from_user() to access the command
metadata.

The io_uring_cmd approach works differently. The IORING_OP_URING_CMD sqe
carries a 40-byte payload so it's possible to embed struct ubdsrv_io_cmd
inside it. The struct ubdsrv_io_desc mmap gets around the fact that
io_uring cqes contain no payload. The driver therefore needs a
side-channel to transfer the request submission details to ubdsrv. I
don't see much of a difference between IORING_OP_READ and the mmap
approach though.

It's not obvious to me how much more efficient the io_uring_cmd approach
is, but taking fewer trips around the io_uring submission/completion
code path is likely to be faster. Something similar can be done with
file_operations ->ioctl(), but I guess the point of using io_uring is
that is composes. If ubdsrv itself wants to use io_uring for other I/O
activity (e.g. networking, disk I/O, etc) then it can do so and won't be
stuck in a blocking ioctl() syscall.

It would be nice if you could write 2 or 3 paragraphs explaining why the
io_uring_cmd design and the struct ubdsrv_io_desc mmap was chosen.

3. Miscellaneous stuff

- There isn't much in the way of memory ordering in the code. I worry a
  little that changes to the struct ubdsrv_io_desc mmap may not be
  visible at the expected time with respect to the io_uring cq ring.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2022-05-16 19:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09  9:23 [RFC PATCH] ubd: add io_uring based userspace block driver Ming Lei
2022-05-09 15:10 ` Gabriel Krisman Bertazi
2022-05-10  1:57   ` Ming Lei
2022-05-10  4:22   ` Ziyang Zhang
2022-05-09 16:00 ` Randy Dunlap
2022-05-09 18:11   ` Gabriel Krisman Bertazi
2022-05-09 18:13     ` Jens Axboe
2022-05-09 16:09 ` Jens Axboe
2022-05-10  2:58   ` Ming Lei
2022-05-10  3:29     ` Jens Axboe
2022-05-10  7:38       ` Ming Lei
2022-05-09 18:14 ` Martin Raiber
2022-05-16 19:29 ` Stefan Hajnoczi [this message]
2022-05-17  1:57   ` Ming Lei
2022-05-17  6:17     ` Stefan Hajnoczi
2022-05-30  7:07 ` Pavel Machek
2022-06-02  3:19   ` Ming Lei
2022-06-06  2:15     ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YoKmFYjIe1AWk/P8@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=ZiyangZhang@linux.alibaba.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=krisman@collabora.com \
    --cc=kwolf@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox