From: Ziyang Zhang <[email protected]>
To: Gabriel Krisman Bertazi <[email protected]>,
Ming Lei <[email protected]>
Cc: Jens Axboe <[email protected]>,
[email protected],
Harris James R <[email protected]>,
[email protected], [email protected],
Xiaoguang Wang <[email protected]>,
Stefan Hajnoczi <[email protected]>
Subject: Re: [PATCH V3 1/1] ublk: add io_uring based userspace block driver
Date: Tue, 5 Jul 2022 12:16:07 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 2022/7/5 06:10, Gabriel Krisman Bertazi wrote:
> Ming Lei <[email protected]> writes:
>
>> This is the driver part of userspace block driver(ublk driver), the other
>> part is userspace daemon part(ublksrv)[1].
>>
>> The two parts communicate by io_uring's IORING_OP_URING_CMD with one
>> shared cmd buffer for storing io command, and the buffer is read only for
>> ublksrv, each io command is indexed by io request tag directly, and
>> is written by ublk driver.
>>
>> For example, when one READ io request is submitted to ublk block driver, ublk
>> driver stores the io command into cmd buffer first, then completes one
>> IORING_OP_URING_CMD for notifying ublksrv, and the URING_CMD is issued to
>> ublk driver beforehand by ublksrv for getting notification of any new io request,
>> and each URING_CMD is associated with one io request by tag.
>>
>> After ublksrv gets the io command, it translates and handles the ublk io
>> request, such as, for the ublk-loop target, ublksrv translates the request
>> into same request on another file or disk, like the kernel loop block
>> driver. In ublksrv's implementation, the io is still handled by io_uring,
>> and share same ring with IORING_OP_URING_CMD command. When the target io
>> request is done, the same IORING_OP_URING_CMD is issued to ublk driver for
>> both committing io request result and getting future notification of new
>> io request.
>>
>> Another thing done by ublk driver is to copy data between kernel io
>> request and ublksrv's io buffer:
>>
>> 1) before ubsrv handles WRITE request, copy the request's data into
>> ublksrv's userspace io buffer, so that ublksrv can handle the write
>> request
>>
>> 2) after ubsrv handles READ request, copy ublksrv's userspace io buffer
>> into this READ request, then ublk driver can complete the READ request
>>
>> Zero copy may be switched if mm is ready to support it.
>>
>> ublk driver doesn't handle any logic of the specific user space driver,
>> so it should be small/simple enough.
>>
>> [1] ublksrv
>>
>> https://github.com/ming1/ubdsrv
>>
>> Signed-off-by: Ming Lei <[email protected]>
>
> Hi Ming,
>
> A few comments inline:
>
>
>> +#define UBLK_MINORS (1U << MINORBITS)
>> +
>> +struct ublk_rq_data {
>> + struct callback_head work;
>> +};
>> +
>> +/* io cmd is active: sqe cmd is received, and its cqe isn't done */
>> +#define UBLK_IO_FLAG_ACTIVE 0x01
>> +
>> +/*
>> + * FETCH io cmd is completed via cqe, and the io cmd is being handled by
>> + * ublksrv, and not committed yet
>> + */
>> +#define UBLK_IO_FLAG_OWNED_BY_SRV 0x02
>> +
>
> Minor nit: I wonder if the IO life cycle isn't better represented as a
> state machine than flags:
>
> enum {
> UBLK_IO_FREE,
> UBLK_IO_QUEUED
> UBLK_IO_OWNED_BY_SRV
> UBLK_IO_COMPLETED,
> UBLK_IO_ABORTED,
> }
>
> Since currently, IO_FLAG_ACTIVE and IO_OWNED_BY_SRV should (almost) be
> mutually exclusive.
>
>
>> +
>> +static int ublk_ctrl_stop_dev(struct ublk_device *ub)
>> +{
>> + ublk_stop_dev(ub);
>> + cancel_work_sync(&ub->stop_work);
>> + return 0;
>> +}
>> +
>> +static inline bool ublk_queue_ready(struct ublk_queue *ubq)
>> +{
>> + return ubq->nr_io_ready == ubq->q_depth;
>> +}
>> +
>> +/* device can only be started after all IOs are ready */
>> +static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
>> +{
>> + mutex_lock(&ub->mutex);
>> + ubq->nr_io_ready++;
>
> I think this is still problematic for the case where a FETCH_IO is sent
> from a different thread than the one originally set in ubq_daemon
> (i.e. a userspace bug). Since ubq_daemon is used to decide what task
> context will do the data copy, If an IO_FETCH_RQ is sent to the same queue
> from two threads, the data copy can happen in the context of the wrong
> task. I'd suggest something like the check below at the beginning of
> mark_io_ready and a similar on for IO_COMMIT_AND_FETCH_RQ
>
> mutex_lock(&ub->mutex);
> if (ub->ubq_daemon && ub->ubq_daemon != current) {
> mutex_unlock(&ub->mutex);
> return -EINVAL;
> }
> ubq->nr_io_ready++;
> ...
>> + if (ublk_queue_ready(ubq)) {
>> + ubq->ubq_daemon = current;
>> + get_task_struct(ubq->ubq_daemon);
>> + ub->nr_queues_ready++;
>> + }
>> + if (ub->nr_queues_ready == ub->dev_info.nr_hw_queues)
>> + complete_all(&ub->completion);
>> + mutex_unlock(&ub->mutex);
>> +}
>> +
>> +static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
>> +{
>> + struct ublksrv_io_cmd *ub_cmd = (struct ublksrv_io_cmd *)cmd->cmd;
>> + struct ublk_device *ub = cmd->file->private_data;
>> + struct ublk_queue *ubq;
>> + struct ublk_io *io;
>> + u32 cmd_op = cmd->cmd_op;
>> + unsigned tag = ub_cmd->tag;
>> + int ret = -EINVAL;
>> +
>> + pr_devel("%s: receieved: cmd op %d queue %d tag %d result %d\n",
>
> ^^^
> received
>
>
>> + __func__, cmd->cmd_op, ub_cmd->q_id, tag,
>> + ub_cmd->result);
>> +
>> + if (!(issue_flags & IO_URING_F_SQE128))
>> + goto out;
>> +
>> + ubq = ublk_get_queue(ub, ub_cmd->q_id);
>> + if (!ubq || ub_cmd->q_id != ubq->q_id)
>
> q_id is coming from userspace and is used to access an array inside
> ublk_get_queue(). I think you need to ensure qid < ub->dev_info.nr_hw_queues
> before calling ublk_get_queue() to protect from a kernel bad memory
> access triggered by userspace.
>
>> + goto out;
>> +
>> + if (WARN_ON_ONCE(tag >= ubq->q_depth))
>
> Userspace shouldn't be able to easily trigger a WARN_ON.
>
>> + goto out;
>> +
>> + io = &ubq->ios[tag];
>> +
>> + /* there is pending io cmd, something must be wrong */
>> + if (io->flags & UBLK_IO_FLAG_ACTIVE) {b
>> + ret = -EBUSY;
>> + goto out;
>> + }
>> +
>> + switch (cmd_op) {
>> + case UBLK_IO_FETCH_REQ:
>> + /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */
>> + if (WARN_ON_ONCE(ublk_queue_ready(ubq))) {
>
> Likewise, this shouldn't trigger a WARN_ON, IMO.
>
>> + ret = -EBUSY;
>> + goto out;
>> + }
>> + /*
>> + * The io is being handled by server, so COMMIT_RQ is expected
>> + * instead of FETCH_REQ
>> + */
>> + if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)
>> + goto out;
>> + /* FETCH_RQ has to provide IO buffer */
>> + if (!ub_cmd->addr)
>> + goto out;
>> + io->cmd = cmd;
>> + io->flags |= UBLK_IO_FLAG_ACTIVE;
>> + io->addr = ub_cmd->addr;
>> +
>> + ublk_mark_io_ready(ub, ubq);
>> + break;
>> + case UBLK_IO_COMMIT_AND_FETCH_REQ:
>> + /* FETCH_RQ has to provide IO buffer */
>> + if (!ub_cmd->addr)
>> + goto out;
>> + io->addr = ub_cmd->addr;
>> + io->flags |= UBLK_IO_FLAG_ACTIVE;
>> + fallthrough;
>> + case UBLK_IO_COMMIT_REQ:
>> + io->cmd = cmd;
>> + if (!(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV))
>> + goto out;
>> + ublk_commit_completion(ub, ub_cmd);
>> +
>> + /* COMMIT_REQ is supposed to not fetch req */
>
> I wonder if we could make it without IO_COMMIT_REQ. Is it useful to be
> able to commit without fetching a new request?
UBLK_IO_COMMIT_REQ is not necessary, IMO.
In current version of ubd_drv.c I find UBLK_IO_COMMIT_REQ is sent by ublksrv
after it gets one UBD_IO_RES_ABORT beacuse ubd_drv wants to abort IOs and let
the ublk daemon exit.
We can use UBLK_IO_COMMIT_AND_FETCH_REQ to replace UBLK_IO_COMMIT_REQ.
The data flow could be:
1) UBLK_IO_COMMIT_AND_FETCH_REQ from ublksrv
2) ubd_drv receives IO's sqe with UBLK_IO_COMMIT_AND_FETCH_REQ
and sets the IO's status to UBLK_IO_QUEUED
3) ubd_drv wants to abort IOs so it just completes
this IO's cqe(UBD_IO_RES_ABORT)
I successfully removed UBLK_IO_COMMIT_REQ when developing libubd
although I choose the earliest version of ubd_drv.c(v5.17-ubd-dev)
which may be a buggy version.
>
>> + if (cmd_op == UBLK_IO_COMMIT_REQ) {
>> + ret = UBLK_IO_RES_OK;
>> + goto out;
>> + }
>> + break;
>> + default:
>> + goto out;
>> + }
>> + return -EIOCBQUEUED;
>> +
>> + out:
>> + io->flags &= ~UBLK_IO_FLAG_ACTIVE;
>> + io_uring_cmd_done(cmd, ret, 0);
>> + pr_devel("%s: complete: cmd op %d, tag %d ret %x io_flags %x\n",
>> + __func__, cmd_op, tag, ret, io->flags);
>> + return -EIOCBQUEUED;
>> +}
>> +
>
> Thanks!
>
next prev parent reply other threads:[~2022-07-05 4:16 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-28 16:08 [PATCH V3 0/1] ublk: add io_uring based userspace block driver Ming Lei
2022-06-28 16:08 ` [PATCH V3 1/1] " Ming Lei
2022-06-30 11:35 ` Ziyang Zhang
2022-06-30 12:33 ` Ming Lei
2022-07-01 2:47 ` Ziyang Zhang
2022-07-01 4:06 ` Ming Lei
2022-07-04 11:17 ` Sagi Grimberg
2022-07-04 12:34 ` Ming Lei
2022-07-04 14:00 ` Sagi Grimberg
2022-07-04 16:13 ` Gabriel Krisman Bertazi
2022-07-04 16:19 ` Sagi Grimberg
2022-07-05 0:43 ` Ming Lei
2022-07-04 22:10 ` Gabriel Krisman Bertazi
2022-07-05 4:16 ` Ziyang Zhang [this message]
2022-07-05 8:12 ` Ming Lei
2022-07-05 8:06 ` Ming Lei
2022-07-07 7:49 ` Ming Lei
2022-07-07 7:58 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ebd6754e-57bf-88a7-df04-3f38864b0c52@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox