public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected], [email protected],
	[email protected],
	Gabriel Krisman Bertazi <[email protected]>,
	ZiyangZhang <[email protected]>,
	Xiaoguang Wang <[email protected]>
Subject: Re: [RFC PATCH] ubd: add io_uring based userspace block driver
Date: Tue, 10 May 2022 10:58:01 +0800	[thread overview]
Message-ID: <YnnUuZve2b2LmInc@T590> (raw)
In-Reply-To: <[email protected]>

On Mon, May 09, 2022 at 10:09:10AM -0600, Jens Axboe wrote:
> On 5/9/22 3:23 AM, Ming Lei wrote:
> > This is the driver part of userspace block driver(ubd driver), the other
> > part is userspace daemon part(ubdsrv)[1].
> > 
> > The two parts communicate by io_uring's IORING_OP_URING_CMD with one
> > shared cmd buffer for storing io command, and the buffer is read only for
> > ubdsrv, each io command is indexed by io request tag directly, and
> > is written by ubd driver.
> > 
> > For example, when one READ io request is submitted to ubd block driver, ubd
> > driver stores the io command into cmd buffer first, then completes one
> > IORING_OP_URING_CMD for notifying ubdsrv, and the URING_CMD is issued to
> > ubd driver beforehand by ubdsrv for getting notification of any new io request,
> > and each URING_CMD is associated with one io request by tag.
> > 
> > After ubdsrv gets the io command, it translates and handles the ubd io
> > request, such as, for the ubd-loop target, ubdsrv translates the request
> > into same request on another file or disk, like the kernel loop block
> > driver. In ubdsrv's implementation, the io is still handled by io_uring,
> > and share same ring with IORING_OP_URING_CMD command. When the target io
> > request is done, the same IORING_OP_URING_CMD is issued to ubd driver for
> > both committing io request result and getting future notification of new
> > io request.
> > 
> > Another thing done by ubd driver is to copy data between kernel io
> > request and ubdsrv's io buffer:
> > 
> > 1) before ubsrv handles WRITE request, copy the request's data into
> > ubdsrv's userspace io buffer, so that ubdsrv can handle the write
> > request
> > 
> > 2) after ubsrv handles READ request, copy ubdsrv's userspace io buffer
> > into this READ request, then ubd driver can complete the READ request
> > 
> > Zero copy may be switched if mm is ready to support it.
> > 
> > ubd driver doesn't handle any logic of the specific user space driver,
> > so it should be small/simple enough.
> 
> This is pretty interesting! Just one small thing I noticed, since you
> want to make sure batching is Good Enough:
> 
> > +static blk_status_t ubd_queue_rq(struct blk_mq_hw_ctx *hctx,
> > +		const struct blk_mq_queue_data *bd)
> > +{
> > +	struct ubd_queue *ubq = hctx->driver_data;
> > +	struct request *rq = bd->rq;
> > +	struct ubd_io *io = &ubq->ios[rq->tag];
> > +	struct ubd_rq_data *data = blk_mq_rq_to_pdu(rq);
> > +	blk_status_t res;
> > +
> > +	if (ubq->aborted)
> > +		return BLK_STS_IOERR;
> > +
> > +	/* this io cmd slot isn't active, so have to fail this io */
> > +	if (WARN_ON_ONCE(!(io->flags & UBD_IO_FLAG_ACTIVE)))
> > +		return BLK_STS_IOERR;
> > +
> > +	/* fill iod to slot in io cmd buffer */
> > +	res = ubd_setup_iod(ubq, rq);
> > +	if (res != BLK_STS_OK)
> > +		return BLK_STS_IOERR;
> > +
> > +	blk_mq_start_request(bd->rq);
> > +
> > +	/* mark this cmd owned by ubdsrv */
> > +	io->flags |= UBD_IO_FLAG_OWNED_BY_SRV;
> > +
> > +	/*
> > +	 * clear ACTIVE since we are done with this sqe/cmd slot
> > +	 *
> > +	 * We can only accept io cmd in case of being not active.
> > +	 */
> > +	io->flags &= ~UBD_IO_FLAG_ACTIVE;
> > +
> > +	/*
> > +	 * run data copy in task work context for WRITE, and complete io_uring
> > +	 * cmd there too.
> > +	 *
> > +	 * This way should improve batching, meantime pinning pages in current
> > +	 * context is pretty fast.
> > +	 */
> > +	task_work_add(ubq->ubq_daemon, &data->work, TWA_SIGNAL);
> > +
> > +	return BLK_STS_OK;
> > +}
> 
> It'd be better to use bd->last to indicate what kind of signaling you
> need here. TWA_SIGNAL will force an immediate transition if the app is
> running in userspace, which may not be what you want. Also see:
> 
> https://git.kernel.dk/cgit/linux-block/commit/?h=for-5.19/io_uring&id=e788be95a57a9bebe446878ce9bf2750f6fe4974
> 
> But regardless of signaling needed, you don't need it except if bd->last
> is true. Would need a commit_rqs() as well, but that's trivial.

Good point, I think we may add non-last request via task_work_add(TWA_NONE),
and only notify via TWA_SIGNAL_NO_IPI for bd->last.

> 
> More importantly, what prevents ubq->ubq_daemon from going away after
> it's been assigned? I didn't look at the details, but is this relying on
> io_uring being closed to cancel pending requests? That should work, but

I think no way can prevent ubq->ubq_daemon from being killed by 'kill -9',
even though ubdsrv has handled SIGTERM. That is why I suggest to add
one service for removing all ubd devices before shutdown:

https://github.com/ming1/ubdsrv/blob/devel/README

All the commands of UBD_IO_FETCH_REQ or UBD_IO_COMMIT_AND_FETCH_REQ have
been submitted to driver, I understand io_uring can't cancel them,
please correct me if it is wrong.

One solution I thought of is to use one watchdog to check if ubq->ubq_daemon
is dead, then abort whole device if yes. Or any suggestion?

> we need some way to ensure that ->ubq_daemon is always valid here.

Good catch.

get_task_struct() should be used for assigning ubq->ubq_daemon.



thanks,
Ming


  reply	other threads:[~2022-05-10  2:58 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-09  9:23 [RFC PATCH] ubd: add io_uring based userspace block driver Ming Lei
2022-05-09 15:10 ` Gabriel Krisman Bertazi
2022-05-10  1:57   ` Ming Lei
2022-05-10  4:22   ` Ziyang Zhang
2022-05-09 16:00 ` Randy Dunlap
2022-05-09 18:11   ` Gabriel Krisman Bertazi
2022-05-09 18:13     ` Jens Axboe
2022-05-09 16:09 ` Jens Axboe
2022-05-10  2:58   ` Ming Lei [this message]
2022-05-10  3:29     ` Jens Axboe
2022-05-10  7:38       ` Ming Lei
2022-05-09 18:14 ` Martin Raiber
2022-05-16 19:29 ` Stefan Hajnoczi
2022-05-17  1:57   ` Ming Lei
2022-05-17  6:17     ` Stefan Hajnoczi
2022-05-30  7:07 ` Pavel Machek
2022-06-02  3:19   ` Ming Lei
2022-06-06  2:15     ` Gao Xiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YnnUuZve2b2LmInc@T590 \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox