public inbox for [email protected]
 help / color / mirror / Atom feed
From: Ming Lei <[email protected]>
To: Xiaoguang Wang <[email protected]>
Cc: [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected]
Subject: Re: [RFC 3/3] ublk_drv: add ebpf support
Date: Fri, 17 Feb 2023 11:02:14 +0800	[thread overview]
Message-ID: <Y+7uNpw7QBpJ4GHA@T590> (raw)
In-Reply-To: <[email protected]>

On Thu, Feb 16, 2023 at 08:12:18PM +0800, Xiaoguang Wang wrote:
> hello,
> 
> > On Wed, Feb 15, 2023 at 08:41:22AM +0800, Xiaoguang Wang wrote:
> >> Currenly only one bpf_ublk_queue_sqe() ebpf is added, ublksrv target
> >> can use this helper to write ebpf prog to support ublk kernel & usersapce
> >> zero copy, please see ublksrv test codes for more info.
> >>
> >>  	 */
> >> +	if ((req_op(req) == REQ_OP_WRITE) && ub->io_prep_prog)
> >> +		return rq_bytes;
> > Can you explain a bit why READ isn't supported? Because WRITE zero
> > copy is supposed to be supported easily with splice based approach,
> > and I am more interested in READ zc actually.
> No special reason, READ op can also be supported. I'll
> add this support in patch set v2.
> For this RFC patch set, I just tried to show the idea, so
> I must admit that current codes are not mature enough :)

OK.

> 
> >
> >> +
> >>  	if (req_op(req) != REQ_OP_WRITE && req_op(req) != REQ_OP_FLUSH)
> >>  		return rq_bytes;
> >>  
> >> @@ -860,6 +921,89 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, struct request *rq)
> >>  	}
> >>  }
> >>  
> >>
> >> +	kbuf->bvec = bvec;
> >> +	rq_for_each_bvec(tmp, rq, rq_iter) {
> >> +		*bvec = tmp;
> >> +		bvec++;
> >> +	}
> >> +
> >> +	kbuf->count = blk_rq_bytes(rq);
> >> +	kbuf->nr_bvecs = nr_bvec;
> >> +	data->kbuf = kbuf;
> >> +	return 0;
> > bio/req bvec table is immutable, so here you can pass its reference
> > to kbuf directly.
> Yeah, thanks.

Also if this request has multiple bios, either you need to submit
multple sqes or copy all bvec into single table. And in case of single bio,
the table reference can be used directly.

> 
> >
> >> +}
> >> +
> >> +static int ublk_run_bpf_prog(struct ublk_queue *ubq, struct request *rq)
> >> +{
> >> +	int err;
> >> +	struct ublk_device *ub = ubq->dev;
> >> +	struct bpf_prog *prog = ub->io_prep_prog;
> >> +	struct ublk_io_bpf_ctx *bpf_ctx;
> >> +
> >> +	if (!prog)
> >> +		return 0;
> >> +
> >> +	bpf_ctx = kmalloc(sizeof(struct ublk_io_bpf_ctx), GFP_NOIO);
> >> +	if (!bpf_ctx)
> >> +		return -EIO;
> >> +
> >> +	err = ublk_init_uring_kbuf(rq);
> >> +	if (err < 0) {
> >> +		kfree(bpf_ctx);
> >> +		return -EIO;
> >> +	}
> >> +	bpf_ctx->ub = ub;
> >> +	bpf_ctx->ctx.q_id = ubq->q_id;
> >> +	bpf_ctx->ctx.tag = rq->tag;
> >> +	bpf_ctx->ctx.op = req_op(rq);
> >> +	bpf_ctx->ctx.nr_sectors = blk_rq_sectors(rq);
> >> +	bpf_ctx->ctx.start_sector = blk_rq_pos(rq);
> > The above is for setting up target io parameter, which is supposed
> > to be from userspace, cause it is result of user space logic. If
> > these parameters are from kernel, the whole logic has to be done
> > in io_prep_prog.
> Yeah, it's designed that io_prep_prog implements user space
> io logic.

That could be the biggest weakness of this approach, because people
really want to implement complicated logic in userspace, which should
be the biggest value of ublk, but now seems you move kernel C
programming into ebpf userspace programming, I don't think ebpf
is good at handling complicated userspace logic.

> 
> >
> >> +	bpf_prog_run_pin_on_cpu(prog, bpf_ctx);
> >> +
> >> +	init_task_work(&bpf_ctx->work, ublk_bpf_io_submit_fn);
> >> +	if (task_work_add(ubq->ubq_daemon, &bpf_ctx->work, TWA_SIGNAL_NO_IPI))
> >> +		kfree(bpf_ctx);
> > task_work_add() is only available in case of ublk builtin.
> Yeah, I'm thinking how to work around it.
> 
> >
> >> +	return 0;
> >> +}
> >> +
> >>  static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> >>  		const struct blk_mq_queue_data *bd)
> >>  {
> >> @@ -872,6 +1016,9 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
> >>  	if (unlikely(res != BLK_STS_OK))
> >>  		return BLK_STS_IOERR;
> >>  
> >> +	/* Currently just for test. */
> >> +	ublk_run_bpf_prog(ubq, rq);
> > Can you explain the above comment a bit? When is the io_prep_prog called
> > in the non-test version? Or can you post the non-test version in list
> > for review.
> Forgot to delete stale comments, sorry. I'm writing v2 patch set,

OK, got it, so looks ublk_run_bpf_prog is designed to run two progs
loaded from two control commands.

> 
> > Here it is the key for understanding the whole idea, especially when
> > is io_prep_prog called finally? How to pass parameters to io_prep_prog?
> Let me explain more about the design:
> io_prep_prog has two types of parameters:
> 1) its call argument: struct ublk_bpf_ctx, see ublk.bpf.c.
> ublk_bpf_ctx will describe one kernel io requests about
> its op, qid, sectors info. io_prep_prog uses these info to
> map target io.
> 2) ebpf map structure, user space daemon can use map
> structure to pass much information from user space to
> io_prep_prog, which will help it to initialize target io if necessary.
> 
> io_prep_prog is called when ublk_queue_rq() is called, this bpf
> prog will initialize one or more sqes according to user logic, and
> io_prep_prog will put these sqes in an ebpf map structure, then
> execute a task_work_add() to notify ubq_daemon to execute
> io_submit_prog. Note, we can not call io_uring_submit_sqe()
> in task context that calls ublk_queue_rq(), that context does not
> have io_uring instance owned by ubq_daemon.
> Later ubq_daemon will call io_submit_prog to submit sqes.

Submitting sqe from kernel looks interesting, but I guess
performance may be hurt, given plugging(batching) can't be applied
any more, which is supposed to affect io perf a lot.



Thanks,
Ming


  reply	other threads:[~2023-02-17  3:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-15  0:41 [RFC 0/3] Add io_uring & ebpf based methods to implement zero-copy for ublk Xiaoguang Wang
2023-02-15  0:41 ` [RFC 1/3] bpf: add UBLK program type Xiaoguang Wang
2023-02-15  0:41 ` [RFC 2/3] io_uring: enable io_uring to submit sqes located in kernel Xiaoguang Wang
2023-02-15  0:41 ` [RFC 3/3] ublk_drv: add ebpf support Xiaoguang Wang
2023-02-16  8:11   ` Ming Lei
2023-02-16 12:12     ` Xiaoguang Wang
2023-02-17  3:02       ` Ming Lei [this message]
2023-02-17 10:46         ` Ming Lei
2023-02-22 14:13         ` Xiaoguang Wang
2023-02-15  0:46 ` [UBLKSRV] Add " Xiaoguang Wang
2023-02-16  8:28   ` Ming Lei
2023-02-16  9:17     ` Xiaoguang Wang
2023-02-15  8:40 ` [RFC 0/3] Add io_uring & ebpf based methods to implement zero-copy for ublk Ziyang Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y+7uNpw7QBpJ4GHA@T590 \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox