From: Xiaoguang Wang <[email protected]>
To: Ming Lei <[email protected]>
Cc: Jens Axboe <[email protected]>,
[email protected], [email protected],
[email protected],
Gabriel Krisman Bertazi <[email protected]>,
ZiyangZhang <[email protected]>
Subject: Re: [PATCH V4 0/2] ublk: add io_uring based userspace block driver
Date: Tue, 12 Jul 2022 23:16:05 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <Ys1bTrVd+L5zYODg@T590>
hi,
>>>
>>> If we adopt to pass one io_uring fd per queue when starting device,
>>> blk-mq's queue_rq() will get corresponding io_uring file for this queue and
>>> use it to generate cqes directly to notify new io commands inserted,
>>> then UBLK_CMD_START_DEV doesn't need to wait, and can be in the
>>> same thread with ublksrv_queue_init or ublksrv_process_io.
>>> Seems that for demo_null.c, they are still in different threads.
>>>
>>> For current io_uring implementation, one sqe may generate one or more
>>> cqes, but indeed, we can generate cqes without submitting sqes, just
>>> fill one event to io_uring ctx.
>>> Just suggestions :)
> I don't have any interest in so unusual usage of io_uring, especially it
> needs fundamental change in io_uring.
Got it, I see your point now and will respect that.
>
> Compared with kernel side change, removing waiting until queue setup in
> userspace side simplifies nothing.
>
> You still need io_kiocb/io_uring_cmd and both are allocated in
> submission code path, since 'io_ring_ctx' is private for
> io_uring.
>
> And ublk server still needs one command to send io result to ublk driver,
> that said this way saves nothing for us cause ublk driver uses single
> command for both fetching io request and committing io result.
>
> Easy to say than done, you may try to write a patch to verify your
> ideas, especially no one uses io_ring in this way.
I have done a hack change in io_uring:
iff --git a/fs/io_uring.c b/fs/io_uring.c
index 5ff2cdb425bc..e6696319148e 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -11958,6 +11958,21 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, size_t *argsz
return 0;
}
+static u64 tst_count;
+
+static void io_gen_cqe_direct(struct file *file)
+{
+ struct io_ring_ctx *ctx;
+ ctx = file->private_data;
+
+ printk(KERN_ERR "tst_count %llu\n", tst_count);
+ spin_lock(&ctx->completion_lock);
+ io_fill_cqe_aux(ctx, tst_count++, 0, 0);
+ io_commit_cqring(ctx);
+ spin_unlock(&ctx->completion_lock);
+ io_cqring_ev_posted(ctx);
+}
+
SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
u32, min_complete, u32, flags, const void __user *, argp,
size_t, argsz)
@@ -12005,6 +12020,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
if (unlikely(ctx->flags & IORING_SETUP_R_DISABLED))
goto out;
+ io_gen_cqe_direct(f.file);
/*
* For SQ polling, the thread will do all submissions and completions.
* Just return the requested submit count, and wake the thread if
And in user-space:
ret = io_uring_queue_init(QD, &ring, 0);
for (i = 0; i < 100; i++) {
io_uring_submit_and_wait(&ring, 0);
io_uring_wait_cqe(&ring, &cqe);
printf("lege user_data %llu\n", cqe->user_data);
io_uring_cqe_seen(&ring, cqe);
}
Now user-space app will get cqes continually, by it does not submit any sqes.
Indeed, io_fill_cqe_aux() has been used somewhere in kernel io_uring, for example,
sent one cqe from one io_uring instance to another instance.
I had planned to use io_gen_cqe_direct() in ublk's queue_rq to notify io requests
inserted. But now I'm fine that dropping this idea.
>
>> As I said before, there maybe such benefits:
>> 1. may decouple io command descriptor acquire and io command handling well.
>> At least helper like tcmulib_get_next_command maybe added easily. I'm not sure, some
>> applications based on tcmu previously may need this helper.
> Why do we need similar helper of tcmulib_get_next_command()? for what
> purpose? In current design of ublk server, it should be enough for target
> code to focus on ->handle_io_async(), ->target_io_done() in case of handling
> target io via ubq io_uring, ->handle_event() in case of handling target io in
> other contexts.
I'll have a deep think about it. Initially I tried to make libublk offer
similar programming interface to libtcmu, so apps can switch to ublk
easily, but indeed they maybe totally different things.
Sorry for noises again.
>
> ublk server is one io_uring application, and interfaces provided
> are based on io_uring design/interfaces. I guess TCMU isn't based on
> io_uring, so it may have original style interface. You may ask
> TCMU guys if it is possible to reuse current interface for supporting
> io_uring with expected performance.
>
>> 2. UBLK_CMD_START_DEV won't need to wait another thread context to submit
>> number of queue depth of sqes firstly, but I admit that it's not a big issue.
> Yeah, it simplifies nothing, compared with fundamental io_uring change
> and ublk driver side change.
>
>>
>> I admit batch will be good, and syscalls userspace and kernel context switch
>> introduce overhead. But for big io requests, batch in one context is not good. In
>> the example of read requests, if io request is big, say 1MB, io_uring will do
>> commit req sequentially, which indeed mainly do memcpy work. But if users
>> can choose to issue multiple ioctls which do commit req concurrently, I think
>> user may get higher io throughput.
> If BS is big, single job could saturate devices bw easily, multiple jobs won't
> make a difference, will it? Not mention sync cost among multiple jobs.n
No, I don't consider device scenes, at least for us, our target will visit distributed
file systems, which can offer very high io throughput, far greater than normal device.
>
> Not mention current ublk server does support multiple io jobs.
Yeah, I see that. But for read request, commit req commands will still be done
in ubq->ubq_daemon task context, and commit req commands mainly do memcpy work.
Regards,
Xiaoguang Wang
>
>> And in this case, user may not care userspace and kernel context switch overhead at all.
>>
>> Or to put it another way, should libublk offer synchronous programming interface ?
> It depends what the sync interface is.
>
> You can call sync read()/write() for target io directly in ->handdle_io_async(),
> or call them in other pthread(s) just by telling ubq after these sync ios are done
> with the eventfd API.
>
> demo_null.c is actually one such example.
>
>>> With ublk, usually we handle dozens or even more than hundred of IOs in
>>> single io_uring_enter() syscall.
>>>
>>>> io workers can run concurrently. Since GET_DATA(write request)
>>>> or COMMIT_REQ(read request) mainly do memcpy work, one
>>>> io_uring instance will just do these jobs sequentially, which may
>>>> not take advantage of multi-cpu.
>>> IMO you can implement target code to handle io in other pthreads against
>>> current libublksrv design, see demo_event.c. Or if you think it is still
>>> enough, please just share with us what the problem is. Without
>>> understanding the problem, I won't try to figure out any solution or
>>> change.
>> I need to read your ublk userspace codes carefully, if I made
> One small suggestion, you may start with the two builtin examples, both
> are simple & clean, and the big one is only 300+ LOC, really complicated?
>
>
> Thanks.
> Ming
prev parent reply other threads:[~2022-07-12 15:20 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-11 2:20 [PATCH V4 0/2] ublk: add io_uring based userspace block driver Ming Lei
2022-07-11 2:20 ` [PATCH V4 1/2] " Ming Lei
2022-07-11 2:20 ` [PATCH V4 2/2] ublk_drv: add UBLK_IO_REFETCH_REQ for supporting to build as module Ming Lei
2022-07-11 20:06 ` Gabriel Krisman Bertazi
2022-07-12 2:26 ` Ziyang Zhang
2022-07-12 2:46 ` Ming Lei
2022-07-12 2:33 ` Ming Lei
2022-07-12 10:08 ` Ming Lei
2022-07-11 11:58 ` [PATCH V4 0/2] ublk: add io_uring based userspace block driver Xiaoguang Wang
[not found] ` <[email protected]>
2022-07-11 14:03 ` Ming Lei
2022-07-12 8:44 ` Xiaoguang Wang
2022-07-12 11:30 ` Ming Lei
2022-07-12 15:16 ` Xiaoguang Wang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afed0772-3626-44e6-a33c-7134a7d623f0@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox