public inbox for [email protected]
 help / color / mirror / Atom feed
From: Bernd Schubert <[email protected]>
To: Jens Axboe <[email protected]>,
	"[email protected]" <[email protected]>
Cc: Ming Lei <[email protected]>, Pavel Begunkov <[email protected]>
Subject: Re: SQPOLL / uring_cmd_iopoll
Date: Sat, 22 Apr 2023 13:40:24 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 4/22/23 04:13, Jens Axboe wrote:
> On 4/21/23 4:09?PM, Bernd Schubert wrote:
>> Hello,
>>
>> I was wondering if I could set up SQPOLL for fuse/IORING_OP_URING_CMD
>> and what would be the latency win. Now I get a bit confused what the
>> f_op->uring_cmd_iopoll() function is supposed to do.
> 
> Certainly, you can use SQPOLL with anything. Whether or not it'd be a
> win depends a lot on what you're doing, rate of IO, etc.
> 
> IOPOLL and SQPOLL are two different things. SQPOLL has a kernel side
> submission thread that polls for new SQ entries and submits them when it
> sees them. IOPOLL is a method for avoiding sleeping on waiting on CQ
> entries, where it will instead poll the target for completion instead.
> That's where ->uring_cmd_iopoll() comes in, that's the hook for polling
> for uring commands. For normal fs path read/write requests,
> ->uring_iopoll() is the hook that performs the same kind of action.
> 
>> Is it just there to check if SQEs are can be completed as CQE? In rw.c
> 
> Not sure I follow what you're trying to convey here, maybe you can
> expand on that? And maybe some of the confusion here is because of
> mixing up SQPOLL and IOPOLL?

Thanks a lot for your help!

I was confused when f_op->uring_cmd_iopoll gets called - if I needed to
check myself if the SQE was submitted or if this for CQE submission.
You already resolved my confusion with your comments below. Thanks a
lot for your help!

> 
>> io_do_iopoll() it looks like this. I don't follow all code paths in
>> __io_sq_thread yet, but it looks a like it already checks if the ring
>> has new entries
>>
>> to_submit = io_sqring_entries(ctx);
>> ...
>> ret = io_submit_sqes(ctx, to_submit);
>>
>>     --> it will eventually call into ->uring_cmd() ?
> 
> The SQPOLL thread will pull off new SQEs, and those will then at some
> point hit ->issue() which is an opcode dependent method for issuing the
> actual request. Once it's been issued, if the ring is IOPOLL, then
> io_iopoll_req_issued() will get called which adds the request to an
> internal poll list. When someone does io_uring_enter(2) to wait for
> events on a ring with IOPOLL, it will iterate that list and call
> ->uring_cmd_iopoll() for uring_cmd requests, and ->uring_iopoll() for
> "normal" requests.

Thanks, that basically answered my SQE confusion - it does all itself.

> 
> If the ring is using SQPOLL|IOPOLL, then the SQPOLL thread is also the
> one that does the polling. See __io_sq_thread() -> io_do_iopoll().
> 
>> And then io_do_iopoll ->  file->f_op->uring_cmd_iopoll is supposed to
>> check for available cq entries and will submit these? I.e. I just return
>> 1 if when the request is ready? And also ensure that
>> req->iopoll_completed is set?
> 
> The callback polls for a completion on the target side, which will mark
> is as ->iopoll_completed = true. That still leaves them on the iopoll
> list, and io_do_iopoll() will spot that and post CQEs for them.
> 
>> I'm also not sure what I should do with struct io_comp_batch * - I don't
>> have struct request *req_list anywhere in my fuse-uring changes, seems
>> to be blk-mq specific? So I should just ignore that parameter?
> 
> Hard to say since the above is a bit confusing and I haven't seen your
> code, but you can always start off just passing NULL. That's fine and
> just doesn't do any completion batching. The latter may or may not be
> useful for your case, but in any case, it's fine to pass NULL.

I had send patches to fsdevel and given it is mostly fuse related, didn't
add you to CC
https://lwn.net/Articles/926773/

The code is also here

https://github.com/bsbernd/linux/tree/fuse-uring-for-6.2
https://github.com/bsbernd/libfuse/tree/uring


I got SQPOLL working using this simple function

/**
  * This is called for requests when the ring is configured with
  * IORING_SETUP_IOPOLL.
  */
int fuse_uring_cmd_poll(struct io_uring_cmd *cmd, struct io_comp_batch *iob,
			unsigned int poll_flags)
{

	/* Not much to be done here, when IORING_SETUP_IOPOLL is set
	 * io_uring_cmd_done() already sets req->iopoll_completed.
	 * The caller (io_do_iopoll) already checks this flag
	 * and won't enter this function at all then.
	 * When we get called we just need to return 0 and tell the
          * caller that the cmd is not ready yet.
	 */
	return 0;
}


Just gave it a quick file creation/removal run with single threaded bonnie++
and performance is actually lower than before (around 8000 creates/s without
IORING_SETUP_SQPOLL (adding IORING_SETUP_IOPOLL doesn't help either) and
about 5000 creates/s with IORING_SETUP_SQPOLL). With plain /dev/fuse
it is about 2000 creates/s.
Main improvement comes from ensuring request submission
(application) and request handling (ring/thread) are on the same core.
I'm running into some scheduler issues, which I work around for now using
migrate_disable()/migrate_enable() in before/after fuse request waitq,
without that performance for metadata requests is similar to plain
/dev/fuse.

I will soon post an RFC v2 series with more benchmark results including
read and write.


> 
>> Btw, this might be useful for ublk as well?
> 
> Not sure what "this" is :-)

Ming already replied.



Thanks,
Bernd



  reply	other threads:[~2023-04-22 13:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-21 22:09 SQPOLL / uring_cmd_iopoll Bernd Schubert
2023-04-22  2:13 ` Jens Axboe
2023-04-22 13:40   ` Bernd Schubert [this message]
2023-04-24 12:55     ` Bernd Schubert
2023-04-22 12:55 ` Ming Lei
2023-04-22 14:08   ` Jens Axboe
2023-04-23  1:06     ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox