public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Hao Xu <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH v3] io_uring: implement async hybrid mode for pollable requests
Date: Fri, 22 Oct 2021 20:49:24 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 10/18/21 14:34, Hao Xu wrote:
> The current logic of requests with IOSQE_ASYNC is first queueing it to
> io-worker, then execute it in a synchronous way. For unbound works like
> pollable requests(e.g. read/write a socketfd), the io-worker may stuck
> there waiting for events for a long time. And thus other works wait in
> the list for a long time too.
> Let's introduce a new way for unbound works (currently pollable
> requests), with this a request will first be queued to io-worker, then
> executed in a nonblock try rather than a synchronous way. Failure of
> that leads it to arm poll stuff and then the worker can begin to handle
> other works.
> The detail process of this kind of requests is:

Looks good, but there are a couple of moments that might be better.
Would be good to have it for 5.16, if it's rolling out soon we can
queue this and do post clean up.

nits below

> step1: original context:
>             queue it to io-worker
> step2: io-worker context:
>             nonblock try(the old logic is a synchronous try here)
>                 |
>                 |--fail--> arm poll
>                              |
>                              |--(fail/ready)-->synchronous issue
>                              |
>                              |--(succeed)-->worker finish it's job, tw
>                                             take over the req
> 
> This works much better than the old IOSQE_ASYNC logic in cases where
> unbound max_worker is relatively small. In this case, number of
> io-worker eazily increments to max_worker, new worker cannot be created
> and running workers stuck there handling old works in IOSQE_ASYNC mode.
> 
> In my 64-core machine, set unbound max_worker to 20, run echo-server,
> turns out:
> (arguments: register_file, connetion number is 1000, message size is 12
> Byte)
> original IOSQE_ASYNC: 76664.151 tps
> after this patch: 166934.985 tps
> 
> Suggested-by: Jens Axboe <[email protected]>
> Signed-off-by: Hao Xu <[email protected]>
> ---
> 
> v1-->v2:
>   - tweak added code in io_wq_submit_work to reduce overhead
> v2-->v3:
>   - remove redundant IOSQE_ASYNC_HYBRID stuff
> 
> 
>   fs/io_uring.c | 36 +++++++++++++++++++++++++++++++++++-
>   1 file changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index b3546eef0289..86819c7917df 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -6747,8 +6747,18 @@ static void io_wq_submit_work(struct io_wq_work *work)
>   		ret = -ECANCELED;

Too much of indention, it's hard to read. I'd suggest to do

if (unlikely(work->flags & IO_WQ_WORK_CANCEL)) {
	io_req_task_queue_fail(req, -ECANCELED);
	return;
}

and kill following "if (!ret)" with extra indention


>   
>   	if (!ret) {
> +		bool needs_poll = false;
> +		unsigned int issue_flags = IO_URING_F_UNLOCKED;
> +
> +		if (req->flags & REQ_F_FORCE_ASYNC) {
> +			needs_poll = req->file && file_can_poll(req->file);

It goes through a weird flow if opcode doesn't use polling,

needs_poll = needs_poll && (def->pollin || def->pollout);


We can even split out checks from io_arm_poll_handler() into
an inline handler.


if (!req->file || !file_can_poll(req->file))
	return IO_APOLL_ABORTED;
if (req->flags & REQ_F_POLLED)
	return IO_APOLL_ABORTED;
if (!def->pollin && !def->pollout)
	return IO_APOLL_ABORTED;



> +			if (needs_poll)
> +				issue_flags |= IO_URING_F_NONBLOCK;
> +		}
> +
>   		do {
> -			ret = io_issue_sqe(req, IO_URING_F_UNLOCKED);
> +issue_sqe:
> +			ret = io_issue_sqe(req, issue_flags);
>   			/*
>   			 * We can get EAGAIN for polled IO even though we're
>   			 * forcing a sync submission from here, since we can't
> @@ -6756,6 +6766,30 @@ static void io_wq_submit_work(struct io_wq_work *work)
>   			 */
>   			if (ret != -EAGAIN)
>   				break;
> +			if (needs_poll) {

How about inverting it and killing yet another indention level
for the polling stuff?

if (!needs_poll) {
	cond_resched();
	continue;
}

/* arm polling / etc. */


And that's the only place we need cond_resched(), so the one
at the end can be deleted.

> +				bool armed = false;
> +
> +				ret = 0;
> +				needs_poll = false;
> +				issue_flags &= ~IO_URING_F_NONBLOCK;
> +
> +				switch (io_arm_poll_handler(req)) {

1) Not a big fun of back hopping jumps
2) don't like armed logic

What we can do is replace switch with if's, and then you can freely
use break instead of @armed and continue instead of goto. Also,
considering a cond_reshed() comment it can be:

if (io_arm_poll_handler(req) == IO_APOLL_OK)
	break;
continue;
  

> +				case IO_APOLL_READY:
> +					goto issue_sqe;
> +				case IO_APOLL_ABORTED:
> +					/*
> +					 * somehow we failed to arm the poll infra,
> +					 * fallback it to a normal async worker try.
> +					 */
> +					break;
> +				case IO_APOLL_OK:
> +					armed = true;
> +					break;
> +				}
> +
> +				if (armed)
> +					break;
> +			}
>   			cond_resched();
>   		} while (1);
>   	}
> 

With all but (pollin || pollout) changes looks for me like
below (not tested), I guess can be brushed further.


static void io_wq_submit_work(struct io_wq_work *work)
{
	struct io_kiocb *req = container_of(work, struct io_kiocb, work);
	unsigned int issue_flags = IO_URING_F_UNLOCKED;
	bool needs_poll = false;
	struct io_kiocb *timeout;
	int ret;

	/* one will be dropped by ->io_free_work() after returning to io-wq */
	if (!(req->flags & REQ_F_REFCOUNT))
		__io_req_set_refcount(req, 2);
	else
		req_ref_get(req);

	timeout = io_prep_linked_timeout(req);
	if (timeout)
		io_queue_linked_timeout(timeout);

	/* either cancelled or io-wq is dying, so don't touch tctx->iowq */
	if (unlikely(work->flags & IO_WQ_WORK_CANCEL)) {
		io_req_task_queue_fail(req, -ECANCELED);
		return;
	}

	if (req->flags & REQ_F_FORCE_ASYNC) {
		needs_poll = req->file && file_can_poll(req->file);
		if (needs_poll)
			issue_flags |= IO_URING_F_NONBLOCK;
	}

	do {
		ret = io_issue_sqe(req, issue_flags);
		/*
		 * We can get EAGAIN for polled IO even though we're
		 * forcing a sync submission from here, since we can't
		 * wait for request slots on the block side.
		 */
		if (ret != -EAGAIN)
			break;
		if (!needs_poll) {
			cond_resched();
			continue;
		}

		if (io_arm_poll_handler(req) == IO_APOLL_OK)
			return;
		/* ready or aborted, in either case fall back to blocking */
		needs_poll = false;
		issue_flags &= ~IO_URING_F_NONBLOCK;
	} while (1);

	/* avoid locking problems by failing it from a clean context */
	if (ret)
		io_req_task_queue_fail(req, ret);
}

-- 
Pavel Begunkov

  parent reply	other threads:[~2021-10-22 19:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-18 13:34 [PATCH v3] io_uring: implement async hybrid mode for pollable requests Hao Xu
2021-10-22  3:43 ` Hao Xu
2021-10-22  9:27 ` Pavel Begunkov
2021-10-22 19:49 ` Pavel Begunkov [this message]
2021-10-23  1:21 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox