From: Pavel Begunkov <[email protected]>
To: Hao Xu <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH 2/3] io_uring: maintain drain logic for multishot requests
Date: Wed, 7 Apr 2021 12:41:46 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 07/04/2021 12:23, Hao Xu wrote:
> Now that we have multishot poll requests, one sqe can emit multiple
> cqes. given below example:
> sqe0(multishot poll)-->sqe1-->sqe2(drain req)
> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
> is a multishot poll request, sqe2 may be issued after sqe0's event
> triggered twice before sqe1 completed. This isn't what users leverage
> drain requests for.
> Here a simple solution is to ignore all multishot poll cqes, which means
> drain requests won't wait those request to be done.
> To achieve this, we should reconsider the req_need_defer equation, the
> original one is:
>
> all_sqes(excluding dropped ones) == all_cqes(including dropped ones)
>
> this means we issue a drain request when all the previous submitted
> sqes have generated their cqes.
> Now we should ignore multishot requests, so:
> all_sqes - multishot_sqes == all_cqes - multishot_cqes ==>
> all_sqes + multishot_cqes - multishot_cqes == all_cqes
>
> Thus we have to track the submittion of a multishot request and the cqes
> generation of it, including the ECANCELLED cqes. Here we introduce
> cq_extra = multishot_cqes - multishot_cqes for it.
>
> There are other solutions like:
> - just track multishot (non-ECNCELLED)cqes, don't track multishot sqes.
> this way we include multishot sqes in the left end of the equation
> this means we have to see multishot sqes as normal ones, then we
> have to keep right one cqe for each multishot sqe. It's hard to do
> this since there may be some multishot sqes which triggered
> several events and then was cancelled, meanwhile other multishot
> sqes just triggered events but wasn't cancelled. We still need to
> track number of multishot sqes that haven't been cancelled, which
> make things complicated
>
> For implementations, just do the submittion tracking in
> io_submit_sqe() --> io_init_req() to make things simple. Otherwise if
> we do it in per opcode issue place, then we need to carefully consider
> each caller of io_req_complete_failed() because trick cases like cancel
> multishot reqs in link.
>
> Signed-off-by: Hao Xu <[email protected]>
> ---
> fs/io_uring.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 192463bb977a..a7bd223ce2cc 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -423,6 +423,7 @@ struct io_ring_ctx {
> unsigned cq_mask;
> atomic_t cq_timeouts;
> unsigned cq_last_tm_flush;
> + unsigned cq_extra;
> unsigned long cq_check_overflow;
> struct wait_queue_head cq_wait;
> struct fasync_struct *cq_fasync;
> @@ -879,6 +880,8 @@ struct io_op_def {
> unsigned needs_async_setup : 1;
> /* should block plug */
> unsigned plug : 1;
> + /* set if opcode may generate multiple cqes */
> + unsigned multi_cqes : 1;
> /* size of async data needed, if any */
> unsigned short async_size;
> };
> @@ -924,6 +927,7 @@ struct io_op_def {
> [IORING_OP_POLL_ADD] = {
> .needs_file = 1,
> .unbound_nonreg_file = 1,
> + .multi_cqes = 1,
> },
> [IORING_OP_POLL_REMOVE] = {},
> [IORING_OP_SYNC_FILE_RANGE] = {
> @@ -1186,7 +1190,7 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
> if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
> struct io_ring_ctx *ctx = req->ctx;
>
> - return seq != ctx->cached_cq_tail
> + return seq + ctx->cq_extra != ctx->cached_cq_tail
> + READ_ONCE(ctx->cached_cq_overflow);
> }
>
> @@ -1516,6 +1520,9 @@ static bool __io_cqring_fill_event(struct io_kiocb *req, long res,
>
> trace_io_uring_complete(ctx, req->user_data, res, cflags);
>
> + if (req->flags & REQ_F_MULTI_CQES)
> + req->ctx->cq_extra++;
> +
Here we go, additional overhead burdening everyone but used for
a little new feature. All that can be done in poll or in *_prep()
on opcode by opcode basis.
> /*
> * If we can't get a cq entry, userspace overflowed the
> * submission (by quite a lot). Increment the overflow count in
> @@ -6504,6 +6511,13 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
> req->result = 0;
> req->work.creds = NULL;
>
> + if (sqe_flags & IOSQE_MULTI_CQES) {
> + ctx->cq_extra--;
> + if (!io_op_defs[req->opcode].multi_cqes) {
> + return -EOPNOTSUPP;
> + }
> + }
> +
see above
> /* enforce forwards compatibility on users */
> if (unlikely(sqe_flags & ~SQE_VALID_FLAGS)) {
> req->flags = 0;
>
--
Pavel Begunkov
next prev parent reply other threads:[~2021-04-07 11:45 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-07 11:23 [PATCH 5.13 v2] io_uring: maintain drain requests' logic Hao Xu
2021-04-07 11:23 ` [PATCH 1/3] io_uring: add IOSQE_MULTI_CQES/REQ_F_MULTI_CQES for multishot requests Hao Xu
2021-04-07 11:38 ` Pavel Begunkov
2021-04-07 11:23 ` [PATCH 2/3] io_uring: maintain drain logic " Hao Xu
2021-04-07 11:41 ` Pavel Begunkov [this message]
2021-04-07 11:23 ` [PATCH 3/3] io_uring: use REQ_F_MULTI_CQES for multipoll IORING_OP_ADD Hao Xu
2021-04-07 15:49 ` [PATCH 5.13 v2] io_uring: maintain drain requests' logic Jens Axboe
2021-04-08 10:16 ` Hao Xu
2021-04-08 11:43 ` Hao Xu
2021-04-08 12:22 ` Pavel Begunkov
2021-04-08 16:18 ` Jens Axboe
2021-04-09 6:15 ` Hao Xu
2021-04-09 7:05 ` Hao Xu
2021-04-09 7:50 ` Pavel Begunkov
2021-04-12 15:07 ` Hao Xu
2021-04-12 15:29 ` Hao Xu
2021-04-09 3:12 ` Hao Xu
2021-04-09 3:43 ` Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox