Re: [PATCH for-5.13] io_uring: maintain drain requests' logic

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: Hao Xu <[email protected]>,
	Pavel Begunkov <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
Date: Sun, 4 Apr 2021 17:07:38 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 4/3/21 12:58 AM, Hao Xu wrote:
> 在 2021/4/2 上午6:29, Pavel Begunkov 写道:
>> On 01/04/2021 15:55, Hao Xu wrote:
>>> 在 2021/4/1 下午6:25, Pavel Begunkov 写道:
>>>> On 01/04/2021 07:53, Hao Xu wrote:
>>>>> 在 2021/4/1 上午6:06, Pavel Begunkov 写道:
>>>>>>
>>>>>>
>>>>>> On 31/03/2021 10:01, Hao Xu wrote:
>>>>>>> Now that we have multishot poll requests, one sqe can emit multiple
>>>>>>> cqes. given below example:
>>>>>>>        sqe0(multishot poll)-->sqe1-->sqe2(drain req)
>>>>>>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
>>>>>>> is a multishot poll request, sqe2 may be issued after sqe0's event
>>>>>>> triggered twice before sqe1 completed. This isn't what users leverage
>>>>>>> drain requests for.
>>>>>>> Here a simple solution is to ignore all multishot poll cqes, which means
>>>>>>> drain requests  won't wait those request to be done.
>>>>>>>
>>>>>>> Signed-off-by: Hao Xu <[email protected]>
>>>>>>> ---
>>>>>>>     fs/io_uring.c | 9 +++++++--
>>>>>>>     1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>> index 513096759445..cd6d44cf5940 100644
>>>>>>> --- a/fs/io_uring.c
>>>>>>> +++ b/fs/io_uring.c
>>>>>>> @@ -455,6 +455,7 @@ struct io_ring_ctx {
>>>>>>>         struct callback_head        *exit_task_work;
>>>>>>>           struct wait_queue_head        hash_wait;
>>>>>>> +    unsigned                        multishot_cqes;
>>>>>>>           /* Keep this last, we don't need it for the fast path */
>>>>>>>         struct work_struct        exit_work;
>>>>>>> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>>>>>>>         if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>>>>>>>             struct io_ring_ctx *ctx = req->ctx;
>>>>>>>     -        return seq != ctx->cached_cq_tail
>>>>>>> -                + READ_ONCE(ctx->cached_cq_overflow);
>>>>>>> +        return seq + ctx->multishot_cqes != ctx->cached_cq_tail
>>>>>>> +            + READ_ONCE(ctx->cached_cq_overflow);
>>>>>>>         }
>>>>>>>           return false;
>>>>>>> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>>>     {
>>>>>>>         struct io_ring_ctx *ctx = req->ctx;
>>>>>>>         unsigned flags = IORING_CQE_F_MORE;
>>>>>>> +    bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
>>>>>>>           if (!error && req->poll.canceled) {
>>>>>>>             error = -ECANCELED;
>>>>>>> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>>>             req->poll.done = true;
>>>>>>>             flags = 0;
>>>>>>>         }
>>>>>>> +    if (multishot_poll)
>>>>>>> +        ctx->multishot_cqes++;
>>>>>>> +
>>>>>>
>>>>>> We need to make sure we do that only for a non-final complete, i.e.
>>>>>> not killing request, otherwise it'll double account the last one.
>>>>> Hi Pavel, I saw a killing request like iopoll_remove or async_cancel call io_cqring_fill_event() to create an ECANCELED cqe for the original poll request. So there could be cases like(even for single poll request):
>>>>>     (1). add poll --> cancel poll, an ECANCELED cqe.
>>>>>                                                     1sqe:1cqe   all good
>>>>>     (2). add poll --> trigger event(queued to task_work) --> cancel poll,            an ECANCELED cqe --> task_work runs, another ECANCELED cqe.
>>>>>                                                     1sqe:2cqes
>>>>
>>>> Those should emit a CQE on behalf of the request they're cancelling
>>>> only when it's definitely cancelled and not going to fill it
>>>> itself. E.g. if io_poll_cancel() found it and removed from
>>>> all the list and core's poll infra.
>>>>
>>>> At least before multi-cqe it should have been working fine.
>>>>
>>> I haven't done a test for this, but from the code logic, there could be
>>> case below:
>>>
>>> io_poll_add()                         | io_poll_remove
>>> (event happen)io_poll_wake()          | io_poll_remove_one
>>>                                        | io_poll_remove_waitqs
>>>                                        | io_cqring_fill_event(-ECANCELED)
>>>                                        |
>>> task_work run(io_poll_task_func)      |
>>> io_poll_complete()                    |
>>> req->poll.canceled is true, \         |
>>> __io_cqring_fill_event(-ECANCELED)    |
>>>
>>> two ECANCELED cqes, is there anything I missed?
>>
>> Definitely may be be, but need to take a closer look
>>
> I'll do some test to test if this issue exists, and make some change if 
> it does.

How about something like this? Seems pointless to have an extra
variable for this, when we already track if we're going to do more
completions for this event or not. Also places the variable where
it makes the most sense, and plenty of pad space there too.

Warning: totally untested. Would be great if you could, and hoping
you're going to send out a v2.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index f94b32b43429..1eea4998ad9b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -423,6 +423,7 @@ struct io_ring_ctx {
 		unsigned		cq_mask;
 		atomic_t		cq_timeouts;
 		unsigned		cq_last_tm_flush;
+		unsigned		cq_extra;
 		unsigned long		cq_check_overflow;
 		struct wait_queue_head	cq_wait;
 		struct fasync_struct	*cq_fasync;
@@ -1183,8 +1184,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
 	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
 		struct io_ring_ctx *ctx = req->ctx;
 
-		return seq != ctx->cached_cq_tail
-				+ READ_ONCE(ctx->cached_cq_overflow);
+		return seq + ctx->cq_extra != ctx->cached_cq_tail
+			+ READ_ONCE(ctx->cached_cq_overflow);
 	}
 
 	return false;
@@ -4894,6 +4895,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
 		req->poll.done = true;
 		flags = 0;
 	}
+	if (flags & IORING_CQE_F_MORE)
+		ctx->cq_extra++;
+
 	io_commit_cqring(ctx);
 	return !(flags & IORING_CQE_F_MORE);
 }

-- 
Jens Axboe

next prev parent reply	other threads:[~2021-04-04 23:07 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-31  9:01 [PATCH for-5.13] io_uring: maintain drain requests' logic Hao Xu
2021-03-31 15:36 ` Jens Axboe
2021-04-01  6:58   ` Hao Xu
2021-03-31 22:06 ` Pavel Begunkov
2021-04-01  6:53   ` Hao Xu
2021-04-01 10:25     ` Pavel Begunkov
     [not found]       ` <[email protected]>
2021-04-01 22:29         ` Pavel Begunkov
2021-04-03  6:58           ` Hao Xu
2021-04-04 23:07             ` Jens Axboe [this message]
2021-04-05 16:11               ` Hao Xu

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f94b32b4342 dfblob:1eea4998ad9 )
 OR (
bs:"Re: [PATCH for-5.13] io_uring: maintain drain requests' logic" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox