public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH for-5.13] io_uring: maintain drain requests' logic
@ 2021-03-31  9:01 Hao Xu
  2021-03-31 15:36 ` Jens Axboe
  2021-03-31 22:06 ` Pavel Begunkov
  0 siblings, 2 replies; 10+ messages in thread
From: Hao Xu @ 2021-03-31  9:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

Now that we have multishot poll requests, one sqe can emit multiple
cqes. given below example:
    sqe0(multishot poll)-->sqe1-->sqe2(drain req)
sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
is a multishot poll request, sqe2 may be issued after sqe0's event
triggered twice before sqe1 completed. This isn't what users leverage
drain requests for.
Here a simple solution is to ignore all multishot poll cqes, which means
drain requests  won't wait those request to be done.

Signed-off-by: Hao Xu <[email protected]>
---
 fs/io_uring.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 513096759445..cd6d44cf5940 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -455,6 +455,7 @@ struct io_ring_ctx {
 	struct callback_head		*exit_task_work;
 
 	struct wait_queue_head		hash_wait;
+	unsigned                        multishot_cqes;
 
 	/* Keep this last, we don't need it for the fast path */
 	struct work_struct		exit_work;
@@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
 	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
 		struct io_ring_ctx *ctx = req->ctx;
 
-		return seq != ctx->cached_cq_tail
-				+ READ_ONCE(ctx->cached_cq_overflow);
+		return seq + ctx->multishot_cqes != ctx->cached_cq_tail
+			+ READ_ONCE(ctx->cached_cq_overflow);
 	}
 
 	return false;
@@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	unsigned flags = IORING_CQE_F_MORE;
+	bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
 
 	if (!error && req->poll.canceled) {
 		error = -ECANCELED;
@@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
 		req->poll.done = true;
 		flags = 0;
 	}
+	if (multishot_poll)
+		ctx->multishot_cqes++;
+
 	io_commit_cqring(ctx);
 	return !(flags & IORING_CQE_F_MORE);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
  2021-03-31  9:01 [PATCH for-5.13] io_uring: maintain drain requests' logic Hao Xu
@ 2021-03-31 15:36 ` Jens Axboe
  2021-04-01  6:58   ` Hao Xu
  2021-03-31 22:06 ` Pavel Begunkov
  1 sibling, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2021-03-31 15:36 UTC (permalink / raw)
  To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi

On 3/31/21 3:01 AM, Hao Xu wrote:
> Now that we have multishot poll requests, one sqe can emit multiple
> cqes. given below example:
>     sqe0(multishot poll)-->sqe1-->sqe2(drain req)
> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
> is a multishot poll request, sqe2 may be issued after sqe0's event
> triggered twice before sqe1 completed. This isn't what users leverage
> drain requests for.
> Here a simple solution is to ignore all multishot poll cqes, which means
> drain requests  won't wait those request to be done.

Good point, we need to do something here... Looks simple enough to me,
though I'd probably prefer if we rename 'multishot_cqes' to
'persistent_sqes' or something like that. It's likely not the last
user of having 1:M mappings between sqe and cqe, so might as well
try and name it a bit more appropriately.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
  2021-03-31  9:01 [PATCH for-5.13] io_uring: maintain drain requests' logic Hao Xu
  2021-03-31 15:36 ` Jens Axboe
@ 2021-03-31 22:06 ` Pavel Begunkov
  2021-04-01  6:53   ` Hao Xu
  1 sibling, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2021-03-31 22:06 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi



On 31/03/2021 10:01, Hao Xu wrote:
> Now that we have multishot poll requests, one sqe can emit multiple
> cqes. given below example:
>     sqe0(multishot poll)-->sqe1-->sqe2(drain req)
> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
> is a multishot poll request, sqe2 may be issued after sqe0's event
> triggered twice before sqe1 completed. This isn't what users leverage
> drain requests for.
> Here a simple solution is to ignore all multishot poll cqes, which means
> drain requests  won't wait those request to be done.
> 
> Signed-off-by: Hao Xu <[email protected]>
> ---
>  fs/io_uring.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 513096759445..cd6d44cf5940 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -455,6 +455,7 @@ struct io_ring_ctx {
>  	struct callback_head		*exit_task_work;
>  
>  	struct wait_queue_head		hash_wait;
> +	unsigned                        multishot_cqes;
>  
>  	/* Keep this last, we don't need it for the fast path */
>  	struct work_struct		exit_work;
> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>  	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>  		struct io_ring_ctx *ctx = req->ctx;
>  
> -		return seq != ctx->cached_cq_tail
> -				+ READ_ONCE(ctx->cached_cq_overflow);
> +		return seq + ctx->multishot_cqes != ctx->cached_cq_tail
> +			+ READ_ONCE(ctx->cached_cq_overflow);
>  	}
>  
>  	return false;
> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>  {
>  	struct io_ring_ctx *ctx = req->ctx;
>  	unsigned flags = IORING_CQE_F_MORE;
> +	bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
>  
>  	if (!error && req->poll.canceled) {
>  		error = -ECANCELED;
> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>  		req->poll.done = true;
>  		flags = 0;
>  	}
> +	if (multishot_poll)
> +		ctx->multishot_cqes++;
> +

We need to make sure we do that only for a non-final complete, i.e.
not killing request, otherwise it'll double account the last one.
E.g. is failed __io_cqring_fill_event() in io_poll_complete() fine?
Other places?

Btw, we can use some tests :)


>  	io_commit_cqring(ctx);
>  	return !(flags & IORING_CQE_F_MORE);
>  }
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
  2021-03-31 22:06 ` Pavel Begunkov
@ 2021-04-01  6:53   ` Hao Xu
  2021-04-01 10:25     ` Pavel Begunkov
  0 siblings, 1 reply; 10+ messages in thread
From: Hao Xu @ 2021-04-01  6:53 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/4/1 上午6:06, Pavel Begunkov 写道:
> 
> 
> On 31/03/2021 10:01, Hao Xu wrote:
>> Now that we have multishot poll requests, one sqe can emit multiple
>> cqes. given below example:
>>      sqe0(multishot poll)-->sqe1-->sqe2(drain req)
>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
>> is a multishot poll request, sqe2 may be issued after sqe0's event
>> triggered twice before sqe1 completed. This isn't what users leverage
>> drain requests for.
>> Here a simple solution is to ignore all multishot poll cqes, which means
>> drain requests  won't wait those request to be done.
>>
>> Signed-off-by: Hao Xu <[email protected]>
>> ---
>>   fs/io_uring.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 513096759445..cd6d44cf5940 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -455,6 +455,7 @@ struct io_ring_ctx {
>>   	struct callback_head		*exit_task_work;
>>   
>>   	struct wait_queue_head		hash_wait;
>> +	unsigned                        multishot_cqes;
>>   
>>   	/* Keep this last, we don't need it for the fast path */
>>   	struct work_struct		exit_work;
>> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>>   	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>>   		struct io_ring_ctx *ctx = req->ctx;
>>   
>> -		return seq != ctx->cached_cq_tail
>> -				+ READ_ONCE(ctx->cached_cq_overflow);
>> +		return seq + ctx->multishot_cqes != ctx->cached_cq_tail
>> +			+ READ_ONCE(ctx->cached_cq_overflow);
>>   	}
>>   
>>   	return false;
>> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>   {
>>   	struct io_ring_ctx *ctx = req->ctx;
>>   	unsigned flags = IORING_CQE_F_MORE;
>> +	bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
>>   
>>   	if (!error && req->poll.canceled) {
>>   		error = -ECANCELED;
>> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>   		req->poll.done = true;
>>   		flags = 0;
>>   	}
>> +	if (multishot_poll)
>> +		ctx->multishot_cqes++;
>> +
> 
> We need to make sure we do that only for a non-final complete, i.e.
> not killing request, otherwise it'll double account the last one.
Hi Pavel, I saw a killing request like iopoll_remove or async_cancel 
call io_cqring_fill_event() to create an ECANCELED cqe for the original 
poll request. So there could be cases like(even for single poll request):
   (1). add poll --> cancel poll, an ECANCELED cqe.
                                                   1sqe:1cqe   all good
   (2). add poll --> trigger event(queued to task_work) --> cancel poll, 
            an ECANCELED cqe --> task_work runs, another ECANCELED cqe.
                                                   1sqe:2cqes
I suggest we shall only emit one ECANCELED cqe.
Currently I only account cqe through io_poll_complete(), so ECANCELED 
cqe from io_poll_remove or async_cancel etc are not counted in.
> E.g. is failed __io_cqring_fill_event() in io_poll_complete() fine?
> Other places?
a failed __io_cqring_fill_event() doesn't produce a cqe but increment 
ctx->cached_cq_overflow, as long as a cqe is produced or 
cached_cq_overflow is +=1, it is ok.
> 
> Btw, we can use some tests :)
I'll do more tests.
> 
> 
>>   	io_commit_cqring(ctx);
>>   	return !(flags & IORING_CQE_F_MORE);
>>   }
>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
  2021-03-31 15:36 ` Jens Axboe
@ 2021-04-01  6:58   ` Hao Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Hao Xu @ 2021-04-01  6:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi

在 2021/3/31 下午11:36, Jens Axboe 写道:
> On 3/31/21 3:01 AM, Hao Xu wrote:
>> Now that we have multishot poll requests, one sqe can emit multiple
>> cqes. given below example:
>>      sqe0(multishot poll)-->sqe1-->sqe2(drain req)
>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
>> is a multishot poll request, sqe2 may be issued after sqe0's event
>> triggered twice before sqe1 completed. This isn't what users leverage
>> drain requests for.
>> Here a simple solution is to ignore all multishot poll cqes, which means
>> drain requests  won't wait those request to be done.
> 
> Good point, we need to do something here... Looks simple enough to me,
> though I'd probably prefer if we rename 'multishot_cqes' to
> 'persistent_sqes' or something like that. It's likely not the last
> user of having 1:M mappings between sqe and cqe, so might as well
> try and name it a bit more appropriately.
> 
persistent_sqes makes sense to me.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
  2021-04-01  6:53   ` Hao Xu
@ 2021-04-01 10:25     ` Pavel Begunkov
       [not found]       ` <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2021-04-01 10:25 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 01/04/2021 07:53, Hao Xu wrote:
> 在 2021/4/1 上午6:06, Pavel Begunkov 写道:
>>
>>
>> On 31/03/2021 10:01, Hao Xu wrote:
>>> Now that we have multishot poll requests, one sqe can emit multiple
>>> cqes. given below example:
>>>      sqe0(multishot poll)-->sqe1-->sqe2(drain req)
>>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
>>> is a multishot poll request, sqe2 may be issued after sqe0's event
>>> triggered twice before sqe1 completed. This isn't what users leverage
>>> drain requests for.
>>> Here a simple solution is to ignore all multishot poll cqes, which means
>>> drain requests  won't wait those request to be done.
>>>
>>> Signed-off-by: Hao Xu <[email protected]>
>>> ---
>>>   fs/io_uring.c | 9 +++++++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 513096759445..cd6d44cf5940 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -455,6 +455,7 @@ struct io_ring_ctx {
>>>       struct callback_head        *exit_task_work;
>>>         struct wait_queue_head        hash_wait;
>>> +    unsigned                        multishot_cqes;
>>>         /* Keep this last, we don't need it for the fast path */
>>>       struct work_struct        exit_work;
>>> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>>>       if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>>>           struct io_ring_ctx *ctx = req->ctx;
>>>   -        return seq != ctx->cached_cq_tail
>>> -                + READ_ONCE(ctx->cached_cq_overflow);
>>> +        return seq + ctx->multishot_cqes != ctx->cached_cq_tail
>>> +            + READ_ONCE(ctx->cached_cq_overflow);
>>>       }
>>>         return false;
>>> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>   {
>>>       struct io_ring_ctx *ctx = req->ctx;
>>>       unsigned flags = IORING_CQE_F_MORE;
>>> +    bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
>>>         if (!error && req->poll.canceled) {
>>>           error = -ECANCELED;
>>> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>           req->poll.done = true;
>>>           flags = 0;
>>>       }
>>> +    if (multishot_poll)
>>> +        ctx->multishot_cqes++;
>>> +
>>
>> We need to make sure we do that only for a non-final complete, i.e.
>> not killing request, otherwise it'll double account the last one.
> Hi Pavel, I saw a killing request like iopoll_remove or async_cancel call io_cqring_fill_event() to create an ECANCELED cqe for the original poll request. So there could be cases like(even for single poll request):
>   (1). add poll --> cancel poll, an ECANCELED cqe.
>                                                   1sqe:1cqe   all good
>   (2). add poll --> trigger event(queued to task_work) --> cancel poll,            an ECANCELED cqe --> task_work runs, another ECANCELED cqe.
>                                                   1sqe:2cqes

Those should emit a CQE on behalf of the request they're cancelling
only when it's definitely cancelled and not going to fill it
itself. E.g. if io_poll_cancel() found it and removed from
all the list and core's poll infra.

At least before multi-cqe it should have been working fine.

> I suggest we shall only emit one ECANCELED cqe.
> Currently I only account cqe through io_poll_complete(), so ECANCELED cqe from io_poll_remove or async_cancel etc are not counted in.
>> E.g. is failed __io_cqring_fill_event() in io_poll_complete() fine?
>> Other places?
> a failed __io_cqring_fill_event() doesn't produce a cqe but increment ctx->cached_cq_overflow, as long as a cqe is produced or cached_cq_overflow is +=1, it is ok.

Not claiming that the case is broken, but cached_cq_overflow is
considered in req_need_defer() as well, so from its perspective there
is no much difference between succeed fill_event() or not.

>>
>> Btw, we can use some tests :)
> I'll do more tests.

Perfect!

>>
>>
>>>       io_commit_cqring(ctx);
>>>       return !(flags & IORING_CQE_F_MORE);
>>>   }
>>>
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
       [not found]       ` <[email protected]>
@ 2021-04-01 22:29         ` Pavel Begunkov
  2021-04-03  6:58           ` Hao Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Pavel Begunkov @ 2021-04-01 22:29 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi

On 01/04/2021 15:55, Hao Xu wrote:
> 在 2021/4/1 下午6:25, Pavel Begunkov 写道:
>> On 01/04/2021 07:53, Hao Xu wrote:
>>> 在 2021/4/1 上午6:06, Pavel Begunkov 写道:
>>>>
>>>>
>>>> On 31/03/2021 10:01, Hao Xu wrote:
>>>>> Now that we have multishot poll requests, one sqe can emit multiple
>>>>> cqes. given below example:
>>>>>       sqe0(multishot poll)-->sqe1-->sqe2(drain req)
>>>>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
>>>>> is a multishot poll request, sqe2 may be issued after sqe0's event
>>>>> triggered twice before sqe1 completed. This isn't what users leverage
>>>>> drain requests for.
>>>>> Here a simple solution is to ignore all multishot poll cqes, which means
>>>>> drain requests  won't wait those request to be done.
>>>>>
>>>>> Signed-off-by: Hao Xu <[email protected]>
>>>>> ---
>>>>>    fs/io_uring.c | 9 +++++++--
>>>>>    1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 513096759445..cd6d44cf5940 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -455,6 +455,7 @@ struct io_ring_ctx {
>>>>>        struct callback_head        *exit_task_work;
>>>>>          struct wait_queue_head        hash_wait;
>>>>> +    unsigned                        multishot_cqes;
>>>>>          /* Keep this last, we don't need it for the fast path */
>>>>>        struct work_struct        exit_work;
>>>>> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>>>>>        if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>>>>>            struct io_ring_ctx *ctx = req->ctx;
>>>>>    -        return seq != ctx->cached_cq_tail
>>>>> -                + READ_ONCE(ctx->cached_cq_overflow);
>>>>> +        return seq + ctx->multishot_cqes != ctx->cached_cq_tail
>>>>> +            + READ_ONCE(ctx->cached_cq_overflow);
>>>>>        }
>>>>>          return false;
>>>>> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>    {
>>>>>        struct io_ring_ctx *ctx = req->ctx;
>>>>>        unsigned flags = IORING_CQE_F_MORE;
>>>>> +    bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
>>>>>          if (!error && req->poll.canceled) {
>>>>>            error = -ECANCELED;
>>>>> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>            req->poll.done = true;
>>>>>            flags = 0;
>>>>>        }
>>>>> +    if (multishot_poll)
>>>>> +        ctx->multishot_cqes++;
>>>>> +
>>>>
>>>> We need to make sure we do that only for a non-final complete, i.e.
>>>> not killing request, otherwise it'll double account the last one.
>>> Hi Pavel, I saw a killing request like iopoll_remove or async_cancel call io_cqring_fill_event() to create an ECANCELED cqe for the original poll request. So there could be cases like(even for single poll request):
>>>    (1). add poll --> cancel poll, an ECANCELED cqe.
>>>                                                    1sqe:1cqe   all good
>>>    (2). add poll --> trigger event(queued to task_work) --> cancel poll,            an ECANCELED cqe --> task_work runs, another ECANCELED cqe.
>>>                                                    1sqe:2cqes
>>
>> Those should emit a CQE on behalf of the request they're cancelling
>> only when it's definitely cancelled and not going to fill it
>> itself. E.g. if io_poll_cancel() found it and removed from
>> all the list and core's poll infra.
>>
>> At least before multi-cqe it should have been working fine.
>>
> I haven't done a test for this, but from the code logic, there could be
> case below:
> 
> io_poll_add()                         | io_poll_remove
> (event happen)io_poll_wake()          | io_poll_remove_one
>                                       | io_poll_remove_waitqs
>                                       | io_cqring_fill_event(-ECANCELED)
>                                       |
> task_work run(io_poll_task_func)      |
> io_poll_complete()                    |
> req->poll.canceled is true, \         |
> __io_cqring_fill_event(-ECANCELED)    |
> 
> two ECANCELED cqes, is there anything I missed?

Definitely may be be, but need to take a closer look


>>> I suggest we shall only emit one ECANCELED cqe.
>>> Currently I only account cqe through io_poll_complete(), so ECANCELED cqe from io_poll_remove or async_cancel etc are not counted in.
>>>> E.g. is failed __io_cqring_fill_event() in io_poll_complete() fine?
>>>> Other places?
>>> a failed __io_cqring_fill_event() doesn't produce a cqe but increment ctx->cached_cq_overflow, as long as a cqe is produced or cached_cq_overflow is +=1, it is ok.
>>
>> Not claiming that the case is broken, but cached_cq_overflow is
>> considered in req_need_defer() as well, so from its perspective there
>> is no much difference between succeed fill_event() or not.
>>
>>>>
>>>> Btw, we can use some tests :)
>>> I'll do more tests.
>>
>> Perfect!
>>
>>>>
>>>>
>>>>>        io_commit_cqring(ctx);
>>>>>        return !(flags & IORING_CQE_F_MORE);
>>>>>    }

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
  2021-04-01 22:29         ` Pavel Begunkov
@ 2021-04-03  6:58           ` Hao Xu
  2021-04-04 23:07             ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Hao Xu @ 2021-04-03  6:58 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi

在 2021/4/2 上午6:29, Pavel Begunkov 写道:
> On 01/04/2021 15:55, Hao Xu wrote:
>> 在 2021/4/1 下午6:25, Pavel Begunkov 写道:
>>> On 01/04/2021 07:53, Hao Xu wrote:
>>>> 在 2021/4/1 上午6:06, Pavel Begunkov 写道:
>>>>>
>>>>>
>>>>> On 31/03/2021 10:01, Hao Xu wrote:
>>>>>> Now that we have multishot poll requests, one sqe can emit multiple
>>>>>> cqes. given below example:
>>>>>>        sqe0(multishot poll)-->sqe1-->sqe2(drain req)
>>>>>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
>>>>>> is a multishot poll request, sqe2 may be issued after sqe0's event
>>>>>> triggered twice before sqe1 completed. This isn't what users leverage
>>>>>> drain requests for.
>>>>>> Here a simple solution is to ignore all multishot poll cqes, which means
>>>>>> drain requests  won't wait those request to be done.
>>>>>>
>>>>>> Signed-off-by: Hao Xu <[email protected]>
>>>>>> ---
>>>>>>     fs/io_uring.c | 9 +++++++--
>>>>>>     1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>> index 513096759445..cd6d44cf5940 100644
>>>>>> --- a/fs/io_uring.c
>>>>>> +++ b/fs/io_uring.c
>>>>>> @@ -455,6 +455,7 @@ struct io_ring_ctx {
>>>>>>         struct callback_head        *exit_task_work;
>>>>>>           struct wait_queue_head        hash_wait;
>>>>>> +    unsigned                        multishot_cqes;
>>>>>>           /* Keep this last, we don't need it for the fast path */
>>>>>>         struct work_struct        exit_work;
>>>>>> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>>>>>>         if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>>>>>>             struct io_ring_ctx *ctx = req->ctx;
>>>>>>     -        return seq != ctx->cached_cq_tail
>>>>>> -                + READ_ONCE(ctx->cached_cq_overflow);
>>>>>> +        return seq + ctx->multishot_cqes != ctx->cached_cq_tail
>>>>>> +            + READ_ONCE(ctx->cached_cq_overflow);
>>>>>>         }
>>>>>>           return false;
>>>>>> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>>     {
>>>>>>         struct io_ring_ctx *ctx = req->ctx;
>>>>>>         unsigned flags = IORING_CQE_F_MORE;
>>>>>> +    bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
>>>>>>           if (!error && req->poll.canceled) {
>>>>>>             error = -ECANCELED;
>>>>>> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>>             req->poll.done = true;
>>>>>>             flags = 0;
>>>>>>         }
>>>>>> +    if (multishot_poll)
>>>>>> +        ctx->multishot_cqes++;
>>>>>> +
>>>>>
>>>>> We need to make sure we do that only for a non-final complete, i.e.
>>>>> not killing request, otherwise it'll double account the last one.
>>>> Hi Pavel, I saw a killing request like iopoll_remove or async_cancel call io_cqring_fill_event() to create an ECANCELED cqe for the original poll request. So there could be cases like(even for single poll request):
>>>>     (1). add poll --> cancel poll, an ECANCELED cqe.
>>>>                                                     1sqe:1cqe   all good
>>>>     (2). add poll --> trigger event(queued to task_work) --> cancel poll,            an ECANCELED cqe --> task_work runs, another ECANCELED cqe.
>>>>                                                     1sqe:2cqes
>>>
>>> Those should emit a CQE on behalf of the request they're cancelling
>>> only when it's definitely cancelled and not going to fill it
>>> itself. E.g. if io_poll_cancel() found it and removed from
>>> all the list and core's poll infra.
>>>
>>> At least before multi-cqe it should have been working fine.
>>>
>> I haven't done a test for this, but from the code logic, there could be
>> case below:
>>
>> io_poll_add()                         | io_poll_remove
>> (event happen)io_poll_wake()          | io_poll_remove_one
>>                                        | io_poll_remove_waitqs
>>                                        | io_cqring_fill_event(-ECANCELED)
>>                                        |
>> task_work run(io_poll_task_func)      |
>> io_poll_complete()                    |
>> req->poll.canceled is true, \         |
>> __io_cqring_fill_event(-ECANCELED)    |
>>
>> two ECANCELED cqes, is there anything I missed?
> 
> Definitely may be be, but need to take a closer look
> 
I'll do some test to test if this issue exists, and make some change if 
it does.
> 
>>>> I suggest we shall only emit one ECANCELED cqe.
>>>> Currently I only account cqe through io_poll_complete(), so ECANCELED cqe from io_poll_remove or async_cancel etc are not counted in.
>>>>> E.g. is failed __io_cqring_fill_event() in io_poll_complete() fine?
>>>>> Other places?
>>>> a failed __io_cqring_fill_event() doesn't produce a cqe but increment ctx->cached_cq_overflow, as long as a cqe is produced or cached_cq_overflow is +=1, it is ok.
>>>
>>> Not claiming that the case is broken, but cached_cq_overflow is
>>> considered in req_need_defer() as well, so from its perspective there
>>> is no much difference between succeed fill_event() or not.
>>>
>>>>>
>>>>> Btw, we can use some tests :)
>>>> I'll do more tests.
>>>
>>> Perfect!
>>>
>>>>>
>>>>>
>>>>>>         io_commit_cqring(ctx);
>>>>>>         return !(flags & IORING_CQE_F_MORE);
>>>>>>     }
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
  2021-04-03  6:58           ` Hao Xu
@ 2021-04-04 23:07             ` Jens Axboe
  2021-04-05 16:11               ` Hao Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2021-04-04 23:07 UTC (permalink / raw)
  To: Hao Xu, Pavel Begunkov; +Cc: io-uring, Joseph Qi

On 4/3/21 12:58 AM, Hao Xu wrote:
> 在 2021/4/2 上午6:29, Pavel Begunkov 写道:
>> On 01/04/2021 15:55, Hao Xu wrote:
>>> 在 2021/4/1 下午6:25, Pavel Begunkov 写道:
>>>> On 01/04/2021 07:53, Hao Xu wrote:
>>>>> 在 2021/4/1 上午6:06, Pavel Begunkov 写道:
>>>>>>
>>>>>>
>>>>>> On 31/03/2021 10:01, Hao Xu wrote:
>>>>>>> Now that we have multishot poll requests, one sqe can emit multiple
>>>>>>> cqes. given below example:
>>>>>>>        sqe0(multishot poll)-->sqe1-->sqe2(drain req)
>>>>>>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
>>>>>>> is a multishot poll request, sqe2 may be issued after sqe0's event
>>>>>>> triggered twice before sqe1 completed. This isn't what users leverage
>>>>>>> drain requests for.
>>>>>>> Here a simple solution is to ignore all multishot poll cqes, which means
>>>>>>> drain requests  won't wait those request to be done.
>>>>>>>
>>>>>>> Signed-off-by: Hao Xu <[email protected]>
>>>>>>> ---
>>>>>>>     fs/io_uring.c | 9 +++++++--
>>>>>>>     1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>> index 513096759445..cd6d44cf5940 100644
>>>>>>> --- a/fs/io_uring.c
>>>>>>> +++ b/fs/io_uring.c
>>>>>>> @@ -455,6 +455,7 @@ struct io_ring_ctx {
>>>>>>>         struct callback_head        *exit_task_work;
>>>>>>>           struct wait_queue_head        hash_wait;
>>>>>>> +    unsigned                        multishot_cqes;
>>>>>>>           /* Keep this last, we don't need it for the fast path */
>>>>>>>         struct work_struct        exit_work;
>>>>>>> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>>>>>>>         if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>>>>>>>             struct io_ring_ctx *ctx = req->ctx;
>>>>>>>     -        return seq != ctx->cached_cq_tail
>>>>>>> -                + READ_ONCE(ctx->cached_cq_overflow);
>>>>>>> +        return seq + ctx->multishot_cqes != ctx->cached_cq_tail
>>>>>>> +            + READ_ONCE(ctx->cached_cq_overflow);
>>>>>>>         }
>>>>>>>           return false;
>>>>>>> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>>>     {
>>>>>>>         struct io_ring_ctx *ctx = req->ctx;
>>>>>>>         unsigned flags = IORING_CQE_F_MORE;
>>>>>>> +    bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
>>>>>>>           if (!error && req->poll.canceled) {
>>>>>>>             error = -ECANCELED;
>>>>>>> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>>>             req->poll.done = true;
>>>>>>>             flags = 0;
>>>>>>>         }
>>>>>>> +    if (multishot_poll)
>>>>>>> +        ctx->multishot_cqes++;
>>>>>>> +
>>>>>>
>>>>>> We need to make sure we do that only for a non-final complete, i.e.
>>>>>> not killing request, otherwise it'll double account the last one.
>>>>> Hi Pavel, I saw a killing request like iopoll_remove or async_cancel call io_cqring_fill_event() to create an ECANCELED cqe for the original poll request. So there could be cases like(even for single poll request):
>>>>>     (1). add poll --> cancel poll, an ECANCELED cqe.
>>>>>                                                     1sqe:1cqe   all good
>>>>>     (2). add poll --> trigger event(queued to task_work) --> cancel poll,            an ECANCELED cqe --> task_work runs, another ECANCELED cqe.
>>>>>                                                     1sqe:2cqes
>>>>
>>>> Those should emit a CQE on behalf of the request they're cancelling
>>>> only when it's definitely cancelled and not going to fill it
>>>> itself. E.g. if io_poll_cancel() found it and removed from
>>>> all the list and core's poll infra.
>>>>
>>>> At least before multi-cqe it should have been working fine.
>>>>
>>> I haven't done a test for this, but from the code logic, there could be
>>> case below:
>>>
>>> io_poll_add()                         | io_poll_remove
>>> (event happen)io_poll_wake()          | io_poll_remove_one
>>>                                        | io_poll_remove_waitqs
>>>                                        | io_cqring_fill_event(-ECANCELED)
>>>                                        |
>>> task_work run(io_poll_task_func)      |
>>> io_poll_complete()                    |
>>> req->poll.canceled is true, \         |
>>> __io_cqring_fill_event(-ECANCELED)    |
>>>
>>> two ECANCELED cqes, is there anything I missed?
>>
>> Definitely may be be, but need to take a closer look
>>
> I'll do some test to test if this issue exists, and make some change if 
> it does.

How about something like this? Seems pointless to have an extra
variable for this, when we already track if we're going to do more
completions for this event or not. Also places the variable where
it makes the most sense, and plenty of pad space there too.

Warning: totally untested. Would be great if you could, and hoping
you're going to send out a v2.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index f94b32b43429..1eea4998ad9b 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -423,6 +423,7 @@ struct io_ring_ctx {
 		unsigned		cq_mask;
 		atomic_t		cq_timeouts;
 		unsigned		cq_last_tm_flush;
+		unsigned		cq_extra;
 		unsigned long		cq_check_overflow;
 		struct wait_queue_head	cq_wait;
 		struct fasync_struct	*cq_fasync;
@@ -1183,8 +1184,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
 	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
 		struct io_ring_ctx *ctx = req->ctx;
 
-		return seq != ctx->cached_cq_tail
-				+ READ_ONCE(ctx->cached_cq_overflow);
+		return seq + ctx->cq_extra != ctx->cached_cq_tail
+			+ READ_ONCE(ctx->cached_cq_overflow);
 	}
 
 	return false;
@@ -4894,6 +4895,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
 		req->poll.done = true;
 		flags = 0;
 	}
+	if (flags & IORING_CQE_F_MORE)
+		ctx->cq_extra++;
+
 	io_commit_cqring(ctx);
 	return !(flags & IORING_CQE_F_MORE);
 }

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH for-5.13] io_uring: maintain drain requests' logic
  2021-04-04 23:07             ` Jens Axboe
@ 2021-04-05 16:11               ` Hao Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Hao Xu @ 2021-04-05 16:11 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov; +Cc: io-uring, Joseph Qi

在 2021/4/5 上午7:07, Jens Axboe 写道:
> On 4/3/21 12:58 AM, Hao Xu wrote:
>> 在 2021/4/2 上午6:29, Pavel Begunkov 写道:
>>> On 01/04/2021 15:55, Hao Xu wrote:
>>>> 在 2021/4/1 下午6:25, Pavel Begunkov 写道:
>>>>> On 01/04/2021 07:53, Hao Xu wrote:
>>>>>> 在 2021/4/1 上午6:06, Pavel Begunkov 写道:
>>>>>>>
>>>>>>>
>>>>>>> On 31/03/2021 10:01, Hao Xu wrote:
>>>>>>>> Now that we have multishot poll requests, one sqe can emit multiple
>>>>>>>> cqes. given below example:
>>>>>>>>         sqe0(multishot poll)-->sqe1-->sqe2(drain req)
>>>>>>>> sqe2 is designed to issue after sqe0 and sqe1 completed, but since sqe0
>>>>>>>> is a multishot poll request, sqe2 may be issued after sqe0's event
>>>>>>>> triggered twice before sqe1 completed. This isn't what users leverage
>>>>>>>> drain requests for.
>>>>>>>> Here a simple solution is to ignore all multishot poll cqes, which means
>>>>>>>> drain requests  won't wait those request to be done.
>>>>>>>>
>>>>>>>> Signed-off-by: Hao Xu <[email protected]>
>>>>>>>> ---
>>>>>>>>      fs/io_uring.c | 9 +++++++--
>>>>>>>>      1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>>>>> index 513096759445..cd6d44cf5940 100644
>>>>>>>> --- a/fs/io_uring.c
>>>>>>>> +++ b/fs/io_uring.c
>>>>>>>> @@ -455,6 +455,7 @@ struct io_ring_ctx {
>>>>>>>>          struct callback_head        *exit_task_work;
>>>>>>>>            struct wait_queue_head        hash_wait;
>>>>>>>> +    unsigned                        multishot_cqes;
>>>>>>>>            /* Keep this last, we don't need it for the fast path */
>>>>>>>>          struct work_struct        exit_work;
>>>>>>>> @@ -1181,8 +1182,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>>>>>>>>          if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>>>>>>>>              struct io_ring_ctx *ctx = req->ctx;
>>>>>>>>      -        return seq != ctx->cached_cq_tail
>>>>>>>> -                + READ_ONCE(ctx->cached_cq_overflow);
>>>>>>>> +        return seq + ctx->multishot_cqes != ctx->cached_cq_tail
>>>>>>>> +            + READ_ONCE(ctx->cached_cq_overflow);
>>>>>>>>          }
>>>>>>>>            return false;
>>>>>>>> @@ -4897,6 +4898,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>>>>      {
>>>>>>>>          struct io_ring_ctx *ctx = req->ctx;
>>>>>>>>          unsigned flags = IORING_CQE_F_MORE;
>>>>>>>> +    bool multishot_poll = !(req->poll.events & EPOLLONESHOT);
>>>>>>>>            if (!error && req->poll.canceled) {
>>>>>>>>              error = -ECANCELED;
>>>>>>>> @@ -4911,6 +4913,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>>>>>>>>              req->poll.done = true;
>>>>>>>>              flags = 0;
>>>>>>>>          }
>>>>>>>> +    if (multishot_poll)
>>>>>>>> +        ctx->multishot_cqes++;
>>>>>>>> +
>>>>>>>
>>>>>>> We need to make sure we do that only for a non-final complete, i.e.
>>>>>>> not killing request, otherwise it'll double account the last one.
>>>>>> Hi Pavel, I saw a killing request like iopoll_remove or async_cancel call io_cqring_fill_event() to create an ECANCELED cqe for the original poll request. So there could be cases like(even for single poll request):
>>>>>>      (1). add poll --> cancel poll, an ECANCELED cqe.
>>>>>>                                                      1sqe:1cqe   all good
>>>>>>      (2). add poll --> trigger event(queued to task_work) --> cancel poll,            an ECANCELED cqe --> task_work runs, another ECANCELED cqe.
>>>>>>                                                      1sqe:2cqes
>>>>>
>>>>> Those should emit a CQE on behalf of the request they're cancelling
>>>>> only when it's definitely cancelled and not going to fill it
>>>>> itself. E.g. if io_poll_cancel() found it and removed from
>>>>> all the list and core's poll infra.
>>>>>
>>>>> At least before multi-cqe it should have been working fine.
>>>>>
>>>> I haven't done a test for this, but from the code logic, there could be
>>>> case below:
>>>>
>>>> io_poll_add()                         | io_poll_remove
>>>> (event happen)io_poll_wake()          | io_poll_remove_one
>>>>                                         | io_poll_remove_waitqs
>>>>                                         | io_cqring_fill_event(-ECANCELED)
>>>>                                         |
>>>> task_work run(io_poll_task_func)      |
>>>> io_poll_complete()                    |
>>>> req->poll.canceled is true, \         |
>>>> __io_cqring_fill_event(-ECANCELED)    |
>>>>
>>>> two ECANCELED cqes, is there anything I missed?
>>>
>>> Definitely may be be, but need to take a closer look
>>>
>> I'll do some test to test if this issue exists, and make some change if
>> it does.
> 
> How about something like this? Seems pointless to have an extra
> variable for this, when we already track if we're going to do more
> completions for this event or not. Also places the variable where
> it makes the most sense, and plenty of pad space there too.
> 
> Warning: totally untested. Would be great if you could, and hoping
> you're going to send out a v2.
> 
I'm writting a test for it, will send them together soon.
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index f94b32b43429..1eea4998ad9b 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -423,6 +423,7 @@ struct io_ring_ctx {
>   		unsigned		cq_mask;
>   		atomic_t		cq_timeouts;
>   		unsigned		cq_last_tm_flush;
> +		unsigned		cq_extra;
>   		unsigned long		cq_check_overflow;
>   		struct wait_queue_head	cq_wait;
>   		struct fasync_struct	*cq_fasync;
> @@ -1183,8 +1184,8 @@ static bool req_need_defer(struct io_kiocb *req, u32 seq)
>   	if (unlikely(req->flags & REQ_F_IO_DRAIN)) {
>   		struct io_ring_ctx *ctx = req->ctx;
>   
> -		return seq != ctx->cached_cq_tail
> -				+ READ_ONCE(ctx->cached_cq_overflow);
> +		return seq + ctx->cq_extra != ctx->cached_cq_tail
> +			+ READ_ONCE(ctx->cached_cq_overflow);
>   	}
>   
>   	return false;
> @@ -4894,6 +4895,9 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask, int error)
>   		req->poll.done = true;
>   		flags = 0;
>   	}
> +	if (flags & IORING_CQE_F_MORE)
> +		ctx->cq_extra++;
> +
>   	io_commit_cqring(ctx);
>   	return !(flags & IORING_CQE_F_MORE);
>   }
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-04-05 16:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-31  9:01 [PATCH for-5.13] io_uring: maintain drain requests' logic Hao Xu
2021-03-31 15:36 ` Jens Axboe
2021-04-01  6:58   ` Hao Xu
2021-03-31 22:06 ` Pavel Begunkov
2021-04-01  6:53   ` Hao Xu
2021-04-01 10:25     ` Pavel Begunkov
     [not found]       ` <[email protected]>
2021-04-01 22:29         ` Pavel Begunkov
2021-04-03  6:58           ` Hao Xu
2021-04-04 23:07             ` Jens Axboe
2021-04-05 16:11               ` Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox