public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Marcelo Diop-Gonzalez <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH v2 2/2] io_uring: flush timeouts that should already have expired
Date: Sat, 2 Jan 2021 20:26:26 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 02/01/2021 19:54, Pavel Begunkov wrote:
> On 19/12/2020 19:15, Marcelo Diop-Gonzalez wrote:
>> Right now io_flush_timeouts() checks if the current number of events
>> is equal to ->timeout.target_seq, but this will miss some timeouts if
>> there have been more than 1 event added since the last time they were
>> flushed (possible in io_submit_flush_completions(), for example). Fix
>> it by recording the starting value of ->cached_cq_overflow -
>> ->cq_timeouts instead of the target value, so that we can safely
>> (without overflow problems) compare the number of events that have
>> happened with the number of events needed to trigger the timeout.

https://www.spinics.net/lists/kernel/msg3475160.html

The idea was to replace u32 cached_cq_tail with u64 while keeping
timeout offsets u32. Assuming that we won't ever hit ~2^62 inflight
requests, complete all requests falling into some large enough window
behind that u64 cached_cq_tail.

simplifying:

i64 d = target_off - ctx->u64_cq_tail
if (d <= 0 && d > -2^32)
	complete_it()

Not fond  of it, but at least worked at that time. You can try out
this approach if you want, but would be perfect if you would find
something more elegant :)

>>
>> Signed-off-by: Marcelo Diop-Gonzalez <[email protected]>
>> ---
>>  fs/io_uring.c | 30 +++++++++++++++++++++++-------
>>  1 file changed, 23 insertions(+), 7 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index f394bf358022..f62de0cb5fc4 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -444,7 +444,7 @@ struct io_cancel {
>>  struct io_timeout {
>>  	struct file			*file;
>>  	u32				off;
>> -	u32				target_seq;
>> +	u32				start_seq;
>>  	struct list_head		list;
>>  	/* head of the link, used by linked timeouts only */
>>  	struct io_kiocb			*head;
>> @@ -1629,6 +1629,24 @@ static void __io_queue_deferred(struct io_ring_ctx *ctx)
>>  	} while (!list_empty(&ctx->defer_list));
>>  }
>>  
>> +static inline u32 io_timeout_events_left(struct io_kiocb *req)
>> +{
>> +	struct io_ring_ctx *ctx = req->ctx;
>> +	u32 events;
>> +
>> +	/*
>> +	 * events -= req->timeout.start_seq and the comparison between
>> +	 * ->timeout.off and events will not overflow because each time
>> +	 * ->cq_timeouts is incremented, ->cached_cq_tail is incremented too.
>> +	 */
>> +
>> +	events = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
>> +	events -= req->timeout.start_seq;
> 
> It looks to me that events before the start_seq subtraction can have got wrapped
> around start_seq.
> 
> e.g.
> 1) you submit a timeout with off=0xff...ff (start_seq=0 for convenience)
> 
> 2) some time has passed, let @events = 0xff..ff - 1
> so the timeout still waits
> 
> 3) we commit 5 requests at once and call io_commit_cqring() only once for
> them, so we get @events == 0xff..ff - 1 + 5, i.e. 4
> 
> @events == 4 < off == 0xff...ff,
> so we didn't trigger out timeout even though should have
> 
>> +	if (req->timeout.off > events)
>> +		return req->timeout.off - events;
>> +	return 0;
>> +}
>> +
>>  static void io_flush_timeouts(struct io_ring_ctx *ctx)
>>  {
>>  	while (!list_empty(&ctx->timeout_list)) {
>> @@ -1637,8 +1655,7 @@ static void io_flush_timeouts(struct io_ring_ctx *ctx)
>>  
>>  		if (io_is_timeout_noseq(req))
>>  			break;
>> -		if (req->timeout.target_seq != ctx->cached_cq_tail
>> -					- atomic_read(&ctx->cq_timeouts))
>> +		if (io_timeout_events_left(req) > 0)
>>  			break;
>>  
>>  		list_del_init(&req->timeout.list);
>> @@ -5785,7 +5802,6 @@ static int io_timeout(struct io_kiocb *req)
>>  	struct io_ring_ctx *ctx = req->ctx;
>>  	struct io_timeout_data *data = req->async_data;
>>  	struct list_head *entry;
>> -	u32 tail, off = req->timeout.off;
>>  
>>  	spin_lock_irq(&ctx->completion_lock);
>>  
>> @@ -5799,8 +5815,8 @@ static int io_timeout(struct io_kiocb *req)
>>  		goto add;
>>  	}
>>  
>> -	tail = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
>> -	req->timeout.target_seq = tail + off;
>> +	req->timeout.start_seq = ctx->cached_cq_tail -
>> +		atomic_read(&ctx->cq_timeouts);
>>  
>>  	/*
>>  	 * Insertion sort, ensuring the first entry in the list is always
>> @@ -5813,7 +5829,7 @@ static int io_timeout(struct io_kiocb *req)
>>  		if (io_is_timeout_noseq(nxt))
>>  			continue;
>>  		/* nxt.seq is behind @tail, otherwise would've been completed */
>> -		if (off >= nxt->timeout.target_seq - tail)
>> +		if (req->timeout.off >= io_timeout_events_left(nxt))
>>  			break;
>>  	}
>>  add:
>>
> 

-- 
Pavel Begunkov

  reply	other threads:[~2021-01-02 20:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-19 19:15 [PATCH v2 0/2] io_uring: fix skipping of old timeout events Marcelo Diop-Gonzalez
2020-12-19 19:15 ` [PATCH v2 1/2] io_uring: only increment ->cq_timeouts along with ->cached_cq_tail Marcelo Diop-Gonzalez
2021-01-02 20:03   ` Pavel Begunkov
2021-01-04 16:49     ` Marcelo Diop-Gonzalez
2020-12-19 19:15 ` [PATCH v2 2/2] io_uring: flush timeouts that should already have expired Marcelo Diop-Gonzalez
2021-01-02 19:54   ` Pavel Begunkov
2021-01-02 20:26     ` Pavel Begunkov [this message]
2021-01-08 15:57       ` Marcelo Diop-Gonzalez
2021-01-11  4:57         ` Pavel Begunkov
2021-01-11 15:28           ` Marcelo Diop-Gonzalez
2021-01-12 20:47         ` Pavel Begunkov
2021-01-13 14:41           ` Marcelo Diop-Gonzalez
2021-01-13 15:20             ` Pavel Begunkov
2021-01-14  0:46           ` Marcelo Diop-Gonzalez
2021-01-14 21:04             ` Pavel Begunkov
2021-01-04 17:56     ` Marcelo Diop-Gonzalez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox