From: Pavel Begunkov <[email protected]>
To: Marcelo Diop-Gonzalez <[email protected]>
Cc: Jens Axboe <[email protected]>, io-uring <[email protected]>
Subject: Re: [PATCH v2 2/2] io_uring: flush timeouts that should already have expired
Date: Thu, 14 Jan 2021 21:04:23 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 14/01/2021 00:46, Marcelo Diop-Gonzalez wrote:
> On Tue, Jan 12, 2021 at 08:47:11PM +0000, Pavel Begunkov wrote:
>> On 08/01/2021 15:57, Marcelo Diop-Gonzalez wrote:
>>> On Sat, Jan 02, 2021 at 08:26:26PM +0000, Pavel Begunkov wrote:
>>>> On 02/01/2021 19:54, Pavel Begunkov wrote:
>>>>> On 19/12/2020 19:15, Marcelo Diop-Gonzalez wrote:
>>>>>> Right now io_flush_timeouts() checks if the current number of events
>>>>>> is equal to ->timeout.target_seq, but this will miss some timeouts if
>>>>>> there have been more than 1 event added since the last time they were
>>>>>> flushed (possible in io_submit_flush_completions(), for example). Fix
>>>>>> it by recording the starting value of ->cached_cq_overflow -
>>>>>> ->cq_timeouts instead of the target value, so that we can safely
>>>>>> (without overflow problems) compare the number of events that have
>>>>>> happened with the number of events needed to trigger the timeout.
>>>>
>>>> https://www.spinics.net/lists/kernel/msg3475160.html
>>>>
>>>> The idea was to replace u32 cached_cq_tail with u64 while keeping
>>>> timeout offsets u32. Assuming that we won't ever hit ~2^62 inflight
>>>> requests, complete all requests falling into some large enough window
>>>> behind that u64 cached_cq_tail.
>>>>
>>>> simplifying:
>>>>
>>>> i64 d = target_off - ctx->u64_cq_tail
>>>> if (d <= 0 && d > -2^32)
>>>> complete_it()
>>>>
>>>> Not fond of it, but at least worked at that time. You can try out
>>>> this approach if you want, but would be perfect if you would find
>>>> something more elegant :)
>>>>
>>>
>>> What do you think about something like this? I think it's not totally
>>> correct because it relies on having ->completion_lock in io_timeout() so
>>> that ->cq_last_tm_flushed is updated, but in case of IORING_SETUP_IOPOLL,
>>> io_iopoll_complete() doesn't take that lock, and ->uring_lock will not
>>> be held if io_timeout() is called from io_wq_submit_work(), but maybe
>>> could still be worth it since that was already possibly a problem?
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index cb57e0360fcb..50984709879c 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -353,6 +353,7 @@ struct io_ring_ctx {
>>> unsigned cq_entries;
>>> unsigned cq_mask;
>>> atomic_t cq_timeouts;
>>> + unsigned cq_last_tm_flush;
>>
>> It looks like that "last flush" is a good direction.
>> I think there can be problems at extremes like completing 2^32
>> requests at once, but should be ok in practice. Anyway better
>> than it's now.
>>
>> What about the first patch about overflows and cq_timeouts? I
>> assume that problem is still there, isn't it?
>>
>> See comments below, but if it passes liburing tests, please send
>> a patch.
>>
>>> unsigned long cq_check_overflow;
>>> struct wait_queue_head cq_wait;
>>> struct fasync_struct *cq_fasync;
>>> @@ -1633,19 +1634,26 @@ static void __io_queue_deferred(struct io_ring_ctx *ctx)
>>>
>>> static void io_flush_timeouts(struct io_ring_ctx *ctx)
>>> {
>>> + u32 seq = ctx->cached_cq_tail - atomic_read(&ctx->cq_timeouts);
>>> +
>>
>> a nit,
>>
>> if (list_empty()) return; + do {} while();
>
> Ah btw, so then we would have to add ->last_flush = seq in
> io_timeout() too? I think that should be correct but just wanna make
> sure that's what you meant. Because otherwise if list_empty() is true
> for a while without updating ->last_flush then there could be
> problems. Like for example if there are no timeouts for a while, and
> seq == 2^32-2, then we add a timeout with off == 4. If last_flush is
> still 0 then target-last_flush == 2, but seq - last_flush == 2^32-2
You've just answered your question :) You need to update it somehow,
either unconditionally in commit, or in io_timeout(), or anyhow else.
btw, I like your idea to do it in io_timeout(), because it adds a
timeout anyway, so makes that list_empty() fails and kind of
automatically pushes all that tracking.
--
Pavel Begunkov
next prev parent reply other threads:[~2021-01-14 21:08 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-19 19:15 [PATCH v2 0/2] io_uring: fix skipping of old timeout events Marcelo Diop-Gonzalez
2020-12-19 19:15 ` [PATCH v2 1/2] io_uring: only increment ->cq_timeouts along with ->cached_cq_tail Marcelo Diop-Gonzalez
2021-01-02 20:03 ` Pavel Begunkov
2021-01-04 16:49 ` Marcelo Diop-Gonzalez
2020-12-19 19:15 ` [PATCH v2 2/2] io_uring: flush timeouts that should already have expired Marcelo Diop-Gonzalez
2021-01-02 19:54 ` Pavel Begunkov
2021-01-02 20:26 ` Pavel Begunkov
2021-01-08 15:57 ` Marcelo Diop-Gonzalez
2021-01-11 4:57 ` Pavel Begunkov
2021-01-11 15:28 ` Marcelo Diop-Gonzalez
2021-01-12 20:47 ` Pavel Begunkov
2021-01-13 14:41 ` Marcelo Diop-Gonzalez
2021-01-13 15:20 ` Pavel Begunkov
2021-01-14 0:46 ` Marcelo Diop-Gonzalez
2021-01-14 21:04 ` Pavel Begunkov [this message]
2021-01-04 17:56 ` Marcelo Diop-Gonzalez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox