From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>, Ming Lei <[email protected]>
Cc: [email protected], [email protected],
Kanchan Joshi <[email protected]>
Subject: Re: (subset) [PATCH 00/11] remove aux CQE caches
Date: Sun, 17 Mar 2024 16:24:07 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 3/17/24 4:07 PM, Jens Axboe wrote:
> On 3/17/24 3:51 PM, Jens Axboe wrote:
>> On 3/17/24 3:47 PM, Pavel Begunkov wrote:
>>> On 3/17/24 21:34, Pavel Begunkov wrote:
>>>> On 3/17/24 21:32, Jens Axboe wrote:
>>>>> On 3/17/24 3:29 PM, Pavel Begunkov wrote:
>>>>>> On 3/17/24 21:24, Jens Axboe wrote:
>>>>>>> On 3/17/24 2:55 PM, Pavel Begunkov wrote:
>>>>>>>> On 3/16/24 13:56, Ming Lei wrote:
>>>>>>>>> On Sat, Mar 16, 2024 at 01:27:17PM +0000, Pavel Begunkov wrote:
>>>>>>>>>> On 3/16/24 11:52, Ming Lei wrote:
>>>>>>>>>>> On Fri, Mar 15, 2024 at 04:53:21PM -0600, Jens Axboe wrote:
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>> The following two error can be triggered with this patchset
>>>>>>>>>>> when running some ublk stress test(io vs. deletion). And not see
>>>>>>>>>>> such failures after reverting the 11 patches.
>>>>>>>>>>
>>>>>>>>>> I suppose it's with the fix from yesterday. How can I
>>>>>>>>>> reproduce it, blktests?
>>>>>>>>>
>>>>>>>>> Yeah, it needs yesterday's fix.
>>>>>>>>>
>>>>>>>>> You may need to run this test multiple times for triggering the problem:
>>>>>>>>
>>>>>>>> Thanks for all the testing. I've tried it, all ublk/generic tests hang
>>>>>>>> in userspace waiting for CQEs but no complaints from the kernel.
>>>>>>>> However, it seems the branch is buggy even without my patches, I
>>>>>>>> consistently (5-15 minutes of running in a slow VM) hit page underflow
>>>>>>>> by running liburing tests. Not sure what is that yet, but might also
>>>>>>>> be the reason.
>>>>>>>
>>>>>>> Hmm odd, there's nothing in there but your series and then the
>>>>>>> io_uring-6.9 bits pulled in. Maybe it hit an unfortunate point in the
>>>>>>> merge window -git cycle? Does it happen with io_uring-6.9 as well? I
>>>>>>> haven't seen anything odd.
>>>>>>
>>>>>> Need to test io_uring-6.9. I actually checked the branch twice, both
>>>>>> with the issue, and by full recompilation and config prompts I assumed
>>>>>> you pulled something in between (maybe not).
>>>>>>
>>>>>> And yeah, I can't confirm it's specifically an io_uring bug, the
>>>>>> stack trace is usually some unmap or task exit, sometimes it only
>>>>>> shows when you try to shutdown the VM after tests.
>>>>>
>>>>> Funky. I just ran a bunch of loops of liburing tests and Ming's ublksrv
>>>>> test case as well on io_uring-6.9 and it all worked fine. Trying
>>>>> liburing tests on for-6.10/io_uring as well now, but didn't see anything
>>>>> the other times I ran it. In any case, once you repost I'll rebase and
>>>>> then let's see if it hits again.
>>>>>
>>>>> Did you run with KASAN enabled
>>>>
>>>> Yes, it's a debug kernel, full on KASANs, lockdeps and so
>>>
>>> And another note, I triggered it once (IIRC on shutdown) with ublk
>>> tests only w/o liburing/tests, likely limits it to either the core
>>> io_uring infra or non-io_uring bugs.
>>
>> Been running on for-6.10/io_uring, and the only odd thing I see is that
>> the test output tends to stall here:
>>
>> Running test read-before-exit.t
>>
>> which then either leads to a connection disconnect from my ssh into that
>> vm, or just a long delay and then it picks up again. This did not happen
>> with io_uring-6.9.
>>
>> Maybe related? At least it's something new. Just checked again, and yeah
>> it seems to totally lock up the vm while that is running. Will try a
>> quick bisect of that series.
>
> Seems to be triggered by the top of branch patch in there, my poll and
> timeout special casing. While the above test case runs with that commit,
> it'll freeze the host.
Had a feeling this was the busy looping off cancelations, and flushing
the fallback task_work seems to fix it. I'll check more tomorrow.
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index a2cb8da3cc33..f1d3c5e065e9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3242,6 +3242,8 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
ret |= io_kill_timeouts(ctx, task, cancel_all);
if (task)
ret |= io_run_task_work() > 0;
+ else if (ret)
+ flush_delayed_work(&ctx->fallback_work);
return ret;
}
--
Jens Axboe
next prev parent reply other threads:[~2024-03-17 22:24 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-15 15:29 [PATCH 00/11] remove aux CQE caches Pavel Begunkov
2024-03-15 15:29 ` [PATCH 01/11] io_uring: fix poll_remove stalled req completion Pavel Begunkov
2024-03-15 15:29 ` [PATCH 02/11] io_uring/cmd: kill one issue_flags to tw conversion Pavel Begunkov
2024-03-15 15:29 ` [PATCH 03/11] io_uring/cmd: fix tw <-> issue_flags conversion Pavel Begunkov
2024-03-15 15:29 ` [PATCH 04/11] io_uring/cmd: introduce io_uring_cmd_complete Pavel Begunkov
2024-03-15 15:29 ` [PATCH 05/11] ublk: don't hard code IO_URING_F_UNLOCKED Pavel Begunkov
2024-03-15 15:29 ` [PATCH 06/11] nvme/io_uring: " Pavel Begunkov
2024-03-15 15:29 ` [PATCH 07/11] io_uring/rw: avoid punting to io-wq directly Pavel Begunkov
2024-03-15 15:29 ` [PATCH 08/11] io_uring: force tw ctx locking Pavel Begunkov
2024-03-15 15:40 ` Jens Axboe
2024-03-15 16:14 ` Pavel Begunkov
2024-03-15 15:29 ` [PATCH 09/11] io_uring: remove struct io_tw_state::locked Pavel Begunkov
2024-03-15 15:30 ` [PATCH 10/11] io_uring: refactor io_fill_cqe_req_aux Pavel Begunkov
2024-03-15 15:30 ` [PATCH 11/11] io_uring: get rid of intermediate aux cqe caches Pavel Begunkov
2024-03-15 16:20 ` Jens Axboe
2024-03-15 16:23 ` Pavel Begunkov
2024-03-15 16:25 ` Jens Axboe
2024-03-15 16:27 ` Jens Axboe
2024-03-15 16:44 ` Pavel Begunkov
2024-03-15 16:49 ` Jens Axboe
2024-03-15 17:26 ` Pavel Begunkov
2024-03-15 18:26 ` Jens Axboe
2024-03-15 18:51 ` Pavel Begunkov
2024-03-15 19:02 ` Jens Axboe
2024-03-15 16:29 ` Pavel Begunkov
2024-03-15 16:33 ` Jens Axboe
2024-03-15 15:42 ` [PATCH 00/11] remove aux CQE caches Jens Axboe
2024-03-15 16:00 ` Jens Axboe
2024-03-15 22:53 ` (subset) " Jens Axboe
2024-03-16 2:03 ` Ming Lei
2024-03-16 2:24 ` Ming Lei
2024-03-16 2:54 ` Pavel Begunkov
2024-03-16 3:54 ` Ming Lei
2024-03-16 4:13 ` Pavel Begunkov
2024-03-16 4:20 ` Pavel Begunkov
2024-03-16 9:53 ` Ming Lei
2024-03-16 11:52 ` Ming Lei
2024-03-16 13:27 ` Pavel Begunkov
2024-03-16 13:56 ` Ming Lei
2024-03-17 20:55 ` Pavel Begunkov
2024-03-17 21:24 ` Jens Axboe
2024-03-17 21:29 ` Pavel Begunkov
2024-03-17 21:32 ` Jens Axboe
2024-03-17 21:34 ` Pavel Begunkov
2024-03-17 21:47 ` Pavel Begunkov
2024-03-17 21:51 ` Jens Axboe
2024-03-17 22:07 ` Jens Axboe
2024-03-17 22:24 ` Jens Axboe [this message]
2024-03-18 0:15 ` Ming Lei
2024-03-18 1:34 ` Jens Axboe
2024-03-18 1:44 ` Jens Axboe
2024-03-18 1:49 ` Ming Lei
2024-03-17 23:16 ` Pavel Begunkov
2024-03-16 14:39 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox