From: Pavel Begunkov <[email protected]>
To: Xan Charbonnet <[email protected]>, Jens Axboe <[email protected]>,
Salvatore Bonaccorso <[email protected]>
Cc: [email protected], Bernhard Schmidt <[email protected]>,
[email protected], [email protected],
[email protected]
Subject: Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs
Date: Mon, 27 Jan 2025 16:49:26 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 1/26/25 22:48, Xan Charbonnet wrote:
> Since applying the final patch on Friday, I have seen no problems with either the backup snapshot or catching up with replication. It sure seems like things are all fixed. I haven't yet tried it on our production Galera cluster, but I expect to on Monday.
Great to hear that, thanks for the update. And I sent the fix,
hopefully it'll be merged for the nearest stable release.
> Here are Debian packages containing the modified kernel. Use at your own risk of course. Any feedback about how this works or doesn't work would be very helpful.
>
> https://charbonnet.com/linux-image-6.1.0-29-with-proposed-1093243-fix_amd64.deb
> https://charbonnet.com/linux-image-6.1.0-30-with-proposed-1093243-fix_amd64.deb
>
>
>
>
> On 1/24/25 14:51, Jens Axboe wrote:
>> On 1/24/25 1:33 PM, Salvatore Bonaccorso wrote:
>>> Hi Pavel,
>>>
>>> On Fri, Jan 24, 2025 at 06:40:51PM +0000, Pavel Begunkov wrote:
>>>> On 1/24/25 16:30, Xan Charbonnet wrote:
>>>>> On 1/24/25 04:33, Pavel Begunkov wrote:
>>>>>> Thanks for narrowing it down. Xan, can you try this change please?
>>>>>> Waiters can miss wake ups without it, seems to match the description.
>>>>>>
>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>> index 9b58ba4616d40..e5a8ee944ef59 100644
>>>>>> --- a/io_uring/io_uring.c
>>>>>> +++ b/io_uring/io_uring.c
>>>>>> @@ -592,8 +592,10 @@ static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
>>>>>> io_commit_cqring(ctx);
>>>>>> spin_unlock(&ctx->completion_lock);
>>>>>> io_commit_cqring_flush(ctx);
>>>>>> - if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
>>>>>> + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
>>>>>> + smp_mb();
>>>>>> __io_cqring_wake(ctx);
>>>>>> + }
>>>>>> }
>>>>>> void io_cq_unlock_post(struct io_ring_ctx *ctx)
>>>>>>
>>>>>
>>>>>
>>>>> Thanks Pavel! Early results look very good for this change. I'm now running 6.1.120 with your added smp_mb() call. The backup process which had been quickly triggering the issue has been running longer than it ever did when it would ultimately fail. So that's great!
>>>>>
>>>>> One sour note: overnight, replication hung on this machine, which is another failure that started happening with the jump from 6.1.119 to 6.1.123. The machine was running 6.1.124 with the __io_cq_unlock_post_flush function removed completely. That's the kernel we had celebrated yesterday for running the backup process successfully.
>>>>>
>>>>> So, we might have two separate issues to deal with, unfortunately.
>>>>
>>>> Possible, but it could also be a side effect of reverting the patch.
>>>> As usual, in most cases patches are ported either because they're
>>>> fixing sth or other fixes depend on it, and it's not yet apparent
>>>> to me what happened with this one.
>>>
>>> I researched bit the lists, and there was the inclusion request on the
>>> stable list itself. Looking into the io-uring list I found
>>> https://lore.kernel.org/io-uring/CADZouDRFJ9jtXHqkX-PTKeT=GxSwdMC42zEsAKR34psuG9tUMQ@mail.gmail.com/
>>> which I think was the trigger to later on include in fact the commit
>>> in 6.1.120.
>>
>> Yep indeed, was just looking for the backstory and that is why it got
>> backported. Just missed the fact that it should've been an
>> io_cqring_wake() rather than __io_cqring_wake()...
>>
>
--
Pavel Begunkov
prev parent reply other threads:[~2025-01-27 16:49 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <173706089225.4380.9492796104667651797.reportbug@backup22.biblionix.com>
[not found] ` <[email protected]>
[not found] ` <[email protected]>
[not found] ` <[email protected]>
2025-01-23 20:05 ` Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs Salvatore Bonaccorso
2025-01-23 20:26 ` Jens Axboe
[not found] ` <[email protected]>
2025-01-23 20:49 ` Salvatore Bonaccorso
2025-01-23 23:20 ` Pavel Begunkov
2025-01-24 2:10 ` Xan Charbonnet
2025-01-24 5:24 ` Salvatore Bonaccorso
2025-01-24 10:33 ` Pavel Begunkov
2025-01-24 16:30 ` Xan Charbonnet
2025-01-24 18:40 ` Pavel Begunkov
2025-01-24 20:33 ` Salvatore Bonaccorso
2025-01-24 20:51 ` Jens Axboe
2025-01-26 22:48 ` Xan Charbonnet
2025-01-27 16:38 ` Xan Charbonnet
2025-01-27 17:21 ` Pavel Begunkov
2025-01-27 16:49 ` Pavel Begunkov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox