From: Xan Charbonnet <[email protected]>
To: Pavel Begunkov <[email protected]>,
Salvatore Bonaccorso <[email protected]>
Cc: [email protected], Jens Axboe <[email protected]>,
Bernhard Schmidt <[email protected]>,
[email protected], [email protected]
Subject: Re: Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs
Date: Fri, 24 Jan 2025 10:30:18 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 1/24/25 04:33, Pavel Begunkov wrote:
> Thanks for narrowing it down. Xan, can you try this change please?
> Waiters can miss wake ups without it, seems to match the description.
>
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 9b58ba4616d40..e5a8ee944ef59 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -592,8 +592,10 @@ static inline void __io_cq_unlock_post_flush(struct io_ring_ctx *ctx)
> io_commit_cqring(ctx);
> spin_unlock(&ctx->completion_lock);
> io_commit_cqring_flush(ctx);
> - if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
> + if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN)) {
> + smp_mb();
> __io_cqring_wake(ctx);
> + }
> }
>
> void io_cq_unlock_post(struct io_ring_ctx *ctx)
>
Thanks Pavel! Early results look very good for this change. I'm now
running 6.1.120 with your added smp_mb() call. The backup process which
had been quickly triggering the issue has been running longer than it
ever did when it would ultimately fail. So that's great!
One sour note: overnight, replication hung on this machine, which is
another failure that started happening with the jump from 6.1.119 to
6.1.123. The machine was running 6.1.124 with the
__io_cq_unlock_post_flush function removed completely. That's the
kernel we had celebrated yesterday for running the backup process
successfully.
So, we might have two separate issues to deal with, unfortunately.
This morning, I found that replication had hung and was behind by some
35,000 seconds. I attached gdb and then detached it, which got things
moving again (which goes the extra mile to prove that this is a very
closely related issue). Then it hung up again at about 25,000 seconds
behind. At that point I rebooted into the new kernel, the 6.1.120
kernel with the added smp_mb() call. The lag is now all the way down to
5,000 seconds without hanging again.
It looks like there are 5 io_uring-related patches in 6.1.122 and
another 1 in 6.1.123. My guess is the replication is hitting a problem
with one of those.
Unfortunately, a replication hang is much harder for me to reproduce
than the issue with the backup procedure, which always failed within 15
minutes. It certainly looks to me like the patched 6.1.120 does not
have the hang (but it's hard to be 100% certain). Perhaps the next step
is to apply the extra smp_mb() call to 6.1.123 and see if I can get
replication to hang.
next prev parent reply other threads:[~2025-01-24 16:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <173706089225.4380.9492796104667651797.reportbug@backup22.biblionix.com>
[not found] ` <[email protected]>
[not found] ` <[email protected]>
[not found] ` <[email protected]>
2025-01-23 20:05 ` Bug#1093243: Upgrade to 6.1.123 kernel causes mariadb hangs Salvatore Bonaccorso
2025-01-23 20:26 ` Jens Axboe
[not found] ` <[email protected]>
2025-01-23 20:49 ` Salvatore Bonaccorso
2025-01-23 23:20 ` Pavel Begunkov
2025-01-24 2:10 ` Xan Charbonnet
2025-01-24 5:24 ` Salvatore Bonaccorso
2025-01-24 10:33 ` Pavel Begunkov
2025-01-24 16:30 ` Xan Charbonnet [this message]
2025-01-24 18:40 ` Pavel Begunkov
2025-01-24 20:33 ` Salvatore Bonaccorso
2025-01-24 20:51 ` Jens Axboe
2025-01-26 22:48 ` Xan Charbonnet
2025-01-27 16:38 ` Xan Charbonnet
2025-01-27 17:21 ` Pavel Begunkov
2025-01-27 16:49 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=df3b4c93-ea70-4b66-9bb5-b5cf6193190e@charbonnet.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox