public inbox for [email protected]
 help / color / mirror / Atom feed
From: Dave Chinner <[email protected]>
To: Jens Axboe <[email protected]>
Cc: "Andres Freund" <[email protected]>,
	"Theodore Ts'o" <[email protected]>,
	"Thorsten Leemhuis" <[email protected]>,
	"Shreeya Patel" <[email protected]>,
	[email protected],
	"Ricardo Cañuelo" <[email protected]>,
	[email protected], [email protected],
	[email protected],
	"Linux regressions mailing list" <[email protected]>,
	[email protected]
Subject: Re: task hung in ext4_fallocate #2
Date: Wed, 25 Oct 2023 11:06:25 +1100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Tue, Oct 24, 2023 at 12:35:26PM -0600, Jens Axboe wrote:
> On 10/24/23 8:30 AM, Jens Axboe wrote:
> > I don't think this is related to the io-wq workers doing non-blocking
> > IO.

The io-wq worker that has deadlocked _must_ be doing blocking IO. If
it was doing non-blocking IO (i.e. IOCB_NOWAIT) then it would have
done a trylock and returned -EAGAIN to the worker for it to try
again later. I'm not sure that would avoid the issue, however - it
seems to me like it might just turn it into a livelock rather than a
deadlock....

> > The callback is eventually executed by the task that originally
> > submitted the IO, which is the owner and not the async workers. But...
> > If that original task is blocked in eg fallocate, then I can see how
> > that would potentially be an issue.
> > 
> > I'll take a closer look.
> 
> I think the best way to fix this is likely to have inode_dio_wait() be
> interruptible, and return -ERESTARTSYS if it should be restarted. Now
> the below is obviously not a full patch, but I suspect it'll make ext4
> and xfs tick, because they should both be affected.

How does that solve the problem? Nothing will issue a signal to the
process that is waiting in inode_dio_wait() except userspace, so I
can't see how this does anything to solve the problem at hand...

I'm also very leary of adding new error handling complexity to paths
like truncate, extent cloning, fallocate, etc which expect to block
on locks until they can perform the operation safely.

On further thinking, this could be a self deadlock with
just async direct IO submission - submit an async DIO with
IOCB_CALLER_COMP, then run an unaligned async DIO that attempts to
drain in-flight DIO before continuing. Then the thread waits in
inode_dio_wait() because it can't run the completion that will drop
the i_dio_count to zero.

Hence it appears to me that we've missed some critical constraints
around nesting IO submission and completion when using
IOCB_CALLER_COMP. Further, it really isn't clear to me how deep the
scope of this problem is yet, let alone what the solution might be.

With all this in mind, and how late this is in the 6.6 cycle, can we
just revert the IOCB_CALLER_COMP changes for now?

-Dave.
-- 
Dave Chinner
[email protected]

  reply	other threads:[~2023-10-25  0:06 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <[email protected]>
     [not found] ` <[email protected]>
     [not found]   ` <[email protected]>
2023-10-24  1:12     ` task hung in ext4_fallocate #2 Dave Chinner
2023-10-24  1:36       ` Andres Freund
2023-10-24 14:30       ` Jens Axboe
2023-10-24 18:35         ` Jens Axboe
2023-10-25  0:06           ` Dave Chinner [this message]
2023-10-25  0:34             ` Jens Axboe
2023-10-25 15:31               ` Andres Freund
2023-10-25 15:36                 ` Jens Axboe
2023-10-25 16:14                   ` Andres Freund
2023-10-26  2:48                     ` Andres Freund
2023-10-25 19:55                   ` Theodore Ts'o
2023-10-25 22:28               ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox