public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Nadav Amit <[email protected]>
Cc: [email protected]
Subject: Re: Race between io_wqe_worker() and io_wqe_wake_worker()
Date: Tue, 3 Aug 2021 07:22:11 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/2/21 7:05 PM, Nadav Amit wrote:
> Hello Jens,
> 
> I encountered an issue, which appears to be a race between
> io_wqe_worker() and io_wqe_wake_worker(). I am not sure how to address
> this issue and whether I am missing something, since this seems to
> occur in a common scenario. Your feedback (or fix ;-)) would be
> appreciated.
> 
> I run on 5.13 a workload that issues multiple async read operations
> that should run concurrently. Some read operations can not complete
> for unbounded time (e.g., read from a pipe that is never written to).
> The problem is that occasionally another read operation that should
> complete gets stuck. My understanding, based on debugging and the code
> is that the following race (or similar) occurs:
> 
> 
>   cpu0					cpu1
>   ----					----
> 					io_wqe_worker()
> 					 schedule_timeout()
> 					 // timed out
>   io_wqe_enqueue()
>    io_wqe_wake_worker()
>     // work_flags & IO_WQ_WORK_CONCURRENT
>     io_wqe_activate_free_worker()
> 					 io_worker_exit()
> 
> 
> Basically, io_wqe_wake_worker() can find a worker, but this worker is
> about to exit and is not going to process further work. Once the
> worker exits, the concurrency level decreases and async work might be
> blocked by another work. I had a look at 5.14, but did not see
> anything that might address this issue.
> 
> Am I missing something?
> 
> If not, all my ideas for a solution are either complicated (track
> required concurrency-level) or relaxed (span another worker on
> io_worker_exit if work_list of unbounded work is not empty).
> 
> As said, feedback would be appreciated.

You are right that there's definitely a race here between checking the
freelist and finding a worker, but that worker is already exiting. Let
me mull over this a bit, I'll post something for you to try later today.

-- 
Jens Axboe


  reply	other threads:[~2021-08-03 13:22 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03  1:05 Race between io_wqe_worker() and io_wqe_wake_worker() Nadav Amit
2021-08-03 13:22 ` Jens Axboe [this message]
2021-08-03 14:37   ` Jens Axboe
2021-08-03 17:25     ` Hao Xu
2021-08-03 18:04     ` Nadav Amit
2021-08-03 18:14       ` Jens Axboe
2021-08-03 19:20         ` Nadav Amit
2021-08-03 19:24           ` Jens Axboe
2021-08-03 19:53             ` Jens Axboe
2021-08-03 21:16               ` Nadav Amit
2021-08-03 21:25                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox