public inbox for [email protected]
 help / color / mirror / Atom feed
From: Hao Xu <[email protected]>
To: Jens Axboe <[email protected]>, Nadav Amit <[email protected]>
Cc: [email protected]
Subject: Re: Race between io_wqe_worker() and io_wqe_wake_worker()
Date: Wed, 4 Aug 2021 01:25:37 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

在 2021/8/3 下午10:37, Jens Axboe 写道:
> On 8/3/21 7:22 AM, Jens Axboe wrote:
>> On 8/2/21 7:05 PM, Nadav Amit wrote:
>>> Hello Jens,
>>>
>>> I encountered an issue, which appears to be a race between
>>> io_wqe_worker() and io_wqe_wake_worker(). I am not sure how to address
>>> this issue and whether I am missing something, since this seems to
>>> occur in a common scenario. Your feedback (or fix ;-)) would be
>>> appreciated.
>>>
>>> I run on 5.13 a workload that issues multiple async read operations
>>> that should run concurrently. Some read operations can not complete
>>> for unbounded time (e.g., read from a pipe that is never written to).
>>> The problem is that occasionally another read operation that should
>>> complete gets stuck. My understanding, based on debugging and the code
>>> is that the following race (or similar) occurs:
>>>
>>>
>>>    cpu0					cpu1
>>>    ----					----
>>> 					io_wqe_worker()
>>> 					 schedule_timeout()
>>> 					 // timed out
>>>    io_wqe_enqueue()
>>>     io_wqe_wake_worker()
>>>      // work_flags & IO_WQ_WORK_CONCURRENT
>>>      io_wqe_activate_free_worker()
>>> 					 io_worker_exit()
>>>
>>>
>>> Basically, io_wqe_wake_worker() can find a worker, but this worker is
>>> about to exit and is not going to process further work. Once the
>>> worker exits, the concurrency level decreases and async work might be
>>> blocked by another work. I had a look at 5.14, but did not see
>>> anything that might address this issue.
>>>
>>> Am I missing something?
>>>
>>> If not, all my ideas for a solution are either complicated (track
>>> required concurrency-level) or relaxed (span another worker on
>>> io_worker_exit if work_list of unbounded work is not empty).
>>>
>>> As said, feedback would be appreciated.
>>
>> You are right that there's definitely a race here between checking the
>> freelist and finding a worker, but that worker is already exiting. Let
>> me mull over this a bit, I'll post something for you to try later today.
> 
> Can you try something like this? Just consider it a first tester, need
> to spend a bit more time on it to ensure we fully close the gap.
> 
> 
> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index cf086b01c6c6..e2da2042ee9e 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -42,6 +42,7 @@ struct io_worker {
>   	refcount_t ref;
>   	unsigned flags;
>   	struct hlist_nulls_node nulls_node;
> +	unsigned long exiting;
>   	struct list_head all_list;
>   	struct task_struct *task;
>   	struct io_wqe *wqe;
> @@ -214,15 +215,20 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe)
>   	struct hlist_nulls_node *n;
>   	struct io_worker *worker;
>   
> -	n = rcu_dereference(hlist_nulls_first_rcu(&wqe->free_list));
> -	if (is_a_nulls(n))
> -		return false;
> -
> -	worker = hlist_nulls_entry(n, struct io_worker, nulls_node);
> -	if (io_worker_get(worker)) {
> -		wake_up_process(worker->task);
> +	/*
> +	 * Iterate free_list and see if we can find an idle worker to
> +	 * activate. If a given worker is on the free_list but in the process
> +	 * of exiting, keep trying.
> +	 */
> +	hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) {
> +		if (!io_worker_get(worker))
> +			continue;
> +		if (!test_bit(0, &worker->exiting)) {
> +			wake_up_process(worker->task);
> +			io_worker_release(worker);
> +			return true;
> +		}
>   		io_worker_release(worker);
> -		return true;
>   	}
>   
>   	return false;
> @@ -560,8 +566,17 @@ static int io_wqe_worker(void *data)
>   		if (ret)
>   			continue;
>   		/* timed out, exit unless we're the fixed worker */
> -		if (!(worker->flags & IO_WORKER_F_FIXED))
> +		if (!(worker->flags & IO_WORKER_F_FIXED)) {
> +			/*
> +			 * Someone elevated our refs, which could be trying
> +			 * to re-activate for work. Loop one more time for
> +			 * that case.
> +			 */
> +			if (refcount_read(&worker->ref) != 1)
> +				continue;
> +			set_bit(0, &worker->exiting);
>   			break;
> +		}
>   	}
>   
>   	if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
> 

refcount check may not be enough, we may need another bit worker->in_use
and:
     io_wqe_activate_free_worker                io_wqe_worker

      set_bit(worker->in_use)               set_bit(worker->exiting)
      !test_bit(worker->exiting)            test_bit(worker->in_use)
      wake_up(worker)                       goto handle work

  reply	other threads:[~2021-08-03 17:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-03  1:05 Race between io_wqe_worker() and io_wqe_wake_worker() Nadav Amit
2021-08-03 13:22 ` Jens Axboe
2021-08-03 14:37   ` Jens Axboe
2021-08-03 17:25     ` Hao Xu [this message]
2021-08-03 18:04     ` Nadav Amit
2021-08-03 18:14       ` Jens Axboe
2021-08-03 19:20         ` Nadav Amit
2021-08-03 19:24           ` Jens Axboe
2021-08-03 19:53             ` Jens Axboe
2021-08-03 21:16               ` Nadav Amit
2021-08-03 21:25                 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=12e58686-bffe-ba42-d7a3-20d35b26eaf7@linux.alibaba.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox