From: Jens Axboe <[email protected]>
To: Nadav Amit <[email protected]>
Cc: [email protected]
Subject: Re: Race between io_wqe_worker() and io_wqe_wake_worker()
Date: Tue, 3 Aug 2021 08:37:52 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 8/3/21 7:22 AM, Jens Axboe wrote:
> On 8/2/21 7:05 PM, Nadav Amit wrote:
>> Hello Jens,
>>
>> I encountered an issue, which appears to be a race between
>> io_wqe_worker() and io_wqe_wake_worker(). I am not sure how to address
>> this issue and whether I am missing something, since this seems to
>> occur in a common scenario. Your feedback (or fix ;-)) would be
>> appreciated.
>>
>> I run on 5.13 a workload that issues multiple async read operations
>> that should run concurrently. Some read operations can not complete
>> for unbounded time (e.g., read from a pipe that is never written to).
>> The problem is that occasionally another read operation that should
>> complete gets stuck. My understanding, based on debugging and the code
>> is that the following race (or similar) occurs:
>>
>>
>> cpu0 cpu1
>> ---- ----
>> io_wqe_worker()
>> schedule_timeout()
>> // timed out
>> io_wqe_enqueue()
>> io_wqe_wake_worker()
>> // work_flags & IO_WQ_WORK_CONCURRENT
>> io_wqe_activate_free_worker()
>> io_worker_exit()
>>
>>
>> Basically, io_wqe_wake_worker() can find a worker, but this worker is
>> about to exit and is not going to process further work. Once the
>> worker exits, the concurrency level decreases and async work might be
>> blocked by another work. I had a look at 5.14, but did not see
>> anything that might address this issue.
>>
>> Am I missing something?
>>
>> If not, all my ideas for a solution are either complicated (track
>> required concurrency-level) or relaxed (span another worker on
>> io_worker_exit if work_list of unbounded work is not empty).
>>
>> As said, feedback would be appreciated.
>
> You are right that there's definitely a race here between checking the
> freelist and finding a worker, but that worker is already exiting. Let
> me mull over this a bit, I'll post something for you to try later today.
Can you try something like this? Just consider it a first tester, need
to spend a bit more time on it to ensure we fully close the gap.
diff --git a/fs/io-wq.c b/fs/io-wq.c
index cf086b01c6c6..e2da2042ee9e 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -42,6 +42,7 @@ struct io_worker {
refcount_t ref;
unsigned flags;
struct hlist_nulls_node nulls_node;
+ unsigned long exiting;
struct list_head all_list;
struct task_struct *task;
struct io_wqe *wqe;
@@ -214,15 +215,20 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe)
struct hlist_nulls_node *n;
struct io_worker *worker;
- n = rcu_dereference(hlist_nulls_first_rcu(&wqe->free_list));
- if (is_a_nulls(n))
- return false;
-
- worker = hlist_nulls_entry(n, struct io_worker, nulls_node);
- if (io_worker_get(worker)) {
- wake_up_process(worker->task);
+ /*
+ * Iterate free_list and see if we can find an idle worker to
+ * activate. If a given worker is on the free_list but in the process
+ * of exiting, keep trying.
+ */
+ hlist_nulls_for_each_entry_rcu(worker, n, &wqe->free_list, nulls_node) {
+ if (!io_worker_get(worker))
+ continue;
+ if (!test_bit(0, &worker->exiting)) {
+ wake_up_process(worker->task);
+ io_worker_release(worker);
+ return true;
+ }
io_worker_release(worker);
- return true;
}
return false;
@@ -560,8 +566,17 @@ static int io_wqe_worker(void *data)
if (ret)
continue;
/* timed out, exit unless we're the fixed worker */
- if (!(worker->flags & IO_WORKER_F_FIXED))
+ if (!(worker->flags & IO_WORKER_F_FIXED)) {
+ /*
+ * Someone elevated our refs, which could be trying
+ * to re-activate for work. Loop one more time for
+ * that case.
+ */
+ if (refcount_read(&worker->ref) != 1)
+ continue;
+ set_bit(0, &worker->exiting);
break;
+ }
}
if (test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
--
Jens Axboe
next prev parent reply other threads:[~2021-08-03 14:37 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-03 1:05 Race between io_wqe_worker() and io_wqe_wake_worker() Nadav Amit
2021-08-03 13:22 ` Jens Axboe
2021-08-03 14:37 ` Jens Axboe [this message]
2021-08-03 17:25 ` Hao Xu
2021-08-03 18:04 ` Nadav Amit
2021-08-03 18:14 ` Jens Axboe
2021-08-03 19:20 ` Nadav Amit
2021-08-03 19:24 ` Jens Axboe
2021-08-03 19:53 ` Jens Axboe
2021-08-03 21:16 ` Nadav Amit
2021-08-03 21:25 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox