* [PATCH 0/4] iowq fix @ 2021-09-11 19:40 Hao Xu 2021-09-11 19:40 ` [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() Hao Xu ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Hao Xu @ 2021-09-11 19:40 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi several iowq fixes, all in theory, haven't been manually triggered. Hao Xu (4): io-wq: tweak return value of io_wqe_create_worker() io-wq: code clean of io_wqe_create_worker() io-wq: fix worker->refcount when creating worker in work exit io-wq: fix potential race of acct->nr_workers fs/io-wq.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) -- 2.24.4 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() 2021-09-11 19:40 [PATCH 0/4] iowq fix Hao Xu @ 2021-09-11 19:40 ` Hao Xu 2021-09-12 18:10 ` Jens Axboe 2021-09-11 19:40 ` [PATCH 2/4] io-wq: code clean " Hao Xu ` (2 subsequent siblings) 3 siblings, 1 reply; 15+ messages in thread From: Hao Xu @ 2021-09-11 19:40 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi The return value of io_wqe_create_worker() should be false if we cannot create a new worker according to the name of this function. Signed-off-by: Hao Xu <[email protected]> --- fs/io-wq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 382efca4812b..1b102494e970 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -267,7 +267,7 @@ static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) return create_io_worker(wqe->wq, wqe, acct->index); } - return true; + return false; } static void io_wqe_inc_running(struct io_worker *worker) -- 2.24.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() 2021-09-11 19:40 ` [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() Hao Xu @ 2021-09-12 18:10 ` Jens Axboe 2021-09-12 19:02 ` Hao Xu 0 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2021-09-12 18:10 UTC (permalink / raw) To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi On 9/11/21 1:40 PM, Hao Xu wrote: > The return value of io_wqe_create_worker() should be false if we cannot > create a new worker according to the name of this function. > > Signed-off-by: Hao Xu <[email protected]> > --- > fs/io-wq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/io-wq.c b/fs/io-wq.c > index 382efca4812b..1b102494e970 100644 > --- a/fs/io-wq.c > +++ b/fs/io-wq.c > @@ -267,7 +267,7 @@ static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) > return create_io_worker(wqe->wq, wqe, acct->index); > } > > - return true; > + return false; > } I think this is just a bit confusing. It's not an error case, we just didn't need to create a worker. So don't return failure, or the caller will think that we failed while we did not. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() 2021-09-12 18:10 ` Jens Axboe @ 2021-09-12 19:02 ` Hao Xu 2021-09-12 21:34 ` Jens Axboe 0 siblings, 1 reply; 15+ messages in thread From: Hao Xu @ 2021-09-12 19:02 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi 在 2021/9/13 上午2:10, Jens Axboe 写道: > On 9/11/21 1:40 PM, Hao Xu wrote: >> The return value of io_wqe_create_worker() should be false if we cannot >> create a new worker according to the name of this function. >> >> Signed-off-by: Hao Xu <[email protected]> >> --- >> fs/io-wq.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/io-wq.c b/fs/io-wq.c >> index 382efca4812b..1b102494e970 100644 >> --- a/fs/io-wq.c >> +++ b/fs/io-wq.c >> @@ -267,7 +267,7 @@ static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) >> return create_io_worker(wqe->wq, wqe, acct->index); >> } >> >> - return true; >> + return false; >> } > > I think this is just a bit confusing. It's not an error case, we just > didn't need to create a worker. So don't return failure, or the caller > will think that we failed while we did not. hmm, I think it is an error case----'we failed to create a new worker since nr_worker == max_worker'. nr_worker == max_worker doesn't mean 'no need', we may meet situation describled in 4/4: max_worker is 1, currently 1 worker is running, and we return true here: did_create = io_wqe_create_worker(wqe, acct); //*******nr_workers changes******// if (unlikely(!did_create)) { raw_spin_lock(&wqe->lock); /* fatal condition, failed to create the first worker */ if (!acct->nr_workers) { raw_spin_unlock(&wqe->lock); goto run_cancel; } raw_spin_unlock(&wqe->lock); } we will miss the next check, but we have to do the check, since number of workers may decrease to 0 in //******// place. or maybe we can see the return value as 'do we create a new worker or not', and the next code do safe check if it is false. > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() 2021-09-12 19:02 ` Hao Xu @ 2021-09-12 21:34 ` Jens Axboe 2021-09-13 6:37 ` Hao Xu 0 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2021-09-12 21:34 UTC (permalink / raw) To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi On 9/12/21 1:02 PM, Hao Xu wrote: > 在 2021/9/13 上午2:10, Jens Axboe 写道: >> On 9/11/21 1:40 PM, Hao Xu wrote: >>> The return value of io_wqe_create_worker() should be false if we cannot >>> create a new worker according to the name of this function. >>> >>> Signed-off-by: Hao Xu <[email protected]> >>> --- >>> fs/io-wq.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/io-wq.c b/fs/io-wq.c >>> index 382efca4812b..1b102494e970 100644 >>> --- a/fs/io-wq.c >>> +++ b/fs/io-wq.c >>> @@ -267,7 +267,7 @@ static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) >>> return create_io_worker(wqe->wq, wqe, acct->index); >>> } >>> >>> - return true; >>> + return false; >>> } >> >> I think this is just a bit confusing. It's not an error case, we just >> didn't need to create a worker. So don't return failure, or the caller >> will think that we failed while we did not. > hmm, I think it is an error case----'we failed to create a new worker > since nr_worker == max_worker'. nr_worker == max_worker doesn't mean > 'no need', we may meet situation describled in 4/4: max_worker is 1, But that's not an error case in the sense of "uh oh, we need to handle this as an error". If we're at the max worker count, the work simply has to wait for another work to be done and process it. > currently 1 worker is running, and we return true here: > > did_create = io_wqe_create_worker(wqe, acct); > > //*******nr_workers changes******// > > if (unlikely(!did_create)) { > raw_spin_lock(&wqe->lock); > /* fatal condition, failed to create the first worker */ > if (!acct->nr_workers) { > raw_spin_unlock(&wqe->lock); > goto run_cancel; > } > raw_spin_unlock(&wqe->lock); > } > > we will miss the next check, but we have to do the check, since > number of workers may decrease to 0 in //******// place. If that happens, then the work that we have inserted has already been run. Otherwise how else could we have dropped to zero workers? -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() 2021-09-12 21:34 ` Jens Axboe @ 2021-09-13 6:37 ` Hao Xu 0 siblings, 0 replies; 15+ messages in thread From: Hao Xu @ 2021-09-13 6:37 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi 在 2021/9/13 上午5:34, Jens Axboe 写道: > On 9/12/21 1:02 PM, Hao Xu wrote: >> 在 2021/9/13 上午2:10, Jens Axboe 写道: >>> On 9/11/21 1:40 PM, Hao Xu wrote: >>>> The return value of io_wqe_create_worker() should be false if we cannot >>>> create a new worker according to the name of this function. >>>> >>>> Signed-off-by: Hao Xu <[email protected]> >>>> --- >>>> fs/io-wq.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/io-wq.c b/fs/io-wq.c >>>> index 382efca4812b..1b102494e970 100644 >>>> --- a/fs/io-wq.c >>>> +++ b/fs/io-wq.c >>>> @@ -267,7 +267,7 @@ static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) >>>> return create_io_worker(wqe->wq, wqe, acct->index); >>>> } >>>> >>>> - return true; >>>> + return false; >>>> } >>> >>> I think this is just a bit confusing. It's not an error case, we just >>> didn't need to create a worker. So don't return failure, or the caller >>> will think that we failed while we did not. >> hmm, I think it is an error case----'we failed to create a new worker >> since nr_worker == max_worker'. nr_worker == max_worker doesn't mean >> 'no need', we may meet situation describled in 4/4: max_worker is 1, > > But that's not an error case in the sense of "uh oh, we need to handle > this as an error". If we're at the max worker count, the work simply has > to wait for another work to be done and process it. > >> currently 1 worker is running, and we return true here: >> >> did_create = io_wqe_create_worker(wqe, acct); >> >> //*******nr_workers changes******// >> >> if (unlikely(!did_create)) { >> raw_spin_lock(&wqe->lock); >> /* fatal condition, failed to create the first worker */ >> if (!acct->nr_workers) { >> raw_spin_unlock(&wqe->lock); >> goto run_cancel; >> } >> raw_spin_unlock(&wqe->lock); >> } >> >> we will miss the next check, but we have to do the check, since >> number of workers may decrease to 0 in //******// place. > > If that happens, then the work that we have inserted has already been > run. Otherwise how else could we have dropped to zero workers? > Sorry, I see. I forgot the fix moved the place of nr_workers... There is no problems now. Thanks for explanation, Jens. io_wqe_enqueue worker1 no work there and timeout nr_workers--(after fix) unlock(wqe->lock) ->insert work ->io_wqe_create_worker ->io_worker_exit ->nr_workers--(before fix) ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/4] io-wq: code clean of io_wqe_create_worker() 2021-09-11 19:40 [PATCH 0/4] iowq fix Hao Xu 2021-09-11 19:40 ` [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() Hao Xu @ 2021-09-11 19:40 ` Hao Xu 2021-09-12 18:18 ` Jens Axboe 2021-09-13 8:30 ` Hao Xu 2021-09-11 19:40 ` [PATCH 3/4] io-wq: fix worker->refcount when creating worker in work exit Hao Xu 2021-09-11 19:40 ` [PATCH 4/4] io-wq: fix potential race of acct->nr_workers Hao Xu 3 siblings, 2 replies; 15+ messages in thread From: Hao Xu @ 2021-09-11 19:40 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi Remove do_create to save a local variable. Signed-off-by: Hao Xu <[email protected]> --- fs/io-wq.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 1b102494e970..0e1288a549eb 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -246,8 +246,6 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe, */ static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) { - bool do_create = false; - /* * Most likely an attempt to queue unbounded work on an io_wq that * wasn't setup with any unbounded workers. @@ -256,18 +254,15 @@ static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) pr_warn_once("io-wq is not configured for unbound workers"); raw_spin_lock(&wqe->lock); - if (acct->nr_workers < acct->max_workers) { - acct->nr_workers++; - do_create = true; + if (acct->nr_workers == acct->max_workers) { + raw_spin_unlock(&wqe->lock); + return false; } + acct->nr_workers++; raw_spin_unlock(&wqe->lock); - if (do_create) { - atomic_inc(&acct->nr_running); - atomic_inc(&wqe->wq->worker_refs); - return create_io_worker(wqe->wq, wqe, acct->index); - } - - return false; + atomic_inc(&acct->nr_running); + atomic_inc(&wqe->wq->worker_refs); + return create_io_worker(wqe->wq, wqe, acct->index); } static void io_wqe_inc_running(struct io_worker *worker) -- 2.24.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] io-wq: code clean of io_wqe_create_worker() 2021-09-11 19:40 ` [PATCH 2/4] io-wq: code clean " Hao Xu @ 2021-09-12 18:18 ` Jens Axboe 2021-09-13 8:30 ` Hao Xu 1 sibling, 0 replies; 15+ messages in thread From: Jens Axboe @ 2021-09-12 18:18 UTC (permalink / raw) To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi On 9/11/21 1:40 PM, Hao Xu wrote: > Remove do_create to save a local variable. This one looks good, it's easier to follow as well. Applied. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/4] io-wq: code clean of io_wqe_create_worker() 2021-09-11 19:40 ` [PATCH 2/4] io-wq: code clean " Hao Xu 2021-09-12 18:18 ` Jens Axboe @ 2021-09-13 8:30 ` Hao Xu 1 sibling, 0 replies; 15+ messages in thread From: Hao Xu @ 2021-09-13 8:30 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi 在 2021/9/12 上午3:40, Hao Xu 写道: > Remove do_create to save a local variable. > > Signed-off-by: Hao Xu <[email protected]> > --- > fs/io-wq.c | 19 +++++++------------ > 1 file changed, 7 insertions(+), 12 deletions(-) > > diff --git a/fs/io-wq.c b/fs/io-wq.c > index 1b102494e970..0e1288a549eb 100644 > --- a/fs/io-wq.c > +++ b/fs/io-wq.c > @@ -246,8 +246,6 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe, > */ > static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) > { > - bool do_create = false; > - > /* > * Most likely an attempt to queue unbounded work on an io_wq that > * wasn't setup with any unbounded workers. > @@ -256,18 +254,15 @@ static bool io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) > pr_warn_once("io-wq is not configured for unbound workers"); > > raw_spin_lock(&wqe->lock); > - if (acct->nr_workers < acct->max_workers) { > - acct->nr_workers++; > - do_create = true; > + if (acct->nr_workers == acct->max_workers) { > + raw_spin_unlock(&wqe->lock); > + return false; Hi Jens, would you like to tweak it to return true or would like me to send a fix. > } > + acct->nr_workers++; > raw_spin_unlock(&wqe->lock); > - if (do_create) { > - atomic_inc(&acct->nr_running); > - atomic_inc(&wqe->wq->worker_refs); > - return create_io_worker(wqe->wq, wqe, acct->index); > - } > - > - return false; > + atomic_inc(&acct->nr_running); > + atomic_inc(&wqe->wq->worker_refs); > + return create_io_worker(wqe->wq, wqe, acct->index); > } > > static void io_wqe_inc_running(struct io_worker *worker) > ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 3/4] io-wq: fix worker->refcount when creating worker in work exit 2021-09-11 19:40 [PATCH 0/4] iowq fix Hao Xu 2021-09-11 19:40 ` [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() Hao Xu 2021-09-11 19:40 ` [PATCH 2/4] io-wq: code clean " Hao Xu @ 2021-09-11 19:40 ` Hao Xu 2021-09-11 22:13 ` Jens Axboe 2021-09-11 19:40 ` [PATCH 4/4] io-wq: fix potential race of acct->nr_workers Hao Xu 3 siblings, 1 reply; 15+ messages in thread From: Hao Xu @ 2021-09-11 19:40 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi We may enter the worker creation path from io_worker_exit(), and refcount is already zero, which causes definite failure of worker creation. io_worker_exit ref = 0 ->io_wqe_dec_running ->io_queue_worker_create ->io_worker_get failure since ref is 0 Signed-off-by: Hao Xu <[email protected]> --- fs/io-wq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/io-wq.c b/fs/io-wq.c index 0e1288a549eb..75e79571bdfd 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -188,7 +188,9 @@ static void io_worker_exit(struct io_worker *worker) list_del_rcu(&worker->all_list); acct->nr_workers--; preempt_disable(); + refcount_set(&worker->ref, 1); io_wqe_dec_running(worker); + refcount_set(&worker->ref, 0); worker->flags = 0; current->flags &= ~PF_IO_WORKER; preempt_enable(); -- 2.24.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] io-wq: fix worker->refcount when creating worker in work exit 2021-09-11 19:40 ` [PATCH 3/4] io-wq: fix worker->refcount when creating worker in work exit Hao Xu @ 2021-09-11 22:13 ` Jens Axboe 2021-09-12 9:04 ` Hao Xu 0 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2021-09-11 22:13 UTC (permalink / raw) To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi On 9/11/21 1:40 PM, Hao Xu wrote: > We may enter the worker creation path from io_worker_exit(), and > refcount is already zero, which causes definite failure of worker > creation. > io_worker_exit > ref = 0 > ->io_wqe_dec_running > ->io_queue_worker_create > ->io_worker_get failure since ref is 0 > > Signed-off-by: Hao Xu <[email protected]> > --- > fs/io-wq.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/fs/io-wq.c b/fs/io-wq.c > index 0e1288a549eb..75e79571bdfd 100644 > --- a/fs/io-wq.c > +++ b/fs/io-wq.c > @@ -188,7 +188,9 @@ static void io_worker_exit(struct io_worker *worker) > list_del_rcu(&worker->all_list); > acct->nr_workers--; > preempt_disable(); > + refcount_set(&worker->ref, 1); > io_wqe_dec_running(worker); > + refcount_set(&worker->ref, 0); That kind of refcount manipulation is highly suspect. And in fact it should not be needed, io_worker_exit() clears ->flags before going on with worker teardown. Hence you can't hit worker creation from io_wqe_dec_running(). -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] io-wq: fix worker->refcount when creating worker in work exit 2021-09-11 22:13 ` Jens Axboe @ 2021-09-12 9:04 ` Hao Xu 2021-09-12 18:07 ` Jens Axboe 0 siblings, 1 reply; 15+ messages in thread From: Hao Xu @ 2021-09-12 9:04 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi 在 2021/9/12 上午6:13, Jens Axboe 写道: > On 9/11/21 1:40 PM, Hao Xu wrote: >> We may enter the worker creation path from io_worker_exit(), and >> refcount is already zero, which causes definite failure of worker >> creation. >> io_worker_exit >> ref = 0 >> ->io_wqe_dec_running >> ->io_queue_worker_create >> ->io_worker_get failure since ref is 0 >> >> Signed-off-by: Hao Xu <[email protected]> >> --- >> fs/io-wq.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/fs/io-wq.c b/fs/io-wq.c >> index 0e1288a549eb..75e79571bdfd 100644 >> --- a/fs/io-wq.c >> +++ b/fs/io-wq.c >> @@ -188,7 +188,9 @@ static void io_worker_exit(struct io_worker *worker) >> list_del_rcu(&worker->all_list); >> acct->nr_workers--; >> preempt_disable(); >> + refcount_set(&worker->ref, 1); >> io_wqe_dec_running(worker); >> + refcount_set(&worker->ref, 0); > > That kind of refcount manipulation is highly suspect. And in fact it > should not be needed, io_worker_exit() clears ->flags before going on > with worker teardown. Hence you can't hit worker creation from > io_wqe_dec_running(). Doesn't see the relationship between worker->flags and the creation. But yes, the creation path does io_worker_get() which causes failure if it's from io_worker_exit(), Now I understand it is more like a feature, isn't it? Anyway, the issue in 4/4 seems still there. Regards, Hao > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] io-wq: fix worker->refcount when creating worker in work exit 2021-09-12 9:04 ` Hao Xu @ 2021-09-12 18:07 ` Jens Axboe 0 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2021-09-12 18:07 UTC (permalink / raw) To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi On 9/12/21 3:04 AM, Hao Xu wrote: > 在 2021/9/12 上午6:13, Jens Axboe 写道: >> On 9/11/21 1:40 PM, Hao Xu wrote: >>> We may enter the worker creation path from io_worker_exit(), and >>> refcount is already zero, which causes definite failure of worker >>> creation. >>> io_worker_exit >>> ref = 0 >>> ->io_wqe_dec_running >>> ->io_queue_worker_create >>> ->io_worker_get failure since ref is 0 >>> >>> Signed-off-by: Hao Xu <[email protected]> >>> --- >>> fs/io-wq.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/fs/io-wq.c b/fs/io-wq.c >>> index 0e1288a549eb..75e79571bdfd 100644 >>> --- a/fs/io-wq.c >>> +++ b/fs/io-wq.c >>> @@ -188,7 +188,9 @@ static void io_worker_exit(struct io_worker *worker) >>> list_del_rcu(&worker->all_list); >>> acct->nr_workers--; >>> preempt_disable(); >>> + refcount_set(&worker->ref, 1); >>> io_wqe_dec_running(worker); >>> + refcount_set(&worker->ref, 0); >> >> That kind of refcount manipulation is highly suspect. And in fact it >> should not be needed, io_worker_exit() clears ->flags before going on >> with worker teardown. Hence you can't hit worker creation from >> io_wqe_dec_running(). > Doesn't see the relationship between worker->flags and the creation. > But yes, the creation path does io_worker_get() which causes failure > if it's from io_worker_exit(), Now I understand it is more like a > feature, isn't it? Anyway, the issue in 4/4 seems still there. Right, that's on purpose. In any case, the above would fail miserably if it raced with someone trying to get a reference on the worker: A B refcount_set(ref, 1) io_worker_get(), succeeds now 2 refcount_set(ref, 0) oops... -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/4] io-wq: fix potential race of acct->nr_workers 2021-09-11 19:40 [PATCH 0/4] iowq fix Hao Xu ` (2 preceding siblings ...) 2021-09-11 19:40 ` [PATCH 3/4] io-wq: fix worker->refcount when creating worker in work exit Hao Xu @ 2021-09-11 19:40 ` Hao Xu 2021-09-12 18:23 ` Jens Axboe 3 siblings, 1 reply; 15+ messages in thread From: Hao Xu @ 2021-09-11 19:40 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi Given max_worker is 1, and we currently have 1 running and it is exiting. There may be race like: io_wqe_enqueue worker1 no work there and timeout unlock(wqe->lock) ->insert work -->io_worker_exit lock(wqe->lock) ->if(!nr_workers) //it's still 1 unlock(wqe->lock) goto run_cancel lock(wqe->lock) nr_workers-- ->dec_running ->worker creation fails unlock(wqe->lock) We enqueued one work but there is no workers, causes hung. Signed-off-by: Hao Xu <[email protected]> --- fs/io-wq.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/io-wq.c b/fs/io-wq.c index 75e79571bdfd..b84dc8df6c68 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -176,7 +176,6 @@ static void io_worker_ref_put(struct io_wq *wq) static void io_worker_exit(struct io_worker *worker) { struct io_wqe *wqe = worker->wqe; - struct io_wqe_acct *acct = io_wqe_get_acct(worker); if (refcount_dec_and_test(&worker->ref)) complete(&worker->ref_done); @@ -186,7 +185,6 @@ static void io_worker_exit(struct io_worker *worker) if (worker->flags & IO_WORKER_F_FREE) hlist_nulls_del_rcu(&worker->nulls_node); list_del_rcu(&worker->all_list); - acct->nr_workers--; preempt_disable(); refcount_set(&worker->ref, 1); io_wqe_dec_running(worker); @@ -571,6 +569,7 @@ static int io_wqe_worker(void *data) } /* timed out, exit unless we're the last worker */ if (last_timeout && acct->nr_workers > 1) { + acct->nr_workers--; raw_spin_unlock(&wqe->lock); __set_current_state(TASK_RUNNING); break; -- 2.24.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io-wq: fix potential race of acct->nr_workers 2021-09-11 19:40 ` [PATCH 4/4] io-wq: fix potential race of acct->nr_workers Hao Xu @ 2021-09-12 18:23 ` Jens Axboe 0 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2021-09-12 18:23 UTC (permalink / raw) To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi On 9/11/21 1:40 PM, Hao Xu wrote: > Given max_worker is 1, and we currently have 1 running and it is > exiting. There may be race like: > io_wqe_enqueue worker1 > no work there and timeout > unlock(wqe->lock) > ->insert work > -->io_worker_exit > lock(wqe->lock) > ->if(!nr_workers) //it's still 1 > unlock(wqe->lock) > goto run_cancel > lock(wqe->lock) > nr_workers-- > ->dec_running > ->worker creation fails > unlock(wqe->lock) > > We enqueued one work but there is no workers, causes hung. This one also looks good, applied. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2021-09-13 8:30 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-09-11 19:40 [PATCH 0/4] iowq fix Hao Xu 2021-09-11 19:40 ` [PATCH 1/4] io-wq: tweak return value of io_wqe_create_worker() Hao Xu 2021-09-12 18:10 ` Jens Axboe 2021-09-12 19:02 ` Hao Xu 2021-09-12 21:34 ` Jens Axboe 2021-09-13 6:37 ` Hao Xu 2021-09-11 19:40 ` [PATCH 2/4] io-wq: code clean " Hao Xu 2021-09-12 18:18 ` Jens Axboe 2021-09-13 8:30 ` Hao Xu 2021-09-11 19:40 ` [PATCH 3/4] io-wq: fix worker->refcount when creating worker in work exit Hao Xu 2021-09-11 22:13 ` Jens Axboe 2021-09-12 9:04 ` Hao Xu 2021-09-12 18:07 ` Jens Axboe 2021-09-11 19:40 ` [PATCH 4/4] io-wq: fix potential race of acct->nr_workers Hao Xu 2021-09-12 18:23 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox