* [PATCH 0/2] fix hangs with shared sqpoll @ 2021-04-16 0:22 Pavel Begunkov 2021-04-16 0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Pavel Begunkov @ 2021-04-16 0:22 UTC (permalink / raw) To: Jens Axboe, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. Pavel Begunkov (2): percpu_ref: add percpu_ref_atomic_count() io_uring: fix shared sqpoll cancellation hangs fs/io_uring.c | 5 +++-- include/linux/percpu-refcount.h | 1 + lib/percpu-refcount.c | 26 ++++++++++++++++++++++++++ 3 files changed, 30 insertions(+), 2 deletions(-) -- 2.24.0 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() 2021-04-16 0:22 [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov @ 2021-04-16 0:22 ` Pavel Begunkov 2021-04-16 4:45 ` Dennis Zhou 2021-04-16 15:31 ` Bart Van Assche 2021-04-16 0:22 ` [PATCH 2/2] io_uring: fix shared sqpoll cancellation hangs Pavel Begunkov 2021-04-16 0:26 ` [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov 2 siblings, 2 replies; 17+ messages in thread From: Pavel Begunkov @ 2021-04-16 0:22 UTC (permalink / raw) To: Jens Axboe, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila Add percpu_ref_atomic_count(), which returns number of references of a percpu_ref switched prior into atomic mode, so the caller is responsible to make sure it's in the right mode. Signed-off-by: Pavel Begunkov <[email protected]> --- include/linux/percpu-refcount.h | 1 + lib/percpu-refcount.c | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+) diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index 16c35a728b4c..0ff40e79efa2 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, void percpu_ref_resurrect(struct percpu_ref *ref); void percpu_ref_reinit(struct percpu_ref *ref); bool percpu_ref_is_zero(struct percpu_ref *ref); +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref); /** * percpu_ref_kill - drop the initial ref diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index a1071cdefb5a..56286995e2b8 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref) } EXPORT_SYMBOL_GPL(percpu_ref_is_zero); +/** + * percpu_ref_atomic_count - returns number of left references + * @ref: percpu_ref to test + * + * This function is safe to call as long as @ref is switch into atomic mode, + * and is between init and exit. + */ +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref) +{ + unsigned long __percpu *percpu_count; + unsigned long count, flags; + + if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count))) + return -1UL; + + /* protect us from being destroyed */ + spin_lock_irqsave(&percpu_ref_switch_lock, flags); + if (ref->data) + count = atomic_long_read(&ref->data->count); + else + count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS; + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); + + return count; +} + /** * percpu_ref_reinit - re-initialize a percpu refcount * @ref: perpcu_ref to re-initialize -- 2.24.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() 2021-04-16 0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov @ 2021-04-16 4:45 ` Dennis Zhou 2021-04-16 13:16 ` Pavel Begunkov 2021-04-16 15:31 ` Bart Van Assche 1 sibling, 1 reply; 17+ messages in thread From: Dennis Zhou @ 2021-04-16 4:45 UTC (permalink / raw) To: Pavel Begunkov Cc: Jens Axboe, io-uring, Tejun Heo, Christoph Lameter, Joakim Hassila Hello, On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote: > Add percpu_ref_atomic_count(), which returns number of references of a > percpu_ref switched prior into atomic mode, so the caller is responsible > to make sure it's in the right mode. > > Signed-off-by: Pavel Begunkov <[email protected]> > --- > include/linux/percpu-refcount.h | 1 + > lib/percpu-refcount.c | 26 ++++++++++++++++++++++++++ > 2 files changed, 27 insertions(+) > > diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h > index 16c35a728b4c..0ff40e79efa2 100644 > --- a/include/linux/percpu-refcount.h > +++ b/include/linux/percpu-refcount.h > @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, > void percpu_ref_resurrect(struct percpu_ref *ref); > void percpu_ref_reinit(struct percpu_ref *ref); > bool percpu_ref_is_zero(struct percpu_ref *ref); > +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref); > > /** > * percpu_ref_kill - drop the initial ref > diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c > index a1071cdefb5a..56286995e2b8 100644 > --- a/lib/percpu-refcount.c > +++ b/lib/percpu-refcount.c > @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref) > } > EXPORT_SYMBOL_GPL(percpu_ref_is_zero); > > +/** > + * percpu_ref_atomic_count - returns number of left references > + * @ref: percpu_ref to test > + * > + * This function is safe to call as long as @ref is switch into atomic mode, > + * and is between init and exit. > + */ > +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref) > +{ > + unsigned long __percpu *percpu_count; > + unsigned long count, flags; > + > + if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count))) > + return -1UL; > + > + /* protect us from being destroyed */ > + spin_lock_irqsave(&percpu_ref_switch_lock, flags); > + if (ref->data) > + count = atomic_long_read(&ref->data->count); > + else > + count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS; Sorry I missed Jens' patch before and also the update to percpu_ref. However, I feel like I'm missing something. This isn't entirely related to your patch, but I'm not following why percpu_count_ptr stores the excess count of an exited percpu_ref and doesn't warn when it's not zero. It seems like this should be an error if it's not 0? Granted we have made some contract with the user to do the right thing, but say someone does mess up, we don't indicate to them hey this ref is actually dead and if they're waiting for it to go to 0, it never will. > + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); > + > + return count; > +} > + > /** > * percpu_ref_reinit - re-initialize a percpu refcount > * @ref: perpcu_ref to re-initialize > -- > 2.24.0 > Thanks, Dennis ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() 2021-04-16 4:45 ` Dennis Zhou @ 2021-04-16 13:16 ` Pavel Begunkov 2021-04-16 14:10 ` Ming Lei 0 siblings, 1 reply; 17+ messages in thread From: Pavel Begunkov @ 2021-04-16 13:16 UTC (permalink / raw) To: Dennis Zhou Cc: Jens Axboe, io-uring, Tejun Heo, Christoph Lameter, Joakim Hassila, Ming Lei On 16/04/2021 05:45, Dennis Zhou wrote: > Hello, > > On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote: >> Add percpu_ref_atomic_count(), which returns number of references of a >> percpu_ref switched prior into atomic mode, so the caller is responsible >> to make sure it's in the right mode. >> >> Signed-off-by: Pavel Begunkov <[email protected]> >> --- >> include/linux/percpu-refcount.h | 1 + >> lib/percpu-refcount.c | 26 ++++++++++++++++++++++++++ >> 2 files changed, 27 insertions(+) >> >> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h >> index 16c35a728b4c..0ff40e79efa2 100644 >> --- a/include/linux/percpu-refcount.h >> +++ b/include/linux/percpu-refcount.h >> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, >> void percpu_ref_resurrect(struct percpu_ref *ref); >> void percpu_ref_reinit(struct percpu_ref *ref); >> bool percpu_ref_is_zero(struct percpu_ref *ref); >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref); >> >> /** >> * percpu_ref_kill - drop the initial ref >> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c >> index a1071cdefb5a..56286995e2b8 100644 >> --- a/lib/percpu-refcount.c >> +++ b/lib/percpu-refcount.c >> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref) >> } >> EXPORT_SYMBOL_GPL(percpu_ref_is_zero); >> >> +/** >> + * percpu_ref_atomic_count - returns number of left references >> + * @ref: percpu_ref to test >> + * >> + * This function is safe to call as long as @ref is switch into atomic mode, >> + * and is between init and exit. >> + */ >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref) >> +{ >> + unsigned long __percpu *percpu_count; >> + unsigned long count, flags; >> + >> + if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count))) >> + return -1UL; >> + >> + /* protect us from being destroyed */ >> + spin_lock_irqsave(&percpu_ref_switch_lock, flags); >> + if (ref->data) >> + count = atomic_long_read(&ref->data->count); >> + else >> + count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS; > > Sorry I missed Jens' patch before and also the update to percpu_ref. > However, I feel like I'm missing something. This isn't entirely related > to your patch, but I'm not following why percpu_count_ptr stores the > excess count of an exited percpu_ref and doesn't warn when it's not > zero. It seems like this should be an error if it's not 0? > > Granted we have made some contract with the user to do the right thing, > but say someone does mess up, we don't indicate to them hey this ref is > actually dead and if they're waiting for it to go to 0, it never will. fwiw, I copied is_zero, but skimming through the code don't immediately see myself why it is so... Cc Ming, he split out some parts of it to dynamic allocation not too long ago, maybe he knows the trick. > >> + spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); >> + >> + return count; >> +} >> + >> /** >> * percpu_ref_reinit - re-initialize a percpu refcount >> * @ref: perpcu_ref to re-initialize >> -- >> 2.24.0 >> -- Pavel Begunkov ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() 2021-04-16 13:16 ` Pavel Begunkov @ 2021-04-16 14:10 ` Ming Lei 2021-04-16 14:37 ` Dennis Zhou 0 siblings, 1 reply; 17+ messages in thread From: Ming Lei @ 2021-04-16 14:10 UTC (permalink / raw) To: Pavel Begunkov Cc: Dennis Zhou, Jens Axboe, io-uring, Tejun Heo, Christoph Lameter, Joakim Hassila On Fri, Apr 16, 2021 at 02:16:41PM +0100, Pavel Begunkov wrote: > On 16/04/2021 05:45, Dennis Zhou wrote: > > Hello, > > > > On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote: > >> Add percpu_ref_atomic_count(), which returns number of references of a > >> percpu_ref switched prior into atomic mode, so the caller is responsible > >> to make sure it's in the right mode. > >> > >> Signed-off-by: Pavel Begunkov <[email protected]> > >> --- > >> include/linux/percpu-refcount.h | 1 + > >> lib/percpu-refcount.c | 26 ++++++++++++++++++++++++++ > >> 2 files changed, 27 insertions(+) > >> > >> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h > >> index 16c35a728b4c..0ff40e79efa2 100644 > >> --- a/include/linux/percpu-refcount.h > >> +++ b/include/linux/percpu-refcount.h > >> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, > >> void percpu_ref_resurrect(struct percpu_ref *ref); > >> void percpu_ref_reinit(struct percpu_ref *ref); > >> bool percpu_ref_is_zero(struct percpu_ref *ref); > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref); > >> > >> /** > >> * percpu_ref_kill - drop the initial ref > >> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c > >> index a1071cdefb5a..56286995e2b8 100644 > >> --- a/lib/percpu-refcount.c > >> +++ b/lib/percpu-refcount.c > >> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref) > >> } > >> EXPORT_SYMBOL_GPL(percpu_ref_is_zero); > >> > >> +/** > >> + * percpu_ref_atomic_count - returns number of left references > >> + * @ref: percpu_ref to test > >> + * > >> + * This function is safe to call as long as @ref is switch into atomic mode, > >> + * and is between init and exit. > >> + */ > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref) > >> +{ > >> + unsigned long __percpu *percpu_count; > >> + unsigned long count, flags; > >> + > >> + if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count))) > >> + return -1UL; > >> + > >> + /* protect us from being destroyed */ > >> + spin_lock_irqsave(&percpu_ref_switch_lock, flags); > >> + if (ref->data) > >> + count = atomic_long_read(&ref->data->count); > >> + else > >> + count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS; > > > > Sorry I missed Jens' patch before and also the update to percpu_ref. > > However, I feel like I'm missing something. This isn't entirely related > > to your patch, but I'm not following why percpu_count_ptr stores the > > excess count of an exited percpu_ref and doesn't warn when it's not > > zero. It seems like this should be an error if it's not 0? > > > > Granted we have made some contract with the user to do the right thing, > > but say someone does mess up, we don't indicate to them hey this ref is > > actually dead and if they're waiting for it to go to 0, it never will. > > fwiw, I copied is_zero, but skimming through the code don't immediately > see myself why it is so... > > Cc Ming, he split out some parts of it to dynamic allocation not too > long ago, maybe he knows the trick. I remembered that percpu_ref_is_zero() can be called even after percpu_ref_exit() returns, and looks percpu_ref_is_zero() isn't classified into 'active use'. Thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() 2021-04-16 14:10 ` Ming Lei @ 2021-04-16 14:37 ` Dennis Zhou 2021-04-19 2:03 ` Ming Lei 0 siblings, 1 reply; 17+ messages in thread From: Dennis Zhou @ 2021-04-16 14:37 UTC (permalink / raw) To: Ming Lei Cc: Pavel Begunkov, Jens Axboe, io-uring, Tejun Heo, Christoph Lameter, Joakim Hassila On Fri, Apr 16, 2021 at 10:10:07PM +0800, Ming Lei wrote: > On Fri, Apr 16, 2021 at 02:16:41PM +0100, Pavel Begunkov wrote: > > On 16/04/2021 05:45, Dennis Zhou wrote: > > > Hello, > > > > > > On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote: > > >> Add percpu_ref_atomic_count(), which returns number of references of a > > >> percpu_ref switched prior into atomic mode, so the caller is responsible > > >> to make sure it's in the right mode. > > >> > > >> Signed-off-by: Pavel Begunkov <[email protected]> > > >> --- > > >> include/linux/percpu-refcount.h | 1 + > > >> lib/percpu-refcount.c | 26 ++++++++++++++++++++++++++ > > >> 2 files changed, 27 insertions(+) > > >> > > >> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h > > >> index 16c35a728b4c..0ff40e79efa2 100644 > > >> --- a/include/linux/percpu-refcount.h > > >> +++ b/include/linux/percpu-refcount.h > > >> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, > > >> void percpu_ref_resurrect(struct percpu_ref *ref); > > >> void percpu_ref_reinit(struct percpu_ref *ref); > > >> bool percpu_ref_is_zero(struct percpu_ref *ref); > > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref); > > >> > > >> /** > > >> * percpu_ref_kill - drop the initial ref > > >> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c > > >> index a1071cdefb5a..56286995e2b8 100644 > > >> --- a/lib/percpu-refcount.c > > >> +++ b/lib/percpu-refcount.c > > >> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref) > > >> } > > >> EXPORT_SYMBOL_GPL(percpu_ref_is_zero); > > >> > > >> +/** > > >> + * percpu_ref_atomic_count - returns number of left references > > >> + * @ref: percpu_ref to test > > >> + * > > >> + * This function is safe to call as long as @ref is switch into atomic mode, > > >> + * and is between init and exit. > > >> + */ > > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref) > > >> +{ > > >> + unsigned long __percpu *percpu_count; > > >> + unsigned long count, flags; > > >> + > > >> + if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count))) > > >> + return -1UL; > > >> + > > >> + /* protect us from being destroyed */ > > >> + spin_lock_irqsave(&percpu_ref_switch_lock, flags); > > >> + if (ref->data) > > >> + count = atomic_long_read(&ref->data->count); > > >> + else > > >> + count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS; > > > > > > Sorry I missed Jens' patch before and also the update to percpu_ref. > > > However, I feel like I'm missing something. This isn't entirely related > > > to your patch, but I'm not following why percpu_count_ptr stores the > > > excess count of an exited percpu_ref and doesn't warn when it's not > > > zero. It seems like this should be an error if it's not 0? > > > > > > Granted we have made some contract with the user to do the right thing, > > > but say someone does mess up, we don't indicate to them hey this ref is > > > actually dead and if they're waiting for it to go to 0, it never will. > > > > fwiw, I copied is_zero, but skimming through the code don't immediately > > see myself why it is so... > > > > Cc Ming, he split out some parts of it to dynamic allocation not too > > long ago, maybe he knows the trick. > > I remembered that percpu_ref_is_zero() can be called even after percpu_ref_exit() > returns, and looks percpu_ref_is_zero() isn't classified into 'active use'. > Looking at the commit prior, it seems like percpu_ref_is_zero() was subject to the usual init and exit lifetime. I guess I'm just not convinced it should ever be > 0. I'll think about it a little longer and might fix it. Thanks, Dennis ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() 2021-04-16 14:37 ` Dennis Zhou @ 2021-04-19 2:03 ` Ming Lei 0 siblings, 0 replies; 17+ messages in thread From: Ming Lei @ 2021-04-19 2:03 UTC (permalink / raw) To: Dennis Zhou Cc: Pavel Begunkov, Jens Axboe, io-uring, Tejun Heo, Christoph Lameter, Joakim Hassila On Fri, Apr 16, 2021 at 02:37:03PM +0000, Dennis Zhou wrote: > On Fri, Apr 16, 2021 at 10:10:07PM +0800, Ming Lei wrote: > > On Fri, Apr 16, 2021 at 02:16:41PM +0100, Pavel Begunkov wrote: > > > On 16/04/2021 05:45, Dennis Zhou wrote: > > > > Hello, > > > > > > > > On Fri, Apr 16, 2021 at 01:22:51AM +0100, Pavel Begunkov wrote: > > > >> Add percpu_ref_atomic_count(), which returns number of references of a > > > >> percpu_ref switched prior into atomic mode, so the caller is responsible > > > >> to make sure it's in the right mode. > > > >> > > > >> Signed-off-by: Pavel Begunkov <[email protected]> > > > >> --- > > > >> include/linux/percpu-refcount.h | 1 + > > > >> lib/percpu-refcount.c | 26 ++++++++++++++++++++++++++ > > > >> 2 files changed, 27 insertions(+) > > > >> > > > >> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h > > > >> index 16c35a728b4c..0ff40e79efa2 100644 > > > >> --- a/include/linux/percpu-refcount.h > > > >> +++ b/include/linux/percpu-refcount.h > > > >> @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, > > > >> void percpu_ref_resurrect(struct percpu_ref *ref); > > > >> void percpu_ref_reinit(struct percpu_ref *ref); > > > >> bool percpu_ref_is_zero(struct percpu_ref *ref); > > > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref); > > > >> > > > >> /** > > > >> * percpu_ref_kill - drop the initial ref > > > >> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c > > > >> index a1071cdefb5a..56286995e2b8 100644 > > > >> --- a/lib/percpu-refcount.c > > > >> +++ b/lib/percpu-refcount.c > > > >> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref) > > > >> } > > > >> EXPORT_SYMBOL_GPL(percpu_ref_is_zero); > > > >> > > > >> +/** > > > >> + * percpu_ref_atomic_count - returns number of left references > > > >> + * @ref: percpu_ref to test > > > >> + * > > > >> + * This function is safe to call as long as @ref is switch into atomic mode, > > > >> + * and is between init and exit. > > > >> + */ > > > >> +unsigned long percpu_ref_atomic_count(struct percpu_ref *ref) > > > >> +{ > > > >> + unsigned long __percpu *percpu_count; > > > >> + unsigned long count, flags; > > > >> + > > > >> + if (WARN_ON_ONCE(__ref_is_percpu(ref, &percpu_count))) > > > >> + return -1UL; > > > >> + > > > >> + /* protect us from being destroyed */ > > > >> + spin_lock_irqsave(&percpu_ref_switch_lock, flags); > > > >> + if (ref->data) > > > >> + count = atomic_long_read(&ref->data->count); > > > >> + else > > > >> + count = ref->percpu_count_ptr >> __PERCPU_REF_FLAG_BITS; > > > > > > > > Sorry I missed Jens' patch before and also the update to percpu_ref. > > > > However, I feel like I'm missing something. This isn't entirely related > > > > to your patch, but I'm not following why percpu_count_ptr stores the > > > > excess count of an exited percpu_ref and doesn't warn when it's not > > > > zero. It seems like this should be an error if it's not 0? > > > > > > > > Granted we have made some contract with the user to do the right thing, > > > > but say someone does mess up, we don't indicate to them hey this ref is > > > > actually dead and if they're waiting for it to go to 0, it never will. > > > > > > fwiw, I copied is_zero, but skimming through the code don't immediately > > > see myself why it is so... > > > > > > Cc Ming, he split out some parts of it to dynamic allocation not too > > > long ago, maybe he knows the trick. > > > > I remembered that percpu_ref_is_zero() can be called even after percpu_ref_exit() > > returns, and looks percpu_ref_is_zero() isn't classified into 'active use'. > > > > Looking at the commit prior, it seems like percpu_ref_is_zero() was > subject to the usual init and exit lifetime. I guess I'm just not > convinced it should ever be > 0. I'll think about it a little longer and > might fix it. There may not be > 0 at that time, but it was allowed for percpu_ref_is_zero() to read un-initialized refcount, and there was such kernel oops report: https://lore.kernel.org/lkml/[email protected]/#r Thanks, Ming ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() 2021-04-16 0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov 2021-04-16 4:45 ` Dennis Zhou @ 2021-04-16 15:31 ` Bart Van Assche 2021-04-16 15:34 ` Jens Axboe 1 sibling, 1 reply; 17+ messages in thread From: Bart Van Assche @ 2021-04-16 15:31 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila On 4/15/21 5:22 PM, Pavel Begunkov wrote: > diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c > index a1071cdefb5a..56286995e2b8 100644 > --- a/lib/percpu-refcount.c > +++ b/lib/percpu-refcount.c > @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref) > } > EXPORT_SYMBOL_GPL(percpu_ref_is_zero); > > +/** > + * percpu_ref_atomic_count - returns number of left references > + * @ref: percpu_ref to test > + * > + * This function is safe to call as long as @ref is switch into atomic mode, > + * and is between init and exit. > + */ How about using the name percpu_ref_read() instead of percpu_ref_atomic_count()? Thanks, Bart. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() 2021-04-16 15:31 ` Bart Van Assche @ 2021-04-16 15:34 ` Jens Axboe 0 siblings, 0 replies; 17+ messages in thread From: Jens Axboe @ 2021-04-16 15:34 UTC (permalink / raw) To: Bart Van Assche, Pavel Begunkov, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila On 4/16/21 9:31 AM, Bart Van Assche wrote: > On 4/15/21 5:22 PM, Pavel Begunkov wrote: >> diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c >> index a1071cdefb5a..56286995e2b8 100644 >> --- a/lib/percpu-refcount.c >> +++ b/lib/percpu-refcount.c >> @@ -425,6 +425,32 @@ bool percpu_ref_is_zero(struct percpu_ref *ref) >> } >> EXPORT_SYMBOL_GPL(percpu_ref_is_zero); >> >> +/** >> + * percpu_ref_atomic_count - returns number of left references >> + * @ref: percpu_ref to test >> + * >> + * This function is safe to call as long as @ref is switch into atomic mode, >> + * and is between init and exit. >> + */ > > How about using the name percpu_ref_read() instead of > percpu_ref_atomic_count()? Not sure we're going that route, but in any case, I think it's important to have it visibly require the ref to be in atomic mode. Maybe percpu_ref_read_atomic() would be better, but I do think 'atomic' has to be in the name. -- Jens Axboe ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 2/2] io_uring: fix shared sqpoll cancellation hangs 2021-04-16 0:22 [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov 2021-04-16 0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov @ 2021-04-16 0:22 ` Pavel Begunkov 2021-04-16 0:26 ` [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov 2 siblings, 0 replies; 17+ messages in thread From: Pavel Begunkov @ 2021-04-16 0:22 UTC (permalink / raw) To: Jens Axboe, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila [ 736.982891] INFO: task iou-sqp-4294:4295 blocked for more than 122 seconds. [ 736.982897] Call Trace: [ 736.982901] schedule+0x68/0xe0 [ 736.982903] io_uring_cancel_sqpoll+0xdb/0x110 [ 736.982908] io_sqpoll_cancel_cb+0x24/0x30 [ 736.982911] io_run_task_work_head+0x28/0x50 [ 736.982913] io_sq_thread+0x4e3/0x720 We call io_uring_cancel_sqpoll() one by one for each ctx either in sq_thread() itself or via task works, and it's intended to cancel all requests of a specified context. However the function uses per-task counters to track the number of inflight requests, so it counts more requests than available via currect io_uring ctx and goes to sleep for them to appear (e.g. from IRQ), that will never happen. Reported-by: Joakim Hassila <[email protected]> Reported-by: Jens Axboe <[email protected]> Fixes: 37d1e2e3642e2 ("io_uring: move SQPOLL thread io-wq forked worker") Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index dff34975d86b..c1c843b044c0 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -9000,10 +9000,11 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx) WARN_ON_ONCE(!sqd || ctx->sq_data->thread != current); + percpu_ref_switch_to_atomic_sync(&ctx->refs); atomic_inc(&tctx->in_idle); do { /* read completions before cancelations */ - inflight = tctx_inflight(tctx); + inflight = percpu_ref_atomic_count(&ctx->refs); if (!inflight) break; io_uring_try_cancel_requests(ctx, current, NULL); @@ -9014,7 +9015,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx) * avoids a race where a completion comes in before we did * prepare_to_wait(). */ - if (inflight == tctx_inflight(tctx)) + if (inflight == percpu_ref_atomic_count(&ctx->refs)) schedule(); finish_wait(&tctx->wait, &wait); } while (1); -- 2.24.0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] fix hangs with shared sqpoll 2021-04-16 0:22 [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov 2021-04-16 0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov 2021-04-16 0:22 ` [PATCH 2/2] io_uring: fix shared sqpoll cancellation hangs Pavel Begunkov @ 2021-04-16 0:26 ` Pavel Begunkov 2021-04-16 13:04 ` Jens Axboe 2 siblings, 1 reply; 17+ messages in thread From: Pavel Begunkov @ 2021-04-16 0:26 UTC (permalink / raw) To: Jens Axboe, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila On 16/04/2021 01:22, Pavel Begunkov wrote: > Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. 1/2 is basically a rip off of one of old Jens' patches, but can't find it anywhere. If you still have it, especially if it was reviewed/etc., may make sense to go with it instead > > Pavel Begunkov (2): > percpu_ref: add percpu_ref_atomic_count() > io_uring: fix shared sqpoll cancellation hangs > > fs/io_uring.c | 5 +++-- > include/linux/percpu-refcount.h | 1 + > lib/percpu-refcount.c | 26 ++++++++++++++++++++++++++ > 3 files changed, 30 insertions(+), 2 deletions(-) > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] fix hangs with shared sqpoll 2021-04-16 0:26 ` [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov @ 2021-04-16 13:04 ` Jens Axboe 2021-04-16 13:12 ` Pavel Begunkov 0 siblings, 1 reply; 17+ messages in thread From: Jens Axboe @ 2021-04-16 13:04 UTC (permalink / raw) To: Pavel Begunkov, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila On 4/15/21 6:26 PM, Pavel Begunkov wrote: > On 16/04/2021 01:22, Pavel Begunkov wrote: >> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. > > 1/2 is basically a rip off of one of old Jens' patches, but can't > find it anywhere. If you still have it, especially if it was > reviewed/etc., may make sense to go with it instead I wonder if we can do something like the below instead - we don't care about a particularly stable count in terms of wakeup reliance, and it'd save a nasty sync atomic switch. Totally untested... diff --git a/fs/io_uring.c b/fs/io_uring.c index 6c182a3a221b..9edbcf01ea49 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -8928,7 +8928,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx) atomic_inc(&tctx->in_idle); do { /* read completions before cancelations */ - inflight = tctx_inflight(tctx, false); + inflight = percpu_ref_sum(&ctx->refs); if (!inflight) break; io_uring_try_cancel_requests(ctx, current, NULL); @@ -8939,7 +8939,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx) * avoids a race where a completion comes in before we did * prepare_to_wait(). */ - if (inflight == tctx_inflight(tctx, false)) + if (inflight == percpu_ref_sum(&ctx->refs)) schedule(); finish_wait(&tctx->wait, &wait); } while (1); diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h index 16c35a728b4c..2f29f34bc993 100644 --- a/include/linux/percpu-refcount.h +++ b/include/linux/percpu-refcount.h @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, void percpu_ref_resurrect(struct percpu_ref *ref); void percpu_ref_reinit(struct percpu_ref *ref); bool percpu_ref_is_zero(struct percpu_ref *ref); +long percpu_ref_sum(struct percpu_ref *ref); /** * percpu_ref_kill - drop the initial ref diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c index a1071cdefb5a..b09ed9fdd32d 100644 --- a/lib/percpu-refcount.c +++ b/lib/percpu-refcount.c @@ -475,3 +475,31 @@ void percpu_ref_resurrect(struct percpu_ref *ref) spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); } EXPORT_SYMBOL_GPL(percpu_ref_resurrect); + +/** + * percpu_ref_sum - return approximate ref counts + * @ref: perpcu_ref to sum + * + * Note that this should only really be used to compare refs, as by the + * very nature of percpu references, the value may be stale even before it + * has been returned. + */ +long percpu_ref_sum(struct percpu_ref *ref) +{ + unsigned long __percpu *percpu_count; + long ret; + + rcu_read_lock(); + if (__ref_is_percpu(ref, &percpu_count)) { + ret = atomic_long_read(&ref->data->count); + } else { + int cpu; + + ret = 0; + for_each_possible_cpu(cpu) + ret += *per_cpu_ptr(percpu_count, cpu); + } + rcu_read_unlock(); + + return ret; +} -- Jens Axboe ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] fix hangs with shared sqpoll 2021-04-16 13:04 ` Jens Axboe @ 2021-04-16 13:12 ` Pavel Begunkov 2021-04-16 13:58 ` Jens Axboe 0 siblings, 1 reply; 17+ messages in thread From: Pavel Begunkov @ 2021-04-16 13:12 UTC (permalink / raw) To: Jens Axboe, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila On 16/04/2021 14:04, Jens Axboe wrote: > On 4/15/21 6:26 PM, Pavel Begunkov wrote: >> On 16/04/2021 01:22, Pavel Begunkov wrote: >>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. >> >> 1/2 is basically a rip off of one of old Jens' patches, but can't >> find it anywhere. If you still have it, especially if it was >> reviewed/etc., may make sense to go with it instead > > I wonder if we can do something like the below instead - we don't > care about a particularly stable count in terms of wakeup > reliance, and it'd save a nasty sync atomic switch. But we care about it being monotonous. There are nuances with it. I think, non sync'ed summing may put it to eternal sleep. Are you looking to save on switching? It's almost always is already dying with prior ref_kill > > Totally untested... > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 6c182a3a221b..9edbcf01ea49 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -8928,7 +8928,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx) > atomic_inc(&tctx->in_idle); > do { > /* read completions before cancelations */ > - inflight = tctx_inflight(tctx, false); > + inflight = percpu_ref_sum(&ctx->refs); > if (!inflight) > break; > io_uring_try_cancel_requests(ctx, current, NULL); > @@ -8939,7 +8939,7 @@ static void io_uring_cancel_sqpoll(struct io_ring_ctx *ctx) > * avoids a race where a completion comes in before we did > * prepare_to_wait(). > */ > - if (inflight == tctx_inflight(tctx, false)) > + if (inflight == percpu_ref_sum(&ctx->refs)) > schedule(); > finish_wait(&tctx->wait, &wait); > } while (1); > diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h > index 16c35a728b4c..2f29f34bc993 100644 > --- a/include/linux/percpu-refcount.h > +++ b/include/linux/percpu-refcount.h > @@ -131,6 +131,7 @@ void percpu_ref_kill_and_confirm(struct percpu_ref *ref, > void percpu_ref_resurrect(struct percpu_ref *ref); > void percpu_ref_reinit(struct percpu_ref *ref); > bool percpu_ref_is_zero(struct percpu_ref *ref); > +long percpu_ref_sum(struct percpu_ref *ref); > > /** > * percpu_ref_kill - drop the initial ref > diff --git a/lib/percpu-refcount.c b/lib/percpu-refcount.c > index a1071cdefb5a..b09ed9fdd32d 100644 > --- a/lib/percpu-refcount.c > +++ b/lib/percpu-refcount.c > @@ -475,3 +475,31 @@ void percpu_ref_resurrect(struct percpu_ref *ref) > spin_unlock_irqrestore(&percpu_ref_switch_lock, flags); > } > EXPORT_SYMBOL_GPL(percpu_ref_resurrect); > + > +/** > + * percpu_ref_sum - return approximate ref counts > + * @ref: perpcu_ref to sum > + * > + * Note that this should only really be used to compare refs, as by the > + * very nature of percpu references, the value may be stale even before it > + * has been returned. > + */ > +long percpu_ref_sum(struct percpu_ref *ref) > +{ > + unsigned long __percpu *percpu_count; > + long ret; > + > + rcu_read_lock(); > + if (__ref_is_percpu(ref, &percpu_count)) { > + ret = atomic_long_read(&ref->data->count); > + } else { > + int cpu; > + > + ret = 0; > + for_each_possible_cpu(cpu) > + ret += *per_cpu_ptr(percpu_count, cpu); > + } > + rcu_read_unlock(); > + > + return ret; > +} > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] fix hangs with shared sqpoll 2021-04-16 13:12 ` Pavel Begunkov @ 2021-04-16 13:58 ` Jens Axboe 2021-04-16 14:09 ` Pavel Begunkov 0 siblings, 1 reply; 17+ messages in thread From: Jens Axboe @ 2021-04-16 13:58 UTC (permalink / raw) To: Pavel Begunkov, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila On 4/16/21 7:12 AM, Pavel Begunkov wrote: > On 16/04/2021 14:04, Jens Axboe wrote: >> On 4/15/21 6:26 PM, Pavel Begunkov wrote: >>> On 16/04/2021 01:22, Pavel Begunkov wrote: >>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. >>> >>> 1/2 is basically a rip off of one of old Jens' patches, but can't >>> find it anywhere. If you still have it, especially if it was >>> reviewed/etc., may make sense to go with it instead >> >> I wonder if we can do something like the below instead - we don't >> care about a particularly stable count in terms of wakeup >> reliance, and it'd save a nasty sync atomic switch. > > But we care about it being monotonous. There are nuances with it. Do we, though? We care about it changing when something has happened, but not about it being monotonic. > I think, non sync'ed summing may put it to eternal sleep. That's what the two reads are about, that's the same as before. The numbers are racy in both cases, but that's why we compare after having added ourselves to the wait queue. > Are you looking to save on switching? It's almost always is already > dying with prior ref_kill Yep, always looking to avoid a sync switch if at all possible. For 99% of the cases it's fine, it's the last case in busy prod that wreaks havoc. -- Jens Axboe ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] fix hangs with shared sqpoll 2021-04-16 13:58 ` Jens Axboe @ 2021-04-16 14:09 ` Pavel Begunkov 2021-04-16 14:42 ` Pavel Begunkov [not found] ` <[email protected]> 0 siblings, 2 replies; 17+ messages in thread From: Pavel Begunkov @ 2021-04-16 14:09 UTC (permalink / raw) To: Jens Axboe, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila On 16/04/2021 14:58, Jens Axboe wrote: > On 4/16/21 7:12 AM, Pavel Begunkov wrote: >> On 16/04/2021 14:04, Jens Axboe wrote: >>> On 4/15/21 6:26 PM, Pavel Begunkov wrote: >>>> On 16/04/2021 01:22, Pavel Begunkov wrote: >>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. >>>> >>>> 1/2 is basically a rip off of one of old Jens' patches, but can't >>>> find it anywhere. If you still have it, especially if it was >>>> reviewed/etc., may make sense to go with it instead >>> >>> I wonder if we can do something like the below instead - we don't >>> care about a particularly stable count in terms of wakeup >>> reliance, and it'd save a nasty sync atomic switch. >> >> But we care about it being monotonous. There are nuances with it. > > Do we, though? We care about it changing when something has happened, > but not about it being monotonic. We may find inflight == get_inflight(), when it's not really so, and so get to schedule() awhile there are pending requests that are not going to be cancelled by itself. And those pending requests may have been non-discoverable and so non-cancellable, e.g. because were a part of a ling/hardlink. >> I think, non sync'ed summing may put it to eternal sleep. > > That's what the two reads are about, that's the same as before. The > numbers are racy in both cases, but that's why we compare after having > added ourselves to the wait queue. > >> Are you looking to save on switching? It's almost always is already >> dying with prior ref_kill > > Yep, always looking to avoid a sync switch if at all possible. For 99% > of the cases it's fine, it's the last case in busy prod that wreaks > havoc. Limited to sqpoll, so I wouldn't worry. Also considering that sqpoll doesn't have many file notes (as it was called before). We can completely avoid it and even make faster if happens from sq_thread() on it getting to exit, but do we want it for 5.12? -- Pavel Begunkov ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/2] fix hangs with shared sqpoll 2021-04-16 14:09 ` Pavel Begunkov @ 2021-04-16 14:42 ` Pavel Begunkov [not found] ` <[email protected]> 1 sibling, 0 replies; 17+ messages in thread From: Pavel Begunkov @ 2021-04-16 14:42 UTC (permalink / raw) To: Jens Axboe, io-uring Cc: Dennis Zhou, Tejun Heo, Christoph Lameter, Joakim Hassila On 16/04/2021 15:09, Pavel Begunkov wrote: > On 16/04/2021 14:58, Jens Axboe wrote: >> On 4/16/21 7:12 AM, Pavel Begunkov wrote: >>> On 16/04/2021 14:04, Jens Axboe wrote: >>>> On 4/15/21 6:26 PM, Pavel Begunkov wrote: >>>>> On 16/04/2021 01:22, Pavel Begunkov wrote: >>>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. >>>>> >>>>> 1/2 is basically a rip off of one of old Jens' patches, but can't >>>>> find it anywhere. If you still have it, especially if it was >>>>> reviewed/etc., may make sense to go with it instead >>>> >>>> I wonder if we can do something like the below instead - we don't >>>> care about a particularly stable count in terms of wakeup >>>> reliance, and it'd save a nasty sync atomic switch. >>> >>> But we care about it being monotonous. There are nuances with it. >> >> Do we, though? We care about it changing when something has happened, >> but not about it being monotonic. > > We may find inflight == get_inflight(), when it's not really so, > and so get to schedule() awhile there are pending requests that > are not going to be cancelled by itself. And those pending requests > may have been non-discoverable and so non-cancellable, e.g. because > were a part of a ling/hardlink. Anyway, there might be other problems because of how wake_up()'s and ctx->refs putting is ordered. Needs to be remade, probably without ctx->refs in the first place. >>> I think, non sync'ed summing may put it to eternal sleep. >> >> That's what the two reads are about, that's the same as before. The >> numbers are racy in both cases, but that's why we compare after having >> added ourselves to the wait queue. >> >>> Are you looking to save on switching? It's almost always is already >>> dying with prior ref_kill >> >> Yep, always looking to avoid a sync switch if at all possible. For 99% >> of the cases it's fine, it's the last case in busy prod that wreaks >> havoc. > > Limited to sqpoll, so I wouldn't worry. Also considering that sqpoll > doesn't have many file notes (as it was called before). We can > completely avoid it and even make faster if happens from sq_thread() > on it getting to exit, but do we want it for 5.12? > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <[email protected]>]
* Re: [PATCH 0/2] fix hangs with shared sqpoll [not found] ` <[email protected]> @ 2021-04-18 13:56 ` Pavel Begunkov 0 siblings, 0 replies; 17+ messages in thread From: Pavel Begunkov @ 2021-04-18 13:56 UTC (permalink / raw) To: Hillf Danton; +Cc: Jens Axboe, io-uring On 4/17/21 2:31 AM, Hillf Danton wrote: > On Fri, 16 Apr 2021 15:42:07 Pavel Begunkov wrote: >> On 16/04/2021 15:09, Pavel Begunkov wrote: >>> On 16/04/2021 14:58, Jens Axboe wrote: >>>> On 4/16/21 7:12 AM, Pavel Begunkov wrote: >>>>> On 16/04/2021 14:04, Jens Axboe wrote: >>>>>> On 4/15/21 6:26 PM, Pavel Begunkov wrote: >>>>>>> On 16/04/2021 01:22, Pavel Begunkov wrote: >>>>>>>> Late catched 5.12 bug with nasty hangs. Thanks Jens for a reproducer. >>>>>>> >>>>>>> 1/2 is basically a rip off of one of old Jens' patches, but can't >>>>>>> find it anywhere. If you still have it, especially if it was >>>>>>> reviewed/etc., may make sense to go with it instead >>>>>> >>>>>> I wonder if we can do something like the below instead - we don't >>>>>> care about a particularly stable count in terms of wakeup >>>>>> reliance, and it'd save a nasty sync atomic switch. >>>>> >>>>> But we care about it being monotonous. There are nuances with it. >>>> >>>> Do we, though? We care about it changing when something has happened, >>>> but not about it being monotonic. >>> >>> We may find inflight == get_inflight(), when it's not really so, >>> and so get to schedule() awhile there are pending requests that >>> are not going to be cancelled by itself. And those pending requests >>> may have been non-discoverable and so non-cancellable, e.g. because >>> were a part of a ling/hardlink. >> >> Anyway, there might be other problems because of how wake_up()'s >> and ctx->refs putting is ordered. Needs to be remade, probably >> without ctx->refs in the first place. >> > Given the test rounds in the current tree, next tree and his tree the Whose "his" tree? > percpu count had survived, one of the quick questions is how it fell apart > last night? What "percpu count had survived"? Do you mean the percpu-related patch from the series? What fell apart? -- Pavel Begunkov ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2021-04-19 2:03 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-16 0:22 [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
2021-04-16 0:22 ` [PATCH 1/2] percpu_ref: add percpu_ref_atomic_count() Pavel Begunkov
2021-04-16 4:45 ` Dennis Zhou
2021-04-16 13:16 ` Pavel Begunkov
2021-04-16 14:10 ` Ming Lei
2021-04-16 14:37 ` Dennis Zhou
2021-04-19 2:03 ` Ming Lei
2021-04-16 15:31 ` Bart Van Assche
2021-04-16 15:34 ` Jens Axboe
2021-04-16 0:22 ` [PATCH 2/2] io_uring: fix shared sqpoll cancellation hangs Pavel Begunkov
2021-04-16 0:26 ` [PATCH 0/2] fix hangs with shared sqpoll Pavel Begunkov
2021-04-16 13:04 ` Jens Axboe
2021-04-16 13:12 ` Pavel Begunkov
2021-04-16 13:58 ` Jens Axboe
2021-04-16 14:09 ` Pavel Begunkov
2021-04-16 14:42 ` Pavel Begunkov
[not found] ` <[email protected]>
2021-04-18 13:56 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox