public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/2] Further async re-queue tightening/fix
@ 2021-07-27 16:58 Jens Axboe
  2021-07-27 16:58 ` [PATCH 1/2] io_uring: always reissue from task_work context Jens Axboe
  2021-07-27 16:58 ` [PATCH 2/2] io_uring: don't block level reissue off completion path Jens Axboe
  0 siblings, 2 replies; 6+ messages in thread
From: Jens Axboe @ 2021-07-27 16:58 UTC (permalink / raw)
  To: io-uring; +Cc: f.ebner

Hi,

Two small patches to ensure we always requeue off task_work, and a
sanity correction on when it is OK to do so.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] io_uring: always reissue from task_work context
  2021-07-27 16:58 [PATCH 0/2] Further async re-queue tightening/fix Jens Axboe
@ 2021-07-27 16:58 ` Jens Axboe
  2021-07-27 16:58 ` [PATCH 2/2] io_uring: don't block level reissue off completion path Jens Axboe
  1 sibling, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2021-07-27 16:58 UTC (permalink / raw)
  To: io-uring; +Cc: f.ebner, Jens Axboe

As a safeguard, if we're going to queue async work, do it from task_work
from the original task. This ensures that we can always sanely create
threads, regards of what the reissue context may be.

Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index a4331deb0427..6ba101cd4661 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2060,6 +2060,12 @@ static void io_req_task_queue(struct io_kiocb *req)
 	io_req_task_work_add(req);
 }
 
+static void io_req_task_queue_reissue(struct io_kiocb *req)
+{
+	req->io_task_work.func = io_queue_async_work;
+	io_req_task_work_add(req);
+}
+
 static inline void io_queue_next(struct io_kiocb *req)
 {
 	struct io_kiocb *nxt = io_req_find_next(req);
@@ -2248,7 +2254,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events,
 		    !(req->flags & REQ_F_DONT_REISSUE)) {
 			req->iopoll_completed = 0;
 			req_ref_get(req);
-			io_queue_async_work(req);
+			io_req_task_queue_reissue(req);
 			continue;
 		}
 
@@ -2771,7 +2777,7 @@ static void kiocb_done(struct kiocb *kiocb, ssize_t ret,
 		req->flags &= ~REQ_F_REISSUE;
 		if (io_resubmit_prep(req)) {
 			req_ref_get(req);
-			io_queue_async_work(req);
+			io_req_task_queue_reissue(req);
 		} else {
 			int cflags = 0;
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] io_uring: don't block level reissue off completion path
  2021-07-27 16:58 [PATCH 0/2] Further async re-queue tightening/fix Jens Axboe
  2021-07-27 16:58 ` [PATCH 1/2] io_uring: always reissue from task_work context Jens Axboe
@ 2021-07-27 16:58 ` Jens Axboe
  2021-07-28  9:26   ` Fabian Ebner
  1 sibling, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2021-07-27 16:58 UTC (permalink / raw)
  To: io-uring; +Cc: f.ebner, Jens Axboe, stable

Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
completion path. Normally we expect this to happen inline as part
of submission, but apparently SCSI has a weird corner case where it
can happen as part of normal completions.

This should be solved by having the -EAGAIN bubble back up the stack
as part of submission, but previous attempts at this failed and we're
not just quite there yet. Instead we currently use REQ_F_REISSUE to
handle this case.

For now, catch it in io_rw_should_reissue() and prevent a reissue
from a bogus path.

Cc: [email protected]
Reported-by: Fabian Ebner <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
---
 fs/io_uring.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 6ba101cd4661..83f67d33bf67 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2447,6 +2447,12 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
 	 */
 	if (percpu_ref_is_dying(&ctx->refs))
 		return false;
+	/*
+	 * Play it safe and assume not safe to re-import and reissue if we're
+	 * not in the original thread group (or in task context).
+	 */
+	if (!same_thread_group(req->task, current) || !in_task())
+		return false;
 	return true;
 }
 #else
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] io_uring: don't block level reissue off completion path
  2021-07-27 16:58 ` [PATCH 2/2] io_uring: don't block level reissue off completion path Jens Axboe
@ 2021-07-28  9:26   ` Fabian Ebner
  2021-07-28 13:23     ` Jens Axboe
  0 siblings, 1 reply; 6+ messages in thread
From: Fabian Ebner @ 2021-07-28  9:26 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: stable

Am 27.07.21 um 18:58 schrieb Jens Axboe:
> Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
> completion path. Normally we expect this to happen inline as part
> of submission, but apparently SCSI has a weird corner case where it
> can happen as part of normal completions.
> 
> This should be solved by having the -EAGAIN bubble back up the stack
> as part of submission, but previous attempts at this failed and we're
> not just quite there yet. Instead we currently use REQ_F_REISSUE to
> handle this case.
> 
> For now, catch it in io_rw_should_reissue() and prevent a reissue
> from a bogus path.
> 
> Cc: [email protected]
> Reported-by: Fabian Ebner <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
> ---
>   fs/io_uring.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 6ba101cd4661..83f67d33bf67 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -2447,6 +2447,12 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
>   	 */
>   	if (percpu_ref_is_dying(&ctx->refs))
>   		return false;
> +	/*
> +	 * Play it safe and assume not safe to re-import and reissue if we're
> +	 * not in the original thread group (or in task context).
> +	 */
> +	if (!same_thread_group(req->task, current) || !in_task())
> +		return false;
>   	return true;
>   }
>   #else
> 

Hi,

thank you for the fix! This does indeed prevent the panic (with 5.11.22) 
and hang (with 5.13.3) with my problematic workload.

Best Regards,
Fabian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] io_uring: don't block level reissue off completion path
  2021-07-28  9:26   ` Fabian Ebner
@ 2021-07-28 13:23     ` Jens Axboe
  2021-07-29  6:50       ` Fabian Ebner
  0 siblings, 1 reply; 6+ messages in thread
From: Jens Axboe @ 2021-07-28 13:23 UTC (permalink / raw)
  To: Fabian Ebner, io-uring; +Cc: stable

On 7/28/21 3:26 AM, Fabian Ebner wrote:
> Am 27.07.21 um 18:58 schrieb Jens Axboe:
>> Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
>> completion path. Normally we expect this to happen inline as part
>> of submission, but apparently SCSI has a weird corner case where it
>> can happen as part of normal completions.
>>
>> This should be solved by having the -EAGAIN bubble back up the stack
>> as part of submission, but previous attempts at this failed and we're
>> not just quite there yet. Instead we currently use REQ_F_REISSUE to
>> handle this case.
>>
>> For now, catch it in io_rw_should_reissue() and prevent a reissue
>> from a bogus path.
>>
>> Cc: [email protected]
>> Reported-by: Fabian Ebner <[email protected]>
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>   fs/io_uring.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 6ba101cd4661..83f67d33bf67 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -2447,6 +2447,12 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
>>   	 */
>>   	if (percpu_ref_is_dying(&ctx->refs))
>>   		return false;
>> +	/*
>> +	 * Play it safe and assume not safe to re-import and reissue if we're
>> +	 * not in the original thread group (or in task context).
>> +	 */
>> +	if (!same_thread_group(req->task, current) || !in_task())
>> +		return false;
>>   	return true;
>>   }
>>   #else
>>
> 
> Hi,
> 
> thank you for the fix! This does indeed prevent the panic (with 5.11.22) 
> and hang (with 5.13.3) with my problematic workload.

Perfect, thanks for re-testing! Can I add your Tested-by to the patch?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] io_uring: don't block level reissue off completion path
  2021-07-28 13:23     ` Jens Axboe
@ 2021-07-29  6:50       ` Fabian Ebner
  0 siblings, 0 replies; 6+ messages in thread
From: Fabian Ebner @ 2021-07-29  6:50 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: stable

Am 28.07.21 um 15:23 schrieb Jens Axboe:
> On 7/28/21 3:26 AM, Fabian Ebner wrote:
>> Am 27.07.21 um 18:58 schrieb Jens Axboe:
>>> Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
>>> completion path. Normally we expect this to happen inline as part
>>> of submission, but apparently SCSI has a weird corner case where it
>>> can happen as part of normal completions.
>>>
>>> This should be solved by having the -EAGAIN bubble back up the stack
>>> as part of submission, but previous attempts at this failed and we're
>>> not just quite there yet. Instead we currently use REQ_F_REISSUE to
>>> handle this case.
>>>
>>> For now, catch it in io_rw_should_reissue() and prevent a reissue
>>> from a bogus path.
>>>
>>> Cc: [email protected]
>>> Reported-by: Fabian Ebner <[email protected]>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>> ---
>>>    fs/io_uring.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 6ba101cd4661..83f67d33bf67 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -2447,6 +2447,12 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
>>>    	 */
>>>    	if (percpu_ref_is_dying(&ctx->refs))
>>>    		return false;
>>> +	/*
>>> +	 * Play it safe and assume not safe to re-import and reissue if we're
>>> +	 * not in the original thread group (or in task context).
>>> +	 */
>>> +	if (!same_thread_group(req->task, current) || !in_task())
>>> +		return false;
>>>    	return true;
>>>    }
>>>    #else
>>>
>>
>> Hi,
>>
>> thank you for the fix! This does indeed prevent the panic (with 5.11.22)
>> and hang (with 5.13.3) with my problematic workload.
> 
> Perfect, thanks for re-testing! Can I add your Tested-by to the patch?
> 

Sure, feel free to do so.

Best Regards,
Fabian


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-07-29  6:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-07-27 16:58 [PATCH 0/2] Further async re-queue tightening/fix Jens Axboe
2021-07-27 16:58 ` [PATCH 1/2] io_uring: always reissue from task_work context Jens Axboe
2021-07-27 16:58 ` [PATCH 2/2] io_uring: don't block level reissue off completion path Jens Axboe
2021-07-28  9:26   ` Fabian Ebner
2021-07-28 13:23     ` Jens Axboe
2021-07-29  6:50       ` Fabian Ebner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox