[PATCH v4] io_uring: reduce latency by reissueing the operation

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH v4] io_uring: reduce latency by reissueing the operation
@ 2021-06-22 12:17 Olivier Langlois
  2021-06-22 17:54 ` Pavel Begunkov
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Olivier Langlois @ 2021-06-22 12:17 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring, linux-kernel; +Cc: Olivier Langlois

It is quite frequent that when an operation fails and returns EAGAIN,
the data becomes available between that failure and the call to
vfs_poll() done by io_arm_poll_handler().

Detecting the situation and reissuing the operation is much faster
than going ahead and push the operation to the io-wq.

Performance improvement testing has been performed with:
Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.

4 measurements have been taken:
1. The time it takes to process a read request when data is already available
2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
3. The time it takes to execute io_queue_async_work()
4. The time it takes to complete a read request asynchronously

2.25% of all the read operations did use the new path.

ready data (baseline)
avg	3657.94182918628
min	580
max	20098
stddev	1213.15975908162

reissue	completion
average	7882.67567567568
min	2316
max	28811
stddev	1982.79172973284

insert io-wq time
average	8983.82276995305
min	3324
max	87816
stddev	2551.60056552038

async time completion
average	24670.4758861127
min	10758
max	102612
stddev	3483.92416873804

Conclusion:
On average reissuing the sqe with the patch code is 1.1uSec faster and
in the worse case scenario 59uSec faster than placing the request on
io-wq

On average completion time by reissuing the sqe with the patch code is
16.79uSec faster and in the worse case scenario 73.8uSec faster than
async completion.

Signed-off-by: Olivier Langlois <[email protected]>
---
 fs/io_uring.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index fc8637f591a6..5efa67c2f974 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5152,7 +5152,13 @@ static __poll_t __io_arm_poll_handler(struct io_kiocb *req,
 	return mask;
 }
 
-static bool io_arm_poll_handler(struct io_kiocb *req)
+enum {
+	IO_APOLL_OK,
+	IO_APOLL_ABORTED,
+	IO_APOLL_READY
+};
+
+static int io_arm_poll_handler(struct io_kiocb *req)
 {
 	const struct io_op_def *def = &io_op_defs[req->opcode];
 	struct io_ring_ctx *ctx = req->ctx;
@@ -5162,22 +5168,22 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
 	int rw;
 
 	if (!req->file || !file_can_poll(req->file))
-		return false;
+		return IO_APOLL_ABORTED;
 	if (req->flags & REQ_F_POLLED)
-		return false;
+		return IO_APOLL_ABORTED;
 	if (def->pollin)
 		rw = READ;
 	else if (def->pollout)
 		rw = WRITE;
 	else
-		return false;
+		return IO_APOLL_ABORTED;
 	/* if we can't nonblock try, then no point in arming a poll handler */
 	if (!io_file_supports_async(req, rw))
-		return false;
+		return IO_APOLL_ABORTED;
 
 	apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC);
 	if (unlikely(!apoll))
-		return false;
+		return IO_APOLL_ABORTED;
 	apoll->double_poll = NULL;
 
 	req->flags |= REQ_F_POLLED;
@@ -5203,12 +5209,14 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
 	if (ret || ipt.error) {
 		io_poll_remove_double(req);
 		spin_unlock_irq(&ctx->completion_lock);
-		return false;
+		if (ret)
+			return IO_APOLL_READY;
+		return IO_APOLL_ABORTED;
 	}
 	spin_unlock_irq(&ctx->completion_lock);
 	trace_io_uring_poll_arm(ctx, req, req->opcode, req->user_data,
 				mask, apoll->poll.events);
-	return true;
+	return IO_APOLL_OK;
 }
 
 static bool __io_poll_remove_one(struct io_kiocb *req,
@@ -6437,6 +6445,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
 	struct io_kiocb *linked_timeout = io_prep_linked_timeout(req);
 	int ret;
 
+issue_sqe:
 	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
 
 	/*
@@ -6456,12 +6465,16 @@ static void __io_queue_sqe(struct io_kiocb *req)
 			io_put_req(req);
 		}
 	} else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
-		if (!io_arm_poll_handler(req)) {
+		switch (io_arm_poll_handler(req)) {
+		case IO_APOLL_READY:
+			goto issue_sqe;
+		case IO_APOLL_ABORTED:
 			/*
 			 * Queued up for async execution, worker will release
 			 * submit reference when the iocb is actually submitted.
 			 */
 			io_queue_async_work(req);
+			break;
 		}
 	} else {
 		io_req_complete_failed(req, ret);
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] io_uring: reduce latency by reissueing the operation
  2021-06-22 12:17 [PATCH v4] io_uring: reduce latency by reissueing the operation Olivier Langlois
@ 2021-06-22 17:54 ` Pavel Begunkov
  2021-06-22 18:01   ` Pavel Begunkov
  2021-06-22 20:52 ` Pavel Begunkov
  2021-06-25  0:45 ` Jens Axboe
  2 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2021-06-22 17:54 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, io-uring, linux-kernel

On 6/22/21 1:17 PM, Olivier Langlois wrote:
> It is quite frequent that when an operation fails and returns EAGAIN,
> the data becomes available between that failure and the call to
> vfs_poll() done by io_arm_poll_handler().
> 
> Detecting the situation and reissuing the operation is much faster
> than going ahead and push the operation to the io-wq.
> 
> Performance improvement testing has been performed with:
> Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.
> 
> 4 measurements have been taken:
> 1. The time it takes to process a read request when data is already available
> 2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
> 3. The time it takes to execute io_queue_async_work()
> 4. The time it takes to complete a read request asynchronously
> 
> 2.25% of all the read operations did use the new path.
> 
> ready data (baseline)
> avg	3657.94182918628
> min	580
> max	20098
> stddev	1213.15975908162
> 
> reissue	completion
> average	7882.67567567568
> min	2316
> max	28811
> stddev	1982.79172973284
> 
> insert io-wq time
> average	8983.82276995305
> min	3324
> max	87816
> stddev	2551.60056552038
> 
> async time completion
> average	24670.4758861127
> min	10758
> max	102612
> stddev	3483.92416873804
> 
> Conclusion:
> On average reissuing the sqe with the patch code is 1.1uSec faster and
> in the worse case scenario 59uSec faster than placing the request on
> io-wq
> 
> On average completion time by reissuing the sqe with the patch code is
> 16.79uSec faster and in the worse case scenario 73.8uSec faster than
> async completion.
> 
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
>  fs/io_uring.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index fc8637f591a6..5efa67c2f974 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c

[...]

>  static bool __io_poll_remove_one(struct io_kiocb *req,
> @@ -6437,6 +6445,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
>  	struct io_kiocb *linked_timeout = io_prep_linked_timeout(req);
>  	int ret;
>  
> +issue_sqe:
>  	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
>  
>  	/*
> @@ -6456,12 +6465,16 @@ static void __io_queue_sqe(struct io_kiocb *req)
>  			io_put_req(req);
>  		}
>  	} else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
> -		if (!io_arm_poll_handler(req)) {
> +		switch (io_arm_poll_handler(req)) {
> +		case IO_APOLL_READY:
> +			goto issue_sqe;
> +		case IO_APOLL_ABORTED:
>  			/*
>  			 * Queued up for async execution, worker will release
>  			 * submit reference when the iocb is actually submitted.
>  			 */
>  			io_queue_async_work(req);
> +			break;

Hmm, why there is a new break here? It will miscount @linked_timeout
if you do that. Every io_prep_linked_timeout() should be matched with
io_queue_linked_timeout().


>  		}
>  	} else {
>  		io_req_complete_failed(req, ret);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] io_uring: reduce latency by reissueing the operation
  2021-06-22 17:54 ` Pavel Begunkov
@ 2021-06-22 18:01   ` Pavel Begunkov
  2021-06-22 19:05     ` Olivier Langlois
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2021-06-22 18:01 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, io-uring, linux-kernel

On 6/22/21 6:54 PM, Pavel Begunkov wrote:
> On 6/22/21 1:17 PM, Olivier Langlois wrote:
>> It is quite frequent that when an operation fails and returns EAGAIN,
>> the data becomes available between that failure and the call to
>> vfs_poll() done by io_arm_poll_handler().
>>
>> Detecting the situation and reissuing the operation is much faster
>> than going ahead and push the operation to the io-wq.
>>
>> Performance improvement testing has been performed with:
>> Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.
>>
>> 4 measurements have been taken:
>> 1. The time it takes to process a read request when data is already available
>> 2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
>> 3. The time it takes to execute io_queue_async_work()
>> 4. The time it takes to complete a read request asynchronously
>>
>> 2.25% of all the read operations did use the new path.
>>
>> ready data (baseline)
>> avg	3657.94182918628
>> min	580
>> max	20098
>> stddev	1213.15975908162
>>
>> reissue	completion
>> average	7882.67567567568
>> min	2316
>> max	28811
>> stddev	1982.79172973284
>>
>> insert io-wq time
>> average	8983.82276995305
>> min	3324
>> max	87816
>> stddev	2551.60056552038
>>
>> async time completion
>> average	24670.4758861127
>> min	10758
>> max	102612
>> stddev	3483.92416873804
>>
>> Conclusion:
>> On average reissuing the sqe with the patch code is 1.1uSec faster and
>> in the worse case scenario 59uSec faster than placing the request on
>> io-wq
>>
>> On average completion time by reissuing the sqe with the patch code is
>> 16.79uSec faster and in the worse case scenario 73.8uSec faster than
>> async completion.
>>
>> Signed-off-by: Olivier Langlois <[email protected]>
>> ---
>>  fs/io_uring.c | 31 ++++++++++++++++++++++---------
>>  1 file changed, 22 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index fc8637f591a6..5efa67c2f974 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
> 
> [...]
> 
>>  static bool __io_poll_remove_one(struct io_kiocb *req,
>> @@ -6437,6 +6445,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
>>  	struct io_kiocb *linked_timeout = io_prep_linked_timeout(req);
>>  	int ret;
>>  
>> +issue_sqe:
>>  	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
>>  
>>  	/*
>> @@ -6456,12 +6465,16 @@ static void __io_queue_sqe(struct io_kiocb *req)
>>  			io_put_req(req);
>>  		}
>>  	} else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
>> -		if (!io_arm_poll_handler(req)) {
>> +		switch (io_arm_poll_handler(req)) {
>> +		case IO_APOLL_READY:
>> +			goto issue_sqe;
>> +		case IO_APOLL_ABORTED:
>>  			/*
>>  			 * Queued up for async execution, worker will release
>>  			 * submit reference when the iocb is actually submitted.
>>  			 */
>>  			io_queue_async_work(req);
>> +			break;
> 
> Hmm, why there is a new break here? It will miscount @linked_timeout
> if you do that. Every io_prep_linked_timeout() should be matched with
> io_queue_linked_timeout().

Never mind, I said some nonsense and apparently need some coffee


>>  		}
>>  	} else {
>>  		io_req_complete_failed(req, ret);
>>
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] io_uring: reduce latency by reissueing the operation
  2021-06-22 18:01   ` Pavel Begunkov
@ 2021-06-22 19:05     ` Olivier Langlois
  2021-06-22 20:51       ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Olivier Langlois @ 2021-06-22 19:05 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring, linux-kernel

On Tue, 2021-06-22 at 19:01 +0100, Pavel Begunkov wrote:
> On 6/22/21 6:54 PM, Pavel Begunkov wrote:
> > On 6/22/21 1:17 PM, Olivier Langlois wrote:
> > > 
> > 
> > >  static bool __io_poll_remove_one(struct io_kiocb *req,
> > > @@ -6437,6 +6445,7 @@ static void __io_queue_sqe(struct io_kiocb
> > > *req)
> > >         struct io_kiocb *linked_timeout =
> > > io_prep_linked_timeout(req);
> > >         int ret;
> > >  
> > > +issue_sqe:
> > >         ret = io_issue_sqe(req,
> > > IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
> > >  
> > >         /*
> > > @@ -6456,12 +6465,16 @@ static void __io_queue_sqe(struct
> > > io_kiocb *req)
> > >                         io_put_req(req);
> > >                 }
> > >         } else if (ret == -EAGAIN && !(req->flags &
> > > REQ_F_NOWAIT)) {
> > > -               if (!io_arm_poll_handler(req)) {
> > > +               switch (io_arm_poll_handler(req)) {
> > > +               case IO_APOLL_READY:
> > > +                       goto issue_sqe;
> > > +               case IO_APOLL_ABORTED:
> > >                         /*
> > >                          * Queued up for async execution, worker
> > > will release
> > >                          * submit reference when the iocb is
> > > actually submitted.
> > >                          */
> > >                         io_queue_async_work(req);
> > > +                       break;
> > 
> > Hmm, why there is a new break here? It will miscount
> > @linked_timeout
> > if you do that. Every io_prep_linked_timeout() should be matched
> > with
> > io_queue_linked_timeout().
> 
> Never mind, I said some nonsense and apparently need some coffee

but this is a pertinant question, imho. I guess that you could get away
without it since it is the last case of the switch statement... I am
not sure what kernel coding standard says about that.

However, I can tell you that there was also a break statement at the
end of the case for IO_APOLL_READY and checkpatch.pl did complain about
it saying that it was useless since it was following a goto statement.
Therefore, I did remove that one.

checkpatch.pl did remain silent about the other remaining break. Hence
this is why I left it there.

Greetings,



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] io_uring: reduce latency by reissueing the operation
  2021-06-22 19:05     ` Olivier Langlois
@ 2021-06-22 20:51       ` Pavel Begunkov
  0 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2021-06-22 20:51 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, io-uring, linux-kernel

On 6/22/21 8:05 PM, Olivier Langlois wrote:
> On Tue, 2021-06-22 at 19:01 +0100, Pavel Begunkov wrote:
>> On 6/22/21 6:54 PM, Pavel Begunkov wrote:
>>> On 6/22/21 1:17 PM, Olivier Langlois wrote:
>>>>
>>>
>>>>  static bool __io_poll_remove_one(struct io_kiocb *req,
>>>> @@ -6437,6 +6445,7 @@ static void __io_queue_sqe(struct io_kiocb
>>>> *req)
>>>>         struct io_kiocb *linked_timeout =
>>>> io_prep_linked_timeout(req);
>>>>         int ret;
>>>>  
>>>> +issue_sqe:
>>>>         ret = io_issue_sqe(req,
>>>> IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
>>>>  
>>>>         /*
>>>> @@ -6456,12 +6465,16 @@ static void __io_queue_sqe(struct
>>>> io_kiocb *req)
>>>>                         io_put_req(req);
>>>>                 }
>>>>         } else if (ret == -EAGAIN && !(req->flags &
>>>> REQ_F_NOWAIT)) {
>>>> -               if (!io_arm_poll_handler(req)) {
>>>> +               switch (io_arm_poll_handler(req)) {
>>>> +               case IO_APOLL_READY:
>>>> +                       goto issue_sqe;
>>>> +               case IO_APOLL_ABORTED:
>>>>                         /*
>>>>                          * Queued up for async execution, worker
>>>> will release
>>>>                          * submit reference when the iocb is
>>>> actually submitted.
>>>>                          */
>>>>                         io_queue_async_work(req);
>>>> +                       break;
>>>
>>> Hmm, why there is a new break here? It will miscount
>>> @linked_timeout
>>> if you do that. Every io_prep_linked_timeout() should be matched
>>> with
>>> io_queue_linked_timeout().
>>
>> Never mind, I said some nonsense and apparently need some coffee
> 
> but this is a pertinant question, imho. I guess that you could get away

It appeared to me that it doesn't go down to the end of the function
but returns or so, that's the nonsense part.

> without it since it is the last case of the switch statement... I am
> not sure what kernel coding standard says about that.

breaks are preferable, and falling through should be explicitly
marked with fallthrough;
 
> However, I can tell you that there was also a break statement at the
> end of the case for IO_APOLL_READY and checkpatch.pl did complain about
> it saying that it was useless since it was following a goto statement.
> Therefore, I did remove that one.
> 
> checkpatch.pl did remain silent about the other remaining break. Hence
> this is why I left it there.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] io_uring: reduce latency by reissueing the operation
  2021-06-22 12:17 [PATCH v4] io_uring: reduce latency by reissueing the operation Olivier Langlois
  2021-06-22 17:54 ` Pavel Begunkov
@ 2021-06-22 20:52 ` Pavel Begunkov
  2021-06-25  0:45 ` Jens Axboe
  2 siblings, 0 replies; 9+ messages in thread
From: Pavel Begunkov @ 2021-06-22 20:52 UTC (permalink / raw)
  To: Olivier Langlois, Jens Axboe, io-uring, linux-kernel

On 6/22/21 1:17 PM, Olivier Langlois wrote:
> It is quite frequent that when an operation fails and returns EAGAIN,
> the data becomes available between that failure and the call to
> vfs_poll() done by io_arm_poll_handler().

Looks good

Reviewed-by: Pavel Begunkov <[email protected]>

> Detecting the situation and reissuing the operation is much faster
> than going ahead and push the operation to the io-wq.
> 
> Performance improvement testing has been performed with:
> Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.
> 
> 4 measurements have been taken:
> 1. The time it takes to process a read request when data is already available
> 2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
> 3. The time it takes to execute io_queue_async_work()
> 4. The time it takes to complete a read request asynchronously
> 
> 2.25% of all the read operations did use the new path.
> 
> ready data (baseline)
> avg	3657.94182918628
> min	580
> max	20098
> stddev	1213.15975908162
> 
> reissue	completion
> average	7882.67567567568
> min	2316
> max	28811
> stddev	1982.79172973284
> 
> insert io-wq time
> average	8983.82276995305
> min	3324
> max	87816
> stddev	2551.60056552038
> 
> async time completion
> average	24670.4758861127
> min	10758
> max	102612
> stddev	3483.92416873804
> 
> Conclusion:
> On average reissuing the sqe with the patch code is 1.1uSec faster and
> in the worse case scenario 59uSec faster than placing the request on
> io-wq
> 
> On average completion time by reissuing the sqe with the patch code is
> 16.79uSec faster and in the worse case scenario 73.8uSec faster than
> async completion.
> 
> Signed-off-by: Olivier Langlois <[email protected]>
> ---
>  fs/io_uring.c | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index fc8637f591a6..5efa67c2f974 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -5152,7 +5152,13 @@ static __poll_t __io_arm_poll_handler(struct io_kiocb *req,
>  	return mask;
>  }
>  
> -static bool io_arm_poll_handler(struct io_kiocb *req)
> +enum {
> +	IO_APOLL_OK,
> +	IO_APOLL_ABORTED,
> +	IO_APOLL_READY
> +};
> +
> +static int io_arm_poll_handler(struct io_kiocb *req)
>  {
>  	const struct io_op_def *def = &io_op_defs[req->opcode];
>  	struct io_ring_ctx *ctx = req->ctx;
> @@ -5162,22 +5168,22 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
>  	int rw;
>  
>  	if (!req->file || !file_can_poll(req->file))
> -		return false;
> +		return IO_APOLL_ABORTED;
>  	if (req->flags & REQ_F_POLLED)
> -		return false;
> +		return IO_APOLL_ABORTED;
>  	if (def->pollin)
>  		rw = READ;
>  	else if (def->pollout)
>  		rw = WRITE;
>  	else
> -		return false;
> +		return IO_APOLL_ABORTED;
>  	/* if we can't nonblock try, then no point in arming a poll handler */
>  	if (!io_file_supports_async(req, rw))
> -		return false;
> +		return IO_APOLL_ABORTED;
>  
>  	apoll = kmalloc(sizeof(*apoll), GFP_ATOMIC);
>  	if (unlikely(!apoll))
> -		return false;
> +		return IO_APOLL_ABORTED;
>  	apoll->double_poll = NULL;
>  
>  	req->flags |= REQ_F_POLLED;
> @@ -5203,12 +5209,14 @@ static bool io_arm_poll_handler(struct io_kiocb *req)
>  	if (ret || ipt.error) {
>  		io_poll_remove_double(req);
>  		spin_unlock_irq(&ctx->completion_lock);
> -		return false;
> +		if (ret)
> +			return IO_APOLL_READY;
> +		return IO_APOLL_ABORTED;
>  	}
>  	spin_unlock_irq(&ctx->completion_lock);
>  	trace_io_uring_poll_arm(ctx, req, req->opcode, req->user_data,
>  				mask, apoll->poll.events);
> -	return true;
> +	return IO_APOLL_OK;
>  }
>  
>  static bool __io_poll_remove_one(struct io_kiocb *req,
> @@ -6437,6 +6445,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
>  	struct io_kiocb *linked_timeout = io_prep_linked_timeout(req);
>  	int ret;
>  
> +issue_sqe:
>  	ret = io_issue_sqe(req, IO_URING_F_NONBLOCK|IO_URING_F_COMPLETE_DEFER);
>  
>  	/*
> @@ -6456,12 +6465,16 @@ static void __io_queue_sqe(struct io_kiocb *req)
>  			io_put_req(req);
>  		}
>  	} else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
> -		if (!io_arm_poll_handler(req)) {
> +		switch (io_arm_poll_handler(req)) {
> +		case IO_APOLL_READY:
> +			goto issue_sqe;
> +		case IO_APOLL_ABORTED:
>  			/*
>  			 * Queued up for async execution, worker will release
>  			 * submit reference when the iocb is actually submitted.
>  			 */
>  			io_queue_async_work(req);
> +			break;
>  		}
>  	} else {
>  		io_req_complete_failed(req, ret);
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] io_uring: reduce latency by reissueing the operation
  2021-06-22 12:17 [PATCH v4] io_uring: reduce latency by reissueing the operation Olivier Langlois
  2021-06-22 17:54 ` Pavel Begunkov
  2021-06-22 20:52 ` Pavel Begunkov
@ 2021-06-25  0:45 ` Jens Axboe
  2021-06-25  8:15   ` David Laight
  2 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2021-06-25  0:45 UTC (permalink / raw)
  To: Olivier Langlois, Pavel Begunkov, io-uring, linux-kernel

On 6/22/21 6:17 AM, Olivier Langlois wrote:
> It is quite frequent that when an operation fails and returns EAGAIN,
> the data becomes available between that failure and the call to
> vfs_poll() done by io_arm_poll_handler().
> 
> Detecting the situation and reissuing the operation is much faster
> than going ahead and push the operation to the io-wq.
> 
> Performance improvement testing has been performed with:
> Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.
> 
> 4 measurements have been taken:
> 1. The time it takes to process a read request when data is already available
> 2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data was available
> 3. The time it takes to execute io_queue_async_work()
> 4. The time it takes to complete a read request asynchronously
> 
> 2.25% of all the read operations did use the new path.
> 
> ready data (baseline)
> avg	3657.94182918628
> min	580
> max	20098
> stddev	1213.15975908162
> 
> reissue	completion
> average	7882.67567567568
> min	2316
> max	28811
> stddev	1982.79172973284
> 
> insert io-wq time
> average	8983.82276995305
> min	3324
> max	87816
> stddev	2551.60056552038
> 
> async time completion
> average	24670.4758861127
> min	10758
> max	102612
> stddev	3483.92416873804
> 
> Conclusion:
> On average reissuing the sqe with the patch code is 1.1uSec faster and
> in the worse case scenario 59uSec faster than placing the request on
> io-wq
> 
> On average completion time by reissuing the sqe with the patch code is
> 16.79uSec faster and in the worse case scenario 73.8uSec faster than
> async completion.

Thanks for respinning with a (much) better commit message. Applied.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v4] io_uring: reduce latency by reissueing the operation
  2021-06-25  0:45 ` Jens Axboe
@ 2021-06-25  8:15   ` David Laight
  2021-06-28  6:42     ` Olivier Langlois
  0 siblings, 1 reply; 9+ messages in thread
From: David Laight @ 2021-06-25  8:15 UTC (permalink / raw)
  To: 'Jens Axboe', Olivier Langlois, Pavel Begunkov,
	[email protected], [email protected]

From: Jens Axboe
> Sent: 25 June 2021 01:45
> 
> On 6/22/21 6:17 AM, Olivier Langlois wrote:
> > It is quite frequent that when an operation fails and returns EAGAIN,
> > the data becomes available between that failure and the call to
> > vfs_poll() done by io_arm_poll_handler().
> >
> > Detecting the situation and reissuing the operation is much faster
> > than going ahead and push the operation to the io-wq.
> >
> > Performance improvement testing has been performed with:
> > Single thread, 1 TCP connection receiving a 5 Mbps stream, no sqpoll.
> >
> > 4 measurements have been taken:
> > 1. The time it takes to process a read request when data is already available
> > 2. The time it takes to process by calling twice io_issue_sqe() after vfs_poll() indicated that data
> was available
> > 3. The time it takes to execute io_queue_async_work()
> > 4. The time it takes to complete a read request asynchronously
> >
> > 2.25% of all the read operations did use the new path.

How much slower is it when the data to complete the read isn't
available?

I suspect there are different workflows where that is almost
always true.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v4] io_uring: reduce latency by reissueing the operation
  2021-06-25  8:15   ` David Laight
@ 2021-06-28  6:42     ` Olivier Langlois
  0 siblings, 0 replies; 9+ messages in thread
From: Olivier Langlois @ 2021-06-28  6:42 UTC (permalink / raw)
  To: David Laight, 'Jens Axboe', Pavel Begunkov,
	[email protected], [email protected]

On Fri, 2021-06-25 at 08:15 +0000, David Laight wrote:
> From: Jens Axboe
> > Sent: 25 June 2021 01:45
> > 
> > On 6/22/21 6:17 AM, Olivier Langlois wrote:
> > > It is quite frequent that when an operation fails and returns
> > > EAGAIN,
> > > the data becomes available between that failure and the call to
> > > vfs_poll() done by io_arm_poll_handler().
> > > 
> > > Detecting the situation and reissuing the operation is much
> > > faster
> > > than going ahead and push the operation to the io-wq.
> > > 
> > > Performance improvement testing has been performed with:
> > > Single thread, 1 TCP connection receiving a 5 Mbps stream, no
> > > sqpoll.
> > > 
> > > 4 measurements have been taken:
> > > 1. The time it takes to process a read request when data is
> > > already available
> > > 2. The time it takes to process by calling twice io_issue_sqe()
> > > after vfs_poll() indicated that data
> > was available
> > > 3. The time it takes to execute io_queue_async_work()
> > > 4. The time it takes to complete a read request asynchronously
> > > 
> > > 2.25% of all the read operations did use the new path.
> 
> How much slower is it when the data to complete the read isn't
> available?
> 
> I suspect there are different workflows where that is almost
> always true.
> 
David,

in the case that the data to complete isn't available, the request will
be processed exactly as it was before the patch.

Ideally through io_uring fast polling feature. If not possible because
arming the poll has been aborted, the request will be punted to the io-
wq.

Greetings,



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-06-28  6:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-06-22 12:17 [PATCH v4] io_uring: reduce latency by reissueing the operation Olivier Langlois
2021-06-22 17:54 ` Pavel Begunkov
2021-06-22 18:01   ` Pavel Begunkov
2021-06-22 19:05     ` Olivier Langlois
2021-06-22 20:51       ` Pavel Begunkov
2021-06-22 20:52 ` Pavel Begunkov
2021-06-25  0:45 ` Jens Axboe
2021-06-25  8:15   ` David Laight
2021-06-28  6:42     ` Olivier Langlois

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox