Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Avi Kivity <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Subject: Re: IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL
Date: Wed, 15 Jun 2022 13:12:23 +0300	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>


On 19/04/2022 20.14, Jens Axboe wrote:
> On 4/19/22 9:21 AM, Jens Axboe wrote:
>> On 4/19/22 6:31 AM, Jens Axboe wrote:
>>> On 4/19/22 6:21 AM, Avi Kivity wrote:
>>>> On 19/04/2022 15.04, Jens Axboe wrote:
>>>>> On 4/19/22 5:57 AM, Avi Kivity wrote:
>>>>>> On 19/04/2022 14.38, Jens Axboe wrote:
>>>>>>> On 4/19/22 5:07 AM, Avi Kivity wrote:
>>>>>>>> A simple webserver shows about 5% loss compared to linux-aio.
>>>>>>>>
>>>>>>>>
>>>>>>>> I expect the loss is due to an optimization that io_uring lacks -
>>>>>>>> inline completion vs workqueue completion:
>>>>>>> I don't think that's it, io_uring never punts to a workqueue for
>>>>>>> completions.
>>>>>> I measured this:
>>>>>>
>>>>>>
>>>>>>
>>>>>>    Performance counter stats for 'system wide':
>>>>>>
>>>>>>            1,273,756 io_uring:io_uring_task_add
>>>>>>
>>>>>>         12.288597765 seconds time elapsed
>>>>>>
>>>>>> Which exactly matches with the number of requests sent. If that's the
>>>>>> wrong counter to measure, I'm happy to try again with the correct
>>>>>> counter.
>>>>> io_uring_task_add() isn't a workqueue, it's task_work. So that is
>>>>> expected.
>> Might actually be implicated. Not because it's a async worker, but
>> because I think we might be losing some affinity in this case. Looking
>> at traces, we're definitely bouncing between the poll completion side
>> and then execution the completion.
>>
>> Can you try this hack? It's against -git + for-5.19/io_uring. If you let
>> me know what base you prefer, I can do a version against that. I see
>> about a 3% win with io_uring with this, and was slower before against
>> linux-aio as you saw as well.
> Another thing to try - get rid of the IPI for TWA_SIGNAL, which I
> believe may be the underlying cause of it.
>

Resurrecting an old thread. I have a question about timeliness of 
completions. Let's assume a request has completed. From the patch, it 
appears that io_uring will only guarantee that a completion appears on 
the completion ring if the thread has entered kernel mode since the 
completion happened. So user-space polling of the completion ring can 
cause unbounded delays.


If this is correct (it's not unreasonable, but should be documented), 
then there should also be a simple way to force a kernel entry. But how 
to do this using liburing? IIUC if I the following apply:


  1. I have no pending sqes

  2. There are pending completions

  3. There is a completed event for which a completion has not been 
appended to the completion queue ring


Then io_uring_wait_cqe() will elide io_uring_enter() and the 
completed-but-not-reported event will be delayed.


> diff --git a/fs/io-wq.c b/fs/io-wq.c
> index 32aeb2c581c5..59987dd212d8 100644
> --- a/fs/io-wq.c
> +++ b/fs/io-wq.c
> @@ -871,7 +871,7 @@ static bool io_wq_for_each_worker(struct io_wqe *wqe,
>   
>   static bool io_wq_worker_wake(struct io_worker *worker, void *data)
>   {
> -	set_notify_signal(worker->task);
> +	set_notify_signal(worker->task, true);
>   	wake_up_process(worker->task);
>   	return false;
>   }
> @@ -991,7 +991,7 @@ static bool __io_wq_worker_cancel(struct io_worker *worker,
>   {
>   	if (work && match->fn(work, match->data)) {
>   		work->flags |= IO_WQ_WORK_CANCEL;
> -		set_notify_signal(worker->task);
> +		set_notify_signal(worker->task, true);
>   		return true;
>   	}
>   
> diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
> index 3c8b34876744..ac1f14973e09 100644
> --- a/include/linux/sched/signal.h
> +++ b/include/linux/sched/signal.h
> @@ -359,10 +359,10 @@ static inline void clear_notify_signal(void)
>    * Called to break out of interruptible wait loops, and enter the
>    * exit_to_user_mode_loop().
>    */
> -static inline void set_notify_signal(struct task_struct *task)
> +static inline void set_notify_signal(struct task_struct *task, bool need_ipi)
>   {
>   	if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) &&
> -	    !wake_up_state(task, TASK_INTERRUPTIBLE))
> +	    !wake_up_state(task, TASK_INTERRUPTIBLE) && need_ipi)
>   		kick_process(task);
>   }
>   
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 5d03a2ad1066..bff53f539933 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -367,7 +367,7 @@ static void klp_send_signals(void)
>   			 * Send fake signal to all non-kthread tasks which are
>   			 * still not migrated.
>   			 */
> -			set_notify_signal(task);
> +			set_notify_signal(task, true);
>   		}
>   	}
>   	read_unlock(&tasklist_lock);
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index c59e1a49bc40..47d7024dc499 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -51,7 +51,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>   		set_notify_resume(task);
>   		break;
>   	case TWA_SIGNAL:
> -		set_notify_signal(task);
> +		set_notify_signal(task, false);
>   		break;
>   	default:
>   		WARN_ON_ONCE(1);
>

next prev parent reply	other threads:[~2022-06-15 10:12 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-19 11:07 IORING_OP_POLL_ADD slower than linux-aio IOCB_CMD_POLL Avi Kivity
2022-04-19 11:38 ` Jens Axboe
2022-04-19 11:57   ` Avi Kivity
2022-04-19 12:04     ` Jens Axboe
2022-04-19 12:21       ` Avi Kivity
2022-04-19 12:31         ` Jens Axboe
2022-04-19 15:21           ` Jens Axboe
2022-04-19 15:51             ` Avi Kivity
2022-04-19 17:14             ` Jens Axboe
2022-04-19 19:41               ` Avi Kivity
2022-04-19 19:58                 ` Jens Axboe
2022-04-20 11:55                   ` Avi Kivity
2022-04-20 12:09                     ` Jens Axboe
2022-04-21  9:05                       ` Avi Kivity
2022-06-15 10:12               ` Avi Kivity [this message]
2022-06-15 10:48                 ` Pavel Begunkov
2022-06-15 11:04                   ` Avi Kivity
2022-06-15 11:07                     ` Avi Kivity
2022-06-15 11:38                       ` Pavel Begunkov
2022-06-15 12:21                         ` Jens Axboe
2022-06-15 13:43                           ` Avi Kivity
2022-06-15 11:30                     ` Pavel Begunkov
2022-06-15 11:36                       ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox