From: Pavel Begunkov <[email protected]>
To: Xiaoguang Wang <[email protected]>,
[email protected]
Cc: [email protected]
Subject: Re: [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request
Date: Fri, 29 Oct 2021 14:47:25 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 10/29/21 14:37, Xiaoguang Wang wrote:
> hi,
>
>> On 10/29/21 03:57, Xiaoguang Wang wrote:
>>> hi,
>>>
>>>> On 10/25/21 06:38, Xiaoguang Wang wrote:
>>>>> Run echo_server to evaluate io_uring's multi-shot poll performance, perf
>>>>> shows that add_wait_queue() has obvious overhead. Intruduce a new state
>>>>> 'active' in io_poll_iocb to indicate whether io_poll_wake() should queue
>>>>> a task_work. This new state will be set to true initially, be set to false
>>>>> when starting to queue a task work, and be set to true again when a poll
>>>>> cqe has been committed. One concern is that this method may lost waken-up
>>>>> event, but seems it's ok.
>>>>>
>>>>> io_poll_wake io_poll_task_func
>>>>> t1 |
>>>>> t2 | WRITE_ONCE(req->poll.active, true);
>>>>> t3 |
>>>>> t4 | io_commit_cqring(ctx);
>>>>> t5 |
>>>>> t6 |
>>>>>
>>>>> If waken-up events happens before or at t4, it's ok, user app will always
>>>>> see a cqe. If waken-up events happens after t4 and IIUC, io_poll_wake()
>>>>> will see the new req->poll.active value by using READ_ONCE().
>>>>>
>>>>> Echo_server codes can be cloned from:
>>>>> https://codeup.openanolis.cn/codeup/storage/io_uring-echo-server.git,
>>>>> branch is xiaoguangwang/io_uring_multishot.
>>>>>
>>>>> Without this patch, the tps in our test environment is 284116, with
>>>>> this patch, the tps is 287832, about 1.3% reqs improvement, which
>>>>> is indeed in accord with the saved add_wait_queue() cost.
>>>>>
>>>>> Signed-off-by: Xiaoguang Wang <[email protected]>
>>>>> ---
>>>>> fs/io_uring.c | 57 +++++++++++++++++++++++++++++++++------------------------
>>>>> 1 file changed, 33 insertions(+), 24 deletions(-)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 18af9bb9a4bc..e4c779dac953 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -481,6 +481,7 @@ struct io_poll_iocb {
>>>>> __poll_t events;
>>>>> bool done;
>>>>> bool canceled;
>>>>> + bool active;
>>>>> struct wait_queue_entry wait;
>>>>> };
>>>>> @@ -5233,8 +5234,6 @@ static inline int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *pol
>>>>> {
>>>>> trace_io_uring_task_add(req->ctx, req->opcode, req->user_data, mask);
>>>>> - list_del_init(&poll->wait.entry);
>>>>> -
>>>>
>>>> As I mentioned to Hao some time ago, we can't allow this function or in
>>>> particular io_req_task_work_add() to happen twice before the first
>>>> task work got executed, they use the same field in io_kiocb and those
>>>> will corrupt the tw list.
>>>>
>>>> Looks that's what can happen here.
>>> If I have understood scenario your described correctly, I think it won't happen :)
>>> With this patch, if the first io_req_task_work_add() is issued, poll.active
>>> will be set to false, then no new io_req_task_work_add() will be issued.
>>> Only the first task_work installed by the first io_req_task_work_add() has
>>> completed, poll.active will be set to true again.
>>
>> Ah, I see now, the active dance is in io_poll_wake(). That won't work
>> with READ_ONCE/WRITE_ONCE though, you would need real atomics
>>
>> The easiest race to explain is:
>>
>> CPU1 | CPU2
>> io_poll_wake | io_poll_wake
>> if (p->active) return 0; | if (p->active) return 0;
> it's "if (!p->active) return 0;" in my patch :)
>> // p->active is false in both cases, continue
>> p->active = false; | p->active = false;
>> task_work_add() | task_work_add()
> io_poll_wake() is called with poll->head->lock, so there will no concurrent
> io_poll_wake() calls.
Double poll, and also there are no guarantees that CPUs don't
see inconsistent values.
--
Pavel Begunkov
next prev parent reply other threads:[~2021-10-29 13:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-25 5:38 [PATCH v3 0/3] improvements for multi-shot poll requests Xiaoguang Wang
2021-10-25 5:38 ` [PATCH v3 1/3] io_uring: refactor event check out of __io_async_wake() Xiaoguang Wang
2021-10-25 9:35 ` Praveen Kumar
2021-10-25 5:38 ` [PATCH v3 2/3] io_uring: reduce frequent add_wait_queue() overhead for multi-shot poll request Xiaoguang Wang
2021-10-28 19:21 ` Pavel Begunkov
2021-10-29 2:57 ` Xiaoguang Wang
2021-10-29 10:02 ` Pavel Begunkov
2021-10-29 13:37 ` Xiaoguang Wang
2021-10-29 13:47 ` Pavel Begunkov [this message]
2021-10-29 14:12 ` Pavel Begunkov
2021-10-29 14:34 ` Xiaoguang Wang
2021-10-25 5:38 ` [PATCH v3 3/3] io_uring: don't get completion_lock in io_poll_rewait() Xiaoguang Wang
2021-10-28 19:26 ` Pavel Begunkov
2021-10-29 5:59 ` Xiaoguang Wang
2021-10-28 18:19 ` [PATCH v3 0/3] improvements for multi-shot poll requests Jens Axboe
2021-10-29 18:29 ` Jens Axboe
2021-10-28 18:19 ` Jens Axboe
2021-10-28 19:01 ` Pavel Begunkov
2021-10-28 19:04 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox