From: Hao Xu <[email protected]>
To: Pavel Begunkov <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH RFC 5.13 2/2] io_uring: submit sqes in the original context when waking up sqthread
Date: Wed, 5 May 2021 21:10:07 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
在 2021/4/30 上午6:10, Pavel Begunkov 写道:
> On 4/29/21 9:44 AM, Hao Xu wrote:
>> 在 2021/4/28 下午10:34, Pavel Begunkov 写道:
>>> On 4/28/21 2:32 PM, Hao Xu wrote:
>>>> sqes are submitted by sqthread when it is leveraged, which means there
>>>> is IO latency when waking up sqthread. To wipe it out, submit limited
>>>> number of sqes in the original task context.
>>>> Tests result below:
>>>
>>> Frankly, it can be a nest of corner cases if not now then in the future,
>>> leading to a high maintenance burden. Hence, if we consider the change,
>>> I'd rather want to limit the userspace exposure, so it can be removed
>>> if needed.
>>>
>>> A noticeable change of behaviour here, as Hao recently asked, is that
>>> the ring can be passed to a task from a completely another thread group,
>>> and so the feature would execute from that context, not from the
>>> original/sqpoll one.
>>>
>>> Not sure IORING_ENTER_SQ_DEPUTY knob is needed, but at least can be
>>> ignored if the previous point is addressed.
>>>
>>>>
>>>> 99th latency:
>>>> iops\idle 10us 60us 110us 160us 210us 260us 310us 360us 410us 460us 510us
>>>> with this patch:
>>>> 2k 13 13 12 13 13 12 12 11 11 10.304 11.84
>>>> without this patch:
>>>> 2k 15 14 15 15 15 14 15 14 14 13 11.84
>>>
>>> Not sure the second nine describes it well enough, please can you
>>> add more data? Mean latency, 50%, 90%, 99%, 99.9%, t-put.
>>>
>>> Btw, how happened that only some of the numbers have fractional part?
>>> Can't believe they all but 3 were close enough to integer values.
>>>
>>>> fio config:
>>>> ./run_fio.sh
>>>> fio \
>>>> --ioengine=io_uring --sqthread_poll=1 --hipri=1 --thread=1 --bs=4k \
>>>> --direct=1 --rw=randread --time_based=1 --runtime=300 \
>>>> --group_reporting=1 --filename=/dev/nvme1n1 --sqthread_poll_cpu=30 \
>>>> --randrepeat=0 --cpus_allowed=35 --iodepth=128 --rate_iops=${1} \
>>>> --io_sq_thread_idle=${2}
>>>>
>>>> Signed-off-by: Hao Xu <[email protected]>
>>>> ---
>>>> fs/io_uring.c | 29 +++++++++++++++++++++++------
>>>> include/uapi/linux/io_uring.h | 1 +
>>>> 2 files changed, 24 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 1871fad48412..f0a01232671e 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -1252,7 +1252,12 @@ static void io_queue_async_work(struct io_kiocb *req)
>>>> {
>>>> struct io_ring_ctx *ctx = req->ctx;
>>>> struct io_kiocb *link = io_prep_linked_timeout(req);
>>>> - struct io_uring_task *tctx = req->task->io_uring;
>>>> + struct io_uring_task *tctx = NULL;
>>>> +
>>>> + if (ctx->sq_data && ctx->sq_data->thread)
>>>> + tctx = ctx->sq_data->thread->io_uring;
>>>
>>> without park it's racy, sq_data->thread may become NULL and removed,
>>> as well as its ->io_uring.
>> I now think that it's ok to queue async work to req->task->io_uring. I
>> look through all the OPs, seems only have to take care of async_cancel:
>>
>> io_async_cancel(req) {
>> cancel from req->task->io_uring;
>> cancel from ctx->tctx_list
>> }
>>
>> Given req->task is 'original context', the req to be cancelled may in
>> ctx->sq_data->thread->io_uring's iowq. So search the req from
>> sqthread->io_uring is needed here. This avoids overload in main code
>> path.
>> Did I miss something else?
>
> It must be req->task->io_uring, otherwise cancellations will
> be broken. And using it should be fine in theory because io-wq/etc.
> should be set up by io_uring_add_task_file()
>
>
> One more problem to the pile is io_req_task_work_add() and notify
> mode it choses. Look for IORING_SETUP_SQPOLL in the function.
How about:
notify = TWA_SIGNAL
if ( (is sq mode) and (sqd->thread == NULL or == req->task))
notify = TWA_NONE;
>
> Also, IOPOLL+SQPOLL io_uring_try_cancel_requests() looks like
> may fail (didn't double check it). Look again for IORING_SETUP_SQPOLL.
>
I've excluded IOPOLL. This change will only affect SQPOLL mode.
> I'd rather recommend to go over all uses of IORING_SETUP_SQPOLL
> and think whether it's flawed.
I'm working on this. (no obvious problem through eyes, will put the code
change on more tests)
>
>>
>>
>>>
>>>> + else
>>>> + tctx = req->task->io_uring;
>>>> BUG_ON(!tctx);
>>>> BUG_ON(!tctx->io_wq);
>>>
>>> [snip]
>>>
>>
>
next prev parent reply other threads:[~2021-05-05 13:10 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-28 13:32 [PATCH RFC 5.13 0/2] adaptive sqpoll and its wakeup optimization Hao Xu
2021-04-28 13:32 ` [PATCH RFC 5.13 1/2] io_uring: add support for ns granularity of io_sq_thread_idle Hao Xu
2021-04-28 14:07 ` Pavel Begunkov
2021-04-28 14:16 ` Jens Axboe
2021-04-28 14:53 ` Pavel Begunkov
2021-04-28 14:54 ` Jens Axboe
2021-04-29 3:41 ` Hao Xu
2021-04-29 9:11 ` Pavel Begunkov
2021-05-05 14:07 ` Hao Xu
2021-05-05 17:40 ` Pavel Begunkov
2021-04-29 3:28 ` Hao Xu
2021-04-29 22:15 ` Pavel Begunkov
2021-09-26 10:00 ` Hao Xu
2021-09-28 10:51 ` Pavel Begunkov
2021-09-29 7:52 ` Hao Xu
2021-09-29 9:24 ` Hao Xu
2021-09-29 11:37 ` Pavel Begunkov
2021-09-29 12:13 ` Hao Xu
2021-09-30 8:51 ` Pavel Begunkov
2021-09-30 12:04 ` Pavel Begunkov
2021-10-05 15:00 ` Hao Xu
2021-04-28 13:32 ` [PATCH RFC 5.13 2/2] io_uring: submit sqes in the original context when waking up sqthread Hao Xu
2021-04-28 14:12 ` Jens Axboe
2021-04-29 4:12 ` Hao Xu
2021-04-28 14:34 ` Pavel Begunkov
2021-04-28 14:37 ` Pavel Begunkov
2021-04-29 4:37 ` Hao Xu
2021-04-29 9:28 ` Pavel Begunkov
2021-05-05 11:20 ` Hao Xu
2021-04-28 14:39 ` Jens Axboe
2021-04-28 14:50 ` Pavel Begunkov
2021-04-28 14:53 ` Jens Axboe
2021-04-28 14:56 ` Pavel Begunkov
2021-04-28 15:09 ` Jens Axboe
2021-04-29 4:43 ` Hao Xu
2021-04-29 8:44 ` Hao Xu
2021-04-29 22:10 ` Pavel Begunkov
2021-05-05 13:10 ` Hao Xu [this message]
2021-05-05 17:44 ` Pavel Begunkov
2021-04-29 22:02 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=06927a9b-42ad-61ef-1f6a-fe54011d05c4@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox