From: Hao Xu <[email protected]>
To: Pavel Begunkov <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected], Joseph Qi <[email protected]>
Subject: Re: [PATCH v6 0/6] task work optimization
Date: Mon, 6 Dec 2021 17:48:37 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
在 2021/12/6 下午4:35, Hao Xu 写道:
> 在 2021/12/5 下午11:42, Pavel Begunkov 写道:
>> On 12/5/21 15:02, Hao Xu wrote:
>>> 在 2021/12/3 下午10:21, Pavel Begunkov 写道:
>>>> On 12/3/21 07:30, Hao Xu wrote:
>>>>> 在 2021/12/3 上午10:01, Pavel Begunkov 写道:
>>>>>> On 12/3/21 01:39, Pavel Begunkov wrote:
>>>>>>> On 11/26/21 10:07, Hao Xu wrote:
>>>>>>>> v4->v5
>>>>>>>> - change the implementation of merge_wq_list
>>>>>>>>
>>>> [...]
>>>>>> But testing with liburing tests I'm getting the stuff below,
>>>>>> e.g. cq-overflow hits it every time. Double checked that
>>>>>> I took [RESEND] version of 6/6.
>>>>>>
>>>>>> [ 30.360370] BUG: scheduling while atomic:
>>>>>> cq-overflow/2082/0x00000000
>>>>>> [ 30.360520] Call Trace:
>>>>>> [ 30.360523] <TASK>
>>>>>> [ 30.360527] dump_stack_lvl+0x4c/0x63
>>>>>> [ 30.360536] dump_stack+0x10/0x12
>>>>>> [ 30.360540] __schedule_bug.cold+0x50/0x5e
>>>>>> [ 30.360545] __schedule+0x754/0x900
>>>>>> [ 30.360551] ? __io_cqring_overflow_flush+0xb6/0x200
>>>>>> [ 30.360558] schedule+0x55/0xd0
>>>>>> [ 30.360563] schedule_timeout+0xf8/0x140
>>>>>> [ 30.360567] ? prepare_to_wait_exclusive+0x58/0xa0
>>>>>> [ 30.360573] __x64_sys_io_uring_enter+0x69c/0x8e0
>>>>>> [ 30.360578] ? io_rsrc_buf_put+0x30/0x30
>>>>>> [ 30.360582] do_syscall_64+0x3b/0x80
>>>>>> [ 30.360588] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>>> [ 30.360592] RIP: 0033:0x7f9f9680118d
>>>>>> [ 30.360618] </TASK>
>>>>>> [ 30.362295] BUG: scheduling while atomic:
>>>>>> cq-overflow/2082/0x7ffffffe
>>>>>> [ 30.362396] Call Trace:
>>>>>> [ 30.362397] <TASK>
>>>>>> [ 30.362399] dump_stack_lvl+0x4c/0x63
>>>>>> [ 30.362406] dump_stack+0x10/0x12
>>>>>> [ 30.362409] __schedule_bug.cold+0x50/0x5e
>>>>>> [ 30.362413] __schedule+0x754/0x900
>>>>>> [ 30.362419] schedule+0x55/0xd0
>>>>>> [ 30.362423] schedule_timeout+0xf8/0x140
>>>>>> [ 30.362427] ? prepare_to_wait_exclusive+0x58/0xa0
>>>>>> [ 30.362431] __x64_sys_io_uring_enter+0x69c/0x8e0
>>>>>> [ 30.362437] ? io_rsrc_buf_put+0x30/0x30
>>>>>> [ 30.362440] do_syscall_64+0x3b/0x80
>>>>>> [ 30.362445] entry_SYSCALL_64_after_hwframe+0x44/0xae
>>>>>> [ 30.362449] RIP: 0033:0x7f9f9680118d
>>>>>> [ 30.362470] </TASK>
>>>>>> <repeated>
>>>>>>
>>>>> cannot repro this, all the liburing tests work well on my side..
>>>>
>>>> One problem is when on the first iteration tctx_task_work doen't
>>>> have anything in prior_task_list, it goes to handle_tw_list(),
>>>> which sets up @ctx but leaves @locked=false (say there is
>>>> contention). And then on the second iteration it goes to
>>>> handle_prior_tw_list() with non-NULL @ctx and @locked=false,
>>>> and tries to unlock not locked spin.
>>>>
>>>> Not sure that's the exactly the problem from traces, but at
>>>> least a quick hack resetting the ctx at the beginning of
>>>> handle_prior_tw_list() heals it.
>>> Good catch, thanks.
>>>>
>>>> note: apart from the quick fix the diff below includes
>>>> a couple of lines to force it to go through the new path.
>>>>
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 66d119ac4424..3868123eef87 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -2272,6 +2272,9 @@ static inline void
>>>> ctx_commit_and_unlock(struct io_ring_ctx *ctx)
>>>> static void handle_prior_tw_list(struct io_wq_work_node *node,
>>>> struct io_ring_ctx **ctx,
>>>> bool *locked)
>>>> {
>>>> + ctx_flush_and_put(*ctx, locked);
>>>> + *ctx = NULL;
>>>> +
>>>> do {
>>>> struct io_wq_work_node *next = node->next;
>>>> struct io_kiocb *req = container_of(node, struct
>>>> io_kiocb,
>>>> @@ -2283,7 +2286,8 @@ static void handle_prior_tw_list(struct
>>>> io_wq_work_node *node, struct io_ring_ct
>>>> ctx_flush_and_put(*ctx, locked);
>>>> *ctx = req->ctx;
>>>> /* if not contended, grab and improve
>>>> batching */
>>>> - *locked = mutex_trylock(&(*ctx)->uring_lock);
>>>> + *locked = false;
>>>> + // *locked =
>>>> mutex_trylock(&(*ctx)->uring_lock);
>>> I believe this one is your debug code which I shouldn't take, should I?
>>
>> Right, just for debug, helped to catch the issue. FWIW, it doesn't seem
>> ctx_flush_and_put() is a good solution but was good enough to verify
>> my assumptions.
> How about a new compl_lock variable to indicate the completion_lock
> state, which will make the complete_post() batching as large as possible.
>
Forgot to add compl_lock stuff in handle_tw_list(), but anyway I now
think it may not be a good idea to let completion_lock cross
handle_prior_tw_list() and handle_tw_list() since this may delay
the completion committing though it scale up the batching.
next prev parent reply other threads:[~2021-12-06 9:48 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-26 10:07 [PATCH v6 0/6] task work optimization Hao Xu
2021-11-26 10:07 ` [PATCH 1/6] io-wq: add helper to merge two wq_lists Hao Xu
2021-11-26 10:07 ` [PATCH 2/6] io_uring: add a priority tw list for irq completion work Hao Xu
2021-11-26 10:07 ` [PATCH 3/6] io_uring: add helper for task work execution code Hao Xu
2021-11-26 10:07 ` [PATCH 4/6] io_uring: split io_req_complete_post() and add a helper Hao Xu
2021-11-26 10:07 ` [PATCH 5/6] io_uring: move up io_put_kbuf() and io_put_rw_kbuf() Hao Xu
2021-11-26 10:07 ` [PATCH 6/6] io_uring: batch completion in prior_task_list Hao Xu
2021-11-26 12:56 ` Hao Xu
2021-11-26 13:37 ` [PATCH RESEND " Hao Xu
2021-11-27 15:24 ` [PATCH v7] " Hao Xu
2021-11-28 15:28 ` Pavel Begunkov
2021-12-03 1:39 ` [PATCH v6 0/6] task work optimization Pavel Begunkov
2021-12-03 2:01 ` Pavel Begunkov
2021-12-03 7:30 ` Hao Xu
2021-12-03 14:21 ` Pavel Begunkov
2021-12-05 15:02 ` Hao Xu
2021-12-05 15:42 ` Pavel Begunkov
2021-12-06 8:35 ` Hao Xu
2021-12-06 9:48 ` Hao Xu [this message]
2021-12-03 3:24 ` Hao Xu
2021-12-04 20:58 ` Pavel Begunkov
2021-12-05 15:11 ` Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7394c99d-413c-d9fd-ddc4-ebdc8db3f675@linux.alibaba.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox