From: Jens Axboe <[email protected]>
To: Oleg Nesterov <[email protected]>
Cc: [email protected], [email protected],
Peter Zijlstra <[email protected]>
Subject: Re: [PATCH 4/4] io_uring: flush task work before waiting for ring exit
Date: Wed, 8 Apr 2020 12:06:24 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 4/8/20 11:48 AM, Jens Axboe wrote:
> On 4/8/20 11:40 AM, Oleg Nesterov wrote:
>> Jens, I am sorry. I tried to understand your explanations but I can't :/
>> Just in case, I know nothing about io_uring.
>>
>> However, I strongly believe that
>>
>> - the "task_work_exited" check in 4/4 can't help, the kernel
>> will crash anyway if a task-work callback runs with
>> current->task_works == &task_work_exited.
>>
>> - this check is not needed with the patch I sent.
>> UNLESS io_ring_ctx_wait_and_kill() can be called by the exiting
>> task AFTER it passes exit_task_work(), but I don't see how this
>> is possible.
>>
>> Lets forget this problem, lets assume that task_work_run() is always safe.
>>
>> I still can not understand why io_ring_ctx_wait_and_kill() needs to call
>> task_work_run().
>>
>> On 04/07, Jens Axboe wrote:
>>>
>>> io_uring exit removes the pending poll requests, but what if (for non
>>> exit invocation), we get poll requests completing before they are torn
>>> down. Now we have task_work queued up that won't get run,
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> this must not be possible. If task_work is queued it will run, or we
>> have another bug.
>>
>>> because we
>>> are are in the task_work handler for the __fput().
>>
>> this doesn't matter...
>>
>>> For this case, we
>>> need to run the task work.
>>
>> This is what I fail to understand :/
>
> Actually debugging this just now to attempt to get to the bottom of it.
> I'm running with Peter's "put fput work at the end at task_work_run
> time" patch (with a head == NULL check that was missing). I get a hang
> on the wait_for_completion() on io_uring exit, and if I dump the
> task_work, this is what I get:
>
> dump_work: dump cb
> cb=ffff88bff25589b8, func=ffffffff812f7310 <- io_poll_task_func()
> cb=ffff88bfdd164600, func=ffffffff812925e0 <- some __fput()
> cb=ffff88bfece13cb8, func=ffffffff812f7310 <- io_poll_task_func()
> cb=ffff88bff78393b8, func=ffffffff812b2c40
>
> and we hang because io_poll_task_func() got queued twice on this task
> _after_ we yanked the current list of work.
>
> I'm adding some more debug items to figure out why this is, just wanted
> to let you know that I'm currently looking into this and will provide
> more data when I have it.
Here's some more data. I added a WARN_ON_ONCE() for task->flags &
PF_EXITING on task_work_add() success, and it triggers with the
following backtrace:
[ 628.291061] RIP: 0010:__io_async_wake+0x14a/0x160
[ 628.300452] Code: 8b b8 c8 00 00 00 e8 75 df 00 00 ba 01 00 00 00 48 89 ee 48 89 c7 49 89 c6 e8 82 dd de ff e9 59 ff ff ff 0f 0b e9 52 ff ff ff <0f> 0b e9 40 ff ff ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00
[ 628.337930] RSP: 0018:ffffc9000c830a40 EFLAGS: 00010002
[ 628.348365] RAX: 0000000000000000 RBX: ffff88bfe85fc200 RCX: ffff88bff58491b8
[ 628.362610] RDX: 0000000000000001 RSI: ffff88bfe85fc2b8 RDI: ffff889fc929f000
[ 628.376855] RBP: ffff88bfe85fc2b8 R08: 00000000000000c3 R09: ffffc9000c830ad0
[ 628.391087] R10: 0000000000000000 R11: ffff889ff01000a0 R12: ffffc9000c830ad0
[ 628.405317] R13: ffff889fb405fc00 R14: ffff889fc929f000 R15: ffff88bfe85fc200
[ 628.419549] FS: 0000000000000000(0000) GS:ffff889fff5c0000(0000) knlGS:0000000000000000
[ 628.435706] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 628.447178] CR2: 00007f40a3c4b8f0 CR3: 0000000002409002 CR4: 00000000003606e0
[ 628.461427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 628.475675] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 628.489924] Call Trace:
[ 628.494810] <IRQ>
[ 628.498835] ? __io_queue_sqe.part.97+0x750/0x750
[ 628.508229] ? tcp_ack_update_rtt.isra.55+0x113/0x430
[ 628.518320] __wake_up_common+0x71/0x140
[ 628.526152] __wake_up_common_lock+0x7c/0xc0
[ 628.534681] sock_def_readable+0x3c/0x70
[ 628.542516] tcp_data_queue+0x2d9/0xb50
[ 628.550175] tcp_rcv_established+0x1ce/0x620
[ 628.558703] ? sk_filter_trim_cap+0x4f/0x200
[ 628.567232] tcp_v6_do_rcv+0xbe/0x3b0
[ 628.574545] tcp_v6_rcv+0xa8d/0xb20
[ 628.581516] ip6_protocol_deliver_rcu+0xb4/0x450
[ 628.590736] ip6_input_finish+0x11/0x20
[ 628.598396] ip6_input+0xa0/0xb0
[ 628.604845] ? tcp_v6_early_demux+0x90/0x140
[ 628.613371] ? tcp_v6_early_demux+0xdb/0x140
[ 628.621902] ? ip6_rcv_finish_core.isra.21+0x66/0x90
[ 628.631816] ipv6_rcv+0xc0/0xd0
[ 628.638092] __netif_receive_skb_one_core+0x50/0x70
[ 628.647833] netif_receive_skb_internal+0x2f/0xa0
[ 628.657226] napi_gro_receive+0xe7/0x150
[ 628.665068] mlx5e_handle_rx_cqe+0x8c/0x100
[ 628.673416] mlx5e_poll_rx_cq+0xef/0x95b
[ 628.681252] mlx5e_napi_poll+0xe2/0x610
[ 628.688913] net_rx_action+0x132/0x360
[ 628.696403] __do_softirq+0xd3/0x2e6
[ 628.703545] irq_exit+0xa5/0xb0
[ 628.709816] do_IRQ+0x79/0xd0
[ 628.715744] common_interrupt+0xf/0xf
[ 628.723057] </IRQ>
[ 628.727256] RIP: 0010:cpuidle_enter_state+0xac/0x410
which means that we've successfully added the task_work while the
process is exiting. Maybe I can work-around this by checking myself
instead of relying on task_work_add() finding work_exited on the list.
--
Jens Axboe
next prev parent reply other threads:[~2020-04-08 19:06 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-07 16:02 [PATCHSET v2] io_uring and task_work interactions Jens Axboe
2020-04-07 16:02 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
2020-04-07 17:52 ` Jann Horn
2020-04-07 16:02 ` [PATCH 2/4] task_work: kill current->task_works checking in callers Jens Axboe
2020-04-07 16:02 ` [PATCH 3/4] task_work: make exit_work externally visible Jens Axboe
2020-04-07 16:02 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe
2020-04-07 16:24 ` Oleg Nesterov
2020-04-07 16:38 ` Oleg Nesterov
2020-04-07 20:30 ` Jens Axboe
2020-04-07 20:39 ` Jens Axboe
2020-04-08 18:40 ` Oleg Nesterov
2020-04-08 18:48 ` Jens Axboe
2020-04-08 19:06 ` Jens Axboe [this message]
2020-04-08 20:17 ` Oleg Nesterov
2020-04-08 20:25 ` Jens Axboe
2020-04-08 21:19 ` Jens Axboe
2020-04-09 18:50 ` Oleg Nesterov
2020-04-10 0:29 ` Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
2020-04-06 19:48 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox