public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Oleg Nesterov <[email protected]>
Cc: [email protected], [email protected],
	Peter Zijlstra <[email protected]>
Subject: Re: [PATCH 4/4] io_uring: flush task work before waiting for ring exit
Date: Wed, 8 Apr 2020 12:06:24 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 4/8/20 11:48 AM, Jens Axboe wrote:
> On 4/8/20 11:40 AM, Oleg Nesterov wrote:
>> Jens, I am sorry. I tried to understand your explanations but I can't :/
>> Just in case, I know nothing about io_uring.
>>
>> However, I strongly believe that
>>
>> 	- the "task_work_exited" check in 4/4 can't help, the kernel
>> 	  will crash anyway if a task-work callback runs with
>> 	  current->task_works == &task_work_exited.
>>
>> 	- this check is not needed with the patch I sent.
>> 	  UNLESS io_ring_ctx_wait_and_kill() can be called by the exiting
>> 	  task AFTER it passes exit_task_work(), but I don't see how this
>> 	  is possible.
>>
>> Lets forget this problem, lets assume that task_work_run() is always safe.
>>
>> I still can not understand why io_ring_ctx_wait_and_kill() needs to call
>> task_work_run().
>>
>> On 04/07, Jens Axboe wrote:
>>>
>>> io_uring exit removes the pending poll requests, but what if (for non
>>> exit invocation), we get poll requests completing before they are torn
>>> down. Now we have task_work queued up that won't get run,
>>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> this must not be possible. If task_work is queued it will run, or we
>> have another bug.
>>
>>> because we
>>> are are in the task_work handler for the __fput().
>>
>> this doesn't matter...
>>
>>> For this case, we
>>> need to run the task work.
>>
>> This is what I fail to understand :/
> 
> Actually debugging this just now to attempt to get to the bottom of it.
> I'm running with Peter's "put fput work at the end at task_work_run
> time" patch (with a head == NULL check that was missing). I get a hang
> on the wait_for_completion() on io_uring exit, and if I dump the
> task_work, this is what I get:
> 
> dump_work: dump cb
> cb=ffff88bff25589b8, func=ffffffff812f7310	<- io_poll_task_func()
> cb=ffff88bfdd164600, func=ffffffff812925e0	<- some __fput()
> cb=ffff88bfece13cb8, func=ffffffff812f7310	<- io_poll_task_func()
> cb=ffff88bff78393b8, func=ffffffff812b2c40
> 
> and we hang because io_poll_task_func() got queued twice on this task
> _after_ we yanked the current list of work.
> 
> I'm adding some more debug items to figure out why this is, just wanted
> to let you know that I'm currently looking into this and will provide
> more data when I have it.

Here's some more data. I added a WARN_ON_ONCE() for task->flags &
PF_EXITING on task_work_add() success, and it triggers with the
following backtrace:

[  628.291061] RIP: 0010:__io_async_wake+0x14a/0x160
[  628.300452] Code: 8b b8 c8 00 00 00 e8 75 df 00 00 ba 01 00 00 00 48 89 ee 48 89 c7 49 89 c6 e8 82 dd de ff e9 59 ff ff ff 0f 0b e9 52 ff ff ff <0f> 0b e9 40 ff ff ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00
[  628.337930] RSP: 0018:ffffc9000c830a40 EFLAGS: 00010002
[  628.348365] RAX: 0000000000000000 RBX: ffff88bfe85fc200 RCX: ffff88bff58491b8
[  628.362610] RDX: 0000000000000001 RSI: ffff88bfe85fc2b8 RDI: ffff889fc929f000
[  628.376855] RBP: ffff88bfe85fc2b8 R08: 00000000000000c3 R09: ffffc9000c830ad0
[  628.391087] R10: 0000000000000000 R11: ffff889ff01000a0 R12: ffffc9000c830ad0
[  628.405317] R13: ffff889fb405fc00 R14: ffff889fc929f000 R15: ffff88bfe85fc200
[  628.419549] FS:  0000000000000000(0000) GS:ffff889fff5c0000(0000) knlGS:0000000000000000
[  628.435706] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  628.447178] CR2: 00007f40a3c4b8f0 CR3: 0000000002409002 CR4: 00000000003606e0
[  628.461427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  628.475675] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  628.489924] Call Trace:
[  628.494810]  <IRQ>
[  628.498835]  ? __io_queue_sqe.part.97+0x750/0x750
[  628.508229]  ? tcp_ack_update_rtt.isra.55+0x113/0x430
[  628.518320]  __wake_up_common+0x71/0x140
[  628.526152]  __wake_up_common_lock+0x7c/0xc0
[  628.534681]  sock_def_readable+0x3c/0x70
[  628.542516]  tcp_data_queue+0x2d9/0xb50
[  628.550175]  tcp_rcv_established+0x1ce/0x620
[  628.558703]  ? sk_filter_trim_cap+0x4f/0x200
[  628.567232]  tcp_v6_do_rcv+0xbe/0x3b0
[  628.574545]  tcp_v6_rcv+0xa8d/0xb20
[  628.581516]  ip6_protocol_deliver_rcu+0xb4/0x450
[  628.590736]  ip6_input_finish+0x11/0x20
[  628.598396]  ip6_input+0xa0/0xb0
[  628.604845]  ? tcp_v6_early_demux+0x90/0x140
[  628.613371]  ? tcp_v6_early_demux+0xdb/0x140
[  628.621902]  ? ip6_rcv_finish_core.isra.21+0x66/0x90
[  628.631816]  ipv6_rcv+0xc0/0xd0
[  628.638092]  __netif_receive_skb_one_core+0x50/0x70
[  628.647833]  netif_receive_skb_internal+0x2f/0xa0
[  628.657226]  napi_gro_receive+0xe7/0x150
[  628.665068]  mlx5e_handle_rx_cqe+0x8c/0x100
[  628.673416]  mlx5e_poll_rx_cq+0xef/0x95b
[  628.681252]  mlx5e_napi_poll+0xe2/0x610
[  628.688913]  net_rx_action+0x132/0x360
[  628.696403]  __do_softirq+0xd3/0x2e6
[  628.703545]  irq_exit+0xa5/0xb0
[  628.709816]  do_IRQ+0x79/0xd0
[  628.715744]  common_interrupt+0xf/0xf
[  628.723057]  </IRQ>
[  628.727256] RIP: 0010:cpuidle_enter_state+0xac/0x410

which means that we've successfully added the task_work while the
process is exiting. Maybe I can work-around this by checking myself
instead of relying on task_work_add() finding work_exited on the list.

-- 
Jens Axboe


  reply	other threads:[~2020-04-08 19:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-07 16:02 [PATCHSET v2] io_uring and task_work interactions Jens Axboe
2020-04-07 16:02 ` [PATCH 1/4] task_work: add task_work_pending() helper Jens Axboe
2020-04-07 17:52   ` Jann Horn
2020-04-07 16:02 ` [PATCH 2/4] task_work: kill current->task_works checking in callers Jens Axboe
2020-04-07 16:02 ` [PATCH 3/4] task_work: make exit_work externally visible Jens Axboe
2020-04-07 16:02 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe
2020-04-07 16:24   ` Oleg Nesterov
2020-04-07 16:38     ` Oleg Nesterov
2020-04-07 20:30       ` Jens Axboe
2020-04-07 20:39         ` Jens Axboe
2020-04-08 18:40         ` Oleg Nesterov
2020-04-08 18:48           ` Jens Axboe
2020-04-08 19:06             ` Jens Axboe [this message]
2020-04-08 20:17               ` Oleg Nesterov
2020-04-08 20:25                 ` Jens Axboe
2020-04-08 21:19                   ` Jens Axboe
2020-04-09 18:50                   ` Oleg Nesterov
2020-04-10  0:29                     ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2020-04-06 19:48 [PATCHSET 0/4] io_uring and task_work interactions Jens Axboe
2020-04-06 19:48 ` [PATCH 4/4] io_uring: flush task work before waiting for ring exit Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox