From: Pavel Begunkov <[email protected]>
To: Paul Moore <[email protected]>,
Dan Clash <[email protected]>
Cc: [email protected], [email protected],
Jens Axboe <[email protected]>
Subject: Re: io_uring: worker thread NULL dereference during openat op
Date: Tue, 16 Apr 2024 14:45:29 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAHC9VhTWbFu8vbapWG5ndt=r-y5SkSSe=AA3YEufreYtjPMrUw@mail.gmail.com>
On 4/16/24 04:29, Paul Moore wrote:
> On Mon, Apr 15, 2024 at 7:26 PM Dan Clash <[email protected]> wrote:
>>
>> Below is a test program that causes multiple io_uring worker threads to
>> hit a NULL dereference while executing openat ops.
>>
>> The test program hangs forever in a D state. The test program can be
>> run again after the NULL dereferences. However, there are long delays
>> at reboot time because the io_uring_cancel() during do_exit() attempts
>> to wake the worker threads.
>>
>> The test program is single threaded but it queues multiple openat and
>> close ops with IOSQE_ASYNC set before waiting for completions.
>>
>> I collected trace with /sys/kernel/tracing/events/io_uring/enable
>> enabled if that is helpful.
>>
>> The test program reproduces similar problems in the following releases.
>>
>> mainline v6.9-rc3
>> stable 6.8.5
>> Ubuntu 6.5.0-1018-azure
>>
>> The test program does not reproduce the problem in Ubuntu
>> 5.15.0-1052-azure, which does not have the io_uring audit changes.
>>
>> The following is the first io_uring worker thread backtrace in the repro
>> against v6.9-rc3.
>>
>> BUG: kernel NULL pointer dereference, address: 0000000000000000
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 0 P4D 0
>> Oops: 0000 [#1] SMP PTI
>> CPU: 0 PID: 4628 Comm: iou-wrk-4605 Not tainted 6.9.0-rc3 #2
>> Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine,
>> BIOS Hyper-V UEFI Release v4.1 11/28/2023
>> RIP: 0010:strlen (lib/string.c:402)
>> Call Trace:
>> <TASK>
>> ? show_regs (arch/x86/kernel/dumpstack.c:479)
>> ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
>> ? page_fault_oops (arch/x86/mm/fault.c:713)
>> ? do_user_addr_fault (arch/x86/mm/fault.c:1261)
>> ? exc_page_fault (./arch/x86/include/asm/irqflags.h:37
>> ./arch/x86/include/asm/irqflags.h:72 arch/x86/mm/fault.c:1513
>> arch/x86/mm/fault.c:1563)
>> ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
>> ? __pfx_strlen (lib/string.c:402)
>> ? parent_len (kernel/auditfilter.c:1284).
>> __audit_inode (kernel/auditsc.c:2381 (discriminator 4))
>
> Thanks for the well documented bug report!
>
> That's interesting, it looks like audit_inode() is potentially being
> passed a filename struct with a NULL name field (filename::name ==
> NULL). Given the IOSQE_ASYNC and what looks like io_uring calling
> getname() from within the __io_openat_prep() function, I suspect the
> issue is that we aren't associating the filename information we
> collect in getname() with the proper audit_context(). In other words,
> we do the getname() in one context, and then the actual open operation
> in another, and the audit filename info is lost in the switch.
>
> I think this is related to another issue that Jens and I have been
> discussing relating to connect() and sockaddrs. We had been hoping
> that the issue we were seeing with sockaddrs was just a special case
> with connect, but it doesn't look like that is the case.
>
> I'm going to be a bit busy this week with conferences, but given the
> previous discussions with Jens as well as this new issue, I suspect
> that we are going to need to do some work to support creation of a
> thin, or lazily setup, audit_context that we can initialize in the
> io_uring prep routines for use by getname(), move_addr_to_kernel(),
> etc., store in the io_kiocb struct, and then fully setup in
> audit_uring_entry().
I'd prefer not to leak that much into the io_uring's hot path. I
don't understand specifics of the problem, but let me throw
some ideas:
1) Each io_uring request has a reference to the task it was
submitted from, i.e. req->task, can you use the context from
the submitter task? E.g.
audit_ctx = req->task->audit_context
io_uring explicitly lists all tasks using it, and you can easily
hook in there and initialise audit so that req->ctx->audit_context
is always set.
2) Can we initialise audit for each io-wq task when we create
them? We can also try to share audit ctx b/w iowq tasks and
the task they were created for.
>> ? link_path_walk.part.0.constprop.0 (fs/namei.c:2324)
>> path_openat (fs/namei.c:3550 fs/namei.c:3796)
>> do_filp_open (fs/namei.c:3826)
>> ? alloc_fd (./arch/x86/include/asm/paravirt.h:589 (discriminator 10)
>> ./arch/x86/include/asm/qspinlock.h:57 (discriminator 10)
>> ./include/linux/spinlock.h:204 (discriminator 10)
>> ./include/linux/spinlock_api_smp.h:142 (discriminator 10)
>> ./include/linux/spinlock.h:391 (discriminator 10) fs/file.c:553
>> (discriminator 10))
>> io_openat2 (io_uring/openclose.c:140)
>> io_openat (io_uring/openclose.c:178)
>> io_issue_sqe (io_uring/io_uring.c:1897)
>> io_wq_submit_work (io_uring/io_uring.c:2006)
>> io_worker_handle_work (io_uring/io-wq.c:540 io_uring/io-wq.c:597)
>> io_wq_worker (io_uring/io-wq.c:258 io_uring/io-wq.c:648)
>> ? __pfx_io_wq_worker (io_uring/io-wq.c:627)
>> ? raw_spin_rq_unlock (./arch/x86/include/asm/paravirt.h:589
>> ./arch/x86/include/asm/qspinlock.h:57 ./include/linux/spinlock.h:204
>> ./include/linux/spinlock_api_smp.h:142 kernel/sched/core.c:603)
>> ? finish_task_switch.isra.0 (./arch/x86/include/asm/irqflags.h:42
>> ./arch/x86/include/asm/irqflags.h:77 kernel/sched/sched.h:1397
>> kernel/sched/core.c:5163 kernel/sched/core.c:5281)
>> ? __pfx_io_wq_worker (io_uring/io-wq.c:627)
>> ret_from_fork (arch/x86/kernel/process.c:156)
>> ? __pfx_io_wq_worker (io_uring/io-wq.c:627)
>> ret_from_fork_asm (arch/x86/entry/entry_64.S:256)
>
--
Pavel Begunkov
next prev parent reply other threads:[~2024-04-16 13:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-15 23:26 io_uring: worker thread NULL dereference during openat op Dan Clash
2024-04-16 3:29 ` Paul Moore
2024-04-16 13:45 ` Pavel Begunkov [this message]
2024-04-16 17:15 ` Paul Moore
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox