* io_uring: incorrect assumption about mutex behavior on unlock?
@ 2023-12-01 16:41 Jann Horn
2023-12-01 18:30 ` David Laight
2023-12-01 18:52 ` io_uring: incorrect assumption about mutex behavior on unlock? Pavel Begunkov
0 siblings, 2 replies; 4+ messages in thread
From: Jann Horn @ 2023-12-01 16:41 UTC (permalink / raw)
To: Jens Axboe, Pavel Begunkov, io-uring
Cc: kernel list, Peter Zijlstra, Ingo Molnar, Will Deacon,
Waiman Long
mutex_unlock() has a different API contract compared to spin_unlock().
spin_unlock() can be used to release ownership of an object, so that
as soon as the spinlock is unlocked, another task is allowed to free
the object containing the spinlock.
mutex_unlock() does not support this kind of usage: The caller of
mutex_unlock() must ensure that the mutex stays alive until
mutex_unlock() has returned.
(See the thread
<https://lore.kernel.org/all/[email protected]/>
which discusses adding documentation about this.)
(POSIX userspace mutexes are different from kernel mutexes, in
userspace this pattern is allowed.)
io_ring_exit_work() has a comment that seems to assume that the
uring_lock (which is a mutex) can be used as if the spinlock-style API
contract applied:
/*
* Some may use context even when all refs and requests have been put,
* and they are free to do so while still holding uring_lock or
* completion_lock, see io_req_task_submit(). Apart from other work,
* this lock/unlock section also waits them to finish.
*/
mutex_lock(&ctx->uring_lock);
I couldn't find any way in which io_req_task_submit() actually still
relies on this. I think io_fallback_req_func() now relies on it,
though I'm not sure whether that's intentional. ctx->fallback_work is
flushed in io_ring_ctx_wait_and_kill(), but I think it can probably be
restarted later on via:
io_ring_exit_work -> io_move_task_work_from_local ->
io_req_normal_work_add -> io_fallback_tw(sync=false) ->
schedule_delayed_work
I think it is probably guaranteed that ctx->refs is non-zero when we
enter io_fallback_req_func, since I think we can't enter
io_fallback_req_func with an empty ctx->fallback_llist, and the
requests queued up on ctx->fallback_llist have to hold refcounted
references to the ctx. But by the time we reach the mutex_unlock(), I
think we're not guaranteed to hold any references on the ctx anymore,
and so the ctx could theoretically be freed in the middle of the
mutex_unlock() call?
I think that to make this code properly correct, it might be necessary
to either add another flush_delayed_work() call after ctx->refs has
dropped to zero and we know that the fallback work can't be restarted
anymore, or create an extra ctx->refs reference that is dropped in
io_fallback_req_func() after the mutex_unlock(). (Though I guess it's
probably unlikely that this goes wrong in practice.)
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: io_uring: incorrect assumption about mutex behavior on unlock?
2023-12-01 16:41 io_uring: incorrect assumption about mutex behavior on unlock? Jann Horn
@ 2023-12-01 18:30 ` David Laight
2023-12-01 18:40 ` mutex/spinlock semantics [was: Re: io_uring: incorrect assumption about mutex behavior on unlock?] Jann Horn
2023-12-01 18:52 ` io_uring: incorrect assumption about mutex behavior on unlock? Pavel Begunkov
1 sibling, 1 reply; 4+ messages in thread
From: David Laight @ 2023-12-01 18:30 UTC (permalink / raw)
To: 'Jann Horn', Jens Axboe, Pavel Begunkov, io-uring
Cc: kernel list, Peter Zijlstra, Ingo Molnar, Will Deacon,
Waiman Long
From: Jann Horn
> Sent: 01 December 2023 16:41
>
> mutex_unlock() has a different API contract compared to spin_unlock().
> spin_unlock() can be used to release ownership of an object, so that
> as soon as the spinlock is unlocked, another task is allowed to free
> the object containing the spinlock.
> mutex_unlock() does not support this kind of usage: The caller of
> mutex_unlock() must ensure that the mutex stays alive until
> mutex_unlock() has returned.
The problem sequence might be:
Thread A Thread B
mutex_lock()
code to stop mutex being requested
...
mutex_lock() - sleeps
mutex_unlock()...
Waiters woken...
isr and/or pre-empted
- wakes up
mutex_unlock()
free()
... more kernel code access the mutex
BOOOM
What happens in a PREEMPT_RT kernel where most of the spin_unlock()
get replaced by mutex_unlock().
Seems like they can potentially access a freed mutex?
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 4+ messages in thread
* mutex/spinlock semantics [was: Re: io_uring: incorrect assumption about mutex behavior on unlock?]
2023-12-01 18:30 ` David Laight
@ 2023-12-01 18:40 ` Jann Horn
0 siblings, 0 replies; 4+ messages in thread
From: Jann Horn @ 2023-12-01 18:40 UTC (permalink / raw)
To: David Laight
Cc: Jens Axboe, Pavel Begunkov, io-uring, kernel list, Peter Zijlstra,
Ingo Molnar, Will Deacon, Waiman Long, Thomas Gleixner
On Fri, Dec 1, 2023 at 7:30 PM David Laight <[email protected]> wrote:
>
> From: Jann Horn
> > Sent: 01 December 2023 16:41
> >
> > mutex_unlock() has a different API contract compared to spin_unlock().
> > spin_unlock() can be used to release ownership of an object, so that
> > as soon as the spinlock is unlocked, another task is allowed to free
> > the object containing the spinlock.
> > mutex_unlock() does not support this kind of usage: The caller of
> > mutex_unlock() must ensure that the mutex stays alive until
> > mutex_unlock() has returned.
>
> The problem sequence might be:
> Thread A Thread B
> mutex_lock()
> code to stop mutex being requested
> ...
> mutex_lock() - sleeps
> mutex_unlock()...
> Waiters woken...
> isr and/or pre-empted
> - wakes up
> mutex_unlock()
> free()
> ... more kernel code access the mutex
> BOOOM
>
> What happens in a PREEMPT_RT kernel where most of the spin_unlock()
> get replaced by mutex_unlock().
> Seems like they can potentially access a freed mutex?
RT spinlocks don't use mutexes, they use rtmutexes, and I think those
explicitly support this usecase. See the call path:
spin_unlock -> rt_spin_unlock -> rt_mutex_slowunlock
rt_mutex_slowunlock() has a comment, added in commit 27e35715df54
("rtmutex: Plug slow unlock race"):
* We must be careful here if the fast path is enabled. If we
* have no waiters queued we cannot set owner to NULL here
* because of:
*
* foo->lock->owner = NULL;
* rtmutex_lock(foo->lock); <- fast path
* free = atomic_dec_and_test(foo->refcnt);
* rtmutex_unlock(foo->lock); <- fast path
* if (free)
* kfree(foo);
* raw_spin_unlock(foo->lock->wait_lock);
That commit also explicitly refers to wanting to support this pattern
with spin_unlock() in the commit message.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: io_uring: incorrect assumption about mutex behavior on unlock?
2023-12-01 16:41 io_uring: incorrect assumption about mutex behavior on unlock? Jann Horn
2023-12-01 18:30 ` David Laight
@ 2023-12-01 18:52 ` Pavel Begunkov
1 sibling, 0 replies; 4+ messages in thread
From: Pavel Begunkov @ 2023-12-01 18:52 UTC (permalink / raw)
To: Jann Horn, Jens Axboe, io-uring
Cc: kernel list, Peter Zijlstra, Ingo Molnar, Will Deacon,
Waiman Long
On 12/1/23 16:41, Jann Horn wrote:
> mutex_unlock() has a different API contract compared to spin_unlock().
> spin_unlock() can be used to release ownership of an object, so that
> as soon as the spinlock is unlocked, another task is allowed to free
> the object containing the spinlock.
> mutex_unlock() does not support this kind of usage: The caller of
> mutex_unlock() must ensure that the mutex stays alive until
> mutex_unlock() has returned.
> (See the thread
> <https://lore.kernel.org/all/[email protected]/>
> which discusses adding documentation about this.)
> (POSIX userspace mutexes are different from kernel mutexes, in
> userspace this pattern is allowed.)
>
> io_ring_exit_work() has a comment that seems to assume that the
> uring_lock (which is a mutex) can be used as if the spinlock-style API
> contract applied:
>
> /*
> * Some may use context even when all refs and requests have been put,
> * and they are free to do so while still holding uring_lock or
> * completion_lock, see io_req_task_submit(). Apart from other work,
> * this lock/unlock section also waits them to finish.
> */
> mutex_lock(&ctx->uring_lock);
>
Oh crap. I'll check if there more suspects and patch it up, thanks
> I couldn't find any way in which io_req_task_submit() actually still
> relies on this. I think io_fallback_req_func() now relies on it,
> though I'm not sure whether that's intentional. ctx->fallback_work is
> flushed in io_ring_ctx_wait_and_kill(), but I think it can probably be
> restarted later on via:
Yes, io_fallback_req_func() relies on it, and it can be spinned up
asynchronously from different places, e.g. in-IRQ block request
completion.
> io_ring_exit_work -> io_move_task_work_from_local ->
> io_req_normal_work_add -> io_fallback_tw(sync=false) ->
> schedule_delayed_work
>
> I think it is probably guaranteed that ctx->refs is non-zero when we
> enter io_fallback_req_func, since I think we can't enter
> io_fallback_req_func with an empty ctx->fallback_llist, and the
> requests queued up on ctx->fallback_llist have to hold refcounted
> references to the ctx. But by the time we reach the mutex_unlock(), I
> think we're not guaranteed to hold any references on the ctx anymore,
> and so the ctx could theoretically be freed in the middle of the
> mutex_unlock() call?
Right, it comes with refs but loses them in between lock()/unlock().
> I think that to make this code properly correct, it might be necessary
> to either add another flush_delayed_work() call after ctx->refs has
> dropped to zero and we know that the fallback work can't be restarted
> anymore, or create an extra ctx->refs reference that is dropped in
> io_fallback_req_func() after the mutex_unlock(). (Though I guess it's
> probably unlikely that this goes wrong in practice.)
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-12-01 18:54 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-01 16:41 io_uring: incorrect assumption about mutex behavior on unlock? Jann Horn
2023-12-01 18:30 ` David Laight
2023-12-01 18:40 ` mutex/spinlock semantics [was: Re: io_uring: incorrect assumption about mutex behavior on unlock?] Jann Horn
2023-12-01 18:52 ` io_uring: incorrect assumption about mutex behavior on unlock? Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox