public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: "Jann Horn" <[email protected]>,
	"Thomas Gleixner" <[email protected]>,
	"Ingo Molnar" <[email protected]>,
	"Peter Zijlstra" <[email protected]>,
	"Darren Hart" <[email protected]>,
	"Davidlohr Bueso" <[email protected]>,
	"André Almeida" <[email protected]>,
	"kernel list" <[email protected]>,
	"Pavel Begunkov" <[email protected]>,
	io-uring <[email protected]>
Subject: Re: futex+io_uring: futex_q::task can maybe be dangling (but is not actually accessed, so it's fine)
Date: Fri, 10 Jan 2025 20:33:34 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAG48ez2HHU+vSCcurs5TsFXiEfUhLSXbEzcugBSTBZgBWkzpuA@mail.gmail.com>

On 1/10/25 3:26 PM, Jann Horn wrote:
> Hi!
> 
> I think there is some brittle interaction between futex and io_uring;
> but to be clear, I don't think that there is actually a bug here.
> 
> In io_uring, when a IORING_OP_FUTEX_WAIT SQE is submitted with
> IOSQE_ASYNC, an io_uring worker thread can queue up futex waiters via
> the following path:
> 
> ret_from_fork -> io_wq_worker -> io_worker_handle_work ->
> io_wq_submit_work[called as ->do_work] -> io_issue_sqe ->
> io_futex_wait[called as .issue] -> futex_queue -> __futex_queue
> 
> futex_q instances normally describe synchronously waiting tasks, and
> __futex_queue() records the identity of the calling task (which is
> normally the waiter) in futex_q::task. But io_uring waits on futexes
> asynchronously instead; from io_uring's perspective, a pending futex
> wait is not tied to the task that called into futex_queue(), it is
> just tied to the userspace task on behalf of which the io_uring worker
> is acting (I think). So when a futex wait operation is started by an
> io_uring worker task, I think that worker task could go away while the
> futex_q is still queued up on the futex, and so I think we can end up
> with a futex_q whose "task" member points to a freed task_struct.
> 
> The good part is that (from what I understand) that "task" member is
> only used for two purposes:
> 
> 1. futexes that are either created through the normal futex syscalls
> use futex_wake_mark as their .wake callback, which needs the task
> pointer to know which task should be woken.
> 2. PI futexes use it for priority inheritance magic (and AFAICS there
> is no way for io_uring to interface with PI futexes)
> 
> I'm not sure what is the best thing to do is here - maybe the current
> situation is fine, and I should just send a patch that adds a comment
> describing this to the definition of the "task" member? Or maybe it
> would be better for robustness to ensure that the "task" member is
> NULLed out in those cases, though that would probably make the
> generated machine code a little bit more ugly? (Or maybe I totally
> misunderstand what's going on and there isn't actually a dangling
> pointer...)

Good find. And yes the io-wq task could go away, if there's more of
them.

While it isn't an issue, dangling pointers make me nervous. We could do
something like the (totally untested) below patch, where io_uring just
uses __futex_queue() and make __futex_queue() take a task_struct as
well. Other callers pass in 'current'.

It's quite possible that it'd be safe to just use __futex_queue() and
clear ->task after the queue, but if we pass in NULL from the get-go,
then there's definitely not a risk of anything being dangling.


diff --git a/io_uring/futex.c b/io_uring/futex.c
index 30139cc150f2..985ad10cc6d5 100644
--- a/io_uring/futex.c
+++ b/io_uring/futex.c
@@ -338,7 +338,12 @@ int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags)
 		hlist_add_head(&req->hash_node, &ctx->futex_list);
 		io_ring_submit_unlock(ctx, issue_flags);
 
-		futex_queue(&ifd->q, hb);
+		/*
+		 * Don't pass in current as the task associated with the
+		 * futex queue, that only makes sense for sync syscalls.
+		 */
+		__futex_queue(&ifd->q, hb, NULL);
+		spin_unlock(&hb->lock);
 		return IOU_ISSUE_SKIP_COMPLETE;
 	}
 
diff --git a/kernel/futex/core.c b/kernel/futex/core.c
index ebdd76b4ecbb..3db8567f5a44 100644
--- a/kernel/futex/core.c
+++ b/kernel/futex/core.c
@@ -532,7 +532,8 @@ void futex_q_unlock(struct futex_hash_bucket *hb)
 	futex_hb_waiters_dec(hb);
 }
 
-void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
+void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb,
+		   struct task_struct *task)
 {
 	int prio;
 
@@ -548,7 +549,7 @@ void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
 
 	plist_node_init(&q->list, prio);
 	plist_add(&q->list, &hb->chain);
-	q->task = current;
+	q->task = task;
 }
 
 /**
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index 99b32e728c4a..2d3f1d53c854 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -285,7 +285,8 @@ static inline int futex_get_value_locked(u32 *dest, u32 __user *from)
 }
 
 extern void __futex_unqueue(struct futex_q *q);
-extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb);
+extern void __futex_queue(struct futex_q *q, struct futex_hash_bucket *hb,
+				struct task_struct *task);
 extern int futex_unqueue(struct futex_q *q);
 
 /**
@@ -303,7 +304,7 @@ extern int futex_unqueue(struct futex_q *q);
 static inline void futex_queue(struct futex_q *q, struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
-	__futex_queue(q, hb);
+	__futex_queue(q, hb, current);
 	spin_unlock(&hb->lock);
 }
 
diff --git a/kernel/futex/pi.c b/kernel/futex/pi.c
index d62cca5ed8f4..635c7d5d4222 100644
--- a/kernel/futex/pi.c
+++ b/kernel/futex/pi.c
@@ -982,7 +982,7 @@ int futex_lock_pi(u32 __user *uaddr, unsigned int flags, ktime_t *time, int tryl
 	/*
 	 * Only actually queue now that the atomic ops are done:
 	 */
-	__futex_queue(&q, hb);
+	__futex_queue(&q, hb, current);
 
 	if (trylock) {
 		ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex);

-- 
Jens Axboe

  reply	other threads:[~2025-01-11  3:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10 22:26 futex+io_uring: futex_q::task can maybe be dangling (but is not actually accessed, so it's fine) Jann Horn
2025-01-11  3:33 ` Jens Axboe [this message]
2025-01-13 13:53   ` Jann Horn
2025-01-13 14:38   ` Peter Zijlstra
2025-01-13 14:41     ` Jens Axboe
2025-01-15 10:20     ` Thomas Gleixner
2025-01-15 15:23       ` Jens Axboe
2025-01-15 15:32         ` Jens Axboe
2025-01-15 17:05           ` Thomas Gleixner
2025-01-15 17:07             ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox