public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH] io_uring: fix potential deadlock in io_poll_wake()
@ 2019-11-12 15:18 Jens Axboe
  2019-11-12 16:43 ` Jens Axboe
  0 siblings, 1 reply; 3+ messages in thread
From: Jens Axboe @ 2019-11-12 15:18 UTC (permalink / raw)
  To: io-uring

We attempt to run the poll completion inline, but we're using trylock to
do so. This avoids a deadlock since we're grabbing the locks in reverse
order at this point, we already hold the poll wq lock and we're trying
to grab the completion lock, while the normal rules are the reverse of
that order.

IO completion for a timeout link will need to grab the completion lock,
so it's not safe to use from this particular context. If the poll
request has a timeout link, don't attempt complete inline.

Signed-off-by: Jens Axboe <[email protected]>

---

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3c573f0578a8..e15c2f8e0840 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2064,7 +2064,14 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 
 	list_del_init(&poll->wait.entry);
 
-	if (mask && spin_trylock_irqsave(&ctx->completion_lock, flags)) {
+	/*
+	 * Run completion inline if we can. We're using trylock here because
+	 * we are violating the completion_lock -> poll wq lock ordering.
+	 * If we have a link timeout we're going to need the completion_lock
+	 * for finalizing the request, don't allow inline completion for that.
+	 */
+	if (!(req->flags & REQ_F_LINK_TIMEOUT) &&
+	    mask && spin_trylock_irqsave(&ctx->completion_lock, flags)) {
 		list_del(&req->list);
 		io_poll_complete(req, mask);
 		spin_unlock_irqrestore(&ctx->completion_lock, flags);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] io_uring: fix potential deadlock in io_poll_wake()
  2019-11-12 15:18 Jens Axboe
@ 2019-11-12 16:43 ` Jens Axboe
  0 siblings, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2019-11-12 16:43 UTC (permalink / raw)
  To: io-uring

On 11/12/19 7:18 AM, Jens Axboe wrote:
> We attempt to run the poll completion inline, but we're using trylock to
> do so. This avoids a deadlock since we're grabbing the locks in reverse
> order at this point, we already hold the poll wq lock and we're trying
> to grab the completion lock, while the normal rules are the reverse of
> that order.
> 
> IO completion for a timeout link will need to grab the completion lock,
> so it's not safe to use from this particular context. If the poll
> request has a timeout link, don't attempt complete inline.

It's a shame to lose the inline run with a linked timer, I'm going to
send a v2 that retains this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] io_uring: fix potential deadlock in io_poll_wake()
@ 2019-11-12 16:47 Jens Axboe
  0 siblings, 0 replies; 3+ messages in thread
From: Jens Axboe @ 2019-11-12 16:47 UTC (permalink / raw)
  To: io-uring

We attempt to run the poll completion inline, but we're using trylock to
do so. This avoids a deadlock since we're grabbing the locks in reverse
order at this point, we already hold the poll wq lock and we're trying
to grab the completion lock, while the normal rules are the reverse of
that order.

IO completion for a timeout link will need to grab the completion lock,
but that's not safe from this context. Put the completion under the
completion_lock om io_poll_wake(), and mark the request as entering
the completion with completion_lock already held.

Signed-off-by: Jens Axboe <[email protected]>

---

Changes since v1:

- Retain the inline running, use REQ_F_COMP_LOCKED instead to tell
  io_free_req_find_next() that we are already holding the completion
  lock when entered.
- Correct protection -> protect in comment

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3c573f0578a8..247e5e1137a3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -341,6 +341,7 @@ struct io_kiocb {
 #define REQ_F_ISREG		2048	/* regular file */
 #define REQ_F_MUST_PUNT		4096	/* must be punted even for NONBLOCK */
 #define REQ_F_INFLIGHT		8192	/* on inflight list */
+#define REQ_F_COMP_LOCKED	16384	/* completion under lock */
 	u64			user_data;
 	u32			result;
 	u32			sequence;
@@ -931,14 +932,15 @@ static void io_free_req_find_next(struct io_kiocb *req, struct io_kiocb **nxt)
 	 */
 	if (req->flags & REQ_F_FAIL_LINK) {
 		io_fail_links(req);
-	} else if (req->flags & REQ_F_LINK_TIMEOUT) {
+	} else if ((req->flags & (REQ_F_LINK_TIMEOUT | REQ_F_COMP_LOCKED)) ==
+			REQ_F_LINK_TIMEOUT) {
 		struct io_ring_ctx *ctx = req->ctx;
 		unsigned long flags;
 
 		/*
 		 * If this is a timeout link, we could be racing with the
 		 * timeout timer. Grab the completion lock for this case to
-		 * protection against that.
+		 * protect against that.
 		 */
 		spin_lock_irqsave(&ctx->completion_lock, flags);
 		io_req_link_next(req, nxt);
@@ -2064,13 +2066,20 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 
 	list_del_init(&poll->wait.entry);
 
+	/*
+	 * Run completion inline if we can. We're using trylock here because
+	 * we are violating the completion_lock -> poll wq lock ordering.
+	 * If we have a link timeout we're going to need the completion_lock
+	 * for finalizing the request, mark us as having grabbed that already.
+	 */
 	if (mask && spin_trylock_irqsave(&ctx->completion_lock, flags)) {
 		list_del(&req->list);
 		io_poll_complete(req, mask);
+		req->flags |= REQ_F_COMP_LOCKED;
+		io_put_req(req);
 		spin_unlock_irqrestore(&ctx->completion_lock, flags);
 
 		io_cqring_ev_posted(ctx);
-		io_put_req(req);
 	} else {
 		io_queue_async_work(req);
 	}
-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-11-12 16:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-11-12 16:47 [PATCH] io_uring: fix potential deadlock in io_poll_wake() Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2019-11-12 15:18 Jens Axboe
2019-11-12 16:43 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox