[PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race
@ 2026-03-17  2:17 Jens Axboe
  2026-03-17 12:27 ` Pavel Begunkov
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2026-03-17  2:17 UTC (permalink / raw)
  To: io-uring; +Cc: Pavel Begunkov, Francis Brosseau

When a socket send and shutdown() happen back-to-back, both fire
wake-ups before the receiver's task_work has a chance to run. The first
wake gets poll ownership (poll_refs=1), and the second bumps it to 2.
When io_poll_check_events() runs, it calls io_poll_issue() which does a
recv that reads the data and returns IOU_RETRY. The loop then drains all
accumulated refs (atomic_sub_return(2) -> 0) and exits, even though only
the first event was consumed. Since the shutdown is a persistent state
change, no further wakeups will happen, and the multishot recv can hang
forever.

Check specifically for HUP in the poll loop, and ensure that another
loop is done to check for status if more than a single poll activation
is pending. This ensures we don't lose the shutdown event.

Cc: stable@vger.kernel.org
Fixes: dbc2564cfe0f ("io_uring: let fast poll support multishot")
Reported-by: Francis Brosseau <francis@malagauche.com>
Link: https://github.com/axboe/liburing/issues/1549
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

V3: split mshot and !mshot cases, and simply use the number of refs
    gotten in the beginning for gating retry. if one is dropped when
    we want to retry, we'll loop again as we'd still have remaining
    refs.

diff --git a/io_uring/poll.c b/io_uring/poll.c
index aac4b3b881fb..a264d73a8cbd 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -228,6 +228,19 @@ static inline void io_poll_execute(struct io_kiocb *req, int res)
 		__io_poll_execute(req, res);
 }
 
+static inline void io_mshot_check_retry(struct io_kiocb *req, int *v)
+{
+	/*
+	 * Release all references, retry if someone tried to restart
+	 * task_work while we were executing it.
+	 */
+	*v &= IO_POLL_REF_MASK;
+
+	/* multiple refs and HUP, ensure we loop once more */
+	if ((req->cqe.res & (POLLHUP | POLLRDHUP)) && *v != 1)
+		(*v)--;
+}
+
 /*
  * All poll tw should go through this. Checks for poll events, manages
  * references, does rewait, etc.
@@ -303,6 +316,7 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
 				io_req_set_res(req, mask, 0);
 				return IOU_POLL_REMOVE_POLL_USE_RES;
 			}
+			v &= IO_POLL_REF_MASK;
 		} else {
 			int ret = io_poll_issue(req, tw);
 
@@ -312,16 +326,11 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
 				return IOU_POLL_REQUEUE;
 			if (ret != IOU_RETRY && ret < 0)
 				return ret;
+			io_mshot_check_retry(req, &v);
 		}
 
 		/* force the next iteration to vfs_poll() */
 		req->cqe.res = 0;
-
-		/*
-		 * Release all references, retry if someone tried to restart
-		 * task_work while we were executing it.
-		 */
-		v &= IO_POLL_REF_MASK;
 	} while (atomic_sub_return(v, &req->poll_refs) & IO_POLL_REF_MASK);
 
 	io_napi_add(req);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race
  2026-03-17  2:17 [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race Jens Axboe
@ 2026-03-17 12:27 ` Pavel Begunkov
  2026-03-17 13:07   ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Pavel Begunkov @ 2026-03-17 12:27 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Francis Brosseau

On 3/17/26 02:17, Jens Axboe wrote:
> When a socket send and shutdown() happen back-to-back, both fire
> wake-ups before the receiver's task_work has a chance to run. The first
> wake gets poll ownership (poll_refs=1), and the second bumps it to 2.
> When io_poll_check_events() runs, it calls io_poll_issue() which does a
> recv that reads the data and returns IOU_RETRY. The loop then drains all
> accumulated refs (atomic_sub_return(2) -> 0) and exits, even though only
> the first event was consumed. Since the shutdown is a persistent state
> change, no further wakeups will happen, and the multishot recv can hang
> forever.
> 
> Check specifically for HUP in the poll loop, and ensure that another
> loop is done to check for status if more than a single poll activation
> is pending. This ensures we don't lose the shutdown event.

Sounds fine with comments below.

Btw, did you look into whether it's a INQ issue? Polling expects
multishots to handle all those conditions, which usually goes in a
form of:

while (1) {
	ret = do_IO();
	if (ret == -EAGAIN)
		goto continue_poll;
	if (ret < 0)
		goto fail;
	if (ret == 0)
		goto terminate_req;
	...
	// partial progress, try again
}

and recv was following this pattern before, but maybe it's sth
like recv() returning some bytes, inq rightfully saying that there
are no more bytes left but forgets to check for terminators like
shutdown.

> Cc: stable@vger.kernel.org
> Fixes: dbc2564cfe0f ("io_uring: let fast poll support multishot")
> Reported-by: Francis Brosseau <francis@malagauche.com>
> Link: https://github.com/axboe/liburing/issues/1549
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> ---
> 
> V3: split mshot and !mshot cases, and simply use the number of refs
>      gotten in the beginning for gating retry. if one is dropped when
>      we want to retry, we'll loop again as we'd still have remaining
>      refs.
> 
> diff --git a/io_uring/poll.c b/io_uring/poll.c
> index aac4b3b881fb..a264d73a8cbd 100644
> --- a/io_uring/poll.c
> +++ b/io_uring/poll.c
> @@ -228,6 +228,19 @@ static inline void io_poll_execute(struct io_kiocb *req, int res)
>   		__io_poll_execute(req, res);
>   }
>   
> +static inline void io_mshot_check_retry(struct io_kiocb *req, int *v)
> +{
> +	/*
> +	 * Release all references, retry if someone tried to restart
> +	 * task_work while we were executing it.
> +	 */

This comment belongs to the atomic sub, not masking.

> +	*v &= IO_POLL_REF_MASK;

nit: seems like you can just do that inside the
"if (unlikely(v != 1)) { ... }" block.

> +
> +	/* multiple refs and HUP, ensure we loop once more */
> +	if ((req->cqe.res & (POLLHUP | POLLRDHUP)) && *v != 1)
> +		(*v)--;
> +}
> +
>   /*
>    * All poll tw should go through this. Checks for poll events, manages
>    * references, does rewait, etc.
> @@ -303,6 +316,7 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>   				io_req_set_res(req, mask, 0);
>   				return IOU_POLL_REMOVE_POLL_USE_RES;
>   			}
> +			v &= IO_POLL_REF_MASK;
>   		} else {
>   			int ret = io_poll_issue(req, tw);
>   
> @@ -312,16 +326,11 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>   				return IOU_POLL_REQUEUE;
>   			if (ret != IOU_RETRY && ret < 0)
>   				return ret;
> +			io_mshot_check_retry(req, &v);

Should go before io_poll_issue(), req->cqe.res might already be
invalid.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race
  2026-03-17 12:27 ` Pavel Begunkov
@ 2026-03-17 13:07   ` Jens Axboe
  2026-03-17 18:37     ` Pavel Begunkov
  0 siblings, 1 reply; 5+ messages in thread
From: Jens Axboe @ 2026-03-17 13:07 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Francis Brosseau

On 3/17/26 6:27 AM, Pavel Begunkov wrote:
> On 3/17/26 02:17, Jens Axboe wrote:
>> When a socket send and shutdown() happen back-to-back, both fire
>> wake-ups before the receiver's task_work has a chance to run. The first
>> wake gets poll ownership (poll_refs=1), and the second bumps it to 2.
>> When io_poll_check_events() runs, it calls io_poll_issue() which does a
>> recv that reads the data and returns IOU_RETRY. The loop then drains all
>> accumulated refs (atomic_sub_return(2) -> 0) and exits, even though only
>> the first event was consumed. Since the shutdown is a persistent state
>> change, no further wakeups will happen, and the multishot recv can hang
>> forever.
>>
>> Check specifically for HUP in the poll loop, and ensure that another
>> loop is done to check for status if more than a single poll activation
>> is pending. This ensures we don't lose the shutdown event.
> 
> Sounds fine with comments below.

Thanks

> Btw, did you look into whether it's a INQ issue? Polling expects
> multishots to handle all those conditions, which usually goes in a
> form of:
> 
> while (1) {
>     ret = do_IO();
>     if (ret == -EAGAIN)
>         goto continue_poll;
>     if (ret < 0)
>         goto fail;
>     if (ret == 0)
>         goto terminate_req;
>     ...
>     // partial progress, try again
> }
> 
> and recv was following this pattern before, but maybe it's sth
> like recv() returning some bytes, inq rightfully saying that there
> are no more bytes left but forgets to check for terminators like
> shutdown.

Right, as per my earlier emails, this is what introduced the issue for
AF_UNIX, when the INQ support was added. We read the whole thing, and
INQ is correctly returned as having 0 bytes left. Hence no retry
happens, and the EOF is missed. We could do something ala the below,
entirely untested, which would ensure we retry for that condition.

I don't love the poll HUP hack, but I also don't really like how the
poll event handling will coalesce the events effectively. Since this
particular issue will need to go back to 6.17+ stable, I'm also open to
doing the HUP hack and just doing something cleaner on top.

diff --git a/io_uring/net.c b/io_uring/net.c
index 3f9d08b78c21..c10d4c9bd88b 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -903,10 +903,13 @@ static inline bool io_recv_finish(struct io_kiocb *req,
 	 */
 	if ((req->flags & REQ_F_APOLL_MULTISHOT) && !mshot_finished &&
 	    io_req_post_cqe(req, sel->val, cflags | IORING_CQE_F_MORE)) {
+		struct socket *sock = sock_from_file(req->file);
+
 		sel->val = IOU_RETRY;
 		io_mshot_prep_retry(req, kmsg);
 		/* Known not-empty or unknown state, retry */
-		if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0) {
+		if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0 ||
+		    READ_ONCE(sock->sk->sk_shutdown) & SHUTDOWN_MASK) {
 			if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY &&
 			    !(sr->flags & IORING_RECV_MSHOT_CAP)) {
 				return false;

>> diff --git a/io_uring/poll.c b/io_uring/poll.c
>> index aac4b3b881fb..a264d73a8cbd 100644
>> --- a/io_uring/poll.c
>> +++ b/io_uring/poll.c
>> @@ -228,6 +228,19 @@ static inline void io_poll_execute(struct io_kiocb *req, int res)
>>           __io_poll_execute(req, res);
>>   }
>>   +static inline void io_mshot_check_retry(struct io_kiocb *req, int *v)
>> +{
>> +    /*
>> +     * Release all references, retry if someone tried to restart
>> +     * task_work while we were executing it.
>> +     */
> 
> This comment belongs to the atomic sub, not masking.

True, should've left that there.

>> +    *v &= IO_POLL_REF_MASK;
> 
> nit: seems like you can just do that inside the
> "if (unlikely(v != 1)) { ... }" block.

That could work, then we don't need it in both the other branches.

>> +    /* multiple refs and HUP, ensure we loop once more */
>> +    if ((req->cqe.res & (POLLHUP | POLLRDHUP)) && *v != 1)
>> +        (*v)--;
>> +}
>> +
>>   /*
>>    * All poll tw should go through this. Checks for poll events, manages
>>    * references, does rewait, etc.
>> @@ -303,6 +316,7 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>>                   io_req_set_res(req, mask, 0);
>>                   return IOU_POLL_REMOVE_POLL_USE_RES;
>>               }
>> +            v &= IO_POLL_REF_MASK;
>>           } else {
>>               int ret = io_poll_issue(req, tw);
>>   @@ -312,16 +326,11 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>>                   return IOU_POLL_REQUEUE;
>>               if (ret != IOU_RETRY && ret < 0)
>>                   return ret;
>> +            io_mshot_check_retry(req, &v);
> 
> Should go before io_poll_issue(), req->cqe.res might already be
> invalid.

Yeah good point, it was above it before. Too much late night
consolidation...


-- 
Jens Axboe

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race
  2026-03-17 13:07   ` Jens Axboe
@ 2026-03-17 18:37     ` Pavel Begunkov
  2026-03-17 18:42       ` Jens Axboe
  0 siblings, 1 reply; 5+ messages in thread
From: Pavel Begunkov @ 2026-03-17 18:37 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Francis Brosseau

On 3/17/26 13:07, Jens Axboe wrote:
...
> Right, as per my earlier emails, this is what introduced the issue for
> AF_UNIX, when the INQ support was added. We read the whole thing, and
> INQ is correctly returned as having 0 bytes left. Hence no retry
> happens, and the EOF is missed. We could do something ala the below,
> entirely untested, which would ensure we retry for that condition.

static int tcp_inq_hint(struct sock *sk)
{
	...
	if (inq == 0 && sock_flag(sk, SOCK_DONE))
		inq = 1;
	return inq;
}

Assuming TCP doesn't work either, I guess I was curious whether it
gets shutdown but the sock is !SOCK_DONE, or whether inq=1 is correct.
Just thinking out loud, maybe I will check later.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race
  2026-03-17 18:37     ` Pavel Begunkov
@ 2026-03-17 18:42       ` Jens Axboe
  0 siblings, 0 replies; 5+ messages in thread
From: Jens Axboe @ 2026-03-17 18:42 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring; +Cc: Francis Brosseau

On 3/17/26 12:37 PM, Pavel Begunkov wrote:
> On 3/17/26 13:07, Jens Axboe wrote:
> ...
>> Right, as per my earlier emails, this is what introduced the issue for
>> AF_UNIX, when the INQ support was added. We read the whole thing, and
>> INQ is correctly returned as having 0 bytes left. Hence no retry
>> happens, and the EOF is missed. We could do something ala the below,
>> entirely untested, which would ensure we retry for that condition.
> 
> static int tcp_inq_hint(struct sock *sk)
> {
>     ...
>     if (inq == 0 && sock_flag(sk, SOCK_DONE))
>         inq = 1;
>     return inq;
> }
> 
> Assuming TCP doesn't work either, I guess I was curious whether it
> gets shutdown but the sock is !SOCK_DONE, or whether inq=1 is correct.
> Just thinking out loud, maybe I will check later.

Ah indeed, yes that's a good find. Let me test that real quick...
I feel like the AF_UNIX inq addition was somewhat half baked.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-03-17 18:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-17  2:17 [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race Jens Axboe
2026-03-17 12:27 ` Pavel Begunkov
2026-03-17 13:07   ` Jens Axboe
2026-03-17 18:37     ` Pavel Begunkov
2026-03-17 18:42       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox