public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
	io-uring <io-uring@vger.kernel.org>
Cc: Francis Brosseau <francis@malagauche.com>
Subject: Re: [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race
Date: Tue, 17 Mar 2026 07:07:48 -0600	[thread overview]
Message-ID: <edcd0d75-6877-409d-8350-915349395a7c@kernel.dk> (raw)
In-Reply-To: <06a8b8a6-2cf0-4d1f-835f-06f4070402d9@gmail.com>

On 3/17/26 6:27 AM, Pavel Begunkov wrote:
> On 3/17/26 02:17, Jens Axboe wrote:
>> When a socket send and shutdown() happen back-to-back, both fire
>> wake-ups before the receiver's task_work has a chance to run. The first
>> wake gets poll ownership (poll_refs=1), and the second bumps it to 2.
>> When io_poll_check_events() runs, it calls io_poll_issue() which does a
>> recv that reads the data and returns IOU_RETRY. The loop then drains all
>> accumulated refs (atomic_sub_return(2) -> 0) and exits, even though only
>> the first event was consumed. Since the shutdown is a persistent state
>> change, no further wakeups will happen, and the multishot recv can hang
>> forever.
>>
>> Check specifically for HUP in the poll loop, and ensure that another
>> loop is done to check for status if more than a single poll activation
>> is pending. This ensures we don't lose the shutdown event.
> 
> Sounds fine with comments below.

Thanks

> Btw, did you look into whether it's a INQ issue? Polling expects
> multishots to handle all those conditions, which usually goes in a
> form of:
> 
> while (1) {
>     ret = do_IO();
>     if (ret == -EAGAIN)
>         goto continue_poll;
>     if (ret < 0)
>         goto fail;
>     if (ret == 0)
>         goto terminate_req;
>     ...
>     // partial progress, try again
> }
> 
> and recv was following this pattern before, but maybe it's sth
> like recv() returning some bytes, inq rightfully saying that there
> are no more bytes left but forgets to check for terminators like
> shutdown.

Right, as per my earlier emails, this is what introduced the issue for
AF_UNIX, when the INQ support was added. We read the whole thing, and
INQ is correctly returned as having 0 bytes left. Hence no retry
happens, and the EOF is missed. We could do something ala the below,
entirely untested, which would ensure we retry for that condition.

I don't love the poll HUP hack, but I also don't really like how the
poll event handling will coalesce the events effectively. Since this
particular issue will need to go back to 6.17+ stable, I'm also open to
doing the HUP hack and just doing something cleaner on top.

diff --git a/io_uring/net.c b/io_uring/net.c
index 3f9d08b78c21..c10d4c9bd88b 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -903,10 +903,13 @@ static inline bool io_recv_finish(struct io_kiocb *req,
 	 */
 	if ((req->flags & REQ_F_APOLL_MULTISHOT) && !mshot_finished &&
 	    io_req_post_cqe(req, sel->val, cflags | IORING_CQE_F_MORE)) {
+		struct socket *sock = sock_from_file(req->file);
+
 		sel->val = IOU_RETRY;
 		io_mshot_prep_retry(req, kmsg);
 		/* Known not-empty or unknown state, retry */
-		if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0) {
+		if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0 ||
+		    READ_ONCE(sock->sk->sk_shutdown) & SHUTDOWN_MASK) {
 			if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY &&
 			    !(sr->flags & IORING_RECV_MSHOT_CAP)) {
 				return false;

>> diff --git a/io_uring/poll.c b/io_uring/poll.c
>> index aac4b3b881fb..a264d73a8cbd 100644
>> --- a/io_uring/poll.c
>> +++ b/io_uring/poll.c
>> @@ -228,6 +228,19 @@ static inline void io_poll_execute(struct io_kiocb *req, int res)
>>           __io_poll_execute(req, res);
>>   }
>>   +static inline void io_mshot_check_retry(struct io_kiocb *req, int *v)
>> +{
>> +    /*
>> +     * Release all references, retry if someone tried to restart
>> +     * task_work while we were executing it.
>> +     */
> 
> This comment belongs to the atomic sub, not masking.

True, should've left that there.

>> +    *v &= IO_POLL_REF_MASK;
> 
> nit: seems like you can just do that inside the
> "if (unlikely(v != 1)) { ... }" block.

That could work, then we don't need it in both the other branches.

>> +    /* multiple refs and HUP, ensure we loop once more */
>> +    if ((req->cqe.res & (POLLHUP | POLLRDHUP)) && *v != 1)
>> +        (*v)--;
>> +}
>> +
>>   /*
>>    * All poll tw should go through this. Checks for poll events, manages
>>    * references, does rewait, etc.
>> @@ -303,6 +316,7 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>>                   io_req_set_res(req, mask, 0);
>>                   return IOU_POLL_REMOVE_POLL_USE_RES;
>>               }
>> +            v &= IO_POLL_REF_MASK;
>>           } else {
>>               int ret = io_poll_issue(req, tw);
>>   @@ -312,16 +326,11 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>>                   return IOU_POLL_REQUEUE;
>>               if (ret != IOU_RETRY && ret < 0)
>>                   return ret;
>> +            io_mshot_check_retry(req, &v);
> 
> Should go before io_poll_issue(), req->cqe.res might already be
> invalid.

Yeah good point, it was above it before. Too much late night
consolidation...


-- 
Jens Axboe

  reply	other threads:[~2026-03-17 13:07 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17  2:17 [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race Jens Axboe
2026-03-17 12:27 ` Pavel Begunkov
2026-03-17 13:07   ` Jens Axboe [this message]
2026-03-17 18:37     ` Pavel Begunkov
2026-03-17 18:42       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=edcd0d75-6877-409d-8350-915349395a7c@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=francis@malagauche.com \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox