From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
io-uring <io-uring@vger.kernel.org>
Cc: Francis Brosseau <francis@malagauche.com>
Subject: Re: [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race
Date: Tue, 17 Mar 2026 07:07:48 -0600 [thread overview]
Message-ID: <edcd0d75-6877-409d-8350-915349395a7c@kernel.dk> (raw)
In-Reply-To: <06a8b8a6-2cf0-4d1f-835f-06f4070402d9@gmail.com>
On 3/17/26 6:27 AM, Pavel Begunkov wrote:
> On 3/17/26 02:17, Jens Axboe wrote:
>> When a socket send and shutdown() happen back-to-back, both fire
>> wake-ups before the receiver's task_work has a chance to run. The first
>> wake gets poll ownership (poll_refs=1), and the second bumps it to 2.
>> When io_poll_check_events() runs, it calls io_poll_issue() which does a
>> recv that reads the data and returns IOU_RETRY. The loop then drains all
>> accumulated refs (atomic_sub_return(2) -> 0) and exits, even though only
>> the first event was consumed. Since the shutdown is a persistent state
>> change, no further wakeups will happen, and the multishot recv can hang
>> forever.
>>
>> Check specifically for HUP in the poll loop, and ensure that another
>> loop is done to check for status if more than a single poll activation
>> is pending. This ensures we don't lose the shutdown event.
>
> Sounds fine with comments below.
Thanks
> Btw, did you look into whether it's a INQ issue? Polling expects
> multishots to handle all those conditions, which usually goes in a
> form of:
>
> while (1) {
> ret = do_IO();
> if (ret == -EAGAIN)
> goto continue_poll;
> if (ret < 0)
> goto fail;
> if (ret == 0)
> goto terminate_req;
> ...
> // partial progress, try again
> }
>
> and recv was following this pattern before, but maybe it's sth
> like recv() returning some bytes, inq rightfully saying that there
> are no more bytes left but forgets to check for terminators like
> shutdown.
Right, as per my earlier emails, this is what introduced the issue for
AF_UNIX, when the INQ support was added. We read the whole thing, and
INQ is correctly returned as having 0 bytes left. Hence no retry
happens, and the EOF is missed. We could do something ala the below,
entirely untested, which would ensure we retry for that condition.
I don't love the poll HUP hack, but I also don't really like how the
poll event handling will coalesce the events effectively. Since this
particular issue will need to go back to 6.17+ stable, I'm also open to
doing the HUP hack and just doing something cleaner on top.
diff --git a/io_uring/net.c b/io_uring/net.c
index 3f9d08b78c21..c10d4c9bd88b 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -903,10 +903,13 @@ static inline bool io_recv_finish(struct io_kiocb *req,
*/
if ((req->flags & REQ_F_APOLL_MULTISHOT) && !mshot_finished &&
io_req_post_cqe(req, sel->val, cflags | IORING_CQE_F_MORE)) {
+ struct socket *sock = sock_from_file(req->file);
+
sel->val = IOU_RETRY;
io_mshot_prep_retry(req, kmsg);
/* Known not-empty or unknown state, retry */
- if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0) {
+ if (cflags & IORING_CQE_F_SOCK_NONEMPTY || kmsg->msg.msg_inq < 0 ||
+ READ_ONCE(sock->sk->sk_shutdown) & SHUTDOWN_MASK) {
if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY &&
!(sr->flags & IORING_RECV_MSHOT_CAP)) {
return false;
>> diff --git a/io_uring/poll.c b/io_uring/poll.c
>> index aac4b3b881fb..a264d73a8cbd 100644
>> --- a/io_uring/poll.c
>> +++ b/io_uring/poll.c
>> @@ -228,6 +228,19 @@ static inline void io_poll_execute(struct io_kiocb *req, int res)
>> __io_poll_execute(req, res);
>> }
>> +static inline void io_mshot_check_retry(struct io_kiocb *req, int *v)
>> +{
>> + /*
>> + * Release all references, retry if someone tried to restart
>> + * task_work while we were executing it.
>> + */
>
> This comment belongs to the atomic sub, not masking.
True, should've left that there.
>> + *v &= IO_POLL_REF_MASK;
>
> nit: seems like you can just do that inside the
> "if (unlikely(v != 1)) { ... }" block.
That could work, then we don't need it in both the other branches.
>> + /* multiple refs and HUP, ensure we loop once more */
>> + if ((req->cqe.res & (POLLHUP | POLLRDHUP)) && *v != 1)
>> + (*v)--;
>> +}
>> +
>> /*
>> * All poll tw should go through this. Checks for poll events, manages
>> * references, does rewait, etc.
>> @@ -303,6 +316,7 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>> io_req_set_res(req, mask, 0);
>> return IOU_POLL_REMOVE_POLL_USE_RES;
>> }
>> + v &= IO_POLL_REF_MASK;
>> } else {
>> int ret = io_poll_issue(req, tw);
>> @@ -312,16 +326,11 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>> return IOU_POLL_REQUEUE;
>> if (ret != IOU_RETRY && ret < 0)
>> return ret;
>> + io_mshot_check_retry(req, &v);
>
> Should go before io_poll_issue(), req->cqe.res might already be
> invalid.
Yeah good point, it was above it before. Too much late night
consolidation...
--
Jens Axboe
next prev parent reply other threads:[~2026-03-17 13:07 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-17 2:17 [PATCH v3] io_uring/poll: fix multishot recv missing EOF on wakeup race Jens Axboe
2026-03-17 12:27 ` Pavel Begunkov
2026-03-17 13:07 ` Jens Axboe [this message]
2026-03-17 18:37 ` Pavel Begunkov
2026-03-17 18:42 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=edcd0d75-6877-409d-8350-915349395a7c@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=francis@malagauche.com \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox