From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f44.google.com (mail-wm1-f44.google.com [209.85.128.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9836D372EE8 for ; Mon, 16 Mar 2026 23:08:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773702518; cv=none; b=BZlH3r1aVrBeR+PuM7ozneEuC3WPGRPMxyTTFrMhcBVPVVTdJePBwz2Bv5qC3Z6Qoo07cwvPr+crmbyEFF40Ayc2MV4XxrtrJkD7Ckm3/v3vfPL3ZXXU9EMo32aASL1wW6iFL67Qg2LffiN8gq3YgeMYs+O1R1u2kkchw1CI8NY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773702518; c=relaxed/simple; bh=z0ENevxeedDZ/BV4vFdSJ6KPo4A4cWUOqvDmONkeYww=; h=Message-ID:Date:MIME-Version:Subject:To:References:From: In-Reply-To:Content-Type; b=XydGo/Q7LrDQRpVRIxSgGXJhINnY2jrP2PhyVjXFq2fJd9zYevB/mbcREQMPsv6Y+rg5ursGoSnM2Zflh9LIG0GEVnXgLCMSI+2AW277A0J4kneXQ+mSjkK7bht9rpJ9qqnZSQzantBAojloruvxGC4DxG8jd6CbDl4fHNIBvyQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=dTrjI/Sc; arc=none smtp.client-ip=209.85.128.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="dTrjI/Sc" Received: by mail-wm1-f44.google.com with SMTP id 5b1f17b1804b1-4852ff06541so57330315e9.2 for ; Mon, 16 Mar 2026 16:08:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773702516; x=1774307316; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=Zbf96soQ6yqzSUOGsWgUuKsjYZ2S8jdwCeQ+K/1xSno=; b=dTrjI/ScjHAmey+Kgx2yGRh8fySwVeZ6+EGKklDvbbT+7CyjaKQjFgD10eU7eXn1yw QMLXHd0pm/T+UN1potv+grJr8q/1Az4gjinTFjvyBq8xSJqhSMnUDTkPFBTeaDFP3aK1 HFZKAmfzvPi8q9urtnZX+eM5dmIrMOOfa7O4IK67x4/EtDdZWGh5B/44t7m9BUPSsVTH +BKySBsZ/62sNduougwqdjhNI2/6Z1b7pc0fPsgo/0qAOHiAqrJy1j5qvQzgy3HYOG7a IU8iSNuY3Eziw7GDSr5Fjiikh6EOJjK6fZ7Z58MT1Frr2iF6Y6YSlnb+E+gtCOc0pITz j3IQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773702516; x=1774307316; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Zbf96soQ6yqzSUOGsWgUuKsjYZ2S8jdwCeQ+K/1xSno=; b=H/LxxzhcgdLdcfQPI5JExWSzKLD5WLUiNhFKxDgdGs6xoz0Bo12+GesyKRI7mViqcn dzxw6samNAuBj77PhJlV8WeSS88zd8sCWShAJ2GJpKF157g1ntfWAPaS7jvrOtHi/+5c 6qPemd3h+KL/2taJRxUAwc3Ma8V0vz86lvLvL998qVrs/hIJKmeSFHosY/W7buthGnMV Dc9vTTtLmHTxyl9Q7z4t6An/lUAJTIBEHuPw+cFl2ncKqsVXW1e0CV5UKnh8o6aHCLi0 UXf7udNuCAdm/C+53Xtkl81axvlT2im9inTa0DGO1zrxWTarU7PZqStv05MGpZF8gqvv yxGQ== X-Forwarded-Encrypted: i=1; AJvYcCVq6SuSUjAqe0BgeRXDmtpvrogZpk6K9y5pwKkse67i/4iKcwQxroqfoQiJLbAjmq2i1NIcl7cDlQ==@vger.kernel.org X-Gm-Message-State: AOJu0Yw9mSilWX3+u8nIIiYUzMu00QGl+d0B/Nsl82suNUHBVFVFuTKv rlKzu2vnZW0f6J4TP2B5SmPQ3GT8pY9i0RMFlfh0NzVFcDCfLm/7yqfdYlLM2g== X-Gm-Gg: ATEYQzyCmEAOycqBYIPoEsM4arWPtKdz2BZCa/j4GNkWVlEKttdNzkRWGSXL/So1ixU NcZkAIQ433omQfhAQsollIRC7dtAZZZE4wJViu0laoX17rHg9U5CfdnHQXYjI68V/qpVEK6Pi1s B+OnmNbEZCckXuxN5LM9WtgosC62BVLYzwMC7ralg//jE/6FehLZK7vcXmiIBi2EXX3Zy8IRVCR gG5Zl6BhaATH9rNj/rYeoxxmwzmT9vLGAlmn0Vc2pEFAKdW8GpWYLGORZ/PCFZ2IRahKMSJlWm4 9CF3Ykt1KsALO9mrR3DwTAXGHfushCArDv7WyjnkBeI1mFMeZ9RY5riQe62UNJd3z1SQOk1Rg/i nD/CP8y6sKFN96rQuxna99qdquQs1PC8flwTFMvSvvCNTcXpL2W13DNt6IqWTAuWQYYqGijMro1 a32fC+mrI7xLQibufsmjnAvaFbjyf/XFYwipIs1fCMVGIrX0rcppZqQGuBLLBfIC/dXXNGpEgUo CaJKvKsU0eXDpJ/dg4kdC1iWq0AGcmFU5eFDkmTuue/Ah8MQkJqzoYwObN3tmaQVS7EnGxAsvWb X-Received: by 2002:a05:6000:3104:b0:43b:4440:9c28 with SMTP id ffacd0b85a97d-43b44409c87mr9178131f8f.0.1773702515884; Mon, 16 Mar 2026 16:08:35 -0700 (PDT) Received: from ?IPV6:2a01:4b00:bd21:4f00:7cc6:d3ca:494:116c? ([2a01:4b00:bd21:4f00:7cc6:d3ca:494:116c]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-43b3a09e453sm23849884f8f.0.2026.03.16.16.08.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 16 Mar 2026 16:08:35 -0700 (PDT) Message-ID: <2cd52ebc-57d2-4407-a3d5-a46f57b90600@gmail.com> Date: Mon, 16 Mar 2026 23:08:28 +0000 Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] io_uring/poll: fix multishot recv missing EOF on wakeup race To: Jens Axboe , io-uring References: <8688cc4e-8619-4392-8d5c-93c554d70c34@kernel.dk> <2e2d6e81-bf95-47bf-9c70-1b2f8b63cfbc@gmail.com> <876c9e94-0782-4561-8ae3-0cfed18ee375@kernel.dk> <3b6769f8-4b44-47ee-a308-6f7e23304c8a@gmail.com> <6c0f631e-5015-4578-954a-07a1ca726b34@kernel.dk> <0fce925b-9148-4f83-92cb-19d164a7ea32@kernel.dk> Content-Language: en-US From: Pavel Begunkov In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 3/16/26 22:31, Jens Axboe wrote: > On 3/16/26 4:24 PM, Pavel Begunkov wrote: >> On 3/16/26 18:40, Jens Axboe wrote: >>> On 3/16/26 9:16 AM, Jens Axboe wrote: >>>> On 3/16/26 8:44 AM, Pavel Begunkov wrote: >>>>> On 3/16/26 14:40, Pavel Begunkov wrote: >>>>>> On 3/16/26 14:28, Jens Axboe wrote: >>>>>>> On 3/16/26 8:17 AM, Pavel Begunkov wrote: >>>>>>>> On 3/15/26 16:19, Jens Axboe wrote: >>>>>>>>> When a socket send and shutdown() happen back-to-back, both fire >>>>>>>>> wake-ups before the receiver's task_work has a chance to run. The first >>>>>>>>> wake gets poll ownership (poll_refs=1), and the second bumps it to 2. >>>>>>>>> When io_poll_check_events() runs, it calls io_poll_issue() which does a >>>>>>>>> recv that reads the data and returns IOU_RETRY. The loop then drains all >>>>>>>>> accumulated refs (atomic_sub_return(2) -> 0) and exits, even though only >>>>>>>>> the first event was consumed. Since the shutdown is a persistent state >>>>>>>>> change, no further wakeups will happen, and the multishot recv can hang >>>>>>>>> forever. >>>>>>>>> >>>>>>>>> Fix this by only draining a single poll ref after io_poll_issue() >>>>>>>>> returns IOU_RETRY for the APOLL_MULTISHOT path. If additional wakes >>>>>>>>> raced in (poll_refs was > 1), the loop iterates again, vfs_poll() >>>>>>>>> discovers the remaining state. >>>>>>>> >>>>>>>> How often will iterate with no effect for normal execution (i.e. >>>>>>>> no shutdown)? And how costly it'll be? Why not handle HUP instead? >>>>>>> >>>>>>> That is my worry too. I spent a bit of time on it this morning to figure >>>>>>> out why this is a new issue, and traced it down to 6.16..6.17, and this >>>>>>> commit in particular: >>>>>>> >>>>>>> commit df30285b3670bf52e1e5512e4d4482bec5e93c16 >>>>>>> Author: Kuniyuki Iwashima >>>>>>> Date: Wed Jul 2 22:35:18 2025 +0000 >>>>>>> >>>>>>> af_unix: Introduce SO_INQ. >>>>>>> >>>>>>> which is then not the first time I've had to fix fallout from that >>>>>>> commit. Need to dig a bit deeper. That said, I do also worry a bit about >>>>>>> missing events. Yes if both poll triggers are of the same type, eg >>>>>>> POLLIN, then we don't need to iterate again. IN + HUP is problematic, as >>>>>>> would anything else where you'd need separate handling for the trigger. >>>>>> >>>>>> Thinking more, I don't think the patch is correct either. Seems you >>>>>> expect the last recv to return 0, but let's say you have 2 refs and >>>>>> 8K in the rx queue. The first recv call gets 4K b/c some allocation >>>>>> fails. The 2nd recv call returns another 4K, and now you're in the >>>>>> same situation as before. >>>>>> >>>>>> You're trying to rely on a too specific behaviour. HUP handling should >>>>>> be better. >>>>> >>>>> Some variation on, if HUP'ed, it spins until the opcode give up. >>>> >>>> Took a quick look, and we don't even get a HUP, the hangup side >>>> ends up with a 0 mask. Which is less than useful... I'll keep >>>> digging. >>> >>> How about something like this? Will only retry if hup was seen, and >>> there are multiple refs. Avoids re-iterating for eg multiple POLLIN >>> wakes, which should be the common hot path if v & IO_POLL_REF_MASK != 1. >>> Keeps it local too. >> >> HUP handling is just a hack, it'd be best to avoid complicating > > It is, that's why I wasn't a huge fan of gating on that in the first > place! > >> the pool loop logic for that (and those continue do). >> >> io_poll_loop_retry() { >> ... >> atomic_or(IO_POLL_RETRY_FLAG, &req->poll_refs); >> } >> if (req->cqe.res & (POLLHUP | POLLRDHUP)) >> io_poll_loop_retry(); >> >> >> Can we isolate it like this? Nobody should care about extra >> atomics for this case. > > It's not the extra atomic, I agree it doesn't matter for this non-hot > case. It's more that setting the retry flag isn't enough, you need to > have it do another loop at that point. And if you just set that, then > it'll drop all refs and happily return and wait for the next poll > trigger that won't happen past HUP. It was actually supposed to force it into another iteration, but I see what you're saying. I'll take a look at this part tomorrow > It's not impossible we can make this work, but I don't see a great way > of doing it without messing with the loop like I did :/. Or checking > ->poll_refs again, and at that point the separate variable is better > imho. -- Pavel Begunkov