From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 193D419258E for ; Fri, 12 Dec 2025 08:59:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.194 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765529945; cv=none; b=aZ91ETIoWW2aDY1Y56gb09wtQYPDbfODV9m2o4H//iVsOryyPLqkZn0rApYI/lPq+uhHSxrZpUpPYT2XArHjE4LMOEwPGvGpXK2N9iTNkXXZB2TUfM6PvtRc0q0UR2EaobSR1TOhkvm9/SF32fjwixhq5exkPvaW94Wa1ZaOSAY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765529945; c=relaxed/simple; bh=Lig0LqSoxrsMIPZOVo41vDENsNaV3wyO/82LYNDFXkM=; h=Message-ID:Date:MIME-Version:Subject:From:To:Cc:References: In-Reply-To:Content-Type; b=GvsnJl4DEbRI0+ZvPaNQbF/p87D9DgZolu72fFKrG9klBp4IcI7b0OShCPt/QxbCrz3SnMPBU5G/ndBGXtWd4GrEyRRzhwYVb3gUurqi475s1uF3GtA/b9EKXvEgEW23T8j9/5q61QmEJW3qWHkttwaDXh35gV4oo9fgfVA6M/Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=PJQA6Dmf; arc=none smtp.client-ip=209.85.214.194 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="PJQA6Dmf" Received: by mail-pl1-f194.google.com with SMTP id d9443c01a7336-29808a9a96aso12599575ad.1 for ; Fri, 12 Dec 2025 00:59:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1765529941; x=1766134741; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:user-agent:mime-version:date:message-id:from:to :cc:subject:date:message-id:reply-to; bh=u1xQgbothloxarLneTHB92ruyodVZcrboELr7LaPYJU=; b=PJQA6Dmf3PcYBJUaxTPjEUBqemz9SPVuHe1EUHeeHxg77ELhgDjw2IFGWbQRlesiGL YN1K23KmAHPYlv/YqgJYGuo6ccNuKuvcFZGM84LD8bdHSKKp3rcpVVXFq9QHRQjuSrVQ HzX7LQyjSdAJfb4VInG4nLXUyVpl9IYZFc/4fH6FCpQxHnrkL5PFlsqeZzfPFbmp9j0F noItJKD2fZDnUEDMmDviO5dj05kNZ0dORuTnmZq+3QzkEoMiSy1s9cZddH1HKVhgZOYh rSFota3gMBlygWgfpospG2fpAkLVpaYmKupQCA54F200cseDVLjSCBzCPROLvuCtxIcx FXhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765529941; x=1766134741; h=content-transfer-encoding:in-reply-to:content-language:references :cc:to:from:subject:user-agent:mime-version:date:message-id:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=u1xQgbothloxarLneTHB92ruyodVZcrboELr7LaPYJU=; b=JLWrcZh2B21kijUBQxOq+YPpeBPTEswc14b89TD+uVm5Y6DULZEQpWSa5UGvpJqhKv yYUDbZm5fSIP1R1yPFcQOg4g774KW8OyYmWm/TwMDqV5rAjJHqEV8I6ZMJ7Q5O/ydBJ5 oUY+uOhOnO/FMygP4dUjrWx+OG4NknNeKKmfnJBz570OxPvFNMDFHeghHxy63+ybkAFx ymtvXvlojZHDwOrZr/rNK1HHJOObbwMIt1nCBhNmgV3xcSC6s0aJmEAEtTIIpo8y1uD2 OqfwDShqYATU0A2r4G+3cWm7IAmwvgtsGC+IgNlzvtp0Rf0MEDsYgT1juyClj8r9BWIW 4rrw== X-Forwarded-Encrypted: i=1; AJvYcCVIYVX9Q58BzVY7UwkhXj0KBnmbpmCgZTWC7YiSZWvyB5dJPNKd6wvOIvGg/QA0rNzOAnlneWNVUw==@vger.kernel.org X-Gm-Message-State: AOJu0YwGstS0s/66oivkJDoqaUxWvFEDaMuLXAXlWjH42lVuHwUG03fO zRGnpA8JSfKVlDkoX1eyGcn6k/xRfrO0hS9a9/rcqaftRF/pkgKFxtz/boFNbDgA+Sw= X-Gm-Gg: AY/fxX43OYIiJCcs8ZexXPIRzY+oxCeFHa/UoaR5ya0dyfaSDVOrIWRso7ycOofj1Rg IFQzPWW+Ed2VsCXDXI6NDFsf8Ee/BfJvk77mgejFfouVkEqKV+sRWP9ZeyUmE8lEQDghcV0V1MG 2LEyl9jznXOQNrhBtBIkqUbnEmzWaZ2z4kXLmxFHBgpAGg2n5YE1LmdPdHcMoY5VDREwKOSGA08 btBtmbnWICI7cNlMN8meVgBBMjyW7OVlT7s7xVy0IWq3mv4IJuWShfls2dbQ/UKdIRlzXPJeN+/ GXrVhsoekilpgcQV6H3J70dZSgxHF3Z5dWm0TYKQGa+M3ITzQALp3ku1KHdZXcNd/BeIxK9y44A 0FQBDd7xqR+Gsekpf7+lJ3iaTuzGtpA9ozG+JdZjjSEfRWk5WJyPk3aDdnqomHLrYGMH8x8mt8H 7offgUJEhUDgsflL+yGYLTSq4FpQcTHJH3SiR/SpaTKoBsvhjqHQ== X-Google-Smtp-Source: AGHT+IH5FTDn8i1xn0rlsANc0F5jSSd1Qwf1hA2ZlTorg/VsnGduGEYj/xvEQ0zR9ZQdKbHyEtt82A== X-Received: by 2002:a17:903:2bcb:b0:28e:aacb:e702 with SMTP id d9443c01a7336-29f23dfdda5mr17611585ad.2.1765529941115; Fri, 12 Dec 2025 00:59:01 -0800 (PST) Received: from [172.20.4.188] (221x255x142x61.ap221.ftth.ucom.ne.jp. [221.255.142.61]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29ee9d38c74sm48269385ad.31.2025.12.12.00.58.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 12 Dec 2025 00:59:00 -0800 (PST) Message-ID: <38972bbc-fb6f-42c5-bd17-b19db134dfad@kernel.dk> Date: Fri, 12 Dec 2025 01:58:57 -0700 Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 2/2] io_uring: fix io may accumulation in poll mode From: Jens Axboe To: Fengnan Chang , asml.silence@gmail.com, io-uring@vger.kernel.org Cc: Fengnan Chang , Diangang Li References: <20251210085501.84261-1-changfengnan@bytedance.com> <20251210085501.84261-3-changfengnan@bytedance.com> <69f81ed8-2b4a-461f-90b8-0b9752140f8d@kernel.dk> <0661763c-4f56-4895-afd2-7346bb2452e4@gmail.com> <0654d130-665a-4b1a-b99b-bb80ca06353a@kernel.dk> <1acb251a-4c4a-479c-a51e-a8db9a6e0fa3@kernel.dk> <5ce7c227-3a03-4586-baa8-5bd6579500c7@gmail.com> <1d8a4c67-0c30-449e-a4e3-24363de0fcfa@kernel.dk> Content-Language: en-US In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 12/11/25 10:11 PM, Jens Axboe wrote: > On 12/11/25 7:12 PM, Fengnan Chang wrote: >> >> >> ? 2025/12/12 09:53, Jens Axboe ??: >>> On 12/11/25 6:41 PM, Fengnan Chang wrote: >>>> Oh, we can't add nr_events == iob.nr_reqs check, if >>>> blk_mq_add_to_batch add failed, completed IO will not add into iob, >>>> iob.nr_reqs will be 0, this may cause io hang. >>> Indeed, won't work as-is. >>> >>> I do think we're probably making a bigger deal out of the full loop than >>> necessary. At least I'd be perfectly happy with just the current patch, >>> performance should be better there than we currently have it. Ideally >>> we'd have just one loop for polling and catching the completed items, >>> but that's a bit tricky with the batch completions. >> >> Yes, ideally one loop would be enough, but given that there are also >> multi_queue ctx, that doesn't seem to be possible. > > It's not removing the double loop, but the below could help _only_ > iterate completed requests at the end. Rather than move items between > the current list at the completion callback, have a separate list just > for completed requests. Then we can simply iterate that, knowing all of > them have completed. Gets rid of the ->iopoll_completed as well, and > then we can move the poll_refs. Not really related at all, obviously > this patch should be split into multiple pieces. > > This uses a lockless list. But since the producer and consumer are > generally the same task, that should not add any real overhead. On top > of the previous one I sent. What do you think? Ran some quick testing, as one interesting case is mixing slower and faster devices. Let's take this basic example: t/io_uring -p1 -d128 -b512 -s32 -c32 -F1 -B1 -R1 -X1 -n1 -P1 -t1 -n2 /dev/nvme32n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 where nvme32n1 is about 1.27M IOPS, and the other 3 do about 3.3M IOPS, and we poll 2 devices with each IO thread. With the current kernel, we get: IOPS=5.18M, BW=2.53GiB/s, IOS/call=31/32 IOPS=5.19M, BW=2.53GiB/s, IOS/call=31/31 IOPS=5.17M, BW=2.53GiB/s, IOS/call=31/31 and with the two patches we get: IOPS=6.54M, BW=3.19GiB/s, IOS/call=31/31 IOPS=6.52M, BW=3.18GiB/s, IOS/call=31/31 IOPS=6.52M, BW=3.18GiB/s, IOS/call=31/31 or about a 25% improvement. This is mostly due to the issue you highlighted, where you end up with later completions (that are done) being stuck behind waiting on a slower completion. Note: obviously 1 thread driving multiple devices for polling could still be improved, and in fact it does improve if we simply change -c32 to something lower. The more important case is the one you identified, where different completion times on the same device will hold completions up. Multi device polling is just an interesting way to kind of emulate that, to an extent. This effect is (naturally) also apparent in the completion latencies as well, particularly in the higher percentiles. Ran peak testing too, and it's better all around than before. -- Jens Axboe