From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2ED2513CA81 for ; Fri, 22 Nov 2024 15:56:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.46 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732291021; cv=none; b=CQrDWykCcD6A/MhAhtFO6rL/Df9qHvRouX0kSnpG9cHd7C5aIalhmggSiM8bfVdqzLlDBe2HG5u60XTsBOecUsWyAyeKKAbjXOCghIFBQ51hxcr4WWCc1mLeAzr2aMhHffVh9FQypNt6USwopoUU2gHcf2CxrvsOVC+fZLXLKDg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732291021; c=relaxed/simple; bh=LQlCVcq5O4NlthMRuLbmFQDxjBZ60rIX0g/P2ZmFVUU=; h=Message-ID:Date:MIME-Version:Subject:To:References:From: In-Reply-To:Content-Type; b=AxA50ikuFC68qyd4aBWiJ1kwhG7/KPYj0C0aLLLyRrS3iKRaECkXdC/eW8zk41Uwi/hiv0572D0CuKH9if0EF2lpiIyxkujKqHgxrC03eR0LSlbUMFENRV7SA2w/+dwLd8AZ2O0AMtU93MriGUDLMFHRppwpW+GOs6BTsWeRXVk= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=T424dOsA; arc=none smtp.client-ip=209.85.218.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="T424dOsA" Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-a9f1c590ecdso387265666b.1 for ; Fri, 22 Nov 2024 07:56:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732291018; x=1732895818; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id:from :to:cc:subject:date:message-id:reply-to; bh=WiDFzOg3dgWV03/czVpYBr83+oBvJro7qfqw5cSafwI=; b=T424dOsASyMmyfv3dKyCued5jQmlTvoK41YebwMx5APkDn4A6DfjAV+loZj35yt8LV NFA+6NFZvBm2hd8zxLsr5VFb7VPzV+DNisjI9/jjntE36HfpACgnEvV4M+gHc/gnyH1J ggW0vL3XHpXrltbF4QW/QJk+Q7XeYuXvRyOJQDMCToJ4p60rp7z7pAT5fgnEptmKpz2S eAGenLW0sw2zNpnCS97P83CHi9jbluqzj4ldFFeaZXJbaUKrqs5Ep8jhsw/0ElikFZWs RFoYJpLatSshnYvZfuFHjmeIM93azS26Gz4pXOIzOk7N5GkvGLrozQl/gFNT+zxZC9GF woow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732291018; x=1732895818; h=content-transfer-encoding:in-reply-to:from:content-language :references:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WiDFzOg3dgWV03/czVpYBr83+oBvJro7qfqw5cSafwI=; b=wSGDNmJUZ/I/sLNw9iIHiBLaN0BGSV9CBQIv2LgquqWFWexPid1Ia0Y+IT53GYLL7B k8zZl9dBkoXvWDvuOV0LvZw8BoMrRx0t7Az86mtbtH5vcNC5ySejCNZBOegm9VngZX7G H8bKv9bsk1CGqqj+3Hll7T9S6LkJT/Nx3TF+2E20vVIyiNY2FTrMS8T0e1aZX2V25a5t epWQ3xb5/6KWEPUYntwHRdAg5oJug3fAmwdyKa2ZkHvWd6rh/3aCTu7HdbNW0eohJavZ 5L3vKo9A6CSVrNwTojbVdDHj8BUeRoWpum6GXVuNzkz8SbN1wMrZ9BnOTUmxJkmbC4Ed jYxA== X-Forwarded-Encrypted: i=1; AJvYcCUVhCJHqPPCCRoD1ZOuO8YMkQfe9QDgK345ClG7+HRTdWVnspUgHBZY6EY5IOJeFxDRpAYZpICV2A==@vger.kernel.org X-Gm-Message-State: AOJu0YzPCY9w8opiXHEyL3ktJY9/I47TNwPtydffTn3Quc68JsJa0f77 mu/NfBwcQF9UPKjsbd/jbR3B6FRrmANBnvpp+e6dfCGKW2elQCLHVNV4Fw== X-Gm-Gg: ASbGnctDj3PP1gIIoUSJmMxNwkojsYYNzRiPM6ezs+++UVRIIK3h3OCVVX0lq3TVMfk NMVlprZFW4MMH/CxmvvZLt14RR42fZD4V9hqbzz/CI6YGKnCPzND+Zja/u+uVBlJPeAR1dhcgsh WHUXSuqD/itlfsF0p6FFNxEAmmWIrJGEJ7PE9MIHT18veBI+EesKEAYQUPFuzXaunthM8Azde1q TZbdcNfNWbne2W8RX6D2k+rvcOpTGup3uKJFiG4cSAnZqHs/1sozL2bBik= X-Google-Smtp-Source: AGHT+IFHp3ANywv9NZ0W+i4Gfy+B0OhWDx4ptxYPyFGoafiuqDJsmvI8mo7DRjNlkd902UCLg91BKA== X-Received: by 2002:a17:907:82a8:b0:a99:f6ee:1ee3 with SMTP id a640c23a62f3a-aa509d21918mr256074266b.43.1732291018328; Fri, 22 Nov 2024 07:56:58 -0800 (PST) Received: from [192.168.42.9] ([163.114.131.193]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aa50b52fb96sm114037166b.131.2024.11.22.07.56.57 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 22 Nov 2024 07:56:57 -0800 (PST) Message-ID: <677f07d2-213f-4c63-9379-385d282aa4f3@gmail.com> Date: Fri, 22 Nov 2024 15:57:52 +0000 Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH next v1 2/2] io_uring: limit local tw done To: David Wei , Jens Axboe , io-uring@vger.kernel.org References: <20241120221452.3762588-1-dw@davidwei.uk> <20241120221452.3762588-3-dw@davidwei.uk> <95470d11-c791-4b00-be95-45c2573c6b86@kernel.dk> <614ce5a4-d289-4c3a-be5b-236769566557@gmail.com> <66fa0bfd-13aa-4608-a390-17ea5f333940@gmail.com> <8dfe2a9b-52f8-4206-a670-ecede76ab637@davidwei.uk> Content-Language: en-US From: Pavel Begunkov In-Reply-To: <8dfe2a9b-52f8-4206-a670-ecede76ab637@davidwei.uk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 11/21/24 17:53, David Wei wrote: > On 2024-11-21 07:07, Pavel Begunkov wrote: >> On 11/21/24 14:31, Jens Axboe wrote: >>> On 11/21/24 7:25 AM, Pavel Begunkov wrote: >>>> On 11/21/24 01:12, Jens Axboe wrote: >>>>> On 11/20/24 4:56 PM, Pavel Begunkov wrote: >>>>>> On 11/20/24 22:14, David Wei wrote: >> ... >>>>> I think that can only work if we change work_llist to be a regular list >>>>> with regular locking. Otherwise it's a bit of a mess with the list being >>>> >>>> Dylan once measured the overhead of locks vs atomics in this >>>> path for some artificial case, we can pull the numbers up. >>> >>> I did it more recently if you'll remember, actually posted a patch I >>> think a few months ago changing it to that. But even that approach adds >> >> Right, and it's be a separate topic from this set. >> >>> extra overhead, if you want to add it to the same list as now you need >> >> Extra overhead to the retry path, which is not the hot path, >> and coldness of it is uncertain. >> >>> to re-grab (and re-disable interrupts) the lock to add it back. My gut >>> says that would be _worse_ than the current approach. And if you keep a >>> separate list instead, well then you're back to identical overhead in >>> terms of now needing to check both when needing to know if anything is >>> pending, and checking both when running it. >>> >>>>> reordered, and then you're spending extra cycles on potentially >>>>> reordering all the entries again. >>>> >>>> That sucks, I agree, but then it's same question of how often >>>> it happens. >>> >>> At least for now, there's a real issue reported and we should fix it. I >>> think the current patches are fine in that regard. That doesn't mean we >>> can't potentially make it better, we should certainly investigate that. >>> But I don't see the current patches as being suboptimal really, they are >>> definitely good enough as-is for solving the issue. >> >> That's fair enough, but I still would love to know how frequent >> it is. There is no purpose in optimising it as hot/slow path if >> it triggers every fifth run or such. David, how easy it is to >> get some stats? We can hack up some bpftrace script >> > > Here is a sample distribution of how many task work is done per > __io_run_local_work(): > > @work_done: > [1] 15385954 |@ | > [2, 4) 33424809 |@@@@ | > [4, 8) 196055270 |@@@@@@@@@@@@@@@@@@@@@@@@ | > [8, 16) 419060191 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > [16, 32) 48395043 |@@@@@@ | > [32, 64) 1573469 | | > [64, 128) 98151 | | > [128, 256) 14288 | | > [256, 512) 2035 | | > [512, 1K) 268 | | > [1K, 2K) 13 | | Nice > This workload had wait_nr set to 20 and the timeout set to 500 µs. > > Empirically, I know that any task work done > 50 will violate the > latency limit for this workload. In these cases, all the requests must > be dropped. So even if excessive task work happens in a small % of time, > the impact is far larger than this. So you've got a long tail, which spikes your nines, that makes sense. On the other hand it's perhaps 5-10% of total, though hard to judge as the [16,32) bucket is split by the constant 20. My guess would be a small optimisation for the normal case adding a bit more to the requeue may well worth it but depends on how sharp the skew in the bucket is. -- Pavel Begunkov