Re: [PATCH next v1 2/2] io_uring: limit local tw done

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: David Wei <[email protected]>, Jens Axboe <[email protected]>,
	[email protected]
Subject: Re: [PATCH next v1 2/2] io_uring: limit local tw done
Date: Fri, 22 Nov 2024 15:57:52 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 11/21/24 17:53, David Wei wrote:
> On 2024-11-21 07:07, Pavel Begunkov wrote:
>> On 11/21/24 14:31, Jens Axboe wrote:
>>> On 11/21/24 7:25 AM, Pavel Begunkov wrote:
>>>> On 11/21/24 01:12, Jens Axboe wrote:
>>>>> On 11/20/24 4:56 PM, Pavel Begunkov wrote:
>>>>>> On 11/20/24 22:14, David Wei wrote:
>> ...
>>>>> I think that can only work if we change work_llist to be a regular list
>>>>> with regular locking. Otherwise it's a bit of a mess with the list being
>>>>
>>>> Dylan once measured the overhead of locks vs atomics in this
>>>> path for some artificial case, we can pull the numbers up.
>>>
>>> I did it more recently if you'll remember, actually posted a patch I
>>> think a few months ago changing it to that. But even that approach adds
>>
>> Right, and it's be a separate topic from this set.
>>
>>> extra overhead, if you want to add it to the same list as now you need
>>
>> Extra overhead to the retry path, which is not the hot path,
>> and coldness of it is uncertain.
>>
>>> to re-grab (and re-disable interrupts) the lock to add it back. My gut
>>> says that would be _worse_ than the current approach. And if you keep a
>>> separate list instead, well then you're back to identical overhead in
>>> terms of now needing to check both when needing to know if anything is
>>> pending, and checking both when running it.
>>>
>>>>> reordered, and then you're spending extra cycles on potentially
>>>>> reordering all the entries again.
>>>>
>>>> That sucks, I agree, but then it's same question of how often
>>>> it happens.
>>>
>>> At least for now, there's a real issue reported and we should fix it. I
>>> think the current patches are fine in that regard. That doesn't mean we
>>> can't potentially make it better, we should certainly investigate that.
>>> But I don't see the current patches as being suboptimal really, they are
>>> definitely good enough as-is for solving the issue.
>>
>> That's fair enough, but I still would love to know how frequent
>> it is. There is no purpose in optimising it as hot/slow path if
>> it triggers every fifth run or such. David, how easy it is to
>> get some stats? We can hack up some bpftrace script
>>
> 
> Here is a sample distribution of how many task work is done per
> __io_run_local_work():
> 
> @work_done:
> [1]             15385954  |@                                                   |
> [2, 4)          33424809  |@@@@                                                |
> [4, 8)          196055270 |@@@@@@@@@@@@@@@@@@@@@@@@                            |
> [8, 16)         419060191 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [16, 32)        48395043  |@@@@@@                                              |
> [32, 64)         1573469  |                                                    |
> [64, 128)          98151  |                                                    |
> [128, 256)         14288  |                                                    |
> [256, 512)          2035  |                                                    |
> [512, 1K)            268  |                                                    |
> [1K, 2K)              13  |                                                    |

Nice

> This workload had wait_nr set to 20 and the timeout set to 500 µs.
> 
> Empirically, I know that any task work done > 50 will violate the
> latency limit for this workload. In these cases, all the requests must
> be dropped. So even if excessive task work happens in a small % of time,
> the impact is far larger than this.

So you've got a long tail, which spikes your nines, that makes sense.
On the other hand it's perhaps 5-10% of total, though hard to judge
as the [16,32) bucket is split by the constant 20. My guess would be
a small optimisation for the normal case adding a bit more to the
requeue may well worth it but depends on how sharp the skew in the
bucket is.

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2024-11-22 15:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20 22:14 [PATCH next v1 0/2] limit local tw done David Wei
2024-11-20 22:14 ` [PATCH next v1 1/2] io_uring: add io_local_work_pending() David Wei
2024-11-20 23:45   ` Pavel Begunkov
2024-11-20 22:14 ` [PATCH next v1 2/2] io_uring: limit local tw done David Wei
2024-11-20 23:56   ` Pavel Begunkov
2024-11-21  0:52     ` David Wei
2024-11-21 14:29       ` Pavel Begunkov
2024-11-21 14:34         ` Jens Axboe
2024-11-21 14:58           ` Pavel Begunkov
2024-11-21 15:02             ` Jens Axboe
2024-11-21  1:12     ` Jens Axboe
2024-11-21 14:25       ` Pavel Begunkov
2024-11-21 14:31         ` Jens Axboe
2024-11-21 15:07           ` Pavel Begunkov
2024-11-21 15:15             ` Jens Axboe
2024-11-21 15:22               ` Jens Axboe
2024-11-21 16:00                 ` Pavel Begunkov
2024-11-21 16:05                   ` Jens Axboe
2024-11-21 16:18                 ` Pavel Begunkov
2024-11-21 16:20                   ` Jens Axboe
2024-11-21 16:43                     ` Pavel Begunkov
2024-11-21 16:57                       ` Jens Axboe
2024-11-21 17:05                         ` Jens Axboe
2024-11-22 17:01                           ` Pavel Begunkov
2024-11-22 17:08                             ` Jens Axboe
2024-11-23  0:50                               ` Pavel Begunkov
2024-11-21 17:53             ` David Wei
2024-11-22 15:57               ` Pavel Begunkov [this message]
2024-11-21  1:12 ` [PATCH next v1 0/2] " Jens Axboe
2024-11-21 14:16 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox