public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Jann Horn <[email protected]>
Cc: io-uring <[email protected]>
Subject: Re: [PATCH RFC] io_uring/rsrc: add last-lookup cache hit to io_rsrc_node_lookup()
Date: Wed, 30 Oct 2024 14:52:46 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 10/30/24 2:25 PM, Jens Axboe wrote:
> On 10/30/24 11:20 AM, Jann Horn wrote:
>> On Wed, Oct 30, 2024 at 5:58?PM Jens Axboe <[email protected]> wrote:
>>> This avoids array_index_nospec() for repeated lookups on the same node,
>>> which can be quite common (and costly). If a cached node is removed from
>>
>> You're saying array_index_nospec() can be quite costly - which
>> architecture is this on? Is this the cost of the compare+subtract+and
>> making the critical path longer?
> 
> Tested this on arm64, in a vm to be specific. Let me try and generate
> some numbers/profiles on x86-64 as well. It's noticeable there as well,
> though not quite as bad as the below example. For arm64, with the patch,
> we get roughly 8.7% of the time spent getting a resource - without it's
> 66% of the time. This is just doing a microbenchmark, but it clearly
> shows that anything following the barrier on arm64 is very costly:
> 
>   0.98 ?       ldr   x21, [x0, #96]
>        ?     ? tbnz  w2, #1, b8
>   1.04 ?       ldr   w1, [x21, #144]
>        ?       cmp   w1, w19
>        ?     ? b.ls  a0
>        ? 30:   mov   w1, w1
>        ?       sxtw  x0, w19
>        ?       cmp   x0, x1
>        ?       ngc   x0, xzr
>        ?       csdb
>        ?       ldr   x1, [x21, #160]
>        ?       and   w19, w19, w0
>  93.98 ?       ldr   x19, [x1, w19, sxtw #3]
> 
> and accounts for most of that 66% of the total cost of the micro bench,
> even though it's doing a ton more stuff than simple getting this node
> via a lookup.

Ran some x86-64 testing, and there's no such effect on x86-64. So mostly
useful on archs with more expensive array_index_nospec(). There's
obviously a cost associated with it, but it's more of an even trade off
in terms of having the extra branch vs the nospec indexing. Which means
at that point you may as well not add the extra cache, as this
particular case always hits it, and hence it's a best case kind of test.

-- 
Jens Axboe

  reply	other threads:[~2024-10-30 20:52 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-30 16:58 [PATCH RFC] io_uring/rsrc: add last-lookup cache hit to io_rsrc_node_lookup() Jens Axboe
2024-10-30 17:20 ` Jann Horn
2024-10-30 20:25   ` Jens Axboe
2024-10-30 20:52     ` Jens Axboe [this message]
2024-10-30 21:01     ` Jann Horn
2024-10-30 21:04       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox