Re: [PATCH next v1 2/2] io_uring: limit local tw done

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>,
	David Wei <[email protected]>,
	[email protected]
Subject: Re: [PATCH next v1 2/2] io_uring: limit local tw done
Date: Fri, 22 Nov 2024 10:08:59 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 11/22/24 10:01 AM, Pavel Begunkov wrote:
> On 11/21/24 17:05, Jens Axboe wrote:
>> On 11/21/24 9:57 AM, Jens Axboe wrote:
>>> I did run a basic IRQ storage test as-is, and will compare that with the
>>> llist stuff we have now. Just in terms of overhead. It's not quite a
>>> networking test, but you do get the IRQ side and some burstiness in
>>> terms of completions that way too, at high rates. So should be roughly
>>> comparable.
>>
>> Perf looks comparable, it's about 60M IOPS. Some fluctuation with IRQ
> 
> 60M with iopoll? That one normally shouldn't use use task_work

Maybe that wasn't clear, but it's IRQ driven IO. Otherwise indeed
there'd be no task_work in use.

>> driven, so won't render an opinion on whether one is faster than the
>> other. What is visible though is that adding and running local task_work
>> drops from 2.39% to 2.02% using spinlock + io_wq_work_list over llist,
> 
> Do you summed it up with io_req_local_work_add()? Just sounds a bit
> weird since it's usually run off [soft]irq. I have doubts that part
> became faster. Running could be, especially with high QD and
> consistency of SSD. Btw, what QD was it? 32?

It may just trigger more in frequency in terms of profiling, since the
list reversal is done. Profiling isn't 100% exact.

>> and we entirely drop 2.2% of list reversing in the process.
> 
> We actually discussed it before but in some different patchset,
> perf is not helpful much here, the overhead and cache loading
> moves around a lot between functions.
> 
> I don't think we have a solid proof here, especially for networking
> workloads, which tend to hammer it more from more CPUs. Can we run
> some net benchmarks? Even better to do a good prod experiment.

Already in motion. I ran some here and didn't show any differences at
all, but task_work load was also fairly light. David is running the
networking side and we'll see what it says.

I don't particularly love list + lock for this, but at the end of the
day, the only real downside is the irq disabling nature of it.
Everything else is both simpler, and avoids the really annoying LIFO
nature of llist. I'd expect, all things being equal, that list + lock is
going to be ever so slightly slower. Both will bounce the list
cacheline, no difference in cost on that side. But when you add list
reversal to the mix, that's going to push it to being an overall win.

-- 
Jens Axboe

next prev parent reply	other threads:[~2024-11-22 17:09 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20 22:14 [PATCH next v1 0/2] limit local tw done David Wei
2024-11-20 22:14 ` [PATCH next v1 1/2] io_uring: add io_local_work_pending() David Wei
2024-11-20 23:45   ` Pavel Begunkov
2024-11-20 22:14 ` [PATCH next v1 2/2] io_uring: limit local tw done David Wei
2024-11-20 23:56   ` Pavel Begunkov
2024-11-21  0:52     ` David Wei
2024-11-21 14:29       ` Pavel Begunkov
2024-11-21 14:34         ` Jens Axboe
2024-11-21 14:58           ` Pavel Begunkov
2024-11-21 15:02             ` Jens Axboe
2024-11-21  1:12     ` Jens Axboe
2024-11-21 14:25       ` Pavel Begunkov
2024-11-21 14:31         ` Jens Axboe
2024-11-21 15:07           ` Pavel Begunkov
2024-11-21 15:15             ` Jens Axboe
2024-11-21 15:22               ` Jens Axboe
2024-11-21 16:00                 ` Pavel Begunkov
2024-11-21 16:05                   ` Jens Axboe
2024-11-21 16:18                 ` Pavel Begunkov
2024-11-21 16:20                   ` Jens Axboe
2024-11-21 16:43                     ` Pavel Begunkov
2024-11-21 16:57                       ` Jens Axboe
2024-11-21 17:05                         ` Jens Axboe
2024-11-22 17:01                           ` Pavel Begunkov
2024-11-22 17:08                             ` Jens Axboe [this message]
2024-11-23  0:50                               ` Pavel Begunkov
2024-11-21 17:53             ` David Wei
2024-11-22 15:57               ` Pavel Begunkov
2024-11-21  1:12 ` [PATCH next v1 0/2] " Jens Axboe
2024-11-21 14:16 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox