From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, Andres Freund <[email protected]>
Cc: [email protected]
Subject: Re: Buffered IO async context overhead
Date: Mon, 9 Mar 2020 23:03:52 +0300 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 24/02/2020 18:22, Jens Axboe wrote:
> On 2/24/20 2:35 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2020-02-14 13:49:31 -0700, Jens Axboe wrote:
>>> [description of buffered write workloads being slower via io_uring
>>> than plain writes]
>>> Because I'm working on other items, I didn't read carefully enough. Yes
>>> this won't change the situation for writes. I'll take a look at this when
>>> I get time, maybe there's something we can do to improve the situation.
>>
>> I looked a bit into this.
>>
>> I think one issue is the spinning the workers do:
>>
>> static int io_wqe_worker(void *data)
>> {
>>
>> while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
>> set_current_state(TASK_INTERRUPTIBLE);
>> loop:
>> if (did_work)
>> io_worker_spin_for_work(wqe);
>> spin_lock_irq(&wqe->lock);
>> if (io_wqe_run_queue(wqe)) {
>>
>> static inline void io_worker_spin_for_work(struct io_wqe *wqe)
>> {
>> int i = 0;
>>
>> while (++i < 1000) {
>> if (io_wqe_run_queue(wqe))
>> break;
>> if (need_resched())
>> break;
>> cpu_relax();
>> }
>> }
>>
>> even with the cpu_relax(), that causes quite a lot of cross socket
>> traffic, slowing down the submission side. Which after all frequently
>> needs to take the wqe->lock, just to be able to submit a queue
>> entry.
>>
>> lock, work_list, flags all reside in one cacheline, so it's pretty
>> likely that a single io_wqe_enqueue would get the cacheline "stolen"
>> several times during one enqueue - without allowing any progress in the
>> worker, of course.
>
> Since it's provably harmful for this case, and the gain was small (but
> noticeable) on single issue cases, I think we should just kill it. With
> the poll retry stuff for 5.7, there'll be even less of a need for it.
>
> Care to send a patch for 5.6 to kill it?
>
>> I also wonder if we can't avoid dequeuing entries one-by-one within the
>> worker, at least for the IO_WQ_WORK_HASHED case. Especially when writes
>> are just hitting the page cache, they're pretty fast, making it
>> plausible to cause pretty bad contention on the spinlock (even without
>> the spining above). Whereas the submission side is at least somewhat
>> likely to be able to submit several queue entries while the worker is
>> processing one job, that's pretty unlikely for workers.
>>
>> In the hashed case there shouldn't be another worker processing entries
>> for the same hash. So it seems quite possible for the wqe to drain a few
>> of the entries for that hash within one spinlock acquisition, and then
>> process them one-by-one?
>
> Yeah, I think that'd be a good optimization for hashed work. Work N+1
> can't make any progress before work N is done anyway, so might as well
> grab a batch at the time.
>
A problem here is that we actually have a 2D array of works because of linked
requests.
We can io_wqe_enqueue() dependant works, if have hashed requests, so delegating
it to other threads. But if the work->list is not per-core, it will hurt
locality. Either re-enqueue hashed ones if there is a dependant work. Need to
think how to do better.
--
Pavel Begunkov
next prev parent reply other threads:[~2020-03-09 20:04 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-14 19:50 Buffered IO async context overhead Andres Freund
2020-02-14 20:13 ` Jens Axboe
2020-02-14 20:31 ` Andres Freund
2020-02-14 20:49 ` Jens Axboe
2020-02-24 9:35 ` Andres Freund
2020-02-24 15:22 ` Jens Axboe
2020-03-09 20:03 ` Pavel Begunkov [this message]
2020-03-09 20:41 ` Jens Axboe
2020-03-09 21:02 ` Pavel Begunkov
2020-03-09 21:29 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox