From: Andres Freund <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected]
Subject: Re: Buffered IO async context overhead
Date: Mon, 24 Feb 2020 01:35:44 -0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
Hi,
On 2020-02-14 13:49:31 -0700, Jens Axboe wrote:
> [description of buffered write workloads being slower via io_uring
> than plain writes]
> Because I'm working on other items, I didn't read carefully enough. Yes
> this won't change the situation for writes. I'll take a look at this when
> I get time, maybe there's something we can do to improve the situation.
I looked a bit into this.
I think one issue is the spinning the workers do:
static int io_wqe_worker(void *data)
{
while (!test_bit(IO_WQ_BIT_EXIT, &wq->state)) {
set_current_state(TASK_INTERRUPTIBLE);
loop:
if (did_work)
io_worker_spin_for_work(wqe);
spin_lock_irq(&wqe->lock);
if (io_wqe_run_queue(wqe)) {
static inline void io_worker_spin_for_work(struct io_wqe *wqe)
{
int i = 0;
while (++i < 1000) {
if (io_wqe_run_queue(wqe))
break;
if (need_resched())
break;
cpu_relax();
}
}
even with the cpu_relax(), that causes quite a lot of cross socket
traffic, slowing down the submission side. Which after all frequently
needs to take the wqe->lock, just to be able to submit a queue
entry.
lock, work_list, flags all reside in one cacheline, so it's pretty
likely that a single io_wqe_enqueue would get the cacheline "stolen"
several times during one enqueue - without allowing any progress in the
worker, of course.
I also wonder if we can't avoid dequeuing entries one-by-one within the
worker, at least for the IO_WQ_WORK_HASHED case. Especially when writes
are just hitting the page cache, they're pretty fast, making it
plausible to cause pretty bad contention on the spinlock (even without
the spining above). Whereas the submission side is at least somewhat
likely to be able to submit several queue entries while the worker is
processing one job, that's pretty unlikely for workers.
In the hashed case there shouldn't be another worker processing entries
for the same hash. So it seems quite possible for the wqe to drain a few
of the entries for that hash within one spinlock acquisition, and then
process them one-by-one?
Greetings,
Andres Freund
next prev parent reply other threads:[~2020-02-24 9:35 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-14 19:50 Buffered IO async context overhead Andres Freund
2020-02-14 20:13 ` Jens Axboe
2020-02-14 20:31 ` Andres Freund
2020-02-14 20:49 ` Jens Axboe
2020-02-24 9:35 ` Andres Freund [this message]
2020-02-24 15:22 ` Jens Axboe
2020-03-09 20:03 ` Pavel Begunkov
2020-03-09 20:41 ` Jens Axboe
2020-03-09 21:02 ` Pavel Begunkov
2020-03-09 21:29 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox