public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Andres Freund <[email protected]>
Cc: [email protected], Jens Axboe <[email protected]>,
	Thomas Gleixner <[email protected]>,
	Ingo Molnar <[email protected]>,
	Peter Zijlstra <[email protected]>,
	Darren Hart <[email protected]>,
	Davidlohr Bueso <[email protected]>,
	[email protected]
Subject: Re: [RFC 0/4] futex request support
Date: Fri, 4 Jun 2021 16:26:21 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 6/3/21 7:59 PM, Andres Freund wrote:
> Hi,
> 
> On 2021-06-01 15:58:25 +0100, Pavel Begunkov wrote:
>> Should be interesting for a bunch of people, so we should first
>> outline API and capabilities it should give. As I almost never
>> had to deal with futexes myself, would especially love to hear
>> use case, what might be lacking and other blind spots.
> 
> I did chat with Jens about how useful futex support would be in io_uring, so I
> should outline our / my needs. I'm off work this week though, so I don't think
> I'll have much time to experiment.
> 
> For postgres's AIO support (which I am working on) there are two, largely
> independent, use-cases for desiring futex support in io_uring.
> 
> The first is the ability to wait for locks (queued r/w locks, blocking
> implemented via futexes) and IO at the same time, within one task. Quickly and
> efficiently processing IO completions can improve whole-system latency and
> throughput substantially in some cases (journalling, indexes and other
> high-contention areas - which often have a low queue depth). This is true
> *especially* when there also is lock contention, which tends to make efficient
> IO scheduling harder.

Can you give a quick pointer to futex uses in the postgres code or
where they are? Can't find in master but want to see what types of
futex operations are used and how.

> The second use case is the ability to efficiently wait in several tasks for
> one IO to be processed. The prototypical example here is group commit/journal
> flush, where each task can only continue once the journal flush has
> completed. Typically one of waiters has to do a small amount of work with the
> completion (updating a few shared memory variables) before the other waiters
> can be released. It is hard to implement this efficiently and race-free with
> io_uring right now without adding locking around *waiting* on the completion
> side (instead of just consumption of completions). One cannot just wait on the
> io_uring, because of a) the obvious race that another process could reap all
> completions between check and wait b) there is no good way to wake up other
> waiters once the userspace portion of IO completion is through.

IIRC, the idea is to have a link "I/O -> fut_wake(master_task or nr=1)",
and then after getting some work done the woken task does wake(nr=all),
presumably via sys_futex or io_uring. Is that right?

As with this option userspace can't modify the memory on which futex
sits, the wake in the patchset allows to do an atomic add similarly
to FUTEX_WAKE_OP. However, I still have general concerns if that's
a flexible enough way.

When io_uring-BPF is added it can be offloaded to BPF programs
probably together with "updating a few shared memory variables",
but these are just thoughts for the future.

> All answers for postgres:
> 
>> 1) Do we need PI?
> 
> Not right now.
> 
> Not related to io_uring: I do wish there were a lower overhead (and lower
> guarantees) version of PI futexes. Not for correctness reasons, but
> performance. Granting the waiter's timeslice to the lock holder would improve
> common contention scenarios with more runnable tasks than cores.
> 
> 
>> 2) Do we need requeue? Anything else?
> 
> I can see requeue being useful, but I haven't thought it through fully.
> 
> Do the wake/wait ops as you have them right now support bitsets?

No, but trivial to add

>> 3) How hot waits are? May be done fully async avoiding io-wq, but
>> apparently requires more changes in futex code.
> 
> The waits can be quite hot, most prominently on low latency storage, but not
> just.

Thanks Andres, that clears it up. The next step would be to verify
that FUTEX_WAKE_OP-style waking is enough.

-- 
Pavel Begunkov

      reply	other threads:[~2021-06-04 15:27 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-01 14:58 [RFC 0/4] futex request support Pavel Begunkov
2021-06-01 14:58 ` [RFC 1/4] futex: add op wake for a single key Pavel Begunkov
2021-06-01 14:58 ` [RFC 2/4] io_uring: frame out futex op Pavel Begunkov
2021-06-01 14:58 ` [RFC 3/4] io_uring: support futex wake requests Pavel Begunkov
2021-06-01 14:58 ` [RFC 4/4] io_uring: implement futex wait Pavel Begunkov
2021-06-01 15:45   ` Jens Axboe
2021-06-01 15:58     ` Pavel Begunkov
2021-06-01 16:01       ` Jens Axboe
2021-06-01 16:29         ` Pavel Begunkov
2021-06-01 21:53           ` Thomas Gleixner
2021-06-03 10:31             ` Pavel Begunkov
2021-06-04  9:19               ` Thomas Gleixner
2021-06-04 11:58                 ` Pavel Begunkov
2021-06-05  2:09                   ` Thomas Gleixner
2021-06-07 12:14                     ` Pavel Begunkov
2021-06-03 19:03             ` Andres Freund
2021-06-03 21:10               ` Peter Zijlstra
2021-06-03 21:21                 ` Andres Freund
2021-06-05  0:43               ` Thomas Gleixner
2021-06-07 11:31                 ` Pavel Begunkov
2021-06-07 11:48                   ` Peter Zijlstra
2021-06-03 18:59 ` [RFC 0/4] futex request support Andres Freund
2021-06-04 15:26   ` Pavel Begunkov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox