public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: [email protected], [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: [PATCHSET v6] Add io_uring futex/futexv support
Date: Thu, 28 Sep 2023 11:25:09 -0600	[thread overview]
Message-ID: <[email protected]> (raw)

Hi,

This patchset adds support for first futex wake and wait, and then
futexv.

For both wait/wake/waitv, we support the bitset variant, as the
"normal" variants can be easily implemented on top of that.

PI and requeue are not supported through io_uring, just the above
mentioned parts. This may change in the future, but in the spirit
of keeping this small (and based on what people have been asking for),
this is what we currently have.

When I did these patches, I forgot that Pavel had previously posted a
futex variant for io_uring. The major thing that had been holding me
back from people asking about futexes and io_uring, is that I wanted
to do this what I consider the right way - no usage of io-wq or thread
offload, an actually async implementation that is efficient to use
and don't rely on a blocking thread for futex wait/waitv. This is what
this patchset attempts to do, while being minimally invasive on the
futex side. I believe the diffstat reflects that.

As far as I can recall, the first request for futex support with
io_uring came from Andres Freund, working on postgres. His aio rework
of postgres was one of the early adopters of io_uring, and futex
support was a natural extension for that. This is relevant from both
a usability point of view, as well as for effiency and performance.
In Andres's words, for the former:

"Futex wait support in io_uring makes it a lot easier to avoid deadlocks
in concurrent programs that have their own buffer pool: Obviously pages in
the application buffer pool have to be locked during IO. If the initiator
of IO A needs to wait for a held lock B, the holder of lock B might wait
for the IO A to complete.  The ability to wait for a lock and IO
completions at the same time provides an efficient way to avoid such
deadlocks."

and in terms of effiency, even without unlocking the full potential yet,
Andres says:

"Futex wake support in io_uring is useful because it allows for more
efficient directed wakeups.  For some "locks" postgres has queues
implemented in userspace, with wakeup logic that cannot easily be
implemented with FUTEX_WAKE_BITSET on a single "futex word" (imagine
waiting for journal flushes to have completed up to a certain point). Thus
a "lock release" sometimes need to wake up many processes in a row.  A
quick-and-dirty conversion to doing these wakeups via io_uring lead to a
3% throughput increase, with 12% fewer context switches, albeit in a
fairly extreme workload."

Some basic io_uring futex support and test cases are available in the
liburing 'futex' branch:

https://git.kernel.dk/cgit/liburing/log/?h=futex

testing all of the variants. I originally wrote this code about a
month ago and Andres has been using it with postgres, and I'm not
aware of any bugs in it. That's not to say it's perfect, obviously,
and I welcome some feedback so we can move this forward and hash out
any potential issues.

In terms of testing, there's a functionality and beat-up test case
in liburing, and I've run all the ltp futex test cases as well to
ensure we didn't inadvertently break anything. It's also been in
linux-next for a long time and haven't heard any complaints.

 include/linux/io_uring_types.h |   5 +
 include/uapi/linux/io_uring.h  |   4 +
 io_uring/Makefile              |   1 +
 io_uring/cancel.c              |   5 +
 io_uring/cancel.h              |   4 +
 io_uring/futex.c               | 386 +++++++++++++++++++++++++++++++++
 io_uring/futex.h               |  36 +++
 io_uring/io_uring.c            |   7 +
 io_uring/opdef.c               |  34 +++
 kernel/futex/futex.h           |  20 ++
 kernel/futex/requeue.c         |   3 +-
 kernel/futex/syscalls.c        |  18 +-
 kernel/futex/waitwake.c        |  49 +++--
 13 files changed, 545 insertions(+), 27 deletions(-)

You can also find the code here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-futex

V6:
- Expand the two main commit messages describing command layout
- Fix double conversion of FUTEX2_* flags (Peter)
- Use FLAGS_STRICT for IORING_OP_FUTEX_WAKE, so we return 0 wakes
  when the caller asked for 0 (Peter)
- Fix issue with IORING_OP_FUTEX_WAITV and futex_wait_multiple_setup()
  doing futex_unqueue_multiple() if we had a wakeup while setting up.
- Cleanup IORING_OP_FUTEX_WAITV issue path
- Don't use sqe->futex_flags for the FUTEX2_* flags, reserve it for
  future internal use.
- Add more liburing test cases.
- Rebase on current tree (for-6.7/io_uring + tip locking/core)

-- 
Jens Axboe



             reply	other threads:[~2023-09-28 17:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-28 17:25 Jens Axboe [this message]
2023-09-28 17:25 ` [PATCH 1/8] futex: move FUTEX2_VALID_MASK to futex.h Jens Axboe
2023-09-28 17:25 ` [PATCH 2/8] futex: factor out the futex wake handling Jens Axboe
2023-09-28 17:25 ` [PATCH 3/8] futex: abstract out a __futex_wake_mark() helper Jens Axboe
2023-09-28 17:25 ` [PATCH 4/8] io_uring: add support for futex wake and wait Jens Axboe
2023-09-28 17:25 ` [PATCH 5/8] futex: add wake_data to struct futex_q Jens Axboe
2023-09-28 17:25 ` [PATCH 6/8] futex: make futex_parse_waitv() available as a helper Jens Axboe
2023-09-28 17:25 ` [PATCH 7/8] futex: make the vectored futex operations available Jens Axboe
2023-09-28 17:25 ` [PATCH 8/8] io_uring: add support for vectored futex waits Jens Axboe
2023-09-29  7:53 ` [PATCHSET v6] Add io_uring futex/futexv support Peter Zijlstra
2023-09-29  9:11   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox