[LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: [email protected]
Cc: Jens Axboe <[email protected]>, io-uring <[email protected]>,
	[email protected], [email protected]
Subject: [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF
Date: Fri, 24 Jan 2020 17:18:51 +0300	[thread overview]
Message-ID: <[email protected]> (raw)

[-- Attachment #1.1: Type: text/plain, Size: 2247 bytes --]

Apart from concurrent IO execution, io_uring allows to issue a sequence
of operations, a.k.a links, where requests are executed sequentially one
after another. If an "error" happened, the rest of the link will be
cancelled.

The problem is what to consider an "error". For example, if we
read less bytes than have been asked for, the link will be cancelled.
It's necessary to play safe here, but this implies a lot of overhead if
that isn't the desired behaviour. The user would need to reap all
cancelled requests, analyse the state, resubmit them and suffer from
context switches and all in-kernel preparation work. And there are
dozens of possibly desirable patterns, so it's just not viable to
hard-code them into the kernel.

The other problem is to keep in running even when a request depends on
a result of the previous one. It could be simple passing return code or
something more fancy, like reading from the userspace.

And that's where BPF will be extremely useful. It will control the flow
and do steering.

The concept is to be able run a BPF program after a request's
completion, taking the request's state, and doing some of the following:
1. drop a link/request
2. issue new requests
3. link/unlink requests
4. do fast calculations / accumulate data
5. emit information to the userspace (e.g. via ring's CQ)

With that, it will be possible to have almost context-switch-less IO,
and that's really tempting considering how fast current devices are.

What to discuss:
1. use cases
2. control flow for non-privileged users (e.g. allowing some popular
   pre-registered patterns)
3. what input the program needs (e.g. last request's
   io_uring_cqe) and how to pass it.
4. whether we need notification via CQ for each cancelled/requested
   request, because sometimes they only add noise
5. BPF access to user data (e.g. allow to read only registered buffers)
6. implementation details. E.g.
   - how to ask to run BPF (e.g. with a new opcode)
   - having global BPF, bound to an io_uring instance or mixed
   - program state and how to register
   - rework notion of draining and sequencing
   - live-lock avoidance (e.g. double check io_uring shut-down code)

-- 
Pavel Begunkov

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next             reply	other threads:[~2020-01-24 14:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-24 14:18 Pavel Begunkov [this message]
2020-01-31 21:30 ` [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox