Re: [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>,
	[email protected]
Cc: io-uring <[email protected]>,
	[email protected], [email protected]
Subject: Re: [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF
Date: Fri, 31 Jan 2020 14:30:06 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 1/24/20 7:18 AM, Pavel Begunkov wrote:
> Apart from concurrent IO execution, io_uring allows to issue a sequence
> of operations, a.k.a links, where requests are executed sequentially one
> after another. If an "error" happened, the rest of the link will be
> cancelled.
> 
> The problem is what to consider an "error". For example, if we
> read less bytes than have been asked for, the link will be cancelled.
> It's necessary to play safe here, but this implies a lot of overhead if
> that isn't the desired behaviour. The user would need to reap all
> cancelled requests, analyse the state, resubmit them and suffer from
> context switches and all in-kernel preparation work. And there are
> dozens of possibly desirable patterns, so it's just not viable to
> hard-code them into the kernel.
> 
> The other problem is to keep in running even when a request depends on
> a result of the previous one. It could be simple passing return code or
> something more fancy, like reading from the userspace.
> 
> And that's where BPF will be extremely useful. It will control the flow
> and do steering.
> 
> The concept is to be able run a BPF program after a request's
> completion, taking the request's state, and doing some of the following:
> 1. drop a link/request
> 2. issue new requests
> 3. link/unlink requests
> 4. do fast calculations / accumulate data
> 5. emit information to the userspace (e.g. via ring's CQ)
> 
> With that, it will be possible to have almost context-switch-less IO,
> and that's really tempting considering how fast current devices are.
> 
> What to discuss:
> 1. use cases
> 2. control flow for non-privileged users (e.g. allowing some popular
>    pre-registered patterns)
> 3. what input the program needs (e.g. last request's
>    io_uring_cqe) and how to pass it.
> 4. whether we need notification via CQ for each cancelled/requested
>    request, because sometimes they only add noise
> 5. BPF access to user data (e.g. allow to read only registered buffers)
> 6. implementation details. E.g.
>    - how to ask to run BPF (e.g. with a new opcode)
>    - having global BPF, bound to an io_uring instance or mixed
>    - program state and how to register
>    - rework notion of draining and sequencing
>    - live-lock avoidance (e.g. double check io_uring shut-down code)

I think this is a key topic that we should absolutely discuss at LSFMM.

-- 
Jens Axboe

     prev parent reply	other threads:[~2020-01-31 21:32 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-24 14:18 [LSF/MM/BPF TOPIC] programmable IO control flow with io_uring and BPF Pavel Begunkov
2020-01-31 21:30 ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox