Re: [RFC] Programming model for io_uring + eBPF

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Christian Dietrich <[email protected]>
To: Pavel Begunkov <[email protected]>,
	io-uring <[email protected]>
Cc: Horst Schirmeier <[email protected]>,
	"Franz-B. Tuneke" <[email protected]>
Subject: Re: [RFC] Programming model for io_uring + eBPF
Date: Wed, 05 May 2021 14:57:48 +0200	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

Pavel Begunkov <[email protected]> [01. May 2021]:

> No need to register maps, it's passed as an fd. Looks I have never
> posted any example, but you can even compile fds in in bpf programs
> at runtime.
>
> fd = ...;
> bpf_prog[] = { ..., READ_MAP(fd), ...};
> compile_bpf(bpf_prog);
>
> I was thinking about the opposite, to allow to optionally register
> them to save on atomic ops. Not in the next patch iteration though

That was exactly my idea:  make it possible to register them
optionally.

I also think that registering the eBPF program FD could be made
optionally, since this would be similar to other FD references that are
submitted to io_uring.

> Yes, at least that's how I envision it. But that's a good question
> what else we want to pass. E.g. some arbitrary ids (u64), that's
> forwarded to the program, it can be a pointer to somewhere or just
> a piece of current state.

So, we will have the regular user_data attribute there. So why add
another potential argument to the program, besides passing an eBPF map?

>> One thought that came to my mind: Why do we have to register the eBPF
>> programs and maps? We could also just pass the FDs for those objects in
>> the SQE. As long as there is no other state, it could be the userspaces
>> choice to either attach it or pass it every time. For other FDs we
>> already support both modes, right?
>
> It doesn't register maps (see above). For not registering/attaching in
> advance -- interesting idea, I need it double check with bpf guys.

My feeling would be the following: In any case, at submission we have to
resolve the SQE references to a 'bpf_prog *', if this is done via a
registered program in io_ring_ctx or via calling 'bpf_prog_get_type' is
only a matter of the frontend.

The only difference would be the reference couting. For registered
bpf_progs, we can increment the refcounter only once. For unregistered,
we would have to increment it for every SQE. Don't know if the added
flexibility is worth the effort.

>> If we make it possible to pass some FD to an synchronization object
>> (e.g. semaphore), this might do the trick to support both modes at the interface.

I was thinking further about this. And, really I could not come up with
a proper user-space visible 'object' that we can grab with an FD to
attach this semantic to as futexes come in form of a piece of memory.

So perhaps, we would do something like

    // alloc 3 groups
    io_uring_register(fd, REGISTER_SYNCHRONIZATION_GROUPS, 3);

    // submit a synchronized SQE
    sqe->flags |= IOSQE_SYNCHRONIZE;
    sqe->synchronize_group = 1;     // could probably be restricted to uint8_t.

When looking at this, this could generally be a nice feature to have
with SQEs, or? Hereby, the user could insert all of his SQEs and they
would run sequentially. In contrast to SQE linking, the order of SQEs
would not be determined, which might be beneficial at some point.

>> - FD to synchronization object during the execution
>
> can be in the context, in theory. We need to check available
> synchronisation means

In the context you mean: there could be a pointer to user-memory and the
bpf program calls futex_wait?

> The idea is to add several CQs to a single ring, regardless of BPF,
> and for any request of any type use specify sqe->cq_idx, and CQE
> goes there. And then BPF can benefit from it, sending its requests
> to the right CQ, not necessarily to a single one.
> It will be the responsibility of the userspace to make use of CQs
> right, e.g. allocating CQ per BPF program and so avoiding any sync,
> or do something more complex cross posting to several CQs.

Would this be done at creation time or by registering them? I think that
creation time would be sufficient and it would ease the housekeeping. In
that case, we could get away with making all CQs equal in size and only
read out the number of the additional CQs from the io_uring_params.

When adding a sqe->cq_idx: Do we have to introduce an IOSQE_SELET_OTHER
flag to not break userspace for backwards compatibility or is it
sufficient to assume that the user has zeroed the SQE beforehand?

However, I like the idea of making it a generic field of SQEs.

>> When taking a step back, this is nearly a io_uring_enter(minwait=N)-SQE
>
> Think of it as a separate request type (won't be a separate, but still...)
> waiting on CQ, similarly to io_uring_enter(minwait=N) but doesn't block.
>
>> with an attached eBPF callback, or? At that point, we are nearly full
>> circle.
>
> What do you mean?

I was just joking: With a minwait=N-like field in the eBPF-SQE, we can
mimic the behavior of io_uring_enter(minwait=N) but with an SQE. On the
result of that eBPF-SQE we can now actually wait with io_uring_enter(..)

chris
-- 
Prof. Dr.-Ing. Christian Dietrich
Operating System Group (E-EXK4)
Technische Universität Hamburg
Am Schwarzenberg-Campus 3 (E)
21073 Hamburg

eMail:  [email protected]
Tel:    +49 40 42878 2188
WWW:    https://osg.tuhh.de/

next prev parent reply	other threads:[~2021-05-05 12:57 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <[email protected]>
     [not found] ` <[email protected]>
     [not found]   ` <[email protected]>
2021-04-16 15:49     ` [RFC] Programming model for io_uring + eBPF Pavel Begunkov
2021-04-20 16:35       ` Christian Dietrich
2021-04-23 15:34         ` Pavel Begunkov
2021-04-29 13:27           ` Christian Dietrich
2021-05-01  9:49             ` Pavel Begunkov
2021-05-05 12:57               ` Christian Dietrich [this message]
2021-05-05 16:13                 ` Christian Dietrich
2021-05-07 15:13                   ` Pavel Begunkov
2021-05-12 11:20                     ` Christian Dietrich
2021-05-18 14:39                       ` Pavel Begunkov
2021-05-19 16:55                         ` Christian Dietrich
2021-05-20 11:14                           ` Pavel Begunkov
2021-05-20 15:01                             ` Christian Dietrich
2021-05-21 10:27                               ` Pavel Begunkov
2021-05-27 11:12                                 ` Christian Dietrich
2021-06-02 10:47                                   ` Pavel Begunkov
2021-05-07 15:10                 ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox