Re: [RFC] Programming model for io_uring + eBPF

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Pavel Begunkov <[email protected]>
To: Christian Dietrich <[email protected]>,
	io-uring <[email protected]>
Cc: Horst Schirmeier <[email protected]>,
	"Franz-B. Tuneke" <[email protected]>
Subject: Re: [RFC] Programming model for io_uring + eBPF
Date: Tue, 18 May 2021 15:39:31 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 5/12/21 12:20 PM, Christian Dietrich wrote:
> Pavel Begunkov <[email protected]> [07. May 2021]:
> 
>>> The following SQE would become: Append this SQE to the SQE-link chain
>>> with the name '1'. If the link chain has completed, start a new one.
>>> Thereby, the user could add an SQE to an existing link chain, even other
>>> SQEs are already submitted.
>>>
>>>>     sqe->flags |= IOSQE_SYNCHRONIZE;
>>>>     sqe->synchronize_group = 1;     // could probably be restricted to uint8_t.
>>>
>>> Implementation wise, we would hold a pointer to the last element of the
>>> implicitly generated link chain.
>>
>> It will be in the common path hurting performance for those not using
>> it, and with no clear benefit that can't be implemented in userspace.
>> And io_uring is thin enough for all those extra ifs to affect end
>> performance.
>>
>> Let's consider if we run out of userspace options.
> 
> So summarize my proposal: I want io_uring to support implicit
> synchronization by sequentialization at submit time. Doing this would
> avoid the overheads of locking (and potentially sleeping).
> 
> So the problem that I see with a userspace solution is the following:
> If I want to sequentialize an SQE with another SQE that was submitted
> waaaaaay earlier, the usual IOSQE_IO_LINK cannot be used as I cannot the
> the link flag of that already submitted SQE. Therefore, I would have to
> wait in userspace for the CQE and submit my second SQE lateron.
> 
> Especially if the goal is to remain in Kernelspace as long as possible
> via eBPF-SQEs this is not optimal.
> 
>> Such things go really horribly with performant APIs as io_uring, even
>> if not used. Just see IOSQE_IO_DRAIN, it maybe almost never used but
>> still in the hot path.
> 
> If we extend the semantic of IOSEQ_IO_LINK instead of introducing a new
> flag, we should be able to limit the problem, or?
> 
> - With synchronize_group=0, the usual link-the-next SQE semantic could
>   remain.
> - While synchronize_group!=0 could expose the described synchronization
>   semantic.
> 
> Thereby, the overhead is at least hidden behind the existing check for
> IOSEQ_IO_LINK, which is there anyway. Do you consider IOSQE_IO_LINK=1
> part of the hot path?

Let's clarify in case I misunderstood you. In a snippet below, should
it serialise execution of sqe1 and sqe2, so they don't run
concurrently? Once request is submitted we don't keep an explicit
reference to it, and it's hard and unreliably trying to find it, so
would not really be "submission" time, but would require additional
locking:

1) either on completion of a request it looks up its group, but
then submission should do +1 spinlock to keep e.g. a list for each
group.
2) or try to find a running request and append to its linked list,
but that won't work.
3) or do some other magic, but all options would rather be far from
free.

If it shouldn't serialise it this case, then I don't see much
difference with IOSEQ_IO_LINK.

prep_sqe1(group=1);
submit();
prep_sqe2(group=1);
submit();

-- 
Pavel Begunkov

next prev parent reply	other threads:[~2021-05-18 14:39 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <[email protected]>
     [not found] ` <[email protected]>
     [not found]   ` <[email protected]>
2021-04-16 15:49     ` [RFC] Programming model for io_uring + eBPF Pavel Begunkov
2021-04-20 16:35       ` Christian Dietrich
2021-04-23 15:34         ` Pavel Begunkov
2021-04-29 13:27           ` Christian Dietrich
2021-05-01  9:49             ` Pavel Begunkov
2021-05-05 12:57               ` Christian Dietrich
2021-05-05 16:13                 ` Christian Dietrich
2021-05-07 15:13                   ` Pavel Begunkov
2021-05-12 11:20                     ` Christian Dietrich
2021-05-18 14:39                       ` Pavel Begunkov [this message]
2021-05-19 16:55                         ` Christian Dietrich
2021-05-20 11:14                           ` Pavel Begunkov
2021-05-20 15:01                             ` Christian Dietrich
2021-05-21 10:27                               ` Pavel Begunkov
2021-05-27 11:12                                 ` Christian Dietrich
2021-06-02 10:47                                   ` Pavel Begunkov
2021-05-07 15:10                 ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox