public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	Caleb Sander Mateos <csander@purestorage.com>,
	Akilesh Kailash <akailash@google.com>,
	bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring
Date: Tue, 11 Nov 2025 14:07:47 +0000	[thread overview]
Message-ID: <9b59b165-1f57-4cb6-ae62-403d922ad4da@gmail.com> (raw)
In-Reply-To: <aQ4WTLX9ieL5J7ot@fedora>

On 11/7/25 15:54, Ming Lei wrote:
> On Thu, Nov 06, 2025 at 04:03:29PM +0000, Pavel Begunkov wrote:
>> On 11/5/25 15:57, Ming Lei wrote:
>>> On Wed, Nov 05, 2025 at 12:47:58PM +0000, Pavel Begunkov wrote:
>>>> On 11/4/25 16:21, Ming Lei wrote:
>>>>> Hello,
>>>>>
>>>>> Add IORING_OP_BPF for extending io_uring operations, follows typical cases:
>>>>
>>>> BPF requests were tried long time ago and it wasn't great. Performance
>>>
>>> Care to share the link so I can learn from the lesson? Maybe things have
>>> changed now...
>>
>> https://lore.kernel.org/io-uring/a83f147b-ea9d-e693-a2e9-c6ce16659749@gmail.com/T/#m31d0a2ac6e2213f912a200f5e8d88bd74f81406b
>>
>> There were some extra features and testing from folks, but I don't
>> think it was ever posted to the list.
> 
> Thanks for sharing the link:
> 
> ```
> The main problem solved is feeding completion information of other
> requests in a form of CQEs back into BPF. I decided to wire up support
> for multiple completion queues (aka CQs) and give BPF programs access to
> them, so leaving userspace in control over synchronisation that should
> be much more flexible that the link-based approach.
> ```

FWIW, and those extensions were the sign telling that the approach
wasn't flexible enough.

> Looks it is totally different with my patch in motivation and policy.
> 
> I do _not_ want to move application logic into kernel by building SQE from
> kernel prog. With IORING_OP_BPF, the whole io_uring application is
> built & maintained completely in userspace, so I needn't to do cumbersome
> kernel/user communication just for setting up one SQE in prog, not mention
> maintaining SQE's relation with userspace side's.

It's built and maintained in userspace in either case, and in
both cases you have bpf implementing some logic that was previously
done in userspace. To emphasize, you can do the desired parts of
handling in BPF, and I'm not suggesting moving the entirety of
request processing in there.

>>>> for short BPF programs is not great because of io_uring request handling
>>>> overhead. And flexibility was severely lacking, so even simple use cases
>>>
>>> What is the overhead? In this patch, OP's prep() and issue() are defined in
>>
>> The overhead of creating, freeing and executing a request. If you use
>> it with links, it's also overhead of that. That prototype could also
>> optionally wait for completions, and it wasn't free either.
> 
> IORING_OP_BPF is same with existing normal io_uring request and link, wrt
> all above you mentioned.

It is, but it's an extra request, and in previous testing overhead
for that extra request was affecting total performance, that's why
linking or not is also important.

> IORING_OP_BPF's motivation is for being io_uring's supplementary or extention
> in function, not for improving performance.
> 
>>
>>> bpf prog, but in typical use case, the code size is pretty small, and bpf
>>> prog code is supposed to run in fast path.>
>>>> were looking pretty ugly, internally, and for BPF writers as well.
>>>
>>> I am not sure what `simple use cases` you are talking about.
>>
>> As an example, creating a loop reading a file:
>> read N bytes; wait for completion; repeat
> 
> IORING_OP_BPF isn't supposed to implement FS operation in bpf prog.
> 
> It doesn't mean IORING_OP_BPF can't support async issuing:
> 
> - issue_wait() can be added for offload in io-wq context
> 
> OR
> 
> - for typical FS AIO, in theory it can be supported too, just the struct_ops need
> to define one completion callback, and the callback can be called from
> ->ki_complete().

There is more to IO than read/write, and I'm afraid each new type of
operation would need some extra kfunc glue. And even then there is
enough of handling for rw requests in io_uring than just calling the
callback. It's nicer to be able to reuse all io_uring request
handling, which wouldn't even need extra kfuncs.

...
>>> and it can't be used in my case.
>> Hmm, how so? Let's say ublk registers a buffer and posts a
>> completion. Then BPF runs, it sees the completion and does the
>> necessary processing, probably using some kfuncs like the ones
> 
> It is easy to say, how can the BPF prog know the next completion is
> exactly waiting for? You have to rely on bpf map to communicate with userspace

By taking a peek at and maybe dereferencing cqe->user_data.

> to understanding what completion is what you are interested in, also
> need all information from userpace for preparing the SQE for submission
> from bpf prog. Tons of userspace and kernel communication.

You can setup a BPF arena, and all that comm will be working with
a block of shared memory. Or same but via io_uring parameter region.
That sounds pretty simple.

>> you introduced. After it can optionally queue up requests
>> writing it to the storage or anything else.
> 
> Again, I do not want to move userspace logic into bpf prog(kernel), what
> IORING_BPF_OP provides is to define one operation, then userspace
> can use it just like in-kernel operations.

Right, but that's rather limited. I want to cover all those
use cases with one implementation instead of fragmenting users,
if that can be achieved.

> Then existing application can apply IORING_BPF_OP just with little small
> change. If submitting SQE from bpf prog, ublk application need re-write
> for supporting register buffer based zero copy.
> 
>> The reason I'm asking is because it's supposed to be able to
>> do anything the userspace can already achieve (and more). So,
>> if it can't be used for this use cases, there should be some
>> problem in my design.
> 
> BPF prog programming is definitely much more limited compared with
> userspace application because it is safe kernel programming.

-- 
Pavel Begunkov


  reply	other threads:[~2025-11-11 14:07 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
2025-11-07 19:02   ` kernel test robot
2025-11-08  6:53   ` kernel test robot
2025-11-13 10:32   ` Stefan Metzmacher
2025-11-13 10:59     ` Ming Lei
2025-11-13 11:19       ` Stefan Metzmacher
2025-11-14  3:00         ` Ming Lei
2025-11-19 14:39   ` Jonathan Corbet
2025-11-20  1:46     ` Ming Lei
2025-11-20  1:51       ` Ming Lei
2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
2025-11-13 10:42   ` Stefan Metzmacher
2025-11-13 11:04     ` Ming Lei
2025-11-13 11:25       ` Stefan Metzmacher
2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
2025-11-07 18:51   ` kernel test robot
2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
2025-11-05 15:57   ` Ming Lei
2025-11-06 16:03     ` Pavel Begunkov
2025-11-07 15:54       ` Ming Lei
2025-11-11 14:07         ` Pavel Begunkov [this message]
2025-11-13  4:18           ` Ming Lei
2025-11-19 19:00             ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b59b165-1f57-4cb6-ae62-403d922ad4da@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=akailash@google.com \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=csander@purestorage.com \
    --cc=io-uring@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox