From: Pavel Begunkov <asml.silence@gmail.com>
To: io-uring@vger.kernel.org
Cc: asml.silence@gmail.com, axboe@kernel.dk,
Martin KaFai Lau <martin.lau@linux.dev>,
bpf@vger.kernel.org,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Andrii Nakryiko <andrii@kernel.org>,
ming.lei@redhat.com
Subject: [PATCH v3 00/10] BPF controlled io_uring
Date: Thu, 13 Nov 2025 11:59:37 +0000 [thread overview]
Message-ID: <cover.1763031077.git.asml.silence@gmail.com> (raw)
This patch set adds a hook into the io_uring waiting loop and
allows to attach a BPF program to it, which is implemented as
io_uring specific struct_ops. It allows event processing and
request submission as well as waiting tuning.
There is a bunch of cases collected over time it's designed to cover:
- Syscall avoidance. Instead of returning to the userspace for
CQE processing, a part of the logic can be moved into BPF to
avoid excessive number of syscalls.
- Smarter request ordering and linking. Request links are pretty
limited and inflexible as they can't pass information from one
request to another. With BPF we can peek at CQEs and memory and
compile a subsequent request.
- Eventual deprecation of links. Linked request kernel logic is
a large liability. It introduces a lot of complexity to core
io_uring, and also leaking into other areas, e.g. affecting
decisions on what and when is initialised. It'll be a big win
if it can be moved into the new hook as a kernel function or
even better BPF.
- Access to in-kernel io_uring resources. For example, there are
registered buffers that can't be directly accessed by the userspace,
however we can give BPF the ability to peek at them. It can be used
to take a look at in-buffer app level headers to decide what to do
with data next and issuing IO using it.
- Finer control over waiting algorithms. io_uring has min-wait support,
however it's still limited by uapi, and elaborate parametrisation
for more complex algorithms won't be feasible.
- Optimised waiting. On the same note, mixing requests of different
types and combining submissions with waiting into a single syscall
proved to be troublesome because of different ways requests are
executed. BPF and CQE parsing will help with that.
- Smarter polling. Napi polling is performed only once per syscall
and then it switches to waiting. We can do smarter and intermix
polling with waiting using the hook.
It might need more specialised kfuncs in the future, but the core
functionality is implemented with just two simple functions. One
returns region memory, which gives BPF access to CQ/SQ/etc. And
the second is for submitting requests. It's also given a structure
as an argument, which is used to pass waiting information.
Previously, a test sequentially executing N nop request in BPF
showed a 50% performance edge over 2-nop links, and 80% with no
linking at all on a mitigated kernel.
since v2: https://lore.kernel.org/io-uring/cover.1749214572.git.asml.silence@gmail.com/
- Removed most of utility kfuncs and replaced it with a single
helper returning the ring memory.
- Added KF_TRUSTED_ARGS to kfuncs
- Fix ifdef guarding
- Added a selftest
- Adjusted the waiting loop
- Reused the bpf lock section for task_work execution
Pavel Begunkov (10):
io_uring: rename the wait queue entry field
io_uring: simplify io_cqring_wait_schedule results
io_uring: export __io_run_local_work
io_uring: extract waiting parameters into a struct
io_uring/bpf: add stubs for bpf struct_ops
io_uring/bpf: add handle events callback
io_uring/bpf: implement struct_ops registration
io_uring/bpf: add basic kfunc helpers
selftests/io_uring: update mini liburing
selftests/io_uring: add bpf io_uring selftests
include/linux/io_uring_types.h | 6 +
io_uring/Kconfig | 5 +
io_uring/Makefile | 1 +
io_uring/bpf.c | 277 +++++++++++++++++++
io_uring/bpf.h | 47 ++++
io_uring/io_uring.c | 63 +++--
io_uring/io_uring.h | 18 +-
io_uring/napi.c | 4 +-
tools/include/io_uring/mini_liburing.h | 57 +++-
tools/testing/selftests/Makefile | 3 +-
tools/testing/selftests/io_uring/Makefile | 164 +++++++++++
tools/testing/selftests/io_uring/basic.bpf.c | 81 ++++++
tools/testing/selftests/io_uring/common.h | 2 +
tools/testing/selftests/io_uring/runner.c | 80 ++++++
tools/testing/selftests/io_uring/types.bpf.h | 136 +++++++++
15 files changed, 900 insertions(+), 44 deletions(-)
create mode 100644 io_uring/bpf.c
create mode 100644 io_uring/bpf.h
create mode 100644 tools/testing/selftests/io_uring/Makefile
create mode 100644 tools/testing/selftests/io_uring/basic.bpf.c
create mode 100644 tools/testing/selftests/io_uring/common.h
create mode 100644 tools/testing/selftests/io_uring/runner.c
create mode 100644 tools/testing/selftests/io_uring/types.bpf.h
--
2.49.0
next reply other threads:[~2025-11-13 12:00 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-13 11:59 Pavel Begunkov [this message]
2025-11-13 11:59 ` [PATCH v3 01/10] io_uring: rename the wait queue entry field Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 02/10] io_uring: simplify io_cqring_wait_schedule results Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 03/10] io_uring: export __io_run_local_work Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 04/10] io_uring: extract waiting parameters into a struct Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 05/10] io_uring/bpf: add stubs for bpf struct_ops Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 06/10] io_uring/bpf: add handle events callback Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 07/10] io_uring/bpf: implement struct_ops registration Pavel Begunkov
2025-11-24 3:44 ` Ming Lei
2025-11-24 13:12 ` Pavel Begunkov
2025-11-24 14:29 ` Ming Lei
2025-11-25 12:46 ` Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 08/10] io_uring/bpf: add basic kfunc helpers Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 09/10] selftests/io_uring: update mini liburing Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 10/10] selftests/io_uring: add bpf io_uring selftests Pavel Begunkov
2025-11-14 13:08 ` Ming Lei
2025-11-19 19:00 ` Pavel Begunkov
2025-11-20 1:41 ` Ming Lei
2025-11-21 16:12 ` Pavel Begunkov
2025-11-22 0:19 ` Ming Lei
2025-11-24 11:57 ` Pavel Begunkov
2025-11-24 13:28 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1763031077.git.asml.silence@gmail.com \
--to=asml.silence@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=axboe@kernel.dk \
--cc=bpf@vger.kernel.org \
--cc=io-uring@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox