public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Pavel Begunkov <asml.silence@gmail.com>
To: io-uring@vger.kernel.org
Cc: asml.silence@gmail.com, axboe@kernel.dk,
	Martin KaFai Lau <martin.lau@linux.dev>,
	bpf@vger.kernel.org,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	ming.lei@redhat.com
Subject: [PATCH v3 00/10] BPF controlled io_uring
Date: Thu, 13 Nov 2025 11:59:37 +0000	[thread overview]
Message-ID: <cover.1763031077.git.asml.silence@gmail.com> (raw)

This patch set adds a hook into the io_uring waiting loop and
allows to attach a BPF program to it, which is implemented as
io_uring specific struct_ops. It allows event processing and
request submission as well as waiting tuning.

There is a bunch of cases collected over time it's designed to cover:

- Syscall avoidance. Instead of returning to the userspace for
  CQE processing, a part of the logic can be moved into BPF to
  avoid excessive number of syscalls.

- Smarter request ordering and linking. Request links are pretty
  limited and inflexible as they can't pass information from one
  request to another. With BPF we can peek at CQEs and memory and
  compile a subsequent request.

- Eventual deprecation of links. Linked request kernel logic is
  a large liability. It introduces a lot of complexity to core
  io_uring, and also leaking into other areas, e.g. affecting
  decisions on what and when is initialised. It'll be a big win
  if it can be moved into the new hook as a kernel function or
  even better BPF.

- Access to in-kernel io_uring resources. For example, there are
  registered buffers that can't be directly accessed by the userspace,
  however we can give BPF the ability to peek at them. It can be used
  to take a look at in-buffer app level headers to decide what to do
  with data next and issuing IO using it.

- Finer control over waiting algorithms. io_uring has min-wait support,
  however it's still limited by uapi, and elaborate parametrisation
  for more complex algorithms won't be feasible.

- Optimised waiting. On the same note, mixing requests of different
  types and combining submissions with waiting into a single syscall
  proved to be troublesome because of different ways requests are
  executed. BPF and CQE parsing will help with that.

- Smarter polling. Napi polling is performed only once per syscall
  and then it switches to waiting. We can do smarter and intermix
  polling with waiting using the hook.

It might need more specialised kfuncs in the future, but the core
functionality is implemented with just two simple functions. One
returns region memory, which gives BPF access to CQ/SQ/etc. And
the second is for submitting requests. It's also given a structure
as an argument, which is used to pass waiting information.

Previously, a test sequentially executing N nop request in BPF
showed a 50% performance edge over 2-nop links, and 80% with no
linking at all on a mitigated kernel.

since v2: https://lore.kernel.org/io-uring/cover.1749214572.git.asml.silence@gmail.com/
  - Removed most of utility kfuncs and replaced it with a single
    helper returning the ring memory.
  - Added KF_TRUSTED_ARGS to kfuncs
  - Fix ifdef guarding
  - Added a selftest
  - Adjusted the waiting loop
  - Reused the bpf lock section for task_work execution

Pavel Begunkov (10):
  io_uring: rename the wait queue entry field
  io_uring: simplify io_cqring_wait_schedule results
  io_uring: export __io_run_local_work
  io_uring: extract waiting parameters into a struct
  io_uring/bpf: add stubs for bpf struct_ops
  io_uring/bpf: add handle events callback
  io_uring/bpf: implement struct_ops registration
  io_uring/bpf: add basic kfunc helpers
  selftests/io_uring: update mini liburing
  selftests/io_uring: add bpf io_uring selftests

 include/linux/io_uring_types.h               |   6 +
 io_uring/Kconfig                             |   5 +
 io_uring/Makefile                            |   1 +
 io_uring/bpf.c                               | 277 +++++++++++++++++++
 io_uring/bpf.h                               |  47 ++++
 io_uring/io_uring.c                          |  63 +++--
 io_uring/io_uring.h                          |  18 +-
 io_uring/napi.c                              |   4 +-
 tools/include/io_uring/mini_liburing.h       |  57 +++-
 tools/testing/selftests/Makefile             |   3 +-
 tools/testing/selftests/io_uring/Makefile    | 164 +++++++++++
 tools/testing/selftests/io_uring/basic.bpf.c |  81 ++++++
 tools/testing/selftests/io_uring/common.h    |   2 +
 tools/testing/selftests/io_uring/runner.c    |  80 ++++++
 tools/testing/selftests/io_uring/types.bpf.h | 136 +++++++++
 15 files changed, 900 insertions(+), 44 deletions(-)
 create mode 100644 io_uring/bpf.c
 create mode 100644 io_uring/bpf.h
 create mode 100644 tools/testing/selftests/io_uring/Makefile
 create mode 100644 tools/testing/selftests/io_uring/basic.bpf.c
 create mode 100644 tools/testing/selftests/io_uring/common.h
 create mode 100644 tools/testing/selftests/io_uring/runner.c
 create mode 100644 tools/testing/selftests/io_uring/types.bpf.h

-- 
2.49.0


             reply	other threads:[~2025-11-13 12:00 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-13 11:59 Pavel Begunkov [this message]
2025-11-13 11:59 ` [PATCH v3 01/10] io_uring: rename the wait queue entry field Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 02/10] io_uring: simplify io_cqring_wait_schedule results Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 03/10] io_uring: export __io_run_local_work Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 04/10] io_uring: extract waiting parameters into a struct Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 05/10] io_uring/bpf: add stubs for bpf struct_ops Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 06/10] io_uring/bpf: add handle events callback Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 07/10] io_uring/bpf: implement struct_ops registration Pavel Begunkov
2025-11-24  3:44   ` Ming Lei
2025-11-24 13:12     ` Pavel Begunkov
2025-11-24 14:29       ` Ming Lei
2025-11-25 12:46         ` Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 08/10] io_uring/bpf: add basic kfunc helpers Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 09/10] selftests/io_uring: update mini liburing Pavel Begunkov
2025-11-13 11:59 ` [PATCH v3 10/10] selftests/io_uring: add bpf io_uring selftests Pavel Begunkov
2025-11-14 13:08   ` Ming Lei
2025-11-19 19:00     ` Pavel Begunkov
2025-11-20  1:41       ` Ming Lei
2025-11-21 16:12         ` Pavel Begunkov
2025-11-22  0:19           ` Ming Lei
2025-11-24 11:57             ` Pavel Begunkov
2025-11-24 13:28               ` Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1763031077.git.asml.silence@gmail.com \
    --to=asml.silence@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox