public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] BPF controlled io_uring
@ 2026-01-27 10:14 Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 1/6] io_uring: introduce callback driven main loop Pavel Begunkov
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 10:14 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, bpf

Note: I'll be targeting 7.1 as it's rc7 and it can use some
time to settle down.

This series introduces a way to override the standard io_uring_enter
syscall execution with an extendible event loop, which can be controlled
by BPF via new io_uring struct_ops or from within the kernel.

There are multiple use cases I want to cover with this:

- Syscall avoidance. Instead of returning to the userspace for
  CQE processing, a part of the logic can be moved into BPF to
  avoid excessive number of syscalls.

- Access to in-kernel io_uring resources. For example, there are
  registered buffers that can't be directly accessed by the userspace,
  however we can give BPF the ability to peek at them. It can be used
  to take a look at in-buffer app level headers to decide what to do
  with data next and issuing IO using it.

- Smarter request ordering and linking. Request links are pretty
  limited and inflexible as they can't pass information from one
  request to another. With BPF we can peek at CQEs and memory and
  compile a subsequent request.

- Feature semi-deprecation. It can be used to simplify handling
  of deprecated features by moving it into the callback out core
  io_uring. For example, it should be trivial to simulate
  IOSQE_IO_DRAIN. Another target could be request linking logic.

- It can serve as a base for custom algorithms and fine tuning.
  Often, it'd be impractical to introduce a generic feature because
  it's either niche or requires a lot of configuration. For example,
  there is support min-wait, however BPF can help to further fine tune
  it by doing it in multiple steps with different number of CQEs /
  timeouts. Another feature people were asking about is allowing
  to over queue SQEs but make the kernel to maintain a given QD.

- Smarter polling. Napi polling is performed only once per syscall
  and then it switches to waiting. We can do smarter and intermix
  polling with waiting using the hook.

It might need more specialised kfuncs in the future, but the core
functionality is implemented with just two simple functions. One
returns region memory, which gives BPF access to CQ/SQ/etc. And
the second is for submitting requests. It's also given a structure
as an argument, which is used to pass waiting parameters.

It showed good numbers in a test that sequentially executes N nop
requests, where BPF was more than twice as fast than a 2-nop
request link implementation.

Pavel Begunkov (6):
  io_uring: introduce callback driven main loop
  io_uring/bpf-ops: add basic bpf struct_ops boilerplate
  io_uring/bpf-ops: add loop_step struct_ops callback
  io_uring/bpf-ops: add kfunc helpers
  io_uring/bpf-ops: add bpf struct ops registration
  selftests/io_uring: add a bpf io_uring selftest

 include/linux/io_uring_types.h               |  10 +
 io_uring/Kconfig                             |   5 +
 io_uring/Makefile                            |   3 +-
 io_uring/bpf-ops.c                           | 265 +++++++++++++++++++
 io_uring/bpf-ops.h                           |  28 ++
 io_uring/io_uring.c                          |   8 +
 io_uring/loop.c                              |  88 ++++++
 io_uring/loop.h                              |  27 ++
 tools/testing/selftests/Makefile             |   3 +-
 tools/testing/selftests/io_uring/Makefile    | 143 ++++++++++
 tools/testing/selftests/io_uring/basic.bpf.c | 116 ++++++++
 tools/testing/selftests/io_uring/common.h    |   6 +
 tools/testing/selftests/io_uring/runner.c    | 107 ++++++++
 tools/testing/selftests/io_uring/types.bpf.h | 131 +++++++++
 14 files changed, 938 insertions(+), 2 deletions(-)
 create mode 100644 io_uring/bpf-ops.c
 create mode 100644 io_uring/bpf-ops.h
 create mode 100644 io_uring/loop.c
 create mode 100644 io_uring/loop.h
 create mode 100644 tools/testing/selftests/io_uring/Makefile
 create mode 100644 tools/testing/selftests/io_uring/basic.bpf.c
 create mode 100644 tools/testing/selftests/io_uring/common.h
 create mode 100644 tools/testing/selftests/io_uring/runner.c
 create mode 100644 tools/testing/selftests/io_uring/types.bpf.h

-- 
2.52.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v4 1/6] io_uring: introduce callback driven main loop
  2026-01-27 10:14 [PATCH v4 0/6] BPF controlled io_uring Pavel Begunkov
@ 2026-01-27 10:14 ` Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 2/6] io_uring/bpf-ops: add basic bpf struct_ops boilerplate Pavel Begunkov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 10:14 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, bpf

The io_uring_enter() has a fixed order of execution: it submits
requests, waits for completions, and returns to the user. Allow to
optionally replace it with a custom loop driven by a callback called
loop_step. The basic requirements to the callback is that it should be
able to submit requests, wait for completions, parse them and repeat.
Most of the communication including parameter passing can be implemented
via shared memory.

The callback should return IOU_LOOP_CONTINUE to continue execution or
IOU_LOOP_STOP to return to the user space. Note that the kernel may
decide to prematurely terminate it as well, e.g. in case the process was
signalled or killed.

The hook takes a structure with parameters. It can be used to ask the
kernel to wait for CQEs by setting cq_wait_idx to the CQE index it wants
to wait for. Spurious wake ups are possible and even likely, the callback
is expected to handle it. There will be more parameters in the future
like timeout.

It can be used with kernel callbacks, for example, as a slow path
deprecation mechanism overwiting SQEs and emulating the wanted
behaviour, however it's more useful together with BPF programs
implemented in following patches.

Note that keeping it separately from the normal io_uring wait loop
makes things much simpler and cleaner. It keeps it in one place instead
of spreading a bunch of checks in different places including disabling
the submission path. It holds the lock by default, which is a better fit
for BPF synchronisation and the loop execution model. It nicely avoids
existing quirks like forced wake ups on timeout request completion. And
it should be easier to implement new features.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/io_uring_types.h |  5 ++
 io_uring/Makefile              |  2 +-
 io_uring/io_uring.c            |  6 +++
 io_uring/loop.c                | 88 ++++++++++++++++++++++++++++++++++
 io_uring/loop.h                | 27 +++++++++++
 5 files changed, 127 insertions(+), 1 deletion(-)
 create mode 100644 io_uring/loop.c
 create mode 100644 io_uring/loop.h

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index dc6bd6940a0d..9990df98790d 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -41,6 +41,8 @@ enum io_uring_cmd_flags {
 	IO_URING_F_COMPAT		= (1 << 12),
 };
 
+struct iou_loop_params;
+
 struct io_wq_work_node {
 	struct io_wq_work_node *next;
 };
@@ -342,6 +344,9 @@ struct io_ring_ctx {
 		struct io_alloc_cache	rw_cache;
 		struct io_alloc_cache	cmd_cache;
 
+		int (*loop_step)(struct io_ring_ctx *ctx,
+				 struct iou_loop_params *);
+
 		/*
 		 * Any cancelable uring_cmd is added to this list in
 		 * ->uring_cmd() by io_uring_cmd_insert_cancelable()
diff --git a/io_uring/Makefile b/io_uring/Makefile
index bf9eff88427a..d4dbc16a58a5 100644
--- a/io_uring/Makefile
+++ b/io_uring/Makefile
@@ -14,7 +14,7 @@ obj-$(CONFIG_IO_URING)		+= io_uring.o opdef.o kbuf.o rsrc.o notif.o \
 					advise.o openclose.o statx.o timeout.o \
 					cancel.o waitid.o register.o \
 					truncate.o memmap.o alloc_cache.o \
-					query.o
+					query.o loop.o
 
 obj-$(CONFIG_IO_URING_ZCRX)	+= zcrx.o
 obj-$(CONFIG_IO_WQ)		+= io-wq.o
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5c503a3f6ecc..aea27e3538bb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -94,6 +94,7 @@
 #include "alloc_cache.h"
 #include "eventfd.h"
 #include "wait.h"
+#include "loop.h"
 
 #define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \
 			  IOSQE_IO_HARDLINK | IOSQE_ASYNC)
@@ -2557,6 +2558,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	if (unlikely(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED))
 		goto out;
 
+	if (io_has_loop_ops(ctx)) {
+		ret = io_run_loop(ctx);
+		goto out;
+	}
+
 	/*
 	 * For SQ polling, the thread will do all submissions and completions.
 	 * Just return the requested submit count, and wake the thread if
diff --git a/io_uring/loop.c b/io_uring/loop.c
new file mode 100644
index 000000000000..bf38f20f0537
--- /dev/null
+++ b/io_uring/loop.c
@@ -0,0 +1,88 @@
+#include "io_uring.h"
+#include "napi.h"
+#include "wait.h"
+#include "loop.h"
+
+struct iou_loop_state {
+	struct iou_loop_params		p;
+	struct io_ring_ctx		*ctx;
+};
+
+static inline int io_loop_nr_cqes(const struct io_ring_ctx *ctx,
+				  const struct iou_loop_state *ls)
+{
+	return ls->p.cq_wait_idx - READ_ONCE(ctx->rings->cq.tail);
+}
+
+static inline void io_loop_wait_finish(struct io_ring_ctx *ctx)
+{
+	__set_current_state(TASK_RUNNING);
+	atomic_set(&ctx->cq_wait_nr, IO_CQ_WAKE_INIT);
+}
+
+static void io_loop_wait(struct io_ring_ctx *ctx, struct iou_loop_state *ls,
+			 unsigned nr_wait)
+{
+	atomic_set(&ctx->cq_wait_nr, nr_wait);
+	set_current_state(TASK_INTERRUPTIBLE);
+
+	if (unlikely(io_local_work_pending(ctx) ||
+		     io_loop_nr_cqes(ctx, ls) <= 0) ||
+		     READ_ONCE(ctx->check_cq)) {
+		io_loop_wait_finish(ctx);
+		return;
+	}
+
+	mutex_unlock(&ctx->uring_lock);
+	schedule();
+	io_loop_wait_finish(ctx);
+	mutex_lock(&ctx->uring_lock);
+}
+
+int io_run_loop(struct io_ring_ctx *ctx)
+{
+	struct iou_loop_state ls = {};
+	int ret = -EINVAL;
+
+	if (!io_allowed_run_tw(ctx))
+		return -EEXIST;
+	mutex_lock(&ctx->uring_lock);
+
+	while (true) {
+		unsigned nr_wait;
+		int step_res;
+
+		if (unlikely(!ctx->loop_step)) {
+			ret = -EFAULT;
+			goto out_unlock;
+		}
+		step_res = ctx->loop_step(ctx, &ls.p);
+		if (step_res == IOU_LOOP_STOP)
+			break;
+
+		nr_wait = io_loop_nr_cqes(ctx, &ls);
+		if (nr_wait > 0)
+			io_loop_wait(ctx, &ls, nr_wait);
+
+		if (task_work_pending(current)) {
+			mutex_unlock(&ctx->uring_lock);
+			io_run_task_work();
+			mutex_lock(&ctx->uring_lock);
+		}
+		if (task_sigpending(current)) {
+			ret = -EINTR;
+			goto out_unlock;
+		}
+
+		nr_wait = max(nr_wait, 0);
+		io_run_local_work_locked(ctx, nr_wait);
+
+		if (READ_ONCE(ctx->check_cq) & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
+			io_cqring_do_overflow_flush(ctx);
+	}
+
+	ret = 0;
+out_unlock:
+	mutex_unlock(&ctx->uring_lock);
+	return ret;
+}
diff --git a/io_uring/loop.h b/io_uring/loop.h
new file mode 100644
index 000000000000..d7718b9ce61e
--- /dev/null
+++ b/io_uring/loop.h
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef IOU_LOOP_H
+#define IOU_LOOP_H
+
+#include <linux/io_uring_types.h>
+
+struct iou_loop_params {
+	/*
+	 * The CQE index to wait for. Only serves as a hint and can still be
+	 * woken up earlier.
+	 */
+	__u32			cq_wait_idx;
+};
+
+enum {
+	IOU_LOOP_CONTINUE = 0,
+	IOU_LOOP_STOP,
+};
+
+static inline bool io_has_loop_ops(struct io_ring_ctx *ctx)
+{
+	return data_race(ctx->loop_step);
+}
+
+int io_run_loop(struct io_ring_ctx *ctx);
+
+#endif
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 2/6] io_uring/bpf-ops: add basic bpf struct_ops boilerplate
  2026-01-27 10:14 [PATCH v4 0/6] BPF controlled io_uring Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 1/6] io_uring: introduce callback driven main loop Pavel Begunkov
@ 2026-01-27 10:14 ` Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 3/6] io_uring/bpf-ops: add loop_step struct_ops callback Pavel Begunkov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 10:14 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, bpf

Add boilerplate code with a basic bpf_struct_ops implementation and
related files and definitions. It'll be used in following patches to add
support for io_uring bpf ops.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/Kconfig    |  5 +++
 io_uring/Makefile   |  1 +
 io_uring/bpf-ops.c  | 91 +++++++++++++++++++++++++++++++++++++++++++++
 io_uring/bpf-ops.h  | 10 +++++
 io_uring/io_uring.c |  1 +
 5 files changed, 108 insertions(+)
 create mode 100644 io_uring/bpf-ops.c
 create mode 100644 io_uring/bpf-ops.h

diff --git a/io_uring/Kconfig b/io_uring/Kconfig
index 4b949c42c0bf..b4dad9b74544 100644
--- a/io_uring/Kconfig
+++ b/io_uring/Kconfig
@@ -9,3 +9,8 @@ config IO_URING_ZCRX
 	depends on PAGE_POOL
 	depends on INET
 	depends on NET_RX_BUSY_POLL
+
+config IO_URING_BPF
+	def_bool y
+	depends on IO_URING
+	depends on BPF_SYSCALL && BPF_JIT && DEBUG_INFO_BTF
diff --git a/io_uring/Makefile b/io_uring/Makefile
index d4dbc16a58a5..675865ddb906 100644
--- a/io_uring/Makefile
+++ b/io_uring/Makefile
@@ -24,3 +24,4 @@ obj-$(CONFIG_NET_RX_BUSY_POLL)	+= napi.o
 obj-$(CONFIG_NET) += net.o cmd_net.o
 obj-$(CONFIG_PROC_FS) += fdinfo.o
 obj-$(CONFIG_IO_URING_MOCK_FILE) += mock_file.o
+obj-$(CONFIG_IO_URING_BPF)	+= bpf-ops.o
diff --git a/io_uring/bpf-ops.c b/io_uring/bpf-ops.c
new file mode 100644
index 000000000000..a89d1dea60c7
--- /dev/null
+++ b/io_uring/bpf-ops.c
@@ -0,0 +1,91 @@
+#include <linux/mutex.h>
+#include <linux/bpf.h>
+
+#include "io_uring.h"
+#include "register.h"
+#include "bpf-ops.h"
+
+static struct io_uring_bpf_ops io_bpf_ops_stubs = {
+};
+
+static bool bpf_io_is_valid_access(int off, int size,
+				    enum bpf_access_type type,
+				    const struct bpf_prog *prog,
+				    struct bpf_insn_access_aux *info)
+{
+	if (type != BPF_READ)
+		return false;
+	if (off < 0 || off >= sizeof(__u64) * MAX_BPF_FUNC_ARGS)
+		return false;
+	if (off % size != 0)
+		return false;
+
+	return btf_ctx_access(off, size, type, prog, info);
+}
+
+static int bpf_io_btf_struct_access(struct bpf_verifier_log *log,
+				    const struct bpf_reg_state *reg, int off,
+				    int size)
+{
+	return -EACCES;
+}
+
+static const struct bpf_verifier_ops bpf_io_verifier_ops = {
+	.get_func_proto = bpf_base_func_proto,
+	.is_valid_access = bpf_io_is_valid_access,
+	.btf_struct_access = bpf_io_btf_struct_access,
+};
+
+static int bpf_io_init(struct btf *btf)
+{
+	return 0;
+}
+
+static int bpf_io_check_member(const struct btf_type *t,
+				const struct btf_member *member,
+				const struct bpf_prog *prog)
+{
+	return 0;
+}
+
+static int bpf_io_init_member(const struct btf_type *t,
+			       const struct btf_member *member,
+			       void *kdata, const void *udata)
+{
+	return 0;
+}
+
+static int bpf_io_reg(void *kdata, struct bpf_link *link)
+{
+	return -EOPNOTSUPP;
+}
+
+static void bpf_io_unreg(void *kdata, struct bpf_link *link)
+{
+}
+
+static struct bpf_struct_ops bpf_ring_ops = {
+	.verifier_ops = &bpf_io_verifier_ops,
+	.reg = bpf_io_reg,
+	.unreg = bpf_io_unreg,
+	.check_member = bpf_io_check_member,
+	.init_member = bpf_io_init_member,
+	.init = bpf_io_init,
+	.cfi_stubs = &io_bpf_ops_stubs,
+	.name = "io_uring_bpf_ops",
+	.owner = THIS_MODULE,
+};
+
+static int __init io_uring_bpf_init(void)
+{
+	int ret;
+
+	ret = register_bpf_struct_ops(&bpf_ring_ops, io_uring_bpf_ops);
+	if (ret) {
+		pr_err("io_uring: Failed to register struct_ops (%d)\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+__initcall(io_uring_bpf_init);
diff --git a/io_uring/bpf-ops.h b/io_uring/bpf-ops.h
new file mode 100644
index 000000000000..a6756b391387
--- /dev/null
+++ b/io_uring/bpf-ops.h
@@ -0,0 +1,10 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef IOU_BPF_OPS_H
+#define IOU_BPF_OPS_H
+
+#include <linux/io_uring_types.h>
+
+struct io_uring_bpf_ops {
+};
+
+#endif /* IOU_BPF_OPS_H */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index aea27e3538bb..09920e56c9c9 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -87,6 +87,7 @@
 #include "msg_ring.h"
 #include "memmap.h"
 #include "zcrx.h"
+#include "bpf-ops.h"
 
 #include "timeout.h"
 #include "poll.h"
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 3/6] io_uring/bpf-ops: add loop_step struct_ops callback
  2026-01-27 10:14 [PATCH v4 0/6] BPF controlled io_uring Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 1/6] io_uring: introduce callback driven main loop Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 2/6] io_uring/bpf-ops: add basic bpf struct_ops boilerplate Pavel Begunkov
@ 2026-01-27 10:14 ` Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 4/6] io_uring/bpf-ops: add kfunc helpers Pavel Begunkov
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 10:14 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, bpf

Allow BPF to implement the loop_step callback to overwrite the main loop
logic. As described in the patch introducing the callback, it receives
iou_loop_params as an argument, which BPF can directly use to control
the loop.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/bpf-ops.c | 36 ++++++++++++++++++++++++++++++++++++
 io_uring/bpf-ops.h |  4 ++++
 2 files changed, 40 insertions(+)

diff --git a/io_uring/bpf-ops.c b/io_uring/bpf-ops.c
index a89d1dea60c7..7db07eda5a48 100644
--- a/io_uring/bpf-ops.c
+++ b/io_uring/bpf-ops.c
@@ -1,11 +1,22 @@
 #include <linux/mutex.h>
 #include <linux/bpf.h>
+#include <linux/bpf_verifier.h>
 
 #include "io_uring.h"
 #include "register.h"
 #include "bpf-ops.h"
+#include "loop.h"
+
+static const struct btf_type *loop_params_type;
+
+static int io_bpf_ops__loop_step(struct io_ring_ctx *ctx,
+				 struct iou_loop_params *lp)
+{
+	return IOU_LOOP_STOP;
+}
 
 static struct io_uring_bpf_ops io_bpf_ops_stubs = {
+	.loop_step = io_bpf_ops__loop_step,
 };
 
 static bool bpf_io_is_valid_access(int off, int size,
@@ -27,6 +38,14 @@ static int bpf_io_btf_struct_access(struct bpf_verifier_log *log,
 				    const struct bpf_reg_state *reg, int off,
 				    int size)
 {
+	const struct btf_type *t = btf_type_by_id(reg->btf, reg->btf_id);
+
+	if (t == loop_params_type) {
+		if (off >= offsetof(struct iou_loop_params, cq_wait_idx) &&
+		    off + size <= offsetofend(struct iou_loop_params, cq_wait_idx))
+			return SCALAR_VALUE;
+	}
+
 	return -EACCES;
 }
 
@@ -36,8 +55,25 @@ static const struct bpf_verifier_ops bpf_io_verifier_ops = {
 	.btf_struct_access = bpf_io_btf_struct_access,
 };
 
+static const struct btf_type *
+io_lookup_struct_type(struct btf *btf, const char *name)
+{
+	s32 type_id;
+
+	type_id = btf_find_by_name_kind(btf, name, BTF_KIND_STRUCT);
+	if (type_id < 0)
+		return NULL;
+	return btf_type_by_id(btf, type_id);
+}
+
 static int bpf_io_init(struct btf *btf)
 {
+	loop_params_type = io_lookup_struct_type(btf, "iou_loop_params");
+	if (!loop_params_type) {
+		pr_err("io_uring: Failed to locate iou_loop_params\n");
+		return -EINVAL;
+	}
+
 	return 0;
 }
 
diff --git a/io_uring/bpf-ops.h b/io_uring/bpf-ops.h
index a6756b391387..e8a08ae2df0a 100644
--- a/io_uring/bpf-ops.h
+++ b/io_uring/bpf-ops.h
@@ -5,6 +5,10 @@
 #include <linux/io_uring_types.h>
 
 struct io_uring_bpf_ops {
+	int (*loop_step)(struct io_ring_ctx *ctx, struct iou_loop_params *lp);
+
+	__u32 ring_fd;
+	void *priv;
 };
 
 #endif /* IOU_BPF_OPS_H */
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 4/6] io_uring/bpf-ops: add kfunc helpers
  2026-01-27 10:14 [PATCH v4 0/6] BPF controlled io_uring Pavel Begunkov
                   ` (2 preceding siblings ...)
  2026-01-27 10:14 ` [PATCH v4 3/6] io_uring/bpf-ops: add loop_step struct_ops callback Pavel Begunkov
@ 2026-01-27 10:14 ` Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 5/6] io_uring/bpf-ops: add bpf struct ops registration Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest Pavel Begunkov
  5 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 10:14 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, bpf

Add two kfuncs that should cover most of the needs:

1. bpf_io_uring_submit_sqes(), which allows to submit io_uring requests.
   It mirrors the normal user space submission path and follows all
   related io_uring_enter(2) rules. i.e. SQEs are taken from the SQ
   according to head/tail values. In case of IORING_SETUP_SQ_REWIND,
   it'll submit first N entries.

2. bpf_io_uring_get_region() returns a pointer to the specified region,
   where io_uring regions are kernel-userspace shared chunks of memory.
   It takes the size as an argument, which should be a load time
   constant. There are 3 types of regions:
   - IOU_REGION_SQ returns the submission queue.
   - IOU_REGION_CQ stores the CQ, SQ/CQ headers and the sqarray. In
     other words, it gives same memory that would normally be mmap'ed
     with IORING_FEAT_SINGLE_MMAP enabled IORING_OFF_SQ_RING.
   - IOU_REGION_MEM represents the memory / parameter region. It can be
     used to store request indirect parameters and for kernel - user
     communication.

It intentionally provides a thin but flexible API and expects BPF
programs to implement CQ/SQ header parsing, CQ walking, etc. That
mirrors how the normal user space works with rings and should help
to minimise kernel / kfunc helpers changes while introducing new generic
io_uring features.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/bpf-ops.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++
 io_uring/bpf-ops.h |  6 ++++++
 2 files changed, 59 insertions(+)

diff --git a/io_uring/bpf-ops.c b/io_uring/bpf-ops.c
index 7db07eda5a48..ad4e3dc889ba 100644
--- a/io_uring/bpf-ops.c
+++ b/io_uring/bpf-ops.c
@@ -4,11 +4,56 @@
 
 #include "io_uring.h"
 #include "register.h"
+#include "memmap.h"
 #include "bpf-ops.h"
 #include "loop.h"
 
 static const struct btf_type *loop_params_type;
 
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc int bpf_io_uring_submit_sqes(struct io_ring_ctx *ctx, u32 nr)
+{
+	return io_submit_sqes(ctx, nr);
+}
+
+__bpf_kfunc
+__u8 *bpf_io_uring_get_region(struct io_ring_ctx *ctx, __u32 region_id,
+			      const size_t rdwr_buf_size)
+{
+	struct io_mapped_region *r;
+
+	switch (region_id) {
+	case IOU_REGION_MEM:
+		r = &ctx->param_region;
+		break;
+	case IOU_REGION_CQ:
+		r = &ctx->ring_region;
+		break;
+	case IOU_REGION_SQ:
+		r = &ctx->sq_region;
+		break;
+	default:
+		return NULL;
+	}
+
+	if (unlikely(rdwr_buf_size > io_region_size(r)))
+		return NULL;
+	return io_region_get_ptr(r);
+}
+
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(io_uring_kfunc_set)
+BTF_ID_FLAGS(func, bpf_io_uring_submit_sqes, KF_SLEEPABLE | KF_TRUSTED_ARGS);
+BTF_ID_FLAGS(func, bpf_io_uring_get_region, KF_RET_NULL | KF_TRUSTED_ARGS);
+BTF_KFUNCS_END(io_uring_kfunc_set)
+
+static const struct btf_kfunc_id_set bpf_io_uring_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set = &io_uring_kfunc_set,
+};
+
 static int io_bpf_ops__loop_step(struct io_ring_ctx *ctx,
 				 struct iou_loop_params *lp)
 {
@@ -68,12 +113,20 @@ io_lookup_struct_type(struct btf *btf, const char *name)
 
 static int bpf_io_init(struct btf *btf)
 {
+	int ret;
+
 	loop_params_type = io_lookup_struct_type(btf, "iou_loop_params");
 	if (!loop_params_type) {
 		pr_err("io_uring: Failed to locate iou_loop_params\n");
 		return -EINVAL;
 	}
 
+	ret = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS,
+					&bpf_io_uring_kfunc_set);
+	if (ret) {
+		pr_err("io_uring: Failed to register kfuncs (%d)\n", ret);
+		return ret;
+	}
 	return 0;
 }
 
diff --git a/io_uring/bpf-ops.h b/io_uring/bpf-ops.h
index e8a08ae2df0a..b9e589ad519a 100644
--- a/io_uring/bpf-ops.h
+++ b/io_uring/bpf-ops.h
@@ -4,6 +4,12 @@
 
 #include <linux/io_uring_types.h>
 
+enum {
+	IOU_REGION_MEM,
+	IOU_REGION_CQ,
+	IOU_REGION_SQ,
+};
+
 struct io_uring_bpf_ops {
 	int (*loop_step)(struct io_ring_ctx *ctx, struct iou_loop_params *lp);
 
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 5/6] io_uring/bpf-ops: add bpf struct ops registration
  2026-01-27 10:14 [PATCH v4 0/6] BPF controlled io_uring Pavel Begunkov
                   ` (3 preceding siblings ...)
  2026-01-27 10:14 ` [PATCH v4 4/6] io_uring/bpf-ops: add kfunc helpers Pavel Begunkov
@ 2026-01-27 10:14 ` Pavel Begunkov
  2026-01-27 10:14 ` [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest Pavel Begunkov
  5 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 10:14 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, bpf

Implement BPF struct ops registration. It's registered from the BPF
path, and can be removed by BPF as well as io_uring, which is why it's
protected by a global lock io_bpf_ctrl_mutex.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/io_uring_types.h |  5 ++
 io_uring/bpf-ops.c             | 87 +++++++++++++++++++++++++++++++++-
 io_uring/bpf-ops.h             |  8 ++++
 io_uring/io_uring.c            |  1 +
 4 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 9990df98790d..5dfe3608dbb9 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -8,6 +8,9 @@
 #include <linux/llist.h>
 #include <uapi/linux/io_uring.h>
 
+struct iou_loop_params;
+struct io_uring_bpf_ops;
+
 enum {
 	/*
 	 * A hint to not wake right away but delay until there are enough of
@@ -462,6 +465,8 @@ struct io_ring_ctx {
 	DECLARE_HASHTABLE(napi_ht, 4);
 #endif
 
+	struct io_uring_bpf_ops		*bpf_ops;
+
 	/*
 	 * Protection for resize vs mmap races - both the mmap and resize
 	 * side will need to grab this lock, to prevent either side from
diff --git a/io_uring/bpf-ops.c b/io_uring/bpf-ops.c
index ad4e3dc889ba..26955ff06ecf 100644
--- a/io_uring/bpf-ops.c
+++ b/io_uring/bpf-ops.c
@@ -4,10 +4,12 @@
 
 #include "io_uring.h"
 #include "register.h"
+#include "loop.h"
 #include "memmap.h"
 #include "bpf-ops.h"
 #include "loop.h"
 
+static DEFINE_MUTEX(io_bpf_ctrl_mutex);
 static const struct btf_type *loop_params_type;
 
 __bpf_kfunc_start_defs();
@@ -141,16 +143,99 @@ static int bpf_io_init_member(const struct btf_type *t,
 			       const struct btf_member *member,
 			       void *kdata, const void *udata)
 {
+	u32 moff = __btf_member_bit_offset(t, member) / 8;
+	const struct io_uring_bpf_ops *uops = udata;
+	struct io_uring_bpf_ops *ops = kdata;
+
+	switch (moff) {
+	case offsetof(struct io_uring_bpf_ops, ring_fd):
+		ops->ring_fd = uops->ring_fd;
+		return 1;
+	}
+	return 0;
+}
+
+static int io_install_bpf(struct io_ring_ctx *ctx, struct io_uring_bpf_ops *ops)
+{
+	if (ctx->flags & (IORING_SETUP_SQPOLL | IORING_SETUP_IOPOLL))
+		return -EOPNOTSUPP;
+	if (!(ctx->flags & IORING_SETUP_DEFER_TASKRUN))
+		return -EOPNOTSUPP;
+
+	if (ctx->bpf_ops)
+		return -EBUSY;
+	if (WARN_ON_ONCE(!ops->loop_step))
+		return -EINVAL;
+
+	ops->priv = ctx;
+	ctx->bpf_ops = ops;
+	ctx->loop_step = ops->loop_step;
 	return 0;
 }
 
 static int bpf_io_reg(void *kdata, struct bpf_link *link)
 {
-	return -EOPNOTSUPP;
+	struct io_uring_bpf_ops *ops = kdata;
+	struct io_ring_ctx *ctx;
+	struct file *file;
+	int ret = -EBUSY;
+
+	file = io_uring_register_get_file(ops->ring_fd, false);
+	if (IS_ERR(file))
+		return PTR_ERR(file);
+	ctx = file->private_data;
+
+	scoped_guard(mutex, &io_bpf_ctrl_mutex) {
+		guard(mutex)(&ctx->uring_lock);
+		ret = io_install_bpf(ctx, ops);
+	}
+
+	fput(file);
+	return ret;
+}
+
+static void io_eject_bpf(struct io_ring_ctx *ctx)
+{
+	struct io_uring_bpf_ops *ops = ctx->bpf_ops;
+
+	if (!WARN_ON_ONCE(!ops))
+		return;
+	if (WARN_ON_ONCE(ops->priv != ctx))
+		return;
+
+	ops->priv = NULL;
+	ctx->bpf_ops = NULL;
+	ctx->loop_step = NULL;
 }
 
 static void bpf_io_unreg(void *kdata, struct bpf_link *link)
 {
+	struct io_uring_bpf_ops *ops = kdata;
+	struct io_ring_ctx *ctx;
+
+	guard(mutex)(&io_bpf_ctrl_mutex);
+	ctx = ops->priv;
+	if (ctx) {
+		guard(mutex)(&ctx->uring_lock);
+		if (WARN_ON_ONCE(ctx->bpf_ops != ops))
+			return;
+
+		io_eject_bpf(ctx);
+	}
+}
+
+void io_unregister_bpf_ops(struct io_ring_ctx *ctx)
+{
+	/* check it first to avoid taking io_bpf_ctrl_mutex */
+	scoped_guard(mutex, &ctx->uring_lock) {
+		if (!ctx->bpf_ops)
+			return;
+	}
+
+	guard(mutex)(&io_bpf_ctrl_mutex);
+	guard(mutex)(&ctx->uring_lock);
+	if (ctx->bpf_ops)
+		io_eject_bpf(ctx);
 }
 
 static struct bpf_struct_ops bpf_ring_ops = {
diff --git a/io_uring/bpf-ops.h b/io_uring/bpf-ops.h
index b9e589ad519a..bf4d5b9bb8c9 100644
--- a/io_uring/bpf-ops.h
+++ b/io_uring/bpf-ops.h
@@ -17,4 +17,12 @@ struct io_uring_bpf_ops {
 	void *priv;
 };
 
+#ifdef CONFIG_IO_URING_BPF
+void io_unregister_bpf_ops(struct io_ring_ctx *ctx);
+#else
+static inline void io_unregister_bpf_ops(struct io_ring_ctx *ctx)
+{
+}
+#endif
+
 #endif /* IOU_BPF_OPS_H */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 09920e56c9c9..9d6eef7ccf22 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2141,6 +2141,7 @@ static __cold void io_req_caches_free(struct io_ring_ctx *ctx)
 
 static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 {
+	io_unregister_bpf_ops(ctx);
 	io_sq_thread_finish(ctx);
 
 	mutex_lock(&ctx->uring_lock);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest
  2026-01-27 10:14 [PATCH v4 0/6] BPF controlled io_uring Pavel Begunkov
                   ` (4 preceding siblings ...)
  2026-01-27 10:14 ` [PATCH v4 5/6] io_uring/bpf-ops: add bpf struct ops registration Pavel Begunkov
@ 2026-01-27 10:14 ` Pavel Begunkov
  2026-01-27 17:32   ` Alexei Starovoitov
  5 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 10:14 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, bpf

Add a simple io_uring BPF selftest, where the BPF program implemented in
basic.bpf.c executes a given number of NOP requests with QD=1, writes
some stats and returns back. The makefile is borrowed from sched_ext
tests.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 tools/testing/selftests/Makefile             |   3 +-
 tools/testing/selftests/io_uring/Makefile    | 143 +++++++++++++++++++
 tools/testing/selftests/io_uring/basic.bpf.c | 116 +++++++++++++++
 tools/testing/selftests/io_uring/common.h    |   6 +
 tools/testing/selftests/io_uring/runner.c    | 107 ++++++++++++++
 tools/testing/selftests/io_uring/types.bpf.h | 131 +++++++++++++++++
 6 files changed, 505 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/io_uring/Makefile
 create mode 100644 tools/testing/selftests/io_uring/basic.bpf.c
 create mode 100644 tools/testing/selftests/io_uring/common.h
 create mode 100644 tools/testing/selftests/io_uring/runner.c
 create mode 100644 tools/testing/selftests/io_uring/types.bpf.h

diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 56e44a98d6a5..5e965ba3697c 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -130,6 +130,7 @@ TARGETS += vfio
 TARGETS += x86
 TARGETS += x86/bugs
 TARGETS += zram
+TARGETS += io_uring
 #Please keep the TARGETS list alphabetically sorted
 # Run "make quicktest=1 run_tests" or
 # "make quicktest=1 kselftest" from top level Makefile
@@ -147,7 +148,7 @@ endif
 # User can optionally provide a TARGETS skiplist. By default we skip
 # targets using BPF since it has cutting edge build time dependencies
 # which require more effort to install.
-SKIP_TARGETS ?= bpf sched_ext
+SKIP_TARGETS ?= bpf sched_ext io_uring
 ifneq ($(SKIP_TARGETS),)
 	TMP := $(filter-out $(SKIP_TARGETS), $(TARGETS))
 	override TARGETS := $(TMP)
diff --git a/tools/testing/selftests/io_uring/Makefile b/tools/testing/selftests/io_uring/Makefile
new file mode 100644
index 000000000000..edc8c83d4273
--- /dev/null
+++ b/tools/testing/selftests/io_uring/Makefile
@@ -0,0 +1,143 @@
+# SPDX-License-Identifier: GPL-2.0
+include ../../../build/Build.include
+include ../../../scripts/Makefile.arch
+include ../../../scripts/Makefile.include
+
+TEST_GEN_PROGS := runner
+
+# override lib.mk's default rules
+OVERRIDE_TARGETS := 1
+include ../lib.mk
+
+CURDIR := $(abspath .)
+REPOROOT := $(abspath ../../../..)
+TOOLSDIR := $(REPOROOT)/tools
+LIBDIR := $(TOOLSDIR)/lib
+BPFDIR := $(LIBDIR)/bpf
+TOOLSINCDIR := $(TOOLSDIR)/include
+BPFTOOLDIR := $(TOOLSDIR)/bpf/bpftool
+APIDIR := $(TOOLSINCDIR)/uapi
+GENDIR := $(REPOROOT)/include/generated
+GENHDR := $(GENDIR)/autoconf.h
+
+OUTPUT_DIR := $(OUTPUT)/build
+OBJ_DIR := $(OUTPUT_DIR)/obj
+INCLUDE_DIR := $(OUTPUT_DIR)/include
+BPFOBJ_DIR := $(OBJ_DIR)/libbpf
+IOUOBJ_DIR := $(OBJ_DIR)/io_uring
+LIBBPF_OUTPUT := $(OBJ_DIR)/libbpf/libbpf.a
+BPFOBJ := $(BPFOBJ_DIR)/libbpf.a
+
+DEFAULT_BPFTOOL := $(OUTPUT_DIR)/host/sbin/bpftool
+HOST_OBJ_DIR := $(OBJ_DIR)/host/bpftool
+HOST_LIBBPF_OUTPUT := $(OBJ_DIR)/host/libbpf/
+HOST_LIBBPF_DESTDIR := $(OUTPUT_DIR)/host/
+HOST_DESTDIR := $(OUTPUT_DIR)/host/
+
+BPFTOOL ?= $(DEFAULT_BPFTOOL)
+
+ifneq ($(wildcard $(GENHDR)),)
+  GENFLAGS := -DHAVE_GENHDR
+endif
+
+CFLAGS += -g -O2 -rdynamic -pthread -Wall -Werror $(GENFLAGS)			\
+	  -I$(INCLUDE_DIR) -I$(GENDIR) -I$(LIBDIR)				\
+	  -I$(TOOLSINCDIR) -I$(APIDIR) -I$(CURDIR)/include
+
+# Silence some warnings when compiled with clang
+ifneq ($(LLVM),)
+CFLAGS += -Wno-unused-command-line-argument
+endif
+
+LDFLAGS = -lelf -lz -lpthread -lzstd
+
+IS_LITTLE_ENDIAN = $(shell $(CC) -dM -E - </dev/null |				\
+			grep 'define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__')
+
+# Get Clang's default includes on this system, as opposed to those seen by
+# '-target bpf'. This fixes "missing" files on some architectures/distros,
+# such as asm/byteorder.h, asm/socket.h, asm/sockios.h, sys/cdefs.h etc.
+#
+# Use '-idirafter': Don't interfere with include mechanics except where the
+# build would have failed anyways.
+define get_sys_includes
+$(shell $(1) $(2) -v -E - </dev/null 2>&1 \
+	| sed -n '/<...> search starts here:/,/End of search list./{ s| \(/.*\)|-idirafter \1|p }') \
+$(shell $(1) $(2) -dM -E - </dev/null | grep '__riscv_xlen ' | awk '{printf("-D__riscv_xlen=%d -D__BITS_PER_LONG=%d", $$3, $$3)}')
+endef
+
+ifneq ($(CROSS_COMPILE),)
+CLANG_TARGET_ARCH = --target=$(notdir $(CROSS_COMPILE:%-=%))
+endif
+
+CLANG_SYS_INCLUDES = $(call get_sys_includes,$(CLANG),$(CLANG_TARGET_ARCH))
+
+BPF_CFLAGS = -g -D__TARGET_ARCH_$(SRCARCH)					\
+	     $(if $(IS_LITTLE_ENDIAN),-mlittle-endian,-mbig-endian)		\
+	     -I$(CURDIR)/include -I$(CURDIR)/include/bpf-compat			\
+	     -I$(INCLUDE_DIR) -I$(APIDIR) 	\
+	     -I$(REPOROOT)/include						\
+	     $(CLANG_SYS_INCLUDES) 						\
+	     -Wall -Wno-compare-distinct-pointer-types				\
+	     -Wno-incompatible-function-pointer-types				\
+	     -O2 -mcpu=v3
+
+# sort removes libbpf duplicates when not cross-building
+MAKE_DIRS := $(sort $(OBJ_DIR)/libbpf $(OBJ_DIR)/libbpf				\
+	       $(OBJ_DIR)/bpftool $(OBJ_DIR)/resolve_btfids			\
+	       $(HOST_OBJ_DIR) $(INCLUDE_DIR) $(IOUOBJ_DIR))
+
+$(MAKE_DIRS):
+	$(call msg,MKDIR,,$@)
+	$(Q)mkdir -p $@
+
+$(BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile)			\
+	   $(APIDIR)/linux/bpf.h						\
+	   | $(OBJ_DIR)/libbpf
+	$(Q)$(MAKE) $(submake_extras) -C $(BPFDIR) OUTPUT=$(OBJ_DIR)/libbpf/	\
+		    ARCH=$(ARCH) CC="$(CC)" CROSS_COMPILE=$(CROSS_COMPILE)	\
+		    EXTRA_CFLAGS='-g -O0 -fPIC'					\
+		    DESTDIR=$(OUTPUT_DIR) prefix= all install_headers
+
+$(DEFAULT_BPFTOOL): $(wildcard $(BPFTOOLDIR)/*.[ch] $(BPFTOOLDIR)/Makefile)	\
+		    $(LIBBPF_OUTPUT) | $(HOST_OBJ_DIR)
+	$(Q)$(MAKE) $(submake_extras)  -C $(BPFTOOLDIR)				\
+		    ARCH= CROSS_COMPILE= CC=$(HOSTCC) LD=$(HOSTLD)		\
+		    EXTRA_CFLAGS='-g -O0'					\
+		    OUTPUT=$(HOST_OBJ_DIR)/					\
+		    LIBBPF_OUTPUT=$(HOST_LIBBPF_OUTPUT)				\
+		    LIBBPF_DESTDIR=$(HOST_LIBBPF_DESTDIR)			\
+		    prefix= DESTDIR=$(HOST_DESTDIR) install-bin
+
+$(IOUOBJ_DIR)/%.bpf.o: %.bpf.c | $(BPFOBJ) $(IOUOBJ_DIR)
+	$(call msg,CLNG-BPF,,$(notdir $@))
+	$(Q)$(CLANG) $(BPF_CFLAGS) -target bpf -c $< -o $@
+
+$(INCLUDE_DIR)/%.bpf.skel.h: $(IOUOBJ_DIR)/%.bpf.o $(BPFTOOL) | $(INCLUDE_DIR)
+	$(eval sched=$(notdir $@))
+	$(call msg,GEN-SKEL,,$(sched))
+	$(Q)$(BPFTOOL) gen object $(<:.o=.linked1.o) $<
+	$(Q)$(BPFTOOL) gen object $(<:.o=.linked2.o) $(<:.o=.linked1.o)
+	$(Q)$(BPFTOOL) gen object $(<:.o=.linked3.o) $(<:.o=.linked2.o)
+	$(Q)diff $(<:.o=.linked2.o) $(<:.o=.linked3.o)
+	$(Q)$(BPFTOOL) gen skeleton $(<:.o=.linked3.o) name $(subst .bpf.skel.h,,$(sched)) > $@
+	$(Q)$(BPFTOOL) gen subskeleton $(<:.o=.linked3.o) name $(subst .bpf.skel.h,,$(sched)) > $(@:.skel.h=.subskel.h)
+
+override define CLEAN
+	rm -rf $(OUTPUT_DIR)
+	rm -f $(TEST_GEN_PROGS)
+endef
+
+all_test_bpfprogs := $(foreach prog,$(wildcard *.bpf.c),$(INCLUDE_DIR)/$(patsubst %.c,%.skel.h,$(prog)))
+
+$(IOUOBJ_DIR)/runner.o: runner.c $(all_test_bpfprogs) | $(IOUOBJ_DIR) $(BPFOBJ)
+	$(CC) $(CFLAGS) -c $< -o $@
+
+$(OUTPUT)/runner: $(IOUOBJ_DIR)/runner.o $(BPFOBJ)
+	$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
+
+.DEFAULT_GOAL := all
+
+.DELETE_ON_ERROR:
+
+.SECONDARY:
diff --git a/tools/testing/selftests/io_uring/basic.bpf.c b/tools/testing/selftests/io_uring/basic.bpf.c
new file mode 100644
index 000000000000..b2f6f3279090
--- /dev/null
+++ b/tools/testing/selftests/io_uring/basic.bpf.c
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#include <linux/types.h>
+#include <linux/stddef.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include "types.bpf.h"
+#include "common.h"
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+extern int bpf_io_uring_submit_sqes(struct io_ring_ctx *ctx,
+				    unsigned int nr) __ksym;
+extern __u8 *bpf_io_uring_get_region(struct io_ring_ctx *ctx, __u32 region_id,
+				    const __u64 rdwr_buf_size) __ksym;
+
+static inline void io_update_cq_wait(struct iou_loop_params *lp,
+				     struct io_ring_hdr *cq_hdr,
+				     unsigned to_wait)
+{
+	lp->cq_wait_idx = cq_hdr->head + to_wait;
+}
+
+char LICENSE[] SEC("license") = "Dual BSD/GPL";
+int reqs_to_run;
+const volatile unsigned cq_hdr_offset;
+const volatile unsigned sq_hdr_offset;
+const volatile unsigned cqes_offset;
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 3);
+	__type(key, u32);
+	__type(value, s64);
+} res_map SEC(".maps");
+
+static inline void write_result(int res)
+{
+	u32 key = SLOT_RES;
+	u64 *val;
+
+	val = bpf_map_lookup_elem(&res_map, &key);
+	if (val)
+		*val = res;
+}
+
+static inline void inc_slot(int idx)
+{
+	u32 key = idx;
+	u64 *val;
+
+	val = bpf_map_lookup_elem(&res_map, &key);
+	if (val)
+		*val += 1;
+}
+
+SEC("struct_ops.s/link_loop")
+int BPF_PROG(link_loop, struct io_ring_ctx *ring, struct iou_loop_params *ls)
+{
+	struct io_ring_hdr *sq_hdr, *cq_hdr;
+	struct io_uring_cqe *cqes;
+	struct io_uring_sqe *sqes, *sqe;
+	void *rings;
+	int ret;
+
+	sqes = (void *)bpf_io_uring_get_region(ring, IOU_REGION_SQ,
+				SQ_ENTRIES * sizeof(struct io_uring_sqe));
+	rings = (void *)bpf_io_uring_get_region(ring, IOU_REGION_CQ,
+				cqes_offset + CQ_ENTRIES * sizeof(struct io_uring_cqe));
+	if (!rings || !sqes) {
+		write_result(-1);
+		return IOU_LOOP_STOP;
+	}
+
+	sq_hdr = rings + (sq_hdr_offset & 63);
+	cq_hdr = rings + (cq_hdr_offset & 63);
+	cqes = rings + cqes_offset;
+
+	if (cq_hdr->tail != cq_hdr->head) {
+		unsigned cq_mask = CQ_ENTRIES - 1;
+		struct io_uring_cqe *cqe = &cqes[cq_hdr->head++ & cq_mask];
+
+		if (cqe->user_data != reqs_to_run) {
+			write_result(-3);
+			return IOU_LOOP_STOP;
+		}
+
+		--reqs_to_run;
+		inc_slot(SLOT_NR_CQES);
+
+		if (reqs_to_run <= 0) {
+			write_result(1);
+			return IOU_LOOP_STOP;
+		}
+	}
+
+	sqe = &sqes[sq_hdr->tail & (SQ_ENTRIES - 1)];
+	*sqe = (struct io_uring_sqe){};
+	sqe->opcode = IORING_OP_NOP;
+	sqe->user_data = reqs_to_run;
+	sq_hdr->tail++;
+
+	ret = bpf_io_uring_submit_sqes(ring, 1);
+	if (ret != 1) {
+		write_result(-2);
+		return IOU_LOOP_STOP;
+	}
+
+	inc_slot(SLOT_NR_SQES);
+	io_update_cq_wait(ls, cq_hdr, 1);
+	return IOU_LOOP_CONTINUE;
+}
+
+SEC(".struct_ops")
+struct io_uring_bpf_ops basic_ops = {
+	.loop_step = (void *)link_loop,
+};
diff --git a/tools/testing/selftests/io_uring/common.h b/tools/testing/selftests/io_uring/common.h
new file mode 100644
index 000000000000..40e3182b8e5a
--- /dev/null
+++ b/tools/testing/selftests/io_uring/common.h
@@ -0,0 +1,6 @@
+#define CQ_ENTRIES 8
+#define SQ_ENTRIES 8
+
+#define SLOT_RES	0
+#define SLOT_NR_CQES	1
+#define SLOT_NR_SQES	2
diff --git a/tools/testing/selftests/io_uring/runner.c b/tools/testing/selftests/io_uring/runner.c
new file mode 100644
index 000000000000..5fc25ddc20e8
--- /dev/null
+++ b/tools/testing/selftests/io_uring/runner.c
@@ -0,0 +1,107 @@
+#include <linux/stddef.h>
+#include <errno.h>
+#include <signal.h>
+#include <stdlib.h>
+
+#include <bpf/libbpf.h>
+#include <io_uring/mini_liburing.h>
+
+#include "basic.bpf.skel.h"
+#include "common.h"
+
+static struct io_uring_params params;
+static struct basic *skel;
+static struct bpf_link *basic_link;
+
+#define NR_ITERS 10
+
+static void setup_ring(struct io_uring *ring)
+{
+	int ret;
+
+	memset(&params, 0, sizeof(params));
+	params.cq_entries = CQ_ENTRIES;
+	params.flags = IORING_SETUP_SINGLE_ISSUER |
+			IORING_SETUP_DEFER_TASKRUN |
+			IORING_SETUP_NO_SQARRAY |
+			IORING_SETUP_CQSIZE;
+
+	ret = io_uring_queue_init_params(SQ_ENTRIES, ring, &params);
+	if (ret) {
+		fprintf(stderr, "ring init failed\n");
+		exit(1);
+	}
+}
+
+static void setup_bpf_ops(struct io_uring *ring)
+{
+	int ret;
+
+	skel = basic__open();
+	if (!skel) {
+		fprintf(stderr, "can't generate skeleton\n");
+		exit(1);
+	}
+
+	skel->struct_ops.basic_ops->ring_fd = ring->ring_fd;
+	skel->bss->reqs_to_run = NR_ITERS;
+	skel->rodata->sq_hdr_offset = params.sq_off.head;
+	skel->rodata->cq_hdr_offset = params.cq_off.head;
+	skel->rodata->cqes_offset = params.cq_off.cqes;
+
+	ret = basic__load(skel);
+	if (ret) {
+		fprintf(stderr, "failed to load skeleton\n");
+		exit(1);
+	}
+
+	basic_link = bpf_map__attach_struct_ops(skel->maps.basic_ops);
+	if (!basic_link) {
+		fprintf(stderr, "failed to attach ops\n");
+		exit(1);
+	}
+}
+
+static void run_ring(struct io_uring *ring)
+{
+	__s64 res[3] = {};
+	int i, ret;
+
+	ret = io_uring_enter(ring->ring_fd, 0, 0, IORING_ENTER_GETEVENTS, NULL);
+	if (ret) {
+		fprintf(stderr, "run failed\n");
+		exit(1);
+	}
+
+	for (i = 0; i < 3; i++) {
+		__u32 key = i;
+
+		ret = bpf_map__lookup_elem(skel->maps.res_map,
+					&key, sizeof(key),
+					&res[i], sizeof(res[i]), 0);
+		if (ret)
+			fprintf(stderr, "can't read map idx %i: %i\n", i, ret);
+	}
+
+	if (res[SLOT_RES] != 1)
+		fprintf(stderr, "run failed: %i\n", (int)res[SLOT_RES]);
+	if (res[SLOT_NR_CQES] != NR_ITERS)
+		fprintf(stderr, "unexpected number of CQEs: %i\n",
+			(int)res[SLOT_NR_CQES]);
+	if (res[SLOT_NR_SQES] != NR_ITERS)
+		fprintf(stderr, "unexpected submitted number: %i\n",
+			(int)res[SLOT_NR_SQES]);
+}
+
+int main() {
+	struct io_uring ring;
+
+	setup_ring(&ring);
+	setup_bpf_ops(&ring);
+
+	run_ring(&ring);
+
+	bpf_link__destroy(basic_link);
+	basic__destroy(skel);
+	return 0;
+}
diff --git a/tools/testing/selftests/io_uring/types.bpf.h b/tools/testing/selftests/io_uring/types.bpf.h
new file mode 100644
index 000000000000..7a170cb2f388
--- /dev/null
+++ b/tools/testing/selftests/io_uring/types.bpf.h
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+#include <linux/types.h>
+#include <bpf/bpf_helpers.h>
+
+struct io_ring_ctx {
+};
+
+struct io_uring_sqe {
+	__u8	opcode;		/* type of operation for this sqe */
+	__u8	flags;		/* IOSQE_ flags */
+	__u16	ioprio;		/* ioprio for the request */
+	__s32	fd;		/* file descriptor to do IO on */
+	union {
+		__u64	off;	/* offset into file */
+		__u64	addr2;
+		struct {
+			__u32	cmd_op;
+			__u32	__pad1;
+		};
+	};
+	union {
+		__u64	addr;	/* pointer to buffer or iovecs */
+		__u64	splice_off_in;
+		struct {
+			__u32	level;
+			__u32	optname;
+		};
+	};
+	__u32	len;		/* buffer size or number of iovecs */
+	union {
+		__u32		fsync_flags;
+		__u16		poll_events;	/* compatibility */
+		__u32		poll32_events;	/* word-reversed for BE */
+		__u32		sync_range_flags;
+		__u32		msg_flags;
+		__u32		timeout_flags;
+		__u32		accept_flags;
+		__u32		cancel_flags;
+		__u32		open_flags;
+		__u32		statx_flags;
+		__u32		fadvise_advice;
+		__u32		splice_flags;
+		__u32		rename_flags;
+		__u32		unlink_flags;
+		__u32		hardlink_flags;
+		__u32		xattr_flags;
+		__u32		msg_ring_flags;
+		__u32		uring_cmd_flags;
+		__u32		waitid_flags;
+		__u32		futex_flags;
+		__u32		install_fd_flags;
+		__u32		nop_flags;
+		__u32		pipe_flags;
+	};
+	__u64	user_data;	/* data to be passed back at completion time */
+	/* pack this to avoid bogus arm OABI complaints */
+	union {
+		/* index into fixed buffers, if used */
+		__u16	buf_index;
+		/* for grouped buffer selection */
+		__u16	buf_group;
+	} __attribute__((packed));
+	/* personality to use, if used */
+	__u16	personality;
+	union {
+		__s32	splice_fd_in;
+		__u32	file_index;
+		__u32	zcrx_ifq_idx;
+		__u32	optlen;
+		struct {
+			__u16	addr_len;
+			__u16	__pad3[1];
+		};
+	};
+	union {
+		struct {
+			__u64	addr3;
+			__u64	__pad2[1];
+		};
+		struct {
+			__u64	attr_ptr; /* pointer to attribute information */
+			__u64	attr_type_mask; /* bit mask of attributes */
+		};
+		__u64	optval;
+		/*
+		 * If the ring is initialized with IORING_SETUP_SQE128, then
+		 * this field is used for 80 bytes of arbitrary command data
+		 */
+		__u8	cmd[0];
+	};
+};
+
+struct io_uring_cqe {
+	__u64	user_data;
+	__s32	res;
+	__u32	flags;
+};
+
+struct iou_loop_params {
+	__u32			cq_wait_idx;
+};
+
+enum {
+	IOU_LOOP_CONTINUE = 0,
+	IOU_LOOP_STOP,
+};
+
+enum {
+	IOU_REGION_MEM,
+	IOU_REGION_CQ,
+	IOU_REGION_SQ,
+};
+
+struct io_uring_bpf_ops {
+	int (*loop_step)(struct io_ring_ctx *ctx, struct iou_loop_params *lp);
+
+	__u32 ring_fd;
+};
+
+struct io_ring_hdr {
+	u32 head;
+	u32 tail;
+};
+
+
+enum io_uring_op {
+	IORING_OP_NOP,
+
+	/* this goes last, obviously */
+	IORING_OP_LAST,
+};
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest
  2026-01-27 10:14 ` [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest Pavel Begunkov
@ 2026-01-27 17:32   ` Alexei Starovoitov
  2026-01-27 18:42     ` Pavel Begunkov
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2026-01-27 17:32 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, bpf

On Tue, Jan 27, 2026 at 2:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> index 000000000000..7a170cb2f388
> --- /dev/null
> +++ b/tools/testing/selftests/io_uring/types.bpf.h
> @@ -0,0 +1,131 @@
> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> +#include <linux/types.h>
> +#include <bpf/bpf_helpers.h>
> +
> +struct io_ring_ctx {
> +};
> +
> +struct io_uring_sqe {
> +       __u8    opcode;         /* type of operation for this sqe */
> +       __u8    flags;          /* IOSQE_ flags */
> +       __u16   ioprio;         /* ioprio for the request */
> +       __s32   fd;             /* file descriptor to do IO on */

1.
No need to copy paste. Just include vmlinux.h. It's there.

2.
drop KF_TRUSTED_ARGS from kfunc. It's a default now and this flag
was removed.

3.
add a runtime logic to check that the return value is either IOU_LOOP_CONTINUE
or IOU_LOOP_STOP or instruct the verifier do it statically.
Otherwise it will be less convenient to extend to other commands,
since the way I read it IOU_LOOP_CONTINUE == 0 aliases to any value > 1.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest
  2026-01-27 17:32   ` Alexei Starovoitov
@ 2026-01-27 18:42     ` Pavel Begunkov
  2026-01-27 18:53       ` Alexei Starovoitov
  0 siblings, 1 reply; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 18:42 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: io-uring, bpf

On 1/27/26 17:32, Alexei Starovoitov wrote:
> On Tue, Jan 27, 2026 at 2:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> index 000000000000..7a170cb2f388
>> --- /dev/null
>> +++ b/tools/testing/selftests/io_uring/types.bpf.h
>> @@ -0,0 +1,131 @@
>> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>> +#include <linux/types.h>
>> +#include <bpf/bpf_helpers.h>
>> +
>> +struct io_ring_ctx {
>> +};
>> +
>> +struct io_uring_sqe {
>> +       __u8    opcode;         /* type of operation for this sqe */
>> +       __u8    flags;          /* IOSQE_ flags */
>> +       __u16   ioprio;         /* ioprio for the request */
>> +       __s32   fd;             /* file descriptor to do IO on */
> 
> 1.
> No need to copy paste. Just include vmlinux.h. It's there.
> 
> 2.
> drop KF_TRUSTED_ARGS from kfunc. It's a default now and this flag
> was removed.

Got it, will change both, thanks

> 3.
> add a runtime logic to check that the return value is either IOU_LOOP_CONTINUE
> or IOU_LOOP_STOP or instruct the verifier do it statically.
> Otherwise it will be less convenient to extend to other commands,
> since the way I read it IOU_LOOP_CONTINUE == 0 aliases to any value > 1.

Is there a struct_ops hook that can help with that? check_return_code()
has some hard-coded checks, but I can't find anything customizable for
struct_ops.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest
  2026-01-27 18:42     ` Pavel Begunkov
@ 2026-01-27 18:53       ` Alexei Starovoitov
  2026-01-27 19:20         ` Pavel Begunkov
  0 siblings, 1 reply; 11+ messages in thread
From: Alexei Starovoitov @ 2026-01-27 18:53 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: io-uring, bpf

On Tue, Jan 27, 2026 at 10:42 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 1/27/26 17:32, Alexei Starovoitov wrote:
> > On Tue, Jan 27, 2026 at 2:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >>
> >> index 000000000000..7a170cb2f388
> >> --- /dev/null
> >> +++ b/tools/testing/selftests/io_uring/types.bpf.h
> >> @@ -0,0 +1,131 @@
> >> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
> >> +#include <linux/types.h>
> >> +#include <bpf/bpf_helpers.h>
> >> +
> >> +struct io_ring_ctx {
> >> +};
> >> +
> >> +struct io_uring_sqe {
> >> +       __u8    opcode;         /* type of operation for this sqe */
> >> +       __u8    flags;          /* IOSQE_ flags */
> >> +       __u16   ioprio;         /* ioprio for the request */
> >> +       __s32   fd;             /* file descriptor to do IO on */
> >
> > 1.
> > No need to copy paste. Just include vmlinux.h. It's there.
> >
> > 2.
> > drop KF_TRUSTED_ARGS from kfunc. It's a default now and this flag
> > was removed.
>
> Got it, will change both, thanks
>
> > 3.
> > add a runtime logic to check that the return value is either IOU_LOOP_CONTINUE
> > or IOU_LOOP_STOP or instruct the verifier do it statically.
> > Otherwise it will be less convenient to extend to other commands,
> > since the way I read it IOU_LOOP_CONTINUE == 0 aliases to any value > 1.
>
> Is there a struct_ops hook that can help with that? check_return_code()
> has some hard-coded checks, but I can't find anything customizable for
> struct_ops.

yep. check_return_code() is the one and you're correct. It's not
customizable atm. Much simpler to do a runtime check for now and
improve through the verifier support later.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest
  2026-01-27 18:53       ` Alexei Starovoitov
@ 2026-01-27 19:20         ` Pavel Begunkov
  0 siblings, 0 replies; 11+ messages in thread
From: Pavel Begunkov @ 2026-01-27 19:20 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: io-uring, bpf

On 1/27/26 18:53, Alexei Starovoitov wrote:
> On Tue, Jan 27, 2026 at 10:42 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 1/27/26 17:32, Alexei Starovoitov wrote:
>>> On Tue, Jan 27, 2026 at 2:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>
>>>> index 000000000000..7a170cb2f388
>>>> --- /dev/null
>>>> +++ b/tools/testing/selftests/io_uring/types.bpf.h
>>>> @@ -0,0 +1,131 @@
>>>> +// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
>>>> +#include <linux/types.h>
>>>> +#include <bpf/bpf_helpers.h>
>>>> +
>>>> +struct io_ring_ctx {
>>>> +};
>>>> +
>>>> +struct io_uring_sqe {
>>>> +       __u8    opcode;         /* type of operation for this sqe */
>>>> +       __u8    flags;          /* IOSQE_ flags */
>>>> +       __u16   ioprio;         /* ioprio for the request */
>>>> +       __s32   fd;             /* file descriptor to do IO on */
>>>
>>> 1.
>>> No need to copy paste. Just include vmlinux.h. It's there.
>>>
>>> 2.
>>> drop KF_TRUSTED_ARGS from kfunc. It's a default now and this flag
>>> was removed.
>>
>> Got it, will change both, thanks
>>
>>> 3.
>>> add a runtime logic to check that the return value is either IOU_LOOP_CONTINUE
>>> or IOU_LOOP_STOP or instruct the verifier do it statically.
>>> Otherwise it will be less convenient to extend to other commands,
>>> since the way I read it IOU_LOOP_CONTINUE == 0 aliases to any value > 1.
>>
>> Is there a struct_ops hook that can help with that? check_return_code()
>> has some hard-coded checks, but I can't find anything customizable for
>> struct_ops.
> 
> yep. check_return_code() is the one and you're correct. It's not
> customizable atm. Much simpler to do a runtime check for now and
> improve through the verifier support later.

Makes sense, thanks!

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-01-27 19:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-27 10:14 [PATCH v4 0/6] BPF controlled io_uring Pavel Begunkov
2026-01-27 10:14 ` [PATCH v4 1/6] io_uring: introduce callback driven main loop Pavel Begunkov
2026-01-27 10:14 ` [PATCH v4 2/6] io_uring/bpf-ops: add basic bpf struct_ops boilerplate Pavel Begunkov
2026-01-27 10:14 ` [PATCH v4 3/6] io_uring/bpf-ops: add loop_step struct_ops callback Pavel Begunkov
2026-01-27 10:14 ` [PATCH v4 4/6] io_uring/bpf-ops: add kfunc helpers Pavel Begunkov
2026-01-27 10:14 ` [PATCH v4 5/6] io_uring/bpf-ops: add bpf struct ops registration Pavel Begunkov
2026-01-27 10:14 ` [PATCH v4 6/6] selftests/io_uring: add a bpf io_uring selftest Pavel Begunkov
2026-01-27 17:32   ` Alexei Starovoitov
2026-01-27 18:42     ` Pavel Begunkov
2026-01-27 18:53       ` Alexei Starovoitov
2026-01-27 19:20         ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox