[PATCH 0/5] io_uring: add IORING_OP_BPF for extending io

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring
@ 2025-11-04 16:21 Ming Lei
  2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Ming Lei @ 2025-11-04 16:21 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Alexei Starovoitov,
	Ming Lei

Hello,

Add IORING_OP_BPF for extending io_uring operations, follows typical cases:

- buffer registered zero copy [1]

Also there are some RAID like ublk servers which needs to generate data
parity in case of ublk zero copy

- extend io_uring operations from application

Easy to add one new syscall with IORING_OP_BPF

- extend 64 byte SQE

bpf map can store IO data conveniently

- communicate in IO chain

IORING_OP_BPF can be used for communicate among IOs seamlessly without requiring
extra syscall

- pretty handy to inject error for test purpose


The 1st 3 patches adds IORING_OP_BPF framework & struct_ops.

The 4th patch adds both fixed and plain user buffer support, in future
vector and fixed vector buffer can be supported.

The last patch exports io_uring_bpf_req_memcpy() for copying data among
different request buffers, so far fixed and plain buffers are supported,
this is also one demo for showing how to solve above requirements with
IORING_OP_BPF.

Follows liburing support and tests:

https://github.com/ming1/liburing/commits/uring_bpf/

Any comments & feedback are welcome!


[1] lpc2024: ublk based zero copy I/O - use case in Android

https://lpc.events/event/18/contributions/1710/attachments/1440/3070/LPC2024_ublk_zero_copy.pdf




Ming Lei (5):
  io_uring: prepare for extending io_uring with bpf
  io_uring: bpf: add io_uring_ctx setup for BPF into one list
  io_uring: bpf: extend io_uring with bpf struct_ops
  io_uring: bpf: add buffer support for IORING_OP_BPF
  io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc

 include/linux/io_uring_types.h |   5 +
 include/uapi/linux/io_uring.h  |  56 ++++
 init/Kconfig                   |   7 +
 io_uring/Makefile              |   1 +
 io_uring/bpf.c                 | 556 +++++++++++++++++++++++++++++++++
 io_uring/io_uring.c            |   8 +
 io_uring/io_uring.h            |   6 +-
 io_uring/opdef.c               |  10 +
 io_uring/uring_bpf.h           |  86 +++++
 9 files changed, 733 insertions(+), 2 deletions(-)
 create mode 100644 io_uring/bpf.c
 create mode 100644 io_uring/uring_bpf.h

-- 
2.47.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] io_uring: prepare for extending io_uring with bpf
  2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
@ 2025-11-04 16:21 ` Ming Lei
  2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2025-11-04 16:21 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Alexei Starovoitov,
	Ming Lei

Add one bpf operation & related framework and prepare for extending io_uring
with bpf.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/uapi/linux/io_uring.h |  1 +
 init/Kconfig                  |  7 +++++++
 io_uring/Makefile             |  1 +
 io_uring/bpf.c                | 26 ++++++++++++++++++++++++++
 io_uring/opdef.c              | 10 ++++++++++
 io_uring/uring_bpf.h          | 26 ++++++++++++++++++++++++++
 6 files changed, 71 insertions(+)
 create mode 100644 io_uring/bpf.c
 create mode 100644 io_uring/uring_bpf.h

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 04797a9b76bc..b167c1d4ce6e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -303,6 +303,7 @@ enum io_uring_op {
 	IORING_OP_PIPE,
 	IORING_OP_NOP128,
 	IORING_OP_URING_CMD128,
+	IORING_OP_BPF,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
diff --git a/init/Kconfig b/init/Kconfig
index cab3ad28ca49..14d566516643 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1843,6 +1843,13 @@ config IO_URING
 	  applications to submit and complete IO through submission and
 	  completion rings that are shared between the kernel and application.
 
+config IO_URING_BPF
+	bool "Enable IO uring bpf extension" if EXPERT
+	depends on IO_URING && BPF
+	help
+	  This option enables bpf extension for the io_uring interface, so
+	  application can define its own io_uring operation by bpf program.
+
 config GCOV_PROFILE_URING
 	bool "Enable GCOV profiling on the io_uring subsystem"
 	depends on IO_URING && GCOV_KERNEL
diff --git a/io_uring/Makefile b/io_uring/Makefile
index bc4e4a3fa0a5..35eeeaf64489 100644
--- a/io_uring/Makefile
+++ b/io_uring/Makefile
@@ -22,3 +22,4 @@ obj-$(CONFIG_NET_RX_BUSY_POLL)	+= napi.o
 obj-$(CONFIG_NET) += net.o cmd_net.o
 obj-$(CONFIG_PROC_FS) += fdinfo.o
 obj-$(CONFIG_IO_URING_MOCK_FILE) += mock_file.o
+obj-$(CONFIG_IO_URING_BPF)	+= bpf.o
diff --git a/io_uring/bpf.c b/io_uring/bpf.c
new file mode 100644
index 000000000000..8c47df13c7b5
--- /dev/null
+++ b/io_uring/bpf.c
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024 Red Hat */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <uapi/linux/io_uring.h>
+#include "io_uring.h"
+#include "uring_bpf.h"
+
+int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
+{
+	return -ECANCELED;
+}
+
+int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	return -EOPNOTSUPP;
+}
+
+void io_uring_bpf_fail(struct io_kiocb *req)
+{
+}
+
+void io_uring_bpf_cleanup(struct io_kiocb *req)
+{
+}
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index df52d760240e..d93ee3d577d4 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -38,6 +38,7 @@
 #include "futex.h"
 #include "truncate.h"
 #include "zcrx.h"
+#include "uring_bpf.h"
 
 static int io_no_issue(struct io_kiocb *req, unsigned int issue_flags)
 {
@@ -593,6 +594,10 @@ const struct io_issue_def io_issue_defs[] = {
 		.prep			= io_uring_cmd_prep,
 		.issue			= io_uring_cmd,
 	},
+	[IORING_OP_BPF] = {
+		.prep			= io_uring_bpf_prep,
+		.issue			= io_uring_bpf_issue,
+	},
 };
 
 const struct io_cold_def io_cold_defs[] = {
@@ -851,6 +856,11 @@ const struct io_cold_def io_cold_defs[] = {
 		.sqe_copy		= io_uring_cmd_sqe_copy,
 		.cleanup		= io_uring_cmd_cleanup,
 	},
+	[IORING_OP_BPF] = {
+		.name			= "BPF",
+		.cleanup		= io_uring_bpf_cleanup,
+		.fail			= io_uring_bpf_fail,
+	},
 };
 
 const char *io_uring_get_opcode(u8 opcode)
diff --git a/io_uring/uring_bpf.h b/io_uring/uring_bpf.h
new file mode 100644
index 000000000000..bde774ce6ac0
--- /dev/null
+++ b/io_uring/uring_bpf.h
@@ -0,0 +1,26 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef IOU_BPF_H
+#define IOU_BPF_H
+
+#ifdef CONFIG_IO_URING_BPF
+int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags);
+int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+void io_uring_bpf_fail(struct io_kiocb *req);
+void io_uring_bpf_cleanup(struct io_kiocb *req);
+#else
+static inline int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
+{
+	return -ECANCELED;
+}
+static inline int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	return -EOPNOTSUPP;
+}
+static inline void io_uring_bpf_fail(struct io_kiocb *req)
+{
+}
+static inline void io_uring_bpf_cleanup(struct io_kiocb *req)
+{
+}
+#endif
+#endif
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list
  2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
  2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
@ 2025-11-04 16:21 ` Ming Lei
  2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2025-11-04 16:21 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Alexei Starovoitov,
	Ming Lei

Add io_uring_ctx setup for BPF into one list, and prepare for syncing
bpf struct_ops register and un-register.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/linux/io_uring_types.h |  5 +++++
 include/uapi/linux/io_uring.h  |  5 +++++
 io_uring/bpf.c                 | 15 +++++++++++++++
 io_uring/io_uring.c            |  7 +++++++
 io_uring/io_uring.h            |  3 ++-
 io_uring/uring_bpf.h           | 11 +++++++++++
 6 files changed, 45 insertions(+), 1 deletion(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 92780764d5fa..d2e098c3fd2c 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -465,6 +465,11 @@ struct io_ring_ctx {
 	struct io_mapped_region		ring_region;
 	/* used for optimised request parameter and wait argument passing  */
 	struct io_mapped_region		param_region;
+
+#ifdef CONFIG_IO_URING_BPF
+	/* added to uring_bpf_ctx_list */
+	struct list_head		bpf_node;
+#endif
 };
 
 /*
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index b167c1d4ce6e..b8c49813b4e5 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -237,6 +237,11 @@ enum io_uring_sqe_flags_bit {
  */
 #define IORING_SETUP_SQE_MIXED		(1U << 19)
 
+/*
+ * Allow to submit bpf IO
+ */
+#define IORING_SETUP_BPF		(1U << 20)
+
 enum io_uring_op {
 	IORING_OP_NOP,
 	IORING_OP_READV,
diff --git a/io_uring/bpf.c b/io_uring/bpf.c
index 8c47df13c7b5..bb1e37d1e804 100644
--- a/io_uring/bpf.c
+++ b/io_uring/bpf.c
@@ -7,6 +7,9 @@
 #include "io_uring.h"
 #include "uring_bpf.h"
 
+static DEFINE_MUTEX(uring_bpf_ctx_lock);
+static LIST_HEAD(uring_bpf_ctx_list);
+
 int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
 {
 	return -ECANCELED;
@@ -24,3 +27,15 @@ void io_uring_bpf_fail(struct io_kiocb *req)
 void io_uring_bpf_cleanup(struct io_kiocb *req)
 {
 }
+
+void uring_bpf_add_ctx(struct io_ring_ctx *ctx)
+{
+	guard(mutex)(&uring_bpf_ctx_lock);
+	list_add(&ctx->bpf_node, &uring_bpf_ctx_list);
+}
+
+void uring_bpf_del_ctx(struct io_ring_ctx *ctx)
+{
+	guard(mutex)(&uring_bpf_ctx_lock);
+	list_del(&ctx->bpf_node);
+}
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 3f0489261d11..38f03f6c28cb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -105,6 +105,7 @@
 #include "rw.h"
 #include "alloc_cache.h"
 #include "eventfd.h"
+#include "uring_bpf.h"
 
 #define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \
 			  IOSQE_IO_HARDLINK | IOSQE_ASYNC)
@@ -352,6 +353,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	io_napi_init(ctx);
 	mutex_init(&ctx->mmap_lock);
 
+	if (ctx->flags & IORING_SETUP_BPF)
+		uring_bpf_add_ctx(ctx);
+
 	return ctx;
 
 free_ref:
@@ -2855,6 +2859,9 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 	if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
 		static_branch_dec(&io_key_has_sqarray);
 
+	if (ctx->flags & IORING_SETUP_BPF)
+		uring_bpf_del_ctx(ctx);
+
 	percpu_ref_exit(&ctx->refs);
 	free_uid(ctx->user);
 	io_req_caches_free(ctx);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 23c268ab1c8f..4baf21a9e1ee 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -55,7 +55,8 @@
 			IORING_SETUP_NO_SQARRAY |\
 			IORING_SETUP_HYBRID_IOPOLL |\
 			IORING_SETUP_CQE_MIXED |\
-			IORING_SETUP_SQE_MIXED)
+			IORING_SETUP_SQE_MIXED |\
+			IORING_SETUP_BPF)
 
 #define IORING_ENTER_FLAGS (IORING_ENTER_GETEVENTS |\
 			IORING_ENTER_SQ_WAKEUP |\
diff --git a/io_uring/uring_bpf.h b/io_uring/uring_bpf.h
index bde774ce6ac0..b6cda6df99b1 100644
--- a/io_uring/uring_bpf.h
+++ b/io_uring/uring_bpf.h
@@ -7,6 +7,10 @@ int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags);
 int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 void io_uring_bpf_fail(struct io_kiocb *req);
 void io_uring_bpf_cleanup(struct io_kiocb *req);
+
+void uring_bpf_add_ctx(struct io_ring_ctx *ctx);
+void uring_bpf_del_ctx(struct io_ring_ctx *ctx);
+
 #else
 static inline int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
 {
@@ -22,5 +26,12 @@ static inline void io_uring_bpf_fail(struct io_kiocb *req)
 static inline void io_uring_bpf_cleanup(struct io_kiocb *req)
 {
 }
+
+static inline void uring_bpf_add_ctx(struct io_ring_ctx *ctx)
+{
+}
+static inline void uring_bpf_del_ctx(struct io_ring_ctx *ctx)
+{
+}
 #endif
 #endif
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
  2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
  2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
  2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
@ 2025-11-04 16:21 ` Ming Lei
  2025-11-07 19:02   ` kernel test robot
  2025-11-08  6:53   ` kernel test robot
  2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Ming Lei @ 2025-11-04 16:21 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Alexei Starovoitov,
	Ming Lei

io_uring can be extended with bpf struct_ops in the following ways:

1) add new io_uring operation from application
- one typical use case is for operating device zero-copy buffer, which
belongs to kernel, and not visible or too expensive to export to
userspace, such as supporting copy data from this buffer to userspace,
decompressing data to zero-copy buffer in Android case[1][2], or
checksum/decrypting.

[1] https://lpc.events/event/18/contributions/1710/attachments/1440/3070/LPC2024_ublk_zero_copy.pdf

2) extend 64 byte SQE, since bpf map can be used to store IO data
   conveniently

3) communicate in IO chain, since bpf map can be shared among IOs,
when one bpf IO is completed, data can be written to IO chain wide
bpf map, then the following bpf IO can retrieve the data from this bpf
map, this way is more flexible than io_uring built-in buffer

4) pretty handy to inject error for test purpose

bpf struct_ops is one very handy way to attach bpf prog with kernel, and
this patch simply wires existed io_uring operation callbacks with added
uring bpf struct_ops, so application can define its own uring bpf
operations.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/uapi/linux/io_uring.h |   9 ++
 io_uring/bpf.c                | 271 +++++++++++++++++++++++++++++++++-
 io_uring/io_uring.c           |   1 +
 io_uring/io_uring.h           |   3 +-
 io_uring/uring_bpf.h          |  30 ++++
 5 files changed, 311 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index b8c49813b4e5..94d2050131ac 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -74,6 +74,7 @@ struct io_uring_sqe {
 		__u32		install_fd_flags;
 		__u32		nop_flags;
 		__u32		pipe_flags;
+		__u32		bpf_op_flags;
 	};
 	__u64	user_data;	/* data to be passed back at completion time */
 	/* pack this to avoid bogus arm OABI complaints */
@@ -427,6 +428,13 @@ enum io_uring_op {
 #define IORING_RECVSEND_BUNDLE		(1U << 4)
 #define IORING_SEND_VECTORIZED		(1U << 5)
 
+/*
+ * sqe->bpf_op_flags		top 8bits is for storing bpf op
+ *				The other 24bits are used for bpf prog
+ */
+#define IORING_BPF_OP_BITS	(8)
+#define IORING_BPF_OP_SHIFT	(24)
+
 /*
  * cqe.res for IORING_CQE_F_NOTIF if
  * IORING_SEND_ZC_REPORT_USAGE was requested
@@ -631,6 +639,7 @@ struct io_uring_params {
 #define IORING_FEAT_MIN_TIMEOUT		(1U << 15)
 #define IORING_FEAT_RW_ATTR		(1U << 16)
 #define IORING_FEAT_NO_IOWAIT		(1U << 17)
+#define IORING_FEAT_BPF			(1U << 18)
 
 /*
  * io_uring_register(2) opcodes and arguments
diff --git a/io_uring/bpf.c b/io_uring/bpf.c
index bb1e37d1e804..8227be6d5a10 100644
--- a/io_uring/bpf.c
+++ b/io_uring/bpf.c
@@ -4,28 +4,95 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <uapi/linux/io_uring.h>
+#include <linux/init.h>
+#include <linux/types.h>
+#include <linux/bpf_verifier.h>
+#include <linux/bpf.h>
+#include <linux/btf.h>
+#include <linux/btf_ids.h>
+#include <linux/filter.h>
 #include "io_uring.h"
 #include "uring_bpf.h"
 
+#define MAX_BPF_OPS_COUNT	(1 << IORING_BPF_OP_BITS)
+
 static DEFINE_MUTEX(uring_bpf_ctx_lock);
 static LIST_HEAD(uring_bpf_ctx_list);
+DEFINE_STATIC_SRCU(uring_bpf_srcu);
+static struct uring_bpf_ops bpf_ops[MAX_BPF_OPS_COUNT];
 
-int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
+static inline unsigned char uring_bpf_get_op(unsigned int op_flags)
 {
-	return -ECANCELED;
+	return (unsigned char)(op_flags >> IORING_BPF_OP_SHIFT);
+}
+
+static inline unsigned int uring_bpf_get_flags(unsigned int op_flags)
+{
+	return op_flags & ((1U << IORING_BPF_OP_SHIFT) - 1);
+}
+
+static inline struct uring_bpf_ops *uring_bpf_get_ops(struct uring_bpf_data *data)
+{
+	return &bpf_ops[uring_bpf_get_op(data->opf)];
 }
 
 int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	unsigned int op_flags = READ_ONCE(sqe->bpf_op_flags);
+	struct uring_bpf_ops *ops;
+
+	if (!(req->ctx->flags & IORING_SETUP_BPF))
+		return -EACCES;
+
+	data->opf = op_flags;
+	ops = &bpf_ops[uring_bpf_get_op(data->opf)];
+
+	if (ops->prep_fn)
+		return ops->prep_fn(data, sqe);
 	return -EOPNOTSUPP;
 }
 
+static int __io_uring_bpf_issue(struct io_kiocb *req)
+{
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	struct uring_bpf_ops *ops = uring_bpf_get_ops(data);
+
+	if (ops->issue_fn)
+		return ops->issue_fn(data);
+	return -ECANCELED;
+}
+
+int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
+{
+	if (issue_flags & IO_URING_F_UNLOCKED) {
+		int idx, ret;
+
+		idx = srcu_read_lock(&uring_bpf_srcu);
+		ret = __io_uring_bpf_issue(req);
+		srcu_read_unlock(&uring_bpf_srcu, idx);
+
+		return ret;
+	}
+	return __io_uring_bpf_issue(req);
+}
+
 void io_uring_bpf_fail(struct io_kiocb *req)
 {
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	struct uring_bpf_ops *ops = uring_bpf_get_ops(data);
+
+	if (ops->fail_fn)
+		ops->fail_fn(data);
 }
 
 void io_uring_bpf_cleanup(struct io_kiocb *req)
 {
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+	struct uring_bpf_ops *ops = uring_bpf_get_ops(data);
+
+	if (ops->cleanup_fn)
+		ops->cleanup_fn(data);
 }
 
 void uring_bpf_add_ctx(struct io_ring_ctx *ctx)
@@ -39,3 +106,203 @@ void uring_bpf_del_ctx(struct io_ring_ctx *ctx)
 	guard(mutex)(&uring_bpf_ctx_lock);
 	list_del(&ctx->bpf_node);
 }
+
+static const struct btf_type *uring_bpf_data_type;
+
+static bool uring_bpf_ops_is_valid_access(int off, int size,
+				       enum bpf_access_type type,
+				       const struct bpf_prog *prog,
+				       struct bpf_insn_access_aux *info)
+{
+	return bpf_tracing_btf_ctx_access(off, size, type, prog, info);
+}
+
+static int uring_bpf_ops_btf_struct_access(struct bpf_verifier_log *log,
+					const struct bpf_reg_state *reg,
+					int off, int size)
+{
+	const struct btf_type *t;
+
+	t = btf_type_by_id(reg->btf, reg->btf_id);
+	if (t != uring_bpf_data_type) {
+		bpf_log(log, "only read is supported\n");
+		return -EACCES;
+	}
+
+	if (off < offsetof(struct uring_bpf_data, pdu) ||
+			off + size >= sizeof(struct uring_bpf_data))
+		return -EACCES;
+
+	return NOT_INIT;
+}
+
+static const struct bpf_verifier_ops io_bpf_verifier_ops = {
+	.get_func_proto = bpf_base_func_proto,
+	.is_valid_access = uring_bpf_ops_is_valid_access,
+	.btf_struct_access = uring_bpf_ops_btf_struct_access,
+};
+
+static int uring_bpf_ops_init(struct btf *btf)
+{
+	s32 type_id;
+
+	type_id = btf_find_by_name_kind(btf, "uring_bpf_data", BTF_KIND_STRUCT);
+	if (type_id < 0)
+		return -EINVAL;
+	uring_bpf_data_type = btf_type_by_id(btf, type_id);
+	return 0;
+}
+
+static int uring_bpf_ops_check_member(const struct btf_type *t,
+				   const struct btf_member *member,
+				   const struct bpf_prog *prog)
+{
+	return 0;
+}
+
+static int uring_bpf_ops_init_member(const struct btf_type *t,
+				 const struct btf_member *member,
+				 void *kdata, const void *udata)
+{
+	const struct uring_bpf_ops *uuring_bpf_ops;
+	struct uring_bpf_ops *kuring_bpf_ops;
+	u32 moff;
+
+	uuring_bpf_ops = (const struct uring_bpf_ops *)udata;
+	kuring_bpf_ops = (struct uring_bpf_ops *)kdata;
+
+	moff = __btf_member_bit_offset(t, member) / 8;
+
+	switch (moff) {
+	case offsetof(struct uring_bpf_ops, id):
+		/* For dev_id, this function has to copy it and return 1 to
+		 * indicate that the data has been handled by the struct_ops
+		 * type, or the verifier will reject the map if the value of
+		 * those fields is not zero.
+		 */
+		kuring_bpf_ops->id = uuring_bpf_ops->id;
+		return 1;
+	}
+	return 0;
+}
+
+static int io_bpf_reg_unreg(struct uring_bpf_ops *ops, bool reg)
+{
+	struct io_ring_ctx *ctx;
+	int ret = 0;
+
+	guard(mutex)(&uring_bpf_ctx_lock);
+	list_for_each_entry(ctx, &uring_bpf_ctx_list, bpf_node)
+		mutex_lock(&ctx->uring_lock);
+
+	if (reg) {
+		if (bpf_ops[ops->id].issue_fn)
+			ret = -EBUSY;
+		else
+			bpf_ops[ops->id] = *ops;
+	} else {
+		bpf_ops[ops->id] = (struct uring_bpf_ops) {0};
+	}
+
+	synchronize_srcu(&uring_bpf_srcu);
+
+	list_for_each_entry(ctx, &uring_bpf_ctx_list, bpf_node)
+		mutex_unlock(&ctx->uring_lock);
+
+	return ret;
+}
+
+static int io_bpf_reg(void *kdata, struct bpf_link *link)
+{
+	struct uring_bpf_ops *ops = kdata;
+
+	return io_bpf_reg_unreg(ops, true);
+}
+
+static void io_bpf_unreg(void *kdata, struct bpf_link *link)
+{
+	struct uring_bpf_ops *ops = kdata;
+
+	io_bpf_reg_unreg(ops, false);
+}
+
+static int io_bpf_prep_io(struct uring_bpf_data *data, const struct io_uring_sqe *sqe)
+{
+	return -EOPNOTSUPP;
+}
+
+static int io_bpf_issue_io(struct uring_bpf_data *data)
+{
+	return -ECANCELED;
+}
+
+static void io_bpf_fail_io(struct uring_bpf_data *data)
+{
+}
+
+static void io_bpf_cleanup_io(struct uring_bpf_data *data)
+{
+}
+
+static struct uring_bpf_ops __bpf_uring_bpf_ops = {
+	.prep_fn	= io_bpf_prep_io,
+	.issue_fn	= io_bpf_issue_io,
+	.fail_fn	= io_bpf_fail_io,
+	.cleanup_fn	= io_bpf_cleanup_io,
+};
+
+static struct bpf_struct_ops bpf_uring_bpf_ops = {
+	.verifier_ops = &io_bpf_verifier_ops,
+	.init = uring_bpf_ops_init,
+	.check_member = uring_bpf_ops_check_member,
+	.init_member = uring_bpf_ops_init_member,
+	.reg = io_bpf_reg,
+	.unreg = io_bpf_unreg,
+	.name = "uring_bpf_ops",
+	.cfi_stubs = &__bpf_uring_bpf_ops,
+	.owner = THIS_MODULE,
+};
+
+__bpf_kfunc_start_defs();
+__bpf_kfunc void uring_bpf_set_result(struct uring_bpf_data *data, int res)
+{
+	struct io_kiocb *req = cmd_to_io_kiocb(data);
+
+	if (res < 0)
+		req_set_fail(req);
+	io_req_set_res(req, res, 0);
+}
+
+/* io_kiocb layout might be changed */
+__bpf_kfunc struct io_kiocb *uring_bpf_data_to_req(struct uring_bpf_data *data)
+{
+	return cmd_to_io_kiocb(data);
+}
+__bpf_kfunc_end_defs();
+
+BTF_KFUNCS_START(uring_bpf_kfuncs)
+BTF_ID_FLAGS(func, uring_bpf_set_result)
+BTF_ID_FLAGS(func, uring_bpf_data_to_req)
+BTF_KFUNCS_END(uring_bpf_kfuncs)
+
+static const struct btf_kfunc_id_set uring_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &uring_bpf_kfuncs,
+};
+
+int __init io_bpf_init(void)
+{
+	int err;
+
+	err = register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS, &uring_kfunc_set);
+	if (err) {
+		pr_warn("error while setting UBLK BPF tracing kfuncs: %d", err);
+		return err;
+	}
+
+	err = register_bpf_struct_ops(&bpf_uring_bpf_ops, uring_bpf_ops);
+	if (err)
+		pr_warn("error while registering io_uring bpf struct ops: %d", err);
+
+	return 0;
+}
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 38f03f6c28cb..d2517e09407a 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3851,6 +3851,7 @@ static int __init io_uring_init(void)
 	register_sysctl_init("kernel", kernel_io_uring_disabled_table);
 #endif
 
+	io_bpf_init();
 	return 0;
 };
 __initcall(io_uring_init);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 4baf21a9e1ee..3f19bb079bcc 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -34,7 +34,8 @@
 			IORING_FEAT_RECVSEND_BUNDLE |\
 			IORING_FEAT_MIN_TIMEOUT |\
 			IORING_FEAT_RW_ATTR |\
-			IORING_FEAT_NO_IOWAIT)
+			IORING_FEAT_NO_IOWAIT |\
+			IORING_FEAT_BPF);
 
 #define IORING_SETUP_FLAGS (IORING_SETUP_IOPOLL |\
 			IORING_SETUP_SQPOLL |\
diff --git a/io_uring/uring_bpf.h b/io_uring/uring_bpf.h
index b6cda6df99b1..c76eba887d22 100644
--- a/io_uring/uring_bpf.h
+++ b/io_uring/uring_bpf.h
@@ -2,6 +2,29 @@
 #ifndef IOU_BPF_H
 #define IOU_BPF_H
 
+struct uring_bpf_data {
+	/* readonly for bpf prog */
+	struct file     *file;
+	u32		opf;
+
+	/* writeable for bpf prog */
+	u8              pdu[64 - sizeof(struct file *) - sizeof(u32)];
+};
+
+typedef int (*uring_io_prep_t)(struct uring_bpf_data *data,
+			       const struct io_uring_sqe *sqe);
+typedef int (*uring_io_issue_t)(struct uring_bpf_data *data);
+typedef void (*uring_io_fail_t)(struct uring_bpf_data *data);
+typedef void (*uring_io_cleanup_t)(struct uring_bpf_data *data);
+
+struct uring_bpf_ops {
+	unsigned short		id;
+	uring_io_prep_t		prep_fn;
+	uring_io_issue_t	issue_fn;
+	uring_io_fail_t		fail_fn;
+	uring_io_cleanup_t	cleanup_fn;
+};
+
 #ifdef CONFIG_IO_URING_BPF
 int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags);
 int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
@@ -11,6 +34,8 @@ void io_uring_bpf_cleanup(struct io_kiocb *req);
 void uring_bpf_add_ctx(struct io_ring_ctx *ctx);
 void uring_bpf_del_ctx(struct io_ring_ctx *ctx);
 
+int __init io_bpf_init(void);
+
 #else
 static inline int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
 {
@@ -33,5 +58,10 @@ static inline void uring_bpf_add_ctx(struct io_ring_ctx *ctx)
 static inline void uring_bpf_del_ctx(struct io_ring_ctx *ctx)
 {
 }
+
+static inline int __init io_bpf_init(void)
+{
+	return 0;
+}
 #endif
 #endif
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
  2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
@ 2025-11-07 19:02   ` kernel test robot
  2025-11-08  6:53   ` kernel test robot
  1 sibling, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-11-07 19:02 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, io-uring
  Cc: oe-kbuild-all, Caleb Sander Mateos, Akilesh Kailash, bpf,
	Alexei Starovoitov, Ming Lei

Hi Ming,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20251104]
[cannot apply to bpf-next/net bpf-next/master bpf/master linus/master v6.18-rc4 v6.18-rc3 v6.18-rc2 v6.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ming-Lei/io_uring-prepare-for-extending-io_uring-with-bpf/20251105-002757
base:   next-20251104
patch link:    https://lore.kernel.org/r/20251104162123.1086035-4-ming.lei%40redhat.com
patch subject: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
config: parisc-randconfig-r121-20251107 (https://download.01.org/0day-ci/archive/20251108/202511080257.ZRnK6f2W-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251108/202511080257.ZRnK6f2W-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511080257.ZRnK6f2W-lkp@intel.com/

All errors (new ones prefixed by >>):

   io_uring/bpf.c: In function 'uring_bpf_ops_is_valid_access':
>> io_uring/bpf.c:117:9: error: implicit declaration of function 'bpf_tracing_btf_ctx_access'; did you mean 'bpf_sock_convert_ctx_access'? [-Werror=implicit-function-declaration]
     return bpf_tracing_btf_ctx_access(off, size, type, prog, info);
            ^~~~~~~~~~~~~~~~~~~~~~~~~~
            bpf_sock_convert_ctx_access
   In file included from include/linux/bpf_verifier.h:7,
                    from io_uring/bpf.c:9:
   io_uring/bpf.c: In function 'io_bpf_init':
   include/linux/bpf.h:2044:50: warning: statement with no effect [-Wunused-value]
    #define register_bpf_struct_ops(st_ops, type) ({ (void *)(st_ops); 0; })
                                                     ^~~~~~~~~~~~~~~~
   io_uring/bpf.c:303:8: note: in expansion of macro 'register_bpf_struct_ops'
     err = register_bpf_struct_ops(&bpf_uring_bpf_ops, uring_bpf_ops);
           ^~~~~~~~~~~~~~~~~~~~~~~
   In file included from <command-line>:
   In function 'io_kiocb_cmd_sz_check',
       inlined from 'io_uring_bpf_prep' at io_uring/bpf.c:41:32:
   include/linux/compiler_types.h:603:38: error: call to '__compiletime_assert_513' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                         ^
   include/linux/compiler_types.h:584:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler_types.h:603:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
   include/linux/io_uring_types.h:655:2: note: in expansion of macro 'BUILD_BUG_ON'
     BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
     ^~~~~~~~~~~~
   In function 'io_kiocb_cmd_sz_check',
       inlined from '__io_uring_bpf_issue.isra.4' at io_uring/bpf.c:58:32,
       inlined from 'io_uring_bpf_issue' at io_uring/bpf.c:72:9:
   include/linux/compiler_types.h:603:38: error: call to '__compiletime_assert_513' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                         ^
   include/linux/compiler_types.h:584:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler_types.h:603:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
   include/linux/io_uring_types.h:655:2: note: in expansion of macro 'BUILD_BUG_ON'
     BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
     ^~~~~~~~~~~~
   In function 'io_kiocb_cmd_sz_check',
       inlined from '__io_uring_bpf_issue.isra.4' at io_uring/bpf.c:58:32,
       inlined from 'io_uring_bpf_issue' at io_uring/bpf.c:77:9:
   include/linux/compiler_types.h:603:38: error: call to '__compiletime_assert_513' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                         ^
   include/linux/compiler_types.h:584:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler_types.h:603:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
   include/linux/io_uring_types.h:655:2: note: in expansion of macro 'BUILD_BUG_ON'
     BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
     ^~~~~~~~~~~~
   In function 'io_kiocb_cmd_sz_check',
       inlined from 'io_uring_bpf_fail' at io_uring/bpf.c:82:32:
   include/linux/compiler_types.h:603:38: error: call to '__compiletime_assert_513' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                         ^
   include/linux/compiler_types.h:584:4: note: in definition of macro '__compiletime_assert'
       prefix ## suffix();    \
       ^~~~~~
   include/linux/compiler_types.h:603:2: note: in expansion of macro '_compiletime_assert'
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
     ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
    #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                        ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:2: note: in expansion of macro 'BUILD_BUG_ON_MSG'
     BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
     ^~~~~~~~~~~~~~~~
   include/linux/io_uring_types.h:655:2: note: in expansion of macro 'BUILD_BUG_ON'
     BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
     ^~~~~~~~~~~~
   include/linux/io_uring_types.h: In function 'io_uring_bpf_cleanup':
   include/linux/compiler_types.h:603:38: error: call to '__compiletime_assert_513' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
                                         ^
   include/linux/compiler_types.h:584:4: note: in definition of macro '__compiletime_assert'


vim +117 io_uring/bpf.c

   111	
   112	static bool uring_bpf_ops_is_valid_access(int off, int size,
   113					       enum bpf_access_type type,
   114					       const struct bpf_prog *prog,
   115					       struct bpf_insn_access_aux *info)
   116	{
 > 117		return bpf_tracing_btf_ctx_access(off, size, type, prog, info);
   118	}
   119	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
  2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
  2025-11-07 19:02   ` kernel test robot
@ 2025-11-08  6:53   ` kernel test robot
  1 sibling, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-11-08  6:53 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, io-uring
  Cc: llvm, oe-kbuild-all, Caleb Sander Mateos, Akilesh Kailash, bpf,
	Alexei Starovoitov, Ming Lei

Hi Ming,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20251104]
[cannot apply to bpf-next/net bpf-next/master bpf/master linus/master v6.18-rc4 v6.18-rc3 v6.18-rc2 v6.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ming-Lei/io_uring-prepare-for-extending-io_uring-with-bpf/20251105-002757
base:   next-20251104
patch link:    https://lore.kernel.org/r/20251104162123.1086035-4-ming.lei%40redhat.com
patch subject: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
:::::: branch date: 3 days ago
:::::: commit date: 3 days ago
config: hexagon-allmodconfig (https://download.01.org/0day-ci/archive/20251108/202511080004.DkwIEtwd-lkp@intel.com/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251108/202511080004.DkwIEtwd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/r/202511080004.DkwIEtwd-lkp@intel.com/

All errors (new ones prefixed by >>):

   io_uring/bpf.c:29:28: warning: unused function 'uring_bpf_get_flags' [-Wunused-function]
      29 | static inline unsigned int uring_bpf_get_flags(unsigned int op_flags)
         |                            ^~~~~~~~~~~~~~~~~~~
   In file included from io_uring/bpf.c:14:
   In file included from io_uring/io_uring.h:9:
>> include/linux/io_uring_types.h:655:2: error: call to '__compiletime_assert_586' declared with 'error' attribute: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     655 |         BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
         |         ^
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^
   include/linux/compiler_types.h:603:2: note: expanded from macro 'compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^
   include/linux/compiler_types.h:591:2: note: expanded from macro '_compiletime_assert'
     591 |         __compiletime_assert(condition, msg, prefix, suffix)
         |         ^
   include/linux/compiler_types.h:584:4: note: expanded from macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^
   <scratch space>:174:1: note: expanded from here
     174 | __compiletime_assert_586
         | ^
   In file included from io_uring/bpf.c:14:
   In file included from io_uring/io_uring.h:9:
>> include/linux/io_uring_types.h:655:2: error: call to '__compiletime_assert_586' declared with 'error' attribute: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^
   include/linux/compiler_types.h:603:2: note: expanded from macro 'compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^
   include/linux/compiler_types.h:591:2: note: expanded from macro '_compiletime_assert'
     591 |         __compiletime_assert(condition, msg, prefix, suffix)
         |         ^
   include/linux/compiler_types.h:584:4: note: expanded from macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^
   <scratch space>:174:1: note: expanded from here
     174 | __compiletime_assert_586
         | ^
   In file included from io_uring/bpf.c:14:
   In file included from io_uring/io_uring.h:9:
>> include/linux/io_uring_types.h:655:2: error: call to '__compiletime_assert_586' declared with 'error' attribute: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^
   include/linux/compiler_types.h:603:2: note: expanded from macro 'compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^
   include/linux/compiler_types.h:591:2: note: expanded from macro '_compiletime_assert'
     591 |         __compiletime_assert(condition, msg, prefix, suffix)
         |         ^
   include/linux/compiler_types.h:584:4: note: expanded from macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^
   <scratch space>:174:1: note: expanded from here
     174 | __compiletime_assert_586
         | ^
   In file included from io_uring/bpf.c:14:
   In file included from io_uring/io_uring.h:9:
>> include/linux/io_uring_types.h:655:2: error: call to '__compiletime_assert_586' declared with 'error' attribute: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^
   include/linux/compiler_types.h:603:2: note: expanded from macro 'compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^
   include/linux/compiler_types.h:591:2: note: expanded from macro '_compiletime_assert'
     591 |         __compiletime_assert(condition, msg, prefix, suffix)
         |         ^
   include/linux/compiler_types.h:584:4: note: expanded from macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^
   <scratch space>:174:1: note: expanded from here
     174 | __compiletime_assert_586
         | ^
   In file included from io_uring/bpf.c:14:
   In file included from io_uring/io_uring.h:9:
>> include/linux/io_uring_types.h:655:2: error: call to '__compiletime_assert_586' declared with 'error' attribute: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
   include/linux/build_bug.h:50:2: note: expanded from macro 'BUILD_BUG_ON'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^
   include/linux/compiler_types.h:603:2: note: expanded from macro 'compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^
   include/linux/compiler_types.h:591:2: note: expanded from macro '_compiletime_assert'
     591 |         __compiletime_assert(condition, msg, prefix, suffix)
         |         ^
   include/linux/compiler_types.h:584:4: note: expanded from macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^
   <scratch space>:174:1: note: expanded from here
     174 | __compiletime_assert_586
         | ^
   1 warning and 5 errors generated.


vim +655 include/linux/io_uring_types.h

e27f928ee1cb06 io_uring/io_uring_types.h      Jens Axboe          2022-05-24  652  
f2ccb5aed7bce1 include/linux/io_uring_types.h Stefan Metzmacher   2022-08-11  653  static inline void io_kiocb_cmd_sz_check(size_t cmd_sz)
f2ccb5aed7bce1 include/linux/io_uring_types.h Stefan Metzmacher   2022-08-11  654  {
f2ccb5aed7bce1 include/linux/io_uring_types.h Stefan Metzmacher   2022-08-11 @655  	BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
f2ccb5aed7bce1 include/linux/io_uring_types.h Stefan Metzmacher   2022-08-11  656  }
f2ccb5aed7bce1 include/linux/io_uring_types.h Stefan Metzmacher   2022-08-11  657  #define io_kiocb_to_cmd(req, cmd_type) ( \
f2ccb5aed7bce1 include/linux/io_uring_types.h Stefan Metzmacher   2022-08-11  658  	io_kiocb_cmd_sz_check(sizeof(cmd_type)) , \
f2ccb5aed7bce1 include/linux/io_uring_types.h Stefan Metzmacher   2022-08-11  659  	((cmd_type *)&(req)->cmd) \
f2ccb5aed7bce1 include/linux/io_uring_types.h Stefan Metzmacher   2022-08-11  660  )
09fdd35162c289 include/linux/io_uring_types.h Caleb Sander Mateos 2025-02-28  661  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF
  2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (2 preceding siblings ...)
  2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
@ 2025-11-04 16:21 ` Ming Lei
  2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
  2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
  5 siblings, 0 replies; 14+ messages in thread
From: Ming Lei @ 2025-11-04 16:21 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Alexei Starovoitov,
	Ming Lei

Add support for passing 0-2 buffers to BPF operations through
IORING_OP_BPF. Buffer types are encoded in sqe->bpf_op_flags
using dedicated 3-bit fields for each buffer.

Buffer 1 can be:
- None (no buffer)
- Plain user buffer (addr=sqe->addr, len=sqe->len)
- Fixed/registered buffer (index=sqe->buf_index, offset=sqe->addr,
  len=sqe->len)

Buffer 2 can be:
- None (no buffer)
- Plain user buffer (addr=sqe->addr3, len=sqe->optlen)

The sqe->bpf_op_flags layout (32 bits):
  Bits 31-24: BPF operation ID (8 bits)
  Bits 23-21: Buffer 1 type (3 bits)
  Bits 20-18: Buffer 2 type (3 bits)
  Bits 17-0:  Custom BPF flags (18 bits)

Using 3-bit encoding for each buffer type allows for future
expansion to 8 types (0-7). Currently types 0-2 are defined
(none/plain/fixed) and 3-7 are reserved for future use.

Buffer 2 currently only supports none/plain types because the
io_uring framework can only handle one fixed buffer per request
(via req->buf_index). The 3-bit encoding provides room for
future enhancements.

Buffer metadata (addresses, lengths) is stored in the extended
uring_bpf_data structure and is accessible to BPF programs with
readonly permissions. Buffer types can be extracted from the opf
field using IORING_BPF_BUF1_TYPE() and IORING_BPF_BUF2_TYPE()
macros.

Valid buffer combinations:
- 0 buffers
- 1 plain buffer
- 1 fixed buffer
- 2 plain buffers
- 1 fixed buffer + 1 plain buffer

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 include/uapi/linux/io_uring.h | 45 +++++++++++++++++++++++--
 io_uring/bpf.c                | 63 ++++++++++++++++++++++++++++++++++-
 io_uring/uring_bpf.h          | 12 ++++++-
 3 files changed, 116 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 94d2050131ac..950f4cfbbf86 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -429,12 +429,53 @@ enum io_uring_op {
 #define IORING_SEND_VECTORIZED		(1U << 5)
 
 /*
- * sqe->bpf_op_flags		top 8bits is for storing bpf op
- *				The other 24bits are used for bpf prog
+ * sqe->bpf_op_flags layout (32 bits total):
+ *   Bits 31-24: BPF operation ID (8 bits, 256 possible operations)
+ *   Bits 23-21: Buffer 1 type (3 bits: none/plain/fixed/reserved)
+ *   Bits 20-18: Buffer 2 type (3 bits: none/plain/reserved)
+ *   Bits 17-0:  Custom BPF flags (18 bits, available for BPF programs)
+ *
+ * For IORING_OP_BPF, buffers are specified as follows:
+ *   Buffer 1 (plain):  addr=sqe->addr, len=sqe->len
+ *   Buffer 1 (fixed):  index=sqe->buf_index, offset=sqe->addr, len=sqe->len
+ *   Buffer 2 (plain):  addr=sqe->addr3, len=sqe->optlen
+ *
+ * Note: Buffer 1 can be none/plain/fixed. Buffer 2 can only be none/plain.
+ *       3-bit encoding for each buffer allows for future expansion to 8 types (0-7).
+ *       Currently only one fixed buffer per request is supported (buffer 1).
+ *       Valid combinations: 0 buffers, 1 plain, 1 fixed, 2 plain, 1 fixed + 1 plain.
  */
 #define IORING_BPF_OP_BITS	(8)
 #define IORING_BPF_OP_SHIFT	(24)
 
+/* Buffer type encoding in sqe->bpf_op_flags */
+#define IORING_BPF_BUF1_TYPE_SHIFT	(21)
+#define IORING_BPF_BUF2_TYPE_SHIFT	(18)
+#define IORING_BPF_BUF_TYPE_NONE	(0)	/* No buffer */
+#define IORING_BPF_BUF_TYPE_PLAIN	(1)	/* Plain user buffer */
+#define IORING_BPF_BUF_TYPE_FIXED	(2)	/* Fixed/registered buffer */
+#define IORING_BPF_BUF_TYPE_MASK	(7)	/* 3-bit mask */
+
+/* Helper macros to encode/decode buffer types */
+#define IORING_BPF_BUF1_TYPE(flags) \
+	(((flags) >> IORING_BPF_BUF1_TYPE_SHIFT) & IORING_BPF_BUF_TYPE_MASK)
+#define IORING_BPF_BUF2_TYPE(flags) \
+	(((flags) >> IORING_BPF_BUF2_TYPE_SHIFT) & IORING_BPF_BUF_TYPE_MASK)
+#define IORING_BPF_SET_BUF1_TYPE(type) \
+	(((type) & IORING_BPF_BUF_TYPE_MASK) << IORING_BPF_BUF1_TYPE_SHIFT)
+#define IORING_BPF_SET_BUF2_TYPE(type) \
+	(((type) & IORING_BPF_BUF_TYPE_MASK) << IORING_BPF_BUF2_TYPE_SHIFT)
+
+/* Custom BPF flags mask (18 bits available, bits 17-0) */
+#define IORING_BPF_CUSTOM_FLAGS_MASK	((1U << 18) - 1)
+
+/* Encode all components into sqe->bpf_op_flags */
+#define IORING_BPF_OP_FLAGS(op, buf1_type, buf2_type, flags) \
+	(((op) << IORING_BPF_OP_SHIFT) | \
+	 IORING_BPF_SET_BUF1_TYPE(buf1_type) | \
+	 IORING_BPF_SET_BUF2_TYPE(buf2_type) | \
+	 ((flags) & IORING_BPF_CUSTOM_FLAGS_MASK))
+
 /*
  * cqe.res for IORING_CQE_F_NOTIF if
  * IORING_SEND_ZC_REPORT_USAGE was requested
diff --git a/io_uring/bpf.c b/io_uring/bpf.c
index 8227be6d5a10..e837c3d57b96 100644
--- a/io_uring/bpf.c
+++ b/io_uring/bpf.c
@@ -11,8 +11,10 @@
 #include <linux/btf.h>
 #include <linux/btf_ids.h>
 #include <linux/filter.h>
+#include <linux/uio.h>
 #include "io_uring.h"
 #include "uring_bpf.h"
+#include "rsrc.h"
 
 #define MAX_BPF_OPS_COUNT	(1 << IORING_BPF_OP_BITS)
 
@@ -28,7 +30,7 @@ static inline unsigned char uring_bpf_get_op(unsigned int op_flags)
 
 static inline unsigned int uring_bpf_get_flags(unsigned int op_flags)
 {
-	return op_flags & ((1U << IORING_BPF_OP_SHIFT) - 1);
+	return op_flags & IORING_BPF_CUSTOM_FLAGS_MASK;
 }
 
 static inline struct uring_bpf_ops *uring_bpf_get_ops(struct uring_bpf_data *data)
@@ -36,18 +38,77 @@ static inline struct uring_bpf_ops *uring_bpf_get_ops(struct uring_bpf_data *dat
 	return &bpf_ops[uring_bpf_get_op(data->opf)];
 }
 
+static int io_bpf_prep_buffers(struct io_kiocb *req,
+			       const struct io_uring_sqe *sqe,
+			       struct uring_bpf_data *data,
+			       unsigned int op_flags)
+{
+	u8 buf1_type, buf2_type;
+
+	/* Extract buffer configuration from bpf_op_flags */
+	buf1_type = IORING_BPF_BUF1_TYPE(op_flags);
+	buf2_type = IORING_BPF_BUF2_TYPE(op_flags);
+
+	/* Prepare buffer 1 */
+	if (buf1_type == IORING_BPF_BUF_TYPE_PLAIN) {
+		/* Plain user buffer: addr=sqe->addr, len=sqe->len */
+		data->buf1_addr = READ_ONCE(sqe->addr);
+		data->buf1_len = READ_ONCE(sqe->len);
+	} else if (buf1_type == IORING_BPF_BUF_TYPE_FIXED) {
+		/* Fixed buffer: index=sqe->buf_index, offset=sqe->addr, len=sqe->len */
+		req->buf_index = READ_ONCE(sqe->buf_index);
+		data->buf1_addr = READ_ONCE(sqe->addr);  /* offset within fixed buffer */
+		data->buf1_len = READ_ONCE(sqe->len);
+
+		/* Validate buffer index */
+		if (unlikely(!req->ctx->buf_table.nr))
+			return -EFAULT;
+		if (unlikely(req->buf_index >= req->ctx->buf_table.nr))
+			return -EINVAL;
+	} else if (buf1_type == IORING_BPF_BUF_TYPE_NONE) {
+		data->buf1_addr = 0;
+		data->buf1_len = 0;
+	} else {
+		return -EINVAL;
+	}
+
+	/* Prepare buffer 2 (plain only - io_uring only supports one fixed buffer) */
+	if (buf2_type == IORING_BPF_BUF_TYPE_PLAIN) {
+		/* Plain user buffer: addr=sqe->addr3, len=sqe->optlen */
+		data->buf2_addr = READ_ONCE(sqe->addr3);
+		data->buf2_len = READ_ONCE(sqe->optlen);
+	} else if (buf2_type == IORING_BPF_BUF_TYPE_NONE) {
+		data->buf2_addr = 0;
+		data->buf2_len = 0;
+	} else {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+
 int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
 	unsigned int op_flags = READ_ONCE(sqe->bpf_op_flags);
 	struct uring_bpf_ops *ops;
+	int ret;
 
 	if (!(req->ctx->flags & IORING_SETUP_BPF))
 		return -EACCES;
 
+	if (uring_bpf_get_flags(op_flags))
+		return -EINVAL;
+
 	data->opf = op_flags;
 	ops = &bpf_ops[uring_bpf_get_op(data->opf)];
 
+	/* Prepare buffers based on buffer type flags */
+	ret = io_bpf_prep_buffers(req, sqe, data, op_flags);
+	if (ret)
+		return ret;
+
 	if (ops->prep_fn)
 		return ops->prep_fn(data, sqe);
 	return -EOPNOTSUPP;
diff --git a/io_uring/uring_bpf.h b/io_uring/uring_bpf.h
index c76eba887d22..c919931cb4b0 100644
--- a/io_uring/uring_bpf.h
+++ b/io_uring/uring_bpf.h
@@ -7,8 +7,18 @@ struct uring_bpf_data {
 	struct file     *file;
 	u32		opf;
 
+	/* Buffer 1 metadata - readable for bpf prog */
+	u32		buf1_len;		/* buffer 1 length, bytes 12-15 */
+	u64		buf1_addr;		/* buffer 1 address or offset, bytes 16-23 */
+
+	/* Buffer 2 metadata - readable for bpf prog (plain only) */
+	u64		buf2_addr;		/* buffer 2 address, bytes 24-31 */
+	u32		buf2_len;		/* buffer 2 length, bytes 32-35 */
+	u32		__pad;			/* padding, bytes 36-39 */
+
 	/* writeable for bpf prog */
-	u8              pdu[64 - sizeof(struct file *) - sizeof(u32)];
+	u8              pdu[64 - sizeof(struct file *) - 4 * sizeof(u32) -
+		2 * sizeof(u64)];
 };
 
 typedef int (*uring_io_prep_t)(struct uring_bpf_data *data,
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc
  2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (3 preceding siblings ...)
  2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
@ 2025-11-04 16:21 ` Ming Lei
  2025-11-07 18:51   ` kernel test robot
  2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
  5 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-11-04 16:21 UTC (permalink / raw)
  To: Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Alexei Starovoitov,
	Ming Lei

Add io_uring_bpf_req_memcpy() kfunc to enable BPF programs to copy
data between buffers associated with IORING_OP_BPF requests.

The kfunc supports copying between:
- Plain user buffers (using import_ubuf())
- Fixed/registered buffers (using io_import_reg_buf())
- Mixed combinations (plain-to-fixed, fixed-to-plain)

This enables BPF programs to implement data transformation and
processing operations directly within io_uring's request context,
avoiding additional userspace copies.

Implementation details:

1. Add issue_flags tracking in struct uring_bpf_data:
   - Replace __pad field with issue_flags (bytes 36-39)
   - Initialized to 0 before ops->prep_fn()
   - Saved from issue_flags parameter before ops->issue_fn()
   - Required by io_import_reg_buf() for proper async handling

2. Add buffer preparation infrastructure:
   - io_bpf_prep_buffers() extracts buffer metadata from SQE
   - Buffer 1: plain (addr/len) or fixed (buf_index/addr/len)
   - Buffer 2: plain only (addr3/optlen)
   - Buffer types encoded in sqe->bpf_op_flags bits 23-18

3. io_uring_bpf_req_memcpy() implementation:
   - Validates buffer IDs (1 or 2) and prevents same-buffer copies
   - Extracts buffer metadata based on buffer ID
   - Sets up iov_iters using import_ubuf() or io_import_reg_buf()
   - Performs page-sized chunked copying via temporary buffer
   - Returns bytes copied or negative error code

Buffer encoding in sqe->bpf_op_flags (32 bits):
  Bits 31-24: BPF operation ID (8 bits)
  Bits 23-21: Buffer 1 type (0=none, 1=plain, 2=fixed)
  Bits 20-18: Buffer 2 type (0=none, 1=plain)
  Bits 17-0:  Custom BPF flags (18 bits)

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 io_uring/bpf.c       | 187 +++++++++++++++++++++++++++++++++++++++++++
 io_uring/uring_bpf.h |  11 ++-
 2 files changed, 197 insertions(+), 1 deletion(-)

diff --git a/io_uring/bpf.c b/io_uring/bpf.c
index e837c3d57b96..ee4c617e3904 100644
--- a/io_uring/bpf.c
+++ b/io_uring/bpf.c
@@ -109,6 +109,8 @@ int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (ret)
 		return ret;
 
+	/* ctx->uring_lock is held */
+	data->issue_flags = 0;
 	if (ops->prep_fn)
 		return ops->prep_fn(data, sqe);
 	return -EOPNOTSUPP;
@@ -126,6 +128,9 @@ static int __io_uring_bpf_issue(struct io_kiocb *req)
 
 int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
 {
+	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
+
+	data->issue_flags = issue_flags;
 	if (issue_flags & IO_URING_F_UNLOCKED) {
 		int idx, ret;
 
@@ -143,6 +148,8 @@ void io_uring_bpf_fail(struct io_kiocb *req)
 	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
 	struct uring_bpf_ops *ops = uring_bpf_get_ops(data);
 
+	/* ctx->uring_lock is held */
+	data->issue_flags = 0;
 	if (ops->fail_fn)
 		ops->fail_fn(data);
 }
@@ -152,6 +159,8 @@ void io_uring_bpf_cleanup(struct io_kiocb *req)
 	struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
 	struct uring_bpf_ops *ops = uring_bpf_get_ops(data);
 
+	/* ctx->uring_lock is held */
+	data->issue_flags = 0;
 	if (ops->cleanup_fn)
 		ops->cleanup_fn(data);
 }
@@ -324,6 +333,104 @@ static struct bpf_struct_ops bpf_uring_bpf_ops = {
 	.owner = THIS_MODULE,
 };
 
+/*
+ * Helper to copy data between two iov_iters using page extraction.
+ * Extracts pages from source iterator and copies them to destination.
+ * Returns number of bytes copied or negative error code.
+ */
+static ssize_t io_bpf_copy_iters(struct iov_iter *src, struct iov_iter *dst,
+				 size_t len)
+{
+#define MAX_PAGES_PER_LOOP 32
+	struct page *pages[MAX_PAGES_PER_LOOP];
+	size_t total_copied = 0;
+	bool need_unpin;
+
+	/* Determine if we'll need to unpin pages later */
+	need_unpin = user_backed_iter(src);
+
+	/* Process pages in chunks */
+	while (len > 0) {
+		struct page **page_array = pages;
+		size_t offset, copied = 0;
+		ssize_t extracted;
+		unsigned int nr_pages;
+		size_t chunk_len;
+		int i;
+
+		/* Extract up to MAX_PAGES_PER_LOOP pages */
+		chunk_len = min_t(size_t, len, MAX_PAGES_PER_LOOP * PAGE_SIZE);
+		extracted = iov_iter_extract_pages(src, &page_array, chunk_len,
+						   MAX_PAGES_PER_LOOP, 0, &offset);
+		if (extracted <= 0) {
+			if (total_copied > 0)
+				break;
+			return extracted < 0 ? extracted : -EFAULT;
+		}
+
+		nr_pages = DIV_ROUND_UP(offset + extracted, PAGE_SIZE);
+
+		/* Copy pages to destination iterator */
+		for (i = 0; i < nr_pages && copied < extracted; i++) {
+			size_t page_offset = (i == 0) ? offset : 0;
+			size_t page_len = min_t(size_t, extracted - copied,
+						PAGE_SIZE - page_offset);
+			size_t n;
+
+			n = copy_page_to_iter(pages[i], page_offset, page_len, dst);
+			copied += n;
+			if (n < page_len)
+				break;
+		}
+
+		/* Clean up extracted pages */
+		if (need_unpin)
+			unpin_user_pages(pages, nr_pages);
+
+		total_copied += copied;
+		len -= copied;
+
+		/* Stop if we didn't copy all extracted data */
+		if (copied < extracted)
+			break;
+	}
+
+	return total_copied;
+#undef MAX_PAGES_PER_LOOP
+}
+
+/*
+ * Helper to import a buffer into an iov_iter for BPF memcpy operations.
+ * Handles both plain user buffers and fixed/registered buffers.
+ *
+ * @req: io_kiocb request
+ * @iter: output iterator
+ * @buf_type: buffer type (plain or fixed)
+ * @addr: buffer address
+ * @offset: offset into buffer
+ * @len: length from offset
+ * @direction: ITER_SOURCE for source buffer, ITER_DEST for destination
+ * @issue_flags: io_uring issue flags
+ *
+ * Returns 0 on success, negative error code on failure.
+ */
+static int io_bpf_import_buffer(struct io_kiocb *req, struct iov_iter *iter,
+				u8 buf_type, u64 addr, unsigned int offset,
+				u32 len, int direction, unsigned int issue_flags)
+{
+	if (buf_type == IORING_BPF_BUF_TYPE_PLAIN) {
+		/* Plain user buffer */
+		return import_ubuf(direction, (void __user *)(addr + offset),
+				   len - offset, iter);
+	} else if (buf_type == IORING_BPF_BUF_TYPE_FIXED) {
+		/* Fixed buffer */
+		return io_import_reg_buf(req, iter, addr + offset,
+					 len - offset, direction, issue_flags);
+	}
+
+	return -EINVAL;
+}
+
 __bpf_kfunc_start_defs();
 __bpf_kfunc void uring_bpf_set_result(struct uring_bpf_data *data, int res)
 {
@@ -339,11 +446,91 @@ __bpf_kfunc struct io_kiocb *uring_bpf_data_to_req(struct uring_bpf_data *data)
 {
 	return cmd_to_io_kiocb(data);
 }
+
+/**
+ * io_uring_bpf_req_memcpy - Copy data between io_uring BPF request buffers
+ * @data: BPF request data containing buffer metadata
+ * @dest: Destination buffer descriptor (with buf_id and offset)
+ * @src: Source buffer descriptor (with buf_id and offset)
+ * @len: Number of bytes to copy
+ *
+ * Copies data between two different io_uring BPF request buffers (buf_id 1 and 2).
+ * Supports: plain-to-plain, fixed-to-plain, and plain-to-fixed.
+ * Does not support copying within the same buffer (src and dest must be different).
+ *
+ * Returns: Number of bytes copied on success, negative error code on failure
+ */
+__bpf_kfunc int io_uring_bpf_req_memcpy(struct uring_bpf_data *data,
+					struct bpf_req_mem_desc *dest,
+					struct bpf_req_mem_desc *src,
+					unsigned int len)
+{
+	struct io_kiocb *req = cmd_to_io_kiocb(data);
+	struct iov_iter dst_iter, src_iter;
+	u8 dst_type, src_type;
+	u64 dst_addr, src_addr;
+	u32 dst_len, src_len;
+	int ret;
+
+	/* Validate buffer IDs */
+	if (dest->buf_id < 1 || dest->buf_id > 2 ||
+	    src->buf_id < 1 || src->buf_id > 2)
+		return -EINVAL;
+
+	/* Don't allow copying within the same buffer */
+	if (src->buf_id == dest->buf_id)
+		return -EINVAL;
+
+	/* Extract source buffer metadata */
+	if (src->buf_id == 1) {
+		src_type = IORING_BPF_BUF1_TYPE(data->opf);
+		src_addr = data->buf1_addr;
+		src_len = data->buf1_len;
+	} else {
+		src_type = IORING_BPF_BUF2_TYPE(data->opf);
+		src_addr = data->buf2_addr;
+		src_len = data->buf2_len;
+	}
+
+	/* Extract destination buffer metadata */
+	if (dest->buf_id == 1) {
+		dst_type = IORING_BPF_BUF1_TYPE(data->opf);
+		dst_addr = data->buf1_addr;
+		dst_len = data->buf1_len;
+	} else {
+		dst_type = IORING_BPF_BUF2_TYPE(data->opf);
+		dst_addr = data->buf2_addr;
+		dst_len = data->buf2_len;
+	}
+
+	/* Validate offsets and lengths */
+	if (src->offset + len > src_len || dest->offset + len > dst_len)
+		return -EINVAL;
+
+	/* Initialize source iterator */
+	ret = io_bpf_import_buffer(req, &src_iter, src_type,
+				   src_addr, src->offset, src_len,
+				   ITER_SOURCE, data->issue_flags);
+	if (ret)
+		return ret;
+
+	/* Initialize destination iterator */
+	ret = io_bpf_import_buffer(req, &dst_iter, dst_type,
+				   dst_addr, dest->offset, dst_len,
+				   ITER_DEST, data->issue_flags);
+	if (ret)
+		return ret;
+
+	/* Extract pages from source iterator and copy to destination */
+	return io_bpf_copy_iters(&src_iter, &dst_iter, len);
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(uring_bpf_kfuncs)
 BTF_ID_FLAGS(func, uring_bpf_set_result)
 BTF_ID_FLAGS(func, uring_bpf_data_to_req)
+BTF_ID_FLAGS(func, io_uring_bpf_req_memcpy)
 BTF_KFUNCS_END(uring_bpf_kfuncs)
 
 static const struct btf_kfunc_id_set uring_kfunc_set = {
diff --git a/io_uring/uring_bpf.h b/io_uring/uring_bpf.h
index c919931cb4b0..d6e0d6dff82e 100644
--- a/io_uring/uring_bpf.h
+++ b/io_uring/uring_bpf.h
@@ -14,13 +14,22 @@ struct uring_bpf_data {
 	/* Buffer 2 metadata - readable for bpf prog (plain only) */
 	u64		buf2_addr;		/* buffer 2 address, bytes 24-31 */
 	u32		buf2_len;		/* buffer 2 length, bytes 32-35 */
-	u32		__pad;			/* padding, bytes 36-39 */
+	u32		issue_flags;		/* issue_flags from io_uring, bytes 36-39 */
 
 	/* writeable for bpf prog */
 	u8              pdu[64 - sizeof(struct file *) - 4 * sizeof(u32) -
 		2 * sizeof(u64)];
 };
 
+/*
+ * Descriptor for io_uring BPF request buffer.
+ * Used by io_uring_bpf_req_memcpy() to identify which buffer to copy from/to.
+ */
+struct bpf_req_mem_desc {
+	u8		buf_id;		/* Buffer ID: 1 or 2 */
+	unsigned int	offset;		/* Offset into buffer */
+};
+
 typedef int (*uring_io_prep_t)(struct uring_bpf_data *data,
 			       const struct io_uring_sqe *sqe);
 typedef int (*uring_io_issue_t)(struct uring_bpf_data *data);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc
  2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
@ 2025-11-07 18:51   ` kernel test robot
  0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-11-07 18:51 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, io-uring
  Cc: oe-kbuild-all, Caleb Sander Mateos, Akilesh Kailash, bpf,
	Alexei Starovoitov, Ming Lei

Hi Ming,

kernel test robot noticed the following build warnings:

[auto build test WARNING on next-20251104]
[cannot apply to bpf-next/net bpf-next/master bpf/master linus/master v6.18-rc4 v6.18-rc3 v6.18-rc2 v6.18-rc4]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ming-Lei/io_uring-prepare-for-extending-io_uring-with-bpf/20251105-002757
base:   next-20251104
patch link:    https://lore.kernel.org/r/20251104162123.1086035-6-ming.lei%40redhat.com
patch subject: [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc
config: openrisc-allyesconfig (https://download.01.org/0day-ci/archive/20251108/202511080255.v8F8GrXF-lkp@intel.com/config)
compiler: or1k-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251108/202511080255.v8F8GrXF-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511080255.v8F8GrXF-lkp@intel.com/

All warnings (new ones prefixed by >>):

   io_uring/bpf.c: In function 'io_bpf_import_buffer':
>> io_uring/bpf.c:423:47: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     423 |                 return import_ubuf(direction, (void __user *)(addr + offset),
         |                                               ^
   In file included from include/linux/bpf_verifier.h:7,
                    from io_uring/bpf.c:9:
   io_uring/bpf.c: In function 'io_bpf_init':
   include/linux/bpf.h:2044:50: warning: statement with no effect [-Wunused-value]
    2044 | #define register_bpf_struct_ops(st_ops, type) ({ (void *)(st_ops); 0; })
         |                                                  ^~~~~~~~~~~~~~~~
   io_uring/bpf.c:551:15: note: in expansion of macro 'register_bpf_struct_ops'
     551 |         err = register_bpf_struct_ops(&bpf_uring_bpf_ops, uring_bpf_ops);
         |               ^~~~~~~~~~~~~~~~~~~~~~~
   In file included from <command-line>:
   In function 'io_kiocb_cmd_sz_check',
       inlined from 'io_uring_bpf_prep' at io_uring/bpf.c:93:32:
   include/linux/compiler_types.h:603:45: error: call to '__compiletime_assert_598' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:584:25: note: in definition of macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:603:9: note: in expansion of macro '_compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^~~~~~~~~~~~~~~~
   include/linux/io_uring_types.h:655:9: note: in expansion of macro 'BUILD_BUG_ON'
     655 |         BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
         |         ^~~~~~~~~~~~
   In function 'io_kiocb_cmd_sz_check',
       inlined from 'io_uring_bpf_issue' at io_uring/bpf.c:131:32:
   include/linux/compiler_types.h:603:45: error: call to '__compiletime_assert_598' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:584:25: note: in definition of macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:603:9: note: in expansion of macro '_compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^~~~~~~~~~~~~~~~
   include/linux/io_uring_types.h:655:9: note: in expansion of macro 'BUILD_BUG_ON'
     655 |         BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
         |         ^~~~~~~~~~~~
   In function 'io_kiocb_cmd_sz_check',
       inlined from 'io_uring_bpf_fail' at io_uring/bpf.c:148:32:
   include/linux/compiler_types.h:603:45: error: call to '__compiletime_assert_598' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:584:25: note: in definition of macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:603:9: note: in expansion of macro '_compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^~~~~~~~~~~~~~~~
   include/linux/io_uring_types.h:655:9: note: in expansion of macro 'BUILD_BUG_ON'
     655 |         BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
         |         ^~~~~~~~~~~~
   In function 'io_kiocb_cmd_sz_check',
       inlined from 'io_uring_bpf_cleanup' at io_uring/bpf.c:159:32:
   include/linux/compiler_types.h:603:45: error: call to '__compiletime_assert_598' declared with attribute error: BUILD_BUG_ON failed: cmd_sz > sizeof(struct io_cmd_data)
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:584:25: note: in definition of macro '__compiletime_assert'
     584 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:603:9: note: in expansion of macro '_compiletime_assert'
     603 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
         |         ^~~~~~~~~~~~~~~~
   include/linux/io_uring_types.h:655:9: note: in expansion of macro 'BUILD_BUG_ON'
     655 |         BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data));
         |         ^~~~~~~~~~~~


vim +423 io_uring/bpf.c

   401	
   402	/*
   403	 * Helper to import a buffer into an iov_iter for BPF memcpy operations.
   404	 * Handles both plain user buffers and fixed/registered buffers.
   405	 *
   406	 * @req: io_kiocb request
   407	 * @iter: output iterator
   408	 * @buf_type: buffer type (plain or fixed)
   409	 * @addr: buffer address
   410	 * @offset: offset into buffer
   411	 * @len: length from offset
   412	 * @direction: ITER_SOURCE for source buffer, ITER_DEST for destination
   413	 * @issue_flags: io_uring issue flags
   414	 *
   415	 * Returns 0 on success, negative error code on failure.
   416	 */
   417	static int io_bpf_import_buffer(struct io_kiocb *req, struct iov_iter *iter,
   418					u8 buf_type, u64 addr, unsigned int offset,
   419					u32 len, int direction, unsigned int issue_flags)
   420	{
   421		if (buf_type == IORING_BPF_BUF_TYPE_PLAIN) {
   422			/* Plain user buffer */
 > 423			return import_ubuf(direction, (void __user *)(addr + offset),
   424					   len - offset, iter);
   425		} else if (buf_type == IORING_BPF_BUF_TYPE_FIXED) {
   426			/* Fixed buffer */
   427			return io_import_reg_buf(req, iter, addr + offset,
   428						 len - offset, direction, issue_flags);
   429		}
   430	
   431		return -EINVAL;
   432	}
   433	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring
  2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
                   ` (4 preceding siblings ...)
  2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
@ 2025-11-05 12:47 ` Pavel Begunkov
  2025-11-05 15:57   ` Ming Lei
  5 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2025-11-05 12:47 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, io-uring
  Cc: Caleb Sander Mateos, Akilesh Kailash, bpf, Alexei Starovoitov

On 11/4/25 16:21, Ming Lei wrote:
> Hello,
> 
> Add IORING_OP_BPF for extending io_uring operations, follows typical cases:

BPF requests were tried long time ago and it wasn't great. Performance
for short BPF programs is not great because of io_uring request handling
overhead. And flexibility was severely lacking, so even simple use cases
were looking pretty ugly, internally, and for BPF writers as well.

I'm not so sure about your criteria, but my requirement was to at least
being able to reuse all io_uring IO handling, i.e. submitting requests,
and to wait/process completions, otherwise a lot of opportunities are
wasted. My approach from a few months back [1] controlling requests from
the outside was looking much better. At least it covered a bunch of needs
without extra changes. I was just wiring up io_uring changes I wanted
to make BPF writer lifes easier. Let me resend the bpf series with it.

It makes me wonder if they are complementary, but I'm not sure what
your use cases are and what capabilities it might need.

[1] https://lore.kernel.org/io-uring/cover.1749214572.git.asml.silence@gmail.com/

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring
  2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
@ 2025-11-05 15:57   ` Ming Lei
  2025-11-06 16:03     ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-11-05 15:57 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, io-uring, Caleb Sander Mateos, Akilesh Kailash, bpf,
	Alexei Starovoitov

On Wed, Nov 05, 2025 at 12:47:58PM +0000, Pavel Begunkov wrote:
> On 11/4/25 16:21, Ming Lei wrote:
> > Hello,
> > 
> > Add IORING_OP_BPF for extending io_uring operations, follows typical cases:
> 
> BPF requests were tried long time ago and it wasn't great. Performance

Care to share the link so I can learn from the lesson? Maybe things have
changed now...

> for short BPF programs is not great because of io_uring request handling
> overhead. And flexibility was severely lacking, so even simple use cases

What is the overhead? In this patch, OP's prep() and issue() are defined in
bpf prog, but in typical use case, the code size is pretty small, and bpf
prog code is supposed to run in fast path.

> were looking pretty ugly, internally, and for BPF writers as well.

I am not sure what `simple use cases` you are talking about.

> 
> I'm not so sure about your criteria, but my requirement was to at least
> being able to reuse all io_uring IO handling, i.e. submitting requests,
> and to wait/process completions, otherwise a lot of opportunities are
> wasted. My approach from a few months back [1] controlling requests from

Please read the patchset.

This patchset defines new IORING_BPF_OP code, which's ->prep(), ->issue(), ...,
are hooked with struct_ops prog, so all io_uring core code is used, just the
exact IORING_BPF_OP behavior is defined by struct_ops prog.

> the outside was looking much better. At least it covered a bunch of needs
> without extra changes. I was just wiring up io_uring changes I wanted
> to make BPF writer lifes easier. Let me resend the bpf series with it.
> 
> It makes me wonder if they are complementary, but I'm not sure what

I think the two are orthogonal in function, and they can co-exist.

> your use cases are and what capabilities it might need.

The main use cases are described in cover letter and the 3rd patch, please
find the details there.

So far the main case is to access the registered (kernel)buffer
from issue() callback of struct_ops, because the buffer doesn't have
userspace mapping. The last two patches adds support to provide two
buffers(fixed, plain) for IORING_BPF_OP, and in future vectored buffer
will be added too, so IORING_BPF_OP can handle buffer flexibly, such as:

- use exported compress kfunc to compress data from kernel buffer
into another buffer or inplace, then the following linked SQE can be submitted
to write the built compressed data into storage

- in raid use case, calculate IO data parity from kernel buffer, and store
the parity data to another plain user buffer, then the following linked SQE
can be submitted to write the built parity data to storage

Even for userspace buffer, the BPF_OP can support similar handling for saving
one extra io_uring_enter() syscall.

> 
> [1] https://lore.kernel.org/io-uring/cover.1749214572.git.asml.silence@gmail.com/

I looked at your patches, in which SQE is generated in bpf prog(kernel),
and it can't be used in my case.

Thanks,
Ming

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring
  2025-11-05 15:57   ` Ming Lei
@ 2025-11-06 16:03     ` Pavel Begunkov
  2025-11-07 15:54       ` Ming Lei
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2025-11-06 16:03 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, io-uring, Caleb Sander Mateos, Akilesh Kailash, bpf,
	Alexei Starovoitov

On 11/5/25 15:57, Ming Lei wrote:
> On Wed, Nov 05, 2025 at 12:47:58PM +0000, Pavel Begunkov wrote:
>> On 11/4/25 16:21, Ming Lei wrote:
>>> Hello,
>>>
>>> Add IORING_OP_BPF for extending io_uring operations, follows typical cases:
>>
>> BPF requests were tried long time ago and it wasn't great. Performance
> 
> Care to share the link so I can learn from the lesson? Maybe things have
> changed now...

https://lore.kernel.org/io-uring/a83f147b-ea9d-e693-a2e9-c6ce16659749@gmail.com/T/#m31d0a2ac6e2213f912a200f5e8d88bd74f81406b

There were some extra features and testing from folks, but I don't
think it was ever posted to the list.

>> for short BPF programs is not great because of io_uring request handling
>> overhead. And flexibility was severely lacking, so even simple use cases
> 
> What is the overhead? In this patch, OP's prep() and issue() are defined in

The overhead of creating, freeing and executing a request. If you use
it with links, it's also overhead of that. That prototype could also
optionally wait for completions, and it wasn't free either.

> bpf prog, but in typical use case, the code size is pretty small, and bpf
> prog code is supposed to run in fast path.> 
>> were looking pretty ugly, internally, and for BPF writers as well.
> 
> I am not sure what `simple use cases` you are talking about.

As an example, creating a loop reading a file:
read N bytes; wait for completion; repeat

>> I'm not so sure about your criteria, but my requirement was to at least
>> being able to reuse all io_uring IO handling, i.e. submitting requests,
>> and to wait/process completions, otherwise a lot of opportunities are
>> wasted. My approach from a few months back [1] controlling requests from
> 
> Please read the patchset.
> 
> This patchset defines new IORING_BPF_OP code, which's ->prep(), ->issue(), ...,
> are hooked with struct_ops prog, so all io_uring core code is used, just the
> exact IORING_BPF_OP behavior is defined by struct_ops prog.

Right, but I'm talking about what the io_uring BPF program is capable
of doing.

>> the outside was looking much better. At least it covered a bunch of needs
>> without extra changes. I was just wiring up io_uring changes I wanted
>> to make BPF writer lifes easier. Let me resend the bpf series with it.
>>
>> It makes me wonder if they are complementary, but I'm not sure what
> 
> I think the two are orthogonal in function, and they can co-exist.
> 
>> your use cases are and what capabilities it might need.
> 
> The main use cases are described in cover letter and the 3rd patch, please
> find the details there.
> 
> So far the main case is to access the registered (kernel)buffer
> from issue() callback of struct_ops, because the buffer doesn't have
> userspace mapping. The last two patches adds support to provide two
> buffers(fixed, plain) for IORING_BPF_OP, and in future vectored buffer
> will be added too, so IORING_BPF_OP can handle buffer flexibly, such as:
> 
> - use exported compress kfunc to compress data from kernel buffer
> into another buffer or inplace, then the following linked SQE can be submitted
> to write the built compressed data into storage
> 
> - in raid use case, calculate IO data parity from kernel buffer, and store
> the parity data to another plain user buffer, then the following linked SQE
> can be submitted to write the built parity data to storage
> 
> Even for userspace buffer, the BPF_OP can support similar handling for saving
> one extra io_uring_enter() syscall.

Sure, registered buffer handling was one of the use cases for
that recent re-itarations as well, and David Wei had some thoughts
for it as well. Though, it was not exactly about copying.

>> [1] https://lore.kernel.org/io-uring/cover.1749214572.git.asml.silence@gmail.com/
> 
> I looked at your patches, in which SQE is generated in bpf prog(kernel),

Quick note: userspace and BPF are both allowed to submit
requests / generate SQEs.

> and it can't be used in my case.
Hmm, how so? Let's say ublk registers a buffer and posts a
completion. Then BPF runs, it sees the completion and does the
necessary processing, probably using some kfuncs like the ones
you introduced. After it can optionally queue up requests
writing it to the storage or anything else.

The reason I'm asking is because it's supposed to be able to
do anything the userspace can already achieve (and more). So,
if it can't be used for this use cases, there should be some
problem in my design.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring
  2025-11-06 16:03     ` Pavel Begunkov
@ 2025-11-07 15:54       ` Ming Lei
  2025-11-11 14:07         ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: Ming Lei @ 2025-11-07 15:54 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Jens Axboe, io-uring, Caleb Sander Mateos, Akilesh Kailash, bpf,
	Alexei Starovoitov

On Thu, Nov 06, 2025 at 04:03:29PM +0000, Pavel Begunkov wrote:
> On 11/5/25 15:57, Ming Lei wrote:
> > On Wed, Nov 05, 2025 at 12:47:58PM +0000, Pavel Begunkov wrote:
> > > On 11/4/25 16:21, Ming Lei wrote:
> > > > Hello,
> > > > 
> > > > Add IORING_OP_BPF for extending io_uring operations, follows typical cases:
> > > 
> > > BPF requests were tried long time ago and it wasn't great. Performance
> > 
> > Care to share the link so I can learn from the lesson? Maybe things have
> > changed now...
> 
> https://lore.kernel.org/io-uring/a83f147b-ea9d-e693-a2e9-c6ce16659749@gmail.com/T/#m31d0a2ac6e2213f912a200f5e8d88bd74f81406b
> 
> There were some extra features and testing from folks, but I don't
> think it was ever posted to the list.

Thanks for sharing the link:

```
The main problem solved is feeding completion information of other
requests in a form of CQEs back into BPF. I decided to wire up support
for multiple completion queues (aka CQs) and give BPF programs access to
them, so leaving userspace in control over synchronisation that should
be much more flexible that the link-based approach.
```

Looks it is totally different with my patch in motivation and policy.

I do _not_ want to move application logic into kernel by building SQE from
kernel prog. With IORING_OP_BPF, the whole io_uring application is
built & maintained completely in userspace, so I needn't to do cumbersome
kernel/user communication just for setting up one SQE in prog, not mention
maintaining SQE's relation with userspace side's.

> 
> > > for short BPF programs is not great because of io_uring request handling
> > > overhead. And flexibility was severely lacking, so even simple use cases
> > 
> > What is the overhead? In this patch, OP's prep() and issue() are defined in
> 
> The overhead of creating, freeing and executing a request. If you use
> it with links, it's also overhead of that. That prototype could also
> optionally wait for completions, and it wasn't free either.

IORING_OP_BPF is same with existing normal io_uring request and link, wrt
all above you mentioned.

IORING_OP_BPF's motivation is for being io_uring's supplementary or extention
in function, not for improving performance.

> 
> > bpf prog, but in typical use case, the code size is pretty small, and bpf
> > prog code is supposed to run in fast path.>
> > > were looking pretty ugly, internally, and for BPF writers as well.
> > 
> > I am not sure what `simple use cases` you are talking about.
> 
> As an example, creating a loop reading a file:
> read N bytes; wait for completion; repeat

IORING_OP_BPF isn't supposed to implement FS operation in bpf prog.

It doesn't mean IORING_OP_BPF can't support async issuing:

- issue_wait() can be added for offload in io-wq context

OR

- for typical FS AIO, in theory it can be supported too, just the struct_ops need
to define one completion callback, and the callback can be called from
->ki_complete().

> 
> > > I'm not so sure about your criteria, but my requirement was to at least
> > > being able to reuse all io_uring IO handling, i.e. submitting requests,
> > > and to wait/process completions, otherwise a lot of opportunities are
> > > wasted. My approach from a few months back [1] controlling requests from
> > 
> > Please read the patchset.
> > 
> > This patchset defines new IORING_BPF_OP code, which's ->prep(), ->issue(), ...,
> > are hooked with struct_ops prog, so all io_uring core code is used, just the
> > exact IORING_BPF_OP behavior is defined by struct_ops prog.
> 
> Right, but I'm talking about what the io_uring BPF program is capable
> of doing.

There can be many types of io_uring BPF progs from function viewpoint, we are not
talking about same type.

> 
> > > the outside was looking much better. At least it covered a bunch of needs
> > > without extra changes. I was just wiring up io_uring changes I wanted
> > > to make BPF writer lifes easier. Let me resend the bpf series with it.
> > > 
> > > It makes me wonder if they are complementary, but I'm not sure what
> > 
> > I think the two are orthogonal in function, and they can co-exist.
> > 
> > > your use cases are and what capabilities it might need.
> > 
> > The main use cases are described in cover letter and the 3rd patch, please
> > find the details there.
> > 
> > So far the main case is to access the registered (kernel)buffer
> > from issue() callback of struct_ops, because the buffer doesn't have
> > userspace mapping. The last two patches adds support to provide two
> > buffers(fixed, plain) for IORING_BPF_OP, and in future vectored buffer
> > will be added too, so IORING_BPF_OP can handle buffer flexibly, such as:
> > 
> > - use exported compress kfunc to compress data from kernel buffer
> > into another buffer or inplace, then the following linked SQE can be submitted
> > to write the built compressed data into storage
> > 
> > - in raid use case, calculate IO data parity from kernel buffer, and store
> > the parity data to another plain user buffer, then the following linked SQE
> > can be submitted to write the built parity data to storage
> > 
> > Even for userspace buffer, the BPF_OP can support similar handling for saving
> > one extra io_uring_enter() syscall.
> 
> Sure, registered buffer handling was one of the use cases for
> that recent re-itarations as well, and David Wei had some thoughts
> for it as well. Though, it was not exactly about copying.
> 
> > > [1] https://lore.kernel.org/io-uring/cover.1749214572.git.asml.silence@gmail.com/
> > 
> > I looked at your patches, in which SQE is generated in bpf prog(kernel),
> 
> Quick note: userspace and BPF are both allowed to submit
> requests / generate SQEs.
> 
> > and it can't be used in my case.
> Hmm, how so? Let's say ublk registers a buffer and posts a
> completion. Then BPF runs, it sees the completion and does the
> necessary processing, probably using some kfuncs like the ones

It is easy to say, how can the BPF prog know the next completion is
exactly waiting for? You have to rely on bpf map to communicate with userspace
to understanding what completion is what you are interested in, also
need all information from userpace for preparing the SQE for submission
from bpf prog. Tons of userspace and kernel communication.

> you introduced. After it can optionally queue up requests
> writing it to the storage or anything else.

Again, I do not want to move userspace logic into bpf prog(kernel), what
IORING_BPF_OP provides is to define one operation, then userspace
can use it just like in-kernel operations.

Then existing application can apply IORING_BPF_OP just with little small
change. If submitting SQE from bpf prog, ublk application need re-write
for supporting register buffer based zero copy.

> The reason I'm asking is because it's supposed to be able to
> do anything the userspace can already achieve (and more). So,
> if it can't be used for this use cases, there should be some
> problem in my design.

BPF prog programming is definitely much more limited compared with
userspace application because it is safe kernel programming.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring
  2025-11-07 15:54       ` Ming Lei
@ 2025-11-11 14:07         ` Pavel Begunkov
  0 siblings, 0 replies; 14+ messages in thread
From: Pavel Begunkov @ 2025-11-11 14:07 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, io-uring, Caleb Sander Mateos, Akilesh Kailash, bpf,
	Alexei Starovoitov

On 11/7/25 15:54, Ming Lei wrote:
> On Thu, Nov 06, 2025 at 04:03:29PM +0000, Pavel Begunkov wrote:
>> On 11/5/25 15:57, Ming Lei wrote:
>>> On Wed, Nov 05, 2025 at 12:47:58PM +0000, Pavel Begunkov wrote:
>>>> On 11/4/25 16:21, Ming Lei wrote:
>>>>> Hello,
>>>>>
>>>>> Add IORING_OP_BPF for extending io_uring operations, follows typical cases:
>>>>
>>>> BPF requests were tried long time ago and it wasn't great. Performance
>>>
>>> Care to share the link so I can learn from the lesson? Maybe things have
>>> changed now...
>>
>> https://lore.kernel.org/io-uring/a83f147b-ea9d-e693-a2e9-c6ce16659749@gmail.com/T/#m31d0a2ac6e2213f912a200f5e8d88bd74f81406b
>>
>> There were some extra features and testing from folks, but I don't
>> think it was ever posted to the list.
> 
> Thanks for sharing the link:
> 
> ```
> The main problem solved is feeding completion information of other
> requests in a form of CQEs back into BPF. I decided to wire up support
> for multiple completion queues (aka CQs) and give BPF programs access to
> them, so leaving userspace in control over synchronisation that should
> be much more flexible that the link-based approach.
> ```

FWIW, and those extensions were the sign telling that the approach
wasn't flexible enough.

> Looks it is totally different with my patch in motivation and policy.
> 
> I do _not_ want to move application logic into kernel by building SQE from
> kernel prog. With IORING_OP_BPF, the whole io_uring application is
> built & maintained completely in userspace, so I needn't to do cumbersome
> kernel/user communication just for setting up one SQE in prog, not mention
> maintaining SQE's relation with userspace side's.

It's built and maintained in userspace in either case, and in
both cases you have bpf implementing some logic that was previously
done in userspace. To emphasize, you can do the desired parts of
handling in BPF, and I'm not suggesting moving the entirety of
request processing in there.

>>>> for short BPF programs is not great because of io_uring request handling
>>>> overhead. And flexibility was severely lacking, so even simple use cases
>>>
>>> What is the overhead? In this patch, OP's prep() and issue() are defined in
>>
>> The overhead of creating, freeing and executing a request. If you use
>> it with links, it's also overhead of that. That prototype could also
>> optionally wait for completions, and it wasn't free either.
> 
> IORING_OP_BPF is same with existing normal io_uring request and link, wrt
> all above you mentioned.

It is, but it's an extra request, and in previous testing overhead
for that extra request was affecting total performance, that's why
linking or not is also important.

> IORING_OP_BPF's motivation is for being io_uring's supplementary or extention
> in function, not for improving performance.
> 
>>
>>> bpf prog, but in typical use case, the code size is pretty small, and bpf
>>> prog code is supposed to run in fast path.>
>>>> were looking pretty ugly, internally, and for BPF writers as well.
>>>
>>> I am not sure what `simple use cases` you are talking about.
>>
>> As an example, creating a loop reading a file:
>> read N bytes; wait for completion; repeat
> 
> IORING_OP_BPF isn't supposed to implement FS operation in bpf prog.
> 
> It doesn't mean IORING_OP_BPF can't support async issuing:
> 
> - issue_wait() can be added for offload in io-wq context
> 
> OR
> 
> - for typical FS AIO, in theory it can be supported too, just the struct_ops need
> to define one completion callback, and the callback can be called from
> ->ki_complete().

There is more to IO than read/write, and I'm afraid each new type of
operation would need some extra kfunc glue. And even then there is
enough of handling for rw requests in io_uring than just calling the
callback. It's nicer to be able to reuse all io_uring request
handling, which wouldn't even need extra kfuncs.

...
>>> and it can't be used in my case.
>> Hmm, how so? Let's say ublk registers a buffer and posts a
>> completion. Then BPF runs, it sees the completion and does the
>> necessary processing, probably using some kfuncs like the ones
> 
> It is easy to say, how can the BPF prog know the next completion is
> exactly waiting for? You have to rely on bpf map to communicate with userspace

By taking a peek at and maybe dereferencing cqe->user_data.

> to understanding what completion is what you are interested in, also
> need all information from userpace for preparing the SQE for submission
> from bpf prog. Tons of userspace and kernel communication.

You can setup a BPF arena, and all that comm will be working with
a block of shared memory. Or same but via io_uring parameter region.
That sounds pretty simple.

>> you introduced. After it can optionally queue up requests
>> writing it to the storage or anything else.
> 
> Again, I do not want to move userspace logic into bpf prog(kernel), what
> IORING_BPF_OP provides is to define one operation, then userspace
> can use it just like in-kernel operations.

Right, but that's rather limited. I want to cover all those
use cases with one implementation instead of fragmenting users,
if that can be achieved.

> Then existing application can apply IORING_BPF_OP just with little small
> change. If submitting SQE from bpf prog, ublk application need re-write
> for supporting register buffer based zero copy.
> 
>> The reason I'm asking is because it's supposed to be able to
>> do anything the userspace can already achieve (and more). So,
>> if it can't be used for this use cases, there should be some
>> problem in my design.
> 
> BPF prog programming is definitely much more limited compared with
> userspace application because it is safe kernel programming.

-- 
Pavel Begunkov


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-11-11 14:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
2025-11-07 19:02   ` kernel test robot
2025-11-08  6:53   ` kernel test robot
2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
2025-11-07 18:51   ` kernel test robot
2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
2025-11-05 15:57   ` Ming Lei
2025-11-06 16:03     ` Pavel Begunkov
2025-11-07 15:54       ` Ming Lei
2025-11-11 14:07         ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox