* [PATCHSET RFC v3] Inherited restrictions and BPF filtering
@ 2026-01-15 16:36 Jens Axboe
2026-01-15 16:36 ` [PATCH 1/3] io_uring: move ctx->restrictions to be dynamically allocated Jens Axboe
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Jens Axboe @ 2026-01-15 16:36 UTC (permalink / raw)
To: io-uring
Hi,
Followup to v2 here:
https://lore.kernel.org/io-uring/20260109185155.88150-1-axboe@kernel.dk/
While this is a followup, it takes a different approach to the problem.
What remains is task inheritance - if a set of restrictions are
registered with a task, any children will get it too.
What's new is adding basic support for BPF filters, so that anything
can be filtered. You can add filters for each opcode, and several of
them as well. As the filtering is done after the prep phase, it's even
possible to support filtering based on user structs that are copied in
to the kernel. For now, only IORING_OP_SOCKET is done, and allows
filtering on domain/type/protocol. This is done as an example. A sample
filter for that could look like:
SEC("io_uring_filter")
int socket_filter(struct io_uring_bpf_ctx *ctx)
{
/* Only allow AF_INET and AF_INET6 */
if (ctx->socket.family != AF_INET && ctx->socket.family != AF_INET6)
return 0; /* Reject */
/* Only allow SOCK_STREAM (TCP) */
if (ctx->socket.type != SOCK_STREAM)
return 0; /* Reject */
/* Only allow IPPROTO_TCP or default (0) */
if (ctx->socket.protocol != IPPROTO_TCP && ctx->socket.protocol != 0)
return 0; /* Reject */
return 1; /* Accept */
}
to restrict certain families, types, or protocols.
Just supports SQE opcodes for this kind of filtering, but easily
extendable to cover REGISTER opcodes as well, including arguments.
Sending this out as an RFC for comments. I think this provides most of
the functionality needed to filter basically anything. There's still
some rough edges here, notably the BPF support, as I really don't know
what I'm doing there. But it works for testing, at least... I don't have
a liburing branch for this just yet, let me know if you want some
test/sample code and I'll be happy to toss it over the wall. I'll add a
liburing branch over the weekend for easier experimentation.
Sample based on the above filter:
axboe@m2max-kvm ~> ./io_uring_bpf_loader io_uring_bpf_filter.c.bpf.o
io_uring BPF Socket Filter Test (C-based)
==========================================
io_uring initialized
BPF program loaded successfully from io_uring_bpf_filter.c.bpf.o, fd=4
BPF filter registered for opcode 45
Running tests...
Testing AF_INET TCP (explicit): PASSED (fd=5)
Testing AF_INET TCP (default): PASSED (fd=5)
Testing AF_INET6 TCP (explicit): PASSED (fd=5)
Testing AF_INET6 TCP (default): PASSED (fd=5)
Testing AF_INET UDP: PASSED (correctly rejected)
Testing AF_INET RAW: PASSED (correctly rejected)
Testing AF_UNIX: PASSED (correctly rejected)
Testing AF_INET TCP socket with UDP proto: PASSED (correctly rejected)
or running t/io_uring with IORING_OP_NOP and a filter set. This filter
just allows the opcode, but it's still run on each NOP issued:
axboe@m2max-kvm ~/g/fio (master)> sudo taskset -c 0 t/io_uring -N1 -n1 -E ~/noop_filter.bpf.c.o -B0 -F0 trim.json
submitter=0, tid=2287, file=trim.json, nfiles=1, node=-1
BPF program loaded successfully from /home/axboe/noop_filter.bpf.c.o, fd=5
BPF filter registered for opcode 0
polled=1, fixedbufs=0, register_files=0, buffered=0, QD=128
Engine=io_uring, sq_ring=128, cq_ring=128
IOPS=13.89M, IOS/call=32/32
IOPS=13.90M, IOS/call=32/32
[...]
Comments welcome! Kernel branch can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux.git/log/?h=io_uring-bpf-restrictions
and sits on top of for-7.0/io_uring
include/linux/bpf.h | 1 +
include/linux/bpf_types.h | 4 +
include/linux/io_uring.h | 2 +-
include/linux/io_uring_types.h | 20 +++-
include/linux/sched.h | 1 +
include/uapi/linux/bpf.h | 1 +
include/uapi/linux/io_uring.h | 46 +++++++
io_uring/Makefile | 1 +
io_uring/bpf_filter.c | 212 +++++++++++++++++++++++++++++++++
io_uring/bpf_filter.h | 41 +++++++
io_uring/io_uring.c | 33 ++++-
io_uring/net.c | 9 ++
io_uring/net.h | 5 +
io_uring/register.c | 133 +++++++++++++++++++--
io_uring/register.h | 2 +
io_uring/tctx.c | 26 ++--
kernel/bpf/syscall.c | 9 ++
kernel/fork.c | 4 +
18 files changed, 527 insertions(+), 23 deletions(-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] io_uring: move ctx->restrictions to be dynamically allocated
2026-01-15 16:36 [PATCHSET RFC v3] Inherited restrictions and BPF filtering Jens Axboe
@ 2026-01-15 16:36 ` Jens Axboe
2026-01-15 16:36 ` [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions Jens Axboe
2026-01-15 16:36 ` [PATCH 3/3] io_uring: allow registration of per-task restrictions Jens Axboe
2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2026-01-15 16:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
In preparation for being able to share restrictions, move them to
allocated data rather than be embedded in the ring. This makes it more
obvious when they will have potentially different lifetimes.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/io_uring_types.h | 4 +++-
io_uring/io_uring.c | 12 ++++++-----
io_uring/register.c | 37 +++++++++++++++++++++++++++-------
io_uring/register.h | 2 ++
4 files changed, 42 insertions(+), 13 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 211686ad89fd..c664c84247f1 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -220,6 +220,7 @@ struct io_rings {
};
struct io_restriction {
+ refcount_t refs;
DECLARE_BITMAP(register_op, IORING_REGISTER_LAST);
DECLARE_BITMAP(sqe_op, IORING_OP_LAST);
u8 sqe_flags_allowed;
@@ -342,6 +343,8 @@ struct io_ring_ctx {
struct io_alloc_cache rw_cache;
struct io_alloc_cache cmd_cache;
+ struct io_restriction *restrictions;
+
/*
* Any cancelable uring_cmd is added to this list in
* ->uring_cmd() by io_uring_cmd_insert_cancelable()
@@ -413,7 +416,6 @@ struct io_ring_ctx {
/* Keep this last, we don't need it for the fast path */
struct wait_queue_head poll_wq;
- struct io_restriction restrictions;
/* Stores zcrx object pointers of type struct io_zcrx_ifq */
struct xarray zcrx_ctxs;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 2cde22af78a3..eec8da38a596 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2058,15 +2058,15 @@ static inline bool io_check_restriction(struct io_ring_ctx *ctx,
{
if (!ctx->op_restricted)
return true;
- if (!test_bit(req->opcode, ctx->restrictions.sqe_op))
+ if (!test_bit(req->opcode, ctx->restrictions->sqe_op))
return false;
- if ((sqe_flags & ctx->restrictions.sqe_flags_required) !=
- ctx->restrictions.sqe_flags_required)
+ if ((sqe_flags & ctx->restrictions->sqe_flags_required) !=
+ ctx->restrictions->sqe_flags_required)
return false;
- if (sqe_flags & ~(ctx->restrictions.sqe_flags_allowed |
- ctx->restrictions.sqe_flags_required))
+ if (sqe_flags & ~(ctx->restrictions->sqe_flags_allowed |
+ ctx->restrictions->sqe_flags_required))
return false;
return true;
@@ -2850,6 +2850,8 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
percpu_ref_exit(&ctx->refs);
free_uid(ctx->user);
io_req_caches_free(ctx);
+ if (ctx->restrictions)
+ io_put_restrictions(ctx->restrictions);
WARN_ON_ONCE(ctx->nr_req_allocated);
diff --git a/io_uring/register.c b/io_uring/register.c
index 8551f13920dc..6c99b441d886 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -163,9 +163,28 @@ static __cold int io_parse_restrictions(void __user *arg, unsigned int nr_args,
return ret;
}
+void io_put_restrictions(struct io_restriction *res)
+{
+ if (refcount_dec_and_test(&res->refs))
+ kfree(res);
+}
+
+static struct io_restriction *io_alloc_restrictions(void)
+{
+ struct io_restriction *res;
+
+ res = kzalloc(sizeof(*res), GFP_KERNEL);
+ if (!res)
+ return ERR_PTR(-ENOMEM);
+
+ refcount_set(&res->refs, 1);
+ return res;
+}
+
static __cold int io_register_restrictions(struct io_ring_ctx *ctx,
void __user *arg, unsigned int nr_args)
{
+ struct io_restriction *res;
int ret;
/* Restrictions allowed only if rings started disabled */
@@ -173,19 +192,23 @@ static __cold int io_register_restrictions(struct io_ring_ctx *ctx,
return -EBADFD;
/* We allow only a single restrictions registration */
- if (ctx->restrictions.op_registered || ctx->restrictions.reg_registered)
+ if (ctx->restrictions)
return -EBUSY;
- ret = io_parse_restrictions(arg, nr_args, &ctx->restrictions);
- /* Reset all restrictions if an error happened */
+ res = io_alloc_restrictions();
+ if (IS_ERR(res))
+ return PTR_ERR(res);
+
+ ret = io_parse_restrictions(arg, nr_args, res);
if (ret < 0) {
- memset(&ctx->restrictions, 0, sizeof(ctx->restrictions));
+ io_put_restrictions(res);
return ret;
}
- if (ctx->restrictions.op_registered)
+ if (res->op_registered)
ctx->op_restricted = 1;
- if (ctx->restrictions.reg_registered)
+ if (res->reg_registered)
ctx->reg_restricted = 1;
+ ctx->restrictions = res;
return 0;
}
@@ -637,7 +660,7 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
if (ctx->reg_restricted && !(ctx->flags & IORING_SETUP_R_DISABLED)) {
opcode = array_index_nospec(opcode, IORING_REGISTER_LAST);
- if (!test_bit(opcode, ctx->restrictions.register_op))
+ if (!test_bit(opcode, ctx->restrictions->register_op))
return -EACCES;
}
diff --git a/io_uring/register.h b/io_uring/register.h
index a5f39d5ef9e0..99c59894d163 100644
--- a/io_uring/register.h
+++ b/io_uring/register.h
@@ -6,4 +6,6 @@ int io_eventfd_unregister(struct io_ring_ctx *ctx);
int io_unregister_personality(struct io_ring_ctx *ctx, unsigned id);
struct file *io_uring_register_get_file(unsigned int fd, bool registered);
+void io_put_restrictions(struct io_restriction *res);
+
#endif
--
2.51.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions
2026-01-15 16:36 [PATCHSET RFC v3] Inherited restrictions and BPF filtering Jens Axboe
2026-01-15 16:36 ` [PATCH 1/3] io_uring: move ctx->restrictions to be dynamically allocated Jens Axboe
@ 2026-01-15 16:36 ` Jens Axboe
2026-01-15 20:11 ` Jonathan Corbet
2026-01-15 16:36 ` [PATCH 3/3] io_uring: allow registration of per-task restrictions Jens Axboe
2 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2026-01-15 16:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
This adds support for loading BPF programs with io_uring, which can
restrict the opcodes performed. Unlike IORING_REGISTER_RESTRICTIONS,
using BPF programs allow fine grained control over both the opcode
in question, as well as other data associated with the request.
Initially only IORING_OP_SOCKET is supported.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/bpf.h | 1 +
include/linux/bpf_types.h | 4 +
include/linux/io_uring_types.h | 16 +++
include/uapi/linux/bpf.h | 1 +
include/uapi/linux/io_uring.h | 37 ++++++
io_uring/Makefile | 1 +
io_uring/bpf_filter.c | 212 +++++++++++++++++++++++++++++++++
io_uring/bpf_filter.h | 41 +++++++
io_uring/io_uring.c | 7 ++
io_uring/net.c | 9 ++
io_uring/net.h | 5 +
io_uring/register.c | 33 ++++-
kernel/bpf/syscall.c | 9 ++
13 files changed, 375 insertions(+), 1 deletion(-)
create mode 100644 io_uring/bpf_filter.c
create mode 100644 io_uring/bpf_filter.h
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index e5be698256d1..9b4435452458 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -31,6 +31,7 @@
#include <linux/static_call.h>
#include <linux/memcontrol.h>
#include <linux/cfi.h>
+#include <linux/io_uring_types.h>
#include <asm/rqspinlock.h>
struct bpf_verifier_env;
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index b13de31e163f..c5d58806a1cf 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -83,6 +83,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
BPF_PROG_TYPE(BPF_PROG_TYPE_NETFILTER, netfilter,
struct bpf_nf_ctx, struct bpf_nf_ctx)
#endif
+#ifdef CONFIG_IO_URING
+BPF_PROG_TYPE(BPF_PROG_TYPE_IO_URING, io_uring_filter,
+ struct io_uring_bpf_ctx, struct io_uring_bpf_ctx)
+#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index c664c84247f1..4b18dfc63764 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -219,6 +219,17 @@ struct io_rings {
struct io_uring_cqe cqes[] ____cacheline_aligned_in_smp;
};
+#ifdef CONFIG_BPF
+extern const struct bpf_prog_ops io_uring_filter_prog_ops;
+extern const struct bpf_verifier_ops io_uring_filter_verifier_ops;
+#endif
+
+struct io_bpf_filter;
+struct io_bpf_filters {
+ spinlock_t lock;
+ struct io_bpf_filter __rcu **bpf_filters;
+};
+
struct io_restriction {
refcount_t refs;
DECLARE_BITMAP(register_op, IORING_REGISTER_LAST);
@@ -229,6 +240,10 @@ struct io_restriction {
bool op_registered;
/* IORING_REGISTER_* restrictions exist */
bool reg_registered;
+ /* BPF filter restrictions exists */
+ bool bpf_registered;
+ struct io_bpf_filters filters;
+ struct rcu_head rcu_head;
};
struct io_submit_link {
@@ -265,6 +280,7 @@ struct io_ring_ctx {
unsigned int drain_next: 1;
unsigned int op_restricted: 1;
unsigned int reg_restricted: 1;
+ unsigned int bpf_restricted: 1;
unsigned int off_timeout_used: 1;
unsigned int drain_active: 1;
unsigned int has_evfd: 1;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f8d8513eda27..4d43ec003887 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1072,6 +1072,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SK_LOOKUP,
BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */
BPF_PROG_TYPE_NETFILTER,
+ BPF_PROG_TYPE_IO_URING,
__MAX_BPF_PROG_TYPE
};
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index b5b23c0d5283..0e1b0871fe5e 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -700,6 +700,9 @@ enum io_uring_register_op {
/* auxiliary zcrx configuration, see enum zcrx_ctrl_op */
IORING_REGISTER_ZCRX_CTRL = 36,
+ /* register bpf filtering programs */
+ IORING_REGISTER_BPF_FILTER = 37,
+
/* this goes last */
IORING_REGISTER_LAST,
@@ -1113,6 +1116,40 @@ struct zcrx_ctrl {
};
};
+struct io_uring_bpf_ctx {
+ __u8 opcode;
+ __u8 sqe_flags;
+ __u8 pad[6];
+ __u64 user_data;
+ union {
+ struct {
+ __u32 family;
+ __u32 type;
+ __u32 protocol;
+ } socket;
+ };
+};
+
+struct io_uring_bpf_filter {
+ __u32 opcode; /* io_uring opcode to filter */
+ __u32 flags;
+ __s32 prog_fd; /* BPF program fd */
+ __u32 reserved[3];
+};
+
+enum {
+ IO_URING_BPF_CMD_FILTER = 1,
+};
+
+struct io_uring_bpf {
+ __u16 cmd_type; /* IO_URING_BPF_* values */
+ __u16 cmd_flags; /* none so far */
+ __u32 resv;
+ union {
+ struct io_uring_bpf_filter filter;
+ };
+};
+
#ifdef __cplusplus
}
#endif
diff --git a/io_uring/Makefile b/io_uring/Makefile
index bc4e4a3fa0a5..d89bd0cf6363 100644
--- a/io_uring/Makefile
+++ b/io_uring/Makefile
@@ -22,3 +22,4 @@ obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o
obj-$(CONFIG_NET) += net.o cmd_net.o
obj-$(CONFIG_PROC_FS) += fdinfo.o
obj-$(CONFIG_IO_URING_MOCK_FILE) += mock_file.o
+obj-$(CONFIG_BPF) += bpf_filter.o
diff --git a/io_uring/bpf_filter.c b/io_uring/bpf_filter.c
new file mode 100644
index 000000000000..d31bff1984b7
--- /dev/null
+++ b/io_uring/bpf_filter.c
@@ -0,0 +1,212 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * BPF filter support for io_uring. Supports SQE opcodes for now.
+ */
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/io_uring.h>
+#include <linux/filter.h>
+#include <linux/bpf.h>
+#include <uapi/linux/io_uring.h>
+
+#include "io_uring.h"
+#include "bpf_filter.h"
+#include "net.h"
+
+struct io_bpf_filter {
+ struct bpf_prog *prog;
+ struct io_bpf_filter *next;
+};
+
+static bool io_uring_filter_is_valid_access(int off, int size,
+ enum bpf_access_type type,
+ const struct bpf_prog *prog,
+ struct bpf_insn_access_aux *info)
+{
+ if (type != BPF_READ)
+ return false;
+ if (off < 0 || off >= sizeof(struct io_uring_bpf_ctx))
+ return false;
+ if (off % size != 0)
+ return false;
+
+ return true;
+}
+
+/* Convert context field access if needed */
+static u32 io_uring_filter_convert_ctx_access(enum bpf_access_type type,
+ const struct bpf_insn *si,
+ struct bpf_insn *insn_buf,
+ struct bpf_prog *prog,
+ u32 *target_size)
+{
+ struct bpf_insn *insn = insn_buf;
+
+ /* Direct access is fine - context is read-only and passed directly */
+ switch (si->off) {
+ case offsetof(struct io_uring_bpf_ctx, opcode):
+ case offsetof(struct io_uring_bpf_ctx, sqe_flags):
+ case offsetof(struct io_uring_bpf_ctx, user_data):
+ *insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg,
+ si->src_reg, si->off);
+ break;
+ default:
+ /* Union fields - also direct access */
+ *insn++ = BPF_LDX_MEM(BPF_SIZE(si->code), si->dst_reg,
+ si->src_reg, si->off);
+ break;
+ }
+
+ return insn - insn_buf;
+}
+
+/* BTF ID for the context type */
+BTF_ID_LIST_SINGLE(io_uring_filter_btf_ids, struct, io_uring_bpf_ctx)
+
+/* Program operations */
+const struct bpf_prog_ops io_uring_filter_prog_ops = { };
+
+/* Verifier operations */
+const struct bpf_verifier_ops io_uring_filter_verifier_ops = {
+ .get_func_proto = bpf_base_func_proto,
+ .is_valid_access = io_uring_filter_is_valid_access,
+ .convert_ctx_access = io_uring_filter_convert_ctx_access,
+};
+
+/* Populate BPF context from SQE */
+static void io_uring_populate_bpf_ctx(struct io_uring_bpf_ctx *bctx,
+ struct io_kiocb *req)
+{
+ memset(bctx, 0, sizeof(*bctx));
+ bctx->opcode = req->opcode;
+ bctx->sqe_flags = req->flags & SQE_VALID_FLAGS;
+ bctx->user_data = req->cqe.user_data;
+
+ switch (req->opcode) {
+ case IORING_OP_SOCKET:
+ io_socket_bpf_populate(bctx, req);
+ break;
+ }
+}
+
+/*
+ * Run registered filters for a given opcode. Return of 0 means that the
+ * request should be allowed.
+ */
+int __io_uring_run_bpf_filters(struct io_restriction *res, struct io_kiocb *req)
+{
+ struct io_bpf_filter *filter;
+ struct io_uring_bpf_ctx bpf_ctx;
+ int ret;
+
+ rcu_read_lock();
+ filter = rcu_dereference(res->filters.bpf_filters[req->opcode]);
+ if (!filter || !filter->prog) {
+ rcu_read_unlock();
+ return 0;
+ }
+
+ io_uring_populate_bpf_ctx(&bpf_ctx, req);
+
+ do {
+ ret = bpf_prog_run(filter->prog, &bpf_ctx);
+ if (!ret)
+ break;
+ filter = filter->next;
+ } while (filter);
+
+ rcu_read_unlock();
+ return ret ? 0 : -EACCES;
+}
+
+int io_register_bpf_filter(struct io_restriction *res,
+ struct io_uring_bpf_filter __user *arg)
+{
+ struct io_bpf_filter *filter, *old_filter;
+ struct io_bpf_filter **filters;
+ struct io_uring_bpf reg;
+ struct bpf_prog *prog;
+
+ if (copy_from_user(®, arg, sizeof(reg)))
+ return -EFAULT;
+ if (reg.cmd_type != IO_URING_BPF_CMD_FILTER)
+ return -EINVAL;
+ if (reg.cmd_flags || reg.resv)
+ return -EINVAL;
+
+ if (reg.filter.opcode >= IORING_OP_LAST)
+ return -EINVAL;
+ if (reg.filter.flags ||
+ !mem_is_zero(reg.filter.reserved, sizeof(reg.filter.reserved)))
+ return -EINVAL;
+ if (reg.filter.prog_fd < 0)
+ return -EBADF;
+
+ /*
+ * No existing filters, allocate set.
+ */
+ filters = res->filters.bpf_filters;
+ if (!filters) {
+ filters = kcalloc(IORING_OP_LAST, sizeof(struct io_bpf_filter *), GFP_KERNEL_ACCOUNT);
+ if (!filters)
+ return -ENOMEM;
+ }
+
+ prog = bpf_prog_get_type(reg.filter.prog_fd, BPF_PROG_TYPE_IO_URING);
+ if (IS_ERR(prog)) {
+ if (filters != res->filters.bpf_filters)
+ kfree(filters);
+ return PTR_ERR(prog);
+ }
+
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL_ACCOUNT);
+ if (!filter) {
+ if (filters != res->filters.bpf_filters)
+ kfree(filters);
+ bpf_prog_put(prog);
+ return -ENOMEM;
+ }
+ filter->prog = prog;
+ res->filters.bpf_filters = filters;
+
+ /*
+ * Insert filter - if the current opcode already has a filter
+ * attached, add to the set.
+ */
+ spin_lock(&res->filters.lock);
+ old_filter = rcu_dereference(filters[reg.filter.opcode]);
+ if (old_filter)
+ filter->next = old_filter;
+ rcu_assign_pointer(filters[reg.filter.opcode], filter);
+ spin_unlock(&res->filters.lock);
+ res->bpf_registered = 1;
+ return 0;
+}
+
+void io_uring_put_bpf_filters(struct io_restriction *res)
+{
+ struct io_bpf_filters *filters = &res->filters;
+ int i;
+
+ if (!filters->bpf_filters)
+ return;
+ if (!res->bpf_registered)
+ return;
+
+ res->bpf_registered = 0;
+ for (i = 0; i < IORING_OP_LAST; i++) {
+ struct io_bpf_filter *filter;
+
+ filter = rcu_dereference(filters->bpf_filters[i]);
+ while (filter) {
+ struct io_bpf_filter *next = filter->next;
+
+ if (filter->prog)
+ bpf_prog_put(filter->prog);
+ kfree(filter);
+ filter = next;
+ }
+ }
+ kfree(filters->bpf_filters);
+ filters->bpf_filters = NULL;
+}
diff --git a/io_uring/bpf_filter.h b/io_uring/bpf_filter.h
new file mode 100644
index 000000000000..3cc53e0a3789
--- /dev/null
+++ b/io_uring/bpf_filter.h
@@ -0,0 +1,41 @@
+#ifndef IO_URING_BPF_FILTER_H
+#define IO_URING_BPF_FILTER_H
+
+#ifdef CONFIG_BPF
+
+void io_uring_put_bpf_filters(struct io_restriction *res);
+
+int __io_uring_run_bpf_filters(struct io_restriction *res, struct io_kiocb *req);
+
+int io_register_bpf_filter(struct io_restriction *res,
+ struct io_uring_bpf_filter __user *arg);
+
+static inline int io_uring_run_bpf_filters(struct io_ring_ctx *ctx,
+ struct io_kiocb *req)
+{
+ struct io_restriction *res = ctx->restrictions;
+
+ if (res && res->filters.bpf_filters)
+ return __io_uring_run_bpf_filters(res, req);
+
+ return 0;
+}
+
+#else
+
+static inline int io_register_bpf_filter(struct io_restriction *res,
+ struct io_uring_bpf_filter __user *arg)
+{
+ return -EINVAL;
+}
+static inline int io_uring_run_bpf_filters(struct io_ring_ctx *ctx,
+ struct io_kiocb *req)
+{
+ return 0;
+}
+static inline void io_uring_put_bpf_filters(struct io_restriction *res)
+{
+}
+#endif /* CONFIG_IO_URING */
+
+#endif
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index eec8da38a596..80aeb498ec8a 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -93,6 +93,7 @@
#include "rw.h"
#include "alloc_cache.h"
#include "eventfd.h"
+#include "bpf_filter.h"
#define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \
IOSQE_IO_HARDLINK | IOSQE_ASYNC)
@@ -2261,6 +2262,12 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
if (unlikely(ret))
return io_submit_fail_init(sqe, req, ret);
+ if (unlikely(ctx->bpf_restricted)) {
+ ret = io_uring_run_bpf_filters(ctx, req);
+ if (ret)
+ return io_submit_fail_init(sqe, req, ret);
+ }
+
trace_io_uring_submit_req(req);
/*
diff --git a/io_uring/net.c b/io_uring/net.c
index 519ea055b761..4fcba36bd0bb 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1699,6 +1699,15 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
return IOU_COMPLETE;
}
+void io_socket_bpf_populate(struct io_uring_bpf_ctx *bctx, struct io_kiocb *req)
+{
+ struct io_socket *sock = io_kiocb_to_cmd(req, struct io_socket);
+
+ bctx->socket.family = sock->domain;
+ bctx->socket.type = sock->type;
+ bctx->socket.protocol = sock->protocol;
+}
+
int io_socket_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_socket *sock = io_kiocb_to_cmd(req, struct io_socket);
diff --git a/io_uring/net.h b/io_uring/net.h
index 43e5ce5416b7..eef6b4272d01 100644
--- a/io_uring/net.h
+++ b/io_uring/net.h
@@ -44,6 +44,7 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags);
int io_socket_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_socket(struct io_kiocb *req, unsigned int issue_flags);
+void io_socket_bpf_populate(struct io_uring_bpf_ctx *bctx, struct io_kiocb *req);
int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
int io_connect(struct io_kiocb *req, unsigned int issue_flags);
@@ -64,4 +65,8 @@ void io_netmsg_cache_free(const void *entry);
static inline void io_netmsg_cache_free(const void *entry)
{
}
+static inline void io_socket_bpf_populate(struct io_uring_bpf_ctx *bctx,
+ struct io_kiocb *req)
+{
+}
#endif
diff --git a/io_uring/register.c b/io_uring/register.c
index 6c99b441d886..cb006d53a146 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -33,6 +33,7 @@
#include "memmap.h"
#include "zcrx.h"
#include "query.h"
+#include "bpf_filter.h"
#define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \
IORING_REGISTER_LAST + IORING_OP_LAST)
@@ -163,10 +164,19 @@ static __cold int io_parse_restrictions(void __user *arg, unsigned int nr_args,
return ret;
}
+static void io_free_restrictions(struct rcu_head *head)
+{
+ struct io_restriction *res;
+
+ res = container_of(head, struct io_restriction, rcu_head);
+ io_uring_put_bpf_filters(res);
+ kfree(res);
+}
+
void io_put_restrictions(struct io_restriction *res)
{
if (refcount_dec_and_test(&res->refs))
- kfree(res);
+ call_rcu(&res->rcu_head, io_free_restrictions);
}
static struct io_restriction *io_alloc_restrictions(void)
@@ -178,6 +188,7 @@ static struct io_restriction *io_alloc_restrictions(void)
return ERR_PTR(-ENOMEM);
refcount_set(&res->refs, 1);
+ spin_lock_init(&res->filters.lock);
return res;
}
@@ -853,6 +864,26 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
case IORING_REGISTER_ZCRX_CTRL:
ret = io_zcrx_ctrl(ctx, arg, nr_args);
break;
+ case IORING_REGISTER_BPF_FILTER:
+ ret = -EINVAL;
+ if (nr_args != 1)
+ break;
+#ifdef CONFIG_BPF
+ if (!ctx->restrictions) {
+ struct io_restriction *res;
+
+ res = io_alloc_restrictions();
+ if (IS_ERR(res)) {
+ ret = PTR_ERR(res);
+ break;
+ }
+ ctx->restrictions = res;
+ }
+ ret = io_register_bpf_filter(ctx->restrictions, arg);
+ if (ctx->restrictions->bpf_registered)
+ ctx->bpf_restricted = 1;
+#endif
+ break;
default:
ret = -EINVAL;
break;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4ff82144f885..d12537d918f7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2752,6 +2752,10 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
if (expected_attach_type == BPF_NETFILTER)
return 0;
return -EINVAL;
+ case BPF_PROG_TYPE_IO_URING:
+ if (expected_attach_type)
+ return -EINVAL;
+ return 0;
case BPF_PROG_TYPE_SYSCALL:
case BPF_PROG_TYPE_EXT:
if (expected_attach_type)
@@ -2934,6 +2938,7 @@ static int bpf_prog_load(union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
}
if (type != BPF_PROG_TYPE_SOCKET_FILTER &&
type != BPF_PROG_TYPE_CGROUP_SKB &&
+ type != BPF_PROG_TYPE_IO_URING &&
!bpf_cap)
goto put_token;
@@ -4403,6 +4408,10 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
if (attach_type != BPF_NETFILTER)
return -EINVAL;
return 0;
+ case BPF_PROG_TYPE_IO_URING:
+ if (attach_type != 0)
+ return -EINVAL;
+ return 0;
case BPF_PROG_TYPE_PERF_EVENT:
case BPF_PROG_TYPE_TRACEPOINT:
if (attach_type != BPF_PERF_EVENT)
--
2.51.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] io_uring: allow registration of per-task restrictions
2026-01-15 16:36 [PATCHSET RFC v3] Inherited restrictions and BPF filtering Jens Axboe
2026-01-15 16:36 ` [PATCH 1/3] io_uring: move ctx->restrictions to be dynamically allocated Jens Axboe
2026-01-15 16:36 ` [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions Jens Axboe
@ 2026-01-15 16:36 ` Jens Axboe
2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2026-01-15 16:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe
Currently io_uring supports restricting operations on a per-ring basis.
To use those, the ring must be setup in a disabled state by setting
IORING_SETUP_R_DISABLED. Then restrictions can be set for the ring, and
the ring can then be enabled.
This commit adds support for IORING_REGISTER_RESTRICTIONS_TASK, which
allows to register the same kind of restrictions, but with the task
itself rather than with a specific ring. Once done, any ring created
will inherit these restrictions.
If a restriction filter is registered with a task, then it's
inherited on fork for its children. Children may only further restrict
operations, not extend them.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
include/linux/io_uring.h | 2 +-
include/linux/sched.h | 1 +
include/uapi/linux/io_uring.h | 9 +++++
io_uring/io_uring.c | 14 ++++++++
io_uring/register.c | 65 +++++++++++++++++++++++++++++++++++
io_uring/tctx.c | 26 +++++++++-----
kernel/fork.c | 4 +++
7 files changed, 111 insertions(+), 10 deletions(-)
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 85fe4e6b275c..cfd2f4c667ee 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -25,7 +25,7 @@ static inline void io_uring_task_cancel(void)
}
static inline void io_uring_free(struct task_struct *tsk)
{
- if (tsk->io_uring)
+ if (tsk->io_uring || tsk->io_uring_restrict)
__io_uring_free(tsk);
}
#else
diff --git a/include/linux/sched.h b/include/linux/sched.h
index d395f2810fac..9abbd11bb87c 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1190,6 +1190,7 @@ struct task_struct {
#ifdef CONFIG_IO_URING
struct io_uring_task *io_uring;
+ struct io_restriction *io_uring_restrict;
#endif
/* Namespaces: */
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 0e1b0871fe5e..dcf70e064e45 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -703,6 +703,8 @@ enum io_uring_register_op {
/* register bpf filtering programs */
IORING_REGISTER_BPF_FILTER = 37,
+ IORING_REGISTER_RESTRICTIONS_TASK = 38,
+
/* this goes last */
IORING_REGISTER_LAST,
@@ -808,6 +810,13 @@ struct io_uring_restriction {
__u32 resv2[3];
};
+struct io_uring_task_restriction {
+ __u16 flags;
+ __u16 nr_res;
+ __u32 resv[3];
+ __DECLARE_FLEX_ARRAY(struct io_uring_restriction, restrictions);
+};
+
struct io_uring_clock_register {
__u32 clockid;
__u32 __resv[3];
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 80aeb498ec8a..f1625e4c6c7b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3623,6 +3623,20 @@ static __cold int io_uring_create(struct io_ctx_config *config)
else
ctx->notify_method = TWA_SIGNAL;
+ /*
+ * If the current task has restrictions enabled, then copy them to
+ * our newly created ring and mark it as registered.
+ */
+ if (current->io_uring_restrict) {
+ struct io_restriction *res = current->io_uring_restrict;
+
+ refcount_inc(&res->refs);
+ ctx->restrictions = res;
+ ctx->op_restricted = res->op_registered;
+ ctx->reg_restricted = res->reg_registered;
+ ctx->bpf_restricted = res->bpf_registered;
+ }
+
/*
* This is just grabbed for accounting purposes. When a process exits,
* the mm is exited and dropped before the files, hence we need to hang
diff --git a/io_uring/register.c b/io_uring/register.c
index cb006d53a146..00b9508b18f9 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -223,6 +223,67 @@ static __cold int io_register_restrictions(struct io_ring_ctx *ctx,
return 0;
}
+static int io_register_restrictions_task(void __user *arg, unsigned int nr_args)
+{
+ struct io_uring_task_restriction __user *ures = arg;
+ struct io_uring_task_restriction tres;
+ struct io_restriction *res;
+ int ret;
+
+ /* Disallow if task already has registered restrictions */
+ if (current->io_uring_restrict)
+ return -EPERM;
+ if (nr_args != 1)
+ return -EINVAL;
+
+ if (copy_from_user(&tres, arg, sizeof(tres)))
+ return -EFAULT;
+
+ if (tres.flags)
+ return -EINVAL;
+ if (!mem_is_zero(tres.resv, sizeof(tres.resv)))
+ return -EINVAL;
+
+ res = io_alloc_restrictions();
+ if (IS_ERR(res))
+ return PTR_ERR(res);
+
+ ret = io_parse_restrictions(ures->restrictions, tres.nr_res, res);
+ if (ret < 0) {
+ kfree(res);
+ return ret;
+ }
+ current->io_uring_restrict = res;
+ return 0;
+}
+
+static int io_register_bpf_filter_task(void __user *arg, unsigned int nr_args)
+{
+ struct io_restriction *res;
+ int ret;
+
+ if (nr_args != 1)
+ return -EINVAL;
+
+ /* If no task restrictions exist, setup a new set */
+ res = current->io_uring_restrict;
+ if (!res) {
+ res = io_alloc_restrictions();
+ if (IS_ERR(res))
+ return PTR_ERR(res);
+ }
+
+ ret = io_register_bpf_filter(res, arg);
+ if (ret) {
+ if (res != current->io_uring_restrict)
+ io_put_restrictions(res);
+ return ret;
+ }
+ if (!current->io_uring_restrict)
+ current->io_uring_restrict = res;
+ return 0;
+}
+
static int io_register_enable_rings(struct io_ring_ctx *ctx)
{
if (!(ctx->flags & IORING_SETUP_R_DISABLED))
@@ -955,6 +1016,10 @@ static int io_uring_register_blind(unsigned int opcode, void __user *arg,
return io_uring_register_send_msg_ring(arg, nr_args);
case IORING_REGISTER_QUERY:
return io_query(arg, nr_args);
+ case IORING_REGISTER_RESTRICTIONS_TASK:
+ return io_register_restrictions_task(arg, nr_args);
+ case IORING_REGISTER_BPF_FILTER:
+ return io_register_bpf_filter_task(arg, nr_args);
}
return -EINVAL;
}
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index 5b66755579c0..d45785dcd2e3 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -11,6 +11,8 @@
#include "io_uring.h"
#include "tctx.h"
+#include "register.h"
+#include "bpf_filter.h"
static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx,
struct task_struct *task)
@@ -54,16 +56,22 @@ void __io_uring_free(struct task_struct *tsk)
* node is stored in the xarray. Until that gets sorted out, attempt
* an iteration here and warn if any entries are found.
*/
- xa_for_each(&tctx->xa, index, node) {
- WARN_ON_ONCE(1);
- break;
- }
- WARN_ON_ONCE(tctx->io_wq);
- WARN_ON_ONCE(tctx->cached_refs);
+ if (tctx) {
+ xa_for_each(&tctx->xa, index, node) {
+ WARN_ON_ONCE(1);
+ break;
+ }
+ WARN_ON_ONCE(tctx->io_wq);
+ WARN_ON_ONCE(tctx->cached_refs);
- percpu_counter_destroy(&tctx->inflight);
- kfree(tctx);
- tsk->io_uring = NULL;
+ percpu_counter_destroy(&tctx->inflight);
+ kfree(tctx);
+ tsk->io_uring = NULL;
+ }
+ if (tsk->io_uring_restrict) {
+ io_put_restrictions(tsk->io_uring_restrict);
+ tsk->io_uring_restrict = NULL;
+ }
}
__cold int io_uring_alloc_task_context(struct task_struct *task,
diff --git a/kernel/fork.c b/kernel/fork.c
index b1f3915d5f8e..da8fd6fd384c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -97,6 +97,7 @@
#include <linux/kasan.h>
#include <linux/scs.h>
#include <linux/io_uring.h>
+#include <linux/io_uring_types.h>
#include <linux/bpf.h>
#include <linux/stackprotector.h>
#include <linux/user_events.h>
@@ -2129,6 +2130,8 @@ __latent_entropy struct task_struct *copy_process(
#ifdef CONFIG_IO_URING
p->io_uring = NULL;
+ if (p->io_uring_restrict)
+ refcount_inc(&p->io_uring_restrict->refs);
#endif
p->default_timer_slack_ns = current->timer_slack_ns;
@@ -2525,6 +2528,7 @@ __latent_entropy struct task_struct *copy_process(
mpol_put(p->mempolicy);
#endif
bad_fork_cleanup_delayacct:
+ io_uring_free(p);
delayacct_tsk_free(p);
bad_fork_cleanup_count:
dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
--
2.51.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions
2026-01-15 16:36 ` [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions Jens Axboe
@ 2026-01-15 20:11 ` Jonathan Corbet
2026-01-15 21:02 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Corbet @ 2026-01-15 20:11 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Jens Axboe
Jens Axboe <axboe@kernel.dk> writes:
> This adds support for loading BPF programs with io_uring, which can
> restrict the opcodes performed. Unlike IORING_REGISTER_RESTRICTIONS,
> using BPF programs allow fine grained control over both the opcode
> in question, as well as other data associated with the request.
> Initially only IORING_OP_SOCKET is supported.
A minor nit...
[...]
> +/*
> + * Run registered filters for a given opcode. Return of 0 means that the
> + * request should be allowed.
> + */
> +int __io_uring_run_bpf_filters(struct io_restriction *res, struct io_kiocb *req)
> +{
That comment seems to contradict the actual logic in this function, as
well as the example BPF program in the cover letter. So
s/allowed/blocked/?
Thanks,
jon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions
2026-01-15 20:11 ` Jonathan Corbet
@ 2026-01-15 21:02 ` Jens Axboe
2026-01-15 21:05 ` Jonathan Corbet
0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2026-01-15 21:02 UTC (permalink / raw)
To: Jonathan Corbet, io-uring
On 1/15/26 1:11 PM, Jonathan Corbet wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>
>> This adds support for loading BPF programs with io_uring, which can
>> restrict the opcodes performed. Unlike IORING_REGISTER_RESTRICTIONS,
>> using BPF programs allow fine grained control over both the opcode
>> in question, as well as other data associated with the request.
>> Initially only IORING_OP_SOCKET is supported.
>
> A minor nit...
>
> [...]
>
>> +/*
>> + * Run registered filters for a given opcode. Return of 0 means that the
>> + * request should be allowed.
>> + */
>> +int __io_uring_run_bpf_filters(struct io_restriction *res, struct io_kiocb *req)
>> +{
>
> That comment seems to contradict the actual logic in this function, as
> well as the example BPF program in the cover letter. So
> s/allowed/blocked/?
Are you talking about __io_uring_run_bpf_filters() or the filters
themselves? For the former, 0 does indeed mean "yep let it rip", for the
filters it's 0/1 where 0 is deny and 1 is allow. I should probably make
the comment more explicit on that front...
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions
2026-01-15 21:02 ` Jens Axboe
@ 2026-01-15 21:05 ` Jonathan Corbet
2026-01-15 21:08 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Corbet @ 2026-01-15 21:05 UTC (permalink / raw)
To: Jens Axboe, io-uring
Jens Axboe <axboe@kernel.dk> writes:
> On 1/15/26 1:11 PM, Jonathan Corbet wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>>
>>> This adds support for loading BPF programs with io_uring, which can
>>> restrict the opcodes performed. Unlike IORING_REGISTER_RESTRICTIONS,
>>> using BPF programs allow fine grained control over both the opcode
>>> in question, as well as other data associated with the request.
>>> Initially only IORING_OP_SOCKET is supported.
>>
>> A minor nit...
>>
>> [...]
>>
>>> +/*
>>> + * Run registered filters for a given opcode. Return of 0 means that the
>>> + * request should be allowed.
>>> + */
>>> +int __io_uring_run_bpf_filters(struct io_restriction *res, struct io_kiocb *req)
>>> +{
>>
>> That comment seems to contradict the actual logic in this function, as
>> well as the example BPF program in the cover letter. So
>> s/allowed/blocked/?
>
> Are you talking about __io_uring_run_bpf_filters() or the filters
> themselves? For the former, 0 does indeed mean "yep let it rip", for the
> filters it's 0/1 where 0 is deny and 1 is allow. I should probably make
> the comment more explicit on that front...
Ah, yes, I got confused between the two, sorry for the noise.
jon
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions
2026-01-15 21:05 ` Jonathan Corbet
@ 2026-01-15 21:08 ` Jens Axboe
0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2026-01-15 21:08 UTC (permalink / raw)
To: Jonathan Corbet, io-uring
On 1/15/26 2:05 PM, Jonathan Corbet wrote:
> Jens Axboe <axboe@kernel.dk> writes:
>
>> On 1/15/26 1:11 PM, Jonathan Corbet wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>> This adds support for loading BPF programs with io_uring, which can
>>>> restrict the opcodes performed. Unlike IORING_REGISTER_RESTRICTIONS,
>>>> using BPF programs allow fine grained control over both the opcode
>>>> in question, as well as other data associated with the request.
>>>> Initially only IORING_OP_SOCKET is supported.
>>>
>>> A minor nit...
>>>
>>> [...]
>>>
>>>> +/*
>>>> + * Run registered filters for a given opcode. Return of 0 means that the
>>>> + * request should be allowed.
>>>> + */
>>>> +int __io_uring_run_bpf_filters(struct io_restriction *res, struct io_kiocb *req)
>>>> +{
>>>
>>> That comment seems to contradict the actual logic in this function, as
>>> well as the example BPF program in the cover letter. So
>>> s/allowed/blocked/?
>>
>> Are you talking about __io_uring_run_bpf_filters() or the filters
>> themselves? For the former, 0 does indeed mean "yep let it rip", for the
>> filters it's 0/1 where 0 is deny and 1 is allow. I should probably make
>> the comment more explicit on that front...
>
> Ah, yes, I got confused between the two, sorry for the noise.
It's useful, I expanded the comment now:
/*
* Run registered filters for a given opcode. For filters, a return of 0 denies
* execution of the request, a return of 1 allows it. If any filter for an
* opcode returns 0, filter processing is stopped, and the request is denied.
* This also stops the processing of filters.
*
* __io_uring_run_bpf_filters() returns 0 on success, allow running the
* request, and -EACCES when a request is denied.
*/
I am making one more change in this patch though - to deny by default.
If restrictions are registered with BPF, we should probably have the
same logic as the classic restrictions, where if an opcode isn't
explicitly enabled, it is denied. That makes it easier to future proof a
filter set. I do want to do that without needing a dummy BPF program
attached to each one, however...
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-01-15 21:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-15 16:36 [PATCHSET RFC v3] Inherited restrictions and BPF filtering Jens Axboe
2026-01-15 16:36 ` [PATCH 1/3] io_uring: move ctx->restrictions to be dynamically allocated Jens Axboe
2026-01-15 16:36 ` [PATCH 2/3] io_uring: add support for BPF filtering for opcode restrictions Jens Axboe
2026-01-15 20:11 ` Jonathan Corbet
2026-01-15 21:02 ` Jens Axboe
2026-01-15 21:05 ` Jonathan Corbet
2026-01-15 21:08 ` Jens Axboe
2026-01-15 16:36 ` [PATCH 3/3] io_uring: allow registration of per-task restrictions Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox