From: Stefan Metzmacher <metze@samba.org>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
io-uring@vger.kernel.org,
Caleb Sander Mateos <csander@purestorage.com>,
Akilesh Kailash <akailash@google.com>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
Date: Thu, 13 Nov 2025 12:19:33 +0100 [thread overview]
Message-ID: <05a37623-c78c-4a86-a9f3-c78ce133fa66@samba.org> (raw)
In-Reply-To: <aRW6LfJi63X7wbPm@fedora>
Am 13.11.25 um 11:59 schrieb Ming Lei:
> On Thu, Nov 13, 2025 at 11:32:56AM +0100, Stefan Metzmacher wrote:
>> Hi Ming,
>>
>>> io_uring can be extended with bpf struct_ops in the following ways:
>>>
>>> 1) add new io_uring operation from application
>>> - one typical use case is for operating device zero-copy buffer, which
>>> belongs to kernel, and not visible or too expensive to export to
>>> userspace, such as supporting copy data from this buffer to userspace,
>>> decompressing data to zero-copy buffer in Android case[1][2], or
>>> checksum/decrypting.
>>>
>>> [1] https://lpc.events/event/18/contributions/1710/attachments/1440/3070/LPC2024_ublk_zero_copy.pdf
>>>
>>> 2) extend 64 byte SQE, since bpf map can be used to store IO data
>>> conveniently
>>>
>>> 3) communicate in IO chain, since bpf map can be shared among IOs,
>>> when one bpf IO is completed, data can be written to IO chain wide
>>> bpf map, then the following bpf IO can retrieve the data from this bpf
>>> map, this way is more flexible than io_uring built-in buffer
>>>
>>> 4) pretty handy to inject error for test purpose
>>>
>>> bpf struct_ops is one very handy way to attach bpf prog with kernel, and
>>> this patch simply wires existed io_uring operation callbacks with added
>>> uring bpf struct_ops, so application can define its own uring bpf
>>> operations.
>>
>> This sounds useful to me.
>>
>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>> ---
>>> include/uapi/linux/io_uring.h | 9 ++
>>> io_uring/bpf.c | 271 +++++++++++++++++++++++++++++++++-
>>> io_uring/io_uring.c | 1 +
>>> io_uring/io_uring.h | 3 +-
>>> io_uring/uring_bpf.h | 30 ++++
>>> 5 files changed, 311 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>> index b8c49813b4e5..94d2050131ac 100644
>>> --- a/include/uapi/linux/io_uring.h
>>> +++ b/include/uapi/linux/io_uring.h
>>> @@ -74,6 +74,7 @@ struct io_uring_sqe {
>>> __u32 install_fd_flags;
>>> __u32 nop_flags;
>>> __u32 pipe_flags;
>>> + __u32 bpf_op_flags;
>>> };
>>> __u64 user_data; /* data to be passed back at completion time */
>>> /* pack this to avoid bogus arm OABI complaints */
>>> @@ -427,6 +428,13 @@ enum io_uring_op {
>>> #define IORING_RECVSEND_BUNDLE (1U << 4)
>>> #define IORING_SEND_VECTORIZED (1U << 5)
>>> +/*
>>> + * sqe->bpf_op_flags top 8bits is for storing bpf op
>>> + * The other 24bits are used for bpf prog
>>> + */
>>> +#define IORING_BPF_OP_BITS (8)
>>> +#define IORING_BPF_OP_SHIFT (24)
>>> +
>>> /*
>>> * cqe.res for IORING_CQE_F_NOTIF if
>>> * IORING_SEND_ZC_REPORT_USAGE was requested
>>> @@ -631,6 +639,7 @@ struct io_uring_params {
>>> #define IORING_FEAT_MIN_TIMEOUT (1U << 15)
>>> #define IORING_FEAT_RW_ATTR (1U << 16)
>>> #define IORING_FEAT_NO_IOWAIT (1U << 17)
>>> +#define IORING_FEAT_BPF (1U << 18)
>>> /*
>>> * io_uring_register(2) opcodes and arguments
>>> diff --git a/io_uring/bpf.c b/io_uring/bpf.c
>>> index bb1e37d1e804..8227be6d5a10 100644
>>> --- a/io_uring/bpf.c
>>> +++ b/io_uring/bpf.c
>>> @@ -4,28 +4,95 @@
>>> #include <linux/kernel.h>
>>> #include <linux/errno.h>
>>> #include <uapi/linux/io_uring.h>
>>> +#include <linux/init.h>
>>> +#include <linux/types.h>
>>> +#include <linux/bpf_verifier.h>
>>> +#include <linux/bpf.h>
>>> +#include <linux/btf.h>
>>> +#include <linux/btf_ids.h>
>>> +#include <linux/filter.h>
>>> #include "io_uring.h"
>>> #include "uring_bpf.h"
>>> +#define MAX_BPF_OPS_COUNT (1 << IORING_BPF_OP_BITS)
>>> +
>>> static DEFINE_MUTEX(uring_bpf_ctx_lock);
>>> static LIST_HEAD(uring_bpf_ctx_list);
>>> +DEFINE_STATIC_SRCU(uring_bpf_srcu);
>>> +static struct uring_bpf_ops bpf_ops[MAX_BPF_OPS_COUNT];
>>
>> This indicates to me that the whole system with all applications in all namespaces
>> need to coordinate in order to use these 256 ops?
>
> So far there is only 62 in-tree io_uring operation defined, I feel 256
> should be enough.
>
>> I think in order to have something useful, this should be per
>> struct io_ring_ctx and each application should be able to load
>> its own bpf programs.
>
> per-ctx requirement looks reasonable, and it shouldn't be hard to
> support.
>
>>
>> Something that uses bpf_prog_get_type() based on a bpf_fd
>> like SIOCKCMATTACH in net/kcm/kcmsock.c.
>
> I considered per-ctx prog before, one drawback is the prog can't be shared
> among io_ring_ctx, which could waste memory. In my ublk case, there can be
> lots of devices sharing same bpf prog.
Can't the ublk instances coordinate and use the same bpf_fd?
new instances could request it via a unix socket and SCM_RIGHTS
from a long running loading process. On the other hand do they
really want to share?
I don't know much about bpf in details, so I'm wondering in your
example from
https://github.com/ming1/liburing/commit/625b69ddde15ad80e078c684ba166f49c1174fa4
Would memory_map be global in the whole system or would
each loaded instance of the program have it's own instance of memory_map?
Thanks!
metze
next prev parent reply other threads:[~2025-11-13 11:19 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
2025-11-07 19:02 ` kernel test robot
2025-11-08 6:53 ` kernel test robot
2025-11-13 10:32 ` Stefan Metzmacher
2025-11-13 10:59 ` Ming Lei
2025-11-13 11:19 ` Stefan Metzmacher [this message]
2025-11-14 3:00 ` Ming Lei
2025-11-19 14:39 ` Jonathan Corbet
2025-11-20 1:46 ` Ming Lei
2025-11-20 1:51 ` Ming Lei
2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
2025-11-13 10:42 ` Stefan Metzmacher
2025-11-13 11:04 ` Ming Lei
2025-11-13 11:25 ` Stefan Metzmacher
2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
2025-11-07 18:51 ` kernel test robot
2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
2025-11-05 15:57 ` Ming Lei
2025-11-06 16:03 ` Pavel Begunkov
2025-11-07 15:54 ` Ming Lei
2025-11-11 14:07 ` Pavel Begunkov
2025-11-13 4:18 ` Ming Lei
2025-11-19 19:00 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=05a37623-c78c-4a86-a9f3-c78ce133fa66@samba.org \
--to=metze@samba.org \
--cc=akailash@google.com \
--cc=ast@kernel.org \
--cc=axboe@kernel.dk \
--cc=bpf@vger.kernel.org \
--cc=csander@purestorage.com \
--cc=io-uring@vger.kernel.org \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox