Re: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

From: Caleb Sander Mateos <csander@purestorage.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Stefan Metzmacher <metze@samba.org>, Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,  Akilesh Kailash <akailash@google.com>,
	bpf@vger.kernel.org,  Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops
Date: Mon, 8 Dec 2025 14:45:35 -0800	[thread overview]
Message-ID: <CADUfDZqpTSihuYnTqUbtctrX4OGT7Szr-_wWb4xLgg11RcwYkA@mail.gmail.com> (raw)
In-Reply-To: <aRabTk29_v6p92mY@fedora>

On Thu, Nov 13, 2025 at 7:00 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Thu, Nov 13, 2025 at 12:19:33PM +0100, Stefan Metzmacher wrote:
> > Am 13.11.25 um 11:59 schrieb Ming Lei:
> > > On Thu, Nov 13, 2025 at 11:32:56AM +0100, Stefan Metzmacher wrote:
> > > > Hi Ming,
> > > >
> > > > > io_uring can be extended with bpf struct_ops in the following ways:
> > > > >
> > > > > 1) add new io_uring operation from application
> > > > > - one typical use case is for operating device zero-copy buffer, which
> > > > > belongs to kernel, and not visible or too expensive to export to
> > > > > userspace, such as supporting copy data from this buffer to userspace,
> > > > > decompressing data to zero-copy buffer in Android case[1][2], or
> > > > > checksum/decrypting.
> > > > >
> > > > > [1] https://lpc.events/event/18/contributions/1710/attachments/1440/3070/LPC2024_ublk_zero_copy.pdf
> > > > >
> > > > > 2) extend 64 byte SQE, since bpf map can be used to store IO data
> > > > >      conveniently
> > > > >
> > > > > 3) communicate in IO chain, since bpf map can be shared among IOs,
> > > > > when one bpf IO is completed, data can be written to IO chain wide
> > > > > bpf map, then the following bpf IO can retrieve the data from this bpf
> > > > > map, this way is more flexible than io_uring built-in buffer
> > > > >
> > > > > 4) pretty handy to inject error for test purpose
> > > > >
> > > > > bpf struct_ops is one very handy way to attach bpf prog with kernel, and
> > > > > this patch simply wires existed io_uring operation callbacks with added
> > > > > uring bpf struct_ops, so application can define its own uring bpf
> > > > > operations.
> > > >
> > > > This sounds useful to me.
> > > >
> > > > > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > > > > ---
> > > > >    include/uapi/linux/io_uring.h |   9 ++
> > > > >    io_uring/bpf.c                | 271 +++++++++++++++++++++++++++++++++-
> > > > >    io_uring/io_uring.c           |   1 +
> > > > >    io_uring/io_uring.h           |   3 +-
> > > > >    io_uring/uring_bpf.h          |  30 ++++
> > > > >    5 files changed, 311 insertions(+), 3 deletions(-)
> > > > >
> > > > > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > > > > index b8c49813b4e5..94d2050131ac 100644
> > > > > --- a/include/uapi/linux/io_uring.h
> > > > > +++ b/include/uapi/linux/io_uring.h
> > > > > @@ -74,6 +74,7 @@ struct io_uring_sqe {
> > > > >                 __u32           install_fd_flags;
> > > > >                 __u32           nop_flags;
> > > > >                 __u32           pipe_flags;
> > > > > +               __u32           bpf_op_flags;
> > > > >         };
> > > > >         __u64   user_data;      /* data to be passed back at completion time */
> > > > >         /* pack this to avoid bogus arm OABI complaints */
> > > > > @@ -427,6 +428,13 @@ enum io_uring_op {
> > > > >    #define IORING_RECVSEND_BUNDLE               (1U << 4)
> > > > >    #define IORING_SEND_VECTORIZED               (1U << 5)
> > > > > +/*
> > > > > + * sqe->bpf_op_flags           top 8bits is for storing bpf op
> > > > > + *                             The other 24bits are used for bpf prog
> > > > > + */
> > > > > +#define IORING_BPF_OP_BITS     (8)
> > > > > +#define IORING_BPF_OP_SHIFT    (24)
> > > > > +
> > > > >    /*
> > > > >     * cqe.res for IORING_CQE_F_NOTIF if
> > > > >     * IORING_SEND_ZC_REPORT_USAGE was requested
> > > > > @@ -631,6 +639,7 @@ struct io_uring_params {
> > > > >    #define IORING_FEAT_MIN_TIMEOUT              (1U << 15)
> > > > >    #define IORING_FEAT_RW_ATTR          (1U << 16)
> > > > >    #define IORING_FEAT_NO_IOWAIT                (1U << 17)
> > > > > +#define IORING_FEAT_BPF                        (1U << 18)
> > > > >    /*
> > > > >     * io_uring_register(2) opcodes and arguments
> > > > > diff --git a/io_uring/bpf.c b/io_uring/bpf.c
> > > > > index bb1e37d1e804..8227be6d5a10 100644
> > > > > --- a/io_uring/bpf.c
> > > > > +++ b/io_uring/bpf.c
> > > > > @@ -4,28 +4,95 @@
> > > > >    #include <linux/kernel.h>
> > > > >    #include <linux/errno.h>
> > > > >    #include <uapi/linux/io_uring.h>
> > > > > +#include <linux/init.h>
> > > > > +#include <linux/types.h>
> > > > > +#include <linux/bpf_verifier.h>
> > > > > +#include <linux/bpf.h>
> > > > > +#include <linux/btf.h>
> > > > > +#include <linux/btf_ids.h>
> > > > > +#include <linux/filter.h>
> > > > >    #include "io_uring.h"
> > > > >    #include "uring_bpf.h"
> > > > > +#define MAX_BPF_OPS_COUNT      (1 << IORING_BPF_OP_BITS)
> > > > > +
> > > > >    static DEFINE_MUTEX(uring_bpf_ctx_lock);
> > > > >    static LIST_HEAD(uring_bpf_ctx_list);
> > > > > +DEFINE_STATIC_SRCU(uring_bpf_srcu);
> > > > > +static struct uring_bpf_ops bpf_ops[MAX_BPF_OPS_COUNT];
> > > >
> > > > This indicates to me that the whole system with all applications in all namespaces
> > > > need to coordinate in order to use these 256 ops?
> > >
> > > So far there is only 62 in-tree io_uring operation defined, I feel 256
> > > should be enough.
> > >
> > > > I think in order to have something useful, this should be per
> > > > struct io_ring_ctx and each application should be able to load
> > > > its own bpf programs.
> > >
> > > per-ctx requirement looks reasonable, and it shouldn't be hard to
> > > support.
> > >
> > > >
> > > > Something that uses bpf_prog_get_type() based on a bpf_fd
> > > > like SIOCKCMATTACH in net/kcm/kcmsock.c.
> > >
> > > I considered per-ctx prog before, one drawback is the prog can't be shared
> > > among io_ring_ctx, which could waste memory. In my ublk case, there can be
> > > lots of devices sharing same bpf prog.
> >
> > Can't the ublk instances coordinate and use the same bpf_fd?
> > new instances could request it via a unix socket and SCM_RIGHTS
> > from a long running loading process. On the other hand do they
> > really want to share?
>
> struct_ops is typically registered once, used everywhere, such as
> sched_ext and socket example.
>
> This patch follows this usage, so every io_uring application can access it like the
> in-kernel operations.
>
> I can understand the requirement for per-io-ring-ctx struct_ops, which
> won't cause conflict among different applications.
>
> For example, ublk/raid5, there are 100 such devices, each device is created in dedicated
> process and uses its own io-uring, so 100 same struct_ops prog are registered in memory.
> Given struct_ops prog is registered as per-io-ring-ctx, it may not be shared by `bpf_fd`, IMO.

I agree with Stefan that a global IORING_OP_BPF op to BPF program
mapping will be difficult to coordinate between processes. For
example, consider two different ublk server programs that each want to
use a different BPF program. Ideally, each should be an independent
program and not need to know the op ids used by the other.
On the other hand, a multithreaded process may have multiple
io_ring_ctxs and want to use the same IORING_OP_BPF ops with all of
them. So a process-level mapping seems to make the most sense. And
that's exactly the mapping level that we would get from using the BPF
program file descriptor to specify the IORING_OP_BPF op. Additionally,
as Stefan points out, the IORING_OP_BPF program could be shared with
another process by sending the file descriptor using SCM_RIGHTS. And
the file descriptor lookup overhead could be avoided in the I/O path
using io_uring's existing support for registered files.

Best,
Caleb

>
> >
> > I don't know much about bpf in details, so I'm wondering in your
> > example from
> > https://github.com/ming1/liburing/commit/625b69ddde15ad80e078c684ba166f49c1174fa4
> >
> > Would memory_map be global in the whole system or would
> > each loaded instance of the program have it's own instance of memory_map?
>
> bpf map is global.
>
> At default, each loaded prog owns the map, but it may be exported for
> others by pinning the map.
>
> It is easy to verify by writing test code in tools/testing/selftests/
>
> But I am not an bpf expert...
>
> Thanks,
> Ming
>

next prev parent reply	other threads:[~2025-12-08 22:45 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
2025-11-07 19:02   ` kernel test robot
2025-11-08  6:53   ` kernel test robot
2025-11-13 10:32   ` Stefan Metzmacher
2025-11-13 10:59     ` Ming Lei
2025-11-13 11:19       ` Stefan Metzmacher
2025-11-14  3:00         ` Ming Lei
2025-12-08 22:45           ` Caleb Sander Mateos [this message]
2025-12-09  3:08             ` Ming Lei
2025-12-10 16:11               ` Caleb Sander Mateos
2025-11-19 14:39   ` Jonathan Corbet
2025-11-20  1:46     ` Ming Lei
2025-11-20  1:51       ` Ming Lei
2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
2025-11-13 10:42   ` Stefan Metzmacher
2025-11-13 11:04     ` Ming Lei
2025-11-13 11:25       ` Stefan Metzmacher
2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
2025-11-07 18:51   ` kernel test robot
2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
2025-11-05 15:57   ` Ming Lei
2025-11-06 16:03     ` Pavel Begunkov
2025-11-07 15:54       ` Ming Lei
2025-11-11 14:07         ` Pavel Begunkov
2025-11-13  4:18           ` Ming Lei
2025-11-19 19:00             ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADUfDZqpTSihuYnTqUbtctrX4OGT7Szr-_wWb4xLgg11RcwYkA@mail.gmail.com \
    --to=csander@purestorage.com \
    --cc=akailash@google.com \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=metze@samba.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox