Re: [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: io-uring@vger.kernel.org, Martin KaFai Lau <martin.lau@linux.dev>,
	 bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers
Date: Wed, 11 Jun 2025 19:47:50 -0700	[thread overview]
Message-ID: <CAADnVQJgxnQEL+rtVkp7TB_qQ1JKHiXe=p48tB_-N6F+oaDLyQ@mail.gmail.com> (raw)
In-Reply-To: <c4de7ed6e165f54e2166e84bc88632887d87cfdf.1749214572.git.asml.silence@gmail.com>

On Fri, Jun 6, 2025 at 6:58 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> A handle_events program should be able to parse the CQ and submit new
> requests, add kfuncs to cover that. The only essential kfunc here is
> bpf_io_uring_submit_sqes, and the rest are likely be removed in a
> non-RFC version in favour of a more general approach.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>  io_uring/bpf.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 86 insertions(+)
>
> diff --git a/io_uring/bpf.c b/io_uring/bpf.c
> index f86b12f280e8..9494e4289605 100644
> --- a/io_uring/bpf.c
> +++ b/io_uring/bpf.c
> @@ -1,12 +1,92 @@
>  #include <linux/mutex.h>
>  #include <linux/bpf_verifier.h>
>
> +#include "io_uring.h"
>  #include "bpf.h"
>  #include "register.h"
>
>  static const struct btf_type *loop_state_type;
>  DEFINE_MUTEX(io_bpf_ctrl_mutex);
>
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc int bpf_io_uring_submit_sqes(struct io_ring_ctx *ctx,
> +                                        unsigned nr)
> +{
> +       return io_submit_sqes(ctx, nr);
> +}
> +
> +__bpf_kfunc int bpf_io_uring_post_cqe(struct io_ring_ctx *ctx,
> +                                     u64 data, u32 res, u32 cflags)
> +{
> +       bool posted;
> +
> +       posted = io_post_aux_cqe(ctx, data, res, cflags);
> +       return posted ? 0 : -ENOMEM;
> +}
> +
> +__bpf_kfunc int bpf_io_uring_queue_sqe(struct io_ring_ctx *ctx,
> +                                       void *bpf_sqe, int mem__sz)
> +{
> +       unsigned tail = ctx->rings->sq.tail;
> +       struct io_uring_sqe *sqe;
> +
> +       if (mem__sz != sizeof(*sqe))
> +               return -EINVAL;
> +
> +       ctx->rings->sq.tail++;
> +       tail &= (ctx->sq_entries - 1);
> +       /* double index for 128-byte SQEs, twice as long */
> +       if (ctx->flags & IORING_SETUP_SQE128)
> +               tail <<= 1;
> +       sqe = &ctx->sq_sqes[tail];
> +       memcpy(sqe, bpf_sqe, sizeof(*sqe));
> +       return 0;
> +}
> +
> +__bpf_kfunc
> +struct io_uring_cqe *bpf_io_uring_get_cqe(struct io_ring_ctx *ctx, u32 idx)
> +{
> +       unsigned max_entries = ctx->cq_entries;
> +       struct io_uring_cqe *cqe_array = ctx->rings->cqes;
> +
> +       if (ctx->flags & IORING_SETUP_CQE32)
> +               max_entries *= 2;
> +       return &cqe_array[idx & (max_entries - 1)];
> +}
> +
> +__bpf_kfunc
> +struct io_uring_cqe *bpf_io_uring_extract_next_cqe(struct io_ring_ctx *ctx)
> +{
> +       struct io_rings *rings = ctx->rings;
> +       unsigned int mask = ctx->cq_entries - 1;
> +       unsigned head = rings->cq.head;
> +       struct io_uring_cqe *cqe;
> +
> +       /* TODO CQE32 */
> +       if (head == rings->cq.tail)
> +               return NULL;
> +
> +       cqe = &rings->cqes[head & mask];
> +       rings->cq.head++;
> +       return cqe;
> +}
> +
> +__bpf_kfunc_end_defs();
> +
> +BTF_KFUNCS_START(io_uring_kfunc_set)
> +BTF_ID_FLAGS(func, bpf_io_uring_submit_sqes, KF_SLEEPABLE);
> +BTF_ID_FLAGS(func, bpf_io_uring_post_cqe, KF_SLEEPABLE);
> +BTF_ID_FLAGS(func, bpf_io_uring_queue_sqe, KF_SLEEPABLE);
> +BTF_ID_FLAGS(func, bpf_io_uring_get_cqe, 0);
> +BTF_ID_FLAGS(func, bpf_io_uring_extract_next_cqe, KF_RET_NULL);
> +BTF_KFUNCS_END(io_uring_kfunc_set)

This is not safe in general.
The verifier doesn't enforce argument safety here.
As a minimum you need to add KF_TRUSTED_ARGS flag to all kfunc.
And once you do that you'll see that the verifier
doesn't recognize the cqe returned from bpf_io_uring_get_cqe*()
as trusted.
Looking at your example:
https://github.com/axboe/liburing/commit/706237127f03e15b4cc9c7c31c16d34dbff37cdc
it doesn't care about contents of cqe and doesn't pass it further.
So sort-of ok-ish right now,
but if you need to pass cqe to another kfunc
you would need to add an open coded iterator for cqe-s
with appropriate KF_ITER* flags
or maybe add acquire/release semantics for cqe.
Like, get_cqe will be KF_ACQUIRE, and you'd need
matching KF_RELEASE kfunc,
so that 'cqe' is not lost.
Then 'cqe' will be trusted and you can pass it as actual 'cqe'
into another kfunc.
Without KF_ACQUIRE the verifier sees that get_cqe*() kfuncs
return 'struct io_uring_cqe *' and it's ok for tracing
or passing into kfuncs like bpf_io_uring_queue_sqe()
that don't care about a particular type,
but not ok for full tracking of objects.

For next revision please post all selftest, examples,
and bpf progs on the list,
so people don't need to search github.

next prev parent reply	other threads:[~2025-06-12  2:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06 13:57 [RFC v2 0/5] BPF controlled io_uring Pavel Begunkov
2025-06-06 13:57 ` [RFC v2 1/5] io_uring: add struct for state controlling cqwait Pavel Begunkov
2025-06-06 13:57 ` [RFC v2 2/5] io_uring/bpf: add stubs for bpf struct_ops Pavel Begunkov
2025-06-06 14:25   ` Jens Axboe
2025-06-06 14:28     ` Jens Axboe
2025-06-06 13:58 ` [RFC v2 3/5] io_uring/bpf: implement struct_ops registration Pavel Begunkov
2025-06-06 14:57   ` Jens Axboe
2025-06-06 20:00     ` Pavel Begunkov
2025-06-06 21:07       ` Jens Axboe
2025-06-06 13:58 ` [RFC v2 4/5] io_uring/bpf: add handle events callback Pavel Begunkov
2025-06-12  2:28   ` Alexei Starovoitov
2025-06-12  9:33     ` Pavel Begunkov
2025-06-12 14:07     ` Jens Axboe
2025-06-06 13:58 ` [RFC v2 5/5] io_uring/bpf: add basic kfunc helpers Pavel Begunkov
2025-06-12  2:47   ` Alexei Starovoitov [this message]
2025-06-12 13:26     ` Pavel Begunkov
2025-06-12 14:06       ` Jens Axboe
2025-06-13  0:25       ` Alexei Starovoitov
2025-06-13 16:12         ` Pavel Begunkov
2025-06-13 19:51           ` Alexei Starovoitov
2025-06-16 20:34             ` Pavel Begunkov
2025-06-06 14:38 ` [RFC v2 0/5] BPF controlled io_uring Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAADnVQJgxnQEL+rtVkp7TB_qQ1JKHiXe=p48tB_-N6F+oaDLyQ@mail.gmail.com' \
    --to=alexei.starovoitov@gmail.com \
    --cc=asml.silence@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox