From: Caleb Sander Mateos <csander@purestorage.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
io-uring@vger.kernel.org, Akilesh Kailash <akailash@google.com>,
bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc
Date: Tue, 30 Dec 2025 20:42:35 -0500 [thread overview]
Message-ID: <CADUfDZqsqiZkSZrkDacviO3sJ61KHmgZ5-BNm2+s6=Sb4yVV7Q@mail.gmail.com> (raw)
In-Reply-To: <20251104162123.1086035-6-ming.lei@redhat.com>
On Tue, Nov 4, 2025 at 8:22 AM Ming Lei <ming.lei@redhat.com> wrote:
>
> Add io_uring_bpf_req_memcpy() kfunc to enable BPF programs to copy
> data between buffers associated with IORING_OP_BPF requests.
>
> The kfunc supports copying between:
> - Plain user buffers (using import_ubuf())
> - Fixed/registered buffers (using io_import_reg_buf())
> - Mixed combinations (plain-to-fixed, fixed-to-plain)
>
> This enables BPF programs to implement data transformation and
> processing operations directly within io_uring's request context,
> avoiding additional userspace copies.
>
> Implementation details:
>
> 1. Add issue_flags tracking in struct uring_bpf_data:
> - Replace __pad field with issue_flags (bytes 36-39)
> - Initialized to 0 before ops->prep_fn()
> - Saved from issue_flags parameter before ops->issue_fn()
> - Required by io_import_reg_buf() for proper async handling
>
> 2. Add buffer preparation infrastructure:
> - io_bpf_prep_buffers() extracts buffer metadata from SQE
> - Buffer 1: plain (addr/len) or fixed (buf_index/addr/len)
> - Buffer 2: plain only (addr3/optlen)
> - Buffer types encoded in sqe->bpf_op_flags bits 23-18
>
> 3. io_uring_bpf_req_memcpy() implementation:
> - Validates buffer IDs (1 or 2) and prevents same-buffer copies
> - Extracts buffer metadata based on buffer ID
> - Sets up iov_iters using import_ubuf() or io_import_reg_buf()
> - Performs page-sized chunked copying via temporary buffer
> - Returns bytes copied or negative error code
>
> Buffer encoding in sqe->bpf_op_flags (32 bits):
> Bits 31-24: BPF operation ID (8 bits)
> Bits 23-21: Buffer 1 type (0=none, 1=plain, 2=fixed)
> Bits 20-18: Buffer 2 type (0=none, 1=plain)
> Bits 17-0: Custom BPF flags (18 bits)
>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> io_uring/bpf.c | 187 +++++++++++++++++++++++++++++++++++++++++++
> io_uring/uring_bpf.h | 11 ++-
> 2 files changed, 197 insertions(+), 1 deletion(-)
>
> diff --git a/io_uring/bpf.c b/io_uring/bpf.c
> index e837c3d57b96..ee4c617e3904 100644
> --- a/io_uring/bpf.c
> +++ b/io_uring/bpf.c
> @@ -109,6 +109,8 @@ int io_uring_bpf_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> if (ret)
> return ret;
>
> + /* ctx->uring_lock is held */
> + data->issue_flags = 0;
> if (ops->prep_fn)
> return ops->prep_fn(data, sqe);
> return -EOPNOTSUPP;
> @@ -126,6 +128,9 @@ static int __io_uring_bpf_issue(struct io_kiocb *req)
>
> int io_uring_bpf_issue(struct io_kiocb *req, unsigned int issue_flags)
> {
> + struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
> +
> + data->issue_flags = issue_flags;
> if (issue_flags & IO_URING_F_UNLOCKED) {
> int idx, ret;
>
> @@ -143,6 +148,8 @@ void io_uring_bpf_fail(struct io_kiocb *req)
> struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
> struct uring_bpf_ops *ops = uring_bpf_get_ops(data);
>
> + /* ctx->uring_lock is held */
> + data->issue_flags = 0;
> if (ops->fail_fn)
> ops->fail_fn(data);
> }
> @@ -152,6 +159,8 @@ void io_uring_bpf_cleanup(struct io_kiocb *req)
> struct uring_bpf_data *data = io_kiocb_to_cmd(req, struct uring_bpf_data);
> struct uring_bpf_ops *ops = uring_bpf_get_ops(data);
>
> + /* ctx->uring_lock is held */
> + data->issue_flags = 0;
> if (ops->cleanup_fn)
> ops->cleanup_fn(data);
> }
> @@ -324,6 +333,104 @@ static struct bpf_struct_ops bpf_uring_bpf_ops = {
> .owner = THIS_MODULE,
> };
>
> +/*
> + * Helper to copy data between two iov_iters using page extraction.
> + * Extracts pages from source iterator and copies them to destination.
> + * Returns number of bytes copied or negative error code.
> + */
> +static ssize_t io_bpf_copy_iters(struct iov_iter *src, struct iov_iter *dst,
> + size_t len)
> +{
> +#define MAX_PAGES_PER_LOOP 32
> + struct page *pages[MAX_PAGES_PER_LOOP];
> + size_t total_copied = 0;
> + bool need_unpin;
> +
> + /* Determine if we'll need to unpin pages later */
> + need_unpin = user_backed_iter(src);
Use iov_iter_extract_will_pin() for clarity?
> +
> + /* Process pages in chunks */
> + while (len > 0) {
> + struct page **page_array = pages;
> + size_t offset, copied = 0;
> + ssize_t extracted;
> + unsigned int nr_pages;
> + size_t chunk_len;
> + int i;
> +
> + /* Extract up to MAX_PAGES_PER_LOOP pages */
> + chunk_len = min_t(size_t, len, MAX_PAGES_PER_LOOP * PAGE_SIZE);
> + extracted = iov_iter_extract_pages(src, &page_array, chunk_len,
> + MAX_PAGES_PER_LOOP, 0, &offset);
> + if (extracted <= 0) {
> + if (total_copied > 0)
> + break;
> + return extracted < 0 ? extracted : -EFAULT;
> + }
> +
> + nr_pages = DIV_ROUND_UP(offset + extracted, PAGE_SIZE);
> +
> + /* Copy pages to destination iterator */
> + for (i = 0; i < nr_pages && copied < extracted; i++) {
> + size_t page_offset = (i == 0) ? offset : 0;
> + size_t page_len = min_t(size_t, extracted - copied,
> + PAGE_SIZE - page_offset);
> + size_t n;
> +
> + n = copy_page_to_iter(pages[i], page_offset, page_len, dst);
> + copied += n;
> + if (n < page_len)
> + break;
> + }
> +
> + /* Clean up extracted pages */
> + if (need_unpin)
> + unpin_user_pages(pages, nr_pages);
Could avoid the page pinning and unpinning cost when copying from a
user iov_iter to a bvec iov_iter by using iov_iter_extract_pages(dst)
and copy_page_from_iter(src). But this optimization could be
implemented later.
> +
> + total_copied += copied;
> + len -= copied;
> +
> + /* Stop if we didn't copy all extracted data */
> + if (copied < extracted)
> + break;
> + }
> +
> + return total_copied;
> +#undef MAX_PAGES_PER_LOOP
> +}
> +
> +/*
> + * Helper to import a buffer into an iov_iter for BPF memcpy operations.
> + * Handles both plain user buffers and fixed/registered buffers.
> + *
> + * @req: io_kiocb request
> + * @iter: output iterator
> + * @buf_type: buffer type (plain or fixed)
> + * @addr: buffer address
> + * @offset: offset into buffer
> + * @len: length from offset
> + * @direction: ITER_SOURCE for source buffer, ITER_DEST for destination
> + * @issue_flags: io_uring issue flags
> + *
> + * Returns 0 on success, negative error code on failure.
> + */
> +static int io_bpf_import_buffer(struct io_kiocb *req, struct iov_iter *iter,
> + u8 buf_type, u64 addr, unsigned int offset,
> + u32 len, int direction, unsigned int issue_flags)
> +{
> + if (buf_type == IORING_BPF_BUF_TYPE_PLAIN) {
> + /* Plain user buffer */
> + return import_ubuf(direction, (void __user *)(addr + offset),
> + len - offset, iter);
> + } else if (buf_type == IORING_BPF_BUF_TYPE_FIXED) {
> + /* Fixed buffer */
> + return io_import_reg_buf(req, iter, addr + offset,
> + len - offset, direction, issue_flags);
> + }
> +
> + return -EINVAL;
> +}
> +
> __bpf_kfunc_start_defs();
> __bpf_kfunc void uring_bpf_set_result(struct uring_bpf_data *data, int res)
> {
> @@ -339,11 +446,91 @@ __bpf_kfunc struct io_kiocb *uring_bpf_data_to_req(struct uring_bpf_data *data)
> {
> return cmd_to_io_kiocb(data);
> }
> +
> +/**
> + * io_uring_bpf_req_memcpy - Copy data between io_uring BPF request buffers
> + * @data: BPF request data containing buffer metadata
> + * @dest: Destination buffer descriptor (with buf_id and offset)
> + * @src: Source buffer descriptor (with buf_id and offset)
> + * @len: Number of bytes to copy
> + *
> + * Copies data between two different io_uring BPF request buffers (buf_id 1 and 2).
> + * Supports: plain-to-plain, fixed-to-plain, and plain-to-fixed.
> + * Does not support copying within the same buffer (src and dest must be different).
> + *
> + * Returns: Number of bytes copied on success, negative error code on failure
> + */
> +__bpf_kfunc int io_uring_bpf_req_memcpy(struct uring_bpf_data *data,
> + struct bpf_req_mem_desc *dest,
> + struct bpf_req_mem_desc *src,
Curious, does struct bpf_req_mem_desc need to be registered anywhere
for use as a kfunc argument? Or is any type automatically allowed to
be used with a kfunc since BTF is generated for it?
Best,
Caleb
> + unsigned int len)
> +{
> + struct io_kiocb *req = cmd_to_io_kiocb(data);
> + struct iov_iter dst_iter, src_iter;
> + u8 dst_type, src_type;
> + u64 dst_addr, src_addr;
> + u32 dst_len, src_len;
> + int ret;
> +
> + /* Validate buffer IDs */
> + if (dest->buf_id < 1 || dest->buf_id > 2 ||
> + src->buf_id < 1 || src->buf_id > 2)
> + return -EINVAL;
> +
> + /* Don't allow copying within the same buffer */
> + if (src->buf_id == dest->buf_id)
> + return -EINVAL;
> +
> + /* Extract source buffer metadata */
> + if (src->buf_id == 1) {
> + src_type = IORING_BPF_BUF1_TYPE(data->opf);
> + src_addr = data->buf1_addr;
> + src_len = data->buf1_len;
> + } else {
> + src_type = IORING_BPF_BUF2_TYPE(data->opf);
> + src_addr = data->buf2_addr;
> + src_len = data->buf2_len;
> + }
> +
> + /* Extract destination buffer metadata */
> + if (dest->buf_id == 1) {
> + dst_type = IORING_BPF_BUF1_TYPE(data->opf);
> + dst_addr = data->buf1_addr;
> + dst_len = data->buf1_len;
> + } else {
> + dst_type = IORING_BPF_BUF2_TYPE(data->opf);
> + dst_addr = data->buf2_addr;
> + dst_len = data->buf2_len;
> + }
> +
> + /* Validate offsets and lengths */
> + if (src->offset + len > src_len || dest->offset + len > dst_len)
> + return -EINVAL;
> +
> + /* Initialize source iterator */
> + ret = io_bpf_import_buffer(req, &src_iter, src_type,
> + src_addr, src->offset, src_len,
> + ITER_SOURCE, data->issue_flags);
> + if (ret)
> + return ret;
> +
> + /* Initialize destination iterator */
> + ret = io_bpf_import_buffer(req, &dst_iter, dst_type,
> + dst_addr, dest->offset, dst_len,
> + ITER_DEST, data->issue_flags);
> + if (ret)
> + return ret;
> +
> + /* Extract pages from source iterator and copy to destination */
> + return io_bpf_copy_iters(&src_iter, &dst_iter, len);
> +}
> +
> __bpf_kfunc_end_defs();
>
> BTF_KFUNCS_START(uring_bpf_kfuncs)
> BTF_ID_FLAGS(func, uring_bpf_set_result)
> BTF_ID_FLAGS(func, uring_bpf_data_to_req)
> +BTF_ID_FLAGS(func, io_uring_bpf_req_memcpy)
> BTF_KFUNCS_END(uring_bpf_kfuncs)
>
> static const struct btf_kfunc_id_set uring_kfunc_set = {
> diff --git a/io_uring/uring_bpf.h b/io_uring/uring_bpf.h
> index c919931cb4b0..d6e0d6dff82e 100644
> --- a/io_uring/uring_bpf.h
> +++ b/io_uring/uring_bpf.h
> @@ -14,13 +14,22 @@ struct uring_bpf_data {
> /* Buffer 2 metadata - readable for bpf prog (plain only) */
> u64 buf2_addr; /* buffer 2 address, bytes 24-31 */
> u32 buf2_len; /* buffer 2 length, bytes 32-35 */
> - u32 __pad; /* padding, bytes 36-39 */
> + u32 issue_flags; /* issue_flags from io_uring, bytes 36-39 */
>
> /* writeable for bpf prog */
> u8 pdu[64 - sizeof(struct file *) - 4 * sizeof(u32) -
> 2 * sizeof(u64)];
> };
>
> +/*
> + * Descriptor for io_uring BPF request buffer.
> + * Used by io_uring_bpf_req_memcpy() to identify which buffer to copy from/to.
> + */
> +struct bpf_req_mem_desc {
> + u8 buf_id; /* Buffer ID: 1 or 2 */
> + unsigned int offset; /* Offset into buffer */
> +};
> +
> typedef int (*uring_io_prep_t)(struct uring_bpf_data *data,
> const struct io_uring_sqe *sqe);
> typedef int (*uring_io_issue_t)(struct uring_bpf_data *data);
> --
> 2.47.0
>
next prev parent reply other threads:[~2025-12-31 1:42 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 16:21 [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Ming Lei
2025-11-04 16:21 ` [PATCH 1/5] io_uring: prepare for extending io_uring with bpf Ming Lei
2025-12-31 1:13 ` Caleb Sander Mateos
2025-12-31 9:33 ` Ming Lei
2025-11-04 16:21 ` [PATCH 2/5] io_uring: bpf: add io_uring_ctx setup for BPF into one list Ming Lei
2025-12-31 1:13 ` Caleb Sander Mateos
2025-12-31 9:49 ` Ming Lei
2025-12-31 16:19 ` Caleb Sander Mateos
2025-11-04 16:21 ` [PATCH 3/5] io_uring: bpf: extend io_uring with bpf struct_ops Ming Lei
2025-11-07 19:02 ` kernel test robot
2025-11-08 6:53 ` kernel test robot
2025-11-13 10:32 ` Stefan Metzmacher
2025-11-13 10:59 ` Ming Lei
2025-11-13 11:19 ` Stefan Metzmacher
2025-11-14 3:00 ` Ming Lei
2025-12-08 22:45 ` Caleb Sander Mateos
2025-12-09 3:08 ` Ming Lei
2025-12-10 16:11 ` Caleb Sander Mateos
2025-11-19 14:39 ` Jonathan Corbet
2025-11-20 1:46 ` Ming Lei
2025-11-20 1:51 ` Ming Lei
2025-12-31 1:19 ` Caleb Sander Mateos
2025-12-31 10:32 ` Ming Lei
2025-12-31 16:48 ` Caleb Sander Mateos
2025-11-04 16:21 ` [PATCH 4/5] io_uring: bpf: add buffer support for IORING_OP_BPF Ming Lei
2025-11-13 10:42 ` Stefan Metzmacher
2025-11-13 11:04 ` Ming Lei
2025-11-13 11:25 ` Stefan Metzmacher
2025-12-31 1:42 ` Caleb Sander Mateos
2025-12-31 11:02 ` Ming Lei
2025-12-31 17:02 ` Caleb Sander Mateos
2025-11-04 16:21 ` [PATCH 5/5] io_uring: bpf: add io_uring_bpf_req_memcpy() kfunc Ming Lei
2025-11-07 18:51 ` kernel test robot
2025-12-31 1:42 ` Caleb Sander Mateos [this message]
2025-11-05 12:47 ` [PATCH 0/5] io_uring: add IORING_OP_BPF for extending io_uring Pavel Begunkov
2025-11-05 15:57 ` Ming Lei
2025-11-06 16:03 ` Pavel Begunkov
2025-11-07 15:54 ` Ming Lei
2025-11-11 14:07 ` Pavel Begunkov
2025-11-13 4:18 ` Ming Lei
2025-11-19 19:00 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CADUfDZqsqiZkSZrkDacviO3sJ61KHmgZ5-BNm2+s6=Sb4yVV7Q@mail.gmail.com' \
--to=csander@purestorage.com \
--cc=akailash@google.com \
--cc=ast@kernel.org \
--cc=axboe@kernel.dk \
--cc=bpf@vger.kernel.org \
--cc=io-uring@vger.kernel.org \
--cc=ming.lei@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox