From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: io-uring@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Jens Axboe <axboe@kernel.dk>
Subject: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps
Date: Tue, 21 Apr 2026 15:46:16 +0200 [thread overview]
Message-ID: <2026042115-body-attention-d15b@gregkh> (raw)
Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
virtual address of the io_mapped_region's backing pages directly;
the user's VMA aliases the kernel allocation. io_uring_mmap() then
just returns 0 -- it takes no page references.
The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
each inserted page. Those references are released when the VMA is torn
down (zap_pte_range -> put_page). io_free_region() -> release_pages()
drops the io_uring-side references, but the pages survive until munmap
drops the VMA-side references.
Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
io_put_bl -> io_free_region -> release_pages drops the only references
and the pages return to the buddy allocator while the user's VMA still
has vm_start pointing into them. The user can then write into whatever
the allocator hands out next.
Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
release them via vm_ops->close. NOMMU's delete_vma() calls vma_close()
which runs ->close on munmap. If the region was unregistered between
mmap and munmap (region->pages is NULL after io_free_region's memset),
walk the VMA address range instead -- the pages are still live (our refs
kept them) and virt_to_page recovers them.
This also incidentally addresses the duplicate-vm_start case: two mmaps
of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
With page refs taken per mmap, the second mmap takes its own refs and
the pages survive until both mmaps are closed. The nommu rb-tree BUG_ON
on duplicate vm_start is a separate mm/nommu.c concern (it should share
the existing region rather than BUG), but the page lifetime is now
correct.
Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Anthropic
Assisted-by: gkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
Note, I have no way of testing this, I'm only forwarding this on because
I got the bug report and was able to generate something that "seems"
correct, but it might be a total load of crap here, my knowledge of the
vm layer is very low so take this for where it is coming from (i.e. a
non-deterministic pattern matching system.)
I do have another patch that just disables io_uring for !MMU systems, if
you want that instead? Or is this feature something that !MMU devices
actually care about?
io_uring/memmap.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 68 insertions(+), 1 deletion(-)
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index e6958968975a..6818e9abf3b3 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -366,9 +366,76 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
#else /* !CONFIG_MMU */
+/*
+ * Under NOMMU, get_unmapped_area returns the kernel virtual address of
+ * the io_mapped_region's backing pages directly -- the user's VMA
+ * aliases the kernel allocation rather than holding its own copy or
+ * page-table entries. The CONFIG_MMU path's vm_insert_pages() takes
+ * page references that survive until munmap; this path takes none, so
+ * io_unregister_pbuf_ring() -> io_free_region() -> release_pages()
+ * frees the pages while the user's VMA still maps them. The user can
+ * then write into whatever the buddy allocator hands out next.
+ *
+ * Mirror the MMU lifetime by taking page references in io_uring_mmap()
+ * and releasing them in vm_ops->close. We re-derive the region from
+ * vm_pgoff (same lookup get_unmapped_area used) so we know which pages
+ * to grab.
+ */
+
+static void io_uring_nommu_vm_close(struct vm_area_struct *vma)
+{
+ struct io_ring_ctx *ctx = vma->vm_file->private_data;
+ struct io_mapped_region *region;
+ unsigned long i;
+
+ guard(mutex)(&ctx->mmap_lock);
+ region = io_mmap_get_region(ctx, vma->vm_pgoff);
+ /*
+ * The region may have been unregistered (memset to zero in
+ * io_free_region()) between mmap and munmap. The page refs we
+ * took in io_uring_mmap() are what kept the pages alive; release
+ * them via the VMA range since the region->pages array is gone.
+ */
+ if (region && region->pages) {
+ for (i = 0; i < region->nr_pages; i++)
+ put_page(region->pages[i]);
+ } else {
+ /* Region cleared; walk the VMA range. */
+ unsigned long a;
+
+ for (a = vma->vm_start; a < vma->vm_end; a += PAGE_SIZE)
+ put_page(virt_to_page((void *)a));
+ }
+}
+
+static const struct vm_operations_struct io_uring_nommu_vm_ops = {
+ .close = io_uring_nommu_vm_close,
+};
+
int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
{
- return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
+ struct io_ring_ctx *ctx = file->private_data;
+ struct io_mapped_region *region;
+ unsigned long i;
+
+ if (!is_nommu_shared_mapping(vma->vm_flags))
+ return -EINVAL;
+
+ guard(mutex)(&ctx->mmap_lock);
+ region = io_mmap_get_region(ctx, vma->vm_pgoff);
+ if (!region || !io_region_is_set(region))
+ return -EINVAL;
+
+ /*
+ * Pin the pages so io_free_region()'s release_pages() does not
+ * drop the last reference while this VMA exists. delete_vma()
+ * in mm/nommu.c calls vma_close() which runs ->close above.
+ */
+ for (i = 0; i < region->nr_pages; i++)
+ get_page(region->pages[i]);
+
+ vma->vm_ops = &io_uring_nommu_vm_ops;
+ return 0;
}
unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
--
2.53.0
next reply other threads:[~2026-04-21 13:46 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 13:46 Greg Kroah-Hartman [this message]
2026-04-21 13:50 ` [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps Jens Axboe
2026-04-21 13:55 ` Greg Kroah-Hartman
2026-04-21 14:02 ` Jens Axboe
2026-04-21 16:01 ` Greg Kroah-Hartman
2026-04-21 16:05 ` Jens Axboe
2026-04-21 16:21 ` Jens Axboe
2026-04-21 16:24 ` Greg Kroah-Hartman
2026-04-21 16:41 ` Jens Axboe
2026-04-21 17:04 ` Jens Axboe
2026-04-21 17:38 ` Jens Axboe
2026-04-21 17:39 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2026042115-body-attention-d15b@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox