public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>,
	io-uring@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH AUTOSEL 7.0-6.12] io_uring: take page references for NOMMU pbuf_ring mmaps
Date: Tue, 28 Apr 2026 06:41:24 -0400	[thread overview]
Message-ID: <20260428104133.2858589-73-sashal@kernel.org> (raw)
In-Reply-To: <20260428104133.2858589-1-sashal@kernel.org>

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit d0be8884f56b0b800cd8966e37ce23417cd5044e ]

Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
virtual address of the io_mapped_region's backing pages directly;
the user's VMA aliases the kernel allocation. io_uring_mmap() then
just returns 0 -- it takes no page references.

The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
each inserted page.  Those references are released when the VMA is torn
down (zap_pte_range -> put_page). io_free_region() -> release_pages()
drops the io_uring-side references, but the pages survive until munmap
drops the VMA-side references.

Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
io_put_bl -> io_free_region -> release_pages drops the only references
and the pages return to the buddy allocator while the user's VMA still
has vm_start pointing into them.  The user can then write into whatever
the allocator hands out next.

Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
release them via vm_ops->close.  NOMMU's delete_vma() calls vma_close()
which runs ->close on munmap.

This also incidentally addresses the duplicate-vm_start case: two mmaps
of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
With page refs taken per mmap, the second mmap takes its own refs and
the pages survive until both mmaps are closed.  The nommu rb-tree BUG_ON
on duplicate vm_start is a separate mm/nommu.c concern (it should share
the existing region rather than BUG), but the page lifetime is now
correct.

Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Anthropic
Assisted-by: gkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh
[axboe: get rid of region lookup, just iterate pages in vma]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

# Analysis: io_uring NOMMU pbuf_ring page UAF fix

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1 - Subject line:**
Record: subsystem `io_uring`, action verb `take`, summary: "take page
references for NOMMU pbuf_ring mmaps" — wraps a fix for a page lifetime
/ use-after-free issue under `!CONFIG_MMU`.

**Step 1.2 - Tags:**
Record:
- `Cc: Jens Axboe <axboe@kernel.dk>`
- `Reported-by: Anthropic` (AI bug report)
- `Assisted-by: gkh_clanker_t1000` (an unusual tag — verified this is
  identical to upstream commit `d0be8884f56b0`, not pipeline-injected)
- `Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>`
  (author SOB, kernel veteran)
- `Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh`
- `[axboe: get rid of region lookup, just iterate pages in vma]`
  (maintainer-folded change)
- `Signed-off-by: Jens Axboe <axboe@kernel.dk>` (subsystem maintainer)
No `Cc: stable` or `Fixes:` — expected.

**Step 1.3 - Body text:**
Record: Author explains a use-after-free root cause precisely:
- NOMMU `io_uring_get_unmapped_area()` returns a kernel virtual address;
  user VMA aliases the kernel pages.
- `io_uring_mmap()` returns 0 without taking page references.
- `io_unregister_pbuf_ring -> io_put_bl -> io_free_region ->
  release_pages` drops the only reference; pages return to the buddy
  allocator while the user's VMA still maps them.
- "The user can then write into whatever the allocator hands out next."
  — this is a write-after-free.
- Fix mirrors MMU lifetime by `get_page` per page in `mmap()` and
  `put_page` via `vm_ops->close`.
- Also addresses the duplicate-vm_start case for SQ/CQ.

**Step 1.4 - Hidden bug fix?**
Record: Not hidden — the commit body explicitly describes a use-after-
free / write-after-free of pages handed to userspace, which is a serious
memory-safety / security bug.

## PHASE 2: DIFF ANALYSIS

**Step 2.1 - Inventory:**
Record: 1 file changed (`io_uring/memmap.c`); +44 lines, -1 line. Adds
`io_uring_nommu_vm_close()`, `io_uring_nommu_vm_ops`, expands
`io_uring_mmap()` (`!CONFIG_MMU` branch). Single-file, surgical NOMMU-
only change.

**Step 2.2 - Code flow change:**
Before: `io_uring_mmap()` for NOMMU only validated flags; returned 0
with no page references taken.
After: validates flags, looks up the region under `ctx->mmap_lock`,
validates region is set and the VMA size matches `region->nr_pages`,
takes a `get_page()` per backing page, and installs `vm_ops->close` to
drop those references at unmap.

**Step 2.3 - Bug mechanism:**
Record: Use-after-free / write-after-free of kernel pages still mapped
in userspace. Category: memory safety + reference counting (missing
`get_page` on the mmap path that aliases kernel allocations). The fix
balances the lifetime by adding `get_page()` on map and `put_page()` on
close.

**Step 2.4 - Fix quality:**
Record: Small, contained. Logic is straightforward: per-page `get_page`
on map, mirrored `put_page` on close. The validation that `vma->vm_end -
vma->vm_start == region->nr_pages << PAGE_SHIFT` guards the close-time
`virt_to_page` walk over the VMA address range. Risk that
`vma->vm_start` no longer points to those pages is addressed by holding
the page references — the kernel virtual address remains valid as long
as the page is alive. Fix is obviously correct for the NOMMU case
described.

## PHASE 3: GIT HISTORY

**Step 3.1 - Blame:**
Record: The vulnerable line `return
is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;` has been present
in NOMMU `io_uring_mmap()` since `f15ed8b4d0ce2 io_uring: move
mapping/allocation helpers to a separate file` (v6.10) and earlier in
`io_uring/io_uring.c` going back to v6.0 era when io_uring moved into
its own subdirectory (`ed29b0b4fd835`, v6.0).

**Step 3.2 - Fixes: tag:**
Record: No Fixes: tag. The specific UAF via the `pbuf_ring`
`release_pages` path requires the region API on the pbuf side, which
arrived with `ef62de3c4ad58 io_uring/kbuf: use region api for pbuf
rings` and the sibling memmap commits, all in v6.14-rc1.

**Step 3.3 - Related changes:**
Record: Relevant series: `7cd7b9575270e io_uring/memmap: unify io_uring
mmap'ing code`, `ef62de3c4ad58 io_uring/kbuf: use region api for pbuf
rings`, `90175f3f50321 io_uring/kbuf: remove pbuf ring refcounting` (all
v6.14-rc1). These restructured pbuf_ring mmap to share the region
machinery — the same machinery whose `release_pages` now drops the only
reference under NOMMU.

**Step 3.4 - Author:**
Record: Author is Greg Kroah-Hartman (LTS maintainer). Folded by Jens
Axboe (io_uring maintainer). Both highly authoritative.

**Step 3.5 - Dependencies:**
Record: The fix uses `io_mmap_get_region()`, `io_region_is_set()`,
`region->pages`, `region->nr_pages`, `ctx->mmap_lock` — all introduced
in v6.14. For v6.14+ stable trees, this should apply standalone. For
older trees (≤v6.12), the patch will not apply as-is.

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1 - Original submission:**
Record: `b4 dig -c d0be8884f56b0` returned thread
`https://lore.kernel.org/all/2026042115-body-attention-d15b@gregkh/`.
The series went through one revision — Jens folded a simplification
("get rid of region lookup, just iterate pages in vma") with size
validation before applying.

**Step 4.2 - Reviewers:**
Record: To `io-uring@vger.kernel.org`, Cc: Jens Axboe (subsystem
maintainer). Maintainer folded changes and pushed.

**Step 4.3 - Bug report:**
Record: Greg's email confirms this was an AI-generated report.
**However**, Greg explicitly built a PoC (poc.c + run-poc.sh attached to
the thread) which:
- Builds a riscv64 NOMMU kernel and boots in QEMU with `init_on_free=1`
- As init, registers a pbuf_ring with `IOU_PBUF_RING_MMAP`, mmaps a
  page, writes a 0x55 canary, unregisters the pbuf_ring, then re-reads
- On unfixed: canary becomes 0x00 (page freed and zeroed), then re-
  registering reuses the same page demonstrating write-after-free
- On fixed: canary is intact
- Greg replied `Tested-by: Greg Kroah-Hartman
  <gregkh@linuxfoundation.org>` after Jens's folded version

The CVE-style identifiers `ANT-2026-02884` (the UAF) and
`ANT-2026-02650` (related duplicate vm_start) are referenced in the PoC.

**Step 4.4 - Series context:**
Record: Single patch (no series). Greg also has an alternative patch
that disables io_uring on `!MMU` entirely, which Jens did not accept in
favor of this fix.

**Step 4.5 - Stable discussion:**
Record: No explicit `Cc: stable` mention in the thread, and no
`stable@vger.kernel.org` in the discussion. However, this is a confirmed
UAF reachable from unprivileged userspace with a working exploit
reproducer — clearly stable material.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1 - Modified functions:**
Record: `io_uring_mmap()` (NOMMU branch), new
`io_uring_nommu_vm_close()`, new `io_uring_nommu_vm_ops`.

**Step 5.2 - Callers:**
Record: `io_uring_mmap` is the file_operations `.mmap` for the io_uring
fd; reachable from any userspace `mmap()` on an io_uring fd.
`io_uring_nommu_vm_close` is invoked by `delete_vma()` in `mm/nommu.c`
on `munmap`/exit. The bug path: `io_unregister_pbuf_ring()` →
`io_put_bl()` (`io_uring/kbuf.c:445`) → `io_free_region()`
(`io_uring/memmap.c:91`) → `release_pages()` — confirmed by `git grep`.

**Step 5.3 - Callees:**
Record: `get_page()`, `put_page()`, `is_nommu_shared_mapping()`,
`io_mmap_get_region()`, `io_region_is_set()`, `virt_to_page()`. All
standard kernel APIs.

**Step 5.4 - Reachability:**
Record: io_uring `register`/`unregister` and `mmap` are unprivileged
syscalls (no `CAP_SYS_ADMIN` for these paths — verified by grep across
`io_uring/`). The PoC demonstrates a full unprivileged trigger.

**Step 5.5 - Similar patterns:**
Record: The MMU path uses `vm_insert_pages()` (which does its own
`get_page` per inserted page, released on VMA teardown via
`zap_pte_range -> put_page`). The fix gives NOMMU equivalent symmetry.
Searching for other `is_nommu_shared_mapping` users (`fc4f4be9b5271`) —
io_uring is the only file_ops user adding such page lifetime semantics
manually.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1 - Bug presence in stable:**
Record: Verified `git show v6.18:io_uring/memmap.c` and `git show
v7.0:io_uring/memmap.c` — both contain the unfixed `return
is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;`. The pbuf_ring
region API (the trigger surface for this exact UAF) exists from v6.14
onward. Affected trees with this exact bug: v6.14, v6.15, v6.16, v6.17,
**v6.18 LTS**, v6.19, **v7.0** (this branch).

**Step 6.2 - Backport complications:**
Record: For v6.14 → v7.0, all helpers (`io_mmap_get_region`,
`io_region_is_set`, `ctx->mmap_lock`, `region->pages/nr_pages`,
`guard(mutex)`) exist; the patch should apply cleanly or with trivial
adjustment. For v6.12 LTS and older, `io_mmap_get_region()` does not
exist (region API absent in pbuf path) — the same conceptual UAF may
exist via different code, but the fix as-presented does not apply. v6.6
LTS — same story.

**Step 6.3 - Related fixes already in stable:**
Record: No prior fix found. This is a new, recently-discovered class of
bug.

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1 - Subsystem and criticality:**
Record: `io_uring` — IMPORTANT (heavily used subsystem; security-
relevant; reachable from unprivileged userspace). Criticality of this
specific config (NOMMU): PERIPHERAL (only `!CONFIG_MMU` builds, mostly
RISC-V/embedded). Net assessment: IMPORTANT-but-PERIPHERAL —
unprivileged UAF in a security-sensitive subsystem, on a small but real
config.

**Step 7.2 - Subsystem activity:**
Record: io_uring is one of the most actively developed kernel
subsystems; the affected code (region API) is recent (v6.14) and well
maintained.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1 - Affected users:**
Record: Users of `!CONFIG_MMU` kernels (RISC-V nommu, ARM nommu,
Blackfin successors, some MicroBlaze configs, embedded NOMMU systems
with io_uring enabled). Small population, but real and the bug is
unconditional on those builds when pbuf_ring mmap is used.

**Step 8.2 - Trigger:**
Record: Trivial — unprivileged process calls `io_uring_setup`,
`io_uring_register(IORING_REGISTER_PBUF_RING, ..., IOU_PBUF_RING_MMAP)`,
`mmap(IORING_OFF_PBUF_RING)`, then
`io_uring_register(IORING_UNREGISTER_PBUF_RING, ...)`. PoC demonstrates
this path. Same pattern for SQ/CQ rings.

**Step 8.3 - Failure mode:**
Record: Use-after-free → write-after-free of kernel pages from
userspace. With the page returned to the buddy allocator and reused
(kernel-side allocation hands the same page back), the user can
read/write whatever the kernel later places there — heap-spray-friendly,
security-CRITICAL. PoC ends with sysrq-c kernel panic for proof.

**Step 8.4 - Risk-benefit:**
Record:
- Benefit: prevents an unprivileged user-triggered UAF / write-after-
  free on NOMMU systems — exactly the stable mandate.
- Risk: minimal — change is confined to the `!CONFIG_MMU` branch of
  `io_uring/memmap.c` (44 lines), so it cannot affect any MMU build.
  Even on NOMMU, the fix only adds `get_page`/`put_page` symmetry to
  mirror the MMU path. Tested-by Greg Kroah-Hartman with explicit PoC +
  boot test.
Ratio: very high benefit / very low risk.

## PHASE 9: FINAL SYNTHESIS

**Step 9.1 - Evidence:**
- FOR: confirmed unprivileged-reachable UAF/WAF on NOMMU; PoC exists and
  panics unfixed kernels; small, surgical, single-file fix; tested by
  Greg KH; written by LTS maintainer; folded by io_uring maintainer;
  merged upstream in `d0be8884f56b0`; only touches NOMMU branch; depends
  on helpers all present in v6.14+.
- AGAINST: NOMMU is uncommon; the bug was AI-discovered and Greg
  initially expressed low confidence in the fix details; pre-v6.14
  stable trees would need a different patch.

**Step 9.2 - Stable rules:**
1. Obviously correct and tested? YES — PoC + Tested-by from LTS
   maintainer.
2. Fixes a real bug affecting users? YES — UAF reachable by any
   unprivileged process on NOMMU.
3. Important issue? YES — security-critical (write-after-free of
   arbitrary kernel pages).
4. Small and contained? YES — 1 file, ~44 lines, NOMMU-only branch.
5. No new features/APIs? YES — internal fix only.
6. Applies to stable? YES for v6.14+ (cleanly to v6.18, v6.19, v7.0);
   needs adaptation for older trees.

**Step 9.3 - Exceptions:** Not applicable — this is a normal bug fix,
not a quirk/device-id/DT update. Stands on its merits.

**Step 9.4 - Decision:** Backport. This is a confirmed unprivileged-
reachable use-after-free with a working PoC. The fix is small, surgical,
and isolated to the NOMMU code path so it cannot regress MMU builds. It
applies cleanly to v6.14+ stable trees including the v7.0.y autosel
target (HEAD here).

## Verification

- [Phase 1] Read full upstream commit `d0be8884f56b0` via `git show`;
  confirmed `Reported-by: Anthropic` and `Assisted-by:
  gkh_clanker_t1000` are part of the upstream commit, not pipeline-
  injected.
- [Phase 2] Diff inspection confirms +44/-1 lines in `io_uring/memmap.c`
  only, all in the `!CONFIG_MMU` branch.
- [Phase 3] `git log --oneline -- io_uring/memmap.c` and `git describe
  --contains` confirm region API arrived in v6.14-rc1 (`ef62de3c4ad58`,
  `7cd7b9575270e`); pre-v6.14 NOMMU mmap was already vulnerable in
  spirit but used different (refcounted) pbuf paths.
- [Phase 3] `git show v6.6:io_uring/io_uring.c`,
  `v6.12:io_uring/memmap.c`, `v6.18:io_uring/memmap.c`,
  `v7.0:io_uring/memmap.c` confirm the unfixed `return
  is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;` is present from
  v6.6 through v7.0.
- [Phase 4] `b4 dig -c d0be8884f56b0` returned thread
  `https://lore.kernel.org/all/2026042115-body-attention-d15b@gregkh/`.
- [Phase 4] `b4 dig -c d0be8884f56b0 -a` showed v1 only; Jens folded an
  inline simplification when applying.
- [Phase 4] `b4 dig -c d0be8884f56b0 -m /tmp/io_uring_thread.mbox` saved
  the thread; read confirms PoC (poc.c, run-poc.sh) tests vulnerable vs.
  fixed kernels with `init_on_free=1`, and `Tested-by: Greg Kroah-
  Hartman` on Jens's folded version.
- [Phase 4] PoC references CVE-style identifiers `ANT-2026-02884` (this
  UAF) and `ANT-2026-02650` (related duplicate vm_start case).
- [Phase 5] `grep` in `io_uring/kbuf.c` confirmed
  `io_unregister_pbuf_ring -> io_put_bl -> io_free_region` call chain at
  lines 445, 698, 719.
- [Phase 5] `grep` for `capable\|CAP_` in `io_uring/io_uring.c, kbuf.c,
  register.c` confirms IORING_REGISTER_PBUF_RING and io_uring_mmap are
  unprivileged.
- [Phase 6] Verified `io_mmap_get_region` and `io_region_is_set` exist
  in v6.18, v7.0; do not exist in v6.12.
- [Phase 8] Failure mode: confirmed UAF + WAF + observable from
  userspace via PoC. Severity: CRITICAL (security).
- UNVERIFIED: Did not attempt to actually run the PoC under QEMU in this
  session; relied on Greg KH's `Tested-by` and PoC source code
  inspection.
- UNVERIFIED: Did not check whether stable maintainers (separate from
  the discussion thread) have already queued or rejected this for
  stable.

The fix addresses a confirmed unprivileged-reachable use-after-free /
write-after-free in io_uring under `!CONFIG_MMU`, is small and contained
to the NOMMU branch only, was tested-by the LTS maintainer with a
working PoC, and applies cleanly to v6.14+ stable trees (including the
v7.0.y target this branch represents). It meets every stable rule.

**YES**

 io_uring/memmap.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index e6958968975a8..4f9b439319c46 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -366,9 +366,53 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
 
 #else /* !CONFIG_MMU */
 
+/*
+ * Drop the pages that were initially referenced and added in
+ * io_uring_mmap(). We cannot have had a mremap() as that isn't supported,
+ * hence the vma should be identical to the one we initially referenced and
+ * mapped, and partial unmaps and splitting isn't possible on a file backed
+ * mapping.
+ */
+static void io_uring_nommu_vm_close(struct vm_area_struct *vma)
+{
+	unsigned long index;
+
+	for (index = vma->vm_start; index < vma->vm_end; index += PAGE_SIZE)
+		put_page(virt_to_page((void *) index));
+}
+
+static const struct vm_operations_struct io_uring_nommu_vm_ops = {
+	.close = io_uring_nommu_vm_close,
+};
+
 int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
+	struct io_ring_ctx *ctx = file->private_data;
+	struct io_mapped_region *region;
+	unsigned long i;
+
+	if (!is_nommu_shared_mapping(vma->vm_flags))
+		return -EINVAL;
+
+	guard(mutex)(&ctx->mmap_lock);
+	region = io_mmap_get_region(ctx, vma->vm_pgoff);
+	if (!region || !io_region_is_set(region))
+		return -EINVAL;
+
+	if ((vma->vm_end - vma->vm_start) !=
+	    (unsigned long) region->nr_pages << PAGE_SHIFT)
+		return -EINVAL;
+
+	/*
+	 * Pin the pages so io_free_region()'s release_pages() does not
+	 * drop the last reference while this VMA exists. delete_vma()
+	 * in mm/nommu.c calls vma_close() which runs ->close above.
+	 */
+	for (i = 0; i < region->nr_pages; i++)
+		get_page(region->pages[i]);
+
+	vma->vm_ops = &io_uring_nommu_vm_ops;
+	return 0;
 }
 
 unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
-- 
2.53.0


      parent reply	other threads:[~2026-04-28 10:43 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260428104133.2858589-1-sashal@kernel.org>
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] io_uring/rsrc: unify nospec indexing for direct descriptors Sasha Levin
2026-04-28 10:41 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428104133.2858589-73-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox