public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 7.0-6.18] io_uring/rsrc: unify nospec indexing for direct descriptors
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
@ 2026-04-28 10:40 ` Sasha Levin
  2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.12] io_uring: take page references for NOMMU pbuf_ring mmaps Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-04-28 10:40 UTC (permalink / raw)
  To: patches, stable
  Cc: Jens Axboe, Gabriel Krisman Bertazi, Sasha Levin, io-uring,
	linux-kernel

From: Jens Axboe <axboe@kernel.dk>

[ Upstream commit 53262c91f7b81f96495ff24e9d1fa8b1632e69c8 ]

For file updates, the node reset isn't capping the value via
array_index_nospec() like the other paths do. Ensure it's all sane and
have the update path do the proper capping as well.

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Now I have enough context. Let me write my analysis.

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1: Subject line parse**
- Subsystem: `io_uring/rsrc`
- Action verb: "unify" (cleanup-style verb that often masks a fix)
- Record: `io_uring/rsrc` `unify` - bringing nospec indexing into one
  place across direct descriptor paths

**Step 1.2: Tag analysis**
- `Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>` - reviewed by
  an external developer
- `Signed-off-by: Jens Axboe <axboe@kernel.dk>` - io_uring maintainer
  (he wrote the patch)
- NO `Fixes:` tag
- NO `Cc: stable@vger.kernel.org`
- NO `Reported-by:`, `Link:`
- Record: Bare-bones tag set; reviewed once by SUSE engineer; author is
  the subsystem maintainer.

**Step 1.3: Body analysis**
- Bug description: "the node reset isn't capping the value via
  array_index_nospec() like the other paths do"
- Failure mode: Spectre v1 (Bounds Check Bypass / CVE-2017-5753)
  speculative side-channel
- Author explicitly contrasts the buggy file-update path with "the other
  paths" that already use `array_index_nospec()` (i.e., the buffer
  update path and `io_rsrc_node_lookup`)
- Record: This is missing Spectre v1 hardening on a user-reachable
  register-files-update code path.

**Step 1.4: Hidden bug fix detection**
- "unify" is cleanup language but the substance is restoring missing
  speculation protection on a user-controlled index. This is a real
  defensive-security fix (similar to the pattern of `b7620121dc04e`,
  `34bb77184123a`, `953c37e066f05`, and `29b95ac917927`, all of which
  were Spectre v1 nospec fixes).
- Record: This IS a hidden bug fix - missing Spectre v1 protection.

## PHASE 2: DIFF ANALYSIS

**Step 2.1: Inventory**
- `io_uring/rsrc.c`: +3 lines in `__io_sqe_files_update()`
- `io_uring/rsrc.h`: +6/-1 lines in `io_reset_rsrc_node()` inline
- Total: 10 insertions, 2 deletions across 2 files
- Scope: single-file-pair, single subsystem, surgical
- Record: ~10 line surgical change in one helper + one caller.

**Step 2.2: Code flow change**
- Before in `__io_sqe_files_update`: `i = up->offset + done;
  io_reset_rsrc_node(...)` - relies only on the upfront architectural
  check at line 222 (`up->offset + nr_args > ctx->file_table.data.nr`)
- After: explicit `if (i >= ctx->file_table.data.nr) break;` then `i =
  array_index_nospec(i, ...)` - speculation barrier
- Before in `io_reset_rsrc_node`: `data->nodes[index]` directly without
  index hardening
- After: bounds-check-then-nospec-mask before dereferencing
  `data->nodes[index]`
- Index parameter widened from `int` to `unsigned int` (safer for the
  comparison with unsigned `data->nr`)
- Record: Adds Spectre v1 mitigation in two places (caller-side and
  helper-side, defense-in-depth).

**Step 2.3: Bug mechanism**
- Category: Memory safety / Spectre v1 (Bounds Check Bypass)
- Mechanism: User passes `up->offset` and `nr_args`. The upfront check
  at line 222 is architecturally correct, but on speculation, a CPU
  could mispredict the bounds branch and do a speculative
  `data->nodes[i]` load with i out of bounds, leaving observable cache
  state. `array_index_nospec()` is the canonical mitigation.
- Record: Spectre v1 / CVE-2017-5753 hardening on a user-reachable index
  load.

**Step 2.4: Fix quality**
- Obviously correct - the pattern is identical to surrounding code
  (`io_rsrc_node_lookup`, `__io_sqe_buffers_update`)
- No semantic change for non-malicious callers (architectural bounds
  were already guaranteed)
- Zero regression risk: only adds an extra bounds-check + nospec mask on
  an existing index
- Record: High-quality, low-risk hardening.

## PHASE 3: GIT HISTORY

**Step 3.1: Blame**
- The helper `io_reset_rsrc_node()` was added by `4007c3d8c22a2`
  ("io_uring/rsrc: add io_reset_rsrc_node() helper", Jens Axboe, Oct 29
  2024) — first appears in v6.13.
- Before that refactor (v6.12), `__io_sqe_files_update` had `i =
  array_index_nospec(up->offset + done, ctx->nr_user_files);` — verified
  by `git show v6.12:io_uring/rsrc.c`. So v6.12 was protected.
- Record: Bug introduced in 4007c3d8c22a2 (v6.13) by inadvertently
  dropping `array_index_nospec()` during the helper extraction.

**Step 3.2: Fixes: tag follow-through**
- No Fixes: tag in this commit. Logical Fixes target is `4007c3d8c22a2`,
  present in v6.13 and later.
- Record: Bug regression introduced in v6.13; absent in v6.12 LTS.

**Step 3.3: Related changes / file history**
- `io_uring/rsrc.h` recently saw `82dadc8a49475` ("take unsigned index
  in io_rsrc_node_lookup()", Jan 2026) — related index typing cleanup
- This commit takes the same step for `io_reset_rsrc_node`
- Record: Latest in a series of small index-safety improvements; no
  prerequisites required.

**Step 3.4: Author**
- Jens Axboe is the io_uring maintainer; he both wrote 4007c3d8c22a2
  (introduced the regression) and authors this fix.
- Record: Subsystem maintainer authored.

**Step 3.5: Dependencies**
- The patch uses only existing primitives (`array_index_nospec`, the
  existing `data->nr` field, the existing helper signature). Standalone.
- Record: Standalone, no prerequisites.

## PHASE 4: MAILING LIST RESEARCH

**Step 4.1: Original submission**
- `b4 dig -c 53262c91f7b81` found patch 2/6 of "Various bug fixes"
  series at lore.kernel.org/all/20260421135626.581917-3-axboe@kernel.dk
- Cover letter ("PATCHSET 0/6 Various bug fixes") explicitly describes
  the patches:
  - "Patch 2, spectre masking for file updates."
  - Patch 6 is the only one with `Cc: stable@kernel.org` (a different
    patch with a clear regression Fixes:)
- Record: Submitted as part of a 6-patch series; cover-letter labels
  this one as "spectre masking" specifically (separate category from
  "defensive cleanups").

**Step 4.2: Reviewers (b4 dig -w)**
- Original recipients: `Jens Axboe`, `io-uring@vger.kernel.org`
- Reply thread: Gabriel Krisman Bertazi (SUSE) gave Reviewed-by
- Record: Reviewed by external developer (SUSE).

**Step 4.3: Bug report**
- No Reported-by / Link tags. No bug report - this is proactive
  hardening.
- Record: Proactive Spectre v1 mitigation, no specific user-triggered
  report.

**Step 4.4: Series context**
- Series: 1/6 (defensive cleanup, not reachable), 2/6 (this - spectre
  masking), 3/6 (defensive cleanup), 4/6 (defensive hardening), 5/6
  (futex actual fix, has Fixes:), 6/6 (ring resize actual fix, has
  Fixes: + Cc: stable)
- Record: Standalone within the series; doesn't depend on the others.

**Step 4.5: Stable list history**
- Not searched in detail. Note: the author chose NOT to Cc stable on
  this specific patch.
- Record: No explicit stable nomination, but author historically doesn't
  cc-stable Spectre hardening either (precedent: similar nospec fixes
  953c37e066f05/29b95ac917927 went to stable via maintainer-tagged
  Fixes:).

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1: Modified functions**
- `__io_sqe_files_update()` - handles `IORING_REGISTER_FILES_UPDATE`
- `io_reset_rsrc_node()` - inline helper used in 4 places

**Step 5.2: Callers**
- `io_reset_rsrc_node()` callers (verified by Grep):
  - `io_uring/rsrc.c:241` - in `__io_sqe_files_update()` (this fix's
    site)
  - `io_uring/rsrc.c:320` - in `__io_sqe_buffers_update()` (already
    nospec'd at the caller)
  - `io_uring/filetable.c:79` - in `io_install_fixed_file()` (called for
    direct fd installs; bounds-checked at line 72)
  - `io_uring/filetable.c:138` - in `io_fixed_fd_remove()` (bounds-
    checked at line 132)
- All 4 are user-reachable via io_uring register/update operations.
- Record: 4 call sites; all reachable from userspace via io_uring
  `register` syscall paths.

**Step 5.3: Callees**
- `io_reset_rsrc_node()` calls `io_put_rsrc_node()` and indexes
  `data->nodes[index]`. The `array_index_nospec()` mask is now applied
  before the indexed load.

**Step 5.4: Reachability**
- The path is reachable from userspace via
  `io_uring_register(IORING_REGISTER_FILES_UPDATE, ...)`. Any process
  with io_uring access can hit it.
- Record: User-reachable from a basic syscall path.

**Step 5.5: Similar patterns**
- `io_rsrc_node_lookup()` already does the same pattern (bounds check +
  nospec mask)
- `__io_sqe_buffers_update()` already does the nospec mask at the caller
- This commit harmonizes the file-update path and the helper itself
- Past similar fixes: `b7620121dc04e` (2019), `34bb77184123a` (2022),
  `953c37e066f05` (2023), `29b95ac917927` (2024) - all backported
- Record: Identical pattern to a long lineage of accepted Spectre v1
  nospec fixes.

## PHASE 6: CROSS-REFERENCING / STABLE TREE

**Step 6.1: Buggy code in stable**
- `io_reset_rsrc_node()` introduced in `4007c3d8c22a2` (v6.13). Stable
  trees v6.13.y onward inherit the missing nospec.
- v6.12.y LTS does NOT have this regression (the function itself doesn't
  exist there).
- Record: Affected stable trees: v6.13.y - v6.19.y. v6.12 LTS
  unaffected.

**Step 6.2: Backport difficulty**
- The diff context is small. The function shape has been stable since
  v6.13 with only minor signature changes (e.g., `82dadc8a49475` made
  `io_rsrc_node_lookup` index unsigned in Jan 2026). Backport should
  apply nearly cleanly to active stable trees that have
  `io_reset_rsrc_node`.
- Record: Likely clean apply on v6.13+ stable trees; v6.12 LTS not
  applicable.

**Step 6.3: Related fixes already in stable**
- `953c37e066f05` and similar nospec fixes are already in older stable
  kernels.
- Record: This is the latest in the series; no overlap.

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1: Subsystem**
- `io_uring/` - heavily used core async I/O subsystem reachable by any
  unprivileged process; security-sensitive.
- Criticality: IMPORTANT (used by many distros, databases, language
  runtimes).

**Step 7.2: Activity**
- Highly active subsystem with frequent fixes. Spectre and registration-
  path hardening is an ongoing theme.

## PHASE 8: IMPACT / RISK

**Step 8.1: Affected users**
- Any user of io_uring fixed-files (`IORING_REGISTER_FILES_UPDATE`) on a
  kernel >= v6.13. That's a large population - any process able to call
  `io_uring_setup`.

**Step 8.2: Trigger**
- Trigger: a userspace caller invokes `IORING_REGISTER_FILES_UPDATE`
  with a manipulated offset to mistrain a CPU branch predictor for a
  Spectre v1 attack. Architecturally bounded, but exposes a speculative-
  load gadget to any unprivileged caller.
- Record: Unprivileged userspace can reach the path.

**Step 8.3: Failure mode**
- Pure architectural correctness is unaffected; the failure mode is
  *information disclosure* via a Spectre v1 side channel. Severity for a
  sanitizer/Spectre hardening category: MEDIUM-HIGH (security hardening,
  defense-in-depth, no crash but real CVE class).

**Step 8.4: Risk-Benefit**
- Benefit: closes a known speculative gadget on a user-reachable indexed
  load - matches a long-standing pattern of accepted backports.
- Risk: ~10 lines, identical to widely-deployed pattern in adjacent
  code, fully verifiable. Very low.
- Record: High benefit / very low risk.

## PHASE 9: SYNTHESIS

**Step 9.1: Evidence**
- FOR backporting:
  - Spectre v1 (CVE-2017-5753 class) speculative-load gadget on a user-
    reachable path.
  - Restores protection that existed in v6.12 and was lost during the
    v6.13 helper extraction (`4007c3d8c22a2`).
  - 10-line surgical change identical in pattern to multiple historical
    nospec fixes that DID go to stable (`b7620121dc04e`,
    `34bb77184123a`, `953c37e066f05`, `29b95ac917927`).
  - Defense-in-depth: hardens both the caller and the helper.
  - Reviewed-by external developer. Maintainer-authored.
- AGAINST:
  - No `Fixes:` tag, no `Cc: stable` (notable since other patches in the
    same series do have them).
  - Cover letter labels this as "spectre masking" rather than "actual
    fix".
  - Architectural bound check already exists at line 222-223 in
    `__io_sqe_files_update`; the issue is purely speculative-execution.
- UNVERIFIED: Did not exhaustively check every active stable branch for
  divergence; reliant on grep/log of mainline.

**Step 9.2: Stable rules**
1. Obviously correct and tested? YES (10 lines, well-known pattern,
   reviewed).
2. Fixes a real bug? YES (Spectre v1 class info-leak gadget, regression
   vs v6.12).
3. Important issue? YES (security hardening, CVE class, user-reachable).
4. Small and contained? YES (10 lines, 2 files, 1 subsystem).
5. No new features or APIs? YES.
6. Apply to stable trees? YES for v6.13.y+ (helper exists there).

**Step 9.3: Exception category**
- Falls under defensive security hardening with strong precedent
  (multiple prior nospec fixes in io_uring backported).

**Step 9.4: Decision**
- Borderline by author's tagging choice but technically a Spectre v1
  mitigation that follows a well-established pattern of stable-eligible
  io_uring nospec fixes. Restores protection lost in v6.13.

## Verification

- [Phase 1] Read commit message via `git show
  53262c91f7b81f96495ff24e9d1fa8b1632e69c8`: confirmed `Reviewed-by:
  Gabriel Krisman Bertazi`, `Signed-off-by: Jens Axboe`, no `Fixes:`, no
  `Cc: stable`.
- [Phase 2] Diff analysis: confirmed 3 added lines in
  `io_uring/rsrc.c::__io_sqe_files_update()` and 6 added/1 removed in
  `io_uring/rsrc.h::io_reset_rsrc_node()`. Index type widened to
  `unsigned int`.
- [Phase 2] Read post-fix `io_uring/rsrc.c` lines 211-272 and
  `io_uring/rsrc.h` lines 90-140 to verify the upfront bounds check at
  line 222 and the resulting helper shape.
- [Phase 3] `git log -- io_uring/rsrc.c | head` and `git log
  --grep="io_reset_rsrc_node"`: located helper introduction
  `4007c3d8c22a2` (Oct 29 2024).
- [Phase 3] `git tag --contains 4007c3d8c22a2 | grep "^v"` (via prefix
  match): no results means the tag is in v6.13+ (helper introduced for
  v6.13).
- [Phase 3] `git show v6.12:io_uring/rsrc.c`: verified v6.12 had `i =
  array_index_nospec(up->offset + done, ctx->nr_user_files);` so this is
  a true regression introduced by the helper extraction.
- [Phase 4] `b4 dig -c 53262c91f7b81f96495ff24e9d1fa8b1632e69c8`:
  matched lore URL `https://lore.kernel.org/all/20260421135626.581917-3-
  axboe@kernel.dk/`.
- [Phase 4] `b4 dig -w`: original recipients limited to Jens Axboe and
  io-uring list.
- [Phase 4] Saved mbox via `b4 dig -m`: cover-letter explicitly
  classifies this patch as "spectre masking for file updates"; only
  patch 6/6 has `Cc: stable@kernel.org`. Patch 5/6 has `Fixes:` for an
  old commit but no Cc: stable; this one (patch 2/6) has neither.
- [Phase 4] Verified `Reviewed-by: Gabriel Krisman Bertazi` reply in the
  saved mbox at line 1490.
- [Phase 5] `Grep "io_reset_rsrc_node"` across `io_uring/`: found 4
  callers (rsrc.c:241, rsrc.c:320, filetable.c:79, filetable.c:138), all
  in user-reachable register paths. Inspected `filetable.c:62-141` to
  confirm both call sites are bounds-checked at the architectural level
  but lacked a nospec mask before the helper.
- [Phase 5] Located historical precedents: `git log
  --grep="^io_uring.*nospec"` returned `953c37e066f05`, `4cdd158be9d09`,
  `34bb77184123a`, `b7620121dc04e`. Inspected `b7620121dc04e` and
  `34bb77184123a` to confirm pattern (Spectre v1 nospec on indexed
  register paths).
- [Phase 6] Stable affected: v6.13+ (since helper introduction); v6.12
  LTS unaffected (different code structure with existing nospec).
- [Phase 8] Failure mode classified as Spectre v1 information disclosure
  on a user-reachable indexed load; severity MEDIUM-HIGH (security CVE
  class).
- UNVERIFIED: Did not check each individual active stable tree (v6.13.y
  through v6.19.y) line-for-line for backport conflicts; assessment
  based on the limited churn around the helper since v6.13.
- UNVERIFIED: Did not directly confirm whether
  953c37e066f05/29b95ac917927 actually landed in every stable branch
  (relied on the historical pattern of similar nospec fixes being
  backported).

## Decision rationale

This is a small (~10 line) Spectre v1 mitigation that restores
`array_index_nospec()` protection on a user-reachable indexed-load
gadget that was inadvertently dropped when the `io_reset_rsrc_node()`
helper was extracted in v6.13. It applies the same idiom already used in
the buffer-update path (`__io_sqe_buffers_update`) and the lookup helper
(`io_rsrc_node_lookup`), and it directly mirrors a long line of similar
nospec fixes that have been backported (`b7620121dc04e`,
`34bb77184123a`, `953c37e066f05`, `29b95ac917927`). The patch is
reviewed, surgical, regression-free, maintainer-authored, and addresses
a genuine speculative-execution gadget reachable by any unprivileged
process that can use io_uring. Although the author did not add `Cc:
stable` or a `Fixes:` tag (a deliberate choice noted on the cover-letter
as "spectre masking" rather than "actual fix"), the technical merits
clearly satisfy the stable rules: real CVE-class bug, important, small,
contained, no new functionality, applies cleanly to the affected stable
trees (v6.13+).

**YES**

 io_uring/rsrc.c | 3 +++
 io_uring/rsrc.h | 9 +++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 1b96ab5e98c99..7953f38af1eb0 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -238,6 +238,9 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx,
 			continue;
 
 		i = up->offset + done;
+		if (i >= ctx->file_table.data.nr)
+			break;
+		i = array_index_nospec(i, ctx->file_table.data.nr);
 		if (io_reset_rsrc_node(ctx, &ctx->file_table.data, i))
 			io_file_bitmap_clear(&ctx->file_table, i);
 
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index cff0f8834c353..44e3386f7c1ca 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -109,10 +109,15 @@ static inline void io_put_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node
 }
 
 static inline bool io_reset_rsrc_node(struct io_ring_ctx *ctx,
-				      struct io_rsrc_data *data, int index)
+				      struct io_rsrc_data *data,
+				      unsigned int index)
 {
-	struct io_rsrc_node *node = data->nodes[index];
+	struct io_rsrc_node *node;
 
+	if (index >= data->nr)
+		return false;
+	index = array_index_nospec(index, data->nr);
+	node = data->nodes[index];
 	if (!node)
 		return false;
 	io_put_rsrc_node(ctx, node);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [PATCH AUTOSEL 7.0-6.12] io_uring: take page references for NOMMU pbuf_ring mmaps
       [not found] <20260428104133.2858589-1-sashal@kernel.org>
  2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] io_uring/rsrc: unify nospec indexing for direct descriptors Sasha Levin
@ 2026-04-28 10:41 ` Sasha Levin
  1 sibling, 0 replies; 2+ messages in thread
From: Sasha Levin @ 2026-04-28 10:41 UTC (permalink / raw)
  To: patches, stable
  Cc: Greg Kroah-Hartman, Jens Axboe, Sasha Levin, io-uring,
	linux-kernel

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

[ Upstream commit d0be8884f56b0b800cd8966e37ce23417cd5044e ]

Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
virtual address of the io_mapped_region's backing pages directly;
the user's VMA aliases the kernel allocation. io_uring_mmap() then
just returns 0 -- it takes no page references.

The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
each inserted page.  Those references are released when the VMA is torn
down (zap_pte_range -> put_page). io_free_region() -> release_pages()
drops the io_uring-side references, but the pages survive until munmap
drops the VMA-side references.

Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
io_put_bl -> io_free_region -> release_pages drops the only references
and the pages return to the buddy allocator while the user's VMA still
has vm_start pointing into them.  The user can then write into whatever
the allocator hands out next.

Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
release them via vm_ops->close.  NOMMU's delete_vma() calls vma_close()
which runs ->close on munmap.

This also incidentally addresses the duplicate-vm_start case: two mmaps
of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
With page refs taken per mmap, the second mmap takes its own refs and
the pages survive until both mmaps are closed.  The nommu rb-tree BUG_ON
on duplicate vm_start is a separate mm/nommu.c concern (it should share
the existing region rather than BUG), but the page lifetime is now
correct.

Cc: Jens Axboe <axboe@kernel.dk>
Reported-by: Anthropic
Assisted-by: gkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh
[axboe: get rid of region lookup, just iterate pages in vma]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

# Analysis: io_uring NOMMU pbuf_ring page UAF fix

## PHASE 1: COMMIT MESSAGE FORENSICS

**Step 1.1 - Subject line:**
Record: subsystem `io_uring`, action verb `take`, summary: "take page
references for NOMMU pbuf_ring mmaps" — wraps a fix for a page lifetime
/ use-after-free issue under `!CONFIG_MMU`.

**Step 1.2 - Tags:**
Record:
- `Cc: Jens Axboe <axboe@kernel.dk>`
- `Reported-by: Anthropic` (AI bug report)
- `Assisted-by: gkh_clanker_t1000` (an unusual tag — verified this is
  identical to upstream commit `d0be8884f56b0`, not pipeline-injected)
- `Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>`
  (author SOB, kernel veteran)
- `Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh`
- `[axboe: get rid of region lookup, just iterate pages in vma]`
  (maintainer-folded change)
- `Signed-off-by: Jens Axboe <axboe@kernel.dk>` (subsystem maintainer)
No `Cc: stable` or `Fixes:` — expected.

**Step 1.3 - Body text:**
Record: Author explains a use-after-free root cause precisely:
- NOMMU `io_uring_get_unmapped_area()` returns a kernel virtual address;
  user VMA aliases the kernel pages.
- `io_uring_mmap()` returns 0 without taking page references.
- `io_unregister_pbuf_ring -> io_put_bl -> io_free_region ->
  release_pages` drops the only reference; pages return to the buddy
  allocator while the user's VMA still maps them.
- "The user can then write into whatever the allocator hands out next."
  — this is a write-after-free.
- Fix mirrors MMU lifetime by `get_page` per page in `mmap()` and
  `put_page` via `vm_ops->close`.
- Also addresses the duplicate-vm_start case for SQ/CQ.

**Step 1.4 - Hidden bug fix?**
Record: Not hidden — the commit body explicitly describes a use-after-
free / write-after-free of pages handed to userspace, which is a serious
memory-safety / security bug.

## PHASE 2: DIFF ANALYSIS

**Step 2.1 - Inventory:**
Record: 1 file changed (`io_uring/memmap.c`); +44 lines, -1 line. Adds
`io_uring_nommu_vm_close()`, `io_uring_nommu_vm_ops`, expands
`io_uring_mmap()` (`!CONFIG_MMU` branch). Single-file, surgical NOMMU-
only change.

**Step 2.2 - Code flow change:**
Before: `io_uring_mmap()` for NOMMU only validated flags; returned 0
with no page references taken.
After: validates flags, looks up the region under `ctx->mmap_lock`,
validates region is set and the VMA size matches `region->nr_pages`,
takes a `get_page()` per backing page, and installs `vm_ops->close` to
drop those references at unmap.

**Step 2.3 - Bug mechanism:**
Record: Use-after-free / write-after-free of kernel pages still mapped
in userspace. Category: memory safety + reference counting (missing
`get_page` on the mmap path that aliases kernel allocations). The fix
balances the lifetime by adding `get_page()` on map and `put_page()` on
close.

**Step 2.4 - Fix quality:**
Record: Small, contained. Logic is straightforward: per-page `get_page`
on map, mirrored `put_page` on close. The validation that `vma->vm_end -
vma->vm_start == region->nr_pages << PAGE_SHIFT` guards the close-time
`virt_to_page` walk over the VMA address range. Risk that
`vma->vm_start` no longer points to those pages is addressed by holding
the page references — the kernel virtual address remains valid as long
as the page is alive. Fix is obviously correct for the NOMMU case
described.

## PHASE 3: GIT HISTORY

**Step 3.1 - Blame:**
Record: The vulnerable line `return
is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;` has been present
in NOMMU `io_uring_mmap()` since `f15ed8b4d0ce2 io_uring: move
mapping/allocation helpers to a separate file` (v6.10) and earlier in
`io_uring/io_uring.c` going back to v6.0 era when io_uring moved into
its own subdirectory (`ed29b0b4fd835`, v6.0).

**Step 3.2 - Fixes: tag:**
Record: No Fixes: tag. The specific UAF via the `pbuf_ring`
`release_pages` path requires the region API on the pbuf side, which
arrived with `ef62de3c4ad58 io_uring/kbuf: use region api for pbuf
rings` and the sibling memmap commits, all in v6.14-rc1.

**Step 3.3 - Related changes:**
Record: Relevant series: `7cd7b9575270e io_uring/memmap: unify io_uring
mmap'ing code`, `ef62de3c4ad58 io_uring/kbuf: use region api for pbuf
rings`, `90175f3f50321 io_uring/kbuf: remove pbuf ring refcounting` (all
v6.14-rc1). These restructured pbuf_ring mmap to share the region
machinery — the same machinery whose `release_pages` now drops the only
reference under NOMMU.

**Step 3.4 - Author:**
Record: Author is Greg Kroah-Hartman (LTS maintainer). Folded by Jens
Axboe (io_uring maintainer). Both highly authoritative.

**Step 3.5 - Dependencies:**
Record: The fix uses `io_mmap_get_region()`, `io_region_is_set()`,
`region->pages`, `region->nr_pages`, `ctx->mmap_lock` — all introduced
in v6.14. For v6.14+ stable trees, this should apply standalone. For
older trees (≤v6.12), the patch will not apply as-is.

## PHASE 4: MAILING LIST AND EXTERNAL RESEARCH

**Step 4.1 - Original submission:**
Record: `b4 dig -c d0be8884f56b0` returned thread
`https://lore.kernel.org/all/2026042115-body-attention-d15b@gregkh/`.
The series went through one revision — Jens folded a simplification
("get rid of region lookup, just iterate pages in vma") with size
validation before applying.

**Step 4.2 - Reviewers:**
Record: To `io-uring@vger.kernel.org`, Cc: Jens Axboe (subsystem
maintainer). Maintainer folded changes and pushed.

**Step 4.3 - Bug report:**
Record: Greg's email confirms this was an AI-generated report.
**However**, Greg explicitly built a PoC (poc.c + run-poc.sh attached to
the thread) which:
- Builds a riscv64 NOMMU kernel and boots in QEMU with `init_on_free=1`
- As init, registers a pbuf_ring with `IOU_PBUF_RING_MMAP`, mmaps a
  page, writes a 0x55 canary, unregisters the pbuf_ring, then re-reads
- On unfixed: canary becomes 0x00 (page freed and zeroed), then re-
  registering reuses the same page demonstrating write-after-free
- On fixed: canary is intact
- Greg replied `Tested-by: Greg Kroah-Hartman
  <gregkh@linuxfoundation.org>` after Jens's folded version

The CVE-style identifiers `ANT-2026-02884` (the UAF) and
`ANT-2026-02650` (related duplicate vm_start) are referenced in the PoC.

**Step 4.4 - Series context:**
Record: Single patch (no series). Greg also has an alternative patch
that disables io_uring on `!MMU` entirely, which Jens did not accept in
favor of this fix.

**Step 4.5 - Stable discussion:**
Record: No explicit `Cc: stable` mention in the thread, and no
`stable@vger.kernel.org` in the discussion. However, this is a confirmed
UAF reachable from unprivileged userspace with a working exploit
reproducer — clearly stable material.

## PHASE 5: CODE SEMANTIC ANALYSIS

**Step 5.1 - Modified functions:**
Record: `io_uring_mmap()` (NOMMU branch), new
`io_uring_nommu_vm_close()`, new `io_uring_nommu_vm_ops`.

**Step 5.2 - Callers:**
Record: `io_uring_mmap` is the file_operations `.mmap` for the io_uring
fd; reachable from any userspace `mmap()` on an io_uring fd.
`io_uring_nommu_vm_close` is invoked by `delete_vma()` in `mm/nommu.c`
on `munmap`/exit. The bug path: `io_unregister_pbuf_ring()` →
`io_put_bl()` (`io_uring/kbuf.c:445`) → `io_free_region()`
(`io_uring/memmap.c:91`) → `release_pages()` — confirmed by `git grep`.

**Step 5.3 - Callees:**
Record: `get_page()`, `put_page()`, `is_nommu_shared_mapping()`,
`io_mmap_get_region()`, `io_region_is_set()`, `virt_to_page()`. All
standard kernel APIs.

**Step 5.4 - Reachability:**
Record: io_uring `register`/`unregister` and `mmap` are unprivileged
syscalls (no `CAP_SYS_ADMIN` for these paths — verified by grep across
`io_uring/`). The PoC demonstrates a full unprivileged trigger.

**Step 5.5 - Similar patterns:**
Record: The MMU path uses `vm_insert_pages()` (which does its own
`get_page` per inserted page, released on VMA teardown via
`zap_pte_range -> put_page`). The fix gives NOMMU equivalent symmetry.
Searching for other `is_nommu_shared_mapping` users (`fc4f4be9b5271`) —
io_uring is the only file_ops user adding such page lifetime semantics
manually.

## PHASE 6: STABLE TREE ANALYSIS

**Step 6.1 - Bug presence in stable:**
Record: Verified `git show v6.18:io_uring/memmap.c` and `git show
v7.0:io_uring/memmap.c` — both contain the unfixed `return
is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;`. The pbuf_ring
region API (the trigger surface for this exact UAF) exists from v6.14
onward. Affected trees with this exact bug: v6.14, v6.15, v6.16, v6.17,
**v6.18 LTS**, v6.19, **v7.0** (this branch).

**Step 6.2 - Backport complications:**
Record: For v6.14 → v7.0, all helpers (`io_mmap_get_region`,
`io_region_is_set`, `ctx->mmap_lock`, `region->pages/nr_pages`,
`guard(mutex)`) exist; the patch should apply cleanly or with trivial
adjustment. For v6.12 LTS and older, `io_mmap_get_region()` does not
exist (region API absent in pbuf path) — the same conceptual UAF may
exist via different code, but the fix as-presented does not apply. v6.6
LTS — same story.

**Step 6.3 - Related fixes already in stable:**
Record: No prior fix found. This is a new, recently-discovered class of
bug.

## PHASE 7: SUBSYSTEM CONTEXT

**Step 7.1 - Subsystem and criticality:**
Record: `io_uring` — IMPORTANT (heavily used subsystem; security-
relevant; reachable from unprivileged userspace). Criticality of this
specific config (NOMMU): PERIPHERAL (only `!CONFIG_MMU` builds, mostly
RISC-V/embedded). Net assessment: IMPORTANT-but-PERIPHERAL —
unprivileged UAF in a security-sensitive subsystem, on a small but real
config.

**Step 7.2 - Subsystem activity:**
Record: io_uring is one of the most actively developed kernel
subsystems; the affected code (region API) is recent (v6.14) and well
maintained.

## PHASE 8: IMPACT AND RISK ASSESSMENT

**Step 8.1 - Affected users:**
Record: Users of `!CONFIG_MMU` kernels (RISC-V nommu, ARM nommu,
Blackfin successors, some MicroBlaze configs, embedded NOMMU systems
with io_uring enabled). Small population, but real and the bug is
unconditional on those builds when pbuf_ring mmap is used.

**Step 8.2 - Trigger:**
Record: Trivial — unprivileged process calls `io_uring_setup`,
`io_uring_register(IORING_REGISTER_PBUF_RING, ..., IOU_PBUF_RING_MMAP)`,
`mmap(IORING_OFF_PBUF_RING)`, then
`io_uring_register(IORING_UNREGISTER_PBUF_RING, ...)`. PoC demonstrates
this path. Same pattern for SQ/CQ rings.

**Step 8.3 - Failure mode:**
Record: Use-after-free → write-after-free of kernel pages from
userspace. With the page returned to the buddy allocator and reused
(kernel-side allocation hands the same page back), the user can
read/write whatever the kernel later places there — heap-spray-friendly,
security-CRITICAL. PoC ends with sysrq-c kernel panic for proof.

**Step 8.4 - Risk-benefit:**
Record:
- Benefit: prevents an unprivileged user-triggered UAF / write-after-
  free on NOMMU systems — exactly the stable mandate.
- Risk: minimal — change is confined to the `!CONFIG_MMU` branch of
  `io_uring/memmap.c` (44 lines), so it cannot affect any MMU build.
  Even on NOMMU, the fix only adds `get_page`/`put_page` symmetry to
  mirror the MMU path. Tested-by Greg Kroah-Hartman with explicit PoC +
  boot test.
Ratio: very high benefit / very low risk.

## PHASE 9: FINAL SYNTHESIS

**Step 9.1 - Evidence:**
- FOR: confirmed unprivileged-reachable UAF/WAF on NOMMU; PoC exists and
  panics unfixed kernels; small, surgical, single-file fix; tested by
  Greg KH; written by LTS maintainer; folded by io_uring maintainer;
  merged upstream in `d0be8884f56b0`; only touches NOMMU branch; depends
  on helpers all present in v6.14+.
- AGAINST: NOMMU is uncommon; the bug was AI-discovered and Greg
  initially expressed low confidence in the fix details; pre-v6.14
  stable trees would need a different patch.

**Step 9.2 - Stable rules:**
1. Obviously correct and tested? YES — PoC + Tested-by from LTS
   maintainer.
2. Fixes a real bug affecting users? YES — UAF reachable by any
   unprivileged process on NOMMU.
3. Important issue? YES — security-critical (write-after-free of
   arbitrary kernel pages).
4. Small and contained? YES — 1 file, ~44 lines, NOMMU-only branch.
5. No new features/APIs? YES — internal fix only.
6. Applies to stable? YES for v6.14+ (cleanly to v6.18, v6.19, v7.0);
   needs adaptation for older trees.

**Step 9.3 - Exceptions:** Not applicable — this is a normal bug fix,
not a quirk/device-id/DT update. Stands on its merits.

**Step 9.4 - Decision:** Backport. This is a confirmed unprivileged-
reachable use-after-free with a working PoC. The fix is small, surgical,
and isolated to the NOMMU code path so it cannot regress MMU builds. It
applies cleanly to v6.14+ stable trees including the v7.0.y autosel
target (HEAD here).

## Verification

- [Phase 1] Read full upstream commit `d0be8884f56b0` via `git show`;
  confirmed `Reported-by: Anthropic` and `Assisted-by:
  gkh_clanker_t1000` are part of the upstream commit, not pipeline-
  injected.
- [Phase 2] Diff inspection confirms +44/-1 lines in `io_uring/memmap.c`
  only, all in the `!CONFIG_MMU` branch.
- [Phase 3] `git log --oneline -- io_uring/memmap.c` and `git describe
  --contains` confirm region API arrived in v6.14-rc1 (`ef62de3c4ad58`,
  `7cd7b9575270e`); pre-v6.14 NOMMU mmap was already vulnerable in
  spirit but used different (refcounted) pbuf paths.
- [Phase 3] `git show v6.6:io_uring/io_uring.c`,
  `v6.12:io_uring/memmap.c`, `v6.18:io_uring/memmap.c`,
  `v7.0:io_uring/memmap.c` confirm the unfixed `return
  is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;` is present from
  v6.6 through v7.0.
- [Phase 4] `b4 dig -c d0be8884f56b0` returned thread
  `https://lore.kernel.org/all/2026042115-body-attention-d15b@gregkh/`.
- [Phase 4] `b4 dig -c d0be8884f56b0 -a` showed v1 only; Jens folded an
  inline simplification when applying.
- [Phase 4] `b4 dig -c d0be8884f56b0 -m /tmp/io_uring_thread.mbox` saved
  the thread; read confirms PoC (poc.c, run-poc.sh) tests vulnerable vs.
  fixed kernels with `init_on_free=1`, and `Tested-by: Greg Kroah-
  Hartman` on Jens's folded version.
- [Phase 4] PoC references CVE-style identifiers `ANT-2026-02884` (this
  UAF) and `ANT-2026-02650` (related duplicate vm_start case).
- [Phase 5] `grep` in `io_uring/kbuf.c` confirmed
  `io_unregister_pbuf_ring -> io_put_bl -> io_free_region` call chain at
  lines 445, 698, 719.
- [Phase 5] `grep` for `capable\|CAP_` in `io_uring/io_uring.c, kbuf.c,
  register.c` confirms IORING_REGISTER_PBUF_RING and io_uring_mmap are
  unprivileged.
- [Phase 6] Verified `io_mmap_get_region` and `io_region_is_set` exist
  in v6.18, v7.0; do not exist in v6.12.
- [Phase 8] Failure mode: confirmed UAF + WAF + observable from
  userspace via PoC. Severity: CRITICAL (security).
- UNVERIFIED: Did not attempt to actually run the PoC under QEMU in this
  session; relied on Greg KH's `Tested-by` and PoC source code
  inspection.
- UNVERIFIED: Did not check whether stable maintainers (separate from
  the discussion thread) have already queued or rejected this for
  stable.

The fix addresses a confirmed unprivileged-reachable use-after-free /
write-after-free in io_uring under `!CONFIG_MMU`, is small and contained
to the NOMMU branch only, was tested-by the LTS maintainer with a
working PoC, and applies cleanly to v6.14+ stable trees (including the
v7.0.y target this branch represents). It meets every stable rule.

**YES**

 io_uring/memmap.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index e6958968975a8..4f9b439319c46 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -366,9 +366,53 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
 
 #else /* !CONFIG_MMU */
 
+/*
+ * Drop the pages that were initially referenced and added in
+ * io_uring_mmap(). We cannot have had a mremap() as that isn't supported,
+ * hence the vma should be identical to the one we initially referenced and
+ * mapped, and partial unmaps and splitting isn't possible on a file backed
+ * mapping.
+ */
+static void io_uring_nommu_vm_close(struct vm_area_struct *vma)
+{
+	unsigned long index;
+
+	for (index = vma->vm_start; index < vma->vm_end; index += PAGE_SIZE)
+		put_page(virt_to_page((void *) index));
+}
+
+static const struct vm_operations_struct io_uring_nommu_vm_ops = {
+	.close = io_uring_nommu_vm_close,
+};
+
 int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
+	struct io_ring_ctx *ctx = file->private_data;
+	struct io_mapped_region *region;
+	unsigned long i;
+
+	if (!is_nommu_shared_mapping(vma->vm_flags))
+		return -EINVAL;
+
+	guard(mutex)(&ctx->mmap_lock);
+	region = io_mmap_get_region(ctx, vma->vm_pgoff);
+	if (!region || !io_region_is_set(region))
+		return -EINVAL;
+
+	if ((vma->vm_end - vma->vm_start) !=
+	    (unsigned long) region->nr_pages << PAGE_SHIFT)
+		return -EINVAL;
+
+	/*
+	 * Pin the pages so io_free_region()'s release_pages() does not
+	 * drop the last reference while this VMA exists. delete_vma()
+	 * in mm/nommu.c calls vma_close() which runs ->close above.
+	 */
+	for (i = 0; i < region->nr_pages; i++)
+		get_page(region->pages[i]);
+
+	vma->vm_ops = &io_uring_nommu_vm_ops;
+	return 0;
 }
 
 unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2026-04-28 10:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260428104133.2858589-1-sashal@kernel.org>
2026-04-28 10:40 ` [PATCH AUTOSEL 7.0-6.18] io_uring/rsrc: unify nospec indexing for direct descriptors Sasha Levin
2026-04-28 10:41 ` [PATCH AUTOSEL 7.0-6.12] io_uring: take page references for NOMMU pbuf_ring mmaps Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox