Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps

public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed

From: Jens Axboe <axboe@kernel.dk>
To: io-uring@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps
Date: Tue, 21 Apr 2026 20:26:08 -0600	[thread overview]
Message-ID: <9c20876f-1cdb-429a-abb3-5ddbcd8cac00@kernel.dk> (raw)
In-Reply-To: <f1b43e56-4724-4635-b18b-bae2add37936@kernel.dk>

On 4/21/26 7:56 PM, Jens Axboe wrote:
> On 4/21/26 7:17 PM, Jens Axboe wrote:
>> On 4/21/26 11:39 AM, Jens Axboe wrote:
>>>
>>> On Tue, 21 Apr 2026 15:46:16 +0200, Greg Kroah-Hartman wrote:
>>>> Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
>>>> virtual address of the io_mapped_region's backing pages directly;
>>>> the user's VMA aliases the kernel allocation. io_uring_mmap() then
>>>> just returns 0 -- it takes no page references.
>>>>
>>>> The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
>>>> each inserted page.  Those references are released when the VMA is torn
>>>> down (zap_pte_range -> put_page). io_free_region() -> release_pages()
>>>> drops the io_uring-side references, but the pages survive until munmap
>>>> drops the VMA-side references.
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>>
>>> [1/1] io_uring: take page references for NOMMU pbuf_ring mmaps
>>>       commit: d9b7b3d9c5286a786c7fe8220c55a6e012088c2e
>>
>> Actually, I take that back - what prevents the io_mmap_get_region()
>> in the newly added io_uring_nommu_vm_close() from getting the same
>> region that we initially referenced the pages from in the nommu
>> variant of io_uring_mmap()?
> 
> I think we can get rid of that and simplify the code at the same
> time. Rather than need to re-lookup the buffer list, we can just iterate
> the pages mapped in the vma. Since this is a file backed mapping and
> io_uring doesn't allow remaps, that should always be the same.
> 
> Greg, can you test this? I will fold this in.

Here's the full patch - the incremental was missing a ')'. And
for good measure, ensure that the vma size matches the pages in
the region.

commit d0be8884f56b0b800cd8966e37ce23417cd5044e
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Tue Apr 21 15:46:16 2026 +0200

    io_uring: take page references for NOMMU pbuf_ring mmaps
    
    Under !CONFIG_MMU, io_uring_get_unmapped_area() returns the kernel
    virtual address of the io_mapped_region's backing pages directly;
    the user's VMA aliases the kernel allocation. io_uring_mmap() then
    just returns 0 -- it takes no page references.
    
    The CONFIG_MMU path uses vm_insert_pages(), which takes a reference on
    each inserted page.  Those references are released when the VMA is torn
    down (zap_pte_range -> put_page). io_free_region() -> release_pages()
    drops the io_uring-side references, but the pages survive until munmap
    drops the VMA-side references.
    
    Under NOMMU there are no VMA-side references. io_unregister_pbuf_ring ->
    io_put_bl -> io_free_region -> release_pages drops the only references
    and the pages return to the buddy allocator while the user's VMA still
    has vm_start pointing into them.  The user can then write into whatever
    the allocator hands out next.
    
    Mirror the MMU lifetime: take get_page references in io_uring_mmap() and
    release them via vm_ops->close.  NOMMU's delete_vma() calls vma_close()
    which runs ->close on munmap.
    
    This also incidentally addresses the duplicate-vm_start case: two mmaps
    of SQ_RING and CQ_RING resolve to the same ctx->ring_region pointer.
    With page refs taken per mmap, the second mmap takes its own refs and
    the pages survive until both mmaps are closed.  The nommu rb-tree BUG_ON
    on duplicate vm_start is a separate mm/nommu.c concern (it should share
    the existing region rather than BUG), but the page lifetime is now
    correct.
    
    Cc: Jens Axboe <axboe@kernel.dk>
    Reported-by: Anthropic
    Assisted-by: gkh_clanker_t1000
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Link: https://patch.msgid.link/2026042115-body-attention-d15b@gregkh
    [axboe: get rid of region lookup, just iterate pages in vma]
    Signed-off-by: Jens Axboe <axboe@kernel.dk>

diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index e6958968975a..4f9b439319c4 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -366,9 +366,53 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
 
 #else /* !CONFIG_MMU */
 
+/*
+ * Drop the pages that were initially referenced and added in
+ * io_uring_mmap(). We cannot have had a mremap() as that isn't supported,
+ * hence the vma should be identical to the one we initially referenced and
+ * mapped, and partial unmaps and splitting isn't possible on a file backed
+ * mapping.
+ */
+static void io_uring_nommu_vm_close(struct vm_area_struct *vma)
+{
+	unsigned long index;
+
+	for (index = vma->vm_start; index < vma->vm_end; index += PAGE_SIZE)
+		put_page(virt_to_page((void *) index));
+}
+
+static const struct vm_operations_struct io_uring_nommu_vm_ops = {
+	.close = io_uring_nommu_vm_close,
+};
+
 int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 {
-	return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
+	struct io_ring_ctx *ctx = file->private_data;
+	struct io_mapped_region *region;
+	unsigned long i;
+
+	if (!is_nommu_shared_mapping(vma->vm_flags))
+		return -EINVAL;
+
+	guard(mutex)(&ctx->mmap_lock);
+	region = io_mmap_get_region(ctx, vma->vm_pgoff);
+	if (!region || !io_region_is_set(region))
+		return -EINVAL;
+
+	if ((vma->vm_end - vma->vm_start) !=
+	    (unsigned long) region->nr_pages << PAGE_SHIFT)
+		return -EINVAL;
+
+	/*
+	 * Pin the pages so io_free_region()'s release_pages() does not
+	 * drop the last reference while this VMA exists. delete_vma()
+	 * in mm/nommu.c calls vma_close() which runs ->close above.
+	 */
+	for (i = 0; i < region->nr_pages; i++)
+		get_page(region->pages[i]);
+
+	vma->vm_ops = &io_uring_nommu_vm_ops;
+	return 0;
 }
 
 unsigned int io_uring_nommu_mmap_capabilities(struct file *file)

-- 
Jens Axboe

     prev parent reply	other threads:[~2026-04-22  2:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 13:46 [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps Greg Kroah-Hartman
2026-04-21 13:50 ` Jens Axboe
2026-04-21 13:55   ` Greg Kroah-Hartman
2026-04-21 14:02     ` Jens Axboe
2026-04-21 16:01     ` Greg Kroah-Hartman
2026-04-21 16:05       ` Jens Axboe
2026-04-21 16:21         ` Jens Axboe
2026-04-21 16:24           ` Greg Kroah-Hartman
2026-04-21 16:41             ` Jens Axboe
2026-04-21 17:04               ` Jens Axboe
2026-04-21 17:38                 ` Jens Axboe
2026-04-21 17:39 ` Jens Axboe
2026-04-22  1:17   ` Jens Axboe
2026-04-22  1:56     ` Jens Axboe
2026-04-22  2:26       ` Jens Axboe [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e6958968975 dfblob:4f9b439319c )
 OR (
bs:"Re: [PATCH] io_uring: take page references for NOMMU pbuf_ring mmaps" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9c20876f-1cdb-429a-abb3-5ddbcd8cac00@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox