From: Jens Axboe <[email protected]>
To: Johannes Weiner <[email protected]>
Cc: [email protected]
Subject: Re: [PATCH 02/10] io_uring: get rid of remap_pfn_range() for mapping rings/sqes
Date: Thu, 28 Mar 2024 08:49:11 -0600 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 3/28/24 8:08 AM, Johannes Weiner wrote:
> On Wed, Mar 27, 2024 at 01:13:37PM -0600, Jens Axboe wrote:
>> Rather than use remap_pfn_range() for this and manually free later,
>> switch to using vm_insert_pages() and have it Just Work.
>>
>> If possible, allocate a single compound page that covers the range that
>> is needed. If that works, then we can just use page_address() on that
>> page. If we fail to get a compound page, allocate single pages and use
>> vmap() to map them into the kernel virtual address space.
>>
>> This just covers the rings/sqes, the other remaining user of the mmap
>> remap_pfn_range() user will be converted separately. Once that is done,
>> we can kill the old alloc/free code.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>
> Overall this looks good to me.
>
> Two comments below:
>
>> @@ -2601,6 +2601,27 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
>> return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0;
>> }
>>
>> +static void io_pages_unmap(void *ptr, struct page ***pages,
>> + unsigned short *npages)
>> +{
>> + bool do_vunmap = false;
>> +
>> + if (*npages) {
>> + struct page **to_free = *pages;
>> + int i;
>> +
>> + /* only did vmap for non-compound and multiple pages */
>> + do_vunmap = !PageCompound(to_free[0]) && *npages > 1;
>> + for (i = 0; i < *npages; i++)
>> + put_page(to_free[i]);
>> + }
>> + if (do_vunmap)
>> + vunmap(ptr);
>> + kvfree(*pages);
>> + *pages = NULL;
>> + *npages = 0;
>> +}
>> +
>> void io_mem_free(void *ptr)
>> {
>> if (!ptr)
>> @@ -2701,8 +2722,8 @@ static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr,
>> static void io_rings_free(struct io_ring_ctx *ctx)
>> {
>> if (!(ctx->flags & IORING_SETUP_NO_MMAP)) {
>> - io_mem_free(ctx->rings);
>> - io_mem_free(ctx->sq_sqes);
>> + io_pages_unmap(ctx->rings, &ctx->ring_pages, &ctx->n_ring_pages);
>> + io_pages_unmap(ctx->sq_sqes, &ctx->sqe_pages, &ctx->n_sqe_pages);
>> } else {
>> io_pages_free(&ctx->ring_pages, ctx->n_ring_pages);
>> ctx->n_ring_pages = 0;
>> @@ -2714,6 +2735,84 @@ static void io_rings_free(struct io_ring_ctx *ctx)
>> ctx->sq_sqes = NULL;
>> }
>>
>> +static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
>> + size_t size, gfp_t gfp)
>> +{
>> + struct page *page;
>> + int i, order;
>> +
>> + order = get_order(size);
>> + if (order > MAX_PAGE_ORDER)
>> + return NULL;
>> + else if (order)
>> + gfp |= __GFP_COMP;
>> +
>> + page = alloc_pages(gfp, order);
>> + if (!page)
>> + return NULL;
>> +
>> + /* add pages, grab a ref to tail pages */
>> + for (i = 0; i < nr_pages; i++) {
>> + pages[i] = page + i;
>> + if (i)
>> + get_page(pages[i]);
>> + }
>
> You don't need those extra refs.
>
> __GFP_COMP makes a super page that acts like a single entity. The ref
> returned by alloc_pages() keeps the whole thing alive; you can then do
> a single put in io_pages_unmap() for the compound case as well.
>
> [ vm_insert_pages() and munmap() still do gets and puts on the tail
> pages as they are individually mapped and unmapped, but those calls
> get implicitly redirected to the compound refcount maintained in the
> head page. IOW, an munmap() of an individual tail page won't free
> that tail as long as you hold the base ref from the alloc_pages(). ]
OK then, so I can just do something ala:
diff --git a/io_uring/memmap.c b/io_uring/memmap.c
index bf1527055679..d168752c206f 100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@ -29,12 +29,8 @@ static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
if (!page)
return NULL;
- /* add pages, grab a ref to tail pages */
- for (i = 0; i < nr_pages; i++) {
+ for (i = 0; i < nr_pages; i++)
pages[i] = page + i;
- if (i)
- get_page(pages[i]);
- }
return page_address(page);
}
@@ -100,8 +96,14 @@ void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
struct page **to_free = *pages;
int i;
- /* only did vmap for non-compound and multiple pages */
- do_vunmap = !PageCompound(to_free[0]) && *npages > 1;
+ /*
+ * Only did vmap for the non-compound multiple page case.
+ * For the compound page, we just need to put the head.
+ */
+ if (PageCompound(to_free[0]))
+ *npages = 1;
+ else if (*npages > 1)
+ do_vunmap = true;
for (i = 0; i < *npages; i++)
put_page(to_free[i]);
}
and not need any extra refs. I wish the compound page was a bit more
integrated, eg I could just do vm_inser_page() on a single compound page
and have it Just Work. But I have to treat it as separate pages there.
Thanks!
>> +static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size,
>> + gfp_t gfp)
>> +{
>> + void *ret;
>> + int i;
>> +
>> + for (i = 0; i < nr_pages; i++) {
>> + pages[i] = alloc_page(gfp);
>> + if (!pages[i])
>> + goto err;
>> + }
>> +
>> + ret = vmap(pages, nr_pages, VM_MAP | VM_ALLOW_HUGE_VMAP, PAGE_KERNEL);
>
> You can kill the VM_ALLOW_HUGE_VMAP.
>
> It's a no-op in vmap(), since you're passing an array of order-0
> pages, which cannot be mapped by anything larger than PTEs.
Noted, will kill the VM_ALLOW_HUGE_VMAP.
--
Jens Axboe
next prev parent reply other threads:[~2024-03-28 14:49 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-27 19:13 [PATCHSET v2 0/10] Move away from remap_pfn_range() Jens Axboe
2024-03-27 19:13 ` [PATCH 01/10] mm: add nommu variant of vm_insert_pages() Jens Axboe
2024-03-28 13:23 ` Johannes Weiner
2024-03-27 19:13 ` [PATCH 02/10] io_uring: get rid of remap_pfn_range() for mapping rings/sqes Jens Axboe
2024-03-28 14:08 ` Johannes Weiner
2024-03-28 14:49 ` Jens Axboe [this message]
2024-03-27 19:13 ` [PATCH 03/10] io_uring: use vmap() for ring mapping Jens Axboe
2024-03-27 20:29 ` Jeff Moyer
2024-03-27 20:31 ` Jens Axboe
2024-03-27 19:13 ` [PATCH 04/10] io_uring: unify io_pin_pages() Jens Axboe
2024-03-27 19:13 ` [PATCH 05/10] io_uring/kbuf: get rid of lower BGID lists Jens Axboe
2024-03-27 19:13 ` [PATCH 06/10] io_uring/kbuf: get rid of bl->is_ready Jens Axboe
2024-03-27 19:13 ` [PATCH 07/10] io_uring/kbuf: vmap pinned buffer ring Jens Axboe
2024-03-27 19:13 ` [PATCH 08/10] io_uring/kbuf: protect io_buffer_list teardown with a reference Jens Axboe
2024-03-27 19:13 ` [PATCH 09/10] io_uring/kbuf: use vm_insert_pages() for mmap'ed pbuf ring Jens Axboe
2024-03-27 19:13 ` [PATCH 10/10] io_uring: use unpin_user_pages() where appropriate Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox