From: Matthew Wilcox <[email protected]>
To: Vlastimil Babka <[email protected]>
Cc: Andrew Morton <[email protected]>,
Mike Kravetz <[email protected]>,
Luis Chamberlain <[email protected]>,
Oscar Salvador <[email protected]>, Jens Axboe <[email protected]>,
[email protected], [email protected]
Subject: Re: [PATCH v2 07/13] mm: Remove HUGETLB_PAGE_DTOR
Date: Fri, 8 Dec 2023 18:31:54 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On Fri, Dec 08, 2023 at 06:54:19PM +0100, Vlastimil Babka wrote:
> On 8/16/23 17:11, Matthew Wilcox (Oracle) wrote:
> > We can use a bit in page[1].flags to indicate that this folio belongs
> > to hugetlb instead of using a value in page[1].dtors. That lets
> > folio_test_hugetlb() become an inline function like it should be.
> > We can also get rid of NULL_COMPOUND_DTOR.
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
>
> I think this (as commit 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")) is
> causing the bug reported by Luis here:
> https://bugzilla.kernel.org/show_bug.cgi?id=218227
Luis, please stop using bugzilla. If you'd sent email like a normal
kernel developer, I'd've seen this bug earlier.
> > page:000000009006bf10 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x3f8a0 pfn:0x1035c0
> > flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
> > page_type: 0xffffff7f(buddy)
> > raw: 017fffc000000000 ffffe704c422f808 ffffe704c41ac008 0000000000000000
> > raw: 000000000003f8a0 0000000000000005 00000000ffffff7f 0000000000000000
> > page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((vo>
> > ------------[ cut here ]------------
> > kernel BUG at include/linux/page-flags.h:314!
> > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 6 PID: 2435641 Comm: md5sum Not tainted 6.6.0-rc5 #2
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > RIP: 0010:folio_flags+0x65/0x70
> > Code: a8 40 74 de 48 8b 47 48 a8 01 74 d6 48 83 e8 01 48 39 c7 75 bd eb cb 48 8b 07 a8 40 75 c8 48 c7 c6 d8 a7 c3 89 e8 3b c7 fa ff <0f> 0b 66 0f >
> > RSP: 0018:ffffad51c0bfb7a8 EFLAGS: 00010246
> > RAX: 000000000000015f RBX: ffffe704c40d7000 RCX: 0000000000000000
> > RDX: 0000000000000000 RSI: ffffffff89be8040 RDI: 00000000ffffffff
> > RBP: 0000000000103600 R08: 0000000000000000 R09: ffffad51c0bfb658
> > R10: 0000000000000003 R11: ffffffff89eacb08 R12: 0000000000000035
> > R13: ffffe704c40d7000 R14: 0000000000000000 R15: ffffad51c0bfb930
> > FS: 00007f350c51b740(0000) GS:ffff9b62fbd80000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000555860919508 CR3: 00000001217fe002 CR4: 0000000000770ee0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > Call Trace:
> > <TASK>
> > ? die+0x32/0x80
> > ? do_trap+0xd6/0x100
> > ? folio_flags+0x65/0x70
> > ? do_error_trap+0x6a/0x90
> > ? folio_flags+0x65/0x70
> > ? exc_invalid_op+0x4c/0x60
> > ? folio_flags+0x65/0x70
> > ? asm_exc_invalid_op+0x16/0x20
> > ? folio_flags+0x65/0x70
> > ? folio_flags+0x65/0x70
> > PageHuge+0x67/0x80
> > isolate_migratepages_block+0x1c5/0x13b0
> > compact_zone+0x746/0xfc0
> > compact_zone_order+0xbb/0x100
> > try_to_compact_pages+0xf0/0x2f0
> > __alloc_pages_direct_compact+0x78/0x210
> > __alloc_pages_slowpath.constprop.0+0xac1/0xdb0
> > __alloc_pages+0x218/0x240
> > folio_alloc+0x17/0x50
>
> It's because PageHuge() now does folio_test_hugetlb() which is documented to
> assume caller holds a reference, but the test in
> isolate_migratepages_block() doesn't. The code is there from 2021 by Oscar,
> perhaps it could be changed to take a reference (and e.g. only test
> PageHead() before), but it will be a bit involved as the
> isolate_or_dissolve_huge_page() it calls has some logic based on the
> refcount being zero/non-zero as well. Oscar, what do you think?
> Also I wonder if any of the the other existing PageHuge() callers are also
> affected because they might be doing so without a reference.
I don't think the warning is actually wrong! We're living very
dangerously here as PageHuge() could have returned a false positive
before this change [1]. Then we assume that compound_nr() returns a
consistent result (and compound_order() might, for example, return a
value larger than 63, leading to UB).
I think there's a place for a folio_test_hugetlb_unsafe(), but that
would only remove the warning, and do nothing to fix all the unsafe
usage. The hugetlb handling code in isolate_migratepages_block()
doesn't seem to have any understanding that it's working with pages
that can change under it. I can have a go at fixing it up, but maybe
Oscar would prefer to do it?
[1] terribly unlikely, but consider what happens if the page starts out
as a non-hugetlb compound allocation. Then it is freed and reallocated;
the new owner of the second page could have put enough information
into it that tricked PageHuge() into returning true. Then we call
isolate_or_dissolve_huge_page() which takes a lock, but doesn't take
a refcount. Now it gets a bogus hstate and things go downhill from
there ...)
next prev parent reply other threads:[~2023-12-08 18:32 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-16 15:11 [PATCH v2 00/13] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 01/13] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 02/13] mm: Call free_huge_page() directly Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 03/13] mm: Convert free_huge_page() to free_huge_folio() Matthew Wilcox (Oracle)
2023-08-16 20:14 ` Sidhartha Kumar
2023-08-17 3:31 ` Yanteng Si
2023-08-16 15:11 ` [PATCH v2 04/13] mm: Convert free_transhuge_folio() to folio_undo_large_rmappable() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 05/13] mm; Convert prep_transhuge_page() to folio_prep_large_rmappable() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 06/13] mm: Remove free_compound_page() and the compound_page_dtors array Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 07/13] mm: Remove HUGETLB_PAGE_DTOR Matthew Wilcox (Oracle)
2023-08-16 21:45 ` Sidhartha Kumar
2023-08-22 3:13 ` Mike Kravetz
2023-08-22 3:32 ` Matthew Wilcox
2023-08-22 17:19 ` Mike Kravetz
2023-12-08 17:54 ` Vlastimil Babka
2023-12-08 18:31 ` Matthew Wilcox [this message]
2023-12-13 12:23 ` David Hildenbrand
2023-08-16 15:11 ` [PATCH v2 08/13] mm: Add large_rmappable page flag Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 09/13] mm: Rearrange page flags Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 10/13] mm: Free up a word in the first tail page Matthew Wilcox (Oracle)
2023-08-22 23:17 ` Mike Kravetz
2023-08-23 0:29 ` Matthew Wilcox
2023-08-16 15:11 ` [PATCH v2 11/13] mm: Remove folio_test_transhuge() Matthew Wilcox (Oracle)
2023-08-16 15:12 ` [PATCH v2 12/13] mm: Add tail private fields to struct folio Matthew Wilcox (Oracle)
2023-08-16 15:21 ` David Hildenbrand
2023-08-16 15:12 ` [PATCH v2 13/13] mm: Convert split_huge_pages_pid() to use a folio Matthew Wilcox (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox