From: Vlastimil Babka <[email protected]>
To: "Matthew Wilcox (Oracle)" <[email protected]>,
Andrew Morton <[email protected]>,
Mike Kravetz <[email protected]>,
Luis Chamberlain <[email protected]>,
Oscar Salvador <[email protected]>
Cc: Jens Axboe <[email protected]>,
[email protected], [email protected]
Subject: Re: [PATCH v2 07/13] mm: Remove HUGETLB_PAGE_DTOR
Date: Fri, 8 Dec 2023 18:54:19 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 8/16/23 17:11, Matthew Wilcox (Oracle) wrote:
> We can use a bit in page[1].flags to indicate that this folio belongs
> to hugetlb instead of using a value in page[1].dtors. That lets
> folio_test_hugetlb() become an inline function like it should be.
> We can also get rid of NULL_COMPOUND_DTOR.
>
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
I think this (as commit 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")) is
causing the bug reported by Luis here:
https://bugzilla.kernel.org/show_bug.cgi?id=218227
> page:000000009006bf10 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x3f8a0 pfn:0x1035c0
> flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
> page_type: 0xffffff7f(buddy)
> raw: 017fffc000000000 ffffe704c422f808 ffffe704c41ac008 0000000000000000
> raw: 000000000003f8a0 0000000000000005 00000000ffffff7f 0000000000000000
> page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((vo>
> ------------[ cut here ]------------
> kernel BUG at include/linux/page-flags.h:314!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 6 PID: 2435641 Comm: md5sum Not tainted 6.6.0-rc5 #2
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:folio_flags+0x65/0x70
> Code: a8 40 74 de 48 8b 47 48 a8 01 74 d6 48 83 e8 01 48 39 c7 75 bd eb cb 48 8b 07 a8 40 75 c8 48 c7 c6 d8 a7 c3 89 e8 3b c7 fa ff <0f> 0b 66 0f >
> RSP: 0018:ffffad51c0bfb7a8 EFLAGS: 00010246
> RAX: 000000000000015f RBX: ffffe704c40d7000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff89be8040 RDI: 00000000ffffffff
> RBP: 0000000000103600 R08: 0000000000000000 R09: ffffad51c0bfb658
> R10: 0000000000000003 R11: ffffffff89eacb08 R12: 0000000000000035
> R13: ffffe704c40d7000 R14: 0000000000000000 R15: ffffad51c0bfb930
> FS: 00007f350c51b740(0000) GS:ffff9b62fbd80000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000555860919508 CR3: 00000001217fe002 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <TASK>
> ? die+0x32/0x80
> ? do_trap+0xd6/0x100
> ? folio_flags+0x65/0x70
> ? do_error_trap+0x6a/0x90
> ? folio_flags+0x65/0x70
> ? exc_invalid_op+0x4c/0x60
> ? folio_flags+0x65/0x70
> ? asm_exc_invalid_op+0x16/0x20
> ? folio_flags+0x65/0x70
> ? folio_flags+0x65/0x70
> PageHuge+0x67/0x80
> isolate_migratepages_block+0x1c5/0x13b0
> ? __pv_queued_spin_lock_slowpath+0x16c/0x370
> compact_zone+0x746/0xfc0
> compact_zone_order+0xbb/0x100
> try_to_compact_pages+0xf0/0x2f0
> __alloc_pages_direct_compact+0x78/0x210
> __alloc_pages_slowpath.constprop.0+0xac1/0xdb0
> ? prepare_alloc_pages.constprop.0+0xff/0x1b0
> __alloc_pages+0x218/0x240
> folio_alloc+0x17/0x50
> page_cache_ra_order+0x15a/0x340
> filemap_get_pages+0x136/0x6c0
> ? update_load_avg+0x7e/0x780
> ? current_time+0x2b/0xd0
> filemap_read+0xce/0x340
> ? do_sched_setscheduler+0x111/0x1b0
> ? nohz_balance_exit_idle+0x16/0xc0
> ? trigger_load_balance+0x302/0x370
> ? preempt_count_add+0x47/0xa0
> xfs_file_buffered_read+0x52/0xd0 [xfs]
> xfs_file_read_iter+0x73/0xe0 [xfs]
> vfs_read+0x1b1/0x300
> ksys_read+0x63/0xe0
> do_syscall_64+0x38/0x90
> entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> RIP: 0033:0x7f350c615a5d
> Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d a6 60 0a 00 e8 99 08 02 00 66 0f 1f 84 00 00 00 00 00 80 3d 81 3b 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 >
> RSP: 002b:00007ffca3ef5108 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 000055ff140712f0 RCX: 00007f350c615a5d
> RDX: 0000000000008000 RSI: 000055ff140714d0 RDI: 0000000000000003
> RBP: 00007f350c6ee600 R08: 0000000000000900 R09: 000000000b9bc05b
> R10: 000055ff140794d0 R11: 0000000000000246 R12: 000055ff140714d0
> R13: 0000000000008000 R14: 0000000000000a68 R15: 00007f350c6edd00
> </TASK>
> Modules linked in: dm_zero dm_thin_pool dm_persistent_data dm_bio_prison sd_mod sg scsi_mod scsi_common dm_snapshot dm_bufio dm_flakey xfs sunrpc >
> ---[ end trace 0000000000000000 ]---
It's because PageHuge() now does folio_test_hugetlb() which is documented to
assume caller holds a reference, but the test in
isolate_migratepages_block() doesn't. The code is there from 2021 by Oscar,
perhaps it could be changed to take a reference (and e.g. only test
PageHead() before), but it will be a bit involved as the
isolate_or_dissolve_huge_page() it calls has some logic based on the
refcount being zero/non-zero as well. Oscar, what do you think?
Also I wonder if any of the the other existing PageHuge() callers are also
affected because they might be doing so without a reference.
(keeping the rest of patch for new Cc's)
> ---
> .../admin-guide/kdump/vmcoreinfo.rst | 10 +---
> include/linux/mm.h | 4 --
> include/linux/page-flags.h | 43 ++++++++++++----
> kernel/crash_core.c | 2 +-
> mm/hugetlb.c | 49 +++----------------
> mm/page_alloc.c | 2 +-
> 6 files changed, 43 insertions(+), 67 deletions(-)
>
> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> index c18d94fa6470..baa1c355741d 100644
> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> @@ -325,8 +325,8 @@ NR_FREE_PAGES
> On linux-2.6.21 or later, the number of free pages is in
> vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
>
> -PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
> -------------------------------------------------------------------------------
> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PG_hugetlb
> +-----------------------------------------------------------------------------------------
>
> Page attributes. These flags are used to filter various unnecessary for
> dumping pages.
> @@ -338,12 +338,6 @@ More page attributes. These flags are used to filter various unnecessary for
> dumping pages.
>
>
> -HUGETLB_PAGE_DTOR
> ------------------
> -
> -The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
> -excludes these pages.
> -
> x86_64
> ======
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7b800d1298dc..642f5fe5860e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1268,11 +1268,7 @@ void folio_copy(struct folio *dst, struct folio *src);
> unsigned long nr_free_buffer_pages(void);
>
> enum compound_dtor_id {
> - NULL_COMPOUND_DTOR,
> COMPOUND_PAGE_DTOR,
> -#ifdef CONFIG_HUGETLB_PAGE
> - HUGETLB_PAGE_DTOR,
> -#endif
> TRANSHUGE_PAGE_DTOR,
> NR_COMPOUND_DTORS,
> };
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 92a2063a0a23..aeecf0cf1456 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -171,15 +171,6 @@ enum pageflags {
> /* Remapped by swiotlb-xen. */
> PG_xen_remapped = PG_owner_priv_1,
>
> -#ifdef CONFIG_MEMORY_FAILURE
> - /*
> - * Compound pages. Stored in first tail page's flags.
> - * Indicates that at least one subpage is hwpoisoned in the
> - * THP.
> - */
> - PG_has_hwpoisoned = PG_error,
> -#endif
> -
> /* non-lru isolated movable page */
> PG_isolated = PG_reclaim,
>
> @@ -190,6 +181,15 @@ enum pageflags {
> /* For self-hosted memmap pages */
> PG_vmemmap_self_hosted = PG_owner_priv_1,
> #endif
> +
> + /*
> + * Flags only valid for compound pages. Stored in first tail page's
> + * flags word.
> + */
> +
> + /* At least one page in this folio has the hwpoison flag set */
> + PG_has_hwpoisoned = PG_error,
> + PG_hugetlb = PG_active,
> };
>
> #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1)
> @@ -812,7 +812,23 @@ static inline void ClearPageCompound(struct page *page)
>
> #ifdef CONFIG_HUGETLB_PAGE
> int PageHuge(struct page *page);
> -bool folio_test_hugetlb(struct folio *folio);
> +SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
> +CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
> +
> +/**
> + * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
> + * @folio: The folio to test.
> + *
> + * Context: Any context. Caller should have a reference on the folio to
> + * prevent it from being turned into a tail page.
> + * Return: True for hugetlbfs folios, false for anon folios or folios
> + * belonging to other filesystems.
> + */
> +static inline bool folio_test_hugetlb(struct folio *folio)
> +{
> + return folio_test_large(folio) &&
> + test_bit(PG_hugetlb, folio_flags(folio, 1));
> +}
> #else
> TESTPAGEFLAG_FALSE(Huge, hugetlb)
> #endif
> @@ -1040,6 +1056,13 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
> #define PAGE_FLAGS_CHECK_AT_PREP \
> ((PAGEFLAGS_MASK & ~__PG_HWPOISON) | LRU_GEN_MASK | LRU_REFS_MASK)
>
> +/*
> + * Flags stored in the second page of a compound page. They may overlap
> + * the CHECK_AT_FREE flags above, so need to be cleared.
> + */
> +#define PAGE_FLAGS_SECOND \
> + (1UL << PG_has_hwpoisoned | 1UL << PG_hugetlb)
> +
> #define PAGE_FLAGS_PRIVATE \
> (1UL << PG_private | 1UL << PG_private_2)
> /**
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 90ce1dfd591c..dd5f87047d06 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -490,7 +490,7 @@ static int __init crash_save_vmcoreinfo_init(void)
> #define PAGE_BUDDY_MAPCOUNT_VALUE (~PG_buddy)
> VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
> #ifdef CONFIG_HUGETLB_PAGE
> - VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
> + VMCOREINFO_NUMBER(PG_hugetlb);
> #define PAGE_OFFLINE_MAPCOUNT_VALUE (~PG_offline)
> VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
> #endif
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 086eb51bf845..389490f100b0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1585,25 +1585,7 @@ static inline void __clear_hugetlb_destructor(struct hstate *h,
> {
> lockdep_assert_held(&hugetlb_lock);
>
> - /*
> - * Very subtle
> - *
> - * For non-gigantic pages set the destructor to the normal compound
> - * page dtor. This is needed in case someone takes an additional
> - * temporary ref to the page, and freeing is delayed until they drop
> - * their reference.
> - *
> - * For gigantic pages set the destructor to the null dtor. This
> - * destructor will never be called. Before freeing the gigantic
> - * page destroy_compound_gigantic_folio will turn the folio into a
> - * simple group of pages. After this the destructor does not
> - * apply.
> - *
> - */
> - if (hstate_is_gigantic(h))
> - folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
> - else
> - folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
> + folio_clear_hugetlb(folio);
> }
>
> /*
> @@ -1690,7 +1672,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio,
> h->surplus_huge_pages_node[nid]++;
> }
>
> - folio_set_compound_dtor(folio, HUGETLB_PAGE_DTOR);
> + folio_set_hugetlb(folio);
> folio_change_private(folio, NULL);
> /*
> * We have to set hugetlb_vmemmap_optimized again as above
> @@ -1814,9 +1796,8 @@ static void free_hpage_workfn(struct work_struct *work)
> /*
> * The VM_BUG_ON_FOLIO(!folio_test_hugetlb(folio), folio) in
> * folio_hstate() is going to trigger because a previous call to
> - * remove_hugetlb_folio() will call folio_set_compound_dtor
> - * (folio, NULL_COMPOUND_DTOR), so do not use folio_hstate()
> - * directly.
> + * remove_hugetlb_folio() will clear the hugetlb bit, so do
> + * not use folio_hstate() directly.
> */
> h = size_to_hstate(page_size(page));
>
> @@ -1955,7 +1936,7 @@ static void __prep_new_hugetlb_folio(struct hstate *h, struct folio *folio)
> {
> hugetlb_vmemmap_optimize(h, &folio->page);
> INIT_LIST_HEAD(&folio->lru);
> - folio_set_compound_dtor(folio, HUGETLB_PAGE_DTOR);
> + folio_set_hugetlb(folio);
> hugetlb_set_folio_subpool(folio, NULL);
> set_hugetlb_cgroup(folio, NULL);
> set_hugetlb_cgroup_rsvd(folio, NULL);
> @@ -2070,28 +2051,10 @@ int PageHuge(struct page *page)
> if (!PageCompound(page))
> return 0;
> folio = page_folio(page);
> - return folio->_folio_dtor == HUGETLB_PAGE_DTOR;
> + return folio_test_hugetlb(folio);
> }
> EXPORT_SYMBOL_GPL(PageHuge);
>
> -/**
> - * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
> - * @folio: The folio to test.
> - *
> - * Context: Any context. Caller should have a reference on the folio to
> - * prevent it from being turned into a tail page.
> - * Return: True for hugetlbfs folios, false for anon folios or folios
> - * belonging to other filesystems.
> - */
> -bool folio_test_hugetlb(struct folio *folio)
> -{
> - if (!folio_test_large(folio))
> - return false;
> -
> - return folio->_folio_dtor == HUGETLB_PAGE_DTOR;
> -}
> -EXPORT_SYMBOL_GPL(folio_test_hugetlb);
> -
> /*
> * Find and lock address space (mapping) in write mode.
> *
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9638fdddf065..f8e276de4fd5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1122,7 +1122,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
> VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
>
> if (compound)
> - ClearPageHasHWPoisoned(page);
> + page[1].flags &= ~PAGE_FLAGS_SECOND;
> for (i = 1; i < (1 << order); i++) {
> if (compound)
> bad += free_tail_page_prepare(page, page + i);
next prev parent reply other threads:[~2023-12-08 17:54 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-16 15:11 [PATCH v2 00/13] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 01/13] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 02/13] mm: Call free_huge_page() directly Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 03/13] mm: Convert free_huge_page() to free_huge_folio() Matthew Wilcox (Oracle)
2023-08-16 20:14 ` Sidhartha Kumar
2023-08-17 3:31 ` Yanteng Si
2023-08-16 15:11 ` [PATCH v2 04/13] mm: Convert free_transhuge_folio() to folio_undo_large_rmappable() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 05/13] mm; Convert prep_transhuge_page() to folio_prep_large_rmappable() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 06/13] mm: Remove free_compound_page() and the compound_page_dtors array Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 07/13] mm: Remove HUGETLB_PAGE_DTOR Matthew Wilcox (Oracle)
2023-08-16 21:45 ` Sidhartha Kumar
2023-08-22 3:13 ` Mike Kravetz
2023-08-22 3:32 ` Matthew Wilcox
2023-08-22 17:19 ` Mike Kravetz
2023-12-08 17:54 ` Vlastimil Babka [this message]
2023-12-08 18:31 ` Matthew Wilcox
2023-12-13 12:23 ` David Hildenbrand
2023-08-16 15:11 ` [PATCH v2 08/13] mm: Add large_rmappable page flag Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 09/13] mm: Rearrange page flags Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 10/13] mm: Free up a word in the first tail page Matthew Wilcox (Oracle)
2023-08-22 23:17 ` Mike Kravetz
2023-08-23 0:29 ` Matthew Wilcox
2023-08-16 15:11 ` [PATCH v2 11/13] mm: Remove folio_test_transhuge() Matthew Wilcox (Oracle)
2023-08-16 15:12 ` [PATCH v2 12/13] mm: Add tail private fields to struct folio Matthew Wilcox (Oracle)
2023-08-16 15:21 ` David Hildenbrand
2023-08-16 15:12 ` [PATCH v2 13/13] mm: Convert split_huge_pages_pid() to use a folio Matthew Wilcox (Oracle)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox