public inbox for [email protected]
 help / color / mirror / Atom feed
From: Vlastimil Babka <[email protected]>
To: "Matthew Wilcox (Oracle)" <[email protected]>,
	Andrew Morton <[email protected]>,
	Mike Kravetz <[email protected]>,
	Luis Chamberlain <[email protected]>,
	Oscar Salvador <[email protected]>
Cc: Jens Axboe <[email protected]>,
	[email protected], [email protected]
Subject: Re: [PATCH v2 07/13] mm: Remove HUGETLB_PAGE_DTOR
Date: Fri, 8 Dec 2023 18:54:19 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/16/23 17:11, Matthew Wilcox (Oracle) wrote:
> We can use a bit in page[1].flags to indicate that this folio belongs
> to hugetlb instead of using a value in page[1].dtors.  That lets
> folio_test_hugetlb() become an inline function like it should be.
> We can also get rid of NULL_COMPOUND_DTOR.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>

I think this (as commit 9c5ccf2db04b ("mm: remove HUGETLB_PAGE_DTOR")) is
causing the bug reported by Luis here:
https://bugzilla.kernel.org/show_bug.cgi?id=218227

> page:000000009006bf10 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x3f8a0 pfn:0x1035c0
> flags: 0x17fffc000000000(node=0|zone=2|lastcpupid=0x1ffff)
> page_type: 0xffffff7f(buddy)
> raw: 017fffc000000000 ffffe704c422f808 ffffe704c41ac008 0000000000000000
> raw: 000000000003f8a0 0000000000000005 00000000ffffff7f 0000000000000000
> page dumped because: VM_BUG_ON_PAGE(n > 0 && !((__builtin_constant_p(PG_head) && __builtin_constant_p((uintptr_t)(&page->flags) != (uintptr_t)((vo>
> ------------[ cut here ]------------
> kernel BUG at include/linux/page-flags.h:314!
> invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 6 PID: 2435641 Comm: md5sum Not tainted 6.6.0-rc5 #2
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:folio_flags+0x65/0x70
> Code: a8 40 74 de 48 8b 47 48 a8 01 74 d6 48 83 e8 01 48 39 c7 75 bd eb cb 48 8b 07 a8 40 75 c8 48 c7 c6 d8 a7 c3 89 e8 3b c7 fa ff <0f> 0b 66 0f >
> RSP: 0018:ffffad51c0bfb7a8 EFLAGS: 00010246
> RAX: 000000000000015f RBX: ffffe704c40d7000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: ffffffff89be8040 RDI: 00000000ffffffff
> RBP: 0000000000103600 R08: 0000000000000000 R09: ffffad51c0bfb658
> R10: 0000000000000003 R11: ffffffff89eacb08 R12: 0000000000000035
> R13: ffffe704c40d7000 R14: 0000000000000000 R15: ffffad51c0bfb930
> FS:  00007f350c51b740(0000) GS:ffff9b62fbd80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000555860919508 CR3: 00000001217fe002 CR4: 0000000000770ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  ? die+0x32/0x80
>  ? do_trap+0xd6/0x100
>  ? folio_flags+0x65/0x70
>  ? do_error_trap+0x6a/0x90
>  ? folio_flags+0x65/0x70
>  ? exc_invalid_op+0x4c/0x60
>  ? folio_flags+0x65/0x70
>  ? asm_exc_invalid_op+0x16/0x20
>  ? folio_flags+0x65/0x70
>  ? folio_flags+0x65/0x70
>  PageHuge+0x67/0x80
>  isolate_migratepages_block+0x1c5/0x13b0
>  ? __pv_queued_spin_lock_slowpath+0x16c/0x370
>  compact_zone+0x746/0xfc0
>  compact_zone_order+0xbb/0x100
>  try_to_compact_pages+0xf0/0x2f0
>  __alloc_pages_direct_compact+0x78/0x210
>  __alloc_pages_slowpath.constprop.0+0xac1/0xdb0
>  ? prepare_alloc_pages.constprop.0+0xff/0x1b0
>  __alloc_pages+0x218/0x240
>  folio_alloc+0x17/0x50
>  page_cache_ra_order+0x15a/0x340
>  filemap_get_pages+0x136/0x6c0
>  ? update_load_avg+0x7e/0x780
>  ? current_time+0x2b/0xd0
>  filemap_read+0xce/0x340
>  ? do_sched_setscheduler+0x111/0x1b0
>  ? nohz_balance_exit_idle+0x16/0xc0
>  ? trigger_load_balance+0x302/0x370
>  ? preempt_count_add+0x47/0xa0
>  xfs_file_buffered_read+0x52/0xd0 [xfs]
>  xfs_file_read_iter+0x73/0xe0 [xfs]
>  vfs_read+0x1b1/0x300
>  ksys_read+0x63/0xe0
>  do_syscall_64+0x38/0x90
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> RIP: 0033:0x7f350c615a5d
> Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d a6 60 0a 00 e8 99 08 02 00 66 0f 1f 84 00 00 00 00 00 80 3d 81 3b 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 >
> RSP: 002b:00007ffca3ef5108 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 000055ff140712f0 RCX: 00007f350c615a5d
> RDX: 0000000000008000 RSI: 000055ff140714d0 RDI: 0000000000000003
> RBP: 00007f350c6ee600 R08: 0000000000000900 R09: 000000000b9bc05b
> R10: 000055ff140794d0 R11: 0000000000000246 R12: 000055ff140714d0
> R13: 0000000000008000 R14: 0000000000000a68 R15: 00007f350c6edd00
>  </TASK>
> Modules linked in: dm_zero dm_thin_pool dm_persistent_data dm_bio_prison sd_mod sg scsi_mod scsi_common dm_snapshot dm_bufio dm_flakey xfs sunrpc >
> ---[ end trace 0000000000000000 ]---

It's because PageHuge() now does folio_test_hugetlb() which is documented to
assume caller holds a reference, but the test in
isolate_migratepages_block() doesn't. The code is there from 2021 by Oscar,
perhaps it could be changed to take a reference (and e.g. only test
PageHead() before), but it will be a bit involved as the
isolate_or_dissolve_huge_page() it calls has some logic based on the
refcount being zero/non-zero as well. Oscar, what do you think?
Also I wonder if any of the the other existing PageHuge() callers are also
affected because they might be doing so without a reference.

(keeping the rest of patch for new Cc's)

> ---
>  .../admin-guide/kdump/vmcoreinfo.rst          | 10 +---
>  include/linux/mm.h                            |  4 --
>  include/linux/page-flags.h                    | 43 ++++++++++++----
>  kernel/crash_core.c                           |  2 +-
>  mm/hugetlb.c                                  | 49 +++----------------
>  mm/page_alloc.c                               |  2 +-
>  6 files changed, 43 insertions(+), 67 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> index c18d94fa6470..baa1c355741d 100644
> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
> +++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> @@ -325,8 +325,8 @@ NR_FREE_PAGES
>  On linux-2.6.21 or later, the number of free pages is in
>  vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
>  
> -PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
> -------------------------------------------------------------------------------
> +PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PG_hugetlb
> +-----------------------------------------------------------------------------------------
>  
>  Page attributes. These flags are used to filter various unnecessary for
>  dumping pages.
> @@ -338,12 +338,6 @@ More page attributes. These flags are used to filter various unnecessary for
>  dumping pages.
>  
>  
> -HUGETLB_PAGE_DTOR
> ------------------
> -
> -The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
> -excludes these pages.
> -
>  x86_64
>  ======
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 7b800d1298dc..642f5fe5860e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1268,11 +1268,7 @@ void folio_copy(struct folio *dst, struct folio *src);
>  unsigned long nr_free_buffer_pages(void);
>  
>  enum compound_dtor_id {
> -	NULL_COMPOUND_DTOR,
>  	COMPOUND_PAGE_DTOR,
> -#ifdef CONFIG_HUGETLB_PAGE
> -	HUGETLB_PAGE_DTOR,
> -#endif
>  	TRANSHUGE_PAGE_DTOR,
>  	NR_COMPOUND_DTORS,
>  };
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 92a2063a0a23..aeecf0cf1456 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -171,15 +171,6 @@ enum pageflags {
>  	/* Remapped by swiotlb-xen. */
>  	PG_xen_remapped = PG_owner_priv_1,
>  
> -#ifdef CONFIG_MEMORY_FAILURE
> -	/*
> -	 * Compound pages. Stored in first tail page's flags.
> -	 * Indicates that at least one subpage is hwpoisoned in the
> -	 * THP.
> -	 */
> -	PG_has_hwpoisoned = PG_error,
> -#endif
> -
>  	/* non-lru isolated movable page */
>  	PG_isolated = PG_reclaim,
>  
> @@ -190,6 +181,15 @@ enum pageflags {
>  	/* For self-hosted memmap pages */
>  	PG_vmemmap_self_hosted = PG_owner_priv_1,
>  #endif
> +
> +	/*
> +	 * Flags only valid for compound pages.  Stored in first tail page's
> +	 * flags word.
> +	 */
> +
> +	/* At least one page in this folio has the hwpoison flag set */
> +	PG_has_hwpoisoned = PG_error,
> +	PG_hugetlb = PG_active,
>  };
>  
>  #define PAGEFLAGS_MASK		((1UL << NR_PAGEFLAGS) - 1)
> @@ -812,7 +812,23 @@ static inline void ClearPageCompound(struct page *page)
>  
>  #ifdef CONFIG_HUGETLB_PAGE
>  int PageHuge(struct page *page);
> -bool folio_test_hugetlb(struct folio *folio);
> +SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
> +CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
> +
> +/**
> + * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
> + * @folio: The folio to test.
> + *
> + * Context: Any context.  Caller should have a reference on the folio to
> + * prevent it from being turned into a tail page.
> + * Return: True for hugetlbfs folios, false for anon folios or folios
> + * belonging to other filesystems.
> + */
> +static inline bool folio_test_hugetlb(struct folio *folio)
> +{
> +	return folio_test_large(folio) &&
> +		test_bit(PG_hugetlb, folio_flags(folio, 1));
> +}
>  #else
>  TESTPAGEFLAG_FALSE(Huge, hugetlb)
>  #endif
> @@ -1040,6 +1056,13 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>  #define PAGE_FLAGS_CHECK_AT_PREP	\
>  	((PAGEFLAGS_MASK & ~__PG_HWPOISON) | LRU_GEN_MASK | LRU_REFS_MASK)
>  
> +/*
> + * Flags stored in the second page of a compound page.  They may overlap
> + * the CHECK_AT_FREE flags above, so need to be cleared.
> + */
> +#define PAGE_FLAGS_SECOND						\
> +	(1UL << PG_has_hwpoisoned	| 1UL << PG_hugetlb)
> +
>  #define PAGE_FLAGS_PRIVATE				\
>  	(1UL << PG_private | 1UL << PG_private_2)
>  /**
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 90ce1dfd591c..dd5f87047d06 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -490,7 +490,7 @@ static int __init crash_save_vmcoreinfo_init(void)
>  #define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
>  	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
>  #ifdef CONFIG_HUGETLB_PAGE
> -	VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
> +	VMCOREINFO_NUMBER(PG_hugetlb);
>  #define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
>  	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
>  #endif
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 086eb51bf845..389490f100b0 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1585,25 +1585,7 @@ static inline void __clear_hugetlb_destructor(struct hstate *h,
>  {
>  	lockdep_assert_held(&hugetlb_lock);
>  
> -	/*
> -	 * Very subtle
> -	 *
> -	 * For non-gigantic pages set the destructor to the normal compound
> -	 * page dtor.  This is needed in case someone takes an additional
> -	 * temporary ref to the page, and freeing is delayed until they drop
> -	 * their reference.
> -	 *
> -	 * For gigantic pages set the destructor to the null dtor.  This
> -	 * destructor will never be called.  Before freeing the gigantic
> -	 * page destroy_compound_gigantic_folio will turn the folio into a
> -	 * simple group of pages.  After this the destructor does not
> -	 * apply.
> -	 *
> -	 */
> -	if (hstate_is_gigantic(h))
> -		folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
> -	else
> -		folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
> +	folio_clear_hugetlb(folio);
>  }
>  
>  /*
> @@ -1690,7 +1672,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio,
>  		h->surplus_huge_pages_node[nid]++;
>  	}
>  
> -	folio_set_compound_dtor(folio, HUGETLB_PAGE_DTOR);
> +	folio_set_hugetlb(folio);
>  	folio_change_private(folio, NULL);
>  	/*
>  	 * We have to set hugetlb_vmemmap_optimized again as above
> @@ -1814,9 +1796,8 @@ static void free_hpage_workfn(struct work_struct *work)
>  		/*
>  		 * The VM_BUG_ON_FOLIO(!folio_test_hugetlb(folio), folio) in
>  		 * folio_hstate() is going to trigger because a previous call to
> -		 * remove_hugetlb_folio() will call folio_set_compound_dtor
> -		 * (folio, NULL_COMPOUND_DTOR), so do not use folio_hstate()
> -		 * directly.
> +		 * remove_hugetlb_folio() will clear the hugetlb bit, so do
> +		 * not use folio_hstate() directly.
>  		 */
>  		h = size_to_hstate(page_size(page));
>  
> @@ -1955,7 +1936,7 @@ static void __prep_new_hugetlb_folio(struct hstate *h, struct folio *folio)
>  {
>  	hugetlb_vmemmap_optimize(h, &folio->page);
>  	INIT_LIST_HEAD(&folio->lru);
> -	folio_set_compound_dtor(folio, HUGETLB_PAGE_DTOR);
> +	folio_set_hugetlb(folio);
>  	hugetlb_set_folio_subpool(folio, NULL);
>  	set_hugetlb_cgroup(folio, NULL);
>  	set_hugetlb_cgroup_rsvd(folio, NULL);
> @@ -2070,28 +2051,10 @@ int PageHuge(struct page *page)
>  	if (!PageCompound(page))
>  		return 0;
>  	folio = page_folio(page);
> -	return folio->_folio_dtor == HUGETLB_PAGE_DTOR;
> +	return folio_test_hugetlb(folio);
>  }
>  EXPORT_SYMBOL_GPL(PageHuge);
>  
> -/**
> - * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
> - * @folio: The folio to test.
> - *
> - * Context: Any context.  Caller should have a reference on the folio to
> - * prevent it from being turned into a tail page.
> - * Return: True for hugetlbfs folios, false for anon folios or folios
> - * belonging to other filesystems.
> - */
> -bool folio_test_hugetlb(struct folio *folio)
> -{
> -	if (!folio_test_large(folio))
> -		return false;
> -
> -	return folio->_folio_dtor == HUGETLB_PAGE_DTOR;
> -}
> -EXPORT_SYMBOL_GPL(folio_test_hugetlb);
> -
>  /*
>   * Find and lock address space (mapping) in write mode.
>   *
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9638fdddf065..f8e276de4fd5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1122,7 +1122,7 @@ static __always_inline bool free_pages_prepare(struct page *page,
>  		VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
>  
>  		if (compound)
> -			ClearPageHasHWPoisoned(page);
> +			page[1].flags &= ~PAGE_FLAGS_SECOND;
>  		for (i = 1; i < (1 << order); i++) {
>  			if (compound)
>  				bad += free_tail_page_prepare(page, page + i);


  parent reply	other threads:[~2023-12-08 17:54 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-16 15:11 [PATCH v2 00/13] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 01/13] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 02/13] mm: Call free_huge_page() directly Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 03/13] mm: Convert free_huge_page() to free_huge_folio() Matthew Wilcox (Oracle)
2023-08-16 20:14   ` Sidhartha Kumar
2023-08-17  3:31   ` Yanteng Si
2023-08-16 15:11 ` [PATCH v2 04/13] mm: Convert free_transhuge_folio() to folio_undo_large_rmappable() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 05/13] mm; Convert prep_transhuge_page() to folio_prep_large_rmappable() Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 06/13] mm: Remove free_compound_page() and the compound_page_dtors array Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 07/13] mm: Remove HUGETLB_PAGE_DTOR Matthew Wilcox (Oracle)
2023-08-16 21:45   ` Sidhartha Kumar
2023-08-22  3:13   ` Mike Kravetz
2023-08-22  3:32     ` Matthew Wilcox
2023-08-22 17:19       ` Mike Kravetz
2023-12-08 17:54   ` Vlastimil Babka [this message]
2023-12-08 18:31     ` Matthew Wilcox
2023-12-13 12:23       ` David Hildenbrand
2023-08-16 15:11 ` [PATCH v2 08/13] mm: Add large_rmappable page flag Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 09/13] mm: Rearrange page flags Matthew Wilcox (Oracle)
2023-08-16 15:11 ` [PATCH v2 10/13] mm: Free up a word in the first tail page Matthew Wilcox (Oracle)
2023-08-22 23:17   ` Mike Kravetz
2023-08-23  0:29     ` Matthew Wilcox
2023-08-16 15:11 ` [PATCH v2 11/13] mm: Remove folio_test_transhuge() Matthew Wilcox (Oracle)
2023-08-16 15:12 ` [PATCH v2 12/13] mm: Add tail private fields to struct folio Matthew Wilcox (Oracle)
2023-08-16 15:21   ` David Hildenbrand
2023-08-16 15:12 ` [PATCH v2 13/13] mm: Convert split_huge_pages_pid() to use a folio Matthew Wilcox (Oracle)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox