[PATCH 0/9] Remove _folio_dtor and _folio

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH 0/9] Remove _folio_dtor and _folio_order
@ 2023-08-15  3:26 Matthew Wilcox (Oracle)
  2023-08-15  3:26 ` [PATCH 1/9] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
                   ` (8 more replies)
  0 siblings, 9 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

This started out as "I should remove the indirect function call from
freeing compound pages" and turned into "I can free up some space in
struct folio".  So ... what am I bid for the new contents of this word
in the first tail page?

Matthew Wilcox (Oracle) (9):
  io_uring: Stop calling free_compound_page()
  mm: Call the hugetlb destructor directly
  mm: Call free_transhuge_folio() directly from destroy_large_folio()
  mm: Make free_compound_page() static
  mm: Remove free_compound_page()
  mm: Remove HUGETLB_PAGE_DTOR
  mm: Add deferred_list page flag
  mm: Rearrange page flags
  mm: Free up a word in the first tail page

 .../admin-guide/kdump/vmcoreinfo.rst          | 14 +---
 include/linux/huge_mm.h                       |  2 +-
 include/linux/hugetlb.h                       |  3 +-
 include/linux/mm.h                            | 39 ++--------
 include/linux/mm_types.h                      |  5 +-
 include/linux/page-flags.h                    | 41 +++++++---
 io_uring/io_uring.c                           |  6 +-
 io_uring/kbuf.c                               |  6 +-
 kernel/crash_core.c                           |  4 +-
 mm/huge_memory.c                              |  8 +-
 mm/hugetlb.c                                  | 75 +++++--------------
 mm/internal.h                                 |  3 +-
 mm/page_alloc.c                               | 35 +++------
 13 files changed, 81 insertions(+), 160 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1/9] io_uring: Stop calling free_compound_page()
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  7:33   ` David Hildenbrand
  2023-08-15 15:00   ` Jens Axboe
  2023-08-15  3:26 ` [PATCH 2/9] mm: Call the hugetlb destructor directly Matthew Wilcox (Oracle)
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

folio_put() is the standard way to write this, and it's not
appreciably slower.  This is an enabling patch for removing
free_compound_page() entirely.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 io_uring/io_uring.c | 6 +-----
 io_uring/kbuf.c     | 6 +-----
 2 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index dcf5fc7d2820..a5b9b5de7aff 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2664,14 +2664,10 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
 
 static void io_mem_free(void *ptr)
 {
-	struct page *page;
-
 	if (!ptr)
 		return;
 
-	page = virt_to_head_page(ptr);
-	if (put_page_testzero(page))
-		free_compound_page(page);
+	folio_put(virt_to_folio(ptr));
 }
 
 static void io_pages_free(struct page ***pages, int npages)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 2f0181521c98..556f4df25b0f 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -218,11 +218,7 @@ static int __io_remove_buffers(struct io_ring_ctx *ctx,
 	if (bl->is_mapped) {
 		i = bl->buf_ring->tail - bl->head;
 		if (bl->is_mmap) {
-			struct page *page;
-
-			page = virt_to_head_page(bl->buf_ring);
-			if (put_page_testzero(page))
-				free_compound_page(page);
+			folio_put(virt_to_folio(bl->buf_ring));
 			bl->buf_ring = NULL;
 			bl->is_mmap = 0;
 		} else if (bl->buf_nr_pages) {
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 2/9] mm: Call the hugetlb destructor directly
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
  2023-08-15  3:26 ` [PATCH 1/9] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  7:36   ` David Hildenbrand
  2023-08-15  3:26 ` [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio() Matthew Wilcox (Oracle)
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

Indirect calls are expensive, thanks to Spectre.  Convert this one to
a direct call, and pass a folio instead of the head page to save a few
more instructions.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 include/linux/hugetlb.h |  3 ++-
 include/linux/mm.h      |  6 +-----
 mm/hugetlb.c            | 26 ++++++++++++--------------
 mm/page_alloc.c         |  8 +++++---
 4 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 0a393bc02f25..9555859537a3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -26,6 +26,8 @@ typedef struct { unsigned long pd; } hugepd_t;
 #define __hugepd(x) ((hugepd_t) { (x) })
 #endif
 
+void free_huge_page(struct folio *folio);
+
 #ifdef CONFIG_HUGETLB_PAGE
 
 #include <linux/mempolicy.h>
@@ -165,7 +167,6 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
 				bool *migratable_cleared);
 void folio_putback_active_hugetlb(struct folio *folio);
 void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason);
-void free_huge_page(struct page *page);
 void hugetlb_fix_reserve_counts(struct inode *inode);
 extern struct mutex *hugetlb_fault_mutex_table;
 u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 19493d6a2bb8..7fb529dbff31 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1278,13 +1278,9 @@ typedef void compound_page_dtor(struct page *);
 enum compound_dtor_id {
 	NULL_COMPOUND_DTOR,
 	COMPOUND_PAGE_DTOR,
-#ifdef CONFIG_HUGETLB_PAGE
 	HUGETLB_PAGE_DTOR,
-#endif
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	TRANSHUGE_PAGE_DTOR,
-#endif
-	NR_COMPOUND_DTORS,
+	NR_COMPOUND_DTORS
 };
 
 static inline void folio_set_compound_dtor(struct folio *folio,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e327a5a7602c..bc340f5dbbd4 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1875,13 +1875,12 @@ struct hstate *size_to_hstate(unsigned long size)
 	return NULL;
 }
 
-void free_huge_page(struct page *page)
+void free_huge_page(struct folio *folio)
 {
 	/*
 	 * Can't pass hstate in here because it is called from the
 	 * compound page destructor.
 	 */
-	struct folio *folio = page_folio(page);
 	struct hstate *h = folio_hstate(folio);
 	int nid = folio_nid(folio);
 	struct hugepage_subpool *spool = hugetlb_folio_subpool(folio);
@@ -1936,7 +1935,7 @@ void free_huge_page(struct page *page)
 		spin_unlock_irqrestore(&hugetlb_lock, flags);
 		update_and_free_hugetlb_folio(h, folio, true);
 	} else {
-		arch_clear_hugepage_flags(page);
+		arch_clear_hugepage_flags(&folio->page);
 		enqueue_hugetlb_folio(h, folio);
 		spin_unlock_irqrestore(&hugetlb_lock, flags);
 	}
@@ -2246,7 +2245,7 @@ static int alloc_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed,
 		folio = alloc_fresh_hugetlb_folio(h, gfp_mask, node,
 					nodes_allowed, node_alloc_noretry);
 		if (folio) {
-			free_huge_page(&folio->page); /* free it into the hugepage allocator */
+			free_huge_page(folio); /* free it into the hugepage allocator */
 			return 1;
 		}
 	}
@@ -2435,7 +2434,7 @@ static struct folio *alloc_surplus_hugetlb_folio(struct hstate *h,
 	if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
 		folio_set_hugetlb_temporary(folio);
 		spin_unlock_irq(&hugetlb_lock);
-		free_huge_page(&folio->page);
+		free_huge_page(folio);
 		return NULL;
 	}
 
@@ -2547,8 +2546,7 @@ static int gather_surplus_pages(struct hstate *h, long delta)
 	__must_hold(&hugetlb_lock)
 {
 	LIST_HEAD(surplus_list);
-	struct folio *folio;
-	struct page *page, *tmp;
+	struct folio *folio, *tmp;
 	int ret;
 	long i;
 	long needed, allocated;
@@ -2608,11 +2606,11 @@ static int gather_surplus_pages(struct hstate *h, long delta)
 	ret = 0;
 
 	/* Free the needed pages to the hugetlb pool */
-	list_for_each_entry_safe(page, tmp, &surplus_list, lru) {
+	list_for_each_entry_safe(folio, tmp, &surplus_list, lru) {
 		if ((--needed) < 0)
 			break;
 		/* Add the page to the hugetlb allocator */
-		enqueue_hugetlb_folio(h, page_folio(page));
+		enqueue_hugetlb_folio(h, folio);
 	}
 free:
 	spin_unlock_irq(&hugetlb_lock);
@@ -2621,8 +2619,8 @@ static int gather_surplus_pages(struct hstate *h, long delta)
 	 * Free unnecessary surplus pages to the buddy allocator.
 	 * Pages have no ref count, call free_huge_page directly.
 	 */
-	list_for_each_entry_safe(page, tmp, &surplus_list, lru)
-		free_huge_page(page);
+	list_for_each_entry_safe(folio, tmp, &surplus_list, lru)
+		free_huge_page(folio);
 	spin_lock_irq(&hugetlb_lock);
 
 	return ret;
@@ -3232,7 +3230,7 @@ static void __init gather_bootmem_prealloc(void)
 		if (prep_compound_gigantic_folio(folio, huge_page_order(h))) {
 			WARN_ON(folio_test_reserved(folio));
 			prep_new_hugetlb_folio(h, folio, folio_nid(folio));
-			free_huge_page(page); /* add to the hugepage allocator */
+			free_huge_page(folio); /* add to the hugepage allocator */
 		} else {
 			/* VERY unlikely inflated ref count on a tail page */
 			free_gigantic_folio(folio, huge_page_order(h));
@@ -3264,7 +3262,7 @@ static void __init hugetlb_hstate_alloc_pages_onenode(struct hstate *h, int nid)
 					&node_states[N_MEMORY], NULL);
 			if (!folio)
 				break;
-			free_huge_page(&folio->page); /* free it into the hugepage allocator */
+			free_huge_page(folio); /* free it into the hugepage allocator */
 		}
 		cond_resched();
 	}
@@ -3658,7 +3656,7 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
 			prep_compound_page(subpage, target_hstate->order);
 		folio_change_private(inner_folio, NULL);
 		prep_new_hugetlb_folio(target_hstate, inner_folio, nid);
-		free_huge_page(subpage);
+		free_huge_page(inner_folio);
 	}
 	mutex_unlock(&target_hstate->resize_lock);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fe9ff917850..1f67d4968590 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -287,9 +287,6 @@ const char * const migratetype_names[MIGRATE_TYPES] = {
 static compound_page_dtor * const compound_page_dtors[NR_COMPOUND_DTORS] = {
 	[NULL_COMPOUND_DTOR] = NULL,
 	[COMPOUND_PAGE_DTOR] = free_compound_page,
-#ifdef CONFIG_HUGETLB_PAGE
-	[HUGETLB_PAGE_DTOR] = free_huge_page,
-#endif
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	[TRANSHUGE_PAGE_DTOR] = free_transhuge_page,
 #endif
@@ -622,6 +619,11 @@ void destroy_large_folio(struct folio *folio)
 {
 	enum compound_dtor_id dtor = folio->_folio_dtor;
 
+	if (folio_test_hugetlb(folio)) {
+		free_huge_page(folio);
+		return;
+	}
+
 	VM_BUG_ON_FOLIO(dtor >= NR_COMPOUND_DTORS, folio);
 	compound_page_dtors[dtor](&folio->page);
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio()
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
  2023-08-15  3:26 ` [PATCH 1/9] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
  2023-08-15  3:26 ` [PATCH 2/9] mm: Call the hugetlb destructor directly Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  6:13   ` kernel test robot
                     ` (2 more replies)
  2023-08-15  3:26 ` [PATCH 4/9] mm: Make free_compound_page() static Matthew Wilcox (Oracle)
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

Indirect calls are expensive, thanks to Spectre.  Convert this one to
a direct call, and pass a folio instead of the head page for type safety.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 include/linux/huge_mm.h | 2 +-
 mm/huge_memory.c        | 5 ++---
 mm/page_alloc.c         | 8 +++++---
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 20284387b841..24aee49a581a 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -144,7 +144,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
 		unsigned long len, unsigned long pgoff, unsigned long flags);
 
 void prep_transhuge_page(struct page *page);
-void free_transhuge_page(struct page *page);
+void free_transhuge_folio(struct folio *folio);
 
 bool can_split_folio(struct folio *folio, int *pextra_pins);
 int split_huge_page_to_list(struct page *page, struct list_head *list);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8480728fa220..516fe3c26ef3 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2779,9 +2779,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 	return ret;
 }
 
-void free_transhuge_page(struct page *page)
+void free_transhuge_folio(struct folio *folio)
 {
-	struct folio *folio = (struct folio *)page;
 	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
 	unsigned long flags;
 
@@ -2798,7 +2797,7 @@ void free_transhuge_page(struct page *page)
 		}
 		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	}
-	free_compound_page(page);
+	free_compound_page(&folio->page);
 }
 
 void deferred_split_folio(struct folio *folio)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1f67d4968590..feb2e95cf021 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -287,9 +287,6 @@ const char * const migratetype_names[MIGRATE_TYPES] = {
 static compound_page_dtor * const compound_page_dtors[NR_COMPOUND_DTORS] = {
 	[NULL_COMPOUND_DTOR] = NULL,
 	[COMPOUND_PAGE_DTOR] = free_compound_page,
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
-	[TRANSHUGE_PAGE_DTOR] = free_transhuge_page,
-#endif
 };
 
 int min_free_kbytes = 1024;
@@ -624,6 +621,11 @@ void destroy_large_folio(struct folio *folio)
 		return;
 	}
 
+	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR) {
+		free_transhuge_folio(folio);
+		return;
+	}
+
 	VM_BUG_ON_FOLIO(dtor >= NR_COMPOUND_DTORS, folio);
 	compound_page_dtors[dtor](&folio->page);
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 4/9] mm: Make free_compound_page() static
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
                   ` (2 preceding siblings ...)
  2023-08-15  3:26 ` [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio() Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  7:47   ` David Hildenbrand
  2023-08-15  3:26 ` [PATCH 5/9] mm: Remove free_compound_page() Matthew Wilcox (Oracle)
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

free_compound_page() is the only remaining dynamic destructor.
Call it unconditionally from destroy_large_folio() and convert it
to take a folio.  It used to be the last thing called from
free_transhuge_folio(), and the call from destroy_large_folio()
will take care of that case too.

This was the last entry in the compound_page_dtors array, so delete it
and reword the comments that referred to it.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 include/linux/mm.h | 11 +----------
 mm/huge_memory.c   |  1 -
 mm/page_alloc.c    | 23 +++++++----------------
 3 files changed, 8 insertions(+), 27 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 7fb529dbff31..cf6707a7069e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1267,14 +1267,7 @@ void folio_copy(struct folio *dst, struct folio *src);
 
 unsigned long nr_free_buffer_pages(void);
 
-/*
- * Compound pages have a destructor function.  Provide a
- * prototype for that function and accessor functions.
- * These are _only_ valid on the head of a compound page.
- */
-typedef void compound_page_dtor(struct page *);
-
-/* Keep the enum in sync with compound_page_dtors array in mm/page_alloc.c */
+/* Compound pages may have a special destructor */
 enum compound_dtor_id {
 	NULL_COMPOUND_DTOR,
 	COMPOUND_PAGE_DTOR,
@@ -1325,8 +1318,6 @@ static inline unsigned long thp_size(struct page *page)
 	return PAGE_SIZE << thp_order(page);
 }
 
-void free_compound_page(struct page *page);
-
 #ifdef CONFIG_MMU
 /*
  * Do pte_mkwrite, but only if the vma says VM_WRITE.  We do this when
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 516fe3c26ef3..99e36ad540c4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2797,7 +2797,6 @@ void free_transhuge_folio(struct folio *folio)
 		}
 		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
 	}
-	free_compound_page(&folio->page);
 }
 
 void deferred_split_folio(struct folio *folio)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index feb2e95cf021..804982faba4e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -284,11 +284,6 @@ const char * const migratetype_names[MIGRATE_TYPES] = {
 #endif
 };
 
-static compound_page_dtor * const compound_page_dtors[NR_COMPOUND_DTORS] = {
-	[NULL_COMPOUND_DTOR] = NULL,
-	[COMPOUND_PAGE_DTOR] = free_compound_page,
-};
-
 int min_free_kbytes = 1024;
 int user_min_free_kbytes = -1;
 static int watermark_boost_factor __read_mostly = 15000;
@@ -587,17 +582,17 @@ static inline void free_the_page(struct page *page, unsigned int order)
  * The remaining PAGE_SIZE pages are called "tail pages". PageTail() is encoded
  * in bit 0 of page->compound_head. The rest of bits is pointer to head page.
  *
- * The first tail page's ->compound_dtor holds the offset in array of compound
- * page destructors. See compound_page_dtors.
+ * The first tail page's ->compound_dtor describes how to destroy the
+ * compound page.
  *
  * The first tail page's ->compound_order holds the order of allocation.
  * This usage means that zero-order pages may not be compound.
  */
 
-void free_compound_page(struct page *page)
+static void free_compound_page(struct folio *folio)
 {
-	mem_cgroup_uncharge(page_folio(page));
-	free_the_page(page, compound_order(page));
+	mem_cgroup_uncharge(folio);
+	free_the_page(&folio->page, folio_order(folio));
 }
 
 void prep_compound_page(struct page *page, unsigned int order)
@@ -621,13 +616,9 @@ void destroy_large_folio(struct folio *folio)
 		return;
 	}
 
-	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR) {
+	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR)
 		free_transhuge_folio(folio);
-		return;
-	}
-
-	VM_BUG_ON_FOLIO(dtor >= NR_COMPOUND_DTORS, folio);
-	compound_page_dtors[dtor](&folio->page);
+	free_compound_page(folio);
 }
 
 static inline void set_buddy_order(struct page *page, unsigned int order)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 5/9] mm: Remove free_compound_page()
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
                   ` (3 preceding siblings ...)
  2023-08-15  3:26 ` [PATCH 4/9] mm: Make free_compound_page() static Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  7:48   ` David Hildenbrand
  2023-08-15  3:26 ` [PATCH 6/9] mm: Remove HUGETLB_PAGE_DTOR Matthew Wilcox (Oracle)
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

Inline it into its one caller.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 mm/page_alloc.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 804982faba4e..21af71aea6eb 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -589,12 +589,6 @@ static inline void free_the_page(struct page *page, unsigned int order)
  * This usage means that zero-order pages may not be compound.
  */
 
-static void free_compound_page(struct folio *folio)
-{
-	mem_cgroup_uncharge(folio);
-	free_the_page(&folio->page, folio_order(folio));
-}
-
 void prep_compound_page(struct page *page, unsigned int order)
 {
 	int i;
@@ -618,7 +612,8 @@ void destroy_large_folio(struct folio *folio)
 
 	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR)
 		free_transhuge_folio(folio);
-	free_compound_page(folio);
+	mem_cgroup_uncharge(folio);
+	free_the_page(&folio->page, folio_order(folio));
 }
 
 static inline void set_buddy_order(struct page *page, unsigned int order)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 6/9] mm: Remove HUGETLB_PAGE_DTOR
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
                   ` (4 preceding siblings ...)
  2023-08-15  3:26 ` [PATCH 5/9] mm: Remove free_compound_page() Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  7:50   ` David Hildenbrand
  2023-08-15  3:26 ` [PATCH 7/9] mm: Add deferred_list page flag Matthew Wilcox (Oracle)
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

We can use a bit in page[1].flags to indicate that this folio belongs
to hugetlb instead of using a value in page[1].dtors.  That lets
folio_test_hugetlb() become an inline function like it should be.
We can also get rid of NULL_COMPOUND_DTOR.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 .../admin-guide/kdump/vmcoreinfo.rst          | 10 +---
 include/linux/mm.h                            |  2 -
 include/linux/page-flags.h                    | 21 +++++++-
 kernel/crash_core.c                           |  2 +-
 mm/hugetlb.c                                  | 49 +++----------------
 5 files changed, 29 insertions(+), 55 deletions(-)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index c18d94fa6470..baa1c355741d 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -325,8 +325,8 @@ NR_FREE_PAGES
 On linux-2.6.21 or later, the number of free pages is in
 vm_stat[NR_FREE_PAGES]. Used to get the number of free pages.
 
-PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask
-------------------------------------------------------------------------------
+PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PG_hugetlb
+-----------------------------------------------------------------------------------------
 
 Page attributes. These flags are used to filter various unnecessary for
 dumping pages.
@@ -338,12 +338,6 @@ More page attributes. These flags are used to filter various unnecessary for
 dumping pages.
 
 
-HUGETLB_PAGE_DTOR
------------------
-
-The HUGETLB_PAGE_DTOR flag denotes hugetlbfs pages. Makedumpfile
-excludes these pages.
-
 x86_64
 ======
 
diff --git a/include/linux/mm.h b/include/linux/mm.h
index cf6707a7069e..c8c8b1fd64d3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1269,9 +1269,7 @@ unsigned long nr_free_buffer_pages(void);
 
 /* Compound pages may have a special destructor */
 enum compound_dtor_id {
-	NULL_COMPOUND_DTOR,
 	COMPOUND_PAGE_DTOR,
-	HUGETLB_PAGE_DTOR,
 	TRANSHUGE_PAGE_DTOR,
 	NR_COMPOUND_DTORS
 };
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 92a2063a0a23..01d98695e79f 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -180,6 +180,9 @@ enum pageflags {
 	PG_has_hwpoisoned = PG_error,
 #endif
 
+	/* Is a hugetlb page.  Stored in first tail page. */
+	PG_hugetlb = PG_writeback,
+
 	/* non-lru isolated movable page */
 	PG_isolated = PG_reclaim,
 
@@ -812,7 +815,23 @@ static inline void ClearPageCompound(struct page *page)
 
 #ifdef CONFIG_HUGETLB_PAGE
 int PageHuge(struct page *page);
-bool folio_test_hugetlb(struct folio *folio);
+SETPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
+CLEARPAGEFLAG(HugeTLB, hugetlb, PF_SECOND)
+
+/**
+ * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
+ * @folio: The folio to test.
+ *
+ * Context: Any context.  Caller should have a reference on the folio to
+ * prevent it from being turned into a tail page.
+ * Return: True for hugetlbfs folios, false for anon folios or folios
+ * belonging to other filesystems.
+ */
+static inline bool folio_test_hugetlb(struct folio *folio)
+{
+	return folio_test_large(folio) &&
+		test_bit(PG_hugetlb, folio_flags(folio, 1));
+}
 #else
 TESTPAGEFLAG_FALSE(Huge, hugetlb)
 #endif
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 90ce1dfd591c..dd5f87047d06 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -490,7 +490,7 @@ static int __init crash_save_vmcoreinfo_init(void)
 #define PAGE_BUDDY_MAPCOUNT_VALUE	(~PG_buddy)
 	VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
 #ifdef CONFIG_HUGETLB_PAGE
-	VMCOREINFO_NUMBER(HUGETLB_PAGE_DTOR);
+	VMCOREINFO_NUMBER(PG_hugetlb);
 #define PAGE_OFFLINE_MAPCOUNT_VALUE	(~PG_offline)
 	VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
 #endif
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc340f5dbbd4..a1cebcee6503 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1585,25 +1585,7 @@ static inline void __clear_hugetlb_destructor(struct hstate *h,
 {
 	lockdep_assert_held(&hugetlb_lock);
 
-	/*
-	 * Very subtle
-	 *
-	 * For non-gigantic pages set the destructor to the normal compound
-	 * page dtor.  This is needed in case someone takes an additional
-	 * temporary ref to the page, and freeing is delayed until they drop
-	 * their reference.
-	 *
-	 * For gigantic pages set the destructor to the null dtor.  This
-	 * destructor will never be called.  Before freeing the gigantic
-	 * page destroy_compound_gigantic_folio will turn the folio into a
-	 * simple group of pages.  After this the destructor does not
-	 * apply.
-	 *
-	 */
-	if (hstate_is_gigantic(h))
-		folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
-	else
-		folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
+	folio_clear_hugetlb(folio);
 }
 
 /*
@@ -1690,7 +1672,7 @@ static void add_hugetlb_folio(struct hstate *h, struct folio *folio,
 		h->surplus_huge_pages_node[nid]++;
 	}
 
-	folio_set_compound_dtor(folio, HUGETLB_PAGE_DTOR);
+	folio_set_hugetlb(folio);
 	folio_change_private(folio, NULL);
 	/*
 	 * We have to set hugetlb_vmemmap_optimized again as above
@@ -1814,9 +1796,8 @@ static void free_hpage_workfn(struct work_struct *work)
 		/*
 		 * The VM_BUG_ON_FOLIO(!folio_test_hugetlb(folio), folio) in
 		 * folio_hstate() is going to trigger because a previous call to
-		 * remove_hugetlb_folio() will call folio_set_compound_dtor
-		 * (folio, NULL_COMPOUND_DTOR), so do not use folio_hstate()
-		 * directly.
+		 * remove_hugetlb_folio() will clear the hugetlb bit, so do
+		 * not use folio_hstate() directly.
 		 */
 		h = size_to_hstate(page_size(page));
 
@@ -1955,7 +1936,7 @@ static void __prep_new_hugetlb_folio(struct hstate *h, struct folio *folio)
 {
 	hugetlb_vmemmap_optimize(h, &folio->page);
 	INIT_LIST_HEAD(&folio->lru);
-	folio_set_compound_dtor(folio, HUGETLB_PAGE_DTOR);
+	folio_set_hugetlb(folio);
 	hugetlb_set_folio_subpool(folio, NULL);
 	set_hugetlb_cgroup(folio, NULL);
 	set_hugetlb_cgroup_rsvd(folio, NULL);
@@ -2070,28 +2051,10 @@ int PageHuge(struct page *page)
 	if (!PageCompound(page))
 		return 0;
 	folio = page_folio(page);
-	return folio->_folio_dtor == HUGETLB_PAGE_DTOR;
+	return folio_test_hugetlb(folio);
 }
 EXPORT_SYMBOL_GPL(PageHuge);
 
-/**
- * folio_test_hugetlb - Determine if the folio belongs to hugetlbfs
- * @folio: The folio to test.
- *
- * Context: Any context.  Caller should have a reference on the folio to
- * prevent it from being turned into a tail page.
- * Return: True for hugetlbfs folios, false for anon folios or folios
- * belonging to other filesystems.
- */
-bool folio_test_hugetlb(struct folio *folio)
-{
-	if (!folio_test_large(folio))
-		return false;
-
-	return folio->_folio_dtor == HUGETLB_PAGE_DTOR;
-}
-EXPORT_SYMBOL_GPL(folio_test_hugetlb);
-
 /*
  * Find and lock address space (mapping) in write mode.
  *
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
                   ` (5 preceding siblings ...)
  2023-08-15  3:26 ` [PATCH 6/9] mm: Remove HUGETLB_PAGE_DTOR Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  7:54   ` David Hildenbrand
  2023-08-15  3:26 ` [PATCH 8/9] mm: Rearrange page flags Matthew Wilcox (Oracle)
  2023-08-15  3:26 ` [PATCH 9/9] mm: Free up a word in the first tail page Matthew Wilcox (Oracle)
  8 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

Stored in the first tail page's flags, this flag replaces the destructor.
That removes the last of the destructors, so remove all references to
folio_dtor and compound_dtor.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 Documentation/admin-guide/kdump/vmcoreinfo.rst |  4 ++--
 include/linux/mm.h                             | 14 --------------
 include/linux/mm_types.h                       |  2 --
 include/linux/page-flags.h                     |  6 ++++++
 kernel/crash_core.c                            |  1 -
 mm/huge_memory.c                               |  4 ++--
 mm/internal.h                                  |  1 -
 mm/page_alloc.c                                |  7 +------
 8 files changed, 11 insertions(+), 28 deletions(-)

diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
index baa1c355741d..3bd38ac0e7de 100644
--- a/Documentation/admin-guide/kdump/vmcoreinfo.rst
+++ b/Documentation/admin-guide/kdump/vmcoreinfo.rst
@@ -141,8 +141,8 @@ nodemask_t
 The size of a nodemask_t type. Used to compute the number of online
 nodes.
 
-(page, flags|_refcount|mapping|lru|_mapcount|private|compound_dtor|compound_order|compound_head)
--------------------------------------------------------------------------------------------------
+(page, flags|_refcount|mapping|lru|_mapcount|private|compound_order|compound_head)
+----------------------------------------------------------------------------------
 
 User-space tools compute their values based on the offset of these
 variables. The variables are used when excluding unnecessary pages.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c8c8b1fd64d3..cf0ae8c51d7f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1267,20 +1267,6 @@ void folio_copy(struct folio *dst, struct folio *src);
 
 unsigned long nr_free_buffer_pages(void);
 
-/* Compound pages may have a special destructor */
-enum compound_dtor_id {
-	COMPOUND_PAGE_DTOR,
-	TRANSHUGE_PAGE_DTOR,
-	NR_COMPOUND_DTORS
-};
-
-static inline void folio_set_compound_dtor(struct folio *folio,
-		enum compound_dtor_id compound_dtor)
-{
-	VM_BUG_ON_FOLIO(compound_dtor >= NR_COMPOUND_DTORS, folio);
-	folio->_folio_dtor = compound_dtor;
-}
-
 void destroy_large_folio(struct folio *folio);
 
 /* Returns the number of bytes in this potentially compound page. */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index da538ff68953..d45a2b8041e0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -282,7 +282,6 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page)
  * @_refcount: Do not access this member directly.  Use folio_ref_count()
  *    to find how many references there are to this folio.
  * @memcg_data: Memory Control Group data.
- * @_folio_dtor: Which destructor to use for this folio.
  * @_folio_order: Do not use directly, call folio_order().
  * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
  * @_nr_pages_mapped: Do not use directly, call folio_mapcount().
@@ -336,7 +335,6 @@ struct folio {
 			unsigned long _flags_1;
 			unsigned long _head_1;
 	/* public: */
-			unsigned char _folio_dtor;
 			unsigned char _folio_order;
 			atomic_t _entire_mapcount;
 			atomic_t _nr_pages_mapped;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 01d98695e79f..aabf50dc71a3 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -183,6 +183,9 @@ enum pageflags {
 	/* Is a hugetlb page.  Stored in first tail page. */
 	PG_hugetlb = PG_writeback,
 
+	/* Has a deferred list (may be empty).  First tail page. */
+	PG_deferred_list = PG_reclaim,
+
 	/* non-lru isolated movable page */
 	PG_isolated = PG_reclaim,
 
@@ -809,6 +812,9 @@ static inline void ClearPageCompound(struct page *page)
 	BUG_ON(!PageHead(page));
 	ClearPageHead(page);
 }
+PAGEFLAG(DeferredList, deferred_list, PF_SECOND)
+#else
+TESTPAGEFLAG_FALSE(DeferredList, deferred_list)
 #endif
 
 #define PG_head_mask ((1UL << PG_head))
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index dd5f87047d06..934dd86e19f5 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -455,7 +455,6 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_OFFSET(page, lru);
 	VMCOREINFO_OFFSET(page, _mapcount);
 	VMCOREINFO_OFFSET(page, private);
-	VMCOREINFO_OFFSET(folio, _folio_dtor);
 	VMCOREINFO_OFFSET(folio, _folio_order);
 	VMCOREINFO_OFFSET(page, compound_head);
 	VMCOREINFO_OFFSET(pglist_data, node_zones);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 99e36ad540c4..3b5db99eb7ac 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -583,7 +583,7 @@ void prep_transhuge_page(struct page *page)
 
 	VM_BUG_ON_FOLIO(folio_order(folio) < 2, folio);
 	INIT_LIST_HEAD(&folio->_deferred_list);
-	folio_set_compound_dtor(folio, TRANSHUGE_PAGE_DTOR);
+	folio_set_deferred_list(folio);
 }
 
 static inline bool is_transparent_hugepage(struct page *page)
@@ -595,7 +595,7 @@ static inline bool is_transparent_hugepage(struct page *page)
 
 	folio = page_folio(page);
 	return is_huge_zero_page(&folio->page) ||
-	       folio->_folio_dtor == TRANSHUGE_PAGE_DTOR;
+		folio_test_deferred_list(folio);
 }
 
 static unsigned long __thp_get_unmapped_area(struct file *filp,
diff --git a/mm/internal.h b/mm/internal.h
index 5a03bc4782a2..e3d11119b04e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -417,7 +417,6 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
 {
 	struct folio *folio = (struct folio *)page;
 
-	folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
 	folio_set_order(folio, order);
 	atomic_set(&folio->_entire_mapcount, -1);
 	atomic_set(&folio->_nr_pages_mapped, 0);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 21af71aea6eb..9fe9209605a5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -582,9 +582,6 @@ static inline void free_the_page(struct page *page, unsigned int order)
  * The remaining PAGE_SIZE pages are called "tail pages". PageTail() is encoded
  * in bit 0 of page->compound_head. The rest of bits is pointer to head page.
  *
- * The first tail page's ->compound_dtor describes how to destroy the
- * compound page.
- *
  * The first tail page's ->compound_order holds the order of allocation.
  * This usage means that zero-order pages may not be compound.
  */
@@ -603,14 +600,12 @@ void prep_compound_page(struct page *page, unsigned int order)
 
 void destroy_large_folio(struct folio *folio)
 {
-	enum compound_dtor_id dtor = folio->_folio_dtor;
-
 	if (folio_test_hugetlb(folio)) {
 		free_huge_page(folio);
 		return;
 	}
 
-	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR)
+	if (folio_test_deferred_list(folio))
 		free_transhuge_folio(folio);
 	mem_cgroup_uncharge(folio);
 	free_the_page(&folio->page, folio_order(folio));
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 8/9] mm: Rearrange page flags
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
                   ` (6 preceding siblings ...)
  2023-08-15  3:26 ` [PATCH 7/9] mm: Add deferred_list page flag Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  4:30   ` Yosry Ahmed
  2023-08-15 19:24   ` Peter Xu
  2023-08-15  3:26 ` [PATCH 9/9] mm: Free up a word in the first tail page Matthew Wilcox (Oracle)
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

Move PG_writeback into bottom byte so that it can use PG_waiters in a
later patch.  Move PG_head into bottom byte as well to match with where
'order' is moving next.  PG_active and PG_workingset move into the second
byte to make room for them.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 include/linux/page-flags.h | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index aabf50dc71a3..6a0dd94b2460 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -99,13 +99,15 @@
  */
 enum pageflags {
 	PG_locked,		/* Page is locked. Don't touch. */
+	PG_writeback,		/* Page is under writeback */
 	PG_referenced,
 	PG_uptodate,
 	PG_dirty,
 	PG_lru,
+	PG_head,		/* Must be in bit 6 */
+	PG_waiters,		/* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */
 	PG_active,
 	PG_workingset,
-	PG_waiters,		/* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */
 	PG_error,
 	PG_slab,
 	PG_owner_priv_1,	/* Owner use. If pagecache, fs may use*/
@@ -113,8 +115,6 @@ enum pageflags {
 	PG_reserved,
 	PG_private,		/* If pagecache, has fs-private data */
 	PG_private_2,		/* If pagecache, has fs aux data */
-	PG_writeback,		/* Page is under writeback */
-	PG_head,		/* A head page */
 	PG_mappedtodisk,	/* Has blocks allocated on-disk */
 	PG_reclaim,		/* To be reclaimed asap */
 	PG_swapbacked,		/* Page is backed by RAM/swap */
@@ -171,21 +171,19 @@ enum pageflags {
 	/* Remapped by swiotlb-xen. */
 	PG_xen_remapped = PG_owner_priv_1,
 
-#ifdef CONFIG_MEMORY_FAILURE
 	/*
-	 * Compound pages. Stored in first tail page's flags.
-	 * Indicates that at least one subpage is hwpoisoned in the
-	 * THP.
+	 * Flags only valid for compound pages.  Stored in first tail page's
+	 * flags word.
 	 */
-	PG_has_hwpoisoned = PG_error,
-#endif
-
-	/* Is a hugetlb page.  Stored in first tail page. */
-	PG_hugetlb = PG_writeback,
 
-	/* Has a deferred list (may be empty).  First tail page. */
+	/* At least one page is hwpoisoned in the folio.  */
+	PG_has_hwpoisoned = PG_error,
+	/* Belongs to hugetlb */
+	PG_hugetlb = PG_active,
+	/* Has a deferred list (does not indicate whether it is active) */
 	PG_deferred_list = PG_reclaim,
 
+
 	/* non-lru isolated movable page */
 	PG_isolated = PG_reclaim,
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH 9/9] mm: Free up a word in the first tail page
  2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
                   ` (7 preceding siblings ...)
  2023-08-15  3:26 ` [PATCH 8/9] mm: Rearrange page flags Matthew Wilcox (Oracle)
@ 2023-08-15  3:26 ` Matthew Wilcox (Oracle)
  2023-08-15  7:59   ` David Hildenbrand
  2023-08-15 19:21   ` Peter Xu
  8 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-08-15  3:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), Jens Axboe, io-uring, linux-mm

Store the folio order in the low byte of the flags word in the first
tail page.  This frees up the word that was being used to store the
order and dtor bytes previously.

Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
---
 include/linux/mm.h       | 10 +++++-----
 include/linux/mm_types.h |  3 +--
 kernel/crash_core.c      |  1 -
 mm/internal.h            |  2 +-
 mm/page_alloc.c          |  4 +++-
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cf0ae8c51d7f..85568e2b2556 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1028,7 +1028,7 @@ struct inode;
  * compound_order() can be called without holding a reference, which means
  * that niceties like page_folio() don't work.  These callers should be
  * prepared to handle wild return values.  For example, PG_head may be
- * set before _folio_order is initialised, or this may be a tail page.
+ * set before the order is initialised, or this may be a tail page.
  * See compaction.c for some good examples.
  */
 static inline unsigned int compound_order(struct page *page)
@@ -1037,7 +1037,7 @@ static inline unsigned int compound_order(struct page *page)
 
 	if (!test_bit(PG_head, &folio->flags))
 		return 0;
-	return folio->_folio_order;
+	return folio->_flags_1 & 0xff;
 }
 
 /**
@@ -1053,7 +1053,7 @@ static inline unsigned int folio_order(struct folio *folio)
 {
 	if (!folio_test_large(folio))
 		return 0;
-	return folio->_folio_order;
+	return folio->_flags_1 & 0xff;
 }
 
 #include <linux/huge_mm.h>
@@ -2025,7 +2025,7 @@ static inline long folio_nr_pages(struct folio *folio)
 #ifdef CONFIG_64BIT
 	return folio->_folio_nr_pages;
 #else
-	return 1L << folio->_folio_order;
+	return 1L << (folio->_flags_1 & 0xff);
 #endif
 }
 
@@ -2043,7 +2043,7 @@ static inline unsigned long compound_nr(struct page *page)
 #ifdef CONFIG_64BIT
 	return folio->_folio_nr_pages;
 #else
-	return 1L << folio->_folio_order;
+	return 1L << (folio->_flags_1 & 0xff);
 #endif
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d45a2b8041e0..659c7b84726c 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -282,7 +282,6 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page)
  * @_refcount: Do not access this member directly.  Use folio_ref_count()
  *    to find how many references there are to this folio.
  * @memcg_data: Memory Control Group data.
- * @_folio_order: Do not use directly, call folio_order().
  * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
  * @_nr_pages_mapped: Do not use directly, call folio_mapcount().
  * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
@@ -334,8 +333,8 @@ struct folio {
 		struct {
 			unsigned long _flags_1;
 			unsigned long _head_1;
+			unsigned long _folio_avail;
 	/* public: */
-			unsigned char _folio_order;
 			atomic_t _entire_mapcount;
 			atomic_t _nr_pages_mapped;
 			atomic_t _pincount;
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 934dd86e19f5..693445e1f7f6 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -455,7 +455,6 @@ static int __init crash_save_vmcoreinfo_init(void)
 	VMCOREINFO_OFFSET(page, lru);
 	VMCOREINFO_OFFSET(page, _mapcount);
 	VMCOREINFO_OFFSET(page, private);
-	VMCOREINFO_OFFSET(folio, _folio_order);
 	VMCOREINFO_OFFSET(page, compound_head);
 	VMCOREINFO_OFFSET(pglist_data, node_zones);
 	VMCOREINFO_OFFSET(pglist_data, nr_zones);
diff --git a/mm/internal.h b/mm/internal.h
index e3d11119b04e..c415260c1f06 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -407,7 +407,7 @@ static inline void folio_set_order(struct folio *folio, unsigned int order)
 	if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
 		return;
 
-	folio->_folio_order = order;
+	folio->_flags_1 = (folio->_flags_1 & ~0xffUL) | order;
 #ifdef CONFIG_64BIT
 	folio->_folio_nr_pages = 1U << order;
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9fe9209605a5..0e0e0d18a81b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1115,8 +1115,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
 
 		VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
 
-		if (compound)
+		if (compound) {
 			ClearPageHasHWPoisoned(page);
+			page[1].flags &= ~0xffUL;
+		}
 		for (i = 1; i < (1 << order); i++) {
 			if (compound)
 				bad += free_tail_page_prepare(page, page + i);
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] mm: Rearrange page flags
  2023-08-15  3:26 ` [PATCH 8/9] mm: Rearrange page flags Matthew Wilcox (Oracle)
@ 2023-08-15  4:30   ` Yosry Ahmed
  2023-08-15 19:24   ` Peter Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Yosry Ahmed @ 2023-08-15  4:30 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Mon, Aug 14, 2023 at 8:27 PM Matthew Wilcox (Oracle)
<[email protected]> wrote:
>
> Move PG_writeback into bottom byte so that it can use PG_waiters in a
> later patch.  Move PG_head into bottom byte as well to match with where
> 'order' is moving next.

Sorry if that's obvious, but why do we want to move PG_head to where
'order' is moving? I thought we would want to avoid an overlap to
avoid mistaking the first tail page of a high-order compound page as a
head page?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio()
  2023-08-15  3:26 ` [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio() Matthew Wilcox (Oracle)
@ 2023-08-15  6:13   ` kernel test robot
  2023-08-15  7:40   ` David Hildenbrand
  2023-08-15  8:09   ` kernel test robot
  2 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2023-08-15  6:13 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: oe-kbuild-all, Linux Memory Management List,
	Matthew Wilcox (Oracle), Jens Axboe, io-uring

Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.5-rc6 next-20230809]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/io_uring-Stop-calling-free_compound_page/20230815-112847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230815032645.1393700-4-willy%40infradead.org
patch subject: [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio()
config: m68k-randconfig-r006-20230815 (https://download.01.org/0day-ci/archive/20230815/[email protected]/config)
compiler: m68k-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230815/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

   mm/page_alloc.c: In function 'destroy_large_folio':
>> mm/page_alloc.c:615:17: error: implicit declaration of function 'free_transhuge_folio' [-Werror=implicit-function-declaration]
     615 |                 free_transhuge_folio(folio);
         |                 ^~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/free_transhuge_folio +615 mm/page_alloc.c

   604	
   605	void destroy_large_folio(struct folio *folio)
   606	{
   607		enum compound_dtor_id dtor = folio->_folio_dtor;
   608	
   609		if (folio_test_hugetlb(folio)) {
   610			free_huge_page(folio);
   611			return;
   612		}
   613	
   614		if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR) {
 > 615			free_transhuge_folio(folio);
   616			return;
   617		}
   618	
   619		VM_BUG_ON_FOLIO(dtor >= NR_COMPOUND_DTORS, folio);
   620		compound_page_dtors[dtor](&folio->page);
   621	}
   622	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] io_uring: Stop calling free_compound_page()
  2023-08-15  3:26 ` [PATCH 1/9] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
@ 2023-08-15  7:33   ` David Hildenbrand
  2023-08-15 15:00   ` Jens Axboe
  1 sibling, 0 replies; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:33 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> folio_put() is the standard way to write this, and it's not
> appreciably slower.  This is an enabling patch for removing
> free_compound_page() entirely.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---


Reviewed-by: David Hildenbrand <[email protected]>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2/9] mm: Call the hugetlb destructor directly
  2023-08-15  3:26 ` [PATCH 2/9] mm: Call the hugetlb destructor directly Matthew Wilcox (Oracle)
@ 2023-08-15  7:36   ` David Hildenbrand
  0 siblings, 0 replies; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:36 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> Indirect calls are expensive, thanks to Spectre.  Convert this one to
> a direct call, and pass a folio instead of the head page to save a few
> more instructions.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
>   include/linux/hugetlb.h |  3 ++-
>   include/linux/mm.h      |  6 +-----
>   mm/hugetlb.c            | 26 ++++++++++++--------------
>   mm/page_alloc.c         |  8 +++++---
>   4 files changed, 20 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 0a393bc02f25..9555859537a3 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -26,6 +26,8 @@ typedef struct { unsigned long pd; } hugepd_t;
>   #define __hugepd(x) ((hugepd_t) { (x) })
>   #endif
>   
> +void free_huge_page(struct folio *folio);
> +
>   #ifdef CONFIG_HUGETLB_PAGE
>   
>   #include <linux/mempolicy.h>
> @@ -165,7 +167,6 @@ int get_huge_page_for_hwpoison(unsigned long pfn, int flags,
>   				bool *migratable_cleared);
>   void folio_putback_active_hugetlb(struct folio *folio);
>   void move_hugetlb_state(struct folio *old_folio, struct folio *new_folio, int reason);
> -void free_huge_page(struct page *page);
>   void hugetlb_fix_reserve_counts(struct inode *inode);
>   extern struct mutex *hugetlb_fault_mutex_table;
>   u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 19493d6a2bb8..7fb529dbff31 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1278,13 +1278,9 @@ typedef void compound_page_dtor(struct page *);
>   enum compound_dtor_id {
>   	NULL_COMPOUND_DTOR,
>   	COMPOUND_PAGE_DTOR,
> -#ifdef CONFIG_HUGETLB_PAGE
>   	HUGETLB_PAGE_DTOR,
> -#endif
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>   	TRANSHUGE_PAGE_DTOR,
> -#endif
> -	NR_COMPOUND_DTORS,
> +	NR_COMPOUND_DTORS
>   };
>   
>   static inline void folio_set_compound_dtor(struct folio *folio,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index e327a5a7602c..bc340f5dbbd4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1875,13 +1875,12 @@ struct hstate *size_to_hstate(unsigned long size)
>   	return NULL;
>   }
>   
> -void free_huge_page(struct page *page)
> +void free_huge_page(struct folio *folio)

free_huge_page" but passing a folio, hm. Maybe something like 
"free_hugetlb_folio" would be better.


Apart from that LGTM.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio()
  2023-08-15  3:26 ` [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio() Matthew Wilcox (Oracle)
  2023-08-15  6:13   ` kernel test robot
@ 2023-08-15  7:40   ` David Hildenbrand
  2023-08-15 14:06     ` Matthew Wilcox
  2023-08-15  8:09   ` kernel test robot
  2 siblings, 1 reply; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:40 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> Indirect calls are expensive, thanks to Spectre.  Convert this one to
> a direct call, and pass a folio instead of the head page for type safety.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
>   include/linux/huge_mm.h | 2 +-
>   mm/huge_memory.c        | 5 ++---
>   mm/page_alloc.c         | 8 +++++---
>   3 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 20284387b841..24aee49a581a 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -144,7 +144,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
>   		unsigned long len, unsigned long pgoff, unsigned long flags);
>   
>   void prep_transhuge_page(struct page *page);
> -void free_transhuge_page(struct page *page);
> +void free_transhuge_folio(struct folio *folio);
>   
>   bool can_split_folio(struct folio *folio, int *pextra_pins);
>   int split_huge_page_to_list(struct page *page, struct list_head *list);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 8480728fa220..516fe3c26ef3 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2779,9 +2779,8 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
>   	return ret;
>   }
>   
> -void free_transhuge_page(struct page *page)
> +void free_transhuge_folio(struct folio *folio)
>   {
> -	struct folio *folio = (struct folio *)page;
>   	struct deferred_split *ds_queue = get_deferred_split_queue(folio);
>   	unsigned long flags;
>   
> @@ -2798,7 +2797,7 @@ void free_transhuge_page(struct page *page)
>   		}
>   		spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags);
>   	}
> -	free_compound_page(page);
> +	free_compound_page(&folio->page);
>   }
>   
>   void deferred_split_folio(struct folio *folio)
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 1f67d4968590..feb2e95cf021 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -287,9 +287,6 @@ const char * const migratetype_names[MIGRATE_TYPES] = {
>   static compound_page_dtor * const compound_page_dtors[NR_COMPOUND_DTORS] = {
>   	[NULL_COMPOUND_DTOR] = NULL,
>   	[COMPOUND_PAGE_DTOR] = free_compound_page,
> -#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> -	[TRANSHUGE_PAGE_DTOR] = free_transhuge_page,
> -#endif
>   };
>   
>   int min_free_kbytes = 1024;
> @@ -624,6 +621,11 @@ void destroy_large_folio(struct folio *folio)
>   		return;
>   	}
>   
> +	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR) {
> +		free_transhuge_folio(folio);

I really wonder if folio_test_transhuge() should be written similar to 
folio_test_hugetlb() instead, such that the dtor check is implicit.

Any good reasons not to do that?

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/9] mm: Make free_compound_page() static
  2023-08-15  3:26 ` [PATCH 4/9] mm: Make free_compound_page() static Matthew Wilcox (Oracle)
@ 2023-08-15  7:47   ` David Hildenbrand
  2023-08-15  7:48     ` David Hildenbrand
  0 siblings, 1 reply; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:47 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> free_compound_page() is the only remaining dynamic destructor.
> Call it unconditionally from destroy_large_folio() and convert it
> to take a folio.  It used to be the last thing called from
> free_transhuge_folio(), and the call from destroy_large_folio()
> will take care of that case too.
> 
> This was the last entry in the compound_page_dtors array, so delete it
> and reword the comments that referred to it.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---

[...]

>   int min_free_kbytes = 1024;
>   int user_min_free_kbytes = -1;
>   static int watermark_boost_factor __read_mostly = 15000;
> @@ -587,17 +582,17 @@ static inline void free_the_page(struct page *page, unsigned int order)
>    * The remaining PAGE_SIZE pages are called "tail pages". PageTail() is encoded
>    * in bit 0 of page->compound_head. The rest of bits is pointer to head page.
>    *
> - * The first tail page's ->compound_dtor holds the offset in array of compound
> - * page destructors. See compound_page_dtors.
> + * The first tail page's ->compound_dtor describes how to destroy the
> + * compound page.
>    *
>    * The first tail page's ->compound_order holds the order of allocation.
>    * This usage means that zero-order pages may not be compound.
>    */
>   
> -void free_compound_page(struct page *page)
> +static void free_compound_page(struct folio *folio)

"free_folio", maybe? Doesn't seem to be taken yet.

Acked-by: David Hildenbrand <[email protected]>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4/9] mm: Make free_compound_page() static
  2023-08-15  7:47   ` David Hildenbrand
@ 2023-08-15  7:48     ` David Hildenbrand
  0 siblings, 0 replies; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 09:47, David Hildenbrand wrote:
> On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
>> free_compound_page() is the only remaining dynamic destructor.
>> Call it unconditionally from destroy_large_folio() and convert it
>> to take a folio.  It used to be the last thing called from
>> free_transhuge_folio(), and the call from destroy_large_folio()
>> will take care of that case too.
>>
>> This was the last entry in the compound_page_dtors array, so delete it
>> and reword the comments that referred to it.
>>
>> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
>> ---
> 
> [...]
> 
>>    int min_free_kbytes = 1024;
>>    int user_min_free_kbytes = -1;
>>    static int watermark_boost_factor __read_mostly = 15000;
>> @@ -587,17 +582,17 @@ static inline void free_the_page(struct page *page, unsigned int order)
>>     * The remaining PAGE_SIZE pages are called "tail pages". PageTail() is encoded
>>     * in bit 0 of page->compound_head. The rest of bits is pointer to head page.
>>     *
>> - * The first tail page's ->compound_dtor holds the offset in array of compound
>> - * page destructors. See compound_page_dtors.
>> + * The first tail page's ->compound_dtor describes how to destroy the
>> + * compound page.
>>     *
>>     * The first tail page's ->compound_order holds the order of allocation.
>>     * This usage means that zero-order pages may not be compound.
>>     */
>>    
>> -void free_compound_page(struct page *page)
>> +static void free_compound_page(struct folio *folio)
> 
> "free_folio", maybe? Doesn't seem to be taken yet.

Ah, I see it gets removed in the following patch.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5/9] mm: Remove free_compound_page()
  2023-08-15  3:26 ` [PATCH 5/9] mm: Remove free_compound_page() Matthew Wilcox (Oracle)
@ 2023-08-15  7:48   ` David Hildenbrand
  0 siblings, 0 replies; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:48 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> Inline it into its one caller.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
>   mm/page_alloc.c | 9 ++-------
>   1 file changed, 2 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 804982faba4e..21af71aea6eb 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -589,12 +589,6 @@ static inline void free_the_page(struct page *page, unsigned int order)
>    * This usage means that zero-order pages may not be compound.
>    */
>   
> -static void free_compound_page(struct folio *folio)
> -{
> -	mem_cgroup_uncharge(folio);
> -	free_the_page(&folio->page, folio_order(folio));
> -}
> -
>   void prep_compound_page(struct page *page, unsigned int order)
>   {
>   	int i;
> @@ -618,7 +612,8 @@ void destroy_large_folio(struct folio *folio)
>   
>   	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR)
>   		free_transhuge_folio(folio);
> -	free_compound_page(folio);
> +	mem_cgroup_uncharge(folio);
> +	free_the_page(&folio->page, folio_order(folio));
>   }
>   
>   static inline void set_buddy_order(struct page *page, unsigned int order)

I'd squash that into the previous commit

Acked-by: David Hildenbrand <[email protected]>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 6/9] mm: Remove HUGETLB_PAGE_DTOR
  2023-08-15  3:26 ` [PATCH 6/9] mm: Remove HUGETLB_PAGE_DTOR Matthew Wilcox (Oracle)
@ 2023-08-15  7:50   ` David Hildenbrand
  0 siblings, 0 replies; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:50 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> We can use a bit in page[1].flags to indicate that this folio belongs
> to hugetlb instead of using a value in page[1].dtors.  That lets
> folio_test_hugetlb() become an inline function like it should be.
> We can also get rid of NULL_COMPOUND_DTOR.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
>   .../admin-guide/kdump/vmcoreinfo.rst          | 10 +---
>   include/linux/mm.h                            |  2 -
>   include/linux/page-flags.h                    | 21 +++++++-
>   kernel/crash_core.c                           |  2 +-
>   mm/hugetlb.c                                  | 49 +++----------------
>   5 files changed, 29 insertions(+), 55 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kdump/vmcoreinfo.rst b/Documentation/admin-guide/kdump/vmcoreinfo.rst
> index c18d94fa6470..baa1c355741d 100644
> --- a/Documentation/admin-guide/kdump/vmcoreinfo.rst


Acked-by: David Hildenbrand <[email protected]>

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15  3:26 ` [PATCH 7/9] mm: Add deferred_list page flag Matthew Wilcox (Oracle)
@ 2023-08-15  7:54   ` David Hildenbrand
  2023-08-15 15:32     ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:54 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> Stored in the first tail page's flags, this flag replaces the destructor.
> That removes the last of the destructors, so remove all references to
> folio_dtor and compound_dtor.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---

[...]

>   
> +	/* Has a deferred list (may be empty).  First tail page. */
> +	PG_deferred_list = PG_reclaim,
> +

If PG_deferred_list implies thp (and replaces the thp dtor), should we 
rather name this PG_thp or something along those lines?

[...]

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 21af71aea6eb..9fe9209605a5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -582,9 +582,6 @@ static inline void free_the_page(struct page *page, unsigned int order)
>    * The remaining PAGE_SIZE pages are called "tail pages". PageTail() is encoded
>    * in bit 0 of page->compound_head. The rest of bits is pointer to head page.
>    *
> - * The first tail page's ->compound_dtor describes how to destroy the
> - * compound page.
> - *
>    * The first tail page's ->compound_order holds the order of allocation.
>    * This usage means that zero-order pages may not be compound.
>    */
> @@ -603,14 +600,12 @@ void prep_compound_page(struct page *page, unsigned int order)
>   
>   void destroy_large_folio(struct folio *folio)
>   {
> -	enum compound_dtor_id dtor = folio->_folio_dtor;
> -
>   	if (folio_test_hugetlb(folio)) {
>   		free_huge_page(folio);
>   		return;
>   	}
>   
> -	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR)
> +	if (folio_test_deferred_list(folio))

Similar question as before: why not have folio_test_transhuge() perform 
this check internally?

The sequence of

	if (folio_test_deferred_list(folio))
		free_transhuge_folio(folio);

Looks less clear to what we had before.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] mm: Free up a word in the first tail page
  2023-08-15  3:26 ` [PATCH 9/9] mm: Free up a word in the first tail page Matthew Wilcox (Oracle)
@ 2023-08-15  7:59   ` David Hildenbrand
  2023-08-15 11:39     ` Matthew Wilcox
  2023-08-15 19:21   ` Peter Xu
  1 sibling, 1 reply; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15  7:59 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: Jens Axboe, io-uring, linux-mm

On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> Store the folio order in the low byte of the flags word in the first
> tail page.  This frees up the word that was being used to store the
> order and dtor bytes previously.
> 

Is there still a free flag in page[1] after this change? I need one, at 
least for a prototype I'm working on. (could fallback to page[2], though 
eventually, though)

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio()
  2023-08-15  3:26 ` [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio() Matthew Wilcox (Oracle)
  2023-08-15  6:13   ` kernel test robot
  2023-08-15  7:40   ` David Hildenbrand
@ 2023-08-15  8:09   ` kernel test robot
  2 siblings, 0 replies; 42+ messages in thread
From: kernel test robot @ 2023-08-15  8:09 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton
  Cc: llvm, oe-kbuild-all, Linux Memory Management List,
	Matthew Wilcox (Oracle), Jens Axboe, io-uring

Hi Matthew,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.5-rc6 next-20230809]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/io_uring-Stop-calling-free_compound_page/20230815-112847
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20230815032645.1393700-4-willy%40infradead.org
patch subject: [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio()
config: arm-randconfig-r046-20230815 (https://download.01.org/0day-ci/archive/20230815/[email protected]/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230815/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

>> mm/page_alloc.c:615:3: error: call to undeclared function 'free_transhuge_folio'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     615 |                 free_transhuge_folio(folio);
         |                 ^
   1 error generated.


vim +/free_transhuge_folio +615 mm/page_alloc.c

   604	
   605	void destroy_large_folio(struct folio *folio)
   606	{
   607		enum compound_dtor_id dtor = folio->_folio_dtor;
   608	
   609		if (folio_test_hugetlb(folio)) {
   610			free_huge_page(folio);
   611			return;
   612		}
   613	
   614		if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR) {
 > 615			free_transhuge_folio(folio);
   616			return;
   617		}
   618	
   619		VM_BUG_ON_FOLIO(dtor >= NR_COMPOUND_DTORS, folio);
   620		compound_page_dtors[dtor](&folio->page);
   621	}
   622	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] mm: Free up a word in the first tail page
  2023-08-15  7:59   ` David Hildenbrand
@ 2023-08-15 11:39     ` Matthew Wilcox
  0 siblings, 0 replies; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-15 11:39 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 09:59:08AM +0200, David Hildenbrand wrote:
> On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> > Store the folio order in the low byte of the flags word in the first
> > tail page.  This frees up the word that was being used to store the
> > order and dtor bytes previously.
> > 
> 
> Is there still a free flag in page[1] after this change? I need one, at
> least for a prototype I'm working on. (could fallback to page[2], though
> eventually, though)

There are only ~13 flags used in page[1] at this point.  Plenty of
space.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio()
  2023-08-15  7:40   ` David Hildenbrand
@ 2023-08-15 14:06     ` Matthew Wilcox
  0 siblings, 0 replies; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-15 14:06 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 09:40:58AM +0200, David Hildenbrand wrote:
> > @@ -624,6 +621,11 @@ void destroy_large_folio(struct folio *folio)
> >   		return;
> >   	}
> > +	if (folio_test_transhuge(folio) && dtor == TRANSHUGE_PAGE_DTOR) {
> > +		free_transhuge_folio(folio);
> 
> I really wonder if folio_test_transhuge() should be written similar to
> folio_test_hugetlb() instead, such that the dtor check is implicit.
> 
> Any good reasons not to do that?

Actually, we should probably delete folio_test_transhuge().  I'll tack
a patch onto the end of this series to do that.  I want to avoid a
reference to free_transhuge_folio() if !CONFIG_TRANSPARENT_HUGEPAGE and
folio_test_transhuge() accomplishes that by way of TESTPAGEFLAG_FALSE
in page-flags.  But so does folio_test_deferred_list() which is what
we're getting to by the end of this series.

So I'll leave this patch alone for now (other than fixing up the
buildbot errors).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] io_uring: Stop calling free_compound_page()
  2023-08-15  3:26 ` [PATCH 1/9] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
  2023-08-15  7:33   ` David Hildenbrand
@ 2023-08-15 15:00   ` Jens Axboe
  2023-08-15 15:36     ` Matthew Wilcox
  1 sibling, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2023-08-15 15:00 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: io-uring, linux-mm

On 8/14/23 9:26 PM, Matthew Wilcox (Oracle) wrote:
> folio_put() is the standard way to write this, and it's not
> appreciably slower.  This is an enabling patch for removing
> free_compound_page() entirely.

Looks fine to me, nice cleanup too. Please use 72-74 char line breaks
though in the commit message, this looks like ~60? With that fixed:

Reviewed-by: Jens Axboe <[email protected]>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15  7:54   ` David Hildenbrand
@ 2023-08-15 15:32     ` Matthew Wilcox
  2023-08-15 16:40       ` David Hildenbrand
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-15 15:32 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 09:54:36AM +0200, David Hildenbrand wrote:
> On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> > Stored in the first tail page's flags, this flag replaces the destructor.
> > That removes the last of the destructors, so remove all references to
> > folio_dtor and compound_dtor.
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > ---
> 
> [...]
> 
> > +	/* Has a deferred list (may be empty).  First tail page. */
> > +	PG_deferred_list = PG_reclaim,
> > +
> 
> If PG_deferred_list implies thp (and replaces the thp dtor), should we
> rather name this PG_thp or something along those lines?

We're trying to use 'thp' to mean 'a folio which is pmd mappable',
so I'd rather not call it that.

> The sequence of
> 
> 	if (folio_test_deferred_list(folio))
> 		free_transhuge_folio(folio);
> 
> Looks less clear to what we had before.

I can rename that.  How about

	if (folio_test_deferred_list(folio))
		folio_remove_deferred(folio);


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 1/9] io_uring: Stop calling free_compound_page()
  2023-08-15 15:00   ` Jens Axboe
@ 2023-08-15 15:36     ` Matthew Wilcox
  0 siblings, 0 replies; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-15 15:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, io-uring, linux-mm

On Tue, Aug 15, 2023 at 09:00:24AM -0600, Jens Axboe wrote:
> On 8/14/23 9:26 PM, Matthew Wilcox (Oracle) wrote:
> > folio_put() is the standard way to write this, and it's not
> > appreciably slower.  This is an enabling patch for removing
> > free_compound_page() entirely.
> 
> Looks fine to me, nice cleanup too. Please use 72-74 char line breaks
> though in the commit message, this looks like ~60? With that fixed:

Eh, I just pipe it throught fmt, and that's what it did.  Probably
it has rules about widows/orphans?

> Reviewed-by: Jens Axboe <[email protected]>
> 
> -- 
> Jens Axboe
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15 15:32     ` Matthew Wilcox
@ 2023-08-15 16:40       ` David Hildenbrand
  2023-08-15 17:06         ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15 16:40 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On 15.08.23 17:32, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 09:54:36AM +0200, David Hildenbrand wrote:
>> On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
>>> Stored in the first tail page's flags, this flag replaces the destructor.
>>> That removes the last of the destructors, so remove all references to
>>> folio_dtor and compound_dtor.
>>>
>>> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
>>> ---
>>
>> [...]
>>
>>> +	/* Has a deferred list (may be empty).  First tail page. */
>>> +	PG_deferred_list = PG_reclaim,
>>> +
>>
>> If PG_deferred_list implies thp (and replaces the thp dtor), should we
>> rather name this PG_thp or something along those lines?
> 
> We're trying to use 'thp' to mean 'a folio which is pmd mappable',
> so I'd rather not call it that.

There is no conclusion on that.

And I am not sure if inventing new terminology will help anybody (both, 
users and developers). Just call the old thing "PMD-sized THP".

After all, the deferred split queue is just an implementation detail, 
and it happens to live in tailpage 2, no?

Once we would end up initializing something else in 
prep_transhuge_page(), it would turn out pretty confusing if that is 
called folio_remove_deferred() ...

In the end, I don't care as long as it doesn't add confusion; this did. 
We most probably won't reach a conclusion here and that shouldn't block 
this patch set.

So at least prep_transhuge_page() should not be renamed to 
folio_remove_deferred() imho ...

-- 
Cheers,

David / dhildenb

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15 16:40       ` David Hildenbrand
@ 2023-08-15 17:06         ` Matthew Wilcox
  2023-08-15 17:27           ` David Hildenbrand
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-15 17:06 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 06:40:55PM +0200, David Hildenbrand wrote:
> On 15.08.23 17:32, Matthew Wilcox wrote:
> > On Tue, Aug 15, 2023 at 09:54:36AM +0200, David Hildenbrand wrote:
> > > On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
> > > > Stored in the first tail page's flags, this flag replaces the destructor.
> > > > That removes the last of the destructors, so remove all references to
> > > > folio_dtor and compound_dtor.
> > > > 
> > > > Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> > > > ---
> > > 
> > > [...]
> > > 
> > > > +	/* Has a deferred list (may be empty).  First tail page. */
> > > > +	PG_deferred_list = PG_reclaim,
> > > > +
> > > 
> > > If PG_deferred_list implies thp (and replaces the thp dtor), should we
> > > rather name this PG_thp or something along those lines?
> > 
> > We're trying to use 'thp' to mean 'a folio which is pmd mappable',
> > so I'd rather not call it that.
> 
> There is no conclusion on that.

Theree are a lot of counters called THP and TransHuge and other variants
which are exposed to userspace, and the (user) assumption is that this counts
PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
you'll find them.  If we have folio_test_thp(), people will write:

	if (folio_test_thp(folio))
		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);

instead of using folio_test_pmd_mappable().

> After all, the deferred split queue is just an implementation detail, and it
> happens to live in tailpage 2, no?
> 
> Once we would end up initializing something else in prep_transhuge_page(),
> it would turn out pretty confusing if that is called folio_remove_deferred()
> ...

Perhaps the key difference between normal compound pages and file/anon
compound pages is that the latter are splittable?  So we can name all
of this:

	folio_init_splittable()
	folio_test_splittable()
	folio_fini_splittable()

Maybe that's still too close to an implementation detail, but it's at
least talking about _a_ characteristic of the folio, even if it's not
the _only_ characteristic of the folio.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15 17:06         ` Matthew Wilcox
@ 2023-08-15 17:27           ` David Hildenbrand
  2023-08-15 19:58             ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: David Hildenbrand @ 2023-08-15 17:27 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On 15.08.23 19:06, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 06:40:55PM +0200, David Hildenbrand wrote:
>> On 15.08.23 17:32, Matthew Wilcox wrote:
>>> On Tue, Aug 15, 2023 at 09:54:36AM +0200, David Hildenbrand wrote:
>>>> On 15.08.23 05:26, Matthew Wilcox (Oracle) wrote:
>>>>> Stored in the first tail page's flags, this flag replaces the destructor.
>>>>> That removes the last of the destructors, so remove all references to
>>>>> folio_dtor and compound_dtor.
>>>>>
>>>>> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
>>>>> ---
>>>>
>>>> [...]
>>>>
>>>>> +	/* Has a deferred list (may be empty).  First tail page. */
>>>>> +	PG_deferred_list = PG_reclaim,
>>>>> +
>>>>
>>>> If PG_deferred_list implies thp (and replaces the thp dtor), should we
>>>> rather name this PG_thp or something along those lines?
>>>
>>> We're trying to use 'thp' to mean 'a folio which is pmd mappable',
>>> so I'd rather not call it that.
>>
>> There is no conclusion on that.
> 
> Theree are a lot of counters called THP and TransHuge and other variants
> which are exposed to userspace, and the (user) assumption is that this counts
> PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
> you'll find them.  If we have folio_test_thp(), people will write:
> 
> 	if (folio_test_thp(folio))
> 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
> 
> instead of using folio_test_pmd_mappable().
> 


So if we *really* don't want to use THP to express that we have a page, 
then let's see what these pages are:
* can be mapped to user space
* are transparent to most MM-related systemcalls by (un) mapping
   them in system page size (PTEs)

That we can split these pages (not PTE-map, but convert from large folio 
to small folios) is one characteristic, but IMHO not the main one (and 
maybe not even required at all!).

Maybe we can come up with a better term for "THP, but not necessarily 
PMD-sized".

"Large folio" is IMHO bad. A hugetlb page is a large folio and not all 
large folios can be mapped to user space.

"Transparent large folios" ? Better IMHO.


>> After all, the deferred split queue is just an implementation detail, and it
>> happens to live in tailpage 2, no?
>>
>> Once we would end up initializing something else in prep_transhuge_page(),
>> it would turn out pretty confusing if that is called folio_remove_deferred()
>> ...
> 
> Perhaps the key difference between normal compound pages and file/anon
> compound pages is that the latter are splittable?  So we can name all
> of this:
> 
> 	folio_init_splittable()
> 	folio_test_splittable()
> 	folio_fini_splittable()
> 
> Maybe that's still too close to an implementation detail, but it's at
> least talking about _a_ characteristic of the folio, even if it's not
> the _only_ characteristic of the folio.

Maybe folio_init_transparent() ... avoiding the "huge" part of it.

Very open for alternatives. As expressed in other context, we really 
should figure this out soon.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 9/9] mm: Free up a word in the first tail page
  2023-08-15  3:26 ` [PATCH 9/9] mm: Free up a word in the first tail page Matthew Wilcox (Oracle)
  2023-08-15  7:59   ` David Hildenbrand
@ 2023-08-15 19:21   ` Peter Xu
  1 sibling, 0 replies; 42+ messages in thread
From: Peter Xu @ 2023-08-15 19:21 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 04:26:45AM +0100, Matthew Wilcox (Oracle) wrote:
> Store the folio order in the low byte of the flags word in the first
> tail page.  This frees up the word that was being used to store the
> order and dtor bytes previously.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
>  include/linux/mm.h       | 10 +++++-----
>  include/linux/mm_types.h |  3 +--
>  kernel/crash_core.c      |  1 -
>  mm/internal.h            |  2 +-
>  mm/page_alloc.c          |  4 +++-
>  5 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cf0ae8c51d7f..85568e2b2556 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1028,7 +1028,7 @@ struct inode;
>   * compound_order() can be called without holding a reference, which means
>   * that niceties like page_folio() don't work.  These callers should be
>   * prepared to handle wild return values.  For example, PG_head may be
> - * set before _folio_order is initialised, or this may be a tail page.
> + * set before the order is initialised, or this may be a tail page.
>   * See compaction.c for some good examples.
>   */
>  static inline unsigned int compound_order(struct page *page)
> @@ -1037,7 +1037,7 @@ static inline unsigned int compound_order(struct page *page)
>  
>  	if (!test_bit(PG_head, &folio->flags))
>  		return 0;
> -	return folio->_folio_order;
> +	return folio->_flags_1 & 0xff;
>  }
>  
>  /**
> @@ -1053,7 +1053,7 @@ static inline unsigned int folio_order(struct folio *folio)
>  {
>  	if (!folio_test_large(folio))
>  		return 0;
> -	return folio->_folio_order;
> +	return folio->_flags_1 & 0xff;
>  }
>  
>  #include <linux/huge_mm.h>
> @@ -2025,7 +2025,7 @@ static inline long folio_nr_pages(struct folio *folio)
>  #ifdef CONFIG_64BIT
>  	return folio->_folio_nr_pages;
>  #else
> -	return 1L << folio->_folio_order;
> +	return 1L << (folio->_flags_1 & 0xff);
>  #endif
>  }
>  
> @@ -2043,7 +2043,7 @@ static inline unsigned long compound_nr(struct page *page)
>  #ifdef CONFIG_64BIT
>  	return folio->_folio_nr_pages;
>  #else
> -	return 1L << folio->_folio_order;
> +	return 1L << (folio->_flags_1 & 0xff);
>  #endif
>  }
>  
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index d45a2b8041e0..659c7b84726c 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -282,7 +282,6 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page)
>   * @_refcount: Do not access this member directly.  Use folio_ref_count()
>   *    to find how many references there are to this folio.
>   * @memcg_data: Memory Control Group data.
> - * @_folio_order: Do not use directly, call folio_order().
>   * @_entire_mapcount: Do not use directly, call folio_entire_mapcount().
>   * @_nr_pages_mapped: Do not use directly, call folio_mapcount().
>   * @_pincount: Do not use directly, call folio_maybe_dma_pinned().
> @@ -334,8 +333,8 @@ struct folio {
>  		struct {
>  			unsigned long _flags_1;
>  			unsigned long _head_1;
> +			unsigned long _folio_avail;

This can just be dropped?  Having this single field as "avail" is weird,
without mentioning the rest, IMHO.

We can have a separate patch to resolve what's available, either you can
leave that to my series, or if you dislike that you can propose what you've
replied to my cover letter but add all the available bits.

>  	/* public: */
> -			unsigned char _folio_order;
>  			atomic_t _entire_mapcount;
>  			atomic_t _nr_pages_mapped;
>  			atomic_t _pincount;
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index 934dd86e19f5..693445e1f7f6 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -455,7 +455,6 @@ static int __init crash_save_vmcoreinfo_init(void)
>  	VMCOREINFO_OFFSET(page, lru);
>  	VMCOREINFO_OFFSET(page, _mapcount);
>  	VMCOREINFO_OFFSET(page, private);
> -	VMCOREINFO_OFFSET(folio, _folio_order);
>  	VMCOREINFO_OFFSET(page, compound_head);
>  	VMCOREINFO_OFFSET(pglist_data, node_zones);
>  	VMCOREINFO_OFFSET(pglist_data, nr_zones);
> diff --git a/mm/internal.h b/mm/internal.h
> index e3d11119b04e..c415260c1f06 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -407,7 +407,7 @@ static inline void folio_set_order(struct folio *folio, unsigned int order)
>  	if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>  		return;
>  
> -	folio->_folio_order = order;
> +	folio->_flags_1 = (folio->_flags_1 & ~0xffUL) | order;
>  #ifdef CONFIG_64BIT
>  	folio->_folio_nr_pages = 1U << order;
>  #endif
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9fe9209605a5..0e0e0d18a81b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1115,8 +1115,10 @@ static __always_inline bool free_pages_prepare(struct page *page,
>  
>  		VM_BUG_ON_PAGE(compound && compound_order(page) != order, page);
>  
> -		if (compound)
> +		if (compound) {
>  			ClearPageHasHWPoisoned(page);
> +			page[1].flags &= ~0xffUL;

Could we hide the hard-coded 0xff in some way?

One easy way would be using a macro with a bunch of helpers, like
folio_set|get|clear_order().

The other way is maybe we can also define _flags_1 an enum, where we can
just move over the compound_order field at offset 0?  But I'm not sure how
that looks like at last.

Thanks,

> +		}
>  		for (i = 1; i < (1 << order); i++) {
>  			if (compound)
>  				bad += free_tail_page_prepare(page, page + i);
> -- 
> 2.40.1
> 
> 

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] mm: Rearrange page flags
  2023-08-15  3:26 ` [PATCH 8/9] mm: Rearrange page flags Matthew Wilcox (Oracle)
  2023-08-15  4:30   ` Yosry Ahmed
@ 2023-08-15 19:24   ` Peter Xu
  2023-08-15 20:07     ` Matthew Wilcox
  1 sibling, 1 reply; 42+ messages in thread
From: Peter Xu @ 2023-08-15 19:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 04:26:44AM +0100, Matthew Wilcox (Oracle) wrote:
> Move PG_writeback into bottom byte so that it can use PG_waiters in a
> later patch.  Move PG_head into bottom byte as well to match with where
> 'order' is moving next.  PG_active and PG_workingset move into the second
> byte to make room for them.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
> ---
>  include/linux/page-flags.h | 24 +++++++++++-------------
>  1 file changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index aabf50dc71a3..6a0dd94b2460 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -99,13 +99,15 @@
>   */
>  enum pageflags {
>  	PG_locked,		/* Page is locked. Don't touch. */
> +	PG_writeback,		/* Page is under writeback */
>  	PG_referenced,
>  	PG_uptodate,
>  	PG_dirty,
>  	PG_lru,
> +	PG_head,		/* Must be in bit 6 */

Could there be some explanation on "must be in bit 6" here?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15 17:27           ` David Hildenbrand
@ 2023-08-15 19:58             ` Matthew Wilcox
  2023-08-16  3:14               ` Matthew Wilcox
  2023-08-16  9:55               ` David Hildenbrand
  0 siblings, 2 replies; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-15 19:58 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 07:27:26PM +0200, David Hildenbrand wrote:
> On 15.08.23 19:06, Matthew Wilcox wrote:
> > Theree are a lot of counters called THP and TransHuge and other variants
> > which are exposed to userspace, and the (user) assumption is that this counts
> > PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
> > you'll find them.  If we have folio_test_thp(), people will write:
> > 
> > 	if (folio_test_thp(folio))
> > 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
> > 
> > instead of using folio_test_pmd_mappable().
> 
> So if we *really* don't want to use THP to express that we have a page, then
> let's see what these pages are:
> * can be mapped to user space
> * are transparent to most MM-related systemcalls by (un) mapping
>   them in system page size (PTEs)

 * Are managed on the LRU
 * Can be dirtied, written back

> That we can split these pages (not PTE-map, but convert from large folio to
> small folios) is one characteristic, but IMHO not the main one (and maybe
> not even required at all!).

It's the one which distinguishes them from, say, compound pages used for
slab.  Or used by device drivers.  Or net pagepool, or vmalloc.  There's
a lot of compound allocations out there, and the only ones which need
special treatment here are the ones which are splittable.

> Maybe we can come up with a better term for "THP, but not necessarily
> PMD-sized".
> 
> "Large folio" is IMHO bad. A hugetlb page is a large folio and not all large
> folios can be mapped to user space.
> 
> "Transparent large folios" ? Better IMHO.

I think this goes back to Johannes' point many months ago that we need
separate names for some things.  He wants to split anon & file memory
apart (who gets to keep the name "folio" in the divorce?  what do we
name the type that encompasses both folios and the other one?  or do
they both get different names?)

> > Perhaps the key difference between normal compound pages and file/anon
> > compound pages is that the latter are splittable?  So we can name all
> > of this:
> > 
> > 	folio_init_splittable()
> > 	folio_test_splittable()
> > 	folio_fini_splittable()
> > 
> > Maybe that's still too close to an implementation detail, but it's at
> > least talking about _a_ characteristic of the folio, even if it's not
> > the _only_ characteristic of the folio.
> 
> Maybe folio_init_transparent() ... avoiding the "huge" part of it.
> 
> Very open for alternatives. As expressed in other context, we really should
> figure this out soon.

Yeah, I'm open to better naming too.  At this point in the flow we're
trying to distinguish between compound pages used for slab and compound
pages used for anon/file, but that's not always going to be the case
elsewhere.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] mm: Rearrange page flags
  2023-08-15 19:24   ` Peter Xu
@ 2023-08-15 20:07     ` Matthew Wilcox
  2023-08-15 22:31       ` Yosry Ahmed
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-15 20:07 UTC (permalink / raw)
  To: Peter Xu; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 03:24:16PM -0400, Peter Xu wrote:
> On Tue, Aug 15, 2023 at 04:26:44AM +0100, Matthew Wilcox (Oracle) wrote:
> > +++ b/include/linux/page-flags.h
> > @@ -99,13 +99,15 @@
> >   */
> >  enum pageflags {
> >  	PG_locked,		/* Page is locked. Don't touch. */
> > +	PG_writeback,		/* Page is under writeback */
> >  	PG_referenced,
> >  	PG_uptodate,
> >  	PG_dirty,
> >  	PG_lru,
> > +	PG_head,		/* Must be in bit 6 */
> 
> Could there be some explanation on "must be in bit 6" here?

Not on this line, no.  You get 40-50 characters to say something useful.
Happy to elaborate further in some other comment or in the commit log,
but not on this line.

The idea behind all of this is that _folio_order moves into the bottom
byte of _flags_1.  Because the order can never be greater than 63 (and
in practice I think the largest we're going to see is about 30 -- a 16GB
hugetlb page on some architectures), we know that PG_head and PG_waiters
will be clear, so we can write (folio->_flags_1 & 0xff) and the compiler
will just load a byte; it won't actually load the word and mask it.

We can't move PG_head any lower, or we'll risk having a tail page with
PG_head set (which can happen with the vmemmmap optimisation, but eugh).
And we don't want to move it any higher because then we'll have a flag
that cannot be reused in a tail page.  Bit 6 is the perfect spot for it.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] mm: Rearrange page flags
  2023-08-15 20:07     ` Matthew Wilcox
@ 2023-08-15 22:31       ` Yosry Ahmed
  2023-08-15 23:01         ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: Yosry Ahmed @ 2023-08-15 22:31 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Peter Xu, Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 1:07 PM Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Aug 15, 2023 at 03:24:16PM -0400, Peter Xu wrote:
> > On Tue, Aug 15, 2023 at 04:26:44AM +0100, Matthew Wilcox (Oracle) wrote:
> > > +++ b/include/linux/page-flags.h
> > > @@ -99,13 +99,15 @@
> > >   */
> > >  enum pageflags {
> > >     PG_locked,              /* Page is locked. Don't touch. */
> > > +   PG_writeback,           /* Page is under writeback */
> > >     PG_referenced,
> > >     PG_uptodate,
> > >     PG_dirty,
> > >     PG_lru,
> > > +   PG_head,                /* Must be in bit 6 */
> >
> > Could there be some explanation on "must be in bit 6" here?
>
> Not on this line, no.  You get 40-50 characters to say something useful.
> Happy to elaborate further in some other comment or in the commit log,
> but not on this line.
>
> The idea behind all of this is that _folio_order moves into the bottom
> byte of _flags_1.  Because the order can never be greater than 63 (and
> in practice I think the largest we're going to see is about 30 -- a 16GB
> hugetlb page on some architectures), we know that PG_head and PG_waiters
> will be clear, so we can write (folio->_flags_1 & 0xff) and the compiler
> will just load a byte; it won't actually load the word and mask it.
>
> We can't move PG_head any lower, or we'll risk having a tail page with
> PG_head set (which can happen with the vmemmmap optimisation, but eugh).
> And we don't want to move it any higher because then we'll have a flag
> that cannot be reused in a tail page.  Bit 6 is the perfect spot for it.

Is there some compile time assertion that the order cannot overflow into bit 6?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] mm: Rearrange page flags
  2023-08-15 22:31       ` Yosry Ahmed
@ 2023-08-15 23:01         ` Matthew Wilcox
  2023-08-15 23:33           ` Yosry Ahmed
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-15 23:01 UTC (permalink / raw)
  To: Yosry Ahmed; +Cc: Peter Xu, Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 03:31:42PM -0700, Yosry Ahmed wrote:
> On Tue, Aug 15, 2023 at 1:07 PM Matthew Wilcox <[email protected]> wrote:
> >
> > On Tue, Aug 15, 2023 at 03:24:16PM -0400, Peter Xu wrote:
> > > On Tue, Aug 15, 2023 at 04:26:44AM +0100, Matthew Wilcox (Oracle) wrote:
> > > > +++ b/include/linux/page-flags.h
> > > > @@ -99,13 +99,15 @@
> > > >   */
> > > >  enum pageflags {
> > > >     PG_locked,              /* Page is locked. Don't touch. */
> > > > +   PG_writeback,           /* Page is under writeback */
> > > >     PG_referenced,
> > > >     PG_uptodate,
> > > >     PG_dirty,
> > > >     PG_lru,
> > > > +   PG_head,                /* Must be in bit 6 */
> > >
> > > Could there be some explanation on "must be in bit 6" here?
> >
> > Not on this line, no.  You get 40-50 characters to say something useful.
> > Happy to elaborate further in some other comment or in the commit log,
> > but not on this line.
> >
> > The idea behind all of this is that _folio_order moves into the bottom
> > byte of _flags_1.  Because the order can never be greater than 63 (and
> > in practice I think the largest we're going to see is about 30 -- a 16GB
> > hugetlb page on some architectures), we know that PG_head and PG_waiters
> > will be clear, so we can write (folio->_flags_1 & 0xff) and the compiler
> > will just load a byte; it won't actually load the word and mask it.
> >
> > We can't move PG_head any lower, or we'll risk having a tail page with
> > PG_head set (which can happen with the vmemmmap optimisation, but eugh).
> > And we don't want to move it any higher because then we'll have a flag
> > that cannot be reused in a tail page.  Bit 6 is the perfect spot for it.
> 
> Is there some compile time assertion that the order cannot overflow into bit 6?

An order-64 folio on a machine with a 4kB page size would be 64
zettabytes.  I intend to retire before we're managing memory in chunks
that large.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8/9] mm: Rearrange page flags
  2023-08-15 23:01         ` Matthew Wilcox
@ 2023-08-15 23:33           ` Yosry Ahmed
  0 siblings, 0 replies; 42+ messages in thread
From: Yosry Ahmed @ 2023-08-15 23:33 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Peter Xu, Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 4:01 PM Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Aug 15, 2023 at 03:31:42PM -0700, Yosry Ahmed wrote:
> > On Tue, Aug 15, 2023 at 1:07 PM Matthew Wilcox <[email protected]> wrote:
> > >
> > > On Tue, Aug 15, 2023 at 03:24:16PM -0400, Peter Xu wrote:
> > > > On Tue, Aug 15, 2023 at 04:26:44AM +0100, Matthew Wilcox (Oracle) wrote:
> > > > > +++ b/include/linux/page-flags.h
> > > > > @@ -99,13 +99,15 @@
> > > > >   */
> > > > >  enum pageflags {
> > > > >     PG_locked,              /* Page is locked. Don't touch. */
> > > > > +   PG_writeback,           /* Page is under writeback */
> > > > >     PG_referenced,
> > > > >     PG_uptodate,
> > > > >     PG_dirty,
> > > > >     PG_lru,
> > > > > +   PG_head,                /* Must be in bit 6 */
> > > >
> > > > Could there be some explanation on "must be in bit 6" here?
> > >
> > > Not on this line, no.  You get 40-50 characters to say something useful.
> > > Happy to elaborate further in some other comment or in the commit log,
> > > but not on this line.
> > >
> > > The idea behind all of this is that _folio_order moves into the bottom
> > > byte of _flags_1.  Because the order can never be greater than 63 (and
> > > in practice I think the largest we're going to see is about 30 -- a 16GB
> > > hugetlb page on some architectures), we know that PG_head and PG_waiters
> > > will be clear, so we can write (folio->_flags_1 & 0xff) and the compiler
> > > will just load a byte; it won't actually load the word and mask it.
> > >
> > > We can't move PG_head any lower, or we'll risk having a tail page with
> > > PG_head set (which can happen with the vmemmmap optimisation, but eugh).
> > > And we don't want to move it any higher because then we'll have a flag
> > > that cannot be reused in a tail page.  Bit 6 is the perfect spot for it.
> >
> > Is there some compile time assertion that the order cannot overflow into bit 6?
>
> An order-64 folio on a machine with a 4kB page size would be 64
> zettabytes.  I intend to retire before we're managing memory in chunks
> that large.

I should have done the math :)
Never say never though :)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15 19:58             ` Matthew Wilcox
@ 2023-08-16  3:14               ` Matthew Wilcox
  2023-08-16 10:12                 ` David Hildenbrand
  2023-08-16  9:55               ` David Hildenbrand
  1 sibling, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-16  3:14 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Tue, Aug 15, 2023 at 08:58:24PM +0100, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 07:27:26PM +0200, David Hildenbrand wrote:
> > On 15.08.23 19:06, Matthew Wilcox wrote:
> > > Theree are a lot of counters called THP and TransHuge and other variants
> > > which are exposed to userspace, and the (user) assumption is that this counts
> > > PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
> > > you'll find them.  If we have folio_test_thp(), people will write:
> > > 
> > > 	if (folio_test_thp(folio))
> > > 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
> > > 
> > > instead of using folio_test_pmd_mappable().
> > 
> > So if we *really* don't want to use THP to express that we have a page, then
> > let's see what these pages are:
> > * can be mapped to user space
> > * are transparent to most MM-related systemcalls by (un) mapping
> >   them in system page size (PTEs)
> 
>  * Are managed on the LRU

I think this is the best one to go with.  Either that or "managed by
rmap".  That excludes compoud pages which are allocated from vmalloc()
(which can be mmaped), page tables, slab, etc.  It includes both file
and anon folios.

I have a handy taxonomy here: https://kernelnewbies.org/MemoryTypes

Unfortunately, folio_test_lru() already exists and means something
different ("Is this folio on an LRU list").  I fear folio_test_rmap()
would have a similar confusion -- "Is this folio currently findable by
rmap", or some such. folio_test_rmappable()?


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-15 19:58             ` Matthew Wilcox
  2023-08-16  3:14               ` Matthew Wilcox
@ 2023-08-16  9:55               ` David Hildenbrand
  1 sibling, 0 replies; 42+ messages in thread
From: David Hildenbrand @ 2023-08-16  9:55 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On 15.08.23 21:58, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 07:27:26PM +0200, David Hildenbrand wrote:
>> On 15.08.23 19:06, Matthew Wilcox wrote:
>>> Theree are a lot of counters called THP and TransHuge and other variants
>>> which are exposed to userspace, and the (user) assumption is that this counts
>>> PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
>>> you'll find them.  If we have folio_test_thp(), people will write:
>>>
>>> 	if (folio_test_thp(folio))
>>> 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
>>>
>>> instead of using folio_test_pmd_mappable().
>>
>> So if we *really* don't want to use THP to express that we have a page, then
>> let's see what these pages are:
>> * can be mapped to user space
>> * are transparent to most MM-related systemcalls by (un) mapping
>>    them in system page size (PTEs)
> 
>   * Are managed on the LRU
>   * Can be dirtied, written back

Right, but at least hugetlb *could* be extended to do that as well (and 
even implement swapping). I think the biggest difference is the 
transparency/PTE-mapping/unmapping/ ....

> 
>> That we can split these pages (not PTE-map, but convert from large folio to
>> small folios) is one characteristic, but IMHO not the main one (and maybe
>> not even required at all!).
> 
> It's the one which distinguishes them from, say, compound pages used for
> slab.  Or used by device drivers.  Or net pagepool, or vmalloc.  There's
> a lot of compound allocations out there, and the only ones which need
> special treatment here are the ones which are splittable.

And my point is that that is an implementation detail I'm afraid. 
Instead of splitting the folio into order-0 folios, you could also 
migrate off all data to order-0 folios and just free the large folio.

Because splitting only succeeds if there are no other references on the 
folio, just like migration.

But let's not get distracted :)

> 
>> Maybe we can come up with a better term for "THP, but not necessarily
>> PMD-sized".
>>
>> "Large folio" is IMHO bad. A hugetlb page is a large folio and not all large
>> folios can be mapped to user space.
>>
>> "Transparent large folios" ? Better IMHO.
> 
> I think this goes back to Johannes' point many months ago that we need
> separate names for some things.  He wants to split anon & file memory
> apart (who gets to keep the name "folio" in the divorce?  what do we
> name the type that encompasses both folios and the other one?  or do
> they both get different names?)

Good question. I remember discussing a type hierarchy back when you 
upstreamed folios.

Maybe we would have "file folios" and "anon folios.

> 
>>> Perhaps the key difference between normal compound pages and file/anon
>>> compound pages is that the latter are splittable?  So we can name all
>>> of this:
>>>
>>> 	folio_init_splittable()
>>> 	folio_test_splittable()
>>> 	folio_fini_splittable()
>>>
>>> Maybe that's still too close to an implementation detail, but it's at
>>> least talking about _a_ characteristic of the folio, even if it's not
>>> the _only_ characteristic of the folio.
>>
>> Maybe folio_init_transparent() ... avoiding the "huge" part of it.
>>
>> Very open for alternatives. As expressed in other context, we really should
>> figure this out soon.
> 
> Yeah, I'm open to better naming too.  At this point in the flow we're
> trying to distinguish between compound pages used for slab and compound
> pages used for anon/file, but that's not always going to be the case
> elsewhere.


Yes. Let me reply to your other mail.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-16  3:14               ` Matthew Wilcox
@ 2023-08-16 10:12                 ` David Hildenbrand
  2023-08-16 12:05                   ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: David Hildenbrand @ 2023-08-16 10:12 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On 16.08.23 05:14, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 08:58:24PM +0100, Matthew Wilcox wrote:
>> On Tue, Aug 15, 2023 at 07:27:26PM +0200, David Hildenbrand wrote:
>>> On 15.08.23 19:06, Matthew Wilcox wrote:
>>>> Theree are a lot of counters called THP and TransHuge and other variants
>>>> which are exposed to userspace, and the (user) assumption is that this counts
>>>> PMD-sized folios.  If you grep around for folio_test_pmd_mappable(),
>>>> you'll find them.  If we have folio_test_thp(), people will write:
>>>>
>>>> 	if (folio_test_thp(folio))
>>>> 		__mod_lruvec_state(lruvec, NR_SHMEM_THPS, nr);
>>>>
>>>> instead of using folio_test_pmd_mappable().
>>>
>>> So if we *really* don't want to use THP to express that we have a page, then
>>> let's see what these pages are:
>>> * can be mapped to user space
>>> * are transparent to most MM-related systemcalls by (un) mapping
>>>    them in system page size (PTEs)
>>
>>   * Are managed on the LRU
> 
> I think this is the best one to go with.  Either that or "managed by
> rmap".  That excludes compoud pages which are allocated from vmalloc()
> (which can be mmaped), page tables, slab, etc.  It includes both file
> and anon folios.
> 
> I have a handy taxonomy here: https://kernelnewbies.org/MemoryTypes
> 
> Unfortunately, folio_test_lru() already exists and means something
> different ("Is this folio on an LRU list").  I fear folio_test_rmap()
> would have a similar confusion -- "Is this folio currently findable by
> rmap", or some such. folio_test_rmappable()?
But what about hugetlb, they are also remappable? We could have 
folio_test_rmappable(), but that would then also better include hugetlb ...

(in theory, one could envision hugetlb also using an lru mechanism, 
although I doubt/hope it will ever happen)

Starting at the link you provided, I guess "vmalloc" and "net pool" 
would not fall under that category, or would they? (I'm assuming they 
don't get mapped using the rmap, so they are "different", and they are 
not managed by lru).

So I assume we only care about anon+file (lru-managed). Only these are 
rmappable (besides hugetlb), correct?

folio_test_lru_managed()

Might be cleanest to describe anon+file that are managed by the lru, 
just might not be on a lru list right now (difference to folio_test_lru()).


I've been also thinking about

"folio_test_normal"

But it only makes sense when "all others (including hugetlb) are the odd 
one".

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-16 10:12                 ` David Hildenbrand
@ 2023-08-16 12:05                   ` Matthew Wilcox
  2023-08-16 12:34                     ` David Hildenbrand
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2023-08-16 12:05 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On Wed, Aug 16, 2023 at 12:12:44PM +0200, David Hildenbrand wrote:
> On 16.08.23 05:14, Matthew Wilcox wrote:
> > >   * Are managed on the LRU
> > 
> > I think this is the best one to go with.  Either that or "managed by
> > rmap".  That excludes compoud pages which are allocated from vmalloc()
> > (which can be mmaped), page tables, slab, etc.  It includes both file
> > and anon folios.
> > 
> > I have a handy taxonomy here: https://kernelnewbies.org/MemoryTypes
> > 
> > Unfortunately, folio_test_lru() already exists and means something
> > different ("Is this folio on an LRU list").  I fear folio_test_rmap()
> > would have a similar confusion -- "Is this folio currently findable by
> > rmap", or some such. folio_test_rmappable()?
> But what about hugetlb, they are also remappable? We could have
> folio_test_rmappable(), but that would then also better include hugetlb ...

We could do that!  Have both hugetlb & huge_memory.c set the rmappable
flag.  We'd still know which destructor to call because hugetlb also sets
the hugetlb flag.

> Starting at the link you provided, I guess "vmalloc" and "net pool" would
> not fall under that category, or would they? (I'm assuming they don't get
> mapped using the rmap, so they are "different", and they are not managed by
> lru).

Right, neither type of page ends up on the LRU, and neither is added to
rmap.

> So I assume we only care about anon+file (lru-managed). Only these are
> rmappable (besides hugetlb), correct?
> 
> folio_test_lru_managed()
> 
> Might be cleanest to describe anon+file that are managed by the lru, just
> might not be on a lru list right now (difference to folio_test_lru()).

Something I didn't think about last night is that this flag only
_exists_ for large folios.  folio_test_lru_managed() (and
folio_test_rmappable()) both sound like they might work if you call them
on single-page folios, but we BUG if you do (see folio_flags()) 

> I've been also thinking about
> 
> "folio_test_normal"
> 
> But it only makes sense when "all others (including hugetlb) are the odd
> one".

Who's to say slab is abnormal?  ;-)  But this one also fails to
communicate "only call this on large folios".  folio_test_splittable()
does at least communicate that this is related to large folios, although
one might simply expect it to return false for single-page folios rather
than BUG.

folio_test_large_rmappable()?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 7/9] mm: Add deferred_list page flag
  2023-08-16 12:05                   ` Matthew Wilcox
@ 2023-08-16 12:34                     ` David Hildenbrand
  0 siblings, 0 replies; 42+ messages in thread
From: David Hildenbrand @ 2023-08-16 12:34 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Andrew Morton, Jens Axboe, io-uring, linux-mm

On 16.08.23 14:05, Matthew Wilcox wrote:
> On Wed, Aug 16, 2023 at 12:12:44PM +0200, David Hildenbrand wrote:
>> On 16.08.23 05:14, Matthew Wilcox wrote:
>>>>    * Are managed on the LRU
>>>
>>> I think this is the best one to go with.  Either that or "managed by
>>> rmap".  That excludes compoud pages which are allocated from vmalloc()
>>> (which can be mmaped), page tables, slab, etc.  It includes both file
>>> and anon folios.
>>>
>>> I have a handy taxonomy here: https://kernelnewbies.org/MemoryTypes
>>>
>>> Unfortunately, folio_test_lru() already exists and means something
>>> different ("Is this folio on an LRU list").  I fear folio_test_rmap()
>>> would have a similar confusion -- "Is this folio currently findable by
>>> rmap", or some such. folio_test_rmappable()?
>> But what about hugetlb, they are also remappable? We could have
>> folio_test_rmappable(), but that would then also better include hugetlb ...
> 
> We could do that!  Have both hugetlb & huge_memory.c set the rmappable
> flag.  We'd still know which destructor to call because hugetlb also sets
> the hugetlb flag.
> 
>> Starting at the link you provided, I guess "vmalloc" and "net pool" would
>> not fall under that category, or would they? (I'm assuming they don't get
>> mapped using the rmap, so they are "different", and they are not managed by
>> lru).
> 
> Right, neither type of page ends up on the LRU, and neither is added to
> rmap.
> 
>> So I assume we only care about anon+file (lru-managed). Only these are
>> rmappable (besides hugetlb), correct?
>>
>> folio_test_lru_managed()
>>
>> Might be cleanest to describe anon+file that are managed by the lru, just
>> might not be on a lru list right now (difference to folio_test_lru()).
> 
> Something I didn't think about last night is that this flag only
> _exists_ for large folios.  folio_test_lru_managed() (and
> folio_test_rmappable()) both sound like they might work if you call them
> on single-page folios, but we BUG if you do (see folio_flags())
> 
>> I've been also thinking about
>>
>> "folio_test_normal"
>>
>> But it only makes sense when "all others (including hugetlb) are the odd
>> one".
> 
> Who's to say slab is abnormal?  ;-)  But this one also fails to
> communicate "only call this on large folios".  folio_test_splittable()
> does at least communicate that this is related to large folios, although
> one might simply expect it to return false for single-page folios rather
> than BUG.
> 
> folio_test_large_rmappable()?

Sounds good to me. We can then further test if it's hugetlb to rule that 
one out.

-- 
Cheers,

David / dhildenb


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2023-08-16 12:36 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-15  3:26 [PATCH 0/9] Remove _folio_dtor and _folio_order Matthew Wilcox (Oracle)
2023-08-15  3:26 ` [PATCH 1/9] io_uring: Stop calling free_compound_page() Matthew Wilcox (Oracle)
2023-08-15  7:33   ` David Hildenbrand
2023-08-15 15:00   ` Jens Axboe
2023-08-15 15:36     ` Matthew Wilcox
2023-08-15  3:26 ` [PATCH 2/9] mm: Call the hugetlb destructor directly Matthew Wilcox (Oracle)
2023-08-15  7:36   ` David Hildenbrand
2023-08-15  3:26 ` [PATCH 3/9] mm: Call free_transhuge_folio() directly from destroy_large_folio() Matthew Wilcox (Oracle)
2023-08-15  6:13   ` kernel test robot
2023-08-15  7:40   ` David Hildenbrand
2023-08-15 14:06     ` Matthew Wilcox
2023-08-15  8:09   ` kernel test robot
2023-08-15  3:26 ` [PATCH 4/9] mm: Make free_compound_page() static Matthew Wilcox (Oracle)
2023-08-15  7:47   ` David Hildenbrand
2023-08-15  7:48     ` David Hildenbrand
2023-08-15  3:26 ` [PATCH 5/9] mm: Remove free_compound_page() Matthew Wilcox (Oracle)
2023-08-15  7:48   ` David Hildenbrand
2023-08-15  3:26 ` [PATCH 6/9] mm: Remove HUGETLB_PAGE_DTOR Matthew Wilcox (Oracle)
2023-08-15  7:50   ` David Hildenbrand
2023-08-15  3:26 ` [PATCH 7/9] mm: Add deferred_list page flag Matthew Wilcox (Oracle)
2023-08-15  7:54   ` David Hildenbrand
2023-08-15 15:32     ` Matthew Wilcox
2023-08-15 16:40       ` David Hildenbrand
2023-08-15 17:06         ` Matthew Wilcox
2023-08-15 17:27           ` David Hildenbrand
2023-08-15 19:58             ` Matthew Wilcox
2023-08-16  3:14               ` Matthew Wilcox
2023-08-16 10:12                 ` David Hildenbrand
2023-08-16 12:05                   ` Matthew Wilcox
2023-08-16 12:34                     ` David Hildenbrand
2023-08-16  9:55               ` David Hildenbrand
2023-08-15  3:26 ` [PATCH 8/9] mm: Rearrange page flags Matthew Wilcox (Oracle)
2023-08-15  4:30   ` Yosry Ahmed
2023-08-15 19:24   ` Peter Xu
2023-08-15 20:07     ` Matthew Wilcox
2023-08-15 22:31       ` Yosry Ahmed
2023-08-15 23:01         ` Matthew Wilcox
2023-08-15 23:33           ` Yosry Ahmed
2023-08-15  3:26 ` [PATCH 9/9] mm: Free up a word in the first tail page Matthew Wilcox (Oracle)
2023-08-15  7:59   ` David Hildenbrand
2023-08-15 11:39     ` Matthew Wilcox
2023-08-15 19:21   ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox