public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v5 0/3] io_uring/rsrc: coalescing multi-hugepage registered buffers
       [not found] <CGME20240628084418epcas5p14c304761ca375a6afba3aa199c27f9e3@epcas5p1.samsung.com>
@ 2024-06-28  8:44 ` Chenliang Li
       [not found]   ` <CGME20240628084420epcas5p32f49e7c977695d20bcef7734eb2e38b4@epcas5p3.samsung.com>
                     ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Chenliang Li @ 2024-06-28  8:44 UTC (permalink / raw)
  To: axboe, asml.silence
  Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev,
	Chenliang Li

Registered buffers are stored and processed in the form of bvec array,
each bvec element typically points to a PAGE_SIZE page but can also work
with hugepages. Specifically, a buffer consisting of a hugepage is
coalesced to use only one hugepage bvec entry during registration.
This coalescing feature helps to save both the space and DMA-mapping time.

However, currently the coalescing feature doesn't work for multi-hugepage
buffers. For a buffer with several 2M hugepages, we still split it into
thousands of 4K page bvec entries while in fact, we can just use a
handful of hugepage bvecs.

This patch series enables coalescing registered buffers with more than
one hugepages. It optimizes the DMA-mapping time and saves memory for
these kind of buffers.

Testing:

The hugepage fixed buffer I/O can be tested using fio without
modification. The fio command used in the following test is given
in [1]. There's also a liburing testcase in [2]. Also, the system
should have enough hugepages available before testing.

Perf diff of 8M(4 * 2M hugepages) fio randread test:

Before          After           Symbol
.....................................................
5.88%				[k] __blk_rq_map_sg
3.98%		-3.95%		[k] dma_direct_map_sg
2.47%				[k] dma_pool_alloc
1.37%		-1.36%		[k] sg_next
                +0.28%		[k] dma_map_page_attrs

Perf diff of 8M fio randwrite test:

Before		After		Symbol
......................................................
2.80%				[k] __blk_rq_map_sg
1.74%				[k] dma_direct_map_sg
1.61%				[k] dma_pool_alloc
0.67%				[k] sg_next
		+0.04%		[k] dma_map_page_attrs

First two patches prepare for adding the multi-hugepage coalescing
into buffer registration, the 3rd patch enables the feature. 

-----------------
Changes since v4:

- Use a new compacted array of pages instead of the original one, 
  if buffer can be coalesced.
- Clear unnecessary loops after using the new page array.
- Remove the account and init helper for coalesced imu. Use the original
  path instead.
- Remove unnecessary nr_folios field in the io_imu_folio_data struct.
- Rearrange the helper functions.

v4 : https://lore.kernel.org/io-uring/[email protected]/T/#t

Changes since v3:

- Delete unnecessary commit message
- Update test command and test results

v3 : https://lore.kernel.org/io-uring/[email protected]/T/#t

Changes since v2:

- Modify the loop iterator increment to make code cleaner
- Minor fix to the return procedure in coalesced buffer account
- Correct commit messages
- Add test cases in liburing

v2 : https://lore.kernel.org/io-uring/[email protected]/T/#t

Changes since v1:

- Split into 4 patches
- Fix code style issues
- Rearrange the change of code for cleaner look
- Add speciallized pinned page accounting procedure for coalesced
  buffers
- Reordered the newly add fields in imu struct for better compaction

v1 : https://lore.kernel.org/io-uring/[email protected]/T/#u

[1]
fio -iodepth=64 -rw=randread(-rw=randwrite) -direct=1 -ioengine=io_uring \
-bs=8M -numjobs=1 -group_reporting -mem=shmhuge -fixedbufs -hugepage-size=2M \
-filename=/dev/nvme0n1 -runtime=10s -name=test1

[2]
https://lore.kernel.org/io-uring/[email protected]/T/#u

Chenliang Li (3):
  io_uring/rsrc: add hugepage fixed buffer coalesce helpers
  io_uring/rsrc: store folio shift and mask into imu
  io_uring/rsrc: enable multi-hugepage buffer coalescing

 io_uring/rsrc.c | 149 +++++++++++++++++++++++++++++++++++-------------
 io_uring/rsrc.h |  11 ++++
 2 files changed, 120 insertions(+), 40 deletions(-)


base-commit: 50cf5f3842af3135b88b041890e7e12a74425fcb
-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v5 1/3] io_uring/rsrc: add hugepage fixed buffer coalesce helpers
       [not found]   ` <CGME20240628084420epcas5p32f49e7c977695d20bcef7734eb2e38b4@epcas5p3.samsung.com>
@ 2024-06-28  8:44     ` Chenliang Li
  2024-07-09 13:09       ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Chenliang Li @ 2024-06-28  8:44 UTC (permalink / raw)
  To: axboe, asml.silence
  Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev,
	Chenliang Li

Introduce helper functions to check and coalesce hugepage-backed fixed
buffers. The coalescing optimizes both time and space consumption caused
by mapping and storing multi-hugepage fixed buffers. Currently we only
have single-hugepage buffer coalescing, so add support for multi-hugepage
fixed buffer coalescing.

A coalescable multi-hugepage buffer should fully cover its folios
(except potentially the first and last one), and these folios should
have the same size. These requirements are for easier processing later,
also we need same size'd chunks in io_import_fixed for fast iov_iter
adjust.

Signed-off-by: Chenliang Li <[email protected]>
---
 io_uring/rsrc.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
 io_uring/rsrc.h |  9 +++++
 2 files changed, 96 insertions(+)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 60c00144471a..c88ce8c38515 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -849,6 +849,93 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
 	return ret;
 }
 
+static bool io_do_coalesce_buffer(struct page ***pages, int *nr_pages,
+				struct io_imu_folio_data *data, int nr_folios)
+{
+	struct page **page_array = *pages, **new_array = NULL;
+	int nr_pages_left = *nr_pages, i, j;
+
+	/* Store head pages only*/
+	new_array = kvmalloc_array(nr_folios, sizeof(struct page *),
+					GFP_KERNEL);
+	if (!new_array)
+		return false;
+
+	new_array[0] = page_array[0];
+	/*
+	 * The pages are bound to the folio, it doesn't
+	 * actually unpin them but drops all but one reference,
+	 * which is usually put down by io_buffer_unmap().
+	 * Note, needs a better helper.
+	 */
+	if (data->nr_pages_head > 1)
+		unpin_user_pages(&page_array[1], data->nr_pages_head - 1);
+
+	j = data->nr_pages_head;
+	nr_pages_left -= data->nr_pages_head;
+	for (i = 1; i < nr_folios; i++) {
+		unsigned int nr_unpin;
+
+		new_array[i] = page_array[j];
+		nr_unpin = min_t(unsigned int, nr_pages_left - 1,
+					data->nr_pages_mid - 1);
+		if (nr_unpin)
+			unpin_user_pages(&page_array[j+1], nr_unpin);
+		j += data->nr_pages_mid;
+		nr_pages_left -= data->nr_pages_mid;
+	}
+	kvfree(page_array);
+	*pages = new_array;
+	*nr_pages = nr_folios;
+	return true;
+}
+
+static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
+					 struct io_imu_folio_data *data)
+{
+	struct page **page_array = *pages;
+	struct folio *folio = page_folio(page_array[0]);
+	unsigned int count = 1, nr_folios = 1;
+	int i;
+
+	if (*nr_pages <= 1)
+		return false;
+
+	data->nr_pages_mid = folio_nr_pages(folio);
+	if (data->nr_pages_mid == 1)
+		return false;
+
+	data->folio_shift = folio_shift(folio);
+	data->folio_size = folio_size(folio);
+	/*
+	 * Check if pages are contiguous inside a folio, and all folios have
+	 * the same page count except for the head and tail.
+	 */
+	for (i = 1; i < *nr_pages; i++) {
+		if (page_folio(page_array[i]) == folio &&
+			page_array[i] == page_array[i-1] + 1) {
+			count++;
+			continue;
+		}
+
+		if (nr_folios == 1)
+			data->nr_pages_head = count;
+		else if (count != data->nr_pages_mid)
+			return false;
+
+		folio = page_folio(page_array[i]);
+		if (folio_size(folio) != data->folio_size)
+			return false;
+
+		count = 1;
+		nr_folios++;
+	}
+	if (nr_folios == 1)
+		data->nr_pages_head = count;
+
+	return io_do_coalesce_buffer(pages, nr_pages, data, nr_folios);
+}
+
 static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
 				  struct io_mapped_ubuf **pimu,
 				  struct page **last_hpage)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index c032ca3436ca..cc66323535f6 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -50,6 +50,15 @@ struct io_mapped_ubuf {
 	struct bio_vec	bvec[] __counted_by(nr_bvecs);
 };
 
+struct io_imu_folio_data {
+	/* Head folio can be partially included in the fixed buf */
+	unsigned int	nr_pages_head;
+	/* For non-head/tail folios, has to be fully included */
+	unsigned int	nr_pages_mid;
+	unsigned int	folio_shift;
+	size_t		folio_size;
+};
+
 void io_rsrc_node_ref_zero(struct io_rsrc_node *node);
 void io_rsrc_node_destroy(struct io_ring_ctx *ctx, struct io_rsrc_node *ref_node);
 struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 2/3] io_uring/rsrc: store folio shift and mask into imu
       [not found]   ` <CGME20240628084422epcas5p3b5d4c93e5fa30069c703bcead1fa0033@epcas5p3.samsung.com>
@ 2024-06-28  8:44     ` Chenliang Li
  0 siblings, 0 replies; 9+ messages in thread
From: Chenliang Li @ 2024-06-28  8:44 UTC (permalink / raw)
  To: axboe, asml.silence
  Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev,
	Chenliang Li

Store the folio shift and folio mask into imu struct and use it in
iov_iter adjust, as we will have non PAGE_SIZE'd chunks if a
multi-hugepage buffer get coalesced.

Signed-off-by: Chenliang Li <[email protected]>
Reviewed-by: Anuj Gupta <[email protected]>
---
 io_uring/rsrc.c | 15 ++++++---------
 io_uring/rsrc.h |  2 ++
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index c88ce8c38515..3198cf854db1 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1002,6 +1002,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
 	imu->ubuf = (unsigned long) iov->iov_base;
 	imu->ubuf_end = imu->ubuf + iov->iov_len;
 	imu->nr_bvecs = nr_pages;
+	imu->folio_shift = PAGE_SHIFT;
+	imu->folio_mask = PAGE_MASK;
 	*pimu = imu;
 	ret = 0;
 
@@ -1118,23 +1120,18 @@ int io_import_fixed(int ddir, struct iov_iter *iter,
 		 * we know that:
 		 *
 		 * 1) it's a BVEC iter, we set it up
-		 * 2) all bvecs are PAGE_SIZE in size, except potentially the
+		 * 2) all bvecs are the same in size, except potentially the
 		 *    first and last bvec
 		 *
 		 * So just find our index, and adjust the iterator afterwards.
 		 * If the offset is within the first bvec (or the whole first
 		 * bvec, just use iov_iter_advance(). This makes it easier
 		 * since we can just skip the first segment, which may not
-		 * be PAGE_SIZE aligned.
+		 * be folio_size aligned.
 		 */
 		const struct bio_vec *bvec = imu->bvec;
 
 		if (offset < bvec->bv_len) {
-			/*
-			 * Note, huge pages buffers consists of one large
-			 * bvec entry and should always go this way. The other
-			 * branch doesn't expect non PAGE_SIZE'd chunks.
-			 */
 			iter->bvec = bvec;
 			iter->nr_segs = bvec->bv_len;
 			iter->count -= offset;
@@ -1144,12 +1141,12 @@ int io_import_fixed(int ddir, struct iov_iter *iter,
 
 			/* skip first vec */
 			offset -= bvec->bv_len;
-			seg_skip = 1 + (offset >> PAGE_SHIFT);
+			seg_skip = 1 + (offset >> imu->folio_shift);
 
 			iter->bvec = bvec + seg_skip;
 			iter->nr_segs -= seg_skip;
 			iter->count -= bvec->bv_len + offset;
-			iter->iov_offset = offset & ~PAGE_MASK;
+			iter->iov_offset = offset & ~imu->folio_mask;
 		}
 	}
 
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index cc66323535f6..b02fc0ef59fe 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -46,7 +46,9 @@ struct io_mapped_ubuf {
 	u64		ubuf;
 	u64		ubuf_end;
 	unsigned int	nr_bvecs;
+	unsigned int	folio_shift;
 	unsigned long	acct_pages;
+	unsigned long	folio_mask;
 	struct bio_vec	bvec[] __counted_by(nr_bvecs);
 };
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v5 3/3] io_uring/rsrc: enable multi-hugepage buffer coalescing
       [not found]   ` <CGME20240628084424epcas5p3c34ec2fb8fb45752ef6a11447812ae0d@epcas5p3.samsung.com>
@ 2024-06-28  8:44     ` Chenliang Li
  2024-07-09 13:17       ` Pavel Begunkov
  0 siblings, 1 reply; 9+ messages in thread
From: Chenliang Li @ 2024-06-28  8:44 UTC (permalink / raw)
  To: axboe, asml.silence
  Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev,
	Chenliang Li

Modify io_sqe_buffer_register to enable the coalescing for
multi-hugepage fixed buffers.

Signed-off-by: Chenliang Li <[email protected]>
---
 io_uring/rsrc.c | 47 ++++++++++++++++-------------------------------
 1 file changed, 16 insertions(+), 31 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 3198cf854db1..790ed3c1bcc8 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -945,7 +945,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
 	unsigned long off;
 	size_t size;
 	int ret, nr_pages, i;
-	struct folio *folio = NULL;
+	struct io_imu_folio_data data;
+	bool coalesced;
 
 	*pimu = (struct io_mapped_ubuf *)&dummy_ubuf;
 	if (!iov->iov_base)
@@ -960,31 +961,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
 		goto done;
 	}
 
-	/* If it's a huge page, try to coalesce them into a single bvec entry */
-	if (nr_pages > 1) {
-		folio = page_folio(pages[0]);
-		for (i = 1; i < nr_pages; i++) {
-			/*
-			 * Pages must be consecutive and on the same folio for
-			 * this to work
-			 */
-			if (page_folio(pages[i]) != folio ||
-			    pages[i] != pages[i - 1] + 1) {
-				folio = NULL;
-				break;
-			}
-		}
-		if (folio) {
-			/*
-			 * The pages are bound to the folio, it doesn't
-			 * actually unpin them but drops all but one reference,
-			 * which is usually put down by io_buffer_unmap().
-			 * Note, needs a better helper.
-			 */
-			unpin_user_pages(&pages[1], nr_pages - 1);
-			nr_pages = 1;
-		}
-	}
+	/* If it's huge page(s), try to coalesce them into fewer bvec entries */
+	coalesced = io_try_coalesce_buffer(&pages, &nr_pages, &data);
 
 	imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL);
 	if (!imu)
@@ -1004,17 +982,24 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
 	imu->nr_bvecs = nr_pages;
 	imu->folio_shift = PAGE_SHIFT;
 	imu->folio_mask = PAGE_MASK;
+	if (coalesced) {
+		imu->folio_shift = data.folio_shift;
+		imu->folio_mask = ~((1UL << data.folio_shift) - 1);
+	}
 	*pimu = imu;
 	ret = 0;
 
-	if (folio) {
-		bvec_set_page(&imu->bvec[0], pages[0], size, off);
-		goto done;
-	}
 	for (i = 0; i < nr_pages; i++) {
 		size_t vec_len;
 
-		vec_len = min_t(size_t, size, PAGE_SIZE - off);
+		if (coalesced) {
+			size_t seg_size = i ? data.folio_size :
+				PAGE_SIZE * data.nr_pages_head;
+
+			vec_len = min_t(size_t, size, seg_size - off);
+		} else {
+			vec_len = min_t(size_t, size, PAGE_SIZE - off);
+		}
 		bvec_set_page(&imu->bvec[i], pages[i], vec_len, off);
 		off = 0;
 		size -= vec_len;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 0/3] io_uring/rsrc: coalescing multi-hugepage registered buffers
       [not found]   ` <CGME20240708021432epcas5p4e7e74d81a42a559f2b059e94e7022740@epcas5p4.samsung.com>
@ 2024-07-08  2:14     ` Chenliang Li
  0 siblings, 0 replies; 9+ messages in thread
From: Chenliang Li @ 2024-07-08  2:14 UTC (permalink / raw)
  To: axboe, asml.silence; +Cc: io-uring

Hi, a gentle ping here. Any comments on the v5 patchset?

Thanks,
Chenliang Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/3] io_uring/rsrc: add hugepage fixed buffer coalesce helpers
  2024-06-28  8:44     ` [PATCH v5 1/3] io_uring/rsrc: add hugepage fixed buffer coalesce helpers Chenliang Li
@ 2024-07-09 13:09       ` Pavel Begunkov
       [not found]         ` <CGME20240710022336epcas5p2685a44c8e04962830f4e7f8ffee8168f@epcas5p2.samsung.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2024-07-09 13:09 UTC (permalink / raw)
  To: Chenliang Li, axboe
  Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev

On 6/28/24 09:44, Chenliang Li wrote:
> Introduce helper functions to check and coalesce hugepage-backed fixed
> buffers. The coalescing optimizes both time and space consumption caused
> by mapping and storing multi-hugepage fixed buffers. Currently we only
> have single-hugepage buffer coalescing, so add support for multi-hugepage
> fixed buffer coalescing.
> 
> A coalescable multi-hugepage buffer should fully cover its folios
> (except potentially the first and last one), and these folios should
> have the same size. These requirements are for easier processing later,
> also we need same size'd chunks in io_import_fixed for fast iov_iter
> adjust.
> 
> Signed-off-by: Chenliang Li <[email protected]>
> ---
>   io_uring/rsrc.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++
>   io_uring/rsrc.h |  9 +++++
>   2 files changed, 96 insertions(+)
> 
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 60c00144471a..c88ce8c38515 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -849,6 +849,93 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
>   	return ret;
>   }
>   
> +static bool io_do_coalesce_buffer(struct page ***pages, int *nr_pages,
> +				struct io_imu_folio_data *data, int nr_folios)
> +{
> +	struct page **page_array = *pages, **new_array = NULL;
> +	int nr_pages_left = *nr_pages, i, j;
> +
> +	/* Store head pages only*/
> +	new_array = kvmalloc_array(nr_folios, sizeof(struct page *),
> +					GFP_KERNEL);
> +	if (!new_array)
> +		return false;
> +
> +	new_array[0] = page_array[0];
> +	/*
> +	 * The pages are bound to the folio, it doesn't
> +	 * actually unpin them but drops all but one reference,
> +	 * which is usually put down by io_buffer_unmap().
> +	 * Note, needs a better helper.
> +	 */
> +	if (data->nr_pages_head > 1)
> +		unpin_user_pages(&page_array[1], data->nr_pages_head - 1);
> +
> +	j = data->nr_pages_head;
> +	nr_pages_left -= data->nr_pages_head;
> +	for (i = 1; i < nr_folios; i++) {
> +		unsigned int nr_unpin;
> +
> +		new_array[i] = page_array[j];
> +		nr_unpin = min_t(unsigned int, nr_pages_left - 1,
> +					data->nr_pages_mid - 1);
> +		if (nr_unpin)
> +			unpin_user_pages(&page_array[j+1], nr_unpin);
> +		j += data->nr_pages_mid;
> +		nr_pages_left -= data->nr_pages_mid;
> +	}
> +	kvfree(page_array);
> +	*pages = new_array;
> +	*nr_pages = nr_folios;
> +	return true;
> +}
> +
> +static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
> +					 struct io_imu_folio_data *data)

I believe unused static function will trigger a warning, we don't
want that, especially since error on warn is a thing.

You can either reshuffle patches or at least add a
__maybe_unused attribute.


> +{
> +	struct page **page_array = *pages;
> +	struct folio *folio = page_folio(page_array[0]);
> +	unsigned int count = 1, nr_folios = 1;
> +	int i;
> +
> +	if (*nr_pages <= 1)
> +		return false;
> +
> +	data->nr_pages_mid = folio_nr_pages(folio);
> +	if (data->nr_pages_mid == 1)
> +		return false;
> +
> +	data->folio_shift = folio_shift(folio);
> +	data->folio_size = folio_size(folio);
> +	/*
> +	 * Check if pages are contiguous inside a folio, and all folios have
> +	 * the same page count except for the head and tail.
> +	 */
> +	for (i = 1; i < *nr_pages; i++) {
> +		if (page_folio(page_array[i]) == folio &&
> +			page_array[i] == page_array[i-1] + 1) {
> +			count++;
> +			continue;
> +		}

Seems like the first and last folios can be not border aligned,
i.e. the first should end at the folio_size boundary, and the
last one should start at the beginning of the folio.

Not really a bug, but we might get some problems with optimising
calculations down the road if we don't restrict it.

> +
> +		if (nr_folios == 1)
> +			data->nr_pages_head = count;
> +		else if (count != data->nr_pages_mid)
> +			return false;
> +
> +		folio = page_folio(page_array[i]);
> +		if (folio_size(folio) != data->folio_size)
> +			return false;
> +
> +		count = 1;
> +		nr_folios++;
> +	}
> +	if (nr_folios == 1)
> +		data->nr_pages_head = count;
> +
> +	return io_do_coalesce_buffer(pages, nr_pages, data, nr_folios);
> +}
> +
>   static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
>   				  struct io_mapped_ubuf **pimu,
>   				  struct page **last_hpage)
> diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
> index c032ca3436ca..cc66323535f6 100644
> --- a/io_uring/rsrc.h
> +++ b/io_uring/rsrc.h
> @@ -50,6 +50,15 @@ struct io_mapped_ubuf {
>   	struct bio_vec	bvec[] __counted_by(nr_bvecs);
>   };
>   
> +struct io_imu_folio_data {
> +	/* Head folio can be partially included in the fixed buf */
> +	unsigned int	nr_pages_head;
> +	/* For non-head/tail folios, has to be fully included */
> +	unsigned int	nr_pages_mid;
> +	unsigned int	folio_shift;
> +	size_t		folio_size;
> +};
> +
>   void io_rsrc_node_ref_zero(struct io_rsrc_node *node);
>   void io_rsrc_node_destroy(struct io_ring_ctx *ctx, struct io_rsrc_node *ref_node);
>   struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx);

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 3/3] io_uring/rsrc: enable multi-hugepage buffer coalescing
  2024-06-28  8:44     ` [PATCH v5 3/3] io_uring/rsrc: enable multi-hugepage buffer coalescing Chenliang Li
@ 2024-07-09 13:17       ` Pavel Begunkov
       [not found]         ` <CGME20240710022900epcas5p368c4ebc44f3ace1ca0804116bd913512@epcas5p3.samsung.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Begunkov @ 2024-07-09 13:17 UTC (permalink / raw)
  To: Chenliang Li, axboe
  Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev

On 6/28/24 09:44, Chenliang Li wrote:
> Modify io_sqe_buffer_register to enable the coalescing for
> multi-hugepage fixed buffers.
> 
> Signed-off-by: Chenliang Li <[email protected]>
> ---
>   io_uring/rsrc.c | 47 ++++++++++++++++-------------------------------
>   1 file changed, 16 insertions(+), 31 deletions(-)
> 
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 3198cf854db1..790ed3c1bcc8 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -945,7 +945,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
>   	unsigned long off;
>   	size_t size;
>   	int ret, nr_pages, i;
> -	struct folio *folio = NULL;
> +	struct io_imu_folio_data data;
> +	bool coalesced;
>   
>   	*pimu = (struct io_mapped_ubuf *)&dummy_ubuf;
>   	if (!iov->iov_base)
> @@ -960,31 +961,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
>   		goto done;
>   	}
>   
> -	/* If it's a huge page, try to coalesce them into a single bvec entry */
> -	if (nr_pages > 1) {
> -		folio = page_folio(pages[0]);
> -		for (i = 1; i < nr_pages; i++) {
> -			/*
> -			 * Pages must be consecutive and on the same folio for
> -			 * this to work
> -			 */
> -			if (page_folio(pages[i]) != folio ||
> -			    pages[i] != pages[i - 1] + 1) {
> -				folio = NULL;
> -				break;
> -			}
> -		}
> -		if (folio) {
> -			/*
> -			 * The pages are bound to the folio, it doesn't
> -			 * actually unpin them but drops all but one reference,
> -			 * which is usually put down by io_buffer_unmap().
> -			 * Note, needs a better helper.
> -			 */
> -			unpin_user_pages(&pages[1], nr_pages - 1);
> -			nr_pages = 1;
> -		}
> -	}
> +	/* If it's huge page(s), try to coalesce them into fewer bvec entries */
> +	coalesced = io_try_coalesce_buffer(&pages, &nr_pages, &data);
>   
>   	imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL);
>   	if (!imu)
> @@ -1004,17 +982,24 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
>   	imu->nr_bvecs = nr_pages;
>   	imu->folio_shift = PAGE_SHIFT;
>   	imu->folio_mask = PAGE_MASK;
> +	if (coalesced) {
> +		imu->folio_shift = data.folio_shift;
> +		imu->folio_mask = ~((1UL << data.folio_shift) - 1);
> +	}
>   	*pimu = imu;
>   	ret = 0;
>   
> -	if (folio) {
> -		bvec_set_page(&imu->bvec[0], pages[0], size, off);
> -		goto done;
> -	}
>   	for (i = 0; i < nr_pages; i++) {
>   		size_t vec_len;
>   
> -		vec_len = min_t(size_t, size, PAGE_SIZE - off);
> +		if (coalesced) {
> +			size_t seg_size = i ? data.folio_size :
> +				PAGE_SIZE * data.nr_pages_head;

When you're compacting the page array, instead of taking a middle
page for the first folio, you can set it to the first page in the
folio and fix up the offset. Kind of:

new_array[0] = compound_head(old_array[0]);
off += folio_page_idx(folio, old_array[0]) << PAGE_SHIFT;


With that change you should be able to treat it in a uniform way
without branching.

off = (unsigned long) iov->iov_base & ~folio_mask;
vec_len = min_t(size_t, size, folio_size - off);


> +
> +			vec_len = min_t(size_t, size, seg_size - off);
> +		} else {
> +			vec_len = min_t(size_t, size, PAGE_SIZE - off);
> +		}
>   		bvec_set_page(&imu->bvec[i], pages[i], vec_len, off);
>   		off = 0;
>   		size -= vec_len;

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 1/3] io_uring/rsrc: add hugepage fixed buffer coalesce helpers
       [not found]         ` <CGME20240710022336epcas5p2685a44c8e04962830f4e7f8ffee8168f@epcas5p2.samsung.com>
@ 2024-07-10  2:23           ` Chenliang Li
  0 siblings, 0 replies; 9+ messages in thread
From: Chenliang Li @ 2024-07-10  2:23 UTC (permalink / raw)
  To: asml.silence
  Cc: anuj20.g, axboe, cliang01.li, gost.dev, io-uring, joshi.k,
	kundan.kumar, peiwei.li

On 2024-07-09 13:09 UTC, Pavel Begunkov wrote:
> On 6/28/24 09:44, Chenliang Li wrote:
>> +static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages,
>> +					 struct io_imu_folio_data *data)
>
> I believe unused static function will trigger a warning, we don't
> want that, especially since error on warn is a thing.
>
> You can either reshuffle patches or at least add a
> __maybe_unused attribute.

OK, will reshuffle the patchset.

>> +	/*
>> +	 * Check if pages are contiguous inside a folio, and all folios have
>> +	 * the same page count except for the head and tail.
>> +	 */
>> +	for (i = 1; i < *nr_pages; i++) {
>> +		if (page_folio(page_array[i]) == folio &&
>> +			page_array[i] == page_array[i-1] + 1) {
>> +			count++;
>> +			continue;
>> +		}
>
> Seems like the first and last folios can be not border aligned,
> i.e. the first should end at the folio_size boundary, and the
> last one should start at the beginning of the folio.
>
> Not really a bug, but we might get some problems with optimising
> calculations down the road if we don't restrict it.

Will add restrictions for that.

Thanks,
Chenliang Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v5 3/3] io_uring/rsrc: enable multi-hugepage buffer coalescing
       [not found]         ` <CGME20240710022900epcas5p368c4ebc44f3ace1ca0804116bd913512@epcas5p3.samsung.com>
@ 2024-07-10  2:28           ` Chenliang Li
  0 siblings, 0 replies; 9+ messages in thread
From: Chenliang Li @ 2024-07-10  2:28 UTC (permalink / raw)
  To: asml.silence
  Cc: anuj20.g, axboe, cliang01.li, gost.dev, io-uring, joshi.k,
	kundan.kumar, peiwei.li

On 2024-07-09 13:17 UTC, Pavel Begunkov wrote:
> On 6/28/24 09:44, Chenliang Li wrote:
>> -	if (folio) {
>> -		bvec_set_page(&imu->bvec[0], pages[0], size, off);
>> -		goto done;
>> -	}
>>   	for (i = 0; i < nr_pages; i++) {
>>   		size_t vec_len;
>>   
>> -		vec_len = min_t(size_t, size, PAGE_SIZE - off);
>> +		if (coalesced) {
>> +			size_t seg_size = i ? data.folio_size :
>> +				PAGE_SIZE * data.nr_pages_head;
>
> When you're compacting the page array, instead of taking a middle
> page for the first folio, you can set it to the first page in the
> folio and fix up the offset. Kind of:
>
> new_array[0] = compound_head(old_array[0]);
> off += folio_page_idx(folio, old_array[0]) << PAGE_SHIFT;
>
>
> With that change you should be able to treat it in a uniform way
> without branching.
> 
> off = (unsigned long) iov->iov_base & ~folio_mask;
> vec_len = min_t(size_t, size, folio_size - off);

That's brilliant. Will change it this way.

Thanks,
Chenliang Li

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-07-10  2:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20240628084418epcas5p14c304761ca375a6afba3aa199c27f9e3@epcas5p1.samsung.com>
2024-06-28  8:44 ` [PATCH v5 0/3] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li
     [not found]   ` <CGME20240628084420epcas5p32f49e7c977695d20bcef7734eb2e38b4@epcas5p3.samsung.com>
2024-06-28  8:44     ` [PATCH v5 1/3] io_uring/rsrc: add hugepage fixed buffer coalesce helpers Chenliang Li
2024-07-09 13:09       ` Pavel Begunkov
     [not found]         ` <CGME20240710022336epcas5p2685a44c8e04962830f4e7f8ffee8168f@epcas5p2.samsung.com>
2024-07-10  2:23           ` Chenliang Li
     [not found]   ` <CGME20240628084422epcas5p3b5d4c93e5fa30069c703bcead1fa0033@epcas5p3.samsung.com>
2024-06-28  8:44     ` [PATCH v5 2/3] io_uring/rsrc: store folio shift and mask into imu Chenliang Li
     [not found]   ` <CGME20240628084424epcas5p3c34ec2fb8fb45752ef6a11447812ae0d@epcas5p3.samsung.com>
2024-06-28  8:44     ` [PATCH v5 3/3] io_uring/rsrc: enable multi-hugepage buffer coalescing Chenliang Li
2024-07-09 13:17       ` Pavel Begunkov
     [not found]         ` <CGME20240710022900epcas5p368c4ebc44f3ace1ca0804116bd913512@epcas5p3.samsung.com>
2024-07-10  2:28           ` Chenliang Li
     [not found]   ` <CGME20240708021432epcas5p4e7e74d81a42a559f2b059e94e7022740@epcas5p4.samsung.com>
2024-07-08  2:14     ` [PATCH v5 0/3] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox