public inbox for [email protected]
 help / color / mirror / Atom feed
* Re: [PATCH v4 3/4] io_uring/rsrc: add init and account functions for coalesced imus
@ 2024-06-16 17:43 Pavel Begunkov
       [not found] ` <CGME20240617031611epcas5p26e5c5f65a182af069427b1609f01d1d0@epcas5p2.samsung.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2024-06-16 17:43 UTC (permalink / raw)
  To: Chenliang Li, axboe
  Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev

On 5/14/24 08:54, Chenliang Li wrote:
> Introduce two functions to separate the coalesced imu alloc and
> accounting path from the original one. This helps to keep the original
> code path clean.
> 
> Signed-off-by: Chenliang Li <[email protected]>
> ---
>   io_uring/rsrc.c | 89 +++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 89 insertions(+)
> 
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 578d382ca9bc..53fac5f27bbf 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -871,6 +871,45 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
>   	return ret;
>   }
>   
> +static int io_coalesced_buffer_account_pin(struct io_ring_ctx *ctx,
> +					   struct page **pages,
> +					   struct io_mapped_ubuf *imu,
> +					   struct page **last_hpage,
> +					   struct io_imu_folio_data *data)
> +{
> +	int i, j, ret;
> +
> +	imu->acct_pages = 0;
> +	j = 0;
> +	for (i = 0; i < data->nr_folios; i++) {
> +		struct page *hpage = pages[j];
> +
> +		if (hpage == *last_hpage)
> +			continue;
> +		*last_hpage = hpage;
> +		/*
> +		 * Already checked the page array in try coalesce,
> +		 * so pass in nr_pages=0 here to waive that.
> +		 */
> +		if (headpage_already_acct(ctx, pages, 0, hpage))
> +			continue;
> +		imu->acct_pages += data->nr_pages_mid;
> +		if (i)
> +			j += data->nr_pages_mid;
> +		else
> +			j = data->nr_pages_head;

You should account an entire folio here, i.e. ->nr_pages_mid
in either case. Let's say the first page in the registration
is the last page of a huge page, you'd account 4K while it
actually pins the entire huge page size.
It seems like you can just call io_buffer_account_pin()
instead.

On that note, you shouldn't duplicate code in either case,
just treat the normal discontig pages case as folios of
shift=PAGE_SHIFT.

Either just plain reuse or adjust io_buffer_account_pin()
instead of io_coalesced_buffer_account_pin().
io_coalesced_imu_alloc() should also go away.

io_sqe_buffer_register() {
	struct io_imu_folio_data data;

	if (!io_sqe_buffer_try_coalesce(pages, folio_data)) {
		folio_data.shift = PAGE_SHIFT;
		...
	}
	
	io_buffer_account_pin(pages, &data);
	imu->data = uaddr;
	...
}

> +	}
> +
> +	if (!imu->acct_pages)
> +		return 0;
> +
> +	ret = io_account_mem(ctx, imu->acct_pages);
> +	if (!ret)
> +		return 0;
> +	imu->acct_pages = 0;
> +	return ret;
> +}
> +
>   static bool __io_sqe_buffer_try_coalesce(struct page **pages, int nr_pages,
>   					 struct io_imu_folio_data *data)
>   {
> @@ -949,6 +988,56 @@ static bool io_sqe_buffer_try_coalesce(struct page **pages, int nr_pages,
>   	return true;
>   }
>   
> +static int io_coalesced_imu_alloc(struct io_ring_ctx *ctx, struct iovec *iov,
> +				  struct io_mapped_ubuf **pimu,
> +				  struct page **last_hpage, struct page **pages,
> +				  struct io_imu_folio_data *data)
> +{
> +	struct io_mapped_ubuf *imu = NULL;
> +	unsigned long off;
> +	size_t size, vec_len;
> +	int ret, i, j;
> +
> +	ret = -ENOMEM;
> +	imu = kvmalloc(struct_size(imu, bvec, data->nr_folios), GFP_KERNEL);
> +	if (!imu)
> +		return ret;
> +
> +	ret = io_coalesced_buffer_account_pin(ctx, pages, imu, last_hpage,
> +						data);
> +	if (ret) {
> +		unpin_user_page(pages[0]);
> +		j = data->nr_pages_head;
> +		for (i = 1; i < data->nr_folios; i++) {
> +			unpin_user_page(pages[j]);
> +			j += data->nr_pages_mid;
> +		}
> +		return ret;
> +	}
> +	off = (unsigned long) iov->iov_base & ~PAGE_MASK;
> +	size = iov->iov_len;
> +	/* store original address for later verification */
> +	imu->ubuf = (unsigned long) iov->iov_base;
> +	imu->ubuf_end = imu->ubuf + iov->iov_len;
> +	imu->nr_bvecs = data->nr_folios;
> +	imu->folio_shift = data->folio_shift;
> +	imu->folio_mask = ~((1UL << data->folio_shift) - 1);
> +	*pimu = imu;
> +	ret = 0;
> +
> +	vec_len = min_t(size_t, size, PAGE_SIZE * data->nr_pages_head - off);
> +	bvec_set_page(&imu->bvec[0], pages[0], vec_len, off);
> +	size -= vec_len;
> +	j = data->nr_pages_head;
> +	for (i = 1; i < data->nr_folios; i++) {
> +		vec_len = min_t(size_t, size, data->folio_size);
> +		bvec_set_page(&imu->bvec[i], pages[j], vec_len, 0);
> +		size -= vec_len;
> +		j += data->nr_pages_mid;
> +	}
> +	return ret;
> +}
> +
>   static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
>   				  struct io_mapped_ubuf **pimu,
>   				  struct page **last_hpage)

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: [PATCH v4 1/4] io_uring/rsrc: add hugepage buffer coalesce helpers
@ 2024-06-16 18:04 Pavel Begunkov
       [not found] ` <CGME20240617031218epcas5p4f706f53094ed8650a2b59b2006120956@epcas5p4.samsung.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Pavel Begunkov @ 2024-06-16 18:04 UTC (permalink / raw)
  To: Chenliang Li, axboe
  Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev

On 5/14/24 08:54, Chenliang Li wrote:
> Introduce helper functions to check whether a buffer can
> be coalesced or not, and gather folio data for later use.
> 
> The coalescing optimizes time and space consumption caused
> by mapping and storing multi-hugepage fixed buffers.
> 
> A coalescable multi-hugepage buffer should fully cover its folios
> (except potentially the first and last one), and these folios should
> have the same size. These requirements are for easier later process,
> also we need same size'd chunks in io_import_fixed for fast iov_iter
> adjust.
> 
> Signed-off-by: Chenliang Li <[email protected]>
> ---
>   io_uring/rsrc.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++
>   io_uring/rsrc.h | 10 +++++++
>   2 files changed, 88 insertions(+)
> 
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 65417c9553b1..d08224c0c5b0 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -871,6 +871,84 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
>   	return ret;
>   }
>   
> +static bool __io_sqe_buffer_try_coalesce(struct page **pages, int nr_pages,
> +					 struct io_imu_folio_data *data)

io_can_coalesce_buffer(), you're not actually trying to
do it here.

> +{
> +	struct folio *folio = page_folio(pages[0]);
> +	unsigned int count = 1;
> +	int i;
> +
> +	data->nr_pages_mid = folio_nr_pages(folio);
> +	if (data->nr_pages_mid == 1)
> +		return false;
> +
> +	data->folio_shift = folio_shift(folio);
> +	data->folio_size = folio_size(folio);
> +	data->nr_folios = 1;
> +	/*
> +	 * Check if pages are contiguous inside a folio, and all folios have
> +	 * the same page count except for the head and tail.
> +	 */
> +	for (i = 1; i < nr_pages; i++) {
> +		if (page_folio(pages[i]) == folio &&
> +			pages[i] == pages[i-1] + 1) {
> +			count++;
> +			continue;
> +		}
> +
> +		if (data->nr_folios == 1)
> +			data->nr_pages_head = count;
> +		else if (count != data->nr_pages_mid)
> +			return false;
> +
> +		folio = page_folio(pages[i]);
> +		if (folio_size(folio) != data->folio_size)
> +			return false;
> +
> +		count = 1;
> +		data->nr_folios++;
> +	}
> +	if (data->nr_folios == 1)
> +		data->nr_pages_head = count;
> +
> +	return true;
> +}
> +
> +static bool io_sqe_buffer_try_coalesce(struct page **pages, int nr_pages,
> +				       struct io_imu_folio_data *data)
> +{
> +	int i, j;
> +
> +	if (nr_pages <= 1 ||
> +		!__io_sqe_buffer_try_coalesce(pages, nr_pages, data))
> +		return false;
> +
> +	/*
> +	 * The pages are bound to the folio, it doesn't
> +	 * actually unpin them but drops all but one reference,
> +	 * which is usually put down by io_buffer_unmap().
> +	 * Note, needs a better helper.
> +	 */
> +	if (data->nr_pages_head > 1)
> +		unpin_user_pages(&pages[1], data->nr_pages_head - 1);

Should be pages[0]. page[1] can be in another folio, and even
though data->nr_pages_head > 1 protects against touching it,
it's still flimsy.

> +
> +	j = data->nr_pages_head;
> +	nr_pages -= data->nr_pages_head;
> +	for (i = 1; i < data->nr_folios; i++) {
> +		unsigned int nr_unpin;
> +
> +		nr_unpin = min_t(unsigned int, nr_pages - 1,
> +					data->nr_pages_mid - 1);
> +		if (nr_unpin == 0)
> +			break;
> +		unpin_user_pages(&pages[j+1], nr_unpin);

same

> +		j += data->nr_pages_mid;

And instead of duplicating this voodoo iteration later,
please just assemble a new compacted ->nr_folios sized
page array.

> +		nr_pages -= data->nr_pages_mid;
> +	}
> +
> +	return true;
> +}
> +
>   static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov,
>   				  struct io_mapped_ubuf **pimu,
>   				  struct page **last_hpage)
> diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
> index c032ca3436ca..b2a9d66b76dd 100644
> --- a/io_uring/rsrc.h
> +++ b/io_uring/rsrc.h
> @@ -50,6 +50,16 @@ struct io_mapped_ubuf {
>   	struct bio_vec	bvec[] __counted_by(nr_bvecs);
>   };
>   
> +struct io_imu_folio_data {
> +	/* Head folio can be partially included in the fixed buf */
> +	unsigned int	nr_pages_head;
> +	/* For non-head/tail folios, has to be fully included */
> +	unsigned int	nr_pages_mid;
> +	unsigned int	nr_folios;
> +	unsigned int	folio_shift;
> +	size_t		folio_size;
> +};
> +
>   void io_rsrc_node_ref_zero(struct io_rsrc_node *node);
>   void io_rsrc_node_destroy(struct io_ring_ctx *ctx, struct io_rsrc_node *ref_node);
>   struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx);

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-06-17 12:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20240511055242epcas5p46612dde17997c140232207540e789a2e@epcas5p4.samsung.com>
2024-05-11  5:52 ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li
     [not found]   ` <CGME20240511055243epcas5p291fc5f72baf211a79475ec36682e170d@epcas5p2.samsung.com>
2024-05-11  5:52     ` [PATCH v2 1/4] io_uring/rsrc: add hugepage buffer coalesce helpers Chenliang Li
2024-05-11 16:43       ` Jens Axboe
     [not found]   ` <CGME20240511055245epcas5p407cdbc005fb5f0fe2d9bbb8da423ff28@epcas5p4.samsung.com>
2024-05-11  5:52     ` [PATCH v2 2/4] io_uring/rsrc: store folio shift and mask into imu Chenliang Li
     [not found]   ` <CGME20240511055247epcas5p2a54e23b6dddd11dda962733d259a10af@epcas5p2.samsung.com>
2024-05-11  5:52     ` [PATCH v2 3/4] io_uring/rsrc: add init and account functions for coalesced imus Chenliang Li
2024-05-11 16:48       ` Jens Axboe
     [not found]         ` <CGME20240513021656epcas5p2367b442e02b07e6405b857f98a4eff44@epcas5p2.samsung.com>
2024-05-13  2:16           ` Chenliang Li
     [not found]   ` <CGME20240511055248epcas5p287b7dfdab3162033744badc71fd084e1@epcas5p2.samsung.com>
2024-05-11  5:52     ` [PATCH v2 4/4] io_uring/rsrc: enable multi-hugepage buffer coalescing Chenliang Li
2024-05-11 16:49       ` Jens Axboe
2024-05-11 16:43   ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Jens Axboe
     [not found]     ` <CGME20240513020155epcas5p23699782b97749bfcce0511ce5378df3c@epcas5p2.samsung.com>
2024-05-13  2:01       ` Chenliang Li
2024-06-16 17:43 [PATCH v4 3/4] io_uring/rsrc: add init and account functions for coalesced imus Pavel Begunkov
     [not found] ` <CGME20240617031611epcas5p26e5c5f65a182af069427b1609f01d1d0@epcas5p2.samsung.com>
2024-06-17  3:16   ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li
2024-06-17 12:22     ` Pavel Begunkov
  -- strict thread matches above, loose matches on Subject: below --
2024-06-16 18:04 [PATCH v4 1/4] io_uring/rsrc: add hugepage buffer coalesce helpers Pavel Begunkov
     [not found] ` <CGME20240617031218epcas5p4f706f53094ed8650a2b59b2006120956@epcas5p4.samsung.com>
2024-06-17  3:12   ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li
2024-06-17 12:38     ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox