* [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers [not found] <CGME20240511055242epcas5p46612dde17997c140232207540e789a2e@epcas5p4.samsung.com> @ 2024-05-11 5:52 ` Chenliang Li [not found] ` <CGME20240511055243epcas5p291fc5f72baf211a79475ec36682e170d@epcas5p2.samsung.com> ` (4 more replies) 0 siblings, 5 replies; 11+ messages in thread From: Chenliang Li @ 2024-05-11 5:52 UTC (permalink / raw) To: axboe, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev, Chenliang Li Registered buffers are stored and processed in the form of bvec array, each bvec element typically points to a PAGE_SIZE page but can also work with hugepages. Specifically, a buffer consisting of a hugepage is coalesced to use only one hugepage bvec entry during registration. This coalescing feature helps to save both the space and DMA-mapping time. However, currently the coalescing feature doesn't work for multi-hugepage buffers. For a buffer with several 2M hugepages, we still split it into thousands of 4K page bvec entries while in fact, we can just use a handful of hugepage bvecs. This patch series enables coalescing registered buffers with more than one hugepages. It optimizes the DMA-mapping time and saves memory for these kind of buffers. Perf diff of 8M(4*2M) hugepage fixed buffer fio test: fio/t/io_uring -d64 -s32 -c32 -b8388608 -p0 -B1 -F0 -n1 -O1 -r10 \ -R1 /dev/nvme0n1 Before After Symbol 5.90% [k] __blk_rq_map_sg 3.70% [k] dma_direct_map_sg 3.07% [k] dma_pool_alloc 1.12% [k] sg_next +0.44% [k] dma_map_page_attrs First three patches prepare for adding the multi-hugepage coalescing into buffer registration, the 4th patch enables the feature. ----------------- Changes since v1: - Split into 4 patches - Fix code style issues - Rearrange the change of code for cleaner look - Add speciallized pinned page accounting procedure for coalesced buffers - Reordered the newly add fields in imu struct for better compaction v1 : https://lore.kernel.org/io-uring/[email protected]/T/#u Chenliang Li (4): io_uring/rsrc: add hugepage buffer coalesce helpers io_uring/rsrc: store folio shift and mask into imu io_uring/rsrc: add init and account functions for coalesced imus io_uring/rsrc: enable multi-hugepage buffer coalescing io_uring/rsrc.c | 214 +++++++++++++++++++++++++++++++++++++++--------- io_uring/rsrc.h | 12 +++ 2 files changed, 188 insertions(+), 38 deletions(-) base-commit: 59b28a6e37e650c0d601ed87875b6217140cda5d -- 2.34.1 ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240511055243epcas5p291fc5f72baf211a79475ec36682e170d@epcas5p2.samsung.com>]
* [PATCH v2 1/4] io_uring/rsrc: add hugepage buffer coalesce helpers [not found] ` <CGME20240511055243epcas5p291fc5f72baf211a79475ec36682e170d@epcas5p2.samsung.com> @ 2024-05-11 5:52 ` Chenliang Li 2024-05-11 16:43 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Chenliang Li @ 2024-05-11 5:52 UTC (permalink / raw) To: axboe, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev, Chenliang Li This patch introduces helper functions to check whether a buffer can be coalesced or not, and gather folio data for later use. The coalescing optimizes time and space consumption caused by mapping and storing multi-hugepage fixed buffers. A coalescable multi-hugepage buffer should fully cover its folios (except potentially the first and last one), and these folios should have the same size. These requirements are for easier later process, also we need same size'd chunks in io_import_fixed for fast iov_iter adjust. Signed-off-by: Chenliang Li <[email protected]> --- io_uring/rsrc.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++ io_uring/rsrc.h | 10 +++++++ 2 files changed, 88 insertions(+) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 65417c9553b1..d08224c0c5b0 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -871,6 +871,84 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, return ret; } +static bool __io_sqe_buffer_try_coalesce(struct page **pages, int nr_pages, + struct io_imu_folio_data *data) +{ + struct folio *folio = page_folio(pages[0]); + unsigned int count = 1; + int i; + + data->nr_pages_mid = folio_nr_pages(folio); + if (data->nr_pages_mid == 1) + return false; + + data->folio_shift = folio_shift(folio); + data->folio_size = folio_size(folio); + data->nr_folios = 1; + /* + * Check if pages are contiguous inside a folio, and all folios have + * the same page count except for the head and tail. + */ + for (i = 1; i < nr_pages; i++) { + if (page_folio(pages[i]) == folio && + pages[i] == pages[i-1] + 1) { + count++; + continue; + } + + if (data->nr_folios == 1) + data->nr_pages_head = count; + else if (count != data->nr_pages_mid) + return false; + + folio = page_folio(pages[i]); + if (folio_size(folio) != data->folio_size) + return false; + + count = 1; + data->nr_folios++; + } + if (data->nr_folios == 1) + data->nr_pages_head = count; + + return true; +} + +static bool io_sqe_buffer_try_coalesce(struct page **pages, int nr_pages, + struct io_imu_folio_data *data) +{ + int i, j; + + if (nr_pages <= 1 || + !__io_sqe_buffer_try_coalesce(pages, nr_pages, data)) + return false; + + /* + * The pages are bound to the folio, it doesn't + * actually unpin them but drops all but one reference, + * which is usually put down by io_buffer_unmap(). + * Note, needs a better helper. + */ + if (data->nr_pages_head > 1) + unpin_user_pages(&pages[1], data->nr_pages_head - 1); + + j = data->nr_pages_head; + nr_pages -= data->nr_pages_head; + for (i = 1; i < data->nr_folios; i++) { + unsigned int nr_unpin; + + nr_unpin = min_t(unsigned int, nr_pages - 1, + data->nr_pages_mid - 1); + if (nr_unpin == 0) + break; + unpin_user_pages(&pages[j+1], nr_unpin); + j += data->nr_pages_mid; + nr_pages -= data->nr_pages_mid; + } + + return true; +} + static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, struct io_mapped_ubuf **pimu, struct page **last_hpage) diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index c032ca3436ca..b2a9d66b76dd 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -50,6 +50,16 @@ struct io_mapped_ubuf { struct bio_vec bvec[] __counted_by(nr_bvecs); }; +struct io_imu_folio_data { + /* Head folio can be partially included in the fixed buf */ + unsigned int nr_pages_head; + /* For non-head/tail folios, has to be fully included */ + unsigned int nr_pages_mid; + unsigned int nr_folios; + unsigned int folio_shift; + size_t folio_size; +}; + void io_rsrc_node_ref_zero(struct io_rsrc_node *node); void io_rsrc_node_destroy(struct io_ring_ctx *ctx, struct io_rsrc_node *ref_node); struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx); -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 1/4] io_uring/rsrc: add hugepage buffer coalesce helpers 2024-05-11 5:52 ` [PATCH v2 1/4] io_uring/rsrc: add hugepage buffer coalesce helpers Chenliang Li @ 2024-05-11 16:43 ` Jens Axboe 0 siblings, 0 replies; 11+ messages in thread From: Jens Axboe @ 2024-05-11 16:43 UTC (permalink / raw) To: Chenliang Li, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev On 5/10/24 11:52 PM, Chenliang Li wrote: > This patch introduces helper functions to check whether a buffer can > be coalesced or not, and gather folio data for later use. Introduce helper functions to check whether a buffer can be coalesced or not, and gather folio data for later use. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240511055245epcas5p407cdbc005fb5f0fe2d9bbb8da423ff28@epcas5p4.samsung.com>]
* [PATCH v2 2/4] io_uring/rsrc: store folio shift and mask into imu [not found] ` <CGME20240511055245epcas5p407cdbc005fb5f0fe2d9bbb8da423ff28@epcas5p4.samsung.com> @ 2024-05-11 5:52 ` Chenliang Li 0 siblings, 0 replies; 11+ messages in thread From: Chenliang Li @ 2024-05-11 5:52 UTC (permalink / raw) To: axboe, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev, Chenliang Li Store the folio shift and folio mask into imu struct and use it in iov_iter adjust, as we will have non PAGE_SIZE'd chunks if a multi-hugepage buffer get coalesced. Signed-off-by: Chenliang Li <[email protected]> --- io_uring/rsrc.c | 6 ++++-- io_uring/rsrc.h | 2 ++ 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index d08224c0c5b0..578d382ca9bc 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1015,6 +1015,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, imu->ubuf = (unsigned long) iov->iov_base; imu->ubuf_end = imu->ubuf + iov->iov_len; imu->nr_bvecs = nr_pages; + imu->folio_shift = PAGE_SHIFT; + imu->folio_mask = PAGE_MASK; *pimu = imu; ret = 0; @@ -1153,12 +1155,12 @@ int io_import_fixed(int ddir, struct iov_iter *iter, /* skip first vec */ offset -= bvec->bv_len; - seg_skip = 1 + (offset >> PAGE_SHIFT); + seg_skip = 1 + (offset >> imu->folio_shift); iter->bvec = bvec + seg_skip; iter->nr_segs -= seg_skip; iter->count -= bvec->bv_len + offset; - iter->iov_offset = offset & ~PAGE_MASK; + iter->iov_offset = offset & ~imu->folio_mask; } } diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index b2a9d66b76dd..93da02e652bc 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -46,7 +46,9 @@ struct io_mapped_ubuf { u64 ubuf; u64 ubuf_end; unsigned int nr_bvecs; + unsigned int folio_shift; unsigned long acct_pages; + unsigned long folio_mask; struct bio_vec bvec[] __counted_by(nr_bvecs); }; -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
[parent not found: <CGME20240511055247epcas5p2a54e23b6dddd11dda962733d259a10af@epcas5p2.samsung.com>]
* [PATCH v2 3/4] io_uring/rsrc: add init and account functions for coalesced imus [not found] ` <CGME20240511055247epcas5p2a54e23b6dddd11dda962733d259a10af@epcas5p2.samsung.com> @ 2024-05-11 5:52 ` Chenliang Li 2024-05-11 16:48 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Chenliang Li @ 2024-05-11 5:52 UTC (permalink / raw) To: axboe, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev, Chenliang Li This patch depends on patch 1 and 2. Introduces two functions to separate the coalesced imu alloc and accounting path from the original one. This helps to keep the original code path clean. Signed-off-by: Chenliang Li <[email protected]> --- io_uring/rsrc.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 578d382ca9bc..7f95eba72f1c 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -871,6 +871,42 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, return ret; } +static int io_coalesced_buffer_account_pin(struct io_ring_ctx *ctx, + struct page **pages, + struct io_mapped_ubuf *imu, + struct page **last_hpage, + struct io_imu_folio_data *data) +{ + int i, j, ret; + + imu->acct_pages = 0; + j = 0; + for (i = 0; i < data->nr_folios; i++) { + struct page *hpage = pages[j]; + + if (hpage == *last_hpage) + continue; + *last_hpage = hpage; + /* + * Already checked the page array in try coalesce, + * so pass in nr_pages=0 here to waive that. + */ + if (headpage_already_acct(ctx, pages, 0, hpage)) + continue; + imu->acct_pages += data->nr_pages_mid; + j += (i == 0) ? + data->nr_pages_head : data->nr_pages_mid; + } + + if (!imu->acct_pages) + return 0; + + ret = io_account_mem(ctx, imu->acct_pages); + if (ret) + imu->acct_pages = 0; + return ret; +} + static bool __io_sqe_buffer_try_coalesce(struct page **pages, int nr_pages, struct io_imu_folio_data *data) { @@ -949,6 +985,56 @@ static bool io_sqe_buffer_try_coalesce(struct page **pages, int nr_pages, return true; } +static int io_coalesced_imu_alloc(struct io_ring_ctx *ctx, struct iovec *iov, + struct io_mapped_ubuf **pimu, + struct page **last_hpage, struct page **pages, + struct io_imu_folio_data *data) +{ + struct io_mapped_ubuf *imu = NULL; + unsigned long off; + size_t size, vec_len; + int ret, i, j; + + ret = -ENOMEM; + imu = kvmalloc(struct_size(imu, bvec, data->nr_folios), GFP_KERNEL); + if (!imu) + return ret; + + ret = io_coalesced_buffer_account_pin(ctx, pages, imu, last_hpage, + data); + if (ret) { + j = 0; + for (i = 0; i < data->nr_folios; i++) { + unpin_user_page(pages[j]); + j += (i == 0) ? + data->nr_pages_head : data->nr_pages_mid; + } + return ret; + } + off = (unsigned long) iov->iov_base & ~PAGE_MASK; + size = iov->iov_len; + /* store original address for later verification */ + imu->ubuf = (unsigned long) iov->iov_base; + imu->ubuf_end = imu->ubuf + iov->iov_len; + imu->nr_bvecs = data->nr_folios; + imu->folio_shift = data->folio_shift; + imu->folio_mask = ~((1UL << data->folio_shift) - 1); + *pimu = imu; + ret = 0; + + vec_len = min_t(size_t, size, PAGE_SIZE * data->nr_pages_head - off); + bvec_set_page(&imu->bvec[0], pages[0], vec_len, off); + size -= vec_len; + j = data->nr_pages_head; + for (i = 1; i < data->nr_folios; i++) { + vec_len = min_t(size_t, size, data->folio_size); + bvec_set_page(&imu->bvec[i], pages[j], vec_len, 0); + size -= vec_len; + j += data->nr_pages_mid; + } + return ret; +} + static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, struct io_mapped_ubuf **pimu, struct page **last_hpage) -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/4] io_uring/rsrc: add init and account functions for coalesced imus 2024-05-11 5:52 ` [PATCH v2 3/4] io_uring/rsrc: add init and account functions for coalesced imus Chenliang Li @ 2024-05-11 16:48 ` Jens Axboe [not found] ` <CGME20240513021656epcas5p2367b442e02b07e6405b857f98a4eff44@epcas5p2.samsung.com> 0 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2024-05-11 16:48 UTC (permalink / raw) To: Chenliang Li, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev On 5/10/24 11:52 PM, Chenliang Li wrote: > This patch depends on patch 1 and 2. What does "patch 1 and 2" mean here, once it's in the git log? It doesn't really mean anything. It's quite natural for patches in a series to have dependencies on each other, eg patch 3 requirest 1 and 2. Highlighting it doesn't really add anything, so just get rid of that. > Introduces two functions to separate the coalesced imu alloc and > accounting path from the original one. This helps to keep the original > code path clean. > > Signed-off-by: Chenliang Li <[email protected]> > --- > io_uring/rsrc.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 86 insertions(+) > > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c > index 578d382ca9bc..7f95eba72f1c 100644 > --- a/io_uring/rsrc.c > +++ b/io_uring/rsrc.c > @@ -871,6 +871,42 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, > return ret; > } > > +static int io_coalesced_buffer_account_pin(struct io_ring_ctx *ctx, > + struct page **pages, > + struct io_mapped_ubuf *imu, > + struct page **last_hpage, > + struct io_imu_folio_data *data) > +{ > + int i, j, ret; > + > + imu->acct_pages = 0; > + j = 0; > + for (i = 0; i < data->nr_folios; i++) { > + struct page *hpage = pages[j]; > + > + if (hpage == *last_hpage) > + continue; > + *last_hpage = hpage; > + /* > + * Already checked the page array in try coalesce, > + * so pass in nr_pages=0 here to waive that. > + */ > + if (headpage_already_acct(ctx, pages, 0, hpage)) > + continue; > + imu->acct_pages += data->nr_pages_mid; > + j += (i == 0) ? > + data->nr_pages_head : data->nr_pages_mid; Can we just initialize 'j' to data->nr_pages_head and change this to be: if (i) j += data->nr_pages_mid; That would be a lot cleaner. > + if (!imu->acct_pages) > + return 0; > + > + ret = io_account_mem(ctx, imu->acct_pages); > + if (ret) > + imu->acct_pages = 0; > + return ret; > +} ret = io_account_mem(ctx, imu->acct_pages); if (!ret) return 0; imu->acct_pages = 0; return ret; > + struct io_mapped_ubuf **pimu, > + struct page **last_hpage, struct page **pages, > + struct io_imu_folio_data *data) > +{ > + struct io_mapped_ubuf *imu = NULL; > + unsigned long off; > + size_t size, vec_len; > + int ret, i, j; > + > + ret = -ENOMEM; > + imu = kvmalloc(struct_size(imu, bvec, data->nr_folios), GFP_KERNEL); > + if (!imu) > + return ret; > + > + ret = io_coalesced_buffer_account_pin(ctx, pages, imu, last_hpage, > + data); > + if (ret) { > + j = 0; > + for (i = 0; i < data->nr_folios; i++) { > + unpin_user_page(pages[j]); > + j += (i == 0) ? > + data->nr_pages_head : data->nr_pages_mid; > + } > + return ret; Same comment here. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240513021656epcas5p2367b442e02b07e6405b857f98a4eff44@epcas5p2.samsung.com>]
* Re: [PATCH v2 3/4] io_uring/rsrc: add init and account functions for coalesced imus [not found] ` <CGME20240513021656epcas5p2367b442e02b07e6405b857f98a4eff44@epcas5p2.samsung.com> @ 2024-05-13 2:16 ` Chenliang Li 0 siblings, 0 replies; 11+ messages in thread From: Chenliang Li @ 2024-05-13 2:16 UTC (permalink / raw) To: axboe Cc: asml.silence, cliang01.li, gost.dev, io-uring, joshi.k, kundan.kumar, peiwei.li On Sat, 11 May 2024 10:48:18 -0600 Jens Axboe wrote: > On 5/10/24 11:52 PM, Chenliang Li wrote: >> This patch depends on patch 1 and 2. > What does "patch 1 and 2" mean here, once it's in the git log? It > doesn't really mean anything. It's quite natural for patches in a series > to have dependencies on each other, eg patch 3 requirest 1 and 2. > Highlighting it doesn't really add anything, so just get rid of that. will delete that in V3. >> +static int io_coalesced_buffer_account_pin(struct io_ring_ctx *ctx, >> + struct page **pages, >> + struct io_mapped_ubuf *imu, >> + struct page **last_hpage, >> + struct io_imu_folio_data *data) >> +{ >> + int i, j, ret; >> + >> + imu->acct_pages = 0; >> + j = 0; >> + for (i = 0; i < data->nr_folios; i++) { >> + struct page *hpage = pages[j]; >> + >> + if (hpage == *last_hpage) >> + continue; >> + *last_hpage = hpage; >> + /* >> + * Already checked the page array in try coalesce, >> + * so pass in nr_pages=0 here to waive that. >> + */ >> + if (headpage_already_acct(ctx, pages, 0, hpage)) >> + continue; >> + imu->acct_pages += data->nr_pages_mid; >> + j += (i == 0) ? >> + data->nr_pages_head : data->nr_pages_mid; > > Can we just initialize 'j' to data->nr_pages_head and change this to be: > > if (i) > j += data->nr_pages_mid; Yes, will change it in V3. >> + if (!imu->acct_pages) >> + return 0; >> + >> + ret = io_account_mem(ctx, imu->acct_pages); >> + if (ret) >> + imu->acct_pages = 0; >> + return ret; >> +} > > ret = io_account_mem(ctx, imu->acct_pages); > if (!ret) > return 0; > imu->acct_pages = 0; > return ret; Will change it. >> + if (ret) { >> + j = 0; >> + for (i = 0; i < data->nr_folios; i++) { >> + unpin_user_page(pages[j]); >> + j += (i == 0) ? >> + data->nr_pages_head : data->nr_pages_mid; >> + } >> + return ret; > Same comment here. Will change it. ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240511055248epcas5p287b7dfdab3162033744badc71fd084e1@epcas5p2.samsung.com>]
* [PATCH v2 4/4] io_uring/rsrc: enable multi-hugepage buffer coalescing [not found] ` <CGME20240511055248epcas5p287b7dfdab3162033744badc71fd084e1@epcas5p2.samsung.com> @ 2024-05-11 5:52 ` Chenliang Li 2024-05-11 16:49 ` Jens Axboe 0 siblings, 1 reply; 11+ messages in thread From: Chenliang Li @ 2024-05-11 5:52 UTC (permalink / raw) To: axboe, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev, Chenliang Li This patch depends on patch 1, 2, 3. It modifies the original buffer registration path to expand the one-hugepage coalescing feature to work with multi-hugepage buffers. Separated from previous patches to make it more easily reviewed. Signed-off-by: Chenliang Li <[email protected]> --- io_uring/rsrc.c | 44 ++++++++------------------------------------ 1 file changed, 8 insertions(+), 36 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 7f95eba72f1c..70acc76ff27c 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1044,7 +1044,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, unsigned long off; size_t size; int ret, nr_pages, i; - struct folio *folio = NULL; + struct io_imu_folio_data data; *pimu = (struct io_mapped_ubuf *)&dummy_ubuf; if (!iov->iov_base) @@ -1059,30 +1059,11 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, goto done; } - /* If it's a huge page, try to coalesce them into a single bvec entry */ - if (nr_pages > 1) { - folio = page_folio(pages[0]); - for (i = 1; i < nr_pages; i++) { - /* - * Pages must be consecutive and on the same folio for - * this to work - */ - if (page_folio(pages[i]) != folio || - pages[i] != pages[i - 1] + 1) { - folio = NULL; - break; - } - } - if (folio) { - /* - * The pages are bound to the folio, it doesn't - * actually unpin them but drops all but one reference, - * which is usually put down by io_buffer_unmap(). - * Note, needs a better helper. - */ - unpin_user_pages(&pages[1], nr_pages - 1); - nr_pages = 1; - } + /* If it's huge page(s), try to coalesce them into fewer bvec entries */ + if (io_sqe_buffer_try_coalesce(pages, nr_pages, &data)) { + ret = io_coalesced_imu_alloc(ctx, iov, pimu, last_hpage, + pages, &data); + goto done; } imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL); @@ -1106,10 +1087,6 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, *pimu = imu; ret = 0; - if (folio) { - bvec_set_page(&imu->bvec[0], pages[0], size, off); - goto done; - } for (i = 0; i < nr_pages; i++) { size_t vec_len; @@ -1215,23 +1192,18 @@ int io_import_fixed(int ddir, struct iov_iter *iter, * we know that: * * 1) it's a BVEC iter, we set it up - * 2) all bvecs are PAGE_SIZE in size, except potentially the + * 2) all bvecs are the same in size, except potentially the * first and last bvec * * So just find our index, and adjust the iterator afterwards. * If the offset is within the first bvec (or the whole first * bvec, just use iov_iter_advance(). This makes it easier * since we can just skip the first segment, which may not - * be PAGE_SIZE aligned. + * be folio_size aligned. */ const struct bio_vec *bvec = imu->bvec; if (offset < bvec->bv_len) { - /* - * Note, huge pages buffers consists of one large - * bvec entry and should always go this way. The other - * branch doesn't expect non PAGE_SIZE'd chunks. - */ iter->bvec = bvec; iter->nr_segs = bvec->bv_len; iter->count -= offset; -- 2.34.1 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 4/4] io_uring/rsrc: enable multi-hugepage buffer coalescing 2024-05-11 5:52 ` [PATCH v2 4/4] io_uring/rsrc: enable multi-hugepage buffer coalescing Chenliang Li @ 2024-05-11 16:49 ` Jens Axboe 0 siblings, 0 replies; 11+ messages in thread From: Jens Axboe @ 2024-05-11 16:49 UTC (permalink / raw) To: Chenliang Li, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev On 5/10/24 11:52 PM, Chenliang Li wrote: > This patch depends on patch 1, 2, 3. Same comment. > It modifies the original buffer Modify the original buffer -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers 2024-05-11 5:52 ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li ` (3 preceding siblings ...) [not found] ` <CGME20240511055248epcas5p287b7dfdab3162033744badc71fd084e1@epcas5p2.samsung.com> @ 2024-05-11 16:43 ` Jens Axboe [not found] ` <CGME20240513020155epcas5p23699782b97749bfcce0511ce5378df3c@epcas5p2.samsung.com> 4 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2024-05-11 16:43 UTC (permalink / raw) To: Chenliang Li, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, gost.dev On 5/10/24 11:52 PM, Chenliang Li wrote: > Registered buffers are stored and processed in the form of bvec array, > each bvec element typically points to a PAGE_SIZE page but can also work > with hugepages. Specifically, a buffer consisting of a hugepage is > coalesced to use only one hugepage bvec entry during registration. > This coalescing feature helps to save both the space and DMA-mapping time. > > However, currently the coalescing feature doesn't work for multi-hugepage > buffers. For a buffer with several 2M hugepages, we still split it into > thousands of 4K page bvec entries while in fact, we can just use a > handful of hugepage bvecs. > > This patch series enables coalescing registered buffers with more than > one hugepages. It optimizes the DMA-mapping time and saves memory for > these kind of buffers. This series looks much better. Do you have a stand-alone test case for this? We should have that in liburing. Then we can also augment it with edge cases to ensure this is all safe and sound. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CGME20240513020155epcas5p23699782b97749bfcce0511ce5378df3c@epcas5p2.samsung.com>]
* Re: [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers [not found] ` <CGME20240513020155epcas5p23699782b97749bfcce0511ce5378df3c@epcas5p2.samsung.com> @ 2024-05-13 2:01 ` Chenliang Li 0 siblings, 0 replies; 11+ messages in thread From: Chenliang Li @ 2024-05-13 2:01 UTC (permalink / raw) To: axboe Cc: asml.silence, cliang01.li, gost.dev, io-uring, joshi.k, kundan.kumar, peiwei.li On 2024-05-11 16:43 Jens Axboe wrote: > On 5/10/24 11:52 PM, Chenliang Li wrote: >> Registered buffers are stored and processed in the form of bvec array, >> each bvec element typically points to a PAGE_SIZE page but can also work >> with hugepages. Specifically, a buffer consisting of a hugepage is >> coalesced to use only one hugepage bvec entry during registration. >> This coalescing feature helps to save both the space and DMA-mapping time. >> >> However, currently the coalescing feature doesn't work for multi-hugepage >> buffers. For a buffer with several 2M hugepages, we still split it into >> thousands of 4K page bvec entries while in fact, we can just use a >> handful of hugepage bvecs. >> >> This patch series enables coalescing registered buffers with more than >> one hugepages. It optimizes the DMA-mapping time and saves memory for >> these kind of buffers. > This series looks much better. Do you have a stand-alone test case > for this? We should have that in liburing. Then we can also augment it > with edge cases to ensure this is all safe and sound. Thanks! Yes, I have a liburing test case, will send it as a patch in V3. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-05-13 2:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20240511055242epcas5p46612dde17997c140232207540e789a2e@epcas5p4.samsung.com> 2024-05-11 5:52 ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li [not found] ` <CGME20240511055243epcas5p291fc5f72baf211a79475ec36682e170d@epcas5p2.samsung.com> 2024-05-11 5:52 ` [PATCH v2 1/4] io_uring/rsrc: add hugepage buffer coalesce helpers Chenliang Li 2024-05-11 16:43 ` Jens Axboe [not found] ` <CGME20240511055245epcas5p407cdbc005fb5f0fe2d9bbb8da423ff28@epcas5p4.samsung.com> 2024-05-11 5:52 ` [PATCH v2 2/4] io_uring/rsrc: store folio shift and mask into imu Chenliang Li [not found] ` <CGME20240511055247epcas5p2a54e23b6dddd11dda962733d259a10af@epcas5p2.samsung.com> 2024-05-11 5:52 ` [PATCH v2 3/4] io_uring/rsrc: add init and account functions for coalesced imus Chenliang Li 2024-05-11 16:48 ` Jens Axboe [not found] ` <CGME20240513021656epcas5p2367b442e02b07e6405b857f98a4eff44@epcas5p2.samsung.com> 2024-05-13 2:16 ` Chenliang Li [not found] ` <CGME20240511055248epcas5p287b7dfdab3162033744badc71fd084e1@epcas5p2.samsung.com> 2024-05-11 5:52 ` [PATCH v2 4/4] io_uring/rsrc: enable multi-hugepage buffer coalescing Chenliang Li 2024-05-11 16:49 ` Jens Axboe 2024-05-11 16:43 ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Jens Axboe [not found] ` <CGME20240513020155epcas5p23699782b97749bfcce0511ce5378df3c@epcas5p2.samsung.com> 2024-05-13 2:01 ` Chenliang Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox