* [PATCH v8 0/2] io_uring/rsrc: coalescing multi-hugepage registered buffers [not found] <CGME20240731090139epcas5p32e2fdac7e795a139ff9565d151dd2160@epcas5p3.samsung.com> @ 2024-07-31 9:01 ` Chenliang Li [not found] ` <CGME20240731090143epcas5p2ade9e73c43ca6b839baa42761b4dc912@epcas5p2.samsung.com> ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Chenliang Li @ 2024-07-31 9:01 UTC (permalink / raw) To: axboe, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev, Chenliang Li Registered buffers are stored and processed in the form of bvec array, each bvec element typically points to a PAGE_SIZE page but can also work with hugepages. Specifically, a buffer consisting of a hugepage is coalesced to use only one hugepage bvec entry during registration. This coalescing feature helps to save both the space and DMA-mapping time. However, currently the coalescing feature doesn't work for multi-hugepage buffers. For a buffer with several 2M hugepages, we still split it into thousands of 4K page bvec entries while in fact, we can just use a handful of hugepage bvecs. This patch series enables coalescing registered buffers with more than one hugepages. It optimizes the DMA-mapping time and saves memory for these kind of buffers. Testing: The hugepage fixed buffer I/O can be tested using fio without modification. The fio command used in the following test is given in [1]. There's also a liburing testcase in [2]. Also, the system should have enough hugepages available before testing. Perf diff of 8M(4 * 2M hugepages) fio randread test: Before After Symbol ..................................................... 5.88% [k] __blk_rq_map_sg 3.98% -3.95% [k] dma_direct_map_sg 2.47% [k] dma_pool_alloc 1.37% -1.36% [k] sg_next +0.28% [k] dma_map_page_attrs Perf diff of 8M fio randwrite test: Before After Symbol ...................................................... 2.80% [k] __blk_rq_map_sg 1.74% [k] dma_direct_map_sg 1.61% [k] dma_pool_alloc 0.67% [k] sg_next +0.04% [k] dma_map_page_attrs The first patch prepares for adding the multi-hugepage coalescing by storing folio_shift and folio_mask into imu, the 2nd patch enables the feature. --- Changes since v7: - Rebase to io_uring-6.11 v7 : https://lore.kernel.org/io-uring/[email protected]/T/#t Changes since v6: - Remove the restriction on non-border-aligned single hugepage. - Code style issue. v6 : https://lore.kernel.org/io-uring/[email protected]/T/#t [1] fio -iodepth=64 -rw=randread -direct=1 -ioengine=io_uring \ -bs=8M -numjobs=1 -group_reporting -mem=shmhuge -fixedbufs -hugepage-size=2M \ -filename=/dev/nvme0n1 -runtime=10s -name=test1 [2] https://lore.kernel.org/io-uring/[email protected]/T/#u Chenliang Li (2): io_uring/rsrc: store folio shift and mask into imu io_uring/rsrc: enable multi-hugepage buffer coalescing io_uring/rsrc.c | 149 +++++++++++++++++++++++++++++++++++------------- io_uring/rsrc.h | 10 ++++ 2 files changed, 118 insertions(+), 41 deletions(-) base-commit: c3fca4fb83f7c84cd1e1aa9fe3a0e220ce8f30fb -- 2.34.1 ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CGME20240731090143epcas5p2ade9e73c43ca6b839baa42761b4dc912@epcas5p2.samsung.com>]
* [PATCH v8 1/2] io_uring/rsrc: store folio shift and mask into imu [not found] ` <CGME20240731090143epcas5p2ade9e73c43ca6b839baa42761b4dc912@epcas5p2.samsung.com> @ 2024-07-31 9:01 ` Chenliang Li 2024-07-31 23:09 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Chenliang Li @ 2024-07-31 9:01 UTC (permalink / raw) To: axboe, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev, Chenliang Li Store the folio shift and folio mask into imu struct and use it in iov_iter adjust, as we will have non PAGE_SIZE'd chunks if a multi-hugepage buffer get coalesced. Signed-off-by: Chenliang Li <[email protected]> Reviewed-by: Anuj Gupta <[email protected]> --- io_uring/rsrc.c | 15 ++++++--------- io_uring/rsrc.h | 2 ++ 2 files changed, 8 insertions(+), 9 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index a860516bf448..64152dc6f293 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -915,6 +915,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, imu->ubuf = (unsigned long) iov->iov_base; imu->ubuf_end = imu->ubuf + iov->iov_len; imu->nr_bvecs = nr_pages; + imu->folio_shift = PAGE_SHIFT; + imu->folio_mask = PAGE_MASK; *pimu = imu; ret = 0; @@ -1031,23 +1033,18 @@ int io_import_fixed(int ddir, struct iov_iter *iter, * we know that: * * 1) it's a BVEC iter, we set it up - * 2) all bvecs are PAGE_SIZE in size, except potentially the + * 2) all bvecs are the same in size, except potentially the * first and last bvec * * So just find our index, and adjust the iterator afterwards. * If the offset is within the first bvec (or the whole first * bvec, just use iov_iter_advance(). This makes it easier * since we can just skip the first segment, which may not - * be PAGE_SIZE aligned. + * be folio_size aligned. */ const struct bio_vec *bvec = imu->bvec; if (offset < bvec->bv_len) { - /* - * Note, huge pages buffers consists of one large - * bvec entry and should always go this way. The other - * branch doesn't expect non PAGE_SIZE'd chunks. - */ iter->bvec = bvec; iter->count -= offset; iter->iov_offset = offset; @@ -1056,12 +1053,12 @@ int io_import_fixed(int ddir, struct iov_iter *iter, /* skip first vec */ offset -= bvec->bv_len; - seg_skip = 1 + (offset >> PAGE_SHIFT); + seg_skip = 1 + (offset >> imu->folio_shift); iter->bvec = bvec + seg_skip; iter->nr_segs -= seg_skip; iter->count -= bvec->bv_len + offset; - iter->iov_offset = offset & ~PAGE_MASK; + iter->iov_offset = offset & ~imu->folio_mask; } } diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index c032ca3436ca..ee77e53328bf 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -46,7 +46,9 @@ struct io_mapped_ubuf { u64 ubuf; u64 ubuf_end; unsigned int nr_bvecs; + unsigned int folio_shift; unsigned long acct_pages; + unsigned long folio_mask; struct bio_vec bvec[] __counted_by(nr_bvecs); }; -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v8 1/2] io_uring/rsrc: store folio shift and mask into imu 2024-07-31 9:01 ` [PATCH v8 1/2] io_uring/rsrc: store folio shift and mask into imu Chenliang Li @ 2024-07-31 23:09 ` Pavel Begunkov 0 siblings, 0 replies; 6+ messages in thread From: Pavel Begunkov @ 2024-07-31 23:09 UTC (permalink / raw) To: Chenliang Li, axboe Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev On 7/31/24 10:01, Chenliang Li wrote: > Store the folio shift and folio mask into imu struct and use it in > iov_iter adjust, as we will have non PAGE_SIZE'd chunks if a > multi-hugepage buffer get coalesced. Reviewed-by: Pavel Begunkov <[email protected]> -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <CGME20240731090145epcas5p459f36e03c78655d92b5bd4aca85b1d68@epcas5p4.samsung.com>]
* [PATCH v8 2/2] io_uring/rsrc: enable multi-hugepage buffer coalescing [not found] ` <CGME20240731090145epcas5p459f36e03c78655d92b5bd4aca85b1d68@epcas5p4.samsung.com> @ 2024-07-31 9:01 ` Chenliang Li 2024-07-31 23:09 ` Pavel Begunkov 0 siblings, 1 reply; 6+ messages in thread From: Chenliang Li @ 2024-07-31 9:01 UTC (permalink / raw) To: axboe, asml.silence Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev, Chenliang Li Add support for checking and coalescing multi-hugepage-backed fixed buffers. The coalescing optimizes both time and space consumption caused by mapping and storing multi-hugepage fixed buffers. A coalescable multi-hugepage buffer should fully cover its folios (except potentially the first and last one), and these folios should have the same size. These requirements are for easier processing later, also we need same size'd chunks in io_import_fixed for fast iov_iter adjust. Signed-off-by: Chenliang Li <[email protected]> --- io_uring/rsrc.c | 134 ++++++++++++++++++++++++++++++++++++------------ io_uring/rsrc.h | 8 +++ 2 files changed, 110 insertions(+), 32 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 64152dc6f293..7d639a996f28 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -849,6 +849,98 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages, return ret; } +static bool io_do_coalesce_buffer(struct page ***pages, int *nr_pages, + struct io_imu_folio_data *data, int nr_folios) +{ + struct page **page_array = *pages, **new_array = NULL; + int nr_pages_left = *nr_pages, i, j; + + /* Store head pages only*/ + new_array = kvmalloc_array(nr_folios, sizeof(struct page *), + GFP_KERNEL); + if (!new_array) + return false; + + new_array[0] = compound_head(page_array[0]); + /* + * The pages are bound to the folio, it doesn't + * actually unpin them but drops all but one reference, + * which is usually put down by io_buffer_unmap(). + * Note, needs a better helper. + */ + if (data->nr_pages_head > 1) + unpin_user_pages(&page_array[1], data->nr_pages_head - 1); + + j = data->nr_pages_head; + nr_pages_left -= data->nr_pages_head; + for (i = 1; i < nr_folios; i++) { + unsigned int nr_unpin; + + new_array[i] = page_array[j]; + nr_unpin = min_t(unsigned int, nr_pages_left - 1, + data->nr_pages_mid - 1); + if (nr_unpin) + unpin_user_pages(&page_array[j+1], nr_unpin); + j += data->nr_pages_mid; + nr_pages_left -= data->nr_pages_mid; + } + kvfree(page_array); + *pages = new_array; + *nr_pages = nr_folios; + return true; +} + +static bool io_try_coalesce_buffer(struct page ***pages, int *nr_pages, + struct io_imu_folio_data *data) +{ + struct page **page_array = *pages; + struct folio *folio = page_folio(page_array[0]); + unsigned int count = 1, nr_folios = 1; + int i; + + if (*nr_pages <= 1) + return false; + + data->nr_pages_mid = folio_nr_pages(folio); + if (data->nr_pages_mid == 1) + return false; + + data->folio_shift = folio_shift(folio); + /* + * Check if pages are contiguous inside a folio, and all folios have + * the same page count except for the head and tail. + */ + for (i = 1; i < *nr_pages; i++) { + if (page_folio(page_array[i]) == folio && + page_array[i] == page_array[i-1] + 1) { + count++; + continue; + } + + if (nr_folios == 1) { + if (folio_page_idx(folio, page_array[i-1]) != + data->nr_pages_mid - 1) + return false; + + data->nr_pages_head = count; + } else if (count != data->nr_pages_mid) { + return false; + } + + folio = page_folio(page_array[i]); + if (folio_size(folio) != (1UL << data->folio_shift) || + folio_page_idx(folio, page_array[i]) != 0) + return false; + + count = 1; + nr_folios++; + } + if (nr_folios == 1) + data->nr_pages_head = count; + + return io_do_coalesce_buffer(pages, nr_pages, data, nr_folios); +} + static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, struct io_mapped_ubuf **pimu, struct page **last_hpage) @@ -858,7 +950,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, unsigned long off; size_t size; int ret, nr_pages, i; - struct folio *folio = NULL; + struct io_imu_folio_data data; + bool coalesced; *pimu = (struct io_mapped_ubuf *)&dummy_ubuf; if (!iov->iov_base) @@ -873,31 +966,8 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, goto done; } - /* If it's a huge page, try to coalesce them into a single bvec entry */ - if (nr_pages > 1) { - folio = page_folio(pages[0]); - for (i = 1; i < nr_pages; i++) { - /* - * Pages must be consecutive and on the same folio for - * this to work - */ - if (page_folio(pages[i]) != folio || - pages[i] != pages[i - 1] + 1) { - folio = NULL; - break; - } - } - if (folio) { - /* - * The pages are bound to the folio, it doesn't - * actually unpin them but drops all but one reference, - * which is usually put down by io_buffer_unmap(). - * Note, needs a better helper. - */ - unpin_user_pages(&pages[1], nr_pages - 1); - nr_pages = 1; - } - } + /* If it's huge page(s), try to coalesce them into fewer bvec entries */ + coalesced = io_try_coalesce_buffer(&pages, &nr_pages, &data); imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL); if (!imu) @@ -909,7 +979,6 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, goto done; } - off = (unsigned long) iov->iov_base & ~PAGE_MASK; size = iov->iov_len; /* store original address for later verification */ imu->ubuf = (unsigned long) iov->iov_base; @@ -917,17 +986,18 @@ static int io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, imu->nr_bvecs = nr_pages; imu->folio_shift = PAGE_SHIFT; imu->folio_mask = PAGE_MASK; + if (coalesced) { + imu->folio_shift = data.folio_shift; + imu->folio_mask = ~((1UL << data.folio_shift) - 1); + } + off = (unsigned long) iov->iov_base & ~imu->folio_mask; *pimu = imu; ret = 0; - if (folio) { - bvec_set_page(&imu->bvec[0], pages[0], size, off); - goto done; - } for (i = 0; i < nr_pages; i++) { size_t vec_len; - vec_len = min_t(size_t, size, PAGE_SIZE - off); + vec_len = min_t(size_t, size, (1UL << imu->folio_shift) - off); bvec_set_page(&imu->bvec[i], pages[i], vec_len, off); off = 0; size -= vec_len; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index ee77e53328bf..18242b2e9da4 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -52,6 +52,14 @@ struct io_mapped_ubuf { struct bio_vec bvec[] __counted_by(nr_bvecs); }; +struct io_imu_folio_data { + /* Head folio can be partially included in the fixed buf */ + unsigned int nr_pages_head; + /* For non-head/tail folios, has to be fully included */ + unsigned int nr_pages_mid; + unsigned int folio_shift; +}; + void io_rsrc_node_ref_zero(struct io_rsrc_node *node); void io_rsrc_node_destroy(struct io_ring_ctx *ctx, struct io_rsrc_node *ref_node); struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx); -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v8 2/2] io_uring/rsrc: enable multi-hugepage buffer coalescing 2024-07-31 9:01 ` [PATCH v8 2/2] io_uring/rsrc: enable multi-hugepage buffer coalescing Chenliang Li @ 2024-07-31 23:09 ` Pavel Begunkov 0 siblings, 0 replies; 6+ messages in thread From: Pavel Begunkov @ 2024-07-31 23:09 UTC (permalink / raw) To: Chenliang Li, axboe Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev On 7/31/24 10:01, Chenliang Li wrote: > Add support for checking and coalescing multi-hugepage-backed fixed > buffers. The coalescing optimizes both time and space consumption caused > by mapping and storing multi-hugepage fixed buffers. > > A coalescable multi-hugepage buffer should fully cover its folios > (except potentially the first and last one), and these folios should > have the same size. These requirements are for easier processing later, > also we need same size'd chunks in io_import_fixed for fast iov_iter > adjust. Reviewed-by: Pavel Begunkov <[email protected]> -- Pavel Begunkov ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v8 0/2] io_uring/rsrc: coalescing multi-hugepage registered buffers 2024-07-31 9:01 ` [PATCH v8 0/2] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li [not found] ` <CGME20240731090143epcas5p2ade9e73c43ca6b839baa42761b4dc912@epcas5p2.samsung.com> [not found] ` <CGME20240731090145epcas5p459f36e03c78655d92b5bd4aca85b1d68@epcas5p4.samsung.com> @ 2024-08-02 13:11 ` Jens Axboe 2 siblings, 0 replies; 6+ messages in thread From: Jens Axboe @ 2024-08-02 13:11 UTC (permalink / raw) To: asml.silence, Chenliang Li Cc: io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, gost.dev On Wed, 31 Jul 2024 17:01:31 +0800, Chenliang Li wrote: > Registered buffers are stored and processed in the form of bvec array, > each bvec element typically points to a PAGE_SIZE page but can also work > with hugepages. Specifically, a buffer consisting of a hugepage is > coalesced to use only one hugepage bvec entry during registration. > This coalescing feature helps to save both the space and DMA-mapping time. > > However, currently the coalescing feature doesn't work for multi-hugepage > buffers. For a buffer with several 2M hugepages, we still split it into > thousands of 4K page bvec entries while in fact, we can just use a > handful of hugepage bvecs. > > [...] Applied, thanks! [1/2] io_uring/rsrc: store folio shift and mask into imu commit: cbca98cb933728bb5eee39ba6bfe184932931e3d [2/2] io_uring/rsrc: enable multi-hugepage buffer coalescing commit: 04eedfc93ea1121bb6b00f27b14c58973f7de1c9 Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-08-02 13:11 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20240731090139epcas5p32e2fdac7e795a139ff9565d151dd2160@epcas5p3.samsung.com> 2024-07-31 9:01 ` [PATCH v8 0/2] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li [not found] ` <CGME20240731090143epcas5p2ade9e73c43ca6b839baa42761b4dc912@epcas5p2.samsung.com> 2024-07-31 9:01 ` [PATCH v8 1/2] io_uring/rsrc: store folio shift and mask into imu Chenliang Li 2024-07-31 23:09 ` Pavel Begunkov [not found] ` <CGME20240731090145epcas5p459f36e03c78655d92b5bd4aca85b1d68@epcas5p4.samsung.com> 2024-07-31 9:01 ` [PATCH v8 2/2] io_uring/rsrc: enable multi-hugepage buffer coalescing Chenliang Li 2024-07-31 23:09 ` Pavel Begunkov 2024-08-02 13:11 ` [PATCH v8 0/2] io_uring/rsrc: coalescing multi-hugepage registered buffers Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox