public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Anuj gupta <[email protected]>,
	Chenliang Li <[email protected]>
Cc: [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v4 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers
Date: Thu, 16 May 2024 08:58:03 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CACzX3AvTUJqmtD+qDhLimGde2WZUuSVa=sY+jYJ8-OB43TkoWw@mail.gmail.com>

On 5/16/24 8:01 AM, Anuj gupta wrote:
> On Tue, May 14, 2024 at 1:25?PM Chenliang Li <[email protected]> wrote:
>>
>> Registered buffers are stored and processed in the form of bvec array,
>> each bvec element typically points to a PAGE_SIZE page but can also work
>> with hugepages. Specifically, a buffer consisting of a hugepage is
>> coalesced to use only one hugepage bvec entry during registration.
>> This coalescing feature helps to save both the space and DMA-mapping time.
>>
>> However, currently the coalescing feature doesn't work for multi-hugepage
>> buffers. For a buffer with several 2M hugepages, we still split it into
>> thousands of 4K page bvec entries while in fact, we can just use a
>> handful of hugepage bvecs.
>>
>> This patch series enables coalescing registered buffers with more than
>> one hugepages. It optimizes the DMA-mapping time and saves memory for
>> these kind of buffers.
>>
>> Testing:
>>
>> The hugepage fixed buffer I/O can be tested using fio without
>> modification. The fio command used in the following test is given
>> in [1]. There's also a liburing testcase in [2]. Also, the system
>> should have enough hugepages available before testing.
>>
>> Perf diff of 8M(4 * 2M hugepages) fio randread test:
>>
>> Before          After           Symbol
>> .....................................................
>> 4.68%                           [k] __blk_rq_map_sg
>> 3.31%                           [k] dma_direct_map_sg
>> 2.64%                           [k] dma_pool_alloc
>> 1.09%                           [k] sg_next
>>                 +0.49%          [k] dma_map_page_attrs
>>
>> Perf diff of 8M fio randwrite test:
>>
>> Before          After           Symbol
>> ......................................................
>> 2.82%                           [k] __blk_rq_map_sg
>> 2.05%                           [k] dma_direct_map_sg
>> 1.75%                           [k] dma_pool_alloc
>> 0.68%                           [k] sg_next
>>                 +0.08%          [k] dma_map_page_attrs
>>
>> First three patches prepare for adding the multi-hugepage coalescing
>> into buffer registration, the 4th patch enables the feature.
>>
>> -----------------
>> Changes since v3:
>>
>> - Delete unnecessary commit message
>> - Update test command and test results
>>
>> v3 : https://lore.kernel.org/io-uring/[email protected]/T/#t
>>
>> Changes since v2:
>>
>> - Modify the loop iterator increment to make code cleaner
>> - Minor fix to the return procedure in coalesced buffer account
>> - Correct commit messages
>> - Add test cases in liburing
>>
>> v2 : https://lore.kernel.org/io-uring/[email protected]/T/#t
>>
>> Changes since v1:
>>
>> - Split into 4 patches
>> - Fix code style issues
>> - Rearrange the change of code for cleaner look
>> - Add speciallized pinned page accounting procedure for coalesced
>>   buffers
>> - Reordered the newly add fields in imu struct for better compaction
>>
>> v1 : https://lore.kernel.org/io-uring/[email protected]/T/#u
>>
>> [1]
>> fio -iodepth=64 -rw=randread(-rw=randwrite) -direct=1 -ioengine=io_uring \
>> -bs=8M -numjobs=1 -group_reporting -mem=shmhuge -fixedbufs -hugepage-size=2M \
>> -filename=/dev/nvme0n1 -runtime=10s -name=test1
>>
>> [2]
>> https://lore.kernel.org/io-uring/[email protected]/T/#u
>>
>> Chenliang Li (4):
>>   io_uring/rsrc: add hugepage buffer coalesce helpers
>>   io_uring/rsrc: store folio shift and mask into imu
>>   io_uring/rsrc: add init and account functions for coalesced imus
>>   io_uring/rsrc: enable multi-hugepage buffer coalescing
>>
>>  io_uring/rsrc.c | 217 +++++++++++++++++++++++++++++++++++++++---------
>>  io_uring/rsrc.h |  12 +++
>>  2 files changed, 191 insertions(+), 38 deletions(-)
>>
>>
>> base-commit: 59b28a6e37e650c0d601ed87875b6217140cda5d
>> --
>> 2.34.1
>>
>>
> 
> I tested this series by registering multi-hugepage buffers. The coalescing helps
> saving dma-mapping time. This is the gain observed on my setup, while running
> the fio workload shared here.
> 
> RandomRead:
> Baseline        DeltaAbs        Symbol
> .....................................................
> 3.89%            -3.62%            [k] blk_rq_map_sg
> 3.58%            -3.23%            [k] dma_direct_map_sg
> 2.25%            -2.23%            [k] sg_next
> 
> RandomWrite:
> Baseline        DeltaAbs        Symbol
> .....................................................
> 2.46%            -2.31%            [k] dma_direct_map_sg
> 2.06%            -2.05%            [k] sg_next
> 2.08%            -1.80%            [k] blk_rq_map_sg
> 
> The liburing test case shared works fine too on my setup.
> 
> Feel free to add:
> Tested-by: Anuj Gupta <[email protected]>

It's even more dramatic here, excerpt from profiles:

    32.16%    -25.46%  [kernel.kallsyms]  [k] bio_split_rw
     8.92%     -8.38%  [kernel.kallsyms]  [k] iov_iter_is_aligned
     6.85%     -4.31%  [nvme]             [k] nvme_prep_rq.part.0
    14.71%             [kernel.kallsyms]  [k] __blk_rq_map_sg
     9.49%             [kernel.kallsyms]  [k] dma_direct_map_sg
     8.50%             [kernel.kallsyms]  [k] sg_next

some of it just shifted, but definitely a huge win. This is just using
a single drive, doing about 7GB/sec.

The change looks pretty reasonable to me. I'd love for the test cases to
try and hit corner cases, as it's really more of a functionality test
right now. We should include things like one-off huge pages, ensure we
don't coalesce where we should not, etc.

This is obviously too late for the 6.10 merge window, so there's plenty
of time to get this 100% sorted before the next kernel release.

-- 
Jens Axboe


  reply	other threads:[~2024-05-16 14:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240514075453epcas5p17974fb62d65a88b1a1b55b97942ee2be@epcas5p1.samsung.com>
2024-05-14  7:54 ` [PATCH v4 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li
     [not found]   ` <CGME20240514075457epcas5p10f02f1746f957df91353724ec859664f@epcas5p1.samsung.com>
2024-05-14  7:54     ` [PATCH v4 1/4] io_uring/rsrc: add hugepage buffer coalesce helpers Chenliang Li
2024-05-16 14:07       ` Anuj gupta
2024-06-16 18:04       ` Pavel Begunkov
     [not found]         ` <CGME20240617031218epcas5p4f706f53094ed8650a2b59b2006120956@epcas5p4.samsung.com>
2024-06-17  3:12           ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li
2024-06-17 12:38             ` Pavel Begunkov
     [not found]               ` <CGME20240618031115epcas5p25e2275b5e73f974f13aa5ba060979973@epcas5p2.samsung.com>
2024-06-18  3:11                 ` [PATCH v4 3/4] io_uring/rsrc: add init and account functions for coalesced imus Chenliang Li
     [not found]   ` <CGME20240514075459epcas5p2275b4c26f16bcfcea200e97fc75c2a14@epcas5p2.samsung.com>
2024-05-14  7:54     ` [PATCH v4 2/4] io_uring/rsrc: store folio shift and mask into imu Chenliang Li
2024-05-16 14:08       ` Anuj gupta
     [not found]   ` <CGME20240514075500epcas5p1e638b1ae84727b3669ff6b780cd1cb23@epcas5p1.samsung.com>
2024-05-14  7:54     ` [PATCH v4 3/4] io_uring/rsrc: add init and account functions for coalesced imus Chenliang Li
2024-06-16 17:43       ` Pavel Begunkov
     [not found]         ` <CGME20240617031611epcas5p26e5c5f65a182af069427b1609f01d1d0@epcas5p2.samsung.com>
2024-06-17  3:16           ` [PATCH v2 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Chenliang Li
2024-06-17 12:22             ` Pavel Begunkov
     [not found]               ` <CGME20240618032433epcas5p258e5fe6863a91a1f6243f3408b3378f9@epcas5p2.samsung.com>
2024-06-18  3:24                 ` [PATCH v4 3/4] io_uring/rsrc: add init and account functions for coalesced imus Chenliang Li
     [not found]   ` <CGME20240514075502epcas5p10be6bef71d284a110277575d6008563d@epcas5p1.samsung.com>
2024-05-14  7:54     ` [PATCH v4 4/4] io_uring/rsrc: enable multi-hugepage buffer coalescing Chenliang Li
2024-05-16 14:09       ` Anuj gupta
2024-05-16 14:01   ` [PATCH v4 0/4] io_uring/rsrc: coalescing multi-hugepage registered buffers Anuj gupta
2024-05-16 14:58     ` Jens Axboe [this message]
     [not found]       ` <CGME20240530051050epcas5p122f30aebcf99e27a8d02cc1318dbafc8@epcas5p1.samsung.com>
2024-05-30  5:10         ` Chenliang Li
2024-06-04 13:33           ` Anuj gupta
     [not found]           ` <CGME20240613024932epcas5p2f053609efe7e9fb3d87318a66c2ccf53@epcas5p2.samsung.com>
2024-06-13  2:49             ` Chenliang Li
2024-06-16  2:54               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox