From: Jens Axboe <axboe@kernel.dk>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCHSET RFC v2 0/4] uring_cmd copy avoidance
Date: Fri, 6 Jun 2025 11:32:57 -0600 [thread overview]
Message-ID: <453b9f15-f047-464f-9602-c3fd99df89c3@kernel.dk> (raw)
In-Reply-To: <CADUfDZrSAUYtd2988vSUryNt2voSUbngXtBcAU3Cb+JqYuuxTg@mail.gmail.com>
On 6/6/25 11:29 AM, Caleb Sander Mateos wrote:
> On Thu, Jun 5, 2025 at 12:47?PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> Hi,
>>
>> Currently uring_cmd unconditionally copies the SQE at prep time, as it
>> has no other choice - the SQE data must remain stable after submit.
>> This can lead to excessive memory bandwidth being used for that copy,
>> as passthrough will often use 128b SQEs, and efficiency concerns as
>> those copies will potentially use quite a lot of CPU cycles as well.
>>
>> As a quick test, running the current -git kernel on a box with 23
>> NVMe drives doing passthrough IO, memcpy() is the highest cycle user
>> at 9.05%, which is all off the uring_cmd prep path. The test case is
>> a 512b random read, which runs at 91-92M IOPS.
>>
>> With these patches, memcpy() is gone from the profiles, and it runs
>> at 98-99M IOPS, or about 7-8% faster.
>>
>> Before:
>>
>> IOPS=91.12M, BW=44.49GiB/s, IOS/call=32/32
>> IOPS=91.16M, BW=44.51GiB/s, IOS/call=32/32
>> IOPS=91.18M, BW=44.52GiB/s, IOS/call=31/32
>> IOPS=91.92M, BW=44.88GiB/s, IOS/call=32/32
>> IOPS=91.88M, BW=44.86GiB/s, IOS/call=32/32
>> IOPS=91.82M, BW=44.83GiB/s, IOS/call=32/31
>> IOPS=91.52M, BW=44.69GiB/s, IOS/call=32/32
>>
>> with the top perf report -g --no-children being:
>>
>> + 9.07% io_uring [kernel.kallsyms] [k] memcpy
>>
>> and after:
>>
>> # bash run-peak-pass.sh
>> [...]
>> IOPS=99.30M, BW=48.49GiB/s, IOS/call=32/32
>> IOPS=99.27M, BW=48.47GiB/s, IOS/call=31/32
>> IOPS=99.60M, BW=48.63GiB/s, IOS/call=32/32
>> IOPS=99.68M, BW=48.67GiB/s, IOS/call=32/31
>> IOPS=99.80M, BW=48.73GiB/s, IOS/call=31/32
>> IOPS=99.84M, BW=48.75GiB/s, IOS/call=32/32
>>
>> with memcpy not even in profiles. If you do the actual math of 100M
>> requests per second, and 128b of copying per IOP, then it's almost
>> 12GB/sec of reduced memory bandwidth.
>>
>> Even for lower IOPS production testing, Caleb reports that memcpy()
>> overhead is in the realm of 1.1% of CPU time.
>>
>> v2 of this patchset takes a different approach than v1 did - rather
>> than have the core mark a request as being potentially issued
>> out-of-line, this one adds an io_cold_def ->sqe_copy() helper, and
>> puts the onus on io_uring core to call it appropriately. Outside of
>> that, it also adds an IO_URING_F_INLINE flag so that the copy helper
>> _knows_ if it may sanely copy the SQE, or whether there's a bug in
>> the core and it should just be ended with -EFAULT. Where possible,
>> the actual SQE is also passed in.
>
> I like the ->sqe_copy() approach. I'm not totally convinced the
> complexity of computing and checking IO_URING_F_INLINE is worth it for
> what's effectively an assertion, but I'm not strongly opposed to it
> either.
It's no extra overhead on the normal issue side, as the mask isn't
conditional. For now it's just belt and suspenders, down the line we can
relax (and remove) some of this on the uring_cmd side.
--
Jens Axboe
prev parent reply other threads:[~2025-06-06 17:32 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-05 19:40 [PATCHSET RFC v2 0/4] uring_cmd copy avoidance Jens Axboe
2025-06-05 19:40 ` [PATCH 1/4] io_uring: add IO_URING_F_INLINE issue flag Jens Axboe
2025-06-06 17:31 ` Caleb Sander Mateos
2025-06-06 21:02 ` Jens Axboe
2025-06-05 19:40 ` [PATCH 2/4] io_uring: add struct io_cold_def->sqe_copy() method Jens Axboe
2025-06-05 20:05 ` Jens Axboe
2025-06-06 17:36 ` Caleb Sander Mateos
2025-06-06 21:01 ` Jens Axboe
2025-06-05 19:40 ` [PATCH 3/4] io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup() Jens Axboe
2025-06-06 17:37 ` Caleb Sander Mateos
2025-06-05 19:40 ` [PATCH 4/4] io_uring/uring_cmd: implement ->sqe_copy() to avoid unnecessary copies Jens Axboe
2025-06-06 17:39 ` Caleb Sander Mateos
2025-06-06 21:05 ` Jens Axboe
2025-06-06 22:08 ` Jens Axboe
2025-06-06 22:09 ` Caleb Sander Mateos
2025-06-06 23:53 ` Jens Axboe
2025-06-06 17:29 ` [PATCHSET RFC v2 0/4] uring_cmd copy avoidance Caleb Sander Mateos
2025-06-06 17:32 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=453b9f15-f047-464f-9602-c3fd99df89c3@kernel.dk \
--to=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox