From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCHSET RFC v2 0/4] uring_cmd copy avoidance
Date: Fri, 6 Jun 2025 10:29:34 -0700 [thread overview]
Message-ID: <CADUfDZrSAUYtd2988vSUryNt2voSUbngXtBcAU3Cb+JqYuuxTg@mail.gmail.com> (raw)
In-Reply-To: <20250605194728.145287-1-axboe@kernel.dk>
On Thu, Jun 5, 2025 at 12:47 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> Hi,
>
> Currently uring_cmd unconditionally copies the SQE at prep time, as it
> has no other choice - the SQE data must remain stable after submit.
> This can lead to excessive memory bandwidth being used for that copy,
> as passthrough will often use 128b SQEs, and efficiency concerns as
> those copies will potentially use quite a lot of CPU cycles as well.
>
> As a quick test, running the current -git kernel on a box with 23
> NVMe drives doing passthrough IO, memcpy() is the highest cycle user
> at 9.05%, which is all off the uring_cmd prep path. The test case is
> a 512b random read, which runs at 91-92M IOPS.
>
> With these patches, memcpy() is gone from the profiles, and it runs
> at 98-99M IOPS, or about 7-8% faster.
>
> Before:
>
> IOPS=91.12M, BW=44.49GiB/s, IOS/call=32/32
> IOPS=91.16M, BW=44.51GiB/s, IOS/call=32/32
> IOPS=91.18M, BW=44.52GiB/s, IOS/call=31/32
> IOPS=91.92M, BW=44.88GiB/s, IOS/call=32/32
> IOPS=91.88M, BW=44.86GiB/s, IOS/call=32/32
> IOPS=91.82M, BW=44.83GiB/s, IOS/call=32/31
> IOPS=91.52M, BW=44.69GiB/s, IOS/call=32/32
>
> with the top perf report -g --no-children being:
>
> + 9.07% io_uring [kernel.kallsyms] [k] memcpy
>
> and after:
>
> # bash run-peak-pass.sh
> [...]
> IOPS=99.30M, BW=48.49GiB/s, IOS/call=32/32
> IOPS=99.27M, BW=48.47GiB/s, IOS/call=31/32
> IOPS=99.60M, BW=48.63GiB/s, IOS/call=32/32
> IOPS=99.68M, BW=48.67GiB/s, IOS/call=32/31
> IOPS=99.80M, BW=48.73GiB/s, IOS/call=31/32
> IOPS=99.84M, BW=48.75GiB/s, IOS/call=32/32
>
> with memcpy not even in profiles. If you do the actual math of 100M
> requests per second, and 128b of copying per IOP, then it's almost
> 12GB/sec of reduced memory bandwidth.
>
> Even for lower IOPS production testing, Caleb reports that memcpy()
> overhead is in the realm of 1.1% of CPU time.
>
> v2 of this patchset takes a different approach than v1 did - rather
> than have the core mark a request as being potentially issued
> out-of-line, this one adds an io_cold_def ->sqe_copy() helper, and
> puts the onus on io_uring core to call it appropriately. Outside of
> that, it also adds an IO_URING_F_INLINE flag so that the copy helper
> _knows_ if it may sanely copy the SQE, or whether there's a bug in
> the core and it should just be ended with -EFAULT. Where possible,
> the actual SQE is also passed in.
I like the ->sqe_copy() approach. I'm not totally convinced the
complexity of computing and checking IO_URING_F_INLINE is worth it for
what's effectively an assertion, but I'm not strongly opposed to it
either.
Thanks,
Caleb
>
> I think this approach is saner, and in fact it can be extended to
> reduce over-eager copies in other spots. For now I just did uring_cmd,
> and verified that the memcpy's are still gone from my test.
>
> Can also be found here:
>
> https://git.kernel.dk/cgit/linux/log/?h=uring_cmd.2
>
> include/linux/io_uring_types.h | 2 ++
> io_uring/io_uring.c | 35 +++++++++++++++------
> io_uring/opdef.c | 1 +
> io_uring/opdef.h | 1 +
> io_uring/uring_cmd.c | 57 ++++++++++++++++++----------------
> io_uring/uring_cmd.h | 2 ++
> 6 files changed, 63 insertions(+), 35 deletions(-)
>
> --
> Jens Axboe
>
next prev parent reply other threads:[~2025-06-06 17:29 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-05 19:40 [PATCHSET RFC v2 0/4] uring_cmd copy avoidance Jens Axboe
2025-06-05 19:40 ` [PATCH 1/4] io_uring: add IO_URING_F_INLINE issue flag Jens Axboe
2025-06-06 17:31 ` Caleb Sander Mateos
2025-06-06 21:02 ` Jens Axboe
2025-06-05 19:40 ` [PATCH 2/4] io_uring: add struct io_cold_def->sqe_copy() method Jens Axboe
2025-06-05 20:05 ` Jens Axboe
2025-06-06 17:36 ` Caleb Sander Mateos
2025-06-06 21:01 ` Jens Axboe
2025-06-05 19:40 ` [PATCH 3/4] io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup() Jens Axboe
2025-06-06 17:37 ` Caleb Sander Mateos
2025-06-05 19:40 ` [PATCH 4/4] io_uring/uring_cmd: implement ->sqe_copy() to avoid unnecessary copies Jens Axboe
2025-06-06 17:39 ` Caleb Sander Mateos
2025-06-06 21:05 ` Jens Axboe
2025-06-06 22:08 ` Jens Axboe
2025-06-06 22:09 ` Caleb Sander Mateos
2025-06-06 23:53 ` Jens Axboe
2025-06-06 17:29 ` Caleb Sander Mateos [this message]
2025-06-06 17:32 ` [PATCHSET RFC v2 0/4] uring_cmd copy avoidance Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CADUfDZrSAUYtd2988vSUryNt2voSUbngXtBcAU3Cb+JqYuuxTg@mail.gmail.com \
--to=csander@purestorage.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox