public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: io-uring@vger.kernel.org
Subject: Re: [PATCHSET RFC v2 0/4] uring_cmd copy avoidance
Date: Fri, 6 Jun 2025 10:29:34 -0700	[thread overview]
Message-ID: <CADUfDZrSAUYtd2988vSUryNt2voSUbngXtBcAU3Cb+JqYuuxTg@mail.gmail.com> (raw)
In-Reply-To: <20250605194728.145287-1-axboe@kernel.dk>

On Thu, Jun 5, 2025 at 12:47 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> Hi,
>
> Currently uring_cmd unconditionally copies the SQE at prep time, as it
> has no other choice - the SQE data must remain stable after submit.
> This can lead to excessive memory bandwidth being used for that copy,
> as passthrough will often use 128b SQEs, and efficiency concerns as
> those copies will potentially use quite a lot of CPU cycles as well.
>
> As a quick test, running the current -git kernel on a box with 23
> NVMe drives doing passthrough IO, memcpy() is the highest cycle user
> at 9.05%, which is all off the uring_cmd prep path. The test case is
> a 512b random read, which runs at 91-92M IOPS.
>
> With these patches, memcpy() is gone from the profiles, and it runs
> at 98-99M IOPS, or about 7-8% faster.
>
> Before:
>
> IOPS=91.12M, BW=44.49GiB/s, IOS/call=32/32
> IOPS=91.16M, BW=44.51GiB/s, IOS/call=32/32
> IOPS=91.18M, BW=44.52GiB/s, IOS/call=31/32
> IOPS=91.92M, BW=44.88GiB/s, IOS/call=32/32
> IOPS=91.88M, BW=44.86GiB/s, IOS/call=32/32
> IOPS=91.82M, BW=44.83GiB/s, IOS/call=32/31
> IOPS=91.52M, BW=44.69GiB/s, IOS/call=32/32
>
> with the top perf report -g --no-children being:
>
> +    9.07%  io_uring  [kernel.kallsyms]  [k] memcpy
>
> and after:
>
> # bash run-peak-pass.sh
> [...]
> IOPS=99.30M, BW=48.49GiB/s, IOS/call=32/32
> IOPS=99.27M, BW=48.47GiB/s, IOS/call=31/32
> IOPS=99.60M, BW=48.63GiB/s, IOS/call=32/32
> IOPS=99.68M, BW=48.67GiB/s, IOS/call=32/31
> IOPS=99.80M, BW=48.73GiB/s, IOS/call=31/32
> IOPS=99.84M, BW=48.75GiB/s, IOS/call=32/32
>
> with memcpy not even in profiles. If you do the actual math of 100M
> requests per second, and 128b of copying per IOP, then it's almost
> 12GB/sec of reduced memory bandwidth.
>
> Even for lower IOPS production testing, Caleb reports that memcpy()
> overhead is in the realm of 1.1% of CPU time.
>
> v2 of this patchset takes a different approach than v1 did - rather
> than have the core mark a request as being potentially issued
> out-of-line, this one adds an io_cold_def ->sqe_copy() helper, and
> puts the onus on io_uring core to call it appropriately. Outside of
> that, it also adds an IO_URING_F_INLINE flag so that the copy helper
> _knows_ if it may sanely copy the SQE, or whether there's a bug in
> the core and it should just be ended with -EFAULT. Where possible,
> the actual SQE is also passed in.

I like the ->sqe_copy() approach. I'm not totally convinced the
complexity of computing and checking IO_URING_F_INLINE is worth it for
what's effectively an assertion, but I'm not strongly opposed to it
either.

Thanks,
Caleb


>
> I think this approach is saner, and in fact it can be extended to
> reduce over-eager copies in other spots. For now I just did uring_cmd,
> and verified that the memcpy's are still gone from my test.
>
> Can also be found here:
>
> https://git.kernel.dk/cgit/linux/log/?h=uring_cmd.2
>
>  include/linux/io_uring_types.h |  2 ++
>  io_uring/io_uring.c            | 35 +++++++++++++++------
>  io_uring/opdef.c               |  1 +
>  io_uring/opdef.h               |  1 +
>  io_uring/uring_cmd.c           | 57 ++++++++++++++++++----------------
>  io_uring/uring_cmd.h           |  2 ++
>  6 files changed, 63 insertions(+), 35 deletions(-)
>
> --
> Jens Axboe
>

  parent reply	other threads:[~2025-06-06 17:29 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-05 19:40 [PATCHSET RFC v2 0/4] uring_cmd copy avoidance Jens Axboe
2025-06-05 19:40 ` [PATCH 1/4] io_uring: add IO_URING_F_INLINE issue flag Jens Axboe
2025-06-06 17:31   ` Caleb Sander Mateos
2025-06-06 21:02     ` Jens Axboe
2025-06-05 19:40 ` [PATCH 2/4] io_uring: add struct io_cold_def->sqe_copy() method Jens Axboe
2025-06-05 20:05   ` Jens Axboe
2025-06-06 17:36   ` Caleb Sander Mateos
2025-06-06 21:01     ` Jens Axboe
2025-06-05 19:40 ` [PATCH 3/4] io_uring/uring_cmd: get rid of io_uring_cmd_prep_setup() Jens Axboe
2025-06-06 17:37   ` Caleb Sander Mateos
2025-06-05 19:40 ` [PATCH 4/4] io_uring/uring_cmd: implement ->sqe_copy() to avoid unnecessary copies Jens Axboe
2025-06-06 17:39   ` Caleb Sander Mateos
2025-06-06 21:05     ` Jens Axboe
2025-06-06 22:08       ` Jens Axboe
2025-06-06 22:09         ` Caleb Sander Mateos
2025-06-06 23:53           ` Jens Axboe
2025-06-06 17:29 ` Caleb Sander Mateos [this message]
2025-06-06 17:32   ` [PATCHSET RFC v2 0/4] uring_cmd copy avoidance Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADUfDZrSAUYtd2988vSUryNt2voSUbngXtBcAU3Cb+JqYuuxTg@mail.gmail.com \
    --to=csander@purestorage.com \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox