Re: [PATCHv3] io_uring: set plug tags for same file

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jens Axboe <[email protected]>
To: Keith Busch <[email protected]>,
	[email protected], [email protected],
	[email protected]
Cc: Keith Busch <[email protected]>
Subject: Re: [PATCHv3] io_uring: set plug tags for same file
Date: Fri, 11 Aug 2023 13:24:17 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 7/31/23 2:39 PM, Keith Busch wrote:
> From: Keith Busch <[email protected]>
> 
> io_uring tries to optimize allocating tags by hinting to the plug how
> many it expects to need for a batch instead of allocating each tag
> individually. But io_uring submission queueus may have a mix of many
> devices for io, so the number of io's counted may be overestimated. This
> can lead to allocating too many tags, which adds overhead to finding
> that many contiguous tags, freeing up the ones we didn't use, and may
> starve out other users that can actually use them.
> 
> When starting a new batch of uring commands, count only commands that
> match the file descriptor of the first seen for this optimization. This
> avoids have to call the unlikely "blk_mq_free_plug_rqs()" at the end of
> a submission when multiple devices are used in a batch.

Wanted to run this through both the peak IOPS and networking testing,
started with the former. Here's a peak run with -git + pending 6.5
changes + pending 6.6 changes:

IOPS=125.88M, BW=61.46GiB/s, IOS/call=32/31
IOPS=125.39M, BW=61.23GiB/s, IOS/call=31/31
IOPS=124.97M, BW=61.02GiB/s, IOS/call=32/32
IOPS=124.60M, BW=60.84GiB/s, IOS/call=32/32
IOPS=124.27M, BW=60.68GiB/s, IOS/call=31/31
IOPS=124.00M, BW=60.54GiB/s, IOS/call=32/31

and here's one with the patch:

IOPS=121.69M, BW=59.42GiB/s, IOS/call=31/32
IOPS=121.26M, BW=59.21GiB/s, IOS/call=32/32
IOPS=120.87M, BW=59.02GiB/s, IOS/call=31/31
IOPS=120.87M, BW=59.02GiB/s, IOS/call=32/32
IOPS=121.02M, BW=59.09GiB/s, IOS/call=32/32
IOPS=121.63M, BW=59.39GiB/s, IOS/call=31/32
IOPS=121.48M, BW=59.32GiB/s, IOS/call=31/31

Running a quick profile, here's the top diff:

# Baseline  Delta Abs  Shared Object     Symbol                                     
# ........  .........  ................  ...........................................
#
     6.69%     -3.03%  [kernel.vmlinux]  [k] io_issue_sqe
     9.65%     +2.30%  [nvme]            [k] nvme_poll_cq
     4.86%     -1.55%  [kernel.vmlinux]  [k] io_submit_sqes
     4.61%     +1.40%  [kernel.vmlinux]  [k] blk_mq_submit_bio
     4.79%     -0.98%  [kernel.vmlinux]  [k] io_read
     0.56%     +0.97%  [kernel.vmlinux]  [k] blkdev_dio_unaligned.isra.0
     3.61%     +0.52%  [kernel.vmlinux]  [k] dma_unmap_page_attrs
     2.04%     -0.45%  [kernel.vmlinux]  [k] blk_add_rq_to_plug

Note that this is perf.data.old being the kernel with your patch, and
perf.data being the "stock" kernel mentioned above. The main thing looks
like spending more time in io_issue_sqe() and io_submit_sqes(), and
converserly we're spending less time polling. Usually for profiling a
polled workload, having more time in the polling function is good, as it
shows us we're spending less time everywhere else.

This is what I'm using:

sudo t/io_uring -p1 -d128 -b512 -s32 -c32 -F1 -B1 -R0 -X1 -n24 -P1 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1 /dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1 /dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1 /dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1

which is submitting 32 requests at the time. Obviously we don't expect a
win in this case, as each thread is handling just a single NVMe device.
The stock kernel will not over-allocate in this case.

If we change that to -n12 instead, meaning each will drive two devices,
here's what the stock kernel gets:

IOPS=60.95M, BW=29.76GiB/s, IOS/call=31/32
IOPS=60.99M, BW=29.78GiB/s, IOS/call=31/32
IOPS=60.96M, BW=29.77GiB/s, IOS/call=31/31
IOPS=60.95M, BW=29.76GiB/s, IOS/call=31/31
IOPS=60.91M, BW=29.74GiB/s, IOS/call=32/32

and with the patch:

IOPS=59.64M, BW=29.12GiB/s, IOS/call=32/31
IOPS=59.63M, BW=29.12GiB/s, IOS/call=31/32
IOPS=59.57M, BW=29.09GiB/s, IOS/call=31/31
IOPS=59.57M, BW=29.09GiB/s, IOS/call=32/32
IOPS=59.65M, BW=29.12GiB/s, IOS/call=31/31

Now these are both obviously lower, but I haven't done anything to
ensure that the two-drives-per-poller workload is optimized. For all I
know, the numa layout is now messed up too. Just as a caveat, but they
are comparable to each other.

Perf diff again looks similar, note that this time it's perf.data.old
that's the stock kernel and perf.data that's the one with your patch:

     3.51%     +2.84%  [kernel.vmlinux]  [k] io_issue_sqe
     3.24%     +1.35%  [kernel.vmlinux]  [k] io_submit_sqes

With the kernel without your patch, I was looking for tag flush overhead
but didn't find much:

     0.02%  io_uring  [kernel.vmlinux]  [k] blk_mq_free_plug_rqs

Outside of the peak worry with the patch, do you have a workload that we
should test this on?

-- 
Jens Axboe

next prev parent reply	other threads:[~2023-08-11 19:24 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31 20:39 [PATCHv3] io_uring: set plug tags for same file Keith Busch
2023-08-09 12:37 ` Pavel Begunkov
2023-08-11 19:24 ` Jens Axboe [this message]
2023-08-14 16:50   ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox