public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Gabriel Krisman Bertazi <krisman@suse.de>
To: axboe@kernel.dk
Cc: io-uring@vger.kernel.org, Gabriel Krisman Bertazi <krisman@suse.de>
Subject: [PATCH] io_uring/sqpoll: Increase task_work submission batch size
Date: Thu,  3 Apr 2025 15:56:05 -0400	[thread overview]
Message-ID: <20250403195605.1221203-1-krisman@suse.de> (raw)

Our QA team reported a 10%-23% throughput reduction on an io_uring
sqpoll testcase that I traced back to a reduction of the device
submission queue depth when doing io over an sqpoll. After commit
af5d68f8892f ("io_uring/sqpoll: manage task_work privately"), we capped
the number of tw entries that can be executed from a single spin of
sqpoll to only 8 entries, before the sqpoll goes around to try to sleep.
My understanding is that this starves the device, as seen in device
utilization, mostly because it reduced the opportunity for plugging in the
block layer.

A simple usecase that showcases the issue is using sqpoll against a
nullblk:

fio --ioengine=io_uring --direct=1 --iodepth=128 --runtime=300 --bs=4k \
    --invalidate=1 --time_based  --ramp_time=10 --group_reporting=1 \
    --filename=/dev/nullb0 --name=RandomReads-direct-nullb-sqpoll-4k-1 \
    --rw=randread --numjobs=1 --sqthread_poll

One QA test machine yielded, with the above command:

SLE Kernel predating af5d68f8892f:
 READ: bw=9839MiB/s (10.3GB/s), 9839MiB/s-9839MiB/s (10.3GB/s-10.3GB/s), io=2883GiB (3095GB), run=300001-300001msec

SLE kernel after af5d68f8892f:
 READ: bw=8288MiB/s (8691MB/s), 8288MiB/s-8288MiB/s (8691MB/s-8691MB/s), io=2428GiB (2607GB), run=300001-300001msec

Ideally, the tw cap size would at least be the deep enough to fill the
device queue (assuming all uring commands are against only one device),
but we can't predict that behavior and thus can't guess the batch size.
We also don't want to let the tw run unbounded, though I'm not sure it
is really a problem.  Instead, let's just give it a more sensible value that
will allow for more efficient batching.

With this patch, my test machine (not the same as above) yielded a
consistent 10% throughput increase when doing randreads on nullb.  Our QE
team also reported it solved the regression on all machines they tested.

Fixes: af5d68f8892f ("io_uring/sqpoll: manage task_work privately")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
---
 io_uring/sqpoll.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index d037cc68e9d3..e58e4d2b3bde 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -20,7 +20,7 @@
 #include "sqpoll.h"
 
 #define IORING_SQPOLL_CAP_ENTRIES_VALUE 8
-#define IORING_TW_CAP_ENTRIES_VALUE	8
+#define IORING_TW_CAP_ENTRIES_VALUE	1024
 
 enum {
 	IO_SQ_THREAD_SHOULD_STOP = 0,
-- 
2.49.0


             reply	other threads:[~2025-04-03 19:56 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-03 19:56 Gabriel Krisman Bertazi [this message]
2025-04-03 20:26 ` [PATCH] io_uring/sqpoll: Increase task_work submission batch size Jens Axboe
2025-04-04  1:18   ` Gabriel Krisman Bertazi
2025-04-07 15:47     ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250403195605.1221203-1-krisman@suse.de \
    --to=krisman@suse.de \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox