From: Andres Freund <[email protected]>
To: Jens Axboe <[email protected]>
Cc: [email protected]
Subject: Re: Deduplicate io_*_prep calls?
Date: Sun, 23 Feb 2020 23:12:11 -0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
[-- Attachment #1: Type: text/plain, Size: 3729 bytes --]
Hi,
On 2020-02-23 20:52:26 -0700, Jens Axboe wrote:
> The fast case is not being deferred, that's by far the common (and hot)
> case, which means io_issue() is called with sqe != NULL. My worry is
> that by moving it into a prep helper, the compiler isn't smart enough to
> not make that basically two switches.
I'm not sure that benefit of a single switch isn't offset by the lower
code density due to the additional per-opcode branches. Not inlining
the prepare function results in:
$ size fs/io_uring.o fs/io_uring.before.o
text data bss dec hex filename
75383 8237 8 83628 146ac fs/io_uring.o
76959 8237 8 85204 14cd4 fs/io_uring.before.o
symbol size
-io_close_prep 0000000000000066
-io_connect_prep 0000000000000051
-io_epoll_ctl_prep 0000000000000051
-io_issue_sqe 0000000000001101
+io_issue_sqe 0000000000000de9
-io_openat2_prep 00000000000000ed
-io_openat_prep 0000000000000089
-io_poll_add_prep 0000000000000056
-io_prep_fsync 0000000000000053
-io_prep_sfr 000000000000004e
-io_read_prep 00000000000000ca
-io_recvmsg_prep 0000000000000079
-io_req_defer_prep 000000000000058e
+io_req_defer_prep 0000000000000160
+io_req_prep 0000000000000d26
-io_sendmsg_prep 000000000000006b
-io_statx_prep 00000000000000ed
-io_write_prep 00000000000000cd
> Feel free to prove me wrong, I'd love to reduce it ;-)
With a bit of handholding the compiler can deduplicate the switches. It
can't recognize on its own that req->opcode can't change between the
switch for prep and issue. Can be solved by moving the opcode into a
temporary variable. Also needs an inline for io_req_prep (not surpring,
it's a bit large).
That results in a bit bigger code. That's partially because of more
inlining:
text data bss dec hex filename
78291 8237 8 86536 15208 fs/io_uring.o
76959 8237 8 85204 14cd4 fs/io_uring.before.o
symbol size
+get_order 0000000000000015
-io_close_prep 0000000000000066
-io_connect_prep 0000000000000051
-io_epoll_ctl_prep 0000000000000051
-io_issue_sqe 0000000000001101
+io_issue_sqe 00000000000018fa
-io_openat2_prep 00000000000000ed
-io_openat_prep 0000000000000089
-io_poll_add_prep 0000000000000056
-io_prep_fsync 0000000000000053
-io_prep_sfr 000000000000004e
-io_read_prep 00000000000000ca
-io_recvmsg_prep 0000000000000079
-io_req_defer_prep 000000000000058e
+io_req_defer_prep 0000000000000f12
-io_sendmsg_prep 000000000000006b
-io_statx_prep 00000000000000ed
-io_write_prep 00000000000000cd
There's still some unnecessary branching on force_nonblocking. The
second patch just separates the cases needing force_nonblocking
out. Probably not quite the right structure.
Oddly enough gcc decides that io_queue_async_work() wouldn't be inlined
anymore after that. I'm quite doubtful it's a good candidate anyway?
Seems mighty complex, and not likely to win much. That's a noticable
win:
text data bss dec hex filename
72857 8141 8 81006 13c6e fs/io_uring.o
76959 8237 8 85204 14cd4 fs/io_uring.before.o
--- /tmp/before.txt 2020-02-23 21:00:16.316753022 -0800
+++ /tmp/after.txt 2020-02-23 23:10:44.979496728 -0800
-io_commit_cqring 00000000000003ef
+io_commit_cqring 000000000000012c
+io_free_req 000000000000005e
-io_free_req 00000000000002ed
-io_issue_sqe 0000000000001101
+io_issue_sqe 0000000000000e86
-io_poll_remove_one 0000000000000308
+io_poll_remove_one 0000000000000074
-io_poll_wake 0000000000000498
+io_poll_wake 000000000000021c
+io_queue_async_work 00000000000002a0
-io_queue_sqe 00000000000008cc
+io_queue_sqe 0000000000000391
Not quite sure what the policy is with attaching POC patches? Also send
as separate emails?
Greetings,
Andres Freund
[-- Attachment #2: v1-0001-WIP-io_uring-Deduplicate-request-prep.patch --]
[-- Type: text/x-diff, Size: 7794 bytes --]
From edb629fc246ef146ad4e25bc51fd3f5db797b2be Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Sun, 23 Feb 2020 22:22:33 -0800
Subject: [PATCH v1 1/2] WIP: io_uring: Deduplicate request prep.
Signed-off-by: Andres Freund <[email protected]>
---
fs/io_uring.c | 192 +++++++++++++-------------------------------------
1 file changed, 49 insertions(+), 143 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index de650df9ac53..9a8fda8b28c9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4116,31 +4116,24 @@ static int io_files_update(struct io_kiocb *req, bool force_nonblock)
return 0;
}
-static int io_req_defer_prep(struct io_kiocb *req,
- const struct io_uring_sqe *sqe)
+static inline int io_req_prep(u8 opcode, struct io_kiocb *req,
+ const struct io_uring_sqe *sqe,
+ bool force_nonblock)
{
ssize_t ret = 0;
- if (io_op_defs[req->opcode].file_table) {
- ret = io_grab_files(req);
- if (unlikely(ret))
- return ret;
- }
-
- io_req_work_grab_env(req, &io_op_defs[req->opcode]);
-
- switch (req->opcode) {
+ switch (opcode) {
case IORING_OP_NOP:
break;
case IORING_OP_READV:
case IORING_OP_READ_FIXED:
case IORING_OP_READ:
- ret = io_read_prep(req, sqe, true);
+ ret = io_read_prep(req, sqe, force_nonblock);
break;
case IORING_OP_WRITEV:
case IORING_OP_WRITE_FIXED:
case IORING_OP_WRITE:
- ret = io_write_prep(req, sqe, true);
+ ret = io_write_prep(req, sqe, force_nonblock);
break;
case IORING_OP_POLL_ADD:
ret = io_poll_add_prep(req, sqe);
@@ -4162,23 +4155,23 @@ static int io_req_defer_prep(struct io_kiocb *req,
case IORING_OP_RECV:
ret = io_recvmsg_prep(req, sqe);
break;
- case IORING_OP_CONNECT:
- ret = io_connect_prep(req, sqe);
- break;
case IORING_OP_TIMEOUT:
ret = io_timeout_prep(req, sqe, false);
break;
case IORING_OP_TIMEOUT_REMOVE:
ret = io_timeout_remove_prep(req, sqe);
break;
+ case IORING_OP_ACCEPT:
+ ret = io_accept_prep(req, sqe);
+ break;
case IORING_OP_ASYNC_CANCEL:
ret = io_async_cancel_prep(req, sqe);
break;
case IORING_OP_LINK_TIMEOUT:
ret = io_timeout_prep(req, sqe, true);
break;
- case IORING_OP_ACCEPT:
- ret = io_accept_prep(req, sqe);
+ case IORING_OP_CONNECT:
+ ret = io_connect_prep(req, sqe);
break;
case IORING_OP_FALLOCATE:
ret = io_fallocate_prep(req, sqe);
@@ -4217,6 +4210,23 @@ static int io_req_defer_prep(struct io_kiocb *req,
return ret;
}
+static int io_req_defer_prep(struct io_kiocb *req,
+ const struct io_uring_sqe *sqe)
+{
+ ssize_t ret = 0;
+ u8 opcode = req->opcode;
+
+ if (io_op_defs[opcode].file_table) {
+ ret = io_grab_files(req);
+ if (unlikely(ret))
+ return ret;
+ }
+
+ io_req_work_grab_env(req, &io_op_defs[opcode]);
+
+ return io_req_prep(opcode, req, sqe, true);
+}
+
static int io_req_defer(struct io_kiocb *req, const struct io_uring_sqe *sqe)
{
struct io_ring_ctx *ctx = req->ctx;
@@ -4278,198 +4288,94 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
struct io_kiocb **nxt, bool force_nonblock)
{
struct io_ring_ctx *ctx = req->ctx;
+ /* allow compiler to infer opcode doesn't change */
+ u8 opcode = req->opcode;
int ret;
- switch (req->opcode) {
+ if (sqe) {
+ ret = io_req_prep(opcode, req, sqe, force_nonblock);
+ if (ret)
+ return ret;
+ }
+
+ switch (opcode) {
case IORING_OP_NOP:
ret = io_nop(req);
break;
case IORING_OP_READV:
case IORING_OP_READ_FIXED:
case IORING_OP_READ:
- if (sqe) {
- ret = io_read_prep(req, sqe, force_nonblock);
- if (ret < 0)
- break;
- }
ret = io_read(req, nxt, force_nonblock);
break;
case IORING_OP_WRITEV:
case IORING_OP_WRITE_FIXED:
case IORING_OP_WRITE:
- if (sqe) {
- ret = io_write_prep(req, sqe, force_nonblock);
- if (ret < 0)
- break;
- }
ret = io_write(req, nxt, force_nonblock);
break;
- case IORING_OP_FSYNC:
- if (sqe) {
- ret = io_prep_fsync(req, sqe);
- if (ret < 0)
- break;
- }
- ret = io_fsync(req, nxt, force_nonblock);
- break;
case IORING_OP_POLL_ADD:
- if (sqe) {
- ret = io_poll_add_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_poll_add(req, nxt);
break;
case IORING_OP_POLL_REMOVE:
- if (sqe) {
- ret = io_poll_remove_prep(req, sqe);
- if (ret < 0)
- break;
- }
ret = io_poll_remove(req);
break;
+ case IORING_OP_FSYNC:
+ ret = io_fsync(req, nxt, force_nonblock);
+ break;
case IORING_OP_SYNC_FILE_RANGE:
- if (sqe) {
- ret = io_prep_sfr(req, sqe);
- if (ret < 0)
- break;
- }
ret = io_sync_file_range(req, nxt, force_nonblock);
break;
case IORING_OP_SENDMSG:
+ ret = io_sendmsg(req, nxt, force_nonblock);
+ break;
case IORING_OP_SEND:
- if (sqe) {
- ret = io_sendmsg_prep(req, sqe);
- if (ret < 0)
- break;
- }
- if (req->opcode == IORING_OP_SENDMSG)
- ret = io_sendmsg(req, nxt, force_nonblock);
- else
- ret = io_send(req, nxt, force_nonblock);
+ ret = io_send(req, nxt, force_nonblock);
break;
case IORING_OP_RECVMSG:
+ ret = io_recvmsg(req, nxt, force_nonblock);
+ break;
case IORING_OP_RECV:
- if (sqe) {
- ret = io_recvmsg_prep(req, sqe);
- if (ret)
- break;
- }
- if (req->opcode == IORING_OP_RECVMSG)
- ret = io_recvmsg(req, nxt, force_nonblock);
- else
- ret = io_recv(req, nxt, force_nonblock);
+ ret = io_recv(req, nxt, force_nonblock);
break;
case IORING_OP_TIMEOUT:
- if (sqe) {
- ret = io_timeout_prep(req, sqe, false);
- if (ret)
- break;
- }
ret = io_timeout(req);
break;
case IORING_OP_TIMEOUT_REMOVE:
- if (sqe) {
- ret = io_timeout_remove_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_timeout_remove(req);
break;
case IORING_OP_ACCEPT:
- if (sqe) {
- ret = io_accept_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_accept(req, nxt, force_nonblock);
break;
- case IORING_OP_CONNECT:
- if (sqe) {
- ret = io_connect_prep(req, sqe);
- if (ret)
- break;
- }
- ret = io_connect(req, nxt, force_nonblock);
- break;
case IORING_OP_ASYNC_CANCEL:
- if (sqe) {
- ret = io_async_cancel_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_async_cancel(req, nxt);
break;
+ case IORING_OP_CONNECT:
+ ret = io_connect(req, nxt, force_nonblock);
+ break;
case IORING_OP_FALLOCATE:
- if (sqe) {
- ret = io_fallocate_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_fallocate(req, nxt, force_nonblock);
break;
case IORING_OP_OPENAT:
- if (sqe) {
- ret = io_openat_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_openat(req, nxt, force_nonblock);
break;
case IORING_OP_CLOSE:
- if (sqe) {
- ret = io_close_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_close(req, nxt, force_nonblock);
break;
case IORING_OP_FILES_UPDATE:
- if (sqe) {
- ret = io_files_update_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_files_update(req, force_nonblock);
break;
case IORING_OP_STATX:
- if (sqe) {
- ret = io_statx_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_statx(req, nxt, force_nonblock);
break;
case IORING_OP_FADVISE:
- if (sqe) {
- ret = io_fadvise_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_fadvise(req, nxt, force_nonblock);
break;
case IORING_OP_MADVISE:
- if (sqe) {
- ret = io_madvise_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_madvise(req, nxt, force_nonblock);
break;
case IORING_OP_OPENAT2:
- if (sqe) {
- ret = io_openat2_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_openat2(req, nxt, force_nonblock);
break;
case IORING_OP_EPOLL_CTL:
- if (sqe) {
- ret = io_epoll_ctl_prep(req, sqe);
- if (ret)
- break;
- }
ret = io_epoll_ctl(req, nxt, force_nonblock);
break;
default:
--
2.25.0.114.g5b0ca878e0
[-- Attachment #3: v1-0002-WIP-io_uring-Separate-blocking-nonblocking-io_iss.patch --]
[-- Type: text/x-diff, Size: 2588 bytes --]
From 4efd092e07207d18b2f0fdbc6e68e93d5e7c93b0 Mon Sep 17 00:00:00 2001
From: Andres Freund <[email protected]>
Date: Sun, 23 Feb 2020 23:06:58 -0800
Subject: [PATCH v1 2/2] WIP: io_uring: Separate blocking/nonblocking
io_issue_sqe cases.
Signed-off-by: Andres Freund <[email protected]>
---
fs/io_uring.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 9a8fda8b28c9..b149ab57c5b4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4284,20 +4284,12 @@ static void io_cleanup_req(struct io_kiocb *req)
req->flags &= ~REQ_F_NEED_CLEANUP;
}
-static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+static inline int __io_issue_sqe(u8 opcode, struct io_kiocb *req, const struct io_uring_sqe *sqe,
struct io_kiocb **nxt, bool force_nonblock)
{
struct io_ring_ctx *ctx = req->ctx;
- /* allow compiler to infer opcode doesn't change */
- u8 opcode = req->opcode;
int ret;
- if (sqe) {
- ret = io_req_prep(opcode, req, sqe, force_nonblock);
- if (ret)
- return ret;
- }
-
switch (opcode) {
case IORING_OP_NOP:
ret = io_nop(req);
@@ -4405,6 +4397,25 @@ static int io_issue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe,
return 0;
}
+static int io_prep_issue_sqe_nonblock(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+ struct io_kiocb **nxt)
+{
+ /* allow compiler to infer opcode doesn't change */
+ u8 opcode = req->opcode;
+ int ret;
+
+ ret = io_req_prep(opcode, req, sqe, true);
+ if (ret)
+ return ret;
+
+ return __io_issue_sqe(opcode, req, NULL, nxt, true);
+}
+
+static int io_issue_sqe_block(struct io_kiocb *req, struct io_kiocb **nxt)
+{
+ return __io_issue_sqe(req->opcode, req, NULL, nxt, false);
+}
+
static void io_wq_submit_work(struct io_wq_work **workptr)
{
struct io_wq_work *work = *workptr;
@@ -4421,7 +4432,7 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
if (!ret) {
req->in_async = true;
do {
- ret = io_issue_sqe(req, NULL, &nxt, false);
+ ret = io_issue_sqe_block(req, &nxt);
/*
* We can get EAGAIN for polled IO even though we're
* forcing a sync submission from here, since we can't
@@ -4616,7 +4627,7 @@ static void __io_queue_sqe(struct io_kiocb *req, const struct io_uring_sqe *sqe)
again:
linked_timeout = io_prep_linked_timeout(req);
- ret = io_issue_sqe(req, sqe, &nxt, true);
+ ret = io_prep_issue_sqe_nonblock(req, sqe, &nxt);
/*
* We async punt it if the file wasn't marked NOWAIT, or if the file
--
2.25.0.114.g5b0ca878e0
next prev parent reply other threads:[~2020-02-24 7:12 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-24 1:07 Deduplicate io_*_prep calls? Andres Freund
2020-02-24 3:17 ` Jens Axboe
2020-02-24 3:33 ` Andres Freund
2020-02-24 3:52 ` Jens Axboe
2020-02-24 7:12 ` Andres Freund [this message]
2020-02-24 9:10 ` Pavel Begunkov
2020-02-24 15:40 ` Jens Axboe
2020-02-24 15:44 ` Pavel Begunkov
2020-02-24 15:46 ` Jens Axboe
2020-02-24 15:50 ` Pavel Begunkov
2020-02-24 15:53 ` Jens Axboe
2020-02-24 15:56 ` Pavel Begunkov
2020-02-24 16:02 ` Jens Axboe
2020-02-24 16:18 ` Pavel Begunkov
2020-02-24 17:08 ` Andres Freund
2020-02-24 17:16 ` Pavel Begunkov
2020-02-25 9:26 ` Pavel Begunkov
2020-02-27 21:06 ` Andres Freund
2020-02-24 16:53 ` Andres Freund
2020-02-24 17:19 ` Jens Axboe
2020-02-24 17:30 ` Jens Axboe
2020-02-24 17:37 ` Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox