Hi, On 2020-02-23 20:52:26 -0700, Jens Axboe wrote: > The fast case is not being deferred, that's by far the common (and hot) > case, which means io_issue() is called with sqe != NULL. My worry is > that by moving it into a prep helper, the compiler isn't smart enough to > not make that basically two switches. I'm not sure that benefit of a single switch isn't offset by the lower code density due to the additional per-opcode branches. Not inlining the prepare function results in: $ size fs/io_uring.o fs/io_uring.before.o text data bss dec hex filename 75383 8237 8 83628 146ac fs/io_uring.o 76959 8237 8 85204 14cd4 fs/io_uring.before.o symbol size -io_close_prep 0000000000000066 -io_connect_prep 0000000000000051 -io_epoll_ctl_prep 0000000000000051 -io_issue_sqe 0000000000001101 +io_issue_sqe 0000000000000de9 -io_openat2_prep 00000000000000ed -io_openat_prep 0000000000000089 -io_poll_add_prep 0000000000000056 -io_prep_fsync 0000000000000053 -io_prep_sfr 000000000000004e -io_read_prep 00000000000000ca -io_recvmsg_prep 0000000000000079 -io_req_defer_prep 000000000000058e +io_req_defer_prep 0000000000000160 +io_req_prep 0000000000000d26 -io_sendmsg_prep 000000000000006b -io_statx_prep 00000000000000ed -io_write_prep 00000000000000cd > Feel free to prove me wrong, I'd love to reduce it ;-) With a bit of handholding the compiler can deduplicate the switches. It can't recognize on its own that req->opcode can't change between the switch for prep and issue. Can be solved by moving the opcode into a temporary variable. Also needs an inline for io_req_prep (not surpring, it's a bit large). That results in a bit bigger code. That's partially because of more inlining: text data bss dec hex filename 78291 8237 8 86536 15208 fs/io_uring.o 76959 8237 8 85204 14cd4 fs/io_uring.before.o symbol size +get_order 0000000000000015 -io_close_prep 0000000000000066 -io_connect_prep 0000000000000051 -io_epoll_ctl_prep 0000000000000051 -io_issue_sqe 0000000000001101 +io_issue_sqe 00000000000018fa -io_openat2_prep 00000000000000ed -io_openat_prep 0000000000000089 -io_poll_add_prep 0000000000000056 -io_prep_fsync 0000000000000053 -io_prep_sfr 000000000000004e -io_read_prep 00000000000000ca -io_recvmsg_prep 0000000000000079 -io_req_defer_prep 000000000000058e +io_req_defer_prep 0000000000000f12 -io_sendmsg_prep 000000000000006b -io_statx_prep 00000000000000ed -io_write_prep 00000000000000cd There's still some unnecessary branching on force_nonblocking. The second patch just separates the cases needing force_nonblocking out. Probably not quite the right structure. Oddly enough gcc decides that io_queue_async_work() wouldn't be inlined anymore after that. I'm quite doubtful it's a good candidate anyway? Seems mighty complex, and not likely to win much. That's a noticable win: text data bss dec hex filename 72857 8141 8 81006 13c6e fs/io_uring.o 76959 8237 8 85204 14cd4 fs/io_uring.before.o --- /tmp/before.txt 2020-02-23 21:00:16.316753022 -0800 +++ /tmp/after.txt 2020-02-23 23:10:44.979496728 -0800 -io_commit_cqring 00000000000003ef +io_commit_cqring 000000000000012c +io_free_req 000000000000005e -io_free_req 00000000000002ed -io_issue_sqe 0000000000001101 +io_issue_sqe 0000000000000e86 -io_poll_remove_one 0000000000000308 +io_poll_remove_one 0000000000000074 -io_poll_wake 0000000000000498 +io_poll_wake 000000000000021c +io_queue_async_work 00000000000002a0 -io_queue_sqe 00000000000008cc +io_queue_sqe 0000000000000391 Not quite sure what the policy is with attaching POC patches? Also send as separate emails? Greetings, Andres Freund