Hi,

On 2020-02-23 20:52:26 -0700, Jens Axboe wrote:
> The fast case is not being deferred, that's by far the common (and hot)
> case, which means io_issue() is called with sqe != NULL. My worry is
> that by moving it into a prep helper, the compiler isn't smart enough to
> not make that basically two switches.

I'm not sure that benefit of a single switch isn't offset by the lower
code density due to the additional per-opcode branches.  Not inlining
the prepare function results in:

$ size fs/io_uring.o fs/io_uring.before.o
   text	   data	    bss	    dec	    hex	filename
  75383	   8237	      8	  83628	  146ac	fs/io_uring.o
  76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o

symbol size
-io_close_prep 0000000000000066
-io_connect_prep 0000000000000051
-io_epoll_ctl_prep 0000000000000051
-io_issue_sqe 0000000000001101
+io_issue_sqe 0000000000000de9
-io_openat2_prep 00000000000000ed
-io_openat_prep 0000000000000089
-io_poll_add_prep 0000000000000056
-io_prep_fsync 0000000000000053
-io_prep_sfr 000000000000004e
-io_read_prep 00000000000000ca
-io_recvmsg_prep 0000000000000079
-io_req_defer_prep 000000000000058e
+io_req_defer_prep 0000000000000160
+io_req_prep 0000000000000d26
-io_sendmsg_prep 000000000000006b
-io_statx_prep 00000000000000ed
-io_write_prep 00000000000000cd


> Feel free to prove me wrong, I'd love to reduce it ;-)

With a bit of handholding the compiler can deduplicate the switches. It
can't recognize on its own that req->opcode can't change between the
switch for prep and issue. Can be solved by moving the opcode into a
temporary variable. Also needs an inline for io_req_prep (not surpring,
it's a bit large).

That results in a bit bigger code. That's partially because of more
inlining:
   text	   data	    bss	    dec	    hex	filename
  78291	   8237	      8	  86536	  15208	fs/io_uring.o
  76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o

symbol size
+get_order 0000000000000015
-io_close_prep 0000000000000066
-io_connect_prep 0000000000000051
-io_epoll_ctl_prep 0000000000000051
-io_issue_sqe 0000000000001101
+io_issue_sqe 00000000000018fa
-io_openat2_prep 00000000000000ed
-io_openat_prep 0000000000000089
-io_poll_add_prep 0000000000000056
-io_prep_fsync 0000000000000053
-io_prep_sfr 000000000000004e
-io_read_prep 00000000000000ca
-io_recvmsg_prep 0000000000000079
-io_req_defer_prep 000000000000058e
+io_req_defer_prep 0000000000000f12
-io_sendmsg_prep 000000000000006b
-io_statx_prep 00000000000000ed
-io_write_prep 00000000000000cd


There's still some unnecessary branching on force_nonblocking. The
second patch just separates the cases needing force_nonblocking
out. Probably not quite the right structure.


Oddly enough gcc decides that io_queue_async_work() wouldn't be inlined
anymore after that. I'm quite doubtful it's a good candidate anyway?
Seems mighty complex, and not likely to win much. That's a noticable
win:
   text	   data	    bss	    dec	    hex	filename
  72857	   8141	      8	  81006	  13c6e	fs/io_uring.o
  76959	   8237	      8	  85204	  14cd4	fs/io_uring.before.o
--- /tmp/before.txt	2020-02-23 21:00:16.316753022 -0800
+++ /tmp/after.txt	2020-02-23 23:10:44.979496728 -0800
-io_commit_cqring 00000000000003ef
+io_commit_cqring 000000000000012c
+io_free_req 000000000000005e
-io_free_req 00000000000002ed
-io_issue_sqe 0000000000001101
+io_issue_sqe 0000000000000e86
-io_poll_remove_one 0000000000000308
+io_poll_remove_one 0000000000000074
-io_poll_wake 0000000000000498
+io_poll_wake 000000000000021c
+io_queue_async_work 00000000000002a0
-io_queue_sqe 00000000000008cc
+io_queue_sqe 0000000000000391


Not quite sure what the policy is with attaching POC patches? Also send
as separate emails?

Greetings,

Andres Freund