On 24/02/2020 18:40, Jens Axboe wrote: > On 2/24/20 12:12 AM, Andres Freund wrote: >> Hi, >> >> On 2020-02-23 20:52:26 -0700, Jens Axboe wrote: >>> The fast case is not being deferred, that's by far the common (and hot) >>> case, which means io_issue() is called with sqe != NULL. My worry is >>> that by moving it into a prep helper, the compiler isn't smart enough to >>> not make that basically two switches. >> >> I'm not sure that benefit of a single switch isn't offset by the lower >> code density due to the additional per-opcode branches. Not inlining >> the prepare function results in: >> >> $ size fs/io_uring.o fs/io_uring.before.o >> text data bss dec hex filename >> 75383 8237 8 83628 146ac fs/io_uring.o >> 76959 8237 8 85204 14cd4 fs/io_uring.before.o >> >> symbol size >> -io_close_prep 0000000000000066 >> -io_connect_prep 0000000000000051 >> -io_epoll_ctl_prep 0000000000000051 >> -io_issue_sqe 0000000000001101 >> +io_issue_sqe 0000000000000de9 >> -io_openat2_prep 00000000000000ed >> -io_openat_prep 0000000000000089 >> -io_poll_add_prep 0000000000000056 >> -io_prep_fsync 0000000000000053 >> -io_prep_sfr 000000000000004e >> -io_read_prep 00000000000000ca >> -io_recvmsg_prep 0000000000000079 >> -io_req_defer_prep 000000000000058e >> +io_req_defer_prep 0000000000000160 >> +io_req_prep 0000000000000d26 >> -io_sendmsg_prep 000000000000006b >> -io_statx_prep 00000000000000ed >> -io_write_prep 00000000000000cd >> >> >> >>> Feel free to prove me wrong, I'd love to reduce it ;-) >> >> With a bit of handholding the compiler can deduplicate the switches. It >> can't recognize on its own that req->opcode can't change between the >> switch for prep and issue. Can be solved by moving the opcode into a >> temporary variable. Also needs an inline for io_req_prep (not surpring, >> it's a bit large). >> >> That results in a bit bigger code. That's partially because of more >> inlining: >> text data bss dec hex filename >> 78291 8237 8 86536 15208 fs/io_uring.o >> 76959 8237 8 85204 14cd4 fs/io_uring.before.o >> >> symbol size >> +get_order 0000000000000015 >> -io_close_prep 0000000000000066 >> -io_connect_prep 0000000000000051 >> -io_epoll_ctl_prep 0000000000000051 >> -io_issue_sqe 0000000000001101 >> +io_issue_sqe 00000000000018fa >> -io_openat2_prep 00000000000000ed >> -io_openat_prep 0000000000000089 >> -io_poll_add_prep 0000000000000056 >> -io_prep_fsync 0000000000000053 >> -io_prep_sfr 000000000000004e >> -io_read_prep 00000000000000ca >> -io_recvmsg_prep 0000000000000079 >> -io_req_defer_prep 000000000000058e >> +io_req_defer_prep 0000000000000f12 >> -io_sendmsg_prep 000000000000006b >> -io_statx_prep 00000000000000ed >> -io_write_prep 00000000000000cd >> >> >> There's still some unnecessary branching on force_nonblocking. The >> second patch just separates the cases needing force_nonblocking >> out. Probably not quite the right structure. >> >> >> Oddly enough gcc decides that io_queue_async_work() wouldn't be inlined >> anymore after that. I'm quite doubtful it's a good candidate anyway? >> Seems mighty complex, and not likely to win much. That's a noticable >> win: >> text data bss dec hex filename >> 72857 8141 8 81006 13c6e fs/io_uring.o >> 76959 8237 8 85204 14cd4 fs/io_uring.before.o >> --- /tmp/before.txt 2020-02-23 21:00:16.316753022 -0800 >> +++ /tmp/after.txt 2020-02-23 23:10:44.979496728 -0800 >> -io_commit_cqring 00000000000003ef >> +io_commit_cqring 000000000000012c >> +io_free_req 000000000000005e >> -io_free_req 00000000000002ed >> -io_issue_sqe 0000000000001101 >> +io_issue_sqe 0000000000000e86 >> -io_poll_remove_one 0000000000000308 >> +io_poll_remove_one 0000000000000074 >> -io_poll_wake 0000000000000498 >> +io_poll_wake 000000000000021c >> +io_queue_async_work 00000000000002a0 >> -io_queue_sqe 00000000000008cc >> +io_queue_sqe 0000000000000391 > > That's OK, it's slow path, I'd prefer it not to be inlined. > >> Not quite sure what the policy is with attaching POC patches? Also send >> as separate emails? > > Fine like this, though easier if you inline the patches so it's easier > to comment on them. > > Agree that the first patch looks fine, though I don't quite see why > you want to pass in opcode as a separate argument as it's always > req->opcode. Seeing it separate makes me a bit nervous, thinking that > someone is reading it again from the sqe, or maybe not passing in > the right opcode for the given request. So that seems fragile and it > should go away. I suppose it's to hint a compiler, that opcode haven't been changed inside the first switch. And any compiler I used breaks analysis there pretty easy. Optimising C is such a pain... -- Pavel Begunkov