* [PATCH liburing v1 0/2] __hot and __cold @ 2022-07-03 11:59 Ammar Faizi 2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw) To: Jens Axboe Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov, io-uring Mailing List, GNU/Weeb Mailing List From: Ammar Faizi <[email protected]> Hi Jens, This series adds __hot and __cold macros. Currently, the __hot macro is not used. The __cold annotation hints the compiler to optimize for code size. This is good for the slow-path in the setup.c file. Here is the result compiling with Ubuntu clang 15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510 Without this patchset: $ wc -c src/liburing.so.2.3 71288 src/liburing.so.2.3 With this patchset: $ wc -c src/liburing.so.2.3 69448 src/liburing.so.2.3 Take one slow-path function example, using __cold avoids aggresive inlining. Without this patchset: 00000000000024f0 <io_uring_queue_init>: 24f0: pushq %r14 24f2: pushq %rbx 24f3: subq $0x78,%rsp 24f7: movq %rsi,%r14 24fa: xorps %xmm0,%xmm0 24fd: movaps %xmm0,(%rsp) 2501: movaps %xmm0,0x60(%rsp) 2506: movaps %xmm0,0x50(%rsp) 250b: movaps %xmm0,0x40(%rsp) 2510: movaps %xmm0,0x30(%rsp) 2515: movaps %xmm0,0x20(%rsp) 251a: movaps %xmm0,0x10(%rsp) 251f: movq $0x0,0x70(%rsp) 2528: movl %edx,0x8(%rsp) 252c: movq %rsp,%rsi 252f: movl $0x1a9,%eax 2534: syscall 2536: movq %rax,%rbx 2539: testl %ebx,%ebx 253b: js 256a <io_uring_queue_init+0x7a> 253d: movq %rsp,%rsi 2540: movl %ebx,%edi 2542: movq %r14,%rdx 2545: callq 2080 <io_uring_queue_mmap@plt> 254a: testl %eax,%eax 254c: je 255d <io_uring_queue_init+0x6d> 254e: movl %eax,%edx 2550: movl $0x3,%eax 2555: movl %ebx,%edi 2557: syscall 2559: movl %edx,%ebx 255b: jmp 256a <io_uring_queue_init+0x7a> 255d: movl 0x14(%rsp),%eax 2561: movl %eax,0xc8(%r14) 2568: xorl %ebx,%ebx 256a: movl %ebx,%eax 256c: addq $0x78,%rsp 2570: popq %rbx 2571: popq %r14 2573: retq With this patchset: 000000000000240c <io_uring_queue_init>: 240c: subq $0x78,%rsp 2410: xorps %xmm0,%xmm0 2413: movq %rsp,%rax 2416: movaps %xmm0,(%rax) 2419: movaps %xmm0,0x60(%rax) 241d: movaps %xmm0,0x50(%rax) 2421: movaps %xmm0,0x40(%rax) 2425: movaps %xmm0,0x30(%rax) 2429: movaps %xmm0,0x20(%rax) 242d: movaps %xmm0,0x10(%rax) 2431: movq $0x0,0x70(%rax) 2439: movl %edx,0x8(%rax) 243c: movq %rax,%rdx 243f: callq 2090 <io_uring_queue_init_params@plt> 2444: addq $0x78,%rsp 2448: retq Signed-off-by: Ammar Faizi <[email protected]> --- Ammar Faizi (2): lib: Add __hot and __cold macros setup: Mark the exported functions as __cold src/lib.h | 2 ++ src/setup.c | 25 ++++++++++++++----------- 2 files changed, 16 insertions(+), 11 deletions(-) base-commit: 98c14a04e2c0dcdfbb71372a1a209ed889fb3e4d -- Ammar Faizi ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH liburing v1 1/2] lib: Add __hot and __cold macros 2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi @ 2022-07-03 11:59 ` Ammar Faizi 2022-07-03 12:20 ` Alviro Iskandar Setiawan 2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi 2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe 2 siblings, 1 reply; 6+ messages in thread From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw) To: Jens Axboe Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov, io-uring Mailing List, GNU/Weeb Mailing List From: Ammar Faizi <[email protected]> A prep patch. These macros will be used to annotate hot and cold functions. Currently, the __hot macro is not used, we will only use the __cold macro at the moment. Signed-off-by: Ammar Faizi <[email protected]> --- src/lib.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/lib.h b/src/lib.h index 5844cd2..89a40f2 100644 --- a/src/lib.h +++ b/src/lib.h @@ -34,6 +34,8 @@ #endif #define __maybe_unused __attribute__((__unused__)) +#define __hot __attribute__((__hot__)) +#define __cold __attribute__((__cold__)) void *__uring_malloc(size_t len); void __uring_free(void *p); -- Ammar Faizi ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH liburing v1 1/2] lib: Add __hot and __cold macros 2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi @ 2022-07-03 12:20 ` Alviro Iskandar Setiawan 0 siblings, 0 replies; 6+ messages in thread From: Alviro Iskandar Setiawan @ 2022-07-03 12:20 UTC (permalink / raw) To: Ammar Faizi Cc: Jens Axboe, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov, io-uring Mailing List, GNU/Weeb Mailing List On Sun, Jul 3, 2022 at 6:59 PM Ammar Faizi wrote: > > From: Ammar Faizi <[email protected]> > > A prep patch. These macros will be used to annotate hot and cold > functions. Currently, the __hot macro is not used, we will only use > the __cold macro at the moment. > > Signed-off-by: Ammar Faizi <[email protected]> Reviewed-by: Alviro Iskandar Setiawan <[email protected]> tq -- Viro ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold 2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi 2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi @ 2022-07-03 11:59 ` Ammar Faizi 2022-07-03 12:24 ` Alviro Iskandar Setiawan 2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe 2 siblings, 1 reply; 6+ messages in thread From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw) To: Jens Axboe Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov, io-uring Mailing List, GNU/Weeb Mailing List From: Ammar Faizi <[email protected]> These functions are called at initialization, which are slow-paths. Mark them as __cold so that the compiler will optimize for code size. Here is the result compiling with Ubuntu clang 15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510 Without this patch: $ wc -c src/liburing.so.2.3 71288 src/liburing.so.2.3 With this patch: $ wc -c src/liburing.so.2.3 69448 src/liburing.so.2.3 Take one slow-path function example, using __cold avoids aggresive inlining. Without this patch: 00000000000024f0 <io_uring_queue_init>: 24f0: pushq %r14 24f2: pushq %rbx 24f3: subq $0x78,%rsp 24f7: movq %rsi,%r14 24fa: xorps %xmm0,%xmm0 24fd: movaps %xmm0,(%rsp) 2501: movaps %xmm0,0x60(%rsp) 2506: movaps %xmm0,0x50(%rsp) 250b: movaps %xmm0,0x40(%rsp) 2510: movaps %xmm0,0x30(%rsp) 2515: movaps %xmm0,0x20(%rsp) 251a: movaps %xmm0,0x10(%rsp) 251f: movq $0x0,0x70(%rsp) 2528: movl %edx,0x8(%rsp) 252c: movq %rsp,%rsi 252f: movl $0x1a9,%eax 2534: syscall 2536: movq %rax,%rbx 2539: testl %ebx,%ebx 253b: js 256a <io_uring_queue_init+0x7a> 253d: movq %rsp,%rsi 2540: movl %ebx,%edi 2542: movq %r14,%rdx 2545: callq 2080 <io_uring_queue_mmap@plt> 254a: testl %eax,%eax 254c: je 255d <io_uring_queue_init+0x6d> 254e: movl %eax,%edx 2550: movl $0x3,%eax 2555: movl %ebx,%edi 2557: syscall 2559: movl %edx,%ebx 255b: jmp 256a <io_uring_queue_init+0x7a> 255d: movl 0x14(%rsp),%eax 2561: movl %eax,0xc8(%r14) 2568: xorl %ebx,%ebx 256a: movl %ebx,%eax 256c: addq $0x78,%rsp 2570: popq %rbx 2571: popq %r14 2573: retq With this patch: 000000000000240c <io_uring_queue_init>: 240c: subq $0x78,%rsp 2410: xorps %xmm0,%xmm0 2413: movq %rsp,%rax 2416: movaps %xmm0,(%rax) 2419: movaps %xmm0,0x60(%rax) 241d: movaps %xmm0,0x50(%rax) 2421: movaps %xmm0,0x40(%rax) 2425: movaps %xmm0,0x30(%rax) 2429: movaps %xmm0,0x20(%rax) 242d: movaps %xmm0,0x10(%rax) 2431: movq $0x0,0x70(%rax) 2439: movl %edx,0x8(%rax) 243c: movq %rax,%rdx 243f: callq 2090 <io_uring_queue_init_params@plt> 2444: addq $0x78,%rsp 2448: retq Signed-off-by: Ammar Faizi <[email protected]> --- src/setup.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/src/setup.c b/src/setup.c index d2adc7f..2badcc1 100644 --- a/src/setup.c +++ b/src/setup.c @@ -89,7 +89,8 @@ err: * Returns -errno on error, or zero on success. On success, 'ring' * contains the necessary information to read/write to the rings. */ -int io_uring_queue_mmap(int fd, struct io_uring_params *p, struct io_uring *ring) +__cold int io_uring_queue_mmap(int fd, struct io_uring_params *p, + struct io_uring *ring) { int ret; @@ -107,7 +108,7 @@ int io_uring_queue_mmap(int fd, struct io_uring_params *p, struct io_uring *ring * Ensure that the mmap'ed rings aren't available to a child after a fork(2). * This uses madvise(..., MADV_DONTFORK) on the mmap'ed ranges. */ -int io_uring_ring_dontfork(struct io_uring *ring) +__cold int io_uring_ring_dontfork(struct io_uring *ring) { size_t len; int ret; @@ -138,8 +139,8 @@ int io_uring_ring_dontfork(struct io_uring *ring) return 0; } -int io_uring_queue_init_params(unsigned entries, struct io_uring *ring, - struct io_uring_params *p) +__cold int io_uring_queue_init_params(unsigned entries, struct io_uring *ring, + struct io_uring_params *p) { int fd, ret; @@ -161,7 +162,8 @@ int io_uring_queue_init_params(unsigned entries, struct io_uring *ring, * Returns -errno on error, or zero on success. On success, 'ring' * contains the necessary information to read/write to the rings. */ -int io_uring_queue_init(unsigned entries, struct io_uring *ring, unsigned flags) +__cold int io_uring_queue_init(unsigned entries, struct io_uring *ring, + unsigned flags) { struct io_uring_params p; @@ -171,7 +173,7 @@ int io_uring_queue_init(unsigned entries, struct io_uring *ring, unsigned flags) return io_uring_queue_init_params(entries, ring, &p); } -void io_uring_queue_exit(struct io_uring *ring) +__cold void io_uring_queue_exit(struct io_uring *ring) { struct io_uring_sq *sq = &ring->sq; struct io_uring_cq *cq = &ring->cq; @@ -191,7 +193,7 @@ void io_uring_queue_exit(struct io_uring *ring) __sys_close(ring->ring_fd); } -struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring) +__cold struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring) { struct io_uring_probe *probe; size_t len; @@ -211,7 +213,7 @@ struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring) return NULL; } -struct io_uring_probe *io_uring_get_probe(void) +__cold struct io_uring_probe *io_uring_get_probe(void) { struct io_uring ring; struct io_uring_probe *probe; @@ -226,7 +228,7 @@ struct io_uring_probe *io_uring_get_probe(void) return probe; } -void io_uring_free_probe(struct io_uring_probe *probe) +__cold void io_uring_free_probe(struct io_uring_probe *probe) { uring_free(probe); } @@ -284,7 +286,8 @@ static size_t rings_size(struct io_uring_params *p, unsigned entries, * return the required memory so that the caller can ensure that enough space * is available before setting up a ring with the specified parameters. */ -ssize_t io_uring_mlock_size_params(unsigned entries, struct io_uring_params *p) +__cold ssize_t io_uring_mlock_size_params(unsigned entries, + struct io_uring_params *p) { struct io_uring_params lp = { }; struct io_uring ring; @@ -343,7 +346,7 @@ ssize_t io_uring_mlock_size_params(unsigned entries, struct io_uring_params *p) * Return required ulimit -l memory space for a given ring setup. See * @io_uring_mlock_size_params(). */ -ssize_t io_uring_mlock_size(unsigned entries, unsigned flags) +__cold ssize_t io_uring_mlock_size(unsigned entries, unsigned flags) { struct io_uring_params p = { .flags = flags, }; -- Ammar Faizi ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold 2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi @ 2022-07-03 12:24 ` Alviro Iskandar Setiawan 0 siblings, 0 replies; 6+ messages in thread From: Alviro Iskandar Setiawan @ 2022-07-03 12:24 UTC (permalink / raw) To: Ammar Faizi Cc: Jens Axboe, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov, io-uring Mailing List, GNU/Weeb Mailing List On Sun, Jul 3, 2022 at 6:59 PM Ammar Faizi wrote: > > From: Ammar Faizi <[email protected]> > > These functions are called at initialization, which are slow-paths. > Mark them as __cold so that the compiler will optimize for code size. > > Here is the result compiling with Ubuntu clang > 15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510 > > Without this patch: > > $ wc -c src/liburing.so.2.3 > 71288 src/liburing.so.2.3 > > With this patch: > > $ wc -c src/liburing.so.2.3 > 69448 src/liburing.so.2.3 > > Take one slow-path function example, using __cold avoids aggresive > inlining. > > Without this patch: > > 00000000000024f0 <io_uring_queue_init>: > 24f0: pushq %r14 > 24f2: pushq %rbx > 24f3: subq $0x78,%rsp > 24f7: movq %rsi,%r14 > 24fa: xorps %xmm0,%xmm0 > 24fd: movaps %xmm0,(%rsp) > 2501: movaps %xmm0,0x60(%rsp) > 2506: movaps %xmm0,0x50(%rsp) > 250b: movaps %xmm0,0x40(%rsp) > 2510: movaps %xmm0,0x30(%rsp) > 2515: movaps %xmm0,0x20(%rsp) > 251a: movaps %xmm0,0x10(%rsp) > 251f: movq $0x0,0x70(%rsp) > 2528: movl %edx,0x8(%rsp) > 252c: movq %rsp,%rsi > 252f: movl $0x1a9,%eax > 2534: syscall > 2536: movq %rax,%rbx > 2539: testl %ebx,%ebx > 253b: js 256a <io_uring_queue_init+0x7a> > 253d: movq %rsp,%rsi > 2540: movl %ebx,%edi > 2542: movq %r14,%rdx > 2545: callq 2080 <io_uring_queue_mmap@plt> > 254a: testl %eax,%eax > 254c: je 255d <io_uring_queue_init+0x6d> > 254e: movl %eax,%edx > 2550: movl $0x3,%eax > 2555: movl %ebx,%edi > 2557: syscall > 2559: movl %edx,%ebx > 255b: jmp 256a <io_uring_queue_init+0x7a> > 255d: movl 0x14(%rsp),%eax > 2561: movl %eax,0xc8(%r14) > 2568: xorl %ebx,%ebx > 256a: movl %ebx,%eax > 256c: addq $0x78,%rsp > 2570: popq %rbx > 2571: popq %r14 > 2573: retq > > With this patch: > > 000000000000240c <io_uring_queue_init>: > 240c: subq $0x78,%rsp > 2410: xorps %xmm0,%xmm0 > 2413: movq %rsp,%rax > 2416: movaps %xmm0,(%rax) > 2419: movaps %xmm0,0x60(%rax) > 241d: movaps %xmm0,0x50(%rax) > 2421: movaps %xmm0,0x40(%rax) > 2425: movaps %xmm0,0x30(%rax) > 2429: movaps %xmm0,0x20(%rax) > 242d: movaps %xmm0,0x10(%rax) > 2431: movq $0x0,0x70(%rax) > 2439: movl %edx,0x8(%rax) > 243c: movq %rax,%rdx > 243f: callq 2090 <io_uring_queue_init_params@plt> > 2444: addq $0x78,%rsp > 2448: retq > > Signed-off-by: Ammar Faizi <[email protected]> Reviewed-by: Alviro Iskandar Setiawan <[email protected]> tq -- Viro ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH liburing v1 0/2] __hot and __cold 2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi 2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi 2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi @ 2022-07-03 13:00 ` Jens Axboe 2 siblings, 0 replies; 6+ messages in thread From: Jens Axboe @ 2022-07-03 13:00 UTC (permalink / raw) To: ammarfaizi2 Cc: alviro.iskandar, asml.silence, io-uring, howeyxu, fernandafmr12, gwml On Sun, 3 Jul 2022 18:59:10 +0700, Ammar Faizi wrote: > From: Ammar Faizi <[email protected]> > > Hi Jens, > > This series adds __hot and __cold macros. Currently, the __hot macro > is not used. The __cold annotation hints the compiler to optimize for > code size. This is good for the slow-path in the setup.c file. > > [...] Applied, thanks! [1/2] lib: Add __hot and __cold macros commit: ee459df3c83ab86b84e1acaaa23c340efb5bab35 [2/2] setup: Mark the exported functions as __cold commit: 907c171fa4aac773fee9421bc38fcf9581e54f61 Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-07-03 13:01 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi 2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi 2022-07-03 12:20 ` Alviro Iskandar Setiawan 2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi 2022-07-03 12:24 ` Alviro Iskandar Setiawan 2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox