* [PATCH liburing v1 0/2] __hot and __cold
@ 2022-07-03 11:59 Ammar Faizi
2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf,
Hao Xu, Pavel Begunkov, io-uring Mailing List,
GNU/Weeb Mailing List
From: Ammar Faizi <[email protected]>
Hi Jens,
This series adds __hot and __cold macros. Currently, the __hot macro
is not used. The __cold annotation hints the compiler to optimize for
code size. This is good for the slow-path in the setup.c file.
Here is the result compiling with Ubuntu clang
15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510
Without this patchset:
$ wc -c src/liburing.so.2.3
71288 src/liburing.so.2.3
With this patchset:
$ wc -c src/liburing.so.2.3
69448 src/liburing.so.2.3
Take one slow-path function example, using __cold avoids aggresive
inlining.
Without this patchset:
00000000000024f0 <io_uring_queue_init>:
24f0: pushq %r14
24f2: pushq %rbx
24f3: subq $0x78,%rsp
24f7: movq %rsi,%r14
24fa: xorps %xmm0,%xmm0
24fd: movaps %xmm0,(%rsp)
2501: movaps %xmm0,0x60(%rsp)
2506: movaps %xmm0,0x50(%rsp)
250b: movaps %xmm0,0x40(%rsp)
2510: movaps %xmm0,0x30(%rsp)
2515: movaps %xmm0,0x20(%rsp)
251a: movaps %xmm0,0x10(%rsp)
251f: movq $0x0,0x70(%rsp)
2528: movl %edx,0x8(%rsp)
252c: movq %rsp,%rsi
252f: movl $0x1a9,%eax
2534: syscall
2536: movq %rax,%rbx
2539: testl %ebx,%ebx
253b: js 256a <io_uring_queue_init+0x7a>
253d: movq %rsp,%rsi
2540: movl %ebx,%edi
2542: movq %r14,%rdx
2545: callq 2080 <io_uring_queue_mmap@plt>
254a: testl %eax,%eax
254c: je 255d <io_uring_queue_init+0x6d>
254e: movl %eax,%edx
2550: movl $0x3,%eax
2555: movl %ebx,%edi
2557: syscall
2559: movl %edx,%ebx
255b: jmp 256a <io_uring_queue_init+0x7a>
255d: movl 0x14(%rsp),%eax
2561: movl %eax,0xc8(%r14)
2568: xorl %ebx,%ebx
256a: movl %ebx,%eax
256c: addq $0x78,%rsp
2570: popq %rbx
2571: popq %r14
2573: retq
With this patchset:
000000000000240c <io_uring_queue_init>:
240c: subq $0x78,%rsp
2410: xorps %xmm0,%xmm0
2413: movq %rsp,%rax
2416: movaps %xmm0,(%rax)
2419: movaps %xmm0,0x60(%rax)
241d: movaps %xmm0,0x50(%rax)
2421: movaps %xmm0,0x40(%rax)
2425: movaps %xmm0,0x30(%rax)
2429: movaps %xmm0,0x20(%rax)
242d: movaps %xmm0,0x10(%rax)
2431: movq $0x0,0x70(%rax)
2439: movl %edx,0x8(%rax)
243c: movq %rax,%rdx
243f: callq 2090 <io_uring_queue_init_params@plt>
2444: addq $0x78,%rsp
2448: retq
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (2):
lib: Add __hot and __cold macros
setup: Mark the exported functions as __cold
src/lib.h | 2 ++
src/setup.c | 25 ++++++++++++++-----------
2 files changed, 16 insertions(+), 11 deletions(-)
base-commit: 98c14a04e2c0dcdfbb71372a1a209ed889fb3e4d
--
Ammar Faizi
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH liburing v1 1/2] lib: Add __hot and __cold macros
2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi
@ 2022-07-03 11:59 ` Ammar Faizi
2022-07-03 12:20 ` Alviro Iskandar Setiawan
2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi
2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe
2 siblings, 1 reply; 6+ messages in thread
From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf,
Hao Xu, Pavel Begunkov, io-uring Mailing List,
GNU/Weeb Mailing List
From: Ammar Faizi <[email protected]>
A prep patch. These macros will be used to annotate hot and cold
functions. Currently, the __hot macro is not used, we will only use
the __cold macro at the moment.
Signed-off-by: Ammar Faizi <[email protected]>
---
src/lib.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/src/lib.h b/src/lib.h
index 5844cd2..89a40f2 100644
--- a/src/lib.h
+++ b/src/lib.h
@@ -34,6 +34,8 @@
#endif
#define __maybe_unused __attribute__((__unused__))
+#define __hot __attribute__((__hot__))
+#define __cold __attribute__((__cold__))
void *__uring_malloc(size_t len);
void __uring_free(void *p);
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold
2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi
2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
@ 2022-07-03 11:59 ` Ammar Faizi
2022-07-03 12:24 ` Alviro Iskandar Setiawan
2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe
2 siblings, 1 reply; 6+ messages in thread
From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf,
Hao Xu, Pavel Begunkov, io-uring Mailing List,
GNU/Weeb Mailing List
From: Ammar Faizi <[email protected]>
These functions are called at initialization, which are slow-paths.
Mark them as __cold so that the compiler will optimize for code size.
Here is the result compiling with Ubuntu clang
15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510
Without this patch:
$ wc -c src/liburing.so.2.3
71288 src/liburing.so.2.3
With this patch:
$ wc -c src/liburing.so.2.3
69448 src/liburing.so.2.3
Take one slow-path function example, using __cold avoids aggresive
inlining.
Without this patch:
00000000000024f0 <io_uring_queue_init>:
24f0: pushq %r14
24f2: pushq %rbx
24f3: subq $0x78,%rsp
24f7: movq %rsi,%r14
24fa: xorps %xmm0,%xmm0
24fd: movaps %xmm0,(%rsp)
2501: movaps %xmm0,0x60(%rsp)
2506: movaps %xmm0,0x50(%rsp)
250b: movaps %xmm0,0x40(%rsp)
2510: movaps %xmm0,0x30(%rsp)
2515: movaps %xmm0,0x20(%rsp)
251a: movaps %xmm0,0x10(%rsp)
251f: movq $0x0,0x70(%rsp)
2528: movl %edx,0x8(%rsp)
252c: movq %rsp,%rsi
252f: movl $0x1a9,%eax
2534: syscall
2536: movq %rax,%rbx
2539: testl %ebx,%ebx
253b: js 256a <io_uring_queue_init+0x7a>
253d: movq %rsp,%rsi
2540: movl %ebx,%edi
2542: movq %r14,%rdx
2545: callq 2080 <io_uring_queue_mmap@plt>
254a: testl %eax,%eax
254c: je 255d <io_uring_queue_init+0x6d>
254e: movl %eax,%edx
2550: movl $0x3,%eax
2555: movl %ebx,%edi
2557: syscall
2559: movl %edx,%ebx
255b: jmp 256a <io_uring_queue_init+0x7a>
255d: movl 0x14(%rsp),%eax
2561: movl %eax,0xc8(%r14)
2568: xorl %ebx,%ebx
256a: movl %ebx,%eax
256c: addq $0x78,%rsp
2570: popq %rbx
2571: popq %r14
2573: retq
With this patch:
000000000000240c <io_uring_queue_init>:
240c: subq $0x78,%rsp
2410: xorps %xmm0,%xmm0
2413: movq %rsp,%rax
2416: movaps %xmm0,(%rax)
2419: movaps %xmm0,0x60(%rax)
241d: movaps %xmm0,0x50(%rax)
2421: movaps %xmm0,0x40(%rax)
2425: movaps %xmm0,0x30(%rax)
2429: movaps %xmm0,0x20(%rax)
242d: movaps %xmm0,0x10(%rax)
2431: movq $0x0,0x70(%rax)
2439: movl %edx,0x8(%rax)
243c: movq %rax,%rdx
243f: callq 2090 <io_uring_queue_init_params@plt>
2444: addq $0x78,%rsp
2448: retq
Signed-off-by: Ammar Faizi <[email protected]>
---
src/setup.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/src/setup.c b/src/setup.c
index d2adc7f..2badcc1 100644
--- a/src/setup.c
+++ b/src/setup.c
@@ -89,7 +89,8 @@ err:
* Returns -errno on error, or zero on success. On success, 'ring'
* contains the necessary information to read/write to the rings.
*/
-int io_uring_queue_mmap(int fd, struct io_uring_params *p, struct io_uring *ring)
+__cold int io_uring_queue_mmap(int fd, struct io_uring_params *p,
+ struct io_uring *ring)
{
int ret;
@@ -107,7 +108,7 @@ int io_uring_queue_mmap(int fd, struct io_uring_params *p, struct io_uring *ring
* Ensure that the mmap'ed rings aren't available to a child after a fork(2).
* This uses madvise(..., MADV_DONTFORK) on the mmap'ed ranges.
*/
-int io_uring_ring_dontfork(struct io_uring *ring)
+__cold int io_uring_ring_dontfork(struct io_uring *ring)
{
size_t len;
int ret;
@@ -138,8 +139,8 @@ int io_uring_ring_dontfork(struct io_uring *ring)
return 0;
}
-int io_uring_queue_init_params(unsigned entries, struct io_uring *ring,
- struct io_uring_params *p)
+__cold int io_uring_queue_init_params(unsigned entries, struct io_uring *ring,
+ struct io_uring_params *p)
{
int fd, ret;
@@ -161,7 +162,8 @@ int io_uring_queue_init_params(unsigned entries, struct io_uring *ring,
* Returns -errno on error, or zero on success. On success, 'ring'
* contains the necessary information to read/write to the rings.
*/
-int io_uring_queue_init(unsigned entries, struct io_uring *ring, unsigned flags)
+__cold int io_uring_queue_init(unsigned entries, struct io_uring *ring,
+ unsigned flags)
{
struct io_uring_params p;
@@ -171,7 +173,7 @@ int io_uring_queue_init(unsigned entries, struct io_uring *ring, unsigned flags)
return io_uring_queue_init_params(entries, ring, &p);
}
-void io_uring_queue_exit(struct io_uring *ring)
+__cold void io_uring_queue_exit(struct io_uring *ring)
{
struct io_uring_sq *sq = &ring->sq;
struct io_uring_cq *cq = &ring->cq;
@@ -191,7 +193,7 @@ void io_uring_queue_exit(struct io_uring *ring)
__sys_close(ring->ring_fd);
}
-struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring)
+__cold struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring)
{
struct io_uring_probe *probe;
size_t len;
@@ -211,7 +213,7 @@ struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring)
return NULL;
}
-struct io_uring_probe *io_uring_get_probe(void)
+__cold struct io_uring_probe *io_uring_get_probe(void)
{
struct io_uring ring;
struct io_uring_probe *probe;
@@ -226,7 +228,7 @@ struct io_uring_probe *io_uring_get_probe(void)
return probe;
}
-void io_uring_free_probe(struct io_uring_probe *probe)
+__cold void io_uring_free_probe(struct io_uring_probe *probe)
{
uring_free(probe);
}
@@ -284,7 +286,8 @@ static size_t rings_size(struct io_uring_params *p, unsigned entries,
* return the required memory so that the caller can ensure that enough space
* is available before setting up a ring with the specified parameters.
*/
-ssize_t io_uring_mlock_size_params(unsigned entries, struct io_uring_params *p)
+__cold ssize_t io_uring_mlock_size_params(unsigned entries,
+ struct io_uring_params *p)
{
struct io_uring_params lp = { };
struct io_uring ring;
@@ -343,7 +346,7 @@ ssize_t io_uring_mlock_size_params(unsigned entries, struct io_uring_params *p)
* Return required ulimit -l memory space for a given ring setup. See
* @io_uring_mlock_size_params().
*/
-ssize_t io_uring_mlock_size(unsigned entries, unsigned flags)
+__cold ssize_t io_uring_mlock_size(unsigned entries, unsigned flags)
{
struct io_uring_params p = { .flags = flags, };
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH liburing v1 1/2] lib: Add __hot and __cold macros
2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
@ 2022-07-03 12:20 ` Alviro Iskandar Setiawan
0 siblings, 0 replies; 6+ messages in thread
From: Alviro Iskandar Setiawan @ 2022-07-03 12:20 UTC (permalink / raw)
To: Ammar Faizi
Cc: Jens Axboe, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov,
io-uring Mailing List, GNU/Weeb Mailing List
On Sun, Jul 3, 2022 at 6:59 PM Ammar Faizi wrote:
>
> From: Ammar Faizi <[email protected]>
>
> A prep patch. These macros will be used to annotate hot and cold
> functions. Currently, the __hot macro is not used, we will only use
> the __cold macro at the moment.
>
> Signed-off-by: Ammar Faizi <[email protected]>
Reviewed-by: Alviro Iskandar Setiawan <[email protected]>
tq
-- Viro
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold
2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi
@ 2022-07-03 12:24 ` Alviro Iskandar Setiawan
0 siblings, 0 replies; 6+ messages in thread
From: Alviro Iskandar Setiawan @ 2022-07-03 12:24 UTC (permalink / raw)
To: Ammar Faizi
Cc: Jens Axboe, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov,
io-uring Mailing List, GNU/Weeb Mailing List
On Sun, Jul 3, 2022 at 6:59 PM Ammar Faizi wrote:
>
> From: Ammar Faizi <[email protected]>
>
> These functions are called at initialization, which are slow-paths.
> Mark them as __cold so that the compiler will optimize for code size.
>
> Here is the result compiling with Ubuntu clang
> 15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510
>
> Without this patch:
>
> $ wc -c src/liburing.so.2.3
> 71288 src/liburing.so.2.3
>
> With this patch:
>
> $ wc -c src/liburing.so.2.3
> 69448 src/liburing.so.2.3
>
> Take one slow-path function example, using __cold avoids aggresive
> inlining.
>
> Without this patch:
>
> 00000000000024f0 <io_uring_queue_init>:
> 24f0: pushq %r14
> 24f2: pushq %rbx
> 24f3: subq $0x78,%rsp
> 24f7: movq %rsi,%r14
> 24fa: xorps %xmm0,%xmm0
> 24fd: movaps %xmm0,(%rsp)
> 2501: movaps %xmm0,0x60(%rsp)
> 2506: movaps %xmm0,0x50(%rsp)
> 250b: movaps %xmm0,0x40(%rsp)
> 2510: movaps %xmm0,0x30(%rsp)
> 2515: movaps %xmm0,0x20(%rsp)
> 251a: movaps %xmm0,0x10(%rsp)
> 251f: movq $0x0,0x70(%rsp)
> 2528: movl %edx,0x8(%rsp)
> 252c: movq %rsp,%rsi
> 252f: movl $0x1a9,%eax
> 2534: syscall
> 2536: movq %rax,%rbx
> 2539: testl %ebx,%ebx
> 253b: js 256a <io_uring_queue_init+0x7a>
> 253d: movq %rsp,%rsi
> 2540: movl %ebx,%edi
> 2542: movq %r14,%rdx
> 2545: callq 2080 <io_uring_queue_mmap@plt>
> 254a: testl %eax,%eax
> 254c: je 255d <io_uring_queue_init+0x6d>
> 254e: movl %eax,%edx
> 2550: movl $0x3,%eax
> 2555: movl %ebx,%edi
> 2557: syscall
> 2559: movl %edx,%ebx
> 255b: jmp 256a <io_uring_queue_init+0x7a>
> 255d: movl 0x14(%rsp),%eax
> 2561: movl %eax,0xc8(%r14)
> 2568: xorl %ebx,%ebx
> 256a: movl %ebx,%eax
> 256c: addq $0x78,%rsp
> 2570: popq %rbx
> 2571: popq %r14
> 2573: retq
>
> With this patch:
>
> 000000000000240c <io_uring_queue_init>:
> 240c: subq $0x78,%rsp
> 2410: xorps %xmm0,%xmm0
> 2413: movq %rsp,%rax
> 2416: movaps %xmm0,(%rax)
> 2419: movaps %xmm0,0x60(%rax)
> 241d: movaps %xmm0,0x50(%rax)
> 2421: movaps %xmm0,0x40(%rax)
> 2425: movaps %xmm0,0x30(%rax)
> 2429: movaps %xmm0,0x20(%rax)
> 242d: movaps %xmm0,0x10(%rax)
> 2431: movq $0x0,0x70(%rax)
> 2439: movl %edx,0x8(%rax)
> 243c: movq %rax,%rdx
> 243f: callq 2090 <io_uring_queue_init_params@plt>
> 2444: addq $0x78,%rsp
> 2448: retq
>
> Signed-off-by: Ammar Faizi <[email protected]>
Reviewed-by: Alviro Iskandar Setiawan <[email protected]>
tq
-- Viro
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH liburing v1 0/2] __hot and __cold
2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi
2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi
@ 2022-07-03 13:00 ` Jens Axboe
2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2022-07-03 13:00 UTC (permalink / raw)
To: ammarfaizi2
Cc: alviro.iskandar, asml.silence, io-uring, howeyxu, fernandafmr12,
gwml
On Sun, 3 Jul 2022 18:59:10 +0700, Ammar Faizi wrote:
> From: Ammar Faizi <[email protected]>
>
> Hi Jens,
>
> This series adds __hot and __cold macros. Currently, the __hot macro
> is not used. The __cold annotation hints the compiler to optimize for
> code size. This is good for the slow-path in the setup.c file.
>
> [...]
Applied, thanks!
[1/2] lib: Add __hot and __cold macros
commit: ee459df3c83ab86b84e1acaaa23c340efb5bab35
[2/2] setup: Mark the exported functions as __cold
commit: 907c171fa4aac773fee9421bc38fcf9581e54f61
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-07-03 13:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi
2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
2022-07-03 12:20 ` Alviro Iskandar Setiawan
2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi
2022-07-03 12:24 ` Alviro Iskandar Setiawan
2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox