public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH liburing v1 0/2] __hot and __cold
@ 2022-07-03 11:59 Ammar Faizi
  2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf,
	Hao Xu, Pavel Begunkov, io-uring Mailing List,
	GNU/Weeb Mailing List

From: Ammar Faizi <[email protected]>

Hi Jens,

This series adds __hot and __cold macros. Currently, the __hot macro
is not used. The __cold annotation hints the compiler to optimize for
code size. This is good for the slow-path in the setup.c file.

Here is the result compiling with Ubuntu clang
15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510

Without this patchset:

  $ wc -c src/liburing.so.2.3
  71288 src/liburing.so.2.3

With this patchset:

  $ wc -c src/liburing.so.2.3
  69448 src/liburing.so.2.3

Take one slow-path function example, using __cold avoids aggresive
inlining.

Without this patchset:

  00000000000024f0 <io_uring_queue_init>:
    24f0: pushq  %r14
    24f2: pushq  %rbx
    24f3: subq   $0x78,%rsp
    24f7: movq   %rsi,%r14
    24fa: xorps  %xmm0,%xmm0
    24fd: movaps %xmm0,(%rsp)
    2501: movaps %xmm0,0x60(%rsp)
    2506: movaps %xmm0,0x50(%rsp)
    250b: movaps %xmm0,0x40(%rsp)
    2510: movaps %xmm0,0x30(%rsp)
    2515: movaps %xmm0,0x20(%rsp)
    251a: movaps %xmm0,0x10(%rsp)
    251f: movq   $0x0,0x70(%rsp)
    2528: movl   %edx,0x8(%rsp)
    252c: movq   %rsp,%rsi
    252f: movl   $0x1a9,%eax
    2534: syscall
    2536: movq   %rax,%rbx
    2539: testl  %ebx,%ebx
    253b: js     256a <io_uring_queue_init+0x7a>
    253d: movq   %rsp,%rsi
    2540: movl   %ebx,%edi
    2542: movq   %r14,%rdx
    2545: callq  2080 <io_uring_queue_mmap@plt>
    254a: testl  %eax,%eax
    254c: je     255d <io_uring_queue_init+0x6d>
    254e: movl   %eax,%edx
    2550: movl   $0x3,%eax
    2555: movl   %ebx,%edi
    2557: syscall
    2559: movl   %edx,%ebx
    255b: jmp    256a <io_uring_queue_init+0x7a>
    255d: movl   0x14(%rsp),%eax
    2561: movl   %eax,0xc8(%r14)
    2568: xorl   %ebx,%ebx
    256a: movl   %ebx,%eax
    256c: addq   $0x78,%rsp
    2570: popq   %rbx
    2571: popq   %r14
    2573: retq

With this patchset:

  000000000000240c <io_uring_queue_init>:
    240c: subq   $0x78,%rsp
    2410: xorps  %xmm0,%xmm0
    2413: movq   %rsp,%rax
    2416: movaps %xmm0,(%rax)
    2419: movaps %xmm0,0x60(%rax)
    241d: movaps %xmm0,0x50(%rax)
    2421: movaps %xmm0,0x40(%rax)
    2425: movaps %xmm0,0x30(%rax)
    2429: movaps %xmm0,0x20(%rax)
    242d: movaps %xmm0,0x10(%rax)
    2431: movq   $0x0,0x70(%rax)
    2439: movl   %edx,0x8(%rax)
    243c: movq   %rax,%rdx
    243f: callq  2090 <io_uring_queue_init_params@plt>
    2444: addq   $0x78,%rsp
    2448: retq

Signed-off-by: Ammar Faizi <[email protected]>
---

Ammar Faizi (2):
  lib: Add __hot and __cold macros
  setup: Mark the exported functions as __cold

 src/lib.h   |  2 ++
 src/setup.c | 25 ++++++++++++++-----------
 2 files changed, 16 insertions(+), 11 deletions(-)


base-commit: 98c14a04e2c0dcdfbb71372a1a209ed889fb3e4d
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH liburing v1 1/2] lib: Add __hot and __cold macros
  2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi
@ 2022-07-03 11:59 ` Ammar Faizi
  2022-07-03 12:20   ` Alviro Iskandar Setiawan
  2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi
  2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe
  2 siblings, 1 reply; 6+ messages in thread
From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf,
	Hao Xu, Pavel Begunkov, io-uring Mailing List,
	GNU/Weeb Mailing List

From: Ammar Faizi <[email protected]>

A prep patch. These macros will be used to annotate hot and cold
functions. Currently, the __hot macro is not used, we will only use
the __cold macro at the moment.

Signed-off-by: Ammar Faizi <[email protected]>
---
 src/lib.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/lib.h b/src/lib.h
index 5844cd2..89a40f2 100644
--- a/src/lib.h
+++ b/src/lib.h
@@ -34,6 +34,8 @@
 #endif
 
 #define __maybe_unused		__attribute__((__unused__))
+#define __hot			__attribute__((__hot__))
+#define __cold			__attribute__((__cold__))
 
 void *__uring_malloc(size_t len);
 void __uring_free(void *p);
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold
  2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi
  2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
@ 2022-07-03 11:59 ` Ammar Faizi
  2022-07-03 12:24   ` Alviro Iskandar Setiawan
  2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe
  2 siblings, 1 reply; 6+ messages in thread
From: Ammar Faizi @ 2022-07-03 11:59 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Ammar Faizi, Alviro Iskandar Setiawan, Fernanda Ma'rouf,
	Hao Xu, Pavel Begunkov, io-uring Mailing List,
	GNU/Weeb Mailing List

From: Ammar Faizi <[email protected]>

These functions are called at initialization, which are slow-paths.
Mark them as __cold so that the compiler will optimize for code size.

Here is the result compiling with Ubuntu clang
15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510

Without this patch:

  $ wc -c src/liburing.so.2.3
  71288 src/liburing.so.2.3

With this patch:

  $ wc -c src/liburing.so.2.3
  69448 src/liburing.so.2.3

Take one slow-path function example, using __cold avoids aggresive
inlining.

Without this patch:

  00000000000024f0 <io_uring_queue_init>:
    24f0: pushq  %r14
    24f2: pushq  %rbx
    24f3: subq   $0x78,%rsp
    24f7: movq   %rsi,%r14
    24fa: xorps  %xmm0,%xmm0
    24fd: movaps %xmm0,(%rsp)
    2501: movaps %xmm0,0x60(%rsp)
    2506: movaps %xmm0,0x50(%rsp)
    250b: movaps %xmm0,0x40(%rsp)
    2510: movaps %xmm0,0x30(%rsp)
    2515: movaps %xmm0,0x20(%rsp)
    251a: movaps %xmm0,0x10(%rsp)
    251f: movq   $0x0,0x70(%rsp)
    2528: movl   %edx,0x8(%rsp)
    252c: movq   %rsp,%rsi
    252f: movl   $0x1a9,%eax
    2534: syscall
    2536: movq   %rax,%rbx
    2539: testl  %ebx,%ebx
    253b: js     256a <io_uring_queue_init+0x7a>
    253d: movq   %rsp,%rsi
    2540: movl   %ebx,%edi
    2542: movq   %r14,%rdx
    2545: callq  2080 <io_uring_queue_mmap@plt>
    254a: testl  %eax,%eax
    254c: je     255d <io_uring_queue_init+0x6d>
    254e: movl   %eax,%edx
    2550: movl   $0x3,%eax
    2555: movl   %ebx,%edi
    2557: syscall
    2559: movl   %edx,%ebx
    255b: jmp    256a <io_uring_queue_init+0x7a>
    255d: movl   0x14(%rsp),%eax
    2561: movl   %eax,0xc8(%r14)
    2568: xorl   %ebx,%ebx
    256a: movl   %ebx,%eax
    256c: addq   $0x78,%rsp
    2570: popq   %rbx
    2571: popq   %r14
    2573: retq

With this patch:

  000000000000240c <io_uring_queue_init>:
    240c: subq   $0x78,%rsp
    2410: xorps  %xmm0,%xmm0
    2413: movq   %rsp,%rax
    2416: movaps %xmm0,(%rax)
    2419: movaps %xmm0,0x60(%rax)
    241d: movaps %xmm0,0x50(%rax)
    2421: movaps %xmm0,0x40(%rax)
    2425: movaps %xmm0,0x30(%rax)
    2429: movaps %xmm0,0x20(%rax)
    242d: movaps %xmm0,0x10(%rax)
    2431: movq   $0x0,0x70(%rax)
    2439: movl   %edx,0x8(%rax)
    243c: movq   %rax,%rdx
    243f: callq  2090 <io_uring_queue_init_params@plt>
    2444: addq   $0x78,%rsp
    2448: retq

Signed-off-by: Ammar Faizi <[email protected]>
---
 src/setup.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/src/setup.c b/src/setup.c
index d2adc7f..2badcc1 100644
--- a/src/setup.c
+++ b/src/setup.c
@@ -89,7 +89,8 @@ err:
  * Returns -errno on error, or zero on success.  On success, 'ring'
  * contains the necessary information to read/write to the rings.
  */
-int io_uring_queue_mmap(int fd, struct io_uring_params *p, struct io_uring *ring)
+__cold int io_uring_queue_mmap(int fd, struct io_uring_params *p,
+			       struct io_uring *ring)
 {
 	int ret;
 
@@ -107,7 +108,7 @@ int io_uring_queue_mmap(int fd, struct io_uring_params *p, struct io_uring *ring
  * Ensure that the mmap'ed rings aren't available to a child after a fork(2).
  * This uses madvise(..., MADV_DONTFORK) on the mmap'ed ranges.
  */
-int io_uring_ring_dontfork(struct io_uring *ring)
+__cold int io_uring_ring_dontfork(struct io_uring *ring)
 {
 	size_t len;
 	int ret;
@@ -138,8 +139,8 @@ int io_uring_ring_dontfork(struct io_uring *ring)
 	return 0;
 }
 
-int io_uring_queue_init_params(unsigned entries, struct io_uring *ring,
-			       struct io_uring_params *p)
+__cold int io_uring_queue_init_params(unsigned entries, struct io_uring *ring,
+				      struct io_uring_params *p)
 {
 	int fd, ret;
 
@@ -161,7 +162,8 @@ int io_uring_queue_init_params(unsigned entries, struct io_uring *ring,
  * Returns -errno on error, or zero on success. On success, 'ring'
  * contains the necessary information to read/write to the rings.
  */
-int io_uring_queue_init(unsigned entries, struct io_uring *ring, unsigned flags)
+__cold int io_uring_queue_init(unsigned entries, struct io_uring *ring,
+			       unsigned flags)
 {
 	struct io_uring_params p;
 
@@ -171,7 +173,7 @@ int io_uring_queue_init(unsigned entries, struct io_uring *ring, unsigned flags)
 	return io_uring_queue_init_params(entries, ring, &p);
 }
 
-void io_uring_queue_exit(struct io_uring *ring)
+__cold void io_uring_queue_exit(struct io_uring *ring)
 {
 	struct io_uring_sq *sq = &ring->sq;
 	struct io_uring_cq *cq = &ring->cq;
@@ -191,7 +193,7 @@ void io_uring_queue_exit(struct io_uring *ring)
 	__sys_close(ring->ring_fd);
 }
 
-struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring)
+__cold struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring)
 {
 	struct io_uring_probe *probe;
 	size_t len;
@@ -211,7 +213,7 @@ struct io_uring_probe *io_uring_get_probe_ring(struct io_uring *ring)
 	return NULL;
 }
 
-struct io_uring_probe *io_uring_get_probe(void)
+__cold struct io_uring_probe *io_uring_get_probe(void)
 {
 	struct io_uring ring;
 	struct io_uring_probe *probe;
@@ -226,7 +228,7 @@ struct io_uring_probe *io_uring_get_probe(void)
 	return probe;
 }
 
-void io_uring_free_probe(struct io_uring_probe *probe)
+__cold void io_uring_free_probe(struct io_uring_probe *probe)
 {
 	uring_free(probe);
 }
@@ -284,7 +286,8 @@ static size_t rings_size(struct io_uring_params *p, unsigned entries,
  * return the required memory so that the caller can ensure that enough space
  * is available before setting up a ring with the specified parameters.
  */
-ssize_t io_uring_mlock_size_params(unsigned entries, struct io_uring_params *p)
+__cold ssize_t io_uring_mlock_size_params(unsigned entries,
+					  struct io_uring_params *p)
 {
 	struct io_uring_params lp = { };
 	struct io_uring ring;
@@ -343,7 +346,7 @@ ssize_t io_uring_mlock_size_params(unsigned entries, struct io_uring_params *p)
  * Return required ulimit -l memory space for a given ring setup. See
  * @io_uring_mlock_size_params().
  */
-ssize_t io_uring_mlock_size(unsigned entries, unsigned flags)
+__cold ssize_t io_uring_mlock_size(unsigned entries, unsigned flags)
 {
 	struct io_uring_params p = { .flags = flags, };
 
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH liburing v1 1/2] lib: Add __hot and __cold macros
  2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
@ 2022-07-03 12:20   ` Alviro Iskandar Setiawan
  0 siblings, 0 replies; 6+ messages in thread
From: Alviro Iskandar Setiawan @ 2022-07-03 12:20 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Jens Axboe, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov,
	io-uring Mailing List, GNU/Weeb Mailing List

On Sun, Jul 3, 2022 at 6:59 PM Ammar Faizi wrote:
>
> From: Ammar Faizi <[email protected]>
>
> A prep patch. These macros will be used to annotate hot and cold
> functions. Currently, the __hot macro is not used, we will only use
> the __cold macro at the moment.
>
> Signed-off-by: Ammar Faizi <[email protected]>

Reviewed-by: Alviro Iskandar Setiawan <[email protected]>

tq

-- Viro

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold
  2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi
@ 2022-07-03 12:24   ` Alviro Iskandar Setiawan
  0 siblings, 0 replies; 6+ messages in thread
From: Alviro Iskandar Setiawan @ 2022-07-03 12:24 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Jens Axboe, Fernanda Ma'rouf, Hao Xu, Pavel Begunkov,
	io-uring Mailing List, GNU/Weeb Mailing List

On Sun, Jul 3, 2022 at 6:59 PM Ammar Faizi wrote:
>
> From: Ammar Faizi <[email protected]>
>
> These functions are called at initialization, which are slow-paths.
> Mark them as __cold so that the compiler will optimize for code size.
>
> Here is the result compiling with Ubuntu clang
> 15.0.0-++20220601012204+ec2711b35411-1~exp1~20220601012300.510
>
> Without this patch:
>
>   $ wc -c src/liburing.so.2.3
>   71288 src/liburing.so.2.3
>
> With this patch:
>
>   $ wc -c src/liburing.so.2.3
>   69448 src/liburing.so.2.3
>
> Take one slow-path function example, using __cold avoids aggresive
> inlining.
>
> Without this patch:
>
>   00000000000024f0 <io_uring_queue_init>:
>     24f0: pushq  %r14
>     24f2: pushq  %rbx
>     24f3: subq   $0x78,%rsp
>     24f7: movq   %rsi,%r14
>     24fa: xorps  %xmm0,%xmm0
>     24fd: movaps %xmm0,(%rsp)
>     2501: movaps %xmm0,0x60(%rsp)
>     2506: movaps %xmm0,0x50(%rsp)
>     250b: movaps %xmm0,0x40(%rsp)
>     2510: movaps %xmm0,0x30(%rsp)
>     2515: movaps %xmm0,0x20(%rsp)
>     251a: movaps %xmm0,0x10(%rsp)
>     251f: movq   $0x0,0x70(%rsp)
>     2528: movl   %edx,0x8(%rsp)
>     252c: movq   %rsp,%rsi
>     252f: movl   $0x1a9,%eax
>     2534: syscall
>     2536: movq   %rax,%rbx
>     2539: testl  %ebx,%ebx
>     253b: js     256a <io_uring_queue_init+0x7a>
>     253d: movq   %rsp,%rsi
>     2540: movl   %ebx,%edi
>     2542: movq   %r14,%rdx
>     2545: callq  2080 <io_uring_queue_mmap@plt>
>     254a: testl  %eax,%eax
>     254c: je     255d <io_uring_queue_init+0x6d>
>     254e: movl   %eax,%edx
>     2550: movl   $0x3,%eax
>     2555: movl   %ebx,%edi
>     2557: syscall
>     2559: movl   %edx,%ebx
>     255b: jmp    256a <io_uring_queue_init+0x7a>
>     255d: movl   0x14(%rsp),%eax
>     2561: movl   %eax,0xc8(%r14)
>     2568: xorl   %ebx,%ebx
>     256a: movl   %ebx,%eax
>     256c: addq   $0x78,%rsp
>     2570: popq   %rbx
>     2571: popq   %r14
>     2573: retq
>
> With this patch:
>
>   000000000000240c <io_uring_queue_init>:
>     240c: subq   $0x78,%rsp
>     2410: xorps  %xmm0,%xmm0
>     2413: movq   %rsp,%rax
>     2416: movaps %xmm0,(%rax)
>     2419: movaps %xmm0,0x60(%rax)
>     241d: movaps %xmm0,0x50(%rax)
>     2421: movaps %xmm0,0x40(%rax)
>     2425: movaps %xmm0,0x30(%rax)
>     2429: movaps %xmm0,0x20(%rax)
>     242d: movaps %xmm0,0x10(%rax)
>     2431: movq   $0x0,0x70(%rax)
>     2439: movl   %edx,0x8(%rax)
>     243c: movq   %rax,%rdx
>     243f: callq  2090 <io_uring_queue_init_params@plt>
>     2444: addq   $0x78,%rsp
>     2448: retq
>
> Signed-off-by: Ammar Faizi <[email protected]>

Reviewed-by: Alviro Iskandar Setiawan <[email protected]>

tq

-- Viro

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH liburing v1 0/2] __hot and __cold
  2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi
  2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
  2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi
@ 2022-07-03 13:00 ` Jens Axboe
  2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2022-07-03 13:00 UTC (permalink / raw)
  To: ammarfaizi2
  Cc: alviro.iskandar, asml.silence, io-uring, howeyxu, fernandafmr12,
	gwml

On Sun, 3 Jul 2022 18:59:10 +0700, Ammar Faizi wrote:
> From: Ammar Faizi <[email protected]>
> 
> Hi Jens,
> 
> This series adds __hot and __cold macros. Currently, the __hot macro
> is not used. The __cold annotation hints the compiler to optimize for
> code size. This is good for the slow-path in the setup.c file.
> 
> [...]

Applied, thanks!

[1/2] lib: Add __hot and __cold macros
      commit: ee459df3c83ab86b84e1acaaa23c340efb5bab35
[2/2] setup: Mark the exported functions as __cold
      commit: 907c171fa4aac773fee9421bc38fcf9581e54f61

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-07-03 13:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-03 11:59 [PATCH liburing v1 0/2] __hot and __cold Ammar Faizi
2022-07-03 11:59 ` [PATCH liburing v1 1/2] lib: Add __hot and __cold macros Ammar Faizi
2022-07-03 12:20   ` Alviro Iskandar Setiawan
2022-07-03 11:59 ` [PATCH liburing v1 2/2] setup: Mark the exported functions as __cold Ammar Faizi
2022-07-03 12:24   ` Alviro Iskandar Setiawan
2022-07-03 13:00 ` [PATCH liburing v1 0/2] __hot and __cold Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox