From: Alviro Iskandar Setiawan <[email protected]>
To: Ammar Faizi <[email protected]>
Cc: Jens Axboe <[email protected]>,
Pavel Begunkov <[email protected]>,
Gilang Fachrezy <[email protected]>,
VNLX Kernel Department <[email protected]>,
"GNU/Weeb Mailing List" <[email protected]>,
io-uring Mailing List <[email protected]>
Subject: Re: [PATCH liburing v1 1/2] nolibc: Fix bloated memset due to unexpected vectorization
Date: Fri, 6 Jan 2023 22:56:48 +0700 [thread overview]
Message-ID: <CAOG64qOo6Co0Z8i48MYyNbmA+arMbhktAGgpTrzBzJa3bqrORw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
On Fri, Jan 6, 2023 at 10:43 PM Ammar Faizi wrote:
> Clang and GCC generate an insane vectorized memset() in nolibc.c.
> liburing doesn't need such a powerful memset(). Add an empty inline ASM
> to prevent the compilers from over-optimizing the memset().
>
> Just for comparison, see the following Assembly code (generated by
> Clang).
>
> Before this patch:
>
> ```
> 0000000000003a00 <__uring_memset>:
> 3a00: mov %rdi,%rax
> 3a03: test %rdx,%rdx
> 3a06: je 3b2c <__uring_memset+0x12c>
> 3a0c: cmp $0x8,%rdx
> 3a10: jae 3a19 <__uring_memset+0x19>
> 3a12: xor %ecx,%ecx
> 3a14: jmp 3b20 <__uring_memset+0x120>
> 3a19: movzbl %sil,%r8d
> 3a1d: cmp $0x20,%rdx
> 3a21: jae 3a2a <__uring_memset+0x2a>
> 3a23: xor %ecx,%ecx
> 3a25: jmp 3ae0 <__uring_memset+0xe0>
> 3a2a: mov %rdx,%rcx
> 3a2d: and $0xffffffffffffffe0,%rcx
> 3a31: movd %r8d,%xmm0
> 3a36: punpcklbw %xmm0,%xmm0
> 3a3a: pshuflw $0x0,%xmm0,%xmm0
> 3a3f: pshufd $0x0,%xmm0,%xmm0
> 3a44: lea -0x20(%rcx),%rdi
> 3a48: mov %rdi,%r10
> 3a4b: shr $0x5,%r10
> 3a4f: inc %r10
> 3a52: mov %r10d,%r9d
> 3a55: and $0x3,%r9d
> 3a59: cmp $0x60,%rdi
> 3a5d: jae 3a63 <__uring_memset+0x63>
> 3a5f: xor %edi,%edi
> 3a61: jmp 3aa9 <__uring_memset+0xa9>
> 3a63: and $0xfffffffffffffffc,%r10
> 3a67: xor %edi,%edi
> 3a69: nopl 0x0(%rax)
> 3a70: movdqu %xmm0,(%rax,%rdi,1)
> 3a75: movdqu %xmm0,0x10(%rax,%rdi,1)
> 3a7b: movdqu %xmm0,0x20(%rax,%rdi,1)
> 3a81: movdqu %xmm0,0x30(%rax,%rdi,1)
> 3a87: movdqu %xmm0,0x40(%rax,%rdi,1)
> 3a8d: movdqu %xmm0,0x50(%rax,%rdi,1)
> 3a93: movdqu %xmm0,0x60(%rax,%rdi,1)
> 3a99: movdqu %xmm0,0x70(%rax,%rdi,1)
> 3a9f: sub $0xffffffffffffff80,%rdi
> 3aa3: add $0xfffffffffffffffc,%r10
> 3aa7: jne 3a70 <__uring_memset+0x70>
> 3aa9: test %r9,%r9
> 3aac: je 3ad6 <__uring_memset+0xd6>
> 3aae: lea (%rdi,%rax,1),%r10
> 3ab2: add $0x10,%r10
> 3ab6: shl $0x5,%r9
> 3aba: xor %edi,%edi
> 3abc: nopl 0x0(%rax)
> 3ac0: movdqu %xmm0,-0x10(%r10,%rdi,1)
> 3ac7: movdqu %xmm0,(%r10,%rdi,1)
> 3acd: add $0x20,%rdi
> 3ad1: cmp %rdi,%r9
> 3ad4: jne 3ac0 <__uring_memset+0xc0>
> 3ad6: cmp %rdx,%rcx
> 3ad9: je 3b2c <__uring_memset+0x12c>
> 3adb: test $0x18,%dl
> 3ade: je 3b20 <__uring_memset+0x120>
> 3ae0: mov %rcx,%rdi
> 3ae3: mov %rdx,%rcx
> 3ae6: and $0xfffffffffffffff8,%rcx
> 3aea: movd %r8d,%xmm0
> 3aef: punpcklbw %xmm0,%xmm0
> 3af3: pshuflw $0x0,%xmm0,%xmm0
> 3af8: nopl 0x0(%rax,%rax,1)
> 3b00: movq %xmm0,(%rax,%rdi,1)
> 3b05: add $0x8,%rdi
> 3b09: cmp %rdi,%rcx
> 3b0c: jne 3b00 <__uring_memset+0x100>
> 3b0e: cmp %rdx,%rcx
> 3b11: je 3b2c <__uring_memset+0x12c>
> 3b13: data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
> 3b20: mov %sil,(%rax,%rcx,1)
> 3b24: inc %rcx
> 3b27: cmp %rcx,%rdx
> 3b2a: jne 3b20 <__uring_memset+0x120>
> 3b2c: ret
> 3b2d: nopl (%rax)
> ```
>
> After this patch:
>
> ```
> 0000000000003424 <__uring_memset>:
> 3424: mov %rdi,%rax
> 3427: test %rdx,%rdx
> 342a: je 343a <__uring_memset+0x16>
> 342c: xor %ecx,%ecx
> 342e: mov %sil,(%rax,%rcx,1)
> 3432: inc %rcx
> 3435: cmp %rcx,%rdx
> 3438: jne 342e <__uring_memset+0xa>
> 343a: ret
> ```
>
> Signed-off-by: Ammar Faizi <[email protected]>
Reviewed-by: Alviro Iskandar Setiawan <[email protected]>
next prev parent reply other threads:[~2023-01-06 15:57 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-06 15:42 [PATCH liburing v1 0/2] liburing micro-optimzation Ammar Faizi
2023-01-06 15:42 ` [PATCH liburing v1 1/2] nolibc: Fix bloated memset due to unexpected vectorization Ammar Faizi
2023-01-06 15:56 ` Alviro Iskandar Setiawan [this message]
2023-01-06 15:42 ` [PATCH liburing v1 2/2] register: Simplify `io_uring_register_file_alloc_range()` function Ammar Faizi
2023-01-06 15:59 ` Alviro Iskandar Setiawan
2023-01-06 17:08 ` [PATCH liburing v1 0/2] liburing micro-optimzation Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAOG64qOo6Co0Z8i48MYyNbmA+arMbhktAGgpTrzBzJa3bqrORw@mail.gmail.com \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox