public inbox for [email protected]
 help / color / mirror / Atom feed
From: Alviro Iskandar Setiawan <[email protected]>
To: Ammar Faizi <[email protected]>
Cc: Jens Axboe <[email protected]>,
	Pavel Begunkov <[email protected]>,
	Gilang Fachrezy <[email protected]>,
	VNLX Kernel Department <[email protected]>,
	"GNU/Weeb Mailing List" <[email protected]>,
	io-uring Mailing List <[email protected]>
Subject: Re: [PATCH liburing v1 1/2] nolibc: Fix bloated memset due to unexpected vectorization
Date: Fri, 6 Jan 2023 22:56:48 +0700	[thread overview]
Message-ID: <CAOG64qOo6Co0Z8i48MYyNbmA+arMbhktAGgpTrzBzJa3bqrORw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>

On Fri, Jan 6, 2023 at 10:43 PM Ammar Faizi wrote:
> Clang and GCC generate an insane vectorized memset() in nolibc.c.
> liburing doesn't need such a powerful memset(). Add an empty inline ASM
> to prevent the compilers from over-optimizing the memset().
>
> Just for comparison, see the following Assembly code (generated by
> Clang).
>
> Before this patch:
>
> ```
>   0000000000003a00 <__uring_memset>:
>     3a00:  mov    %rdi,%rax
>     3a03:  test   %rdx,%rdx
>     3a06:  je     3b2c <__uring_memset+0x12c>
>     3a0c:  cmp    $0x8,%rdx
>     3a10:  jae    3a19 <__uring_memset+0x19>
>     3a12:  xor    %ecx,%ecx
>     3a14:  jmp    3b20 <__uring_memset+0x120>
>     3a19:  movzbl %sil,%r8d
>     3a1d:  cmp    $0x20,%rdx
>     3a21:  jae    3a2a <__uring_memset+0x2a>
>     3a23:  xor    %ecx,%ecx
>     3a25:  jmp    3ae0 <__uring_memset+0xe0>
>     3a2a:  mov    %rdx,%rcx
>     3a2d:  and    $0xffffffffffffffe0,%rcx
>     3a31:  movd   %r8d,%xmm0
>     3a36:  punpcklbw %xmm0,%xmm0
>     3a3a:  pshuflw $0x0,%xmm0,%xmm0
>     3a3f:  pshufd $0x0,%xmm0,%xmm0
>     3a44:  lea    -0x20(%rcx),%rdi
>     3a48:  mov    %rdi,%r10
>     3a4b:  shr    $0x5,%r10
>     3a4f:  inc    %r10
>     3a52:  mov    %r10d,%r9d
>     3a55:  and    $0x3,%r9d
>     3a59:  cmp    $0x60,%rdi
>     3a5d:  jae    3a63 <__uring_memset+0x63>
>     3a5f:  xor    %edi,%edi
>     3a61:  jmp    3aa9 <__uring_memset+0xa9>
>     3a63:  and    $0xfffffffffffffffc,%r10
>     3a67:  xor    %edi,%edi
>     3a69:  nopl   0x0(%rax)
>     3a70:  movdqu %xmm0,(%rax,%rdi,1)
>     3a75:  movdqu %xmm0,0x10(%rax,%rdi,1)
>     3a7b:  movdqu %xmm0,0x20(%rax,%rdi,1)
>     3a81:  movdqu %xmm0,0x30(%rax,%rdi,1)
>     3a87:  movdqu %xmm0,0x40(%rax,%rdi,1)
>     3a8d:  movdqu %xmm0,0x50(%rax,%rdi,1)
>     3a93:  movdqu %xmm0,0x60(%rax,%rdi,1)
>     3a99:  movdqu %xmm0,0x70(%rax,%rdi,1)
>     3a9f:  sub    $0xffffffffffffff80,%rdi
>     3aa3:  add    $0xfffffffffffffffc,%r10
>     3aa7:  jne    3a70 <__uring_memset+0x70>
>     3aa9:  test   %r9,%r9
>     3aac:  je     3ad6 <__uring_memset+0xd6>
>     3aae:  lea    (%rdi,%rax,1),%r10
>     3ab2:  add    $0x10,%r10
>     3ab6:  shl    $0x5,%r9
>     3aba:  xor    %edi,%edi
>     3abc:  nopl   0x0(%rax)
>     3ac0:  movdqu %xmm0,-0x10(%r10,%rdi,1)
>     3ac7:  movdqu %xmm0,(%r10,%rdi,1)
>     3acd:  add    $0x20,%rdi
>     3ad1:  cmp    %rdi,%r9
>     3ad4:  jne    3ac0 <__uring_memset+0xc0>
>     3ad6:  cmp    %rdx,%rcx
>     3ad9:  je     3b2c <__uring_memset+0x12c>
>     3adb:  test   $0x18,%dl
>     3ade:  je     3b20 <__uring_memset+0x120>
>     3ae0:  mov    %rcx,%rdi
>     3ae3:  mov    %rdx,%rcx
>     3ae6:  and    $0xfffffffffffffff8,%rcx
>     3aea:  movd   %r8d,%xmm0
>     3aef:  punpcklbw %xmm0,%xmm0
>     3af3:  pshuflw $0x0,%xmm0,%xmm0
>     3af8:  nopl   0x0(%rax,%rax,1)
>     3b00:  movq   %xmm0,(%rax,%rdi,1)
>     3b05:  add    $0x8,%rdi
>     3b09:  cmp    %rdi,%rcx
>     3b0c:  jne    3b00 <__uring_memset+0x100>
>     3b0e:  cmp    %rdx,%rcx
>     3b11:  je     3b2c <__uring_memset+0x12c>
>     3b13:  data16 data16 data16 cs nopw 0x0(%rax,%rax,1)
>     3b20:  mov    %sil,(%rax,%rcx,1)
>     3b24:  inc    %rcx
>     3b27:  cmp    %rcx,%rdx
>     3b2a:  jne    3b20 <__uring_memset+0x120>
>     3b2c:  ret
>     3b2d:  nopl   (%rax)
> ```
>
> After this patch:
>
> ```
>   0000000000003424 <__uring_memset>:
>     3424:  mov    %rdi,%rax
>     3427:  test   %rdx,%rdx
>     342a:  je     343a <__uring_memset+0x16>
>     342c:  xor    %ecx,%ecx
>     342e:  mov    %sil,(%rax,%rcx,1)
>     3432:  inc    %rcx
>     3435:  cmp    %rcx,%rdx
>     3438:  jne    342e <__uring_memset+0xa>
>     343a:  ret
> ```
>
> Signed-off-by: Ammar Faizi <[email protected]>

Reviewed-by: Alviro Iskandar Setiawan <[email protected]>

  reply	other threads:[~2023-01-06 15:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 15:42 [PATCH liburing v1 0/2] liburing micro-optimzation Ammar Faizi
2023-01-06 15:42 ` [PATCH liburing v1 1/2] nolibc: Fix bloated memset due to unexpected vectorization Ammar Faizi
2023-01-06 15:56   ` Alviro Iskandar Setiawan [this message]
2023-01-06 15:42 ` [PATCH liburing v1 2/2] register: Simplify `io_uring_register_file_alloc_range()` function Ammar Faizi
2023-01-06 15:59   ` Alviro Iskandar Setiawan
2023-01-06 17:08 ` [PATCH liburing v1 0/2] liburing micro-optimzation Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOG64qOo6Co0Z8i48MYyNbmA+arMbhktAGgpTrzBzJa3bqrORw@mail.gmail.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox