From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=1.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 Authentication-Results: gnuweeb.org; dmarc=none (p=none dis=none) header.from=1wt.eu Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=163.172.96.212; helo=1wt.eu; envelope-from=w@1wt.eu; receiver= Received: from 1wt.eu (ded1.1wt.eu [163.172.96.212]) by gnuweeb.org (Postfix) with ESMTP id 9FB0A24B336 for ; Wed, 30 Aug 2023 22:23:36 +0700 (WIB) Received: (from willy@localhost) by mail.home.local (8.17.1/8.17.1/Submit) id 37UFNMms027896; Wed, 30 Aug 2023 17:23:22 +0200 Date: Wed, 30 Aug 2023 17:23:22 +0200 From: Willy Tarreau To: Ammar Faizi Cc: Alviro Iskandar Setiawan , Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Nicholas Rosenberg , Michael William Jonathan , GNU/Weeb Mailing List , Linux Kernel Mailing List Subject: Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Message-ID: References: <20230830135726.1939997-1-ammarfaizi2@gnuweeb.org> <20230830135726.1939997-3-ammarfaizi2@gnuweeb.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: On Wed, Aug 30, 2023 at 10:09:51PM +0700, Ammar Faizi wrote: > On Wed, Aug 30, 2023 at 09:24:45PM +0700, Alviro Iskandar Setiawan wrote: > > Just a small idea to shrink this more, "mov %rdi, %rdx" and "mov %rdx, > > %rax" can be replaced with "push %rdi" and "pop %rax" (they are just a > > byte). So we can save 4 bytes more. > > > > 0000000000001500 : > > 1500: 48 89 f0 mov %rsi,%rax > > 1503: 48 89 d1 mov %rdx,%rcx > > 1506: 57 push %rdi > > 1507: f3 aa rep stos %al,%es:(%rdi) > > 1509: 58 pop %rax > > 150a: c3 ret > > > > But I know you don't like it because it costs extra memory access. > > Yes, that's an extra memory access. But I believe it doesn't hurt > someone targetting -Os. In many cases, the compilers use push/pop to > align the stack before a 'call' instruction. If they want to avoid extra > memory access, they could have used "subq $8, %rsp" and "addq $8, %rsp". Then "xchg %esi, %eax" is just one byte with no memory access ;-) Willy