From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=1.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, SPF_HELO_PASS,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 Authentication-Results: gnuweeb.org; dmarc=none (p=none dis=none) header.from=1wt.eu Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=163.172.96.212; helo=1wt.eu; envelope-from=w@1wt.eu; receiver= Received: from 1wt.eu (ded1.1wt.eu [163.172.96.212]) by gnuweeb.org (Postfix) with ESMTP id E857524B2B0 for ; Wed, 30 Aug 2023 22:51:59 +0700 (WIB) Received: (from willy@localhost) by mail.home.local (8.17.1/8.17.1/Submit) id 37UFpq87028074; Wed, 30 Aug 2023 17:51:52 +0200 Date: Wed, 30 Aug 2023 17:51:52 +0200 From: Willy Tarreau To: Ammar Faizi Cc: Alviro Iskandar Setiawan , Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Nicholas Rosenberg , Michael William Jonathan , GNU/Weeb Mailing List , Linux Kernel Mailing List Subject: Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Message-ID: References: <20230830135726.1939997-1-ammarfaizi2@gnuweeb.org> <20230830135726.1939997-3-ammarfaizi2@gnuweeb.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: On Wed, Aug 30, 2023 at 10:44:53PM +0700, Ammar Faizi wrote: > On Wed, Aug 30, 2023 at 05:23:22PM +0200, Willy Tarreau wrote: > > Then "xchg %esi, %eax" is just one byte with no memory access ;-) > > Perfect! > > Now I got this, shorter than "movl %esi, %eax": > ``` > 0000000000001500 : > 1500: 96 xchg %eax,%esi > 1501: 48 89 d1 mov %rdx,%rcx > 1504: 57 push %rdi > 1505: f3 aa rep stos %al,%es:(%rdi) > 1507: 58 pop %rax > 1508: c3 ret > ``` > > Unfortunately, the xchg trick doesn't yield smaller machine code for > %rdx, %rcx. Lol. Normal, that's because historically "xchg ax, regX" was a single-byte 0x9X on 8086, then it turned to 32-bit keeping the same encoding, like many instructions (note that NOP is encoded as xchg ax,ax). It remains short when you can sacrifice the other register, or restore it later using yet another xchg. For rcx/rdx a push/pop could do it as they should also be a single-byte 0x5X even in long mode unless I'm mistaken. Thus if you absolutely want to squeeze that 9th byte to end up with a 8-byte function you could probably do: xchg %eax, %esi 1 push %rdx 1 pop %rcx 1 push %rdi 1 rep movsb 2 pop %rax 1 ret 1 ------------- Total: 8 bytes :-) Willy