From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org X-Spam-Level: X-Spam-Status: No, score=-0.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,LOTS_OF_MONEY,MONEY_NOHTML, URIBL_BLOCKED,URIBL_DBL_BLOCKED_OPENDNS autolearn=no autolearn_force=no version=3.4.6 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gnuweeb.org; s=default; t=1693553259; bh=ksi0N4bkY4vYdPJRy7584Sb6QNv7wTjPKtpgW3gRMDI=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=h6Zh8YGHc9wyCl7XeR5kfndRlwlkWmvBho3inO3tFxWmIWGaELZFCfKpXP/+pnve4 wDGRj8NEcD4y9gPNiy9N8MfQmp6ih+SbDBTTK3SFXJXN2hGzLEYzLdcXHU8toRP1L4 m4PzIvQZMk6RPhYCdb2RTraXXQrLWsMKy9Lb+aJpNLMy4Qgu231GJaPuAp4VTQSlbd BkGH3OPSXtyimXKDqOSQidJfWrlidoFSVQFVv42x0aaznSu9KqyFWOiz94p2cGn0GC 7noDMvDolP8/TRls/WR57aPYhlaSkm9qqVGgZ1LfkR9ASVC673VE5vMxJCC3xaicAu ZHnBLDbvFnneQ== Received: from biznet-home.integral.gnuweeb.org (unknown [182.253.126.208]) by gnuweeb.org (Postfix) with ESMTPSA id A86D424B367; Fri, 1 Sep 2023 14:27:36 +0700 (WIB) Date: Fri, 1 Sep 2023 14:27:28 +0700 From: Ammar Faizi To: Willy Tarreau Cc: Thomas =?iso-8859-1?Q?Wei=DFschuh?= , Nicholas Rosenberg , Alviro Iskandar Setiawan , Michael William Jonathan , GNU/Weeb Mailing List , Linux Kernel Mailing List Subject: Re: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()` Message-ID: References: <20230830135726.1939997-1-ammarfaizi2@gnuweeb.org> <20230830135726.1939997-4-ammarfaizi2@gnuweeb.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bpl: hUx9VaHkTWcLO7S8CQCslj6OzqBx2hfLChRz45nPESx5VSB/xuJQVOKOB1zSXE3yc9ntP27bV1M1 List-Id: On Fri, Sep 01, 2023 at 05:35:08AM +0200, Willy Tarreau wrote: > On Fri, Sep 01, 2023 at 10:24:42AM +0700, Ammar Faizi wrote: > > After thinking about this more, I think I'll drop the memcmp() patch > > because it will prevent optimization when comparing a small value. > > > > For example, without __asm__: > > > > memcmp(var, "abcd", 4); > > > > may compile to: > > > > cmpl $0x64636261, %reg > > ...something... > > > > But with __asm__, the compiler can't do that. Thus, it's not worth > > optimizing the memcmp() in this case. > > Ah you're totally right! So, it turns out that such assumption is wrong. The compiler cannot optimize the current memcmp() into that. I just posted a question on SO: https://stackoverflow.com/questions/77020562/what-prevents-the-compiler-from-optimizing-a-hand-written-memcmp Given: ``` bool test_data(void *data) { return memcmp(data, "abcd", 4) == 0; } ``` The result when using default the memcmp (good): ``` test_data: cmpl $1684234849, (%rdi) sete %al ret ``` The result when using nolibc memcmp() (bad): ``` test_data: cmpb $97, (%rdi) jne .L5 cmpb $98, 1(%rdi) jne .L5 cmpb $99, 2(%rdi) jne .L5 cmpb $100, 3(%rdi) sete %al ret .L5: xorl %eax, %eax ret ``` Link: https://godbolt.org/z/TT94r3bvf This is because apart from the input length, the current nolibc `memcmp()` must stop comparing the next byte if it finds a non-match byte. Imagine what happens if we call: ``` char xstr[] = {'a', 'b', 'x'}; test_data(x); ``` In that case, the compiler may read past xstr if it uses a dword cmp, it can also lead to segfault in particular circumstances using a dword cmp. What the current nolibc memcmp() does from the C language view: 1) Compare one byte at a time. 2) Must stop comparing the next byte if it finds a non-match byte. Because point (2) comes in, the compiler is not allowed to optimize nolibc memcmp() into a wider load; otherwise, it may hit a segfault. That also means it cannot vectorize the memcmp() loop. On the other hand, memcpy() and memset() don't have such a restriction so they can vectorize. The real memcmp() assumes that both sources are at least `n` length in size, allowing for a wider load. The current nolibc memcmp() implementation doesn't reflect that assumption in the C code. IOW, the real built-in memcmp() is undefined behavior for this code: ``` char x = 'q'; return memcmp(&x, "abcd", 4); ``` but the current nolibc memcmp() is well-defined behavior (well, must be, as what the C code reflects). We can improve nolibc memcmp() by casting the sources to a wider type like (ulong, uint, ushort). But that's another story for another RFC patchset. -- Ammar Faizi