From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on gnuweeb.org X-Spam-Level: ** X-Spam-Status: No, score=2.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, PDS_BAD_THREAD_QP_64,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 Authentication-Results: gnuweeb.org; dmarc=pass (p=none dis=none) header.from=ACULAB.COM Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=185.58.86.151; helo=eu-smtp-delivery-151.mimecast.com; envelope-from=david.laight@aculab.com; receiver= Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.86.151]) by gnuweeb.org (Postfix) with ESMTPS id B83F924B3E1 for ; Mon, 4 Sep 2023 15:27:02 +0700 (WIB) Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-165-xe_fBOK-OuSM-5QU5OdlUg-1; Mon, 04 Sep 2023 09:27:00 +0100 X-MC-Unique: xe_fBOK-OuSM-5QU5OdlUg-1 Received: from AcuMS.Aculab.com (10.202.163.6) by AcuMS.aculab.com (10.202.163.6) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Mon, 4 Sep 2023 09:26:49 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Mon, 4 Sep 2023 09:26:49 +0100 From: David Laight To: 'Willy Tarreau' , Ammar Faizi CC: =?iso-8859-1?Q?Thomas_Wei=DFschuh?= , "Nicholas Rosenberg" , Alviro Iskandar Setiawan , Michael William Jonathan , GNU/Weeb Mailing List , Linux Kernel Mailing List Subject: RE: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()` Thread-Topic: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()` Thread-Index: AQHZ25BVzjdSsUQiYE6wDAvKxWgMfbAKWqPw Date: Mon, 4 Sep 2023 08:26:49 +0000 Message-ID: References: <20230830135726.1939997-1-ammarfaizi2@gnuweeb.org> <20230830135726.1939997-4-ammarfaizi2@gnuweeb.org> In-Reply-To: Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable List-Id: From: Willy Tarreau > Sent: 30 August 2023 22:27 >=20 > On Wed, Aug 30, 2023 at 08:57:24PM +0700, Ammar Faizi wrote: > > Simplify memcmp() on the x86-64 arch. > > > > The x86-64 arch has a 'rep cmpsb' instruction, which can be used to > > implement the memcmp() function. > > > > %rdi =3D source 1 > > %rsi =3D source 2 > > %rcx =3D length > > > > Signed-off-by: Ammar Faizi > > --- > > tools/include/nolibc/arch-x86_64.h | 19 +++++++++++++++++++ > > tools/include/nolibc/string.h | 2 ++ > > 2 files changed, 21 insertions(+) > > > > diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/= arch-x86_64.h > > index 42f2674ad1ecdd64..6c1b54ba9f774e7b 100644 > > --- a/tools/include/nolibc/arch-x86_64.h > > +++ b/tools/include/nolibc/arch-x86_64.h > > @@ -214,4 +214,23 @@ __asm__ ( > > =09"retq\n" > > ); > > > > +#define NOLIBC_ARCH_HAS_MEMCMP > > +static int memcmp(const void *s1, const void *s2, size_t n) > > +{ > > +=09const unsigned char *p1 =3D s1; > > +=09const unsigned char *p2 =3D s2; > > + > > +=09if (!n) > > +=09=09return 0; > > + > > +=09__asm__ volatile ( > > +=09=09"rep cmpsb" > > +=09=09: "+D"(p2), "+S"(p1), "+c"(n) > > +=09=09: "m"(*(const unsigned char (*)[n])s1), > > +=09=09 "m"(*(const unsigned char (*)[n])s2) > > +=09); > > + > > +=09return p1[-1] - p2[-1]; > > +} >=20 > Out of curiosity, given that you implemented the 3 other ones directly > in an asm statement, is there a particular reason this one mixes a bit > of C and asm ? It would probably be something around this, in the same > vein: >=20 > memcmp: > xchg %esi,%eax // source1 Aren't the arguments in %rdi, %rsi and %rdx so you only need to move the count (below) ? (Looks like you copied memchr()) =09David > mov %rdx,%rcx // count > rep cmpsb // source2 in rdi; sets ZF on equal, CF if src1 seta %al // 0 if src2 <=3D src1, 1 if src2 > src1 > sbb $0, %al // 0 if src2 =3D=3D src1, -1 if src2 < src1, 1 if s= rc2 > src1 > movsx %al, %eax // sign extend to %eax > ret >=20 > Note that the output logic could have to be revisited, I'm not certain bu= t > at first glance it looks valid. >=20 > Regards, > Willy - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1= PT, UK Registration No: 1397386 (Wales)