public inbox for [email protected]
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/5] nolibc x86-64 string functions
@ 2023-08-30 13:57 Ammar Faizi
  2023-08-30 13:57 ` [RFC PATCH v1 1/5] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 13:57 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, Nicholas Rosenberg, Alviro Iskandar Setiawan,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

Hi Willy,

This is an RFC patchset for nolibc x86-64 string functions. There are 5
patches in this series.

## Patch 1-3: Use `rep movsb`, `rep stosb`, and `rep cmpsb` for:
    - memcpy() and memmove()
    - memset()
    - memcmp()
respectively. They can simplify the generated ASM code.

Patch 4 and 5 are not related, just a small cleanup.

## Patch 4: Remove the `_nolibc_memcpy_down()` function
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.

## Patch 5: Remove the `_nolibc_memcpy_up()` function
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.

Before this series:
```
  0000000000001479 <memmove>:
    1479: f3 0f 1e fa           endbr64 
    147d: 48 39 f7              cmp    %rsi,%rdi
    1480: 48 c7 c1 ff ff ff ff  mov    $0xffffffffffffffff,%rcx
    1487: 48 89 f8              mov    %rdi,%rax
    148a: 48 0f 43 ca           cmovae %rdx,%rcx
    148e: 48 19 ff              sbb    %rdi,%rdi
    1491: 83 e7 02              and    $0x2,%edi
    1494: 48 ff cf              dec    %rdi
    1497: 48 85 d2              test   %rdx,%rdx
    149a: 74 10                 je     14ac <memmove+0x33>
    149c: 48 01 f9              add    %rdi,%rcx
    149f: 48 ff ca              dec    %rdx
    14a2: 44 8a 04 0e           mov    (%rsi,%rcx,1),%r8b
    14a6: 44 88 04 08           mov    %r8b,(%rax,%rcx,1)
    14aa: eb eb                 jmp    1497 <memmove+0x1e>
    14ac: c3                    ret

  00000000000014ad <memcpy>:
    14ad: f3 0f 1e fa           endbr64 
    14b1: 48 89 f8              mov    %rdi,%rax
    14b4: 31 c9                 xor    %ecx,%ecx
    14b6: 48 39 ca              cmp    %rcx,%rdx
    14b9: 74 0d                 je     14c8 <memcpy+0x1b>
    14bb: 40 8a 3c 0e           mov    (%rsi,%rcx,1),%dil
    14bf: 40 88 3c 08           mov    %dil,(%rax,%rcx,1)
    14c3: 48 ff c1              inc    %rcx
    14c6: eb ee                 jmp    14b6 <memcpy+0x9>
    14c8: c3                    ret

  00000000000014c9 <memset>:
    14c9: f3 0f 1e fa           endbr64 
    14cd: 48 89 f8              mov    %rdi,%rax
    14d0: 31 c9                 xor    %ecx,%ecx
    14d2: 48 39 ca              cmp    %rcx,%rdx
    14d5: 74 09                 je     14e0 <memset+0x17>
    14d7: 40 88 34 08           mov    %sil,(%rax,%rcx,1)
    14db: 48 ff c1              inc    %rcx
    14de: eb f2                 jmp    14d2 <memset+0x9>
    14e0: c3                    ret
```

After this series:
```
  000000000000140a <memmove>:
    140a: 48 89 f8              mov    %rdi,%rax
    140d: 48 89 d1              mov    %rdx,%rcx
    1410: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
    1415: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
    141a: fd                    std    
    141b: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    141d: fc                    cld
    141e: c3                    ret

  000000000000141f <memcpy>:
    141f: 48 89 f8              mov    %rdi,%rax
    1422: 48 89 d1              mov    %rdx,%rcx
    1425: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    1427: c3                    ret

  0000000000001428 <memset>:
    1428: 48 89 f0              mov    %rsi,%rax
    142b: 48 89 d1              mov    %rdx,%rcx
    142e: 48 89 fa              mov    %rdi,%rdx
    1431: f3 aa                 rep stos %al,%es:(%rdi)
    1433: 48 89 d0              mov    %rdx,%rax
    1436: c3                    ret
```

Signed-off-by: Ammar Faizi <[email protected]>
---

Ammar Faizi (5):
  tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()`
  tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
  tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function

 tools/include/nolibc/arch-x86_64.h | 60 ++++++++++++++++++++++++++++++
 tools/include/nolibc/string.h      | 38 ++++++++-----------
 2 files changed, 75 insertions(+), 23 deletions(-)


base-commit: 3c9b7c4a228bf8cca2f92abb65575cdd54065302
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [RFC PATCH v1 1/5] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-08-30 13:57 [RFC PATCH v1 0/5] nolibc x86-64 string functions Ammar Faizi
@ 2023-08-30 13:57 ` Ammar Faizi
  2023-08-30 13:57 ` [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 13:57 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, Nicholas Rosenberg, Alviro Iskandar Setiawan,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

Simplify memcpy() and memmove() on the x86-64 arch.

The x86-64 arch has a 'rep movsb' instruction, which can perform
memcpy() using only a single instruction, given:

    %rdi = destination
    %rsi = source
    %rcx = length

Additionally, it can also handle the overlapping case by setting DF=1
(backward copy), which can be used as the memmove() implementation.

Before this patch:
```
  00000000000010ab <memmove>:
    10ab: 48 89 f8              mov    %rdi,%rax
    10ae: 31 c9                 xor    %ecx,%ecx
    10b0: 48 39 f7              cmp    %rsi,%rdi
    10b3: 48 83 d1 ff           adc    $0xffffffffffffffff,%rcx
    10b7: 48 85 d2              test   %rdx,%rdx
    10ba: 74 25                 je     10e1 <memmove+0x36>
    10bc: 48 83 c9 01           or     $0x1,%rcx
    10c0: 48 39 f0              cmp    %rsi,%rax
    10c3: 48 c7 c7 ff ff ff ff  mov    $0xffffffffffffffff,%rdi
    10ca: 48 0f 43 fa           cmovae %rdx,%rdi
    10ce: 48 01 cf              add    %rcx,%rdi
    10d1: 44 8a 04 3e           mov    (%rsi,%rdi,1),%r8b
    10d5: 44 88 04 38           mov    %r8b,(%rax,%rdi,1)
    10d9: 48 01 cf              add    %rcx,%rdi
    10dc: 48 ff ca              dec    %rdx
    10df: 75 f0                 jne    10d1 <memmove+0x26>
    10e1: c3                    ret

  00000000000010e2 <memcpy>:
    10e2: 48 89 f8              mov    %rdi,%rax
    10e5: 48 85 d2              test   %rdx,%rdx
    10e8: 74 12                 je     10fc <memcpy+0x1a>
    10ea: 31 c9                 xor    %ecx,%ecx
    10ec: 40 8a 3c 0e           mov    (%rsi,%rcx,1),%dil
    10f0: 40 88 3c 08           mov    %dil,(%rax,%rcx,1)
    10f4: 48 ff c1              inc    %rcx
    10f7: 48 39 ca              cmp    %rcx,%rdx
    10fa: 75 f0                 jne    10ec <memcpy+0xa>
    10fc: c3                    ret
```

After this patch:
```
  0000000000001093 <memmove>:
    1093: 48 89 f8              mov    %rdi,%rax
    1096: 48 89 d1              mov    %rdx,%rcx
    1099: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
    109e: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
    10a3: fd                    std
    10a4: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    10a6: fc                    cld
    10a7: c3                    ret

  00000000000010a8 <memcpy>:
    10a8: 48 89 f8              mov    %rdi,%rax
    10ab: 48 89 d1              mov    %rdx,%rcx
    10ae: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    10b0: c3                    ret
```

Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/arch-x86_64.h | 28 ++++++++++++++++++++++++++++
 tools/include/nolibc/string.h      |  4 ++++
 2 files changed, 32 insertions(+)

diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index e5ccb926c90306b6..c5162170a2ccdff1 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -173,4 +173,32 @@ void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_
 	__builtin_unreachable();
 }
 
+#define NOLIBC_ARCH_HAS_MEMMOVE
+void *memmove(void *dst, const void *src, size_t len);
+
+#define NOLIBC_ARCH_HAS_MEMCPY
+void *memcpy(void *dst, const void *src, size_t len);
+
+__asm__ (
+".section .text.nolibc_memmove\n"
+".weak memmove\n"
+"memmove:\n"
+	"movq %rdi, %rax\n"
+	"movq %rdx, %rcx\n"
+	"leaq -1(%rdi, %rcx), %rdi\n"
+	"leaq -1(%rsi, %rcx), %rsi\n"
+	"std\n"
+	"rep movsb\n"
+	"cld\n"
+	"retq\n"
+
+".section .text.nolibc_memcpy\n"
+".weak memcpy\n"
+"memcpy:\n"
+	"movq %rdi, %rax\n"
+	"movq %rdx, %rcx\n"
+	"rep movsb\n"
+	"retq\n"
+);
+
 #endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 0c2e06c7c4772bc6..6eca267ec6fa7177 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -49,6 +49,7 @@ void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
 	return dst;
 }
 
+#ifndef NOLIBC_ARCH_HAS_MEMMOVE
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
  */
@@ -72,13 +73,16 @@ void *memmove(void *dst, const void *src, size_t len)
 	}
 	return dst;
 }
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
 
+#ifndef NOLIBC_ARCH_HAS_MEMCPY
 /* must be exported, as it's used by libgcc on ARM */
 __attribute__((weak,unused,section(".text.nolibc_memcpy")))
 void *memcpy(void *dst, const void *src, size_t len)
 {
 	return _nolibc_memcpy_up(dst, src, len);
 }
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
 
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 13:57 [RFC PATCH v1 0/5] nolibc x86-64 string functions Ammar Faizi
  2023-08-30 13:57 ` [RFC PATCH v1 1/5] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
@ 2023-08-30 13:57 ` Ammar Faizi
  2023-08-30 14:08   ` Alviro Iskandar Setiawan
  2023-08-30 14:24   ` Alviro Iskandar Setiawan
  2023-08-30 13:57 ` [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()` Ammar Faizi
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 13:57 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, Nicholas Rosenberg, Alviro Iskandar Setiawan,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

Simplify memset() on the x86-64 arch.

The x86-64 arch has a 'rep stosb' instruction, which can perform
memset() using only a single instruction, given:

    %al  = value (just like the second argument of memset())
    %rdi = destination
    %rcx = length

Before this patch:
```
  00000000000010c9 <memset>:
    10c9: 48 89 f8              mov    %rdi,%rax
    10cc: 48 85 d2              test   %rdx,%rdx
    10cf: 74 0e                 je     10df <memset+0x16>
    10d1: 31 c9                 xor    %ecx,%ecx
    10d3: 40 88 34 08           mov    %sil,(%rax,%rcx,1)
    10d7: 48 ff c1              inc    %rcx
    10da: 48 39 ca              cmp    %rcx,%rdx
    10dd: 75 f4                 jne    10d3 <memset+0xa>
    10df: c3                    ret
```

After this patch:
```
  00000000000010b1 <memset>:
    10b1: 48 89 f0              mov    %rsi,%rax
    10b4: 48 89 d1              mov    %rdx,%rcx
    10b7: 48 89 fa              mov    %rdi,%rdx
    10ba: f3 aa                 rep stos %al,%es:(%rdi)
    10bc: 48 89 d0              mov    %rdx,%rax
    10bf: c3                    ret
```

Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/arch-x86_64.h | 13 +++++++++++++
 tools/include/nolibc/string.h      |  2 ++
 2 files changed, 15 insertions(+)

diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index c5162170a2ccdff1..42f2674ad1ecdd64 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -179,6 +179,9 @@ void *memmove(void *dst, const void *src, size_t len);
 #define NOLIBC_ARCH_HAS_MEMCPY
 void *memcpy(void *dst, const void *src, size_t len);
 
+#define NOLIBC_ARCH_HAS_MEMSET
+void *memset(void *dst, int c, size_t len);
+
 __asm__ (
 ".section .text.nolibc_memmove\n"
 ".weak memmove\n"
@@ -199,6 +202,16 @@ __asm__ (
 	"movq %rdx, %rcx\n"
 	"rep movsb\n"
 	"retq\n"
+
+".section .text.nolibc_memset\n"
+".weak memset\n"
+"memset:\n"
+	"movq %rsi, %rax\n"
+	"movq %rdx, %rcx\n"
+	"movq %rdi, %rdx\n"
+	"rep stosb\n"
+	"movq %rdx, %rax\n"
+	"retq\n"
 );
 
 #endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 6eca267ec6fa7177..1bad6121ef8c4ab5 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -84,6 +84,7 @@ void *memcpy(void *dst, const void *src, size_t len)
 }
 #endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
 
+#ifndef NOLIBC_ARCH_HAS_MEMSET
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
  */
@@ -99,6 +100,7 @@ void *memset(void *dst, int b, size_t len)
 	}
 	return dst;
 }
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMSET */
 
 static __attribute__((unused))
 char *strchr(const char *s, int c)
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()`
  2023-08-30 13:57 [RFC PATCH v1 0/5] nolibc x86-64 string functions Ammar Faizi
  2023-08-30 13:57 ` [RFC PATCH v1 1/5] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
  2023-08-30 13:57 ` [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
@ 2023-08-30 13:57 ` Ammar Faizi
  2023-08-30 21:26   ` Willy Tarreau
  2023-08-30 13:57 ` [RFC PATCH v1 4/5] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 13:57 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, Nicholas Rosenberg, Alviro Iskandar Setiawan,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

Simplify memcmp() on the x86-64 arch.

The x86-64 arch has a 'rep cmpsb' instruction, which can be used to
implement the memcmp() function.

    %rdi = source 1
    %rsi = source 2
    %rcx = length

Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/arch-x86_64.h | 19 +++++++++++++++++++
 tools/include/nolibc/string.h      |  2 ++
 2 files changed, 21 insertions(+)

diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index 42f2674ad1ecdd64..6c1b54ba9f774e7b 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -214,4 +214,23 @@ __asm__ (
 	"retq\n"
 );
 
+#define NOLIBC_ARCH_HAS_MEMCMP
+static int memcmp(const void *s1, const void *s2, size_t n)
+{
+	const unsigned char *p1 = s1;
+	const unsigned char *p2 = s2;
+
+	if (!n)
+		return 0;
+
+	__asm__ volatile (
+		"rep cmpsb"
+		: "+D"(p2), "+S"(p1), "+c"(n)
+		: "m"(*(const unsigned char (*)[n])s1),
+		  "m"(*(const unsigned char (*)[n])s2)
+	);
+
+	return p1[-1] - p2[-1];
+}
+
 #endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 1bad6121ef8c4ab5..3c941289d5dd0985 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -15,6 +15,7 @@ static void *malloc(size_t len);
  * As much as possible, please keep functions alphabetically sorted.
  */
 
+#ifndef NOLIBC_ARCH_HAS_MEMCMP
 static __attribute__((unused))
 int memcmp(const void *s1, const void *s2, size_t n)
 {
@@ -26,6 +27,7 @@ int memcmp(const void *s1, const void *s2, size_t n)
 	}
 	return c1;
 }
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCMP */
 
 static __attribute__((unused))
 void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v1 4/5] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
  2023-08-30 13:57 [RFC PATCH v1 0/5] nolibc x86-64 string functions Ammar Faizi
                   ` (2 preceding siblings ...)
  2023-08-30 13:57 ` [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()` Ammar Faizi
@ 2023-08-30 13:57 ` Ammar Faizi
  2023-08-30 21:27   ` Willy Tarreau
  2023-08-30 13:57 ` [RFC PATCH v1 5/5] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 13:57 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, Nicholas Rosenberg, Alviro Iskandar Setiawan,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.

Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/string.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 3c941289d5dd0985..b0d7d8e143f61741 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -41,16 +41,6 @@ void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
 	return dst;
 }
 
-static __attribute__((unused))
-void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
-{
-	while (len) {
-		len--;
-		((char *)dst)[len] = ((const char *)src)[len];
-	}
-	return dst;
-}
-
 #ifndef NOLIBC_ARCH_HAS_MEMMOVE
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [RFC PATCH v1 5/5] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
  2023-08-30 13:57 [RFC PATCH v1 0/5] nolibc x86-64 string functions Ammar Faizi
                   ` (3 preceding siblings ...)
  2023-08-30 13:57 ` [RFC PATCH v1 4/5] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
@ 2023-08-30 13:57 ` Ammar Faizi
  2023-08-30 21:32 ` [RFC PATCH v1 0/5] nolibc x86-64 string functions Willy Tarreau
  2023-09-01 11:34 ` David Laight
  6 siblings, 0 replies; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 13:57 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, Nicholas Rosenberg, Alviro Iskandar Setiawan,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.

Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/string.h | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index b0d7d8e143f61741..920a0a6c922dbc42 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -29,18 +29,6 @@ int memcmp(const void *s1, const void *s2, size_t n)
 }
 #endif /* #ifndef NOLIBC_ARCH_HAS_MEMCMP */
 
-static __attribute__((unused))
-void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
-{
-	size_t pos = 0;
-
-	while (pos < len) {
-		((char *)dst)[pos] = ((const char *)src)[pos];
-		pos++;
-	}
-	return dst;
-}
-
 #ifndef NOLIBC_ARCH_HAS_MEMMOVE
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
@@ -72,7 +60,13 @@ void *memmove(void *dst, const void *src, size_t len)
 __attribute__((weak,unused,section(".text.nolibc_memcpy")))
 void *memcpy(void *dst, const void *src, size_t len)
 {
-	return _nolibc_memcpy_up(dst, src, len);
+	size_t pos = 0;
+
+	while (pos < len) {
+		((char *)dst)[pos] = ((const char *)src)[pos];
+		pos++;
+	}
+	return dst;
 }
 #endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
 
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 13:57 ` [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
@ 2023-08-30 14:08   ` Alviro Iskandar Setiawan
  2023-08-30 14:13     ` Ammar Faizi
  2023-08-30 14:24   ` Alviro Iskandar Setiawan
  1 sibling, 1 reply; 29+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-08-30 14:08 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 8:57 PM Ammar Faizi wrote:
> +".section .text.nolibc_memset\n"
> +".weak memset\n"
> +"memset:\n"
> +       "movq %rsi, %rax\n"
> +       "movq %rdx, %rcx\n"
> +       "movq %rdi, %rdx\n"
> +       "rep stosb\n"
> +       "movq %rdx, %rax\n"
> +       "retq\n"

The first instruction could be:

   movl %esi, %eax

That's smaller. Also, the second argument of memset() is an int
anyway, so there is no need to have a full 64-bit copy of %rsi in
%rax.

-- Viro

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 14:08   ` Alviro Iskandar Setiawan
@ 2023-08-30 14:13     ` Ammar Faizi
  0 siblings, 0 replies; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 14:13 UTC (permalink / raw)
  To: Alviro Iskandar Setiawan
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 09:08:05PM +0700, Alviro Iskandar Setiawan wrote:
> The first instruction could be:
> 
>    movl %esi, %eax
> 
> That's smaller. Also, the second argument of memset() is an int
> anyway, so there is no need to have a full 64-bit copy of %rsi in
> %rax.

Agree, noted.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 13:57 ` [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
  2023-08-30 14:08   ` Alviro Iskandar Setiawan
@ 2023-08-30 14:24   ` Alviro Iskandar Setiawan
  2023-08-30 15:09     ` Ammar Faizi
  1 sibling, 1 reply; 29+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-08-30 14:24 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 8:57 PM Ammar Faizi wrote:
>   00000000000010b1 <memset>:
>     10b1: 48 89 f0              mov    %rsi,%rax
>     10b4: 48 89 d1              mov    %rdx,%rcx
>     10b7: 48 89 fa              mov    %rdi,%rdx
>     10ba: f3 aa                 rep stos %al,%es:(%rdi)
>     10bc: 48 89 d0              mov    %rdx,%rax
>     10bf: c3                    ret

Just a small idea to shrink this more, "mov %rdi, %rdx" and "mov %rdx,
%rax" can be replaced with "push %rdi" and "pop %rax" (they are just a
byte). So we can save 4 bytes more.

0000000000001500 <memset>:
    1500: 48 89 f0     mov    %rsi,%rax
    1503: 48 89 d1     mov    %rdx,%rcx
    1506: 57           push   %rdi
    1507: f3 aa        rep stos %al,%es:(%rdi)
    1509: 58           pop    %rax
    150a: c3           ret

But I know you don't like it because it costs extra memory access.

-- Viro

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 14:24   ` Alviro Iskandar Setiawan
@ 2023-08-30 15:09     ` Ammar Faizi
  2023-08-30 15:23       ` Willy Tarreau
  0 siblings, 1 reply; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 15:09 UTC (permalink / raw)
  To: Alviro Iskandar Setiawan
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 09:24:45PM +0700, Alviro Iskandar Setiawan wrote:
> Just a small idea to shrink this more, "mov %rdi, %rdx" and "mov %rdx,
> %rax" can be replaced with "push %rdi" and "pop %rax" (they are just a
> byte). So we can save 4 bytes more.
> 
> 0000000000001500 <memset>:
>     1500: 48 89 f0     mov    %rsi,%rax
>     1503: 48 89 d1     mov    %rdx,%rcx
>     1506: 57           push   %rdi
>     1507: f3 aa        rep stos %al,%es:(%rdi)
>     1509: 58           pop    %rax
>     150a: c3           ret
> 
> But I know you don't like it because it costs extra memory access.

Yes, that's an extra memory access. But I believe it doesn't hurt
someone targetting -Os. In many cases, the compilers use push/pop to
align the stack before a 'call' instruction. If they want to avoid extra
memory access, they could have used "subq $8, %rsp" and "addq $8, %rsp".

For example: https://godbolt.org/z/Tzc1xWGEn

C code:
```
int fx(int b);
int fy(int a)
{
    return 1 + fx(a);
}
```

Targetting -Os, both clang and gcc compile it to:
```
fy:
    pushq   %rax
    call    fx
    popq    %rdx
    incl    %eax
    ret
```

Targetting -O2:
```
fy:
    subq    $8, %rsp
    call    fx
    addq    $8, %rsp
    addl    $1, %eax
    ret
```

That pushq/popq pair doesn't actually preserve anything; it's just to
align the %rsp at 16 bytes on 'call'. IOW, sometimes having extra memory
access to get a smaller code size is acceptable.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 15:09     ` Ammar Faizi
@ 2023-08-30 15:23       ` Willy Tarreau
  2023-08-30 15:44         ` Ammar Faizi
  0 siblings, 1 reply; 29+ messages in thread
From: Willy Tarreau @ 2023-08-30 15:23 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Alviro Iskandar Setiawan, Thomas Weißschuh,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 10:09:51PM +0700, Ammar Faizi wrote:
> On Wed, Aug 30, 2023 at 09:24:45PM +0700, Alviro Iskandar Setiawan wrote:
> > Just a small idea to shrink this more, "mov %rdi, %rdx" and "mov %rdx,
> > %rax" can be replaced with "push %rdi" and "pop %rax" (they are just a
> > byte). So we can save 4 bytes more.
> > 
> > 0000000000001500 <memset>:
> >     1500: 48 89 f0     mov    %rsi,%rax
> >     1503: 48 89 d1     mov    %rdx,%rcx
> >     1506: 57           push   %rdi
> >     1507: f3 aa        rep stos %al,%es:(%rdi)
> >     1509: 58           pop    %rax
> >     150a: c3           ret
> > 
> > But I know you don't like it because it costs extra memory access.
> 
> Yes, that's an extra memory access. But I believe it doesn't hurt
> someone targetting -Os. In many cases, the compilers use push/pop to
> align the stack before a 'call' instruction. If they want to avoid extra
> memory access, they could have used "subq $8, %rsp" and "addq $8, %rsp".

Then "xchg %esi, %eax" is just one byte with no memory access ;-)

Willy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 15:23       ` Willy Tarreau
@ 2023-08-30 15:44         ` Ammar Faizi
  2023-08-30 15:51           ` Willy Tarreau
  0 siblings, 1 reply; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 15:44 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Alviro Iskandar Setiawan, Thomas Weißschuh,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 05:23:22PM +0200, Willy Tarreau wrote:
> Then "xchg %esi, %eax" is just one byte with no memory access ;-)

Perfect!

Now I got this, shorter than "movl %esi, %eax":
```
0000000000001500 <memset>:
    1500: 96          xchg   %eax,%esi
    1501: 48 89 d1    mov    %rdx,%rcx
    1504: 57          push   %rdi
    1505: f3 aa       rep stos %al,%es:(%rdi)
    1507: 58          pop    %rax
    1508: c3          ret
```

Unfortunately, the xchg trick doesn't yield smaller machine code for
%rdx, %rcx. Lol.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 15:44         ` Ammar Faizi
@ 2023-08-30 15:51           ` Willy Tarreau
  2023-08-30 16:08             ` Ammar Faizi
  0 siblings, 1 reply; 29+ messages in thread
From: Willy Tarreau @ 2023-08-30 15:51 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Alviro Iskandar Setiawan, Thomas Weißschuh,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 10:44:53PM +0700, Ammar Faizi wrote:
> On Wed, Aug 30, 2023 at 05:23:22PM +0200, Willy Tarreau wrote:
> > Then "xchg %esi, %eax" is just one byte with no memory access ;-)
> 
> Perfect!
> 
> Now I got this, shorter than "movl %esi, %eax":
> ```
> 0000000000001500 <memset>:
>     1500: 96          xchg   %eax,%esi
>     1501: 48 89 d1    mov    %rdx,%rcx
>     1504: 57          push   %rdi
>     1505: f3 aa       rep stos %al,%es:(%rdi)
>     1507: 58          pop    %rax
>     1508: c3          ret
> ```
> 
> Unfortunately, the xchg trick doesn't yield smaller machine code for
> %rdx, %rcx. Lol.

Normal, that's because historically "xchg ax, regX" was a single-byte 0x9X
on 8086, then it turned to 32-bit keeping the same encoding, like many
instructions (note that NOP is encoded as xchg ax,ax). It remains short
when you can sacrifice the other register, or restore it later using yet
another xchg. For rcx/rdx a push/pop could do it as they should also be
a single-byte 0x5X even in long mode unless I'm mistaken. Thus if you
absolutely want to squeeze that 9th byte to end up with a 8-byte function
you could probably do:

    xchg %eax, %esi      1
    push %rdx            1
    pop %rcx             1
    push %rdi            1
    rep movsb            2
    pop %rax             1
    ret                  1
    ------------- Total: 8 bytes :-)

Willy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 15:51           ` Willy Tarreau
@ 2023-08-30 16:08             ` Ammar Faizi
  2023-08-30 16:11               ` Alviro Iskandar Setiawan
  0 siblings, 1 reply; 29+ messages in thread
From: Ammar Faizi @ 2023-08-30 16:08 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Alviro Iskandar Setiawan, Thomas Weißschuh,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 05:51:52PM +0200, Willy Tarreau wrote:
> Normal, that's because historically "xchg ax, regX" was a single-byte 0x9X
> on 8086, then it turned to 32-bit keeping the same encoding, like many
> instructions (note that NOP is encoded as xchg ax,ax). It remains short
> when you can sacrifice the other register, or restore it later using yet
> another xchg. For rcx/rdx a push/pop could do it as they should also be
> a single-byte 0x5X even in long mode unless I'm mistaken. Thus if you
> absolutely want to squeeze that 9th byte to end up with a 8-byte function
> you could probably do:
> 
>     xchg %eax, %esi      1
>     push %rdx            1
>     pop %rcx             1
>     push %rdi            1
>     rep movsb            2 [sic]
>     pop %rax             1
>     ret                  1
>     ------------- Total: 8 bytes :-)

Fun!

We're not doing a code golf game, though. So, I think I will leave the
"mov %rdx, %rcx" as is. Otherwise, I would be tempted to do that all
over the place.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-08-30 16:08             ` Ammar Faizi
@ 2023-08-30 16:11               ` Alviro Iskandar Setiawan
  0 siblings, 0 replies; 29+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-08-30 16:11 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 11:08 PM Ammar Faizi wrote:
> On Wed, Aug 30, 2023 at 05:51:52PM +0200, Willy Tarreau wrote:
> >     xchg %eax, %esi      1
> >     push %rdx            1
> >     pop %rcx             1
> >     push %rdi            1
> >     rep movsb            2 [sic]
> >     pop %rax             1
> >     ret                  1
> >     ------------- Total: 8 bytes :-)

That's beautiful!

-- Viro

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()`
  2023-08-30 13:57 ` [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()` Ammar Faizi
@ 2023-08-30 21:26   ` Willy Tarreau
  2023-09-01  3:24     ` Ammar Faizi
  2023-09-04  8:26     ` David Laight
  0 siblings, 2 replies; 29+ messages in thread
From: Willy Tarreau @ 2023-08-30 21:26 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 08:57:24PM +0700, Ammar Faizi wrote:
> Simplify memcmp() on the x86-64 arch.
> 
> The x86-64 arch has a 'rep cmpsb' instruction, which can be used to
> implement the memcmp() function.
> 
>     %rdi = source 1
>     %rsi = source 2
>     %rcx = length
> 
> Signed-off-by: Ammar Faizi <[email protected]>
> ---
>  tools/include/nolibc/arch-x86_64.h | 19 +++++++++++++++++++
>  tools/include/nolibc/string.h      |  2 ++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
> index 42f2674ad1ecdd64..6c1b54ba9f774e7b 100644
> --- a/tools/include/nolibc/arch-x86_64.h
> +++ b/tools/include/nolibc/arch-x86_64.h
> @@ -214,4 +214,23 @@ __asm__ (
>  	"retq\n"
>  );
>  
> +#define NOLIBC_ARCH_HAS_MEMCMP
> +static int memcmp(const void *s1, const void *s2, size_t n)
> +{
> +	const unsigned char *p1 = s1;
> +	const unsigned char *p2 = s2;
> +
> +	if (!n)
> +		return 0;
> +
> +	__asm__ volatile (
> +		"rep cmpsb"
> +		: "+D"(p2), "+S"(p1), "+c"(n)
> +		: "m"(*(const unsigned char (*)[n])s1),
> +		  "m"(*(const unsigned char (*)[n])s2)
> +	);
> +
> +	return p1[-1] - p2[-1];
> +}

Out of curiosity, given that you implemented the 3 other ones directly
in an asm statement, is there a particular reason this one mixes a bit
of C and asm ? It would probably be something around this, in the same
vein:

  memcmp:
    xchg  %esi,%eax   // source1
    mov   %rdx,%rcx   // count
    rep   cmpsb       // source2 in rdi; sets ZF on equal, CF if src1<src2
    seta  %al         // 0 if src2 <= src1, 1 if src2 > src1
    sbb   $0, %al     // 0 if src2 == src1, -1 if src2 < src1, 1 if src2 > src1
    movsx %al, %eax   // sign extend to %eax
    ret

Note that the output logic could have to be revisited, I'm not certain but
at first glance it looks valid.

Regards,
Willy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 4/5] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
  2023-08-30 13:57 ` [RFC PATCH v1 4/5] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
@ 2023-08-30 21:27   ` Willy Tarreau
  0 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2023-08-30 21:27 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 08:57:25PM +0700, Ammar Faizi wrote:
> This nolibc internal function is not used. Delete it. It was probably
> supposed to handle memmove(), but today the memmove() has its own
> implementation.

Yeah good catch for these two, I'm pretty sure as well they were used
for memmove().

Thanks!
Willy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
  2023-08-30 13:57 [RFC PATCH v1 0/5] nolibc x86-64 string functions Ammar Faizi
                   ` (4 preceding siblings ...)
  2023-08-30 13:57 ` [RFC PATCH v1 5/5] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
@ 2023-08-30 21:32 ` Willy Tarreau
  2023-09-01 11:34 ` David Laight
  6 siblings, 0 replies; 29+ messages in thread
From: Willy Tarreau @ 2023-08-30 21:32 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

Hi Ammar,

On Wed, Aug 30, 2023 at 08:57:21PM +0700, Ammar Faizi wrote:
> Hi Willy,
> 
> This is an RFC patchset for nolibc x86-64 string functions. There are 5
> patches in this series.
> 
> ## Patch 1-3: Use `rep movsb`, `rep stosb`, and `rep cmpsb` for:
>     - memcpy() and memmove()
>     - memset()
>     - memcmp()
> respectively. They can simplify the generated ASM code.
> 
> Patch 4 and 5 are not related, just a small cleanup.

So overall I'm fine with this, I think it's reasonable. As you said
we're not trying to chase the very last byte, but for such functions
it's also nice if they can remain small. Some of them might even
benefit from being inlined by the way (in this case they'd rather
move to C functions with an asm() statement), because the call
instruction and the register moves or spilling code will generally
be larger than the functions themselves. That might be worth checking.
Ah no, we cannot because some of them are called from libgcc and friends.
Or we may need to mark them inline and weak without static, I'm not sure
how well that works.

Please just let me know if you intend to change a few things based on
previous comments, and also this memcmp() stuff that's both C and asm.

Thanks!
Willy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()`
  2023-08-30 21:26   ` Willy Tarreau
@ 2023-09-01  3:24     ` Ammar Faizi
  2023-09-01  3:35       ` Willy Tarreau
  2023-09-04  8:26     ` David Laight
  1 sibling, 1 reply; 29+ messages in thread
From: Ammar Faizi @ 2023-09-01  3:24 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Wed, Aug 30, 2023 at 11:26:57PM +0200, Willy Tarreau wrote:
> Out of curiosity, given that you implemented the 3 other ones directly
> in an asm statement, is there a particular reason this one mixes a bit
> of C and asm ?

Because this one maybe unused. The other are explicitly exported.

> It would probably be something around this, in the same vein:
> 
>   memcmp:
>     xchg  %esi,%eax   // source1
>     mov   %rdx,%rcx   // count
>     rep   cmpsb       // source2 in rdi; sets ZF on equal, CF if src1<src2
>     seta  %al         // 0 if src2 <= src1, 1 if src2 > src1
>     sbb   $0, %al     // 0 if src2 == src1, -1 if src2 < src1, 1 if src2 > src1
>     movsx %al, %eax   // sign extend to %eax
>     ret
> 
> Note that the output logic could have to be revisited, I'm not certain but
> at first glance it looks valid.

After thinking about this more, I think I'll drop the memcmp() patch
because it will prevent optimization when comparing a small value.

For example, without __asm__:

    memcmp(var, "abcd", 4);

may compile to:

    cmpl $0x64636261, %reg
    ...something...

But with __asm__, the compiler can't do that. Thus, it's not worth
optimizing the memcmp() in this case.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()`
  2023-09-01  3:24     ` Ammar Faizi
@ 2023-09-01  3:35       ` Willy Tarreau
  2023-09-01  7:27         ` Ammar Faizi
  0 siblings, 1 reply; 29+ messages in thread
From: Willy Tarreau @ 2023-09-01  3:35 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Fri, Sep 01, 2023 at 10:24:42AM +0700, Ammar Faizi wrote:
> On Wed, Aug 30, 2023 at 11:26:57PM +0200, Willy Tarreau wrote:
> > Out of curiosity, given that you implemented the 3 other ones directly
> > in an asm statement, is there a particular reason this one mixes a bit
> > of C and asm ?
> 
> Because this one maybe unused. The other are explicitly exported.

Makes sense, indeed.

> > It would probably be something around this, in the same vein:
> > 
> >   memcmp:
> >     xchg  %esi,%eax   // source1
> >     mov   %rdx,%rcx   // count
> >     rep   cmpsb       // source2 in rdi; sets ZF on equal, CF if src1<src2
> >     seta  %al         // 0 if src2 <= src1, 1 if src2 > src1
> >     sbb   $0, %al     // 0 if src2 == src1, -1 if src2 < src1, 1 if src2 > src1
> >     movsx %al, %eax   // sign extend to %eax
> >     ret
> > 
> > Note that the output logic could have to be revisited, I'm not certain but
> > at first glance it looks valid.
> 
> After thinking about this more, I think I'll drop the memcmp() patch
> because it will prevent optimization when comparing a small value.
> 
> For example, without __asm__:
> 
>     memcmp(var, "abcd", 4);
> 
> may compile to:
> 
>     cmpl $0x64636261, %reg
>     ...something...
> 
> But with __asm__, the compiler can't do that. Thus, it's not worth
> optimizing the memcmp() in this case.

Ah you're totally right!

Willy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()`
  2023-09-01  3:35       ` Willy Tarreau
@ 2023-09-01  7:27         ` Ammar Faizi
  0 siblings, 0 replies; 29+ messages in thread
From: Ammar Faizi @ 2023-09-01  7:27 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Fri, Sep 01, 2023 at 05:35:08AM +0200, Willy Tarreau wrote:
> On Fri, Sep 01, 2023 at 10:24:42AM +0700, Ammar Faizi wrote:
> > After thinking about this more, I think I'll drop the memcmp() patch
> > because it will prevent optimization when comparing a small value.
> > 
> > For example, without __asm__:
> > 
> >     memcmp(var, "abcd", 4);
> > 
> > may compile to:
> > 
> >     cmpl $0x64636261, %reg
> >     ...something...
> > 
> > But with __asm__, the compiler can't do that. Thus, it's not worth
> > optimizing the memcmp() in this case.
> 
> Ah you're totally right!

So, it turns out that such assumption is wrong. The compiler cannot
optimize the current memcmp() into that. I just posted a question on SO:

   https://stackoverflow.com/questions/77020562/what-prevents-the-compiler-from-optimizing-a-hand-written-memcmp

Given:
```
  bool test_data(void *data)
  {
          return memcmp(data, "abcd", 4) == 0;
  }
```

The result when using default the <string.h> memcmp (good):
```
  test_data:
      cmpl    $1684234849, (%rdi)
      sete    %al
      ret
```

The result when using nolibc memcmp() (bad):
```
  test_data:
      cmpb    $97, (%rdi)
      jne     .L5
      cmpb    $98, 1(%rdi)
      jne     .L5
      cmpb    $99, 2(%rdi)
      jne     .L5
      cmpb    $100, 3(%rdi)
      sete    %al
      ret
  .L5:
      xorl    %eax, %eax
      ret
```

Link: https://godbolt.org/z/TT94r3bvf

This is because apart from the input length, the current nolibc
`memcmp()` must stop comparing the next byte if it finds a non-match
byte. Imagine what happens if we call:

```
  char xstr[] = {'a', 'b', 'x'};
  test_data(x);
```

In that case, the compiler may read past xstr if it uses a dword cmp, it
can also lead to segfault in particular circumstances using a dword cmp.

What the current nolibc memcmp() does from the C language view:

  1) Compare one byte at a time.
  2) Must stop comparing the next byte if it finds a non-match byte.

Because point (2) comes in, the compiler is not allowed to optimize
nolibc memcmp() into a wider load; otherwise, it may hit a segfault.
That also means it cannot vectorize the memcmp() loop.

On the other hand, memcpy() and memset() don't have such a restriction
so they can vectorize.

The real memcmp() assumes that both sources are at least `n` length in
size, allowing for a wider load. The current nolibc memcmp()
implementation doesn't reflect that assumption in the C code.

IOW, the real built-in memcmp() is undefined behavior for this code:
```
    char x = 'q';
    return memcmp(&x, "abcd", 4);
```
but the current nolibc memcmp() is well-defined behavior (well, must be,
as what the C code reflects).

We can improve nolibc memcmp() by casting the sources to a wider type
like (ulong, uint, ushort). But that's another story for another RFC
patchset.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH v1 0/5] nolibc x86-64 string functions
  2023-08-30 13:57 [RFC PATCH v1 0/5] nolibc x86-64 string functions Ammar Faizi
                   ` (5 preceding siblings ...)
  2023-08-30 21:32 ` [RFC PATCH v1 0/5] nolibc x86-64 string functions Willy Tarreau
@ 2023-09-01 11:34 ` David Laight
  2023-09-01 11:46   ` Willy Tarreau
  6 siblings, 1 reply; 29+ messages in thread
From: David Laight @ 2023-09-01 11:34 UTC (permalink / raw)
  To: 'Ammar Faizi', Willy Tarreau, Thomas Weißschuh
  Cc: Nicholas Rosenberg, Alviro Iskandar Setiawan,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

From: Ammar Faizi <[email protected]>
> Sent: 30 August 2023 14:57
> 
> This is an RFC patchset for nolibc x86-64 string functions. There are 5
> patches in this series.
> 
> ## Patch 1-3: Use `rep movsb`, `rep stosb`, and `rep cmpsb` for:
>     - memcpy() and memmove()
>     - memset()
>     - memcmp()
> respectively. They can simplify the generated ASM code.
> 
...
> After this series:
> ```
>   000000000000140a <memmove>:
>     140a: 48 89 f8              mov    %rdi,%rax
>     140d: 48 89 d1              mov    %rdx,%rcx
>     1410: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
>     1415: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
>     141a: fd                    std
>     141b: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
>     141d: fc                    cld
>     141e: c3                    ret

Isn't that completely broken?

You need to select between forwards and backwards moves.
Since forwards moves are preferred it is best to do
	if (dst - src < len)
		backards_copy()
	else
		formwards_copy()

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
  2023-09-01 11:34 ` David Laight
@ 2023-09-01 11:46   ` Willy Tarreau
  2023-09-01 13:06     ` Ammar Faizi
  0 siblings, 1 reply; 29+ messages in thread
From: Willy Tarreau @ 2023-09-01 11:46 UTC (permalink / raw)
  To: David Laight
  Cc: 'Ammar Faizi', Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Fri, Sep 01, 2023 at 11:34:18AM +0000, David Laight wrote:
> From: Ammar Faizi <[email protected]>
> > Sent: 30 August 2023 14:57
> > 
> > This is an RFC patchset for nolibc x86-64 string functions. There are 5
> > patches in this series.
> > 
> > ## Patch 1-3: Use `rep movsb`, `rep stosb`, and `rep cmpsb` for:
> >     - memcpy() and memmove()
> >     - memset()
> >     - memcmp()
> > respectively. They can simplify the generated ASM code.
> > 
> ...
> > After this series:
> > ```
> >   000000000000140a <memmove>:
> >     140a: 48 89 f8              mov    %rdi,%rax
> >     140d: 48 89 d1              mov    %rdx,%rcx
> >     1410: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
> >     1415: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
> >     141a: fd                    std
> >     141b: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
> >     141d: fc                    cld
> >     141e: c3                    ret
> 
> Isn't that completely broken?
> 
> You need to select between forwards and backwards moves.
> Since forwards moves are preferred it is best to do
> 	if (dst - src < len)
> 		backards_copy()
> 	else
> 		formwards_copy()
> 
> 	David

You're completely right indeed, reminds me about the copy_up/copy_down
that were not used anymore :-)

Willy

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
  2023-09-01 11:46   ` Willy Tarreau
@ 2023-09-01 13:06     ` Ammar Faizi
  2023-09-01 14:23       ` David Laight
  0 siblings, 1 reply; 29+ messages in thread
From: Ammar Faizi @ 2023-09-01 13:06 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: David Laight, Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Fri, Sep 01, 2023 at 01:46:44PM +0200, Willy Tarreau wrote:
> On Fri, Sep 01, 2023 at 11:34:18AM +0000, David Laight wrote:
> > Isn't that completely broken?
> > 
> > You need to select between forwards and backwards moves.
> > Since forwards moves are preferred it is best to do
> > 	if (dst - src < len)
> > 		backards_copy()
> > 	else
> > 		formwards_copy()
> > 
> > 	David
> 
> You're completely right indeed, reminds me about the copy_up/copy_down
> that were not used anymore :-)

I'm an idiot, will fix that. Another attempt as suggested below:

__asm__ (
".section .text.nolibc_memmove\n"
".weak memmove\n"
"memmove:\n"
"    movq    %rdx, %rcx\n"
"    movq    %rdi, %rdx\n"
"    movq    %rdi, %rax\n"
"    subq    %rsi, %rdx\n"
"    cmpq    %rcx, %rdx\n"
"    jnb     .Lforward_copy\n"
"    leaq    -1(%rdi, %rcx, 1), %rdi\n"
"    leaq    -1(%rsi, %rcx, 1), %rsi\n"
"    std\n"
".Lforward_copy:\n"
"    rep movsb\n"
"    cld\n"
"    ret\n"
);

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH v1 0/5] nolibc x86-64 string functions
  2023-09-01 13:06     ` Ammar Faizi
@ 2023-09-01 14:23       ` David Laight
  2023-09-01 14:41         ` Ammar Faizi
  0 siblings, 1 reply; 29+ messages in thread
From: David Laight @ 2023-09-01 14:23 UTC (permalink / raw)
  To: 'Ammar Faizi', Willy Tarreau
  Cc: Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

From: Ammar Faizi
> Sent: 01 September 2023 14:06
...
> > You're completely right indeed, reminds me about the copy_up/copy_down
> > that were not used anymore :-)
> 
> I'm an idiot, will fix that. Another attempt as suggested below:
> 
> __asm__ (
> ".section .text.nolibc_memmove\n"
> ".weak memmove\n"
> "memmove:\n"
> "    movq    %rdx, %rcx\n"
> "    movq    %rdi, %rdx\n"
> "    movq    %rdi, %rax\n"

You seem to have confused yourself about whether you are using %eax or %edx.

> "    subq    %rsi, %rdx\n"
> "    cmpq    %rcx, %rdx\n"
> "    jnb     .Lforward_copy\n"

I think I'd fall through to the forwards copy
and not worry about replicating the 'reps movsb' and 'ret'.
IIRC 'cld' can be slow as well.

> "    leaq    -1(%rdi, %rcx, 1), %rdi\n"
> "    leaq    -1(%rsi, %rcx, 1), %rsi\n"
> "    std\n"
> ".Lforward_copy:\n"
> "    rep movsb\n"
> "    cld\n"
> "    ret\n"
> );
> 
> --
> Ammar Faizi

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
  2023-09-01 14:23       ` David Laight
@ 2023-09-01 14:41         ` Ammar Faizi
  2023-09-01 14:54           ` David Laight
  0 siblings, 1 reply; 29+ messages in thread
From: Ammar Faizi @ 2023-09-01 14:41 UTC (permalink / raw)
  To: David Laight
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Fri, Sep 01, 2023 at 02:23:28PM +0000, David Laight wrote:
> From: Ammar Faizi
> > Sent: 01 September 2023 14:06
...
> > __asm__ (
> > ".section .text.nolibc_memmove\n"
> > ".weak memmove\n"
> > "memmove:\n"
> > "    movq    %rdx, %rcx\n"
> > "    movq    %rdi, %rdx\n"
> > "    movq    %rdi, %rax\n"
> 
> You seem to have confused yourself about whether you are using %eax or %edx.

What do you mean? They're all 64-bit pointers.

What I know is that the %rdx will be clobbered by "subq %rsi, %rdx"
below and the %rax should be return value. That's why I copy the %rdi
twice. memmove() returns the dst pointer. Did I miss something?

> > "    subq    %rsi, %rdx\n"
> > "    cmpq    %rcx, %rdx\n"
> > "    jnb     .Lforward_copy\n"
> 
> I think I'd fall through to the forwards copy
> and not worry about replicating the 'reps movsb' and 'ret'.
> IIRC 'cld' can be slow as well.

Alright, I will avoid cld for the forward copy.

> > "    leaq    -1(%rdi, %rcx, 1), %rdi\n"
> > "    leaq    -1(%rsi, %rcx, 1), %rsi\n"
> > "    std\n"
> > ".Lforward_copy:\n"
> > "    rep movsb\n"
> > "    cld\n"
> > "    ret\n"
> > );

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH v1 0/5] nolibc x86-64 string functions
  2023-09-01 14:41         ` Ammar Faizi
@ 2023-09-01 14:54           ` David Laight
  2023-09-01 15:20             ` Ammar Faizi
  0 siblings, 1 reply; 29+ messages in thread
From: David Laight @ 2023-09-01 14:54 UTC (permalink / raw)
  To: 'Ammar Faizi'
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

From: Ammar Faizi
> Sent: 01 September 2023 15:42
...
> > > "    movq    %rdx, %rcx\n"
> > > "    movq    %rdi, %rdx\n"
> > > "    movq    %rdi, %rax\n"
> >
> > You seem to have confused yourself about whether you are using %eax or %edx.
> 
> What do you mean? They're all 64-bit pointers.

%ax, %eax, %rax - what is the difference :-)

> What I know is that the %rdx will be clobbered by "subq %rsi, %rdx"
> below and the %rax should be return value. That's why I copy the %rdi
> twice. memmove() returns the dst pointer. Did I miss something?

I'd forgotten about the (stupid) return value.

I'm pretty sure it is an accident from the original pdp-11
implementation from the days before C had an explicit 'return'
statement.
(The pdp-11 I used ran RSX/11M - so had a Fortran compiler
not a C one.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [RFC PATCH v1 0/5] nolibc x86-64 string functions
  2023-09-01 14:54           ` David Laight
@ 2023-09-01 15:20             ` Ammar Faizi
  0 siblings, 0 replies; 29+ messages in thread
From: Ammar Faizi @ 2023-09-01 15:20 UTC (permalink / raw)
  To: David Laight
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Fri, Sep 01, 2023 at 02:54:56PM +0000, David Laight wrote:
> I'd forgotten about the (stupid) return value.
> 
> I'm pretty sure it is an accident from the original pdp-11
> implementation from the days before C had an explicit 'return'
> statement.
> (The pdp-11 I used ran RSX/11M - so had a Fortran compiler
> not a C one.)

You're old. I did not exist in that era. And my parents were still young :-)

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()`
  2023-08-30 21:26   ` Willy Tarreau
  2023-09-01  3:24     ` Ammar Faizi
@ 2023-09-04  8:26     ` David Laight
  1 sibling, 0 replies; 29+ messages in thread
From: David Laight @ 2023-09-04  8:26 UTC (permalink / raw)
  To: 'Willy Tarreau', Ammar Faizi
  Cc: Thomas Weißschuh, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

From: Willy Tarreau
> Sent: 30 August 2023 22:27
> 
> On Wed, Aug 30, 2023 at 08:57:24PM +0700, Ammar Faizi wrote:
> > Simplify memcmp() on the x86-64 arch.
> >
> > The x86-64 arch has a 'rep cmpsb' instruction, which can be used to
> > implement the memcmp() function.
> >
> >     %rdi = source 1
> >     %rsi = source 2
> >     %rcx = length
> >
> > Signed-off-by: Ammar Faizi <[email protected]>
> > ---
> >  tools/include/nolibc/arch-x86_64.h | 19 +++++++++++++++++++
> >  tools/include/nolibc/string.h      |  2 ++
> >  2 files changed, 21 insertions(+)
> >
> > diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
> > index 42f2674ad1ecdd64..6c1b54ba9f774e7b 100644
> > --- a/tools/include/nolibc/arch-x86_64.h
> > +++ b/tools/include/nolibc/arch-x86_64.h
> > @@ -214,4 +214,23 @@ __asm__ (
> >  	"retq\n"
> >  );
> >
> > +#define NOLIBC_ARCH_HAS_MEMCMP
> > +static int memcmp(const void *s1, const void *s2, size_t n)
> > +{
> > +	const unsigned char *p1 = s1;
> > +	const unsigned char *p2 = s2;
> > +
> > +	if (!n)
> > +		return 0;
> > +
> > +	__asm__ volatile (
> > +		"rep cmpsb"
> > +		: "+D"(p2), "+S"(p1), "+c"(n)
> > +		: "m"(*(const unsigned char (*)[n])s1),
> > +		  "m"(*(const unsigned char (*)[n])s2)
> > +	);
> > +
> > +	return p1[-1] - p2[-1];
> > +}
> 
> Out of curiosity, given that you implemented the 3 other ones directly
> in an asm statement, is there a particular reason this one mixes a bit
> of C and asm ? It would probably be something around this, in the same
> vein:
> 
>   memcmp:
>     xchg  %esi,%eax   // source1

Aren't the arguments in %rdi, %rsi and %rdx so you only
need to move the count (below) ?
(Looks like you copied memchr())

	David

>     mov   %rdx,%rcx   // count
>     rep   cmpsb       // source2 in rdi; sets ZF on equal, CF if src1<src2
>     seta  %al         // 0 if src2 <= src1, 1 if src2 > src1
>     sbb   $0, %al     // 0 if src2 == src1, -1 if src2 < src1, 1 if src2 > src1
>     movsx %al, %eax   // sign extend to %eax
>     ret
> 
> Note that the output logic could have to be revisited, I'm not certain but
> at first glance it looks valid.
> 
> Regards,
> Willy

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2023-09-04  8:27 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-30 13:57 [RFC PATCH v1 0/5] nolibc x86-64 string functions Ammar Faizi
2023-08-30 13:57 ` [RFC PATCH v1 1/5] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
2023-08-30 13:57 ` [RFC PATCH v1 2/5] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
2023-08-30 14:08   ` Alviro Iskandar Setiawan
2023-08-30 14:13     ` Ammar Faizi
2023-08-30 14:24   ` Alviro Iskandar Setiawan
2023-08-30 15:09     ` Ammar Faizi
2023-08-30 15:23       ` Willy Tarreau
2023-08-30 15:44         ` Ammar Faizi
2023-08-30 15:51           ` Willy Tarreau
2023-08-30 16:08             ` Ammar Faizi
2023-08-30 16:11               ` Alviro Iskandar Setiawan
2023-08-30 13:57 ` [RFC PATCH v1 3/5] tools/nolibc: x86-64: Use `rep cmpsb` for `memcmp()` Ammar Faizi
2023-08-30 21:26   ` Willy Tarreau
2023-09-01  3:24     ` Ammar Faizi
2023-09-01  3:35       ` Willy Tarreau
2023-09-01  7:27         ` Ammar Faizi
2023-09-04  8:26     ` David Laight
2023-08-30 13:57 ` [RFC PATCH v1 4/5] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
2023-08-30 21:27   ` Willy Tarreau
2023-08-30 13:57 ` [RFC PATCH v1 5/5] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
2023-08-30 21:32 ` [RFC PATCH v1 0/5] nolibc x86-64 string functions Willy Tarreau
2023-09-01 11:34 ` David Laight
2023-09-01 11:46   ` Willy Tarreau
2023-09-01 13:06     ` Ammar Faizi
2023-09-01 14:23       ` David Laight
2023-09-01 14:41         ` Ammar Faizi
2023-09-01 14:54           ` David Laight
2023-09-01 15:20             ` Ammar Faizi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox