GNU/Weeb Mailing List <[email protected]>
 help / color / mirror / Atom feed
* [RFC PATCH v2 0/4] nolibc x86-64 string functions
@ 2023-09-02  5:50 Ammar Faizi
  2023-09-02  5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02  5:50 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

Hi Willy,

This is an RFC patchset v2 for nolibc x86-64 string functions.

Changes in v2:
  - Shrink the memset code size:
      - Use pushq %rdi / popq %rax (Alviro).
      - Use xchg %eax, %esi (Willy).
  - Drop the memcmp patch (need more pondering).
  - Fix the broken memmove implementation (David).

There are 4 patches in this series:

## Patch 1-2: Use `rep movsb`, `rep stosb` for:
    - memcpy() and memmove()
    - memset()
respectively. They can simplify the generated ASM code.

Patch 3 and 4 are not related, just a small cleanup.

## Patch 3: Remove the `_nolibc_memcpy_down()` function
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.

## Patch 4: Remove the `_nolibc_memcpy_up()` function
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.

Before this series:
```
  0000000000401058 <memmove>:
    401058: 48 89 f8              movq   %rdi,%rax
    40105b: 31 c9                 xorl   %ecx,%ecx
    40105d: 48 39 f7              cmpq   %rsi,%rdi
    401060: 48 83 d1 ff           adcq   $0xffffffffffffffff,%rcx
    401064: 48 85 d2              testq  %rdx,%rdx
    401067: 74 25                 je     40108e <memmove+0x36>
    401069: 48 83 c9 01           orq    $0x1,%rcx
    40106d: 48 39 f0              cmpq   %rsi,%rax
    401070: 48 c7 c7 ff ff ff ff  movq   $0xffffffffffffffff,%rdi
    401077: 48 0f 43 fa           cmovaeq %rdx,%rdi
    40107b: 48 01 cf              addq   %rcx,%rdi
    40107e: 44 8a 04 3e           movb   (%rsi,%rdi,1),%r8b
    401082: 44 88 04 38           movb   %r8b,(%rax,%rdi,1)
    401086: 48 01 cf              addq   %rcx,%rdi
    401089: 48 ff ca              decq   %rdx
    40108c: 75 f0                 jne    40107e <memmove+0x26>
    40108e: c3                    retq

  000000000040108f <memcpy>:
    40108f: 48 89 f8              movq   %rdi,%rax
    401092: 48 85 d2              testq  %rdx,%rdx
    401095: 74 12                 je     4010a9 <memcpy+0x1a>
    401097: 31 c9                 xorl   %ecx,%ecx
    401099: 40 8a 3c 0e           movb   (%rsi,%rcx,1),%dil
    40109d: 40 88 3c 08           movb   %dil,(%rax,%rcx,1)
    4010a1: 48 ff c1              incq   %rcx
    4010a4: 48 39 ca              cmpq   %rcx,%rdx
    4010a7: 75 f0                 jne    401099 <memcpy+0xa>
    4010a9: c3                    retq

  00000000004010aa <memset>:
    4010aa: 48 89 f8              movq   %rdi,%rax
    4010ad: 48 85 d2              testq  %rdx,%rdx
    4010b0: 74 0e                 je     4010c0 <memset+0x16>
    4010b2: 31 c9                 xorl   %ecx,%ecx
    4010b4: 40 88 34 08           movb   %sil,(%rax,%rcx,1)
    4010b8: 48 ff c1              incq   %rcx
    4010bb: 48 39 ca              cmpq   %rcx,%rdx
    4010be: 75 f4                 jne    4010b4 <memset+0xa>
    4010c0: c3                    retq
```

After this series:
```
  0000000000401040 <memmove>:
    401040: 48 89 d1              movq   %rdx,%rcx
    401043: 48 89 fa              movq   %rdi,%rdx
    401046: 48 89 f8              movq   %rdi,%rax
    401049: 48 29 f2              subq   %rsi,%rdx
    40104c: 48 39 ca              cmpq   %rcx,%rdx
    40104f: 73 0f                 jae    401060 <memmove+0x20>
    401051: 48 8d 7c 0f ff        leaq   -0x1(%rdi,%rcx,1),%rdi
    401056: 48 8d 74 0e ff        leaq   -0x1(%rsi,%rcx,1),%rsi
    40105b: fd                    std
    40105c: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    40105e: fc                    cld
    40105f: c3                    retq
    401060: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    401062: c3                    retq

  0000000000401063 <memcpy>:
    401063: 48 89 f8              movq   %rdi,%rax
    401066: 48 89 d1              movq   %rdx,%rcx
    401069: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    40106b: c3                    retq

  000000000040106c <memset>:
    40106c: 96                    xchgl  %eax,%esi
    40106d: 48 89 d1              movq   %rdx,%rcx
    401070: 57                    pushq  %rdi
    401071: f3 aa                 rep stosb %al,%es:(%rdi)
    401073: 58                    popq   %rax
    401074: c3                    retq
```

Signed-off-by: Ammar Faizi <[email protected]>
---

Ammar Faizi (4):
  tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
  tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function

 tools/include/nolibc/arch-x86_64.h | 48 ++++++++++++++++++++++++++++++
 tools/include/nolibc/string.h      | 36 ++++++++--------------
 2 files changed, 61 insertions(+), 23 deletions(-)


base-commit: 3c9b7c4a228bf8cca2f92abb65575cdd54065302
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-09-02  5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
@ 2023-09-02  5:50 ` Ammar Faizi
  2023-09-02  6:07   ` Alviro Iskandar Setiawan
  2023-09-02  5:50 ` [RFC PATCH v2 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02  5:50 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

Simplify memcpy() and memmove() on the x86-64 arch.

The x86-64 arch has a 'rep movsb' instruction, which can perform
memcpy() using only a single instruction, given:

    %rdi = destination
    %rsi = source
    %rcx = length

Additionally, it can also handle the overlapping case by setting DF=1
(backward copy), which can be used as the memmove() implementation.

Before this patch:
```
  00000000000010ab <memmove>:
    10ab: 48 89 f8              mov    %rdi,%rax
    10ae: 31 c9                 xor    %ecx,%ecx
    10b0: 48 39 f7              cmp    %rsi,%rdi
    10b3: 48 83 d1 ff           adc    $0xffffffffffffffff,%rcx
    10b7: 48 85 d2              test   %rdx,%rdx
    10ba: 74 25                 je     10e1 <memmove+0x36>
    10bc: 48 83 c9 01           or     $0x1,%rcx
    10c0: 48 39 f0              cmp    %rsi,%rax
    10c3: 48 c7 c7 ff ff ff ff  mov    $0xffffffffffffffff,%rdi
    10ca: 48 0f 43 fa           cmovae %rdx,%rdi
    10ce: 48 01 cf              add    %rcx,%rdi
    10d1: 44 8a 04 3e           mov    (%rsi,%rdi,1),%r8b
    10d5: 44 88 04 38           mov    %r8b,(%rax,%rdi,1)
    10d9: 48 01 cf              add    %rcx,%rdi
    10dc: 48 ff ca              dec    %rdx
    10df: 75 f0                 jne    10d1 <memmove+0x26>
    10e1: c3                    ret

  00000000000010e2 <memcpy>:
    10e2: 48 89 f8              mov    %rdi,%rax
    10e5: 48 85 d2              test   %rdx,%rdx
    10e8: 74 12                 je     10fc <memcpy+0x1a>
    10ea: 31 c9                 xor    %ecx,%ecx
    10ec: 40 8a 3c 0e           mov    (%rsi,%rcx,1),%dil
    10f0: 40 88 3c 08           mov    %dil,(%rax,%rcx,1)
    10f4: 48 ff c1              inc    %rcx
    10f7: 48 39 ca              cmp    %rcx,%rdx
    10fa: 75 f0                 jne    10ec <memcpy+0xa>
    10fc: c3                    ret
```

After this patch:
```
  0000000000401040 <memmove>:
    401040: 48 89 d1              mov    %rdx,%rcx
    401043: 48 89 fa              mov    %rdi,%rdx
    401046: 48 89 f8              mov    %rdi,%rax
    401049: 48 29 f2              sub    %rsi,%rdx
    40104c: 48 39 ca              cmp    %rcx,%rdx
    40104f: 73 0f                 jae    401060 <memmove+0x20>
    401051: 48 8d 7c 0f ff        lea    -0x1(%rdi,%rcx,1),%rdi
    401056: 48 8d 74 0e ff        lea    -0x1(%rsi,%rcx,1),%rsi
    40105b: fd                    std
    40105c: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    40105e: fc                    cld
    40105f: c3                    ret
    401060: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    401062: c3                    ret

  0000000000401063 <memcpy>:
    401063: 48 89 f8              mov    %rdi,%rax
    401066: 48 89 d1              mov    %rdx,%rcx
    401069: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    40106b: c3                    ret
```

Link: https://lore.kernel.org/lkml/[email protected]
Suggested-by: David Laight <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/arch-x86_64.h | 35 ++++++++++++++++++++++++++++++
 tools/include/nolibc/string.h      |  4 ++++
 2 files changed, 39 insertions(+)

diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index e5ccb926c90306b6..297ffa364b2312eb 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -156,21 +156,56 @@
 
 /* startup code */
 /*
  * x86-64 System V ABI mandates:
  * 1) %rsp must be 16-byte aligned right before the function call.
  * 2) The deepest stack frame should be zero (the %rbp).
  *
  */
 void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
 {
 	__asm__ volatile (
 		"xor  %ebp, %ebp\n"       /* zero the stack frame                            */
 		"mov  %rsp, %rdi\n"       /* save stack pointer to %rdi, as arg1 of _start_c */
 		"and  $-16, %rsp\n"       /* %rsp must be 16-byte aligned before call        */
 		"call _start_c\n"         /* transfer to c runtime                           */
 		"hlt\n"                   /* ensure it does not return                       */
 	);
 	__builtin_unreachable();
 }
 
+#define NOLIBC_ARCH_HAS_MEMMOVE
+void *memmove(void *dst, const void *src, size_t len);
+
+#define NOLIBC_ARCH_HAS_MEMCPY
+void *memcpy(void *dst, const void *src, size_t len);
+
+__asm__ (
+".section .text.nolibc_memmove\n"
+".weak memmove\n"
+"memmove:\n"
+	"movq %rdx, %rcx\n"
+	"movq %rdi, %rdx\n"
+	"movq %rdi, %rax\n"
+	"subq %rsi, %rdx\n"
+	"cmpq %rcx, %rdx\n"
+	"jnb  .Lforward_copy\n"
+	"leaq -1(%rdi, %rcx, 1), %rdi\n"
+	"leaq -1(%rsi, %rcx, 1), %rsi\n"
+	"std\n"
+	"rep movsb\n"
+	"cld\n"
+	"retq\n"
+".Lforward_copy:\n"
+	"rep movsb\n"
+	"retq\n"
+
+".section .text.nolibc_memcpy\n"
+".weak memcpy\n"
+"memcpy:\n"
+	"movq %rdi, %rax\n"
+	"movq %rdx, %rcx\n"
+	"rep movsb\n"
+	"retq\n"
+);
+
 #endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 0c2e06c7c4772bc6..6eca267ec6fa7177 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -32,70 +32,74 @@ void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
 {
 	size_t pos = 0;
 
 	while (pos < len) {
 		((char *)dst)[pos] = ((const char *)src)[pos];
 		pos++;
 	}
 	return dst;
 }
 
 static __attribute__((unused))
 void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
 {
 	while (len) {
 		len--;
 		((char *)dst)[len] = ((const char *)src)[len];
 	}
 	return dst;
 }
 
+#ifndef NOLIBC_ARCH_HAS_MEMMOVE
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
  */
 __attribute__((weak,unused,section(".text.nolibc_memmove")))
 void *memmove(void *dst, const void *src, size_t len)
 {
 	size_t dir, pos;
 
 	pos = len;
 	dir = -1;
 
 	if (dst < src) {
 		pos = -1;
 		dir = 1;
 	}
 
 	while (len) {
 		pos += dir;
 		((char *)dst)[pos] = ((const char *)src)[pos];
 		len--;
 	}
 	return dst;
 }
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
 
+#ifndef NOLIBC_ARCH_HAS_MEMCPY
 /* must be exported, as it's used by libgcc on ARM */
 __attribute__((weak,unused,section(".text.nolibc_memcpy")))
 void *memcpy(void *dst, const void *src, size_t len)
 {
 	return _nolibc_memcpy_up(dst, src, len);
 }
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
 
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
  */
 __attribute__((weak,unused,section(".text.nolibc_memset")))
 void *memset(void *dst, int b, size_t len)
 {
 	char *p = dst;
 
 	while (len--) {
 		/* prevent gcc from recognizing memset() here */
 		__asm__ volatile("");
 		*(p++) = b;
 	}
 	return dst;
 }
 
 static __attribute__((unused))
 char *strchr(const char *s, int c)
 {
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH v2 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  2023-09-02  5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
  2023-09-02  5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
@ 2023-09-02  5:50 ` Ammar Faizi
  2023-09-02  5:50 ` [RFC PATCH v2 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
  2023-09-02  5:50 ` [RFC PATCH v2 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
  3 siblings, 0 replies; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02  5:50 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

Simplify memset() on the x86-64 arch.

The x86-64 arch has a 'rep stosb' instruction, which can perform
memset() using only a single instruction, given:

    %al  = value (just like the second argument of memset())
    %rdi = destination
    %rcx = length

Before this patch:
```
  00000000000010c9 <memset>:
    10c9: 48 89 f8              mov    %rdi,%rax
    10cc: 48 85 d2              test   %rdx,%rdx
    10cf: 74 0e                 je     10df <memset+0x16>
    10d1: 31 c9                 xor    %ecx,%ecx
    10d3: 40 88 34 08           mov    %sil,(%rax,%rcx,1)
    10d7: 48 ff c1              inc    %rcx
    10da: 48 39 ca              cmp    %rcx,%rdx
    10dd: 75 f4                 jne    10d3 <memset+0xa>
    10df: c3                    ret
```

After this patch:
```
  0000000000001511 <memset>:
    1511: 96                    xchg   %eax,%esi
    1512: 48 89 d1              mov    %rdx,%rcx
    1515: 57                    push   %rdi
    1516: f3 aa                 rep stos %al,%es:(%rdi)
    1518: 58                    pop    %rax
    1519: c3                    ret
```

v2 code size shrink:
  - Use pushq %rdi / popq %rax (Alviro).
  - Use xchg %eax, %esi (Willy).

Link: https://lore.kernel.org/lkml/[email protected]
Suggested-by: Alviro Iskandar Setiawan <[email protected]>
Suggested-by: Willy Tarreau <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/arch-x86_64.h | 13 +++++++++++++
 tools/include/nolibc/string.h      |  2 ++
 2 files changed, 15 insertions(+)

diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index 297ffa364b2312eb..ed01b130e25c6740 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -162,50 +162,63 @@
  *
  */
 void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
 {
 	__asm__ volatile (
 		"xor  %ebp, %ebp\n"       /* zero the stack frame                            */
 		"mov  %rsp, %rdi\n"       /* save stack pointer to %rdi, as arg1 of _start_c */
 		"and  $-16, %rsp\n"       /* %rsp must be 16-byte aligned before call        */
 		"call _start_c\n"         /* transfer to c runtime                           */
 		"hlt\n"                   /* ensure it does not return                       */
 	);
 	__builtin_unreachable();
 }
 
 #define NOLIBC_ARCH_HAS_MEMMOVE
 void *memmove(void *dst, const void *src, size_t len);
 
 #define NOLIBC_ARCH_HAS_MEMCPY
 void *memcpy(void *dst, const void *src, size_t len);
 
+#define NOLIBC_ARCH_HAS_MEMSET
+void *memset(void *dst, int c, size_t len);
+
 __asm__ (
 ".section .text.nolibc_memmove\n"
 ".weak memmove\n"
 "memmove:\n"
 	"movq %rdx, %rcx\n"
 	"movq %rdi, %rdx\n"
 	"movq %rdi, %rax\n"
 	"subq %rsi, %rdx\n"
 	"cmpq %rcx, %rdx\n"
 	"jnb  .Lforward_copy\n"
 	"leaq -1(%rdi, %rcx, 1), %rdi\n"
 	"leaq -1(%rsi, %rcx, 1), %rsi\n"
 	"std\n"
 	"rep movsb\n"
 	"cld\n"
 	"retq\n"
 ".Lforward_copy:\n"
 	"rep movsb\n"
 	"retq\n"
 
 ".section .text.nolibc_memcpy\n"
 ".weak memcpy\n"
 "memcpy:\n"
 	"movq %rdi, %rax\n"
 	"movq %rdx, %rcx\n"
 	"rep movsb\n"
 	"retq\n"
+
+".section .text.nolibc_memset\n"
+".weak memset\n"
+"memset:\n"
+	"xchg  %eax, %esi\n"
+	"movq  %rdx, %rcx\n"
+	"pushq %rdi\n"
+	"rep stosb\n"
+	"popq  %rax\n"
+	"retq\n"
 );
 
 #endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 6eca267ec6fa7177..1bad6121ef8c4ab5 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -67,55 +67,57 @@ void *memmove(void *dst, const void *src, size_t len)
 	}
 
 	while (len) {
 		pos += dir;
 		((char *)dst)[pos] = ((const char *)src)[pos];
 		len--;
 	}
 	return dst;
 }
 #endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
 
 #ifndef NOLIBC_ARCH_HAS_MEMCPY
 /* must be exported, as it's used by libgcc on ARM */
 __attribute__((weak,unused,section(".text.nolibc_memcpy")))
 void *memcpy(void *dst, const void *src, size_t len)
 {
 	return _nolibc_memcpy_up(dst, src, len);
 }
 #endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
 
+#ifndef NOLIBC_ARCH_HAS_MEMSET
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
  */
 __attribute__((weak,unused,section(".text.nolibc_memset")))
 void *memset(void *dst, int b, size_t len)
 {
 	char *p = dst;
 
 	while (len--) {
 		/* prevent gcc from recognizing memset() here */
 		__asm__ volatile("");
 		*(p++) = b;
 	}
 	return dst;
 }
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMSET */
 
 static __attribute__((unused))
 char *strchr(const char *s, int c)
 {
 	while (*s) {
 		if (*s == (char)c)
 			return (char *)s;
 		s++;
 	}
 	return NULL;
 }
 
 static __attribute__((unused))
 int strcmp(const char *a, const char *b)
 {
 	unsigned int c;
 	int diff;
 
 	while (!(diff = (unsigned char)*a++ - (c = (unsigned char)*b++)) && c)
 		;
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH v2 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
  2023-09-02  5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
  2023-09-02  5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
  2023-09-02  5:50 ` [RFC PATCH v2 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
@ 2023-09-02  5:50 ` Ammar Faizi
  2023-09-02  5:50 ` [RFC PATCH v2 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
  3 siblings, 0 replies; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02  5:50 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.

Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/string.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 1bad6121ef8c4ab5..22dcb3f566baeefe 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -22,50 +22,40 @@ int memcmp(const void *s1, const void *s2, size_t n)
 	int c1 = 0;
 
 	while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) {
 		ofs++;
 	}
 	return c1;
 }
 
 static __attribute__((unused))
 void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
 {
 	size_t pos = 0;
 
 	while (pos < len) {
 		((char *)dst)[pos] = ((const char *)src)[pos];
 		pos++;
 	}
 	return dst;
 }
 
-static __attribute__((unused))
-void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
-{
-	while (len) {
-		len--;
-		((char *)dst)[len] = ((const char *)src)[len];
-	}
-	return dst;
-}
-
 #ifndef NOLIBC_ARCH_HAS_MEMMOVE
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
  */
 __attribute__((weak,unused,section(".text.nolibc_memmove")))
 void *memmove(void *dst, const void *src, size_t len)
 {
 	size_t dir, pos;
 
 	pos = len;
 	dir = -1;
 
 	if (dst < src) {
 		pos = -1;
 		dir = 1;
 	}
 
 	while (len) {
 		pos += dir;
 		((char *)dst)[pos] = ((const char *)src)[pos];
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH v2 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
  2023-09-02  5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
                   ` (2 preceding siblings ...)
  2023-09-02  5:50 ` [RFC PATCH v2 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
@ 2023-09-02  5:50 ` Ammar Faizi
  3 siblings, 0 replies; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02  5:50 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.

Signed-off-by: Ammar Faizi <[email protected]>
---
 tools/include/nolibc/string.h | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 22dcb3f566baeefe..a01c69dd495f550c 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -10,84 +10,78 @@
 #include "std.h"
 
 static void *malloc(size_t len);
 
 /*
  * As much as possible, please keep functions alphabetically sorted.
  */
 
 static __attribute__((unused))
 int memcmp(const void *s1, const void *s2, size_t n)
 {
 	size_t ofs = 0;
 	int c1 = 0;
 
 	while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) {
 		ofs++;
 	}
 	return c1;
 }
 
-static __attribute__((unused))
-void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
-{
-	size_t pos = 0;
-
-	while (pos < len) {
-		((char *)dst)[pos] = ((const char *)src)[pos];
-		pos++;
-	}
-	return dst;
-}
-
 #ifndef NOLIBC_ARCH_HAS_MEMMOVE
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
  */
 __attribute__((weak,unused,section(".text.nolibc_memmove")))
 void *memmove(void *dst, const void *src, size_t len)
 {
 	size_t dir, pos;
 
 	pos = len;
 	dir = -1;
 
 	if (dst < src) {
 		pos = -1;
 		dir = 1;
 	}
 
 	while (len) {
 		pos += dir;
 		((char *)dst)[pos] = ((const char *)src)[pos];
 		len--;
 	}
 	return dst;
 }
 #endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
 
 #ifndef NOLIBC_ARCH_HAS_MEMCPY
 /* must be exported, as it's used by libgcc on ARM */
 __attribute__((weak,unused,section(".text.nolibc_memcpy")))
 void *memcpy(void *dst, const void *src, size_t len)
 {
-	return _nolibc_memcpy_up(dst, src, len);
+	size_t pos = 0;
+
+	while (pos < len) {
+		((char *)dst)[pos] = ((const char *)src)[pos];
+		pos++;
+	}
+	return dst;
 }
 #endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
 
 #ifndef NOLIBC_ARCH_HAS_MEMSET
 /* might be ignored by the compiler without -ffreestanding, then found as
  * missing.
  */
 __attribute__((weak,unused,section(".text.nolibc_memset")))
 void *memset(void *dst, int b, size_t len)
 {
 	char *p = dst;
 
 	while (len--) {
 		/* prevent gcc from recognizing memset() here */
 		__asm__ volatile("");
 		*(p++) = b;
 	}
 	return dst;
 }
 #endif /* #ifndef NOLIBC_ARCH_HAS_MEMSET */
-- 
Ammar Faizi


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-09-02  5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
@ 2023-09-02  6:07   ` Alviro Iskandar Setiawan
  2023-09-02  6:11     ` Ammar Faizi
  0 siblings, 1 reply; 12+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-09-02  6:07 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Sat, Sep 2, 2023 at 12:51 PM Ammar Faizi wrote:
> +__asm__ (
> +".section .text.nolibc_memmove\n"
> +".weak memmove\n"
> +"memmove:\n"
> +       "movq %rdx, %rcx\n"
> +       "movq %rdi, %rdx\n"
> +       "movq %rdi, %rax\n"
> +       "subq %rsi, %rdx\n"
> +       "cmpq %rcx, %rdx\n"
> +       "jnb  .Lforward_copy\n"
> +       "leaq -1(%rdi, %rcx, 1), %rdi\n"
> +       "leaq -1(%rsi, %rcx, 1), %rsi\n"
> +       "std\n"
> +       "rep movsb\n"
> +       "cld\n"
> +       "retq\n"
> +".Lforward_copy:\n"
> +       "rep movsb\n"
> +       "retq\n"
> +
> +".section .text.nolibc_memcpy\n"
> +".weak memcpy\n"
> +"memcpy:\n"
> +       "movq %rdi, %rax\n"
> +       "movq %rdx, %rcx\n"
> +       "rep movsb\n"
> +       "retq\n"
> +);

Btw, sir, this can be simplified more by merging the forward copy
path, only using two "rep movsb" for both memmove() and memcpy()
should be enough?
```
__asm__ (
".section .text.nolibc_memmove_memcpy\n"
".weak memmove\n"
".weak memcpy\n"
"memmove:\n"
        "movq %rdx, %rcx\n"
        "movq %rdi, %rdx\n"
        "movq %rdi, %rax\n"
        "subq %rsi, %rdx\n"
        "cmpq %rcx, %rdx\n"
        "jnb __nolibc_forward_copy\n"
        "leaq -1(%rdi, %rcx, 1), %rdi\n"
        "leaq -1(%rsi, %rcx, 1), %rsi\n"
        "std\n"
        "rep movsb\n"
        "cld\n"
        "retq\n"

"memcpy:\n"
        "movq %rdi, %rax\n"
        "movq %rdx, %rcx\n"
"__nolibc_forward_copy:\n"
        "rep movsb\n"
        "retq\n"
);
```
Thought?

-- Viro

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-09-02  6:07   ` Alviro Iskandar Setiawan
@ 2023-09-02  6:11     ` Ammar Faizi
  2023-09-02  6:22       ` Willy Tarreau
  0 siblings, 1 reply; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02  6:11 UTC (permalink / raw)
  To: Alviro Iskandar Setiawan
  Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Sat, Sep 02, 2023 at 01:07:50PM +0700, Alviro Iskandar Setiawan wrote:
> Btw, sir, this can be simplified more by merging the forward copy
> path, only using two "rep movsb" for both memmove() and memcpy()
> should be enough?
> ```
> __asm__ (
> ".section .text.nolibc_memmove_memcpy\n"
> ".weak memmove\n"
> ".weak memcpy\n"
> "memmove:\n"
>         "movq %rdx, %rcx\n"
>         "movq %rdi, %rdx\n"
>         "movq %rdi, %rax\n"
>         "subq %rsi, %rdx\n"
>         "cmpq %rcx, %rdx\n"
>         "jnb __nolibc_forward_copy\n"
>         "leaq -1(%rdi, %rcx, 1), %rdi\n"
>         "leaq -1(%rsi, %rcx, 1), %rsi\n"
>         "std\n"
>         "rep movsb\n"
>         "cld\n"
>         "retq\n"
> 
> "memcpy:\n"
>         "movq %rdi, %rax\n"
>         "movq %rdx, %rcx\n"
> "__nolibc_forward_copy:\n"
>         "rep movsb\n"
>         "retq\n"
> );
> ```

Looks good. I'll apply that change.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-09-02  6:11     ` Ammar Faizi
@ 2023-09-02  6:22       ` Willy Tarreau
  2023-09-02  6:37         ` Ammar Faizi
  0 siblings, 1 reply; 12+ messages in thread
From: Willy Tarreau @ 2023-09-02  6:22 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Alviro Iskandar Setiawan, Thomas Weißschuh, David Laight,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Sat, Sep 02, 2023 at 01:11:06PM +0700, Ammar Faizi wrote:
> On Sat, Sep 02, 2023 at 01:07:50PM +0700, Alviro Iskandar Setiawan wrote:
> > Btw, sir, this can be simplified more by merging the forward copy
> > path, only using two "rep movsb" for both memmove() and memcpy()
> > should be enough?
> > ```
> > __asm__ (
> > ".section .text.nolibc_memmove_memcpy\n"
> > ".weak memmove\n"
> > ".weak memcpy\n"
> > "memmove:\n"
> >         "movq %rdx, %rcx\n"
> >         "movq %rdi, %rdx\n"
> >         "movq %rdi, %rax\n"
> >         "subq %rsi, %rdx\n"
> >         "cmpq %rcx, %rdx\n"
> >         "jnb __nolibc_forward_copy\n"
> >         "leaq -1(%rdi, %rcx, 1), %rdi\n"
> >         "leaq -1(%rsi, %rcx, 1), %rsi\n"
> >         "std\n"
> >         "rep movsb\n"
> >         "cld\n"
> >         "retq\n"
> > 
> > "memcpy:\n"
> >         "movq %rdi, %rax\n"
> >         "movq %rdx, %rcx\n"
> > "__nolibc_forward_copy:\n"
> >         "rep movsb\n"
> >         "retq\n"
> > );
> > ```
> 
> Looks good. I'll apply that change.

Note that in this case we simply don't need a special
version of memcpy(), memmove() is always OK for this,
so you can simplify this further by starting with:

  memcpy:
  memmove:
        ...

Willy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-09-02  6:22       ` Willy Tarreau
@ 2023-09-02  6:37         ` Ammar Faizi
  2023-09-02 12:29           ` Alviro Iskandar Setiawan
  0 siblings, 1 reply; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02  6:37 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Alviro Iskandar Setiawan, Thomas Weißschuh, David Laight,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Sat, Sep 02, 2023 at 08:22:37AM +0200, Willy Tarreau wrote:
> Note that in this case we simply don't need a special
> version of memcpy(), memmove() is always OK for this,
> so you can simplify this further by starting with:
> 
>   memcpy:
>   memmove:
>        ...

Ok, I'll do that.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-09-02  6:37         ` Ammar Faizi
@ 2023-09-02 12:29           ` Alviro Iskandar Setiawan
  2023-09-02 12:36             ` Ammar Faizi
  0 siblings, 1 reply; 12+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-09-02 12:29 UTC (permalink / raw)
  To: Ammar Faizi
  Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On Sat, Sep 2, 2023 at 1:38 PM Ammar Faizi wrote:
> Ok, I'll do that.

Another micro-optimization. Since the likely case is the forward copy,
make it the case that doesn't take the jump.

Pseudo C:
if (unlikely(dst - src < n))
        backward_copy();
else
        forward_copy();

-- Viro

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-09-02 12:29           ` Alviro Iskandar Setiawan
@ 2023-09-02 12:36             ` Ammar Faizi
  2023-09-03 20:35               ` David Laight
  0 siblings, 1 reply; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 12:36 UTC (permalink / raw)
  To: Alviro Iskandar Setiawan
  Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
	Nicholas Rosenberg, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

On 2023/09/02 午後7:29, Alviro Iskandar Setiawan wrote:
> On Sat, Sep 2, 2023 at 1:38 PM Ammar Faizi wrote:
>> Ok, I'll do that.
> 
> Another micro-optimization. Since the likely case is the forward copy,
> make it the case that doesn't take the jump.
> 
> Pseudo C:
> if (unlikely(dst - src < n))
>          backward_copy();
> else
>          forward_copy();

Point taken.

-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  2023-09-02 12:36             ` Ammar Faizi
@ 2023-09-03 20:35               ` David Laight
  0 siblings, 0 replies; 12+ messages in thread
From: David Laight @ 2023-09-03 20:35 UTC (permalink / raw)
  To: 'Ammar Faizi', Alviro Iskandar Setiawan
  Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
	Michael William Jonathan, GNU/Weeb Mailing List,
	Linux Kernel Mailing List

From: Ammar Faizi
> Sent: 02 September 2023 13:37
> On 2023/09/02 午後7:29, Alviro Iskandar Setiawan wrote:
> > On Sat, Sep 2, 2023 at 1:38 PM Ammar Faizi wrote:
> >> Ok, I'll do that.
> >
> > Another micro-optimization. Since the likely case is the forward copy,
> > make it the case that doesn't take the jump.
> >
> > Pseudo C:
> > if (unlikely(dst - src < n))
> >          backward_copy();
> > else
> >          forward_copy();
> 
> Point taken.

I believe it makes almost no difference.
I'm sure I read that modern (Intel at least) cpu never do
static branch prediction.
So 'cold cache' there is 50% chance of the branch being taken.
Nothing the compiler can do will affect it.
OTOH having
	if (likely(dst - src >= n))
		forwards();
	else
		backwards();
(and in that order) probably makes the code a bit easier to read.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-09-03 20:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-02  5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
2023-09-02  5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
2023-09-02  6:07   ` Alviro Iskandar Setiawan
2023-09-02  6:11     ` Ammar Faizi
2023-09-02  6:22       ` Willy Tarreau
2023-09-02  6:37         ` Ammar Faizi
2023-09-02 12:29           ` Alviro Iskandar Setiawan
2023-09-02 12:36             ` Ammar Faizi
2023-09-03 20:35               ` David Laight
2023-09-02  5:50 ` [RFC PATCH v2 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
2023-09-02  5:50 ` [RFC PATCH v2 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
2023-09-02  5:50 ` [RFC PATCH v2 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox