* [RFC PATCH v2 0/4] nolibc x86-64 string functions
@ 2023-09-02 5:50 Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 5:50 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
Hi Willy,
This is an RFC patchset v2 for nolibc x86-64 string functions.
Changes in v2:
- Shrink the memset code size:
- Use pushq %rdi / popq %rax (Alviro).
- Use xchg %eax, %esi (Willy).
- Drop the memcmp patch (need more pondering).
- Fix the broken memmove implementation (David).
There are 4 patches in this series:
## Patch 1-2: Use `rep movsb`, `rep stosb` for:
- memcpy() and memmove()
- memset()
respectively. They can simplify the generated ASM code.
Patch 3 and 4 are not related, just a small cleanup.
## Patch 3: Remove the `_nolibc_memcpy_down()` function
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.
## Patch 4: Remove the `_nolibc_memcpy_up()` function
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.
Before this series:
```
0000000000401058 <memmove>:
401058: 48 89 f8 movq %rdi,%rax
40105b: 31 c9 xorl %ecx,%ecx
40105d: 48 39 f7 cmpq %rsi,%rdi
401060: 48 83 d1 ff adcq $0xffffffffffffffff,%rcx
401064: 48 85 d2 testq %rdx,%rdx
401067: 74 25 je 40108e <memmove+0x36>
401069: 48 83 c9 01 orq $0x1,%rcx
40106d: 48 39 f0 cmpq %rsi,%rax
401070: 48 c7 c7 ff ff ff ff movq $0xffffffffffffffff,%rdi
401077: 48 0f 43 fa cmovaeq %rdx,%rdi
40107b: 48 01 cf addq %rcx,%rdi
40107e: 44 8a 04 3e movb (%rsi,%rdi,1),%r8b
401082: 44 88 04 38 movb %r8b,(%rax,%rdi,1)
401086: 48 01 cf addq %rcx,%rdi
401089: 48 ff ca decq %rdx
40108c: 75 f0 jne 40107e <memmove+0x26>
40108e: c3 retq
000000000040108f <memcpy>:
40108f: 48 89 f8 movq %rdi,%rax
401092: 48 85 d2 testq %rdx,%rdx
401095: 74 12 je 4010a9 <memcpy+0x1a>
401097: 31 c9 xorl %ecx,%ecx
401099: 40 8a 3c 0e movb (%rsi,%rcx,1),%dil
40109d: 40 88 3c 08 movb %dil,(%rax,%rcx,1)
4010a1: 48 ff c1 incq %rcx
4010a4: 48 39 ca cmpq %rcx,%rdx
4010a7: 75 f0 jne 401099 <memcpy+0xa>
4010a9: c3 retq
00000000004010aa <memset>:
4010aa: 48 89 f8 movq %rdi,%rax
4010ad: 48 85 d2 testq %rdx,%rdx
4010b0: 74 0e je 4010c0 <memset+0x16>
4010b2: 31 c9 xorl %ecx,%ecx
4010b4: 40 88 34 08 movb %sil,(%rax,%rcx,1)
4010b8: 48 ff c1 incq %rcx
4010bb: 48 39 ca cmpq %rcx,%rdx
4010be: 75 f4 jne 4010b4 <memset+0xa>
4010c0: c3 retq
```
After this series:
```
0000000000401040 <memmove>:
401040: 48 89 d1 movq %rdx,%rcx
401043: 48 89 fa movq %rdi,%rdx
401046: 48 89 f8 movq %rdi,%rax
401049: 48 29 f2 subq %rsi,%rdx
40104c: 48 39 ca cmpq %rcx,%rdx
40104f: 73 0f jae 401060 <memmove+0x20>
401051: 48 8d 7c 0f ff leaq -0x1(%rdi,%rcx,1),%rdi
401056: 48 8d 74 0e ff leaq -0x1(%rsi,%rcx,1),%rsi
40105b: fd std
40105c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
40105e: fc cld
40105f: c3 retq
401060: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
401062: c3 retq
0000000000401063 <memcpy>:
401063: 48 89 f8 movq %rdi,%rax
401066: 48 89 d1 movq %rdx,%rcx
401069: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
40106b: c3 retq
000000000040106c <memset>:
40106c: 96 xchgl %eax,%esi
40106d: 48 89 d1 movq %rdx,%rcx
401070: 57 pushq %rdi
401071: f3 aa rep stosb %al,%es:(%rdi)
401073: 58 popq %rax
401074: c3 retq
```
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (4):
tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
tools/nolibc: x86-64: Use `rep stosb` for `memset()`
tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
tools/include/nolibc/arch-x86_64.h | 48 ++++++++++++++++++++++++++++++
tools/include/nolibc/string.h | 36 ++++++++--------------
2 files changed, 61 insertions(+), 23 deletions(-)
base-commit: 3c9b7c4a228bf8cca2f92abb65575cdd54065302
--
Ammar Faizi
^ permalink raw reply [flat|nested] 12+ messages in thread
* [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
@ 2023-09-02 5:50 ` Ammar Faizi
2023-09-02 6:07 ` Alviro Iskandar Setiawan
2023-09-02 5:50 ` [RFC PATCH v2 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 5:50 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
Simplify memcpy() and memmove() on the x86-64 arch.
The x86-64 arch has a 'rep movsb' instruction, which can perform
memcpy() using only a single instruction, given:
%rdi = destination
%rsi = source
%rcx = length
Additionally, it can also handle the overlapping case by setting DF=1
(backward copy), which can be used as the memmove() implementation.
Before this patch:
```
00000000000010ab <memmove>:
10ab: 48 89 f8 mov %rdi,%rax
10ae: 31 c9 xor %ecx,%ecx
10b0: 48 39 f7 cmp %rsi,%rdi
10b3: 48 83 d1 ff adc $0xffffffffffffffff,%rcx
10b7: 48 85 d2 test %rdx,%rdx
10ba: 74 25 je 10e1 <memmove+0x36>
10bc: 48 83 c9 01 or $0x1,%rcx
10c0: 48 39 f0 cmp %rsi,%rax
10c3: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi
10ca: 48 0f 43 fa cmovae %rdx,%rdi
10ce: 48 01 cf add %rcx,%rdi
10d1: 44 8a 04 3e mov (%rsi,%rdi,1),%r8b
10d5: 44 88 04 38 mov %r8b,(%rax,%rdi,1)
10d9: 48 01 cf add %rcx,%rdi
10dc: 48 ff ca dec %rdx
10df: 75 f0 jne 10d1 <memmove+0x26>
10e1: c3 ret
00000000000010e2 <memcpy>:
10e2: 48 89 f8 mov %rdi,%rax
10e5: 48 85 d2 test %rdx,%rdx
10e8: 74 12 je 10fc <memcpy+0x1a>
10ea: 31 c9 xor %ecx,%ecx
10ec: 40 8a 3c 0e mov (%rsi,%rcx,1),%dil
10f0: 40 88 3c 08 mov %dil,(%rax,%rcx,1)
10f4: 48 ff c1 inc %rcx
10f7: 48 39 ca cmp %rcx,%rdx
10fa: 75 f0 jne 10ec <memcpy+0xa>
10fc: c3 ret
```
After this patch:
```
0000000000401040 <memmove>:
401040: 48 89 d1 mov %rdx,%rcx
401043: 48 89 fa mov %rdi,%rdx
401046: 48 89 f8 mov %rdi,%rax
401049: 48 29 f2 sub %rsi,%rdx
40104c: 48 39 ca cmp %rcx,%rdx
40104f: 73 0f jae 401060 <memmove+0x20>
401051: 48 8d 7c 0f ff lea -0x1(%rdi,%rcx,1),%rdi
401056: 48 8d 74 0e ff lea -0x1(%rsi,%rcx,1),%rsi
40105b: fd std
40105c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
40105e: fc cld
40105f: c3 ret
401060: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
401062: c3 ret
0000000000401063 <memcpy>:
401063: 48 89 f8 mov %rdi,%rax
401066: 48 89 d1 mov %rdx,%rcx
401069: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
40106b: c3 ret
```
Link: https://lore.kernel.org/lkml/[email protected]
Suggested-by: David Laight <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
tools/include/nolibc/arch-x86_64.h | 35 ++++++++++++++++++++++++++++++
tools/include/nolibc/string.h | 4 ++++
2 files changed, 39 insertions(+)
diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index e5ccb926c90306b6..297ffa364b2312eb 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -156,21 +156,56 @@
/* startup code */
/*
* x86-64 System V ABI mandates:
* 1) %rsp must be 16-byte aligned right before the function call.
* 2) The deepest stack frame should be zero (the %rbp).
*
*/
void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
{
__asm__ volatile (
"xor %ebp, %ebp\n" /* zero the stack frame */
"mov %rsp, %rdi\n" /* save stack pointer to %rdi, as arg1 of _start_c */
"and $-16, %rsp\n" /* %rsp must be 16-byte aligned before call */
"call _start_c\n" /* transfer to c runtime */
"hlt\n" /* ensure it does not return */
);
__builtin_unreachable();
}
+#define NOLIBC_ARCH_HAS_MEMMOVE
+void *memmove(void *dst, const void *src, size_t len);
+
+#define NOLIBC_ARCH_HAS_MEMCPY
+void *memcpy(void *dst, const void *src, size_t len);
+
+__asm__ (
+".section .text.nolibc_memmove\n"
+".weak memmove\n"
+"memmove:\n"
+ "movq %rdx, %rcx\n"
+ "movq %rdi, %rdx\n"
+ "movq %rdi, %rax\n"
+ "subq %rsi, %rdx\n"
+ "cmpq %rcx, %rdx\n"
+ "jnb .Lforward_copy\n"
+ "leaq -1(%rdi, %rcx, 1), %rdi\n"
+ "leaq -1(%rsi, %rcx, 1), %rsi\n"
+ "std\n"
+ "rep movsb\n"
+ "cld\n"
+ "retq\n"
+".Lforward_copy:\n"
+ "rep movsb\n"
+ "retq\n"
+
+".section .text.nolibc_memcpy\n"
+".weak memcpy\n"
+"memcpy:\n"
+ "movq %rdi, %rax\n"
+ "movq %rdx, %rcx\n"
+ "rep movsb\n"
+ "retq\n"
+);
+
#endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 0c2e06c7c4772bc6..6eca267ec6fa7177 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -32,70 +32,74 @@ void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
{
size_t pos = 0;
while (pos < len) {
((char *)dst)[pos] = ((const char *)src)[pos];
pos++;
}
return dst;
}
static __attribute__((unused))
void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
{
while (len) {
len--;
((char *)dst)[len] = ((const char *)src)[len];
}
return dst;
}
+#ifndef NOLIBC_ARCH_HAS_MEMMOVE
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memmove")))
void *memmove(void *dst, const void *src, size_t len)
{
size_t dir, pos;
pos = len;
dir = -1;
if (dst < src) {
pos = -1;
dir = 1;
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
len--;
}
return dst;
}
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
+#ifndef NOLIBC_ARCH_HAS_MEMCPY
/* must be exported, as it's used by libgcc on ARM */
__attribute__((weak,unused,section(".text.nolibc_memcpy")))
void *memcpy(void *dst, const void *src, size_t len)
{
return _nolibc_memcpy_up(dst, src, len);
}
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memset")))
void *memset(void *dst, int b, size_t len)
{
char *p = dst;
while (len--) {
/* prevent gcc from recognizing memset() here */
__asm__ volatile("");
*(p++) = b;
}
return dst;
}
static __attribute__((unused))
char *strchr(const char *s, int c)
{
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH v2 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-02 5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
@ 2023-09-02 5:50 ` Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
3 siblings, 0 replies; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 5:50 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
Simplify memset() on the x86-64 arch.
The x86-64 arch has a 'rep stosb' instruction, which can perform
memset() using only a single instruction, given:
%al = value (just like the second argument of memset())
%rdi = destination
%rcx = length
Before this patch:
```
00000000000010c9 <memset>:
10c9: 48 89 f8 mov %rdi,%rax
10cc: 48 85 d2 test %rdx,%rdx
10cf: 74 0e je 10df <memset+0x16>
10d1: 31 c9 xor %ecx,%ecx
10d3: 40 88 34 08 mov %sil,(%rax,%rcx,1)
10d7: 48 ff c1 inc %rcx
10da: 48 39 ca cmp %rcx,%rdx
10dd: 75 f4 jne 10d3 <memset+0xa>
10df: c3 ret
```
After this patch:
```
0000000000001511 <memset>:
1511: 96 xchg %eax,%esi
1512: 48 89 d1 mov %rdx,%rcx
1515: 57 push %rdi
1516: f3 aa rep stos %al,%es:(%rdi)
1518: 58 pop %rax
1519: c3 ret
```
v2 code size shrink:
- Use pushq %rdi / popq %rax (Alviro).
- Use xchg %eax, %esi (Willy).
Link: https://lore.kernel.org/lkml/[email protected]
Suggested-by: Alviro Iskandar Setiawan <[email protected]>
Suggested-by: Willy Tarreau <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
tools/include/nolibc/arch-x86_64.h | 13 +++++++++++++
tools/include/nolibc/string.h | 2 ++
2 files changed, 15 insertions(+)
diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index 297ffa364b2312eb..ed01b130e25c6740 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -162,50 +162,63 @@
*
*/
void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
{
__asm__ volatile (
"xor %ebp, %ebp\n" /* zero the stack frame */
"mov %rsp, %rdi\n" /* save stack pointer to %rdi, as arg1 of _start_c */
"and $-16, %rsp\n" /* %rsp must be 16-byte aligned before call */
"call _start_c\n" /* transfer to c runtime */
"hlt\n" /* ensure it does not return */
);
__builtin_unreachable();
}
#define NOLIBC_ARCH_HAS_MEMMOVE
void *memmove(void *dst, const void *src, size_t len);
#define NOLIBC_ARCH_HAS_MEMCPY
void *memcpy(void *dst, const void *src, size_t len);
+#define NOLIBC_ARCH_HAS_MEMSET
+void *memset(void *dst, int c, size_t len);
+
__asm__ (
".section .text.nolibc_memmove\n"
".weak memmove\n"
"memmove:\n"
"movq %rdx, %rcx\n"
"movq %rdi, %rdx\n"
"movq %rdi, %rax\n"
"subq %rsi, %rdx\n"
"cmpq %rcx, %rdx\n"
"jnb .Lforward_copy\n"
"leaq -1(%rdi, %rcx, 1), %rdi\n"
"leaq -1(%rsi, %rcx, 1), %rsi\n"
"std\n"
"rep movsb\n"
"cld\n"
"retq\n"
".Lforward_copy:\n"
"rep movsb\n"
"retq\n"
".section .text.nolibc_memcpy\n"
".weak memcpy\n"
"memcpy:\n"
"movq %rdi, %rax\n"
"movq %rdx, %rcx\n"
"rep movsb\n"
"retq\n"
+
+".section .text.nolibc_memset\n"
+".weak memset\n"
+"memset:\n"
+ "xchg %eax, %esi\n"
+ "movq %rdx, %rcx\n"
+ "pushq %rdi\n"
+ "rep stosb\n"
+ "popq %rax\n"
+ "retq\n"
);
#endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 6eca267ec6fa7177..1bad6121ef8c4ab5 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -67,55 +67,57 @@ void *memmove(void *dst, const void *src, size_t len)
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
len--;
}
return dst;
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
#ifndef NOLIBC_ARCH_HAS_MEMCPY
/* must be exported, as it's used by libgcc on ARM */
__attribute__((weak,unused,section(".text.nolibc_memcpy")))
void *memcpy(void *dst, const void *src, size_t len)
{
return _nolibc_memcpy_up(dst, src, len);
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
+#ifndef NOLIBC_ARCH_HAS_MEMSET
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memset")))
void *memset(void *dst, int b, size_t len)
{
char *p = dst;
while (len--) {
/* prevent gcc from recognizing memset() here */
__asm__ volatile("");
*(p++) = b;
}
return dst;
}
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMSET */
static __attribute__((unused))
char *strchr(const char *s, int c)
{
while (*s) {
if (*s == (char)c)
return (char *)s;
s++;
}
return NULL;
}
static __attribute__((unused))
int strcmp(const char *a, const char *b)
{
unsigned int c;
int diff;
while (!(diff = (unsigned char)*a++ - (c = (unsigned char)*b++)) && c)
;
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH v2 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
2023-09-02 5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
@ 2023-09-02 5:50 ` Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
3 siblings, 0 replies; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 5:50 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.
Signed-off-by: Ammar Faizi <[email protected]>
---
tools/include/nolibc/string.h | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 1bad6121ef8c4ab5..22dcb3f566baeefe 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -22,50 +22,40 @@ int memcmp(const void *s1, const void *s2, size_t n)
int c1 = 0;
while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) {
ofs++;
}
return c1;
}
static __attribute__((unused))
void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
{
size_t pos = 0;
while (pos < len) {
((char *)dst)[pos] = ((const char *)src)[pos];
pos++;
}
return dst;
}
-static __attribute__((unused))
-void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
-{
- while (len) {
- len--;
- ((char *)dst)[len] = ((const char *)src)[len];
- }
- return dst;
-}
-
#ifndef NOLIBC_ARCH_HAS_MEMMOVE
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memmove")))
void *memmove(void *dst, const void *src, size_t len)
{
size_t dir, pos;
pos = len;
dir = -1;
if (dst < src) {
pos = -1;
dir = 1;
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [RFC PATCH v2 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
2023-09-02 5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
` (2 preceding siblings ...)
2023-09-02 5:50 ` [RFC PATCH v2 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
@ 2023-09-02 5:50 ` Ammar Faizi
3 siblings, 0 replies; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 5:50 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.
Signed-off-by: Ammar Faizi <[email protected]>
---
tools/include/nolibc/string.h | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 22dcb3f566baeefe..a01c69dd495f550c 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -10,84 +10,78 @@
#include "std.h"
static void *malloc(size_t len);
/*
* As much as possible, please keep functions alphabetically sorted.
*/
static __attribute__((unused))
int memcmp(const void *s1, const void *s2, size_t n)
{
size_t ofs = 0;
int c1 = 0;
while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) {
ofs++;
}
return c1;
}
-static __attribute__((unused))
-void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
-{
- size_t pos = 0;
-
- while (pos < len) {
- ((char *)dst)[pos] = ((const char *)src)[pos];
- pos++;
- }
- return dst;
-}
-
#ifndef NOLIBC_ARCH_HAS_MEMMOVE
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memmove")))
void *memmove(void *dst, const void *src, size_t len)
{
size_t dir, pos;
pos = len;
dir = -1;
if (dst < src) {
pos = -1;
dir = 1;
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
len--;
}
return dst;
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
#ifndef NOLIBC_ARCH_HAS_MEMCPY
/* must be exported, as it's used by libgcc on ARM */
__attribute__((weak,unused,section(".text.nolibc_memcpy")))
void *memcpy(void *dst, const void *src, size_t len)
{
- return _nolibc_memcpy_up(dst, src, len);
+ size_t pos = 0;
+
+ while (pos < len) {
+ ((char *)dst)[pos] = ((const char *)src)[pos];
+ pos++;
+ }
+ return dst;
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
#ifndef NOLIBC_ARCH_HAS_MEMSET
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memset")))
void *memset(void *dst, int b, size_t len)
{
char *p = dst;
while (len--) {
/* prevent gcc from recognizing memset() here */
__asm__ volatile("");
*(p++) = b;
}
return dst;
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMSET */
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
@ 2023-09-02 6:07 ` Alviro Iskandar Setiawan
2023-09-02 6:11 ` Ammar Faizi
0 siblings, 1 reply; 12+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-09-02 6:07 UTC (permalink / raw)
To: Ammar Faizi
Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sat, Sep 2, 2023 at 12:51 PM Ammar Faizi wrote:
> +__asm__ (
> +".section .text.nolibc_memmove\n"
> +".weak memmove\n"
> +"memmove:\n"
> + "movq %rdx, %rcx\n"
> + "movq %rdi, %rdx\n"
> + "movq %rdi, %rax\n"
> + "subq %rsi, %rdx\n"
> + "cmpq %rcx, %rdx\n"
> + "jnb .Lforward_copy\n"
> + "leaq -1(%rdi, %rcx, 1), %rdi\n"
> + "leaq -1(%rsi, %rcx, 1), %rsi\n"
> + "std\n"
> + "rep movsb\n"
> + "cld\n"
> + "retq\n"
> +".Lforward_copy:\n"
> + "rep movsb\n"
> + "retq\n"
> +
> +".section .text.nolibc_memcpy\n"
> +".weak memcpy\n"
> +"memcpy:\n"
> + "movq %rdi, %rax\n"
> + "movq %rdx, %rcx\n"
> + "rep movsb\n"
> + "retq\n"
> +);
Btw, sir, this can be simplified more by merging the forward copy
path, only using two "rep movsb" for both memmove() and memcpy()
should be enough?
```
__asm__ (
".section .text.nolibc_memmove_memcpy\n"
".weak memmove\n"
".weak memcpy\n"
"memmove:\n"
"movq %rdx, %rcx\n"
"movq %rdi, %rdx\n"
"movq %rdi, %rax\n"
"subq %rsi, %rdx\n"
"cmpq %rcx, %rdx\n"
"jnb __nolibc_forward_copy\n"
"leaq -1(%rdi, %rcx, 1), %rdi\n"
"leaq -1(%rsi, %rcx, 1), %rsi\n"
"std\n"
"rep movsb\n"
"cld\n"
"retq\n"
"memcpy:\n"
"movq %rdi, %rax\n"
"movq %rdx, %rcx\n"
"__nolibc_forward_copy:\n"
"rep movsb\n"
"retq\n"
);
```
Thought?
-- Viro
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 6:07 ` Alviro Iskandar Setiawan
@ 2023-09-02 6:11 ` Ammar Faizi
2023-09-02 6:22 ` Willy Tarreau
0 siblings, 1 reply; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 6:11 UTC (permalink / raw)
To: Alviro Iskandar Setiawan
Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sat, Sep 02, 2023 at 01:07:50PM +0700, Alviro Iskandar Setiawan wrote:
> Btw, sir, this can be simplified more by merging the forward copy
> path, only using two "rep movsb" for both memmove() and memcpy()
> should be enough?
> ```
> __asm__ (
> ".section .text.nolibc_memmove_memcpy\n"
> ".weak memmove\n"
> ".weak memcpy\n"
> "memmove:\n"
> "movq %rdx, %rcx\n"
> "movq %rdi, %rdx\n"
> "movq %rdi, %rax\n"
> "subq %rsi, %rdx\n"
> "cmpq %rcx, %rdx\n"
> "jnb __nolibc_forward_copy\n"
> "leaq -1(%rdi, %rcx, 1), %rdi\n"
> "leaq -1(%rsi, %rcx, 1), %rsi\n"
> "std\n"
> "rep movsb\n"
> "cld\n"
> "retq\n"
>
> "memcpy:\n"
> "movq %rdi, %rax\n"
> "movq %rdx, %rcx\n"
> "__nolibc_forward_copy:\n"
> "rep movsb\n"
> "retq\n"
> );
> ```
Looks good. I'll apply that change.
--
Ammar Faizi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 6:11 ` Ammar Faizi
@ 2023-09-02 6:22 ` Willy Tarreau
2023-09-02 6:37 ` Ammar Faizi
0 siblings, 1 reply; 12+ messages in thread
From: Willy Tarreau @ 2023-09-02 6:22 UTC (permalink / raw)
To: Ammar Faizi
Cc: Alviro Iskandar Setiawan, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sat, Sep 02, 2023 at 01:11:06PM +0700, Ammar Faizi wrote:
> On Sat, Sep 02, 2023 at 01:07:50PM +0700, Alviro Iskandar Setiawan wrote:
> > Btw, sir, this can be simplified more by merging the forward copy
> > path, only using two "rep movsb" for both memmove() and memcpy()
> > should be enough?
> > ```
> > __asm__ (
> > ".section .text.nolibc_memmove_memcpy\n"
> > ".weak memmove\n"
> > ".weak memcpy\n"
> > "memmove:\n"
> > "movq %rdx, %rcx\n"
> > "movq %rdi, %rdx\n"
> > "movq %rdi, %rax\n"
> > "subq %rsi, %rdx\n"
> > "cmpq %rcx, %rdx\n"
> > "jnb __nolibc_forward_copy\n"
> > "leaq -1(%rdi, %rcx, 1), %rdi\n"
> > "leaq -1(%rsi, %rcx, 1), %rsi\n"
> > "std\n"
> > "rep movsb\n"
> > "cld\n"
> > "retq\n"
> >
> > "memcpy:\n"
> > "movq %rdi, %rax\n"
> > "movq %rdx, %rcx\n"
> > "__nolibc_forward_copy:\n"
> > "rep movsb\n"
> > "retq\n"
> > );
> > ```
>
> Looks good. I'll apply that change.
Note that in this case we simply don't need a special
version of memcpy(), memmove() is always OK for this,
so you can simplify this further by starting with:
memcpy:
memmove:
...
Willy
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 6:22 ` Willy Tarreau
@ 2023-09-02 6:37 ` Ammar Faizi
2023-09-02 12:29 ` Alviro Iskandar Setiawan
0 siblings, 1 reply; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 6:37 UTC (permalink / raw)
To: Willy Tarreau
Cc: Alviro Iskandar Setiawan, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sat, Sep 02, 2023 at 08:22:37AM +0200, Willy Tarreau wrote:
> Note that in this case we simply don't need a special
> version of memcpy(), memmove() is always OK for this,
> so you can simplify this further by starting with:
>
> memcpy:
> memmove:
> ...
Ok, I'll do that.
--
Ammar Faizi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 6:37 ` Ammar Faizi
@ 2023-09-02 12:29 ` Alviro Iskandar Setiawan
2023-09-02 12:36 ` Ammar Faizi
0 siblings, 1 reply; 12+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-09-02 12:29 UTC (permalink / raw)
To: Ammar Faizi
Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sat, Sep 2, 2023 at 1:38 PM Ammar Faizi wrote:
> Ok, I'll do that.
Another micro-optimization. Since the likely case is the forward copy,
make it the case that doesn't take the jump.
Pseudo C:
if (unlikely(dst - src < n))
backward_copy();
else
forward_copy();
-- Viro
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 12:29 ` Alviro Iskandar Setiawan
@ 2023-09-02 12:36 ` Ammar Faizi
2023-09-03 20:35 ` David Laight
0 siblings, 1 reply; 12+ messages in thread
From: Ammar Faizi @ 2023-09-02 12:36 UTC (permalink / raw)
To: Alviro Iskandar Setiawan
Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On 2023/09/02 午後7:29, Alviro Iskandar Setiawan wrote:
> On Sat, Sep 2, 2023 at 1:38 PM Ammar Faizi wrote:
>> Ok, I'll do that.
>
> Another micro-optimization. Since the likely case is the forward copy,
> make it the case that doesn't take the jump.
>
> Pseudo C:
> if (unlikely(dst - src < n))
> backward_copy();
> else
> forward_copy();
Point taken.
--
Ammar Faizi
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 12:36 ` Ammar Faizi
@ 2023-09-03 20:35 ` David Laight
0 siblings, 0 replies; 12+ messages in thread
From: David Laight @ 2023-09-03 20:35 UTC (permalink / raw)
To: 'Ammar Faizi', Alviro Iskandar Setiawan
Cc: Willy Tarreau, Thomas Weißschuh, Nicholas Rosenberg,
Michael William Jonathan, GNU/Weeb Mailing List,
Linux Kernel Mailing List
From: Ammar Faizi
> Sent: 02 September 2023 13:37
> On 2023/09/02 午後7:29, Alviro Iskandar Setiawan wrote:
> > On Sat, Sep 2, 2023 at 1:38 PM Ammar Faizi wrote:
> >> Ok, I'll do that.
> >
> > Another micro-optimization. Since the likely case is the forward copy,
> > make it the case that doesn't take the jump.
> >
> > Pseudo C:
> > if (unlikely(dst - src < n))
> > backward_copy();
> > else
> > forward_copy();
>
> Point taken.
I believe it makes almost no difference.
I'm sure I read that modern (Intel at least) cpu never do
static branch prediction.
So 'cold cache' there is 50% chance of the branch being taken.
Nothing the compiler can do will affect it.
OTOH having
if (likely(dst - src >= n))
forwards();
else
backwards();
(and in that order) probably makes the code a bit easier to read.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-09-03 20:35 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-02 5:50 [RFC PATCH v2 0/4] nolibc x86-64 string functions Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
2023-09-02 6:07 ` Alviro Iskandar Setiawan
2023-09-02 6:11 ` Ammar Faizi
2023-09-02 6:22 ` Willy Tarreau
2023-09-02 6:37 ` Ammar Faizi
2023-09-02 12:29 ` Alviro Iskandar Setiawan
2023-09-02 12:36 ` Ammar Faizi
2023-09-03 20:35 ` David Laight
2023-09-02 5:50 ` [RFC PATCH v2 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
2023-09-02 5:50 ` [RFC PATCH v2 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox