* [RFC PATCH v3 0/4] nolibc x86-64 string functions
@ 2023-09-02 13:35 Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
` (4 more replies)
0 siblings, 5 replies; 17+ messages in thread
From: Ammar Faizi @ 2023-09-02 13:35 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
Hi Willy,
This is an RFC patchset v3 for nolibc x86-64 string functions.
There are 4 patches in this series:
## Patch 1-2: Use `rep movsb`, `rep stosb` for:
- memcpy() and memmove()
- memset()
respectively. They can simplify the generated ASM code.
Patch 3 and 4 are not related, just a small cleanup.
## Patch 3: Remove the `_nolibc_memcpy_down()` function
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.
## Patch 4: Remove the `_nolibc_memcpy_up()` function
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.
Before this series:
```
00000000004013aa <memmove>:
4013aa: f3 0f 1e fa endbr64
4013ae: 48 39 f7 cmpq %rsi,%rdi
4013b1: 48 c7 c1 ff ff ff ff movq $0xffffffffffffffff,%rcx
4013b8: 48 89 f8 movq %rdi,%rax
4013bb: 48 0f 43 ca cmovaeq %rdx,%rcx
4013bf: 48 19 ff sbbq %rdi,%rdi
4013c2: 83 e7 02 andl $0x2,%edi
4013c5: 48 ff cf decq %rdi
4013c8: 48 85 d2 testq %rdx,%rdx
4013cb: 74 10 je 4013dd <memmove+0x33>
4013cd: 48 01 f9 addq %rdi,%rcx
4013d0: 48 ff ca decq %rdx
4013d3: 44 8a 04 0e movb (%rsi,%rcx,1),%r8b
4013d7: 44 88 04 08 movb %r8b,(%rax,%rcx,1)
4013db: eb eb jmp 4013c8 <memmove+0x1e>
4013dd: c3 retq
00000000004013de <memcpy>:
4013de: f3 0f 1e fa endbr64
4013e2: 48 89 f8 movq %rdi,%rax
4013e5: 31 c9 xorl %ecx,%ecx
4013e7: 48 39 ca cmpq %rcx,%rdx
4013ea: 74 0d je 4013f9 <memcpy+0x1b>
4013ec: 40 8a 3c 0e movb (%rsi,%rcx,1),%dil
4013f0: 40 88 3c 08 movb %dil,(%rax,%rcx,1)
4013f4: 48 ff c1 incq %rcx
4013f7: eb ee jmp 4013e7 <memcpy+0x9>
4013f9: c3 retq
00000000004013fa <memset>:
4013fa: f3 0f 1e fa endbr64
4013fe: 48 89 f8 movq %rdi,%rax
401401: 31 c9 xorl %ecx,%ecx
401403: 48 39 ca cmpq %rcx,%rdx
401406: 74 09 je 401411 <memset+0x17>
401408: 40 88 34 08 movb %sil,(%rax,%rcx,1)
40140c: 48 ff c1 incq %rcx
40140f: eb f2 jmp 401403 <memset+0x9>
401411: c3 retq
```
After this series:
```
// `memmove` is an alias for `memcpy`
000000000040149c <memcpy>:
40149c: 48 89 d1 movq %rdx,%rcx
40149f: 48 89 f8 movq %rdi,%rax
4014a2: 48 89 fa movq %rdi,%rdx
4014a5: 48 29 f2 subq %rsi,%rdx
4014a8: 48 39 ca cmpq %rcx,%rdx
4014ab: 72 03 jb 4014b0 <memcpy+0x14>
4014ad: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
4014af: c3 retq
4014b0: 48 8d 7c 0f ff leaq -0x1(%rdi,%rcx,1),%rdi
4014b5: 48 8d 74 0e ff leaq -0x1(%rsi,%rcx,1),%rsi
4014ba: fd std
4014bb: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
4014bd: fc cld
4014be: c3 retq
00000000004014bf <memset>:
4014bf: 96 xchgl %eax,%esi
4014c0: 48 89 d1 movq %rdx,%rcx
4014c3: 57 pushq %rdi
4014c4: f3 aa rep stosb %al,%es:(%rdi)
4014c6: 58 popq %rax
4014c7: c3 retq
```
## Changelog
Changes in v3:
- Make memmove as an alias for memcpy (Willy).
- Make the forward copy the likely case (Alviro).
Changes in v2:
- Shrink the memset code size:
- Use pushq %rdi / popq %rax (Alviro).
- Use xchg %eax, %esi (Willy).
- Drop the memcmp patch (need more pondering).
- Fix the broken memmove implementation (David).
Signed-off-by: Ammar Faizi <[email protected]>
---
Ammar Faizi (4):
tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
tools/nolibc: x86-64: Use `rep stosb` for `memset()`
tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
tools/include/nolibc/arch-x86_64.h | 42 ++++++++++++++++++++++++++++++
tools/include/nolibc/string.h | 36 +++++++++----------------
2 files changed, 55 insertions(+), 23 deletions(-)
base-commit: 3c9b7c4a228bf8cca2f92abb65575cdd54065302
--
Ammar Faizi
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC PATCH v3 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
2023-09-02 13:35 [RFC PATCH v3 0/4] nolibc x86-64 string functions Ammar Faizi
@ 2023-09-02 13:35 ` Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
` (3 subsequent siblings)
4 siblings, 0 replies; 17+ messages in thread
From: Ammar Faizi @ 2023-09-02 13:35 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
Simplify memcpy() and memmove() on the x86-64 arch.
The x86-64 arch has a 'rep movsb' instruction, which can perform
memcpy() using only a single instruction, given:
%rdi = destination
%rsi = source
%rcx = length
Additionally, it can also handle the overlapping case by setting DF=1
(backward copy), which can be used as the memmove() implementation.
Before this patch:
```
00000000000010ab <memmove>:
10ab: 48 89 f8 mov %rdi,%rax
10ae: 31 c9 xor %ecx,%ecx
10b0: 48 39 f7 cmp %rsi,%rdi
10b3: 48 83 d1 ff adc $0xffffffffffffffff,%rcx
10b7: 48 85 d2 test %rdx,%rdx
10ba: 74 25 je 10e1 <memmove+0x36>
10bc: 48 83 c9 01 or $0x1,%rcx
10c0: 48 39 f0 cmp %rsi,%rax
10c3: 48 c7 c7 ff ff ff ff mov $0xffffffffffffffff,%rdi
10ca: 48 0f 43 fa cmovae %rdx,%rdi
10ce: 48 01 cf add %rcx,%rdi
10d1: 44 8a 04 3e mov (%rsi,%rdi,1),%r8b
10d5: 44 88 04 38 mov %r8b,(%rax,%rdi,1)
10d9: 48 01 cf add %rcx,%rdi
10dc: 48 ff ca dec %rdx
10df: 75 f0 jne 10d1 <memmove+0x26>
10e1: c3 ret
00000000000010e2 <memcpy>:
10e2: 48 89 f8 mov %rdi,%rax
10e5: 48 85 d2 test %rdx,%rdx
10e8: 74 12 je 10fc <memcpy+0x1a>
10ea: 31 c9 xor %ecx,%ecx
10ec: 40 8a 3c 0e mov (%rsi,%rcx,1),%dil
10f0: 40 88 3c 08 mov %dil,(%rax,%rcx,1)
10f4: 48 ff c1 inc %rcx
10f7: 48 39 ca cmp %rcx,%rdx
10fa: 75 f0 jne 10ec <memcpy+0xa>
10fc: c3 ret
```
After this patch:
```
// memmove is an alias for memcpy
000000000040133b <memcpy>:
40133b: 48 89 d1 mov %rdx,%rcx
40133e: 48 89 f8 mov %rdi,%rax
401341: 48 89 fa mov %rdi,%rdx
401344: 48 29 f2 sub %rsi,%rdx
401347: 48 39 ca cmp %rcx,%rdx
40134a: 72 03 jb 40134f <memcpy+0x14>
40134c: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
40134e: c3 ret
40134f: 48 8d 7c 0f ff lea -0x1(%rdi,%rcx,1),%rdi
401354: 48 8d 74 0e ff lea -0x1(%rsi,%rcx,1),%rsi
401359: fd std
40135a: f3 a4 rep movsb %ds:(%rsi),%es:(%rdi)
40135c: fc cld
40135d: c3 ret
```
v3:
- Make memmove as an alias for memcpy (Willy).
- Make the forward copy the likely case (Alviro).
v2:
- Fix the broken memmove implementation (David).
Link: https://lore.kernel.org/lkml/[email protected]
Link: https://lore.kernel.org/lkml/[email protected]
Suggested-by: David Laight <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
tools/include/nolibc/arch-x86_64.h | 29 +++++++++++++++++++++++++++++
tools/include/nolibc/string.h | 4 ++++
2 files changed, 33 insertions(+)
diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index e5ccb926c90306b6..aece7d8951535a36 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -156,21 +156,50 @@
/* startup code */
/*
* x86-64 System V ABI mandates:
* 1) %rsp must be 16-byte aligned right before the function call.
* 2) The deepest stack frame should be zero (the %rbp).
*
*/
void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
{
__asm__ volatile (
"xor %ebp, %ebp\n" /* zero the stack frame */
"mov %rsp, %rdi\n" /* save stack pointer to %rdi, as arg1 of _start_c */
"and $-16, %rsp\n" /* %rsp must be 16-byte aligned before call */
"call _start_c\n" /* transfer to c runtime */
"hlt\n" /* ensure it does not return */
);
__builtin_unreachable();
}
+#define NOLIBC_ARCH_HAS_MEMMOVE
+void *memmove(void *dst, const void *src, size_t len);
+
+#define NOLIBC_ARCH_HAS_MEMCPY
+void *memcpy(void *dst, const void *src, size_t len);
+
+__asm__ (
+".section .text.nolibc_memmove_memcpy\n"
+".weak memmove\n"
+".weak memcpy\n"
+"memmove:\n"
+"memcpy:\n"
+ "movq %rdx, %rcx\n\t"
+ "movq %rdi, %rax\n\t"
+ "movq %rdi, %rdx\n\t"
+ "subq %rsi, %rdx\n\t"
+ "cmpq %rcx, %rdx\n\t"
+ "jb .Lbackward_copy\n\t"
+ "rep movsb\n\t"
+ "retq\n"
+".Lbackward_copy:"
+ "leaq -1(%rdi, %rcx, 1), %rdi\n\t"
+ "leaq -1(%rsi, %rcx, 1), %rsi\n\t"
+ "std\n\t"
+ "rep movsb\n\t"
+ "cld\n\t"
+ "retq\n"
+);
+
#endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 0c2e06c7c4772bc6..6eca267ec6fa7177 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -32,70 +32,74 @@ void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
{
size_t pos = 0;
while (pos < len) {
((char *)dst)[pos] = ((const char *)src)[pos];
pos++;
}
return dst;
}
static __attribute__((unused))
void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
{
while (len) {
len--;
((char *)dst)[len] = ((const char *)src)[len];
}
return dst;
}
+#ifndef NOLIBC_ARCH_HAS_MEMMOVE
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memmove")))
void *memmove(void *dst, const void *src, size_t len)
{
size_t dir, pos;
pos = len;
dir = -1;
if (dst < src) {
pos = -1;
dir = 1;
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
len--;
}
return dst;
}
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
+#ifndef NOLIBC_ARCH_HAS_MEMCPY
/* must be exported, as it's used by libgcc on ARM */
__attribute__((weak,unused,section(".text.nolibc_memcpy")))
void *memcpy(void *dst, const void *src, size_t len)
{
return _nolibc_memcpy_up(dst, src, len);
}
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memset")))
void *memset(void *dst, int b, size_t len)
{
char *p = dst;
while (len--) {
/* prevent gcc from recognizing memset() here */
__asm__ volatile("");
*(p++) = b;
}
return dst;
}
static __attribute__((unused))
char *strchr(const char *s, int c)
{
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-02 13:35 [RFC PATCH v3 0/4] nolibc x86-64 string functions Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
@ 2023-09-02 13:35 ` Ammar Faizi
2023-09-02 19:28 ` Alviro Iskandar Setiawan
2023-09-02 13:35 ` [RFC PATCH v3 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
` (2 subsequent siblings)
4 siblings, 1 reply; 17+ messages in thread
From: Ammar Faizi @ 2023-09-02 13:35 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
Simplify memset() on the x86-64 arch.
The x86-64 arch has a 'rep stosb' instruction, which can perform
memset() using only a single instruction, given:
%al = value (just like the second argument of memset())
%rdi = destination
%rcx = length
Before this patch:
```
00000000000010c9 <memset>:
10c9: 48 89 f8 mov %rdi,%rax
10cc: 48 85 d2 test %rdx,%rdx
10cf: 74 0e je 10df <memset+0x16>
10d1: 31 c9 xor %ecx,%ecx
10d3: 40 88 34 08 mov %sil,(%rax,%rcx,1)
10d7: 48 ff c1 inc %rcx
10da: 48 39 ca cmp %rcx,%rdx
10dd: 75 f4 jne 10d3 <memset+0xa>
10df: c3 ret
```
After this patch:
```
0000000000001511 <memset>:
1511: 96 xchg %eax,%esi
1512: 48 89 d1 mov %rdx,%rcx
1515: 57 push %rdi
1516: f3 aa rep stos %al,%es:(%rdi)
1518: 58 pop %rax
1519: c3 ret
```
v2:
- Use pushq %rdi / popq %rax (Alviro).
- Use xchg %eax, %esi (Willy).
Link: https://lore.kernel.org/lkml/[email protected]
Suggested-by: Alviro Iskandar Setiawan <[email protected]>
Suggested-by: Willy Tarreau <[email protected]>
Signed-off-by: Ammar Faizi <[email protected]>
---
tools/include/nolibc/arch-x86_64.h | 13 +++++++++++++
tools/include/nolibc/string.h | 2 ++
2 files changed, 15 insertions(+)
diff --git a/tools/include/nolibc/arch-x86_64.h b/tools/include/nolibc/arch-x86_64.h
index aece7d8951535a36..1502db5c58fc0c87 100644
--- a/tools/include/nolibc/arch-x86_64.h
+++ b/tools/include/nolibc/arch-x86_64.h
@@ -162,44 +162,57 @@
*
*/
void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
{
__asm__ volatile (
"xor %ebp, %ebp\n" /* zero the stack frame */
"mov %rsp, %rdi\n" /* save stack pointer to %rdi, as arg1 of _start_c */
"and $-16, %rsp\n" /* %rsp must be 16-byte aligned before call */
"call _start_c\n" /* transfer to c runtime */
"hlt\n" /* ensure it does not return */
);
__builtin_unreachable();
}
#define NOLIBC_ARCH_HAS_MEMMOVE
void *memmove(void *dst, const void *src, size_t len);
#define NOLIBC_ARCH_HAS_MEMCPY
void *memcpy(void *dst, const void *src, size_t len);
+#define NOLIBC_ARCH_HAS_MEMSET
+void *memset(void *dst, int c, size_t len);
+
__asm__ (
".section .text.nolibc_memmove_memcpy\n"
".weak memmove\n"
".weak memcpy\n"
"memmove:\n"
"memcpy:\n"
"movq %rdx, %rcx\n\t"
"movq %rdi, %rax\n\t"
"movq %rdi, %rdx\n\t"
"subq %rsi, %rdx\n\t"
"cmpq %rcx, %rdx\n\t"
"jb .Lbackward_copy\n\t"
"rep movsb\n\t"
"retq\n"
".Lbackward_copy:"
"leaq -1(%rdi, %rcx, 1), %rdi\n\t"
"leaq -1(%rsi, %rcx, 1), %rsi\n\t"
"std\n\t"
"rep movsb\n\t"
"cld\n\t"
"retq\n"
+
+".section .text.nolibc_memset\n"
+".weak memset\n"
+"memset:\n"
+ "xchgl %eax, %esi\n"
+ "movq %rdx, %rcx\n"
+ "pushq %rdi\n"
+ "rep stosb\n"
+ "popq %rax\n"
+ "retq\n"
);
#endif /* _NOLIBC_ARCH_X86_64_H */
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 6eca267ec6fa7177..1bad6121ef8c4ab5 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -67,55 +67,57 @@ void *memmove(void *dst, const void *src, size_t len)
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
len--;
}
return dst;
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
#ifndef NOLIBC_ARCH_HAS_MEMCPY
/* must be exported, as it's used by libgcc on ARM */
__attribute__((weak,unused,section(".text.nolibc_memcpy")))
void *memcpy(void *dst, const void *src, size_t len)
{
return _nolibc_memcpy_up(dst, src, len);
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
+#ifndef NOLIBC_ARCH_HAS_MEMSET
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memset")))
void *memset(void *dst, int b, size_t len)
{
char *p = dst;
while (len--) {
/* prevent gcc from recognizing memset() here */
__asm__ volatile("");
*(p++) = b;
}
return dst;
}
+#endif /* #ifndef NOLIBC_ARCH_HAS_MEMSET */
static __attribute__((unused))
char *strchr(const char *s, int c)
{
while (*s) {
if (*s == (char)c)
return (char *)s;
s++;
}
return NULL;
}
static __attribute__((unused))
int strcmp(const char *a, const char *b)
{
unsigned int c;
int diff;
while (!(diff = (unsigned char)*a++ - (c = (unsigned char)*b++)) && c)
;
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v3 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
2023-09-02 13:35 [RFC PATCH v3 0/4] nolibc x86-64 string functions Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
@ 2023-09-02 13:35 ` Ammar Faizi
2023-09-02 19:24 ` Alviro Iskandar Setiawan
2023-09-02 13:35 ` [RFC PATCH v3 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
2023-09-03 20:38 ` [RFC PATCH v3 0/4] nolibc x86-64 string functions David Laight
4 siblings, 1 reply; 17+ messages in thread
From: Ammar Faizi @ 2023-09-02 13:35 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.
Signed-off-by: Ammar Faizi <[email protected]>
---
tools/include/nolibc/string.h | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 1bad6121ef8c4ab5..22dcb3f566baeefe 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -22,50 +22,40 @@ int memcmp(const void *s1, const void *s2, size_t n)
int c1 = 0;
while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) {
ofs++;
}
return c1;
}
static __attribute__((unused))
void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
{
size_t pos = 0;
while (pos < len) {
((char *)dst)[pos] = ((const char *)src)[pos];
pos++;
}
return dst;
}
-static __attribute__((unused))
-void *_nolibc_memcpy_down(void *dst, const void *src, size_t len)
-{
- while (len) {
- len--;
- ((char *)dst)[len] = ((const char *)src)[len];
- }
- return dst;
-}
-
#ifndef NOLIBC_ARCH_HAS_MEMMOVE
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memmove")))
void *memmove(void *dst, const void *src, size_t len)
{
size_t dir, pos;
pos = len;
dir = -1;
if (dst < src) {
pos = -1;
dir = 1;
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH v3 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
2023-09-02 13:35 [RFC PATCH v3 0/4] nolibc x86-64 string functions Ammar Faizi
` (2 preceding siblings ...)
2023-09-02 13:35 ` [RFC PATCH v3 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
@ 2023-09-02 13:35 ` Ammar Faizi
2023-09-02 19:26 ` Alviro Iskandar Setiawan
2023-09-03 20:38 ` [RFC PATCH v3 0/4] nolibc x86-64 string functions David Laight
4 siblings, 1 reply; 17+ messages in thread
From: Ammar Faizi @ 2023-09-02 13:35 UTC (permalink / raw)
To: Willy Tarreau, Thomas Weißschuh
Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.
Signed-off-by: Ammar Faizi <[email protected]>
---
tools/include/nolibc/string.h | 20 +++++++-------------
1 file changed, 7 insertions(+), 13 deletions(-)
diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index 22dcb3f566baeefe..a01c69dd495f550c 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -10,84 +10,78 @@
#include "std.h"
static void *malloc(size_t len);
/*
* As much as possible, please keep functions alphabetically sorted.
*/
static __attribute__((unused))
int memcmp(const void *s1, const void *s2, size_t n)
{
size_t ofs = 0;
int c1 = 0;
while (ofs < n && !(c1 = ((unsigned char *)s1)[ofs] - ((unsigned char *)s2)[ofs])) {
ofs++;
}
return c1;
}
-static __attribute__((unused))
-void *_nolibc_memcpy_up(void *dst, const void *src, size_t len)
-{
- size_t pos = 0;
-
- while (pos < len) {
- ((char *)dst)[pos] = ((const char *)src)[pos];
- pos++;
- }
- return dst;
-}
-
#ifndef NOLIBC_ARCH_HAS_MEMMOVE
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memmove")))
void *memmove(void *dst, const void *src, size_t len)
{
size_t dir, pos;
pos = len;
dir = -1;
if (dst < src) {
pos = -1;
dir = 1;
}
while (len) {
pos += dir;
((char *)dst)[pos] = ((const char *)src)[pos];
len--;
}
return dst;
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMMOVE */
#ifndef NOLIBC_ARCH_HAS_MEMCPY
/* must be exported, as it's used by libgcc on ARM */
__attribute__((weak,unused,section(".text.nolibc_memcpy")))
void *memcpy(void *dst, const void *src, size_t len)
{
- return _nolibc_memcpy_up(dst, src, len);
+ size_t pos = 0;
+
+ while (pos < len) {
+ ((char *)dst)[pos] = ((const char *)src)[pos];
+ pos++;
+ }
+ return dst;
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMCPY */
#ifndef NOLIBC_ARCH_HAS_MEMSET
/* might be ignored by the compiler without -ffreestanding, then found as
* missing.
*/
__attribute__((weak,unused,section(".text.nolibc_memset")))
void *memset(void *dst, int b, size_t len)
{
char *p = dst;
while (len--) {
/* prevent gcc from recognizing memset() here */
__asm__ volatile("");
*(p++) = b;
}
return dst;
}
#endif /* #ifndef NOLIBC_ARCH_HAS_MEMSET */
--
Ammar Faizi
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
2023-09-02 13:35 ` [RFC PATCH v3 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
@ 2023-09-02 19:24 ` Alviro Iskandar Setiawan
0 siblings, 0 replies; 17+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-09-02 19:24 UTC (permalink / raw)
To: Ammar Faizi
Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sat, Sep 2, 2023 at 8:35 PM Ammar Faizi <[email protected]> wrote:
> This nolibc internal function is not used. Delete it. It was probably
> supposed to handle memmove(), but today the memmove() has its own
> implementation.
>
> Signed-off-by: Ammar Faizi <[email protected]>
Reviewed-by: Alviro Iskandar Setiawan <[email protected]>
-- Viro
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function
2023-09-02 13:35 ` [RFC PATCH v3 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
@ 2023-09-02 19:26 ` Alviro Iskandar Setiawan
0 siblings, 0 replies; 17+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-09-02 19:26 UTC (permalink / raw)
To: Ammar Faizi
Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sat, Sep 2, 2023 at 8:35 PM Ammar Faizi wrote:
> This function is only called by memcpy(), there is no real reason to
> have this wrapper. Delete this function and move the code to memcpy()
> directly.
>
> Signed-off-by: Ammar Faizi <[email protected]>
Reviewed-by: Alviro Iskandar Setiawan <[email protected]>
-- Viro
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-02 13:35 ` [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
@ 2023-09-02 19:28 ` Alviro Iskandar Setiawan
2023-09-02 19:34 ` Ammar Faizi
0 siblings, 1 reply; 17+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-09-02 19:28 UTC (permalink / raw)
To: Ammar Faizi
Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sat, Sep 2, 2023 at 8:35 PM Ammar Faizi wrote:
> __asm__ (
> ".section .text.nolibc_memmove_memcpy\n"
> ".weak memmove\n"
> ".weak memcpy\n"
> "memmove:\n"
> "memcpy:\n"
> "movq %rdx, %rcx\n\t"
> "movq %rdi, %rax\n\t"
> "movq %rdi, %rdx\n\t"
> "subq %rsi, %rdx\n\t"
> "cmpq %rcx, %rdx\n\t"
> "jb .Lbackward_copy\n\t"
> "rep movsb\n\t"
> "retq\n"
> ".Lbackward_copy:"
> "leaq -1(%rdi, %rcx, 1), %rdi\n\t"
> "leaq -1(%rsi, %rcx, 1), %rsi\n\t"
> "std\n\t"
> "rep movsb\n\t"
> "cld\n\t"
> "retq\n"
> +
> +".section .text.nolibc_memset\n"
> +".weak memset\n"
> +"memset:\n"
> + "xchgl %eax, %esi\n"
> + "movq %rdx, %rcx\n"
> + "pushq %rdi\n"
> + "rep stosb\n"
> + "popq %rax\n"
> + "retq\n"
> );
nit: Be consistent. Use \n\t for the memset too.
Apart from that:
Reviewed-by: Alviro Iskandar Setiawan <[email protected]>
-- Viro
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-02 19:28 ` Alviro Iskandar Setiawan
@ 2023-09-02 19:34 ` Ammar Faizi
2023-09-02 19:38 ` Alviro Iskandar Setiawan
2023-09-03 8:17 ` Willy Tarreau
0 siblings, 2 replies; 17+ messages in thread
From: Ammar Faizi @ 2023-09-02 19:34 UTC (permalink / raw)
To: Alviro Iskandar Setiawan
Cc: Willy Tarreau, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sun, Sep 03, 2023 at 02:28:18AM +0700, Alviro Iskandar Setiawan wrote:
> nit: Be consistent. Use \n\t for the memset too.
Good catch, I'll fix that in v4 revision.
--
Ammar Faizi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-02 19:34 ` Ammar Faizi
@ 2023-09-02 19:38 ` Alviro Iskandar Setiawan
2023-09-02 19:39 ` Ammar Faizi
2023-09-03 8:17 ` Willy Tarreau
1 sibling, 1 reply; 17+ messages in thread
From: Alviro Iskandar Setiawan @ 2023-09-02 19:38 UTC (permalink / raw)
To: Ammar Faizi; +Cc: Michael William Jonathan, GNU/Weeb Mailing List
[ Strip kernel participants from the CC (GNU/Weeb only) ]
On Sun, Sep 3, 2023 at 2:34 AM Ammar Faizi wrote:
> Good catch, I'll fix that in v4 revision.
buset, ga tidur pak?
-- Viro
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-02 19:38 ` Alviro Iskandar Setiawan
@ 2023-09-02 19:39 ` Ammar Faizi
0 siblings, 0 replies; 17+ messages in thread
From: Ammar Faizi @ 2023-09-02 19:39 UTC (permalink / raw)
To: Alviro Iskandar Setiawan; +Cc: Michael William Jonathan, GNU/Weeb Mailing List
On Sun, Sep 03, 2023 at 02:38:38AM +0700, Alviro Iskandar Setiawan wrote:
> buset, ga tidur pak?
Ntar agak siangan paling wkwk.
--
Ammar Faizi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-02 19:34 ` Ammar Faizi
2023-09-02 19:38 ` Alviro Iskandar Setiawan
@ 2023-09-03 8:17 ` Willy Tarreau
2023-09-03 8:34 ` Ammar Nofan Faizi
2023-09-03 8:39 ` Ammar Faizi
1 sibling, 2 replies; 17+ messages in thread
From: Willy Tarreau @ 2023-09-03 8:17 UTC (permalink / raw)
To: Ammar Faizi
Cc: Alviro Iskandar Setiawan, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sun, Sep 03, 2023 at 02:34:22AM +0700, Ammar Faizi wrote:
> On Sun, Sep 03, 2023 at 02:28:18AM +0700, Alviro Iskandar Setiawan wrote:
> > nit: Be consistent. Use \n\t for the memset too.
>
> Good catch, I'll fix that in v4 revision.
Ammar, I'm overall fine with your series. I can as well add the missing \t
to your patch while merging it, or wait for your v4, just let me know.
Thanks,
Willy
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-03 8:17 ` Willy Tarreau
@ 2023-09-03 8:34 ` Ammar Nofan Faizi
2023-09-03 8:39 ` Ammar Faizi
1 sibling, 0 replies; 17+ messages in thread
From: Ammar Nofan Faizi @ 2023-09-03 8:34 UTC (permalink / raw)
To: Willy Tarreau
Cc: Alviro Iskandar Setiawan, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On 2023/09/03 午後3:17, Willy Tarreau wrote:
> On Sun, Sep 03, 2023 at 02:34:22AM +0700, Ammar Faizi wrote:
>> On Sun, Sep 03, 2023 at 02:28:18AM +0700, Alviro Iskandar Setiawan wrote:
>>> nit: Be consistent. Use \n\t for the memset too.
>>
>> Good catch, I'll fix that in v4 revision.
>
> Ammar, I'm overall fine with your series. I can as well add the missing \t
> to your patch while merging it, or wait for your v4, just let me know.
I'm now traveling and will be available in Jakarta on Monday. Thus, I actually planned to send the v4 revision on Monday.
However, since you don't have further objections to this series, I'll leave the trivial missing bit to you. Please merge this series and I will not send a v4 revision.
Thanks,
--
Ammar Faizi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-03 8:17 ` Willy Tarreau
2023-09-03 8:34 ` Ammar Nofan Faizi
@ 2023-09-03 8:39 ` Ammar Faizi
2023-09-03 9:55 ` Willy Tarreau
1 sibling, 1 reply; 17+ messages in thread
From: Ammar Faizi @ 2023-09-03 8:39 UTC (permalink / raw)
To: Willy Tarreau
Cc: Alviro Iskandar Setiawan, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
[ Resend, I sent it using the wrong From address. ]
On 2023/09/03 午後3:17, Willy Tarreau wrote:
> On Sun, Sep 03, 2023 at 02:34:22AM +0700, Ammar Faizi wrote:
>> On Sun, Sep 03, 2023 at 02:28:18AM +0700, Alviro Iskandar Setiawan wrote:
>>> nit: Be consistent. Use \n\t for the memset too.
>>
>> Good catch, I'll fix that in v4 revision.
>
> Ammar, I'm overall fine with your series. I can as well add the missing \t
> to your patch while merging it, or wait for your v4, just let me know.
I'm now traveling and will be available in Jakarta on Monday. Thus, I
actually planned to send the v4 revision on Monday.
However, since you don't have further objections to this series, I'll
leave the trivial missing bit to you. Please merge this series and I
will not send a v4 revision.
Thanks,
--
Ammar Faizi
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()`
2023-09-03 8:39 ` Ammar Faizi
@ 2023-09-03 9:55 ` Willy Tarreau
0 siblings, 0 replies; 17+ messages in thread
From: Willy Tarreau @ 2023-09-03 9:55 UTC (permalink / raw)
To: Ammar Faizi
Cc: Alviro Iskandar Setiawan, Thomas Weißschuh, David Laight,
Nicholas Rosenberg, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sun, Sep 03, 2023 at 03:39:33PM +0700, Ammar Faizi wrote:
> [ Resend, I sent it using the wrong From address. ]
>
> On 2023/09/03 ??3:17, Willy Tarreau wrote:
> > On Sun, Sep 03, 2023 at 02:34:22AM +0700, Ammar Faizi wrote:
> > > On Sun, Sep 03, 2023 at 02:28:18AM +0700, Alviro Iskandar Setiawan wrote:
> > > > nit: Be consistent. Use \n\t for the memset too.
> > >
> > > Good catch, I'll fix that in v4 revision.
> >
> > Ammar, I'm overall fine with your series. I can as well add the missing \t
> > to your patch while merging it, or wait for your v4, just let me know.
>
> I'm now traveling and will be available in Jakarta on Monday. Thus, I
> actually planned to send the v4 revision on Monday.
>
> However, since you don't have further objections to this series, I'll
> leave the trivial missing bit to you. Please merge this series and I
> will not send a v4 revision.
OK now merged with the \t appended. No need to spend your time on a v4
anymore.
Thanks!
Willy
^ permalink raw reply [flat|nested] 17+ messages in thread
* RE: [RFC PATCH v3 0/4] nolibc x86-64 string functions
2023-09-02 13:35 [RFC PATCH v3 0/4] nolibc x86-64 string functions Ammar Faizi
` (3 preceding siblings ...)
2023-09-02 13:35 ` [RFC PATCH v3 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
@ 2023-09-03 20:38 ` David Laight
2023-09-03 21:19 ` Willy Tarreau
4 siblings, 1 reply; 17+ messages in thread
From: David Laight @ 2023-09-03 20:38 UTC (permalink / raw)
To: 'Ammar Faizi', Willy Tarreau, Thomas Weißschuh
Cc: Nicholas Rosenberg, Alviro Iskandar Setiawan,
Michael William Jonathan, GNU/Weeb Mailing List,
Linux Kernel Mailing List
From: Ammar Faizi
> Sent: 02 September 2023 14:35
>
> This is an RFC patchset v3 for nolibc x86-64 string functions.
>
> There are 4 patches in this series:
>
> ## Patch 1-2: Use `rep movsb`, `rep stosb` for:
> - memcpy() and memmove()
> - memset()
> respectively. They can simplify the generated ASM code.
It is worth pointing out that while the code size for 'rep xxxb'
is smaller, the performance is terrible.
The only time it is ever good is for the optimised forwards
copies on cpu that support it.
reverse, stos and scas are always horrid.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH v3 0/4] nolibc x86-64 string functions
2023-09-03 20:38 ` [RFC PATCH v3 0/4] nolibc x86-64 string functions David Laight
@ 2023-09-03 21:19 ` Willy Tarreau
0 siblings, 0 replies; 17+ messages in thread
From: Willy Tarreau @ 2023-09-03 21:19 UTC (permalink / raw)
To: David Laight
Cc: 'Ammar Faizi', Thomas Weißschuh, Nicholas Rosenberg,
Alviro Iskandar Setiawan, Michael William Jonathan,
GNU/Weeb Mailing List, Linux Kernel Mailing List
On Sun, Sep 03, 2023 at 08:38:22PM +0000, David Laight wrote:
> From: Ammar Faizi
> > Sent: 02 September 2023 14:35
> >
> > This is an RFC patchset v3 for nolibc x86-64 string functions.
> >
> > There are 4 patches in this series:
> >
> > ## Patch 1-2: Use `rep movsb`, `rep stosb` for:
> > - memcpy() and memmove()
> > - memset()
> > respectively. They can simplify the generated ASM code.
>
> It is worth pointing out that while the code size for 'rep xxxb'
> is smaller, the performance is terrible.
> The only time it is ever good is for the optimised forwards
> copies on cpu that support it.
>
> reverse, stos and scas are always horrid.
It's terrible compared to other approaches but not *that* bad. Also we
absolutely don't care about performance here, rather about correctness
and compact size.
Regards,
Willy
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-09-03 21:19 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-02 13:35 [RFC PATCH v3 0/4] nolibc x86-64 string functions Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
2023-09-02 19:28 ` Alviro Iskandar Setiawan
2023-09-02 19:34 ` Ammar Faizi
2023-09-02 19:38 ` Alviro Iskandar Setiawan
2023-09-02 19:39 ` Ammar Faizi
2023-09-03 8:17 ` Willy Tarreau
2023-09-03 8:34 ` Ammar Nofan Faizi
2023-09-03 8:39 ` Ammar Faizi
2023-09-03 9:55 ` Willy Tarreau
2023-09-02 13:35 ` [RFC PATCH v3 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
2023-09-02 19:24 ` Alviro Iskandar Setiawan
2023-09-02 13:35 ` [RFC PATCH v3 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
2023-09-02 19:26 ` Alviro Iskandar Setiawan
2023-09-03 20:38 ` [RFC PATCH v3 0/4] nolibc x86-64 string functions David Laight
2023-09-03 21:19 ` Willy Tarreau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox