public inbox for [email protected]
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/4] nolibc x86-64 string functions
@ 2023-09-02 13:35 Ammar Faizi
  2023-09-02 13:35 ` [RFC PATCH v3 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Ammar Faizi @ 2023-09-02 13:35 UTC (permalink / raw)
  To: Willy Tarreau, Thomas Weißschuh
  Cc: Ammar Faizi, David Laight, Nicholas Rosenberg,
	Alviro Iskandar Setiawan, Michael William Jonathan,
	GNU/Weeb Mailing List, Linux Kernel Mailing List

Hi Willy,

This is an RFC patchset v3 for nolibc x86-64 string functions.

There are 4 patches in this series:

## Patch 1-2: Use `rep movsb`, `rep stosb` for:
    - memcpy() and memmove()
    - memset()
respectively. They can simplify the generated ASM code.

Patch 3 and 4 are not related, just a small cleanup.

## Patch 3: Remove the `_nolibc_memcpy_down()` function
This nolibc internal function is not used. Delete it. It was probably
supposed to handle memmove(), but today the memmove() has its own
implementation.

## Patch 4: Remove the `_nolibc_memcpy_up()` function
This function is only called by memcpy(), there is no real reason to
have this wrapper. Delete this function and move the code to memcpy()
directly.

Before this series:
```
  00000000004013aa <memmove>:
    4013aa: f3 0f 1e fa           endbr64
    4013ae: 48 39 f7              cmpq   %rsi,%rdi
    4013b1: 48 c7 c1 ff ff ff ff  movq   $0xffffffffffffffff,%rcx
    4013b8: 48 89 f8              movq   %rdi,%rax
    4013bb: 48 0f 43 ca           cmovaeq %rdx,%rcx
    4013bf: 48 19 ff              sbbq   %rdi,%rdi
    4013c2: 83 e7 02              andl   $0x2,%edi
    4013c5: 48 ff cf              decq   %rdi
    4013c8: 48 85 d2              testq  %rdx,%rdx
    4013cb: 74 10                 je     4013dd <memmove+0x33>
    4013cd: 48 01 f9              addq   %rdi,%rcx
    4013d0: 48 ff ca              decq   %rdx
    4013d3: 44 8a 04 0e           movb   (%rsi,%rcx,1),%r8b
    4013d7: 44 88 04 08           movb   %r8b,(%rax,%rcx,1)
    4013db: eb eb                 jmp    4013c8 <memmove+0x1e>
    4013dd: c3                    retq

  00000000004013de <memcpy>:
    4013de: f3 0f 1e fa           endbr64
    4013e2: 48 89 f8              movq   %rdi,%rax
    4013e5: 31 c9                 xorl   %ecx,%ecx
    4013e7: 48 39 ca              cmpq   %rcx,%rdx
    4013ea: 74 0d                 je     4013f9 <memcpy+0x1b>
    4013ec: 40 8a 3c 0e           movb   (%rsi,%rcx,1),%dil
    4013f0: 40 88 3c 08           movb   %dil,(%rax,%rcx,1)
    4013f4: 48 ff c1              incq   %rcx
    4013f7: eb ee                 jmp    4013e7 <memcpy+0x9>
    4013f9: c3                    retq

  00000000004013fa <memset>:
    4013fa: f3 0f 1e fa           endbr64
    4013fe: 48 89 f8              movq   %rdi,%rax
    401401: 31 c9                 xorl   %ecx,%ecx
    401403: 48 39 ca              cmpq   %rcx,%rdx
    401406: 74 09                 je     401411 <memset+0x17>
    401408: 40 88 34 08           movb   %sil,(%rax,%rcx,1)
    40140c: 48 ff c1              incq   %rcx
    40140f: eb f2                 jmp    401403 <memset+0x9>
    401411: c3                    retq
```

After this series:
```
  // `memmove` is an alias for `memcpy`
  000000000040149c <memcpy>:
    40149c: 48 89 d1              movq   %rdx,%rcx
    40149f: 48 89 f8              movq   %rdi,%rax
    4014a2: 48 89 fa              movq   %rdi,%rdx
    4014a5: 48 29 f2              subq   %rsi,%rdx
    4014a8: 48 39 ca              cmpq   %rcx,%rdx
    4014ab: 72 03                 jb     4014b0 <memcpy+0x14>
    4014ad: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    4014af: c3                    retq
    4014b0: 48 8d 7c 0f ff        leaq   -0x1(%rdi,%rcx,1),%rdi
    4014b5: 48 8d 74 0e ff        leaq   -0x1(%rsi,%rcx,1),%rsi
    4014ba: fd                    std
    4014bb: f3 a4                 rep movsb %ds:(%rsi),%es:(%rdi)
    4014bd: fc                    cld
    4014be: c3                    retq

  00000000004014bf <memset>:
    4014bf: 96                    xchgl  %eax,%esi
    4014c0: 48 89 d1              movq   %rdx,%rcx
    4014c3: 57                    pushq  %rdi
    4014c4: f3 aa                 rep stosb %al,%es:(%rdi)
    4014c6: 58                    popq   %rax
    4014c7: c3                    retq
```

## Changelog

Changes in v3:
  - Make memmove as an alias for memcpy (Willy).
  - Make the forward copy the likely case (Alviro).

Changes in v2:
  - Shrink the memset code size:
      - Use pushq %rdi / popq %rax (Alviro).
      - Use xchg %eax, %esi (Willy).
  - Drop the memcmp patch (need more pondering).
  - Fix the broken memmove implementation (David).

Signed-off-by: Ammar Faizi <[email protected]>
---

Ammar Faizi (4):
  tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()`
  tools/nolibc: x86-64: Use `rep stosb` for `memset()`
  tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function
  tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function

 tools/include/nolibc/arch-x86_64.h | 42 ++++++++++++++++++++++++++++++
 tools/include/nolibc/string.h      | 36 +++++++++----------------
 2 files changed, 55 insertions(+), 23 deletions(-)


base-commit: 3c9b7c4a228bf8cca2f92abb65575cdd54065302
-- 
Ammar Faizi


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-09-03 21:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-02 13:35 [RFC PATCH v3 0/4] nolibc x86-64 string functions Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 1/4] tools/nolibc: x86-64: Use `rep movsb` for `memcpy()` and `memmove()` Ammar Faizi
2023-09-02 13:35 ` [RFC PATCH v3 2/4] tools/nolibc: x86-64: Use `rep stosb` for `memset()` Ammar Faizi
2023-09-02 19:28   ` Alviro Iskandar Setiawan
2023-09-02 19:34     ` Ammar Faizi
2023-09-02 19:38       ` Alviro Iskandar Setiawan
2023-09-02 19:39         ` Ammar Faizi
2023-09-03  8:17       ` Willy Tarreau
2023-09-03  8:34         ` Ammar Nofan Faizi
2023-09-03  8:39         ` Ammar Faizi
2023-09-03  9:55           ` Willy Tarreau
2023-09-02 13:35 ` [RFC PATCH v3 3/4] tools/nolibc: string: Remove the `_nolibc_memcpy_down()` function Ammar Faizi
2023-09-02 19:24   ` Alviro Iskandar Setiawan
2023-09-02 13:35 ` [RFC PATCH v3 4/4] tools/nolibc: string: Remove the `_nolibc_memcpy_up()` function Ammar Faizi
2023-09-02 19:26   ` Alviro Iskandar Setiawan
2023-09-03 20:38 ` [RFC PATCH v3 0/4] nolibc x86-64 string functions David Laight
2023-09-03 21:19   ` Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox