public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHSET v1 RFC liburing 0/6] Implement the kernel style return value
@ 2021-09-29 10:16 Ammar Faizi
  2021-09-29 10:16 ` [PATCHSET v1 RFC liburing 1/6] src/syscall: " Ammar Faizi
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Ammar Faizi @ 2021-09-29 10:16 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov
  Cc: io-uring Mailing List, Louvian Lyndal, Ammar Faizi

Hi Jens,
Hi Pavel,

This is the v1 of RFC to implement the kernel style return value.

Motivation:
Currently liburing depends on libc. We want to make liburing can be
built without libc.

This idea firstly posted as an issue on the liburing GitHub
repository here: https://github.com/axboe/liburing/issues/443

The subject of the issue is: "An option to use liburing without libc?".

On Mon, Sep 27, 2021 at 4:18 PM Mahdi Rakhshandehroo <[email protected]> wrote:
> There are a couple of issues with liburing's libc dependency:
> 
>  1) libc implementations of errno, malloc, pthread etc. tend to
>     pollute the binary with unwanted global/thread-local state.
>     This makes reentrancy impossible and relocations expensive.
>  2) libc doesn't play nice with non-POSIX threading models, like
>     green threads with small stack sizes, or direct use of the
>     clone() system call. This makes interop with other
>     languages/runtimes difficult.
> 
> One could use the raw syscall interface to io_uring to address these
> concerns, but that would be somewhat painful, so it would be nice
> for liburing to support this use case out of the box. Perhaps
> something like a NOLIBC macro could be added which, if defined,
> would patch out libc constructs and replace them with non-libc
> wrappers where applicable. A few API changes might be necessary for
> the non-libc case (e.g. io_uring_get_probe/io_uring_free_probe), but
> it shouldn't break existing applications as long as it's opt-in.

----------------------------------------------------------------

### 1) Introduction

We want to make the changes incrementally, start from making it
possible to remove the `errno` variable dependency.

So this RFC aims to make it possible to remove `errno` variable
depedency from the liburing sources by implementing the kernel style
return value.

What we mean by "kernel style return value" is that, we wrap the
syscall API to make it return negative error code when error happens,
like we usually do in the kernel space code. So the caller doesn't
have to check the `errno` variable.

If we can land this "kernel style return value" on liburing, we will
start working on series to support build with no libc. These changes
will not break user land and no functional changes will be visible to
user (only affect liburing internal sources).


### 2) How to deal with __sys_io_uring_{register,setup,enter2,enter}

Currently we expose these functions (**AAA**) to userland:
**AAA**:
  1) `__sys_io_uring_register`
  2) `__sys_io_uring_setup`
  3) `__sys_io_uring_enter2`
  4) `__sys_io_uring_enter`

These functions are used by several tests. As the userland needs to
check the `errno` value to use them properly, this means those
functions always depend on libc. So we cannot change their behavior.

As such, only for the **no libc** environment case, we remove those
functions (**AAA**).

Then we introduce new functions (**BBB**) with the same name (with
extra underscore as prefix, 4 underscores). These functions do not
use `errno` variable on the caller (they use the kernel style return
value) and always exist regardless the libc existence.

**BBB**:
  1) `____sys_io_uring_register`
  2) `____sys_io_uring_setup`
  3) `____sys_io_uring_enter2`
  4) `____sys_io_uring_enter`
    
Summary
  1) **AAA** will only exist for the libc environment.

  2) **BBB** always exists.

  3) Do not use **AAA** for the liburing internal (it's just for the
     userland backward compatibility).

  4) For the libc environment, **BBB** may use `syscall(2)` and
     `errno` variable, only to emulate the kernel style return value.

  5) For the no libc environment, **BBB** will use Assembly interface
     to perform the syscall (arch dependent).

  6) Tests should not be affected, this is because of (1) and (4),
     which keep the compatibility.


### 3) How to deal syscalls

We have 3 patches in this series to wrap the syscalls, they are:
  - Add `liburing_mmap()` and `liburing_munmap()`
  - Add `liburing_madvise()`
  - Add `liburing_getrlimit()` and `liburing_setrlimit()`

For `liburing_{munmap,madvise,getrlimit,setrlimit}`, they will return
negative value of error code if error. They basically just return
an int, so nothing to worry about.

Special case is for pointer return value like `liburing_mmap()`. In
this case we take the `include/linux/err.h` file from the Linux kernel
source tree and use `IS_ERR()`, `PTR_ERR()`, `ERR_PTR()` to deal with
it.

It is implemented in patch:
  - Add kernel error header `src/kernel_err.h`


### 4) How can this help to support no libc environment?

When this kernel style return value gets adapted on liburing, we will
start working on raw syscall directly written in Assembly (arch
dependent).

Me (Ammar Faizi) will start kicking the tires from x86-64 arch.
Hopefully we will get support for other architectures as well.

The example of liburing syscall wrapper may look like this:

```c
void *liburing_mmap(void *addr, size_t length, int prot, int flags,
		    int fd, off_t offset)
{	
#ifdef LIBURING_NOLIBC
	/*
	 * This is when we build without libc.
	 *
	 * Assume __raw_mmap is the syscall written in ASM.
	 *
	 * The return value is directly taken from the syscall
	 * return value.
	 */
	return __raw_mmap(addr, length, prot, flags, fd, offset);
#else
	/*
	 * This is when libc exists.
	 */
	void *ret;

	ret = mmap(addr, length, prot, flags, fd, offset);
	if (ret == MAP_FAILED)
		ret = ERR_PTR(-errno);

	return ret;
#endif
}
```

----------------------------------------------------------------
The following changes since commit ce10538688b93dafd257ebfed7faf18844e0052d:

  test: Fix endianess issue on `bind()` and `connect()` (2021-09-27 07:45:03 -0600)

based on:

  git://git.kernel.dk/liburing.git master

are available as 6 patches in this series, all will be posted as a
response to this one.

If you want to take git tag, it is available in the Git repository at:

  git://github.com/ammarfaizi2/liburing.git tags/nolibc-support-rfc-v1

Please review!

----------------------------------------------------------------
Ammar Faizi (6):
      src/syscall: Implement the kernel style return value
      Add kernel error header `src/kernel_err.h`
      Add `liburing_mmap()` and `liburing_munmap()`
      Add `liburing_madvise()`
      Add `liburing_getrlimit()` and `liburing_setrlimit()`
      src/{queue,register,setup}: Remove `#include <errno.h>`

 src/kernel_err.h |  75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 src/queue.c      |  28 +++++++++----------------
 src/register.c   | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------------------------------------------------------------------------
 src/setup.c      |  60 ++++++++++++++++++++++++++++-------------------------
 src/syscall.c    |  92 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 src/syscall.h    |  18 ++++++++++++++++
 6 files changed, 284 insertions(+), 178 deletions(-)
 create mode 100644 src/kernel_err.h

--
Ammar Faizi



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-10-01  7:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-29 10:16 [PATCHSET v1 RFC liburing 0/6] Implement the kernel style return value Ammar Faizi
2021-09-29 10:16 ` [PATCHSET v1 RFC liburing 1/6] src/syscall: " Ammar Faizi
2021-09-29 10:16 ` [PATCHSET v1 RFC liburing 2/6] Add kernel error header `src/kernel_err.h` Ammar Faizi
2021-09-29 10:16 ` [PATCHSET v1 RFC liburing 3/6] Add `liburing_mmap()` and `liburing_munmap()` Ammar Faizi
2021-09-29 10:16 ` [PATCHSET v1 RFC liburing 4/6] Add `liburing_madvise()` Ammar Faizi
2021-09-29 10:16 ` [PATCHSET v1 RFC liburing 5/6] Add `liburing_getrlimit()` and `liburing_setrlimit()` Ammar Faizi
2021-09-29 10:16 ` [PATCHSET v1 RFC liburing 6/6] src/{queue,register,setup}: Remove `#include <errno.h>` Ammar Faizi
2021-09-29 10:21 ` [PATCHSET v1 RFC liburing 0/6] Implement the kernel style return value Ammar Faizi
2021-10-01  6:44   ` Louvian Lyndal
2021-10-01  7:36     ` Ammar Faizi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox