* [PATCH v3 1/9] bpf: Leverage sockptr_t in BPF getsockopt hook
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-17 14:55 ` [PATCH v3 2/9] bpf: Leverage sockptr_t in BPF setsockopt hook Breno Leitao
` (7 subsequent siblings)
8 siblings, 0 replies; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: bpf, linux-kernel, netdev, io-uring, krisman
Leverage sockptr_t structure to have an argument that is either an
userspace pointer, or, a kernel pointer.
This makes this function flexible, so, we can mix and match user and
kernel space pointers. The main motivation for this change is to use it
in the io_uring {g,s}etsockopt(), which will use a userspace pointer for
*optval, but, a kernel value for optlen.
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/bpf-cgroup.h | 5 +++--
kernel/bpf/cgroup.c | 20 +++++++++++---------
net/socket.c | 5 +++--
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 57e9e109257e..d16cb99fd4f1 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -139,9 +139,10 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
int *optname, char __user *optval,
int *optlen, char **kernel_optval);
+
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
- int optname, char __user *optval,
- int __user *optlen, int max_optlen,
+ int optname, sockptr_t optval,
+ sockptr_t optlen, int max_optlen,
int retval);
int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 5b2741aa0d9b..ebc8c58f7e46 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1875,8 +1875,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
}
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
- int optname, char __user *optval,
- int __user *optlen, int max_optlen,
+ int optname, sockptr_t optval,
+ sockptr_t optlen, int max_optlen,
int retval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1903,8 +1903,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
* one that kernel returned as well to let
* BPF programs inspect the value.
*/
-
- if (get_user(ctx.optlen, optlen)) {
+ if (copy_from_sockptr(&ctx.optlen, optlen,
+ sizeof(ctx.optlen))) {
ret = -EFAULT;
goto out;
}
@@ -1915,8 +1915,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
}
orig_optlen = ctx.optlen;
- if (copy_from_user(ctx.optval, optval,
- min(ctx.optlen, max_optlen)) != 0) {
+ if (copy_from_sockptr(ctx.optval, optval,
+ min(ctx.optlen, max_optlen))) {
ret = -EFAULT;
goto out;
}
@@ -1930,7 +1930,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
if (ret < 0)
goto out;
- if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
+ if (!sockptr_is_null(optval) &&
+ (ctx.optlen > max_optlen || ctx.optlen < 0)) {
if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) {
pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n",
ctx.optlen, max_optlen);
@@ -1942,11 +1943,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
}
if (ctx.optlen != 0) {
- if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) {
+ if (!sockptr_is_null(optval) &&
+ copy_to_sockptr(optval, ctx.optval, ctx.optlen)) {
ret = -EFAULT;
goto out;
}
- if (put_user(ctx.optlen, optlen)) {
+ if (copy_to_sockptr(optlen, &ctx.optlen, sizeof(ctx.optlen))) {
ret = -EFAULT;
goto out;
}
diff --git a/net/socket.c b/net/socket.c
index 1dc23f5298ba..33ea5eb91ade 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2311,8 +2311,9 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
if (!in_compat_syscall())
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
- optval, optlen, max_optlen,
- err);
+ USER_SOCKPTR(optval),
+ USER_SOCKPTR(optlen),
+ max_optlen, err);
out_put:
fput_light(sock->file, fput_needed);
return err;
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v3 2/9] bpf: Leverage sockptr_t in BPF setsockopt hook
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
2023-08-17 14:55 ` [PATCH v3 1/9] bpf: Leverage sockptr_t in BPF getsockopt hook Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-17 14:55 ` [PATCH v3 3/9] net/socket: Break down __sys_setsockopt Breno Leitao
` (6 subsequent siblings)
8 siblings, 0 replies; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Andrii Nakryiko, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: bpf, linux-kernel, netdev, io-uring, krisman
Change BPF setsockopt hook (__cgroup_bpf_run_filter_setsockopt()) to use
sockptr instead of user pointers. This brings flexibility to the
function, since it could be called with userspace or kernel pointers.
This change will allow the creation of a core sock_setsockopt, called
do_sock_setsockopt(), which will be called from both the system call path
and by io_uring command.
This also aligns with the getsockopt() counterpart, which is now using
sockptr_t types.
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/bpf-cgroup.h | 2 +-
kernel/bpf/cgroup.c | 5 +++--
net/socket.c | 2 +-
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index d16cb99fd4f1..5e3419eb267a 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -137,7 +137,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
enum cgroup_bpf_attach_type atype);
int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
- int *optname, char __user *optval,
+ int *optname, sockptr_t optval,
int *optlen, char **kernel_optval);
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ebc8c58f7e46..f0dedd4f7f2e 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1785,7 +1785,7 @@ static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx,
}
int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
- int *optname, char __user *optval,
+ int *optname, sockptr_t optval,
int *optlen, char **kernel_optval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1808,7 +1808,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
ctx.optlen = *optlen;
- if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
+ if (copy_from_sockptr(ctx.optval, optval,
+ min(*optlen, max_optlen))) {
ret = -EFAULT;
goto out;
}
diff --git a/net/socket.c b/net/socket.c
index 33ea5eb91ade..2c3ef8862b4f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2246,7 +2246,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
if (!in_compat_syscall())
err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
- user_optval, &optlen,
+ optval, &optlen,
&kernel_optval);
if (err < 0)
goto out_put;
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v3 3/9] net/socket: Break down __sys_setsockopt
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
2023-08-17 14:55 ` [PATCH v3 1/9] bpf: Leverage sockptr_t in BPF getsockopt hook Breno Leitao
2023-08-17 14:55 ` [PATCH v3 2/9] bpf: Leverage sockptr_t in BPF setsockopt hook Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-19 14:35 ` Willem de Bruijn
2023-08-17 14:55 ` [PATCH v3 4/9] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
` (5 subsequent siblings)
8 siblings, 1 reply; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: bpf, linux-kernel, netdev, io-uring, krisman
Split __sys_setsockopt() into two functions by removing the core
logic into a sub-function (do_sock_setsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_setsockopt() will be called by io_uring setsockopt() command
operation in the following patch.
Signed-off-by: Breno Leitao <[email protected]>
---
include/net/sock.h | 2 ++
net/socket.c | 39 +++++++++++++++++++++++++--------------
2 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 2eb916d1ff64..2a0324275347 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1853,6 +1853,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, unsigned int optlen);
int sock_setsockopt(struct socket *sock, int level, int op,
sockptr_t optval, unsigned int optlen);
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, int optlen);
int sk_getsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, sockptr_t optlen);
diff --git a/net/socket.c b/net/socket.c
index 2c3ef8862b4f..b5e4398a6b4d 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2221,30 +2221,20 @@ static bool sock_use_custom_sol_socket(const struct socket *sock)
return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
}
-/*
- * Set a socket option. Because we don't know the option lengths we have
- * to pass the user mode parameter for the protocols to sort out.
- */
-int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
- int optlen)
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, int optlen)
{
- sockptr_t optval = USER_SOCKPTR(user_optval);
char *kernel_optval = NULL;
- int err, fput_needed;
- struct socket *sock;
+ int err;
if (optlen < 0)
return -EINVAL;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- return err;
-
err = security_socket_setsockopt(sock, level, optname);
if (err)
goto out_put;
- if (!in_compat_syscall())
+ if (!compat)
err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
optval, &optlen,
&kernel_optval);
@@ -2266,6 +2256,27 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
optlen);
kfree(kernel_optval);
out_put:
+ return err;
+}
+EXPORT_SYMBOL(do_sock_setsockopt);
+
+/* Set a socket option. Because we don't know the option lengths we have
+ * to pass the user mode parameter for the protocols to sort out.
+ */
+int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
+ int optlen)
+{
+ sockptr_t optval = USER_SOCKPTR(user_optval);
+ bool compat = in_compat_syscall();
+ int err, fput_needed;
+ struct socket *sock;
+
+ sock = sockfd_lookup_light(fd, &err, &fput_needed);
+ if (!sock)
+ return err;
+
+ err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
+
fput_light(sock->file, fput_needed);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 3/9] net/socket: Break down __sys_setsockopt
2023-08-17 14:55 ` [PATCH v3 3/9] net/socket: Break down __sys_setsockopt Breno Leitao
@ 2023-08-19 14:35 ` Willem de Bruijn
0 siblings, 0 replies; 24+ messages in thread
From: Willem de Bruijn @ 2023-08-19 14:35 UTC (permalink / raw)
To: Breno Leitao, sdf, axboe, asml.silence, willemdebruijn.kernel,
martin.lau, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: bpf, linux-kernel, netdev, io-uring, krisman
Breno Leitao wrote:
> Split __sys_setsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_setsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.
>
> do_sock_setsockopt() will be called by io_uring setsockopt() command
> operation in the following patch.
>
> Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
> ---
> include/net/sock.h | 2 ++
> net/socket.c | 39 +++++++++++++++++++++++++--------------
> 2 files changed, 27 insertions(+), 14 deletions(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 2eb916d1ff64..2a0324275347 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1853,6 +1853,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
> sockptr_t optval, unsigned int optlen);
> int sock_setsockopt(struct socket *sock, int level, int op,
> sockptr_t optval, unsigned int optlen);
> +int do_sock_setsockopt(struct socket *sock, bool compat, int level,
> + int optname, sockptr_t optval, int optlen);
Somewhat surprising that optlen type differs between __sys_setsockopt
and sock_setsockopt. But agreed that this code should follow
__sys_setsockopt.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3 4/9] io_uring/cmd: Pass compat mode in issue_flags
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (2 preceding siblings ...)
2023-08-17 14:55 ` [PATCH v3 3/9] net/socket: Break down __sys_setsockopt Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-17 14:55 ` [PATCH v3 5/9] selftests/net: Extract uring helpers to be reusable Breno Leitao
` (4 subsequent siblings)
8 siblings, 0 replies; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau
Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni, krisman
Create a new flag to track if the operation is running compat mode.
This basically check the context->compat and pass it to the issue_flags,
so, it could be queried later in the callbacks.
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/io_uring.h | 1 +
io_uring/uring_cmd.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 106cdc55ff3b..bc53b35966ed 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -20,6 +20,7 @@ enum io_uring_cmd_flags {
IO_URING_F_SQE128 = (1 << 8),
IO_URING_F_CQE32 = (1 << 9),
IO_URING_F_IOPOLL = (1 << 10),
+ IO_URING_F_COMPAT = (1 << 11),
};
struct io_uring_cmd {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 8e7a03c1b20e..5f32083bd0a5 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -129,6 +129,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
issue_flags |= IO_URING_F_SQE128;
if (ctx->flags & IORING_SETUP_CQE32)
issue_flags |= IO_URING_F_CQE32;
+ if (ctx->compat)
+ issue_flags |= IO_URING_F_COMPAT;
if (ctx->flags & IORING_SETUP_IOPOLL) {
if (!file->f_op->uring_cmd_iopoll)
return -EOPNOTSUPP;
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v3 5/9] selftests/net: Extract uring helpers to be reusable
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (3 preceding siblings ...)
2023-08-17 14:55 ` [PATCH v3 4/9] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-17 14:55 ` [PATCH v3 6/9] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
` (3 subsequent siblings)
8 siblings, 0 replies; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Shuah Khan
Cc: bpf, linux-kernel, netdev, io-uring, krisman,
open list:KERNEL SELFTEST FRAMEWORK
Instead of defining basic io_uring functions in the test case, move them
to a common directory, so, other tests can use them.
This simplify the test code and reuse the common liburing
infrastructure. This is basically a copy of what we have in
io_uring_zerocopy_tx with some minor improvements to make checkpatch
happy.
A follow-up test will use the same helpers in a BPF sockopt test.
Signed-off-by: Breno Leitao <[email protected]>
---
tools/include/io_uring/mini_liburing.h | 282 ++++++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/io_uring_zerocopy_tx.c | 268 +----------------
3 files changed, 285 insertions(+), 266 deletions(-)
create mode 100644 tools/include/io_uring/mini_liburing.h
diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h
new file mode 100644
index 000000000000..9ccb16074eb5
--- /dev/null
+++ b/tools/include/io_uring/mini_liburing.h
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: MIT */
+
+#include <linux/io_uring.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+struct io_sq_ring {
+ unsigned int *head;
+ unsigned int *tail;
+ unsigned int *ring_mask;
+ unsigned int *ring_entries;
+ unsigned int *flags;
+ unsigned int *array;
+};
+
+struct io_cq_ring {
+ unsigned int *head;
+ unsigned int *tail;
+ unsigned int *ring_mask;
+ unsigned int *ring_entries;
+ struct io_uring_cqe *cqes;
+};
+
+struct io_uring_sq {
+ unsigned int *khead;
+ unsigned int *ktail;
+ unsigned int *kring_mask;
+ unsigned int *kring_entries;
+ unsigned int *kflags;
+ unsigned int *kdropped;
+ unsigned int *array;
+ struct io_uring_sqe *sqes;
+
+ unsigned int sqe_head;
+ unsigned int sqe_tail;
+
+ size_t ring_sz;
+};
+
+struct io_uring_cq {
+ unsigned int *khead;
+ unsigned int *ktail;
+ unsigned int *kring_mask;
+ unsigned int *kring_entries;
+ unsigned int *koverflow;
+ struct io_uring_cqe *cqes;
+
+ size_t ring_sz;
+};
+
+struct io_uring {
+ struct io_uring_sq sq;
+ struct io_uring_cq cq;
+ int ring_fd;
+};
+
+#if defined(__x86_64) || defined(__i386__)
+#define read_barrier() __asm__ __volatile__("":::"memory")
+#define write_barrier() __asm__ __volatile__("":::"memory")
+#else
+#define read_barrier() __sync_synchronize()
+#define write_barrier() __sync_synchronize()
+#endif
+
+static inline int io_uring_mmap(int fd, struct io_uring_params *p,
+ struct io_uring_sq *sq, struct io_uring_cq *cq)
+{
+ size_t size;
+ void *ptr;
+ int ret;
+
+ sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned int);
+ ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
+ if (ptr == MAP_FAILED)
+ return -errno;
+ sq->khead = ptr + p->sq_off.head;
+ sq->ktail = ptr + p->sq_off.tail;
+ sq->kring_mask = ptr + p->sq_off.ring_mask;
+ sq->kring_entries = ptr + p->sq_off.ring_entries;
+ sq->kflags = ptr + p->sq_off.flags;
+ sq->kdropped = ptr + p->sq_off.dropped;
+ sq->array = ptr + p->sq_off.array;
+
+ size = p->sq_entries * sizeof(struct io_uring_sqe);
+ sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
+ if (sq->sqes == MAP_FAILED) {
+ ret = -errno;
+err:
+ munmap(sq->khead, sq->ring_sz);
+ return ret;
+ }
+
+ cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
+ ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
+ if (ptr == MAP_FAILED) {
+ ret = -errno;
+ munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
+ goto err;
+ }
+ cq->khead = ptr + p->cq_off.head;
+ cq->ktail = ptr + p->cq_off.tail;
+ cq->kring_mask = ptr + p->cq_off.ring_mask;
+ cq->kring_entries = ptr + p->cq_off.ring_entries;
+ cq->koverflow = ptr + p->cq_off.overflow;
+ cq->cqes = ptr + p->cq_off.cqes;
+ return 0;
+}
+
+static inline int io_uring_setup(unsigned int entries,
+ struct io_uring_params *p)
+{
+ return syscall(__NR_io_uring_setup, entries, p);
+}
+
+static inline int io_uring_enter(int fd, unsigned int to_submit,
+ unsigned int min_complete,
+ unsigned int flags, sigset_t *sig)
+{
+ return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
+ flags, sig, _NSIG / 8);
+}
+
+static inline int io_uring_queue_init(unsigned int entries,
+ struct io_uring *ring,
+ unsigned int flags)
+{
+ struct io_uring_params p;
+ int fd, ret;
+
+ memset(ring, 0, sizeof(*ring));
+ memset(&p, 0, sizeof(p));
+ p.flags = flags;
+
+ fd = io_uring_setup(entries, &p);
+ if (fd < 0)
+ return fd;
+ ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
+ if (!ret)
+ ring->ring_fd = fd;
+ else
+ close(fd);
+ return ret;
+}
+
+/* Get a sqe */
+static inline struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+
+ if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
+ return NULL;
+ return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
+}
+
+static inline int io_uring_wait_cqe(struct io_uring *ring,
+ struct io_uring_cqe **cqe_ptr)
+{
+ struct io_uring_cq *cq = &ring->cq;
+ const unsigned int mask = *cq->kring_mask;
+ unsigned int head = *cq->khead;
+ int ret;
+
+ *cqe_ptr = NULL;
+ do {
+ read_barrier();
+ if (head != *cq->ktail) {
+ *cqe_ptr = &cq->cqes[head & mask];
+ break;
+ }
+ ret = io_uring_enter(ring->ring_fd, 0, 1,
+ IORING_ENTER_GETEVENTS, NULL);
+ if (ret < 0)
+ return -errno;
+ } while (1);
+
+ return 0;
+}
+
+static inline int io_uring_submit(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+ const unsigned int mask = *sq->kring_mask;
+ unsigned int ktail, submitted, to_submit;
+ int ret;
+
+ read_barrier();
+ if (*sq->khead != *sq->ktail) {
+ submitted = *sq->kring_entries;
+ goto submit;
+ }
+ if (sq->sqe_head == sq->sqe_tail)
+ return 0;
+
+ ktail = *sq->ktail;
+ to_submit = sq->sqe_tail - sq->sqe_head;
+ for (submitted = 0; submitted < to_submit; submitted++) {
+ read_barrier();
+ sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
+ }
+ if (!submitted)
+ return 0;
+
+ if (*sq->ktail != ktail) {
+ write_barrier();
+ *sq->ktail = ktail;
+ write_barrier();
+ }
+submit:
+ ret = io_uring_enter(ring->ring_fd, submitted, 0,
+ IORING_ENTER_GETEVENTS, NULL);
+ return ret < 0 ? -errno : ret;
+}
+
+static inline void io_uring_queue_exit(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+
+ munmap(sq->sqes, *sq->kring_entries * sizeof(struct io_uring_sqe));
+ munmap(sq->khead, sq->ring_sz);
+ close(ring->ring_fd);
+}
+
+/* Prepare and send the SQE */
+static inline void io_uring_prep_cmd(struct io_uring_sqe *sqe, int op,
+ int sockfd,
+ int level, int optname,
+ const void *optval,
+ int optlen)
+{
+ memset(sqe, 0, sizeof(*sqe));
+ sqe->opcode = (__u8)IORING_OP_URING_CMD;
+ sqe->fd = sockfd;
+ sqe->cmd_op = op;
+
+ sqe->level = level;
+ sqe->optname = optname;
+ sqe->optval = (unsigned long long)optval;
+ sqe->optlen = optlen;
+}
+
+static inline int io_uring_register_buffers(struct io_uring *ring,
+ const struct iovec *iovecs,
+ unsigned int nr_iovecs)
+{
+ int ret;
+
+ ret = syscall(__NR_io_uring_register, ring->ring_fd,
+ IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
+ return (ret < 0) ? -errno : ret;
+}
+
+static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
+ const void *buf, size_t len, int flags)
+{
+ memset(sqe, 0, sizeof(*sqe));
+ sqe->opcode = (__u8)IORING_OP_SEND;
+ sqe->fd = sockfd;
+ sqe->addr = (unsigned long)buf;
+ sqe->len = len;
+ sqe->msg_flags = (__u32)flags;
+}
+
+static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
+ const void *buf, size_t len, int flags,
+ unsigned int zc_flags)
+{
+ io_uring_prep_send(sqe, sockfd, buf, len, flags);
+ sqe->opcode = (__u8)IORING_OP_SEND_ZC;
+ sqe->ioprio = zc_flags;
+}
+
+static inline void io_uring_cqe_seen(struct io_uring *ring)
+{
+ *(&ring->cq)->khead += 1;
+ write_barrier();
+}
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 7f3ab2a93ed6..70f4a5982b01 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -94,6 +94,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
$(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
$(OUTPUT)/tcp_inq: LDLIBS += -lpthread
$(OUTPUT)/bind_bhash: LDLIBS += -lpthread
+$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
# Rules to generate bpf obj nat6to4.o
CLANG ?= clang
diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
index 154287740172..76e604e4810e 100644
--- a/tools/testing/selftests/net/io_uring_zerocopy_tx.c
+++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
@@ -36,6 +36,8 @@
#include <sys/un.h>
#include <sys/wait.h>
+#include <io_uring/mini_liburing.h>
+
#define NOTIF_TAG 0xfffffffULL
#define NONZC_TAG 0
#define ZC_TAG 1
@@ -60,272 +62,6 @@ static struct sockaddr_storage cfg_dst_addr;
static char payload[IP_MAXPACKET] __attribute__((aligned(4096)));
-struct io_sq_ring {
- unsigned *head;
- unsigned *tail;
- unsigned *ring_mask;
- unsigned *ring_entries;
- unsigned *flags;
- unsigned *array;
-};
-
-struct io_cq_ring {
- unsigned *head;
- unsigned *tail;
- unsigned *ring_mask;
- unsigned *ring_entries;
- struct io_uring_cqe *cqes;
-};
-
-struct io_uring_sq {
- unsigned *khead;
- unsigned *ktail;
- unsigned *kring_mask;
- unsigned *kring_entries;
- unsigned *kflags;
- unsigned *kdropped;
- unsigned *array;
- struct io_uring_sqe *sqes;
-
- unsigned sqe_head;
- unsigned sqe_tail;
-
- size_t ring_sz;
-};
-
-struct io_uring_cq {
- unsigned *khead;
- unsigned *ktail;
- unsigned *kring_mask;
- unsigned *kring_entries;
- unsigned *koverflow;
- struct io_uring_cqe *cqes;
-
- size_t ring_sz;
-};
-
-struct io_uring {
- struct io_uring_sq sq;
- struct io_uring_cq cq;
- int ring_fd;
-};
-
-#ifdef __alpha__
-# ifndef __NR_io_uring_setup
-# define __NR_io_uring_setup 535
-# endif
-# ifndef __NR_io_uring_enter
-# define __NR_io_uring_enter 536
-# endif
-# ifndef __NR_io_uring_register
-# define __NR_io_uring_register 537
-# endif
-#else /* !__alpha__ */
-# ifndef __NR_io_uring_setup
-# define __NR_io_uring_setup 425
-# endif
-# ifndef __NR_io_uring_enter
-# define __NR_io_uring_enter 426
-# endif
-# ifndef __NR_io_uring_register
-# define __NR_io_uring_register 427
-# endif
-#endif
-
-#if defined(__x86_64) || defined(__i386__)
-#define read_barrier() __asm__ __volatile__("":::"memory")
-#define write_barrier() __asm__ __volatile__("":::"memory")
-#else
-
-#define read_barrier() __sync_synchronize()
-#define write_barrier() __sync_synchronize()
-#endif
-
-static int io_uring_setup(unsigned int entries, struct io_uring_params *p)
-{
- return syscall(__NR_io_uring_setup, entries, p);
-}
-
-static int io_uring_enter(int fd, unsigned int to_submit,
- unsigned int min_complete,
- unsigned int flags, sigset_t *sig)
-{
- return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
- flags, sig, _NSIG / 8);
-}
-
-static int io_uring_register_buffers(struct io_uring *ring,
- const struct iovec *iovecs,
- unsigned nr_iovecs)
-{
- int ret;
-
- ret = syscall(__NR_io_uring_register, ring->ring_fd,
- IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
- return (ret < 0) ? -errno : ret;
-}
-
-static int io_uring_mmap(int fd, struct io_uring_params *p,
- struct io_uring_sq *sq, struct io_uring_cq *cq)
-{
- size_t size;
- void *ptr;
- int ret;
-
- sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned);
- ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
- if (ptr == MAP_FAILED)
- return -errno;
- sq->khead = ptr + p->sq_off.head;
- sq->ktail = ptr + p->sq_off.tail;
- sq->kring_mask = ptr + p->sq_off.ring_mask;
- sq->kring_entries = ptr + p->sq_off.ring_entries;
- sq->kflags = ptr + p->sq_off.flags;
- sq->kdropped = ptr + p->sq_off.dropped;
- sq->array = ptr + p->sq_off.array;
-
- size = p->sq_entries * sizeof(struct io_uring_sqe);
- sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
- if (sq->sqes == MAP_FAILED) {
- ret = -errno;
-err:
- munmap(sq->khead, sq->ring_sz);
- return ret;
- }
-
- cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
- ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
- if (ptr == MAP_FAILED) {
- ret = -errno;
- munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
- goto err;
- }
- cq->khead = ptr + p->cq_off.head;
- cq->ktail = ptr + p->cq_off.tail;
- cq->kring_mask = ptr + p->cq_off.ring_mask;
- cq->kring_entries = ptr + p->cq_off.ring_entries;
- cq->koverflow = ptr + p->cq_off.overflow;
- cq->cqes = ptr + p->cq_off.cqes;
- return 0;
-}
-
-static int io_uring_queue_init(unsigned entries, struct io_uring *ring,
- unsigned flags)
-{
- struct io_uring_params p;
- int fd, ret;
-
- memset(ring, 0, sizeof(*ring));
- memset(&p, 0, sizeof(p));
- p.flags = flags;
-
- fd = io_uring_setup(entries, &p);
- if (fd < 0)
- return fd;
- ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
- if (!ret)
- ring->ring_fd = fd;
- else
- close(fd);
- return ret;
-}
-
-static int io_uring_submit(struct io_uring *ring)
-{
- struct io_uring_sq *sq = &ring->sq;
- const unsigned mask = *sq->kring_mask;
- unsigned ktail, submitted, to_submit;
- int ret;
-
- read_barrier();
- if (*sq->khead != *sq->ktail) {
- submitted = *sq->kring_entries;
- goto submit;
- }
- if (sq->sqe_head == sq->sqe_tail)
- return 0;
-
- ktail = *sq->ktail;
- to_submit = sq->sqe_tail - sq->sqe_head;
- for (submitted = 0; submitted < to_submit; submitted++) {
- read_barrier();
- sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
- }
- if (!submitted)
- return 0;
-
- if (*sq->ktail != ktail) {
- write_barrier();
- *sq->ktail = ktail;
- write_barrier();
- }
-submit:
- ret = io_uring_enter(ring->ring_fd, submitted, 0,
- IORING_ENTER_GETEVENTS, NULL);
- return ret < 0 ? -errno : ret;
-}
-
-static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
- const void *buf, size_t len, int flags)
-{
- memset(sqe, 0, sizeof(*sqe));
- sqe->opcode = (__u8) IORING_OP_SEND;
- sqe->fd = sockfd;
- sqe->addr = (unsigned long) buf;
- sqe->len = len;
- sqe->msg_flags = (__u32) flags;
-}
-
-static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
- const void *buf, size_t len, int flags,
- unsigned zc_flags)
-{
- io_uring_prep_send(sqe, sockfd, buf, len, flags);
- sqe->opcode = (__u8) IORING_OP_SEND_ZC;
- sqe->ioprio = zc_flags;
-}
-
-static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
-{
- struct io_uring_sq *sq = &ring->sq;
-
- if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
- return NULL;
- return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
-}
-
-static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr)
-{
- struct io_uring_cq *cq = &ring->cq;
- const unsigned mask = *cq->kring_mask;
- unsigned head = *cq->khead;
- int ret;
-
- *cqe_ptr = NULL;
- do {
- read_barrier();
- if (head != *cq->ktail) {
- *cqe_ptr = &cq->cqes[head & mask];
- break;
- }
- ret = io_uring_enter(ring->ring_fd, 0, 1,
- IORING_ENTER_GETEVENTS, NULL);
- if (ret < 0)
- return -errno;
- } while (1);
-
- return 0;
-}
-
-static inline void io_uring_cqe_seen(struct io_uring *ring)
-{
- *(&ring->cq)->khead += 1;
- write_barrier();
-}
-
static unsigned long gettimeofday_ms(void)
{
struct timeval tv;
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v3 6/9] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (4 preceding siblings ...)
2023-08-17 14:55 ` [PATCH v3 5/9] selftests/net: Extract uring helpers to be reusable Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-17 18:38 ` Gabriel Krisman Bertazi
2023-08-17 14:55 ` [PATCH v3 7/9] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
` (2 subsequent siblings)
8 siblings, 1 reply; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau
Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni, krisman
Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
where a sockptr_t is either userspace or kernel space, and handled as
such.
Function io_uring_cmd_getsockopt() is inspired by __sys_getsockopt().
Differently from the getsockopt(2), the optlen field is not a userspace
pointers. In getsockopt(2), userspace provides optlen pointer, which is
overwritten by the kernel. In this implementation, userspace passes a
u32, and the new value is returned in cqe->res. I.e., optlen is not a
pointer.
Important to say that userspace needs to keep the pointer alive until
the CQE is completed.
Signed-off-by: Breno Leitao <[email protected]>
---
include/uapi/linux/io_uring.h | 7 +++++++
io_uring/uring_cmd.c | 38 +++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 9fc7195f25df..8152151080db 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -43,6 +43,10 @@ struct io_uring_sqe {
union {
__u64 addr; /* pointer to buffer or iovecs */
__u64 splice_off_in;
+ struct {
+ __u32 level;
+ __u32 optname;
+ };
};
__u32 len; /* buffer size or number of iovecs */
union {
@@ -79,6 +83,7 @@ struct io_uring_sqe {
union {
__s32 splice_fd_in;
__u32 file_index;
+ __u32 optlen;
struct {
__u16 addr_len;
__u16 __pad3[1];
@@ -89,6 +94,7 @@ struct io_uring_sqe {
__u64 addr3;
__u64 __pad2[1];
};
+ __u64 optval;
/*
* If the ring is initialized with IORING_SETUP_SQE128, then
* this field is used for 80 bytes of arbitrary command data
@@ -729,6 +735,7 @@ struct io_uring_recvmsg_out {
enum {
SOCKET_URING_OP_SIOCINQ = 0,
SOCKET_URING_OP_SIOCOUTQ,
+ SOCKET_URING_OP_GETSOCKOPT,
};
#ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 5f32083bd0a5..7b768f1bf4df 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -168,6 +168,36 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
}
EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
+INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
+ int optname));
+static inline int io_uring_cmd_getsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+ int optname = READ_ONCE(cmd->sqe->optname);
+ int optlen = READ_ONCE(cmd->sqe->optlen);
+ int level = READ_ONCE(cmd->sqe->level);
+ int err;
+
+ err = security_socket_getsockopt(sock, level, optname);
+ if (err)
+ return err;
+
+ if (level == SOL_SOCKET) {
+ err = sk_getsockopt(sock->sk, level, optname,
+ USER_SOCKPTR(optval),
+ KERNEL_SOCKPTR(&optlen));
+ if (err)
+ return err;
+
+ return optlen;
+ }
+
+ return -EOPNOTSUPP;
+}
+
+#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
struct socket *sock = cmd->file->private_data;
@@ -189,8 +219,16 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
if (ret)
return ret;
return arg;
+ case SOCKET_URING_OP_GETSOCKOPT:
+ return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
}
+#else
+int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+ return -EOPNOTSUPP;
+}
+#endif
EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 6/9] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
2023-08-17 14:55 ` [PATCH v3 6/9] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-08-17 18:38 ` Gabriel Krisman Bertazi
2023-08-21 9:09 ` Breno Leitao
0 siblings, 1 reply; 24+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-08-17 18:38 UTC (permalink / raw)
To: Breno Leitao
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
linux-kernel, netdev, io-uring, kuba, pabeni
Breno Leitao <[email protected]> writes:
> +#if defined(CONFIG_NET)
> int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> {
> struct socket *sock = cmd->file->private_data;
> @@ -189,8 +219,16 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> if (ret)
> return ret;
> return arg;
> + case SOCKET_URING_OP_GETSOCKOPT:
> + return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
> default:
> return -EOPNOTSUPP;
> }
> }
> +#else
> +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> +{
> + return -EOPNOTSUPP;
> +}
> +#endif
> EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
The CONFIG_NET guards are unrelated and need to go in a separate commit.
Specially because it is not only gating getsockopt, but also the already
merged SOCKET_URING_OP_SIOCINQ/_OP_SIOCOUTQ.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 6/9] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
2023-08-17 18:38 ` Gabriel Krisman Bertazi
@ 2023-08-21 9:09 ` Breno Leitao
2023-08-21 14:52 ` Gabriel Krisman Bertazi
0 siblings, 1 reply; 24+ messages in thread
From: Breno Leitao @ 2023-08-21 9:09 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
linux-kernel, netdev, io-uring, kuba, pabeni
Hello Gabriel,
On Thu, Aug 17, 2023 at 02:38:46PM -0400, Gabriel Krisman Bertazi wrote:
> Breno Leitao <[email protected]> writes:
>
> > +#if defined(CONFIG_NET)
> > int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > {
> > struct socket *sock = cmd->file->private_data;
> > @@ -189,8 +219,16 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > if (ret)
> > return ret;
> > return arg;
> > + case SOCKET_URING_OP_GETSOCKOPT:
> > + return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
> > default:
> > return -EOPNOTSUPP;
> > }
> > }
> > +#else
> > +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > +{
> > + return -EOPNOTSUPP;
> > +}
> > +#endif
> > EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
>
> The CONFIG_NET guards are unrelated and need to go in a separate commit.
> Specially because it is not only gating getsockopt, but also the already
> merged SOCKET_URING_OP_SIOCINQ/_OP_SIOCOUTQ.
Well, so far, if CONFIG_NET is disable, and you call
io_uring_cmd_getsockopt, the callbacks will be called and returned
-EOPNOTSUPP.
With this new patch, it will eventually call sk_getsockopt which does
not exist in the CONFIG_NET=n world. That is why I have this
protection now. I.e, this `#if defined(CONFIG_NET)` is only necessary now,
since it is the first time this function (io_uring_cmd_sock) will call a
function that does not exist if CONFIG_NET is disabled.
I can split it in a different patch, if you think it makes a difference.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 6/9] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
2023-08-21 9:09 ` Breno Leitao
@ 2023-08-21 14:52 ` Gabriel Krisman Bertazi
0 siblings, 0 replies; 24+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-08-21 14:52 UTC (permalink / raw)
To: Breno Leitao
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
linux-kernel, netdev, io-uring, kuba, pabeni
Breno Leitao <[email protected]> writes:
> Hello Gabriel,
>
> On Thu, Aug 17, 2023 at 02:38:46PM -0400, Gabriel Krisman Bertazi wrote:
>> Breno Leitao <[email protected]> writes:
>>
>> > +#if defined(CONFIG_NET)
>> > int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>> > {
>> > struct socket *sock = cmd->file->private_data;
>> > @@ -189,8 +219,16 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>> > if (ret)
>> > return ret;
>> > return arg;
>> > + case SOCKET_URING_OP_GETSOCKOPT:
>> > + return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
>> > default:
>> > return -EOPNOTSUPP;
>> > }
>> > }
>> > +#else
>> > +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>> > +{
>> > + return -EOPNOTSUPP;
>> > +}
>> > +#endif
>> > EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
>>
>> The CONFIG_NET guards are unrelated and need to go in a separate commit.
>> Specially because it is not only gating getsockopt, but also the already
>> merged SOCKET_URING_OP_SIOCINQ/_OP_SIOCOUTQ.
>
> Well, so far, if CONFIG_NET is disable, and you call
> io_uring_cmd_getsockopt, the callbacks will be called and returned
> -EOPNOTSUPP.
I'm not talking about io_uring_cmd_getsockopt; that would obviously
return -EOPNOTSUPP before as well. But you are changing
io_uring_cmd_sock, which does more things:
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
[...]
if (!prot || !prot->ioctl)
return -EOPNOTSUPP;
switch (cmd->sqe->cmd_op) {
case SOCKET_URING_OP_SIOCINQ:
ret = prot->ioctl(sk, SIOCINQ, &arg);
if (ret)
return ret;
return arg;
case SOCKET_URING_OP_SIOCOUTQ:
ret = prot->ioctl(sk, SIOCOUTQ, &arg);
if (ret)
return ret;
With your patch, these two cmd_op, are now being gated by CONFIG_NET. I
think it makes sense for them to be gated by it, But it should
be in a separated change, or at least explained in your commit message.
> With this new patch, it will eventually call sk_getsockopt which does
> not exist in the CONFIG_NET=n world. That is why I have this
> protection now. I.e, this `#if defined(CONFIG_NET)` is only necessary now,
> since it is the first time this function (io_uring_cmd_sock) will call a
> function that does not exist if CONFIG_NET is disabled.
>
> I can split it in a different patch, if you think it makes a difference.
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3 7/9] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (5 preceding siblings ...)
2023-08-17 14:55 ` [PATCH v3 6/9] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-17 14:55 ` [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd Breno Leitao
2023-08-17 14:55 ` [PATCH v3 9/9] selftests/bpf/sockopt: Add io_uring support Breno Leitao
8 siblings, 0 replies; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau
Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni, krisman
Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
similar to setsockopt. This implementation leverages the function
do_sock_setsockopt(), which is shared with the setsockopt() system call
path.
Important to say that userspace needs to keep the pointer's memory alive
until the operation is completed. I.e, the memory could not be
deallocated before the CQE is returned to userspace.
Signed-off-by: Breno Leitao <[email protected]>
---
include/uapi/linux/io_uring.h | 1 +
io_uring/uring_cmd.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 8152151080db..3fe82df06abf 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -736,6 +736,7 @@ enum {
SOCKET_URING_OP_SIOCINQ = 0,
SOCKET_URING_OP_SIOCOUTQ,
SOCKET_URING_OP_GETSOCKOPT,
+ SOCKET_URING_OP_SETSOCKOPT,
};
#ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 7b768f1bf4df..a567dd32df00 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -197,6 +197,20 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
return -EOPNOTSUPP;
}
+static inline int io_uring_cmd_setsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+ bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+ int optname = READ_ONCE(cmd->sqe->optname);
+ sockptr_t optval_s = USER_SOCKPTR(optval);
+ int optlen = READ_ONCE(cmd->sqe->optlen);
+ int level = READ_ONCE(cmd->sqe->level);
+
+ return do_sock_setsockopt(sock, compat, level, optname, optval_s, optlen);
+}
+
#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
@@ -221,6 +235,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
return arg;
case SOCKET_URING_OP_GETSOCKOPT:
return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
+ case SOCKET_URING_OP_SETSOCKOPT:
+ return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (6 preceding siblings ...)
2023-08-17 14:55 ` [PATCH v3 7/9] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-17 19:08 ` Gabriel Krisman Bertazi
2023-08-17 14:55 ` [PATCH v3 9/9] selftests/bpf/sockopt: Add io_uring support Breno Leitao
8 siblings, 1 reply; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau
Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni, krisman
Add BPF hook support for getsockopts io_uring command. So, BPF cgroups
programs can run when SOCKET_URING_OP_GETSOCKOPT command is executed
through io_uring.
This implementation follows a similar approach to what
__sys_getsockopt() does, but, using USER_SOCKPTR() for optval instead of
kernel pointer.
Signed-off-by: Breno Leitao <[email protected]>
---
io_uring/uring_cmd.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index a567dd32df00..9e08a14760c3 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -5,6 +5,8 @@
#include <linux/io_uring.h>
#include <linux/security.h>
#include <linux/nospec.h>
+#include <linux/compat.h>
+#include <linux/bpf-cgroup.h>
#include <uapi/linux/io_uring.h>
#include <uapi/asm-generic/ioctls.h>
@@ -184,17 +186,23 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
if (err)
return err;
- if (level == SOL_SOCKET) {
+ err = -EOPNOTSUPP;
+ if (level == SOL_SOCKET)
err = sk_getsockopt(sock->sk, level, optname,
USER_SOCKPTR(optval),
KERNEL_SOCKPTR(&optlen));
- if (err)
- return err;
+ if (!(issue_flags & IO_URING_F_COMPAT))
+ err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level,
+ optname,
+ USER_SOCKPTR(optval),
+ KERNEL_SOCKPTR(&optlen),
+ optlen, err);
+
+ if (!err)
return optlen;
- }
- return -EOPNOTSUPP;
+ return err;
}
static inline int io_uring_cmd_setsockopt(struct socket *sock,
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-17 14:55 ` [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd Breno Leitao
@ 2023-08-17 19:08 ` Gabriel Krisman Bertazi
2023-08-21 9:14 ` Breno Leitao
2023-08-21 20:25 ` Martin KaFai Lau
0 siblings, 2 replies; 24+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-08-17 19:08 UTC (permalink / raw)
To: Breno Leitao
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
linux-kernel, netdev, io-uring, kuba, pabeni
Breno Leitao <[email protected]> writes:
> Add BPF hook support for getsockopts io_uring command. So, BPF cgroups
> programs can run when SOCKET_URING_OP_GETSOCKOPT command is executed
> through io_uring.
>
> This implementation follows a similar approach to what
> __sys_getsockopt() does, but, using USER_SOCKPTR() for optval instead of
> kernel pointer.
>
> Signed-off-by: Breno Leitao <[email protected]>
> ---
> io_uring/uring_cmd.c | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index a567dd32df00..9e08a14760c3 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -5,6 +5,8 @@
> #include <linux/io_uring.h>
> #include <linux/security.h>
> #include <linux/nospec.h>
> +#include <linux/compat.h>
> +#include <linux/bpf-cgroup.h>
>
> #include <uapi/linux/io_uring.h>
> #include <uapi/asm-generic/ioctls.h>
> @@ -184,17 +186,23 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
> if (err)
> return err;
>
> - if (level == SOL_SOCKET) {
> + err = -EOPNOTSUPP;
> + if (level == SOL_SOCKET)
> err = sk_getsockopt(sock->sk, level, optname,
> USER_SOCKPTR(optval),
> KERNEL_SOCKPTR(&optlen));
> - if (err)
> - return err;
>
> + if (!(issue_flags & IO_URING_F_COMPAT))
> + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level,
> + optname,
> + USER_SOCKPTR(optval),
> + KERNEL_SOCKPTR(&optlen),
> + optlen, err);
> +
> + if (!err)
> return optlen;
> - }
Shouldn't you call sock->ops->getsockopt for level!=SOL_SOCKET prior to
running the hook? Before this patch, it would bail out with EOPNOTSUPP,
but now the bpf hook gets called even for level!=SOL_SOCKET, which
doesn't fit __sys_getsockopt. Am I misreading the code?
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-17 19:08 ` Gabriel Krisman Bertazi
@ 2023-08-21 9:14 ` Breno Leitao
2023-08-21 17:03 ` Gabriel Krisman Bertazi
2023-08-22 13:50 ` David Laight
2023-08-21 20:25 ` Martin KaFai Lau
1 sibling, 2 replies; 24+ messages in thread
From: Breno Leitao @ 2023-08-21 9:14 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
linux-kernel, netdev, io-uring, kuba, pabeni
On Thu, Aug 17, 2023 at 03:08:47PM -0400, Gabriel Krisman Bertazi wrote:
> Breno Leitao <[email protected]> writes:
>
> > Add BPF hook support for getsockopts io_uring command. So, BPF cgroups
> > programs can run when SOCKET_URING_OP_GETSOCKOPT command is executed
> > through io_uring.
> >
> > This implementation follows a similar approach to what
> > __sys_getsockopt() does, but, using USER_SOCKPTR() for optval instead of
> > kernel pointer.
> >
> > Signed-off-by: Breno Leitao <[email protected]>
> > ---
> > io_uring/uring_cmd.c | 18 +++++++++++++-----
> > 1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > index a567dd32df00..9e08a14760c3 100644
> > --- a/io_uring/uring_cmd.c
> > +++ b/io_uring/uring_cmd.c
> > @@ -5,6 +5,8 @@
> > #include <linux/io_uring.h>
> > #include <linux/security.h>
> > #include <linux/nospec.h>
> > +#include <linux/compat.h>
> > +#include <linux/bpf-cgroup.h>
> >
> > #include <uapi/linux/io_uring.h>
> > #include <uapi/asm-generic/ioctls.h>
> > @@ -184,17 +186,23 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
> > if (err)
> > return err;
> >
> > - if (level == SOL_SOCKET) {
> > + err = -EOPNOTSUPP;
> > + if (level == SOL_SOCKET)
> > err = sk_getsockopt(sock->sk, level, optname,
> > USER_SOCKPTR(optval),
> > KERNEL_SOCKPTR(&optlen));
> > - if (err)
> > - return err;
> >
> > + if (!(issue_flags & IO_URING_F_COMPAT))
> > + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level,
> > + optname,
> > + USER_SOCKPTR(optval),
> > + KERNEL_SOCKPTR(&optlen),
> > + optlen, err);
> > +
> > + if (!err)
> > return optlen;
> > - }
>
> Shouldn't you call sock->ops->getsockopt for level!=SOL_SOCKET prior to
> running the hook?
> Before this patch, it would bail out with EOPNOTSUPP,
> but now the bpf hook gets called even for level!=SOL_SOCKET, which
> doesn't fit __sys_getsockopt. Am I misreading the code?
Not really, sock->ops->getsockopt() does not suport sockptr_t, but
__user addresses, differently from setsockopt()
int (*setsockopt)(struct socket *sock, int level,
int optname, sockptr_t optval,
unsigned int optlen);
int (*getsockopt)(struct socket *sock, int level,
int optname, char __user *optval, int __user *optlen);
In order to be able to call sock->ops->getsockopt(), the callback
function will need to accepted sockptr.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-21 9:14 ` Breno Leitao
@ 2023-08-21 17:03 ` Gabriel Krisman Bertazi
2023-08-23 13:48 ` Breno Leitao
2023-08-22 13:50 ` David Laight
1 sibling, 1 reply; 24+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-08-21 17:03 UTC (permalink / raw)
To: Breno Leitao
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
linux-kernel, netdev, io-uring, kuba, pabeni
Breno Leitao <[email protected]> writes:
> On Thu, Aug 17, 2023 at 03:08:47PM -0400, Gabriel Krisman Bertazi wrote:
>> Breno Leitao <[email protected]> writes:
>>
>> > Add BPF hook support for getsockopts io_uring command. So, BPF cgroups
>> > programs can run when SOCKET_URING_OP_GETSOCKOPT command is executed
>> > through io_uring.
>> >
>> > This implementation follows a similar approach to what
>> > __sys_getsockopt() does, but, using USER_SOCKPTR() for optval instead of
>> > kernel pointer.
>> >
>> > Signed-off-by: Breno Leitao <[email protected]>
>> > ---
>> > io_uring/uring_cmd.c | 18 +++++++++++++-----
>> > 1 file changed, 13 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> > index a567dd32df00..9e08a14760c3 100644
>> > --- a/io_uring/uring_cmd.c
>> > +++ b/io_uring/uring_cmd.c
>> > @@ -5,6 +5,8 @@
>> > #include <linux/io_uring.h>
>> > #include <linux/security.h>
>> > #include <linux/nospec.h>
>> > +#include <linux/compat.h>
>> > +#include <linux/bpf-cgroup.h>
>> >
>> > #include <uapi/linux/io_uring.h>
>> > #include <uapi/asm-generic/ioctls.h>
>> > @@ -184,17 +186,23 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
>> > if (err)
>> > return err;
>> >
>> > - if (level == SOL_SOCKET) {
>> > + err = -EOPNOTSUPP;
>> > + if (level == SOL_SOCKET)
>> > err = sk_getsockopt(sock->sk, level, optname,
>> > USER_SOCKPTR(optval),
>> > KERNEL_SOCKPTR(&optlen));
>> > - if (err)
>> > - return err;
>> >
>> > + if (!(issue_flags & IO_URING_F_COMPAT))
>> > + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level,
>> > + optname,
>> > + USER_SOCKPTR(optval),
>> > + KERNEL_SOCKPTR(&optlen),
>> > + optlen, err);
>> > +
>> > + if (!err)
>> > return optlen;
>> > - }
>>
>> Shouldn't you call sock->ops->getsockopt for level!=SOL_SOCKET prior to
>> running the hook?
>> Before this patch, it would bail out with EOPNOTSUPP,
>> but now the bpf hook gets called even for level!=SOL_SOCKET, which
>> doesn't fit __sys_getsockopt. Am I misreading the code?
>
> Not really, sock->ops->getsockopt() does not suport sockptr_t, but
> __user addresses, differently from setsockopt()
>
> int (*setsockopt)(struct socket *sock, int level,
> int optname, sockptr_t optval,
> unsigned int optlen);
> int (*getsockopt)(struct socket *sock, int level,
> int optname, char __user *optval, int __user *optlen);
>
> In order to be able to call sock->ops->getsockopt(), the callback
> function will need to accepted sockptr.
So, it seems you won't support !SOL_SOCKETs here. Then, I think you
shouldn't call the hook for those sockets. My main concern is that we
remain compatible to __sys_getsockopt when invoking the hook.
I think you should just have the following as the very first thing in
the function (but after the security_ check).
if (level != SOL_SOCKET)
return -EOPNOTSUPP;
--
Gabriel Krisman Bertazi
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-21 17:03 ` Gabriel Krisman Bertazi
@ 2023-08-23 13:48 ` Breno Leitao
0 siblings, 0 replies; 24+ messages in thread
From: Breno Leitao @ 2023-08-23 13:48 UTC (permalink / raw)
To: Gabriel Krisman Bertazi
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
linux-kernel, netdev, io-uring, kuba, pabeni
On Mon, Aug 21, 2023 at 01:03:08PM -0400, Gabriel Krisman Bertazi wrote:
> Breno Leitao <[email protected]> writes:
>
> > On Thu, Aug 17, 2023 at 03:08:47PM -0400, Gabriel Krisman Bertazi wrote:
> >> Breno Leitao <[email protected]> writes:
> >>
> >> > Add BPF hook support for getsockopts io_uring command. So, BPF cgroups
> >> > programs can run when SOCKET_URING_OP_GETSOCKOPT command is executed
> >> > through io_uring.
> >> >
> >> > This implementation follows a similar approach to what
> >> > __sys_getsockopt() does, but, using USER_SOCKPTR() for optval instead of
> >> > kernel pointer.
> >> >
> >> > Signed-off-by: Breno Leitao <[email protected]>
> >> > ---
> >> > io_uring/uring_cmd.c | 18 +++++++++++++-----
> >> > 1 file changed, 13 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> >> > index a567dd32df00..9e08a14760c3 100644
> >> > --- a/io_uring/uring_cmd.c
> >> > +++ b/io_uring/uring_cmd.c
> >> > @@ -5,6 +5,8 @@
> >> > #include <linux/io_uring.h>
> >> > #include <linux/security.h>
> >> > #include <linux/nospec.h>
> >> > +#include <linux/compat.h>
> >> > +#include <linux/bpf-cgroup.h>
> >> >
> >> > #include <uapi/linux/io_uring.h>
> >> > #include <uapi/asm-generic/ioctls.h>
> >> > @@ -184,17 +186,23 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
> >> > if (err)
> >> > return err;
> >> >
> >> > - if (level == SOL_SOCKET) {
> >> > + err = -EOPNOTSUPP;
> >> > + if (level == SOL_SOCKET)
> >> > err = sk_getsockopt(sock->sk, level, optname,
> >> > USER_SOCKPTR(optval),
> >> > KERNEL_SOCKPTR(&optlen));
> >> > - if (err)
> >> > - return err;
> >> >
> >> > + if (!(issue_flags & IO_URING_F_COMPAT))
> >> > + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level,
> >> > + optname,
> >> > + USER_SOCKPTR(optval),
> >> > + KERNEL_SOCKPTR(&optlen),
> >> > + optlen, err);
> >> > +
> >> > + if (!err)
> >> > return optlen;
> >> > - }
> >>
> >> Shouldn't you call sock->ops->getsockopt for level!=SOL_SOCKET prior to
> >> running the hook?
> >> Before this patch, it would bail out with EOPNOTSUPP,
> >> but now the bpf hook gets called even for level!=SOL_SOCKET, which
> >> doesn't fit __sys_getsockopt. Am I misreading the code?
> >
> > Not really, sock->ops->getsockopt() does not suport sockptr_t, but
> > __user addresses, differently from setsockopt()
> >
> > int (*setsockopt)(struct socket *sock, int level,
> > int optname, sockptr_t optval,
> > unsigned int optlen);
> > int (*getsockopt)(struct socket *sock, int level,
> > int optname, char __user *optval, int __user *optlen);
> >
> > In order to be able to call sock->ops->getsockopt(), the callback
> > function will need to accepted sockptr.
>
> So, it seems you won't support !SOL_SOCKETs here. Then, I think you
> shouldn't call the hook for those sockets. My main concern is that we
> remain compatible to __sys_getsockopt when invoking the hook.
>
> I think you should just have the following as the very first thing in
> the function (but after the security_ check).
>
> if (level != SOL_SOCKET)
> return -EOPNOTSUPP;
Gotcha. I will update. Thanks!
^ permalink raw reply [flat|nested] 24+ messages in thread
* RE: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-21 9:14 ` Breno Leitao
2023-08-21 17:03 ` Gabriel Krisman Bertazi
@ 2023-08-22 13:50 ` David Laight
1 sibling, 0 replies; 24+ messages in thread
From: David Laight @ 2023-08-22 13:50 UTC (permalink / raw)
To: 'Breno Leitao', Gabriel Krisman Bertazi
Cc: [email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected]
...
> Not really, sock->ops->getsockopt() does not suport sockptr_t, but
> __user addresses, differently from setsockopt()
>
...
> int (*getsockopt)(struct socket *sock, int level,
> int optname, char __user *optval, int __user *optlen);
>
> In order to be able to call sock->ops->getsockopt(), the callback
> function will need to accepted sockptr.
It is also worth looking at whether 'optlen' can be passed in
as a numeric parameter and then returned on success.
That would move the user copy into the syscall wrapper.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-17 19:08 ` Gabriel Krisman Bertazi
2023-08-21 9:14 ` Breno Leitao
@ 2023-08-21 20:25 ` Martin KaFai Lau
2023-08-25 16:53 ` Breno Leitao
1 sibling, 1 reply; 24+ messages in thread
From: Martin KaFai Lau @ 2023-08-21 20:25 UTC (permalink / raw)
To: Gabriel Krisman Bertazi, Breno Leitao
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, bpf,
linux-kernel, netdev, io-uring, kuba, pabeni
On 8/17/23 12:08 PM, Gabriel Krisman Bertazi wrote:
> Breno Leitao <[email protected]> writes:
>
>> Add BPF hook support for getsockopts io_uring command. So, BPF cgroups
>> programs can run when SOCKET_URING_OP_GETSOCKOPT command is executed
>> through io_uring.
>>
>> This implementation follows a similar approach to what
>> __sys_getsockopt() does, but, using USER_SOCKPTR() for optval instead of
>> kernel pointer.
>>
>> Signed-off-by: Breno Leitao <[email protected]>
>> ---
>> io_uring/uring_cmd.c | 18 +++++++++++++-----
>> 1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index a567dd32df00..9e08a14760c3 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -5,6 +5,8 @@
>> #include <linux/io_uring.h>
>> #include <linux/security.h>
>> #include <linux/nospec.h>
>> +#include <linux/compat.h>
>> +#include <linux/bpf-cgroup.h>
>>
>> #include <uapi/linux/io_uring.h>
>> #include <uapi/asm-generic/ioctls.h>
>> @@ -184,17 +186,23 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
>> if (err)
>> return err;
>>
>> - if (level == SOL_SOCKET) {
>> + err = -EOPNOTSUPP;
>> + if (level == SOL_SOCKET)
>> err = sk_getsockopt(sock->sk, level, optname,
>> USER_SOCKPTR(optval),
>> KERNEL_SOCKPTR(&optlen));
>> - if (err)
>> - return err;
>>
>> + if (!(issue_flags & IO_URING_F_COMPAT))
>> + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level,
>> + optname,
>> + USER_SOCKPTR(optval),
>> + KERNEL_SOCKPTR(&optlen),
>> + optlen, err);
>> +
>> + if (!err)
>> return optlen;
>> - }
>
> Shouldn't you call sock->ops->getsockopt for level!=SOL_SOCKET prior to
> running the hook? Before this patch, it would bail out with EOPNOTSUPP,
> but now the bpf hook gets called even for level!=SOL_SOCKET, which
> doesn't fit __sys_getsockopt. Am I misreading the code?
I agree it should not call into bpf if the io_uring cannot support non
SOL_SOCKET optnames. Otherwise, the bpf prog will get different optval and
optlen when running in _sys_getsockopt vs io_uring getsockopt (e.g. in regular
_sys_getsockopt(SOL_TCP), bpf expects the optval returned from tcp_getsockopt).
I think __sys_getsockopt can also be refactored similar to __sys_setsockopt in
patch 3. Yes, for non SOL_SOCKET it only supports __user *optval and __user
*optlen but may be a WARN_ON_ONCE/BUG_ON(sockpt_is_kernel(optval)) can be added
before calling ops->getsockopt()? Then this details can be hidden away from the
io_uring.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-21 20:25 ` Martin KaFai Lau
@ 2023-08-25 16:53 ` Breno Leitao
2023-08-26 0:45 ` Martin KaFai Lau
0 siblings, 1 reply; 24+ messages in thread
From: Breno Leitao @ 2023-08-25 16:53 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Gabriel Krisman Bertazi, sdf, axboe, asml.silence,
willemdebruijn.kernel, bpf, linux-kernel, netdev, io-uring, kuba,
pabeni
On Mon, Aug 21, 2023 at 01:25:25PM -0700, Martin KaFai Lau wrote:
> On 8/17/23 12:08 PM, Gabriel Krisman Bertazi wrote:
> > Shouldn't you call sock->ops->getsockopt for level!=SOL_SOCKET prior to
> > running the hook? Before this patch, it would bail out with EOPNOTSUPP,
> > but now the bpf hook gets called even for level!=SOL_SOCKET, which
> > doesn't fit __sys_getsockopt. Am I misreading the code?
> I agree it should not call into bpf if the io_uring cannot support non
> SOL_SOCKET optnames. Otherwise, the bpf prog will get different optval and
> optlen when running in _sys_getsockopt vs io_uring getsockopt (e.g. in
> regular _sys_getsockopt(SOL_TCP), bpf expects the optval returned from
> tcp_getsockopt).
>
> I think __sys_getsockopt can also be refactored similar to __sys_setsockopt
> in patch 3. Yes, for non SOL_SOCKET it only supports __user *optval and
> __user *optlen but may be a WARN_ON_ONCE/BUG_ON(sockpt_is_kernel(optval))
> can be added before calling ops->getsockopt()? Then this details can be
> hidden away from the io_uring.
Right, I've spent some time thinking about it, and this could be done.
This is a draft I have. Is it what you had in mind?
--
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 5e3419eb267a..e39743f4ce5e 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -378,7 +378,7 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
({ \
int __ret = 0; \
if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT)) \
- get_user(__ret, optlen); \
+ copy_from_sockptr(&__ret, optlen, sizeof(int)); \
__ret; \
})
diff --git a/include/net/sock.h b/include/net/sock.h
index 2a0324275347..24ea1719fd02 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1855,6 +1855,8 @@ int sock_setsockopt(struct socket *sock, int level, int op,
sockptr_t optval, unsigned int optlen);
int do_sock_setsockopt(struct socket *sock, bool compat, int level,
int optname, sockptr_t optval, int optlen);
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, sockptr_t optlen);
int sk_getsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, sockptr_t optlen);
diff --git a/net/core/sock.c b/net/core/sock.c
index 9370fd50aa2c..2a5f30f14f5c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1997,14 +1997,6 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
return 0;
}
-int sock_getsockopt(struct socket *sock, int level, int optname,
- char __user *optval, int __user *optlen)
-{
- return sk_getsockopt(sock->sk, level, optname,
- USER_SOCKPTR(optval),
- USER_SOCKPTR(optlen));
-}
-
/*
* Initialize an sk_lock.
*
diff --git a/net/socket.c b/net/socket.c
index b5e4398a6b4d..f0d6b6b1f75e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2290,6 +2290,40 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
int optname));
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, sockptr_t optlen)
+{
+ int max_optlen __maybe_unused;
+ int err;
+
+ err = security_socket_getsockopt(sock, level, optname);
+ if (err)
+ return err;
+
+ if (level == SOL_SOCKET) {
+ err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
+ } else if (unlikely(!sock->ops->getsockopt)) {
+ err = -EOPNOTSUPP;
+ } else {
+ if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
+ "Invalid argument type"))
+ return -EOPNOTSUPP;
+
+ err = sock->ops->getsockopt(sock, level, optname, optval.user,
+ optlen.user);
+ }
+
+ if (!compat) {
+ max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+ err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
+ optval, optlen, max_optlen,
+ err);
+ }
+
+ return err;
+}
+EXPORT_SYMBOL(do_sock_getsockopt);
+
/*
* Get a socket option. Because we don't know the option lengths we have
* to pass a user mode parameter for the protocols to sort out.
@@ -2297,35 +2331,17 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
int __user *optlen)
{
- int max_optlen __maybe_unused;
int err, fput_needed;
+ bool compat = in_compat_syscall();
struct socket *sock;
sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (!sock)
return err;
- err = security_socket_getsockopt(sock, level, optname);
- if (err)
- goto out_put;
-
- if (!in_compat_syscall())
- max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
-
- if (level == SOL_SOCKET)
- err = sock_getsockopt(sock, level, optname, optval, optlen);
- else if (unlikely(!sock->ops->getsockopt))
- err = -EOPNOTSUPP;
- else
- err = sock->ops->getsockopt(sock, level, optname, optval,
- optlen);
+ err = do_sock_getsockopt(sock, compat, level, optname,
+ USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
- if (!in_compat_syscall())
- err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
- USER_SOCKPTR(optval),
- USER_SOCKPTR(optlen),
- max_optlen, err);
-out_put:
fput_light(sock->file, fput_needed);
return err;
}
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd
2023-08-25 16:53 ` Breno Leitao
@ 2023-08-26 0:45 ` Martin KaFai Lau
0 siblings, 0 replies; 24+ messages in thread
From: Martin KaFai Lau @ 2023-08-26 0:45 UTC (permalink / raw)
To: Breno Leitao
Cc: Gabriel Krisman Bertazi, sdf, axboe, asml.silence,
willemdebruijn.kernel, bpf, linux-kernel, netdev, io-uring, kuba,
pabeni
On 8/25/23 9:53 AM, Breno Leitao wrote:
> On Mon, Aug 21, 2023 at 01:25:25PM -0700, Martin KaFai Lau wrote:
>> On 8/17/23 12:08 PM, Gabriel Krisman Bertazi wrote:
>>> Shouldn't you call sock->ops->getsockopt for level!=SOL_SOCKET prior to
>>> running the hook? Before this patch, it would bail out with EOPNOTSUPP,
>>> but now the bpf hook gets called even for level!=SOL_SOCKET, which
>>> doesn't fit __sys_getsockopt. Am I misreading the code?
>> I agree it should not call into bpf if the io_uring cannot support non
>> SOL_SOCKET optnames. Otherwise, the bpf prog will get different optval and
>> optlen when running in _sys_getsockopt vs io_uring getsockopt (e.g. in
>> regular _sys_getsockopt(SOL_TCP), bpf expects the optval returned from
>> tcp_getsockopt).
>>
>> I think __sys_getsockopt can also be refactored similar to __sys_setsockopt
>> in patch 3. Yes, for non SOL_SOCKET it only supports __user *optval and
>> __user *optlen but may be a WARN_ON_ONCE/BUG_ON(sockpt_is_kernel(optval))
>> can be added before calling ops->getsockopt()? Then this details can be
>> hidden away from the io_uring.
>
>
> Right, I've spent some time thinking about it, and this could be done.
> This is a draft I have. Is it what you had in mind?
Yes. lgtm. Thanks.
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH v3 9/9] selftests/bpf/sockopt: Add io_uring support
2023-08-17 14:55 [PATCH v3 0/9] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (7 preceding siblings ...)
2023-08-17 14:55 ` [PATCH v3 8/9] io_uring/cmd: BPF hook for getsockopt cmd Breno Leitao
@ 2023-08-17 14:55 ` Breno Leitao
2023-08-21 20:59 ` Martin KaFai Lau
8 siblings, 1 reply; 24+ messages in thread
From: Breno Leitao @ 2023-08-17 14:55 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
Andrii Nakryiko, Mykola Lysenko, Alexei Starovoitov,
Daniel Borkmann, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Hao Luo, Jiri Olsa, Shuah Khan
Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni, krisman,
Wang Yufen, Daniel Müller,
open list:KERNEL SELFTEST FRAMEWORK
Expand the sockopt test to use also check for io_uring {g,s}etsockopt
commands operations.
This patch starts by marking marks each test if they support io_uring
support or not. Right now, io_uring cmd getsockopt() has a limitation of
only accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP.
Later, each test runs using regular {g,s}etsockopt systemcalls, and, if
liburing is supported, execute the same test (again), but calling
liburing {g,s}setsockopt commands.
For that, leverage the mini liburing to do basic io_uring operations.
Signed-off-by: Breno Leitao <[email protected]>
---
tools/testing/selftests/bpf/Makefile | 1 +
.../selftests/bpf/prog_tests/sockopt.c | 129 +++++++++++++++++-
2 files changed, 124 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 538df8fb8c42..4da04242b848 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -362,6 +362,7 @@ CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
$(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
$(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline
+$(OUTPUT)/test_progs.o: CFLAGS += -I../../../include/
$(OUTPUT)/flow_dissector_load.o: flow_dissector_load.h
$(OUTPUT)/cgroup_getset_retval_hooks.o: cgroup_getset_retval_hooks.h
diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c
index 9e6a5e3ed4de..4693ad8bfe8f 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockopt.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
+#include <io_uring/mini_liburing.h>
#include "cgroup_helpers.h"
static char bpf_log_buf[4096];
@@ -38,6 +39,7 @@ static struct sockopt_test {
socklen_t get_optlen_ret;
enum sockopt_test_error error;
+ bool io_uring_support;
} tests[] = {
/* ==================== getsockopt ==================== */
@@ -53,6 +55,7 @@ static struct sockopt_test {
.attach_type = BPF_CGROUP_GETSOCKOPT,
.expected_attach_type = 0,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: wrong expected_attach_type",
@@ -65,6 +68,7 @@ static struct sockopt_test {
.attach_type = BPF_CGROUP_GETSOCKOPT,
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.error = DENY_ATTACH,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: bypass bpf hook",
@@ -121,6 +125,7 @@ static struct sockopt_test {
.attach_type = BPF_CGROUP_GETSOCKOPT,
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: read ctx->level",
@@ -164,6 +169,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: read ctx->optname",
@@ -225,6 +231,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: read ctx->optlen",
@@ -252,6 +259,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
.get_optlen = 64,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: deny bigger ctx->optlen",
@@ -276,6 +284,7 @@ static struct sockopt_test {
.get_optlen = 64,
.error = EFAULT_GETSOCKOPT,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: ignore >PAGE_SIZE optlen",
@@ -318,6 +327,7 @@ static struct sockopt_test {
.get_optval = {}, /* the changes are ignored */
.get_optlen = PAGE_SIZE + 1,
.error = EOPNOTSUPP_GETSOCKOPT,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: support smaller ctx->optlen",
@@ -339,6 +349,7 @@ static struct sockopt_test {
.get_optlen = 64,
.get_optlen_ret = 32,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: deny writing to ctx->optval",
@@ -353,6 +364,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: deny writing to ctx->optval_end",
@@ -367,6 +379,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: rewrite value",
@@ -421,6 +434,7 @@ static struct sockopt_test {
.attach_type = BPF_CGROUP_SETSOCKOPT,
.expected_attach_type = 0,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: wrong expected_attach_type",
@@ -433,6 +447,7 @@ static struct sockopt_test {
.attach_type = BPF_CGROUP_SETSOCKOPT,
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
.error = DENY_ATTACH,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: bypass bpf hook",
@@ -518,6 +533,7 @@ static struct sockopt_test {
.set_level = 123,
.set_optlen = 1,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: allow changing ctx->level",
@@ -572,6 +588,7 @@ static struct sockopt_test {
.set_optname = 123,
.set_optlen = 1,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: allow changing ctx->optname",
@@ -624,6 +641,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.set_optlen = 64,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: ctx->optlen == -1 is ok",
@@ -640,6 +658,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.set_optlen = 64,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: deny ctx->optlen < 0 (except -1)",
@@ -658,6 +677,7 @@ static struct sockopt_test {
.set_optlen = 4,
.error = EFAULT_SETSOCKOPT,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: deny ctx->optlen > input optlen",
@@ -675,6 +695,7 @@ static struct sockopt_test {
.set_optlen = 64,
.error = EFAULT_SETSOCKOPT,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: ignore >PAGE_SIZE optlen",
@@ -775,6 +796,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: deny read ctx->retval",
@@ -791,6 +813,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: deny writing to ctx->optval",
@@ -805,6 +828,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: deny writing to ctx->optval_end",
@@ -819,6 +843,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.error = DENY_LOAD,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: allow IP_TOS <= 128",
@@ -940,7 +965,94 @@ static int load_prog(const struct bpf_insn *insns,
return fd;
}
-static int run_test(int cgroup_fd, struct sockopt_test *test)
+/* Core function that handles io_uring ring initialization,
+ * sending SQE with sockopt command and waiting for the CQE.
+ */
+static int uring_sockopt(int op, int fd, int level, int optname,
+ const void *optval, socklen_t optlen)
+{
+ struct io_uring_cqe *cqe;
+ struct io_uring_sqe *sqe;
+ struct io_uring ring;
+ int err;
+
+ err = io_uring_queue_init(1, &ring, 0);
+ if (err) {
+ log_err("Failed to initialize io_uring ring");
+ return err;
+ }
+
+ sqe = io_uring_get_sqe(&ring);
+ if (!sqe) {
+ log_err("Failed to get an SQE");
+ return -1;
+ }
+
+ io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen);
+
+ err = io_uring_submit(&ring);
+ if (err != 1) {
+ log_err("Failed to submit SQE");
+ return err;
+ }
+
+ err = io_uring_wait_cqe(&ring, &cqe);
+ if (err) {
+ log_err("Failed to wait for CQE");
+ return err;
+ }
+
+ err = cqe->res;
+
+ io_uring_queue_exit(&ring);
+
+ return err;
+}
+
+static int uring_setsockopt(int fd, int level, int optname, const void *optval,
+ socklen_t optlen)
+{
+ return uring_sockopt(SOCKET_URING_OP_SETSOCKOPT, fd, level, optname,
+ optval, optlen);
+}
+
+static int uring_getsockopt(int fd, int level, int optname, void *optval,
+ socklen_t *optlen)
+{
+ int ret = uring_sockopt(SOCKET_URING_OP_GETSOCKOPT, fd, level, optname,
+ optval, *optlen);
+ if (ret < 0)
+ return ret;
+
+ /* Populate optlen back to be compatible with systemcall interface,
+ * and simplify the test.
+ */
+ *optlen = ret;
+
+ return 0;
+}
+
+/* Execute the setsocktopt operation */
+static int call_setsockopt(bool use_io_uring, int fd, int level, int optname,
+ const void *optval, socklen_t optlen)
+{
+ if (use_io_uring)
+ return uring_setsockopt(fd, level, optname, optval, optlen);
+
+ return setsockopt(fd, level, optname, optval, optlen);
+}
+
+/* Execute the getsocktopt operation */
+static int call_getsockopt(bool use_io_uring, int fd, int level, int optname,
+ void *optval, socklen_t *optlen)
+{
+ if (use_io_uring)
+ return uring_getsockopt(fd, level, optname, optval, optlen);
+
+ return getsockopt(fd, level, optname, optval, optlen);
+}
+
+static int run_test(int cgroup_fd, struct sockopt_test *test, bool use_io_uring)
{
int sock_fd, err, prog_fd;
void *optval = NULL;
@@ -980,8 +1092,9 @@ static int run_test(int cgroup_fd, struct sockopt_test *test)
test->set_optlen = num_pages * sysconf(_SC_PAGESIZE) + remainder;
}
- err = setsockopt(sock_fd, test->set_level, test->set_optname,
- test->set_optval, test->set_optlen);
+ err = call_setsockopt(use_io_uring, sock_fd, test->set_level,
+ test->set_optname, test->set_optval,
+ test->set_optlen);
if (err) {
if (errno == EPERM && test->error == EPERM_SETSOCKOPT)
goto close_sock_fd;
@@ -1008,8 +1121,8 @@ static int run_test(int cgroup_fd, struct sockopt_test *test)
socklen_t expected_get_optlen = test->get_optlen_ret ?:
test->get_optlen;
- err = getsockopt(sock_fd, test->get_level, test->get_optname,
- optval, &optlen);
+ err = call_getsockopt(use_io_uring, sock_fd, test->get_level,
+ test->get_optname, optval, &optlen);
if (err) {
if (errno == EOPNOTSUPP && test->error == EOPNOTSUPP_GETSOCKOPT)
goto free_optval;
@@ -1063,7 +1176,11 @@ void test_sockopt(void)
if (!test__start_subtest(tests[i].descr))
continue;
- ASSERT_OK(run_test(cgroup_fd, &tests[i]), tests[i].descr);
+ ASSERT_OK(run_test(cgroup_fd, &tests[i], false),
+ tests[i].descr);
+ if (tests[i].io_uring_support)
+ ASSERT_OK(run_test(cgroup_fd, &tests[i], true),
+ tests[i].descr);
}
close(cgroup_fd);
--
2.34.1
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH v3 9/9] selftests/bpf/sockopt: Add io_uring support
2023-08-17 14:55 ` [PATCH v3 9/9] selftests/bpf/sockopt: Add io_uring support Breno Leitao
@ 2023-08-21 20:59 ` Martin KaFai Lau
2023-08-25 14:15 ` Breno Leitao
0 siblings, 1 reply; 24+ messages in thread
From: Martin KaFai Lau @ 2023-08-21 20:59 UTC (permalink / raw)
To: Breno Leitao, Alexei Starovoitov, Daniel Borkmann
Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni, krisman,
Wang Yufen, Daniel Müller,
open list:KERNEL SELFTEST FRAMEWORK, sdf, axboe, asml.silence,
willemdebruijn.kernel, Andrii Nakryiko, Mykola Lysenko,
Alexei Starovoitov, Daniel Borkmann, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan
On 8/17/23 7:55 AM, Breno Leitao wrote:
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 538df8fb8c42..4da04242b848 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -362,6 +362,7 @@ CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
>
> $(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
> $(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline
> +$(OUTPUT)/test_progs.o: CFLAGS += -I../../../include/
This is the tools/include? Is it really needed? iirc, some of the prog_tests/*.c
has already been using files from tools/include.
>
> $(OUTPUT)/flow_dissector_load.o: flow_dissector_load.h
> $(OUTPUT)/cgroup_getset_retval_hooks.o: cgroup_getset_retval_hooks.h
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c
> index 9e6a5e3ed4de..4693ad8bfe8f 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockopt.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c
> @@ -1,5 +1,6 @@
> // SPDX-License-Identifier: GPL-2.0
> #include <test_progs.h>
> +#include <io_uring/mini_liburing.h>
> #include "cgroup_helpers.h"
>
> static char bpf_log_buf[4096];
> @@ -38,6 +39,7 @@ static struct sockopt_test {
> socklen_t get_optlen_ret;
>
> enum sockopt_test_error error;
> + bool io_uring_support;
> } tests[] = {
>
> /* ==================== getsockopt ==================== */
> @@ -53,6 +55,7 @@ static struct sockopt_test {
> .attach_type = BPF_CGROUP_GETSOCKOPT,
> .expected_attach_type = 0,
> .error = DENY_LOAD,
> + .io_uring_support = true,
DENY_LOAD probably won't be an intersting test. The set/getsockopt won't be called.
The existing test does not seem to have SOL_SOCKET for getsockopt also.
> -static int run_test(int cgroup_fd, struct sockopt_test *test)
> +/* Core function that handles io_uring ring initialization,
> + * sending SQE with sockopt command and waiting for the CQE.
> + */
> +static int uring_sockopt(int op, int fd, int level, int optname,
> + const void *optval, socklen_t optlen)
> +{
> + struct io_uring_cqe *cqe;
> + struct io_uring_sqe *sqe;
> + struct io_uring ring;
> + int err;
> +
> + err = io_uring_queue_init(1, &ring, 0);
> + if (err) {
> + log_err("Failed to initialize io_uring ring");
> + return err;
> + }
> +
> + sqe = io_uring_get_sqe(&ring);
> + if (!sqe) {
> + log_err("Failed to get an SQE");
> + return -1;
No need to io_uring_queue_exit() on the error path?
> + }
> +
> + io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen);
> +
> + err = io_uring_submit(&ring);
> + if (err != 1) {
> + log_err("Failed to submit SQE");
Use ASSERT_* instead.
Regarding how to land this set,
it will be useful to have the selftest running in the bpf CI. While there is
iouring changes, some of the changes is in bpf and/or netdev also. eg. Patch 3
already has a conflict with the net-next and bpf-next tree because of a recent
commit in socket.c on Aug 9.
May be Alexi and Daniel can advise how was similar patch managed before ?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH v3 9/9] selftests/bpf/sockopt: Add io_uring support
2023-08-21 20:59 ` Martin KaFai Lau
@ 2023-08-25 14:15 ` Breno Leitao
0 siblings, 0 replies; 24+ messages in thread
From: Breno Leitao @ 2023-08-25 14:15 UTC (permalink / raw)
To: Martin KaFai Lau
Cc: Alexei Starovoitov, Daniel Borkmann, bpf, linux-kernel, netdev,
io-uring, kuba, pabeni, krisman, Wang Yufen, Daniel Müller,
open list:KERNEL SELFTEST FRAMEWORK, sdf, axboe, asml.silence,
willemdebruijn.kernel, Andrii Nakryiko, Mykola Lysenko, Song Liu,
Yonghong Song, John Fastabend, KP Singh, Hao Luo, Jiri Olsa,
Shuah Khan
On Mon, Aug 21, 2023 at 01:59:12PM -0700, Martin KaFai Lau wrote:
> On 8/17/23 7:55 AM, Breno Leitao wrote:
> > diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> > index 538df8fb8c42..4da04242b848 100644
> > --- a/tools/testing/selftests/bpf/Makefile
> > +++ b/tools/testing/selftests/bpf/Makefile
> > @@ -362,6 +362,7 @@ CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
> > $(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
> > $(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline
> > +$(OUTPUT)/test_progs.o: CFLAGS += -I../../../include/
>
> This is the tools/include? Is it really needed? iirc, some of the
> prog_tests/*.c has already been using files from tools/include.
You are right, we don't need it.
> > $(OUTPUT)/flow_dissector_load.o: flow_dissector_load.h
> > $(OUTPUT)/cgroup_getset_retval_hooks.o: cgroup_getset_retval_hooks.h
> > diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c
> > index 9e6a5e3ed4de..4693ad8bfe8f 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/sockopt.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c
> > @@ -1,5 +1,6 @@
> > // SPDX-License-Identifier: GPL-2.0
> > #include <test_progs.h>
> > +#include <io_uring/mini_liburing.h>
> > #include "cgroup_helpers.h"
> > static char bpf_log_buf[4096];
> > @@ -38,6 +39,7 @@ static struct sockopt_test {
> > socklen_t get_optlen_ret;
> > enum sockopt_test_error error;
> > + bool io_uring_support;
> > } tests[] = {
> > /* ==================== getsockopt ==================== */
> > @@ -53,6 +55,7 @@ static struct sockopt_test {
> > .attach_type = BPF_CGROUP_GETSOCKOPT,
> > .expected_attach_type = 0,
> > .error = DENY_LOAD,
> > + .io_uring_support = true,
>
> DENY_LOAD probably won't be an intersting test. The set/getsockopt won't be called.
Yea, I will remove all the DENY_LOAD and DENY_ATTACH tests.
> The existing test does not seem to have SOL_SOCKET for getsockopt also.
I am planning to move two tests to use SOL_SOCKET so we can also
exercise the io_uring tests. This is what I have in mind right now:
* getsockopt: read ctx->optlen
* getsockopt: support smaller ctx->optlen
> > -static int run_test(int cgroup_fd, struct sockopt_test *test)
> > +/* Core function that handles io_uring ring initialization,
> > + * sending SQE with sockopt command and waiting for the CQE.
> > + */
> > +static int uring_sockopt(int op, int fd, int level, int optname,
> > + const void *optval, socklen_t optlen)
> > +{
> > + struct io_uring_cqe *cqe;
> > + struct io_uring_sqe *sqe;
> > + struct io_uring ring;
> > + int err;
> > +
> > + err = io_uring_queue_init(1, &ring, 0);
> > + if (err) {
> > + log_err("Failed to initialize io_uring ring");
> > + return err;
> > + }
> > +
> > + sqe = io_uring_get_sqe(&ring);
> > + if (!sqe) {
> > + log_err("Failed to get an SQE");
> > + return -1;
>
> No need to io_uring_queue_exit() on the error path?
Good idea. updating it.
> > + }
> > +
> > + io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen);
> > +
> > + err = io_uring_submit(&ring);
> > + if (err != 1) {
> > + log_err("Failed to submit SQE");
>
> Use ASSERT_* instead.
>
> Regarding how to land this set,
> it will be useful to have the selftest running in the bpf CI. While there is
> iouring changes, some of the changes is in bpf and/or netdev also. eg. Patch
> 3 already has a conflict with the net-next and bpf-next tree because of a
> recent commit in socket.c on Aug 9.
>
> May be Alexi and Daniel can advise how was similar patch managed before ?
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread