* [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
@ 2023-10-16 13:47 Breno Leitao
2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
` (12 more replies)
0 siblings, 13 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman
Cc: bpf, linux-kernel, netdev, io-uring
This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
SOL_SOCKET level, which seems to be the most common level parameter for
get/setsockopt(2).
In order to keep the implementation (and tests) simple, some refactors
were done prior to the changes, as follows:
Patches 1-2: Make BPF cgroup filters sockptr aware
Patches 3-4: Remove the core {s,g}etsockopt() core function from
__sys_{g,s}etsockopt, so, the code could be reused by other callers, such as
io_uring.
Patch 5: Pass compat mode to the file/socket callbacks
Patch 6-7: Move io_uring helpers from io_uring_zerocopy_tx to a generic
io_uring headers. This simplify the test case (last patch). Also copy the
io_uring UAPI to the tests directory.
Patch 8: Protect io_uring_cmd_sock() to not be called if CONFIG_NET is
disabled.
These changes were tested with a new test[1] in liburing, LTP sockopt*
tests, as also with bpf/progs/sockopt test case, which is now adapted to
run using both system calls and io_uring commands.
[1] Link: https://github.com/leitao/liburing/blob/getsock/test/socket-getsetsock-cmd.c
RFC -> V1:
* Copy user memory at io_uring subsystem, and call proto_ops
callbacks using kernel memory
* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT
V1 -> V2
* Implemented the BPF part
* Using user pointers from optval to avoid kmalloc in io_uring part.
V2 -> V3:
* Break down __sys_setsockopt and reuse the core code, avoiding
duplicated code. This removed the requirement to expose
sock_use_custom_sol_socket().
* Added io_uring test to selftests/bpf/sockopt.
* Fixed compat argument, by passing it to the issue_flags.
V3 -> V4:
* Rebase on top of commit 1ded5e5a5931b ("net: annotate data-races around sock->ops")
* Also broke down __sys_setsockopt() to reuse the core function
from io_uring.
* Create a new patch to return -EOPNOTSUPP if CONFIG_NET is
disabled.
* Added two SOL_SOCKET tests in bpf/prog_tests/sockopt.
V4 -> V5:
* Do not use sockptr anymore, by changing the optlen getsock argument
to be a user pointer (instead of a kernel pointer). This change also drop
the limitation on getsockopt from previous versions, and now all
levels are supported.
* Simplified the BPF sockopt test, since there is no more limitation on
the io_uring commands.
* No more changes in the BPF subsystem.
* Moved the optlen field in the SQE struct. It is now a pointer instead
of u32.
V5 -> V6:
* Removed the need for #ifdef CONFIG_NET as suggested by Gabriel
Krisman.
* Changed the variable declaration order to respect the reverse
xmas declaration as suggested by Paolo Abeni.
V6 -> V7:
* Changed the optlen back to a value in the SQE instead of
user-pointer. This is similar to version 4.
[https://lore.kernel.org/all/[email protected]/]
* Imported the io_uring.h into tools/include/uapi/linux to be able to
run the tests in machines without liburing.
[https://lore.kernel.org/all/[email protected]/]
Breno Leitao (11):
bpf: Leverage sockptr_t in BPF getsockopt hook
bpf: Leverage sockptr_t in BPF setsockopt hook
net/socket: Break down __sys_setsockopt
net/socket: Break down __sys_getsockopt
io_uring/cmd: Pass compat mode in issue_flags
tools headers: Grab copy of io_uring.h
selftests/net: Extract uring helpers to be reusable
io_uring/cmd: return -EOPNOTSUPP if net is disabled
io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
selftests/bpf/sockopt: Add io_uring support
Breno Leitao (11):
bpf: Add sockptr support for getsockopt
bpf: Add sockptr support for setsockopt
net/socket: Break down __sys_setsockopt
net/socket: Break down __sys_getsockopt
io_uring/cmd: Pass compat mode in issue_flags
tools headers: Grab copy of io_uring.h
selftests/net: Extract uring helpers to be reusable
io_uring/cmd: return -EOPNOTSUPP if net is disabled
io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
selftests/bpf/sockopt: Add io_uring support
include/linux/bpf-cgroup.h | 9 +-
include/linux/io_uring.h | 1 +
include/net/sock.h | 6 +-
include/uapi/linux/io_uring.h | 8 +
io_uring/uring_cmd.c | 53 ++
kernel/bpf/cgroup.c | 25 +-
net/core/sock.c | 8 -
net/socket.c | 103 ++-
tools/include/io_uring/mini_liburing.h | 282 +++++++
tools/include/uapi/linux/io_uring.h | 757 ++++++++++++++++++
.../selftests/bpf/prog_tests/sockopt.c | 113 ++-
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/io_uring_zerocopy_tx.c | 268 +------
13 files changed, 1300 insertions(+), 334 deletions(-)
create mode 100644 tools/include/io_uring/mini_liburing.h
create mode 100644 tools/include/uapi/linux/io_uring.h
--
2.34.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v7 01/11] bpf: Add sockptr support for getsockopt
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-19 19:24 ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Breno Leitao
` (11 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet
Cc: bpf, linux-kernel, netdev, io-uring
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in getsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/bpf-cgroup.h | 5 +++--
kernel/bpf/cgroup.c | 20 +++++++++++---------
net/socket.c | 5 +++--
3 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 98b8cea904fe..7b55844f6ba7 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -145,9 +145,10 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
int *optname, char __user *optval,
int *optlen, char **kernel_optval);
+
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
- int optname, char __user *optval,
- int __user *optlen, int max_optlen,
+ int optname, sockptr_t optval,
+ sockptr_t optlen, int max_optlen,
int retval);
int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 74ad2215e1ba..97745f67ac15 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1890,8 +1890,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
}
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
- int optname, char __user *optval,
- int __user *optlen, int max_optlen,
+ int optname, sockptr_t optval,
+ sockptr_t optlen, int max_optlen,
int retval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1918,8 +1918,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
* one that kernel returned as well to let
* BPF programs inspect the value.
*/
-
- if (get_user(ctx.optlen, optlen)) {
+ if (copy_from_sockptr(&ctx.optlen, optlen,
+ sizeof(ctx.optlen))) {
ret = -EFAULT;
goto out;
}
@@ -1930,8 +1930,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
}
orig_optlen = ctx.optlen;
- if (copy_from_user(ctx.optval, optval,
- min(ctx.optlen, max_optlen)) != 0) {
+ if (copy_from_sockptr(ctx.optval, optval,
+ min(ctx.optlen, max_optlen))) {
ret = -EFAULT;
goto out;
}
@@ -1945,7 +1945,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
if (ret < 0)
goto out;
- if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
+ if (!sockptr_is_null(optval) &&
+ (ctx.optlen > max_optlen || ctx.optlen < 0)) {
if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) {
pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n",
ctx.optlen, max_optlen);
@@ -1957,11 +1958,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
}
if (ctx.optlen != 0) {
- if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) {
+ if (!sockptr_is_null(optval) &&
+ copy_to_sockptr(optval, ctx.optval, ctx.optlen)) {
ret = -EFAULT;
goto out;
}
- if (put_user(ctx.optlen, optlen)) {
+ if (copy_to_sockptr(optlen, &ctx.optlen, sizeof(ctx.optlen))) {
ret = -EFAULT;
goto out;
}
diff --git a/net/socket.c b/net/socket.c
index 5740475e084c..6b47dd499218 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2373,8 +2373,9 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
if (!in_compat_syscall())
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
- optval, optlen, max_optlen,
- err);
+ USER_SOCKPTR(optval),
+ USER_SOCKPTR(optlen),
+ max_optlen, err);
out_put:
fput_light(sock->file, fput_needed);
return err;
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 02/11] bpf: Add sockptr support for setsockopt
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-19 19:32 ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
` (10 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet
Cc: bpf, linux-kernel, netdev, io-uring
The whole network stack uses sockptr, and while it doesn't move to
something more modern, let's use sockptr in setsockptr BPF hooks, so, it
could be used by other callers.
The main motivation for this change is to use it in the io_uring
{g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
kernel value for optlen.
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/bpf-cgroup.h | 2 +-
kernel/bpf/cgroup.c | 5 +++--
net/socket.c | 2 +-
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 7b55844f6ba7..2912dce9144e 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -143,7 +143,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
enum cgroup_bpf_attach_type atype);
int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
- int *optname, char __user *optval,
+ int *optname, sockptr_t optval,
int *optlen, char **kernel_optval);
int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 97745f67ac15..491d20038cbe 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1800,7 +1800,7 @@ static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx,
}
int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
- int *optname, char __user *optval,
+ int *optname, sockptr_t optval,
int *optlen, char **kernel_optval)
{
struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1823,7 +1823,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
ctx.optlen = *optlen;
- if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
+ if (copy_from_sockptr(ctx.optval, optval,
+ min(*optlen, max_optlen))) {
ret = -EFAULT;
goto out;
}
diff --git a/net/socket.c b/net/socket.c
index 6b47dd499218..28d3eb339514 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2305,7 +2305,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
if (!in_compat_syscall())
err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
- user_optval, &optlen,
+ optval, &optlen,
&kernel_optval);
if (err < 0)
goto out_put;
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 03/11] net/socket: Break down __sys_setsockopt
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
2023-10-16 13:47 ` [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-19 0:20 ` Jakub Kicinski
2023-10-19 19:38 ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
` (9 subsequent siblings)
12 siblings, 2 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman, David S. Miller, Eric Dumazet
Cc: bpf, linux-kernel, netdev, io-uring, Willem de Bruijn
Split __sys_setsockopt() into two functions by removing the core
logic into a sub-function (do_sock_setsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_setsockopt() will be called by io_uring setsockopt() command
operation in the following patch.
Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
include/net/sock.h | 2 ++
net/socket.c | 39 +++++++++++++++++++++++++--------------
2 files changed, 27 insertions(+), 14 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 242590308d64..00103e3143c4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1864,6 +1864,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, unsigned int optlen);
int sock_setsockopt(struct socket *sock, int level, int op,
sockptr_t optval, unsigned int optlen);
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, int optlen);
int sk_getsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, sockptr_t optlen);
diff --git a/net/socket.c b/net/socket.c
index 28d3eb339514..0087f8c071e7 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2279,31 +2279,21 @@ static bool sock_use_custom_sol_socket(const struct socket *sock)
return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
}
-/*
- * Set a socket option. Because we don't know the option lengths we have
- * to pass the user mode parameter for the protocols to sort out.
- */
-int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
- int optlen)
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, int optlen)
{
- sockptr_t optval = USER_SOCKPTR(user_optval);
const struct proto_ops *ops;
char *kernel_optval = NULL;
- int err, fput_needed;
- struct socket *sock;
+ int err;
if (optlen < 0)
return -EINVAL;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- return err;
-
err = security_socket_setsockopt(sock, level, optname);
if (err)
goto out_put;
- if (!in_compat_syscall())
+ if (!compat)
err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
optval, &optlen,
&kernel_optval);
@@ -2326,6 +2316,27 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
optlen);
kfree(kernel_optval);
out_put:
+ return err;
+}
+EXPORT_SYMBOL(do_sock_setsockopt);
+
+/* Set a socket option. Because we don't know the option lengths we have
+ * to pass the user mode parameter for the protocols to sort out.
+ */
+int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
+ int optlen)
+{
+ sockptr_t optval = USER_SOCKPTR(user_optval);
+ bool compat = in_compat_syscall();
+ int err, fput_needed;
+ struct socket *sock;
+
+ sock = sockfd_lookup_light(fd, &err, &fput_needed);
+ if (!sock)
+ return err;
+
+ err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
+
fput_light(sock->file, fput_needed);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (2 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-19 0:20 ` Jakub Kicinski
2023-10-19 19:12 ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
` (8 subsequent siblings)
12 siblings, 2 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Song Liu, Yonghong Song,
KP Singh, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet
Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
Alexander Mikhalitsyn, David Howells
Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.
do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.
The same was done for the setsockopt pair.
Suggested-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/bpf-cgroup.h | 2 +-
include/net/sock.h | 4 +--
net/core/sock.c | 8 -----
net/socket.c | 63 ++++++++++++++++++++++++--------------
4 files changed, 43 insertions(+), 34 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 2912dce9144e..a789266feac3 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -393,7 +393,7 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
({ \
int __ret = 0; \
if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT)) \
- get_user(__ret, optlen); \
+ copy_from_sockptr(&__ret, optlen, sizeof(int)); \
__ret; \
})
diff --git a/include/net/sock.h b/include/net/sock.h
index 00103e3143c4..1d6931caf0c3 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1866,11 +1866,11 @@ int sock_setsockopt(struct socket *sock, int level, int op,
sockptr_t optval, unsigned int optlen);
int do_sock_setsockopt(struct socket *sock, bool compat, int level,
int optname, sockptr_t optval, int optlen);
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, sockptr_t optlen);
int sk_getsockopt(struct sock *sk, int level, int optname,
sockptr_t optval, sockptr_t optlen);
-int sock_getsockopt(struct socket *sock, int level, int op,
- char __user *optval, int __user *optlen);
int sock_gettstamp(struct socket *sock, void __user *userstamp,
bool timeval, bool time32);
struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len,
diff --git a/net/core/sock.c b/net/core/sock.c
index 290165954379..d4cb8d6e75b7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2003,14 +2003,6 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
return 0;
}
-int sock_getsockopt(struct socket *sock, int level, int optname,
- char __user *optval, int __user *optlen)
-{
- return sk_getsockopt(sock->sk, level, optname,
- USER_SOCKPTR(optval),
- USER_SOCKPTR(optlen));
-}
-
/*
* Initialize an sk_lock.
*
diff --git a/net/socket.c b/net/socket.c
index 0087f8c071e7..f4c156a1987e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
int optname));
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+ int optname, sockptr_t optval, sockptr_t optlen)
+{
+ int max_optlen __maybe_unused;
+ const struct proto_ops *ops;
+ int err;
+
+ err = security_socket_getsockopt(sock, level, optname);
+ if (err)
+ return err;
+
+ ops = READ_ONCE(sock->ops);
+ if (level == SOL_SOCKET) {
+ err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
+ } else if (unlikely(!ops->getsockopt)) {
+ err = -EOPNOTSUPP;
+ } else {
+ if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
+ "Invalid argument type"))
+ return -EOPNOTSUPP;
+
+ err = ops->getsockopt(sock, level, optname, optval.user,
+ optlen.user);
+ }
+
+ if (!compat) {
+ max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+ err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
+ optval, optlen, max_optlen,
+ err);
+ }
+
+ return err;
+}
+EXPORT_SYMBOL(do_sock_getsockopt);
+
/*
* Get a socket option. Because we don't know the option lengths we have
* to pass a user mode parameter for the protocols to sort out.
@@ -2357,37 +2393,18 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
int __user *optlen)
{
- int max_optlen __maybe_unused;
- const struct proto_ops *ops;
int err, fput_needed;
struct socket *sock;
+ bool compat;
sock = sockfd_lookup_light(fd, &err, &fput_needed);
if (!sock)
return err;
- err = security_socket_getsockopt(sock, level, optname);
- if (err)
- goto out_put;
-
- if (!in_compat_syscall())
- max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+ compat = in_compat_syscall();
+ err = do_sock_getsockopt(sock, compat, level, optname,
+ USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
- ops = READ_ONCE(sock->ops);
- if (level == SOL_SOCKET)
- err = sock_getsockopt(sock, level, optname, optval, optlen);
- else if (unlikely(!ops->getsockopt))
- err = -EOPNOTSUPP;
- else
- err = ops->getsockopt(sock, level, optname, optval,
- optlen);
-
- if (!in_compat_syscall())
- err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
- USER_SOCKPTR(optval),
- USER_SOCKPTR(optlen),
- max_optlen, err);
-out_put:
fput_light(sock->file, fput_needed);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (3 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-16 13:47 ` [PATCH v7 06/11] tools headers: Grab copy of io_uring.h Breno Leitao
` (7 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman
Cc: bpf, linux-kernel, netdev, io-uring
Create a new flag to track if the operation is running compat mode.
This basically check the context->compat and pass it to the issue_flags,
so, it could be queried later in the callbacks.
Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: Gabriel Krisman Bertazi <[email protected]>
---
include/linux/io_uring.h | 1 +
io_uring/uring_cmd.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index b4391e0a9bc8..aefb73eeeebf 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -23,6 +23,7 @@ enum io_uring_cmd_flags {
/* set when uring wants to cancel a previously issued command */
IO_URING_F_CANCEL = (1 << 11),
+ IO_URING_F_COMPAT = (1 << 12),
};
/* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 00a5e5621a28..4bedd633c08c 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -175,6 +175,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
issue_flags |= IO_URING_F_SQE128;
if (ctx->flags & IORING_SETUP_CQE32)
issue_flags |= IO_URING_F_CQE32;
+ if (ctx->compat)
+ issue_flags |= IO_URING_F_COMPAT;
if (ctx->flags & IORING_SETUP_IOPOLL) {
if (!file->f_op->uring_cmd_iopoll)
return -EOPNOTSUPP;
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 06/11] tools headers: Grab copy of io_uring.h
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (4 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
[not found] ` <[email protected]>
2023-10-16 13:47 ` [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable Breno Leitao
` (6 subsequent siblings)
12 siblings, 1 reply; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman
Cc: bpf, linux-kernel, netdev, io-uring, Peter Zijlstra (Intel),
Stefan Metzmacher, Josh Triplett
This file will be used by mini_uring.h and allow tests to run without
the need of installing liburing to run the tests.
This is needed to run io_uring tests in BPF, such as
(tools/testing/selftests/bpf/prog_tests/sockopt.c).
Signed-off-by: Breno Leitao <[email protected]>
---
tools/include/uapi/linux/io_uring.h | 757 ++++++++++++++++++++++++++++
1 file changed, 757 insertions(+)
create mode 100644 tools/include/uapi/linux/io_uring.h
diff --git a/tools/include/uapi/linux/io_uring.h b/tools/include/uapi/linux/io_uring.h
new file mode 100644
index 000000000000..f1c16f817742
--- /dev/null
+++ b/tools/include/uapi/linux/io_uring.h
@@ -0,0 +1,757 @@
+/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */
+/*
+ * Header file for the io_uring interface.
+ *
+ * Copyright (C) 2019 Jens Axboe
+ * Copyright (C) 2019 Christoph Hellwig
+ */
+#ifndef LINUX_IO_URING_H
+#define LINUX_IO_URING_H
+
+#include <linux/fs.h>
+#include <linux/types.h>
+/*
+ * this file is shared with liburing and that has to autodetect
+ * if linux/time_types.h is available or not, it can
+ * define UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H
+ * if linux/time_types.h is not available
+ */
+#ifndef UAPI_LINUX_IO_URING_H_SKIP_LINUX_TIME_TYPES_H
+#include <linux/time_types.h>
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/*
+ * IO submission data structure (Submission Queue Entry)
+ */
+struct io_uring_sqe {
+ __u8 opcode; /* type of operation for this sqe */
+ __u8 flags; /* IOSQE_ flags */
+ __u16 ioprio; /* ioprio for the request */
+ __s32 fd; /* file descriptor to do IO on */
+ union {
+ __u64 off; /* offset into file */
+ __u64 addr2;
+ struct {
+ __u32 cmd_op;
+ __u32 __pad1;
+ };
+ };
+ union {
+ __u64 addr; /* pointer to buffer or iovecs */
+ __u64 splice_off_in;
+ struct {
+ __u32 level;
+ __u32 optname;
+ };
+ };
+ __u32 len; /* buffer size or number of iovecs */
+ union {
+ __kernel_rwf_t rw_flags;
+ __u32 fsync_flags;
+ __u16 poll_events; /* compatibility */
+ __u32 poll32_events; /* word-reversed for BE */
+ __u32 sync_range_flags;
+ __u32 msg_flags;
+ __u32 timeout_flags;
+ __u32 accept_flags;
+ __u32 cancel_flags;
+ __u32 open_flags;
+ __u32 statx_flags;
+ __u32 fadvise_advice;
+ __u32 splice_flags;
+ __u32 rename_flags;
+ __u32 unlink_flags;
+ __u32 hardlink_flags;
+ __u32 xattr_flags;
+ __u32 msg_ring_flags;
+ __u32 uring_cmd_flags;
+ __u32 waitid_flags;
+ __u32 futex_flags;
+ };
+ __u64 user_data; /* data to be passed back at completion time */
+ /* pack this to avoid bogus arm OABI complaints */
+ union {
+ /* index into fixed buffers, if used */
+ __u16 buf_index;
+ /* for grouped buffer selection */
+ __u16 buf_group;
+ } __attribute__((packed));
+ /* personality to use, if used */
+ __u16 personality;
+ union {
+ __s32 splice_fd_in;
+ __u32 file_index;
+ __u32 optlen;
+ struct {
+ __u16 addr_len;
+ __u16 __pad3[1];
+ };
+ };
+ union {
+ struct {
+ __u64 addr3;
+ __u64 __pad2[1];
+ };
+ __u64 optval;
+ /*
+ * If the ring is initialized with IORING_SETUP_SQE128, then
+ * this field is used for 80 bytes of arbitrary command data
+ */
+ __u8 cmd[0];
+ };
+};
+
+/*
+ * If sqe->file_index is set to this for opcodes that instantiate a new
+ * direct descriptor (like openat/openat2/accept), then io_uring will allocate
+ * an available direct descriptor instead of having the application pass one
+ * in. The picked direct descriptor will be returned in cqe->res, or -ENFILE
+ * if the space is full.
+ */
+#define IORING_FILE_INDEX_ALLOC (~0U)
+
+enum {
+ IOSQE_FIXED_FILE_BIT,
+ IOSQE_IO_DRAIN_BIT,
+ IOSQE_IO_LINK_BIT,
+ IOSQE_IO_HARDLINK_BIT,
+ IOSQE_ASYNC_BIT,
+ IOSQE_BUFFER_SELECT_BIT,
+ IOSQE_CQE_SKIP_SUCCESS_BIT,
+};
+
+/*
+ * sqe->flags
+ */
+/* use fixed fileset */
+#define IOSQE_FIXED_FILE (1U << IOSQE_FIXED_FILE_BIT)
+/* issue after inflight IO */
+#define IOSQE_IO_DRAIN (1U << IOSQE_IO_DRAIN_BIT)
+/* links next sqe */
+#define IOSQE_IO_LINK (1U << IOSQE_IO_LINK_BIT)
+/* like LINK, but stronger */
+#define IOSQE_IO_HARDLINK (1U << IOSQE_IO_HARDLINK_BIT)
+/* always go async */
+#define IOSQE_ASYNC (1U << IOSQE_ASYNC_BIT)
+/* select buffer from sqe->buf_group */
+#define IOSQE_BUFFER_SELECT (1U << IOSQE_BUFFER_SELECT_BIT)
+/* don't post CQE if request succeeded */
+#define IOSQE_CQE_SKIP_SUCCESS (1U << IOSQE_CQE_SKIP_SUCCESS_BIT)
+
+/*
+ * io_uring_setup() flags
+ */
+#define IORING_SETUP_IOPOLL (1U << 0) /* io_context is polled */
+#define IORING_SETUP_SQPOLL (1U << 1) /* SQ poll thread */
+#define IORING_SETUP_SQ_AFF (1U << 2) /* sq_thread_cpu is valid */
+#define IORING_SETUP_CQSIZE (1U << 3) /* app defines CQ size */
+#define IORING_SETUP_CLAMP (1U << 4) /* clamp SQ/CQ ring sizes */
+#define IORING_SETUP_ATTACH_WQ (1U << 5) /* attach to existing wq */
+#define IORING_SETUP_R_DISABLED (1U << 6) /* start with ring disabled */
+#define IORING_SETUP_SUBMIT_ALL (1U << 7) /* continue submit on error */
+/*
+ * Cooperative task running. When requests complete, they often require
+ * forcing the submitter to transition to the kernel to complete. If this
+ * flag is set, work will be done when the task transitions anyway, rather
+ * than force an inter-processor interrupt reschedule. This avoids interrupting
+ * a task running in userspace, and saves an IPI.
+ */
+#define IORING_SETUP_COOP_TASKRUN (1U << 8)
+/*
+ * If COOP_TASKRUN is set, get notified if task work is available for
+ * running and a kernel transition would be needed to run it. This sets
+ * IORING_SQ_TASKRUN in the sq ring flags. Not valid with COOP_TASKRUN.
+ */
+#define IORING_SETUP_TASKRUN_FLAG (1U << 9)
+#define IORING_SETUP_SQE128 (1U << 10) /* SQEs are 128 byte */
+#define IORING_SETUP_CQE32 (1U << 11) /* CQEs are 32 byte */
+/*
+ * Only one task is allowed to submit requests
+ */
+#define IORING_SETUP_SINGLE_ISSUER (1U << 12)
+
+/*
+ * Defer running task work to get events.
+ * Rather than running bits of task work whenever the task transitions
+ * try to do it just before it is needed.
+ */
+#define IORING_SETUP_DEFER_TASKRUN (1U << 13)
+
+/*
+ * Application provides the memory for the rings
+ */
+#define IORING_SETUP_NO_MMAP (1U << 14)
+
+/*
+ * Register the ring fd in itself for use with
+ * IORING_REGISTER_USE_REGISTERED_RING; return a registered fd index rather
+ * than an fd.
+ */
+#define IORING_SETUP_REGISTERED_FD_ONLY (1U << 15)
+
+/*
+ * Removes indirection through the SQ index array.
+ */
+#define IORING_SETUP_NO_SQARRAY (1U << 16)
+
+enum io_uring_op {
+ IORING_OP_NOP,
+ IORING_OP_READV,
+ IORING_OP_WRITEV,
+ IORING_OP_FSYNC,
+ IORING_OP_READ_FIXED,
+ IORING_OP_WRITE_FIXED,
+ IORING_OP_POLL_ADD,
+ IORING_OP_POLL_REMOVE,
+ IORING_OP_SYNC_FILE_RANGE,
+ IORING_OP_SENDMSG,
+ IORING_OP_RECVMSG,
+ IORING_OP_TIMEOUT,
+ IORING_OP_TIMEOUT_REMOVE,
+ IORING_OP_ACCEPT,
+ IORING_OP_ASYNC_CANCEL,
+ IORING_OP_LINK_TIMEOUT,
+ IORING_OP_CONNECT,
+ IORING_OP_FALLOCATE,
+ IORING_OP_OPENAT,
+ IORING_OP_CLOSE,
+ IORING_OP_FILES_UPDATE,
+ IORING_OP_STATX,
+ IORING_OP_READ,
+ IORING_OP_WRITE,
+ IORING_OP_FADVISE,
+ IORING_OP_MADVISE,
+ IORING_OP_SEND,
+ IORING_OP_RECV,
+ IORING_OP_OPENAT2,
+ IORING_OP_EPOLL_CTL,
+ IORING_OP_SPLICE,
+ IORING_OP_PROVIDE_BUFFERS,
+ IORING_OP_REMOVE_BUFFERS,
+ IORING_OP_TEE,
+ IORING_OP_SHUTDOWN,
+ IORING_OP_RENAMEAT,
+ IORING_OP_UNLINKAT,
+ IORING_OP_MKDIRAT,
+ IORING_OP_SYMLINKAT,
+ IORING_OP_LINKAT,
+ IORING_OP_MSG_RING,
+ IORING_OP_FSETXATTR,
+ IORING_OP_SETXATTR,
+ IORING_OP_FGETXATTR,
+ IORING_OP_GETXATTR,
+ IORING_OP_SOCKET,
+ IORING_OP_URING_CMD,
+ IORING_OP_SEND_ZC,
+ IORING_OP_SENDMSG_ZC,
+ IORING_OP_READ_MULTISHOT,
+ IORING_OP_WAITID,
+ IORING_OP_FUTEX_WAIT,
+ IORING_OP_FUTEX_WAKE,
+ IORING_OP_FUTEX_WAITV,
+
+ /* this goes last, obviously */
+ IORING_OP_LAST,
+};
+
+/*
+ * sqe->uring_cmd_flags top 8bits aren't available for userspace
+ * IORING_URING_CMD_FIXED use registered buffer; pass this flag
+ * along with setting sqe->buf_index.
+ */
+#define IORING_URING_CMD_FIXED (1U << 0)
+#define IORING_URING_CMD_MASK IORING_URING_CMD_FIXED
+
+
+/*
+ * sqe->fsync_flags
+ */
+#define IORING_FSYNC_DATASYNC (1U << 0)
+
+/*
+ * sqe->timeout_flags
+ */
+#define IORING_TIMEOUT_ABS (1U << 0)
+#define IORING_TIMEOUT_UPDATE (1U << 1)
+#define IORING_TIMEOUT_BOOTTIME (1U << 2)
+#define IORING_TIMEOUT_REALTIME (1U << 3)
+#define IORING_LINK_TIMEOUT_UPDATE (1U << 4)
+#define IORING_TIMEOUT_ETIME_SUCCESS (1U << 5)
+#define IORING_TIMEOUT_MULTISHOT (1U << 6)
+#define IORING_TIMEOUT_CLOCK_MASK (IORING_TIMEOUT_BOOTTIME | IORING_TIMEOUT_REALTIME)
+#define IORING_TIMEOUT_UPDATE_MASK (IORING_TIMEOUT_UPDATE | IORING_LINK_TIMEOUT_UPDATE)
+/*
+ * sqe->splice_flags
+ * extends splice(2) flags
+ */
+#define SPLICE_F_FD_IN_FIXED (1U << 31) /* the last bit of __u32 */
+
+/*
+ * POLL_ADD flags. Note that since sqe->poll_events is the flag space, the
+ * command flags for POLL_ADD are stored in sqe->len.
+ *
+ * IORING_POLL_ADD_MULTI Multishot poll. Sets IORING_CQE_F_MORE if
+ * the poll handler will continue to report
+ * CQEs on behalf of the same SQE.
+ *
+ * IORING_POLL_UPDATE Update existing poll request, matching
+ * sqe->addr as the old user_data field.
+ *
+ * IORING_POLL_LEVEL Level triggered poll.
+ */
+#define IORING_POLL_ADD_MULTI (1U << 0)
+#define IORING_POLL_UPDATE_EVENTS (1U << 1)
+#define IORING_POLL_UPDATE_USER_DATA (1U << 2)
+#define IORING_POLL_ADD_LEVEL (1U << 3)
+
+/*
+ * ASYNC_CANCEL flags.
+ *
+ * IORING_ASYNC_CANCEL_ALL Cancel all requests that match the given key
+ * IORING_ASYNC_CANCEL_FD Key off 'fd' for cancelation rather than the
+ * request 'user_data'
+ * IORING_ASYNC_CANCEL_ANY Match any request
+ * IORING_ASYNC_CANCEL_FD_FIXED 'fd' passed in is a fixed descriptor
+ * IORING_ASYNC_CANCEL_USERDATA Match on user_data, default for no other key
+ * IORING_ASYNC_CANCEL_OP Match request based on opcode
+ */
+#define IORING_ASYNC_CANCEL_ALL (1U << 0)
+#define IORING_ASYNC_CANCEL_FD (1U << 1)
+#define IORING_ASYNC_CANCEL_ANY (1U << 2)
+#define IORING_ASYNC_CANCEL_FD_FIXED (1U << 3)
+#define IORING_ASYNC_CANCEL_USERDATA (1U << 4)
+#define IORING_ASYNC_CANCEL_OP (1U << 5)
+
+/*
+ * send/sendmsg and recv/recvmsg flags (sqe->ioprio)
+ *
+ * IORING_RECVSEND_POLL_FIRST If set, instead of first attempting to send
+ * or receive and arm poll if that yields an
+ * -EAGAIN result, arm poll upfront and skip
+ * the initial transfer attempt.
+ *
+ * IORING_RECV_MULTISHOT Multishot recv. Sets IORING_CQE_F_MORE if
+ * the handler will continue to report
+ * CQEs on behalf of the same SQE.
+ *
+ * IORING_RECVSEND_FIXED_BUF Use registered buffers, the index is stored in
+ * the buf_index field.
+ *
+ * IORING_SEND_ZC_REPORT_USAGE
+ * If set, SEND[MSG]_ZC should report
+ * the zerocopy usage in cqe.res
+ * for the IORING_CQE_F_NOTIF cqe.
+ * 0 is reported if zerocopy was actually possible.
+ * IORING_NOTIF_USAGE_ZC_COPIED if data was copied
+ * (at least partially).
+ */
+#define IORING_RECVSEND_POLL_FIRST (1U << 0)
+#define IORING_RECV_MULTISHOT (1U << 1)
+#define IORING_RECVSEND_FIXED_BUF (1U << 2)
+#define IORING_SEND_ZC_REPORT_USAGE (1U << 3)
+
+/*
+ * cqe.res for IORING_CQE_F_NOTIF if
+ * IORING_SEND_ZC_REPORT_USAGE was requested
+ *
+ * It should be treated as a flag, all other
+ * bits of cqe.res should be treated as reserved!
+ */
+#define IORING_NOTIF_USAGE_ZC_COPIED (1U << 31)
+
+/*
+ * accept flags stored in sqe->ioprio
+ */
+#define IORING_ACCEPT_MULTISHOT (1U << 0)
+
+/*
+ * IORING_OP_MSG_RING command types, stored in sqe->addr
+ */
+enum {
+ IORING_MSG_DATA, /* pass sqe->len as 'res' and off as user_data */
+ IORING_MSG_SEND_FD, /* send a registered fd to another ring */
+};
+
+/*
+ * IORING_OP_MSG_RING flags (sqe->msg_ring_flags)
+ *
+ * IORING_MSG_RING_CQE_SKIP Don't post a CQE to the target ring. Not
+ * applicable for IORING_MSG_DATA, obviously.
+ */
+#define IORING_MSG_RING_CQE_SKIP (1U << 0)
+/* Pass through the flags from sqe->file_index to cqe->flags */
+#define IORING_MSG_RING_FLAGS_PASS (1U << 1)
+
+/*
+ * IO completion data structure (Completion Queue Entry)
+ */
+struct io_uring_cqe {
+ __u64 user_data; /* sqe->data submission passed back */
+ __s32 res; /* result code for this event */
+ __u32 flags;
+
+ /*
+ * If the ring is initialized with IORING_SETUP_CQE32, then this field
+ * contains 16-bytes of padding, doubling the size of the CQE.
+ */
+ __u64 big_cqe[];
+};
+
+/*
+ * cqe->flags
+ *
+ * IORING_CQE_F_BUFFER If set, the upper 16 bits are the buffer ID
+ * IORING_CQE_F_MORE If set, parent SQE will generate more CQE entries
+ * IORING_CQE_F_SOCK_NONEMPTY If set, more data to read after socket recv
+ * IORING_CQE_F_NOTIF Set for notification CQEs. Can be used to distinct
+ * them from sends.
+ */
+#define IORING_CQE_F_BUFFER (1U << 0)
+#define IORING_CQE_F_MORE (1U << 1)
+#define IORING_CQE_F_SOCK_NONEMPTY (1U << 2)
+#define IORING_CQE_F_NOTIF (1U << 3)
+
+enum {
+ IORING_CQE_BUFFER_SHIFT = 16,
+};
+
+/*
+ * Magic offsets for the application to mmap the data it needs
+ */
+#define IORING_OFF_SQ_RING 0ULL
+#define IORING_OFF_CQ_RING 0x8000000ULL
+#define IORING_OFF_SQES 0x10000000ULL
+#define IORING_OFF_PBUF_RING 0x80000000ULL
+#define IORING_OFF_PBUF_SHIFT 16
+#define IORING_OFF_MMAP_MASK 0xf8000000ULL
+
+/*
+ * Filled with the offset for mmap(2)
+ */
+struct io_sqring_offsets {
+ __u32 head;
+ __u32 tail;
+ __u32 ring_mask;
+ __u32 ring_entries;
+ __u32 flags;
+ __u32 dropped;
+ __u32 array;
+ __u32 resv1;
+ __u64 user_addr;
+};
+
+/*
+ * sq_ring->flags
+ */
+#define IORING_SQ_NEED_WAKEUP (1U << 0) /* needs io_uring_enter wakeup */
+#define IORING_SQ_CQ_OVERFLOW (1U << 1) /* CQ ring is overflown */
+#define IORING_SQ_TASKRUN (1U << 2) /* task should enter the kernel */
+
+struct io_cqring_offsets {
+ __u32 head;
+ __u32 tail;
+ __u32 ring_mask;
+ __u32 ring_entries;
+ __u32 overflow;
+ __u32 cqes;
+ __u32 flags;
+ __u32 resv1;
+ __u64 user_addr;
+};
+
+/*
+ * cq_ring->flags
+ */
+
+/* disable eventfd notifications */
+#define IORING_CQ_EVENTFD_DISABLED (1U << 0)
+
+/*
+ * io_uring_enter(2) flags
+ */
+#define IORING_ENTER_GETEVENTS (1U << 0)
+#define IORING_ENTER_SQ_WAKEUP (1U << 1)
+#define IORING_ENTER_SQ_WAIT (1U << 2)
+#define IORING_ENTER_EXT_ARG (1U << 3)
+#define IORING_ENTER_REGISTERED_RING (1U << 4)
+
+/*
+ * Passed in for io_uring_setup(2). Copied back with updated info on success
+ */
+struct io_uring_params {
+ __u32 sq_entries;
+ __u32 cq_entries;
+ __u32 flags;
+ __u32 sq_thread_cpu;
+ __u32 sq_thread_idle;
+ __u32 features;
+ __u32 wq_fd;
+ __u32 resv[3];
+ struct io_sqring_offsets sq_off;
+ struct io_cqring_offsets cq_off;
+};
+
+/*
+ * io_uring_params->features flags
+ */
+#define IORING_FEAT_SINGLE_MMAP (1U << 0)
+#define IORING_FEAT_NODROP (1U << 1)
+#define IORING_FEAT_SUBMIT_STABLE (1U << 2)
+#define IORING_FEAT_RW_CUR_POS (1U << 3)
+#define IORING_FEAT_CUR_PERSONALITY (1U << 4)
+#define IORING_FEAT_FAST_POLL (1U << 5)
+#define IORING_FEAT_POLL_32BITS (1U << 6)
+#define IORING_FEAT_SQPOLL_NONFIXED (1U << 7)
+#define IORING_FEAT_EXT_ARG (1U << 8)
+#define IORING_FEAT_NATIVE_WORKERS (1U << 9)
+#define IORING_FEAT_RSRC_TAGS (1U << 10)
+#define IORING_FEAT_CQE_SKIP (1U << 11)
+#define IORING_FEAT_LINKED_FILE (1U << 12)
+#define IORING_FEAT_REG_REG_RING (1U << 13)
+
+/*
+ * io_uring_register(2) opcodes and arguments
+ */
+enum {
+ IORING_REGISTER_BUFFERS = 0,
+ IORING_UNREGISTER_BUFFERS = 1,
+ IORING_REGISTER_FILES = 2,
+ IORING_UNREGISTER_FILES = 3,
+ IORING_REGISTER_EVENTFD = 4,
+ IORING_UNREGISTER_EVENTFD = 5,
+ IORING_REGISTER_FILES_UPDATE = 6,
+ IORING_REGISTER_EVENTFD_ASYNC = 7,
+ IORING_REGISTER_PROBE = 8,
+ IORING_REGISTER_PERSONALITY = 9,
+ IORING_UNREGISTER_PERSONALITY = 10,
+ IORING_REGISTER_RESTRICTIONS = 11,
+ IORING_REGISTER_ENABLE_RINGS = 12,
+
+ /* extended with tagging */
+ IORING_REGISTER_FILES2 = 13,
+ IORING_REGISTER_FILES_UPDATE2 = 14,
+ IORING_REGISTER_BUFFERS2 = 15,
+ IORING_REGISTER_BUFFERS_UPDATE = 16,
+
+ /* set/clear io-wq thread affinities */
+ IORING_REGISTER_IOWQ_AFF = 17,
+ IORING_UNREGISTER_IOWQ_AFF = 18,
+
+ /* set/get max number of io-wq workers */
+ IORING_REGISTER_IOWQ_MAX_WORKERS = 19,
+
+ /* register/unregister io_uring fd with the ring */
+ IORING_REGISTER_RING_FDS = 20,
+ IORING_UNREGISTER_RING_FDS = 21,
+
+ /* register ring based provide buffer group */
+ IORING_REGISTER_PBUF_RING = 22,
+ IORING_UNREGISTER_PBUF_RING = 23,
+
+ /* sync cancelation API */
+ IORING_REGISTER_SYNC_CANCEL = 24,
+
+ /* register a range of fixed file slots for automatic slot allocation */
+ IORING_REGISTER_FILE_ALLOC_RANGE = 25,
+
+ /* this goes last */
+ IORING_REGISTER_LAST,
+
+ /* flag added to the opcode to use a registered ring fd */
+ IORING_REGISTER_USE_REGISTERED_RING = 1U << 31
+};
+
+/* io-wq worker categories */
+enum {
+ IO_WQ_BOUND,
+ IO_WQ_UNBOUND,
+};
+
+/* deprecated, see struct io_uring_rsrc_update */
+struct io_uring_files_update {
+ __u32 offset;
+ __u32 resv;
+ __aligned_u64 /* __s32 * */ fds;
+};
+
+/*
+ * Register a fully sparse file space, rather than pass in an array of all
+ * -1 file descriptors.
+ */
+#define IORING_RSRC_REGISTER_SPARSE (1U << 0)
+
+struct io_uring_rsrc_register {
+ __u32 nr;
+ __u32 flags;
+ __u64 resv2;
+ __aligned_u64 data;
+ __aligned_u64 tags;
+};
+
+struct io_uring_rsrc_update {
+ __u32 offset;
+ __u32 resv;
+ __aligned_u64 data;
+};
+
+struct io_uring_rsrc_update2 {
+ __u32 offset;
+ __u32 resv;
+ __aligned_u64 data;
+ __aligned_u64 tags;
+ __u32 nr;
+ __u32 resv2;
+};
+
+/* Skip updating fd indexes set to this value in the fd table */
+#define IORING_REGISTER_FILES_SKIP (-2)
+
+#define IO_URING_OP_SUPPORTED (1U << 0)
+
+struct io_uring_probe_op {
+ __u8 op;
+ __u8 resv;
+ __u16 flags; /* IO_URING_OP_* flags */
+ __u32 resv2;
+};
+
+struct io_uring_probe {
+ __u8 last_op; /* last opcode supported */
+ __u8 ops_len; /* length of ops[] array below */
+ __u16 resv;
+ __u32 resv2[3];
+ struct io_uring_probe_op ops[];
+};
+
+struct io_uring_restriction {
+ __u16 opcode;
+ union {
+ __u8 register_op; /* IORING_RESTRICTION_REGISTER_OP */
+ __u8 sqe_op; /* IORING_RESTRICTION_SQE_OP */
+ __u8 sqe_flags; /* IORING_RESTRICTION_SQE_FLAGS_* */
+ };
+ __u8 resv;
+ __u32 resv2[3];
+};
+
+struct io_uring_buf {
+ __u64 addr;
+ __u32 len;
+ __u16 bid;
+ __u16 resv;
+};
+
+struct io_uring_buf_ring {
+ union {
+ /*
+ * To avoid spilling into more pages than we need to, the
+ * ring tail is overlaid with the io_uring_buf->resv field.
+ */
+ struct {
+ __u64 resv1;
+ __u32 resv2;
+ __u16 resv3;
+ __u16 tail;
+ };
+ __DECLARE_FLEX_ARRAY(struct io_uring_buf, bufs);
+ };
+};
+
+/*
+ * Flags for IORING_REGISTER_PBUF_RING.
+ *
+ * IOU_PBUF_RING_MMAP: If set, kernel will allocate the memory for the ring.
+ * The application must not set a ring_addr in struct
+ * io_uring_buf_reg, instead it must subsequently call
+ * mmap(2) with the offset set as:
+ * IORING_OFF_PBUF_RING | (bgid << IORING_OFF_PBUF_SHIFT)
+ * to get a virtual mapping for the ring.
+ */
+enum {
+ IOU_PBUF_RING_MMAP = 1,
+};
+
+/* argument for IORING_(UN)REGISTER_PBUF_RING */
+struct io_uring_buf_reg {
+ __u64 ring_addr;
+ __u32 ring_entries;
+ __u16 bgid;
+ __u16 flags;
+ __u64 resv[3];
+};
+
+/*
+ * io_uring_restriction->opcode values
+ */
+enum {
+ /* Allow an io_uring_register(2) opcode */
+ IORING_RESTRICTION_REGISTER_OP = 0,
+
+ /* Allow an sqe opcode */
+ IORING_RESTRICTION_SQE_OP = 1,
+
+ /* Allow sqe flags */
+ IORING_RESTRICTION_SQE_FLAGS_ALLOWED = 2,
+
+ /* Require sqe flags (these flags must be set on each submission) */
+ IORING_RESTRICTION_SQE_FLAGS_REQUIRED = 3,
+
+ IORING_RESTRICTION_LAST
+};
+
+struct io_uring_getevents_arg {
+ __u64 sigmask;
+ __u32 sigmask_sz;
+ __u32 pad;
+ __u64 ts;
+};
+
+/*
+ * Argument for IORING_REGISTER_SYNC_CANCEL
+ */
+struct io_uring_sync_cancel_reg {
+ __u64 addr;
+ __s32 fd;
+ __u32 flags;
+ struct __kernel_timespec timeout;
+ __u8 opcode;
+ __u8 pad[7];
+ __u64 pad2[3];
+};
+
+/*
+ * Argument for IORING_REGISTER_FILE_ALLOC_RANGE
+ * The range is specified as [off, off + len)
+ */
+struct io_uring_file_index_range {
+ __u32 off;
+ __u32 len;
+ __u64 resv;
+};
+
+struct io_uring_recvmsg_out {
+ __u32 namelen;
+ __u32 controllen;
+ __u32 payloadlen;
+ __u32 flags;
+};
+
+/*
+ * Argument for IORING_OP_URING_CMD when file is a socket
+ */
+enum {
+ SOCKET_URING_OP_SIOCINQ = 0,
+ SOCKET_URING_OP_SIOCOUTQ,
+ SOCKET_URING_OP_GETSOCKOPT,
+ SOCKET_URING_OP_SETSOCKOPT,
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (5 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 06/11] tools headers: Grab copy of io_uring.h Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-16 13:47 ` [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
` (5 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman, David S. Miller, Eric Dumazet, Shuah Khan
Cc: bpf, linux-kernel, netdev, io-uring,
open list:KERNEL SELFTEST FRAMEWORK
Instead of defining basic io_uring functions in the test case, move them
to a common directory, so, other tests can use them.
This simplify the test code and reuse the common liburing
infrastructure. This is basically a copy of what we have in
io_uring_zerocopy_tx with some minor improvements to make checkpatch
happy.
A follow-up test will use the same helpers in a BPF sockopt test.
Signed-off-by: Breno Leitao <[email protected]>
---
tools/include/io_uring/mini_liburing.h | 282 ++++++++++++++++++
tools/testing/selftests/net/Makefile | 1 +
.../selftests/net/io_uring_zerocopy_tx.c | 268 +----------------
3 files changed, 285 insertions(+), 266 deletions(-)
create mode 100644 tools/include/io_uring/mini_liburing.h
diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h
new file mode 100644
index 000000000000..9ccb16074eb5
--- /dev/null
+++ b/tools/include/io_uring/mini_liburing.h
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: MIT */
+
+#include <linux/io_uring.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+struct io_sq_ring {
+ unsigned int *head;
+ unsigned int *tail;
+ unsigned int *ring_mask;
+ unsigned int *ring_entries;
+ unsigned int *flags;
+ unsigned int *array;
+};
+
+struct io_cq_ring {
+ unsigned int *head;
+ unsigned int *tail;
+ unsigned int *ring_mask;
+ unsigned int *ring_entries;
+ struct io_uring_cqe *cqes;
+};
+
+struct io_uring_sq {
+ unsigned int *khead;
+ unsigned int *ktail;
+ unsigned int *kring_mask;
+ unsigned int *kring_entries;
+ unsigned int *kflags;
+ unsigned int *kdropped;
+ unsigned int *array;
+ struct io_uring_sqe *sqes;
+
+ unsigned int sqe_head;
+ unsigned int sqe_tail;
+
+ size_t ring_sz;
+};
+
+struct io_uring_cq {
+ unsigned int *khead;
+ unsigned int *ktail;
+ unsigned int *kring_mask;
+ unsigned int *kring_entries;
+ unsigned int *koverflow;
+ struct io_uring_cqe *cqes;
+
+ size_t ring_sz;
+};
+
+struct io_uring {
+ struct io_uring_sq sq;
+ struct io_uring_cq cq;
+ int ring_fd;
+};
+
+#if defined(__x86_64) || defined(__i386__)
+#define read_barrier() __asm__ __volatile__("":::"memory")
+#define write_barrier() __asm__ __volatile__("":::"memory")
+#else
+#define read_barrier() __sync_synchronize()
+#define write_barrier() __sync_synchronize()
+#endif
+
+static inline int io_uring_mmap(int fd, struct io_uring_params *p,
+ struct io_uring_sq *sq, struct io_uring_cq *cq)
+{
+ size_t size;
+ void *ptr;
+ int ret;
+
+ sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned int);
+ ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
+ if (ptr == MAP_FAILED)
+ return -errno;
+ sq->khead = ptr + p->sq_off.head;
+ sq->ktail = ptr + p->sq_off.tail;
+ sq->kring_mask = ptr + p->sq_off.ring_mask;
+ sq->kring_entries = ptr + p->sq_off.ring_entries;
+ sq->kflags = ptr + p->sq_off.flags;
+ sq->kdropped = ptr + p->sq_off.dropped;
+ sq->array = ptr + p->sq_off.array;
+
+ size = p->sq_entries * sizeof(struct io_uring_sqe);
+ sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
+ if (sq->sqes == MAP_FAILED) {
+ ret = -errno;
+err:
+ munmap(sq->khead, sq->ring_sz);
+ return ret;
+ }
+
+ cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
+ ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
+ MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
+ if (ptr == MAP_FAILED) {
+ ret = -errno;
+ munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
+ goto err;
+ }
+ cq->khead = ptr + p->cq_off.head;
+ cq->ktail = ptr + p->cq_off.tail;
+ cq->kring_mask = ptr + p->cq_off.ring_mask;
+ cq->kring_entries = ptr + p->cq_off.ring_entries;
+ cq->koverflow = ptr + p->cq_off.overflow;
+ cq->cqes = ptr + p->cq_off.cqes;
+ return 0;
+}
+
+static inline int io_uring_setup(unsigned int entries,
+ struct io_uring_params *p)
+{
+ return syscall(__NR_io_uring_setup, entries, p);
+}
+
+static inline int io_uring_enter(int fd, unsigned int to_submit,
+ unsigned int min_complete,
+ unsigned int flags, sigset_t *sig)
+{
+ return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
+ flags, sig, _NSIG / 8);
+}
+
+static inline int io_uring_queue_init(unsigned int entries,
+ struct io_uring *ring,
+ unsigned int flags)
+{
+ struct io_uring_params p;
+ int fd, ret;
+
+ memset(ring, 0, sizeof(*ring));
+ memset(&p, 0, sizeof(p));
+ p.flags = flags;
+
+ fd = io_uring_setup(entries, &p);
+ if (fd < 0)
+ return fd;
+ ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
+ if (!ret)
+ ring->ring_fd = fd;
+ else
+ close(fd);
+ return ret;
+}
+
+/* Get a sqe */
+static inline struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+
+ if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
+ return NULL;
+ return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
+}
+
+static inline int io_uring_wait_cqe(struct io_uring *ring,
+ struct io_uring_cqe **cqe_ptr)
+{
+ struct io_uring_cq *cq = &ring->cq;
+ const unsigned int mask = *cq->kring_mask;
+ unsigned int head = *cq->khead;
+ int ret;
+
+ *cqe_ptr = NULL;
+ do {
+ read_barrier();
+ if (head != *cq->ktail) {
+ *cqe_ptr = &cq->cqes[head & mask];
+ break;
+ }
+ ret = io_uring_enter(ring->ring_fd, 0, 1,
+ IORING_ENTER_GETEVENTS, NULL);
+ if (ret < 0)
+ return -errno;
+ } while (1);
+
+ return 0;
+}
+
+static inline int io_uring_submit(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+ const unsigned int mask = *sq->kring_mask;
+ unsigned int ktail, submitted, to_submit;
+ int ret;
+
+ read_barrier();
+ if (*sq->khead != *sq->ktail) {
+ submitted = *sq->kring_entries;
+ goto submit;
+ }
+ if (sq->sqe_head == sq->sqe_tail)
+ return 0;
+
+ ktail = *sq->ktail;
+ to_submit = sq->sqe_tail - sq->sqe_head;
+ for (submitted = 0; submitted < to_submit; submitted++) {
+ read_barrier();
+ sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
+ }
+ if (!submitted)
+ return 0;
+
+ if (*sq->ktail != ktail) {
+ write_barrier();
+ *sq->ktail = ktail;
+ write_barrier();
+ }
+submit:
+ ret = io_uring_enter(ring->ring_fd, submitted, 0,
+ IORING_ENTER_GETEVENTS, NULL);
+ return ret < 0 ? -errno : ret;
+}
+
+static inline void io_uring_queue_exit(struct io_uring *ring)
+{
+ struct io_uring_sq *sq = &ring->sq;
+
+ munmap(sq->sqes, *sq->kring_entries * sizeof(struct io_uring_sqe));
+ munmap(sq->khead, sq->ring_sz);
+ close(ring->ring_fd);
+}
+
+/* Prepare and send the SQE */
+static inline void io_uring_prep_cmd(struct io_uring_sqe *sqe, int op,
+ int sockfd,
+ int level, int optname,
+ const void *optval,
+ int optlen)
+{
+ memset(sqe, 0, sizeof(*sqe));
+ sqe->opcode = (__u8)IORING_OP_URING_CMD;
+ sqe->fd = sockfd;
+ sqe->cmd_op = op;
+
+ sqe->level = level;
+ sqe->optname = optname;
+ sqe->optval = (unsigned long long)optval;
+ sqe->optlen = optlen;
+}
+
+static inline int io_uring_register_buffers(struct io_uring *ring,
+ const struct iovec *iovecs,
+ unsigned int nr_iovecs)
+{
+ int ret;
+
+ ret = syscall(__NR_io_uring_register, ring->ring_fd,
+ IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
+ return (ret < 0) ? -errno : ret;
+}
+
+static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
+ const void *buf, size_t len, int flags)
+{
+ memset(sqe, 0, sizeof(*sqe));
+ sqe->opcode = (__u8)IORING_OP_SEND;
+ sqe->fd = sockfd;
+ sqe->addr = (unsigned long)buf;
+ sqe->len = len;
+ sqe->msg_flags = (__u32)flags;
+}
+
+static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
+ const void *buf, size_t len, int flags,
+ unsigned int zc_flags)
+{
+ io_uring_prep_send(sqe, sockfd, buf, len, flags);
+ sqe->opcode = (__u8)IORING_OP_SEND_ZC;
+ sqe->ioprio = zc_flags;
+}
+
+static inline void io_uring_cqe_seen(struct io_uring *ring)
+{
+ *(&ring->cq)->khead += 1;
+ write_barrier();
+}
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 61939a695f95..4adfe2186f39 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -99,6 +99,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
$(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
$(OUTPUT)/tcp_inq: LDLIBS += -lpthread
$(OUTPUT)/bind_bhash: LDLIBS += -lpthread
+$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
# Rules to generate bpf obj nat6to4.o
CLANG ?= clang
diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
index 154287740172..76e604e4810e 100644
--- a/tools/testing/selftests/net/io_uring_zerocopy_tx.c
+++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
@@ -36,6 +36,8 @@
#include <sys/un.h>
#include <sys/wait.h>
+#include <io_uring/mini_liburing.h>
+
#define NOTIF_TAG 0xfffffffULL
#define NONZC_TAG 0
#define ZC_TAG 1
@@ -60,272 +62,6 @@ static struct sockaddr_storage cfg_dst_addr;
static char payload[IP_MAXPACKET] __attribute__((aligned(4096)));
-struct io_sq_ring {
- unsigned *head;
- unsigned *tail;
- unsigned *ring_mask;
- unsigned *ring_entries;
- unsigned *flags;
- unsigned *array;
-};
-
-struct io_cq_ring {
- unsigned *head;
- unsigned *tail;
- unsigned *ring_mask;
- unsigned *ring_entries;
- struct io_uring_cqe *cqes;
-};
-
-struct io_uring_sq {
- unsigned *khead;
- unsigned *ktail;
- unsigned *kring_mask;
- unsigned *kring_entries;
- unsigned *kflags;
- unsigned *kdropped;
- unsigned *array;
- struct io_uring_sqe *sqes;
-
- unsigned sqe_head;
- unsigned sqe_tail;
-
- size_t ring_sz;
-};
-
-struct io_uring_cq {
- unsigned *khead;
- unsigned *ktail;
- unsigned *kring_mask;
- unsigned *kring_entries;
- unsigned *koverflow;
- struct io_uring_cqe *cqes;
-
- size_t ring_sz;
-};
-
-struct io_uring {
- struct io_uring_sq sq;
- struct io_uring_cq cq;
- int ring_fd;
-};
-
-#ifdef __alpha__
-# ifndef __NR_io_uring_setup
-# define __NR_io_uring_setup 535
-# endif
-# ifndef __NR_io_uring_enter
-# define __NR_io_uring_enter 536
-# endif
-# ifndef __NR_io_uring_register
-# define __NR_io_uring_register 537
-# endif
-#else /* !__alpha__ */
-# ifndef __NR_io_uring_setup
-# define __NR_io_uring_setup 425
-# endif
-# ifndef __NR_io_uring_enter
-# define __NR_io_uring_enter 426
-# endif
-# ifndef __NR_io_uring_register
-# define __NR_io_uring_register 427
-# endif
-#endif
-
-#if defined(__x86_64) || defined(__i386__)
-#define read_barrier() __asm__ __volatile__("":::"memory")
-#define write_barrier() __asm__ __volatile__("":::"memory")
-#else
-
-#define read_barrier() __sync_synchronize()
-#define write_barrier() __sync_synchronize()
-#endif
-
-static int io_uring_setup(unsigned int entries, struct io_uring_params *p)
-{
- return syscall(__NR_io_uring_setup, entries, p);
-}
-
-static int io_uring_enter(int fd, unsigned int to_submit,
- unsigned int min_complete,
- unsigned int flags, sigset_t *sig)
-{
- return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
- flags, sig, _NSIG / 8);
-}
-
-static int io_uring_register_buffers(struct io_uring *ring,
- const struct iovec *iovecs,
- unsigned nr_iovecs)
-{
- int ret;
-
- ret = syscall(__NR_io_uring_register, ring->ring_fd,
- IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
- return (ret < 0) ? -errno : ret;
-}
-
-static int io_uring_mmap(int fd, struct io_uring_params *p,
- struct io_uring_sq *sq, struct io_uring_cq *cq)
-{
- size_t size;
- void *ptr;
- int ret;
-
- sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned);
- ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
- if (ptr == MAP_FAILED)
- return -errno;
- sq->khead = ptr + p->sq_off.head;
- sq->ktail = ptr + p->sq_off.tail;
- sq->kring_mask = ptr + p->sq_off.ring_mask;
- sq->kring_entries = ptr + p->sq_off.ring_entries;
- sq->kflags = ptr + p->sq_off.flags;
- sq->kdropped = ptr + p->sq_off.dropped;
- sq->array = ptr + p->sq_off.array;
-
- size = p->sq_entries * sizeof(struct io_uring_sqe);
- sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
- if (sq->sqes == MAP_FAILED) {
- ret = -errno;
-err:
- munmap(sq->khead, sq->ring_sz);
- return ret;
- }
-
- cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
- ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
- MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
- if (ptr == MAP_FAILED) {
- ret = -errno;
- munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
- goto err;
- }
- cq->khead = ptr + p->cq_off.head;
- cq->ktail = ptr + p->cq_off.tail;
- cq->kring_mask = ptr + p->cq_off.ring_mask;
- cq->kring_entries = ptr + p->cq_off.ring_entries;
- cq->koverflow = ptr + p->cq_off.overflow;
- cq->cqes = ptr + p->cq_off.cqes;
- return 0;
-}
-
-static int io_uring_queue_init(unsigned entries, struct io_uring *ring,
- unsigned flags)
-{
- struct io_uring_params p;
- int fd, ret;
-
- memset(ring, 0, sizeof(*ring));
- memset(&p, 0, sizeof(p));
- p.flags = flags;
-
- fd = io_uring_setup(entries, &p);
- if (fd < 0)
- return fd;
- ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
- if (!ret)
- ring->ring_fd = fd;
- else
- close(fd);
- return ret;
-}
-
-static int io_uring_submit(struct io_uring *ring)
-{
- struct io_uring_sq *sq = &ring->sq;
- const unsigned mask = *sq->kring_mask;
- unsigned ktail, submitted, to_submit;
- int ret;
-
- read_barrier();
- if (*sq->khead != *sq->ktail) {
- submitted = *sq->kring_entries;
- goto submit;
- }
- if (sq->sqe_head == sq->sqe_tail)
- return 0;
-
- ktail = *sq->ktail;
- to_submit = sq->sqe_tail - sq->sqe_head;
- for (submitted = 0; submitted < to_submit; submitted++) {
- read_barrier();
- sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
- }
- if (!submitted)
- return 0;
-
- if (*sq->ktail != ktail) {
- write_barrier();
- *sq->ktail = ktail;
- write_barrier();
- }
-submit:
- ret = io_uring_enter(ring->ring_fd, submitted, 0,
- IORING_ENTER_GETEVENTS, NULL);
- return ret < 0 ? -errno : ret;
-}
-
-static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
- const void *buf, size_t len, int flags)
-{
- memset(sqe, 0, sizeof(*sqe));
- sqe->opcode = (__u8) IORING_OP_SEND;
- sqe->fd = sockfd;
- sqe->addr = (unsigned long) buf;
- sqe->len = len;
- sqe->msg_flags = (__u32) flags;
-}
-
-static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
- const void *buf, size_t len, int flags,
- unsigned zc_flags)
-{
- io_uring_prep_send(sqe, sockfd, buf, len, flags);
- sqe->opcode = (__u8) IORING_OP_SEND_ZC;
- sqe->ioprio = zc_flags;
-}
-
-static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
-{
- struct io_uring_sq *sq = &ring->sq;
-
- if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
- return NULL;
- return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
-}
-
-static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr)
-{
- struct io_uring_cq *cq = &ring->cq;
- const unsigned mask = *cq->kring_mask;
- unsigned head = *cq->khead;
- int ret;
-
- *cqe_ptr = NULL;
- do {
- read_barrier();
- if (head != *cq->ktail) {
- *cqe_ptr = &cq->cqes[head & mask];
- break;
- }
- ret = io_uring_enter(ring->ring_fd, 0, 1,
- IORING_ENTER_GETEVENTS, NULL);
- if (ret < 0)
- return -errno;
- } while (1);
-
- return 0;
-}
-
-static inline void io_uring_cqe_seen(struct io_uring *ring)
-{
- *(&ring->cq)->khead += 1;
- write_barrier();
-}
-
static unsigned long gettimeofday_ms(void)
{
struct timeval tv;
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (6 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-16 13:47 ` [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
` (4 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman
Cc: bpf, linux-kernel, netdev, io-uring
Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If
network is not enabled, but io_uring is, then we want to return
-EOPNOTSUPP for any possible socket operation.
This is helpful because io_uring_cmd_sock() can now call functions that
only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET
inside the function itself.
Signed-off-by: Breno Leitao <[email protected]>
---
io_uring/uring_cmd.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 4bedd633c08c..42694c07d8fd 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -214,6 +214,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
}
EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
+#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
struct socket *sock = cmd->file->private_data;
@@ -240,3 +241,4 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
}
}
EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (7 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-16 13:47 ` [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
` (3 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman
Cc: bpf, linux-kernel, netdev, io-uring
Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
where a sockptr_t is either userspace or kernel space, and handled as
such.
Differently from the getsockopt(2), the optlen field is not a userspace
pointers. In getsockopt(2), userspace provides optlen pointer, which is
overwritten by the kernel. In this implementation, userspace passes a
u32, and the new value is returned in cqe->res. I.e., optlen is not a
pointer.
Important to say that userspace needs to keep the pointer alive until
the CQE is completed.
Signed-off-by: Breno Leitao <[email protected]>
---
include/uapi/linux/io_uring.h | 7 +++++++
io_uring/uring_cmd.c | 28 ++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 92be89a871fc..9628d4f5daba 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -43,6 +43,10 @@ struct io_uring_sqe {
union {
__u64 addr; /* pointer to buffer or iovecs */
__u64 splice_off_in;
+ struct {
+ __u32 level;
+ __u32 optname;
+ };
};
__u32 len; /* buffer size or number of iovecs */
union {
@@ -81,6 +85,7 @@ struct io_uring_sqe {
union {
__s32 splice_fd_in;
__u32 file_index;
+ __u32 optlen;
struct {
__u16 addr_len;
__u16 __pad3[1];
@@ -91,6 +96,7 @@ struct io_uring_sqe {
__u64 addr3;
__u64 __pad2[1];
};
+ __u64 optval;
/*
* If the ring is initialized with IORING_SETUP_SQE128, then
* this field is used for 80 bytes of arbitrary command data
@@ -740,6 +746,7 @@ struct io_uring_recvmsg_out {
enum {
SOCKET_URING_OP_SIOCINQ = 0,
SOCKET_URING_OP_SIOCOUTQ,
+ SOCKET_URING_OP_GETSOCKOPT,
};
#ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 42694c07d8fd..8b045830b0d9 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -214,6 +214,32 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
}
EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
+static inline int io_uring_cmd_getsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+ int optlen, optname, level, err;
+ void __user *optval;
+
+ level = READ_ONCE(cmd->sqe->level);
+ if (level != SOL_SOCKET)
+ return -EOPNOTSUPP;
+
+ optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+ optname = READ_ONCE(cmd->sqe->optname);
+ optlen = READ_ONCE(cmd->sqe->optlen);
+
+ err = do_sock_getsockopt(sock, compat, level, optname,
+ USER_SOCKPTR(optval),
+ KERNEL_SOCKPTR(&optlen));
+ if (err)
+ return err;
+
+ /* On success, return optlen */
+ return optlen;
+}
+
#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
@@ -236,6 +262,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
if (ret)
return ret;
return arg;
+ case SOCKET_URING_OP_GETSOCKOPT:
+ return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (8 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-16 13:47 ` [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Breno Leitao
` (2 subsequent siblings)
12 siblings, 0 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman
Cc: bpf, linux-kernel, netdev, io-uring
Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
similar to setsockopt. This implementation leverages the function
do_sock_setsockopt(), which is shared with the setsockopt() system call
path.
Important to say that userspace needs to keep the pointer's memory alive
until the operation is completed. I.e, the memory could not be
deallocated before the CQE is returned to userspace.
Signed-off-by: Breno Leitao <[email protected]>
---
include/uapi/linux/io_uring.h | 1 +
io_uring/uring_cmd.c | 21 +++++++++++++++++++++
2 files changed, 22 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 9628d4f5daba..f1c16f817742 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -747,6 +747,7 @@ enum {
SOCKET_URING_OP_SIOCINQ = 0,
SOCKET_URING_OP_SIOCOUTQ,
SOCKET_URING_OP_GETSOCKOPT,
+ SOCKET_URING_OP_SETSOCKOPT,
};
#ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 8b045830b0d9..acbc2924ecd2 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -240,6 +240,25 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
return optlen;
}
+static inline int io_uring_cmd_setsockopt(struct socket *sock,
+ struct io_uring_cmd *cmd,
+ unsigned int issue_flags)
+{
+ bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+ int optname, optlen, level;
+ void __user *optval;
+ sockptr_t optval_s;
+
+ optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+ optname = READ_ONCE(cmd->sqe->optname);
+ optlen = READ_ONCE(cmd->sqe->optlen);
+ level = READ_ONCE(cmd->sqe->level);
+ optval_s = USER_SOCKPTR(optval);
+
+ return do_sock_setsockopt(sock, compat, level, optname, optval_s,
+ optlen);
+}
+
#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
@@ -264,6 +283,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
return arg;
case SOCKET_URING_OP_GETSOCKOPT:
return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
+ case SOCKET_URING_OP_SETSOCKOPT:
+ return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
default:
return -EOPNOTSUPP;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (9 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
@ 2023-10-16 13:47 ` Breno Leitao
2023-10-19 19:51 ` Martin KaFai Lau
2023-10-19 14:58 ` [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Jens Axboe
2023-10-19 15:41 ` Jens Axboe
12 siblings, 1 reply; 27+ messages in thread
From: Breno Leitao @ 2023-10-16 13:47 UTC (permalink / raw)
To: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend,
KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko, Shuah Khan
Cc: bpf, linux-kernel, netdev, io-uring, Daniel Müller,
open list:KERNEL SELFTEST FRAMEWORK
Expand the sockopt test to use also check for io_uring {g,s}etsockopt
commands operations.
This patch starts by marking each test if they support io_uring support
or not.
Right now, io_uring cmd getsockopt() has a limitation of only
accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Since
there aren't any test exercising getsockopt(level == SOL_SOCKET), this
patch changes two tests to use level == SOL_SOCKET, they are
"getsockopt: support smaller ctx->optlen" and "getsockopt: read
ctx->optlen".
There is no limitation for the setsockopt() part.
Later, each test runs using regular {g,s}etsockopt systemcalls, and, if
liburing is supported, execute the same test (again), but calling
liburing {g,s}setsockopt commands.
This patch also changes the level of two tests to use SOL_SOCKET for the
following two tests. This is going to help to exercise the io_uring
subsystem:
* getsockopt: read ctx->optlen
* getsockopt: support smaller ctx->optlen
Signed-off-by: Breno Leitao <[email protected]>
---
.../selftests/bpf/prog_tests/sockopt.c | 113 +++++++++++++++++-
1 file changed, 107 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c
index 9e6a5e3ed4de..5a4491d4edfe 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockopt.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include <test_progs.h>
+#include <io_uring/mini_liburing.h>
#include "cgroup_helpers.h"
static char bpf_log_buf[4096];
@@ -38,6 +39,7 @@ static struct sockopt_test {
socklen_t get_optlen_ret;
enum sockopt_test_error error;
+ bool io_uring_support;
} tests[] = {
/* ==================== getsockopt ==================== */
@@ -251,7 +253,9 @@ static struct sockopt_test {
.attach_type = BPF_CGROUP_GETSOCKOPT,
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
+ .get_level = SOL_SOCKET,
.get_optlen = 64,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: deny bigger ctx->optlen",
@@ -276,6 +280,7 @@ static struct sockopt_test {
.get_optlen = 64,
.error = EFAULT_GETSOCKOPT,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: ignore >PAGE_SIZE optlen",
@@ -318,6 +323,7 @@ static struct sockopt_test {
.get_optval = {}, /* the changes are ignored */
.get_optlen = PAGE_SIZE + 1,
.error = EOPNOTSUPP_GETSOCKOPT,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: support smaller ctx->optlen",
@@ -337,8 +343,10 @@ static struct sockopt_test {
.attach_type = BPF_CGROUP_GETSOCKOPT,
.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
+ .get_level = SOL_SOCKET,
.get_optlen = 64,
.get_optlen_ret = 32,
+ .io_uring_support = true,
},
{
.descr = "getsockopt: deny writing to ctx->optval",
@@ -518,6 +526,7 @@ static struct sockopt_test {
.set_level = 123,
.set_optlen = 1,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: allow changing ctx->level",
@@ -572,6 +581,7 @@ static struct sockopt_test {
.set_optname = 123,
.set_optlen = 1,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: allow changing ctx->optname",
@@ -624,6 +634,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.set_optlen = 64,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: ctx->optlen == -1 is ok",
@@ -640,6 +651,7 @@ static struct sockopt_test {
.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
.set_optlen = 64,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: deny ctx->optlen < 0 (except -1)",
@@ -658,6 +670,7 @@ static struct sockopt_test {
.set_optlen = 4,
.error = EFAULT_SETSOCKOPT,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: deny ctx->optlen > input optlen",
@@ -675,6 +688,7 @@ static struct sockopt_test {
.set_optlen = 64,
.error = EFAULT_SETSOCKOPT,
+ .io_uring_support = true,
},
{
.descr = "setsockopt: ignore >PAGE_SIZE optlen",
@@ -940,7 +954,89 @@ static int load_prog(const struct bpf_insn *insns,
return fd;
}
-static int run_test(int cgroup_fd, struct sockopt_test *test)
+/* Core function that handles io_uring ring initialization,
+ * sending SQE with sockopt command and waiting for the CQE.
+ */
+static int uring_sockopt(int op, int fd, int level, int optname,
+ const void *optval, socklen_t optlen)
+{
+ struct io_uring_cqe *cqe;
+ struct io_uring_sqe *sqe;
+ struct io_uring ring;
+ int err;
+
+ err = io_uring_queue_init(1, &ring, 0);
+ if (!ASSERT_OK(err, "io_uring initialization"))
+ return err;
+
+ sqe = io_uring_get_sqe(&ring);
+ if (!ASSERT_NEQ(sqe, NULL, "Get an SQE")) {
+ err = -1;
+ goto fail;
+ }
+
+ io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen);
+
+ err = io_uring_submit(&ring);
+ if (!ASSERT_EQ(err, 1, "Submit SQE"))
+ goto fail;
+
+ err = io_uring_wait_cqe(&ring, &cqe);
+ if (!ASSERT_OK(err, "Wait for CQE"))
+ goto fail;
+
+ err = cqe->res;
+
+fail:
+ io_uring_queue_exit(&ring);
+
+ return err;
+}
+
+static int uring_setsockopt(int fd, int level, int optname, const void *optval,
+ socklen_t optlen)
+{
+ return uring_sockopt(SOCKET_URING_OP_SETSOCKOPT, fd, level, optname,
+ optval, optlen);
+}
+
+static int uring_getsockopt(int fd, int level, int optname, void *optval,
+ socklen_t *optlen)
+{
+ int ret = uring_sockopt(SOCKET_URING_OP_GETSOCKOPT, fd, level, optname,
+ optval, *optlen);
+ if (ret < 0)
+ return ret;
+
+ /* Populate optlen back to be compatible with systemcall interface,
+ * and simplify the test.
+ */
+ *optlen = ret;
+
+ return 0;
+}
+
+/* Execute the setsocktopt operation */
+static int call_setsockopt(bool use_io_uring, int fd, int level, int optname,
+ const void *optval, socklen_t optlen)
+{
+ if (use_io_uring)
+ return uring_setsockopt(fd, level, optname, optval, optlen);
+
+ return setsockopt(fd, level, optname, optval, optlen);
+}
+
+/* Execute the getsocktopt operation */
+static int call_getsockopt(bool use_io_uring, int fd, int level, int optname,
+ void *optval, socklen_t *optlen)
+{
+ if (use_io_uring)
+ return uring_getsockopt(fd, level, optname, optval, optlen);
+
+ return getsockopt(fd, level, optname, optval, optlen);
+}
+
+static int run_test(int cgroup_fd, struct sockopt_test *test, bool use_io_uring)
{
int sock_fd, err, prog_fd;
void *optval = NULL;
@@ -980,8 +1076,9 @@ static int run_test(int cgroup_fd, struct sockopt_test *test)
test->set_optlen = num_pages * sysconf(_SC_PAGESIZE) + remainder;
}
- err = setsockopt(sock_fd, test->set_level, test->set_optname,
- test->set_optval, test->set_optlen);
+ err = call_setsockopt(use_io_uring, sock_fd, test->set_level,
+ test->set_optname, test->set_optval,
+ test->set_optlen);
if (err) {
if (errno == EPERM && test->error == EPERM_SETSOCKOPT)
goto close_sock_fd;
@@ -1008,8 +1105,8 @@ static int run_test(int cgroup_fd, struct sockopt_test *test)
socklen_t expected_get_optlen = test->get_optlen_ret ?:
test->get_optlen;
- err = getsockopt(sock_fd, test->get_level, test->get_optname,
- optval, &optlen);
+ err = call_getsockopt(use_io_uring, sock_fd, test->get_level,
+ test->get_optname, optval, &optlen);
if (err) {
if (errno == EOPNOTSUPP && test->error == EOPNOTSUPP_GETSOCKOPT)
goto free_optval;
@@ -1063,7 +1160,11 @@ void test_sockopt(void)
if (!test__start_subtest(tests[i].descr))
continue;
- ASSERT_OK(run_test(cgroup_fd, &tests[i]), tests[i].descr);
+ ASSERT_OK(run_test(cgroup_fd, &tests[i], false),
+ tests[i].descr);
+ if (tests[i].io_uring_support)
+ ASSERT_OK(run_test(cgroup_fd, &tests[i], true),
+ tests[i].descr);
}
close(cgroup_fd);
--
2.34.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v7 06/11] tools headers: Grab copy of io_uring.h
[not found] ` <[email protected]>
@ 2023-10-17 12:34 ` Breno Leitao
0 siblings, 0 replies; 27+ messages in thread
From: Breno Leitao @ 2023-10-17 12:34 UTC (permalink / raw)
To: Gabriel Krisman Bertazi, martin.lau, sdf
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, bpf, linux-kernel, netdev, io-uring,
Peter Zijlstra (Intel), Stefan Metzmacher, Josh Triplett
Hello Gabriel,
On Mon, Oct 16, 2023 at 02:56:55PM -0400, Gabriel Krisman Bertazi wrote:
> Breno Leitao <[email protected]> writes:
>
> > This file will be used by mini_uring.h and allow tests to run without
> > the need of installing liburing to run the tests.
> >
> > This is needed to run io_uring tests in BPF, such as
> > (tools/testing/selftests/bpf/prog_tests/sockopt.c).
> >
> > Signed-off-by: Breno Leitao <[email protected]>
>
> Can't mini_uring rely on the kernel header like
> selftests/net/io_uring_zerocopy_tx.c does?
Before this patch, io_uring_zerocopy_tx was not relying on "make
headers" headers, as far as I know. I think it was not a problem because
there was no CI running the test, and whoever was running the test was
relying on local io_uring headers.
My patch is, in fact, adding the following flag, which relies on the
headers now on:
+$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
> I ask because this will be the third copy of these
> definitions that we're gonna need to keep in sync (kernel, liburing and
> here). Given this is only used for selftests, we better avoid the
> duplication.
Right, I don't know why this was the suggested way, but, that is how
people are using it.
I can definitely get rid of the copy and do the same mechanism as
io_uring_zerocopy_tx. This is what I've tested, and it worked fine.
---
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 4225f975fce3..9f79a392acc1 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -383,6 +383,8 @@ BPF_CFLAGS = -g -Wall -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
-Wno-compare-distinct-pointer-types
+HEADER_CFLAGS = -I$(abspath $(OUTPUT)/../../../../usr/include)
+
$(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
$(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline
@@ -551,7 +553,7 @@ $(TRUNNER_TEST_OBJS): $(TRUNNER_OUTPUT)/%.test.o: \
$(TRUNNER_BPF_SKELS_LINKED) \
$$(BPFOBJ) | $(TRUNNER_OUTPUT)
$$(call msg,TEST-OBJ,$(TRUNNER_BINARY),$$@)
- $(Q)cd $$(@D) && $$(CC) -I. $$(CFLAGS) -c $(CURDIR)/$$< $$(LDLIBS) -o $$(@F)
+ $(Q)cd $$(@D) && $$(CC) -I. $$(CFLAGS) $$(HEADER_CFLAGS) -c $(CURDIR)/$$< $$(LDLIBS) -o $$(@F)
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v7 03/11] net/socket: Break down __sys_setsockopt
2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
@ 2023-10-19 0:20 ` Jakub Kicinski
2023-10-19 19:38 ` Martin KaFai Lau
1 sibling, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2023-10-19 0:20 UTC (permalink / raw)
To: Breno Leitao
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, pabeni,
martin.lau, krisman, David S. Miller, Eric Dumazet, bpf,
linux-kernel, netdev, io-uring, Willem de Bruijn
On Mon, 16 Oct 2023 06:47:41 -0700 Breno Leitao wrote:
> Split __sys_setsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_setsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.
>
> do_sock_setsockopt() will be called by io_uring setsockopt() command
> operation in the following patch.
Acked-by: Jakub Kicinski <[email protected]>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
@ 2023-10-19 0:20 ` Jakub Kicinski
2023-10-19 19:12 ` Martin KaFai Lau
1 sibling, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2023-10-19 0:20 UTC (permalink / raw)
To: Breno Leitao
Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, pabeni,
martin.lau, krisman, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Song Liu, Yonghong Song,
KP Singh, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, bpf,
linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
Alexander Mikhalitsyn, David Howells
On Mon, 16 Oct 2023 06:47:42 -0700 Breno Leitao wrote:
> Split __sys_getsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_getsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.
>
> do_sock_getsockopt() will be called by io_uring getsockopt() command
> operation in the following patch.
>
> The same was done for the setsockopt pair.
Acked-by: Jakub Kicinski <[email protected]>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (10 preceding siblings ...)
2023-10-16 13:47 ` [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Breno Leitao
@ 2023-10-19 14:58 ` Jens Axboe
2023-10-19 15:33 ` Jakub Kicinski
2023-10-19 15:41 ` Jens Axboe
12 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2023-10-19 14:58 UTC (permalink / raw)
To: Breno Leitao, sdf, asml.silence, willemdebruijn.kernel, kuba,
pabeni, martin.lau, krisman
Cc: bpf, linux-kernel, netdev, io-uring
On 10/16/23 7:47 AM, Breno Leitao wrote:
> This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
> SOL_SOCKET level, which seems to be the most common level parameter for
> get/setsockopt(2).
>
> In order to keep the implementation (and tests) simple, some refactors
> were done prior to the changes, as follows:
Looks like folks are mostly happy with this now, so the next question is
how to stage it?
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
2023-10-19 14:58 ` [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Jens Axboe
@ 2023-10-19 15:33 ` Jakub Kicinski
2023-10-19 15:40 ` Jens Axboe
0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2023-10-19 15:33 UTC (permalink / raw)
To: Jens Axboe
Cc: Breno Leitao, sdf, asml.silence, willemdebruijn.kernel, pabeni,
martin.lau, krisman, bpf, linux-kernel, netdev, io-uring
On Thu, 19 Oct 2023 08:58:59 -0600 Jens Axboe wrote:
> On 10/16/23 7:47 AM, Breno Leitao wrote:
> > This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> > and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> > SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> > and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
> > SOL_SOCKET level, which seems to be the most common level parameter for
> > get/setsockopt(2).
> >
> > In order to keep the implementation (and tests) simple, some refactors
> > were done prior to the changes, as follows:
>
> Looks like folks are mostly happy with this now, so the next question is
> how to stage it?
Would be good to get acks from BPF folks but AFAICT first four patches
apply cleanly for us now. If they apply cleanly for you I reckon you
can take them directly with io-uring. It's -rc7 time, with a bit of
luck we'll get to the merge window without a conflict.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
2023-10-19 15:33 ` Jakub Kicinski
@ 2023-10-19 15:40 ` Jens Axboe
0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2023-10-19 15:40 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Breno Leitao, sdf, asml.silence, willemdebruijn.kernel, pabeni,
martin.lau, krisman, bpf, linux-kernel, netdev, io-uring
On 10/19/23 9:33 AM, Jakub Kicinski wrote:
> On Thu, 19 Oct 2023 08:58:59 -0600 Jens Axboe wrote:
>> On 10/16/23 7:47 AM, Breno Leitao wrote:
>>> This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
>>> and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
>>> SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
>>> and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
>>> SOL_SOCKET level, which seems to be the most common level parameter for
>>> get/setsockopt(2).
>>>
>>> In order to keep the implementation (and tests) simple, some refactors
>>> were done prior to the changes, as follows:
>>
>> Looks like folks are mostly happy with this now, so the next question is
>> how to stage it?
>
> Would be good to get acks from BPF folks but AFAICT first four patches
Agree, those are still missing. BPF folks, do patches 1-2 look OK to
you?
> apply cleanly for us now. If they apply cleanly for you I reckon you
> can take them directly with io-uring. It's -rc7 time, with a bit of
> luck we'll get to the merge window without a conflict.
I'll tentatively setup a branch for this just to see if we run into
anything on the merge front. Depending on how the BPF side goes, I can
rebase/collect reviews/whatever as we go.
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
` (11 preceding siblings ...)
2023-10-19 14:58 ` [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Jens Axboe
@ 2023-10-19 15:41 ` Jens Axboe
12 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2023-10-19 15:41 UTC (permalink / raw)
To: sdf, asml.silence, willemdebruijn.kernel, kuba, pabeni,
martin.lau, krisman, Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring
On Mon, 16 Oct 2023 06:47:38 -0700, Breno Leitao wrote:
> This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
> and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
> SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
> and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
> SOL_SOCKET level, which seems to be the most common level parameter for
> get/setsockopt(2).
>
> [...]
Applied, thanks!
[01/11] bpf: Add sockptr support for getsockopt
commit: 7cb15cc7e081730df3392f136a8789f3d2c3fd66
[02/11] bpf: Add sockptr support for setsockopt
commit: c028f6e54aa180747e384796760eee3bd78e0891
[03/11] net/socket: Break down __sys_setsockopt
commit: e70464dcdcddb5128fe7956bf809683824c64de5
[04/11] net/socket: Break down __sys_getsockopt
commit: 25f82732c8352bd0bec33c5a9989fd46cac5789f
[05/11] io_uring/cmd: Pass compat mode in issue_flags
commit: 66c87d5639f2f80421b3a01f12dcb7718f996093
[06/11] tools headers: Grab copy of io_uring.h
commit: c36507ed1a2c2cb05c4a2aad9acb39ca5d7c12fe
[07/11] selftests/net: Extract uring helpers to be reusable
commit: 11336afdd4141bbbd144b118a8a559b1993dc5d2
[08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled
commit: d807234143872e460cdf851f1b2bbda2b427f95d
[09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
commit: c3199f61b896cdef3664dc12729a2beadf322783
[10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
commit: 43ad652250d24e9496f4cd6a0d670417807ac9a0
[11/11] selftests/bpf/sockopt: Add io_uring support
commit: d9710f1d12a99738ff168e252ab8e9ffdeb90ed5
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
2023-10-19 0:20 ` Jakub Kicinski
@ 2023-10-19 19:12 ` Martin KaFai Lau
2023-10-19 20:04 ` Jens Axboe
1 sibling, 1 reply; 27+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:12 UTC (permalink / raw)
To: Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
Alexander Mikhalitsyn, David Howells, sdf, axboe, asml.silence,
willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
Eric Dumazet
On 10/16/23 6:47 AM, Breno Leitao wrote:
> diff --git a/net/socket.c b/net/socket.c
> index 0087f8c071e7..f4c156a1987e 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
> INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
> int optname));
>
> +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
> + int optname, sockptr_t optval, sockptr_t optlen)
> +{
> + int max_optlen __maybe_unused;
> + const struct proto_ops *ops;
> + int err;
> +
> + err = security_socket_getsockopt(sock, level, optname);
> + if (err)
> + return err;
> +
> + ops = READ_ONCE(sock->ops);
> + if (level == SOL_SOCKET) {
> + err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
> + } else if (unlikely(!ops->getsockopt)) {
> + err = -EOPNOTSUPP;
> + } else {
> + if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
> + "Invalid argument type"))
> + return -EOPNOTSUPP;
> +
> + err = ops->getsockopt(sock, level, optname, optval.user,
> + optlen.user);
> + }
> +
> + if (!compat) {
> + max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
The max_optlen was done before the above sk_getsockopt. The bpf CI cannot catch
it because it cannot apply patch 5 cleanly. I ran the following out of the
linux-block tree:
$> ./test_progs -t sockopt_sk
test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
run_test:PASS:skel_load 0 nsec
run_test:PASS:setsockopt_link 0 nsec
run_test:PASS:getsockopt_link 0 nsec
(/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c:111:
errno: Operation not permitted) Failed to call getsockopt, ret=-1
run_test:FAIL:getsetsockopt unexpected error: -1 (errno 1)
#217 sockopt_sk:FAIL
> + err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
> + optval, optlen, max_optlen,
> + err);
> + }
> +
> + return err;
> +}
> +EXPORT_SYMBOL(do_sock_getsockopt);
> +
> /*
> * Get a socket option. Because we don't know the option lengths we have
> * to pass a user mode parameter for the protocols to sort out.
> @@ -2357,37 +2393,18 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
> int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
> int __user *optlen)
> {
> - int max_optlen __maybe_unused;
> - const struct proto_ops *ops;
> int err, fput_needed;
> struct socket *sock;
> + bool compat;
>
> sock = sockfd_lookup_light(fd, &err, &fput_needed);
> if (!sock)
> return err;
>
> - err = security_socket_getsockopt(sock, level, optname);
> - if (err)
> - goto out_put;
> -
> - if (!in_compat_syscall())
> - max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
The old max_optlen was done here.
> + compat = in_compat_syscall();
> + err = do_sock_getsockopt(sock, compat, level, optname,
> + USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
>
> - ops = READ_ONCE(sock->ops);
> - if (level == SOL_SOCKET)
> - err = sock_getsockopt(sock, level, optname, optval, optlen);
> - else if (unlikely(!ops->getsockopt))
> - err = -EOPNOTSUPP;
> - else
> - err = ops->getsockopt(sock, level, optname, optval,
> - optlen);
> -
> - if (!in_compat_syscall())
> - err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
> - USER_SOCKPTR(optval),
> - USER_SOCKPTR(optlen),
> - max_optlen, err);
> -out_put:
> fput_light(sock->file, fput_needed);
> return err;
> }
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 01/11] bpf: Add sockptr support for getsockopt
2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
@ 2023-10-19 19:24 ` Martin KaFai Lau
0 siblings, 0 replies; 27+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:24 UTC (permalink / raw)
To: Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring, sdf, axboe, asml.silence,
willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
Eric Dumazet
On 10/16/23 6:47 AM, Breno Leitao wrote:
> The whole network stack uses sockptr, and while it doesn't move to
> something more modern, let's use sockptr in getsockptr BPF hooks, so, it
> could be used by other callers.
>
> The main motivation for this change is to use it in the io_uring
> {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
> kernel value for optlen.
Acked-by: Martin KaFai Lau <[email protected]>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 02/11] bpf: Add sockptr support for setsockopt
2023-10-16 13:47 ` [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Breno Leitao
@ 2023-10-19 19:32 ` Martin KaFai Lau
0 siblings, 0 replies; 27+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:32 UTC (permalink / raw)
To: Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring, sdf, axboe, asml.silence,
willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
Eric Dumazet
On 10/16/23 6:47 AM, Breno Leitao wrote:
> The whole network stack uses sockptr, and while it doesn't move to
> something more modern, let's use sockptr in setsockptr BPF hooks, so, it
> could be used by other callers.
>
> The main motivation for this change is to use it in the io_uring
> {g,s}etsockopt(), which will use a userspace pointer for *optval, but, a
> kernel value for optlen.
Acked-by: Martin KaFai Lau <[email protected]>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 03/11] net/socket: Break down __sys_setsockopt
2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
2023-10-19 0:20 ` Jakub Kicinski
@ 2023-10-19 19:38 ` Martin KaFai Lau
1 sibling, 0 replies; 27+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:38 UTC (permalink / raw)
To: Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring, Willem de Bruijn, sdf, axboe,
asml.silence, willemdebruijn.kernel, kuba, pabeni, krisman,
David S. Miller, Eric Dumazet
On 10/16/23 6:47 AM, Breno Leitao wrote:
> Split __sys_setsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_setsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.
>
> do_sock_setsockopt() will be called by io_uring setsockopt() command
> operation in the following patch.
Acked-by: Martin KaFai Lau <[email protected]>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support
2023-10-16 13:47 ` [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Breno Leitao
@ 2023-10-19 19:51 ` Martin KaFai Lau
0 siblings, 0 replies; 27+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 19:51 UTC (permalink / raw)
To: Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring, Daniel Müller,
open list:KERNEL SELFTEST FRAMEWORK, sdf, axboe, asml.silence,
willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
John Fastabend, KP Singh, Hao Luo, Jiri Olsa, Mykola Lysenko,
Shuah Khan
On 10/16/23 6:47 AM, Breno Leitao wrote:
> Expand the sockopt test to use also check for io_uring {g,s}etsockopt
> commands operations.
>
> This patch starts by marking each test if they support io_uring support
> or not.
>
> Right now, io_uring cmd getsockopt() has a limitation of only
> accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Since
> there aren't any test exercising getsockopt(level == SOL_SOCKET), this
> patch changes two tests to use level == SOL_SOCKET, they are
> "getsockopt: support smaller ctx->optlen" and "getsockopt: read
> ctx->optlen".
> There is no limitation for the setsockopt() part.
>
> Later, each test runs using regular {g,s}etsockopt systemcalls, and, if
> liburing is supported, execute the same test (again), but calling
> liburing {g,s}setsockopt commands.
>
> This patch also changes the level of two tests to use SOL_SOCKET for the
> following two tests. This is going to help to exercise the io_uring
> subsystem:
> * getsockopt: read ctx->optlen
> * getsockopt: support smaller ctx->optlen
Acked-by: Martin KaFai Lau <[email protected]>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
2023-10-19 19:12 ` Martin KaFai Lau
@ 2023-10-19 20:04 ` Jens Axboe
2023-10-19 20:37 ` Martin KaFai Lau
0 siblings, 1 reply; 27+ messages in thread
From: Jens Axboe @ 2023-10-19 20:04 UTC (permalink / raw)
To: Martin KaFai Lau, Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
Alexander Mikhalitsyn, David Howells, sdf, asml.silence,
willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
Eric Dumazet
On 10/19/23 1:12 PM, Martin KaFai Lau wrote:
> On 10/16/23 6:47?AM, Breno Leitao wrote:
>> diff --git a/net/socket.c b/net/socket.c
>> index 0087f8c071e7..f4c156a1987e 100644
>> --- a/net/socket.c
>> +++ b/net/socket.c
>> @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
>> INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>> int optname));
>> +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>> + int optname, sockptr_t optval, sockptr_t optlen)
>> +{
>> + int max_optlen __maybe_unused;
>> + const struct proto_ops *ops;
>> + int err;
>> +
>> + err = security_socket_getsockopt(sock, level, optname);
>> + if (err)
>> + return err;
>> +
>> + ops = READ_ONCE(sock->ops);
>> + if (level == SOL_SOCKET) {
>> + err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
>> + } else if (unlikely(!ops->getsockopt)) {
>> + err = -EOPNOTSUPP;
>> + } else {
>> + if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
>> + "Invalid argument type"))
>> + return -EOPNOTSUPP;
>> +
>> + err = ops->getsockopt(sock, level, optname, optval.user,
>> + optlen.user);
>> + }
>> +
>> + if (!compat) {
>> + max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
>
> The max_optlen was done before the above sk_getsockopt. The bpf CI cannot catch it because it cannot apply patch 5 cleanly. I ran the following out of the linux-block tree:
>
> $> ./test_progs -t sockopt_sk
> test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
> run_test:PASS:skel_load 0 nsec
> run_test:PASS:setsockopt_link 0 nsec
> run_test:PASS:getsockopt_link 0 nsec
> (/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c:111: errno: Operation not permitted) Failed to call getsockopt, ret=-1
> run_test:FAIL:getsetsockopt unexpected error: -1 (errno 1)
> #217 sockopt_sk:FAIL
Does it work with this incremental? I can fold that in, will rebase
anyway to collect acks.
diff --git a/net/socket.c b/net/socket.c
index bccd257e13fe..eb6960958026 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2344,6 +2344,9 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
if (err)
return err;
+ if (!compat)
+ max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+
ops = READ_ONCE(sock->ops);
if (level == SOL_SOCKET) {
err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
@@ -2358,12 +2361,10 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
optlen.user);
}
- if (!compat) {
- max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+ if (!compat)
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
optval, optlen, max_optlen,
err);
- }
return err;
}
--
Jens Axboe
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
2023-10-19 20:04 ` Jens Axboe
@ 2023-10-19 20:37 ` Martin KaFai Lau
2023-10-19 22:32 ` Jens Axboe
0 siblings, 1 reply; 27+ messages in thread
From: Martin KaFai Lau @ 2023-10-19 20:37 UTC (permalink / raw)
To: Jens Axboe, Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
Alexander Mikhalitsyn, David Howells, sdf, asml.silence,
willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
Eric Dumazet
On 10/19/23 1:04 PM, Jens Axboe wrote:
> On 10/19/23 1:12 PM, Martin KaFai Lau wrote:
>> On 10/16/23 6:47?AM, Breno Leitao wrote:
>>> diff --git a/net/socket.c b/net/socket.c
>>> index 0087f8c071e7..f4c156a1987e 100644
>>> --- a/net/socket.c
>>> +++ b/net/socket.c
>>> @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
>>> INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>>> int optname));
>>> +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>>> + int optname, sockptr_t optval, sockptr_t optlen)
>>> +{
>>> + int max_optlen __maybe_unused;
>>> + const struct proto_ops *ops;
>>> + int err;
>>> +
>>> + err = security_socket_getsockopt(sock, level, optname);
>>> + if (err)
>>> + return err;
>>> +
>>> + ops = READ_ONCE(sock->ops);
>>> + if (level == SOL_SOCKET) {
>>> + err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
>>> + } else if (unlikely(!ops->getsockopt)) {
>>> + err = -EOPNOTSUPP;
>>> + } else {
>>> + if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
>>> + "Invalid argument type"))
>>> + return -EOPNOTSUPP;
>>> +
>>> + err = ops->getsockopt(sock, level, optname, optval.user,
>>> + optlen.user);
>>> + }
>>> +
>>> + if (!compat) {
>>> + max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
>>
>> The max_optlen was done before the above sk_getsockopt. The bpf CI cannot catch it because it cannot apply patch 5 cleanly. I ran the following out of the linux-block tree:
>>
>> $> ./test_progs -t sockopt_sk
>> test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
>> run_test:PASS:skel_load 0 nsec
>> run_test:PASS:setsockopt_link 0 nsec
>> run_test:PASS:getsockopt_link 0 nsec
>> (/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c:111: errno: Operation not permitted) Failed to call getsockopt, ret=-1
>> run_test:FAIL:getsetsockopt unexpected error: -1 (errno 1)
>> #217 sockopt_sk:FAIL
>
> Does it work with this incremental? I can fold that in, will rebase
> anyway to collect acks.
Yes, that should work.
Acked-by: Martin KaFai Lau <[email protected]>
>
>
> diff --git a/net/socket.c b/net/socket.c
> index bccd257e13fe..eb6960958026 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -2344,6 +2344,9 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
> if (err)
> return err;
>
> + if (!compat)
> + max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
> +
> ops = READ_ONCE(sock->ops);
> if (level == SOL_SOCKET) {
> err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
> @@ -2358,12 +2361,10 @@ int do_sock_getsockopt(struct socket *sock, bool compat, int level,
> optlen.user);
> }
>
> - if (!compat) {
> - max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
> + if (!compat)
> err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
> optval, optlen, max_optlen,
> err);
> - }
>
> return err;
> }
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v7 04/11] net/socket: Break down __sys_getsockopt
2023-10-19 20:37 ` Martin KaFai Lau
@ 2023-10-19 22:32 ` Jens Axboe
0 siblings, 0 replies; 27+ messages in thread
From: Jens Axboe @ 2023-10-19 22:32 UTC (permalink / raw)
To: Martin KaFai Lau, Breno Leitao
Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
Alexander Mikhalitsyn, David Howells, sdf, asml.silence,
willemdebruijn.kernel, kuba, pabeni, krisman, Alexei Starovoitov,
Daniel Borkmann, John Fastabend, Andrii Nakryiko, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
Eric Dumazet
On 10/19/23 2:37 PM, Martin KaFai Lau wrote:
> On 10/19/23 1:04?PM, Jens Axboe wrote:
>> On 10/19/23 1:12 PM, Martin KaFai Lau wrote:
>>> On 10/16/23 6:47?AM, Breno Leitao wrote:
>>>> diff --git a/net/socket.c b/net/socket.c
>>>> index 0087f8c071e7..f4c156a1987e 100644
>>>> --- a/net/socket.c
>>>> +++ b/net/socket.c
>>>> @@ -2350,6 +2350,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
>>>> INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
>>>> int optname));
>>>> +int do_sock_getsockopt(struct socket *sock, bool compat, int level,
>>>> + int optname, sockptr_t optval, sockptr_t optlen)
>>>> +{
>>>> + int max_optlen __maybe_unused;
>>>> + const struct proto_ops *ops;
>>>> + int err;
>>>> +
>>>> + err = security_socket_getsockopt(sock, level, optname);
>>>> + if (err)
>>>> + return err;
>>>> +
>>>> + ops = READ_ONCE(sock->ops);
>>>> + if (level == SOL_SOCKET) {
>>>> + err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
>>>> + } else if (unlikely(!ops->getsockopt)) {
>>>> + err = -EOPNOTSUPP;
>>>> + } else {
>>>> + if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
>>>> + "Invalid argument type"))
>>>> + return -EOPNOTSUPP;
>>>> +
>>>> + err = ops->getsockopt(sock, level, optname, optval.user,
>>>> + optlen.user);
>>>> + }
>>>> +
>>>> + if (!compat) {
>>>> + max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
>>>
>>> The max_optlen was done before the above sk_getsockopt. The bpf CI cannot catch it because it cannot apply patch 5 cleanly. I ran the following out of the linux-block tree:
>>>
>>> $> ./test_progs -t sockopt_sk
>>> test_sockopt_sk:PASS:join_cgroup /sockopt_sk 0 nsec
>>> run_test:PASS:skel_load 0 nsec
>>> run_test:PASS:setsockopt_link 0 nsec
>>> run_test:PASS:getsockopt_link 0 nsec
>>> (/data/users/kafai/fb-kernel/linux/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c:111: errno: Operation not permitted) Failed to call getsockopt, ret=-1
>>> run_test:FAIL:getsetsockopt unexpected error: -1 (errno 1)
>>> #217 sockopt_sk:FAIL
>>
>> Does it work with this incremental? I can fold that in, will rebase
>> anyway to collect acks.
>
> Yes, that should work.
>
> Acked-by: Martin KaFai Lau <[email protected]>
Thanks Martin, I'll add that too.
--
Jens Axboe
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2023-10-19 22:32 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-16 13:47 [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
2023-10-16 13:47 ` [PATCH v7 01/11] bpf: Add sockptr support for getsockopt Breno Leitao
2023-10-19 19:24 ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 02/11] bpf: Add sockptr support for setsockopt Breno Leitao
2023-10-19 19:32 ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 03/11] net/socket: Break down __sys_setsockopt Breno Leitao
2023-10-19 0:20 ` Jakub Kicinski
2023-10-19 19:38 ` Martin KaFai Lau
2023-10-16 13:47 ` [PATCH v7 04/11] net/socket: Break down __sys_getsockopt Breno Leitao
2023-10-19 0:20 ` Jakub Kicinski
2023-10-19 19:12 ` Martin KaFai Lau
2023-10-19 20:04 ` Jens Axboe
2023-10-19 20:37 ` Martin KaFai Lau
2023-10-19 22:32 ` Jens Axboe
2023-10-16 13:47 ` [PATCH v7 05/11] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
2023-10-16 13:47 ` [PATCH v7 06/11] tools headers: Grab copy of io_uring.h Breno Leitao
[not found] ` <[email protected]>
2023-10-17 12:34 ` Breno Leitao
2023-10-16 13:47 ` [PATCH v7 07/11] selftests/net: Extract uring helpers to be reusable Breno Leitao
2023-10-16 13:47 ` [PATCH v7 08/11] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
2023-10-16 13:47 ` [PATCH v7 09/11] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
2023-10-16 13:47 ` [PATCH v7 10/11] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
2023-10-16 13:47 ` [PATCH v7 11/11] selftests/bpf/sockopt: Add io_uring support Breno Leitao
2023-10-19 19:51 ` Martin KaFai Lau
2023-10-19 14:58 ` [PATCH v7 00/11] io_uring: Initial support for {s,g}etsockopt commands Jens Axboe
2023-10-19 15:33 ` Jakub Kicinski
2023-10-19 15:40 ` Jens Axboe
2023-10-19 15:41 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox