[PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands
@ 2023-09-04 16:24 Breno Leitao
  2023-09-04 16:24 ` [PATCH v4 01/10] bpf: Leverage sockptr_t in BPF getsockopt hook Breno Leitao
                   ` (10 more replies)
  0 siblings, 11 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:24 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman
  Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni

This patchset adds support for getsockopt (SOCKET_URING_OP_GETSOCKOPT)
and setsockopt (SOCKET_URING_OP_SETSOCKOPT) in io_uring commands.
SOCKET_URING_OP_SETSOCKOPT implements generic case, covering all levels
and optnames. SOCKET_URING_OP_GETSOCKOPT is limited, for now, to
SOL_SOCKET level, which seems to be the most common level parameter for
get/setsockopt(2).

In order to keep the implementation (and tests) simple, some refactors
were done prior to the changes, as follows:

Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
become flexible enough to accept user or kernel pointers for optval/optlen.

Patch 3-4:  Remove the core {s,g}etsockopt() core function from
__sys_{g,s}etsockopt, so, the code could be reused by other callers,
such as io_uring.

Patch 5: Pass compat mode to the file/socket callbacks

Patch 6: Move io_uring helpers from io_uring_zerocopy_tx to a generic
io_uring headers. This simplify the test case (last patch)

Patch 7: Protect io_uring_cmd_sock() to not be called if CONFIG_NET is
disabled.

PS1: For getsockopt command, the optlen field is not a userspace
pointers, but an absolute value, so this is slightly different from
getsockopt(2) behaviour. The new optlen value is returned in cqe->res.

PS2: The userspace pointers need to be alive until the operation is
completed.

These changes were tested with a new test[1] in liburing, LTP sockopt*
tests, as also with bpf/progs/sockopt test case, which is now adapted to
run using both system calls and io_uring commands.

[1] Link: https://github.com/leitao/liburing/blob/getsockopt/test/socket-getsetsock-cmd.c

RFC -> V1:
	* Copy user memory at io_uring subsystem, and call proto_ops
	  callbacks using kernel memory
	* Implement all the cases for SOCKET_URING_OP_SETSOCKOPT

V1 -> V2
	* Implemented the BPF part
	* Using user pointers from optval to avoid kmalloc in io_uring part.

V2 -> V3:
	* Break down __sys_setsockopt and reuse the core code, avoiding
	  duplicated code. This removed the requirement to expose
	  sock_use_custom_sol_socket().
	* Added io_uring test to selftests/bpf/sockopt.
	* Fixed compat argument, by passing it to the issue_flags.

V3 -> V4:
	* Rebase on top of commit 1ded5e5a5931b ("net: annotate data-races around sock->ops")
	* Also broke down __sys_setsockopt() to reuse the core function
	  from io_uring.
	* Create a new patch to return -EOPNOTSUPP if CONFIG_NET is
	  disabled
	* Added two SOL_SOCKET tests in bpf/prog_tests/sockopt.c

Breno Leitao (10):
  bpf: Leverage sockptr_t in BPF getsockopt hook
  bpf: Leverage sockptr_t in BPF setsockopt hook
  net/socket: Break down __sys_setsockopt
  net/socket: Break down __sys_getsockopt
  io_uring/cmd: Pass compat mode in issue_flags
  selftests/net: Extract uring helpers to be reusable
  io_uring/cmd: return -EOPNOTSUPP if net is disabled
  io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  selftests/bpf/sockopt: Add io_uring support

 include/linux/bpf-cgroup.h                    |   9 +-
 include/linux/io_uring.h                      |   1 +
 include/net/sock.h                            |   4 +
 include/uapi/linux/io_uring.h                 |   8 +
 io_uring/uring_cmd.c                          |  55 ++++
 kernel/bpf/cgroup.c                           |  25 +-
 net/core/sock.c                               |   8 -
 net/socket.c                                  | 102 ++++---
 tools/include/io_uring/mini_liburing.h        | 282 ++++++++++++++++++
 .../selftests/bpf/prog_tests/sockopt.c        | 113 ++++++-
 tools/testing/selftests/net/Makefile          |   1 +
 .../selftests/net/io_uring_zerocopy_tx.c      | 268 +----------------
 12 files changed, 544 insertions(+), 332 deletions(-)
 create mode 100644 tools/include/io_uring/mini_liburing.h

-- 
2.34.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 01/10] bpf: Leverage sockptr_t in BPF getsockopt hook
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
@ 2023-09-04 16:24 ` Breno Leitao
  2023-09-04 16:24 ` [PATCH v4 02/10] bpf: Leverage sockptr_t in BPF setsockopt hook Breno Leitao
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:24 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: bpf, linux-kernel, netdev, io-uring

Leverage sockptr_t structure to pass universal pointer as argument, that
holds either a userspace pointer, or, a kernel pointer.

This makes this function flexible, so, we can mix and match user and
kernel space pointers. The main motivation for this change is to use it
in the io_uring {g,s}etsockopt(), which will use a userspace pointer for
*optval, but, a kernel value for optlen.

Signed-off-by: Breno Leitao <[email protected]>
---
 include/linux/bpf-cgroup.h |  5 +++--
 kernel/bpf/cgroup.c        | 20 +++++++++++---------
 net/socket.c               |  5 +++--
 3 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 8506690dbb9c..f5b4fb6ed8c6 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -139,9 +139,10 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
 				       int *optname, char __user *optval,
 				       int *optlen, char **kernel_optval);
+
 int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
-				       int optname, char __user *optval,
-				       int __user *optlen, int max_optlen,
+				       int optname, sockptr_t optval,
+				       sockptr_t optlen, int max_optlen,
 				       int retval);
 
 int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 5b2741aa0d9b..ebc8c58f7e46 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1875,8 +1875,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 }
 
 int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
-				       int optname, char __user *optval,
-				       int __user *optlen, int max_optlen,
+				       int optname, sockptr_t optval,
+				       sockptr_t optlen, int max_optlen,
 				       int retval)
 {
 	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1903,8 +1903,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 		 * one that kernel returned as well to let
 		 * BPF programs inspect the value.
 		 */
-
-		if (get_user(ctx.optlen, optlen)) {
+		if (copy_from_sockptr(&ctx.optlen, optlen,
+				      sizeof(ctx.optlen))) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1915,8 +1915,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 		}
 		orig_optlen = ctx.optlen;
 
-		if (copy_from_user(ctx.optval, optval,
-				   min(ctx.optlen, max_optlen)) != 0) {
+		if (copy_from_sockptr(ctx.optval, optval,
+				      min(ctx.optlen, max_optlen))) {
 			ret = -EFAULT;
 			goto out;
 		}
@@ -1930,7 +1930,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	if (ret < 0)
 		goto out;
 
-	if (optval && (ctx.optlen > max_optlen || ctx.optlen < 0)) {
+	if (!sockptr_is_null(optval) &&
+	    (ctx.optlen > max_optlen || ctx.optlen < 0)) {
 		if (orig_optlen > PAGE_SIZE && ctx.optlen >= 0) {
 			pr_info_once("bpf getsockopt: ignoring program buffer with optlen=%d (max_optlen=%d)\n",
 				     ctx.optlen, max_optlen);
@@ -1942,11 +1943,12 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
 	}
 
 	if (ctx.optlen != 0) {
-		if (optval && copy_to_user(optval, ctx.optval, ctx.optlen)) {
+		if (!sockptr_is_null(optval) &&
+		    copy_to_sockptr(optval, ctx.optval, ctx.optlen)) {
 			ret = -EFAULT;
 			goto out;
 		}
-		if (put_user(ctx.optlen, optlen)) {
+		if (copy_to_sockptr(optlen, &ctx.optlen, sizeof(ctx.optlen))) {
 			ret = -EFAULT;
 			goto out;
 		}
diff --git a/net/socket.c b/net/socket.c
index 77f28328e387..6fda5d011521 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2355,8 +2355,9 @@ int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 
 	if (!in_compat_syscall())
 		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
-						     optval, optlen, max_optlen,
-						     err);
+						     USER_SOCKPTR(optval),
+						     USER_SOCKPTR(optlen),
+						     max_optlen, err);
 out_put:
 	fput_light(sock->file, fput_needed);
 	return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 02/10] bpf: Leverage sockptr_t in BPF setsockopt hook
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
  2023-09-04 16:24 ` [PATCH v4 01/10] bpf: Leverage sockptr_t in BPF getsockopt hook Breno Leitao
@ 2023-09-04 16:24 ` Breno Leitao
  2023-09-04 16:24 ` [PATCH v4 03/10] net/socket: Break down __sys_setsockopt Breno Leitao
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:24 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Andrii Nakryiko, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: bpf, linux-kernel, netdev, io-uring

Change BPF setsockopt hook (__cgroup_bpf_run_filter_setsockopt()) to use
sockptr instead of user pointers. This brings flexibility to the
function, since it could be called with userspace or kernel pointers.

This change will allow the creation of a core sock_setsockopt, called
do_sock_setsockopt(), which will be called from both the system call path
and by io_uring command.

This also aligns with the getsockopt() counterpart, which is now using
sockptr_t universal pointer.

Signed-off-by: Breno Leitao <[email protected]>
---
 include/linux/bpf-cgroup.h | 2 +-
 kernel/bpf/cgroup.c        | 5 +++--
 net/socket.c               | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index f5b4fb6ed8c6..cecfe8c99f28 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -137,7 +137,7 @@ int __cgroup_bpf_run_filter_sysctl(struct ctl_table_header *head,
 				   enum cgroup_bpf_attach_type atype);
 
 int __cgroup_bpf_run_filter_setsockopt(struct sock *sock, int *level,
-				       int *optname, char __user *optval,
+				       int *optname, sockptr_t optval,
 				       int *optlen, char **kernel_optval);
 
 int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ebc8c58f7e46..f0dedd4f7f2e 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1785,7 +1785,7 @@ static bool sockopt_buf_allocated(struct bpf_sockopt_kern *ctx,
 }
 
 int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
-				       int *optname, char __user *optval,
+				       int *optname, sockptr_t optval,
 				       int *optlen, char **kernel_optval)
 {
 	struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
@@ -1808,7 +1808,8 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
 
 	ctx.optlen = *optlen;
 
-	if (copy_from_user(ctx.optval, optval, min(*optlen, max_optlen)) != 0) {
+	if (copy_from_sockptr(ctx.optval, optval,
+			      min(*optlen, max_optlen))) {
 		ret = -EFAULT;
 		goto out;
 	}
diff --git a/net/socket.c b/net/socket.c
index 6fda5d011521..9ec9a8a07c0e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2287,7 +2287,7 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 
 	if (!in_compat_syscall())
 		err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
-						     user_optval, &optlen,
+						     optval, &optlen,
 						     &kernel_optval);
 	if (err < 0)
 		goto out_put;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 03/10] net/socket: Break down __sys_setsockopt
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
  2023-09-04 16:24 ` [PATCH v4 01/10] bpf: Leverage sockptr_t in BPF getsockopt hook Breno Leitao
  2023-09-04 16:24 ` [PATCH v4 02/10] bpf: Leverage sockptr_t in BPF setsockopt hook Breno Leitao
@ 2023-09-04 16:24 ` Breno Leitao
  2023-09-04 16:24 ` [PATCH v4 04/10] net/socket: Break down __sys_getsockopt Breno Leitao
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:24 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: bpf, linux-kernel, netdev, io-uring, Willem de Bruijn

Split __sys_setsockopt() into two functions by removing the core
logic into a sub-function (do_sock_setsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.

do_sock_setsockopt() will be called by io_uring setsockopt() command
operation in the following patch.

Signed-off-by: Breno Leitao <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
---
 include/net/sock.h |  2 ++
 net/socket.c       | 39 +++++++++++++++++++++++++--------------
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 11d503417591..b059f9272303 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1861,6 +1861,8 @@ int sk_setsockopt(struct sock *sk, int level, int optname,
 		  sockptr_t optval, unsigned int optlen);
 int sock_setsockopt(struct socket *sock, int level, int op,
 		    sockptr_t optval, unsigned int optlen);
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+		       int optname, sockptr_t optval, int optlen);
 
 int sk_getsockopt(struct sock *sk, int level, int optname,
 		  sockptr_t optval, sockptr_t optlen);
diff --git a/net/socket.c b/net/socket.c
index 9ec9a8a07c0e..3bf29a27653f 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2261,31 +2261,21 @@ static bool sock_use_custom_sol_socket(const struct socket *sock)
 	return test_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
 }
 
-/*
- *	Set a socket option. Because we don't know the option lengths we have
- *	to pass the user mode parameter for the protocols to sort out.
- */
-int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
-		int optlen)
+int do_sock_setsockopt(struct socket *sock, bool compat, int level,
+		       int optname, sockptr_t optval, int optlen)
 {
-	sockptr_t optval = USER_SOCKPTR(user_optval);
 	const struct proto_ops *ops;
 	char *kernel_optval = NULL;
-	int err, fput_needed;
-	struct socket *sock;
+	int err;
 
 	if (optlen < 0)
 		return -EINVAL;
 
-	sock = sockfd_lookup_light(fd, &err, &fput_needed);
-	if (!sock)
-		return err;
-
 	err = security_socket_setsockopt(sock, level, optname);
 	if (err)
 		goto out_put;
 
-	if (!in_compat_syscall())
+	if (!compat)
 		err = BPF_CGROUP_RUN_PROG_SETSOCKOPT(sock->sk, &level, &optname,
 						     optval, &optlen,
 						     &kernel_optval);
@@ -2308,6 +2298,27 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
 					    optlen);
 	kfree(kernel_optval);
 out_put:
+	return err;
+}
+EXPORT_SYMBOL(do_sock_setsockopt);
+
+/* Set a socket option. Because we don't know the option lengths we have
+ * to pass the user mode parameter for the protocols to sort out.
+ */
+int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
+		     int optlen)
+{
+	sockptr_t optval = USER_SOCKPTR(user_optval);
+	bool compat = in_compat_syscall();
+	int err, fput_needed;
+	struct socket *sock;
+
+	sock = sockfd_lookup_light(fd, &err, &fput_needed);
+	if (!sock)
+		return err;
+
+	err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
+
 	fput_light(sock->file, fput_needed);
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 04/10] net/socket: Break down __sys_getsockopt
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (2 preceding siblings ...)
  2023-09-04 16:24 ` [PATCH v4 03/10] net/socket: Break down __sys_setsockopt Breno Leitao
@ 2023-09-04 16:24 ` Breno Leitao
  2023-09-05  9:36   ` David Laight
  2023-09-05 10:29   ` Andy Shevchenko
  2023-09-04 16:24 ` [PATCH v4 05/10] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:24 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Song Liu, Yonghong Song, John Fastabend, KP Singh, Hao Luo,
	Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: bpf, linux-kernel, netdev, io-uring, Kuniyuki Iwashima,
	Alexander Mikhalitsyn, Xin Long, David Howells, Jason Xing,
	Andy Shevchenko

Split __sys_getsockopt() into two functions by removing the core
logic into a sub-function (do_sock_getsockopt()). This will avoid
code duplication when doing the same operation in other callers, for
instance.

do_sock_getsockopt() will be called by io_uring getsockopt() command
operation in the following patch.

The same was done for the setsockopt pair.

Suggested-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
 include/linux/bpf-cgroup.h |  2 +-
 include/net/sock.h         |  2 ++
 net/core/sock.c            |  8 -----
 net/socket.c               | 62 ++++++++++++++++++++++++--------------
 4 files changed, 42 insertions(+), 32 deletions(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index cecfe8c99f28..ffaca1ab5e8d 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -378,7 +378,7 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk,
 ({									       \
 	int __ret = 0;							       \
 	if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT))			       \
-		get_user(__ret, optlen);				       \
+		copy_from_sockptr(&__ret, optlen, sizeof(int));		       \
 	__ret;								       \
 })
 
diff --git a/include/net/sock.h b/include/net/sock.h
index b059f9272303..c0185121efe4 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1863,6 +1863,8 @@ int sock_setsockopt(struct socket *sock, int level, int op,
 		    sockptr_t optval, unsigned int optlen);
 int do_sock_setsockopt(struct socket *sock, bool compat, int level,
 		       int optname, sockptr_t optval, int optlen);
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+		       int optname, sockptr_t optval, sockptr_t optlen);
 
 int sk_getsockopt(struct sock *sk, int level, int optname,
 		  sockptr_t optval, sockptr_t optlen);
diff --git a/net/core/sock.c b/net/core/sock.c
index 666a17cab4f5..cf15394ed664 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2009,14 +2009,6 @@ int sk_getsockopt(struct sock *sk, int level, int optname,
 	return 0;
 }
 
-int sock_getsockopt(struct socket *sock, int level, int optname,
-		    char __user *optval, int __user *optlen)
-{
-	return sk_getsockopt(sock->sk, level, optname,
-			     USER_SOCKPTR(optval),
-			     USER_SOCKPTR(optlen));
-}
-
 /*
  * Initialize an sk_lock.
  *
diff --git a/net/socket.c b/net/socket.c
index 3bf29a27653f..c79d2b2b902e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2332,6 +2332,42 @@ SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
 INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
 							 int optname));
 
+int do_sock_getsockopt(struct socket *sock, bool compat, int level,
+		       int optname, sockptr_t optval, sockptr_t optlen)
+{
+	int max_optlen __maybe_unused;
+	const struct proto_ops *ops;
+	int err;
+
+	err = security_socket_getsockopt(sock, level, optname);
+	if (err)
+		return err;
+
+	ops = READ_ONCE(sock->ops);
+	if (level == SOL_SOCKET) {
+		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
+	} else if (unlikely(!ops->getsockopt)) {
+		err = -EOPNOTSUPP;
+	} else {
+		if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
+			      "Invalid argument type"))
+			return -EOPNOTSUPP;
+
+		err = ops->getsockopt(sock, level, optname, optval.user,
+				      optlen.user);
+	}
+
+	if (!compat) {
+		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
+						     optval, optlen, max_optlen,
+						     err);
+	}
+
+	return err;
+}
+EXPORT_SYMBOL(do_sock_getsockopt);
+
 /*
  *	Get a socket option. Because we don't know the option lengths we have
  *	to pass a user mode parameter for the protocols to sort out.
@@ -2339,37 +2375,17 @@ INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
 int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
 		int __user *optlen)
 {
-	int max_optlen __maybe_unused;
-	const struct proto_ops *ops;
 	int err, fput_needed;
+	bool compat = in_compat_syscall();
 	struct socket *sock;
 
 	sock = sockfd_lookup_light(fd, &err, &fput_needed);
 	if (!sock)
 		return err;
 
-	err = security_socket_getsockopt(sock, level, optname);
-	if (err)
-		goto out_put;
-
-	if (!in_compat_syscall())
-		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);
+	err = do_sock_getsockopt(sock, compat, level, optname,
+				 USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
 
-	ops = READ_ONCE(sock->ops);
-	if (level == SOL_SOCKET)
-		err = sock_getsockopt(sock, level, optname, optval, optlen);
-	else if (unlikely(!ops->getsockopt))
-		err = -EOPNOTSUPP;
-	else
-		err = ops->getsockopt(sock, level, optname, optval,
-					    optlen);
-
-	if (!in_compat_syscall())
-		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
-						     USER_SOCKPTR(optval),
-						     USER_SOCKPTR(optlen),
-						     max_optlen, err);
-out_put:
 	fput_light(sock->file, fput_needed);
 	return err;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* RE: [PATCH v4 04/10] net/socket: Break down __sys_getsockopt
  2023-09-04 16:24 ` [PATCH v4 04/10] net/socket: Break down __sys_getsockopt Breno Leitao
@ 2023-09-05  9:36   ` David Laight
  2023-09-05 10:29   ` Andy Shevchenko
  1 sibling, 0 replies; 23+ messages in thread
From: David Laight @ 2023-09-05  9:36 UTC (permalink / raw)
  To: 'Breno Leitao', [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: [email protected], [email protected],
	[email protected], [email protected],
	Kuniyuki Iwashima, Alexander Mikhalitsyn, Xin Long, David Howells,
	Jason Xing, Andy Shevchenko

From: Breno Leitao
> Sent: 04 September 2023 17:25
> 
> Split __sys_getsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_getsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.

Although a lot more work, I think (others may disagree) that
the internal getsockopt() functions should be changed to take
the length as a parameter and return the positive value
to write back.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 04/10] net/socket: Break down __sys_getsockopt
  2023-09-04 16:24 ` [PATCH v4 04/10] net/socket: Break down __sys_getsockopt Breno Leitao
  2023-09-05  9:36   ` David Laight
@ 2023-09-05 10:29   ` Andy Shevchenko
  1 sibling, 0 replies; 23+ messages in thread
From: Andy Shevchenko @ 2023-09-05 10:29 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Song Liu, Yonghong Song, John Fastabend, KP Singh, Hao Luo,
	Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, bpf, linux-kernel, netdev, io-uring,
	Kuniyuki Iwashima, Alexander Mikhalitsyn, Xin Long, David Howells,
	Jason Xing

On Mon, Sep 04, 2023 at 09:24:57AM -0700, Breno Leitao wrote:
> Split __sys_getsockopt() into two functions by removing the core
> logic into a sub-function (do_sock_getsockopt()). This will avoid
> code duplication when doing the same operation in other callers, for
> instance.
> 
> do_sock_getsockopt() will be called by io_uring getsockopt() command
> operation in the following patch.
> 
> The same was done for the setsockopt pair.

...

> +	ops = READ_ONCE(sock->ops);
> +	if (level == SOL_SOCKET) {
> +		err = sk_getsockopt(sock->sk, level, optname, optval, optlen);
> +	} else if (unlikely(!ops->getsockopt)) {
> +		err = -EOPNOTSUPP;
> +	} else {
> +		if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
> +			      "Invalid argument type"))
> +			return -EOPNOTSUPP;
> +
> +		err = ops->getsockopt(sock, level, optname, optval.user,
> +				      optlen.user);
> +	}

Can be written as

	} else if (WARN_ONCE(optval.is_kernel || optlen.is_kernel,
			     "Invalid argument type"))
		return -EOPNOTSUPP;
	} else {
		err = ops->getsockopt(sock, level, optname, optval.user,
				      optlen.user);
	}

With that done, the {} are not needed anymore.

> +	if (!compat) {

	if (compat)
		return err;

> +		max_optlen = BPF_CGROUP_GETSOCKOPT_MAX_OPTLEN(optlen);

> +		err = BPF_CGROUP_RUN_PROG_GETSOCKOPT(sock->sk, level, optname,
> +						     optval, optlen, max_optlen,
> +						     err);

	return ... ?

> +	}
> +
> +	return err;
> +}

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 05/10] io_uring/cmd: Pass compat mode in issue_flags
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (3 preceding siblings ...)
  2023-09-04 16:24 ` [PATCH v4 04/10] net/socket: Break down __sys_getsockopt Breno Leitao
@ 2023-09-04 16:24 ` Breno Leitao
  2023-09-04 16:24 ` [PATCH v4 06/10] selftests/net: Extract uring helpers to be reusable Breno Leitao
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:24 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman
  Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni

Create a new flag to track if the operation is running compat mode.
This basically check the context->compat and pass it to the issue_flags,
so, it could be queried later in the callbacks.

Signed-off-by: Breno Leitao <[email protected]>
---
 include/linux/io_uring.h | 1 +
 io_uring/uring_cmd.c     | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 106cdc55ff3b..bc53b35966ed 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -20,6 +20,7 @@ enum io_uring_cmd_flags {
 	IO_URING_F_SQE128		= (1 << 8),
 	IO_URING_F_CQE32		= (1 << 9),
 	IO_URING_F_IOPOLL		= (1 << 10),
+	IO_URING_F_COMPAT		= (1 << 11),
 };
 
 struct io_uring_cmd {
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 537795fddc87..60f843a357e0 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -128,6 +128,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 		issue_flags |= IO_URING_F_SQE128;
 	if (ctx->flags & IORING_SETUP_CQE32)
 		issue_flags |= IO_URING_F_CQE32;
+	if (ctx->compat)
+		issue_flags |= IO_URING_F_COMPAT;
 	if (ctx->flags & IORING_SETUP_IOPOLL) {
 		if (!file->f_op->uring_cmd_iopoll)
 			return -EOPNOTSUPP;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 06/10] selftests/net: Extract uring helpers to be reusable
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (4 preceding siblings ...)
  2023-09-04 16:24 ` [PATCH v4 05/10] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
@ 2023-09-04 16:24 ` Breno Leitao
  2023-09-04 16:25 ` [PATCH v4 07/10] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:24 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Shuah Khan
  Cc: bpf, linux-kernel, netdev, io-uring,
	open list:KERNEL SELFTEST FRAMEWORK

Instead of defining basic io_uring functions in the test case, move them
to a common directory, so, other tests can use them.

This simplify the test code and reuse the common liburing
infrastructure. This is basically a copy of what we have in
io_uring_zerocopy_tx with some minor improvements to make checkpatch
happy.

A follow-up test will use the same helpers in a BPF sockopt test.

Signed-off-by: Breno Leitao <[email protected]>
---
 tools/include/io_uring/mini_liburing.h        | 282 ++++++++++++++++++
 tools/testing/selftests/net/Makefile          |   1 +
 .../selftests/net/io_uring_zerocopy_tx.c      | 268 +----------------
 3 files changed, 285 insertions(+), 266 deletions(-)
 create mode 100644 tools/include/io_uring/mini_liburing.h

diff --git a/tools/include/io_uring/mini_liburing.h b/tools/include/io_uring/mini_liburing.h
new file mode 100644
index 000000000000..9ccb16074eb5
--- /dev/null
+++ b/tools/include/io_uring/mini_liburing.h
@@ -0,0 +1,282 @@
+/* SPDX-License-Identifier: MIT */
+
+#include <linux/io_uring.h>
+#include <sys/mman.h>
+#include <sys/syscall.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+
+struct io_sq_ring {
+	unsigned int *head;
+	unsigned int *tail;
+	unsigned int *ring_mask;
+	unsigned int *ring_entries;
+	unsigned int *flags;
+	unsigned int *array;
+};
+
+struct io_cq_ring {
+	unsigned int *head;
+	unsigned int *tail;
+	unsigned int *ring_mask;
+	unsigned int *ring_entries;
+	struct io_uring_cqe *cqes;
+};
+
+struct io_uring_sq {
+	unsigned int *khead;
+	unsigned int *ktail;
+	unsigned int *kring_mask;
+	unsigned int *kring_entries;
+	unsigned int *kflags;
+	unsigned int *kdropped;
+	unsigned int *array;
+	struct io_uring_sqe *sqes;
+
+	unsigned int sqe_head;
+	unsigned int sqe_tail;
+
+	size_t ring_sz;
+};
+
+struct io_uring_cq {
+	unsigned int *khead;
+	unsigned int *ktail;
+	unsigned int *kring_mask;
+	unsigned int *kring_entries;
+	unsigned int *koverflow;
+	struct io_uring_cqe *cqes;
+
+	size_t ring_sz;
+};
+
+struct io_uring {
+	struct io_uring_sq sq;
+	struct io_uring_cq cq;
+	int ring_fd;
+};
+
+#if defined(__x86_64) || defined(__i386__)
+#define read_barrier()	__asm__ __volatile__("":::"memory")
+#define write_barrier()	__asm__ __volatile__("":::"memory")
+#else
+#define read_barrier()	__sync_synchronize()
+#define write_barrier()	__sync_synchronize()
+#endif
+
+static inline int io_uring_mmap(int fd, struct io_uring_params *p,
+				struct io_uring_sq *sq, struct io_uring_cq *cq)
+{
+	size_t size;
+	void *ptr;
+	int ret;
+
+	sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned int);
+	ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
+		   MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
+	if (ptr == MAP_FAILED)
+		return -errno;
+	sq->khead = ptr + p->sq_off.head;
+	sq->ktail = ptr + p->sq_off.tail;
+	sq->kring_mask = ptr + p->sq_off.ring_mask;
+	sq->kring_entries = ptr + p->sq_off.ring_entries;
+	sq->kflags = ptr + p->sq_off.flags;
+	sq->kdropped = ptr + p->sq_off.dropped;
+	sq->array = ptr + p->sq_off.array;
+
+	size = p->sq_entries * sizeof(struct io_uring_sqe);
+	sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
+	if (sq->sqes == MAP_FAILED) {
+		ret = -errno;
+err:
+		munmap(sq->khead, sq->ring_sz);
+		return ret;
+	}
+
+	cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
+	ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
+		   MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
+	if (ptr == MAP_FAILED) {
+		ret = -errno;
+		munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
+		goto err;
+	}
+	cq->khead = ptr + p->cq_off.head;
+	cq->ktail = ptr + p->cq_off.tail;
+	cq->kring_mask = ptr + p->cq_off.ring_mask;
+	cq->kring_entries = ptr + p->cq_off.ring_entries;
+	cq->koverflow = ptr + p->cq_off.overflow;
+	cq->cqes = ptr + p->cq_off.cqes;
+	return 0;
+}
+
+static inline int io_uring_setup(unsigned int entries,
+				 struct io_uring_params *p)
+{
+	return syscall(__NR_io_uring_setup, entries, p);
+}
+
+static inline int io_uring_enter(int fd, unsigned int to_submit,
+				 unsigned int min_complete,
+				 unsigned int flags, sigset_t *sig)
+{
+	return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
+		       flags, sig, _NSIG / 8);
+}
+
+static inline int io_uring_queue_init(unsigned int entries,
+				      struct io_uring *ring,
+				      unsigned int flags)
+{
+	struct io_uring_params p;
+	int fd, ret;
+
+	memset(ring, 0, sizeof(*ring));
+	memset(&p, 0, sizeof(p));
+	p.flags = flags;
+
+	fd = io_uring_setup(entries, &p);
+	if (fd < 0)
+		return fd;
+	ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
+	if (!ret)
+		ring->ring_fd = fd;
+	else
+		close(fd);
+	return ret;
+}
+
+/* Get a sqe */
+static inline struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
+{
+	struct io_uring_sq *sq = &ring->sq;
+
+	if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
+		return NULL;
+	return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
+}
+
+static inline int io_uring_wait_cqe(struct io_uring *ring,
+				    struct io_uring_cqe **cqe_ptr)
+{
+	struct io_uring_cq *cq = &ring->cq;
+	const unsigned int mask = *cq->kring_mask;
+	unsigned int head = *cq->khead;
+	int ret;
+
+	*cqe_ptr = NULL;
+	do {
+		read_barrier();
+		if (head != *cq->ktail) {
+			*cqe_ptr = &cq->cqes[head & mask];
+			break;
+		}
+		ret = io_uring_enter(ring->ring_fd, 0, 1,
+				     IORING_ENTER_GETEVENTS, NULL);
+		if (ret < 0)
+			return -errno;
+	} while (1);
+
+	return 0;
+}
+
+static inline int io_uring_submit(struct io_uring *ring)
+{
+	struct io_uring_sq *sq = &ring->sq;
+	const unsigned int mask = *sq->kring_mask;
+	unsigned int ktail, submitted, to_submit;
+	int ret;
+
+	read_barrier();
+	if (*sq->khead != *sq->ktail) {
+		submitted = *sq->kring_entries;
+		goto submit;
+	}
+	if (sq->sqe_head == sq->sqe_tail)
+		return 0;
+
+	ktail = *sq->ktail;
+	to_submit = sq->sqe_tail - sq->sqe_head;
+	for (submitted = 0; submitted < to_submit; submitted++) {
+		read_barrier();
+		sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
+	}
+	if (!submitted)
+		return 0;
+
+	if (*sq->ktail != ktail) {
+		write_barrier();
+		*sq->ktail = ktail;
+		write_barrier();
+	}
+submit:
+	ret = io_uring_enter(ring->ring_fd, submitted, 0,
+			     IORING_ENTER_GETEVENTS, NULL);
+	return ret < 0 ? -errno : ret;
+}
+
+static inline void io_uring_queue_exit(struct io_uring *ring)
+{
+	struct io_uring_sq *sq = &ring->sq;
+
+	munmap(sq->sqes, *sq->kring_entries * sizeof(struct io_uring_sqe));
+	munmap(sq->khead, sq->ring_sz);
+	close(ring->ring_fd);
+}
+
+/* Prepare and send the SQE */
+static inline void io_uring_prep_cmd(struct io_uring_sqe *sqe, int op,
+				     int sockfd,
+				     int level, int optname,
+				     const void *optval,
+				     int optlen)
+{
+	memset(sqe, 0, sizeof(*sqe));
+	sqe->opcode = (__u8)IORING_OP_URING_CMD;
+	sqe->fd = sockfd;
+	sqe->cmd_op = op;
+
+	sqe->level = level;
+	sqe->optname = optname;
+	sqe->optval = (unsigned long long)optval;
+	sqe->optlen = optlen;
+}
+
+static inline int io_uring_register_buffers(struct io_uring *ring,
+					    const struct iovec *iovecs,
+					    unsigned int nr_iovecs)
+{
+	int ret;
+
+	ret = syscall(__NR_io_uring_register, ring->ring_fd,
+		      IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
+	return (ret < 0) ? -errno : ret;
+}
+
+static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
+				      const void *buf, size_t len, int flags)
+{
+	memset(sqe, 0, sizeof(*sqe));
+	sqe->opcode = (__u8)IORING_OP_SEND;
+	sqe->fd = sockfd;
+	sqe->addr = (unsigned long)buf;
+	sqe->len = len;
+	sqe->msg_flags = (__u32)flags;
+}
+
+static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
+					const void *buf, size_t len, int flags,
+					unsigned int zc_flags)
+{
+	io_uring_prep_send(sqe, sockfd, buf, len, flags);
+	sqe->opcode = (__u8)IORING_OP_SEND_ZC;
+	sqe->ioprio = zc_flags;
+}
+
+static inline void io_uring_cqe_seen(struct io_uring *ring)
+{
+	*(&ring->cq)->khead += 1;
+	write_barrier();
+}
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 8b017070960d..f8d99837b9dc 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -98,6 +98,7 @@ $(OUTPUT)/reuseport_bpf_numa: LDLIBS += -lnuma
 $(OUTPUT)/tcp_mmap: LDLIBS += -lpthread -lcrypto
 $(OUTPUT)/tcp_inq: LDLIBS += -lpthread
 $(OUTPUT)/bind_bhash: LDLIBS += -lpthread
+$(OUTPUT)/io_uring_zerocopy_tx: CFLAGS += -I../../../include/
 
 # Rules to generate bpf obj nat6to4.o
 CLANG ?= clang
diff --git a/tools/testing/selftests/net/io_uring_zerocopy_tx.c b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
index 154287740172..76e604e4810e 100644
--- a/tools/testing/selftests/net/io_uring_zerocopy_tx.c
+++ b/tools/testing/selftests/net/io_uring_zerocopy_tx.c
@@ -36,6 +36,8 @@
 #include <sys/un.h>
 #include <sys/wait.h>
 
+#include <io_uring/mini_liburing.h>
+
 #define NOTIF_TAG 0xfffffffULL
 #define NONZC_TAG 0
 #define ZC_TAG 1
@@ -60,272 +62,6 @@ static struct sockaddr_storage cfg_dst_addr;
 
 static char payload[IP_MAXPACKET] __attribute__((aligned(4096)));
 
-struct io_sq_ring {
-	unsigned *head;
-	unsigned *tail;
-	unsigned *ring_mask;
-	unsigned *ring_entries;
-	unsigned *flags;
-	unsigned *array;
-};
-
-struct io_cq_ring {
-	unsigned *head;
-	unsigned *tail;
-	unsigned *ring_mask;
-	unsigned *ring_entries;
-	struct io_uring_cqe *cqes;
-};
-
-struct io_uring_sq {
-	unsigned *khead;
-	unsigned *ktail;
-	unsigned *kring_mask;
-	unsigned *kring_entries;
-	unsigned *kflags;
-	unsigned *kdropped;
-	unsigned *array;
-	struct io_uring_sqe *sqes;
-
-	unsigned sqe_head;
-	unsigned sqe_tail;
-
-	size_t ring_sz;
-};
-
-struct io_uring_cq {
-	unsigned *khead;
-	unsigned *ktail;
-	unsigned *kring_mask;
-	unsigned *kring_entries;
-	unsigned *koverflow;
-	struct io_uring_cqe *cqes;
-
-	size_t ring_sz;
-};
-
-struct io_uring {
-	struct io_uring_sq sq;
-	struct io_uring_cq cq;
-	int ring_fd;
-};
-
-#ifdef __alpha__
-# ifndef __NR_io_uring_setup
-#  define __NR_io_uring_setup		535
-# endif
-# ifndef __NR_io_uring_enter
-#  define __NR_io_uring_enter		536
-# endif
-# ifndef __NR_io_uring_register
-#  define __NR_io_uring_register	537
-# endif
-#else /* !__alpha__ */
-# ifndef __NR_io_uring_setup
-#  define __NR_io_uring_setup		425
-# endif
-# ifndef __NR_io_uring_enter
-#  define __NR_io_uring_enter		426
-# endif
-# ifndef __NR_io_uring_register
-#  define __NR_io_uring_register	427
-# endif
-#endif
-
-#if defined(__x86_64) || defined(__i386__)
-#define read_barrier()	__asm__ __volatile__("":::"memory")
-#define write_barrier()	__asm__ __volatile__("":::"memory")
-#else
-
-#define read_barrier()	__sync_synchronize()
-#define write_barrier()	__sync_synchronize()
-#endif
-
-static int io_uring_setup(unsigned int entries, struct io_uring_params *p)
-{
-	return syscall(__NR_io_uring_setup, entries, p);
-}
-
-static int io_uring_enter(int fd, unsigned int to_submit,
-			  unsigned int min_complete,
-			  unsigned int flags, sigset_t *sig)
-{
-	return syscall(__NR_io_uring_enter, fd, to_submit, min_complete,
-			flags, sig, _NSIG / 8);
-}
-
-static int io_uring_register_buffers(struct io_uring *ring,
-				     const struct iovec *iovecs,
-				     unsigned nr_iovecs)
-{
-	int ret;
-
-	ret = syscall(__NR_io_uring_register, ring->ring_fd,
-		      IORING_REGISTER_BUFFERS, iovecs, nr_iovecs);
-	return (ret < 0) ? -errno : ret;
-}
-
-static int io_uring_mmap(int fd, struct io_uring_params *p,
-			 struct io_uring_sq *sq, struct io_uring_cq *cq)
-{
-	size_t size;
-	void *ptr;
-	int ret;
-
-	sq->ring_sz = p->sq_off.array + p->sq_entries * sizeof(unsigned);
-	ptr = mmap(0, sq->ring_sz, PROT_READ | PROT_WRITE,
-		   MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQ_RING);
-	if (ptr == MAP_FAILED)
-		return -errno;
-	sq->khead = ptr + p->sq_off.head;
-	sq->ktail = ptr + p->sq_off.tail;
-	sq->kring_mask = ptr + p->sq_off.ring_mask;
-	sq->kring_entries = ptr + p->sq_off.ring_entries;
-	sq->kflags = ptr + p->sq_off.flags;
-	sq->kdropped = ptr + p->sq_off.dropped;
-	sq->array = ptr + p->sq_off.array;
-
-	size = p->sq_entries * sizeof(struct io_uring_sqe);
-	sq->sqes = mmap(0, size, PROT_READ | PROT_WRITE,
-			MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_SQES);
-	if (sq->sqes == MAP_FAILED) {
-		ret = -errno;
-err:
-		munmap(sq->khead, sq->ring_sz);
-		return ret;
-	}
-
-	cq->ring_sz = p->cq_off.cqes + p->cq_entries * sizeof(struct io_uring_cqe);
-	ptr = mmap(0, cq->ring_sz, PROT_READ | PROT_WRITE,
-			MAP_SHARED | MAP_POPULATE, fd, IORING_OFF_CQ_RING);
-	if (ptr == MAP_FAILED) {
-		ret = -errno;
-		munmap(sq->sqes, p->sq_entries * sizeof(struct io_uring_sqe));
-		goto err;
-	}
-	cq->khead = ptr + p->cq_off.head;
-	cq->ktail = ptr + p->cq_off.tail;
-	cq->kring_mask = ptr + p->cq_off.ring_mask;
-	cq->kring_entries = ptr + p->cq_off.ring_entries;
-	cq->koverflow = ptr + p->cq_off.overflow;
-	cq->cqes = ptr + p->cq_off.cqes;
-	return 0;
-}
-
-static int io_uring_queue_init(unsigned entries, struct io_uring *ring,
-			       unsigned flags)
-{
-	struct io_uring_params p;
-	int fd, ret;
-
-	memset(ring, 0, sizeof(*ring));
-	memset(&p, 0, sizeof(p));
-	p.flags = flags;
-
-	fd = io_uring_setup(entries, &p);
-	if (fd < 0)
-		return fd;
-	ret = io_uring_mmap(fd, &p, &ring->sq, &ring->cq);
-	if (!ret)
-		ring->ring_fd = fd;
-	else
-		close(fd);
-	return ret;
-}
-
-static int io_uring_submit(struct io_uring *ring)
-{
-	struct io_uring_sq *sq = &ring->sq;
-	const unsigned mask = *sq->kring_mask;
-	unsigned ktail, submitted, to_submit;
-	int ret;
-
-	read_barrier();
-	if (*sq->khead != *sq->ktail) {
-		submitted = *sq->kring_entries;
-		goto submit;
-	}
-	if (sq->sqe_head == sq->sqe_tail)
-		return 0;
-
-	ktail = *sq->ktail;
-	to_submit = sq->sqe_tail - sq->sqe_head;
-	for (submitted = 0; submitted < to_submit; submitted++) {
-		read_barrier();
-		sq->array[ktail++ & mask] = sq->sqe_head++ & mask;
-	}
-	if (!submitted)
-		return 0;
-
-	if (*sq->ktail != ktail) {
-		write_barrier();
-		*sq->ktail = ktail;
-		write_barrier();
-	}
-submit:
-	ret = io_uring_enter(ring->ring_fd, submitted, 0,
-				IORING_ENTER_GETEVENTS, NULL);
-	return ret < 0 ? -errno : ret;
-}
-
-static inline void io_uring_prep_send(struct io_uring_sqe *sqe, int sockfd,
-				      const void *buf, size_t len, int flags)
-{
-	memset(sqe, 0, sizeof(*sqe));
-	sqe->opcode = (__u8) IORING_OP_SEND;
-	sqe->fd = sockfd;
-	sqe->addr = (unsigned long) buf;
-	sqe->len = len;
-	sqe->msg_flags = (__u32) flags;
-}
-
-static inline void io_uring_prep_sendzc(struct io_uring_sqe *sqe, int sockfd,
-				        const void *buf, size_t len, int flags,
-				        unsigned zc_flags)
-{
-	io_uring_prep_send(sqe, sockfd, buf, len, flags);
-	sqe->opcode = (__u8) IORING_OP_SEND_ZC;
-	sqe->ioprio = zc_flags;
-}
-
-static struct io_uring_sqe *io_uring_get_sqe(struct io_uring *ring)
-{
-	struct io_uring_sq *sq = &ring->sq;
-
-	if (sq->sqe_tail + 1 - sq->sqe_head > *sq->kring_entries)
-		return NULL;
-	return &sq->sqes[sq->sqe_tail++ & *sq->kring_mask];
-}
-
-static int io_uring_wait_cqe(struct io_uring *ring, struct io_uring_cqe **cqe_ptr)
-{
-	struct io_uring_cq *cq = &ring->cq;
-	const unsigned mask = *cq->kring_mask;
-	unsigned head = *cq->khead;
-	int ret;
-
-	*cqe_ptr = NULL;
-	do {
-		read_barrier();
-		if (head != *cq->ktail) {
-			*cqe_ptr = &cq->cqes[head & mask];
-			break;
-		}
-		ret = io_uring_enter(ring->ring_fd, 0, 1,
-					IORING_ENTER_GETEVENTS, NULL);
-		if (ret < 0)
-			return -errno;
-	} while (1);
-
-	return 0;
-}
-
-static inline void io_uring_cqe_seen(struct io_uring *ring)
-{
-	*(&ring->cq)->khead += 1;
-	write_barrier();
-}
-
 static unsigned long gettimeofday_ms(void)
 {
 	struct timeval tv;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v4 07/10] io_uring/cmd: return -EOPNOTSUPP if net is disabled
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (5 preceding siblings ...)
  2023-09-04 16:24 ` [PATCH v4 06/10] selftests/net: Extract uring helpers to be reusable Breno Leitao
@ 2023-09-04 16:25 ` Breno Leitao
  2023-09-05 12:32   ` Gabriel Krisman Bertazi
  2023-09-04 16:25 ` [PATCH v4 08/10] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:25 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman
  Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni

Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If
network is not enabled, but io_uring is, then we want to return
-EOPNOTSUPP for any possible socket operation.

This is helpful because io_uring_cmd_sock() can now call functions that
only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET
inside the function itself.

Signed-off-by: Breno Leitao <[email protected]>
---
 io_uring/uring_cmd.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 60f843a357e0..6a91e1af7d05 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -167,6 +167,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
 
+#if defined(CONFIG_NET)
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
 	struct socket *sock = cmd->file->private_data;
@@ -192,4 +193,11 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		return -EOPNOTSUPP;
 	}
 }
+#else
+int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
+{
+	return -EOPNOTSUPP;
+}
+#endif
+
 EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 07/10] io_uring/cmd: return -EOPNOTSUPP if net is disabled
  2023-09-04 16:25 ` [PATCH v4 07/10] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
@ 2023-09-05 12:32   ` Gabriel Krisman Bertazi
  2023-09-08 17:04     ` Breno Leitao
  0 siblings, 1 reply; 23+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-09-05 12:32 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
	linux-kernel, netdev, io-uring, kuba, pabeni

Breno Leitao <[email protected]> writes:

> Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If
> network is not enabled, but io_uring is, then we want to return
> -EOPNOTSUPP for any possible socket operation.
>
> This is helpful because io_uring_cmd_sock() can now call functions that
> only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET
> inside the function itself.
>
> Signed-off-by: Breno Leitao <[email protected]>
> ---
>  io_uring/uring_cmd.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> index 60f843a357e0..6a91e1af7d05 100644
> --- a/io_uring/uring_cmd.c
> +++ b/io_uring/uring_cmd.c
> @@ -167,6 +167,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
>  }
>  EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
>  
> +#if defined(CONFIG_NET)
>  int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  {
>  	struct socket *sock = cmd->file->private_data;
> @@ -192,4 +193,11 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
>  		return -EOPNOTSUPP;
>  	}
>  }
> +#else
> +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> +{
> +	return -EOPNOTSUPP;
> +}
> +#endif
> +
>  EXPORT_SYMBOL_GPL(io_uring_cmd_sock);

It doesn't make much sense to export the symbol on the !CONFIG_NET case.
Usually, you'd make it a 'static inline' in the header file (even though
it won't be ever inlined in this case):

in include/linux/io_uring.h:

#if defined(CONFIG_NET)
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags);
#else
static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
{
	return -EOPNOTSUPP;
}
#endif

But this is a minor detail. I'd say to consider doing it if you end up doing
another spin of the patchset.  Other than that, looks good to me.

Thanks,

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 07/10] io_uring/cmd: return -EOPNOTSUPP if net is disabled
  2023-09-05 12:32   ` Gabriel Krisman Bertazi
@ 2023-09-08 17:04     ` Breno Leitao
  0 siblings, 0 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-08 17:04 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
	linux-kernel, netdev, io-uring, kuba, pabeni

On Tue, Sep 05, 2023 at 08:32:15AM -0400, Gabriel Krisman Bertazi wrote:
> Breno Leitao <[email protected]> writes:
> 
> > Protect io_uring_cmd_sock() to be called if CONFIG_NET is not set. If
> > network is not enabled, but io_uring is, then we want to return
> > -EOPNOTSUPP for any possible socket operation.
> >
> > This is helpful because io_uring_cmd_sock() can now call functions that
> > only exits if CONFIG_NET is enabled without having #ifdef CONFIG_NET
> > inside the function itself.
> >
> > Signed-off-by: Breno Leitao <[email protected]>
> > ---
> >  io_uring/uring_cmd.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
> > index 60f843a357e0..6a91e1af7d05 100644
> > --- a/io_uring/uring_cmd.c
> > +++ b/io_uring/uring_cmd.c
> > @@ -167,6 +167,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
> >  }
> >  EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
> >  
> > +#if defined(CONFIG_NET)
> >  int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> >  {
> >  	struct socket *sock = cmd->file->private_data;
> > @@ -192,4 +193,11 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> >  		return -EOPNOTSUPP;
> >  	}
> >  }
> > +#else
> > +int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> > +{
> > +	return -EOPNOTSUPP;
> > +}
> > +#endif
> > +
> >  EXPORT_SYMBOL_GPL(io_uring_cmd_sock);
> 
> It doesn't make much sense to export the symbol on the !CONFIG_NET case.
> Usually, you'd make it a 'static inline' in the header file (even though
> it won't be ever inlined in this case):
> 
> in include/linux/io_uring.h:
> 
> #if defined(CONFIG_NET)
> int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags);
> #else
> static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
> {
> 	return -EOPNOTSUPP;
> }
> #endif
> 
> But this is a minor detail. I'd say to consider doing it if you end up doing
> another spin of the patchset.  Other than that, looks good to me.

This makes sense, and I will add the symbol export inside the
"if defined(CONFIG_NET)" block, since I need to respin this patchset to
address the sockptr_t concern.

Thanks for the good reviews!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 08/10] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (6 preceding siblings ...)
  2023-09-04 16:25 ` [PATCH v4 07/10] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
@ 2023-09-04 16:25 ` Breno Leitao
  2023-09-05 12:24   ` Gabriel Krisman Bertazi
  2023-09-04 16:25 ` [PATCH v4 09/10] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:25 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman
  Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni

Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
where a sockptr_t is either userspace or kernel space, and handled as
such.

Differently from the getsockopt(2), the optlen field is not a userspace
pointers. In getsockopt(2), userspace provides optlen pointer, which is
overwritten by the kernel.  In this implementation, userspace passes a
u32, and the new value is returned in cqe->res. I.e., optlen is not a
pointer.

Important to say that userspace needs to keep the pointer alive until
the CQE is completed.

Signed-off-by: Breno Leitao <[email protected]>
---
 include/uapi/linux/io_uring.h |  7 +++++++
 io_uring/uring_cmd.c          | 28 ++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 8e61f8b7c2ce..29efa02a4dcb 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -43,6 +43,10 @@ struct io_uring_sqe {
 	union {
 		__u64	addr;	/* pointer to buffer or iovecs */
 		__u64	splice_off_in;
+		struct {
+			__u32	level;
+			__u32	optname;
+		};
 	};
 	__u32	len;		/* buffer size or number of iovecs */
 	union {
@@ -79,6 +83,7 @@ struct io_uring_sqe {
 	union {
 		__s32	splice_fd_in;
 		__u32	file_index;
+		__u32	optlen;
 		struct {
 			__u16	addr_len;
 			__u16	__pad3[1];
@@ -89,6 +94,7 @@ struct io_uring_sqe {
 			__u64	addr3;
 			__u64	__pad2[1];
 		};
+		__u64	optval;
 		/*
 		 * If the ring is initialized with IORING_SETUP_SQE128, then
 		 * this field is used for 80 bytes of arbitrary command data
@@ -734,6 +740,7 @@ struct io_uring_recvmsg_out {
 enum {
 	SOCKET_URING_OP_SIOCINQ		= 0,
 	SOCKET_URING_OP_SIOCOUTQ,
+	SOCKET_URING_OP_GETSOCKOPT,
 };
 
 #ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 6a91e1af7d05..c373e05ba9ce 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -167,6 +167,32 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed);
 
+INDIRECT_CALLABLE_DECLARE(bool tcp_bpf_bypass_getsockopt(int level,
+							 int optname));
+static inline int io_uring_cmd_getsockopt(struct socket *sock,
+					  struct io_uring_cmd *cmd,
+					  unsigned int issue_flags)
+{
+	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+	bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+	int optlen = READ_ONCE(cmd->sqe->optlen);
+	int optname = READ_ONCE(cmd->sqe->optname);
+	int level = READ_ONCE(cmd->sqe->level);
+	int err;
+
+	if (level != SOL_SOCKET)
+		return -EOPNOTSUPP;
+
+	err = do_sock_getsockopt(sock, compat, level, optname,
+				 USER_SOCKPTR(optval),
+				 KERNEL_SOCKPTR(&optlen));
+	if (err)
+		return err;
+
+	/* On success, return optlen */
+	return optlen;
+}
+
 #if defined(CONFIG_NET)
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
@@ -189,6 +215,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		if (ret)
 			return ret;
 		return arg;
+	case SOCKET_URING_OP_GETSOCKOPT:
+		return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 08/10] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT
  2023-09-04 16:25 ` [PATCH v4 08/10] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-09-05 12:24   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 23+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-09-05 12:24 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
	linux-kernel, netdev, io-uring, kuba, pabeni

Breno Leitao <[email protected]> writes:

> Add support for getsockopt command (SOCKET_URING_OP_GETSOCKOPT), where
> level is SOL_SOCKET. This is leveraging the sockptr_t infrastructure,
> where a sockptr_t is either userspace or kernel space, and handled as
> such.
>
> Differently from the getsockopt(2), the optlen field is not a userspace
> pointers. In getsockopt(2), userspace provides optlen pointer, which is
> overwritten by the kernel.  In this implementation, userspace passes a
> u32, and the new value is returned in cqe->res. I.e., optlen is not a
> pointer.
>
> Important to say that userspace needs to keep the pointer alive until
> the CQE is completed.
>
> Signed-off-by: Breno Leitao <[email protected]>

IMO, this looks much cleaner with most of the bpf and socket logic under
net/.

Reviewed-by: Gabriel Krisman Bertazi <[email protected]>

Thanks!


-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 09/10] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (7 preceding siblings ...)
  2023-09-04 16:25 ` [PATCH v4 08/10] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
@ 2023-09-04 16:25 ` Breno Leitao
  2023-09-05 12:24   ` Gabriel Krisman Bertazi
  2023-09-04 16:25 ` [PATCH v4 10/10] selftests/bpf/sockopt: Add io_uring support Breno Leitao
  2023-09-05 22:49 ` [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Jakub Kicinski
  10 siblings, 1 reply; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:25 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman
  Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni

Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
similar to setsockopt. This implementation leverages the function
do_sock_setsockopt(), which is shared with the setsockopt() system call
path.

Important to say that userspace needs to keep the pointer's memory alive
until the operation is completed. I.e, the memory could not be
deallocated before the CQE is returned to userspace.

Signed-off-by: Breno Leitao <[email protected]>
---
 include/uapi/linux/io_uring.h |  1 +
 io_uring/uring_cmd.c          | 17 +++++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 29efa02a4dcb..3b443da353ba 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -741,6 +741,7 @@ enum {
 	SOCKET_URING_OP_SIOCINQ		= 0,
 	SOCKET_URING_OP_SIOCOUTQ,
 	SOCKET_URING_OP_GETSOCKOPT,
+	SOCKET_URING_OP_SETSOCKOPT,
 };
 
 #ifdef __cplusplus
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index c373e05ba9ce..bec4730fb208 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -193,6 +193,21 @@ static inline int io_uring_cmd_getsockopt(struct socket *sock,
 	return optlen;
 }
 
+static inline int io_uring_cmd_setsockopt(struct socket *sock,
+					  struct io_uring_cmd *cmd,
+					  unsigned int issue_flags)
+{
+	void __user *optval = u64_to_user_ptr(READ_ONCE(cmd->sqe->optval));
+	bool compat = !!(issue_flags & IO_URING_F_COMPAT);
+	int optname = READ_ONCE(cmd->sqe->optname);
+	sockptr_t optval_s = USER_SOCKPTR(optval);
+	int optlen = READ_ONCE(cmd->sqe->optlen);
+	int level = READ_ONCE(cmd->sqe->level);
+
+	return do_sock_setsockopt(sock, compat, level, optname, optval_s,
+				  optlen);
+}
+
 #if defined(CONFIG_NET)
 int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 {
@@ -217,6 +232,8 @@ int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags)
 		return arg;
 	case SOCKET_URING_OP_GETSOCKOPT:
 		return io_uring_cmd_getsockopt(sock, cmd, issue_flags);
+	case SOCKET_URING_OP_SETSOCKOPT:
+		return io_uring_cmd_setsockopt(sock, cmd, issue_flags);
 	default:
 		return -EOPNOTSUPP;
 	}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 09/10] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT
  2023-09-04 16:25 ` [PATCH v4 09/10] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
@ 2023-09-05 12:24   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 23+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-09-05 12:24 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau, bpf,
	linux-kernel, netdev, io-uring, kuba, pabeni

Breno Leitao <[email protected]> writes:

> Add initial support for SOCKET_URING_OP_SETSOCKOPT. This new command is
> similar to setsockopt. This implementation leverages the function
> do_sock_setsockopt(), which is shared with the setsockopt() system call
> path.
>
> Important to say that userspace needs to keep the pointer's memory alive
> until the operation is completed. I.e, the memory could not be
> deallocated before the CQE is returned to userspace.
>
> Signed-off-by: Breno Leitao <[email protected]>

Likewise, much cleaner!  Feel free to add:

Reviewed-by: Gabriel Krisman Bertazi <[email protected]>

Thanks,

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v4 10/10] selftests/bpf/sockopt: Add io_uring support
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (8 preceding siblings ...)
  2023-09-04 16:25 ` [PATCH v4 09/10] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
@ 2023-09-04 16:25 ` Breno Leitao
  2023-09-05 22:49 ` [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Jakub Kicinski
  10 siblings, 0 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-04 16:25 UTC (permalink / raw)
  To: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, Andrii Nakryiko, Mykola Lysenko, Alexei Starovoitov,
	Daniel Borkmann, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Hao Luo, Jiri Olsa, Shuah Khan
  Cc: bpf, linux-kernel, netdev, io-uring, kuba, pabeni,
	Daniel Müller, Wang Yufen,
	open list:KERNEL SELFTEST FRAMEWORK

Expand the sockopt test to use also check for io_uring {g,s}etsockopt
commands operations.

This patch starts by marking each test if they support io_uring support
or not.

Right now, io_uring cmd getsockopt() has a limitation of only
accepting level == SOL_SOCKET, otherwise it returns -EOPNOTSUPP. Since
there aren't any test exercising getsockopt(level == SOL_SOCKET), this
patch changes two tests to use level == SOL_SOCKET, they are
"getsockopt: support smaller ctx->optlen" and "getsockopt: read
ctx->optlen".
There is no limitation for the setsockopt() part.

Later, each test runs using regular {g,s}etsockopt systemcalls, and, if
liburing is supported, execute the same test (again), but calling
liburing {g,s}setsockopt commands.

This patch also changes the level of two tests to use SOL_SOCKET for the
following two tests. This is going to help to exercise the io_uring
subsystem:
 * getsockopt: read ctx->optlen
 * getsockopt: support smaller ctx->optlen

Signed-off-by: Breno Leitao <[email protected]>
---
 .../selftests/bpf/prog_tests/sockopt.c        | 113 +++++++++++++++++-
 1 file changed, 107 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt.c b/tools/testing/selftests/bpf/prog_tests/sockopt.c
index 9e6a5e3ed4de..9b1f7f0b1f7e 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockopt.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockopt.c
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
+#include <io_uring/mini_liburing.h>
 #include "cgroup_helpers.h"
 
 static char bpf_log_buf[4096];
@@ -38,6 +39,7 @@ static struct sockopt_test {
 	socklen_t			get_optlen_ret;
 
 	enum sockopt_test_error		error;
+	bool				io_uring_support;
 } tests[] = {
 
 	/* ==================== getsockopt ====================  */
@@ -251,7 +253,9 @@ static struct sockopt_test {
 		.attach_type = BPF_CGROUP_GETSOCKOPT,
 		.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
 
+		.get_level = SOL_SOCKET,
 		.get_optlen = 64,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "getsockopt: deny bigger ctx->optlen",
@@ -276,6 +280,7 @@ static struct sockopt_test {
 		.get_optlen = 64,
 
 		.error = EFAULT_GETSOCKOPT,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "getsockopt: ignore >PAGE_SIZE optlen",
@@ -318,6 +323,7 @@ static struct sockopt_test {
 		.get_optval = {}, /* the changes are ignored */
 		.get_optlen = PAGE_SIZE + 1,
 		.error = EOPNOTSUPP_GETSOCKOPT,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "getsockopt: support smaller ctx->optlen",
@@ -337,8 +343,10 @@ static struct sockopt_test {
 		.attach_type = BPF_CGROUP_GETSOCKOPT,
 		.expected_attach_type = BPF_CGROUP_GETSOCKOPT,
 
+		.get_level = SOL_SOCKET,
 		.get_optlen = 64,
 		.get_optlen_ret = 32,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "getsockopt: deny writing to ctx->optval",
@@ -518,6 +526,7 @@ static struct sockopt_test {
 		.set_level = 123,
 
 		.set_optlen = 1,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: allow changing ctx->level",
@@ -572,6 +581,7 @@ static struct sockopt_test {
 		.set_optname = 123,
 
 		.set_optlen = 1,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: allow changing ctx->optname",
@@ -624,6 +634,7 @@ static struct sockopt_test {
 		.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
 
 		.set_optlen = 64,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: ctx->optlen == -1 is ok",
@@ -640,6 +651,7 @@ static struct sockopt_test {
 		.expected_attach_type = BPF_CGROUP_SETSOCKOPT,
 
 		.set_optlen = 64,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: deny ctx->optlen < 0 (except -1)",
@@ -658,6 +670,7 @@ static struct sockopt_test {
 		.set_optlen = 4,
 
 		.error = EFAULT_SETSOCKOPT,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: deny ctx->optlen > input optlen",
@@ -675,6 +688,7 @@ static struct sockopt_test {
 		.set_optlen = 64,
 
 		.error = EFAULT_SETSOCKOPT,
+		.io_uring_support = true,
 	},
 	{
 		.descr = "setsockopt: ignore >PAGE_SIZE optlen",
@@ -940,7 +954,89 @@ static int load_prog(const struct bpf_insn *insns,
 	return fd;
 }
 
-static int run_test(int cgroup_fd, struct sockopt_test *test)
+/* Core function that handles io_uring ring initialization,
+ * sending SQE with sockopt command and waiting for the CQE.
+ */
+static int uring_sockopt(int op, int fd, int level, int optname,
+			 const void *optval, socklen_t optlen)
+{
+	struct io_uring_cqe *cqe;
+	struct io_uring_sqe *sqe;
+	struct io_uring ring;
+	int err;
+
+	err = io_uring_queue_init(1, &ring, 0);
+	if (!ASSERT_OK(err, "Failed to initialize io_uring ring"))
+		return err;
+
+	sqe = io_uring_get_sqe(&ring);
+	if (!ASSERT_NEQ(sqe, NULL, "Failed to get an SQE")) {
+		err = -1;
+		goto fail;
+	}
+
+	io_uring_prep_cmd(sqe, op, fd, level, optname, optval, optlen);
+
+	err = io_uring_submit(&ring);
+	if (!ASSERT_EQ(err, 1, "Failed to submit SQE"))
+		goto fail;
+
+	err = io_uring_wait_cqe(&ring, &cqe);
+	if (!ASSERT_OK(err, "Failed to wait for CQE"))
+		goto fail;
+
+	err = cqe->res;
+
+fail:
+	io_uring_queue_exit(&ring);
+
+	return err;
+}
+
+static int uring_setsockopt(int fd, int level, int optname, const void *optval,
+			    socklen_t optlen)
+{
+	return uring_sockopt(SOCKET_URING_OP_SETSOCKOPT, fd, level, optname,
+			     optval, optlen);
+}
+
+static int uring_getsockopt(int fd, int level, int optname, void *optval,
+			    socklen_t *optlen)
+{
+	int ret = uring_sockopt(SOCKET_URING_OP_GETSOCKOPT, fd, level, optname,
+				optval, *optlen);
+	if (ret < 0)
+		return ret;
+
+	/* Populate optlen back to be compatible with systemcall interface,
+	 * and simplify the test.
+	 */
+	*optlen = ret;
+
+	return 0;
+}
+
+/* Execute the setsocktopt operation */
+static int call_setsockopt(bool use_io_uring, int fd, int level, int optname,
+			   const void *optval, socklen_t optlen)
+{
+	if (use_io_uring)
+		return uring_setsockopt(fd, level, optname, optval, optlen);
+
+	return setsockopt(fd, level, optname, optval, optlen);
+}
+
+/* Execute the getsocktopt operation */
+static int call_getsockopt(bool use_io_uring, int fd, int level, int optname,
+			   void *optval, socklen_t *optlen)
+{
+	if (use_io_uring)
+		return uring_getsockopt(fd, level, optname, optval, optlen);
+
+	return getsockopt(fd, level, optname, optval, optlen);
+}
+
+static int run_test(int cgroup_fd, struct sockopt_test *test, bool use_io_uring)
 {
 	int sock_fd, err, prog_fd;
 	void *optval = NULL;
@@ -980,8 +1076,9 @@ static int run_test(int cgroup_fd, struct sockopt_test *test)
 			test->set_optlen = num_pages * sysconf(_SC_PAGESIZE) + remainder;
 		}
 
-		err = setsockopt(sock_fd, test->set_level, test->set_optname,
-				 test->set_optval, test->set_optlen);
+		err = call_setsockopt(use_io_uring, sock_fd, test->set_level,
+				      test->set_optname, test->set_optval,
+				      test->set_optlen);
 		if (err) {
 			if (errno == EPERM && test->error == EPERM_SETSOCKOPT)
 				goto close_sock_fd;
@@ -1008,8 +1105,8 @@ static int run_test(int cgroup_fd, struct sockopt_test *test)
 		socklen_t expected_get_optlen = test->get_optlen_ret ?:
 			test->get_optlen;
 
-		err = getsockopt(sock_fd, test->get_level, test->get_optname,
-				 optval, &optlen);
+		err = call_getsockopt(use_io_uring, sock_fd, test->get_level,
+				      test->get_optname, optval, &optlen);
 		if (err) {
 			if (errno == EOPNOTSUPP && test->error == EOPNOTSUPP_GETSOCKOPT)
 				goto free_optval;
@@ -1063,7 +1160,11 @@ void test_sockopt(void)
 		if (!test__start_subtest(tests[i].descr))
 			continue;
 
-		ASSERT_OK(run_test(cgroup_fd, &tests[i]), tests[i].descr);
+		ASSERT_OK(run_test(cgroup_fd, &tests[i], false),
+			  tests[i].descr);
+		if (tests[i].io_uring_support)
+			ASSERT_OK(run_test(cgroup_fd, &tests[i], true),
+				  tests[i].descr);
 	}
 
 	close(cgroup_fd);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands
  2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
                   ` (9 preceding siblings ...)
  2023-09-04 16:25 ` [PATCH v4 10/10] selftests/bpf/sockopt: Add io_uring support Breno Leitao
@ 2023-09-05 22:49 ` Jakub Kicinski
  2023-09-08 16:55   ` Breno Leitao
  2023-10-06 15:45   ` Breno Leitao
  10 siblings, 2 replies; 23+ messages in thread
From: Jakub Kicinski @ 2023-09-05 22:49 UTC (permalink / raw)
  To: Breno Leitao
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, bpf, linux-kernel, netdev, io-uring, pabeni

On Mon,  4 Sep 2023 09:24:53 -0700 Breno Leitao wrote:
> Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
> become flexible enough to accept user or kernel pointers for optval/optlen.

Have you seen:

https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/

? I wasn't aware that Linus felt this way, now I wonder if having
sockptr_t spread will raise any red flags as this code flows back
to him.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands
  2023-09-05 22:49 ` [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Jakub Kicinski
@ 2023-09-08 16:55   ` Breno Leitao
  2023-10-06 15:45   ` Breno Leitao
  1 sibling, 0 replies; 23+ messages in thread
From: Breno Leitao @ 2023-09-08 16:55 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, bpf, linux-kernel, netdev, io-uring, pabeni

On Tue, Sep 05, 2023 at 03:49:51PM -0700, Jakub Kicinski wrote:
> On Mon,  4 Sep 2023 09:24:53 -0700 Breno Leitao wrote:
> > Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
> > become flexible enough to accept user or kernel pointers for optval/optlen.
> 
> Have you seen:
> 
> https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/

I haven't but I think it will not affect *much* this patchset.

> ? I wasn't aware that Linus felt this way, now I wonder if having
> sockptr_t spread will raise any red flags as this code flows back
> to him.

I can change the io_uring API in a way that we can avoid these
sockptr_t changes completely.

My plan is to mimic what getsockopt(2) is doing in io_uring cmd path, in
regard to optlen being an userpointer, instead of a value - which is
then translated to a KERNEL_SOCKPTR.

In this way, this change don't need to touch any sockptr field.

Thanks for the heads-up

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands
  2023-09-05 22:49 ` [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Jakub Kicinski
  2023-09-08 16:55   ` Breno Leitao
@ 2023-10-06 15:45   ` Breno Leitao
  2023-10-09 10:11     ` Willem de Bruijn
  1 sibling, 1 reply; 23+ messages in thread
From: Breno Leitao @ 2023-10-06 15:45 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: sdf, axboe, asml.silence, willemdebruijn.kernel, martin.lau,
	krisman, bpf, linux-kernel, netdev, io-uring, pabeni

Hello Jakub,

On Tue, Sep 05, 2023 at 03:49:51PM -0700, Jakub Kicinski wrote:
> On Mon,  4 Sep 2023 09:24:53 -0700 Breno Leitao wrote:
> > Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
> > become flexible enough to accept user or kernel pointers for optval/optlen.
> 
> Have you seen:
> 
> https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/
> 
> ? I wasn't aware that Linus felt this way, now I wonder if having
> sockptr_t spread will raise any red flags as this code flows back
> to him.

Thanks for the heads-up. I've been thinking about it for a while and I'd
like to hear what are the next steps here.

Let me first back up and state where we are, and what is the current
situation:

1) __sys_getsockopt() uses __user pointers for both optval and optlen
2) For io_uring command, Jens[1] suggested we get optlen from the io_uring
sqe, which is a kernel pointer/value.

Thus, we need to make the common code (callbacks) able to handle __user
and kernel pointers (for optlen, at least).

From a proto_ops callback perspective, ->setsockopt() uses sockptr.

          int             (*setsockopt)(struct socket *sock, int level,
                                        int optname, sockptr_t optval,
                                        unsigned int optlen);

Getsockopt() uses sockptr() for level=SOL_SOCKET:

	int sk_getsockopt(struct sock *sk, int level, int optname,
                    sockptr_t optval, sockptr_t optlen)

But not for the other levels:

	int             (*getsockopt)(struct socket *sock, int level,
				      int optname, char __user *optval, int __user *optlen);

That said, if this patchset shouldn't use sockptr anymore, what is the
recommendation?

If we move this patchset to use iov_iter instead of sockptr, then I
understand we want to move *all* these callbacks to use iov_vec. Is this
the right direction?

Thanks for the guidance!

[1] https://lore.kernel.org/all/[email protected]/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands
  2023-10-06 15:45   ` Breno Leitao
@ 2023-10-09 10:11     ` Willem de Bruijn
  2023-10-09 13:28       ` Breno Leitao
  0 siblings, 1 reply; 23+ messages in thread
From: Willem de Bruijn @ 2023-10-09 10:11 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Jakub Kicinski, sdf, axboe, asml.silence, martin.lau, krisman,
	bpf, linux-kernel, netdev, io-uring, pabeni

On Fri, Oct 6, 2023 at 10:45 AM Breno Leitao <[email protected]> wrote:
>
> Hello Jakub,
>
> On Tue, Sep 05, 2023 at 03:49:51PM -0700, Jakub Kicinski wrote:
> > On Mon,  4 Sep 2023 09:24:53 -0700 Breno Leitao wrote:
> > > Patches 1-2: Modify the BPF hooks to support sockptr_t, so, these functions
> > > become flexible enough to accept user or kernel pointers for optval/optlen.
> >
> > Have you seen:
> >
> > https://lore.kernel.org/all/CAHk-=wgGV61xrG=gO0=dXH64o2TDWWrXn1mx-CX885JZ7h84Og@mail.gmail.com/
> >
> > ? I wasn't aware that Linus felt this way, now I wonder if having
> > sockptr_t spread will raise any red flags as this code flows back
> > to him.
>
> Thanks for the heads-up. I've been thinking about it for a while and I'd
> like to hear what are the next steps here.
>
> Let me first back up and state where we are, and what is the current
> situation:
>
> 1) __sys_getsockopt() uses __user pointers for both optval and optlen
> 2) For io_uring command, Jens[1] suggested we get optlen from the io_uring
> sqe, which is a kernel pointer/value.
>
> Thus, we need to make the common code (callbacks) able to handle __user
> and kernel pointers (for optlen, at least).
>
> From a proto_ops callback perspective, ->setsockopt() uses sockptr.
>
>           int             (*setsockopt)(struct socket *sock, int level,
>                                         int optname, sockptr_t optval,
>                                         unsigned int optlen);
>
> Getsockopt() uses sockptr() for level=SOL_SOCKET:
>
>         int sk_getsockopt(struct sock *sk, int level, int optname,
>                     sockptr_t optval, sockptr_t optlen)
>
> But not for the other levels:
>
>         int             (*getsockopt)(struct socket *sock, int level,
>                                       int optname, char __user *optval, int __user *optlen);
>
>
> That said, if this patchset shouldn't use sockptr anymore, what is the
> recommendation?
>
> If we move this patchset to use iov_iter instead of sockptr, then I
> understand we want to move *all* these callbacks to use iov_vec. Is this
> the right direction?
>
> Thanks for the guidance!
>
> [1] https://lore.kernel.org/all/[email protected]/

Since sockptr_t is already used by __sys_setsockopt and
__sys_setsockopt, patches 1 and 2 don't introduce any new sockptr code
paths.

setsockopt callbacks also already use sockptr as of commit
a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt").

getsockopt callbacks do take user pointers, just not sockptr.

Is the only issue right now the optlen kernel pointer?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands
  2023-10-09 10:11     ` Willem de Bruijn
@ 2023-10-09 13:28       ` Breno Leitao
  2023-10-09 16:55         ` Jakub Kicinski
  0 siblings, 1 reply; 23+ messages in thread
From: Breno Leitao @ 2023-10-09 13:28 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Jakub Kicinski, sdf, axboe, asml.silence, martin.lau, krisman,
	bpf, linux-kernel, netdev, io-uring, pabeni

On Mon, Oct 09, 2023 at 03:11:05AM -0700, Willem de Bruijn wrote:
> On Fri, Oct 6, 2023 at 10:45 AM Breno Leitao <[email protected]> wrote:
> > Let me first back up and state where we are, and what is the current
> > situation:
> >
> > 1) __sys_getsockopt() uses __user pointers for both optval and optlen
> > 2) For io_uring command, Jens[1] suggested we get optlen from the io_uring
> > sqe, which is a kernel pointer/value.
> >
> > Thus, we need to make the common code (callbacks) able to handle __user
> > and kernel pointers (for optlen, at least).
> >
> > From a proto_ops callback perspective, ->setsockopt() uses sockptr.
> >
> >           int             (*setsockopt)(struct socket *sock, int level,
> >                                         int optname, sockptr_t optval,
> >                                         unsigned int optlen);
> >
> > Getsockopt() uses sockptr() for level=SOL_SOCKET:
> >
> >         int sk_getsockopt(struct sock *sk, int level, int optname,
> >                     sockptr_t optval, sockptr_t optlen)
> >
> > But not for the other levels:
> >
> >         int             (*getsockopt)(struct socket *sock, int level,
> >                                       int optname, char __user *optval, int __user *optlen);
> >
> >
> > That said, if this patchset shouldn't use sockptr anymore, what is the
> > recommendation?
> >
> > If we move this patchset to use iov_iter instead of sockptr, then I
> > understand we want to move *all* these callbacks to use iov_vec. Is this
> > the right direction?
> >
> > Thanks for the guidance!
> >
> > [1] https://lore.kernel.org/all/[email protected]/
> 
> Since sockptr_t is already used by __sys_setsockopt and
> __sys_setsockopt, patches 1 and 2 don't introduce any new sockptr code
> paths.
> 
> setsockopt callbacks also already use sockptr as of commit
> a7b75c5a8c41 ("net: pass a sockptr_t into ->setsockopt").
> 
> getsockopt callbacks do take user pointers, just not sockptr.
> 
> Is the only issue right now the optlen kernel pointer?

Correct. The current discussion is only related to optlen in the
getsockopt() callbacks (invoked when level != SOL_SOCKET). Everything
else (getsockopt(level=SOL_SOCKET..) and setsockopt) is using sockptr.

Is it bad if we review/merge this code as is (using sockptr), and start
the iov_iter/getsockopt() refactor in a follow-up thread?

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands
  2023-10-09 13:28       ` Breno Leitao
@ 2023-10-09 16:55         ` Jakub Kicinski
  0 siblings, 0 replies; 23+ messages in thread
From: Jakub Kicinski @ 2023-10-09 16:55 UTC (permalink / raw)
  To: Breno Leitao
  Cc: Willem de Bruijn, sdf, axboe, asml.silence, martin.lau, krisman,
	bpf, linux-kernel, netdev, io-uring, pabeni

On Mon, 9 Oct 2023 06:28:00 -0700 Breno Leitao wrote:
> Correct. The current discussion is only related to optlen in the
> getsockopt() callbacks (invoked when level != SOL_SOCKET). Everything
> else (getsockopt(level=SOL_SOCKET..) and setsockopt) is using sockptr.
> 
> Is it bad if we review/merge this code as is (using sockptr), and start
> the iov_iter/getsockopt() refactor in a follow-up thread?

Sorry for the delay, I only looked at the code now :S
Agreed, that there's no need to worry about the sockptr spread
in this series. It looks good to go in.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-10-09 16:55 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-04 16:24 [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Breno Leitao
2023-09-04 16:24 ` [PATCH v4 01/10] bpf: Leverage sockptr_t in BPF getsockopt hook Breno Leitao
2023-09-04 16:24 ` [PATCH v4 02/10] bpf: Leverage sockptr_t in BPF setsockopt hook Breno Leitao
2023-09-04 16:24 ` [PATCH v4 03/10] net/socket: Break down __sys_setsockopt Breno Leitao
2023-09-04 16:24 ` [PATCH v4 04/10] net/socket: Break down __sys_getsockopt Breno Leitao
2023-09-05  9:36   ` David Laight
2023-09-05 10:29   ` Andy Shevchenko
2023-09-04 16:24 ` [PATCH v4 05/10] io_uring/cmd: Pass compat mode in issue_flags Breno Leitao
2023-09-04 16:24 ` [PATCH v4 06/10] selftests/net: Extract uring helpers to be reusable Breno Leitao
2023-09-04 16:25 ` [PATCH v4 07/10] io_uring/cmd: return -EOPNOTSUPP if net is disabled Breno Leitao
2023-09-05 12:32   ` Gabriel Krisman Bertazi
2023-09-08 17:04     ` Breno Leitao
2023-09-04 16:25 ` [PATCH v4 08/10] io_uring/cmd: Introduce SOCKET_URING_OP_GETSOCKOPT Breno Leitao
2023-09-05 12:24   ` Gabriel Krisman Bertazi
2023-09-04 16:25 ` [PATCH v4 09/10] io_uring/cmd: Introduce SOCKET_URING_OP_SETSOCKOPT Breno Leitao
2023-09-05 12:24   ` Gabriel Krisman Bertazi
2023-09-04 16:25 ` [PATCH v4 10/10] selftests/bpf/sockopt: Add io_uring support Breno Leitao
2023-09-05 22:49 ` [PATCH v4 00/10] io_uring: Initial support for {s,g}etsockopt commands Jakub Kicinski
2023-09-08 16:55   ` Breno Leitao
2023-10-06 15:45   ` Breno Leitao
2023-10-09 10:11     ` Willem de Bruijn
2023-10-09 13:28       ` Breno Leitao
2023-10-09 16:55         ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox