public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v3 for-next 0/3] io_uring: multishot recvmsg
@ 2022-07-14 11:02 Dylan Yudaken
  2022-07-14 11:02 ` [PATCH v3 for-next 1/3] net: copy from user before calling __copy_msghdr Dylan Yudaken
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Dylan Yudaken @ 2022-07-14 11:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, davem, edumazet, kuba, pabeni,
	io-uring
  Cc: netdev, Kernel-team, Dylan Yudaken

This series adds multishot support to recvmsg in io_uring.

The idea is that you submit a single multishot recvmsg and then receive
completions as and when data arrives. For recvmsg each completion also has
control data, and this is necessarily included in the same buffer as the
payload.

In order to do this a new structure is used: io_uring_recvmsg_out. This
specifies the length written of the name, control and payload. As well as
including the flags.
The layout of the buffer is <header><name><control><payload> where the
lengths are those specified in the original msghdr used to issue the recvmsg.

I suspect this API will be the most contentious part of this series and would
appreciate any comments on it.

For completeness I considered having the original struct msghdr as the header,
but size wise it is much bigger (72 bytes including an iovec vs 16 bytes here).
Testing also showed a 1% slowdown in terms of QPS.

Using a mini network tester [1] shows 14% QPS improvment using this API, however
this is likely to go down to ~8% with the latest allocation cache added by Jens.

[1]: https://github.com/DylanZA/netbench/tree/main

Patches 1,2 change the copy_msghdr code to take a user_msghdr as input
Patch 3 is the multishot feature

v3:
 * apply formatting comments
 * refactor io_recvmsg_prep_multishot to reduce casts
 * move some overflow logic into recvmsg_prep as only need to call it once

v2:
 * Rebase without netbuf recycling provided by io_uring
 * Fix payload field output with MSG_TRUNC set to match recvmsg(2)

Dylan Yudaken (3):
  net: copy from user before calling __copy_msghdr
  net: copy from user before calling __get_compat_msghdr
  io_uring: support multishot in recvmsg

 include/linux/socket.h        |   7 +-
 include/net/compat.h          |   5 +-
 include/uapi/linux/io_uring.h |   7 ++
 io_uring/net.c                | 216 ++++++++++++++++++++++++++++------
 io_uring/net.h                |   6 +
 net/compat.c                  |  39 +++---
 net/socket.c                  |  37 +++---
 7 files changed, 232 insertions(+), 85 deletions(-)


base-commit: 20898aeac6b82d8eb6247754494584b2f6cafd53
-- 
2.30.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 for-next 1/3] net: copy from user before calling __copy_msghdr
  2022-07-14 11:02 [PATCH v3 for-next 0/3] io_uring: multishot recvmsg Dylan Yudaken
@ 2022-07-14 11:02 ` Dylan Yudaken
  2022-07-14 11:02 ` [PATCH v3 for-next 2/3] net: copy from user before calling __get_compat_msghdr Dylan Yudaken
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Dylan Yudaken @ 2022-07-14 11:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, davem, edumazet, kuba, pabeni,
	io-uring
  Cc: netdev, Kernel-team, Dylan Yudaken

this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Dylan Yudaken <[email protected]>
---
 include/linux/socket.h |  7 +++----
 io_uring/net.c         | 17 +++++++++--------
 net/socket.c           | 37 ++++++++++++++++---------------------
 3 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 17311ad9f9af..be24f1c8568a 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -416,10 +416,9 @@ extern int recvmsg_copy_msghdr(struct msghdr *msg,
 			       struct user_msghdr __user *umsg, unsigned flags,
 			       struct sockaddr __user **uaddr,
 			       struct iovec **iov);
-extern int __copy_msghdr_from_user(struct msghdr *kmsg,
-				   struct user_msghdr __user *umsg,
-				   struct sockaddr __user **save_addr,
-				   struct iovec __user **uiov, size_t *nsegs);
+extern int __copy_msghdr(struct msghdr *kmsg,
+			 struct user_msghdr *umsg,
+			 struct sockaddr __user **save_addr);
 
 /* helpers which do the actual work for syscalls */
 extern int __sys_recvfrom(int fd, void __user *ubuf, size_t size,
diff --git a/io_uring/net.c b/io_uring/net.c
index dc9190eafbe7..da7667ed3610 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -329,31 +329,32 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
 				 struct io_async_msghdr *iomsg)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
-	struct iovec __user *uiov;
-	size_t iov_len;
+	struct user_msghdr msg;
 	int ret;
 
-	ret = __copy_msghdr_from_user(&iomsg->msg, sr->umsg,
-					&iomsg->uaddr, &uiov, &iov_len);
+	if (copy_from_user(&msg, sr->umsg, sizeof(*sr->umsg)))
+		return -EFAULT;
+
+	ret = __copy_msghdr(&iomsg->msg, &msg, &iomsg->uaddr);
 	if (ret)
 		return ret;
 
 	if (req->flags & REQ_F_BUFFER_SELECT) {
-		if (iov_len == 0) {
+		if (msg.msg_iovlen == 0) {
 			sr->len = iomsg->fast_iov[0].iov_len = 0;
 			iomsg->fast_iov[0].iov_base = NULL;
 			iomsg->free_iov = NULL;
-		} else if (iov_len > 1) {
+		} else if (msg.msg_iovlen > 1) {
 			return -EINVAL;
 		} else {
-			if (copy_from_user(iomsg->fast_iov, uiov, sizeof(*uiov)))
+			if (copy_from_user(iomsg->fast_iov, msg.msg_iov, sizeof(*msg.msg_iov)))
 				return -EFAULT;
 			sr->len = iomsg->fast_iov[0].iov_len;
 			iomsg->free_iov = NULL;
 		}
 	} else {
 		iomsg->free_iov = iomsg->fast_iov;
-		ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
+		ret = __import_iovec(READ, msg.msg_iov, msg.msg_iovlen, UIO_FASTIOV,
 				     &iomsg->free_iov, &iomsg->msg.msg_iter,
 				     false);
 		if (ret > 0)
diff --git a/net/socket.c b/net/socket.c
index 96300cdc0625..843545c21ec2 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2358,25 +2358,20 @@ struct used_address {
 	unsigned int name_len;
 };
 
-int __copy_msghdr_from_user(struct msghdr *kmsg,
-			    struct user_msghdr __user *umsg,
-			    struct sockaddr __user **save_addr,
-			    struct iovec __user **uiov, size_t *nsegs)
+int __copy_msghdr(struct msghdr *kmsg,
+		  struct user_msghdr *msg,
+		  struct sockaddr __user **save_addr)
 {
-	struct user_msghdr msg;
 	ssize_t err;
 
-	if (copy_from_user(&msg, umsg, sizeof(*umsg)))
-		return -EFAULT;
-
 	kmsg->msg_control_is_user = true;
 	kmsg->msg_get_inq = 0;
-	kmsg->msg_control_user = msg.msg_control;
-	kmsg->msg_controllen = msg.msg_controllen;
-	kmsg->msg_flags = msg.msg_flags;
+	kmsg->msg_control_user = msg->msg_control;
+	kmsg->msg_controllen = msg->msg_controllen;
+	kmsg->msg_flags = msg->msg_flags;
 
-	kmsg->msg_namelen = msg.msg_namelen;
-	if (!msg.msg_name)
+	kmsg->msg_namelen = msg->msg_namelen;
+	if (!msg->msg_name)
 		kmsg->msg_namelen = 0;
 
 	if (kmsg->msg_namelen < 0)
@@ -2386,11 +2381,11 @@ int __copy_msghdr_from_user(struct msghdr *kmsg,
 		kmsg->msg_namelen = sizeof(struct sockaddr_storage);
 
 	if (save_addr)
-		*save_addr = msg.msg_name;
+		*save_addr = msg->msg_name;
 
-	if (msg.msg_name && kmsg->msg_namelen) {
+	if (msg->msg_name && kmsg->msg_namelen) {
 		if (!save_addr) {
-			err = move_addr_to_kernel(msg.msg_name,
+			err = move_addr_to_kernel(msg->msg_name,
 						  kmsg->msg_namelen,
 						  kmsg->msg_name);
 			if (err < 0)
@@ -2401,12 +2396,10 @@ int __copy_msghdr_from_user(struct msghdr *kmsg,
 		kmsg->msg_namelen = 0;
 	}
 
-	if (msg.msg_iovlen > UIO_MAXIOV)
+	if (msg->msg_iovlen > UIO_MAXIOV)
 		return -EMSGSIZE;
 
 	kmsg->msg_iocb = NULL;
-	*uiov = msg.msg_iov;
-	*nsegs = msg.msg_iovlen;
 	return 0;
 }
 
@@ -2418,8 +2411,10 @@ static int copy_msghdr_from_user(struct msghdr *kmsg,
 	struct user_msghdr msg;
 	ssize_t err;
 
-	err = __copy_msghdr_from_user(kmsg, umsg, save_addr, &msg.msg_iov,
-					&msg.msg_iovlen);
+	if (copy_from_user(&msg, umsg, sizeof(*umsg)))
+		return -EFAULT;
+
+	err = __copy_msghdr(kmsg, &msg, save_addr);
 	if (err)
 		return err;
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 for-next 2/3] net: copy from user before calling __get_compat_msghdr
  2022-07-14 11:02 [PATCH v3 for-next 0/3] io_uring: multishot recvmsg Dylan Yudaken
  2022-07-14 11:02 ` [PATCH v3 for-next 1/3] net: copy from user before calling __copy_msghdr Dylan Yudaken
@ 2022-07-14 11:02 ` Dylan Yudaken
       [not found]   ` <CGME20220715202859eucas1p1a336fd34a883adb96bde608ba2ca3a12@eucas1p1.samsung.com>
  2022-07-14 11:02 ` [PATCH v3 for-next 3/3] io_uring: support multishot in recvmsg Dylan Yudaken
  2022-07-14 16:28 ` [PATCH v3 for-next 0/3] io_uring: multishot recvmsg Jens Axboe
  3 siblings, 1 reply; 10+ messages in thread
From: Dylan Yudaken @ 2022-07-14 11:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, davem, edumazet, kuba, pabeni,
	io-uring
  Cc: netdev, Kernel-team, Dylan Yudaken

this is in preparation for multishot receive from io_uring, where it needs
to have access to the original struct user_msghdr.

functionally this should be a no-op.

Acked-by: Paolo Abeni <[email protected]>
Signed-off-by: Dylan Yudaken <[email protected]>
---
 include/net/compat.h |  5 ++---
 io_uring/net.c       | 17 +++++++++--------
 net/compat.c         | 39 +++++++++++++++++----------------------
 3 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/include/net/compat.h b/include/net/compat.h
index 595fee069b82..84c163f40f38 100644
--- a/include/net/compat.h
+++ b/include/net/compat.h
@@ -46,9 +46,8 @@ struct compat_rtentry {
 	unsigned short  rt_irtt;        /* Initial RTT                  */
 };
 
-int __get_compat_msghdr(struct msghdr *kmsg, struct compat_msghdr __user *umsg,
-			struct sockaddr __user **save_addr, compat_uptr_t *ptr,
-			compat_size_t *len);
+int __get_compat_msghdr(struct msghdr *kmsg, struct compat_msghdr *msg,
+			struct sockaddr __user **save_addr);
 int get_compat_msghdr(struct msghdr *, struct compat_msghdr __user *,
 		      struct sockaddr __user **, struct iovec **);
 int put_cmsg_compat(struct msghdr*, int, int, int, void *);
diff --git a/io_uring/net.c b/io_uring/net.c
index da7667ed3610..5bc3440a8290 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -369,24 +369,25 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
 					struct io_async_msghdr *iomsg)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
+	struct compat_msghdr msg;
 	struct compat_iovec __user *uiov;
-	compat_uptr_t ptr;
-	compat_size_t len;
 	int ret;
 
-	ret = __get_compat_msghdr(&iomsg->msg, sr->umsg_compat, &iomsg->uaddr,
-				  &ptr, &len);
+	if (copy_from_user(&msg, sr->umsg_compat, sizeof(msg)))
+		return -EFAULT;
+
+	ret = __get_compat_msghdr(&iomsg->msg, sr->umsg_compat, &iomsg->uaddr);
 	if (ret)
 		return ret;
 
-	uiov = compat_ptr(ptr);
+	uiov = compat_ptr(msg.msg_iov);
 	if (req->flags & REQ_F_BUFFER_SELECT) {
 		compat_ssize_t clen;
 
-		if (len == 0) {
+		if (msg.msg_iovlen == 0) {
 			sr->len = 0;
 			iomsg->free_iov = NULL;
-		} else if (len > 1) {
+		} else if (msg.msg_iovlen > 1) {
 			return -EINVAL;
 		} else {
 			if (!access_ok(uiov, sizeof(*uiov)))
@@ -400,7 +401,7 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
 		}
 	} else {
 		iomsg->free_iov = iomsg->fast_iov;
-		ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
+		ret = __import_iovec(READ, (struct iovec __user *)uiov, msg.msg_iovlen,
 				   UIO_FASTIOV, &iomsg->free_iov,
 				   &iomsg->msg.msg_iter, true);
 		if (ret < 0)
diff --git a/net/compat.c b/net/compat.c
index 210fc3b4d0d8..513aa9a3fc64 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -34,20 +34,15 @@
 #include <net/compat.h>
 
 int __get_compat_msghdr(struct msghdr *kmsg,
-			struct compat_msghdr __user *umsg,
-			struct sockaddr __user **save_addr,
-			compat_uptr_t *ptr, compat_size_t *len)
+			struct compat_msghdr *msg,
+			struct sockaddr __user **save_addr)
 {
-	struct compat_msghdr msg;
 	ssize_t err;
 
-	if (copy_from_user(&msg, umsg, sizeof(*umsg)))
-		return -EFAULT;
-
-	kmsg->msg_flags = msg.msg_flags;
-	kmsg->msg_namelen = msg.msg_namelen;
+	kmsg->msg_flags = msg->msg_flags;
+	kmsg->msg_namelen = msg->msg_namelen;
 
-	if (!msg.msg_name)
+	if (!msg->msg_name)
 		kmsg->msg_namelen = 0;
 
 	if (kmsg->msg_namelen < 0)
@@ -57,15 +52,15 @@ int __get_compat_msghdr(struct msghdr *kmsg,
 		kmsg->msg_namelen = sizeof(struct sockaddr_storage);
 
 	kmsg->msg_control_is_user = true;
-	kmsg->msg_control_user = compat_ptr(msg.msg_control);
-	kmsg->msg_controllen = msg.msg_controllen;
+	kmsg->msg_control_user = compat_ptr(msg->msg_control);
+	kmsg->msg_controllen = msg->msg_controllen;
 
 	if (save_addr)
-		*save_addr = compat_ptr(msg.msg_name);
+		*save_addr = compat_ptr(msg->msg_name);
 
-	if (msg.msg_name && kmsg->msg_namelen) {
+	if (msg->msg_name && kmsg->msg_namelen) {
 		if (!save_addr) {
-			err = move_addr_to_kernel(compat_ptr(msg.msg_name),
+			err = move_addr_to_kernel(compat_ptr(msg->msg_name),
 						  kmsg->msg_namelen,
 						  kmsg->msg_name);
 			if (err < 0)
@@ -76,12 +71,10 @@ int __get_compat_msghdr(struct msghdr *kmsg,
 		kmsg->msg_namelen = 0;
 	}
 
-	if (msg.msg_iovlen > UIO_MAXIOV)
+	if (msg->msg_iovlen > UIO_MAXIOV)
 		return -EMSGSIZE;
 
 	kmsg->msg_iocb = NULL;
-	*ptr = msg.msg_iov;
-	*len = msg.msg_iovlen;
 	return 0;
 }
 
@@ -90,15 +83,17 @@ int get_compat_msghdr(struct msghdr *kmsg,
 		      struct sockaddr __user **save_addr,
 		      struct iovec **iov)
 {
-	compat_uptr_t ptr;
-	compat_size_t len;
+	struct compat_msghdr msg;
 	ssize_t err;
 
-	err = __get_compat_msghdr(kmsg, umsg, save_addr, &ptr, &len);
+	if (copy_from_user(&msg, umsg, sizeof(*umsg)))
+		return -EFAULT;
+
+	err = __get_compat_msghdr(kmsg, umsg, save_addr);
 	if (err)
 		return err;
 
-	err = import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr), len,
+	err = import_iovec(save_addr ? READ : WRITE, compat_ptr(msg.msg_iov), msg.msg_iovlen,
 			   UIO_FASTIOV, iov, &kmsg->msg_iter);
 	return err < 0 ? err : 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 for-next 3/3] io_uring: support multishot in recvmsg
  2022-07-14 11:02 [PATCH v3 for-next 0/3] io_uring: multishot recvmsg Dylan Yudaken
  2022-07-14 11:02 ` [PATCH v3 for-next 1/3] net: copy from user before calling __copy_msghdr Dylan Yudaken
  2022-07-14 11:02 ` [PATCH v3 for-next 2/3] net: copy from user before calling __get_compat_msghdr Dylan Yudaken
@ 2022-07-14 11:02 ` Dylan Yudaken
  2022-07-14 13:36   ` Jens Axboe
  2022-07-14 16:28 ` [PATCH v3 for-next 0/3] io_uring: multishot recvmsg Jens Axboe
  3 siblings, 1 reply; 10+ messages in thread
From: Dylan Yudaken @ 2022-07-14 11:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, davem, edumazet, kuba, pabeni,
	io-uring
  Cc: netdev, Kernel-team, Dylan Yudaken

Similar to multishot recv, this will require provided buffers to be
used. However recvmsg is much more complex than recv as it has multiple
outputs. Specifically flags, name, and control messages.

Support this by introducing a new struct io_uring_recvmsg_out with 4
fields. namelen, controllen and flags match the similar out fields in
msghdr from standard recvmsg(2), payloadlen is the length of the payload
following the header.
This struct is placed at the start of the returned buffer. Based on what
the user specifies in struct msghdr, the next bytes of the buffer will be
name (the next msg_namelen bytes), and then control (the next
msg_controllen bytes). The payload will come at the end. The return value
in the CQE is the total used size of the provided buffer.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 include/uapi/linux/io_uring.h |   7 ++
 io_uring/net.c                | 182 ++++++++++++++++++++++++++++++----
 io_uring/net.h                |   6 ++
 3 files changed, 176 insertions(+), 19 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 499679134961..4c9b11e2e991 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -613,4 +613,11 @@ struct io_uring_file_index_range {
 	__u64	resv;
 };
 
+struct io_uring_recvmsg_out {
+	__u32 namelen;
+	__u32 controllen;
+	__u32 payloadlen;
+	__u32 flags;
+};
+
 #endif
diff --git a/io_uring/net.c b/io_uring/net.c
index 5bc3440a8290..7a4707cbf863 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -325,6 +325,21 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	return IOU_OK;
 }
 
+static int io_recvmsg_multishot_overflow(struct io_async_msghdr *iomsg)
+{
+	unsigned long hdr;
+
+	if (check_add_overflow(sizeof(struct io_uring_recvmsg_out),
+			       (unsigned long)iomsg->namelen, &hdr))
+		return -EOVERFLOW;
+	if (check_add_overflow(hdr, iomsg->controllen, &hdr))
+		return -EOVERFLOW;
+	if (hdr > INT_MAX)
+		return -EOVERFLOW;
+
+	return 0;
+}
+
 static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
 				 struct io_async_msghdr *iomsg)
 {
@@ -352,6 +367,13 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
 			sr->len = iomsg->fast_iov[0].iov_len;
 			iomsg->free_iov = NULL;
 		}
+
+		if (req->flags & REQ_F_APOLL_MULTISHOT) {
+			iomsg->namelen = msg.msg_namelen;
+			iomsg->controllen = msg.msg_controllen;
+			if (io_recvmsg_multishot_overflow(iomsg))
+				return -EOVERFLOW;
+		}
 	} else {
 		iomsg->free_iov = iomsg->fast_iov;
 		ret = __import_iovec(READ, msg.msg_iov, msg.msg_iovlen, UIO_FASTIOV,
@@ -399,6 +421,13 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
 			sr->len = clen;
 			iomsg->free_iov = NULL;
 		}
+
+		if (req->flags & REQ_F_APOLL_MULTISHOT) {
+			iomsg->namelen = msg.msg_namelen;
+			iomsg->controllen = msg.msg_controllen;
+			if (io_recvmsg_multishot_overflow(iomsg))
+				return -EOVERFLOW;
+		}
 	} else {
 		iomsg->free_iov = iomsg->fast_iov;
 		ret = __import_iovec(READ, (struct iovec __user *)uiov, msg.msg_iovlen,
@@ -455,8 +484,6 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (sr->msg_flags & MSG_ERRQUEUE)
 		req->flags |= REQ_F_CLEAR_POLLIN;
 	if (sr->flags & IORING_RECV_MULTISHOT) {
-		if (req->opcode == IORING_OP_RECVMSG)
-			return -EINVAL;
 		if (!(req->flags & REQ_F_BUFFER_SELECT))
 			return -EINVAL;
 		if (sr->msg_flags & MSG_WAITALL)
@@ -483,12 +510,13 @@ static inline void io_recv_prep_retry(struct io_kiocb *req)
 }
 
 /*
- * Finishes io_recv
+ * Finishes io_recv and io_recvmsg.
  *
  * Returns true if it is actually finished, or false if it should run
  * again (for multishot).
  */
-static inline bool io_recv_finish(struct io_kiocb *req, int *ret, unsigned int cflags)
+static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
+				  unsigned int cflags, bool mshot_finished)
 {
 	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
 		io_req_set_res(req, *ret, cflags);
@@ -496,7 +524,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, unsigned int c
 		return true;
 	}
 
-	if (*ret > 0) {
+	if (!mshot_finished) {
 		if (io_post_aux_cqe(req->ctx, req->cqe.user_data, *ret,
 				    cflags | IORING_CQE_F_MORE, false)) {
 			io_recv_prep_retry(req);
@@ -518,6 +546,91 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret, unsigned int c
 	return true;
 }
 
+static int io_recvmsg_prep_multishot(struct io_async_msghdr *kmsg, struct io_sr_msg *sr,
+				     void __user **buf, size_t *len)
+{
+	unsigned long ubuf = (unsigned long)*buf;
+	unsigned long hdr;
+
+	hdr = sizeof(struct io_uring_recvmsg_out) + kmsg->namelen + kmsg->controllen;
+	if (*len < hdr)
+		return -EFAULT;
+
+	if (kmsg->controllen) {
+		unsigned long control = ubuf + hdr - kmsg->controllen;
+
+		kmsg->msg.msg_control_user = (void *)control;
+		kmsg->msg.msg_controllen = kmsg->controllen;
+	}
+
+	sr->buf = *buf; /* stash for later copy */
+	*buf = (void *)(ubuf + hdr);
+	kmsg->payloadlen = *len = *len - hdr;
+	return 0;
+}
+
+struct io_recvmsg_multishot_hdr {
+	struct io_uring_recvmsg_out msg;
+	struct sockaddr_storage addr;
+};
+
+static int io_recvmsg_multishot(
+	struct socket *sock,
+	struct io_sr_msg *io,
+	struct io_async_msghdr *kmsg,
+	unsigned int flags,
+	bool *finished)
+{
+	int err;
+	int copy_len;
+	struct io_recvmsg_multishot_hdr hdr;
+
+	if (kmsg->namelen)
+		kmsg->msg.msg_name = &hdr.addr;
+	kmsg->msg.msg_flags = flags & (MSG_CMSG_CLOEXEC|MSG_CMSG_COMPAT);
+	kmsg->msg.msg_namelen = 0;
+
+	if (sock->file->f_flags & O_NONBLOCK)
+		flags |= MSG_DONTWAIT;
+
+	err = sock_recvmsg(sock, &kmsg->msg, flags);
+	*finished = err <= 0;
+	if (err < 0)
+		return err;
+
+	hdr.msg = (struct io_uring_recvmsg_out) {
+		.controllen = kmsg->controllen - kmsg->msg.msg_controllen,
+		.flags = kmsg->msg.msg_flags & ~MSG_CMSG_COMPAT
+	};
+
+	hdr.msg.payloadlen = err;
+	if (err > kmsg->payloadlen)
+		err = kmsg->payloadlen;
+
+	copy_len = sizeof(struct io_uring_recvmsg_out);
+	if (kmsg->msg.msg_namelen > kmsg->namelen)
+		copy_len += kmsg->namelen;
+	else
+		copy_len += kmsg->msg.msg_namelen;
+
+	/*
+	 *      "fromlen shall refer to the value before truncation.."
+	 *                      1003.1g
+	 */
+	hdr.msg.namelen = kmsg->msg.msg_namelen;
+
+	/* ensure that there is no gap between hdr and sockaddr_storage */
+	BUILD_BUG_ON(offsetof(struct io_recvmsg_multishot_hdr, addr) !=
+		     sizeof(struct io_uring_recvmsg_out));
+	if (copy_to_user(io->buf, &hdr, copy_len)) {
+		*finished = true;
+		return -EFAULT;
+	}
+
+	return sizeof(struct io_uring_recvmsg_out) + kmsg->namelen +
+			kmsg->controllen + err;
+}
+
 int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
@@ -527,6 +640,7 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	unsigned flags;
 	int ret, min_ret = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+	bool mshot_finished = true;
 
 	sock = sock_from_file(req->file);
 	if (unlikely(!sock))
@@ -545,16 +659,29 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
 		return io_setup_async_msg(req, kmsg, issue_flags);
 
+retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
+		size_t len = sr->len;
 
-		buf = io_buffer_select(req, &sr->len, issue_flags);
+		buf = io_buffer_select(req, &len, issue_flags);
 		if (!buf)
 			return -ENOBUFS;
+
+		if (req->flags & REQ_F_APOLL_MULTISHOT) {
+			ret = io_recvmsg_prep_multishot(kmsg, sr,
+							&buf, &len);
+
+			if (ret) {
+				io_kbuf_recycle(req, issue_flags);
+				return ret;
+			}
+		}
+
 		kmsg->fast_iov[0].iov_base = buf;
-		kmsg->fast_iov[0].iov_len = sr->len;
+		kmsg->fast_iov[0].iov_len = len;
 		iov_iter_init(&kmsg->msg.msg_iter, READ, kmsg->fast_iov, 1,
-				sr->len);
+				len);
 	}
 
 	flags = sr->msg_flags;
@@ -564,10 +691,22 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 		min_ret = iov_iter_count(&kmsg->msg.msg_iter);
 
 	kmsg->msg.msg_get_inq = 1;
-	ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg, kmsg->uaddr, flags);
+	if (req->flags & REQ_F_APOLL_MULTISHOT)
+		ret = io_recvmsg_multishot(sock, sr, kmsg, flags,
+					   &mshot_finished);
+	else
+		ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg, kmsg->uaddr, flags);
+
 	if (ret < min_ret) {
-		if (ret == -EAGAIN && force_nonblock)
-			return io_setup_async_msg(req, kmsg, issue_flags);
+		if (ret == -EAGAIN && force_nonblock) {
+			ret = io_setup_async_msg(req, kmsg, issue_flags);
+			if (ret == -EAGAIN && (req->flags & IO_APOLL_MULTI_POLLED) ==
+					       IO_APOLL_MULTI_POLLED) {
+				io_kbuf_recycle(req, issue_flags);
+				return IOU_ISSUE_SKIP_COMPLETE;
+			}
+			return ret;
+		}
 		if (ret == -ERESTARTSYS)
 			ret = -EINTR;
 		if (ret > 0 && io_net_retry(sock, flags)) {
@@ -580,11 +719,6 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 		req_set_fail(req);
 	}
 
-	/* fast path, check for non-NULL to avoid function call */
-	if (kmsg->free_iov)
-		kfree(kmsg->free_iov);
-	io_netmsg_recycle(req, issue_flags);
-	req->flags &= ~REQ_F_NEED_CLEANUP;
 	if (ret > 0)
 		ret += sr->done_io;
 	else if (sr->done_io)
@@ -596,8 +730,18 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	if (kmsg->msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
 
-	io_req_set_res(req, ret, cflags);
-	return IOU_OK;
+	if (!io_recv_finish(req, &ret, cflags, mshot_finished))
+		goto retry_multishot;
+
+	if (mshot_finished) {
+		io_netmsg_recycle(req, issue_flags);
+		/* fast path, check for non-NULL to avoid function call */
+		if (kmsg->free_iov)
+			kfree(kmsg->free_iov);
+		req->flags &= ~REQ_F_NEED_CLEANUP;
+	}
+
+	return ret;
 }
 
 int io_recv(struct io_kiocb *req, unsigned int issue_flags)
@@ -684,7 +828,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	if (msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
 
-	if (!io_recv_finish(req, &ret, cflags))
+	if (!io_recv_finish(req, &ret, cflags, ret <= 0))
 		goto retry_multishot;
 
 	return ret;
diff --git a/io_uring/net.h b/io_uring/net.h
index 178a6d8b76e0..db20ce9d6546 100644
--- a/io_uring/net.h
+++ b/io_uring/net.h
@@ -9,6 +9,12 @@
 struct io_async_msghdr {
 	union {
 		struct iovec		fast_iov[UIO_FASTIOV];
+		struct {
+			struct iovec	fast_iov_one;
+			__kernel_size_t	controllen;
+			int		namelen;
+			__kernel_size_t	payloadlen;
+		};
 		struct io_cache_entry	cache;
 	};
 	/* points to an allocated iov, if NULL we use fast_iov instead */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 for-next 3/3] io_uring: support multishot in recvmsg
  2022-07-14 11:02 ` [PATCH v3 for-next 3/3] io_uring: support multishot in recvmsg Dylan Yudaken
@ 2022-07-14 13:36   ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2022-07-14 13:36 UTC (permalink / raw)
  To: Dylan Yudaken, Pavel Begunkov, davem, edumazet, kuba, pabeni,
	io-uring
  Cc: netdev, Kernel-team

On 7/14/22 5:02 AM, Dylan Yudaken wrote:
> Similar to multishot recv, this will require provided buffers to be
> used. However recvmsg is much more complex than recv as it has multiple
> outputs. Specifically flags, name, and control messages.
> 
> Support this by introducing a new struct io_uring_recvmsg_out with 4
> fields. namelen, controllen and flags match the similar out fields in
> msghdr from standard recvmsg(2), payloadlen is the length of the payload
> following the header.
> This struct is placed at the start of the returned buffer. Based on what
> the user specifies in struct msghdr, the next bytes of the buffer will be
> name (the next msg_namelen bytes), and then control (the next
> msg_controllen bytes). The payload will come at the end. The return value
> in the CQE is the total used size of the provided buffer.

Just a few minor nits (some repeat ones too), otherwise looks fine to
me. I can either fold these in while applying, or you can spin a v4. Let
me know!

> +static int io_recvmsg_multishot_overflow(struct io_async_msghdr *iomsg)
> +{
> +	unsigned long hdr;
> +
> +	if (check_add_overflow(sizeof(struct io_uring_recvmsg_out),
> +			       (unsigned long)iomsg->namelen, &hdr))
> +		return -EOVERFLOW;
> +	if (check_add_overflow(hdr, iomsg->controllen, &hdr))
> +		return -EOVERFLOW;
> +	if (hdr > INT_MAX)
> +		return -EOVERFLOW;
> +
> +	return 0;
> +}

Nobody checks the specific value of this helper, so we should either
actually do that, or just make this one return a true/false instead. The
latter makes the most sense to me.

> +static int io_recvmsg_prep_multishot(struct io_async_msghdr *kmsg, struct io_sr_msg *sr,
> +				     void __user **buf, size_t *len)
> +{

The line breaks here are odd, should be at 80 unless there's a good
reason for it to exceed it.

Function reads better now though with the cast.

> +static int io_recvmsg_multishot(
> +	struct socket *sock,
> +	struct io_sr_msg *io,
> +	struct io_async_msghdr *kmsg,
> +	unsigned int flags,
> +	bool *finished)
> +{

This is still formatted badly.

> +		if (req->flags & REQ_F_APOLL_MULTISHOT) {
> +			ret = io_recvmsg_prep_multishot(kmsg, sr,
> +							&buf, &len);

			ret = io_recvmsg_prep_multishot(kmsg, sr, &buf, &len);

this still would fit nicely on an unbroken line.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 for-next 0/3] io_uring: multishot recvmsg
  2022-07-14 11:02 [PATCH v3 for-next 0/3] io_uring: multishot recvmsg Dylan Yudaken
                   ` (2 preceding siblings ...)
  2022-07-14 11:02 ` [PATCH v3 for-next 3/3] io_uring: support multishot in recvmsg Dylan Yudaken
@ 2022-07-14 16:28 ` Jens Axboe
  3 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2022-07-14 16:28 UTC (permalink / raw)
  To: dylany, pabeni, kuba, edumazet, davem, io-uring, asml.silence
  Cc: Kernel-team, netdev

On Thu, 14 Jul 2022 04:02:55 -0700, Dylan Yudaken wrote:
> This series adds multishot support to recvmsg in io_uring.
> 
> The idea is that you submit a single multishot recvmsg and then receive
> completions as and when data arrives. For recvmsg each completion also has
> control data, and this is necessarily included in the same buffer as the
> payload.
> 
> [...]

Applied, thanks!

[1/3] net: copy from user before calling __copy_msghdr
      commit: 03a3f428042c7752afa5dc5ffe3cf0b8f0e2acbf
[2/3] net: copy from user before calling __get_compat_msghdr
      commit: 1a3e4e94a1b95edb37cf0f66d7dfb32beb61df7f
[3/3] io_uring: support multishot in recvmsg
      commit: a8b38c4ce7240d869c820d457bcd51e452149510

Best regards,
-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 for-next 2/3] net: copy from user before calling __get_compat_msghdr
       [not found]   ` <CGME20220715202859eucas1p1a336fd34a883adb96bde608ba2ca3a12@eucas1p1.samsung.com>
@ 2022-07-15 20:28     ` Marek Szyprowski
  2022-07-15 20:37       ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Szyprowski @ 2022-07-15 20:28 UTC (permalink / raw)
  To: Dylan Yudaken, Jens Axboe, Pavel Begunkov, davem, edumazet, kuba,
	pabeni, io-uring
  Cc: netdev, Kernel-team

Hi,

On 14.07.2022 13:02, Dylan Yudaken wrote:
> this is in preparation for multishot receive from io_uring, where it needs
> to have access to the original struct user_msghdr.
>
> functionally this should be a no-op.
>
> Acked-by: Paolo Abeni <[email protected]>
> Signed-off-by: Dylan Yudaken <[email protected]>

This patch landed in linux next-20220715 as commit 1a3e4e94a1b9 ("net: 
copy from user before calling __get_compat_msghdr"). Unfortunately it 
causes a serious regression on the ARM64 based Khadas VIM3l board:

Unable to handle kernel access to user memory outside uaccess routines 
at virtual address 00000000ffc4a5c8
Mem abort info:
   ESR = 0x000000009600000f
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x0f: level 3 permission fault
Data abort info:
   ISV = 0, ISS = 0x0000000f
   CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=0000000001909000
[00000000ffc4a5c8] pgd=0800000001a7b003, p4d=0800000001a7b003, 
pud=0800000001a0e003, pmd=0800000001913003, pte=00e800000b9baf43
Internal error: Oops: 9600000f [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 247 Comm: systemd-udevd Not tainted 5.19.0-rc6+ #12437
Hardware name: Khadas VIM3L (DT)
pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : get_compat_msghdr+0xd0/0x1b0
lr : get_compat_msghdr+0xcc/0x1b0
...
Call trace:
  get_compat_msghdr+0xd0/0x1b0
  ___sys_sendmsg+0xd0/0xe0
  __sys_sendmsg+0x68/0xc4
  __arm64_compat_sys_sendmsg+0x28/0x3c
  invoke_syscall+0x48/0x114
  el0_svc_common.constprop.0+0x60/0x11c
  do_el0_svc_compat+0x1c/0x50
  el0_svc_compat+0x58/0x100
  el0t_32_sync_handler+0x90/0x140
  el0t_32_sync+0x190/0x194
Code: d2800382 9100f3e0 97d9be02 b5fffd60 (b9401a60)
---[ end trace 0000000000000000 ]---

This happens only on the mentioned board, other my ARM64 test boards 
boot fine with next-20220715. Reverting this commit, together with 
2b0b67d55f13 ("fix up for "io_uring: support multishot in recvmsg"") and 
a8b38c4ce724 ("io_uring: support multishot in recvmsg") due to compile 
dependencies on top of next-20220715 fixes the issue.

Let me know how I can help fixing this issue.

> ---
>   include/net/compat.h |  5 ++---
>   io_uring/net.c       | 17 +++++++++--------
>   net/compat.c         | 39 +++++++++++++++++----------------------
>   3 files changed, 28 insertions(+), 33 deletions(-)
>
> diff --git a/include/net/compat.h b/include/net/compat.h
> index 595fee069b82..84c163f40f38 100644
> --- a/include/net/compat.h
> +++ b/include/net/compat.h
> @@ -46,9 +46,8 @@ struct compat_rtentry {
>   	unsigned short  rt_irtt;        /* Initial RTT                  */
>   };
>   
> -int __get_compat_msghdr(struct msghdr *kmsg, struct compat_msghdr __user *umsg,
> -			struct sockaddr __user **save_addr, compat_uptr_t *ptr,
> -			compat_size_t *len);
> +int __get_compat_msghdr(struct msghdr *kmsg, struct compat_msghdr *msg,
> +			struct sockaddr __user **save_addr);
>   int get_compat_msghdr(struct msghdr *, struct compat_msghdr __user *,
>   		      struct sockaddr __user **, struct iovec **);
>   int put_cmsg_compat(struct msghdr*, int, int, int, void *);
> diff --git a/io_uring/net.c b/io_uring/net.c
> index da7667ed3610..5bc3440a8290 100644
> --- a/io_uring/net.c
> +++ b/io_uring/net.c
> @@ -369,24 +369,25 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
>   					struct io_async_msghdr *iomsg)
>   {
>   	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
> +	struct compat_msghdr msg;
>   	struct compat_iovec __user *uiov;
> -	compat_uptr_t ptr;
> -	compat_size_t len;
>   	int ret;
>   
> -	ret = __get_compat_msghdr(&iomsg->msg, sr->umsg_compat, &iomsg->uaddr,
> -				  &ptr, &len);
> +	if (copy_from_user(&msg, sr->umsg_compat, sizeof(msg)))
> +		return -EFAULT;
> +
> +	ret = __get_compat_msghdr(&iomsg->msg, sr->umsg_compat, &iomsg->uaddr);
>   	if (ret)
>   		return ret;
>   
> -	uiov = compat_ptr(ptr);
> +	uiov = compat_ptr(msg.msg_iov);
>   	if (req->flags & REQ_F_BUFFER_SELECT) {
>   		compat_ssize_t clen;
>   
> -		if (len == 0) {
> +		if (msg.msg_iovlen == 0) {
>   			sr->len = 0;
>   			iomsg->free_iov = NULL;
> -		} else if (len > 1) {
> +		} else if (msg.msg_iovlen > 1) {
>   			return -EINVAL;
>   		} else {
>   			if (!access_ok(uiov, sizeof(*uiov)))
> @@ -400,7 +401,7 @@ static int __io_compat_recvmsg_copy_hdr(struct io_kiocb *req,
>   		}
>   	} else {
>   		iomsg->free_iov = iomsg->fast_iov;
> -		ret = __import_iovec(READ, (struct iovec __user *)uiov, len,
> +		ret = __import_iovec(READ, (struct iovec __user *)uiov, msg.msg_iovlen,
>   				   UIO_FASTIOV, &iomsg->free_iov,
>   				   &iomsg->msg.msg_iter, true);
>   		if (ret < 0)
> diff --git a/net/compat.c b/net/compat.c
> index 210fc3b4d0d8..513aa9a3fc64 100644
> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -34,20 +34,15 @@
>   #include <net/compat.h>
>   
>   int __get_compat_msghdr(struct msghdr *kmsg,
> -			struct compat_msghdr __user *umsg,
> -			struct sockaddr __user **save_addr,
> -			compat_uptr_t *ptr, compat_size_t *len)
> +			struct compat_msghdr *msg,
> +			struct sockaddr __user **save_addr)
>   {
> -	struct compat_msghdr msg;
>   	ssize_t err;
>   
> -	if (copy_from_user(&msg, umsg, sizeof(*umsg)))
> -		return -EFAULT;
> -
> -	kmsg->msg_flags = msg.msg_flags;
> -	kmsg->msg_namelen = msg.msg_namelen;
> +	kmsg->msg_flags = msg->msg_flags;
> +	kmsg->msg_namelen = msg->msg_namelen;
>   
> -	if (!msg.msg_name)
> +	if (!msg->msg_name)
>   		kmsg->msg_namelen = 0;
>   
>   	if (kmsg->msg_namelen < 0)
> @@ -57,15 +52,15 @@ int __get_compat_msghdr(struct msghdr *kmsg,
>   		kmsg->msg_namelen = sizeof(struct sockaddr_storage);
>   
>   	kmsg->msg_control_is_user = true;
> -	kmsg->msg_control_user = compat_ptr(msg.msg_control);
> -	kmsg->msg_controllen = msg.msg_controllen;
> +	kmsg->msg_control_user = compat_ptr(msg->msg_control);
> +	kmsg->msg_controllen = msg->msg_controllen;
>   
>   	if (save_addr)
> -		*save_addr = compat_ptr(msg.msg_name);
> +		*save_addr = compat_ptr(msg->msg_name);
>   
> -	if (msg.msg_name && kmsg->msg_namelen) {
> +	if (msg->msg_name && kmsg->msg_namelen) {
>   		if (!save_addr) {
> -			err = move_addr_to_kernel(compat_ptr(msg.msg_name),
> +			err = move_addr_to_kernel(compat_ptr(msg->msg_name),
>   						  kmsg->msg_namelen,
>   						  kmsg->msg_name);
>   			if (err < 0)
> @@ -76,12 +71,10 @@ int __get_compat_msghdr(struct msghdr *kmsg,
>   		kmsg->msg_namelen = 0;
>   	}
>   
> -	if (msg.msg_iovlen > UIO_MAXIOV)
> +	if (msg->msg_iovlen > UIO_MAXIOV)
>   		return -EMSGSIZE;
>   
>   	kmsg->msg_iocb = NULL;
> -	*ptr = msg.msg_iov;
> -	*len = msg.msg_iovlen;
>   	return 0;
>   }
>   
> @@ -90,15 +83,17 @@ int get_compat_msghdr(struct msghdr *kmsg,
>   		      struct sockaddr __user **save_addr,
>   		      struct iovec **iov)
>   {
> -	compat_uptr_t ptr;
> -	compat_size_t len;
> +	struct compat_msghdr msg;
>   	ssize_t err;
>   
> -	err = __get_compat_msghdr(kmsg, umsg, save_addr, &ptr, &len);
> +	if (copy_from_user(&msg, umsg, sizeof(*umsg)))
> +		return -EFAULT;
> +
> +	err = __get_compat_msghdr(kmsg, umsg, save_addr);
>   	if (err)
>   		return err;
>   
> -	err = import_iovec(save_addr ? READ : WRITE, compat_ptr(ptr), len,
> +	err = import_iovec(save_addr ? READ : WRITE, compat_ptr(msg.msg_iov), msg.msg_iovlen,
>   			   UIO_FASTIOV, iov, &kmsg->msg_iter);
>   	return err < 0 ? err : 0;
>   }

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 for-next 2/3] net: copy from user before calling __get_compat_msghdr
  2022-07-15 20:28     ` Marek Szyprowski
@ 2022-07-15 20:37       ` Jens Axboe
  2022-07-15 20:58         ` Marek Szyprowski
  0 siblings, 1 reply; 10+ messages in thread
From: Jens Axboe @ 2022-07-15 20:37 UTC (permalink / raw)
  To: Marek Szyprowski, Dylan Yudaken, Pavel Begunkov, davem, edumazet,
	kuba, pabeni, io-uring
  Cc: netdev, Kernel-team

On 7/15/22 2:28 PM, Marek Szyprowski wrote:
> Hi,
> 
> On 14.07.2022 13:02, Dylan Yudaken wrote:
>> this is in preparation for multishot receive from io_uring, where it needs
>> to have access to the original struct user_msghdr.
>>
>> functionally this should be a no-op.
>>
>> Acked-by: Paolo Abeni <[email protected]>
>> Signed-off-by: Dylan Yudaken <[email protected]>
> 
> This patch landed in linux next-20220715 as commit 1a3e4e94a1b9 ("net: 
> copy from user before calling __get_compat_msghdr"). Unfortunately it 
> causes a serious regression on the ARM64 based Khadas VIM3l board:
> 
> Unable to handle kernel access to user memory outside uaccess routines 
> at virtual address 00000000ffc4a5c8
> Mem abort info:
>    ESR = 0x000000009600000f
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x0f: level 3 permission fault
> Data abort info:
>    ISV = 0, ISS = 0x0000000f
>    CM = 0, WnR = 0
> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000001909000
> [00000000ffc4a5c8] pgd=0800000001a7b003, p4d=0800000001a7b003, 
> pud=0800000001a0e003, pmd=0800000001913003, pte=00e800000b9baf43
> Internal error: Oops: 9600000f [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 247 Comm: systemd-udevd Not tainted 5.19.0-rc6+ #12437
> Hardware name: Khadas VIM3L (DT)
> pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> pc : get_compat_msghdr+0xd0/0x1b0
> lr : get_compat_msghdr+0xcc/0x1b0
> ...
> Call trace:
>   get_compat_msghdr+0xd0/0x1b0
>   ___sys_sendmsg+0xd0/0xe0
>   __sys_sendmsg+0x68/0xc4
>   __arm64_compat_sys_sendmsg+0x28/0x3c
>   invoke_syscall+0x48/0x114
>   el0_svc_common.constprop.0+0x60/0x11c
>   do_el0_svc_compat+0x1c/0x50
>   el0_svc_compat+0x58/0x100
>   el0t_32_sync_handler+0x90/0x140
>   el0t_32_sync+0x190/0x194
> Code: d2800382 9100f3e0 97d9be02 b5fffd60 (b9401a60)
> ---[ end trace 0000000000000000 ]---
> 
> This happens only on the mentioned board, other my ARM64 test boards 
> boot fine with next-20220715. Reverting this commit, together with 
> 2b0b67d55f13 ("fix up for "io_uring: support multishot in recvmsg"") and 
> a8b38c4ce724 ("io_uring: support multishot in recvmsg") due to compile 
> dependencies on top of next-20220715 fixes the issue.
> 
> Let me know how I can help fixing this issue.

How are you reproducing this?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 for-next 2/3] net: copy from user before calling __get_compat_msghdr
  2022-07-15 20:37       ` Jens Axboe
@ 2022-07-15 20:58         ` Marek Szyprowski
  2022-07-15 21:25           ` Jens Axboe
  0 siblings, 1 reply; 10+ messages in thread
From: Marek Szyprowski @ 2022-07-15 20:58 UTC (permalink / raw)
  To: Jens Axboe, Dylan Yudaken, Pavel Begunkov, davem, edumazet, kuba,
	pabeni, io-uring
  Cc: netdev, Kernel-team

Hi,

On 15.07.2022 22:37, Jens Axboe wrote:
> On 7/15/22 2:28 PM, Marek Szyprowski wrote:
>> On 14.07.2022 13:02, Dylan Yudaken wrote:
>>> this is in preparation for multishot receive from io_uring, where it needs
>>> to have access to the original struct user_msghdr.
>>>
>>> functionally this should be a no-op.
>>>
>>> Acked-by: Paolo Abeni <[email protected]>
>>> Signed-off-by: Dylan Yudaken <[email protected]>
>> This patch landed in linux next-20220715 as commit 1a3e4e94a1b9 ("net:
>> copy from user before calling __get_compat_msghdr"). Unfortunately it
>> causes a serious regression on the ARM64 based Khadas VIM3l board:
>>
>> Unable to handle kernel access to user memory outside uaccess routines
>> at virtual address 00000000ffc4a5c8
>> Mem abort info:
>>     ESR = 0x000000009600000f
>>     EC = 0x25: DABT (current EL), IL = 32 bits
>>     SET = 0, FnV = 0
>>     EA = 0, S1PTW = 0
>>     FSC = 0x0f: level 3 permission fault
>> Data abort info:
>>     ISV = 0, ISS = 0x0000000f
>>     CM = 0, WnR = 0
>> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000001909000
>> [00000000ffc4a5c8] pgd=0800000001a7b003, p4d=0800000001a7b003,
>> pud=0800000001a0e003, pmd=0800000001913003, pte=00e800000b9baf43
>> Internal error: Oops: 9600000f [#1] PREEMPT SMP
>> Modules linked in:
>> CPU: 0 PID: 247 Comm: systemd-udevd Not tainted 5.19.0-rc6+ #12437
>> Hardware name: Khadas VIM3L (DT)
>> pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : get_compat_msghdr+0xd0/0x1b0
>> lr : get_compat_msghdr+0xcc/0x1b0
>> ...
>> Call trace:
>>    get_compat_msghdr+0xd0/0x1b0
>>    ___sys_sendmsg+0xd0/0xe0
>>    __sys_sendmsg+0x68/0xc4
>>    __arm64_compat_sys_sendmsg+0x28/0x3c
>>    invoke_syscall+0x48/0x114
>>    el0_svc_common.constprop.0+0x60/0x11c
>>    do_el0_svc_compat+0x1c/0x50
>>    el0_svc_compat+0x58/0x100
>>    el0t_32_sync_handler+0x90/0x140
>>    el0t_32_sync+0x190/0x194
>> Code: d2800382 9100f3e0 97d9be02 b5fffd60 (b9401a60)
>> ---[ end trace 0000000000000000 ]---
>>
>> This happens only on the mentioned board, other my ARM64 test boards
>> boot fine with next-20220715. Reverting this commit, together with
>> 2b0b67d55f13 ("fix up for "io_uring: support multishot in recvmsg"") and
>> a8b38c4ce724 ("io_uring: support multishot in recvmsg") due to compile
>> dependencies on top of next-20220715 fixes the issue.
>>
>> Let me know how I can help fixing this issue.
> How are you reproducing this?

This happens always during system boot on the mentioned board, when udev 
starts discovering devices. The complete boot log is here:

https://pastebin.com/i8WzFzcx

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 for-next 2/3] net: copy from user before calling __get_compat_msghdr
  2022-07-15 20:58         ` Marek Szyprowski
@ 2022-07-15 21:25           ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2022-07-15 21:25 UTC (permalink / raw)
  To: Marek Szyprowski, Dylan Yudaken, Pavel Begunkov, davem, edumazet,
	kuba, pabeni, io-uring
  Cc: netdev, Kernel-team

On 7/15/22 2:58 PM, Marek Szyprowski wrote:
> Hi,
> 
> On 15.07.2022 22:37, Jens Axboe wrote:
>> On 7/15/22 2:28 PM, Marek Szyprowski wrote:
>>> On 14.07.2022 13:02, Dylan Yudaken wrote:
>>>> this is in preparation for multishot receive from io_uring, where it needs
>>>> to have access to the original struct user_msghdr.
>>>>
>>>> functionally this should be a no-op.
>>>>
>>>> Acked-by: Paolo Abeni <[email protected]>
>>>> Signed-off-by: Dylan Yudaken <[email protected]>
>>> This patch landed in linux next-20220715 as commit 1a3e4e94a1b9 ("net:
>>> copy from user before calling __get_compat_msghdr"). Unfortunately it
>>> causes a serious regression on the ARM64 based Khadas VIM3l board:
>>>
>>> Unable to handle kernel access to user memory outside uaccess routines
>>> at virtual address 00000000ffc4a5c8
>>> Mem abort info:
>>>     ESR = 0x000000009600000f
>>>     EC = 0x25: DABT (current EL), IL = 32 bits
>>>     SET = 0, FnV = 0
>>>     EA = 0, S1PTW = 0
>>>     FSC = 0x0f: level 3 permission fault
>>> Data abort info:
>>>     ISV = 0, ISS = 0x0000000f
>>>     CM = 0, WnR = 0
>>> user pgtable: 4k pages, 48-bit VAs, pgdp=0000000001909000
>>> [00000000ffc4a5c8] pgd=0800000001a7b003, p4d=0800000001a7b003,
>>> pud=0800000001a0e003, pmd=0800000001913003, pte=00e800000b9baf43
>>> Internal error: Oops: 9600000f [#1] PREEMPT SMP
>>> Modules linked in:
>>> CPU: 0 PID: 247 Comm: systemd-udevd Not tainted 5.19.0-rc6+ #12437
>>> Hardware name: Khadas VIM3L (DT)
>>> pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>> pc : get_compat_msghdr+0xd0/0x1b0
>>> lr : get_compat_msghdr+0xcc/0x1b0
>>> ...
>>> Call trace:
>>>    get_compat_msghdr+0xd0/0x1b0
>>>    ___sys_sendmsg+0xd0/0xe0
>>>    __sys_sendmsg+0x68/0xc4
>>>    __arm64_compat_sys_sendmsg+0x28/0x3c
>>>    invoke_syscall+0x48/0x114
>>>    el0_svc_common.constprop.0+0x60/0x11c
>>>    do_el0_svc_compat+0x1c/0x50
>>>    el0_svc_compat+0x58/0x100
>>>    el0t_32_sync_handler+0x90/0x140
>>>    el0t_32_sync+0x190/0x194
>>> Code: d2800382 9100f3e0 97d9be02 b5fffd60 (b9401a60)
>>> ---[ end trace 0000000000000000 ]---
>>>
>>> This happens only on the mentioned board, other my ARM64 test boards
>>> boot fine with next-20220715. Reverting this commit, together with
>>> 2b0b67d55f13 ("fix up for "io_uring: support multishot in recvmsg"") and
>>> a8b38c4ce724 ("io_uring: support multishot in recvmsg") due to compile
>>> dependencies on top of next-20220715 fixes the issue.
>>>
>>> Let me know how I can help fixing this issue.
>> How are you reproducing this?
> 
> This happens always during system boot on the mentioned board, when udev 
> starts discovering devices. The complete boot log is here:
> 
> https://pastebin.com/i8WzFzcx

Does this help?


diff --git a/net/compat.c b/net/compat.c
index 513aa9a3fc64..ed880729d159 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -89,7 +89,7 @@ int get_compat_msghdr(struct msghdr *kmsg,
 	if (copy_from_user(&msg, umsg, sizeof(*umsg)))
 		return -EFAULT;
 
-	err = __get_compat_msghdr(kmsg, umsg, save_addr);
+	err = __get_compat_msghdr(kmsg, &msg, save_addr);
 	if (err)
 		return err;
 

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-07-15 21:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-07-14 11:02 [PATCH v3 for-next 0/3] io_uring: multishot recvmsg Dylan Yudaken
2022-07-14 11:02 ` [PATCH v3 for-next 1/3] net: copy from user before calling __copy_msghdr Dylan Yudaken
2022-07-14 11:02 ` [PATCH v3 for-next 2/3] net: copy from user before calling __get_compat_msghdr Dylan Yudaken
     [not found]   ` <CGME20220715202859eucas1p1a336fd34a883adb96bde608ba2ca3a12@eucas1p1.samsung.com>
2022-07-15 20:28     ` Marek Szyprowski
2022-07-15 20:37       ` Jens Axboe
2022-07-15 20:58         ` Marek Szyprowski
2022-07-15 21:25           ` Jens Axboe
2022-07-14 11:02 ` [PATCH v3 for-next 3/3] io_uring: support multishot in recvmsg Dylan Yudaken
2022-07-14 13:36   ` Jens Axboe
2022-07-14 16:28 ` [PATCH v3 for-next 0/3] io_uring: multishot recvmsg Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox