public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH 0/5] io_uring: support IORING_OP_BIND and IORING_OP_LISTEN
@ 2024-05-31 21:12 Gabriel Krisman Bertazi
  2024-05-31 21:12 ` [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails Gabriel Krisman Bertazi
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-05-31 21:12 UTC (permalink / raw)
  To: axboe; +Cc: io-uring, netdev, Gabriel Krisman Bertazi

Following a discussion at LSFMM, this patchset introduces two new
io_uring operations for bind(2) and listen(2).

The goal is to provide functional parity of registered files and direct
file descriptors with regular fds for io_uring network operations.  The
cool outcome is that we can kickstart a network server solely with
io_uring operations.

This feature has been requested several times in the past, including
at:

  https://github.com/axboe/liburing/issues/941

Regarding parameter organization within the SQE, specifically for
bind(2), I'm following the implementation of IO_RING_CONECT.  So, even
though addr_len is expected to be an integer in the original syscall, I
pass it through addr2, to match IO_RING_CONNECT.  Other than that, the
implementation is quite straightforward.

Patchset 1 fixes a memleak in IO_RING_CONNECT that you might want to
apply ahead of the rest of the patchset; Patches 2 and 3 adapt the net/
side in preparation to support invocations from io_uring; patch 4 and 5
add the io_uring boilerplate.

I wrote liburing support, including tests. I'll follow with those
patches shortly.

Gabriel Krisman Bertazi (5):
  io_uring: Fix leak of async data when connect prep fails
  net: Split a __sys_bind helper for io_uring
  net: Split a __sys_listen helper for io_uring
  io_uring: Introduce IORING_OP_BIND
  io_uring: Introduce IORING_OP_LISTEN

 include/linux/socket.h        |  3 ++
 include/uapi/linux/io_uring.h |  2 +
 io_uring/net.c                | 78 ++++++++++++++++++++++++++++++++++-
 io_uring/net.h                |  6 +++
 io_uring/opdef.c              | 26 ++++++++++++
 net/socket.c                  | 48 +++++++++++++--------
 6 files changed, 144 insertions(+), 19 deletions(-)

-- 
2.44.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails
  2024-05-31 21:12 [PATCH 0/5] io_uring: support IORING_OP_BIND and IORING_OP_LISTEN Gabriel Krisman Bertazi
@ 2024-05-31 21:12 ` Gabriel Krisman Bertazi
  2024-05-31 21:30   ` Jens Axboe
  2024-05-31 21:12 ` [PATCH 2/5] net: Split a __sys_bind helper for io_uring Gabriel Krisman Bertazi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-05-31 21:12 UTC (permalink / raw)
  To: axboe; +Cc: io-uring, netdev, Gabriel Krisman Bertazi

move_addr_to_kernel can fail, like if the user provides a bad sockaddr
pointer. In this case where the failure happens on ->prep() we don't
have a chance to clean the request later, so handle it here.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 io_uring/net.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 0a48596429d9..c3377e70aeeb 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -1657,6 +1657,7 @@ int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_connect *conn = io_kiocb_to_cmd(req, struct io_connect);
 	struct io_async_msghdr *io;
+	int ret;
 
 	if (sqe->len || sqe->buf_index || sqe->rw_flags || sqe->splice_fd_in)
 		return -EINVAL;
@@ -1669,7 +1670,10 @@ int io_connect_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (unlikely(!io))
 		return -ENOMEM;
 
-	return move_addr_to_kernel(conn->addr, conn->addr_len, &io->addr);
+	ret = move_addr_to_kernel(conn->addr, conn->addr_len, &io->addr);
+	if (ret)
+		io_netmsg_recycle(req, 0);
+	return ret;
 }
 
 int io_connect(struct io_kiocb *req, unsigned int issue_flags)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/5] net: Split a __sys_bind helper for io_uring
  2024-05-31 21:12 [PATCH 0/5] io_uring: support IORING_OP_BIND and IORING_OP_LISTEN Gabriel Krisman Bertazi
  2024-05-31 21:12 ` [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails Gabriel Krisman Bertazi
@ 2024-05-31 21:12 ` Gabriel Krisman Bertazi
  2024-05-31 22:38   ` Jens Axboe
  2024-05-31 21:12 ` [PATCH 3/5] net: Split a __sys_listen " Gabriel Krisman Bertazi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-05-31 21:12 UTC (permalink / raw)
  To: axboe; +Cc: io-uring, netdev, Gabriel Krisman Bertazi

io_uring holds a reference to the file and maintains a
sockaddr_storage address.  Similarly to what was done to
__sys_connect_file, split an internal helper for __sys_bind in
preparation to supporting an io_uring bind command.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 include/linux/socket.h |  2 ++
 net/socket.c           | 25 ++++++++++++++++---------
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 89d16b90370b..b3000f49e9f5 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -442,6 +442,8 @@ extern int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 extern int __sys_socket(int family, int type, int protocol);
 extern struct file *__sys_socket_file(int family, int type, int protocol);
 extern int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen);
+extern int __sys_bind_socket(struct socket *sock, struct sockaddr_storage *address,
+			     int addrlen);
 extern int __sys_connect_file(struct file *file, struct sockaddr_storage *addr,
 			      int addrlen, int file_flags);
 extern int __sys_connect(int fd, struct sockaddr __user *uservaddr,
diff --git a/net/socket.c b/net/socket.c
index e416920e9399..fd0714e10ced 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1822,6 +1822,20 @@ SYSCALL_DEFINE4(socketpair, int, family, int, type, int, protocol,
 	return __sys_socketpair(family, type, protocol, usockvec);
 }
 
+int __sys_bind_socket(struct socket *sock, struct sockaddr_storage *address,
+		      int addrlen)
+{
+	int err;
+
+	err = security_socket_bind(sock, (struct sockaddr *)address,
+				   addrlen);
+	if (!err)
+		err = READ_ONCE(sock->ops)->bind(sock,
+						 (struct sockaddr *)address,
+						 addrlen);
+	return err;
+}
+
 /*
  *	Bind a name to a socket. Nothing much to do here since it's
  *	the protocol's responsibility to handle the local address.
@@ -1839,15 +1853,8 @@ int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen)
 	sock = sockfd_lookup_light(fd, &err, &fput_needed);
 	if (sock) {
 		err = move_addr_to_kernel(umyaddr, addrlen, &address);
-		if (!err) {
-			err = security_socket_bind(sock,
-						   (struct sockaddr *)&address,
-						   addrlen);
-			if (!err)
-				err = READ_ONCE(sock->ops)->bind(sock,
-						      (struct sockaddr *)
-						      &address, addrlen);
-		}
+		if (!err)
+			err = __sys_bind_socket(sock, &address, addrlen);
 		fput_light(sock->file, fput_needed);
 	}
 	return err;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/5] net: Split a __sys_listen helper for io_uring
  2024-05-31 21:12 [PATCH 0/5] io_uring: support IORING_OP_BIND and IORING_OP_LISTEN Gabriel Krisman Bertazi
  2024-05-31 21:12 ` [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails Gabriel Krisman Bertazi
  2024-05-31 21:12 ` [PATCH 2/5] net: Split a __sys_bind helper for io_uring Gabriel Krisman Bertazi
@ 2024-05-31 21:12 ` Gabriel Krisman Bertazi
  2024-05-31 22:38   ` Jens Axboe
  2024-05-31 21:12 ` [PATCH 4/5] io_uring: Introduce IORING_OP_BIND Gabriel Krisman Bertazi
  2024-05-31 21:12 ` [PATCH 5/5] io_uring: Introduce IORING_OP_LISTEN Gabriel Krisman Bertazi
  4 siblings, 1 reply; 13+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-05-31 21:12 UTC (permalink / raw)
  To: axboe; +Cc: io-uring, netdev, Gabriel Krisman Bertazi

io_uring holds a reference to the file and maintains a sockaddr_storage
address.  Similarly to what was done to __sys_connect_file, split an
internal helper for __sys_listen in preparation to support an
io_uring listen command.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 include/linux/socket.h |  1 +
 net/socket.c           | 23 ++++++++++++++---------
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index b3000f49e9f5..c1f16cdab677 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -449,6 +449,7 @@ extern int __sys_connect_file(struct file *file, struct sockaddr_storage *addr,
 extern int __sys_connect(int fd, struct sockaddr __user *uservaddr,
 			 int addrlen);
 extern int __sys_listen(int fd, int backlog);
+extern int __sys_listen_socket(struct socket *sock, int backlog);
 extern int __sys_getsockname(int fd, struct sockaddr __user *usockaddr,
 			     int __user *usockaddr_len);
 extern int __sys_getpeername(int fd, struct sockaddr __user *usockaddr,
diff --git a/net/socket.c b/net/socket.c
index fd0714e10ced..fcbdd5bc47ac 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1870,23 +1870,28 @@ SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)
  *	necessary for a listen, and if that works, we mark the socket as
  *	ready for listening.
  */
+int __sys_listen_socket(struct socket *sock, int backlog)
+{
+	int somaxconn, err;
+
+	somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn);
+	if ((unsigned int)backlog > somaxconn)
+		backlog = somaxconn;
+
+	err = security_socket_listen(sock, backlog);
+	if (!err)
+		err = READ_ONCE(sock->ops)->listen(sock, backlog);
+	return err;
+}
 
 int __sys_listen(int fd, int backlog)
 {
 	struct socket *sock;
 	int err, fput_needed;
-	int somaxconn;
 
 	sock = sockfd_lookup_light(fd, &err, &fput_needed);
 	if (sock) {
-		somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn);
-		if ((unsigned int)backlog > somaxconn)
-			backlog = somaxconn;
-
-		err = security_socket_listen(sock, backlog);
-		if (!err)
-			err = READ_ONCE(sock->ops)->listen(sock, backlog);
-
+		err = __sys_listen_socket(sock, backlog);
 		fput_light(sock->file, fput_needed);
 	}
 	return err;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/5] io_uring: Introduce IORING_OP_BIND
  2024-05-31 21:12 [PATCH 0/5] io_uring: support IORING_OP_BIND and IORING_OP_LISTEN Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2024-05-31 21:12 ` [PATCH 3/5] net: Split a __sys_listen " Gabriel Krisman Bertazi
@ 2024-05-31 21:12 ` Gabriel Krisman Bertazi
  2024-05-31 22:30   ` Jens Axboe
  2024-05-31 21:12 ` [PATCH 5/5] io_uring: Introduce IORING_OP_LISTEN Gabriel Krisman Bertazi
  4 siblings, 1 reply; 13+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-05-31 21:12 UTC (permalink / raw)
  To: axboe; +Cc: io-uring, netdev, Gabriel Krisman Bertazi

IORING_OP_BIND provides the semantic of bind(2) via io_uring.  While
this is an essentially synchronous system call, the main point is to
enable a network path to execute fully with io_uring registered and
descriptorless files.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 include/uapi/linux/io_uring.h |  1 +
 io_uring/net.c                | 42 +++++++++++++++++++++++++++++++++++
 io_uring/net.h                |  3 +++
 io_uring/opdef.c              | 13 +++++++++++
 4 files changed, 59 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 994bf7af0efe..4ef153d95c87 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -257,6 +257,7 @@ enum io_uring_op {
 	IORING_OP_FUTEX_WAITV,
 	IORING_OP_FIXED_FD_INSTALL,
 	IORING_OP_FTRUNCATE,
+	IORING_OP_BIND,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
diff --git a/io_uring/net.c b/io_uring/net.c
index c3377e70aeeb..1ac193f92ff6 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -51,6 +51,11 @@ struct io_connect {
 	bool				seen_econnaborted;
 };
 
+struct io_bind {
+	struct file			*file;
+	int				addr_len;
+};
+
 struct io_sr_msg {
 	struct file			*file;
 	union {
@@ -1719,6 +1724,43 @@ int io_connect(struct io_kiocb *req, unsigned int issue_flags)
 	return IOU_OK;
 }
 
+int io_bind_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	struct io_bind *bind = io_kiocb_to_cmd(req, struct io_bind);
+	struct sockaddr __user *uaddr;
+	struct io_async_msghdr *io;
+	int ret;
+
+	if (sqe->len || sqe->buf_index || sqe->rw_flags || sqe->splice_fd_in)
+		return -EINVAL;
+
+	uaddr = u64_to_user_ptr(READ_ONCE(sqe->addr));
+	bind->addr_len =  READ_ONCE(sqe->addr2);
+
+	io = io_msg_alloc_async(req);
+	if (unlikely(!io))
+		return -ENOMEM;
+
+	ret = move_addr_to_kernel(uaddr, bind->addr_len, &io->addr);
+	if (ret)
+		io_req_msg_cleanup(req, 0);
+	return ret;
+}
+
+int io_bind(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_bind *bind = io_kiocb_to_cmd(req, struct io_bind);
+	struct io_async_msghdr *io = req->async_data;
+	int ret;
+
+	ret = __sys_bind_socket(sock_from_file(req->file),  &io->addr, bind->addr_len);
+	if (ret < 0)
+		req_set_fail(req);
+	io_req_set_res(req, ret, 0);
+
+	return 0;
+}
+
 void io_netmsg_cache_free(const void *entry)
 {
 	struct io_async_msghdr *kmsg = (struct io_async_msghdr *) entry;
diff --git a/io_uring/net.h b/io_uring/net.h
index 0eb1c1920fc9..49f9a7bc1113 100644
--- a/io_uring/net.h
+++ b/io_uring/net.h
@@ -49,6 +49,9 @@ int io_sendmsg_zc(struct io_kiocb *req, unsigned int issue_flags);
 int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 void io_send_zc_cleanup(struct io_kiocb *req);
 
+int io_bind_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_bind(struct io_kiocb *req, unsigned int issue_flags);
+
 void io_netmsg_cache_free(const void *entry);
 #else
 static inline void io_netmsg_cache_free(const void *entry)
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 2de5cca9504e..19ee9445f024 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -495,6 +495,16 @@ const struct io_issue_def io_issue_defs[] = {
 		.prep			= io_ftruncate_prep,
 		.issue			= io_ftruncate,
 	},
+	[IORING_OP_BIND] = {
+#if defined(CONFIG_NET)
+		.needs_file		= 1,
+		.prep			= io_bind_prep,
+		.issue			= io_bind,
+		.async_size		= sizeof(struct io_async_msghdr),
+#else
+		.prep			= io_eopnotsupp_prep,
+#endif
+	},
 };
 
 const struct io_cold_def io_cold_defs[] = {
@@ -711,6 +721,9 @@ const struct io_cold_def io_cold_defs[] = {
 	[IORING_OP_FTRUNCATE] = {
 		.name			= "FTRUNCATE",
 	},
+	[IORING_OP_BIND] = {
+		.name			= "BIND",
+	},
 };
 
 const char *io_uring_get_opcode(u8 opcode)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/5] io_uring: Introduce IORING_OP_LISTEN
  2024-05-31 21:12 [PATCH 0/5] io_uring: support IORING_OP_BIND and IORING_OP_LISTEN Gabriel Krisman Bertazi
                   ` (3 preceding siblings ...)
  2024-05-31 21:12 ` [PATCH 4/5] io_uring: Introduce IORING_OP_BIND Gabriel Krisman Bertazi
@ 2024-05-31 21:12 ` Gabriel Krisman Bertazi
  2024-05-31 22:31   ` Jens Axboe
  4 siblings, 1 reply; 13+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-05-31 21:12 UTC (permalink / raw)
  To: axboe; +Cc: io-uring, netdev, Gabriel Krisman Bertazi

IORING_OP_LISTEN provides the semantic of listen(2) via io_uring.  While
this is an essentially synchronous system call, the main point is to
enable a network path to execute fully with io_uring registered and
descriptorless files.

Signed-off-by: Gabriel Krisman Bertazi <[email protected]>
---
 include/uapi/linux/io_uring.h |  1 +
 io_uring/net.c                | 30 ++++++++++++++++++++++++++++++
 io_uring/net.h                |  3 +++
 io_uring/opdef.c              | 13 +++++++++++++
 4 files changed, 47 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 4ef153d95c87..2aaf7ee256ac 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -258,6 +258,7 @@ enum io_uring_op {
 	IORING_OP_FIXED_FD_INSTALL,
 	IORING_OP_FTRUNCATE,
 	IORING_OP_BIND,
+	IORING_OP_LISTEN,
 
 	/* this goes last, obviously */
 	IORING_OP_LAST,
diff --git a/io_uring/net.c b/io_uring/net.c
index 1ac193f92ff6..e39754b5278f 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -56,6 +56,11 @@ struct io_bind {
 	int				addr_len;
 };
 
+struct io_listen {
+	struct file			*file;
+	int				backlog;
+};
+
 struct io_sr_msg {
 	struct file			*file;
 	union {
@@ -1761,6 +1766,31 @@ int io_bind(struct io_kiocb *req, unsigned int issue_flags)
 	return 0;
 }
 
+int io_listen_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
+{
+	struct io_listen *listen = io_kiocb_to_cmd(req, struct io_listen);
+
+	if (sqe->addr || sqe->buf_index || sqe->rw_flags || sqe->splice_fd_in || sqe->addr2)
+		return -EINVAL;
+
+	listen->backlog = READ_ONCE(sqe->len);
+
+	return 0;
+}
+
+int io_listen(struct io_kiocb *req, unsigned int issue_flags)
+{
+	struct io_listen *listen = io_kiocb_to_cmd(req, struct io_listen);
+	int ret;
+
+	ret = __sys_listen_socket(sock_from_file(req->file), listen->backlog);
+	if (ret < 0)
+		req_set_fail(req);
+	io_req_set_res(req, ret, 0);
+
+	return 0;
+}
+
 void io_netmsg_cache_free(const void *entry)
 {
 	struct io_async_msghdr *kmsg = (struct io_async_msghdr *) entry;
diff --git a/io_uring/net.h b/io_uring/net.h
index 49f9a7bc1113..52bfee05f06a 100644
--- a/io_uring/net.h
+++ b/io_uring/net.h
@@ -52,6 +52,9 @@ void io_send_zc_cleanup(struct io_kiocb *req);
 int io_bind_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
 int io_bind(struct io_kiocb *req, unsigned int issue_flags);
 
+int io_listen_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe);
+int io_listen(struct io_kiocb *req, unsigned int issue_flags);
+
 void io_netmsg_cache_free(const void *entry);
 #else
 static inline void io_netmsg_cache_free(const void *entry)
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 19ee9445f024..7d5c51fb8e6e 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -503,6 +503,16 @@ const struct io_issue_def io_issue_defs[] = {
 		.async_size		= sizeof(struct io_async_msghdr),
 #else
 		.prep			= io_eopnotsupp_prep,
+#endif
+	},
+	[IORING_OP_LISTEN] = {
+#if defined(CONFIG_NET)
+		.needs_file		= 1,
+		.prep			= io_listen_prep,
+		.issue			= io_listen,
+		.async_size		= sizeof(struct io_async_msghdr),
+#else
+		.prep			= io_eopnotsupp_prep,
 #endif
 	},
 };
@@ -724,6 +734,9 @@ const struct io_cold_def io_cold_defs[] = {
 	[IORING_OP_BIND] = {
 		.name			= "BIND",
 	},
+	[IORING_OP_LISTEN] = {
+		.name			= "LISTEN",
+	},
 };
 
 const char *io_uring_get_opcode(u8 opcode)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails
  2024-05-31 21:12 ` [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails Gabriel Krisman Bertazi
@ 2024-05-31 21:30   ` Jens Axboe
  2024-05-31 23:01     ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2024-05-31 21:30 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: io-uring, netdev

On 5/31/24 3:12 PM, Gabriel Krisman Bertazi wrote:
> move_addr_to_kernel can fail, like if the user provides a bad sockaddr
> pointer. In this case where the failure happens on ->prep() we don't
> have a chance to clean the request later, so handle it here.

Hmm, that should still get freed in the cleanup path? It'll eventually
go on the compl_reqs list, and it has REQ_F_ASYNC_DATA set. Yes it'll
be slower than the recycling it, but that should not matter as it's
an erred request.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 4/5] io_uring: Introduce IORING_OP_BIND
  2024-05-31 21:12 ` [PATCH 4/5] io_uring: Introduce IORING_OP_BIND Gabriel Krisman Bertazi
@ 2024-05-31 22:30   ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-05-31 22:30 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: io-uring, netdev

On 5/31/24 3:12 PM, Gabriel Krisman Bertazi wrote:
> +int io_bind_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> +	struct io_bind *bind = io_kiocb_to_cmd(req, struct io_bind);
> +	struct sockaddr __user *uaddr;
> +	struct io_async_msghdr *io;
> +	int ret;
> +
> +	if (sqe->len || sqe->buf_index || sqe->rw_flags || sqe->splice_fd_in)
> +		return -EINVAL;
> +
> +	uaddr = u64_to_user_ptr(READ_ONCE(sqe->addr));
> +	bind->addr_len =  READ_ONCE(sqe->addr2);
> +
> +	io = io_msg_alloc_async(req);
> +	if (unlikely(!io))
> +		return -ENOMEM;
> +
> +	ret = move_addr_to_kernel(uaddr, bind->addr_len, &io->addr);
> +	if (ret)
> +		io_req_msg_cleanup(req, 0);
> +	return ret;
> +}

As mentioned in the other patch, I think this can just be:

	return move_addr_to_kernel(uaddr, bind->addr_len, &io->addr);
}

and have normal cleanup take care of it.

> +int io_bind(struct io_kiocb *req, unsigned int issue_flags)
> +{
> +	struct io_bind *bind = io_kiocb_to_cmd(req, struct io_bind);
> +	struct io_async_msghdr *io = req->async_data;
> +	int ret;
> +
> +	ret = __sys_bind_socket(sock_from_file(req->file),  &io->addr, bind->addr_len);
> +	if (ret < 0)
> +		req_set_fail(req);
> +	io_req_set_res(req, ret, 0);
> +
> +	return 0;
> +}

Kill the empty line before return.

Outside of those minor nits, patch looks good!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 5/5] io_uring: Introduce IORING_OP_LISTEN
  2024-05-31 21:12 ` [PATCH 5/5] io_uring: Introduce IORING_OP_LISTEN Gabriel Krisman Bertazi
@ 2024-05-31 22:31   ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-05-31 22:31 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: io-uring, netdev

On 5/31/24 3:12 PM, Gabriel Krisman Bertazi wrote:
> @@ -1761,6 +1766,31 @@ int io_bind(struct io_kiocb *req, unsigned int issue_flags)
>  	return 0;
>  }
>  
> +int io_listen_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
> +{
> +	struct io_listen *listen = io_kiocb_to_cmd(req, struct io_listen);
> +
> +	if (sqe->addr || sqe->buf_index || sqe->rw_flags || sqe->splice_fd_in || sqe->addr2)
> +		return -EINVAL;
> +
> +	listen->backlog = READ_ONCE(sqe->len);
> +
> +	return 0;
> +}

Extra empty line.

> +
> +int io_listen(struct io_kiocb *req, unsigned int issue_flags)
> +{
> +	struct io_listen *listen = io_kiocb_to_cmd(req, struct io_listen);
> +	int ret;
> +
> +	ret = __sys_listen_socket(sock_from_file(req->file), listen->backlog);
> +	if (ret < 0)
> +		req_set_fail(req);
> +	io_req_set_res(req, ret, 0);
> +
> +	return 0;
> +}

Extra empty line.

Outside of that, looks good!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/5] net: Split a __sys_listen helper for io_uring
  2024-05-31 21:12 ` [PATCH 3/5] net: Split a __sys_listen " Gabriel Krisman Bertazi
@ 2024-05-31 22:38   ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-05-31 22:38 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: io-uring, netdev

On 5/31/24 3:12 PM, Gabriel Krisman Bertazi wrote:
> io_uring holds a reference to the file and maintains a sockaddr_storage
> address.  Similarly to what was done to __sys_connect_file, split an
> internal helper for __sys_listen in preparation to support an
> io_uring listen command.

Reviewed-by: Jens Axboe <[email protected]>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/5] net: Split a __sys_bind helper for io_uring
  2024-05-31 21:12 ` [PATCH 2/5] net: Split a __sys_bind helper for io_uring Gabriel Krisman Bertazi
@ 2024-05-31 22:38   ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-05-31 22:38 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: io-uring, netdev

On 5/31/24 3:12 PM, Gabriel Krisman Bertazi wrote:
> io_uring holds a reference to the file and maintains a
> sockaddr_storage address.  Similarly to what was done to
> __sys_connect_file, split an internal helper for __sys_bind in
> preparation to supporting an io_uring bind command.

Reviewed-by: Jens Axboe <[email protected]>

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails
  2024-05-31 21:30   ` Jens Axboe
@ 2024-05-31 23:01     ` Gabriel Krisman Bertazi
  2024-05-31 23:07       ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-05-31 23:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, netdev

Jens Axboe <[email protected]> writes:

> On 5/31/24 3:12 PM, Gabriel Krisman Bertazi wrote:
>> move_addr_to_kernel can fail, like if the user provides a bad sockaddr
>> pointer. In this case where the failure happens on ->prep() we don't
>> have a chance to clean the request later, so handle it here.
>
> Hmm, that should still get freed in the cleanup path? It'll eventually
> go on the compl_reqs list, and it has REQ_F_ASYNC_DATA set. Yes it'll
> be slower than the recycling it, but that should not matter as it's
> an erred request.

Hm right.  I actually managed to reproduce some kind of memory
exhaustion yesterday that I thought was fixed by this patch.  But I see
your point and I'm failing to trigger it today.

Please disregard this patch. I'll look further to figure out what I did
there.


-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails
  2024-05-31 23:01     ` Gabriel Krisman Bertazi
@ 2024-05-31 23:07       ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2024-05-31 23:07 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: io-uring, netdev

On 5/31/24 5:01 PM, Gabriel Krisman Bertazi wrote:
> Jens Axboe <[email protected]> writes:
> 
>> On 5/31/24 3:12 PM, Gabriel Krisman Bertazi wrote:
>>> move_addr_to_kernel can fail, like if the user provides a bad sockaddr
>>> pointer. In this case where the failure happens on ->prep() we don't
>>> have a chance to clean the request later, so handle it here.
>>
>> Hmm, that should still get freed in the cleanup path? It'll eventually
>> go on the compl_reqs list, and it has REQ_F_ASYNC_DATA set. Yes it'll
>> be slower than the recycling it, but that should not matter as it's
>> an erred request.
> 
> Hm right.  I actually managed to reproduce some kind of memory
> exhaustion yesterday that I thought was fixed by this patch.  But I see
> your point and I'm failing to trigger it today.
> 
> Please disregard this patch. I'll look further to figure out what I did
> there.

Maybe enable KMEMLEAK? It's pretty handy for testing. If there is a leak
there, you should be able to reliably get info by doing:

# ./reproducer (should be easy, just bogus addr)
# echo scan > /sys/kernel/debug/kmemleak
# sleep 5
# echo scan > /sys/kernel/debug/kmemleak

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-05-31 23:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-31 21:12 [PATCH 0/5] io_uring: support IORING_OP_BIND and IORING_OP_LISTEN Gabriel Krisman Bertazi
2024-05-31 21:12 ` [PATCH 1/5] io_uring: Fix leak of async data when connect prep fails Gabriel Krisman Bertazi
2024-05-31 21:30   ` Jens Axboe
2024-05-31 23:01     ` Gabriel Krisman Bertazi
2024-05-31 23:07       ` Jens Axboe
2024-05-31 21:12 ` [PATCH 2/5] net: Split a __sys_bind helper for io_uring Gabriel Krisman Bertazi
2024-05-31 22:38   ` Jens Axboe
2024-05-31 21:12 ` [PATCH 3/5] net: Split a __sys_listen " Gabriel Krisman Bertazi
2024-05-31 22:38   ` Jens Axboe
2024-05-31 21:12 ` [PATCH 4/5] io_uring: Introduce IORING_OP_BIND Gabriel Krisman Bertazi
2024-05-31 22:30   ` Jens Axboe
2024-05-31 21:12 ` [PATCH 5/5] io_uring: Introduce IORING_OP_LISTEN Gabriel Krisman Bertazi
2024-05-31 22:31   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox