public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHSET v4 0/9] Support for provided buffers for send
@ 2024-02-26 19:21 Jens Axboe
  2024-02-26 19:21 ` [PATCH 1/9] net: remove {revc,send}msg_copy_msghdr() from exports Jens Axboe
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken

Hi,

We never supported provided buffers for sends, because it didn't seem
to make a lot of sense. But it actually does make a lot of sense! If
an app is receiving data, doing something with it, and then sending
either the same or another buffer out based on that, then if we use
provided buffers for sends we can guarantee that the sends are
serialized. This is because provided buffer rings are FIFO ordered,
as it's a ring buffer, and hence it doesn't really matter if you
have more than one send inflight.

This provides a nice efficiency win, but more importantly, it reduces
the complexity in the application as it no longer needs to track a
potential backlog of sends. The app just sets up a send based buffer
ring, exactly like it does for incoming data. And that's it, no more
dealing with serialized sends.

In some testing with proxy [1], in basic shuffling of packets I see a
68% improvement with this over manually dealing with serializing sends.
That's a pretty big win on top of making the app simpler. Using
multishot further brings a nice improvement on top, about 10% extra on
top.

You can also find the patches here:

https://git.kernel.dk/cgit/linux/log/?h=io_uring-send-queue

[1] https://git.kernel.dk/cgit/liburing/tree/examples/proxy.c

Changes since v3:

- Drop MSG_MORE patch, separate thing anyway. Moved the flags hunk into
  the respective send/sendmsg patches, where they actually belonged.
- Rename IORING_FEAT_SEND_BUFS to IORING_FEAT_SEND_BUF_SELECT
- Enable MSG_WAITALL for send multishot. If set, then we retry via
  poll, if not set, we terminate the multishot sequence. This also
  fixes send multishot with short send in general.
- Add other networking related patch for recv/recvmsg multishot,
  managing IORING_CQE_F_MORE better rather than always needing to hit
  -ENOBUFS to terminate.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/9] net: remove {revc,send}msg_copy_msghdr() from exports
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  2024-02-26 19:21 ` [PATCH 2/9] io_uring/net: add provided buffer support for IORING_OP_SEND Jens Axboe
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

The only user of these was io_uring, and it's not using them anymore.
Make them static and remove them from the socket header file.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/socket.h                         |  7 -------
 net/socket.c                                   | 14 +++++++-------
 tools/perf/trace/beauty/include/linux/socket.h |  7 -------
 3 files changed, 7 insertions(+), 21 deletions(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index cfcb7e2c3813..139c330ccf2c 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -422,13 +422,6 @@ extern long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 			       struct user_msghdr __user *umsg,
 			       struct sockaddr __user *uaddr,
 			       unsigned int flags);
-extern int sendmsg_copy_msghdr(struct msghdr *msg,
-			       struct user_msghdr __user *umsg, unsigned flags,
-			       struct iovec **iov);
-extern int recvmsg_copy_msghdr(struct msghdr *msg,
-			       struct user_msghdr __user *umsg, unsigned flags,
-			       struct sockaddr __user **uaddr,
-			       struct iovec **iov);
 extern int __copy_msghdr(struct msghdr *kmsg,
 			 struct user_msghdr *umsg,
 			 struct sockaddr __user **save_addr);
diff --git a/net/socket.c b/net/socket.c
index ed3df2f749bf..0f5d5079fd91 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2600,9 +2600,9 @@ static int ____sys_sendmsg(struct socket *sock, struct msghdr *msg_sys,
 	return err;
 }
 
-int sendmsg_copy_msghdr(struct msghdr *msg,
-			struct user_msghdr __user *umsg, unsigned flags,
-			struct iovec **iov)
+static int sendmsg_copy_msghdr(struct msghdr *msg,
+			       struct user_msghdr __user *umsg, unsigned flags,
+			       struct iovec **iov)
 {
 	int err;
 
@@ -2753,10 +2753,10 @@ SYSCALL_DEFINE4(sendmmsg, int, fd, struct mmsghdr __user *, mmsg,
 	return __sys_sendmmsg(fd, mmsg, vlen, flags, true);
 }
 
-int recvmsg_copy_msghdr(struct msghdr *msg,
-			struct user_msghdr __user *umsg, unsigned flags,
-			struct sockaddr __user **uaddr,
-			struct iovec **iov)
+static int recvmsg_copy_msghdr(struct msghdr *msg,
+			       struct user_msghdr __user *umsg, unsigned flags,
+			       struct sockaddr __user **uaddr,
+			       struct iovec **iov)
 {
 	ssize_t err;
 
diff --git a/tools/perf/trace/beauty/include/linux/socket.h b/tools/perf/trace/beauty/include/linux/socket.h
index cfcb7e2c3813..139c330ccf2c 100644
--- a/tools/perf/trace/beauty/include/linux/socket.h
+++ b/tools/perf/trace/beauty/include/linux/socket.h
@@ -422,13 +422,6 @@ extern long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
 			       struct user_msghdr __user *umsg,
 			       struct sockaddr __user *uaddr,
 			       unsigned int flags);
-extern int sendmsg_copy_msghdr(struct msghdr *msg,
-			       struct user_msghdr __user *umsg, unsigned flags,
-			       struct iovec **iov);
-extern int recvmsg_copy_msghdr(struct msghdr *msg,
-			       struct user_msghdr __user *umsg, unsigned flags,
-			       struct sockaddr __user **uaddr,
-			       struct iovec **iov);
 extern int __copy_msghdr(struct msghdr *kmsg,
 			 struct user_msghdr *umsg,
 			 struct sockaddr __user **save_addr);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/9] io_uring/net: add provided buffer support for IORING_OP_SEND
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
  2024-02-26 19:21 ` [PATCH 1/9] net: remove {revc,send}msg_copy_msghdr() from exports Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  2024-02-26 19:21 ` [PATCH 3/9] io_uring/net: add provided buffer support for IORING_OP_SENDMSG Jens Axboe
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

It's pretty trivial to wire up provided buffer support for the send
side, just like we do on the receive side. This enables setting up
a buffer ring that an application can use to push pending sends to,
and then have a send pick a buffer from that ring.

One of the challenges with async IO and networking sends is that you
can get into reordering conditions if you have more than one inflight
at the same time. Consider the following scenario where everything is
fine:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, completes successfully, posts CQE
5) sendB is issued, completes successfully, posts CQE

All is fine. Requests are always issued in-order, and both complete
inline as most sends do.

However, if we're flooding socket1 with sends, the following could
also result from the same sequence:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, completes successfully, posts CQE
7) sendA is retried, completes successfully, posts CQE

Now we've sent sendB before sendA, which can make things unhappy. If
both sendA and sendB had been using provided buffers, then it would look
as follows instead:

1) App queues dataA for sendA, queues sendA for socket1
2) App queues dataB for sendB queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, picks first buffer (dataA), completes successfully,
   posts CQE (which says "I sent dataA")
7) sendA is retried, picks first buffer (dataB), completes successfully,
   posts CQE (which says "I sent dataB")

Now we've sent the data in order, and everybody is happy.

It's worth noting that this also opens the door for supporting multishot
sends, as provided buffers would be a prerequisite for that. Those can
trigger either when new buffers are added to the outgoing ring, or (if
stalled due to lack of space) when space frees up in the socket.

Adds IORING_FEAT_SEND_BUF_SELECT as a feature flag.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/uapi/linux/io_uring.h |  1 +
 io_uring/io_uring.c           |  3 ++-
 io_uring/net.c                | 19 ++++++++++++++++---
 io_uring/opdef.c              |  1 +
 4 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 7bd10201a02b..061147cdfbb0 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -522,6 +522,7 @@ struct io_uring_params {
 #define IORING_FEAT_CQE_SKIP		(1U << 11)
 #define IORING_FEAT_LINKED_FILE		(1U << 12)
 #define IORING_FEAT_REG_REG_RING	(1U << 13)
+#define IORING_FEAT_SEND_BUF_SELECT	(1U << 14)
 
 /*
  * io_uring_register(2) opcodes and arguments
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index cf2f514b7cc0..edd7cded1a80 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3962,7 +3962,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
 			IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED |
 			IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
 			IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
-			IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING;
+			IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING |
+			IORING_FEAT_SEND_BUF_SELECT;
 
 	if (copy_to_user(params, p, sizeof(*p))) {
 		ret = -EFAULT;
diff --git a/io_uring/net.c b/io_uring/net.c
index fcbaeb7cc045..c6a24973352e 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -436,6 +436,7 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
 	struct io_async_msghdr iomsg, *kmsg;
 	struct socket *sock;
+	unsigned int cflags;
 	unsigned flags;
 	int min_ret = 0;
 	int ret;
@@ -489,7 +490,8 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 		ret += sr->done_io;
 	else if (sr->done_io)
 		ret = sr->done_io;
-	io_req_set_res(req, ret, 0);
+	cflags = io_put_kbuf(req, issue_flags);
+	io_req_set_res(req, ret, cflags);
 	return IOU_OK;
 }
 
@@ -497,8 +499,9 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct sockaddr_storage __address;
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
-	struct msghdr msg;
+	size_t len = sr->len;
 	struct socket *sock;
+	struct msghdr msg;
 	unsigned flags;
 	int min_ret = 0;
 	int ret;
@@ -531,7 +534,17 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(!sock))
 		return -ENOTSOCK;
 
-	ret = import_ubuf(ITER_SOURCE, sr->buf, sr->len, &msg.msg_iter);
+	if (io_do_buffer_select(req)) {
+		void __user *buf;
+
+		buf = io_buffer_select(req, &len, issue_flags);
+		if (!buf)
+			return -ENOBUFS;
+		sr->buf = buf;
+		sr->len = len;
+	}
+
+	ret = import_ubuf(ITER_SOURCE, sr->buf, len, &msg.msg_iter);
 	if (unlikely(ret))
 		return ret;
 
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 9c080aadc5a6..88fbe5cfd379 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -273,6 +273,7 @@ const struct io_issue_def io_issue_defs[] = {
 		.audit_skip		= 1,
 		.ioprio			= 1,
 		.manual_alloc		= 1,
+		.buffer_select		= 1,
 #if defined(CONFIG_NET)
 		.prep			= io_sendmsg_prep,
 		.issue			= io_send,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/9] io_uring/net: add provided buffer support for IORING_OP_SENDMSG
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
  2024-02-26 19:21 ` [PATCH 1/9] net: remove {revc,send}msg_copy_msghdr() from exports Jens Axboe
  2024-02-26 19:21 ` [PATCH 2/9] io_uring/net: add provided buffer support for IORING_OP_SEND Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  2024-02-26 19:21 ` [PATCH 4/9] io_uring/kbuf: flag request if buffer pool is empty after buffer pick Jens Axboe
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

Adds provided buffer support for sendmsg as well, see the previous commit
that added it to IORING_OP_SEND for a longer explanation of why this
makes sense.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/net.c   | 15 ++++++++++++++-
 io_uring/opdef.c |  1 +
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index c6a24973352e..679eefcd11c5 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -459,6 +459,17 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
 		return io_setup_async_msg(req, kmsg, issue_flags);
 
+	if (io_do_buffer_select(req)) {
+		void __user *buf;
+		size_t len = sr->len;
+
+		buf = io_buffer_select(req, &len, issue_flags);
+		if (!buf)
+			return -ENOBUFS;
+
+		iov_iter_ubuf(&kmsg->msg.msg_iter, ITER_SOURCE, buf, len);
+	}
+
 	flags = sr->msg_flags;
 	if (issue_flags & IO_URING_F_NONBLOCK)
 		flags |= MSG_DONTWAIT;
@@ -502,6 +513,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	size_t len = sr->len;
 	struct socket *sock;
 	struct msghdr msg;
+	unsigned int cflags;
 	unsigned flags;
 	int min_ret = 0;
 	int ret;
@@ -576,7 +588,8 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 		ret += sr->done_io;
 	else if (sr->done_io)
 		ret = sr->done_io;
-	io_req_set_res(req, ret, 0);
+	cflags = io_put_kbuf(req, issue_flags);
+	io_req_set_res(req, ret, cflags);
 	return IOU_OK;
 }
 
diff --git a/io_uring/opdef.c b/io_uring/opdef.c
index 88fbe5cfd379..1f6b09e61ef8 100644
--- a/io_uring/opdef.c
+++ b/io_uring/opdef.c
@@ -139,6 +139,7 @@ const struct io_issue_def io_issue_defs[] = {
 		.pollout		= 1,
 		.ioprio			= 1,
 		.manual_alloc		= 1,
+		.buffer_select		= 1,
 #if defined(CONFIG_NET)
 		.prep			= io_sendmsg_prep,
 		.issue			= io_sendmsg,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/9] io_uring/kbuf: flag request if buffer pool is empty after buffer pick
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
                   ` (2 preceding siblings ...)
  2024-02-26 19:21 ` [PATCH 3/9] io_uring/net: add provided buffer support for IORING_OP_SENDMSG Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  2024-02-26 19:21 ` [PATCH 5/9] io_uring/net: avoid redundant -ENOBUFS on recv multishot retry Jens Axboe
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

Normally we do an extra roundtrip for retries even if the buffer pool has
depleted, as we don't check that upfront. Rather than add this check, have
the buffer selection methods mark the request with REQ_F_BL_EMPTY if the
used buffer group is out of buffers after this selection. This is very
cheap to do once we're all the way inside there anyway, and it gives the
caller a chance to make better decisions on how to proceed.

For example, recv/recvmsg multishot could check this flag when it
decides whether to keep receiving or not.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/linux/io_uring_types.h |  3 +++
 io_uring/kbuf.c                | 10 ++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index bd7071aeec5d..d8111d64812b 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -480,6 +480,7 @@ enum {
 	REQ_F_POLL_NO_LAZY_BIT,
 	REQ_F_CANCEL_SEQ_BIT,
 	REQ_F_CAN_POLL_BIT,
+	REQ_F_BL_EMPTY_BIT,
 
 	/* not a real bit, just to check we're not overflowing the space */
 	__REQ_F_LAST_BIT,
@@ -556,6 +557,8 @@ enum {
 	REQ_F_CANCEL_SEQ	= IO_REQ_FLAG(REQ_F_CANCEL_SEQ_BIT),
 	/* file is pollable */
 	REQ_F_CAN_POLL		= IO_REQ_FLAG(REQ_F_CAN_POLL_BIT),
+	/* buffer list was empty after selection of buffer */
+	REQ_F_BL_EMPTY		= IO_REQ_FLAG(REQ_F_BL_EMPTY_BIT),
 };
 
 typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts);
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index ee866d646997..3d257ed9031b 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -139,6 +139,8 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
 		list_del(&kbuf->list);
 		if (*len == 0 || *len > kbuf->len)
 			*len = kbuf->len;
+		if (list_empty(&bl->buf_list))
+			req->flags |= REQ_F_BL_EMPTY;
 		req->flags |= REQ_F_BUFFER_SELECTED;
 		req->kbuf = kbuf;
 		req->buf_index = kbuf->bid;
@@ -152,12 +154,16 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len,
 					  unsigned int issue_flags)
 {
 	struct io_uring_buf_ring *br = bl->buf_ring;
+	__u16 tail, head = bl->head;
 	struct io_uring_buf *buf;
-	__u16 head = bl->head;
 
-	if (unlikely(smp_load_acquire(&br->tail) == head))
+	tail = smp_load_acquire(&br->tail);
+	if (unlikely(tail == head))
 		return NULL;
 
+	if (head + 1 == tail)
+		req->flags |= REQ_F_BL_EMPTY;
+
 	head &= bl->mask;
 	/* mmaped buffers are always contig */
 	if (bl->is_mmap || head < IO_BUFFER_LIST_BUF_PER_PAGE) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/9] io_uring/net: avoid redundant -ENOBUFS on recv multishot retry
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
                   ` (3 preceding siblings ...)
  2024-02-26 19:21 ` [PATCH 4/9] io_uring/kbuf: flag request if buffer pool is empty after buffer pick Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  2024-02-26 19:21 ` [PATCH 6/9] io_uring/net: move recv/recvmsg flags out of retry loop Jens Axboe
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

Now that we know if the buffer list is empty upfront, there's no point
doing a retry for that case. This can help avoid a redundant -ENOBUFS
which would terminate the multishot receive, requiring the app to
re-arm it.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/net.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/io_uring/net.c b/io_uring/net.c
index 679eefcd11c5..aaab4f121b7f 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -698,6 +698,11 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
 		int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE;
 
 		io_recv_prep_retry(req);
+
+		/* buffer list now empty, no point trying again */
+		if (req->flags & REQ_F_BL_EMPTY)
+			goto enobufs;
+
 		/* Known not-empty or unknown state, retry */
 		if (cflags & IORING_CQE_F_SOCK_NONEMPTY || msg->msg_inq == -1) {
 			if (sr->nr_multishot_loops++ < MULTISHOT_MAX_RETRY)
@@ -706,6 +711,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
 			sr->nr_multishot_loops = 0;
 			mshot_retry_ret = IOU_REQUEUE;
 		}
+enobufs:
 		if (issue_flags & IO_URING_F_MULTISHOT)
 			*ret = mshot_retry_ret;
 		else
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/9] io_uring/net: move recv/recvmsg flags out of retry loop
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
                   ` (4 preceding siblings ...)
  2024-02-26 19:21 ` [PATCH 5/9] io_uring/net: avoid redundant -ENOBUFS on recv multishot retry Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  2024-02-26 19:21 ` [PATCH 7/9] io_uring/net: add generic multishot retry helper Jens Axboe
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

The flags don't change, just intialize them once rather than every loop
for multishot.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/net.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index aaab4f121b7f..c73e4cd246ab 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -843,6 +843,10 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	if (!io_check_multishot(req, issue_flags))
 		return io_setup_async_msg(req, kmsg, issue_flags);
 
+	flags = sr->msg_flags;
+	if (force_nonblock)
+		flags |= MSG_DONTWAIT;
+
 retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
@@ -863,10 +867,6 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 		iov_iter_ubuf(&kmsg->msg.msg_iter, ITER_DEST, buf, len);
 	}
 
-	flags = sr->msg_flags;
-	if (force_nonblock)
-		flags |= MSG_DONTWAIT;
-
 	kmsg->msg.msg_get_inq = 1;
 	kmsg->msg.msg_inq = -1;
 	if (req->flags & REQ_F_APOLL_MULTISHOT) {
@@ -952,6 +952,10 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	msg.msg_iocb = NULL;
 	msg.msg_ubuf = NULL;
 
+	flags = sr->msg_flags;
+	if (force_nonblock)
+		flags |= MSG_DONTWAIT;
+
 retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
@@ -970,9 +974,6 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	msg.msg_inq = -1;
 	msg.msg_flags = 0;
 
-	flags = sr->msg_flags;
-	if (force_nonblock)
-		flags |= MSG_DONTWAIT;
 	if (flags & MSG_WAITALL)
 		min_ret = iov_iter_count(&msg.msg_iter);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 7/9] io_uring/net: add generic multishot retry helper
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
                   ` (5 preceding siblings ...)
  2024-02-26 19:21 ` [PATCH 6/9] io_uring/net: move recv/recvmsg flags out of retry loop Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  2024-02-26 19:21 ` [PATCH 8/9] io_uring/net: support multishot for send Jens Axboe
  2024-02-26 19:21 ` [PATCH 9/9] io_uring/net: support multishot for sendmsg Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

This is just moving io_recv_prep_retry() higher up so we can use it
for sends as well, and renaming it to be generically useful for both
sends and receives.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/net.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index c73e4cd246ab..9ef11883a34a 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -204,6 +204,16 @@ static int io_setup_async_msg(struct io_kiocb *req,
 	return -EAGAIN;
 }
 
+static inline void io_mshot_prep_retry(struct io_kiocb *req)
+{
+	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+
+	req->flags &= ~REQ_F_BL_EMPTY;
+	sr->done_io = 0;
+	sr->len = 0; /* get from the provided buffer */
+	req->buf_index = sr->buf_group;
+}
+
 static bool io_recvmsg_multishot_overflow(struct io_async_msghdr *iomsg)
 {
 	int hdr;
@@ -654,15 +664,6 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
-static inline void io_recv_prep_retry(struct io_kiocb *req)
-{
-	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
-
-	sr->done_io = 0;
-	sr->len = 0; /* get from the provided buffer */
-	req->buf_index = sr->buf_group;
-}
-
 /*
  * Finishes io_recv and io_recvmsg.
  *
@@ -697,7 +698,7 @@ static inline bool io_recv_finish(struct io_kiocb *req, int *ret,
 		struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
 		int mshot_retry_ret = IOU_ISSUE_SKIP_COMPLETE;
 
-		io_recv_prep_retry(req);
+		io_mshot_prep_retry(req);
 
 		/* buffer list now empty, no point trying again */
 		if (req->flags & REQ_F_BL_EMPTY)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 8/9] io_uring/net: support multishot for send
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
                   ` (6 preceding siblings ...)
  2024-02-26 19:21 ` [PATCH 7/9] io_uring/net: add generic multishot retry helper Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  2024-02-26 19:21 ` [PATCH 9/9] io_uring/net: support multishot for sendmsg Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

This works very much like the receive side, except for sends. The idea
is that an application can fill outgoing buffers in a provided buffer
group, and then arm a single send that will service them all. For now
this variant just terminates when we are out of buffers to send, and
hence the application needs to re-arm it if IORING_CQE_F_MORE isn't
set, as per usual for multishot requests.

This only enables it for IORING_OP_SEND, IORING_OP_SENDMSG is coming
in a separate patch. However, this patch does do a lot of the prep
work that makes wiring up the sendmsg variant pretty trivial. They
share the prep side.

Enabling multishot for sends is, again, identical to the receive side.
The app sets IORING_SEND_MULTISHOT in sqe->ioprio. This flag is also
the same as IORING_RECV_MULTISHOT.

Signed-off-by: Jens Axboe <[email protected]>
---
 include/uapi/linux/io_uring.h |  8 ++++
 io_uring/net.c                | 90 ++++++++++++++++++++++++++++++-----
 2 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 061147cdfbb0..feacc64c90a0 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -351,9 +351,17 @@ enum io_uring_op {
  *				0 is reported if zerocopy was actually possible.
  *				IORING_NOTIF_USAGE_ZC_COPIED if data was copied
  *				(at least partially).
+ *
+ * IORING_SEND_MULTISHOT	Multishot send. Like the recv equivalent, must
+ *				be used with provided buffers. Keeps sending
+ *				from the given buffer group ID until it is
+ *				empty. Sets IORING_CQE_F_MORE if more
+ *				completions should be expected on behalf of
+ *				the same SQE.
  */
 #define IORING_RECVSEND_POLL_FIRST	(1U << 0)
 #define IORING_RECV_MULTISHOT		(1U << 1)
+#define IORING_SEND_MULTISHOT		IORING_RECV_MULTISHOT
 #define IORING_RECVSEND_FIXED_BUF	(1U << 2)
 #define IORING_SEND_ZC_REPORT_USAGE	(1U << 3)
 
diff --git a/io_uring/net.c b/io_uring/net.c
index 9ef11883a34a..0b990df04ac7 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -411,6 +411,8 @@ void io_sendmsg_recvmsg_cleanup(struct io_kiocb *req)
 	kfree(io->free_iov);
 }
 
+#define SENDMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_SEND_MULTISHOT)
+
 int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
@@ -427,11 +429,17 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	sr->len = READ_ONCE(sqe->len);
 	sr->flags = READ_ONCE(sqe->ioprio);
-	if (sr->flags & ~IORING_RECVSEND_POLL_FIRST)
+	if (sr->flags & ~SENDMSG_FLAGS)
 		return -EINVAL;
 	sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
 	if (sr->msg_flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
+	if (sr->flags & IORING_SEND_MULTISHOT) {
+		if (!(req->flags & REQ_F_BUFFER_SELECT))
+			return -EINVAL;
+		req->flags |= REQ_F_APOLL_MULTISHOT;
+		sr->buf_group = req->buf_index;
+	}
 
 #ifdef CONFIG_COMPAT
 	if (req->ctx->compat)
@@ -441,6 +449,44 @@ int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
+static inline bool io_send_finish(struct io_kiocb *req, int *ret,
+				  struct msghdr *msg, unsigned issue_flags)
+{
+	bool mshot_finished = *ret <= 0;
+	unsigned int cflags;
+
+	cflags = io_put_kbuf(req, issue_flags);
+
+	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
+		io_req_set_res(req, *ret, cflags);
+		*ret = IOU_OK;
+		return true;
+	}
+
+	if (mshot_finished || req->flags & REQ_F_BL_EMPTY)
+		goto finish;
+
+	/*
+	 * Fill CQE for this receive and see if we should keep trying to
+	 * receive from this socket.
+	 */
+	if (io_fill_cqe_req_aux(req, issue_flags & IO_URING_F_COMPLETE_DEFER,
+				*ret, cflags | IORING_CQE_F_MORE)) {
+		io_mshot_prep_retry(req);
+		*ret = IOU_ISSUE_SKIP_COMPLETE;
+		return false;
+	}
+
+	/* Otherwise stop multishot but use the current result. */
+finish:
+	io_req_set_res(req, *ret, cflags);
+	if (issue_flags & IO_URING_F_MULTISHOT)
+		*ret = IOU_STOP_MULTISHOT;
+	else
+		*ret = IOU_OK;
+	return true;
+}
+
 int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
@@ -523,7 +569,6 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	size_t len = sr->len;
 	struct socket *sock;
 	struct msghdr msg;
-	unsigned int cflags;
 	unsigned flags;
 	int min_ret = 0;
 	int ret;
@@ -552,10 +597,18 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
 		return io_setup_async_addr(req, &__address, issue_flags);
 
+	if (!io_check_multishot(req, issue_flags))
+		return -EAGAIN;
+
 	sock = sock_from_file(req->file);
 	if (unlikely(!sock))
 		return -ENOTSOCK;
 
+	flags = sr->msg_flags;
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		flags |= MSG_DONTWAIT;
+
+retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
 
@@ -570,19 +623,28 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(ret))
 		return ret;
 
-	flags = sr->msg_flags;
-	if (issue_flags & IO_URING_F_NONBLOCK)
-		flags |= MSG_DONTWAIT;
-	if (flags & MSG_WAITALL)
+	/*
+	 * If MSG_WAITALL is set, or this is a multishot send, then we need
+	 * the full amount. If just multishot is set, if we do a short send
+	 * then we complete the multishot sequence rather than continue on.
+	 */
+	if (flags & MSG_WAITALL || req->flags & REQ_F_APOLL_MULTISHOT)
 		min_ret = iov_iter_count(&msg.msg_iter);
 
 	flags &= ~MSG_INTERNAL_SENDMSG_FLAGS;
 	msg.msg_flags = flags;
 	ret = sock_sendmsg(sock, &msg);
 	if (ret < min_ret) {
-		if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
-			return io_setup_async_addr(req, &__address, issue_flags);
-
+		if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) {
+			ret = io_setup_async_addr(req, &__address, issue_flags);
+			if (ret != -EAGAIN)
+				return ret;
+			if (issue_flags & IO_URING_F_MULTISHOT) {
+				io_kbuf_recycle(req, issue_flags);
+				return IOU_ISSUE_SKIP_COMPLETE;
+			}
+			return -EAGAIN;
+		}
 		if (ret > 0 && io_net_retry(sock, flags)) {
 			sr->len -= ret;
 			sr->buf += ret;
@@ -598,9 +660,13 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 		ret += sr->done_io;
 	else if (sr->done_io)
 		ret = sr->done_io;
-	cflags = io_put_kbuf(req, issue_flags);
-	io_req_set_res(req, ret, cflags);
-	return IOU_OK;
+	else
+		io_kbuf_recycle(req, issue_flags);
+
+	if (!io_send_finish(req, &ret, &msg, issue_flags))
+		goto retry_multishot;
+
+	return ret;
 }
 
 int io_recvmsg_prep_async(struct io_kiocb *req)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 9/9] io_uring/net: support multishot for sendmsg
  2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
                   ` (7 preceding siblings ...)
  2024-02-26 19:21 ` [PATCH 8/9] io_uring/net: support multishot for send Jens Axboe
@ 2024-02-26 19:21 ` Jens Axboe
  8 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2024-02-26 19:21 UTC (permalink / raw)
  To: io-uring; +Cc: asml.silence, dyudaken, Jens Axboe

Same as the IORING_OP_SEND multishot mode. Needs further work, but it's
functional and can be tested.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/net.c | 47 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 33 insertions(+), 14 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 0b990df04ac7..c9d9dc611087 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -492,7 +492,6 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
 	struct io_async_msghdr iomsg, *kmsg;
 	struct socket *sock;
-	unsigned int cflags;
 	unsigned flags;
 	int min_ret = 0;
 	int ret;
@@ -515,6 +514,14 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
 		return io_setup_async_msg(req, kmsg, issue_flags);
 
+	if (!io_check_multishot(req, issue_flags))
+		return io_setup_async_msg(req, kmsg, issue_flags);
+
+	flags = sr->msg_flags;
+	if (issue_flags & IO_URING_F_NONBLOCK)
+		flags |= MSG_DONTWAIT;
+
+retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
 		size_t len = sr->len;
@@ -526,17 +533,25 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 		iov_iter_ubuf(&kmsg->msg.msg_iter, ITER_SOURCE, buf, len);
 	}
 
-	flags = sr->msg_flags;
-	if (issue_flags & IO_URING_F_NONBLOCK)
-		flags |= MSG_DONTWAIT;
-	if (flags & MSG_WAITALL)
+	/*
+	 * If MSG_WAITALL is set, or this is a multishot send, then we need
+	 * the full amount. If just multishot is set, if we do a short send
+	 * then we complete the multishot sequence rather than continue on.
+	 */
+	if (flags & MSG_WAITALL || req->flags & REQ_F_APOLL_MULTISHOT)
 		min_ret = iov_iter_count(&kmsg->msg.msg_iter);
 
 	ret = __sys_sendmsg_sock(sock, &kmsg->msg, flags);
 
 	if (ret < min_ret) {
-		if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
-			return io_setup_async_msg(req, kmsg, issue_flags);
+		if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK)) {
+			ret = io_setup_async_msg(req, kmsg, issue_flags);
+			if (ret == -EAGAIN && (issue_flags & IO_URING_F_MULTISHOT)) {
+				io_kbuf_recycle(req, issue_flags);
+				return IOU_ISSUE_SKIP_COMPLETE;
+			}
+			return ret;
+		}
 		if (ret > 0 && io_net_retry(sock, flags)) {
 			kmsg->msg.msg_controllen = 0;
 			kmsg->msg.msg_control = NULL;
@@ -548,18 +563,22 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags)
 			ret = -EINTR;
 		req_set_fail(req);
 	}
+	if (ret >= 0)
+		ret += sr->done_io;
+	else if (sr->done_io)
+		ret = sr->done_io;
+	else
+		io_kbuf_recycle(req, issue_flags);
+
+	if (!io_send_finish(req, &ret, &kmsg->msg, issue_flags))
+		goto retry_multishot;
+
 	/* fast path, check for non-NULL to avoid function call */
 	if (kmsg->free_iov)
 		kfree(kmsg->free_iov);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
 	io_netmsg_recycle(req, issue_flags);
-	if (ret >= 0)
-		ret += sr->done_io;
-	else if (sr->done_io)
-		ret = sr->done_io;
-	cflags = io_put_kbuf(req, issue_flags);
-	io_req_set_res(req, ret, cflags);
-	return IOU_OK;
+	return ret;
 }
 
 int io_send(struct io_kiocb *req, unsigned int issue_flags)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-26 19:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-26 19:21 [PATCHSET v4 0/9] Support for provided buffers for send Jens Axboe
2024-02-26 19:21 ` [PATCH 1/9] net: remove {revc,send}msg_copy_msghdr() from exports Jens Axboe
2024-02-26 19:21 ` [PATCH 2/9] io_uring/net: add provided buffer support for IORING_OP_SEND Jens Axboe
2024-02-26 19:21 ` [PATCH 3/9] io_uring/net: add provided buffer support for IORING_OP_SENDMSG Jens Axboe
2024-02-26 19:21 ` [PATCH 4/9] io_uring/kbuf: flag request if buffer pool is empty after buffer pick Jens Axboe
2024-02-26 19:21 ` [PATCH 5/9] io_uring/net: avoid redundant -ENOBUFS on recv multishot retry Jens Axboe
2024-02-26 19:21 ` [PATCH 6/9] io_uring/net: move recv/recvmsg flags out of retry loop Jens Axboe
2024-02-26 19:21 ` [PATCH 7/9] io_uring/net: add generic multishot retry helper Jens Axboe
2024-02-26 19:21 ` [PATCH 8/9] io_uring/net: support multishot for send Jens Axboe
2024-02-26 19:21 ` [PATCH 9/9] io_uring/net: support multishot for sendmsg Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox