[PATCH for-next 0/8] io_uring: multishot recv

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH for-next 0/8] io_uring: multishot recv
@ 2022-06-28 15:02 Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 1/8] io_uring: allow 0 length for buffer select Dylan Yudaken
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

This series adds support for multishot recv/recvmsg to io_uring.

The idea is that generally socket applications will be continually
enqueuing a new recv() when the previous one completes. This can be
improved on by allowing the application to queue a multishot receive,
which will post completions as and when data is available. It uses the
provided buffers feature to receive new data into a pool provided by
the application.

This is more performant in a few ways:
* Subsequent receives are queued up straight away without requiring the
  application to finish a processing loop.
* If there are more data in the socket (sat the provided buffer
  size is smaller than the socket buffer) then the data is immediately
  returned, improving batching.
*  Poll is only armed once and reused, saving CPU cycles

Running a small network benchmark [1] shows improved QPS of ~6-8% over a range of loads.

[1]: https://github.com/DylanZA/netbench/tree/multishot_recv

Dylan Yudaken (8):
  io_uring: allow 0 length for buffer select
  io_uring: restore bgid in io_put_kbuf
  io_uring: allow iov_len = 0 for recvmsg and buffer select
  io_uring: recycle buffers on error
  io_uring: clean up io_poll_check_events return values
  io_uring: add IOU_STOP_MULTISHOT return code
  io_uring: add IORING_RECV_MULTISHOT flag
  io_uring: multishot recv

 include/uapi/linux/io_uring.h |   5 ++
 io_uring/io_uring.h           |   7 ++
 io_uring/kbuf.c               |   4 +-
 io_uring/kbuf.h               |   8 ++-
 io_uring/net.c                | 119 ++++++++++++++++++++++++++++------
 io_uring/poll.c               |  30 ++++++---
 6 files changed, 140 insertions(+), 33 deletions(-)

base-commit: 755441b9029317d981269da0256e0a7e5a7fe2cc
-- 
2.30.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH for-next 1/8] io_uring: allow 0 length for buffer select
  2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
@ 2022-06-28 15:02 ` Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 2/8] io_uring: restore bgid in io_put_kbuf Dylan Yudaken
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

If user gives 0 for length, we can set it from the available buffer size.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/kbuf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 8e4f1e8aaf4a..4ed5102461bf 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -115,7 +115,7 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
 
 		kbuf = list_first_entry(&bl->buf_list, struct io_buffer, list);
 		list_del(&kbuf->list);
-		if (*len > kbuf->len)
+		if (*len == 0 || *len > kbuf->len)
 			*len = kbuf->len;
 		req->flags |= REQ_F_BUFFER_SELECTED;
 		req->kbuf = kbuf;
@@ -145,7 +145,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len,
 		buf = page_address(bl->buf_pages[index]);
 		buf += off;
 	}
-	if (*len > buf->len)
+	if (*len == 0 || *len > buf->len)
 		*len = buf->len;
 	req->flags |= REQ_F_BUFFER_RING;
 	req->buf_list = bl;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-next 2/8] io_uring: restore bgid in io_put_kbuf
  2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 1/8] io_uring: allow 0 length for buffer select Dylan Yudaken
@ 2022-06-28 15:02 ` Dylan Yudaken
  2022-06-28 15:12   ` Jens Axboe
  2022-06-28 15:02 ` [PATCH for-next 3/8] io_uring: allow iov_len = 0 for recvmsg and buffer select Dylan Yudaken
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Attempt to restore bgid. This is needed when recycling unused buffers as
the next time around it will want the correct bgid.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/kbuf.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 3d48f1ab5439..c64f02ea1c30 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -96,16 +96,20 @@ static inline void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
 static inline unsigned int __io_put_kbuf_list(struct io_kiocb *req,
 					      struct list_head *list)
 {
+	unsigned int ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
 	if (req->flags & REQ_F_BUFFER_RING) {
-		if (req->buf_list)
+		if (req->buf_list) {
+			req->buf_index = req->buf_list->bgid;
 			req->buf_list->head++;
+		}
 		req->flags &= ~REQ_F_BUFFER_RING;
 	} else {
+		req->buf_index = req->kbuf->bgid;
 		list_add(&req->kbuf->list, list);
 		req->flags &= ~REQ_F_BUFFER_SELECTED;
 	}
 
-	return IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+	return ret;
 }
 
 static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH for-next 2/8] io_uring: restore bgid in io_put_kbuf
  2022-06-28 15:02 ` [PATCH for-next 2/8] io_uring: restore bgid in io_put_kbuf Dylan Yudaken
@ 2022-06-28 15:12   ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-06-28 15:12 UTC (permalink / raw)
  To: Dylan Yudaken, Pavel Begunkov, io-uring; +Cc: Kernel-team, linux-kernel

On 6/28/22 9:02 AM, Dylan Yudaken wrote:
> Attempt to restore bgid. This is needed when recycling unused buffers as
> the next time around it will want the correct bgid.
> 
> Signed-off-by: Dylan Yudaken <[email protected]>
> ---
>  io_uring/kbuf.h | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
> index 3d48f1ab5439..c64f02ea1c30 100644
> --- a/io_uring/kbuf.h
> +++ b/io_uring/kbuf.h
> @@ -96,16 +96,20 @@ static inline void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
>  static inline unsigned int __io_put_kbuf_list(struct io_kiocb *req,
>  					      struct list_head *list)
>  {
> +	unsigned int ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
>  	if (req->flags & REQ_F_BUFFER_RING) {

Should have a newline here after the 'ret' variable declaration.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH for-next 3/8] io_uring: allow iov_len = 0 for recvmsg and buffer select
  2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 1/8] io_uring: allow 0 length for buffer select Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 2/8] io_uring: restore bgid in io_put_kbuf Dylan Yudaken
@ 2022-06-28 15:02 ` Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 4/8] io_uring: recycle buffers on error Dylan Yudaken
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

When using BUFFER_SELECT there is no technical requirement that the user
actually provides iov, and this removes one copy_from_user call.

So allow iov_len to be 0.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/net.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 19a805c3814c..5e84f7ab92a3 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -300,12 +300,18 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
 		return ret;
 
 	if (req->flags & REQ_F_BUFFER_SELECT) {
-		if (iov_len > 1)
+		if (iov_len == 0) {
+			sr->len = iomsg->fast_iov[0].iov_len = 0;
+			iomsg->fast_iov[0].iov_base = NULL;
+			iomsg->free_iov = NULL;
+		} else if (iov_len > 1) {
 			return -EINVAL;
-		if (copy_from_user(iomsg->fast_iov, uiov, sizeof(*uiov)))
-			return -EFAULT;
-		sr->len = iomsg->fast_iov[0].iov_len;
-		iomsg->free_iov = NULL;
+		} else {
+			if (copy_from_user(iomsg->fast_iov, uiov, sizeof(*uiov)))
+				return -EFAULT;
+			sr->len = iomsg->fast_iov[0].iov_len;
+			iomsg->free_iov = NULL;
+		}
 	} else {
 		iomsg->free_iov = iomsg->fast_iov;
 		ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-next 4/8] io_uring: recycle buffers on error
  2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
                   ` (2 preceding siblings ...)
  2022-06-28 15:02 ` [PATCH for-next 3/8] io_uring: allow iov_len = 0 for recvmsg and buffer select Dylan Yudaken
@ 2022-06-28 15:02 ` Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 5/8] io_uring: clean up io_poll_check_events return values Dylan Yudaken
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Rather than passing an error back to the user with a buffer attached,
recycle the buffer immediately.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/net.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 5e84f7ab92a3..0268c4603f5d 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -481,10 +481,13 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	if (kmsg->free_iov)
 		kfree(kmsg->free_iov);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
-	if (ret >= 0)
+	if (ret > 0)
 		ret += sr->done_io;
 	else if (sr->done_io)
 		ret = sr->done_io;
+	else
+		io_kbuf_recycle(req, issue_flags);
+
 	cflags = io_put_kbuf(req, issue_flags);
 	if (kmsg->msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
@@ -557,10 +560,13 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 		req_set_fail(req);
 	}
 
-	if (ret >= 0)
+	if (ret > 0)
 		ret += sr->done_io;
 	else if (sr->done_io)
 		ret = sr->done_io;
+	else
+		io_kbuf_recycle(req, issue_flags);
+
 	cflags = io_put_kbuf(req, issue_flags);
 	if (msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-next 5/8] io_uring: clean up io_poll_check_events return values
  2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
                   ` (3 preceding siblings ...)
  2022-06-28 15:02 ` [PATCH for-next 4/8] io_uring: recycle buffers on error Dylan Yudaken
@ 2022-06-28 15:02 ` Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 6/8] io_uring: add IOU_STOP_MULTISHOT return code Dylan Yudaken
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

The values returned are a bit confusing, where 0 and 1 have implied
meaning, so add some definitions for them.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/poll.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index fa25b88a7b93..fc3a5789d303 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -192,6 +192,11 @@ static void io_poll_remove_entries(struct io_kiocb *req)
 	rcu_read_unlock();
 }
 
+enum {
+	IOU_POLL_DONE = 0,
+	IOU_POLL_NO_ACTION = 1,
+};
+
 /*
  * All poll tw should go through this. Checks for poll events, manages
  * references, does rewait, etc.
@@ -214,10 +219,11 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 
 		/* tw handler should be the owner, and so have some references */
 		if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))
-			return 0;
+			return IOU_POLL_DONE;
 		if (v & IO_POLL_CANCEL_FLAG)
 			return -ECANCELED;
 
+		/* the mask was stashed in __io_poll_execute */
 		if (!req->cqe.res) {
 			struct poll_table_struct pt = { ._key = req->apoll_events };
 			req->cqe.res = vfs_poll(req->file, &pt) & req->apoll_events;
@@ -226,7 +232,7 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 		if ((unlikely(!req->cqe.res)))
 			continue;
 		if (req->apoll_events & EPOLLONESHOT)
-			return 0;
+			return IOU_POLL_DONE;
 
 		/* multishot, just fill a CQE and proceed */
 		if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
@@ -238,7 +244,7 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 				return -ECANCELED;
 		} else {
 			ret = io_poll_issue(req, locked);
-			if (ret)
+			if (ret < 0)
 				return ret;
 		}
 
@@ -248,7 +254,7 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 		 */
 	} while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
 
-	return 1;
+	return IOU_POLL_NO_ACTION;
 }
 
 static void io_poll_task_func(struct io_kiocb *req, bool *locked)
@@ -256,12 +262,11 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
-	if (ret > 0)
+	if (ret == IOU_POLL_NO_ACTION)
 		return;
 
-	if (!ret) {
+	if (ret == IOU_POLL_DONE) {
 		struct io_poll *poll = io_kiocb_to_cmd(req);
-
 		req->cqe.res = mangle_poll(req->cqe.res & poll->events);
 	} else {
 		req->cqe.res = ret;
@@ -280,7 +285,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
-	if (ret > 0)
+	if (ret == IOU_POLL_NO_ACTION)
 		return;
 
 	io_poll_remove_entries(req);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-next 6/8] io_uring: add IOU_STOP_MULTISHOT return code
  2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
                   ` (4 preceding siblings ...)
  2022-06-28 15:02 ` [PATCH for-next 5/8] io_uring: clean up io_poll_check_events return values Dylan Yudaken
@ 2022-06-28 15:02 ` Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 7/8] io_uring: add IORING_RECV_MULTISHOT flag Dylan Yudaken
  2022-06-28 15:02 ` [PATCH for-next 8/8] io_uring: multishot recv Dylan Yudaken
  7 siblings, 0 replies; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

For multishot we want a way to signal the caller that multishot has ended
but also this might not be an error return.

For example sockets return 0 when closed, which should end a multishot
recv, but still have a CQE with result 0

Introduce IOU_STOP_MULTISHOT which does this and indicates that the return
code is stored inside req->cqe

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/io_uring.h | 7 +++++++
 io_uring/poll.c     | 9 +++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index f77e4a5403e4..e8da70781fa3 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -15,6 +15,13 @@
 enum {
 	IOU_OK			= 0,
 	IOU_ISSUE_SKIP_COMPLETE	= -EIOCBQUEUED,
+
+	/*
+	 * Intended only when both REQ_F_POLLED and REQ_F_APOLL_MULTISHOT
+	 * are set to indicate to the poll runner that multishot should be
+	 * removed and the result is set on req->cqe.res.
+	 */
+	IOU_STOP_MULTISHOT	= -ECANCELED,
 };
 
 struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index fc3a5789d303..2054df9af291 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -195,6 +195,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
 enum {
 	IOU_POLL_DONE = 0,
 	IOU_POLL_NO_ACTION = 1,
+	IOU_POLL_REMOVE_POLL_USE_RES = 2,
 };
 
 /*
@@ -244,6 +245,8 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 				return -ECANCELED;
 		} else {
 			ret = io_poll_issue(req, locked);
+			if (ret == IOU_STOP_MULTISHOT)
+				return IOU_POLL_REMOVE_POLL_USE_RES;
 			if (ret < 0)
 				return ret;
 		}
@@ -268,7 +271,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	if (ret == IOU_POLL_DONE) {
 		struct io_poll *poll = io_kiocb_to_cmd(req);
 		req->cqe.res = mangle_poll(req->cqe.res & poll->events);
-	} else {
+	} else if (ret != IOU_POLL_REMOVE_POLL_USE_RES) {
 		req->cqe.res = ret;
 		req_set_fail(req);
 	}
@@ -291,7 +294,9 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 	io_poll_remove_entries(req);
 	io_poll_tw_hash_eject(req, locked);
 
-	if (!ret)
+	if (ret == IOU_POLL_REMOVE_POLL_USE_RES)
+		io_req_complete_post(req);
+	else if (ret == IOU_POLL_DONE)
 		io_req_task_submit(req, locked);
 	else
 		io_req_complete_failed(req, ret);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH for-next 7/8] io_uring: add IORING_RECV_MULTISHOT flag
  2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
                   ` (5 preceding siblings ...)
  2022-06-28 15:02 ` [PATCH for-next 6/8] io_uring: add IOU_STOP_MULTISHOT return code Dylan Yudaken
@ 2022-06-28 15:02 ` Dylan Yudaken
  2022-06-28 15:12   ` Jens Axboe
  2022-06-28 15:02 ` [PATCH for-next 8/8] io_uring: multishot recv Dylan Yudaken
  7 siblings, 1 reply; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Introduce multishot recv flag which will be used for multishot
recv/recvmsg

Signed-off-by: Dylan Yudaken <[email protected]>
---
 include/uapi/linux/io_uring.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 09e7c3b13d2d..1e5bdb323184 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -259,8 +259,13 @@ enum io_uring_op {
  *				or receive and arm poll if that yields an
  *				-EAGAIN result, arm poll upfront and skip
  *				the initial transfer attempt.
+ *
+ * IORING_RECV_MULTISHOT	Multishot recv. Sets IORING_CQE_F_MORE if
+ *				the handler will continue to report
+ *				CQEs on behalf of the same SQE.
  */
 #define IORING_RECVSEND_POLL_FIRST	(1U << 0)
+#define IORING_RECV_MULTISHOT	(1U << 1)
 
 /*
  * accept flags stored in sqe->ioprio
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH for-next 7/8] io_uring: add IORING_RECV_MULTISHOT flag
  2022-06-28 15:02 ` [PATCH for-next 7/8] io_uring: add IORING_RECV_MULTISHOT flag Dylan Yudaken
@ 2022-06-28 15:12   ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-06-28 15:12 UTC (permalink / raw)
  To: Dylan Yudaken, Pavel Begunkov, io-uring; +Cc: Kernel-team, linux-kernel

On 6/28/22 9:02 AM, Dylan Yudaken wrote:
> Introduce multishot recv flag which will be used for multishot
> recv/recvmsg

I'd fold this with #8.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH for-next 8/8] io_uring: multishot recv
  2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
                   ` (6 preceding siblings ...)
  2022-06-28 15:02 ` [PATCH for-next 7/8] io_uring: add IORING_RECV_MULTISHOT flag Dylan Yudaken
@ 2022-06-28 15:02 ` Dylan Yudaken
  2022-06-28 15:17   ` Jens Axboe
  7 siblings, 1 reply; 12+ messages in thread
From: Dylan Yudaken @ 2022-06-28 15:02 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Support multishot receive for io_uring.
Typical server applications will run a loop where for each recv CQE it
requeues another recv/recvmsg.

This can be simplified by using the existing multishot functionality
combined with io_uring's provided buffers.
The API is to add the IORING_RECV_MULTISHOT flag to the SQE. CQEs will
then be posted (with IORING_CQE_F_MORE flag set) when data is available
and is read. Once an error occurs or the socket ends, the multishot will
be removed and a completion without IORING_CQE_F_MORE will be posted.

The benefit to this is that the recv is much more performant.
 * Subsequent receives are queued up straight away without requiring the
   application to finish a processing loop.
 * If there are more data in the socket (sat the provided buffer size is
   smaller than the socket buffer) then the data is immediately
   returned, improving batching.
 * Poll is only armed once and reused, saving CPU cycles

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/net.c | 93 +++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 81 insertions(+), 12 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 0268c4603f5d..9bf8c6c0b549 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -389,6 +389,8 @@ int io_recvmsg_prep_async(struct io_kiocb *req)
 	return ret;
 }
 
+#define RECVMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT)
+
 int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
@@ -399,13 +401,22 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	sr->len = READ_ONCE(sqe->len);
 	sr->flags = READ_ONCE(sqe->addr2);
-	if (sr->flags & ~IORING_RECVSEND_POLL_FIRST)
+	if (sr->flags & ~(RECVMSG_FLAGS))
 		return -EINVAL;
 	sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
 	if (sr->msg_flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
 	if (sr->msg_flags & MSG_ERRQUEUE)
 		req->flags |= REQ_F_CLEAR_POLLIN;
+	if (sr->flags & IORING_RECV_MULTISHOT) {
+		if (!(req->flags & REQ_F_BUFFER_SELECT))
+			return -EINVAL;
+		if (sr->msg_flags & MSG_WAITALL)
+			return -EINVAL;
+		if (req->opcode == IORING_OP_RECV && sr->len)
+			return -EINVAL;
+		req->flags |= REQ_F_APOLL_MULTISHOT;
+	}
 
 #ifdef CONFIG_COMPAT
 	if (req->ctx->compat)
@@ -415,6 +426,14 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
+static inline void io_recv_prep_retry(struct io_kiocb *req)
+{
+	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
+
+	sr->done_io = 0;
+	sr->len = 0; /* get from the provided buffer */
+}
+
 int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
@@ -424,6 +443,7 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	unsigned flags;
 	int ret, min_ret = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+	size_t len = sr->len;
 
 	sock = sock_from_file(req->file);
 	if (unlikely(!sock))
@@ -442,16 +462,17 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
 		return io_setup_async_msg(req, kmsg);
 
+retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
 
-		buf = io_buffer_select(req, &sr->len, issue_flags);
+		buf = io_buffer_select(req, &len, issue_flags);
 		if (!buf)
 			return -ENOBUFS;
 		kmsg->fast_iov[0].iov_base = buf;
-		kmsg->fast_iov[0].iov_len = sr->len;
+		kmsg->fast_iov[0].iov_len = len;
 		iov_iter_init(&kmsg->msg.msg_iter, READ, kmsg->fast_iov, 1,
-				sr->len);
+				len);
 	}
 
 	flags = sr->msg_flags;
@@ -463,8 +484,15 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	kmsg->msg.msg_get_inq = 1;
 	ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg, kmsg->uaddr, flags);
 	if (ret < min_ret) {
-		if (ret == -EAGAIN && force_nonblock)
-			return io_setup_async_msg(req, kmsg);
+		if (ret == -EAGAIN && force_nonblock) {
+			ret = io_setup_async_msg(req, kmsg);
+			if (ret == -EAGAIN && (req->flags & IO_APOLL_MULTI_POLLED) ==
+					       IO_APOLL_MULTI_POLLED) {
+				io_kbuf_recycle(req, issue_flags);
+				ret = IOU_ISSUE_SKIP_COMPLETE;
+			}
+			return ret;
+		}
 		if (ret == -ERESTARTSYS)
 			ret = -EINTR;
 		if (ret > 0 && io_net_retry(sock, flags)) {
@@ -491,8 +519,24 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	cflags = io_put_kbuf(req, issue_flags);
 	if (kmsg->msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
+
+	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
+		io_req_set_res(req, ret, cflags);
+		return IOU_OK;
+	}
+
+	if (ret > 0) {
+		if (io_post_aux_cqe(req->ctx, req->cqe.user_data, ret,
+				    cflags | IORING_CQE_F_MORE)) {
+			io_recv_prep_retry(req);
+			goto retry_multishot;
+		} else {
+			ret = -ECANCELED;
+		}
+	}
+
 	io_req_set_res(req, ret, cflags);
-	return IOU_OK;
+	return req->flags & REQ_F_POLLED ? IOU_STOP_MULTISHOT : ret;
 }
 
 int io_recv(struct io_kiocb *req, unsigned int issue_flags)
@@ -505,6 +549,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	unsigned flags;
 	int ret, min_ret = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+	size_t len = sr->len;
 
 	if (!(req->flags & REQ_F_POLLED) &&
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
@@ -514,16 +559,17 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(!sock))
 		return -ENOTSOCK;
 
+retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
 
-		buf = io_buffer_select(req, &sr->len, issue_flags);
+		buf = io_buffer_select(req, &len, issue_flags);
 		if (!buf)
 			return -ENOBUFS;
 		sr->buf = buf;
 	}
 
-	ret = import_single_range(READ, sr->buf, sr->len, &iov, &msg.msg_iter);
+	ret = import_single_range(READ, sr->buf, len, &iov, &msg.msg_iter);
 	if (unlikely(ret))
 		goto out_free;
 
@@ -543,8 +589,14 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 
 	ret = sock_recvmsg(sock, &msg, flags);
 	if (ret < min_ret) {
-		if (ret == -EAGAIN && force_nonblock)
-			return -EAGAIN;
+		if (ret == -EAGAIN && force_nonblock) {
+			if ((req->flags & IO_APOLL_MULTI_POLLED) == IO_APOLL_MULTI_POLLED) {
+				io_kbuf_recycle(req, issue_flags);
+				ret = IOU_ISSUE_SKIP_COMPLETE;
+			}
+
+			return ret;
+		}
 		if (ret == -ERESTARTSYS)
 			ret = -EINTR;
 		if (ret > 0 && io_net_retry(sock, flags)) {
@@ -570,8 +622,25 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	cflags = io_put_kbuf(req, issue_flags);
 	if (msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
+
+
+	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
+		io_req_set_res(req, ret, cflags);
+		return IOU_OK;
+	}
+
+	if (ret > 0) {
+		if (io_post_aux_cqe(req->ctx, req->cqe.user_data, ret,
+				    cflags | IORING_CQE_F_MORE)) {
+			io_recv_prep_retry(req);
+			goto retry_multishot;
+		} else {
+			ret = -ECANCELED;
+		}
+	}
+
 	io_req_set_res(req, ret, cflags);
-	return IOU_OK;
+	return req->flags & REQ_F_POLLED ? IOU_STOP_MULTISHOT : ret;
 }
 
 int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH for-next 8/8] io_uring: multishot recv
  2022-06-28 15:02 ` [PATCH for-next 8/8] io_uring: multishot recv Dylan Yudaken
@ 2022-06-28 15:17   ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2022-06-28 15:17 UTC (permalink / raw)
  To: Dylan Yudaken, Pavel Begunkov, io-uring; +Cc: Kernel-team, linux-kernel

On 6/28/22 9:02 AM, Dylan Yudaken wrote:
> @@ -399,13 +401,22 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  	sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
>  	sr->len = READ_ONCE(sqe->len);
>  	sr->flags = READ_ONCE(sqe->addr2);
> -	if (sr->flags & ~IORING_RECVSEND_POLL_FIRST)
> +	if (sr->flags & ~(RECVMSG_FLAGS))
>  		return -EINVAL;
>  	sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
>  	if (sr->msg_flags & MSG_DONTWAIT)
>  		req->flags |= REQ_F_NOWAIT;
>  	if (sr->msg_flags & MSG_ERRQUEUE)
>  		req->flags |= REQ_F_CLEAR_POLLIN;
> +	if (sr->flags & IORING_RECV_MULTISHOT) {
> +		if (!(req->flags & REQ_F_BUFFER_SELECT))
> +			return -EINVAL;
> +		if (sr->msg_flags & MSG_WAITALL)
> +			return -EINVAL;
> +		if (req->opcode == IORING_OP_RECV && sr->len)
> +			return -EINVAL;
> +		req->flags |= REQ_F_APOLL_MULTISHOT;
> +	}

Do we want to forbid not using provided buffers? If you have a ping-pong
type setup, eg you know you'll have to send something before you receive
anything again, seems like it'd be feasible to use this with a normal
buffer?

I strongly suspect that most use cases will use provided buffers for
this, just wondering if there are any particular reasons for forbidding
it explicitly.

>  
>  #ifdef CONFIG_COMPAT
>  	if (req->ctx->compat)
> @@ -415,6 +426,14 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
>  	return 0;
>  }
>  
> +static inline void io_recv_prep_retry(struct io_kiocb *req)
> +{
> +	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
> +
> +	sr->done_io = 0;
> +	sr->len = 0; /* get from the provided buffer */
> +}
> +
>  int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
>  {
>  	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
> @@ -424,6 +443,7 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
>  	unsigned flags;
>  	int ret, min_ret = 0;
>  	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
> +	size_t len = sr->len;
>  
>  	sock = sock_from_file(req->file);
>  	if (unlikely(!sock))
> @@ -442,16 +462,17 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
>  	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
>  		return io_setup_async_msg(req, kmsg);
>  
> +retry_multishot:
>  	if (io_do_buffer_select(req)) {
>  		void __user *buf;
>  
> -		buf = io_buffer_select(req, &sr->len, issue_flags);
> +		buf = io_buffer_select(req, &len, issue_flags);
>  		if (!buf)
>  			return -ENOBUFS;
>  		kmsg->fast_iov[0].iov_base = buf;
> -		kmsg->fast_iov[0].iov_len = sr->len;
> +		kmsg->fast_iov[0].iov_len = len;
>  		iov_iter_init(&kmsg->msg.msg_iter, READ, kmsg->fast_iov, 1,
> -				sr->len);
> +				len);
>  	}
>  
>  	flags = sr->msg_flags;
> @@ -463,8 +484,15 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
>  	kmsg->msg.msg_get_inq = 1;
>  	ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg, kmsg->uaddr, flags);
>  	if (ret < min_ret) {
> -		if (ret == -EAGAIN && force_nonblock)
> -			return io_setup_async_msg(req, kmsg);
> +		if (ret == -EAGAIN && force_nonblock) {
> +			ret = io_setup_async_msg(req, kmsg);
> +			if (ret == -EAGAIN && (req->flags & IO_APOLL_MULTI_POLLED) ==
> +					       IO_APOLL_MULTI_POLLED) {
> +				io_kbuf_recycle(req, issue_flags);
> +				ret = IOU_ISSUE_SKIP_COMPLETE;
> +			}
> +			return ret;
> +		}
>  		if (ret == -ERESTARTSYS)
>  			ret = -EINTR;
>  		if (ret > 0 && io_net_retry(sock, flags)) {
> @@ -491,8 +519,24 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
>  	cflags = io_put_kbuf(req, issue_flags);
>  	if (kmsg->msg.msg_inq)
>  		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
> +
> +	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
> +		io_req_set_res(req, ret, cflags);
> +		return IOU_OK;
> +	}
> +
> +	if (ret > 0) {
> +		if (io_post_aux_cqe(req->ctx, req->cqe.user_data, ret,
> +				    cflags | IORING_CQE_F_MORE)) {
> +			io_recv_prep_retry(req);
> +			goto retry_multishot;
> +		} else {
> +			ret = -ECANCELED;
> +		}
> +	}
> +
>  	io_req_set_res(req, ret, cflags);
> -	return IOU_OK;
> +	return req->flags & REQ_F_POLLED ? IOU_STOP_MULTISHOT : ret;
>  }

Minor style, but I prefer avoiding ternaries if possible. This is much
easier to read for me:

	if (req->flags & REQ_F_POLLED)
		return IOU_STOP_MULTISHOT;
	return ret;

> @@ -505,6 +549,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>  	unsigned flags;
>  	int ret, min_ret = 0;
>  	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
> +	size_t len = sr->len;
>  
>  	if (!(req->flags & REQ_F_POLLED) &&
>  	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
> @@ -514,16 +559,17 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>  	if (unlikely(!sock))
>  		return -ENOTSOCK;
>  
> +retry_multishot:
>  	if (io_do_buffer_select(req)) {
>  		void __user *buf;
>  
> -		buf = io_buffer_select(req, &sr->len, issue_flags);
> +		buf = io_buffer_select(req, &len, issue_flags);
>  		if (!buf)
>  			return -ENOBUFS;
>  		sr->buf = buf;
>  	}
>  
> -	ret = import_single_range(READ, sr->buf, sr->len, &iov, &msg.msg_iter);
> +	ret = import_single_range(READ, sr->buf, len, &iov, &msg.msg_iter);
>  	if (unlikely(ret))
>  		goto out_free;
>  
> @@ -543,8 +589,14 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>  
>  	ret = sock_recvmsg(sock, &msg, flags);
>  	if (ret < min_ret) {
> -		if (ret == -EAGAIN && force_nonblock)
> -			return -EAGAIN;
> +		if (ret == -EAGAIN && force_nonblock) {
> +			if ((req->flags & IO_APOLL_MULTI_POLLED) == IO_APOLL_MULTI_POLLED) {
> +				io_kbuf_recycle(req, issue_flags);
> +				ret = IOU_ISSUE_SKIP_COMPLETE;
> +			}
> +
> +			return ret;
> +		}

Maybe:
		if ((req->flags & IO_APOLL_MULTI_POLLED) == IO_APOLL_MULTI_POLLED) {
			io_kbuf_recycle(req, issue_flags);
			return IOU_ISSUE_SKIP_COMPLETE;
		}

		return ret;

> @@ -570,8 +622,25 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
>  	cflags = io_put_kbuf(req, issue_flags);
>  	if (msg.msg_inq)
>  		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
> +
> +
> +	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
> +		io_req_set_res(req, ret, cflags);
> +		return IOU_OK;
> +	}
> +
> +	if (ret > 0) {
> +		if (io_post_aux_cqe(req->ctx, req->cqe.user_data, ret,
> +				    cflags | IORING_CQE_F_MORE)) {
> +			io_recv_prep_retry(req);
> +			goto retry_multishot;
> +		} else {
> +			ret = -ECANCELED;
> +		}
> +	}
> +
>  	io_req_set_res(req, ret, cflags);
> -	return IOU_OK;
> +	return req->flags & REQ_F_POLLED ? IOU_STOP_MULTISHOT : ret;
>  }

Same here, and maybe this needs to be a helper so you could just do

	return io_recv_finish(req, ret, cflags);

or something like that? It's non-trivial duplicated code.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-06-28 15:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-28 15:02 [PATCH for-next 0/8] io_uring: multishot recv Dylan Yudaken
2022-06-28 15:02 ` [PATCH for-next 1/8] io_uring: allow 0 length for buffer select Dylan Yudaken
2022-06-28 15:02 ` [PATCH for-next 2/8] io_uring: restore bgid in io_put_kbuf Dylan Yudaken
2022-06-28 15:12   ` Jens Axboe
2022-06-28 15:02 ` [PATCH for-next 3/8] io_uring: allow iov_len = 0 for recvmsg and buffer select Dylan Yudaken
2022-06-28 15:02 ` [PATCH for-next 4/8] io_uring: recycle buffers on error Dylan Yudaken
2022-06-28 15:02 ` [PATCH for-next 5/8] io_uring: clean up io_poll_check_events return values Dylan Yudaken
2022-06-28 15:02 ` [PATCH for-next 6/8] io_uring: add IOU_STOP_MULTISHOT return code Dylan Yudaken
2022-06-28 15:02 ` [PATCH for-next 7/8] io_uring: add IORING_RECV_MULTISHOT flag Dylan Yudaken
2022-06-28 15:12   ` Jens Axboe
2022-06-28 15:02 ` [PATCH for-next 8/8] io_uring: multishot recv Dylan Yudaken
2022-06-28 15:17   ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox