[PATCH v2 for-next 00/12] io_uring: multishot recv

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH v2 for-next 00/12] io_uring: multishot recv
@ 2022-06-30  9:12 Dylan Yudaken
  2022-06-30  9:12 ` Dylan Yudaken
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

This series adds support for multishot recv/recvmsg to io_uring.

The idea is that generally socket applications will be continually
enqueuing a new recv() when the previous one completes. This can be
improved on by allowing the application to queue a multishot receive,
which will post completions as and when data is available. It uses the
provided buffers feature to receive new data into a pool provided by
the application.

This is more performant in a few ways:
* Subsequent receives are queued up straight away without requiring the
  application to finish a processing loop.
* If there are more data in the socket (sat the provided buffer
  size is smaller than the socket buffer) then the data is immediately
  returned, improving batching.
*  Poll is only armed once and reused, saving CPU cycles

Running a small network benchmark [1] shows improved QPS of ~6-8% over a range of loads.

[1]: https://github.com/DylanZA/netbench/tree/multishot_recv

While building this I noticed a small problem in multishot poll which is a really
big problem for receive. If CQEs overflow, then they will be returned to the user
out of order. This is annoying for the existing use cases of poll and accept but
doesn't totally break the functionality. Both of these return results that aren't
strictly ordered except for the IORING_CQE_F_MORE flag. For receive this obviously
is a critical requirement as otherwise data will be received out of order by the
application.

To fix this, when a multishot CQE hits overflow we remove multishot. The application
should then clear CQEs until it sees that CQE, and noticing that IORING_CQE_F_MORE is
not set can re-issue the multishot request.

Patches:
1-3: relax restrictions around provided buffers to allow 0 size lengths
4: recycles more buffers on kernel side in error conditions
5-6: clean up multishot poll API a bit allowing it to end with succesful
error conditions
7-8: fix existing problems with multishot poll on overflow
9: is the multishot receive patch
10-11: are small fixes to tracing of CQEs

v2:
* Added patches 6,7,8 (fixing multishot poll bugs)
* Added patches 10,11 (trace cleanups)
* added io_recv_finish to reduce duplicate logic

Dylan Yudaken (12):
  io_uring: allow 0 length for buffer select
  io_uring: restore bgid in io_put_kbuf
  io_uring: allow iov_len = 0 for recvmsg and buffer select
  io_uring: recycle buffers on error
  io_uring: clean up io_poll_check_events return values
  io_uring: add IOU_STOP_MULTISHOT return code
  io_uring: add allow_overflow to io_post_aux_cqe
  io_uring: fix multishot poll on overflow
  io_uring: fix multishot accept ordering
  io_uring: multishot recv
  io_uring: fix io_uring_cqe_overflow trace format
  io_uring: only trace one of complete or overflow

 include/trace/events/io_uring.h |   2 +-
 include/uapi/linux/io_uring.h   |   5 ++
 io_uring/io_uring.c             |  17 ++--
 io_uring/io_uring.h             |  20 +++--
 io_uring/kbuf.c                 |   4 +-
 io_uring/kbuf.h                 |   9 ++-
 io_uring/msg_ring.c             |   4 +-
 io_uring/net.c                  | 139 ++++++++++++++++++++++++++------
 io_uring/poll.c                 |  44 ++++++----
 io_uring/rsrc.c                 |   4 +-
 10 files changed, 190 insertions(+), 58 deletions(-)

base-commit: 864a15ca4f196184e3f44d72efc1782a7017cbbd
-- 
2.30.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 00/12] io_uring: multishot recv
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 01/12] io_uring: allow 0 length for buffer select Dylan Yudaken
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

This series adds support for multishot recv/recvmsg to io_uring.

The idea is that generally socket applications will be continually
enqueuing a new recv() when the previous one completes. This can be
improved on by allowing the application to queue a multishot receive,
which will post completions as and when data is available. It uses the
provided buffers feature to receive new data into a pool provided by
the application.

This is more performant in a few ways:
* Subsequent receives are queued up straight away without requiring the
  application to finish a processing loop.
* If there are more data in the socket (sat the provided buffer
  size is smaller than the socket buffer) then the data is immediately
  returned, improving batching.
*  Poll is only armed once and reused, saving CPU cycles

Running a small network benchmark [1] shows improved QPS of ~6-8% over a range of loads.

[1]: https://github.com/DylanZA/netbench/tree/multishot_recv

While building this I noticed a small problem in multishot poll which is a really
big problem for receive. If CQEs overflow, then they will be returned to the user
out of order. This is annoying for the existing use cases of poll and accept but
doesn't totally break the functionality. Both of these return results that aren't
strictly ordered except for the IORING_CQE_F_MORE flag. For receive this obviously
is a critical requirement as otherwise data will be received out of order by the
application.

To fix this, when a multishot CQE hits overflow we remove multishot. The application
should then clear CQEs until it sees that CQE, and noticing that IORING_CQE_F_MORE is
not set can re-issue the multishot request.

Patches:
1-3: relax restrictions around provided buffers to allow 0 size lengths
4: recycles more buffers on kernel side in error conditions
5-6: clean up multishot poll API a bit allowing it to end with succesful
error conditions
7-8: fix existing problems with multishot poll on overflow
9: is the multishot receive patch
10-11: are small fixes to tracing of CQEs

v2:
* Added patches 6,7,8 (fixing multishot poll bugs)
* Added patches 10,11 (trace cleanups)
* added io_recv_finish to reduce duplicate logic

Dylan Yudaken (12):
  io_uring: allow 0 length for buffer select
  io_uring: restore bgid in io_put_kbuf
  io_uring: allow iov_len = 0 for recvmsg and buffer select
  io_uring: recycle buffers on error
  io_uring: clean up io_poll_check_events return values
  io_uring: add IOU_STOP_MULTISHOT return code
  io_uring: add allow_overflow to io_post_aux_cqe
  io_uring: fix multishot poll on overflow
  io_uring: fix multishot accept ordering
  io_uring: multishot recv
  io_uring: fix io_uring_cqe_overflow trace format
  io_uring: only trace one of complete or overflow

 include/trace/events/io_uring.h |   2 +-
 include/uapi/linux/io_uring.h   |   5 ++
 io_uring/io_uring.c             |  17 ++--
 io_uring/io_uring.h             |  20 +++--
 io_uring/kbuf.c                 |   4 +-
 io_uring/kbuf.h                 |   9 ++-
 io_uring/msg_ring.c             |   4 +-
 io_uring/net.c                  | 139 ++++++++++++++++++++++++++------
 io_uring/poll.c                 |  44 ++++++----
 io_uring/rsrc.c                 |   4 +-
 10 files changed, 190 insertions(+), 58 deletions(-)

base-commit: 864a15ca4f196184e3f44d72efc1782a7017cbbd
--
2.30.2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 01/12] io_uring: allow 0 length for buffer select
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
  2022-06-30  9:12 ` Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 02/12] io_uring: restore bgid in io_put_kbuf Dylan Yudaken
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

If user gives 0 for length, we can set it from the available buffer size.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/kbuf.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 8e4f1e8aaf4a..4ed5102461bf 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -115,7 +115,7 @@ static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t *len,
 
 		kbuf = list_first_entry(&bl->buf_list, struct io_buffer, list);
 		list_del(&kbuf->list);
-		if (*len > kbuf->len)
+		if (*len == 0 || *len > kbuf->len)
 			*len = kbuf->len;
 		req->flags |= REQ_F_BUFFER_SELECTED;
 		req->kbuf = kbuf;
@@ -145,7 +145,7 @@ static void __user *io_ring_buffer_select(struct io_kiocb *req, size_t *len,
 		buf = page_address(bl->buf_pages[index]);
 		buf += off;
 	}
-	if (*len > buf->len)
+	if (*len == 0 || *len > buf->len)
 		*len = buf->len;
 	req->flags |= REQ_F_BUFFER_RING;
 	req->buf_list = bl;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 02/12] io_uring: restore bgid in io_put_kbuf
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
  2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 01/12] io_uring: allow 0 length for buffer select Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 03/12] io_uring: allow iov_len = 0 for recvmsg and buffer select Dylan Yudaken
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Attempt to restore bgid. This is needed when recycling unused buffers as
the next time around it will want the correct bgid.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/kbuf.h | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index 3d48f1ab5439..d9adedac5e9d 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -96,16 +96,21 @@ static inline void io_kbuf_recycle(struct io_kiocb *req, unsigned issue_flags)
 static inline unsigned int __io_put_kbuf_list(struct io_kiocb *req,
 					      struct list_head *list)
 {
+	unsigned int ret = IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+
 	if (req->flags & REQ_F_BUFFER_RING) {
-		if (req->buf_list)
+		if (req->buf_list) {
+			req->buf_index = req->buf_list->bgid;
 			req->buf_list->head++;
+		}
 		req->flags &= ~REQ_F_BUFFER_RING;
 	} else {
+		req->buf_index = req->kbuf->bgid;
 		list_add(&req->kbuf->list, list);
 		req->flags &= ~REQ_F_BUFFER_SELECTED;
 	}
 
-	return IORING_CQE_F_BUFFER | (req->buf_index << IORING_CQE_BUFFER_SHIFT);
+	return ret;
 }
 
 static inline unsigned int io_put_kbuf_comp(struct io_kiocb *req)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 03/12] io_uring: allow iov_len = 0 for recvmsg and buffer select
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (2 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 02/12] io_uring: restore bgid in io_put_kbuf Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 04/12] io_uring: recycle buffers on error Dylan Yudaken
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

When using BUFFER_SELECT there is no technical requirement that the user
actually provides iov, and this removes one copy_from_user call.

So allow iov_len to be 0.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/net.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 19a805c3814c..5e84f7ab92a3 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -300,12 +300,18 @@ static int __io_recvmsg_copy_hdr(struct io_kiocb *req,
 		return ret;
 
 	if (req->flags & REQ_F_BUFFER_SELECT) {
-		if (iov_len > 1)
+		if (iov_len == 0) {
+			sr->len = iomsg->fast_iov[0].iov_len = 0;
+			iomsg->fast_iov[0].iov_base = NULL;
+			iomsg->free_iov = NULL;
+		} else if (iov_len > 1) {
 			return -EINVAL;
-		if (copy_from_user(iomsg->fast_iov, uiov, sizeof(*uiov)))
-			return -EFAULT;
-		sr->len = iomsg->fast_iov[0].iov_len;
-		iomsg->free_iov = NULL;
+		} else {
+			if (copy_from_user(iomsg->fast_iov, uiov, sizeof(*uiov)))
+				return -EFAULT;
+			sr->len = iomsg->fast_iov[0].iov_len;
+			iomsg->free_iov = NULL;
+		}
 	} else {
 		iomsg->free_iov = iomsg->fast_iov;
 		ret = __import_iovec(READ, uiov, iov_len, UIO_FASTIOV,
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 04/12] io_uring: recycle buffers on error
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (3 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 03/12] io_uring: allow iov_len = 0 for recvmsg and buffer select Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 05/12] io_uring: clean up io_poll_check_events return values Dylan Yudaken
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Rather than passing an error back to the user with a buffer attached,
recycle the buffer immediately.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/net.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 5e84f7ab92a3..0268c4603f5d 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -481,10 +481,13 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	if (kmsg->free_iov)
 		kfree(kmsg->free_iov);
 	req->flags &= ~REQ_F_NEED_CLEANUP;
-	if (ret >= 0)
+	if (ret > 0)
 		ret += sr->done_io;
 	else if (sr->done_io)
 		ret = sr->done_io;
+	else
+		io_kbuf_recycle(req, issue_flags);
+
 	cflags = io_put_kbuf(req, issue_flags);
 	if (kmsg->msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
@@ -557,10 +560,13 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 		req_set_fail(req);
 	}
 
-	if (ret >= 0)
+	if (ret > 0)
 		ret += sr->done_io;
 	else if (sr->done_io)
 		ret = sr->done_io;
+	else
+		io_kbuf_recycle(req, issue_flags);
+
 	cflags = io_put_kbuf(req, issue_flags);
 	if (msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 05/12] io_uring: clean up io_poll_check_events return values
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (4 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 04/12] io_uring: recycle buffers on error Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 06/12] io_uring: add IOU_STOP_MULTISHOT return code Dylan Yudaken
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

The values returned are a bit confusing, where 0 and 1 have implied
meaning, so add some definitions for them.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/poll.c | 27 ++++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index fa25b88a7b93..922a3d1b2e31 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -192,13 +192,18 @@ static void io_poll_remove_entries(struct io_kiocb *req)
 	rcu_read_unlock();
 }
 
+enum {
+	IOU_POLL_DONE = 0,
+	IOU_POLL_NO_ACTION = 1,
+};
+
 /*
  * All poll tw should go through this. Checks for poll events, manages
  * references, does rewait, etc.
  *
- * Returns a negative error on failure. >0 when no action require, which is
- * either spurious wakeup or multishot CQE is served. 0 when it's done with
- * the request, then the mask is stored in req->cqe.res.
+ * Returns a negative error on failure. IOU_POLL_NO_ACTION when no action require,
+ * which is either spurious wakeup or multishot CQE is served.
+ * IOU_POLL_DONE when it's done with the request, then the mask is stored in req->cqe.res.
  */
 static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 {
@@ -214,10 +219,11 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 
 		/* tw handler should be the owner, and so have some references */
 		if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))
-			return 0;
+			return IOU_POLL_DONE;
 		if (v & IO_POLL_CANCEL_FLAG)
 			return -ECANCELED;
 
+		/* the mask was stashed in __io_poll_execute */
 		if (!req->cqe.res) {
 			struct poll_table_struct pt = { ._key = req->apoll_events };
 			req->cqe.res = vfs_poll(req->file, &pt) & req->apoll_events;
@@ -226,7 +232,7 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 		if ((unlikely(!req->cqe.res)))
 			continue;
 		if (req->apoll_events & EPOLLONESHOT)
-			return 0;
+			return IOU_POLL_DONE;
 
 		/* multishot, just fill a CQE and proceed */
 		if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
@@ -238,7 +244,7 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 				return -ECANCELED;
 		} else {
 			ret = io_poll_issue(req, locked);
-			if (ret)
+			if (ret < 0)
 				return ret;
 		}
 
@@ -248,7 +254,7 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 		 */
 	} while (atomic_sub_return(v & IO_POLL_REF_MASK, &req->poll_refs));
 
-	return 1;
+	return IOU_POLL_NO_ACTION;
 }
 
 static void io_poll_task_func(struct io_kiocb *req, bool *locked)
@@ -256,12 +262,11 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
-	if (ret > 0)
+	if (ret == IOU_POLL_NO_ACTION)
 		return;
 
-	if (!ret) {
+	if (ret == IOU_POLL_DONE) {
 		struct io_poll *poll = io_kiocb_to_cmd(req);
-
 		req->cqe.res = mangle_poll(req->cqe.res & poll->events);
 	} else {
 		req->cqe.res = ret;
@@ -280,7 +285,7 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 	int ret;
 
 	ret = io_poll_check_events(req, locked);
-	if (ret > 0)
+	if (ret == IOU_POLL_NO_ACTION)
 		return;
 
 	io_poll_remove_entries(req);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 06/12] io_uring: add IOU_STOP_MULTISHOT return code
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (5 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 05/12] io_uring: clean up io_poll_check_events return values Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 07/12] io_uring: add allow_overflow to io_post_aux_cqe Dylan Yudaken
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

For multishot we want a way to signal the caller that multishot has ended
but also this might not be an error return.

For example sockets return 0 when closed, which should end a multishot
recv, but still have a CQE with result 0

Introduce IOU_STOP_MULTISHOT which does this and indicates that the return
code is stored inside req->cqe

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/io_uring.h |  7 +++++++
 io_uring/poll.c     | 11 +++++++++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index f77e4a5403e4..e8da70781fa3 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -15,6 +15,13 @@
 enum {
 	IOU_OK			= 0,
 	IOU_ISSUE_SKIP_COMPLETE	= -EIOCBQUEUED,
+
+	/*
+	 * Intended only when both REQ_F_POLLED and REQ_F_APOLL_MULTISHOT
+	 * are set to indicate to the poll runner that multishot should be
+	 * removed and the result is set on req->cqe.res.
+	 */
+	IOU_STOP_MULTISHOT	= -ECANCELED,
 };
 
 struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx);
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 922a3d1b2e31..64d426d696ab 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -195,6 +195,7 @@ static void io_poll_remove_entries(struct io_kiocb *req)
 enum {
 	IOU_POLL_DONE = 0,
 	IOU_POLL_NO_ACTION = 1,
+	IOU_POLL_REMOVE_POLL_USE_RES = 2,
 };
 
 /*
@@ -204,6 +205,8 @@ enum {
  * Returns a negative error on failure. IOU_POLL_NO_ACTION when no action require,
  * which is either spurious wakeup or multishot CQE is served.
  * IOU_POLL_DONE when it's done with the request, then the mask is stored in req->cqe.res.
+ * IOU_POLL_REMOVE_POLL_USE_RES indicates to remove multishot poll and that the result
+ * is stored in req->cqe.
  */
 static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 {
@@ -244,6 +247,8 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 				return -ECANCELED;
 		} else {
 			ret = io_poll_issue(req, locked);
+			if (ret == IOU_STOP_MULTISHOT)
+				return IOU_POLL_REMOVE_POLL_USE_RES;
 			if (ret < 0)
 				return ret;
 		}
@@ -268,7 +273,7 @@ static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 	if (ret == IOU_POLL_DONE) {
 		struct io_poll *poll = io_kiocb_to_cmd(req);
 		req->cqe.res = mangle_poll(req->cqe.res & poll->events);
-	} else {
+	} else if (ret != IOU_POLL_REMOVE_POLL_USE_RES) {
 		req->cqe.res = ret;
 		req_set_fail(req);
 	}
@@ -291,7 +296,9 @@ static void io_apoll_task_func(struct io_kiocb *req, bool *locked)
 	io_poll_remove_entries(req);
 	io_poll_tw_hash_eject(req, locked);
 
-	if (!ret)
+	if (ret == IOU_POLL_REMOVE_POLL_USE_RES)
+		io_req_complete_post(req);
+	else if (ret == IOU_POLL_DONE)
 		io_req_task_submit(req, locked);
 	else
 		io_req_complete_failed(req, ret);
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 07/12] io_uring: add allow_overflow to io_post_aux_cqe
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (6 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 06/12] io_uring: add IOU_STOP_MULTISHOT return code Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 08/12] io_uring: fix multishot poll on overflow Dylan Yudaken
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Some use cases of io_post_aux_cqe would not want to overflow as is, but
might want to change the flags/result. For example multishot receive
requires in order CQE, and so if there is an overflow it would need to
stop receiving until the overflow is taken care of.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/io_uring.c | 14 ++++++++++----
 io_uring/io_uring.h |  3 ++-
 io_uring/msg_ring.c |  4 ++--
 io_uring/net.c      |  2 +-
 io_uring/poll.c     |  2 +-
 io_uring/rsrc.c     |  4 ++--
 6 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 745264938a48..523b6ebad15a 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -736,7 +736,8 @@ struct io_uring_cqe *__io_get_cqe(struct io_ring_ctx *ctx)
 }
 
 static bool io_fill_cqe_aux(struct io_ring_ctx *ctx,
-			    u64 user_data, s32 res, u32 cflags)
+			    u64 user_data, s32 res, u32 cflags,
+			    bool allow_overflow)
 {
 	struct io_uring_cqe *cqe;
 
@@ -760,16 +761,21 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx,
 		}
 		return true;
 	}
-	return io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0);
+
+	if (allow_overflow)
+		return io_cqring_event_overflow(ctx, user_data, res, cflags, 0, 0);
+
+	return false;
 }
 
 bool io_post_aux_cqe(struct io_ring_ctx *ctx,
-		     u64 user_data, s32 res, u32 cflags)
+		     u64 user_data, s32 res, u32 cflags,
+		     bool allow_overflow)
 {
 	bool filled;
 
 	io_cq_lock(ctx);
-	filled = io_fill_cqe_aux(ctx, user_data, res, cflags);
+	filled = io_fill_cqe_aux(ctx, user_data, res, cflags, allow_overflow);
 	io_cq_unlock_post(ctx);
 	return filled;
 }
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index e8da70781fa3..e022d71c177a 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -31,7 +31,8 @@ void io_req_complete_failed(struct io_kiocb *req, s32 res);
 void __io_req_complete(struct io_kiocb *req, unsigned issue_flags);
 void io_req_complete_post(struct io_kiocb *req);
 void __io_req_complete_post(struct io_kiocb *req);
-bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags);
+bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags,
+		     bool allow_overflow);
 void __io_commit_cqring_flush(struct io_ring_ctx *ctx);
 
 struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages);
diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index 939205b30c8b..753d16734319 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -31,7 +31,7 @@ static int io_msg_ring_data(struct io_kiocb *req)
 	if (msg->src_fd || msg->dst_fd || msg->flags)
 		return -EINVAL;
 
-	if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0))
+	if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0, true))
 		return 0;
 
 	return -EOVERFLOW;
@@ -113,7 +113,7 @@ static int io_msg_send_fd(struct io_kiocb *req, unsigned int issue_flags)
 	 * completes with -EOVERFLOW, then the sender must ensure that a
 	 * later IORING_OP_MSG_RING delivers the message.
 	 */
-	if (!io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0))
+	if (!io_post_aux_cqe(target_ctx, msg->user_data, msg->len, 0, true))
 		ret = -EOVERFLOW;
 out_unlock:
 	io_double_unlock_ctx(ctx, target_ctx, issue_flags);
diff --git a/io_uring/net.c b/io_uring/net.c
index 0268c4603f5d..c3600814b308 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -658,7 +658,7 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 
 	if (ret < 0)
 		return ret;
-	if (io_post_aux_cqe(ctx, req->cqe.user_data, ret, IORING_CQE_F_MORE))
+	if (io_post_aux_cqe(ctx, req->cqe.user_data, ret, IORING_CQE_F_MORE, true))
 		goto retry;
 	return -ECANCELED;
 }
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 64d426d696ab..e8f922a4f6d7 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -243,7 +243,7 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 						    req->apoll_events);
 
 			if (!io_post_aux_cqe(ctx, req->cqe.user_data,
-					     mask, IORING_CQE_F_MORE))
+					     mask, IORING_CQE_F_MORE, true))
 				return -ECANCELED;
 		} else {
 			ret = io_poll_issue(req, locked);
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 6a75d2f57e8e..1182cf0ea1fc 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -175,10 +175,10 @@ static void __io_rsrc_put_work(struct io_rsrc_node *ref_node)
 		if (prsrc->tag) {
 			if (ctx->flags & IORING_SETUP_IOPOLL) {
 				mutex_lock(&ctx->uring_lock);
-				io_post_aux_cqe(ctx, prsrc->tag, 0, 0);
+				io_post_aux_cqe(ctx, prsrc->tag, 0, 0, true);
 				mutex_unlock(&ctx->uring_lock);
 			} else {
-				io_post_aux_cqe(ctx, prsrc->tag, 0, 0);
+				io_post_aux_cqe(ctx, prsrc->tag, 0, 0, true);
 			}
 		}
 
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 08/12] io_uring: fix multishot poll on overflow
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (7 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 07/12] io_uring: add allow_overflow to io_post_aux_cqe Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 09/12] io_uring: fix multishot accept ordering Dylan Yudaken
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

On overflow, multishot poll can still complete with the IORING_CQE_F_MORE
flag set.
If in the meantime the user clears a CQE and a the poll was cancelled then
the poll will post a CQE without the IORING_CQE_F_MORE (and likely result
-ECANCELED).

However when processing the application will encounter the non-overflow
CQE which indicates that there will be no more events posted. Typical
userspace applications would free memory associated with the poll in this
case.
It will then subsequently receive the earlier CQE which has overflowed,
which breaks the contract given by the IORING_CQE_F_MORE flag.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/poll.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/io_uring/poll.c b/io_uring/poll.c
index e8f922a4f6d7..57747d92bba4 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -243,8 +243,10 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
 						    req->apoll_events);
 
 			if (!io_post_aux_cqe(ctx, req->cqe.user_data,
-					     mask, IORING_CQE_F_MORE, true))
-				return -ECANCELED;
+					     mask, IORING_CQE_F_MORE, false)) {
+				io_req_set_res(req, mask, 0);
+				return IOU_POLL_REMOVE_POLL_USE_RES;
+			}
 		} else {
 			ret = io_poll_issue(req, locked);
 			if (ret == IOU_STOP_MULTISHOT)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 09/12] io_uring: fix multishot accept ordering
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (8 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 08/12] io_uring: fix multishot poll on overflow Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 10/12] io_uring: multishot recv Dylan Yudaken
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Similar to multishot poll, drop multishot accept when CQE overflow occurs.

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/net.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index c3600814b308..75761f48c959 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -656,11 +656,14 @@ int io_accept(struct io_kiocb *req, unsigned int issue_flags)
 		return IOU_OK;
 	}
 
-	if (ret < 0)
-		return ret;
-	if (io_post_aux_cqe(ctx, req->cqe.user_data, ret, IORING_CQE_F_MORE, true))
+	if (ret >= 0 &&
+	    io_post_aux_cqe(ctx, req->cqe.user_data, ret, IORING_CQE_F_MORE, false))
 		goto retry;
-	return -ECANCELED;
+
+	io_req_set_res(req, ret, 0);
+	if (req->flags & REQ_F_POLLED)
+		return IOU_STOP_MULTISHOT;
+	return IOU_OK;
 }
 
 int io_socket_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 10/12] io_uring: multishot recv
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (9 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 09/12] io_uring: fix multishot accept ordering Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 11/12] io_uring: fix io_uring_cqe_overflow trace format Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 12/12] io_uring: only trace one of complete or overflow Dylan Yudaken
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Support multishot receive for io_uring.
Typical server applications will run a loop where for each recv CQE it
requeues another recv/recvmsg.

This can be simplified by using the existing multishot functionality
combined with io_uring's provided buffers.
The API is to add the IORING_RECV_MULTISHOT flag to the SQE. CQEs will
then be posted (with IORING_CQE_F_MORE flag set) when data is available
and is read. Once an error occurs or the socket ends, the multishot will
be removed and a completion without IORING_CQE_F_MORE will be posted.

The benefit to this is that the recv is much more performant.
 * Subsequent receives are queued up straight away without requiring the
   application to finish a processing loop.
 * If there are more data in the socket (sat the provided buffer size is
   smaller than the socket buffer) then the data is immediately
   returned, improving batching.
 * Poll is only armed once and reused, saving CPU cycles

Signed-off-by: Dylan Yudaken <[email protected]>
---
 include/uapi/linux/io_uring.h |   5 ++
 io_uring/net.c                | 102 +++++++++++++++++++++++++++++-----
 2 files changed, 94 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 05addde13df8..405bb5a67d47 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -260,8 +260,13 @@ enum io_uring_op {
  *				or receive and arm poll if that yields an
  *				-EAGAIN result, arm poll upfront and skip
  *				the initial transfer attempt.
+ *
+ * IORING_RECV_MULTISHOT	Multishot recv. Sets IORING_CQE_F_MORE if
+ *				the handler will continue to report
+ *				CQEs on behalf of the same SQE.
  */
 #define IORING_RECVSEND_POLL_FIRST	(1U << 0)
+#define IORING_RECV_MULTISHOT	(1U << 1)
 
 /*
  * accept flags stored in sqe->ioprio
diff --git a/io_uring/net.c b/io_uring/net.c
index 75761f48c959..3394825f74fd 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -389,6 +389,8 @@ int io_recvmsg_prep_async(struct io_kiocb *req)
 	return ret;
 }
 
+#define RECVMSG_FLAGS (IORING_RECVSEND_POLL_FIRST | IORING_RECV_MULTISHOT)
+
 int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
@@ -399,13 +401,22 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	sr->umsg = u64_to_user_ptr(READ_ONCE(sqe->addr));
 	sr->len = READ_ONCE(sqe->len);
 	sr->flags = READ_ONCE(sqe->addr2);
-	if (sr->flags & ~IORING_RECVSEND_POLL_FIRST)
+	if (sr->flags & ~(RECVMSG_FLAGS))
 		return -EINVAL;
 	sr->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL;
 	if (sr->msg_flags & MSG_DONTWAIT)
 		req->flags |= REQ_F_NOWAIT;
 	if (sr->msg_flags & MSG_ERRQUEUE)
 		req->flags |= REQ_F_CLEAR_POLLIN;
+	if (sr->flags & IORING_RECV_MULTISHOT) {
+		if (!(req->flags & REQ_F_BUFFER_SELECT))
+			return -EINVAL;
+		if (sr->msg_flags & MSG_WAITALL)
+			return -EINVAL;
+		if (req->opcode == IORING_OP_RECV && sr->len)
+			return -EINVAL;
+		req->flags |= REQ_F_APOLL_MULTISHOT;
+	}
 
 #ifdef CONFIG_COMPAT
 	if (req->ctx->compat)
@@ -415,6 +426,48 @@ int io_recvmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	return 0;
 }
 
+static inline void io_recv_prep_retry(struct io_kiocb *req)
+{
+	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
+
+	sr->done_io = 0;
+	sr->len = 0; /* get from the provided buffer */
+}
+
+/*
+ * Finishes io_recv and io_recvmsg.
+ *
+ * Returns true if it is actually finished, or false if it should run
+ * again (for multishot).
+ */
+static inline bool io_recv_finish(struct io_kiocb *req, int *ret, unsigned int cflags)
+{
+	if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
+		io_req_set_res(req, *ret, cflags);
+		*ret = IOU_OK;
+		return true;
+	}
+
+	if (*ret > 0) {
+		if (io_post_aux_cqe(req->ctx, req->cqe.user_data, *ret,
+				    cflags | IORING_CQE_F_MORE, false)) {
+			io_recv_prep_retry(req);
+			return false;
+		}
+		/*
+		 * Otherwise stop multishot but use the current result.
+		 * Probably will end up going into overflow, but this means
+		 * we cannot trust the ordering anymore
+		 */
+	}
+
+	io_req_set_res(req, *ret, cflags);
+
+	if (req->flags & REQ_F_POLLED)
+		*ret = IOU_STOP_MULTISHOT;
+	return true;
+}
+
 int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_sr_msg *sr = io_kiocb_to_cmd(req);
@@ -424,6 +477,7 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	unsigned flags;
 	int ret, min_ret = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+	size_t len = sr->len;
 
 	sock = sock_from_file(req->file);
 	if (unlikely(!sock))
@@ -442,16 +496,17 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
 		return io_setup_async_msg(req, kmsg);
 
+retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
 
-		buf = io_buffer_select(req, &sr->len, issue_flags);
+		buf = io_buffer_select(req, &len, issue_flags);
 		if (!buf)
 			return -ENOBUFS;
 		kmsg->fast_iov[0].iov_base = buf;
-		kmsg->fast_iov[0].iov_len = sr->len;
+		kmsg->fast_iov[0].iov_len = len;
 		iov_iter_init(&kmsg->msg.msg_iter, READ, kmsg->fast_iov, 1,
-				sr->len);
+				len);
 	}
 
 	flags = sr->msg_flags;
@@ -463,8 +518,15 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	kmsg->msg.msg_get_inq = 1;
 	ret = __sys_recvmsg_sock(sock, &kmsg->msg, sr->umsg, kmsg->uaddr, flags);
 	if (ret < min_ret) {
-		if (ret == -EAGAIN && force_nonblock)
-			return io_setup_async_msg(req, kmsg);
+		if (ret == -EAGAIN && force_nonblock) {
+			ret = io_setup_async_msg(req, kmsg);
+			if (ret == -EAGAIN && (req->flags & IO_APOLL_MULTI_POLLED) ==
+					       IO_APOLL_MULTI_POLLED) {
+				io_kbuf_recycle(req, issue_flags);
+				return IOU_ISSUE_SKIP_COMPLETE;
+			}
+			return ret;
+		}
 		if (ret == -ERESTARTSYS)
 			ret = -EINTR;
 		if (ret > 0 && io_net_retry(sock, flags)) {
@@ -491,8 +553,11 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags)
 	cflags = io_put_kbuf(req, issue_flags);
 	if (kmsg->msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
-	io_req_set_res(req, ret, cflags);
-	return IOU_OK;
+
+	if (!io_recv_finish(req, &ret, cflags))
+		goto retry_multishot;
+
+	return ret;
 }
 
 int io_recv(struct io_kiocb *req, unsigned int issue_flags)
@@ -505,6 +570,7 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	unsigned flags;
 	int ret, min_ret = 0;
 	bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK;
+	size_t len = sr->len;
 
 	if (!(req->flags & REQ_F_POLLED) &&
 	    (sr->flags & IORING_RECVSEND_POLL_FIRST))
@@ -514,16 +580,17 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	if (unlikely(!sock))
 		return -ENOTSOCK;
 
+retry_multishot:
 	if (io_do_buffer_select(req)) {
 		void __user *buf;
 
-		buf = io_buffer_select(req, &sr->len, issue_flags);
+		buf = io_buffer_select(req, &len, issue_flags);
 		if (!buf)
 			return -ENOBUFS;
 		sr->buf = buf;
 	}
 
-	ret = import_single_range(READ, sr->buf, sr->len, &iov, &msg.msg_iter);
+	ret = import_single_range(READ, sr->buf, len, &iov, &msg.msg_iter);
 	if (unlikely(ret))
 		goto out_free;
 
@@ -543,8 +610,14 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 
 	ret = sock_recvmsg(sock, &msg, flags);
 	if (ret < min_ret) {
-		if (ret == -EAGAIN && force_nonblock)
+		if (ret == -EAGAIN && force_nonblock) {
+			if ((req->flags & IO_APOLL_MULTI_POLLED) == IO_APOLL_MULTI_POLLED) {
+				io_kbuf_recycle(req, issue_flags);
+				return IOU_ISSUE_SKIP_COMPLETE;
+			}
+
 			return -EAGAIN;
+		}
 		if (ret == -ERESTARTSYS)
 			ret = -EINTR;
 		if (ret > 0 && io_net_retry(sock, flags)) {
@@ -570,8 +643,11 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags)
 	cflags = io_put_kbuf(req, issue_flags);
 	if (msg.msg_inq)
 		cflags |= IORING_CQE_F_SOCK_NONEMPTY;
-	io_req_set_res(req, ret, cflags);
-	return IOU_OK;
+
+	if (!io_recv_finish(req, &ret, cflags))
+		goto retry_multishot;
+
+	return ret;
 }
 
 int io_accept_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 11/12] io_uring: fix io_uring_cqe_overflow trace format
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (10 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 10/12] io_uring: multishot recv Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  2022-06-30  9:12 ` [PATCH v2 for-next 12/12] io_uring: only trace one of complete or overflow Dylan Yudaken
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

Make the trace format consistent with io_uring_complete for cflags

Signed-off-by: Dylan Yudaken <[email protected]>
---
 include/trace/events/io_uring.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/events/io_uring.h b/include/trace/events/io_uring.h
index 918e3a43e4b2..95a8cfaad15a 100644
--- a/include/trace/events/io_uring.h
+++ b/include/trace/events/io_uring.h
@@ -594,7 +594,7 @@ TRACE_EVENT(io_uring_cqe_overflow,
 		__entry->ocqe		= ocqe;
 	),
 
-	TP_printk("ring %p, user_data 0x%llx, res %d, flags %x, "
+	TP_printk("ring %p, user_data 0x%llx, res %d, cflags 0x%x, "
 		  "overflow_cqe %p",
 		  __entry->ctx, __entry->user_data, __entry->res,
 		  __entry->cflags, __entry->ocqe)
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 for-next 12/12] io_uring: only trace one of complete or overflow
  2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
                   ` (11 preceding siblings ...)
  2022-06-30  9:12 ` [PATCH v2 for-next 11/12] io_uring: fix io_uring_cqe_overflow trace format Dylan Yudaken
@ 2022-06-30  9:12 ` Dylan Yudaken
  12 siblings, 0 replies; 14+ messages in thread
From: Dylan Yudaken @ 2022-06-30  9:12 UTC (permalink / raw)
  To: Jens Axboe, Pavel Begunkov, io-uring
  Cc: Kernel-team, linux-kernel, Dylan Yudaken

In overflow we see a duplcate line in the trace, and in some cases 3
lines (if initial io_post_aux_cqe fails).
Instead just trace once for each CQE

Signed-off-by: Dylan Yudaken <[email protected]>
---
 io_uring/io_uring.c |  3 ++-
 io_uring/io_uring.h | 10 ++++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 523b6ebad15a..caf979cd4327 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -742,7 +742,6 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx,
 	struct io_uring_cqe *cqe;
 
 	ctx->cq_extra++;
-	trace_io_uring_complete(ctx, NULL, user_data, res, cflags, 0, 0);
 
 	/*
 	 * If we can't get a cq entry, userspace overflowed the
@@ -751,6 +750,8 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx,
 	 */
 	cqe = io_get_cqe(ctx);
 	if (likely(cqe)) {
+		trace_io_uring_complete(ctx, NULL, user_data, res, cflags, 0, 0);
+
 		WRITE_ONCE(cqe->user_data, user_data);
 		WRITE_ONCE(cqe->res, res);
 		WRITE_ONCE(cqe->flags, cflags);
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index e022d71c177a..868f45d55543 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -101,10 +101,6 @@ static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx,
 {
 	struct io_uring_cqe *cqe;
 
-	trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
-				req->cqe.res, req->cqe.flags,
-				(req->flags & REQ_F_CQE32_INIT) ? req->extra1 : 0,
-				(req->flags & REQ_F_CQE32_INIT) ? req->extra2 : 0);
 	/*
 	 * If we can't get a cq entry, userspace overflowed the
 	 * submission (by quite a lot). Increment the overflow count in
@@ -113,6 +109,12 @@ static inline bool __io_fill_cqe_req(struct io_ring_ctx *ctx,
 	cqe = io_get_cqe(ctx);
 	if (unlikely(!cqe))
 		return io_req_cqe_overflow(req);
+
+	trace_io_uring_complete(req->ctx, req, req->cqe.user_data,
+				req->cqe.res, req->cqe.flags,
+				(req->flags & REQ_F_CQE32_INIT) ? req->extra1 : 0,
+				(req->flags & REQ_F_CQE32_INIT) ? req->extra2 : 0);
+
 	memcpy(cqe, &req->cqe, sizeof(*cqe));
 
 	if (ctx->flags & IORING_SETUP_CQE32) {
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-06-30  9:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-30  9:12 [PATCH v2 for-next 00/12] io_uring: multishot recv Dylan Yudaken
2022-06-30  9:12 ` Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 01/12] io_uring: allow 0 length for buffer select Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 02/12] io_uring: restore bgid in io_put_kbuf Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 03/12] io_uring: allow iov_len = 0 for recvmsg and buffer select Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 04/12] io_uring: recycle buffers on error Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 05/12] io_uring: clean up io_poll_check_events return values Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 06/12] io_uring: add IOU_STOP_MULTISHOT return code Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 07/12] io_uring: add allow_overflow to io_post_aux_cqe Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 08/12] io_uring: fix multishot poll on overflow Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 09/12] io_uring: fix multishot accept ordering Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 10/12] io_uring: multishot recv Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 11/12] io_uring: fix io_uring_cqe_overflow trace format Dylan Yudaken
2022-06-30  9:12 ` [PATCH v2 for-next 12/12] io_uring: only trace one of complete or overflow Dylan Yudaken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox