public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHSET 0/3] Cleanup and improve MSG_RING performance
@ 2024-03-28 18:52 Jens Axboe
  2024-03-28 18:52 ` [PATCH 1/3] io_uring: add remote task_work execution helper Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Jens Axboe @ 2024-03-28 18:52 UTC (permalink / raw)
  To: io-uring

Hi,

MSG_RING rolls its own task_work handling, which there's really no need
for. Rather, it should use the generic io_uring infrastructure for this.

Add a helper for remote execution, and switch over MSG_RING to use it.
This both cleans up the code, and improves performance of this opcode
considerably.

 io_uring/io_uring.c | 27 +++++++++++++++++++--------
 io_uring/io_uring.h |  2 ++
 io_uring/msg_ring.c | 37 ++++++++++++++-----------------------
 3 files changed, 35 insertions(+), 31 deletions(-)

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] io_uring: add remote task_work execution helper
  2024-03-28 18:52 [PATCHSET 0/3] Cleanup and improve MSG_RING performance Jens Axboe
@ 2024-03-28 18:52 ` Jens Axboe
  2024-03-29 12:51   ` Pavel Begunkov
  2024-03-28 18:52 ` [PATCH 2/3] io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring Jens Axboe
  2024-03-28 18:52 ` [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting Jens Axboe
  2 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2024-03-28 18:52 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

All our task_work handling is targeted at the state in the io_kiocb
itself, which is what it is being used for. However, MSG_RING rolls its
own task_work handling, ignoring how that is usually done.

In preparation for switching MSG_RING to be able to use the normal
task_work handling, add io_req_task_work_add_remote() which allows the
caller to pass in the target io_ring_ctx and task.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/io_uring.c | 27 +++++++++++++++++++--------
 io_uring/io_uring.h |  2 ++
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9978dbe00027..609ff9ea5930 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1241,9 +1241,10 @@ void tctx_task_work(struct callback_head *cb)
 	WARN_ON_ONCE(ret);
 }
 
-static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags)
+static inline void io_req_local_work_add(struct io_kiocb *req,
+					 struct io_ring_ctx *ctx,
+					 unsigned tw_flags)
 {
-	struct io_ring_ctx *ctx = req->ctx;
 	unsigned nr_wait, nr_tw, nr_tw_prev;
 	unsigned long flags;
 
@@ -1291,9 +1292,10 @@ static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags
 	wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
 }
 
-static void io_req_normal_work_add(struct io_kiocb *req)
+static void io_req_normal_work_add(struct io_kiocb *req,
+				   struct task_struct *task)
 {
-	struct io_uring_task *tctx = req->task->io_uring;
+	struct io_uring_task *tctx = task->io_uring;
 	struct io_ring_ctx *ctx = req->ctx;
 	unsigned long flags;
 	bool was_empty;
@@ -1319,7 +1321,7 @@ static void io_req_normal_work_add(struct io_kiocb *req)
 		return;
 	}
 
-	if (likely(!task_work_add(req->task, &tctx->task_work, ctx->notify_method)))
+	if (likely(!task_work_add(task, &tctx->task_work, ctx->notify_method)))
 		return;
 
 	io_fallback_tw(tctx, false);
@@ -1328,9 +1330,18 @@ static void io_req_normal_work_add(struct io_kiocb *req)
 void __io_req_task_work_add(struct io_kiocb *req, unsigned flags)
 {
 	if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN)
-		io_req_local_work_add(req, flags);
+		io_req_local_work_add(req, req->ctx, flags);
+	else
+		io_req_normal_work_add(req, req->task);
+}
+
+void io_req_task_work_add_remote(struct io_kiocb *req, struct task_struct *task,
+				 struct io_ring_ctx *ctx, unsigned flags)
+{
+	if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN)
+		io_req_local_work_add(req, ctx, flags);
 	else
-		io_req_normal_work_add(req);
+		io_req_normal_work_add(req, task);
 }
 
 static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
@@ -1349,7 +1360,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
 						    io_task_work.node);
 
 		node = node->next;
-		io_req_normal_work_add(req);
+		io_req_normal_work_add(req, req->task);
 	}
 }
 
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index bde463642c71..a6dec5321ec4 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -74,6 +74,8 @@ struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
 			       unsigned issue_flags);
 
 void __io_req_task_work_add(struct io_kiocb *req, unsigned flags);
+void io_req_task_work_add_remote(struct io_kiocb *req, struct task_struct *task,
+				 struct io_ring_ctx *ctx, unsigned flags);
 bool io_alloc_async_data(struct io_kiocb *req);
 void io_req_task_queue(struct io_kiocb *req);
 void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
  2024-03-28 18:52 [PATCHSET 0/3] Cleanup and improve MSG_RING performance Jens Axboe
  2024-03-28 18:52 ` [PATCH 1/3] io_uring: add remote task_work execution helper Jens Axboe
@ 2024-03-28 18:52 ` Jens Axboe
  2024-03-29 15:57   ` Pavel Begunkov
  2024-03-28 18:52 ` [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting Jens Axboe
  2 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2024-03-28 18:52 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Move the posting outside the checking and locking, it's cleaner that
way.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/msg_ring.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index cd6dcf634ba3..d1f66a40b4b4 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -147,13 +147,11 @@ static int io_msg_ring_data(struct io_kiocb *req, unsigned int issue_flags)
 	if (target_ctx->flags & IORING_SETUP_IOPOLL) {
 		if (unlikely(io_double_lock_ctx(target_ctx, issue_flags)))
 			return -EAGAIN;
-		if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
-			ret = 0;
-		io_double_unlock_ctx(target_ctx);
-	} else {
-		if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
-			ret = 0;
 	}
+	if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
+		ret = 0;
+	if (target_ctx->flags & IORING_SETUP_IOPOLL)
+		io_double_unlock_ctx(target_ctx);
 	return ret;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting
  2024-03-28 18:52 [PATCHSET 0/3] Cleanup and improve MSG_RING performance Jens Axboe
  2024-03-28 18:52 ` [PATCH 1/3] io_uring: add remote task_work execution helper Jens Axboe
  2024-03-28 18:52 ` [PATCH 2/3] io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring Jens Axboe
@ 2024-03-28 18:52 ` Jens Axboe
  2024-03-29 12:54   ` Pavel Begunkov
  2 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2024-03-28 18:52 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

Use the exported helper for queueing task_work, rather than rolling our
own.

This improves peak performance of message passing by about 5x in some
basic testing, with 2 threads just sending messages to each other.
Before this change, it was capped at around 700K/sec, with the change
it's at over 4M/sec.

Signed-off-by: Jens Axboe <[email protected]>
---
 io_uring/msg_ring.c | 27 ++++++++++-----------------
 1 file changed, 10 insertions(+), 17 deletions(-)

diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index d1f66a40b4b4..e12a9e8a910a 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -11,9 +11,9 @@
 #include "io_uring.h"
 #include "rsrc.h"
 #include "filetable.h"
+#include "refs.h"
 #include "msg_ring.h"
 
-
 /* All valid masks for MSG_RING */
 #define IORING_MSG_RING_MASK		(IORING_MSG_RING_CQE_SKIP | \
 					IORING_MSG_RING_FLAGS_PASS)
@@ -21,7 +21,6 @@
 struct io_msg {
 	struct file			*file;
 	struct file			*src_file;
-	struct callback_head		tw;
 	u64 user_data;
 	u32 len;
 	u32 cmd;
@@ -73,26 +72,20 @@ static inline bool io_msg_need_remote(struct io_ring_ctx *target_ctx)
 	return current != target_ctx->submitter_task;
 }
 
-static int io_msg_exec_remote(struct io_kiocb *req, task_work_func_t func)
+static int io_msg_exec_remote(struct io_kiocb *req, io_req_tw_func_t func)
 {
 	struct io_ring_ctx *ctx = req->file->private_data;
-	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
 	struct task_struct *task = READ_ONCE(ctx->submitter_task);
 
-	if (unlikely(!task))
-		return -EOWNERDEAD;
-
-	init_task_work(&msg->tw, func);
-	if (task_work_add(ctx->submitter_task, &msg->tw, TWA_SIGNAL))
-		return -EOWNERDEAD;
-
+	__io_req_set_refcount(req, 2);
+	req->io_task_work.func = func;
+	io_req_task_work_add_remote(req, task, ctx, IOU_F_TWQ_LAZY_WAKE);
 	return IOU_ISSUE_SKIP_COMPLETE;
 }
 
-static void io_msg_tw_complete(struct callback_head *head)
+static void io_msg_tw_complete(struct io_kiocb *req, struct io_tw_state *ts)
 {
-	struct io_msg *msg = container_of(head, struct io_msg, tw);
-	struct io_kiocb *req = cmd_to_io_kiocb(msg);
+	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
 	struct io_ring_ctx *target_ctx = req->file->private_data;
 	int ret = 0;
 
@@ -120,6 +113,7 @@ static void io_msg_tw_complete(struct callback_head *head)
 
 	if (ret < 0)
 		req_set_fail(req);
+	req_ref_put_and_test(req);
 	io_req_queue_tw_complete(req, ret);
 }
 
@@ -205,16 +199,15 @@ static int io_msg_install_complete(struct io_kiocb *req, unsigned int issue_flag
 	return ret;
 }
 
-static void io_msg_tw_fd_complete(struct callback_head *head)
+static void io_msg_tw_fd_complete(struct io_kiocb *req, struct io_tw_state *ts)
 {
-	struct io_msg *msg = container_of(head, struct io_msg, tw);
-	struct io_kiocb *req = cmd_to_io_kiocb(msg);
 	int ret = -EOWNERDEAD;
 
 	if (!(current->flags & PF_EXITING))
 		ret = io_msg_install_complete(req, IO_URING_F_UNLOCKED);
 	if (ret < 0)
 		req_set_fail(req);
+	req_ref_put_and_test(req);
 	io_req_queue_tw_complete(req, ret);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] io_uring: add remote task_work execution helper
  2024-03-28 18:52 ` [PATCH 1/3] io_uring: add remote task_work execution helper Jens Axboe
@ 2024-03-29 12:51   ` Pavel Begunkov
  2024-03-29 13:31     ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2024-03-29 12:51 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 3/28/24 18:52, Jens Axboe wrote:
> All our task_work handling is targeted at the state in the io_kiocb
> itself, which is what it is being used for. However, MSG_RING rolls its
> own task_work handling, ignoring how that is usually done.
> 
> In preparation for switching MSG_RING to be able to use the normal
> task_work handling, add io_req_task_work_add_remote() which allows the
> caller to pass in the target io_ring_ctx and task.
> 
> Signed-off-by: Jens Axboe <[email protected]>
> ---
>   io_uring/io_uring.c | 27 +++++++++++++++++++--------
>   io_uring/io_uring.h |  2 ++
>   2 files changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 9978dbe00027..609ff9ea5930 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1241,9 +1241,10 @@ void tctx_task_work(struct callback_head *cb)
>   	WARN_ON_ONCE(ret);
>   }
>   
> -static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags)
> +static inline void io_req_local_work_add(struct io_kiocb *req,
> +					 struct io_ring_ctx *ctx,
> +					 unsigned tw_flags)
>   {
> -	struct io_ring_ctx *ctx = req->ctx;
>   	unsigned nr_wait, nr_tw, nr_tw_prev;
>   	unsigned long flags;
>   
> @@ -1291,9 +1292,10 @@ static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags
>   	wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
>   }
>   
> -static void io_req_normal_work_add(struct io_kiocb *req)
> +static void io_req_normal_work_add(struct io_kiocb *req,
> +				   struct task_struct *task)
>   {
> -	struct io_uring_task *tctx = req->task->io_uring;
> +	struct io_uring_task *tctx = task->io_uring;
>   	struct io_ring_ctx *ctx = req->ctx;
>   	unsigned long flags;
>   	bool was_empty;
> @@ -1319,7 +1321,7 @@ static void io_req_normal_work_add(struct io_kiocb *req)
>   		return;
>   	}
>   
> -	if (likely(!task_work_add(req->task, &tctx->task_work, ctx->notify_method)))
> +	if (likely(!task_work_add(task, &tctx->task_work, ctx->notify_method)))
>   		return;
>   
>   	io_fallback_tw(tctx, false);
> @@ -1328,9 +1330,18 @@ static void io_req_normal_work_add(struct io_kiocb *req)
>   void __io_req_task_work_add(struct io_kiocb *req, unsigned flags)
>   {
>   	if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN)
> -		io_req_local_work_add(req, flags);
> +		io_req_local_work_add(req, req->ctx, flags);
> +	else
> +		io_req_normal_work_add(req, req->task);
> +}
> +
> +void io_req_task_work_add_remote(struct io_kiocb *req, struct task_struct *task,
> +				 struct io_ring_ctx *ctx, unsigned flags)

Urgh, even the declration screams that there is something wrong
considering it _either_ targets @ctx or @task.

Just pass @ctx, so it either use ctx->submitter_task or
req->task, hmm?

A side note, it's a dangerous game, I told it before. At least
it would've been nice to abuse lockdep in a form of:

io_req_task_complete(req, tw, ctx) {
	lockdep_assert(req->ctx == ctx);
	...
}

but we don't have @ctx there, maybe we'll add it to tw later.


> +{
> +	if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN)
> +		io_req_local_work_add(req, ctx, flags);
>   	else
> -		io_req_normal_work_add(req);
> +		io_req_normal_work_add(req, task);
>   }
>   
>   static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
> @@ -1349,7 +1360,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
>   						    io_task_work.node);
>   
>   		node = node->next;
> -		io_req_normal_work_add(req);
> +		io_req_normal_work_add(req, req->task);
>   	}
>   }
>   
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index bde463642c71..a6dec5321ec4 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -74,6 +74,8 @@ struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
>   			       unsigned issue_flags);
>   
>   void __io_req_task_work_add(struct io_kiocb *req, unsigned flags);
> +void io_req_task_work_add_remote(struct io_kiocb *req, struct task_struct *task,
> +				 struct io_ring_ctx *ctx, unsigned flags);
>   bool io_alloc_async_data(struct io_kiocb *req);
>   void io_req_task_queue(struct io_kiocb *req);
>   void io_req_task_complete(struct io_kiocb *req, struct io_tw_state *ts);

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting
  2024-03-28 18:52 ` [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting Jens Axboe
@ 2024-03-29 12:54   ` Pavel Begunkov
  2024-03-29 13:32     ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2024-03-29 12:54 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 3/28/24 18:52, Jens Axboe wrote:
> Use the exported helper for queueing task_work, rather than rolling our
> own.
> 
> This improves peak performance of message passing by about 5x in some
> basic testing, with 2 threads just sending messages to each other.
> Before this change, it was capped at around 700K/sec, with the change
> it's at over 4M/sec.
> 
> Signed-off-by: Jens Axboe <[email protected]>
> ---
>   io_uring/msg_ring.c | 27 ++++++++++-----------------
>   1 file changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
> index d1f66a40b4b4..e12a9e8a910a 100644
> --- a/io_uring/msg_ring.c
> +++ b/io_uring/msg_ring.c
> @@ -11,9 +11,9 @@
>   #include "io_uring.h"
>   #include "rsrc.h"
>   #include "filetable.h"
> +#include "refs.h"
>   #include "msg_ring.h"
>   
> -
>   /* All valid masks for MSG_RING */
>   #define IORING_MSG_RING_MASK		(IORING_MSG_RING_CQE_SKIP | \
>   					IORING_MSG_RING_FLAGS_PASS)
> @@ -21,7 +21,6 @@
>   struct io_msg {
>   	struct file			*file;
>   	struct file			*src_file;
> -	struct callback_head		tw;
>   	u64 user_data;
>   	u32 len;
>   	u32 cmd;
> @@ -73,26 +72,20 @@ static inline bool io_msg_need_remote(struct io_ring_ctx *target_ctx)
>   	return current != target_ctx->submitter_task;
>   }
>   
> -static int io_msg_exec_remote(struct io_kiocb *req, task_work_func_t func)
> +static int io_msg_exec_remote(struct io_kiocb *req, io_req_tw_func_t func)
>   {
>   	struct io_ring_ctx *ctx = req->file->private_data;
> -	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
>   	struct task_struct *task = READ_ONCE(ctx->submitter_task);
>   
> -	if (unlikely(!task))
> -		return -EOWNERDEAD;
> -
> -	init_task_work(&msg->tw, func);
> -	if (task_work_add(ctx->submitter_task, &msg->tw, TWA_SIGNAL))
> -		return -EOWNERDEAD;
> -
> +	__io_req_set_refcount(req, 2);

I'd argue it's better avoid any more req refcount users, I'd be more
happy it it dies out completely at some point.

Why it's even needed here? You pass it via tw to post a CQE/etc and
then pass it back via another tw hop to complete IIRC, the ownership
is clear. At least it worth a comment.


> +	req->io_task_work.func = func;
> +	io_req_task_work_add_remote(req, task, ctx, IOU_F_TWQ_LAZY_WAKE);
>   	return IOU_ISSUE_SKIP_COMPLETE;
>   }
>   
> -static void io_msg_tw_complete(struct callback_head *head)
> +static void io_msg_tw_complete(struct io_kiocb *req, struct io_tw_state *ts)
>   {
> -	struct io_msg *msg = container_of(head, struct io_msg, tw);
> -	struct io_kiocb *req = cmd_to_io_kiocb(msg);
> +	struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
>   	struct io_ring_ctx *target_ctx = req->file->private_data;
>   	int ret = 0;
>   
> @@ -120,6 +113,7 @@ static void io_msg_tw_complete(struct callback_head *head)
>   
>   	if (ret < 0)
>   		req_set_fail(req);
> +	req_ref_put_and_test(req);
>   	io_req_queue_tw_complete(req, ret);
>   }
>   
> @@ -205,16 +199,15 @@ static int io_msg_install_complete(struct io_kiocb *req, unsigned int issue_flag
>   	return ret;
>   }
>   
> -static void io_msg_tw_fd_complete(struct callback_head *head)
> +static void io_msg_tw_fd_complete(struct io_kiocb *req, struct io_tw_state *ts)
>   {
> -	struct io_msg *msg = container_of(head, struct io_msg, tw);
> -	struct io_kiocb *req = cmd_to_io_kiocb(msg);
>   	int ret = -EOWNERDEAD;
>   
>   	if (!(current->flags & PF_EXITING))
>   		ret = io_msg_install_complete(req, IO_URING_F_UNLOCKED);
>   	if (ret < 0)
>   		req_set_fail(req);
> +	req_ref_put_and_test(req);
>   	io_req_queue_tw_complete(req, ret);
>   }
>   

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] io_uring: add remote task_work execution helper
  2024-03-29 12:51   ` Pavel Begunkov
@ 2024-03-29 13:31     ` Jens Axboe
  2024-03-29 15:50       ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2024-03-29 13:31 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/29/24 6:51 AM, Pavel Begunkov wrote:
> On 3/28/24 18:52, Jens Axboe wrote:
>> All our task_work handling is targeted at the state in the io_kiocb
>> itself, which is what it is being used for. However, MSG_RING rolls its
>> own task_work handling, ignoring how that is usually done.
>>
>> In preparation for switching MSG_RING to be able to use the normal
>> task_work handling, add io_req_task_work_add_remote() which allows the
>> caller to pass in the target io_ring_ctx and task.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>   io_uring/io_uring.c | 27 +++++++++++++++++++--------
>>   io_uring/io_uring.h |  2 ++
>>   2 files changed, 21 insertions(+), 8 deletions(-)
>>
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 9978dbe00027..609ff9ea5930 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -1241,9 +1241,10 @@ void tctx_task_work(struct callback_head *cb)
>>       WARN_ON_ONCE(ret);
>>   }
>>   -static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags)
>> +static inline void io_req_local_work_add(struct io_kiocb *req,
>> +                     struct io_ring_ctx *ctx,
>> +                     unsigned tw_flags)
>>   {
>> -    struct io_ring_ctx *ctx = req->ctx;
>>       unsigned nr_wait, nr_tw, nr_tw_prev;
>>       unsigned long flags;
>>   @@ -1291,9 +1292,10 @@ static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags
>>       wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
>>   }
>>   -static void io_req_normal_work_add(struct io_kiocb *req)
>> +static void io_req_normal_work_add(struct io_kiocb *req,
>> +                   struct task_struct *task)
>>   {
>> -    struct io_uring_task *tctx = req->task->io_uring;
>> +    struct io_uring_task *tctx = task->io_uring;
>>       struct io_ring_ctx *ctx = req->ctx;
>>       unsigned long flags;
>>       bool was_empty;
>> @@ -1319,7 +1321,7 @@ static void io_req_normal_work_add(struct io_kiocb *req)
>>           return;
>>       }
>>   -    if (likely(!task_work_add(req->task, &tctx->task_work, ctx->notify_method)))
>> +    if (likely(!task_work_add(task, &tctx->task_work, ctx->notify_method)))
>>           return;
>>         io_fallback_tw(tctx, false);
>> @@ -1328,9 +1330,18 @@ static void io_req_normal_work_add(struct io_kiocb *req)
>>   void __io_req_task_work_add(struct io_kiocb *req, unsigned flags)
>>   {
>>       if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN)
>> -        io_req_local_work_add(req, flags);
>> +        io_req_local_work_add(req, req->ctx, flags);
>> +    else
>> +        io_req_normal_work_add(req, req->task);
>> +}
>> +
>> +void io_req_task_work_add_remote(struct io_kiocb *req, struct task_struct *task,
>> +                 struct io_ring_ctx *ctx, unsigned flags)
> 
> Urgh, even the declration screams that there is something wrong
> considering it _either_ targets @ctx or @task.
> 
> Just pass @ctx, so it either use ctx->submitter_task or
> req->task, hmm?

I actually since changed the above to use a common helper, so was
scratching my head a bit over your comment as it can't really work in
that setup without needing to check for whether ->submitter_task is set
or not. But I do agree this would be nicer, so I'll just return to using
the separate helpers for this and it should fall out nicely. The only
odd caller is the MSG_RING side, so makes sense to have it a bit more
separate rather than try and fold it in with the regular side of using
task_work.

> A side note, it's a dangerous game, I told it before. At least
> it would've been nice to abuse lockdep in a form of:
> 
> io_req_task_complete(req, tw, ctx) {
>     lockdep_assert(req->ctx == ctx);
>     ...
> }
> 
> but we don't have @ctx there, maybe we'll add it to tw later.

Agree, but a separate thing imho.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting
  2024-03-29 12:54   ` Pavel Begunkov
@ 2024-03-29 13:32     ` Jens Axboe
  2024-03-29 15:46       ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2024-03-29 13:32 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/29/24 6:54 AM, Pavel Begunkov wrote:
> On 3/28/24 18:52, Jens Axboe wrote:
>> Use the exported helper for queueing task_work, rather than rolling our
>> own.
>>
>> This improves peak performance of message passing by about 5x in some
>> basic testing, with 2 threads just sending messages to each other.
>> Before this change, it was capped at around 700K/sec, with the change
>> it's at over 4M/sec.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>   io_uring/msg_ring.c | 27 ++++++++++-----------------
>>   1 file changed, 10 insertions(+), 17 deletions(-)
>>
>> diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
>> index d1f66a40b4b4..e12a9e8a910a 100644
>> --- a/io_uring/msg_ring.c
>> +++ b/io_uring/msg_ring.c
>> @@ -11,9 +11,9 @@
>>   #include "io_uring.h"
>>   #include "rsrc.h"
>>   #include "filetable.h"
>> +#include "refs.h"
>>   #include "msg_ring.h"
>>   -
>>   /* All valid masks for MSG_RING */
>>   #define IORING_MSG_RING_MASK        (IORING_MSG_RING_CQE_SKIP | \
>>                       IORING_MSG_RING_FLAGS_PASS)
>> @@ -21,7 +21,6 @@
>>   struct io_msg {
>>       struct file            *file;
>>       struct file            *src_file;
>> -    struct callback_head        tw;
>>       u64 user_data;
>>       u32 len;
>>       u32 cmd;
>> @@ -73,26 +72,20 @@ static inline bool io_msg_need_remote(struct io_ring_ctx *target_ctx)
>>       return current != target_ctx->submitter_task;
>>   }
>>   -static int io_msg_exec_remote(struct io_kiocb *req, task_work_func_t func)
>> +static int io_msg_exec_remote(struct io_kiocb *req, io_req_tw_func_t func)
>>   {
>>       struct io_ring_ctx *ctx = req->file->private_data;
>> -    struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
>>       struct task_struct *task = READ_ONCE(ctx->submitter_task);
>>   -    if (unlikely(!task))
>> -        return -EOWNERDEAD;
>> -
>> -    init_task_work(&msg->tw, func);
>> -    if (task_work_add(ctx->submitter_task, &msg->tw, TWA_SIGNAL))
>> -        return -EOWNERDEAD;
>> -
>> +    __io_req_set_refcount(req, 2);
> 
> I'd argue it's better avoid any more req refcount users, I'd be more
> happy it it dies out completely at some point.
> 
> Why it's even needed here? You pass it via tw to post a CQE/etc and
> then pass it back via another tw hop to complete IIRC, the ownership
> is clear. At least it worth a comment.

It's not, it was more documentation than anything else. But I agree that
we should just avoid it, I'll kill it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting
  2024-03-29 13:32     ` Jens Axboe
@ 2024-03-29 15:46       ` Pavel Begunkov
  2024-03-29 15:47         ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2024-03-29 15:46 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 3/29/24 13:32, Jens Axboe wrote:
> On 3/29/24 6:54 AM, Pavel Begunkov wrote:
>> On 3/28/24 18:52, Jens Axboe wrote:
>>> Use the exported helper for queueing task_work, rather than rolling our
>>> own.
>>>
>>> This improves peak performance of message passing by about 5x in some
>>> basic testing, with 2 threads just sending messages to each other.
>>> Before this change, it was capped at around 700K/sec, with the change
>>> it's at over 4M/sec.
>>>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>> ---
>>>    io_uring/msg_ring.c | 27 ++++++++++-----------------
>>>    1 file changed, 10 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
>>> index d1f66a40b4b4..e12a9e8a910a 100644
>>> --- a/io_uring/msg_ring.c
>>> +++ b/io_uring/msg_ring.c
>>> @@ -11,9 +11,9 @@
>>>    #include "io_uring.h"
>>>    #include "rsrc.h"
>>>    #include "filetable.h"
>>> +#include "refs.h"
>>>    #include "msg_ring.h"
>>>    -
>>>    /* All valid masks for MSG_RING */
>>>    #define IORING_MSG_RING_MASK        (IORING_MSG_RING_CQE_SKIP | \
>>>                        IORING_MSG_RING_FLAGS_PASS)
>>> @@ -21,7 +21,6 @@
>>>    struct io_msg {
>>>        struct file            *file;
>>>        struct file            *src_file;
>>> -    struct callback_head        tw;
>>>        u64 user_data;
>>>        u32 len;
>>>        u32 cmd;
>>> @@ -73,26 +72,20 @@ static inline bool io_msg_need_remote(struct io_ring_ctx *target_ctx)
>>>        return current != target_ctx->submitter_task;
>>>    }
>>>    -static int io_msg_exec_remote(struct io_kiocb *req, task_work_func_t func)
>>> +static int io_msg_exec_remote(struct io_kiocb *req, io_req_tw_func_t func)
>>>    {
>>>        struct io_ring_ctx *ctx = req->file->private_data;
>>> -    struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
>>>        struct task_struct *task = READ_ONCE(ctx->submitter_task);
>>>    -    if (unlikely(!task))
>>> -        return -EOWNERDEAD;
>>> -
>>> -    init_task_work(&msg->tw, func);
>>> -    if (task_work_add(ctx->submitter_task, &msg->tw, TWA_SIGNAL))
>>> -        return -EOWNERDEAD;
>>> -
>>> +    __io_req_set_refcount(req, 2);
>>
>> I'd argue it's better avoid any more req refcount users, I'd be more
>> happy it it dies out completely at some point.
>>
>> Why it's even needed here? You pass it via tw to post a CQE/etc and
>> then pass it back via another tw hop to complete IIRC, the ownership
>> is clear. At least it worth a comment.
> 
> It's not, it was more documentation than anything else. But I agree that
> we should just avoid it, I'll kill it.

Great, it was confusing and I don't think it's even correct. In case
it comes with refcounting enabled you'd get only 1 ref instead of
desired 2. See how io_wq_submit_work() does it. Probably it's better
to kill the "__" set refs helper.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting
  2024-03-29 15:46       ` Pavel Begunkov
@ 2024-03-29 15:47         ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2024-03-29 15:47 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/29/24 9:46 AM, Pavel Begunkov wrote:
> On 3/29/24 13:32, Jens Axboe wrote:
>> On 3/29/24 6:54 AM, Pavel Begunkov wrote:
>>> On 3/28/24 18:52, Jens Axboe wrote:
>>>> Use the exported helper for queueing task_work, rather than rolling our
>>>> own.
>>>>
>>>> This improves peak performance of message passing by about 5x in some
>>>> basic testing, with 2 threads just sending messages to each other.
>>>> Before this change, it was capped at around 700K/sec, with the change
>>>> it's at over 4M/sec.
>>>>
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>> ---
>>>>    io_uring/msg_ring.c | 27 ++++++++++-----------------
>>>>    1 file changed, 10 insertions(+), 17 deletions(-)
>>>>
>>>> diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
>>>> index d1f66a40b4b4..e12a9e8a910a 100644
>>>> --- a/io_uring/msg_ring.c
>>>> +++ b/io_uring/msg_ring.c
>>>> @@ -11,9 +11,9 @@
>>>>    #include "io_uring.h"
>>>>    #include "rsrc.h"
>>>>    #include "filetable.h"
>>>> +#include "refs.h"
>>>>    #include "msg_ring.h"
>>>>    -
>>>>    /* All valid masks for MSG_RING */
>>>>    #define IORING_MSG_RING_MASK        (IORING_MSG_RING_CQE_SKIP | \
>>>>                        IORING_MSG_RING_FLAGS_PASS)
>>>> @@ -21,7 +21,6 @@
>>>>    struct io_msg {
>>>>        struct file            *file;
>>>>        struct file            *src_file;
>>>> -    struct callback_head        tw;
>>>>        u64 user_data;
>>>>        u32 len;
>>>>        u32 cmd;
>>>> @@ -73,26 +72,20 @@ static inline bool io_msg_need_remote(struct io_ring_ctx *target_ctx)
>>>>        return current != target_ctx->submitter_task;
>>>>    }
>>>>    -static int io_msg_exec_remote(struct io_kiocb *req, task_work_func_t func)
>>>> +static int io_msg_exec_remote(struct io_kiocb *req, io_req_tw_func_t func)
>>>>    {
>>>>        struct io_ring_ctx *ctx = req->file->private_data;
>>>> -    struct io_msg *msg = io_kiocb_to_cmd(req, struct io_msg);
>>>>        struct task_struct *task = READ_ONCE(ctx->submitter_task);
>>>>    -    if (unlikely(!task))
>>>> -        return -EOWNERDEAD;
>>>> -
>>>> -    init_task_work(&msg->tw, func);
>>>> -    if (task_work_add(ctx->submitter_task, &msg->tw, TWA_SIGNAL))
>>>> -        return -EOWNERDEAD;
>>>> -
>>>> +    __io_req_set_refcount(req, 2);
>>>
>>> I'd argue it's better avoid any more req refcount users, I'd be more
>>> happy it it dies out completely at some point.
>>>
>>> Why it's even needed here? You pass it via tw to post a CQE/etc and
>>> then pass it back via another tw hop to complete IIRC, the ownership
>>> is clear. At least it worth a comment.
>>
>> It's not, it was more documentation than anything else. But I agree that
>> we should just avoid it, I'll kill it.
> 
> Great, it was confusing and I don't think it's even correct. In case
> it comes with refcounting enabled you'd get only 1 ref instead of
> desired 2. See how io_wq_submit_work() does it. Probably it's better
> to kill the "__" set refs helper.

Yeah, I think there's a bit of room for cleanups on the refs side. But
thankfully it's not very prevalent in the code base.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] io_uring: add remote task_work execution helper
  2024-03-29 13:31     ` Jens Axboe
@ 2024-03-29 15:50       ` Pavel Begunkov
  2024-03-29 16:10         ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2024-03-29 15:50 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 3/29/24 13:31, Jens Axboe wrote:
> On 3/29/24 6:51 AM, Pavel Begunkov wrote:
>> On 3/28/24 18:52, Jens Axboe wrote:
>>> All our task_work handling is targeted at the state in the io_kiocb
>>> itself, which is what it is being used for. However, MSG_RING rolls its
>>> own task_work handling, ignoring how that is usually done.
>>>
>>> In preparation for switching MSG_RING to be able to use the normal
>>> task_work handling, add io_req_task_work_add_remote() which allows the
>>> caller to pass in the target io_ring_ctx and task.
>>>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>> ---
>>>    io_uring/io_uring.c | 27 +++++++++++++++++++--------
>>>    io_uring/io_uring.h |  2 ++
>>>    2 files changed, 21 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>> index 9978dbe00027..609ff9ea5930 100644
>>> --- a/io_uring/io_uring.c
>>> +++ b/io_uring/io_uring.c
>>> @@ -1241,9 +1241,10 @@ void tctx_task_work(struct callback_head *cb)
>>>        WARN_ON_ONCE(ret);
>>>    }
>>>    -static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags)
>>> +static inline void io_req_local_work_add(struct io_kiocb *req,
>>> +                     struct io_ring_ctx *ctx,
>>> +                     unsigned tw_flags)
>>>    {
>>> -    struct io_ring_ctx *ctx = req->ctx;
>>>        unsigned nr_wait, nr_tw, nr_tw_prev;
>>>        unsigned long flags;
>>>    @@ -1291,9 +1292,10 @@ static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags
>>>        wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
>>>    }
>>>    -static void io_req_normal_work_add(struct io_kiocb *req)
>>> +static void io_req_normal_work_add(struct io_kiocb *req,
>>> +                   struct task_struct *task)
>>>    {
>>> -    struct io_uring_task *tctx = req->task->io_uring;
>>> +    struct io_uring_task *tctx = task->io_uring;
>>>        struct io_ring_ctx *ctx = req->ctx;
>>>        unsigned long flags;
>>>        bool was_empty;
>>> @@ -1319,7 +1321,7 @@ static void io_req_normal_work_add(struct io_kiocb *req)
>>>            return;
>>>        }
>>>    -    if (likely(!task_work_add(req->task, &tctx->task_work, ctx->notify_method)))
>>> +    if (likely(!task_work_add(task, &tctx->task_work, ctx->notify_method)))
>>>            return;
>>>          io_fallback_tw(tctx, false);
>>> @@ -1328,9 +1330,18 @@ static void io_req_normal_work_add(struct io_kiocb *req)
>>>    void __io_req_task_work_add(struct io_kiocb *req, unsigned flags)
>>>    {
>>>        if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN)
>>> -        io_req_local_work_add(req, flags);
>>> +        io_req_local_work_add(req, req->ctx, flags);
>>> +    else
>>> +        io_req_normal_work_add(req, req->task);
>>> +}
>>> +
>>> +void io_req_task_work_add_remote(struct io_kiocb *req, struct task_struct *task,
>>> +                 struct io_ring_ctx *ctx, unsigned flags)
>>
>> Urgh, even the declration screams that there is something wrong
>> considering it _either_ targets @ctx or @task.
>>
>> Just pass @ctx, so it either use ctx->submitter_task or
>> req->task, hmm?
> 
> I actually since changed the above to use a common helper, so was
> scratching my head a bit over your comment as it can't really work in
> that setup without needing to check for whether ->submitter_task is set
> or not. But I do agree this would be nicer, so I'll just return to using
> the separate helpers for this and it should fall out nicely. The only
> odd caller is the MSG_RING side, so makes sense to have it a bit more
> separate rather than try and fold it in with the regular side of using
> task_work.
> 
>> A side note, it's a dangerous game, I told it before. At least
>> it would've been nice to abuse lockdep in a form of:
>>
>> io_req_task_complete(req, tw, ctx) {
>>      lockdep_assert(req->ctx == ctx);
>>      ...
>> }
>>
>> but we don't have @ctx there, maybe we'll add it to tw later.
> 
> Agree, but a separate thing imho.

It's not in a sense that condition couldn't have happened
before and the patch opening all possibilities.

We actually have @ctx via struct io_tctx_node, so considering
fallback it would probably be:

lockdep_assert(!current->io_uring ||
                current->io_uring->ctx == req->ctx);

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
  2024-03-28 18:52 ` [PATCH 2/3] io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring Jens Axboe
@ 2024-03-29 15:57   ` Pavel Begunkov
  2024-03-29 16:09     ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2024-03-29 15:57 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 3/28/24 18:52, Jens Axboe wrote:
> Move the posting outside the checking and locking, it's cleaner that
> way.
> 
> Signed-off-by: Jens Axboe <[email protected]>
> ---
>   io_uring/msg_ring.c | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
> index cd6dcf634ba3..d1f66a40b4b4 100644
> --- a/io_uring/msg_ring.c
> +++ b/io_uring/msg_ring.c
> @@ -147,13 +147,11 @@ static int io_msg_ring_data(struct io_kiocb *req, unsigned int issue_flags)
>   	if (target_ctx->flags & IORING_SETUP_IOPOLL) {
>   		if (unlikely(io_double_lock_ctx(target_ctx, issue_flags)))
>   			return -EAGAIN;
> -		if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
> -			ret = 0;
> -		io_double_unlock_ctx(target_ctx);
> -	} else {
> -		if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
> -			ret = 0;
>   	}

A side note, maybe we should just get rid of double locking, it's always
horrible, and always do the job via tw. With DEFER_TASKRUN it only benefits
when rings and bound to the same task => never for any sane use case, so it's
only about !DEFER_TASKRUN. Simpler but also more predictable for general
latency and so on since you need to wait/grab two locks.


> +	if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
> +		ret = 0;
> +	if (target_ctx->flags & IORING_SETUP_IOPOLL)
> +		io_double_unlock_ctx(target_ctx);
>   	return ret;
>   }
>   

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
  2024-03-29 15:57   ` Pavel Begunkov
@ 2024-03-29 16:09     ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2024-03-29 16:09 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/29/24 9:57 AM, Pavel Begunkov wrote:
> On 3/28/24 18:52, Jens Axboe wrote:
>> Move the posting outside the checking and locking, it's cleaner that
>> way.
>>
>> Signed-off-by: Jens Axboe <[email protected]>
>> ---
>>   io_uring/msg_ring.c | 10 ++++------
>>   1 file changed, 4 insertions(+), 6 deletions(-)
>>
>> diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
>> index cd6dcf634ba3..d1f66a40b4b4 100644
>> --- a/io_uring/msg_ring.c
>> +++ b/io_uring/msg_ring.c
>> @@ -147,13 +147,11 @@ static int io_msg_ring_data(struct io_kiocb *req, unsigned int issue_flags)
>>       if (target_ctx->flags & IORING_SETUP_IOPOLL) {
>>           if (unlikely(io_double_lock_ctx(target_ctx, issue_flags)))
>>               return -EAGAIN;
>> -        if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
>> -            ret = 0;
>> -        io_double_unlock_ctx(target_ctx);
>> -    } else {
>> -        if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
>> -            ret = 0;
>>       }
> 
> A side note, maybe we should just get rid of double locking, it's always
> horrible, and always do the job via tw. With DEFER_TASKRUN it only benefits
> when rings and bound to the same task => never for any sane use case, so it's
> only about !DEFER_TASKRUN. Simpler but also more predictable for general
> latency and so on since you need to wait/grab two locks.

It's not the prettiest, but at least for !DEFER_TASKRUN it's a LOT more
efficient than punting through task_work... This is more of a case of
DEFER_TASKRUN not being able to do this well, as we have strict
requirements on CQE posting.

The function is a bit misnamed imho, as it's not double locking, it's
just grabbing the target ctx lock. Should be io_lock_target_ctx() or
something like that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/3] io_uring: add remote task_work execution helper
  2024-03-29 15:50       ` Pavel Begunkov
@ 2024-03-29 16:10         ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2024-03-29 16:10 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 3/29/24 9:50 AM, Pavel Begunkov wrote:
> On 3/29/24 13:31, Jens Axboe wrote:
>> On 3/29/24 6:51 AM, Pavel Begunkov wrote:
>>> On 3/28/24 18:52, Jens Axboe wrote:
>>>> All our task_work handling is targeted at the state in the io_kiocb
>>>> itself, which is what it is being used for. However, MSG_RING rolls its
>>>> own task_work handling, ignoring how that is usually done.
>>>>
>>>> In preparation for switching MSG_RING to be able to use the normal
>>>> task_work handling, add io_req_task_work_add_remote() which allows the
>>>> caller to pass in the target io_ring_ctx and task.
>>>>
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>> ---
>>>>    io_uring/io_uring.c | 27 +++++++++++++++++++--------
>>>>    io_uring/io_uring.h |  2 ++
>>>>    2 files changed, 21 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>> index 9978dbe00027..609ff9ea5930 100644
>>>> --- a/io_uring/io_uring.c
>>>> +++ b/io_uring/io_uring.c
>>>> @@ -1241,9 +1241,10 @@ void tctx_task_work(struct callback_head *cb)
>>>>        WARN_ON_ONCE(ret);
>>>>    }
>>>>    -static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags)
>>>> +static inline void io_req_local_work_add(struct io_kiocb *req,
>>>> +                     struct io_ring_ctx *ctx,
>>>> +                     unsigned tw_flags)
>>>>    {
>>>> -    struct io_ring_ctx *ctx = req->ctx;
>>>>        unsigned nr_wait, nr_tw, nr_tw_prev;
>>>>        unsigned long flags;
>>>>    @@ -1291,9 +1292,10 @@ static inline void io_req_local_work_add(struct io_kiocb *req, unsigned tw_flags
>>>>        wake_up_state(ctx->submitter_task, TASK_INTERRUPTIBLE);
>>>>    }
>>>>    -static void io_req_normal_work_add(struct io_kiocb *req)
>>>> +static void io_req_normal_work_add(struct io_kiocb *req,
>>>> +                   struct task_struct *task)
>>>>    {
>>>> -    struct io_uring_task *tctx = req->task->io_uring;
>>>> +    struct io_uring_task *tctx = task->io_uring;
>>>>        struct io_ring_ctx *ctx = req->ctx;
>>>>        unsigned long flags;
>>>>        bool was_empty;
>>>> @@ -1319,7 +1321,7 @@ static void io_req_normal_work_add(struct io_kiocb *req)
>>>>            return;
>>>>        }
>>>>    -    if (likely(!task_work_add(req->task, &tctx->task_work, ctx->notify_method)))
>>>> +    if (likely(!task_work_add(task, &tctx->task_work, ctx->notify_method)))
>>>>            return;
>>>>          io_fallback_tw(tctx, false);
>>>> @@ -1328,9 +1330,18 @@ static void io_req_normal_work_add(struct io_kiocb *req)
>>>>    void __io_req_task_work_add(struct io_kiocb *req, unsigned flags)
>>>>    {
>>>>        if (req->ctx->flags & IORING_SETUP_DEFER_TASKRUN)
>>>> -        io_req_local_work_add(req, flags);
>>>> +        io_req_local_work_add(req, req->ctx, flags);
>>>> +    else
>>>> +        io_req_normal_work_add(req, req->task);
>>>> +}
>>>> +
>>>> +void io_req_task_work_add_remote(struct io_kiocb *req, struct task_struct *task,
>>>> +                 struct io_ring_ctx *ctx, unsigned flags)
>>>
>>> Urgh, even the declration screams that there is something wrong
>>> considering it _either_ targets @ctx or @task.
>>>
>>> Just pass @ctx, so it either use ctx->submitter_task or
>>> req->task, hmm?
>>
>> I actually since changed the above to use a common helper, so was
>> scratching my head a bit over your comment as it can't really work in
>> that setup without needing to check for whether ->submitter_task is set
>> or not. But I do agree this would be nicer, so I'll just return to using
>> the separate helpers for this and it should fall out nicely. The only
>> odd caller is the MSG_RING side, so makes sense to have it a bit more
>> separate rather than try and fold it in with the regular side of using
>> task_work.
>>
>>> A side note, it's a dangerous game, I told it before. At least
>>> it would've been nice to abuse lockdep in a form of:
>>>
>>> io_req_task_complete(req, tw, ctx) {
>>>      lockdep_assert(req->ctx == ctx);
>>>      ...
>>> }
>>>
>>> but we don't have @ctx there, maybe we'll add it to tw later.
>>
>> Agree, but a separate thing imho.
> 
> It's not in a sense that condition couldn't have happened
> before and the patch opening all possibilities.
> 
> We actually have @ctx via struct io_tctx_node, so considering
> fallback it would probably be:
> 
> lockdep_assert(!current->io_uring ||
>                current->io_uring->ctx == req->ctx);

That's not a bad idea. I did run all the testing verifying the ctx, and
it all appears fine. But adding the check is a good idea in general.
Want to send it?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-29 16:10 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-28 18:52 [PATCHSET 0/3] Cleanup and improve MSG_RING performance Jens Axboe
2024-03-28 18:52 ` [PATCH 1/3] io_uring: add remote task_work execution helper Jens Axboe
2024-03-29 12:51   ` Pavel Begunkov
2024-03-29 13:31     ` Jens Axboe
2024-03-29 15:50       ` Pavel Begunkov
2024-03-29 16:10         ` Jens Axboe
2024-03-28 18:52 ` [PATCH 2/3] io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring Jens Axboe
2024-03-29 15:57   ` Pavel Begunkov
2024-03-29 16:09     ` Jens Axboe
2024-03-28 18:52 ` [PATCH 3/3] io_uring/msg_ring: improve handling of target CQE posting Jens Axboe
2024-03-29 12:54   ` Pavel Begunkov
2024-03-29 13:32     ` Jens Axboe
2024-03-29 15:46       ` Pavel Begunkov
2024-03-29 15:47         ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox