public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough
@ 2023-05-15 12:54 Pavel Begunkov
  2023-05-15 12:54 ` [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper Pavel Begunkov
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-15 12:54 UTC (permalink / raw)
  To: linux-nvme, linux-block, io-uring, axboe
  Cc: kbusch, hch, sagi, joshi.k, Pavel Begunkov

Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.

The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
patchset, but for a quick test I took fio/t/io_uring with 4 threads each
reading their own drive and all pinned to the same CPU to make it CPU
bound and got +10% throughput improvement.

[1] https://lore.kernel.org/all/[email protected]/

Pavel Begunkov (2):
  io_uring/cmd: add cmd lazy tw wake helper
  nvme: optimise io_uring passthrough completion

 drivers/nvme/host/ioctl.c |  4 ++--
 include/linux/io_uring.h  | 18 ++++++++++++++++--
 io_uring/uring_cmd.c      | 16 ++++++++++++----
 3 files changed, 30 insertions(+), 8 deletions(-)


base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6
-- 
2.40.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper
  2023-05-15 12:54 [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Pavel Begunkov
@ 2023-05-15 12:54 ` Pavel Begunkov
  2023-05-16 10:00   ` Kanchan Joshi
  2023-05-15 12:54 ` [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion Pavel Begunkov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-15 12:54 UTC (permalink / raw)
  To: linux-nvme, linux-block, io-uring, axboe
  Cc: kbusch, hch, sagi, joshi.k, Pavel Begunkov

We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new
cmd tw helper accepting TWQ flags, and then add
io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and
imply the "lazy" semantics, i.e. it posts no more than 1 CQE and
delaying execution of this tw should not prevent forward progress.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 include/linux/io_uring.h | 18 ++++++++++++++++--
 io_uring/uring_cmd.c     | 16 ++++++++++++----
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 7fe31b2cd02f..bb9c666bd584 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
 			      struct iov_iter *iter, void *ioucmd);
 void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
 			unsigned issue_flags);
-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
-			void (*task_work_cb)(struct io_uring_cmd *, unsigned));
 struct sock *io_uring_get_socket(struct file *file);
 void __io_uring_cancel(bool cancel_all);
 void __io_uring_free(struct task_struct *tsk);
 void io_uring_unreg_ringfd(void);
 const char *io_uring_get_opcode(u8 opcode);
+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
+			    void (*task_work_cb)(struct io_uring_cmd *, unsigned),
+			    unsigned flags);
+/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */
+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
+			void (*task_work_cb)(struct io_uring_cmd *, unsigned));
+
+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
+			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+{
+	__io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
+}
 
 static inline void io_uring_files_cancel(void)
 {
@@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
 			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
 {
 }
+static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
+			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+{
+}
 static inline struct sock *io_uring_get_socket(struct file *file)
 {
 	return NULL;
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 5e32db48696d..476c7877ce58 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
 	ioucmd->task_work_cb(ioucmd, issue_flags);
 }
 
-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
-			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
+			void (*task_work_cb)(struct io_uring_cmd *, unsigned),
+			unsigned flags)
 {
 	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
 
 	ioucmd->task_work_cb = task_work_cb;
 	req->io_task_work.func = io_uring_cmd_work;
-	io_req_task_work_add(req);
+	__io_req_task_work_add(req, flags);
+}
+EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);
+
+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
+			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+{
+	__io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE);
 }
-EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
+EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy);
 
 static inline void io_req_set_cqe32_extra(struct io_kiocb *req,
 					  u64 extra1, u64 extra2)
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-15 12:54 [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Pavel Begunkov
  2023-05-15 12:54 ` [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper Pavel Begunkov
@ 2023-05-15 12:54 ` Pavel Begunkov
  2023-05-17  7:23   ` Christoph Hellwig
  2023-05-16 11:42 ` [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Anuj gupta
  2023-05-25 14:54 ` Jens Axboe
  3 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-15 12:54 UTC (permalink / raw)
  To: linux-nvme, linux-block, io-uring, axboe
  Cc: kbusch, hch, sagi, joshi.k, Pavel Begunkov

Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
commands completion. It further delays the execution of task_work for
DEFER_TASKRUN until there are enough of task_work items queued to meet
the waiting criteria, which reduces the number of wake ups we issue.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 drivers/nvme/host/ioctl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 81c5c9e38477..52ed1094ccbb 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -521,7 +521,7 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
 	if (cookie != NULL && blk_rq_is_poll(req))
 		nvme_uring_task_cb(ioucmd, IO_URING_F_UNLOCKED);
 	else
-		io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb);
+		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
 
 	return RQ_END_IO_FREE;
 }
@@ -543,7 +543,7 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io_meta(struct request *req,
 	if (cookie != NULL && blk_rq_is_poll(req))
 		nvme_uring_task_meta_cb(ioucmd, IO_URING_F_UNLOCKED);
 	else
-		io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_meta_cb);
+		io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_meta_cb);
 
 	return RQ_END_IO_NONE;
 }
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper
  2023-05-15 12:54 ` [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper Pavel Begunkov
@ 2023-05-16 10:00   ` Kanchan Joshi
  2023-05-16 18:52     ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Kanchan Joshi @ 2023-05-16 10:00 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, hch, sagi

[-- Attachment #1: Type: text/plain, Size: 4023 bytes --]

On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote:
>We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new
>cmd tw helper accepting TWQ flags, and then add
>io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and
>imply the "lazy" semantics, i.e. it posts no more than 1 CQE and
>delaying execution of this tw should not prevent forward progress.
>
>Signed-off-by: Pavel Begunkov <[email protected]>
>---
> include/linux/io_uring.h | 18 ++++++++++++++++--
> io_uring/uring_cmd.c     | 16 ++++++++++++----
> 2 files changed, 28 insertions(+), 6 deletions(-)
>
>diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
>index 7fe31b2cd02f..bb9c666bd584 100644
>--- a/include/linux/io_uring.h
>+++ b/include/linux/io_uring.h
>@@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
> 			      struct iov_iter *iter, void *ioucmd);
> void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
> 			unsigned issue_flags);
>-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>-			void (*task_work_cb)(struct io_uring_cmd *, unsigned));
> struct sock *io_uring_get_socket(struct file *file);
> void __io_uring_cancel(bool cancel_all);
> void __io_uring_free(struct task_struct *tsk);
> void io_uring_unreg_ringfd(void);
> const char *io_uring_get_opcode(u8 opcode);
>+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>+			    void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>+			    unsigned flags);
>+/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */

Should this also translate to some warn_on anywhere?
>+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>+			void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>+
>+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>+			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>+{
>+	__io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>+}
>
> static inline void io_uring_files_cancel(void)
> {
>@@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
> 			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
> {
> }
>+static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>+			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>+{
>+}
> static inline struct sock *io_uring_get_socket(struct file *file)
> {
> 	return NULL;
>diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>index 5e32db48696d..476c7877ce58 100644
>--- a/io_uring/uring_cmd.c
>+++ b/io_uring/uring_cmd.c
>@@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
> 	ioucmd->task_work_cb(ioucmd, issue_flags);
> }
>
>-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>-			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>+			void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>+			unsigned flags)
> {
> 	struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
>
> 	ioucmd->task_work_cb = task_work_cb;
> 	req->io_task_work.func = io_uring_cmd_work;
>-	io_req_task_work_add(req);
>+	__io_req_task_work_add(req, flags);
>+}
>+EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);

Any reason to export this? No one is using this at the moment.
>+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>+			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>+{
>+	__io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE);
> }
>-EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
>+EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy);

Seems you did not want callers to pass the the new flag (LAZY_WAKE) and
therefore this helper.
And if you did not want callers to know about this flag (internal
details of io_uring), it would be better to have two exported helpers
io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task().
Both will use the internal helper __io_uring_cmd_do_in_task with
different flag.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough
  2023-05-15 12:54 [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Pavel Begunkov
  2023-05-15 12:54 ` [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper Pavel Begunkov
  2023-05-15 12:54 ` [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion Pavel Begunkov
@ 2023-05-16 11:42 ` Anuj gupta
  2023-05-16 18:38   ` Pavel Begunkov
  2023-05-25 14:54 ` Jens Axboe
  3 siblings, 1 reply; 19+ messages in thread
From: Anuj gupta @ 2023-05-16 11:42 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, hch, sagi, joshi.k

On Mon, May 15, 2023 at 6:29 PM Pavel Begunkov <[email protected]> wrote:
>
> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.
>
> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
> patchset, but for a quick test I took fio/t/io_uring with 4 threads each
> reading their own drive and all pinned to the same CPU to make it CPU
> bound and got +10% throughput improvement.
>
> [1] https://lore.kernel.org/all/[email protected]/
>
> Pavel Begunkov (2):
>   io_uring/cmd: add cmd lazy tw wake helper
>   nvme: optimise io_uring passthrough completion
>
>  drivers/nvme/host/ioctl.c |  4 ++--
>  include/linux/io_uring.h  | 18 ++++++++++++++++--
>  io_uring/uring_cmd.c      | 16 ++++++++++++----
>  3 files changed, 30 insertions(+), 8 deletions(-)
>
>
> base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6
> --
> 2.40.0
>

I tried to run a few workloads on my setup with your patches applied. However, I
couldn't see any difference in io passthrough performance. I might have missed
something. Can you share the workload that you ran which gave you the perf
improvement. Here is the workload that I ran -

Without your patches applied -

# taskset -c 0 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0
-u1 -n1 /dev/ng0n1
submitter=0, tid=2049, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=2.83M, BW=1382MiB/s, IOS/call=16/15
IOPS=2.82M, BW=1379MiB/s, IOS/call=16/16
IOPS=2.84M, BW=1388MiB/s, IOS/call=16/15
Exiting on timeout
Maximum IOPS=2.84M

# taskset -c 0,3 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0
-O0 -u1 -n2 /dev/ng0n1 /dev/ng1n1
submitter=0, tid=2046, file=/dev/ng0n1, node=-1
submitter=1, tid=2047, file=/dev/ng1n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=5.72M, BW=2.79GiB/s, IOS/call=16/15
IOPS=5.71M, BW=2.79GiB/s, IOS/call=16/16
IOPS=5.70M, BW=2.78GiB/s, IOS/call=16/15
Exiting on timeout Maximum IOPS=5.72M

With your patches applied -

# taskset -c 0 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0
-u1 -n1 /dev/ng0n1
submitter=0, tid=2032, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=2.83M, BW=1381MiB/s, IOS/call=16/15
IOPS=2.83M, BW=1379MiB/s, IOS/call=16/15
IOPS=2.83M, BW=1383MiB/s, IOS/call=15/15
Exiting on timeout Maximum IOPS=2.83M

# taskset -c 0,3 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0
-O0 -u1 -n2 /dev/ng0n1 /dev/ng1n1
submitter=1, tid=2037, file=/dev/ng1n1, node=-1
submitter=0, tid=2036, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=5.64M, BW=2.75GiB/s, IOS/call=15/15
IOPS=5.62M, BW=2.75GiB/s, IOS/call=16/16
IOPS=5.62M, BW=2.74GiB/s, IOS/call=16/16
Exiting on timeout Maximum IOPS=5.64M

--
Anuj Gupta

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough
  2023-05-16 11:42 ` [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Anuj gupta
@ 2023-05-16 18:38   ` Pavel Begunkov
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-16 18:38 UTC (permalink / raw)
  To: Anuj gupta
  Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, hch, sagi, joshi.k

On 5/16/23 12:42, Anuj gupta wrote:
> On Mon, May 15, 2023 at 6:29 PM Pavel Begunkov <[email protected]> wrote:
>>
>> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.
>>
>> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
>> patchset, but for a quick test I took fio/t/io_uring with 4 threads each
>> reading their own drive and all pinned to the same CPU to make it CPU
>> bound and got +10% throughput improvement.
>>
>> [1] https://lore.kernel.org/all/[email protected]/
>>
>> Pavel Begunkov (2):
>>    io_uring/cmd: add cmd lazy tw wake helper
>>    nvme: optimise io_uring passthrough completion
>>
>>   drivers/nvme/host/ioctl.c |  4 ++--
>>   include/linux/io_uring.h  | 18 ++++++++++++++++--
>>   io_uring/uring_cmd.c      | 16 ++++++++++++----
>>   3 files changed, 30 insertions(+), 8 deletions(-)
>>
>>
>> base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6
>> --
>> 2.40.0
>>
> 
> I tried to run a few workloads on my setup with your patches applied. However, I
> couldn't see any difference in io passthrough performance. I might have missed
> something. Can you share the workload that you ran which gave you the perf
> improvement. Here is the workload that I ran -

The patch is way to make completion batching more consistent. If you're so
lucky that all IO complete before task_work runs, it'll be perfect batching
and there is nothing to improve. That often happens with high throughput
benchmarks because of how consistent they are: no writes, same size,
everything is issued at the same time and so on. In reality it depends
on your use pattern, timings, nvme coalescing, will also change if you
introduce a second drive, and so on.

With the patch t/io_uring should run task_work once for exactly the
number of cqes the user is waiting for, i.e. -c<N>, regardless of
circumstances.

Just tried it out to confirm,

taskset -c 0 nice -n -20 /t/io_uring -p0 -d4 -b8192 -s4 -c4 -F1 -B1 -R0 -X1 -u1 -O0 /dev/ng0n1

Without:
12:11:10 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:11:20 PM    0    2.03    0.00   25.95    0.00    0.00    0.00    0.00    0.00    0.00   72.03
With:
12:12:00 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
12:12:10 PM    0    2.22    0.00   17.39    0.00    0.00    0.00    0.00    0.00    0.00   80.40


Double checking it works:

echo 1 > /sys/kernel/debug/tracing/events/io_uring/io_uring_local_work_run/enable
cat /sys/kernel/debug/tracing/trace_pipe

Without I see

io_uring-4108    [000] .....   653.820369: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108    [000] .....   653.820371: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108    [000] .....   653.820382: io_uring_local_work_run: ring 00000000b843f57f, count 2, loops 1
io_uring-4108    [000] .....   653.820383: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108    [000] .....   653.820386: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108    [000] .....   653.820398: io_uring_local_work_run: ring 00000000b843f57f, count 2, loops 1
io_uring-4108    [000] .....   653.820398: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1

And with patches it's strictly count=4.

Another way would be to add more SSDs to the picture and hope they don't
conspire to complete at the same time


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper
  2023-05-16 10:00   ` Kanchan Joshi
@ 2023-05-16 18:52     ` Pavel Begunkov
  2023-05-17 10:33       ` Kanchan Joshi
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-16 18:52 UTC (permalink / raw)
  To: Kanchan Joshi; +Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, hch, sagi

On 5/16/23 11:00, Kanchan Joshi wrote:
> On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote:
>> We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new
>> cmd tw helper accepting TWQ flags, and then add
>> io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and
>> imply the "lazy" semantics, i.e. it posts no more than 1 CQE and
>> delaying execution of this tw should not prevent forward progress.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> include/linux/io_uring.h | 18 ++++++++++++++++--
>> io_uring/uring_cmd.c     | 16 ++++++++++++----
>> 2 files changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
>> index 7fe31b2cd02f..bb9c666bd584 100644
>> --- a/include/linux/io_uring.h
>> +++ b/include/linux/io_uring.h
>> @@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
>>                   struct iov_iter *iter, void *ioucmd);
>> void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
>>             unsigned issue_flags);
>> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>> -            void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>> struct sock *io_uring_get_socket(struct file *file);
>> void __io_uring_cancel(bool cancel_all);
>> void __io_uring_free(struct task_struct *tsk);
>> void io_uring_unreg_ringfd(void);
>> const char *io_uring_get_opcode(u8 opcode);
>> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>> +                void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>> +                unsigned flags);
>> +/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */
> 
> Should this also translate to some warn_on anywhere?

Would love to but don't see how. We can only check it doesn't
produce more than 1 CQE, but that would need

nr_cqes_before = cqes_ready();
tw_item->run();
WARN_ON(cqes_ready() >= nr_cqes_before + 1);

but that's just too ugly


>> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>> +
>> +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>> +{
>> +    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>> +}
>>
>> static inline void io_uring_files_cancel(void)
>> {
>> @@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>             void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>> {
>> }
>> +static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>> +{
>> +}
>> static inline struct sock *io_uring_get_socket(struct file *file)
>> {
>>     return NULL;
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index 5e32db48696d..476c7877ce58 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>     ioucmd->task_work_cb(ioucmd, issue_flags);
>> }
>>
>> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>> -            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>> +            unsigned flags)
>> {
>>     struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
>>
>>     ioucmd->task_work_cb = task_work_cb;
>>     req->io_task_work.func = io_uring_cmd_work;
>> -    io_req_task_work_add(req);
>> +    __io_req_task_work_add(req, flags);
>> +}
>> +EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);

--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h

+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
+			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
+{
+	__io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
+}

That should fail for nvme unless exported.

> Any reason to export this? No one is using this at the moment.
>> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>> +{
>> +    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE);
>> }
>> -EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
>> +EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy);
> 
> Seems you did not want callers to pass the the new flag (LAZY_WAKE) and
> therefore this helper.

Yep, I wouldn't mind exposing just *LAZY_WAKE but don't want
to let it use whatever flags there might be in the future.

Initially I wanted to just make io_uring_cmd_complete_in_task and
io_uring_cmd_do_in_task_lazy static inline, but that would need
some code shuffling to make it clean.

> And if you did not want callers to know about this flag (internal
> details of io_uring), it would be better to have two exported helpers
> io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task().
> Both will use the internal helper __io_uring_cmd_do_in_task with
> different flag.

That's how it should be in this patch

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-15 12:54 ` [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion Pavel Begunkov
@ 2023-05-17  7:23   ` Christoph Hellwig
  2023-05-17 12:32     ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2023-05-17  7:23 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, hch, sagi, joshi.k

On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
> commands completion. It further delays the execution of task_work for
> DEFER_TASKRUN until there are enough of task_work items queued to meet
> the waiting criteria, which reduces the number of wake ups we issue.

Why wouldn't you just do that unconditionally for
io_uring_cmd_complete_in_task?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper
  2023-05-16 18:52     ` Pavel Begunkov
@ 2023-05-17 10:33       ` Kanchan Joshi
  2023-05-17 12:00         ` Pavel Begunkov
  2023-05-19 15:00         ` Pavel Begunkov
  0 siblings, 2 replies; 19+ messages in thread
From: Kanchan Joshi @ 2023-05-17 10:33 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, hch, sagi

[-- Attachment #1: Type: text/plain, Size: 5770 bytes --]

On Tue, May 16, 2023 at 07:52:23PM +0100, Pavel Begunkov wrote:
>On 5/16/23 11:00, Kanchan Joshi wrote:
>>On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote:
>>>We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new
>>>cmd tw helper accepting TWQ flags, and then add
>>>io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and
>>>imply the "lazy" semantics, i.e. it posts no more than 1 CQE and
>>>delaying execution of this tw should not prevent forward progress.
>>>
>>>Signed-off-by: Pavel Begunkov <[email protected]>
>>>---
>>>include/linux/io_uring.h | 18 ++++++++++++++++--
>>>io_uring/uring_cmd.c     | 16 ++++++++++++----
>>>2 files changed, 28 insertions(+), 6 deletions(-)
>>>
>>>diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
>>>index 7fe31b2cd02f..bb9c666bd584 100644
>>>--- a/include/linux/io_uring.h
>>>+++ b/include/linux/io_uring.h
>>>@@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
>>>                  struct iov_iter *iter, void *ioucmd);
>>>void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
>>>            unsigned issue_flags);
>>>-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>-            void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>>>struct sock *io_uring_get_socket(struct file *file);
>>>void __io_uring_cancel(bool cancel_all);
>>>void __io_uring_free(struct task_struct *tsk);
>>>void io_uring_unreg_ringfd(void);
>>>const char *io_uring_get_opcode(u8 opcode);
>>>+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>>>+                void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>>>+                unsigned flags);
>>>+/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */
>>
>>Should this also translate to some warn_on anywhere?
>
>Would love to but don't see how. We can only check it doesn't
>produce more than 1 CQE, but that would need
>
>nr_cqes_before = cqes_ready();
>tw_item->run();
>WARN_ON(cqes_ready() >= nr_cqes_before + 1);
>
>but that's just too ugly
>
>
>>>+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>+            void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>>>+
>>>+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>+            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>+{
>>>+    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>>>+}
>>>
>>>static inline void io_uring_files_cancel(void)
>>>{
>>>@@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>{
>>>}
>>>+static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>+            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>+{
>>>+}
>>>static inline struct sock *io_uring_get_socket(struct file *file)
>>>{
>>>    return NULL;
>>>diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>index 5e32db48696d..476c7877ce58 100644
>>>--- a/io_uring/uring_cmd.c
>>>+++ b/io_uring/uring_cmd.c
>>>@@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>    ioucmd->task_work_cb(ioucmd, issue_flags);
>>>}
>>>
>>>-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>-            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>>>+            void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>>>+            unsigned flags)
>>>{
>>>    struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
>>>
>>>    ioucmd->task_work_cb = task_work_cb;
>>>    req->io_task_work.func = io_uring_cmd_work;
>>>-    io_req_task_work_add(req);
>>>+    __io_req_task_work_add(req, flags);
>>>+}
>>>+EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);
>
>--- a/include/linux/io_uring.h
>+++ b/include/linux/io_uring.h
>
>+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>+			void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>+{
>+	__io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>+}
>
>That should fail for nvme unless exported.

But it does not. Give it a try.

>>Any reason to export this? No one is using this at the moment.
>>>+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>+            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>+{
>>>+    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE);
>>>}
>>>-EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
>>>+EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy);
>>
>>Seems you did not want callers to pass the the new flag (LAZY_WAKE) and
>>therefore this helper.
>
>Yep, I wouldn't mind exposing just *LAZY_WAKE but don't want
>to let it use whatever flags there might be in the future.
>
>Initially I wanted to just make io_uring_cmd_complete_in_task and
>io_uring_cmd_do_in_task_lazy static inline, but that would need
>some code shuffling to make it clean.
>
>>And if you did not want callers to know about this flag (internal
>>details of io_uring), it would be better to have two exported helpers
>>io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task().
>>Both will use the internal helper __io_uring_cmd_do_in_task with
>>different flag.
>
>That's how it should be in this patch

Nah, in this patch __io_uring_cmd_do_in_task is exported helper. And
io_uring_cmd_complete_in_task has been changed too (explicit export to
header based one). Seems like bit more shuffling than what is necessary.

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper
  2023-05-17 10:33       ` Kanchan Joshi
@ 2023-05-17 12:00         ` Pavel Begunkov
  2023-05-19 15:00         ` Pavel Begunkov
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-17 12:00 UTC (permalink / raw)
  To: Kanchan Joshi; +Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, hch, sagi

On 5/17/23 11:33, Kanchan Joshi wrote:
> On Tue, May 16, 2023 at 07:52:23PM +0100, Pavel Begunkov wrote:
>> On 5/16/23 11:00, Kanchan Joshi wrote:
>>> On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote:
>>>> We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new
>>>> cmd tw helper accepting TWQ flags, and then add
>>>> io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and
>>>> imply the "lazy" semantics, i.e. it posts no more than 1 CQE and
>>>> delaying execution of this tw should not prevent forward progress.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> ---
>>>> include/linux/io_uring.h | 18 ++++++++++++++++--
>>>> io_uring/uring_cmd.c     | 16 ++++++++++++----
>>>> 2 files changed, 28 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
>>>> index 7fe31b2cd02f..bb9c666bd584 100644
>>>> --- a/include/linux/io_uring.h
>>>> +++ b/include/linux/io_uring.h
>>>> @@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
>>>>                   struct iov_iter *iter, void *ioucmd);
>>>> void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
>>>>             unsigned issue_flags);
>>>> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>> -            void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>>>> struct sock *io_uring_get_socket(struct file *file);
>>>> void __io_uring_cancel(bool cancel_all);
>>>> void __io_uring_free(struct task_struct *tsk);
>>>> void io_uring_unreg_ringfd(void);
>>>> const char *io_uring_get_opcode(u8 opcode);
>>>> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>>>> +                void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>>>> +                unsigned flags);
>>>> +/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */
>>>
>>> Should this also translate to some warn_on anywhere?
>>
>> Would love to but don't see how. We can only check it doesn't
>> produce more than 1 CQE, but that would need
>>
>> nr_cqes_before = cqes_ready();
>> tw_item->run();
>> WARN_ON(cqes_ready() >= nr_cqes_before + 1);
>>
>> but that's just too ugly
>>
>>
>>>> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>>>> +
>>>> +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +{
>>>> +    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>>>> +}
>>>>
>>>> static inline void io_uring_files_cancel(void)
>>>> {
>>>> @@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>>             void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> {
>>>> }
>>>> +static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +{
>>>> +}
>>>> static inline struct sock *io_uring_get_socket(struct file *file)
>>>> {
>>>>     return NULL;
>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>> index 5e32db48696d..476c7877ce58 100644
>>>> --- a/io_uring/uring_cmd.c
>>>> +++ b/io_uring/uring_cmd.c
>>>> @@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>>     ioucmd->task_work_cb(ioucmd, issue_flags);
>>>> }
>>>>
>>>> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>> -            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>>>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>>>> +            unsigned flags)
>>>> {
>>>>     struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
>>>>
>>>>     ioucmd->task_work_cb = task_work_cb;
>>>>     req->io_task_work.func = io_uring_cmd_work;
>>>> -    io_req_task_work_add(req);
>>>> +    __io_req_task_work_add(req, flags);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);
>>
>> --- a/include/linux/io_uring.h
>> +++ b/include/linux/io_uring.h
>>
>> +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>> +{
>> +    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>> +}
>>
>> That should fail for nvme unless exported.
> 
> But it does not. Give it a try.
> 
>>> Any reason to export this? No one is using this at the moment.
>>>> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +{
>>>> +    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE);
>>>> }
>>>> -EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
>>>> +EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy);
>>>
>>> Seems you did not want callers to pass the the new flag (LAZY_WAKE) and
>>> therefore this helper.
>>
>> Yep, I wouldn't mind exposing just *LAZY_WAKE but don't want
>> to let it use whatever flags there might be in the future.
>>
>> Initially I wanted to just make io_uring_cmd_complete_in_task and
>> io_uring_cmd_do_in_task_lazy static inline, but that would need
>> some code shuffling to make it clean.
>>
>>> And if you did not want callers to know about this flag (internal
>>> details of io_uring), it would be better to have two exported helpers
>>> io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task().
>>> Both will use the internal helper __io_uring_cmd_do_in_task with
>>> different flag.
>>
>> That's how it should be in this patch
> 
> Nah, in this patch __io_uring_cmd_do_in_task is exported helper. And
> io_uring_cmd_complete_in_task has been changed too (explicit export to
> header based one). Seems like bit more shuffling than what is necessary

Ok, I'll check, thanks


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-17  7:23   ` Christoph Hellwig
@ 2023-05-17 12:32     ` Pavel Begunkov
  2023-05-17 12:39       ` Christoph Hellwig
  2023-05-17 19:31       ` Jens Axboe
  0 siblings, 2 replies; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-17 12:32 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, sagi, joshi.k

On 5/17/23 08:23, Christoph Hellwig wrote:
> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
>> commands completion. It further delays the execution of task_work for
>> DEFER_TASKRUN until there are enough of task_work items queued to meet
>> the waiting criteria, which reduces the number of wake ups we issue.
> 
> Why wouldn't you just do that unconditionally for
> io_uring_cmd_complete_in_task?

1) ublk does secondary batching and so may produce multiple cqes,
that's not supported. I believe Ming sent patches removing it,
but I'd rather not deal with conflicts for now.

2) Some users may have dependencies b/w requests, i.e. a request
will only complete when another request's task_work is executed.

3) There might be use cases when you don't wont it to be delayed,
IO retries would be a good example. I wouldn't also use it for
control paths like ublk_ctrl_uring_cmd.

Let's better keep io_uring_cmd_complete_in_task() as the default
option for the interface.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-17 12:32     ` Pavel Begunkov
@ 2023-05-17 12:39       ` Christoph Hellwig
  2023-05-17 13:30         ` Pavel Begunkov
  2023-05-17 19:31       ` Jens Axboe
  1 sibling, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2023-05-17 12:39 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Christoph Hellwig, linux-nvme, linux-block, io-uring, axboe,
	kbusch, sagi, joshi.k

On Wed, May 17, 2023 at 01:32:53PM +0100, Pavel Begunkov wrote:
> 1) ublk does secondary batching and so may produce multiple cqes,
> that's not supported. I believe Ming sent patches removing it,
> but I'd rather not deal with conflicts for now.
>
> 2) Some users may have dependencies b/w requests, i.e. a request
> will only complete when another request's task_work is executed.
>
> 3) There might be use cases when you don't wont it to be delayed,
> IO retries would be a good example. I wouldn't also use it for
> control paths like ublk_ctrl_uring_cmd.

You speak a lot of some users and some cases when the only users
are ublk and nvme, both of which would obviously benefit.

If you don't want conflicts wait for Ming to finish his work
and then we can do this cleanly and without leaving dead code
around.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-17 12:39       ` Christoph Hellwig
@ 2023-05-17 13:30         ` Pavel Begunkov
  2023-05-17 13:53           ` Christoph Hellwig
  0 siblings, 1 reply; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-17 13:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, sagi, joshi.k

On 5/17/23 13:39, Christoph Hellwig wrote:
> On Wed, May 17, 2023 at 01:32:53PM +0100, Pavel Begunkov wrote:
>> 1) ublk does secondary batching and so may produce multiple cqes,
>> that's not supported. I believe Ming sent patches removing it,
>> but I'd rather not deal with conflicts for now.
>>
>> 2) Some users may have dependencies b/w requests, i.e. a request
>> will only complete when another request's task_work is executed.
>>
>> 3) There might be use cases when you don't wont it to be delayed,
>> IO retries would be a good example. I wouldn't also use it for
>> control paths like ublk_ctrl_uring_cmd.
> 
> You speak a lot of some users and some cases when the only users
> are ublk and nvme, both of which would obviously benefit.
> 
> If you don't want conflicts wait for Ming to finish his work
> and then we can do this cleanly and without leaving dead code
> around.

Aside that you decided to ignore the third point, that's a
generic interface, not nvme specific, there are patches for
net cmds, someone even tried to use it for drm. How do you
think new users are supposed to appear if the only helper
doing the job can hang the userspace for their use case?
Well, then maybe it'll remain nvme/ublk specific with such
an approach.

It is clean, and it's not dead code, and we should not
remove the simpler and more straightforward helper.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-17 13:30         ` Pavel Begunkov
@ 2023-05-17 13:53           ` Christoph Hellwig
  2023-05-17 20:11             ` Pavel Begunkov
  0 siblings, 1 reply; 19+ messages in thread
From: Christoph Hellwig @ 2023-05-17 13:53 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Christoph Hellwig, linux-nvme, linux-block, io-uring, axboe,
	kbusch, sagi, joshi.k

On Wed, May 17, 2023 at 02:30:47PM +0100, Pavel Begunkov wrote:
> Aside that you decided to ignore the third point, that's a
> generic interface, not nvme specific, there are patches for
> net cmds, someone even tried to use it for drm. How do you
> think new users are supposed to appear if the only helper
> doing the job can hang the userspace for their use case?
> Well, then maybe it'll remain nvme/ublk specific with such
> an approach.

New users can add new code when it's actualy needed.  We don't
bloat the kernel for maybe in the future crap as a policy.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-17 12:32     ` Pavel Begunkov
  2023-05-17 12:39       ` Christoph Hellwig
@ 2023-05-17 19:31       ` Jens Axboe
  2023-05-18  2:15         ` Ming Lei
  1 sibling, 1 reply; 19+ messages in thread
From: Jens Axboe @ 2023-05-17 19:31 UTC (permalink / raw)
  To: Pavel Begunkov, Christoph Hellwig
  Cc: linux-nvme, linux-block, io-uring, kbusch, sagi, joshi.k

On 5/17/23 6:32 AM, Pavel Begunkov wrote:
> On 5/17/23 08:23, Christoph Hellwig wrote:
>> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
>>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
>>> commands completion. It further delays the execution of task_work for
>>> DEFER_TASKRUN until there are enough of task_work items queued to meet
>>> the waiting criteria, which reduces the number of wake ups we issue.
>>
>> Why wouldn't you just do that unconditionally for
>> io_uring_cmd_complete_in_task?
> 
> 1) ublk does secondary batching and so may produce multiple cqes,
> that's not supported. I believe Ming sent patches removing it,
> but I'd rather not deal with conflicts for now.

Ming, what's the status of those patches? Looks like we'll end up
with a dependency regardless of the ordering of these. Since these
patches are here now, sanest approach seems to move forward with
this series and defer the conflict resolving to the ublk side.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-17 13:53           ` Christoph Hellwig
@ 2023-05-17 20:11             ` Pavel Begunkov
  0 siblings, 0 replies; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-17 20:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, sagi, joshi.k

On 5/17/23 14:53, Christoph Hellwig wrote:
> On Wed, May 17, 2023 at 02:30:47PM +0100, Pavel Begunkov wrote:
>> Aside that you decided to ignore the third point, that's a
>> generic interface, not nvme specific, there are patches for
>> net cmds, someone even tried to use it for drm. How do you
>> think new users are supposed to appear if the only helper
>> doing the job can hang the userspace for their use case?
>> Well, then maybe it'll remain nvme/ublk specific with such
>> an approach.
> 
> New users can add new code when it's actualy needed.  We don't
> bloat the kernel for maybe in the future crap as a policy.

Let me put it for you this way, it's an absolutely horrendous
idea to leave the old innocently looking name, i.e.
io_uring_cmd_complete_in_task(), and add there a bunch of
restrictions no new user would care about, that's called
shooting yourself in the leg.

So, we need to rename the function, which, again, for absolutely
no reason adds dependency on ublk. Why doing that instead of
waiting until ublk is converted? That's a big mystery.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion
  2023-05-17 19:31       ` Jens Axboe
@ 2023-05-18  2:15         ` Ming Lei
  0 siblings, 0 replies; 19+ messages in thread
From: Ming Lei @ 2023-05-18  2:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pavel Begunkov, Christoph Hellwig, linux-nvme, linux-block,
	io-uring, kbusch, sagi, joshi.k, ming.lei

On Wed, May 17, 2023 at 01:31:00PM -0600, Jens Axboe wrote:
> On 5/17/23 6:32 AM, Pavel Begunkov wrote:
> > On 5/17/23 08:23, Christoph Hellwig wrote:
> >> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
> >>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
> >>> commands completion. It further delays the execution of task_work for
> >>> DEFER_TASKRUN until there are enough of task_work items queued to meet
> >>> the waiting criteria, which reduces the number of wake ups we issue.
> >>
> >> Why wouldn't you just do that unconditionally for
> >> io_uring_cmd_complete_in_task?
> > 
> > 1) ublk does secondary batching and so may produce multiple cqes,
> > that's not supported. I believe Ming sent patches removing it,
> > but I'd rather not deal with conflicts for now.
> 
> Ming, what's the status of those patches? Looks like we'll end up
> with a dependency regardless of the ordering of these. Since these
> patches are here now, sanest approach seems to move forward with
> this series and defer the conflict resolving to the ublk side.

I didn't send patch to remove the batch in ublk, such as, the following
line code:

ublk_queue_cmd():
	...
	if (!llist_add(&data->node, &ubq->io_cmds))
		return;
	...

But I did want to kill it given __io_req_task_work_add() can do batching
process, but we have to re-order request in this list, so can't remove it
now simply, see commit:

	7d4a93176e01 ("ublk_drv: don't forward io commands in reserve order")

Pavel must have misunderstood the following one as the batch removal:

https://lore.kernel.org/linux-block/[email protected]/

	

thanks,
Ming


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper
  2023-05-17 10:33       ` Kanchan Joshi
  2023-05-17 12:00         ` Pavel Begunkov
@ 2023-05-19 15:00         ` Pavel Begunkov
  1 sibling, 0 replies; 19+ messages in thread
From: Pavel Begunkov @ 2023-05-19 15:00 UTC (permalink / raw)
  To: Kanchan Joshi; +Cc: linux-nvme, linux-block, io-uring, axboe, kbusch, hch, sagi

On 5/17/23 11:33, Kanchan Joshi wrote:
> On Tue, May 16, 2023 at 07:52:23PM +0100, Pavel Begunkov wrote:
>> On 5/16/23 11:00, Kanchan Joshi wrote:
>>> On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote:
>>>> We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new
>>>> cmd tw helper accepting TWQ flags, and then add
>>>> io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and
>>>> imply the "lazy" semantics, i.e. it posts no more than 1 CQE and
>>>> delaying execution of this tw should not prevent forward progress.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> ---
[...]
>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>> index 5e32db48696d..476c7877ce58 100644
>>>> --- a/io_uring/uring_cmd.c
>>>> +++ b/io_uring/uring_cmd.c
>>>> @@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>>     ioucmd->task_work_cb(ioucmd, issue_flags);
>>>> }
>>>>
>>>> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>> -            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>>>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>>>> +            unsigned flags)
>>>> {
>>>>     struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
>>>>
>>>>     ioucmd->task_work_cb = task_work_cb;
>>>>     req->io_task_work.func = io_uring_cmd_work;
>>>> -    io_req_task_work_add(req);
>>>> +    __io_req_task_work_add(req, flags);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);
>>
>> --- a/include/linux/io_uring.h
>> +++ b/include/linux/io_uring.h
>>
>> +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>> +{
>> +    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>> +}
>>
>> That should fail for nvme unless exported.
> 
> But it does not. Give it a try.

Took the first patch, killed the export and compiled ublk
and nvme as modules:

ERROR: modpost: "__io_uring_cmd_do_in_task" [drivers/block/ublk_drv.ko] undefined!
ERROR: modpost: "__io_uring_cmd_do_in_task" [drivers/nvme/host/nvme-core.ko] undefined!


diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 476c7877ce58..3bb43122a683 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -30,7 +30,6 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
  	req->io_task_work.func = io_uring_cmd_work;
  	__io_req_task_work_add(req, flags);
  }
-EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);
  

>>> Any reason to export this? No one is using this at the moment.
>>>> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>> +            void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +{
>>>> +    __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE);
>>>> }
>>>> -EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
>>>> +EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy);
>>>
>>> Seems you did not want callers to pass the the new flag (LAZY_WAKE) and
>>> therefore this helper.
>>
>> Yep, I wouldn't mind exposing just *LAZY_WAKE but don't want
>> to let it use whatever flags there might be in the future.
>>
>> Initially I wanted to just make io_uring_cmd_complete_in_task and
>> io_uring_cmd_do_in_task_lazy static inline, but that would need
>> some code shuffling to make it clean.
>>
>>> And if you did not want callers to know about this flag (internal
>>> details of io_uring), it would be better to have two exported helpers
>>> io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task().
>>> Both will use the internal helper __io_uring_cmd_do_in_task with
>>> different flag.
>>
>> That's how it should be in this patch
> 
> Nah, in this patch __io_uring_cmd_do_in_task is exported helper. And
> io_uring_cmd_complete_in_task has been changed too (explicit export to
> header based one). Seems like bit more shuffling than what is necessary.

With the end goal to turn them into inline helpers later on
I think it's fine.

-- 
Pavel Begunkov

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough
  2023-05-15 12:54 [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Pavel Begunkov
                   ` (2 preceding siblings ...)
  2023-05-16 11:42 ` [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Anuj gupta
@ 2023-05-25 14:54 ` Jens Axboe
  3 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2023-05-25 14:54 UTC (permalink / raw)
  To: linux-nvme, linux-block, io-uring, Pavel Begunkov
  Cc: kbusch, hch, sagi, joshi.k


On Mon, 15 May 2023 13:54:41 +0100, Pavel Begunkov wrote:
> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.
> 
> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
> patchset, but for a quick test I took fio/t/io_uring with 4 threads each
> reading their own drive and all pinned to the same CPU to make it CPU
> bound and got +10% throughput improvement.
> 
> [...]

Applied, thanks!

[1/2] io_uring/cmd: add cmd lazy tw wake helper
      commit: 5f3139fc46993b2d653a7aa5cdfe66a91881fd06
[2/2] nvme: optimise io_uring passthrough completion
      commit: f026be0e1e881e3395c3d5418ffc8c2a2203c3f3

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-05-25 14:54 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 12:54 [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Pavel Begunkov
2023-05-15 12:54 ` [PATCH for-next 1/2] io_uring/cmd: add cmd lazy tw wake helper Pavel Begunkov
2023-05-16 10:00   ` Kanchan Joshi
2023-05-16 18:52     ` Pavel Begunkov
2023-05-17 10:33       ` Kanchan Joshi
2023-05-17 12:00         ` Pavel Begunkov
2023-05-19 15:00         ` Pavel Begunkov
2023-05-15 12:54 ` [PATCH for-next 2/2] nvme: optimise io_uring passthrough completion Pavel Begunkov
2023-05-17  7:23   ` Christoph Hellwig
2023-05-17 12:32     ` Pavel Begunkov
2023-05-17 12:39       ` Christoph Hellwig
2023-05-17 13:30         ` Pavel Begunkov
2023-05-17 13:53           ` Christoph Hellwig
2023-05-17 20:11             ` Pavel Begunkov
2023-05-17 19:31       ` Jens Axboe
2023-05-18  2:15         ` Ming Lei
2023-05-16 11:42 ` [PATCH for-next 0/2] Enable IOU_F_TWQ_LAZY_WAKE for passthrough Anuj gupta
2023-05-16 18:38   ` Pavel Begunkov
2023-05-25 14:54 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox