Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough. The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1] patchset, but for a quick test I took fio/t/io_uring with 4 threads each reading their own drive and all pinned to the same CPU to make it CPU bound and got +10% throughput improvement. [1] https://lore.kernel.org/all/cover.1680782016.git.asml.silence@gmail.com/ Pavel Begunkov (2): io_uring/cmd: add cmd lazy tw wake helper nvme: optimise io_uring passthrough completion drivers/nvme/host/ioctl.c | 4 ++-- include/linux/io_uring.h | 18 ++++++++++++++++-- io_uring/uring_cmd.c | 16 ++++++++++++---- 3 files changed, 30 insertions(+), 8 deletions(-) base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6 -- 2.40.0
We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new cmd tw helper accepting TWQ flags, and then add io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and imply the "lazy" semantics, i.e. it posts no more than 1 CQE and delaying execution of this tw should not prevent forward progress. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- include/linux/io_uring.h | 18 ++++++++++++++++-- io_uring/uring_cmd.c | 16 ++++++++++++---- 2 files changed, 28 insertions(+), 6 deletions(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 7fe31b2cd02f..bb9c666bd584 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, unsigned issue_flags); -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)); struct sock *io_uring_get_socket(struct file *file); void __io_uring_cancel(bool cancel_all); void __io_uring_free(struct task_struct *tsk); void io_uring_unreg_ringfd(void); const char *io_uring_get_opcode(u8 opcode); +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned), + unsigned flags); +/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */ +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)); + +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); +} static inline void io_uring_files_cancel(void) { @@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, void (*task_work_cb)(struct io_uring_cmd *, unsigned)) { } +static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ +} static inline struct sock *io_uring_get_socket(struct file *file) { return NULL; diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 5e32db48696d..476c7877ce58 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) ioucmd->task_work_cb(ioucmd, issue_flags); } -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned), + unsigned flags) { struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); ioucmd->task_work_cb = task_work_cb; req->io_task_work.func = io_uring_cmd_work; - io_req_task_work_add(req); + __io_req_task_work_add(req, flags); +} +EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task); + +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE); } -EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); +EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy); static inline void io_req_set_cqe32_extra(struct io_kiocb *req, u64 extra1, u64 extra2) -- 2.40.0
Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough commands completion. It further delays the execution of task_work for DEFER_TASKRUN until there are enough of task_work items queued to meet the waiting criteria, which reduces the number of wake ups we issue. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- drivers/nvme/host/ioctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 81c5c9e38477..52ed1094ccbb 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -521,7 +521,7 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req, if (cookie != NULL && blk_rq_is_poll(req)) nvme_uring_task_cb(ioucmd, IO_URING_F_UNLOCKED); else - io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_cb); + io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb); return RQ_END_IO_FREE; } @@ -543,7 +543,7 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io_meta(struct request *req, if (cookie != NULL && blk_rq_is_poll(req)) nvme_uring_task_meta_cb(ioucmd, IO_URING_F_UNLOCKED); else - io_uring_cmd_complete_in_task(ioucmd, nvme_uring_task_meta_cb); + io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_meta_cb); return RQ_END_IO_NONE; } -- 2.40.0
[-- Attachment #1: Type: text/plain, Size: 4023 bytes --] On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote: >We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new >cmd tw helper accepting TWQ flags, and then add >io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and >imply the "lazy" semantics, i.e. it posts no more than 1 CQE and >delaying execution of this tw should not prevent forward progress. > >Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >--- > include/linux/io_uring.h | 18 ++++++++++++++++-- > io_uring/uring_cmd.c | 16 ++++++++++++---- > 2 files changed, 28 insertions(+), 6 deletions(-) > >diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h >index 7fe31b2cd02f..bb9c666bd584 100644 >--- a/include/linux/io_uring.h >+++ b/include/linux/io_uring.h >@@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, > struct iov_iter *iter, void *ioucmd); > void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, > unsigned issue_flags); >-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >- void (*task_work_cb)(struct io_uring_cmd *, unsigned)); > struct sock *io_uring_get_socket(struct file *file); > void __io_uring_cancel(bool cancel_all); > void __io_uring_free(struct task_struct *tsk); > void io_uring_unreg_ringfd(void); > const char *io_uring_get_opcode(u8 opcode); >+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, >+ void (*task_work_cb)(struct io_uring_cmd *, unsigned), >+ unsigned flags); >+/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */ Should this also translate to some warn_on anywhere? >+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)); >+ >+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >+{ >+ __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); >+} > > static inline void io_uring_files_cancel(void) > { >@@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, > void (*task_work_cb)(struct io_uring_cmd *, unsigned)) > { > } >+static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >+{ >+} > static inline struct sock *io_uring_get_socket(struct file *file) > { > return NULL; >diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c >index 5e32db48696d..476c7877ce58 100644 >--- a/io_uring/uring_cmd.c >+++ b/io_uring/uring_cmd.c >@@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) > ioucmd->task_work_cb(ioucmd, issue_flags); > } > >-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >- void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, >+ void (*task_work_cb)(struct io_uring_cmd *, unsigned), >+ unsigned flags) > { > struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); > > ioucmd->task_work_cb = task_work_cb; > req->io_task_work.func = io_uring_cmd_work; >- io_req_task_work_add(req); >+ __io_req_task_work_add(req, flags); >+} >+EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task); Any reason to export this? No one is using this at the moment. >+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >+{ >+ __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE); > } >-EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); >+EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy); Seems you did not want callers to pass the the new flag (LAZY_WAKE) and therefore this helper. And if you did not want callers to know about this flag (internal details of io_uring), it would be better to have two exported helpers io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task(). Both will use the internal helper __io_uring_cmd_do_in_task with different flag. [-- Attachment #2: Type: text/plain, Size: 0 bytes --]
On Mon, May 15, 2023 at 6:29 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.
>
> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
> patchset, but for a quick test I took fio/t/io_uring with 4 threads each
> reading their own drive and all pinned to the same CPU to make it CPU
> bound and got +10% throughput improvement.
>
> [1] https://lore.kernel.org/all/cover.1680782016.git.asml.silence@gmail.com/
>
> Pavel Begunkov (2):
> io_uring/cmd: add cmd lazy tw wake helper
> nvme: optimise io_uring passthrough completion
>
> drivers/nvme/host/ioctl.c | 4 ++--
> include/linux/io_uring.h | 18 ++++++++++++++++--
> io_uring/uring_cmd.c | 16 ++++++++++++----
> 3 files changed, 30 insertions(+), 8 deletions(-)
>
>
> base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6
> --
> 2.40.0
>
I tried to run a few workloads on my setup with your patches applied. However, I
couldn't see any difference in io passthrough performance. I might have missed
something. Can you share the workload that you ran which gave you the perf
improvement. Here is the workload that I ran -
Without your patches applied -
# taskset -c 0 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0
-u1 -n1 /dev/ng0n1
submitter=0, tid=2049, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=2.83M, BW=1382MiB/s, IOS/call=16/15
IOPS=2.82M, BW=1379MiB/s, IOS/call=16/16
IOPS=2.84M, BW=1388MiB/s, IOS/call=16/15
Exiting on timeout
Maximum IOPS=2.84M
# taskset -c 0,3 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0
-O0 -u1 -n2 /dev/ng0n1 /dev/ng1n1
submitter=0, tid=2046, file=/dev/ng0n1, node=-1
submitter=1, tid=2047, file=/dev/ng1n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=5.72M, BW=2.79GiB/s, IOS/call=16/15
IOPS=5.71M, BW=2.79GiB/s, IOS/call=16/16
IOPS=5.70M, BW=2.78GiB/s, IOS/call=16/15
Exiting on timeout Maximum IOPS=5.72M
With your patches applied -
# taskset -c 0 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0 -O0
-u1 -n1 /dev/ng0n1
submitter=0, tid=2032, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=2.83M, BW=1381MiB/s, IOS/call=16/15
IOPS=2.83M, BW=1379MiB/s, IOS/call=16/15
IOPS=2.83M, BW=1383MiB/s, IOS/call=15/15
Exiting on timeout Maximum IOPS=2.83M
# taskset -c 0,3 t/io_uring -r4 -b512 -d64 -c16 -s16 -p0 -F1 -B1 -P0
-O0 -u1 -n2 /dev/ng0n1 /dev/ng1n1
submitter=1, tid=2037, file=/dev/ng1n1, node=-1
submitter=0, tid=2036, file=/dev/ng0n1, node=-1
polled=0, fixedbufs=1/0, register_files=1, buffered=1, QD=64
Engine=io_uring, sq_ring=64, cq_ring=64
IOPS=5.64M, BW=2.75GiB/s, IOS/call=15/15
IOPS=5.62M, BW=2.75GiB/s, IOS/call=16/16
IOPS=5.62M, BW=2.74GiB/s, IOS/call=16/16
Exiting on timeout Maximum IOPS=5.64M
--
Anuj Gupta
On 5/16/23 12:42, Anuj gupta wrote:
> On Mon, May 15, 2023 at 6:29 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.
>>
>> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
>> patchset, but for a quick test I took fio/t/io_uring with 4 threads each
>> reading their own drive and all pinned to the same CPU to make it CPU
>> bound and got +10% throughput improvement.
>>
>> [1] https://lore.kernel.org/all/cover.1680782016.git.asml.silence@gmail.com/
>>
>> Pavel Begunkov (2):
>> io_uring/cmd: add cmd lazy tw wake helper
>> nvme: optimise io_uring passthrough completion
>>
>> drivers/nvme/host/ioctl.c | 4 ++--
>> include/linux/io_uring.h | 18 ++++++++++++++++--
>> io_uring/uring_cmd.c | 16 ++++++++++++----
>> 3 files changed, 30 insertions(+), 8 deletions(-)
>>
>>
>> base-commit: 9a48d604672220545d209e9996c2a1edbb5637f6
>> --
>> 2.40.0
>>
>
> I tried to run a few workloads on my setup with your patches applied. However, I
> couldn't see any difference in io passthrough performance. I might have missed
> something. Can you share the workload that you ran which gave you the perf
> improvement. Here is the workload that I ran -
The patch is way to make completion batching more consistent. If you're so
lucky that all IO complete before task_work runs, it'll be perfect batching
and there is nothing to improve. That often happens with high throughput
benchmarks because of how consistent they are: no writes, same size,
everything is issued at the same time and so on. In reality it depends
on your use pattern, timings, nvme coalescing, will also change if you
introduce a second drive, and so on.
With the patch t/io_uring should run task_work once for exactly the
number of cqes the user is waiting for, i.e. -c<N>, regardless of
circumstances.
Just tried it out to confirm,
taskset -c 0 nice -n -20 /t/io_uring -p0 -d4 -b8192 -s4 -c4 -F1 -B1 -R0 -X1 -u1 -O0 /dev/ng0n1
Without:
12:11:10 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
12:11:20 PM 0 2.03 0.00 25.95 0.00 0.00 0.00 0.00 0.00 0.00 72.03
With:
12:12:00 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
12:12:10 PM 0 2.22 0.00 17.39 0.00 0.00 0.00 0.00 0.00 0.00 80.40
Double checking it works:
echo 1 > /sys/kernel/debug/tracing/events/io_uring/io_uring_local_work_run/enable
cat /sys/kernel/debug/tracing/trace_pipe
Without I see
io_uring-4108 [000] ..... 653.820369: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108 [000] ..... 653.820371: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108 [000] ..... 653.820382: io_uring_local_work_run: ring 00000000b843f57f, count 2, loops 1
io_uring-4108 [000] ..... 653.820383: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108 [000] ..... 653.820386: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
io_uring-4108 [000] ..... 653.820398: io_uring_local_work_run: ring 00000000b843f57f, count 2, loops 1
io_uring-4108 [000] ..... 653.820398: io_uring_local_work_run: ring 00000000b843f57f, count 1, loops 1
And with patches it's strictly count=4.
Another way would be to add more SSDs to the picture and hope they don't
conspire to complete at the same time
--
Pavel Begunkov
On 5/16/23 11:00, Kanchan Joshi wrote: > On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote: >> We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new >> cmd tw helper accepting TWQ flags, and then add >> io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and >> imply the "lazy" semantics, i.e. it posts no more than 1 CQE and >> delaying execution of this tw should not prevent forward progress. >> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >> --- >> include/linux/io_uring.h | 18 ++++++++++++++++-- >> io_uring/uring_cmd.c | 16 ++++++++++++---- >> 2 files changed, 28 insertions(+), 6 deletions(-) >> >> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h >> index 7fe31b2cd02f..bb9c666bd584 100644 >> --- a/include/linux/io_uring.h >> +++ b/include/linux/io_uring.h >> @@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, >> struct iov_iter *iter, void *ioucmd); >> void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, >> unsigned issue_flags); >> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >> - void (*task_work_cb)(struct io_uring_cmd *, unsigned)); >> struct sock *io_uring_get_socket(struct file *file); >> void __io_uring_cancel(bool cancel_all); >> void __io_uring_free(struct task_struct *tsk); >> void io_uring_unreg_ringfd(void); >> const char *io_uring_get_opcode(u8 opcode); >> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, >> + void (*task_work_cb)(struct io_uring_cmd *, unsigned), >> + unsigned flags); >> +/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */ > > Should this also translate to some warn_on anywhere? Would love to but don't see how. We can only check it doesn't produce more than 1 CQE, but that would need nr_cqes_before = cqes_ready(); tw_item->run(); WARN_ON(cqes_ready() >= nr_cqes_before + 1); but that's just too ugly >> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >> + void (*task_work_cb)(struct io_uring_cmd *, unsigned)); >> + >> +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >> + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >> +{ >> + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); >> +} >> >> static inline void io_uring_files_cancel(void) >> { >> @@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >> void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >> { >> } >> +static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >> + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >> +{ >> +} >> static inline struct sock *io_uring_get_socket(struct file *file) >> { >> return NULL; >> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c >> index 5e32db48696d..476c7877ce58 100644 >> --- a/io_uring/uring_cmd.c >> +++ b/io_uring/uring_cmd.c >> @@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) >> ioucmd->task_work_cb(ioucmd, issue_flags); >> } >> >> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >> - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, >> + void (*task_work_cb)(struct io_uring_cmd *, unsigned), >> + unsigned flags) >> { >> struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); >> >> ioucmd->task_work_cb = task_work_cb; >> req->io_task_work.func = io_uring_cmd_work; >> - io_req_task_work_add(req); >> + __io_req_task_work_add(req, flags); >> +} >> +EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task); --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) +{ + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); +} That should fail for nvme unless exported. > Any reason to export this? No one is using this at the moment. >> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >> + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >> +{ >> + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE); >> } >> -EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); >> +EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy); > > Seems you did not want callers to pass the the new flag (LAZY_WAKE) and > therefore this helper. Yep, I wouldn't mind exposing just *LAZY_WAKE but don't want to let it use whatever flags there might be in the future. Initially I wanted to just make io_uring_cmd_complete_in_task and io_uring_cmd_do_in_task_lazy static inline, but that would need some code shuffling to make it clean. > And if you did not want callers to know about this flag (internal > details of io_uring), it would be better to have two exported helpers > io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task(). > Both will use the internal helper __io_uring_cmd_do_in_task with > different flag. That's how it should be in this patch -- Pavel Begunkov
On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
> commands completion. It further delays the execution of task_work for
> DEFER_TASKRUN until there are enough of task_work items queued to meet
> the waiting criteria, which reduces the number of wake ups we issue.
Why wouldn't you just do that unconditionally for
io_uring_cmd_complete_in_task?
[-- Attachment #1: Type: text/plain, Size: 5770 bytes --] On Tue, May 16, 2023 at 07:52:23PM +0100, Pavel Begunkov wrote: >On 5/16/23 11:00, Kanchan Joshi wrote: >>On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote: >>>We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new >>>cmd tw helper accepting TWQ flags, and then add >>>io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and >>>imply the "lazy" semantics, i.e. it posts no more than 1 CQE and >>>delaying execution of this tw should not prevent forward progress. >>> >>>Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >>>--- >>>include/linux/io_uring.h | 18 ++++++++++++++++-- >>>io_uring/uring_cmd.c | 16 ++++++++++++---- >>>2 files changed, 28 insertions(+), 6 deletions(-) >>> >>>diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h >>>index 7fe31b2cd02f..bb9c666bd584 100644 >>>--- a/include/linux/io_uring.h >>>+++ b/include/linux/io_uring.h >>>@@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, >>> struct iov_iter *iter, void *ioucmd); >>>void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2, >>> unsigned issue_flags); >>>-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >>>- void (*task_work_cb)(struct io_uring_cmd *, unsigned)); >>>struct sock *io_uring_get_socket(struct file *file); >>>void __io_uring_cancel(bool cancel_all); >>>void __io_uring_free(struct task_struct *tsk); >>>void io_uring_unreg_ringfd(void); >>>const char *io_uring_get_opcode(u8 opcode); >>>+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, >>>+ void (*task_work_cb)(struct io_uring_cmd *, unsigned), >>>+ unsigned flags); >>>+/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */ >> >>Should this also translate to some warn_on anywhere? > >Would love to but don't see how. We can only check it doesn't >produce more than 1 CQE, but that would need > >nr_cqes_before = cqes_ready(); >tw_item->run(); >WARN_ON(cqes_ready() >= nr_cqes_before + 1); > >but that's just too ugly > > >>>+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >>>+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)); >>>+ >>>+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >>>+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >>>+{ >>>+ __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); >>>+} >>> >>>static inline void io_uring_files_cancel(void) >>>{ >>>@@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >>> void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >>>{ >>>} >>>+static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >>>+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >>>+{ >>>+} >>>static inline struct sock *io_uring_get_socket(struct file *file) >>>{ >>> return NULL; >>>diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c >>>index 5e32db48696d..476c7877ce58 100644 >>>--- a/io_uring/uring_cmd.c >>>+++ b/io_uring/uring_cmd.c >>>@@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) >>> ioucmd->task_work_cb(ioucmd, issue_flags); >>>} >>> >>>-void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >>>- void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >>>+void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, >>>+ void (*task_work_cb)(struct io_uring_cmd *, unsigned), >>>+ unsigned flags) >>>{ >>> struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); >>> >>> ioucmd->task_work_cb = task_work_cb; >>> req->io_task_work.func = io_uring_cmd_work; >>>- io_req_task_work_add(req); >>>+ __io_req_task_work_add(req, flags); >>>+} >>>+EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task); > >--- a/include/linux/io_uring.h >+++ b/include/linux/io_uring.h > >+static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >+{ >+ __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); >+} > >That should fail for nvme unless exported. But it does not. Give it a try. >>Any reason to export this? No one is using this at the moment. >>>+void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >>>+ void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >>>+{ >>>+ __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE); >>>} >>>-EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); >>>+EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy); >> >>Seems you did not want callers to pass the the new flag (LAZY_WAKE) and >>therefore this helper. > >Yep, I wouldn't mind exposing just *LAZY_WAKE but don't want >to let it use whatever flags there might be in the future. > >Initially I wanted to just make io_uring_cmd_complete_in_task and >io_uring_cmd_do_in_task_lazy static inline, but that would need >some code shuffling to make it clean. > >>And if you did not want callers to know about this flag (internal >>details of io_uring), it would be better to have two exported helpers >>io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task(). >>Both will use the internal helper __io_uring_cmd_do_in_task with >>different flag. > >That's how it should be in this patch Nah, in this patch __io_uring_cmd_do_in_task is exported helper. And io_uring_cmd_complete_in_task has been changed too (explicit export to header based one). Seems like bit more shuffling than what is necessary. [-- Attachment #2: Type: text/plain, Size: 0 bytes --]
On 5/17/23 11:33, Kanchan Joshi wrote:
> On Tue, May 16, 2023 at 07:52:23PM +0100, Pavel Begunkov wrote:
>> On 5/16/23 11:00, Kanchan Joshi wrote:
>>> On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote:
>>>> We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new
>>>> cmd tw helper accepting TWQ flags, and then add
>>>> io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and
>>>> imply the "lazy" semantics, i.e. it posts no more than 1 CQE and
>>>> delaying execution of this tw should not prevent forward progress.
>>>>
>>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>>>> ---
>>>> include/linux/io_uring.h | 18 ++++++++++++++++--
>>>> io_uring/uring_cmd.c | 16 ++++++++++++----
>>>> 2 files changed, 28 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
>>>> index 7fe31b2cd02f..bb9c666bd584 100644
>>>> --- a/include/linux/io_uring.h
>>>> +++ b/include/linux/io_uring.h
>>>> @@ -46,13 +46,23 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
>>>> struct iov_iter *iter, void *ioucmd);
>>>> void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
>>>> unsigned issue_flags);
>>>> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>> - void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>>>> struct sock *io_uring_get_socket(struct file *file);
>>>> void __io_uring_cancel(bool cancel_all);
>>>> void __io_uring_free(struct task_struct *tsk);
>>>> void io_uring_unreg_ringfd(void);
>>>> const char *io_uring_get_opcode(u8 opcode);
>>>> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>>>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>>>> + unsigned flags);
>>>> +/* users should follow semantics of IOU_F_TWQ_LAZY_WAKE */
>>>
>>> Should this also translate to some warn_on anywhere?
>>
>> Would love to but don't see how. We can only check it doesn't
>> produce more than 1 CQE, but that would need
>>
>> nr_cqes_before = cqes_ready();
>> tw_item->run();
>> WARN_ON(cqes_ready() >= nr_cqes_before + 1);
>>
>> but that's just too ugly
>>
>>
>>>> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned));
>>>> +
>>>> +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +{
>>>> + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>>>> +}
>>>>
>>>> static inline void io_uring_files_cancel(void)
>>>> {
>>>> @@ -85,6 +95,10 @@ static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>> void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> {
>>>> }
>>>> +static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +{
>>>> +}
>>>> static inline struct sock *io_uring_get_socket(struct file *file)
>>>> {
>>>> return NULL;
>>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>>>> index 5e32db48696d..476c7877ce58 100644
>>>> --- a/io_uring/uring_cmd.c
>>>> +++ b/io_uring/uring_cmd.c
>>>> @@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts)
>>>> ioucmd->task_work_cb(ioucmd, issue_flags);
>>>> }
>>>>
>>>> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>>>> - void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
>>>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned),
>>>> + unsigned flags)
>>>> {
>>>> struct io_kiocb *req = cmd_to_io_kiocb(ioucmd);
>>>>
>>>> ioucmd->task_work_cb = task_work_cb;
>>>> req->io_task_work.func = io_uring_cmd_work;
>>>> - io_req_task_work_add(req);
>>>> + __io_req_task_work_add(req, flags);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task);
>>
>> --- a/include/linux/io_uring.h
>> +++ b/include/linux/io_uring.h
>>
>> +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd,
>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>> +{
>> + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0);
>> +}
>>
>> That should fail for nvme unless exported.
>
> But it does not. Give it a try.
>
>>> Any reason to export this? No one is using this at the moment.
>>>> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
>>>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned))
>>>> +{
>>>> + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE);
>>>> }
>>>> -EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task);
>>>> +EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy);
>>>
>>> Seems you did not want callers to pass the the new flag (LAZY_WAKE) and
>>> therefore this helper.
>>
>> Yep, I wouldn't mind exposing just *LAZY_WAKE but don't want
>> to let it use whatever flags there might be in the future.
>>
>> Initially I wanted to just make io_uring_cmd_complete_in_task and
>> io_uring_cmd_do_in_task_lazy static inline, but that would need
>> some code shuffling to make it clean.
>>
>>> And if you did not want callers to know about this flag (internal
>>> details of io_uring), it would be better to have two exported helpers
>>> io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task().
>>> Both will use the internal helper __io_uring_cmd_do_in_task with
>>> different flag.
>>
>> That's how it should be in this patch
>
> Nah, in this patch __io_uring_cmd_do_in_task is exported helper. And
> io_uring_cmd_complete_in_task has been changed too (explicit export to
> header based one). Seems like bit more shuffling than what is necessary
Ok, I'll check, thanks
--
Pavel Begunkov
On 5/17/23 08:23, Christoph Hellwig wrote:
> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
>> commands completion. It further delays the execution of task_work for
>> DEFER_TASKRUN until there are enough of task_work items queued to meet
>> the waiting criteria, which reduces the number of wake ups we issue.
>
> Why wouldn't you just do that unconditionally for
> io_uring_cmd_complete_in_task?
1) ublk does secondary batching and so may produce multiple cqes,
that's not supported. I believe Ming sent patches removing it,
but I'd rather not deal with conflicts for now.
2) Some users may have dependencies b/w requests, i.e. a request
will only complete when another request's task_work is executed.
3) There might be use cases when you don't wont it to be delayed,
IO retries would be a good example. I wouldn't also use it for
control paths like ublk_ctrl_uring_cmd.
Let's better keep io_uring_cmd_complete_in_task() as the default
option for the interface.
--
Pavel Begunkov
On Wed, May 17, 2023 at 01:32:53PM +0100, Pavel Begunkov wrote:
> 1) ublk does secondary batching and so may produce multiple cqes,
> that's not supported. I believe Ming sent patches removing it,
> but I'd rather not deal with conflicts for now.
>
> 2) Some users may have dependencies b/w requests, i.e. a request
> will only complete when another request's task_work is executed.
>
> 3) There might be use cases when you don't wont it to be delayed,
> IO retries would be a good example. I wouldn't also use it for
> control paths like ublk_ctrl_uring_cmd.
You speak a lot of some users and some cases when the only users
are ublk and nvme, both of which would obviously benefit.
If you don't want conflicts wait for Ming to finish his work
and then we can do this cleanly and without leaving dead code
around.
On 5/17/23 13:39, Christoph Hellwig wrote:
> On Wed, May 17, 2023 at 01:32:53PM +0100, Pavel Begunkov wrote:
>> 1) ublk does secondary batching and so may produce multiple cqes,
>> that's not supported. I believe Ming sent patches removing it,
>> but I'd rather not deal with conflicts for now.
>>
>> 2) Some users may have dependencies b/w requests, i.e. a request
>> will only complete when another request's task_work is executed.
>>
>> 3) There might be use cases when you don't wont it to be delayed,
>> IO retries would be a good example. I wouldn't also use it for
>> control paths like ublk_ctrl_uring_cmd.
>
> You speak a lot of some users and some cases when the only users
> are ublk and nvme, both of which would obviously benefit.
>
> If you don't want conflicts wait for Ming to finish his work
> and then we can do this cleanly and without leaving dead code
> around.
Aside that you decided to ignore the third point, that's a
generic interface, not nvme specific, there are patches for
net cmds, someone even tried to use it for drm. How do you
think new users are supposed to appear if the only helper
doing the job can hang the userspace for their use case?
Well, then maybe it'll remain nvme/ublk specific with such
an approach.
It is clean, and it's not dead code, and we should not
remove the simpler and more straightforward helper.
--
Pavel Begunkov
On Wed, May 17, 2023 at 02:30:47PM +0100, Pavel Begunkov wrote:
> Aside that you decided to ignore the third point, that's a
> generic interface, not nvme specific, there are patches for
> net cmds, someone even tried to use it for drm. How do you
> think new users are supposed to appear if the only helper
> doing the job can hang the userspace for their use case?
> Well, then maybe it'll remain nvme/ublk specific with such
> an approach.
New users can add new code when it's actualy needed. We don't
bloat the kernel for maybe in the future crap as a policy.
On 5/17/23 6:32 AM, Pavel Begunkov wrote:
> On 5/17/23 08:23, Christoph Hellwig wrote:
>> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote:
>>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough
>>> commands completion. It further delays the execution of task_work for
>>> DEFER_TASKRUN until there are enough of task_work items queued to meet
>>> the waiting criteria, which reduces the number of wake ups we issue.
>>
>> Why wouldn't you just do that unconditionally for
>> io_uring_cmd_complete_in_task?
>
> 1) ublk does secondary batching and so may produce multiple cqes,
> that's not supported. I believe Ming sent patches removing it,
> but I'd rather not deal with conflicts for now.
Ming, what's the status of those patches? Looks like we'll end up
with a dependency regardless of the ordering of these. Since these
patches are here now, sanest approach seems to move forward with
this series and defer the conflict resolving to the ublk side.
--
Jens Axboe
On 5/17/23 14:53, Christoph Hellwig wrote:
> On Wed, May 17, 2023 at 02:30:47PM +0100, Pavel Begunkov wrote:
>> Aside that you decided to ignore the third point, that's a
>> generic interface, not nvme specific, there are patches for
>> net cmds, someone even tried to use it for drm. How do you
>> think new users are supposed to appear if the only helper
>> doing the job can hang the userspace for their use case?
>> Well, then maybe it'll remain nvme/ublk specific with such
>> an approach.
>
> New users can add new code when it's actualy needed. We don't
> bloat the kernel for maybe in the future crap as a policy.
Let me put it for you this way, it's an absolutely horrendous
idea to leave the old innocently looking name, i.e.
io_uring_cmd_complete_in_task(), and add there a bunch of
restrictions no new user would care about, that's called
shooting yourself in the leg.
So, we need to rename the function, which, again, for absolutely
no reason adds dependency on ublk. Why doing that instead of
waiting until ublk is converted? That's a big mystery.
--
Pavel Begunkov
On Wed, May 17, 2023 at 01:31:00PM -0600, Jens Axboe wrote: > On 5/17/23 6:32 AM, Pavel Begunkov wrote: > > On 5/17/23 08:23, Christoph Hellwig wrote: > >> On Mon, May 15, 2023 at 01:54:43PM +0100, Pavel Begunkov wrote: > >>> Use IOU_F_TWQ_LAZY_WAKE via iou_cmd_exec_in_task_lazy() for passthrough > >>> commands completion. It further delays the execution of task_work for > >>> DEFER_TASKRUN until there are enough of task_work items queued to meet > >>> the waiting criteria, which reduces the number of wake ups we issue. > >> > >> Why wouldn't you just do that unconditionally for > >> io_uring_cmd_complete_in_task? > > > > 1) ublk does secondary batching and so may produce multiple cqes, > > that's not supported. I believe Ming sent patches removing it, > > but I'd rather not deal with conflicts for now. > > Ming, what's the status of those patches? Looks like we'll end up > with a dependency regardless of the ordering of these. Since these > patches are here now, sanest approach seems to move forward with > this series and defer the conflict resolving to the ublk side. I didn't send patch to remove the batch in ublk, such as, the following line code: ublk_queue_cmd(): ... if (!llist_add(&data->node, &ubq->io_cmds)) return; ... But I did want to kill it given __io_req_task_work_add() can do batching process, but we have to re-order request in this list, so can't remove it now simply, see commit: 7d4a93176e01 ("ublk_drv: don't forward io commands in reserve order") Pavel must have misunderstood the following one as the batch removal: https://lore.kernel.org/linux-block/20230427124414.319945-2-ming.lei@redhat.com/ thanks, Ming
On 5/17/23 11:33, Kanchan Joshi wrote: > On Tue, May 16, 2023 at 07:52:23PM +0100, Pavel Begunkov wrote: >> On 5/16/23 11:00, Kanchan Joshi wrote: >>> On Mon, May 15, 2023 at 01:54:42PM +0100, Pavel Begunkov wrote: >>>> We want to use IOU_F_TWQ_LAZY_WAKE in commands. First, introduce a new >>>> cmd tw helper accepting TWQ flags, and then add >>>> io_uring_cmd_do_in_task_laz() that will pass IOU_F_TWQ_LAZY_WAKE and >>>> imply the "lazy" semantics, i.e. it posts no more than 1 CQE and >>>> delaying execution of this tw should not prevent forward progress. >>>> >>>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> >>>> --- [...] >>>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c >>>> index 5e32db48696d..476c7877ce58 100644 >>>> --- a/io_uring/uring_cmd.c >>>> +++ b/io_uring/uring_cmd.c >>>> @@ -20,16 +20,24 @@ static void io_uring_cmd_work(struct io_kiocb *req, struct io_tw_state *ts) >>>> ioucmd->task_work_cb(ioucmd, issue_flags); >>>> } >>>> >>>> -void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >>>> - void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >>>> +void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, >>>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned), >>>> + unsigned flags) >>>> { >>>> struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); >>>> >>>> ioucmd->task_work_cb = task_work_cb; >>>> req->io_task_work.func = io_uring_cmd_work; >>>> - io_req_task_work_add(req); >>>> + __io_req_task_work_add(req, flags); >>>> +} >>>> +EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task); >> >> --- a/include/linux/io_uring.h >> +++ b/include/linux/io_uring.h >> >> +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, >> + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >> +{ >> + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, 0); >> +} >> >> That should fail for nvme unless exported. > > But it does not. Give it a try. Took the first patch, killed the export and compiled ublk and nvme as modules: ERROR: modpost: "__io_uring_cmd_do_in_task" [drivers/block/ublk_drv.ko] undefined! ERROR: modpost: "__io_uring_cmd_do_in_task" [drivers/nvme/host/nvme-core.ko] undefined! diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 476c7877ce58..3bb43122a683 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -30,7 +30,6 @@ void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd, req->io_task_work.func = io_uring_cmd_work; __io_req_task_work_add(req, flags); } -EXPORT_SYMBOL_GPL(__io_uring_cmd_do_in_task); >>> Any reason to export this? No one is using this at the moment. >>>> +void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd, >>>> + void (*task_work_cb)(struct io_uring_cmd *, unsigned)) >>>> +{ >>>> + __io_uring_cmd_do_in_task(ioucmd, task_work_cb, IOU_F_TWQ_LAZY_WAKE); >>>> } >>>> -EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); >>>> +EXPORT_SYMBOL_GPL(io_uring_cmd_do_in_task_lazy); >>> >>> Seems you did not want callers to pass the the new flag (LAZY_WAKE) and >>> therefore this helper. >> >> Yep, I wouldn't mind exposing just *LAZY_WAKE but don't want >> to let it use whatever flags there might be in the future. >> >> Initially I wanted to just make io_uring_cmd_complete_in_task and >> io_uring_cmd_do_in_task_lazy static inline, but that would need >> some code shuffling to make it clean. >> >>> And if you did not want callers to know about this flag (internal >>> details of io_uring), it would be better to have two exported helpers >>> io_uring_cmd_do_in_task_lazy() and io_uring_cmd_complete_in_task(). >>> Both will use the internal helper __io_uring_cmd_do_in_task with >>> different flag. >> >> That's how it should be in this patch > > Nah, in this patch __io_uring_cmd_do_in_task is exported helper. And > io_uring_cmd_complete_in_task has been changed too (explicit export to > header based one). Seems like bit more shuffling than what is necessary. With the end goal to turn them into inline helpers later on I think it's fine. -- Pavel Begunkov
On Mon, 15 May 2023 13:54:41 +0100, Pavel Begunkov wrote:
> Let cmds to use IOU_F_TWQ_LAZY_WAKE and enable it for nvme passthrough.
>
> The result should be same as in test to the original IOU_F_TWQ_LAZY_WAKE [1]
> patchset, but for a quick test I took fio/t/io_uring with 4 threads each
> reading their own drive and all pinned to the same CPU to make it CPU
> bound and got +10% throughput improvement.
>
> [...]
Applied, thanks!
[1/2] io_uring/cmd: add cmd lazy tw wake helper
commit: 5f3139fc46993b2d653a7aa5cdfe66a91881fd06
[2/2] nvme: optimise io_uring passthrough completion
commit: f026be0e1e881e3395c3d5418ffc8c2a2203c3f3
Best regards,
--
Jens Axboe