* [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf
@ 2024-08-08 16:24 Ming Lei
2024-09-25 3:32 ` Akilesh Kailash
0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2024-08-08 16:24 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-block, Pavel Begunkov; +Cc: Ming Lei
Hello,
The 1st 3 patches are cleanup, and prepare for adding sqe group.
The 4th patch supports generic sqe group which is like link chain, but
allows each sqe in group to be issued in parallel and the group shares
same IO_LINK & IO_DRAIN boundary, so N:M dependency can be supported with
sqe group & io link together. sqe group changes nothing on
IOSQE_IO_LINK.
The 5th patch supports one variant of sqe group: allow members to depend
on group leader, so that kernel resource lifetime can be aligned with
group leader or group, then any kernel resource can be shared in this
sqe group, and can be used in generic device zero copy.
The 6th & 7th patches supports providing sqe group buffer via the sqe
group variant.
The 8th patch supports ublk zero copy based on io_uring providing sqe
group buffer.
Tests:
1) pass liburing test
- make runtests
2) write/pass two sqe group test cases:
https://github.com/axboe/liburing/compare/master...ming1:liburing:sqe_group_v2
- covers related sqe flags combination and linking groups, both nop and
one multi-destination file copy.
- cover failure handling test: fail leader IO or member IO in both single
group and linked groups, which is done in each sqe flags combination
test
3) ublksrv zero copy:
ublksrv userspace implements zero copy by sqe group & provide group
kbuf:
git clone https://github.com/ublk-org/ublksrv.git -b group-provide-buf_v2
make test T=loop/009:nbd/061 #ublk zc tests
When running 64KB/512KB block size test on ublk-loop('ublk add -t loop --buffered_io -f $backing'),
it is observed that perf is doubled.
Any comments are welcome!
V5:
- follow Pavel's suggestion to minimize change on io_uring fast code
path: sqe group code is called in by single 'if (unlikely())' from
both issue & completion code path
- simplify & re-write group request completion
avoid to touch io-wq code by completing group leader via tw
directly, just like ->task_complete
re-write group member & leader completion handling, one
simplification is always to free leader via the last member
simplify queueing group members, not support issuing leader
and members in parallel
- fail the whole group if IO_*LINK & IO_DRAIN is set on group
members, and test code to cover this change
- misc cleanup
V4:
- address most comments from Pavel
- fix request double free
- don't use io_req_commit_cqe() in io_req_complete_defer()
- make members' REQ_F_INFLIGHT discoverable
- use common assembling check in submission code path
- drop patch 3 and don't move REQ_F_CQE_SKIP out of io_free_req()
- don't set .accept_group_kbuf for net send zc, in which members
need to be queued after buffer notification is got, and can be
enabled in future
- add .grp_leader field via union, and share storage with .grp_link
- move .grp_refs into one hole of io_kiocb, so that one extra
cacheline isn't needed for io_kiocb
- cleanup & document improvement
V3:
- add IORING_FEAT_SQE_GROUP
- simplify group completion, and minimize change on io_req_complete_defer()
- simplify & cleanup io_queue_group_members()
- fix many failure handling issues
- cover failure handling code in added liburing tests
- remove RFC
V2:
- add generic sqe group, suggested by Kevin Wolf
- add REQ_F_SQE_GROUP_DEP which is based on IOSQE_SQE_GROUP, for sharing
kernel resource in group wide, suggested by Kevin Wolf
- remove sqe ext flag, and use the last bit for IOSQE_SQE_GROUP(Pavel),
in future we still can extend sqe flags with one uring context flag
- initialize group requests via submit state pattern, suggested by Pavel
- all kinds of cleanup & bug fixes
Ming Lei (8):
io_uring: add io_link_req() helper
io_uring: add io_submit_fail_link() helper
io_uring: add helper of io_req_commit_cqe()
io_uring: support SQE group
io_uring: support sqe group with members depending on leader
io_uring: support providing sqe group buffer
io_uring/uring_cmd: support provide group kernel buffer
ublk: support provide io buffer
drivers/block/ublk_drv.c | 160 ++++++++++++++-
include/linux/io_uring/cmd.h | 7 +
include/linux/io_uring_types.h | 54 +++++
include/uapi/linux/io_uring.h | 11 +-
include/uapi/linux/ublk_cmd.h | 7 +-
io_uring/io_uring.c | 359 ++++++++++++++++++++++++++++++---
io_uring/io_uring.h | 16 ++
io_uring/kbuf.c | 60 ++++++
io_uring/kbuf.h | 13 ++
io_uring/net.c | 23 ++-
io_uring/opdef.c | 4 +
io_uring/opdef.h | 2 +
io_uring/rw.c | 20 +-
io_uring/timeout.c | 2 +
io_uring/uring_cmd.c | 28 +++
15 files changed, 720 insertions(+), 46 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf
@ 2024-08-08 16:24 Ming Lei
2024-08-17 4:16 ` Ming Lei
0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2024-08-08 16:24 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-block, Pavel Begunkov; +Cc: Ming Lei
Hello,
The 1st 3 patches are cleanup, and prepare for adding sqe group.
The 4th patch supports generic sqe group which is like link chain, but
allows each sqe in group to be issued in parallel and the group shares
same IO_LINK & IO_DRAIN boundary, so N:M dependency can be supported with
sqe group & io link together. sqe group changes nothing on
IOSQE_IO_LINK.
The 5th patch supports one variant of sqe group: allow members to depend
on group leader, so that kernel resource lifetime can be aligned with
group leader or group, then any kernel resource can be shared in this
sqe group, and can be used in generic device zero copy.
The 6th & 7th patches supports providing sqe group buffer via the sqe
group variant.
The 8th patch supports ublk zero copy based on io_uring providing sqe
group buffer.
Tests:
1) pass liburing test
- make runtests
2) write/pass two sqe group test cases:
https://github.com/axboe/liburing/compare/master...ming1:liburing:sqe_group_v2
- covers related sqe flags combination and linking groups, both nop and
one multi-destination file copy.
- cover failure handling test: fail leader IO or member IO in both single
group and linked groups, which is done in each sqe flags combination
test
3) ublksrv zero copy:
ublksrv userspace implements zero copy by sqe group & provide group
kbuf:
git clone https://github.com/ublk-org/ublksrv.git -b group-provide-buf_v2
make test T=loop/009:nbd/061 #ublk zc tests
When running 64KB/512KB block size test on ublk-loop('ublk add -t loop --buffered_io -f $backing'),
it is observed that perf is doubled.
Any comments are welcome!
V5:
- follow Pavel's suggestion to minimize change on io_uring fast code
path: sqe group code is called in by single 'if (unlikely())' from
both issue & completion code path
- simplify & re-write group request completion
avoid to touch io-wq code by completing group leader via tw
directly, just like ->task_complete
re-write group member & leader completion handling, one
simplification is always to free leader via the last member
simplify queueing group members, not support issuing leader
and members in parallel
- fail the whole group if IO_*LINK & IO_DRAIN is set on group
members, and test code to cover this change
- misc cleanup
V4:
- address most comments from Pavel
- fix request double free
- don't use io_req_commit_cqe() in io_req_complete_defer()
- make members' REQ_F_INFLIGHT discoverable
- use common assembling check in submission code path
- drop patch 3 and don't move REQ_F_CQE_SKIP out of io_free_req()
- don't set .accept_group_kbuf for net send zc, in which members
need to be queued after buffer notification is got, and can be
enabled in future
- add .grp_leader field via union, and share storage with .grp_link
- move .grp_refs into one hole of io_kiocb, so that one extra
cacheline isn't needed for io_kiocb
- cleanup & document improvement
V3:
- add IORING_FEAT_SQE_GROUP
- simplify group completion, and minimize change on io_req_complete_defer()
- simplify & cleanup io_queue_group_members()
- fix many failure handling issues
- cover failure handling code in added liburing tests
- remove RFC
V2:
- add generic sqe group, suggested by Kevin Wolf
- add REQ_F_SQE_GROUP_DEP which is based on IOSQE_SQE_GROUP, for sharing
kernel resource in group wide, suggested by Kevin Wolf
- remove sqe ext flag, and use the last bit for IOSQE_SQE_GROUP(Pavel),
in future we still can extend sqe flags with one uring context flag
- initialize group requests via submit state pattern, suggested by Pavel
- all kinds of cleanup & bug fixes
Ming Lei (8):
io_uring: add io_link_req() helper
io_uring: add io_submit_fail_link() helper
io_uring: add helper of io_req_commit_cqe()
io_uring: support SQE group
io_uring: support sqe group with members depending on leader
io_uring: support providing sqe group buffer
io_uring/uring_cmd: support provide group kernel buffer
ublk: support provide io buffer
drivers/block/ublk_drv.c | 160 ++++++++++++++-
include/linux/io_uring/cmd.h | 7 +
include/linux/io_uring_types.h | 54 +++++
include/uapi/linux/io_uring.h | 11 +-
include/uapi/linux/ublk_cmd.h | 7 +-
io_uring/io_uring.c | 359 ++++++++++++++++++++++++++++++---
io_uring/io_uring.h | 16 ++
io_uring/kbuf.c | 60 ++++++
io_uring/kbuf.h | 13 ++
io_uring/net.c | 23 ++-
io_uring/opdef.c | 4 +
io_uring/opdef.h | 2 +
io_uring/rw.c | 20 +-
io_uring/timeout.c | 2 +
io_uring/uring_cmd.c | 28 +++
15 files changed, 720 insertions(+), 46 deletions(-)
--
2.42.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf
2024-08-08 16:24 Ming Lei
@ 2024-08-17 4:16 ` Ming Lei
2024-08-17 19:48 ` Pavel Begunkov
0 siblings, 1 reply; 5+ messages in thread
From: Ming Lei @ 2024-08-17 4:16 UTC (permalink / raw)
To: Jens Axboe, io-uring, linux-block, Pavel Begunkov
On Fri, Aug 9, 2024 at 12:25 AM Ming Lei <[email protected]> wrote:
>
> Hello,
>
> The 1st 3 patches are cleanup, and prepare for adding sqe group.
>
> The 4th patch supports generic sqe group which is like link chain, but
> allows each sqe in group to be issued in parallel and the group shares
> same IO_LINK & IO_DRAIN boundary, so N:M dependency can be supported with
> sqe group & io link together. sqe group changes nothing on
> IOSQE_IO_LINK.
>
> The 5th patch supports one variant of sqe group: allow members to depend
> on group leader, so that kernel resource lifetime can be aligned with
> group leader or group, then any kernel resource can be shared in this
> sqe group, and can be used in generic device zero copy.
>
> The 6th & 7th patches supports providing sqe group buffer via the sqe
> group variant.
>
> The 8th patch supports ublk zero copy based on io_uring providing sqe
> group buffer.
>
> Tests:
>
> 1) pass liburing test
> - make runtests
>
> 2) write/pass two sqe group test cases:
>
> https://github.com/axboe/liburing/compare/master...ming1:liburing:sqe_group_v2
>
> - covers related sqe flags combination and linking groups, both nop and
> one multi-destination file copy.
>
> - cover failure handling test: fail leader IO or member IO in both single
> group and linked groups, which is done in each sqe flags combination
> test
>
> 3) ublksrv zero copy:
>
> ublksrv userspace implements zero copy by sqe group & provide group
> kbuf:
>
> git clone https://github.com/ublk-org/ublksrv.git -b group-provide-buf_v2
> make test T=loop/009:nbd/061 #ublk zc tests
>
> When running 64KB/512KB block size test on ublk-loop('ublk add -t loop --buffered_io -f $backing'),
> it is observed that perf is doubled.
>
> Any comments are welcome!
>
> V5:
> - follow Pavel's suggestion to minimize change on io_uring fast code
> path: sqe group code is called in by single 'if (unlikely())' from
> both issue & completion code path
>
> - simplify & re-write group request completion
> avoid to touch io-wq code by completing group leader via tw
> directly, just like ->task_complete
>
> re-write group member & leader completion handling, one
> simplification is always to free leader via the last member
>
> simplify queueing group members, not support issuing leader
> and members in parallel
>
> - fail the whole group if IO_*LINK & IO_DRAIN is set on group
> members, and test code to cover this change
>
> - misc cleanup
Hi Pavel,
V5 should address all your comments on V4, so care to take a look?
Thanks,
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf
2024-08-17 4:16 ` Ming Lei
@ 2024-08-17 19:48 ` Pavel Begunkov
0 siblings, 0 replies; 5+ messages in thread
From: Pavel Begunkov @ 2024-08-17 19:48 UTC (permalink / raw)
To: Ming Lei, Jens Axboe, io-uring, linux-block
On 8/17/24 05:16, Ming Lei wrote:
> On Fri, Aug 9, 2024 at 12:25 AM Ming Lei <[email protected]> wrote:
>>
>> Hello,
>>
>> The 1st 3 patches are cleanup, and prepare for adding sqe group.
>>
>> The 4th patch supports generic sqe group which is like link chain, but
>> allows each sqe in group to be issued in parallel and the group shares
>> same IO_LINK & IO_DRAIN boundary, so N:M dependency can be supported with
>> sqe group & io link together. sqe group changes nothing on
>> IOSQE_IO_LINK.
>>
>> The 5th patch supports one variant of sqe group: allow members to depend
>> on group leader, so that kernel resource lifetime can be aligned with
>> group leader or group, then any kernel resource can be shared in this
>> sqe group, and can be used in generic device zero copy.
>>
>> The 6th & 7th patches supports providing sqe group buffer via the sqe
>> group variant.
>>
>> The 8th patch supports ublk zero copy based on io_uring providing sqe
>> group buffer.
>>
>> Tests:
>>
>> 1) pass liburing test
>> - make runtests
>>
>> 2) write/pass two sqe group test cases:
>>
>> https://github.com/axboe/liburing/compare/master...ming1:liburing:sqe_group_v2
>>
>> - covers related sqe flags combination and linking groups, both nop and
>> one multi-destination file copy.
>>
>> - cover failure handling test: fail leader IO or member IO in both single
>> group and linked groups, which is done in each sqe flags combination
>> test
>>
>> 3) ublksrv zero copy:
>>
>> ublksrv userspace implements zero copy by sqe group & provide group
>> kbuf:
>>
>> git clone https://github.com/ublk-org/ublksrv.git -b group-provide-buf_v2
>> make test T=loop/009:nbd/061 #ublk zc tests
>>
>> When running 64KB/512KB block size test on ublk-loop('ublk add -t loop --buffered_io -f $backing'),
>> it is observed that perf is doubled.
>>
>> Any comments are welcome!
>>
>> V5:
>> - follow Pavel's suggestion to minimize change on io_uring fast code
>> path: sqe group code is called in by single 'if (unlikely())' from
>> both issue & completion code path
>>
>> - simplify & re-write group request completion
>> avoid to touch io-wq code by completing group leader via tw
>> directly, just like ->task_complete
>>
>> re-write group member & leader completion handling, one
>> simplification is always to free leader via the last member
>>
>> simplify queueing group members, not support issuing leader
>> and members in parallel
>>
>> - fail the whole group if IO_*LINK & IO_DRAIN is set on group
>> members, and test code to cover this change
>>
>> - misc cleanup
>
> Hi Pavel,
>
> V5 should address all your comments on V4, so care to take a look?
I will, didn't forget about it, but thanks for the reminder.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf
2024-08-08 16:24 [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf Ming Lei
@ 2024-09-25 3:32 ` Akilesh Kailash
0 siblings, 0 replies; 5+ messages in thread
From: Akilesh Kailash @ 2024-09-25 3:32 UTC (permalink / raw)
To: Ming Lei; +Cc: Jens Axboe, io-uring, linux-block, Pavel Begunkov
On Thu, Aug 8, 2024 at 9:24 AM Ming Lei <[email protected]> wrote:
>
> Hello,
>
> The 1st 3 patches are cleanup, and prepare for adding sqe group.
>
> The 4th patch supports generic sqe group which is like link chain, but
> allows each sqe in group to be issued in parallel and the group shares
> same IO_LINK & IO_DRAIN boundary, so N:M dependency can be supported with
> sqe group & io link together. sqe group changes nothing on
> IOSQE_IO_LINK.
>
> The 5th patch supports one variant of sqe group: allow members to depend
> on group leader, so that kernel resource lifetime can be aligned with
> group leader or group, then any kernel resource can be shared in this
> sqe group, and can be used in generic device zero copy.
>
> The 6th & 7th patches supports providing sqe group buffer via the sqe
> group variant.
>
> The 8th patch supports ublk zero copy based on io_uring providing sqe
> group buffer.
>
Hi Ming,
Thanks for working on this feature. I have tested this entire v5 series
for the Android OTA path to evaluate ublk zero copy.
Tested-by: Akilesh Kailash <[email protected]>
> Tests:
>
> 1) pass liburing test
> - make runtests
>
> 2) write/pass two sqe group test cases:
>
> https://github.com/axboe/liburing/compare/master...ming1:liburing:sqe_group_v2
>
> - covers related sqe flags combination and linking groups, both nop and
> one multi-destination file copy.
>
> - cover failure handling test: fail leader IO or member IO in both single
> group and linked groups, which is done in each sqe flags combination
> test
>
> 3) ublksrv zero copy:
>
> ublksrv userspace implements zero copy by sqe group & provide group
> kbuf:
>
> git clone https://github.com/ublk-org/ublksrv.git -b group-provide-buf_v2
> make test T=loop/009:nbd/061 #ublk zc tests
>
> When running 64KB/512KB block size test on ublk-loop('ublk add -t loop --buffered_io -f $backing'),
> it is observed that perf is doubled.
>
> Any comments are welcome!
>
> V5:
> - follow Pavel's suggestion to minimize change on io_uring fast code
> path: sqe group code is called in by single 'if (unlikely())' from
> both issue & completion code path
>
> - simplify & re-write group request completion
> avoid to touch io-wq code by completing group leader via tw
> directly, just like ->task_complete
>
> re-write group member & leader completion handling, one
> simplification is always to free leader via the last member
>
> simplify queueing group members, not support issuing leader
> and members in parallel
>
> - fail the whole group if IO_*LINK & IO_DRAIN is set on group
> members, and test code to cover this change
>
> - misc cleanup
>
> V4:
> - address most comments from Pavel
> - fix request double free
> - don't use io_req_commit_cqe() in io_req_complete_defer()
> - make members' REQ_F_INFLIGHT discoverable
> - use common assembling check in submission code path
> - drop patch 3 and don't move REQ_F_CQE_SKIP out of io_free_req()
> - don't set .accept_group_kbuf for net send zc, in which members
> need to be queued after buffer notification is got, and can be
> enabled in future
> - add .grp_leader field via union, and share storage with .grp_link
> - move .grp_refs into one hole of io_kiocb, so that one extra
> cacheline isn't needed for io_kiocb
> - cleanup & document improvement
>
> V3:
> - add IORING_FEAT_SQE_GROUP
> - simplify group completion, and minimize change on io_req_complete_defer()
> - simplify & cleanup io_queue_group_members()
> - fix many failure handling issues
> - cover failure handling code in added liburing tests
> - remove RFC
>
> V2:
> - add generic sqe group, suggested by Kevin Wolf
> - add REQ_F_SQE_GROUP_DEP which is based on IOSQE_SQE_GROUP, for sharing
> kernel resource in group wide, suggested by Kevin Wolf
> - remove sqe ext flag, and use the last bit for IOSQE_SQE_GROUP(Pavel),
> in future we still can extend sqe flags with one uring context flag
> - initialize group requests via submit state pattern, suggested by Pavel
> - all kinds of cleanup & bug fixes
>
> Ming Lei (8):
> io_uring: add io_link_req() helper
> io_uring: add io_submit_fail_link() helper
> io_uring: add helper of io_req_commit_cqe()
> io_uring: support SQE group
> io_uring: support sqe group with members depending on leader
> io_uring: support providing sqe group buffer
> io_uring/uring_cmd: support provide group kernel buffer
> ublk: support provide io buffer
>
> drivers/block/ublk_drv.c | 160 ++++++++++++++-
> include/linux/io_uring/cmd.h | 7 +
> include/linux/io_uring_types.h | 54 +++++
> include/uapi/linux/io_uring.h | 11 +-
> include/uapi/linux/ublk_cmd.h | 7 +-
> io_uring/io_uring.c | 359 ++++++++++++++++++++++++++++++---
> io_uring/io_uring.h | 16 ++
> io_uring/kbuf.c | 60 ++++++
> io_uring/kbuf.h | 13 ++
> io_uring/net.c | 23 ++-
> io_uring/opdef.c | 4 +
> io_uring/opdef.h | 2 +
> io_uring/rw.c | 20 +-
> io_uring/timeout.c | 2 +
> io_uring/uring_cmd.c | 28 +++
> 15 files changed, 720 insertions(+), 46 deletions(-)
>
> --
> 2.42.0
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-09-25 3:32 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-08 16:24 [PATCH V5 0/8] io_uring: support sqe group and provide group kbuf Ming Lei
2024-09-25 3:32 ` Akilesh Kailash
-- strict thread matches above, loose matches on Subject: below --
2024-08-08 16:24 Ming Lei
2024-08-17 4:16 ` Ming Lei
2024-08-17 19:48 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox