* [PATCH v9 0/1] io_uring: releasing CPU resources when polling [not found] <CGME20241101092007epcas5p29e0c6a6c7a732642cba600bb1c1faff0@epcas5p2.samsung.com> @ 2024-11-01 9:19 ` hexue [not found] ` <CGME20241101092009epcas5p117843070fa5edd377469f13af388fc06@epcas5p1.samsung.com> ` (2 more replies) 0 siblings, 3 replies; 5+ messages in thread From: hexue @ 2024-11-01 9:19 UTC (permalink / raw) To: axboe, asml.silence; +Cc: io-uring, linux-kernel, hexue This patch add a new hybrid poll at io_uring level, it also set a signal "IORING_SETUP_HYBRID_IOPOLL" to application, aim to provide a interface for users to enable hybrid polling. Hybrid poll may appropriate for some performance bottlenecks due to CPU resource constraints, such as some database applications. In a high-concurrency state, not only polling takes up a lot of CPU time, but also operations like calculation and processing also need to compete for CPU time. The MultiRead interface of Rocksdb has been adapted to io_uring. Here used db_bench to construct a situation with high CPU pressure and compared the performance. The test configuration is as follows, ------------------------------------------------------------------- CPU Model Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz CPU Cores 8 Memory 16G SSD Samsung PM9A3 ------------------------------------------------------------------- Test case: ./db_bench --benchmarks=multireadrandom,stats --duration=60 --threads=4/8/16 --use_direct_reads=true --db=/mnt/rocks/test_db --wal_dir=/mnt/rocks/test_db --key_size=4 --value_size=4096 -cache_size=0 -use_existing_db=1 -batch_size=256 -multiread_batched=true -multiread_stride=0 --------------------------------------------------------------- Test result: National Optimization thread sops/sec ops/sec CPU Utilization 16 121953 160233 100%*8 8 120198 116087 90%*8 4 61302 59105 90%*8 --------------------------------------------------------------- The 9th version patch makes following changes: 1. change some member and function name 2. Avoid the expansion of io_kiocb structure. After checking, the hash_node structure is used in asynchronous poll, while the iopoll only supports the dirict io for disk, these two path are different and they will not be used simultaneously, it also confirmed in the code. So I shared this space with iopoll_start. union { /* * for polled requests, i.e. IORING_OP_POLL_ADD and async armed * poll */ struct hlist_node hash_node; /* For IOPOLL setup queues, with hybrid polling */ u64 iopoll_start; }; 3. Avoid the expansion of io_ring_ctx structure. Although there is an 8-byte hole in the first structure, the structure is basically constants and some read-only hot data that will not be changed, that means this cache does not need to be brushed down frequently, but the hybrid_poll_time of the recorded run time had a chance to be modified several times. So I put it in the second structure (submission data), which is still 24 bytes of space, and some of its own variables also need to be modified. 4. Add the poll_state identity to the flags of req. /* every req only blocks once in hybrid poll */ REQ_F_IOPOLL_STATE = IO_REQ_FLAG(REQ_F_HYBRID_IOPOLL_STATE_BIT) -- changes since v7: - rebase code on for-6.12/io_uring - remove unused varibales changes since v6: - Modified IO path, distinct iopoll and uring_cmd_iopoll - update test results changes since v5: - Remove cstime recorder - Use minimize sleep time in different drivers - Use the half of whole runtime to do schedule - Consider as a suboptimal solution between regular poll and IRQ changes since v4: - Rewrote the commit - Update the test results - Reorganized the code basd on 6.11 changes since v3: - Simplified the commit - Add some comments on code changes since v2: - Modified some formatting errors - Move judgement to poll path changes since v1: - Extend hybrid poll to async polled io hexue (1): io_uring: releasing CPU resources when polling include/linux/io_uring_types.h | 19 ++++++- include/uapi/linux/io_uring.h | 3 ++ io_uring/io_uring.c | 8 ++- io_uring/rw.c | 92 ++++++++++++++++++++++++++++++---- 4 files changed, 108 insertions(+), 14 deletions(-) -- 2.40.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CGME20241101092009epcas5p117843070fa5edd377469f13af388fc06@epcas5p1.samsung.com>]
* [PATCH v9 1/1] io_uring: releasing CPU resources when polling [not found] ` <CGME20241101092009epcas5p117843070fa5edd377469f13af388fc06@epcas5p1.samsung.com> @ 2024-11-01 9:19 ` hexue 2024-11-01 14:06 ` Jens Axboe 0 siblings, 1 reply; 5+ messages in thread From: hexue @ 2024-11-01 9:19 UTC (permalink / raw) To: axboe, asml.silence; +Cc: io-uring, linux-kernel, hexue A new hybrid poll is implemented on the io_uring layer. Once IO issued, it will not polling immediately, but block first and re-run before IO complete, then poll to reap IO. This poll function could be a suboptimal solution when running on a single thread, it offers the performance lower than regular polling but higher than IRQ, and CPU utilization is also lower than polling. Signed-off-by: hexue <[email protected]> --- include/linux/io_uring_types.h | 19 ++++++- include/uapi/linux/io_uring.h | 3 ++ io_uring/io_uring.c | 8 ++- io_uring/rw.c | 92 ++++++++++++++++++++++++++++++---- 4 files changed, 108 insertions(+), 14 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 4b9ba523978d..4a85a823b888 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -302,6 +302,11 @@ struct io_ring_ctx { * ->uring_cmd() by io_uring_cmd_insert_cancelable() */ struct hlist_head cancelable_uring_cmd; + /* + * For Hybrid IOPOLL, runtime in hybrid polling, without + * scheduling time + */ + u64 hybrid_poll_time; } ____cacheline_aligned_in_smp; struct { @@ -447,6 +452,7 @@ enum { REQ_F_LINK_TIMEOUT_BIT, REQ_F_NEED_CLEANUP_BIT, REQ_F_POLLED_BIT, + REQ_F_HYBRID_IOPOLL_STATE_BIT, REQ_F_BUFFER_SELECTED_BIT, REQ_F_BUFFER_RING_BIT, REQ_F_REISSUE_BIT, @@ -506,6 +512,8 @@ enum { REQ_F_NEED_CLEANUP = IO_REQ_FLAG(REQ_F_NEED_CLEANUP_BIT), /* already went through poll handler */ REQ_F_POLLED = IO_REQ_FLAG(REQ_F_POLLED_BIT), + /* every req only blocks once in hybrid poll */ + REQ_F_IOPOLL_STATE = IO_REQ_FLAG(REQ_F_HYBRID_IOPOLL_STATE_BIT), /* buffer already selected */ REQ_F_BUFFER_SELECTED = IO_REQ_FLAG(REQ_F_BUFFER_SELECTED_BIT), /* buffer selected from ring, needs commit */ @@ -643,8 +651,15 @@ struct io_kiocb { atomic_t refs; bool cancel_seq_set; struct io_task_work io_task_work; - /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ - struct hlist_node hash_node; + union { + /* + * for polled requests, i.e. IORING_OP_POLL_ADD and async armed + * poll + */ + struct hlist_node hash_node; + /* For IOPOLL setup queues, with hybrid polling */ + u64 iopoll_start; + }; /* internal polling, see IORING_FEAT_FAST_POLL */ struct async_poll *apoll; /* opcode allocated if it needs to store data for async defer */ diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 1fe79e750470..ddd6e42b134d 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -200,6 +200,9 @@ enum io_uring_sqe_flags_bit { */ #define IORING_SETUP_NO_SQARRAY (1U << 16) +/* Use hybrid poll in iopoll process */ +#define IORING_SETUP_HYBRID_IOPOLL (1U << 17) + enum io_uring_op { IORING_OP_NOP, IORING_OP_READV, diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4199fbe6ce13..ed131fc824a0 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -301,6 +301,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) goto err; ctx->flags = p->flags; + ctx->hybrid_poll_time = LLONG_MAX; atomic_set(&ctx->cq_wait_nr, IO_CQ_WAKE_INIT); init_waitqueue_head(&ctx->sqo_sq_wait); INIT_LIST_HEAD(&ctx->sqd_list); @@ -3545,6 +3546,11 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, ctx->clockid = CLOCK_MONOTONIC; ctx->clock_offset = 0; + /* HYBRID_IOPOLL only valid with IOPOLL */ + if ((ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_HYBRID_IOPOLL)) == + IORING_SETUP_HYBRID_IOPOLL) + return -EINVAL; + if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) && !(ctx->flags & IORING_SETUP_IOPOLL) && !(ctx->flags & IORING_SETUP_SQPOLL)) @@ -3724,7 +3730,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN | IORING_SETUP_NO_MMAP | IORING_SETUP_REGISTERED_FD_ONLY | - IORING_SETUP_NO_SQARRAY)) + IORING_SETUP_NO_SQARRAY | IORING_SETUP_HYBRID_IOPOLL)) return -EINVAL; return io_uring_create(entries, &p, params); diff --git a/io_uring/rw.c b/io_uring/rw.c index f023ff49c688..340dc4b7b84f 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -808,6 +808,11 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type) kiocb->ki_flags |= IOCB_HIPRI; kiocb->ki_complete = io_complete_rw_iopoll; req->iopoll_completed = 0; + if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) { + /* make sure every req only blocks once*/ + req->flags &= ~REQ_F_IOPOLL_STATE; + req->iopoll_start = ktime_get_ns(); + } } else { if (kiocb->ki_flags & IOCB_HIPRI) return -EINVAL; @@ -1112,6 +1117,78 @@ void io_rw_fail(struct io_kiocb *req) io_req_set_res(req, res, req->cqe.flags); } +static int io_uring_classic_poll(struct io_kiocb *req, struct io_comp_batch *iob, + unsigned int poll_flags) +{ + struct file *file = req->file; + + if (req->opcode == IORING_OP_URING_CMD) { + struct io_uring_cmd *ioucmd; + + ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); + return file->f_op->uring_cmd_iopoll(ioucmd, iob, poll_flags); + } else { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + return file->f_op->iopoll(&rw->kiocb, iob, poll_flags); + } +} + +static u64 io_hybrid_iopoll_delay(struct io_ring_ctx *ctx, struct io_kiocb *req) +{ + struct hrtimer_sleeper timer; + enum hrtimer_mode mode; + ktime_t kt; + u64 sleep_time; + + if (req->flags & REQ_F_IOPOLL_STATE) + return 0; + + if (ctx->hybrid_poll_time == LLONG_MAX) + return 0; + + /* Using half the running time to do schedule */ + sleep_time = ctx->hybrid_poll_time / 2; + + kt = ktime_set(0, sleep_time); + req->flags |= REQ_F_IOPOLL_STATE; + + mode = HRTIMER_MODE_REL; + hrtimer_init_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode); + hrtimer_set_expires(&timer.timer, kt); + set_current_state(TASK_INTERRUPTIBLE); + hrtimer_sleeper_start_expires(&timer, mode); + + if (timer.task) + io_schedule(); + + hrtimer_cancel(&timer.timer); + __set_current_state(TASK_RUNNING); + destroy_hrtimer_on_stack(&timer.timer); + return sleep_time; +} + +static int io_uring_hybrid_poll(struct io_kiocb *req, + struct io_comp_batch *iob, unsigned int poll_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + u64 runtime, sleep_time; + int ret; + + sleep_time = io_hybrid_iopoll_delay(ctx, req); + ret = io_uring_classic_poll(req, iob, poll_flags); + runtime = ktime_get_ns() - req->iopoll_start - sleep_time; + + /* + * Use minimum sleep time if we're polling devices with different + * latencies. We could get more completions from the faster ones. + */ + if (ctx->hybrid_poll_time > runtime) + ctx->hybrid_poll_time = runtime; + + return ret; +} + int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) { struct io_wq_work_node *pos, *start, *prev; @@ -1128,7 +1205,6 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) wq_list_for_each(pos, start, &ctx->iopoll_list) { struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list); - struct file *file = req->file; int ret; /* @@ -1139,17 +1215,11 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) if (READ_ONCE(req->iopoll_completed)) break; - if (req->opcode == IORING_OP_URING_CMD) { - struct io_uring_cmd *ioucmd; - - ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); - ret = file->f_op->uring_cmd_iopoll(ioucmd, &iob, - poll_flags); - } else { - struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + if (ctx->flags & IORING_SETUP_HYBRID_IOPOLL) + ret = io_uring_hybrid_poll(req, &iob, poll_flags); + else + ret = io_uring_classic_poll(req, &iob, poll_flags); - ret = file->f_op->iopoll(&rw->kiocb, &iob, poll_flags); - } if (unlikely(ret < 0)) return ret; else if (ret) -- 2.40.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v9 1/1] io_uring: releasing CPU resources when polling 2024-11-01 9:19 ` [PATCH v9 1/1] " hexue @ 2024-11-01 14:06 ` Jens Axboe 0 siblings, 0 replies; 5+ messages in thread From: Jens Axboe @ 2024-11-01 14:06 UTC (permalink / raw) To: hexue, asml.silence; +Cc: io-uring, linux-kernel On 11/1/24 3:19 AM, hexue wrote: > A new hybrid poll is implemented on the io_uring layer. Once IO issued, > it will not polling immediately, but block first and re-run before IO > complete, then poll to reap IO. This poll function could be a suboptimal > solution when running on a single thread, it offers the performance lower > than regular polling but higher than IRQ, and CPU utilization is also lower > than polling. This looks much better now. Do you have a patch for liburing to enable testing of hybrid polling as well? Don't care about perf numbers for that, but it should get exercised. -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v9 0/1] io_uring: releasing CPU resources when polling 2024-11-01 9:19 ` [PATCH v9 0/1] io_uring: releasing CPU resources when polling hexue [not found] ` <CGME20241101092009epcas5p117843070fa5edd377469f13af388fc06@epcas5p1.samsung.com> @ 2024-11-01 14:21 ` Jens Axboe 2024-11-01 15:18 ` Jens Axboe 2 siblings, 0 replies; 5+ messages in thread From: Jens Axboe @ 2024-11-01 14:21 UTC (permalink / raw) To: hexue, asml.silence; +Cc: io-uring, linux-kernel On 11/1/24 3:19 AM, hexue wrote: > changes since v7: > - rebase code on for-6.12/io_uring Though not sure why you'd base it on a branch that's long dead, for-6.13/io_uring is the appropriate branch. for-6.12/io_uring was things queued up for 6.12, it went extinct as soon as the merge window opened for 6.12 and it got queued up. Not a big deal as it can get hand applied, but new features should always get based on the branch for the next kernel, not the previous one. -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v9 0/1] io_uring: releasing CPU resources when polling 2024-11-01 9:19 ` [PATCH v9 0/1] io_uring: releasing CPU resources when polling hexue [not found] ` <CGME20241101092009epcas5p117843070fa5edd377469f13af388fc06@epcas5p1.samsung.com> 2024-11-01 14:21 ` [PATCH v9 0/1] " Jens Axboe @ 2024-11-01 15:18 ` Jens Axboe 2 siblings, 0 replies; 5+ messages in thread From: Jens Axboe @ 2024-11-01 15:18 UTC (permalink / raw) To: asml.silence, hexue; +Cc: io-uring, linux-kernel On Fri, 01 Nov 2024 17:19:56 +0800, hexue wrote: > This patch add a new hybrid poll at io_uring level, it also set a signal > "IORING_SETUP_HYBRID_IOPOLL" to application, aim to provide a interface for > users to enable hybrid polling. > > Hybrid poll may appropriate for some performance bottlenecks due to CPU > resource constraints, such as some database applications. In a > high-concurrency state, not only polling takes up a lot of CPU time, but > also operations like calculation and processing also need to compete for > CPU time. > > [...] Applied, thanks! [1/1] io_uring: releasing CPU resources when polling commit: 71b51c2fb200c502626e433ac7e22bcb8a3ae00c Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-01 15:18 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20241101092007epcas5p29e0c6a6c7a732642cba600bb1c1faff0@epcas5p2.samsung.com> 2024-11-01 9:19 ` [PATCH v9 0/1] io_uring: releasing CPU resources when polling hexue [not found] ` <CGME20241101092009epcas5p117843070fa5edd377469f13af388fc06@epcas5p1.samsung.com> 2024-11-01 9:19 ` [PATCH v9 1/1] " hexue 2024-11-01 14:06 ` Jens Axboe 2024-11-01 14:21 ` [PATCH v9 0/1] " Jens Axboe 2024-11-01 15:18 ` Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox