* [PATCH] io_uring: releasing CPU resources when polling [not found] <CGME20240318090025epcas5p452bc7fea225684119c7ebc139787f848@epcas5p4.samsung.com> @ 2024-03-18 9:00 ` Xue [not found] ` <CGME20240326032337epcas5p4d4725729834e3fdb006293d1aab4053d@epcas5p4.samsung.com> 0 siblings, 1 reply; 5+ messages in thread From: Xue @ 2024-03-18 9:00 UTC (permalink / raw) To: axboe Cc: asml.silence, linux-kernel, io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, wenwen.chen, ruyi.zhang, xiaobing.li, cliang01.li, xue01.he From: hexue <[email protected]> This patch is intended to reduce the CPU usage of io_uring in polling mode. When io_uring is configured to use polling but the underlying device is not configured with poll queue, this patch is trying to optimaize the CPU when io_uring is polling for completions of IOs that will actually be completed via interrupts. The patch is implemented as follows: - Get the poll queue information of the underling device in io_uring. - If there is no poll queue, IO will be arrive as interruption, then io_uring's polling detection will keep spinning empty, resulting in a waste of CPU resources, so let the process release CPU before IO completes. - Record the running time and context switching time of each IO, and use these time to determine whether a process continue to schedule. In specific implementations, applying to consecutive IOs, each IO is judged based on the recording time of the previous IO, and when the idle time is greater than 1us, it will schedule to ensure that the performance is not compromised. - Adaptive adjustment to different devices. Due to the real-time nature of time recording, each device's IO processing speed is different, so the CPU optimization effect will vary, but could ensure that the performance will not be degraded. - Set a interface enables application to choose whether or not to require this feature. Add a tag of flag, when the application adds this tag, the optimization provided by this patch will only be adopted if the above condition is met, i.e., if io_uring is polling and the underlying device is not configured with a poll queue, which means that the user chooses the CPU low-loss prioritized approach for IO. If the application does not set this tag, then even if the conditions are met, it will remain the same as native io_uring, with no changes or impacts. The CPU optimization of patch is tested as follows: In the case of reaching the peak of the disk's spec, the performance is not significantly reduced from native kernel, but the CPU usage of per CPU is reduced by about 50% for sequential R&W, and by ~80% for Randomwrite. This optimization does not affect IO performance in other cases, e.g., when both the underlying device and io_uring allow polling, performance is not affected by this patch. - test tool: Fio 3.35, 8 core VM - test method: Run performance tests on bare disks and use the htop tool to observe CPU utilization - Fio peak workload command: [global] ioengine=io_uring norandommap=1 randrepeat=0 refill_buffers group_reporting ramp_time=30s time_based runtime=1m filename=/dev/nvme0n1 hipri=1 direct=1 iodepth=64 [disk0] | [disk0] bs=4k | bs=128k numjobs=16 | numjobs=1 rw=randread/randwrite | rw=read/write - Detailed test results |-------------|---------|--------------------| | rw=read | BW/s |per CPU utilization | | Native | 10.9G | 100% | |Optimization | 10.8G | 44% | |-------------|---------|--------------------| | rw=write | BW/s |per CPU utilization | | Native | 6175 | 100% | |Optimization | 6150 | 32% | |-------------|---------|--------------------| | rw=randread | KIOPS |per CPU utilization | | Native | 1680 | 100% | |Optimization | 1608 | 100% | |-------------|---------|--------------------| | rw=randwrite| KIOPS |per CPU utilization | | Native | 225 | 100% | |Optimization | 225 | 15% | Signed-off-by: hexue <[email protected]> --- include/linux/io_uring_types.h | 12 ++++++++ include/uapi/linux/io_uring.h | 1 + io_uring/io_uring.c | 47 ++++++++++++++++++++++++++++- io_uring/io_uring.h | 2 ++ io_uring/rw.c | 55 ++++++++++++++++++++++++++++++++++ 5 files changed, 116 insertions(+), 1 deletion(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 854ad67a5f70..55d22f6e1eb0 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -224,6 +224,12 @@ struct io_alloc_cache { size_t elem_size; }; +struct iopoll_info { + bool poll_state; + long last_runtime; + long last_irqtime; +}; + struct io_ring_ctx { /* const or read-mostly hot data */ struct { @@ -421,6 +427,7 @@ struct io_ring_ctx { unsigned short n_sqe_pages; struct page **ring_pages; struct page **sqe_pages; + struct xarray poll_array; }; struct io_tw_state { @@ -641,6 +648,11 @@ struct io_kiocb { u64 extra1; u64 extra2; } big_cqe; + /* for adaptive iopoll */ + int poll_flag; + bool poll_state; + struct timespec64 iopoll_start; + struct timespec64 iopoll_end; }; struct io_overflow_cqe { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 7a673b52827b..cd11a7786c51 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -198,6 +198,7 @@ enum { * Removes indirection through the SQ index array. */ #define IORING_SETUP_NO_SQARRAY (1U << 16) +#define IORING_SETUP_NO_POLLQUEUE (1U << 17) enum io_uring_op { IORING_OP_NOP, diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index cd9a137ad6ce..9609acc60868 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -79,6 +79,8 @@ #include <uapi/linux/io_uring.h> +#include <linux/time.h> +#include <linux/timekeeping.h> #include "io-wq.h" #include "io_uring.h" @@ -122,6 +124,9 @@ #define IO_COMPL_BATCH 32 #define IO_REQ_ALLOC_BATCH 8 +#define IO_POLL_QUEUE 1 +#define IO_NO_POLL_QUEUE 0 + enum { IO_CHECK_CQ_OVERFLOW_BIT, IO_CHECK_CQ_DROPPED_BIT, @@ -311,6 +316,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) goto err; ctx->flags = p->flags; + xa_init(&ctx->poll_array); atomic_set(&ctx->cq_wait_nr, IO_CQ_WAKE_INIT); init_waitqueue_head(&ctx->sqo_sq_wait); INIT_LIST_HEAD(&ctx->sqd_list); @@ -1875,11 +1881,32 @@ static bool io_assign_file(struct io_kiocb *req, const struct io_issue_def *def, return !!req->file; } +/* Get poll queue information of the device */ +int get_poll_queue_state(struct io_kiocb *req) +{ + struct block_device *bdev; + struct request_queue *q; + struct inode *inode; + + inode = req->file->f_inode; + if (!inode->i_rdev) + return 1; + bdev = blkdev_get_no_open(inode->i_rdev); + q = bdev->bd_queue; + if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) { + return IO_NO_POLL_QUEUE; + } else { + return IO_POLL_QUEUE; + } +} + static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) { const struct io_issue_def *def = &io_issue_defs[req->opcode]; const struct cred *creds = NULL; + struct io_ring_ctx *ctx = req->ctx; int ret; + u32 index; if (unlikely(!io_assign_file(req, def, issue_flags))) return -EBADF; @@ -1890,6 +1917,21 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) if (!def->audit_skip) audit_uring_entry(req->opcode); + if (ctx->flags & IORING_SETUP_NO_POLLQUEUE) { + index = req->file->f_inode->i_rdev; + struct iopoll_info *entry = xa_load(&ctx->poll_array, index); + + if (!entry) { + entry = kmalloc(sizeof(struct iopoll_info), GFP_KERNEL); + entry->poll_state = get_poll_queue_state(req); + entry->last_runtime = 0; + entry->last_irqtime = 0; + xa_store(&ctx->poll_array, index, entry, GFP_KERNEL); + } + req->poll_state = entry->poll_state; + ktime_get_ts64(&req->iopoll_start); + } + ret = def->issue(req, issue_flags); if (!def->audit_skip) @@ -2176,6 +2218,8 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req, req->file = NULL; req->rsrc_node = NULL; req->task = current; + req->poll_flag = 0; + req->poll_state = 1; if (unlikely(opcode >= IORING_OP_LAST)) { req->opcode = 0; @@ -2921,6 +2965,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) kfree(ctx->cancel_table_locked.hbs); kfree(ctx->io_bl); xa_destroy(&ctx->io_bl_xa); + xa_destroy(&ctx->poll_array); kfree(ctx); } @@ -4050,7 +4095,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN | IORING_SETUP_NO_MMAP | IORING_SETUP_REGISTERED_FD_ONLY | - IORING_SETUP_NO_SQARRAY)) + IORING_SETUP_NO_SQARRAY | IORING_SETUP_NO_POLLQUEUE)) return -EINVAL; return io_uring_create(entries, &p, params); diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index d5495710c178..4281a0bb7ed9 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -125,6 +125,8 @@ static inline void io_req_task_work_add(struct io_kiocb *req) __io_req_task_work_add(req, 0); } +#define LEFT_TIME 3000 + #define io_for_each_link(pos, head) \ for (pos = (head); pos; pos = pos->link) diff --git a/io_uring/rw.c b/io_uring/rw.c index d5e79d9bdc71..db589a2cf659 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -1118,6 +1118,44 @@ void io_rw_fail(struct io_kiocb *req) io_req_set_res(req, res, req->cqe.flags); } +void io_delay(struct io_kiocb *req, struct iopoll_info *entry) +{ + struct hrtimer_sleeper timer; + ktime_t kt; + struct timespec64 tc, oldtc; + enum hrtimer_mode mode; + long sleep_ti; + + if (req->poll_flag == 1) + return; + + if (entry->last_runtime <= entry->last_irqtime || (entry->last_runtime - entry->last_irqtime) < LEFT_TIME) + return; + + req->poll_flag = 1; + ktime_get_ts64(&oldtc); + sleep_ti = (entry->last_runtime - entry->last_irqtime) / 2; + kt = ktime_set(0, sleep_ti); + + mode = HRTIMER_MODE_REL; + hrtimer_init_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode); + hrtimer_set_expires(&timer.timer, kt); + + set_current_state(TASK_UNINTERRUPTIBLE); + hrtimer_sleeper_start_expires(&timer, mode); + if (timer.task) { + io_schedule(); + } + hrtimer_cancel(&timer.timer); + mode = HRTIMER_MODE_ABS; + + __set_current_state(TASK_RUNNING); + destroy_hrtimer_on_stack(&timer.timer); + + ktime_get_ts64(&tc); + entry->last_irqtime = tc.tv_nsec - oldtc.tv_nsec - sleep_ti; +} + int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) { struct io_wq_work_node *pos, *start, *prev; @@ -1136,12 +1174,28 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list); struct file *file = req->file; int ret; + u32 index = file->f_inode->i_rdev; /* * Move completed and retryable entries to our local lists. * If we find a request that requires polling, break out * and complete those lists first, if we have entries there. */ + + if ((ctx->flags & IORING_SETUP_NO_POLLQUEUE) && !req->poll_state) { + struct iopoll_info *entry = xa_load(&ctx->poll_array, index); + + do { + if (READ_ONCE(req->iopoll_completed)) { + ktime_get_ts64(&req->iopoll_end); + entry->last_runtime = req->iopoll_end.tv_nsec - req->iopoll_start.tv_nsec; + break; + } + io_delay(req, entry); + } while (1); + goto complete; + } + if (READ_ONCE(req->iopoll_completed)) break; @@ -1172,6 +1226,7 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) else if (!pos) return 0; +complete: prev = start; wq_list_for_each_resume(pos, prev) { struct io_kiocb *req = container_of(pos, struct io_kiocb, comp_list); -- 2.34.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
[parent not found: <CGME20240326032337epcas5p4d4725729834e3fdb006293d1aab4053d@epcas5p4.samsung.com>]
* Re:io_uring: releasing CPU resources when polling [not found] ` <CGME20240326032337epcas5p4d4725729834e3fdb006293d1aab4053d@epcas5p4.samsung.com> @ 2024-03-26 3:23 ` Xue 2024-03-26 3:39 ` io_uring: " Jens Axboe 0 siblings, 1 reply; 5+ messages in thread From: Xue @ 2024-03-26 3:23 UTC (permalink / raw) To: axboe Cc: asml.silence, linux-kernel, io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, wenwen.chen, ruyi.zhang, xiaobing.li, cliang01.li, xue01.he Hi, I hope this message finds you well. I'm waiting to follow up on the patch I submitted on 3.18, titled "io_uring: releasing CPU resources when polling". I haven't received feedback yet and wondering if you had a chance to look at it. Any guidance or suggestions you could provide would be greatly appreciated. Thanks, Xue He ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: io_uring: releasing CPU resources when polling 2024-03-26 3:23 ` Xue @ 2024-03-26 3:39 ` Jens Axboe [not found] ` <CGME20240418060723epcas5p148ac18fa70b10a2bbbde916130277a18@epcas5p1.samsung.com> 0 siblings, 1 reply; 5+ messages in thread From: Jens Axboe @ 2024-03-26 3:39 UTC (permalink / raw) To: Xue Cc: asml.silence, linux-kernel, io-uring, peiwei.li, joshi.k, kundan.kumar, anuj20.g, wenwen.chen, ruyi.zhang, xiaobing.li, cliang01.li On 3/25/24 9:23 PM, Xue wrote: > Hi, > > I hope this message finds you well. > > I'm waiting to follow up on the patch I submitted on 3.18, > titled "io_uring: releasing CPU resources when polling". > > I haven't received feedback yet and wondering if you had > a chance to look at it. Any guidance or suggestions you could > provide would be greatly appreciated. I did take a look at it, and I have to be honest - I don't like it at all. It's a lot of expensive code in the fast path, for a problem that should not really exist. The system is misconfigured if you're doing polled IO for devices that don't have a poll queue. At some point the block layer returned -EOPNOTSUPP for that, and honestly I think that's a MUCH better solution than adding expensive code in the fast path for something that is really a badly configured setup. -- Jens Axboe ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <CGME20240418060723epcas5p148ac18fa70b10a2bbbde916130277a18@epcas5p1.samsung.com>]
* Re: Re: io_uring: releasing CPU resources when polling. [not found] ` <CGME20240418060723epcas5p148ac18fa70b10a2bbbde916130277a18@epcas5p1.samsung.com> @ 2024-04-18 6:07 ` hexue 0 siblings, 0 replies; 5+ messages in thread From: hexue @ 2024-04-18 6:07 UTC (permalink / raw) To: axboe Cc: anuj20.g, asml.silence, cliang01.li, io-uring, joshi.k, kundan.kumar, linux-kernel, peiwei.li, ruyi.zhang, wenwen.chen, xiaobing.li, xue01.he On 3/26/24 3:39, Jens Axboe wrote: >On 3/25/24 9:23 PM, Xue wrote: >> Hi, >> >> I hope this message finds you well. >> >> I'm waiting to follow up on the patch I submitted on 3.18, >> titled "io_uring: releasing CPU resources when polling". >> >> I haven't received feedback yet and wondering if you had >> a chance to look at it. Any guidance or suggestions you could >> provide would be greatly appreciated. > >I did take a look at it, and I have to be honest - I don't like it at >all. It's a lot of expensive code in the fast path, for a problem that >should not really exist. The system is misconfigured if you're doing >polled IO for devices that don't have a poll queue. At some point the >block layer returned -EOPNOTSUPP for that, and honestly I think that's a >MUCH better solution than adding expensive code in the fast path for >something that is really a badly configured setup. Sorry for my late reply, if you think that the scenario where if you're doing polled IO for devices that don't have a poll queue is just a misconfigured and does not need to be changed too much, then I'm inclined to extend this scenario to all devices, I think it's an effective way to release CPU resources, and I verified this and found that it does have a very good benefit. At the same time I have reduce the code in the fast path. I will release the v2 version of the code with my test results, and please reconsider the feasibility of this solution. -- Xue ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v6] io_uring: releasing CPU resources when polling @ 2024-07-09 8:16 hexue [not found] ` <CGME20240709092454epcas5p4b4ffaa306b3ce12ec57ce4eb19e08572@epcas5p4.samsung.com> 0 siblings, 1 reply; 5+ messages in thread From: hexue @ 2024-07-09 8:16 UTC (permalink / raw) To: axboe; +Cc: asml.silence, io-uring, linux-kernel, hexue io_uring use polling mode could improve the IO performence, but it will spend 100% of CPU resources to do polling. This set a signal "IORING_SETUP_HY_POLL" to application, aim to provide a interface for user to enable a new hybrid polling at io_uring level. A new hybrid poll is implemented on the io_uring layer. Once IO issued, it will not polling immediately, but block first and re-run before IO complete, then poll to reap IO. This poll function could be a suboptimal solution when running on a single thread, it offers the performance lower than regular polling but higher than IRQ, and CPU utilization is also lower than polling. Test Result fio-3.35, Gen 4 device ------------------------------------------------------------------------------------- Performance ------------------------------------------------------------------------------------- write read randwrite randread regular poll BW=3939MiB/s BW=6596MiB/s IOPS=190K IOPS=526K IRQ BW=3927MiB/s BW=6567MiB/s IOPS=181K IOPS=216K hybrid poll BW=3933MiB/s BW=6600MiB/s IOPS=190K IOPS=390K(suboptimal) ------------------------------------------------------------------------------------- CPU Utilization ------------------------------------------------------------------ write read randwrite randread regular poll 100% 100% 100% 100% IRQ 38% 53% 100% 100% hybrid poll 76% 32% 70% 85% ------------------------------------------------------------------ -- changes since v5: - Remove cstime recorder - Use minimize sleep time in different drivers - Use the half of whole runtime to do schedule - Consider as a suboptimal solution between regular poll and IRQ changes since v4: - Rewrote the commit - Update the test results - Reorganized the code basd on 6.11 changes since v3: - Simplified the commit - Add some comments on code changes since v2: - Modified some formatting errors - Move judgement to poll path changes since v1: - Extend hybrid poll to async polled io Signed-off-by: hexue <[email protected]> --- include/linux/io_uring_types.h | 6 +++ include/uapi/linux/io_uring.h | 1 + io_uring/io_uring.c | 3 +- io_uring/rw.c | 74 +++++++++++++++++++++++++++++++++- 4 files changed, 82 insertions(+), 2 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 91224bbcfa73..0897126fb2d7 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -428,6 +428,8 @@ struct io_ring_ctx { unsigned short n_sqe_pages; struct page **ring_pages; struct page **sqe_pages; + /* for hybrid poll*/ + u64 available_time; }; struct io_tw_state { @@ -665,6 +667,10 @@ struct io_kiocb { u64 extra1; u64 extra2; } big_cqe; + /* for hybrid iopoll */ + bool poll_state; + u64 iopoll_start; + u64 iopoll_end; }; struct io_overflow_cqe { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 994bf7af0efe..ef32ec319d1f 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -199,6 +199,7 @@ enum io_uring_sqe_flags_bit { * Removes indirection through the SQ index array. */ #define IORING_SETUP_NO_SQARRAY (1U << 16) +#define IORING_SETUP_HY_POLL (1U << 17) enum io_uring_op { IORING_OP_NOP, diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 816e93e7f949..b38f8af118c5 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -299,6 +299,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) goto err; ctx->flags = p->flags; + ctx->available_time = LLONG_MAX; atomic_set(&ctx->cq_wait_nr, IO_CQ_WAKE_INIT); init_waitqueue_head(&ctx->sqo_sq_wait); INIT_LIST_HEAD(&ctx->sqd_list); @@ -3637,7 +3638,7 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN | IORING_SETUP_NO_MMAP | IORING_SETUP_REGISTERED_FD_ONLY | - IORING_SETUP_NO_SQARRAY)) + IORING_SETUP_NO_SQARRAY | IORING_SETUP_HY_POLL)) return -EINVAL; return io_uring_create(entries, &p, params); diff --git a/io_uring/rw.c b/io_uring/rw.c index 1a2128459cb4..5505f4292ce5 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -772,6 +772,13 @@ static bool need_complete_io(struct io_kiocb *req) S_ISBLK(file_inode(req->file)->i_mode); } +static void init_hybrid_poll(struct io_ring_ctx *ctx, struct io_kiocb *req) +{ + /* make sure every req only block once*/ + req->poll_state = false; + req->iopoll_start = ktime_get_ns(); +} + static int io_rw_init_file(struct io_kiocb *req, fmode_t mode) { struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); @@ -809,6 +816,8 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode) kiocb->ki_flags |= IOCB_HIPRI; kiocb->ki_complete = io_complete_rw_iopoll; req->iopoll_completed = 0; + if (ctx->flags & IORING_SETUP_HY_POLL) + init_hybrid_poll(ctx, req); } else { if (kiocb->ki_flags & IOCB_HIPRI) return -EINVAL; @@ -1106,6 +1115,67 @@ void io_rw_fail(struct io_kiocb *req) io_req_set_res(req, res, req->cqe.flags); } +static u64 io_delay(struct io_ring_ctx *ctx, struct io_kiocb *req) +{ + struct hrtimer_sleeper timer; + enum hrtimer_mode mode; + ktime_t kt; + u64 sleep_time; + + if (req->poll_state) + return 0; + + if (ctx->available_time == LLONG_MAX) + return 0; + + /* Using half running time to do schedul */ + sleep_time = ctx->available_time / 2; + + kt = ktime_set(0, sleep_time); + req->poll_state = true; + + mode = HRTIMER_MODE_REL; + hrtimer_init_sleeper_on_stack(&timer, CLOCK_MONOTONIC, mode); + hrtimer_set_expires(&timer.timer, kt); + set_current_state(TASK_INTERRUPTIBLE); + hrtimer_sleeper_start_expires(&timer, mode); + + if (timer.task) + io_schedule(); + + hrtimer_cancel(&timer.timer); + __set_current_state(TASK_RUNNING); + destroy_hrtimer_on_stack(&timer.timer); + + return sleep_time; +} + +static int io_uring_hybrid_poll(struct io_kiocb *req, + struct io_comp_batch *iob, unsigned int poll_flags) +{ + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + struct io_ring_ctx *ctx = req->ctx; + int ret; + u64 runtime, sleep_time; + + sleep_time = io_delay(ctx, req); + + /* it doesn't implement with io_uring passthrough now */ + ret = req->file->f_op->iopoll(&rw->kiocb, iob, poll_flags); + + req->iopoll_end = ktime_get_ns(); + runtime = req->iopoll_end - req->iopoll_start - sleep_time; + if (runtime < 0) + return 0; + + /* use minimize sleep time if there are different speed + * drivers, it could get more completions from fast one + */ + if (ctx->available_time > runtime) + ctx->available_time = runtime; + return ret; +} + int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) { struct io_wq_work_node *pos, *start, *prev; @@ -1133,7 +1203,9 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin) if (READ_ONCE(req->iopoll_completed)) break; - if (req->opcode == IORING_OP_URING_CMD) { + if (ctx->flags & IORING_SETUP_HY_POLL) { + ret = io_uring_hybrid_poll(req, &iob, poll_flags); + } else if (req->opcode == IORING_OP_URING_CMD) { struct io_uring_cmd *ioucmd; ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd); -- 2.40.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
[parent not found: <CGME20240709092454epcas5p4b4ffaa306b3ce12ec57ce4eb19e08572@epcas5p4.samsung.com>]
* Re: io_uring: releasing CPU resources when polling [not found] ` <CGME20240709092454epcas5p4b4ffaa306b3ce12ec57ce4eb19e08572@epcas5p4.samsung.com> @ 2024-07-09 9:24 ` hexue 0 siblings, 0 replies; 5+ messages in thread From: hexue @ 2024-07-09 9:24 UTC (permalink / raw) To: xue01.he; +Cc: asml.silence, axboe, io-uring, linux-kernel Sorry, please ignore this patch, I will resend one later, I'm sorry for improper operation. -- hexue ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-07-09 9:30 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20240318090025epcas5p452bc7fea225684119c7ebc139787f848@epcas5p4.samsung.com> 2024-03-18 9:00 ` [PATCH] io_uring: releasing CPU resources when polling Xue [not found] ` <CGME20240326032337epcas5p4d4725729834e3fdb006293d1aab4053d@epcas5p4.samsung.com> 2024-03-26 3:23 ` Xue 2024-03-26 3:39 ` io_uring: " Jens Axboe [not found] ` <CGME20240418060723epcas5p148ac18fa70b10a2bbbde916130277a18@epcas5p1.samsung.com> 2024-04-18 6:07 ` hexue 2024-07-09 8:16 [PATCH v6] " hexue [not found] ` <CGME20240709092454epcas5p4b4ffaa306b3ce12ec57ce4eb19e08572@epcas5p4.samsung.com> 2024-07-09 9:24 ` hexue
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox