* [RFC 0/2] io_task_work optimization @ 2021-08-23 18:36 Hao Xu 2021-08-23 18:36 ` [PATCH 1/2] io_uring: run task_work when sqthread is waken up Hao Xu ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: Hao Xu @ 2021-08-23 18:36 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi running task_work may not be a big bottleneck now, but it's never worse to make it move forward a little bit. I'm trying to construct tests to prove it is better in some cases where it should be theoretically. Currently only prove it is not worse by running fio tests(sometimes a little bit better). So just put it here for comments and suggestion. Hao Xu (2): io_uring: run task_work when sqthread is waken up io_uring: add irq completion work to the head of task_list fs/io-wq.h | 9 +++++++++ fs/io_uring.c | 23 ++++++++++++++--------- 2 files changed, 23 insertions(+), 9 deletions(-) -- 2.24.4 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 1/2] io_uring: run task_work when sqthread is waken up 2021-08-23 18:36 [RFC 0/2] io_task_work optimization Hao Xu @ 2021-08-23 18:36 ` Hao Xu 2021-08-23 18:36 ` [PATCH 2/2] io_uring: add irq completion work to the head of task_list Hao Xu 2021-08-25 15:58 ` [RFC 0/2] io_task_work optimization Jens Axboe 2 siblings, 0 replies; 11+ messages in thread From: Hao Xu @ 2021-08-23 18:36 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi Sqthread may be waken up because of task_work added, since we are now heavily using task_work, let's run it as soon as possible. Signed-off-by: Hao Xu <[email protected]> --- fs/io_uring.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index c3bd2b3fc46b..8172f5f893ad 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6962,6 +6962,8 @@ static int io_sq_thread(void *data) } finish_wait(&sqd->wait, &wait); + if (current->task_works) + io_run_task_work(); timeout = jiffies + sqd->sq_thread_idle; } -- 2.24.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/2] io_uring: add irq completion work to the head of task_list 2021-08-23 18:36 [RFC 0/2] io_task_work optimization Hao Xu 2021-08-23 18:36 ` [PATCH 1/2] io_uring: run task_work when sqthread is waken up Hao Xu @ 2021-08-23 18:36 ` Hao Xu 2021-08-23 18:41 ` Hao Xu 2021-08-24 12:57 ` Pavel Begunkov 2021-08-25 15:58 ` [RFC 0/2] io_task_work optimization Jens Axboe 2 siblings, 2 replies; 11+ messages in thread From: Hao Xu @ 2021-08-23 18:36 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi Now we have a lot of task_work users, some are just to complete a req and generate a cqe. Let's put the work at the head position of the task_list, so that it can be handled quickly and thus to reduce avg req latency. an explanatory case: origin timeline: submit_sqe-->irq-->add completion task_work -->run heavy work0~n-->run completion task_work now timeline: submit_sqe-->irq-->add completion task_work -->run completion task_work-->run heavy work0~n Signed-off-by: Hao Xu <[email protected]> --- fs/io-wq.h | 9 +++++++++ fs/io_uring.c | 21 ++++++++++++--------- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/fs/io-wq.h b/fs/io-wq.h index 308af3928424..51b4408fd177 100644 --- a/fs/io-wq.h +++ b/fs/io-wq.h @@ -41,6 +41,15 @@ static inline void wq_list_add_after(struct io_wq_work_node *node, list->last = node; } +static inline void wq_list_add_head(struct io_wq_work_node *node, + struct io_wq_work_list *list) +{ + node->next = list->first; + list->first = node; + if (!node->next) + list->last = node; +} + static inline void wq_list_add_tail(struct io_wq_work_node *node, struct io_wq_work_list *list) { diff --git a/fs/io_uring.c b/fs/io_uring.c index 8172f5f893ad..954cd8583945 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2050,7 +2050,7 @@ static void tctx_task_work(struct callback_head *cb) ctx_flush_and_put(ctx); } -static void io_req_task_work_add(struct io_kiocb *req) +static void io_req_task_work_add(struct io_kiocb *req, bool emergency) { struct task_struct *tsk = req->task; struct io_uring_task *tctx = tsk->io_uring; @@ -2062,7 +2062,10 @@ static void io_req_task_work_add(struct io_kiocb *req) WARN_ON_ONCE(!tctx); spin_lock_irqsave(&tctx->task_lock, flags); - wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); + if (emergency) + wq_list_add_head(&req->io_task_work.node, &tctx->task_list); + else + wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); running = tctx->task_running; if (!running) tctx->task_running = true; @@ -2122,19 +2125,19 @@ static void io_req_task_queue_fail(struct io_kiocb *req, int ret) { req->result = ret; req->io_task_work.func = io_req_task_cancel; - io_req_task_work_add(req); + io_req_task_work_add(req, true); } static void io_req_task_queue(struct io_kiocb *req) { req->io_task_work.func = io_req_task_submit; - io_req_task_work_add(req); + io_req_task_work_add(req, false); } static void io_req_task_queue_reissue(struct io_kiocb *req) { req->io_task_work.func = io_queue_async_work; - io_req_task_work_add(req); + io_req_task_work_add(req, false); } static inline void io_queue_next(struct io_kiocb *req) @@ -2249,7 +2252,7 @@ static inline void io_put_req_deferred(struct io_kiocb *req) { if (req_ref_put_and_test(req)) { req->io_task_work.func = io_free_req; - io_req_task_work_add(req); + io_req_task_work_add(req, false); } } @@ -2564,7 +2567,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res, long res2) return; req->result = res; req->io_task_work.func = io_req_task_complete; - io_req_task_work_add(req); + io_req_task_work_add(req, true); } static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2) @@ -4881,7 +4884,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, * of executing it. We can't safely execute it anyway, as we may not * have the needed state needed for it anyway. */ - io_req_task_work_add(req); + io_req_task_work_add(req, false); return 1; } @@ -6430,7 +6433,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer) spin_unlock_irqrestore(&ctx->timeout_lock, flags); req->io_task_work.func = io_req_task_link_timeout; - io_req_task_work_add(req); + io_req_task_work_add(req, false); return HRTIMER_NORESTART; } -- 2.24.4 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] io_uring: add irq completion work to the head of task_list 2021-08-23 18:36 ` [PATCH 2/2] io_uring: add irq completion work to the head of task_list Hao Xu @ 2021-08-23 18:41 ` Hao Xu 2021-08-24 12:57 ` Pavel Begunkov 1 sibling, 0 replies; 11+ messages in thread From: Hao Xu @ 2021-08-23 18:41 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi 在 2021/8/24 上午2:36, Hao Xu 写道: > Now we have a lot of task_work users, some are just to complete a req > and generate a cqe. Let's put the work at the head position of the > task_list, so that it can be handled quickly and thus to reduce > avg req latency. an explanatory case: > > origin timeline: > submit_sqe-->irq-->add completion task_work > -->run heavy work0~n-->run completion task_work > now timeline: > submit_sqe-->irq-->add completion task_work > -->run completion task_work-->run heavy work0~n > > Signed-off-by: Hao Xu <[email protected]> sorry, sent this old version by mistake, will send latest version later. the latest one is to add high priority task_work to the front of task_list in the time order, not the reverse order. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] io_uring: add irq completion work to the head of task_list 2021-08-23 18:36 ` [PATCH 2/2] io_uring: add irq completion work to the head of task_list Hao Xu 2021-08-23 18:41 ` Hao Xu @ 2021-08-24 12:57 ` Pavel Begunkov 2021-08-25 3:19 ` Hao Xu 1 sibling, 1 reply; 11+ messages in thread From: Pavel Begunkov @ 2021-08-24 12:57 UTC (permalink / raw) To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi On 8/23/21 7:36 PM, Hao Xu wrote: > Now we have a lot of task_work users, some are just to complete a req > and generate a cqe. Let's put the work at the head position of the > task_list, so that it can be handled quickly and thus to reduce > avg req latency. an explanatory case: > > origin timeline: > submit_sqe-->irq-->add completion task_work > -->run heavy work0~n-->run completion task_work > now timeline: > submit_sqe-->irq-->add completion task_work > -->run completion task_work-->run heavy work0~n Might be good. There are not so many hot tw users: poll, queuing linked requests, and the new IRQ. Could be BPF in the future. So, for the test case I'd think about some heavy-ish submissions linked to your IRQ req. For instance, keeping a large QD of read(IRQ-based) -> linked read_pipe(PAGE_SIZE); and running it for a while, so they get completely out of sync and tw works really mix up. It reads from pipes size<=PAGE_SIZE, so it completes inline, but the copy takes enough of time. One thing is that Jens specifically wanted tw's to be in FIFO order, where IRQ based will be in LIFO. I don't think it's a real problem though, the completion handler should be brief enough. > Signed-off-by: Hao Xu <[email protected]> > --- > fs/io-wq.h | 9 +++++++++ > fs/io_uring.c | 21 ++++++++++++--------- > 2 files changed, 21 insertions(+), 9 deletions(-) > > diff --git a/fs/io-wq.h b/fs/io-wq.h > index 308af3928424..51b4408fd177 100644 > --- a/fs/io-wq.h > +++ b/fs/io-wq.h > @@ -41,6 +41,15 @@ static inline void wq_list_add_after(struct io_wq_work_node *node, > list->last = node; > } > > +static inline void wq_list_add_head(struct io_wq_work_node *node, > + struct io_wq_work_list *list) > +{ > + node->next = list->first; > + list->first = node; > + if (!node->next) > + list->last = node; > +} > + > static inline void wq_list_add_tail(struct io_wq_work_node *node, > struct io_wq_work_list *list) > { > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 8172f5f893ad..954cd8583945 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -2050,7 +2050,7 @@ static void tctx_task_work(struct callback_head *cb) > ctx_flush_and_put(ctx); > } > > -static void io_req_task_work_add(struct io_kiocb *req) > +static void io_req_task_work_add(struct io_kiocb *req, bool emergency) > { > struct task_struct *tsk = req->task; > struct io_uring_task *tctx = tsk->io_uring; > @@ -2062,7 +2062,10 @@ static void io_req_task_work_add(struct io_kiocb *req) > WARN_ON_ONCE(!tctx); > > spin_lock_irqsave(&tctx->task_lock, flags); > - wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); > + if (emergency) > + wq_list_add_head(&req->io_task_work.node, &tctx->task_list); > + else > + wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); > running = tctx->task_running; > if (!running) > tctx->task_running = true; > @@ -2122,19 +2125,19 @@ static void io_req_task_queue_fail(struct io_kiocb *req, int ret) > { > req->result = ret; > req->io_task_work.func = io_req_task_cancel; > - io_req_task_work_add(req); > + io_req_task_work_add(req, true); > } > > static void io_req_task_queue(struct io_kiocb *req) > { > req->io_task_work.func = io_req_task_submit; > - io_req_task_work_add(req); > + io_req_task_work_add(req, false); > } > > static void io_req_task_queue_reissue(struct io_kiocb *req) > { > req->io_task_work.func = io_queue_async_work; > - io_req_task_work_add(req); > + io_req_task_work_add(req, false); > } > > static inline void io_queue_next(struct io_kiocb *req) > @@ -2249,7 +2252,7 @@ static inline void io_put_req_deferred(struct io_kiocb *req) > { > if (req_ref_put_and_test(req)) { > req->io_task_work.func = io_free_req; > - io_req_task_work_add(req); > + io_req_task_work_add(req, false); > } > } > > @@ -2564,7 +2567,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res, long res2) > return; > req->result = res; > req->io_task_work.func = io_req_task_complete; > - io_req_task_work_add(req); > + io_req_task_work_add(req, true); > } > > static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2) > @@ -4881,7 +4884,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, > * of executing it. We can't safely execute it anyway, as we may not > * have the needed state needed for it anyway. > */ > - io_req_task_work_add(req); > + io_req_task_work_add(req, false); > return 1; > } > > @@ -6430,7 +6433,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer) > spin_unlock_irqrestore(&ctx->timeout_lock, flags); > > req->io_task_work.func = io_req_task_link_timeout; > - io_req_task_work_add(req); > + io_req_task_work_add(req, false); > return HRTIMER_NORESTART; > } > > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] io_uring: add irq completion work to the head of task_list 2021-08-24 12:57 ` Pavel Begunkov @ 2021-08-25 3:19 ` Hao Xu 2021-08-25 11:18 ` Pavel Begunkov 0 siblings, 1 reply; 11+ messages in thread From: Hao Xu @ 2021-08-25 3:19 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi 在 2021/8/24 下午8:57, Pavel Begunkov 写道: > On 8/23/21 7:36 PM, Hao Xu wrote: >> Now we have a lot of task_work users, some are just to complete a req >> and generate a cqe. Let's put the work at the head position of the >> task_list, so that it can be handled quickly and thus to reduce >> avg req latency. an explanatory case: >> >> origin timeline: >> submit_sqe-->irq-->add completion task_work >> -->run heavy work0~n-->run completion task_work >> now timeline: >> submit_sqe-->irq-->add completion task_work >> -->run completion task_work-->run heavy work0~n > > Might be good. There are not so many hot tw users: > poll, queuing linked requests, and the new IRQ. Could be > BPF in the future. async buffered reads as well, regarding buffered reads is hot operation. > > So, for the test case I'd think about some heavy-ish > submissions linked to your IRQ req. For instance, > keeping a large QD of > > read(IRQ-based) -> linked read_pipe(PAGE_SIZE); > > and running it for a while, so they get completely > out of sync and tw works really mix up. It reads > from pipes size<=PAGE_SIZE, so it completes inline, > but the copy takes enough of time. Thanks Pavel, previously I tried direct read-->buffered read(async buffered read) didn't see much difference. I'll try the above case you offered. > > One thing is that Jens specifically wanted tw's to > be in FIFO order, where IRQ based will be in LIFO. > I don't think it's a real problem though, the > completion handler should be brief enough.In my latest code, the IRQ based tw are also FIFO, only LIFO between IRQ based tw and other tw: timeline: tw1 tw2 irq1 irq2 task_list: irq1 irq2 tw1 tw2 > >> Signed-off-by: Hao Xu <[email protected]> >> --- >> fs/io-wq.h | 9 +++++++++ >> fs/io_uring.c | 21 ++++++++++++--------- >> 2 files changed, 21 insertions(+), 9 deletions(-) >> >> diff --git a/fs/io-wq.h b/fs/io-wq.h >> index 308af3928424..51b4408fd177 100644 >> --- a/fs/io-wq.h >> +++ b/fs/io-wq.h >> @@ -41,6 +41,15 @@ static inline void wq_list_add_after(struct io_wq_work_node *node, >> list->last = node; >> } >> >> +static inline void wq_list_add_head(struct io_wq_work_node *node, >> + struct io_wq_work_list *list) >> +{ >> + node->next = list->first; >> + list->first = node; >> + if (!node->next) >> + list->last = node; >> +} >> + >> static inline void wq_list_add_tail(struct io_wq_work_node *node, >> struct io_wq_work_list *list) >> { >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 8172f5f893ad..954cd8583945 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -2050,7 +2050,7 @@ static void tctx_task_work(struct callback_head *cb) >> ctx_flush_and_put(ctx); >> } >> >> -static void io_req_task_work_add(struct io_kiocb *req) >> +static void io_req_task_work_add(struct io_kiocb *req, bool emergency) >> { >> struct task_struct *tsk = req->task; >> struct io_uring_task *tctx = tsk->io_uring; >> @@ -2062,7 +2062,10 @@ static void io_req_task_work_add(struct io_kiocb *req) >> WARN_ON_ONCE(!tctx); >> >> spin_lock_irqsave(&tctx->task_lock, flags); >> - wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); >> + if (emergency) >> + wq_list_add_head(&req->io_task_work.node, &tctx->task_list); >> + else >> + wq_list_add_tail(&req->io_task_work.node, &tctx->task_list); >> running = tctx->task_running; >> if (!running) >> tctx->task_running = true; >> @@ -2122,19 +2125,19 @@ static void io_req_task_queue_fail(struct io_kiocb *req, int ret) >> { >> req->result = ret; >> req->io_task_work.func = io_req_task_cancel; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, true); >> } >> >> static void io_req_task_queue(struct io_kiocb *req) >> { >> req->io_task_work.func = io_req_task_submit; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> } >> >> static void io_req_task_queue_reissue(struct io_kiocb *req) >> { >> req->io_task_work.func = io_queue_async_work; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> } >> >> static inline void io_queue_next(struct io_kiocb *req) >> @@ -2249,7 +2252,7 @@ static inline void io_put_req_deferred(struct io_kiocb *req) >> { >> if (req_ref_put_and_test(req)) { >> req->io_task_work.func = io_free_req; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> } >> } >> >> @@ -2564,7 +2567,7 @@ static void io_complete_rw(struct kiocb *kiocb, long res, long res2) >> return; >> req->result = res; >> req->io_task_work.func = io_req_task_complete; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, true); >> } >> >> static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2) >> @@ -4881,7 +4884,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, >> * of executing it. We can't safely execute it anyway, as we may not >> * have the needed state needed for it anyway. >> */ >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> return 1; >> } >> >> @@ -6430,7 +6433,7 @@ static enum hrtimer_restart io_link_timeout_fn(struct hrtimer *timer) >> spin_unlock_irqrestore(&ctx->timeout_lock, flags); >> >> req->io_task_work.func = io_req_task_link_timeout; >> - io_req_task_work_add(req); >> + io_req_task_work_add(req, false); >> return HRTIMER_NORESTART; >> } >> >> > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/2] io_uring: add irq completion work to the head of task_list 2021-08-25 3:19 ` Hao Xu @ 2021-08-25 11:18 ` Pavel Begunkov 0 siblings, 0 replies; 11+ messages in thread From: Pavel Begunkov @ 2021-08-25 11:18 UTC (permalink / raw) To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi On 8/25/21 4:19 AM, Hao Xu wrote: > 在 2021/8/24 下午8:57, Pavel Begunkov 写道: >> On 8/23/21 7:36 PM, Hao Xu wrote: >>> Now we have a lot of task_work users, some are just to complete a req >>> and generate a cqe. Let's put the work at the head position of the >>> task_list, so that it can be handled quickly and thus to reduce >>> avg req latency. an explanatory case: >>> >>> origin timeline: >>> submit_sqe-->irq-->add completion task_work >>> -->run heavy work0~n-->run completion task_work >>> now timeline: >>> submit_sqe-->irq-->add completion task_work >>> -->run completion task_work-->run heavy work0~n >> >> Might be good. There are not so many hot tw users: >> poll, queuing linked requests, and the new IRQ. Could be >> BPF in the future. > async buffered reads as well, regarding buffered reads is > hot operation. Good case as well, forgot about it. Should be not so hot, as it's only when reads are served out of the buffer cache. >> So, for the test case I'd think about some heavy-ish >> submissions linked to your IRQ req. For instance, >> keeping a large QD of >> >> read(IRQ-based) -> linked read_pipe(PAGE_SIZE); >> >> and running it for a while, so they get completely >> out of sync and tw works really mix up. It reads >> from pipes size<=PAGE_SIZE, so it completes inline, >> but the copy takes enough of time. > Thanks Pavel, previously I tried > direct read-->buffered read(async buffered read) > didn't see much difference. I'll try the above case > you offered. Hmm, considering that pipes have to be refilled, buffered reads may be a better option. I'd make them all to read the same page, + registered buffer + reg file. And then it'd probably depend on how fast your main SSD is. mem = malloc_align(4096); io_uring_register_buffer(mem, 4096); // preferably another disk/SSD from the fast one fd2 = open("./file"); // loop read(fast_ssd, DIRECT, 512) -> read(fd2, fixed_buf, 4096) Interesting what it'll yield. Probably with buffered reads it can be experimented to have 2 * PAGE_SIZE or even slightly more, to increase the heavy part. btw, I'd look for latency distribution (90%, 99%) as well, it may get the worst hit. >> >> One thing is that Jens specifically wanted tw's to >> be in FIFO order, where IRQ based will be in LIFO. >> I don't think it's a real problem though, the >> completion handler should be brief enough.In my latest code, the IRQ based tw are also FIFO, > only LIFO between IRQ based tw and other tw: > timeline: tw1 tw2 irq1 irq2 > task_list: irq1 irq2 tw1 tw2 >> -- Pavel Begunkov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/2] io_task_work optimization 2021-08-23 18:36 [RFC 0/2] io_task_work optimization Hao Xu 2021-08-23 18:36 ` [PATCH 1/2] io_uring: run task_work when sqthread is waken up Hao Xu 2021-08-23 18:36 ` [PATCH 2/2] io_uring: add irq completion work to the head of task_list Hao Xu @ 2021-08-25 15:58 ` Jens Axboe 2021-08-25 16:39 ` Hao Xu 2 siblings, 1 reply; 11+ messages in thread From: Jens Axboe @ 2021-08-25 15:58 UTC (permalink / raw) To: Hao Xu; +Cc: io-uring, Pavel Begunkov, Joseph Qi On 8/23/21 12:36 PM, Hao Xu wrote: > running task_work may not be a big bottleneck now, but it's never worse > to make it move forward a little bit. > I'm trying to construct tests to prove it is better in some cases where > it should be theoretically. > Currently only prove it is not worse by running fio tests(sometimes a > little bit better). So just put it here for comments and suggestion. I think this is interesting, particularly for areas where we have a mix of task_work uses because obviously it won't really matter if the task_work being run is homogeneous. That said, would be nice to have some numbers associated with it. We have a few classes of types of task_work: 1) Work completes really fast, we want to just do those first 2) Work is pretty fast, like async buffered read copy 3) Work is more expensive, might require a full retry of the operation Might make sense to make this classification explicit. Problem is, with any kind of scheduling like that, you risk introducing latency bubbles because the prio1 list grows really fast, for example. -- Jens Axboe ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/2] io_task_work optimization 2021-08-25 15:58 ` [RFC 0/2] io_task_work optimization Jens Axboe @ 2021-08-25 16:39 ` Hao Xu 2021-08-25 16:46 ` Pavel Begunkov 0 siblings, 1 reply; 11+ messages in thread From: Hao Xu @ 2021-08-25 16:39 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Pavel Begunkov, Joseph Qi 在 2021/8/25 下午11:58, Jens Axboe 写道: > On 8/23/21 12:36 PM, Hao Xu wrote: >> running task_work may not be a big bottleneck now, but it's never worse >> to make it move forward a little bit. >> I'm trying to construct tests to prove it is better in some cases where >> it should be theoretically. >> Currently only prove it is not worse by running fio tests(sometimes a >> little bit better). So just put it here for comments and suggestion. > > I think this is interesting, particularly for areas where we have a mix > of task_work uses because obviously it won't really matter if the > task_work being run is homogeneous. > > That said, would be nice to have some numbers associated with it. We > have a few classes of types of task_work: > > 1) Work completes really fast, we want to just do those first > 2) Work is pretty fast, like async buffered read copy > 3) Work is more expensive, might require a full retry of the operation > > Might make sense to make this classification explicit. Problem is, with > any kind of scheduling like that, you risk introducing latency bubbles > because the prio1 list grows really fast, for example. Yes, this may intrpduce latency if overwhelming 1) comes in short time. I'll try more tests to see if the problem exists and if there is a better way, like put limited number of 1) to the front. Anyway, I'll update this thread when I get some data. > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/2] io_task_work optimization 2021-08-25 16:39 ` Hao Xu @ 2021-08-25 16:46 ` Pavel Begunkov 2021-08-25 17:26 ` Hao Xu 0 siblings, 1 reply; 11+ messages in thread From: Pavel Begunkov @ 2021-08-25 16:46 UTC (permalink / raw) To: Hao Xu, Jens Axboe; +Cc: io-uring, Joseph Qi On 8/25/21 5:39 PM, Hao Xu wrote: > 在 2021/8/25 下午11:58, Jens Axboe 写道: >> On 8/23/21 12:36 PM, Hao Xu wrote: >>> running task_work may not be a big bottleneck now, but it's never worse >>> to make it move forward a little bit. >>> I'm trying to construct tests to prove it is better in some cases where >>> it should be theoretically. >>> Currently only prove it is not worse by running fio tests(sometimes a >>> little bit better). So just put it here for comments and suggestion. >> >> I think this is interesting, particularly for areas where we have a mix >> of task_work uses because obviously it won't really matter if the >> task_work being run is homogeneous. >> >> That said, would be nice to have some numbers associated with it. We >> have a few classes of types of task_work: >> >> 1) Work completes really fast, we want to just do those first >> 2) Work is pretty fast, like async buffered read copy >> 3) Work is more expensive, might require a full retry of the operation >> >> Might make sense to make this classification explicit. Problem is, with >> any kind of scheduling like that, you risk introducing latency bubbles >> because the prio1 list grows really fast, for example. > Yes, this may intrpduce latency if overwhelming 1) comes in short time. > I'll try more tests to see if the problem exists and if there is a > better way, like put limited number of 1) to the front. Anyway, I'll > update this thread when I get some data. Not sure, but it looks that IRQ completion batching is coming to 5.15. With that you may also want to flush completions after the IRQ sublist is exhausted. May be worth to consider having 2 lists in the future -- Pavel Begunkov ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/2] io_task_work optimization 2021-08-25 16:46 ` Pavel Begunkov @ 2021-08-25 17:26 ` Hao Xu 0 siblings, 0 replies; 11+ messages in thread From: Hao Xu @ 2021-08-25 17:26 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe; +Cc: io-uring, Joseph Qi 在 2021/8/26 上午12:46, Pavel Begunkov 写道: > On 8/25/21 5:39 PM, Hao Xu wrote: >> 在 2021/8/25 下午11:58, Jens Axboe 写道: >>> On 8/23/21 12:36 PM, Hao Xu wrote: >>>> running task_work may not be a big bottleneck now, but it's never worse >>>> to make it move forward a little bit. >>>> I'm trying to construct tests to prove it is better in some cases where >>>> it should be theoretically. >>>> Currently only prove it is not worse by running fio tests(sometimes a >>>> little bit better). So just put it here for comments and suggestion. >>> >>> I think this is interesting, particularly for areas where we have a mix >>> of task_work uses because obviously it won't really matter if the >>> task_work being run is homogeneous. >>> >>> That said, would be nice to have some numbers associated with it. We >>> have a few classes of types of task_work: >>> >>> 1) Work completes really fast, we want to just do those first >>> 2) Work is pretty fast, like async buffered read copy >>> 3) Work is more expensive, might require a full retry of the operation >>> >>> Might make sense to make this classification explicit. Problem is, with >>> any kind of scheduling like that, you risk introducing latency bubbles >>> because the prio1 list grows really fast, for example. >> Yes, this may intrpduce latency if overwhelming 1) comes in short time. >> I'll try more tests to see if the problem exists and if there is a >> better way, like put limited number of 1) to the front. Anyway, I'll >> update this thread when I get some data. > > Not sure, but it looks that IRQ completion batching is coming to > 5.15. With that you may also want to flush completions after the > IRQ sublist is exhausted. > > May be worth to consider having 2 lists in the future I'll think about that, and there may be a way to reduce lock cost if there are multiple lists. lists. > ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-08-25 17:26 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-08-23 18:36 [RFC 0/2] io_task_work optimization Hao Xu 2021-08-23 18:36 ` [PATCH 1/2] io_uring: run task_work when sqthread is waken up Hao Xu 2021-08-23 18:36 ` [PATCH 2/2] io_uring: add irq completion work to the head of task_list Hao Xu 2021-08-23 18:41 ` Hao Xu 2021-08-24 12:57 ` Pavel Begunkov 2021-08-25 3:19 ` Hao Xu 2021-08-25 11:18 ` Pavel Begunkov 2021-08-25 15:58 ` [RFC 0/2] io_task_work optimization Jens Axboe 2021-08-25 16:39 ` Hao Xu 2021-08-25 16:46 ` Pavel Begunkov 2021-08-25 17:26 ` Hao Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox