* [PATCH 0/3] tw mutex & IRQ rw completion batching @ 2021-08-18 11:42 Pavel Begunkov 2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov ` (3 more replies) 0 siblings, 4 replies; 12+ messages in thread From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw) To: Jens Axboe, io-uring In essence, it's about two features. The first one is implemented by 1-2 and saves ->uring_lock lock/unlock in a single call of tctx_task_work(). Should be useful for links, apolls and BPF requests at some moment. The second feature (3/3) is batching freeing and completing of IRQ-based read/write requests. Haven't got numbers yet, but just throwing it for public discussion. P.S. for the new horrendous part of io_req_task_complete() placing state->compl_reqs and flushing is temporary, will be killed by other planned patches. Pavel Begunkov (3): io_uring: flush completions for fallbacks io_uring: batch task work locking io_uring: IRQ rw completion batching fs/io_uring.c | 103 +++++++++++++++++++++++++++++++++----------------- 1 file changed, 69 insertions(+), 34 deletions(-) -- 2.32.0 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] io_uring: flush completions for fallbacks 2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov @ 2021-08-18 11:42 ` Pavel Begunkov 2021-08-20 9:21 ` Hao Xu 2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov ` (2 subsequent siblings) 3 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw) To: Jens Axboe, io-uring io_fallback_req_func() doesn't expect anyone creating inline completions, and no one currently does that. Teach the function to flush completions preparing for further changes. Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/fs/io_uring.c b/fs/io_uring.c index 3da9f1374612..ba087f395507 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work) percpu_ref_get(&ctx->refs); llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node) req->io_task_work.func(req); + + mutex_lock(&ctx->uring_lock); + if (ctx->submit_state.compl_nr) + io_submit_flush_completions(ctx); + mutex_unlock(&ctx->uring_lock); percpu_ref_put(&ctx->refs); } -- 2.32.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks 2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov @ 2021-08-20 9:21 ` Hao Xu 2021-08-20 9:32 ` Hao Xu 2021-08-20 9:49 ` Pavel Begunkov 0 siblings, 2 replies; 12+ messages in thread From: Hao Xu @ 2021-08-20 9:21 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe, io-uring 在 2021/8/18 下午7:42, Pavel Begunkov 写道: > io_fallback_req_func() doesn't expect anyone creating inline > completions, and no one currently does that. Teach the function to flush > completions preparing for further changes. > > Signed-off-by: Pavel Begunkov <[email protected]> > --- > fs/io_uring.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 3da9f1374612..ba087f395507 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work) > percpu_ref_get(&ctx->refs); > llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node) > req->io_task_work.func(req); > + > + mutex_lock(&ctx->uring_lock); > + if (ctx->submit_state.compl_nr) > + io_submit_flush_completions(ctx); > + mutex_unlock(&ctx->uring_lock); why do we protect io_submit_flush_completions() with uring_lock, regarding that it is called in original context. Btw, why not use ctx_flush_and_put() > percpu_ref_put(&ctx->refs); > } > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks 2021-08-20 9:21 ` Hao Xu @ 2021-08-20 9:32 ` Hao Xu 2021-08-20 9:49 ` Pavel Begunkov 1 sibling, 0 replies; 12+ messages in thread From: Hao Xu @ 2021-08-20 9:32 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe, io-uring 在 2021/8/20 下午5:21, Hao Xu 写道: > 在 2021/8/18 下午7:42, Pavel Begunkov 写道: >> io_fallback_req_func() doesn't expect anyone creating inline >> completions, and no one currently does that. Teach the function to flush >> completions preparing for further changes. >> >> Signed-off-by: Pavel Begunkov <[email protected]> >> --- >> fs/io_uring.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 3da9f1374612..ba087f395507 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct >> work_struct *work) >> percpu_ref_get(&ctx->refs); >> llist_for_each_entry_safe(req, tmp, node, >> io_task_work.fallback_node) >> req->io_task_work.func(req); >> + >> + mutex_lock(&ctx->uring_lock); >> + if (ctx->submit_state.compl_nr) >> + io_submit_flush_completions(ctx); >> + mutex_unlock(&ctx->uring_lock); > why do we protect io_submit_flush_completions() with uring_lock, > regarding that it is called in original context. Btw, why not I mean it is in original context before this patch.. > use ctx_flush_and_put() >> percpu_ref_put(&ctx->refs); >> } >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks 2021-08-20 9:21 ` Hao Xu 2021-08-20 9:32 ` Hao Xu @ 2021-08-20 9:49 ` Pavel Begunkov 2021-08-20 10:16 ` Hao Xu 1 sibling, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2021-08-20 9:49 UTC (permalink / raw) To: Hao Xu, Jens Axboe, io-uring On 8/20/21 10:21 AM, Hao Xu wrote: > 在 2021/8/18 下午7:42, Pavel Begunkov 写道: >> io_fallback_req_func() doesn't expect anyone creating inline >> completions, and no one currently does that. Teach the function to flush >> completions preparing for further changes. >> >> Signed-off-by: Pavel Begunkov <[email protected]> >> --- >> fs/io_uring.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 3da9f1374612..ba087f395507 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work) >> percpu_ref_get(&ctx->refs); >> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node) >> req->io_task_work.func(req); >> + >> + mutex_lock(&ctx->uring_lock); >> + if (ctx->submit_state.compl_nr) >> + io_submit_flush_completions(ctx); >> + mutex_unlock(&ctx->uring_lock); > why do we protect io_submit_flush_completions() with uring_lock, > regarding that it is called in original context. Btw, why not > use ctx_flush_and_put() The fallback thing is called from a workqueue not the submitter task context. See delayed_work and so. Regarding locking, it touches struct io_submit_state, and it's protected by ->uring_lock. In particular we're interested in ->reqs and ->free_list. FWIW, there is refurbishment going on around submit state, so if proves useful the locking may change in coming months. ctx_flush_and_put() could have been used, but simpler to hand code it and avoid the (always messy) conditional ref grabbing and locking. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks 2021-08-20 9:49 ` Pavel Begunkov @ 2021-08-20 10:16 ` Hao Xu 2021-08-20 12:26 ` Pavel Begunkov 0 siblings, 1 reply; 12+ messages in thread From: Hao Xu @ 2021-08-20 10:16 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe, io-uring 在 2021/8/20 下午5:49, Pavel Begunkov 写道: > On 8/20/21 10:21 AM, Hao Xu wrote: >> 在 2021/8/18 下午7:42, Pavel Begunkov 写道: >>> io_fallback_req_func() doesn't expect anyone creating inline >>> completions, and no one currently does that. Teach the function to flush >>> completions preparing for further changes. >>> >>> Signed-off-by: Pavel Begunkov <[email protected]> >>> --- >>> fs/io_uring.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>> index 3da9f1374612..ba087f395507 100644 >>> --- a/fs/io_uring.c >>> +++ b/fs/io_uring.c >>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work) >>> percpu_ref_get(&ctx->refs); >>> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node) >>> req->io_task_work.func(req); >>> + >>> + mutex_lock(&ctx->uring_lock); >>> + if (ctx->submit_state.compl_nr) >>> + io_submit_flush_completions(ctx); >>> + mutex_unlock(&ctx->uring_lock); >> why do we protect io_submit_flush_completions() with uring_lock, >> regarding that it is called in original context. Btw, why not >> use ctx_flush_and_put() > > The fallback thing is called from a workqueue not the submitter task > context. See delayed_work and so. > > Regarding locking, it touches struct io_submit_state, and it's protected by > ->uring_lock. In particular we're interested in ->reqs and ->free_list. > FWIW, there is refurbishment going on around submit state, so if proves > useful the locking may change in coming months. > > ctx_flush_and_put() could have been used, but simpler to hand code it > and avoid the (always messy) conditional ref grabbing and locking. I didn't get it, what do you mean 'avoid the (always messy) conditional ref grabbing and locking'? the code here and in ctx_flush_and_put() are same..though I think in ctx_flush_and_put(), there is a problem that compl_nr should also be protected. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks 2021-08-20 10:16 ` Hao Xu @ 2021-08-20 12:26 ` Pavel Begunkov 2021-08-20 18:41 ` Hao Xu 0 siblings, 1 reply; 12+ messages in thread From: Pavel Begunkov @ 2021-08-20 12:26 UTC (permalink / raw) To: Hao Xu, Jens Axboe, io-uring On 8/20/21 11:16 AM, Hao Xu wrote: > 在 2021/8/20 下午5:49, Pavel Begunkov 写道: >> On 8/20/21 10:21 AM, Hao Xu wrote: >>> 在 2021/8/18 下午7:42, Pavel Begunkov 写道: >>>> io_fallback_req_func() doesn't expect anyone creating inline >>>> completions, and no one currently does that. Teach the function to flush >>>> completions preparing for further changes. >>>> >>>> Signed-off-by: Pavel Begunkov <[email protected]> >>>> --- >>>> fs/io_uring.c | 5 +++++ >>>> 1 file changed, 5 insertions(+) >>>> >>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>> index 3da9f1374612..ba087f395507 100644 >>>> --- a/fs/io_uring.c >>>> +++ b/fs/io_uring.c >>>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work) >>>> percpu_ref_get(&ctx->refs); >>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node) >>>> req->io_task_work.func(req); >>>> + >>>> + mutex_lock(&ctx->uring_lock); >>>> + if (ctx->submit_state.compl_nr) >>>> + io_submit_flush_completions(ctx); >>>> + mutex_unlock(&ctx->uring_lock); >>> why do we protect io_submit_flush_completions() with uring_lock, >>> regarding that it is called in original context. Btw, why not >>> use ctx_flush_and_put() >> >> The fallback thing is called from a workqueue not the submitter task >> context. See delayed_work and so. >> >> Regarding locking, it touches struct io_submit_state, and it's protected by >> ->uring_lock. In particular we're interested in ->reqs and ->free_list. >> FWIW, there is refurbishment going on around submit state, so if proves >> useful the locking may change in coming months. >> >> ctx_flush_and_put() could have been used, but simpler to hand code it >> and avoid the (always messy) conditional ref grabbing and locking. > I didn't get it, what do you mean 'avoid the (always messy) conditional > ref grabbing and locking'? the code here and in ctx_flush_and_put() are > same..though I think in ctx_flush_and_put(), there is a problem that > compl_nr should also be protected. Ok, the long story. First, notice a ctx check at the beginning of ctx_flush_and_put(), that one is conditional. Even though we know it's not NULL, it's more confusing and might be a problem for static and human analysis. Also, locking is never easy, and so IMHO it's preferable to keep lock() and a matching unlock (or get/put) in the same function if possible, much easier to read. Compare ref_get(); do_something(); ref_put(); and ref_get(); do_something(); flush_ctx(); I believe, the first one is of less mental overhead. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks 2021-08-20 12:26 ` Pavel Begunkov @ 2021-08-20 18:41 ` Hao Xu 0 siblings, 0 replies; 12+ messages in thread From: Hao Xu @ 2021-08-20 18:41 UTC (permalink / raw) To: Pavel Begunkov, Jens Axboe, io-uring 在 2021/8/20 下午8:26, Pavel Begunkov 写道: > On 8/20/21 11:16 AM, Hao Xu wrote: >> 在 2021/8/20 下午5:49, Pavel Begunkov 写道: >>> On 8/20/21 10:21 AM, Hao Xu wrote: >>>> 在 2021/8/18 下午7:42, Pavel Begunkov 写道: >>>>> io_fallback_req_func() doesn't expect anyone creating inline >>>>> completions, and no one currently does that. Teach the function to flush >>>>> completions preparing for further changes. >>>>> >>>>> Signed-off-by: Pavel Begunkov <[email protected]> >>>>> --- >>>>> fs/io_uring.c | 5 +++++ >>>>> 1 file changed, 5 insertions(+) >>>>> >>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c >>>>> index 3da9f1374612..ba087f395507 100644 >>>>> --- a/fs/io_uring.c >>>>> +++ b/fs/io_uring.c >>>>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work) >>>>> percpu_ref_get(&ctx->refs); >>>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node) >>>>> req->io_task_work.func(req); >>>>> + >>>>> + mutex_lock(&ctx->uring_lock); >>>>> + if (ctx->submit_state.compl_nr) >>>>> + io_submit_flush_completions(ctx); >>>>> + mutex_unlock(&ctx->uring_lock); >>>> why do we protect io_submit_flush_completions() with uring_lock, >>>> regarding that it is called in original context. Btw, why not >>>> use ctx_flush_and_put() >>> >>> The fallback thing is called from a workqueue not the submitter task >>> context. See delayed_work and so. >>> >>> Regarding locking, it touches struct io_submit_state, and it's protected by >>> ->uring_lock. In particular we're interested in ->reqs and ->free_list. >>> FWIW, there is refurbishment going on around submit state, so if proves >>> useful the locking may change in coming months. >>> >>> ctx_flush_and_put() could have been used, but simpler to hand code it >>> and avoid the (always messy) conditional ref grabbing and locking. > >> I didn't get it, what do you mean 'avoid the (always messy) conditional >> ref grabbing and locking'? the code here and in ctx_flush_and_put() are >> same..though I think in ctx_flush_and_put(), there is a problem that >> compl_nr should also be protected. > > Ok, the long story. First, notice a ctx check at the beginning of > ctx_flush_and_put(), that one is conditional. Even though we know > it's not NULL, it's more confusing and might be a problem for > static and human analysis. > > Also, locking is never easy, and so IMHO it's preferable to keep > lock() and a matching unlock (or get/put) in the same function if > possible, much easier to read. Compare > > ref_get(); > do_something(); > ref_put(); > > and > > ref_get(); > do_something(); > flush_ctx(); > > I believe, the first one is of less mental overhead. Thanks, got it. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/3] io_uring: batch task work locking 2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov 2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov @ 2021-08-18 11:42 ` Pavel Begunkov 2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov 2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe 3 siblings, 0 replies; 12+ messages in thread From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw) To: Jens Axboe, io-uring Many task_work handlers either grab ->uring_lock, or may benefit from having it. Move locking logic out of individual handlers to a lazy approach controlled by tctx_task_work(), so we don't keep doing tons of mutex lock/unlock. Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 89 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 52 insertions(+), 37 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index ba087f395507..54c4d8326944 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -775,7 +775,7 @@ struct async_poll { struct io_poll_iocb *double_poll; }; -typedef void (*io_req_tw_func_t)(struct io_kiocb *req); +typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked); struct io_task_work { union { @@ -1080,6 +1080,14 @@ struct sock *io_uring_get_socket(struct file *file) } EXPORT_SYMBOL(io_uring_get_socket); +static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked) +{ + if (!*locked) { + mutex_lock(&ctx->uring_lock); + *locked = true; + } +} + #define io_for_each_link(pos, head) \ for (pos = (head); pos; pos = pos->link) @@ -1193,16 +1201,19 @@ static void io_fallback_req_func(struct work_struct *work) fallback_work.work); struct llist_node *node = llist_del_all(&ctx->fallback_llist); struct io_kiocb *req, *tmp; + bool locked = false; percpu_ref_get(&ctx->refs); llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node) - req->io_task_work.func(req); + req->io_task_work.func(req, &locked); - mutex_lock(&ctx->uring_lock); - if (ctx->submit_state.compl_nr) - io_submit_flush_completions(ctx); - mutex_unlock(&ctx->uring_lock); + if (locked) { + if (ctx->submit_state.compl_nr) + io_submit_flush_completions(ctx); + mutex_unlock(&ctx->uring_lock); + } percpu_ref_put(&ctx->refs); + } static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) @@ -1386,12 +1397,15 @@ static void io_prep_async_link(struct io_kiocb *req) } } -static void io_queue_async_work(struct io_kiocb *req) +static void io_queue_async_work(struct io_kiocb *req, bool *locked) { struct io_ring_ctx *ctx = req->ctx; struct io_kiocb *link = io_prep_linked_timeout(req); struct io_uring_task *tctx = req->task->io_uring; + /* must not take the lock, NULL it as a precaution */ + locked = NULL; + BUG_ON(!tctx); BUG_ON(!tctx->io_wq); @@ -2002,14 +2016,15 @@ static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req) return __io_req_find_next(req); } -static void ctx_flush_and_put(struct io_ring_ctx *ctx) +static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked) { if (!ctx) return; - if (ctx->submit_state.compl_nr) { - mutex_lock(&ctx->uring_lock); - io_submit_flush_completions(ctx); + if (*locked) { + if (ctx->submit_state.compl_nr) + io_submit_flush_completions(ctx); mutex_unlock(&ctx->uring_lock); + *locked = false; } } @@ -2021,6 +2036,7 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx) */ static void tctx_task_work(struct callback_head *cb) { + bool locked = false; struct io_ring_ctx *ctx = NULL; struct io_uring_task *tctx = container_of(cb, struct io_uring_task, task_work); @@ -2043,17 +2059,17 @@ static void tctx_task_work(struct callback_head *cb) io_task_work.node); if (req->ctx != ctx) { - ctx_flush_and_put(ctx); + ctx_flush_and_put(ctx, &locked); ctx = req->ctx; } - req->io_task_work.func(req); + req->io_task_work.func(req, &locked); node = next; } while (node); cond_resched(); } - ctx_flush_and_put(ctx); + ctx_flush_and_put(ctx, &locked); } static void io_req_task_work_add(struct io_kiocb *req) @@ -2105,27 +2121,21 @@ static void io_req_task_work_add(struct io_kiocb *req) } } -static void io_req_task_cancel(struct io_kiocb *req) +static void io_req_task_cancel(struct io_kiocb *req, bool *locked) { - struct io_ring_ctx *ctx = req->ctx; - - /* ctx is guaranteed to stay alive while we hold uring_lock */ - mutex_lock(&ctx->uring_lock); + io_tw_lock(req->ctx, locked); io_req_complete_failed(req, req->result); - mutex_unlock(&ctx->uring_lock); } -static void io_req_task_submit(struct io_kiocb *req) +static void io_req_task_submit(struct io_kiocb *req, bool *locked) { - struct io_ring_ctx *ctx = req->ctx; - /* ctx stays valid until unlock, even if we drop all ours ctx->refs */ - mutex_lock(&ctx->uring_lock); + io_tw_lock(req->ctx, locked); + if (likely(!(req->task->flags & PF_EXITING))) __io_queue_sqe(req); else io_req_complete_failed(req, -EFAULT); - mutex_unlock(&ctx->uring_lock); } static void io_req_task_queue_fail(struct io_kiocb *req, int ret) @@ -2161,6 +2171,11 @@ static void io_free_req(struct io_kiocb *req) __io_free_req(req); } +static void io_free_req_work(struct io_kiocb *req, bool *locked) +{ + io_free_req(req); +} + struct req_batch { struct task_struct *task; int task_refs; @@ -2260,7 +2275,7 @@ static inline void io_put_req(struct io_kiocb *req) static inline void io_put_req_deferred(struct io_kiocb *req) { if (req_ref_put_and_test(req)) { - req->io_task_work.func = io_free_req; + req->io_task_work.func = io_free_req_work; io_req_task_work_add(req); } } @@ -2555,7 +2570,7 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res) return false; } -static void io_req_task_complete(struct io_kiocb *req) +static void io_req_task_complete(struct io_kiocb *req, bool *locked) { __io_req_complete(req, 0, req->result, io_put_rw_kbuf(req)); } @@ -2565,7 +2580,7 @@ static void __io_complete_rw(struct io_kiocb *req, long res, long res2, { if (__io_complete_rw_common(req, res)) return; - io_req_task_complete(req); + __io_req_complete(req, 0, req->result, io_put_rw_kbuf(req)); } static void io_complete_rw(struct kiocb *kiocb, long res, long res2) @@ -4980,7 +4995,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask) return !(flags & IORING_CQE_F_MORE); } -static void io_poll_task_func(struct io_kiocb *req) +static void io_poll_task_func(struct io_kiocb *req, bool *locked) { struct io_ring_ctx *ctx = req->ctx; struct io_kiocb *nxt; @@ -5004,7 +5019,7 @@ static void io_poll_task_func(struct io_kiocb *req) if (done) { nxt = io_put_req_find_next(req); if (nxt) - io_req_task_submit(nxt); + io_req_task_submit(nxt, locked); } } } @@ -5116,7 +5131,7 @@ static void io_async_queue_proc(struct file *file, struct wait_queue_head *head, __io_queue_proc(&apoll->poll, pt, head, &apoll->double_poll); } -static void io_async_task_func(struct io_kiocb *req) +static void io_async_task_func(struct io_kiocb *req, bool *locked) { struct async_poll *apoll = req->apoll; struct io_ring_ctx *ctx = req->ctx; @@ -5133,7 +5148,7 @@ static void io_async_task_func(struct io_kiocb *req) spin_unlock(&ctx->completion_lock); if (!READ_ONCE(apoll->poll.canceled)) - io_req_task_submit(req); + io_req_task_submit(req, locked); else io_req_complete_failed(req, -ECANCELED); } @@ -5532,7 +5547,7 @@ static int io_poll_update(struct io_kiocb *req, unsigned int issue_flags) return 0; } -static void io_req_task_timeout(struct io_kiocb *req) +static void io_req_task_timeout(struct io_kiocb *req, bool *locked) { req_set_fail(req); io_req_complete_post(req, -ETIME, 0); @@ -6087,7 +6102,7 @@ static bool io_drain_req(struct io_kiocb *req) if (!req_need_defer(req, seq) && list_empty(&ctx->defer_list)) { spin_unlock(&ctx->completion_lock); kfree(de); - io_queue_async_work(req); + io_queue_async_work(req, NULL); return true; } @@ -6409,7 +6424,7 @@ static inline struct file *io_file_get(struct io_ring_ctx *ctx, return io_file_get_normal(ctx, req, fd); } -static void io_req_task_link_timeout(struct io_kiocb *req) +static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked) { struct io_kiocb *prev = req->timeout.prev; int ret; @@ -6513,7 +6528,7 @@ static void __io_queue_sqe(struct io_kiocb *req) * Queued up for async execution, worker will release * submit reference when the iocb is actually submitted. */ - io_queue_async_work(req); + io_queue_async_work(req, NULL); break; } @@ -6538,7 +6553,7 @@ static inline void io_queue_sqe(struct io_kiocb *req) if (unlikely(ret)) io_req_complete_failed(req, ret); else - io_queue_async_work(req); + io_queue_async_work(req, NULL); } } -- 2.32.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] io_uring: IRQ rw completion batching 2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov 2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov 2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov @ 2021-08-18 11:42 ` Pavel Begunkov 2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe 3 siblings, 0 replies; 12+ messages in thread From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw) To: Jens Axboe, io-uring Employ inline completion logic for read/write completions done via io_req_task_complete(). If ->uring_lock is contended, just do normal request completion, but if not, make tctx_task_work() to grab the lock and do batched inline completions in io_req_task_complete(). Signed-off-by: Pavel Begunkov <[email protected]> --- fs/io_uring.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 54c4d8326944..7179e34df8e9 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2061,6 +2061,8 @@ static void tctx_task_work(struct callback_head *cb) if (req->ctx != ctx) { ctx_flush_and_put(ctx, &locked); ctx = req->ctx; + /* if not contended, grab and improve batching */ + locked = mutex_trylock(&ctx->uring_lock); } req->io_task_work.func(req, &locked); node = next; @@ -2572,7 +2574,20 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res) static void io_req_task_complete(struct io_kiocb *req, bool *locked) { - __io_req_complete(req, 0, req->result, io_put_rw_kbuf(req)); + unsigned int cflags = io_put_rw_kbuf(req); + long res = req->result; + + if (*locked) { + struct io_ring_ctx *ctx = req->ctx; + struct io_submit_state *state = &ctx->submit_state; + + io_req_complete_state(req, res, cflags); + state->compl_reqs[state->compl_nr++] = req; + if (state->compl_nr == ARRAY_SIZE(state->compl_reqs)) + io_submit_flush_completions(ctx); + } else { + io_req_complete_post(req, res, cflags); + } } static void __io_complete_rw(struct io_kiocb *req, long res, long res2, -- 2.32.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] tw mutex & IRQ rw completion batching 2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov ` (2 preceding siblings ...) 2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov @ 2021-08-19 15:53 ` Jens Axboe 2021-08-19 16:35 ` Pavel Begunkov 3 siblings, 1 reply; 12+ messages in thread From: Jens Axboe @ 2021-08-19 15:53 UTC (permalink / raw) To: Pavel Begunkov, io-uring On 8/18/21 5:42 AM, Pavel Begunkov wrote: > In essence, it's about two features. The first one is implemented by > 1-2 and saves ->uring_lock lock/unlock in a single call of > tctx_task_work(). Should be useful for links, apolls and BPF requests > at some moment. > > The second feature (3/3) is batching freeing and completing of > IRQ-based read/write requests. > > Haven't got numbers yet, but just throwing it for public discussion. I ran some numbers and it looks good to me, it's a nice boost for the IRQ completions. It's funny how the initial move to task_work for IRQ completions took a small hit, but there's so many optimizations that it unlocks that it's already better than before. I'd like to apply 1/3 for now, but it depends on both master and for-5.15/io_uring. Hence I think it'd be better to defer that one until after the initial batch has gone in. For the batched locking, the principle is sound and measures out to be a nice win. But I have a hard time getting over the passed lock state, I do wonder if there's a cleaner way to accomplish this... -- Jens Axboe ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] tw mutex & IRQ rw completion batching 2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe @ 2021-08-19 16:35 ` Pavel Begunkov 0 siblings, 0 replies; 12+ messages in thread From: Pavel Begunkov @ 2021-08-19 16:35 UTC (permalink / raw) To: Jens Axboe, io-uring On 8/19/21 4:53 PM, Jens Axboe wrote: > On 8/18/21 5:42 AM, Pavel Begunkov wrote: >> In essence, it's about two features. The first one is implemented by >> 1-2 and saves ->uring_lock lock/unlock in a single call of >> tctx_task_work(). Should be useful for links, apolls and BPF requests >> at some moment. >> >> The second feature (3/3) is batching freeing and completing of >> IRQ-based read/write requests. >> >> Haven't got numbers yet, but just throwing it for public discussion. > > I ran some numbers and it looks good to me, it's a nice boost for the > IRQ completions. It's funny how the initial move to task_work for IRQ > completions took a small hit, but there's so many optimizations that it > unlocks that it's already better than before. > > I'd like to apply 1/3 for now, but it depends on both master and > for-5.15/io_uring. Hence I think it'd be better to defer that one until > after the initial batch has gone in. > > For the batched locking, the principle is sound and measures out to be a > nice win. But I have a hard time getting over the passed lock state, I > do wonder if there's a cleaner way to accomplish this... The initial idea was to have a request flag telling whether a task_work function may need a lock, but setting/clearing it would be more subtle. Then there is io_poll_task_func -> io_req_task_submit -> lock, and reads/writes based using trylock, because otherwise I'd rather be afraid of it hurting latency. This version looks good enough, apart from conditional locking always being a pain. We can hide bool into a struct, and with a bunch of helpers leave no visibility into it. Though, I don't think it helps anything. -- Pavel Begunkov ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-08-20 18:41 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov 2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov 2021-08-20 9:21 ` Hao Xu 2021-08-20 9:32 ` Hao Xu 2021-08-20 9:49 ` Pavel Begunkov 2021-08-20 10:16 ` Hao Xu 2021-08-20 12:26 ` Pavel Begunkov 2021-08-20 18:41 ` Hao Xu 2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov 2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov 2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe 2021-08-19 16:35 ` Pavel Begunkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox