[PATCH 0/3] tw mutex & IRQ rw completion batching

public inbox for [email protected]
 help / color / mirror / Atom feed

* [PATCH 0/3] tw mutex & IRQ rw completion batching
@ 2021-08-18 11:42 Pavel Begunkov
  2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw)
  To: Jens Axboe, io-uring

In essence, it's about two features. The first one is implemented by
1-2 and saves ->uring_lock lock/unlock in a single call of
tctx_task_work(). Should be useful for links, apolls and BPF requests
at some moment.

The second feature (3/3) is batching freeing and completing of
IRQ-based read/write requests.

Haven't got numbers yet, but just throwing it for public discussion.

P.S. for the new horrendous part of io_req_task_complete() placing
state->compl_reqs and flushing is temporary, will be killed by other
planned patches.

Pavel Begunkov (3):
  io_uring: flush completions for fallbacks
  io_uring: batch task work locking
  io_uring: IRQ rw completion batching

 fs/io_uring.c | 103 +++++++++++++++++++++++++++++++++-----------------
 1 file changed, 69 insertions(+), 34 deletions(-)

-- 
2.32.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/3] io_uring: flush completions for fallbacks
  2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
@ 2021-08-18 11:42 ` Pavel Begunkov
  2021-08-20  9:21   ` Hao Xu
  2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw)
  To: Jens Axboe, io-uring

io_fallback_req_func() doesn't expect anyone creating inline
completions, and no one currently does that. Teach the function to flush
completions preparing for further changes.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3da9f1374612..ba087f395507 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
 	percpu_ref_get(&ctx->refs);
 	llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
 		req->io_task_work.func(req);
+
+	mutex_lock(&ctx->uring_lock);
+	if (ctx->submit_state.compl_nr)
+		io_submit_flush_completions(ctx);
+	mutex_unlock(&ctx->uring_lock);
 	percpu_ref_put(&ctx->refs);
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
  2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
@ 2021-08-20  9:21   ` Hao Xu
  2021-08-20  9:32     ` Hao Xu
  2021-08-20  9:49     ` Pavel Begunkov
  0 siblings, 2 replies; 12+ messages in thread
From: Hao Xu @ 2021-08-20  9:21 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

在 2021/8/18 下午7:42, Pavel Begunkov 写道:
> io_fallback_req_func() doesn't expect anyone creating inline
> completions, and no one currently does that. Teach the function to flush
> completions preparing for further changes.
> 
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
>   fs/io_uring.c | 5 +++++
>   1 file changed, 5 insertions(+)
> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 3da9f1374612..ba087f395507 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>   	percpu_ref_get(&ctx->refs);
>   	llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>   		req->io_task_work.func(req);
> +
> +	mutex_lock(&ctx->uring_lock);
> +	if (ctx->submit_state.compl_nr)
> +		io_submit_flush_completions(ctx);
> +	mutex_unlock(&ctx->uring_lock);
why do we protect io_submit_flush_completions() with uring_lock,
regarding that it is called in original context. Btw, why not
use ctx_flush_and_put()
>   	percpu_ref_put(&ctx->refs);
>   }
>   
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
  2021-08-20  9:21   ` Hao Xu
@ 2021-08-20  9:32     ` Hao Xu
  2021-08-20  9:49     ` Pavel Begunkov
  1 sibling, 0 replies; 12+ messages in thread
From: Hao Xu @ 2021-08-20  9:32 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

在 2021/8/20 下午5:21, Hao Xu 写道:
> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>> io_fallback_req_func() doesn't expect anyone creating inline
>> completions, and no one currently does that. Teach the function to flush
>> completions preparing for further changes.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>>   fs/io_uring.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 3da9f1374612..ba087f395507 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct 
>> work_struct *work)
>>       percpu_ref_get(&ctx->refs);
>>       llist_for_each_entry_safe(req, tmp, node, 
>> io_task_work.fallback_node)
>>           req->io_task_work.func(req);
>> +
>> +    mutex_lock(&ctx->uring_lock);
>> +    if (ctx->submit_state.compl_nr)
>> +        io_submit_flush_completions(ctx);
>> +    mutex_unlock(&ctx->uring_lock);
> why do we protect io_submit_flush_completions() with uring_lock,
> regarding that it is called in original context. Btw, why not
I mean it is in original context before this patch..
> use ctx_flush_and_put()
>>       percpu_ref_put(&ctx->refs);
>>   }
>>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
  2021-08-20  9:21   ` Hao Xu
  2021-08-20  9:32     ` Hao Xu
@ 2021-08-20  9:49     ` Pavel Begunkov
  2021-08-20 10:16       ` Hao Xu
  1 sibling, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-20  9:49 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, io-uring

On 8/20/21 10:21 AM, Hao Xu wrote:
> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>> io_fallback_req_func() doesn't expect anyone creating inline
>> completions, and no one currently does that. Teach the function to flush
>> completions preparing for further changes.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>>   fs/io_uring.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 3da9f1374612..ba087f395507 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>>       percpu_ref_get(&ctx->refs);
>>       llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>>           req->io_task_work.func(req);
>> +
>> +    mutex_lock(&ctx->uring_lock);
>> +    if (ctx->submit_state.compl_nr)
>> +        io_submit_flush_completions(ctx);
>> +    mutex_unlock(&ctx->uring_lock);
> why do we protect io_submit_flush_completions() with uring_lock,
> regarding that it is called in original context. Btw, why not
> use ctx_flush_and_put()

The fallback thing is called from a workqueue not the submitter task
context. See delayed_work and so.

Regarding locking, it touches struct io_submit_state, and it's protected by
->uring_lock. In particular we're interested in ->reqs and ->free_list.
FWIW, there is refurbishment going on around submit state, so if proves
useful the locking may change in coming months.

ctx_flush_and_put() could have been used, but simpler to hand code it
and avoid the (always messy) conditional ref grabbing and locking.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
  2021-08-20  9:49     ` Pavel Begunkov
@ 2021-08-20 10:16       ` Hao Xu
  2021-08-20 12:26         ` Pavel Begunkov
  0 siblings, 1 reply; 12+ messages in thread
From: Hao Xu @ 2021-08-20 10:16 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

在 2021/8/20 下午5:49, Pavel Begunkov 写道:
> On 8/20/21 10:21 AM, Hao Xu wrote:
>> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>>> io_fallback_req_func() doesn't expect anyone creating inline
>>> completions, and no one currently does that. Teach the function to flush
>>> completions preparing for further changes.
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>>    fs/io_uring.c | 5 +++++
>>>    1 file changed, 5 insertions(+)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 3da9f1374612..ba087f395507 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>>>        percpu_ref_get(&ctx->refs);
>>>        llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>>>            req->io_task_work.func(req);
>>> +
>>> +    mutex_lock(&ctx->uring_lock);
>>> +    if (ctx->submit_state.compl_nr)
>>> +        io_submit_flush_completions(ctx);
>>> +    mutex_unlock(&ctx->uring_lock);
>> why do we protect io_submit_flush_completions() with uring_lock,
>> regarding that it is called in original context. Btw, why not
>> use ctx_flush_and_put()
> 
> The fallback thing is called from a workqueue not the submitter task
> context. See delayed_work and so.
> 
> Regarding locking, it touches struct io_submit_state, and it's protected by
> ->uring_lock. In particular we're interested in ->reqs and ->free_list.
> FWIW, there is refurbishment going on around submit state, so if proves
> useful the locking may change in coming months.
> 
> ctx_flush_and_put() could have been used, but simpler to hand code it
> and avoid the (always messy) conditional ref grabbing and locking.
I didn't get it, what do you mean 'avoid the (always messy) conditional
ref grabbing and locking'? the code here and in ctx_flush_and_put() are
same..though I think in ctx_flush_and_put(), there is a problem that
compl_nr should also be protected.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
  2021-08-20 10:16       ` Hao Xu
@ 2021-08-20 12:26         ` Pavel Begunkov
  2021-08-20 18:41           ` Hao Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-20 12:26 UTC (permalink / raw)
  To: Hao Xu, Jens Axboe, io-uring

On 8/20/21 11:16 AM, Hao Xu wrote:
> 在 2021/8/20 下午5:49, Pavel Begunkov 写道:
>> On 8/20/21 10:21 AM, Hao Xu wrote:
>>> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>>>> io_fallback_req_func() doesn't expect anyone creating inline
>>>> completions, and no one currently does that. Teach the function to flush
>>>> completions preparing for further changes.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> ---
>>>>    fs/io_uring.c | 5 +++++
>>>>    1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 3da9f1374612..ba087f395507 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>>>>        percpu_ref_get(&ctx->refs);
>>>>        llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>>>>            req->io_task_work.func(req);
>>>> +
>>>> +    mutex_lock(&ctx->uring_lock);
>>>> +    if (ctx->submit_state.compl_nr)
>>>> +        io_submit_flush_completions(ctx);
>>>> +    mutex_unlock(&ctx->uring_lock);
>>> why do we protect io_submit_flush_completions() with uring_lock,
>>> regarding that it is called in original context. Btw, why not
>>> use ctx_flush_and_put()
>>
>> The fallback thing is called from a workqueue not the submitter task
>> context. See delayed_work and so.
>>
>> Regarding locking, it touches struct io_submit_state, and it's protected by
>> ->uring_lock. In particular we're interested in ->reqs and ->free_list.
>> FWIW, there is refurbishment going on around submit state, so if proves
>> useful the locking may change in coming months.
>>
>> ctx_flush_and_put() could have been used, but simpler to hand code it
>> and avoid the (always messy) conditional ref grabbing and locking.

> I didn't get it, what do you mean 'avoid the (always messy) conditional
> ref grabbing and locking'? the code here and in ctx_flush_and_put() are
> same..though I think in ctx_flush_and_put(), there is a problem that
> compl_nr should also be protected.

Ok, the long story. First, notice a ctx check at the beginning of
ctx_flush_and_put(), that one is conditional. Even though we know
it's not NULL, it's more confusing and might be a problem for
static and human analysis.

Also, locking is never easy, and so IMHO it's preferable to keep
lock() and a matching unlock (or get/put) in the same function if
possible, much easier to read. Compare

ref_get();
do_something();
ref_put();

and

ref_get();
do_something();
flush_ctx();

I believe, the first one is of less mental overhead.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
  2021-08-20 12:26         ` Pavel Begunkov
@ 2021-08-20 18:41           ` Hao Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Hao Xu @ 2021-08-20 18:41 UTC (permalink / raw)
  To: Pavel Begunkov, Jens Axboe, io-uring

在 2021/8/20 下午8:26, Pavel Begunkov 写道:
> On 8/20/21 11:16 AM, Hao Xu wrote:
>> 在 2021/8/20 下午5:49, Pavel Begunkov 写道:
>>> On 8/20/21 10:21 AM, Hao Xu wrote:
>>>> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>>>>> io_fallback_req_func() doesn't expect anyone creating inline
>>>>> completions, and no one currently does that. Teach the function to flush
>>>>> completions preparing for further changes.
>>>>>
>>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>>> ---
>>>>>     fs/io_uring.c | 5 +++++
>>>>>     1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 3da9f1374612..ba087f395507 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>>>>>         percpu_ref_get(&ctx->refs);
>>>>>         llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>>>>>             req->io_task_work.func(req);
>>>>> +
>>>>> +    mutex_lock(&ctx->uring_lock);
>>>>> +    if (ctx->submit_state.compl_nr)
>>>>> +        io_submit_flush_completions(ctx);
>>>>> +    mutex_unlock(&ctx->uring_lock);
>>>> why do we protect io_submit_flush_completions() with uring_lock,
>>>> regarding that it is called in original context. Btw, why not
>>>> use ctx_flush_and_put()
>>>
>>> The fallback thing is called from a workqueue not the submitter task
>>> context. See delayed_work and so.
>>>
>>> Regarding locking, it touches struct io_submit_state, and it's protected by
>>> ->uring_lock. In particular we're interested in ->reqs and ->free_list.
>>> FWIW, there is refurbishment going on around submit state, so if proves
>>> useful the locking may change in coming months.
>>>
>>> ctx_flush_and_put() could have been used, but simpler to hand code it
>>> and avoid the (always messy) conditional ref grabbing and locking.
> 
>> I didn't get it, what do you mean 'avoid the (always messy) conditional
>> ref grabbing and locking'? the code here and in ctx_flush_and_put() are
>> same..though I think in ctx_flush_and_put(), there is a problem that
>> compl_nr should also be protected.
> 
> Ok, the long story. First, notice a ctx check at the beginning of
> ctx_flush_and_put(), that one is conditional. Even though we know
> it's not NULL, it's more confusing and might be a problem for
> static and human analysis.
> 
> Also, locking is never easy, and so IMHO it's preferable to keep
> lock() and a matching unlock (or get/put) in the same function if
> possible, much easier to read. Compare
> 
> ref_get();
> do_something();
> ref_put();
> 
> and
> 
> ref_get();
> do_something();
> flush_ctx();
> 
> I believe, the first one is of less mental overhead.
Thanks, got it.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3] io_uring: batch task work locking
  2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
  2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
@ 2021-08-18 11:42 ` Pavel Begunkov
  2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov
  2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe
  3 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Many task_work handlers either grab ->uring_lock, or may benefit from
having it. Move locking logic out of individual handlers to a lazy
approach controlled by tctx_task_work(), so we don't keep doing
tons of mutex lock/unlock.

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 89 ++++++++++++++++++++++++++++++---------------------
 1 file changed, 52 insertions(+), 37 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index ba087f395507..54c4d8326944 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -775,7 +775,7 @@ struct async_poll {
 	struct io_poll_iocb	*double_poll;
 };
 
-typedef void (*io_req_tw_func_t)(struct io_kiocb *req);
+typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
 
 struct io_task_work {
 	union {
@@ -1080,6 +1080,14 @@ struct sock *io_uring_get_socket(struct file *file)
 }
 EXPORT_SYMBOL(io_uring_get_socket);
 
+static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
+{
+	if (!*locked) {
+		mutex_lock(&ctx->uring_lock);
+		*locked = true;
+	}
+}
+
 #define io_for_each_link(pos, head) \
 	for (pos = (head); pos; pos = pos->link)
 
@@ -1193,16 +1201,19 @@ static void io_fallback_req_func(struct work_struct *work)
 						fallback_work.work);
 	struct llist_node *node = llist_del_all(&ctx->fallback_llist);
 	struct io_kiocb *req, *tmp;
+	bool locked = false;
 
 	percpu_ref_get(&ctx->refs);
 	llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
-		req->io_task_work.func(req);
+		req->io_task_work.func(req, &locked);
 
-	mutex_lock(&ctx->uring_lock);
-	if (ctx->submit_state.compl_nr)
-		io_submit_flush_completions(ctx);
-	mutex_unlock(&ctx->uring_lock);
+	if (locked) {
+		if (ctx->submit_state.compl_nr)
+			io_submit_flush_completions(ctx);
+		mutex_unlock(&ctx->uring_lock);
+	}
 	percpu_ref_put(&ctx->refs);
+
 }
 
 static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
@@ -1386,12 +1397,15 @@ static void io_prep_async_link(struct io_kiocb *req)
 	}
 }
 
-static void io_queue_async_work(struct io_kiocb *req)
+static void io_queue_async_work(struct io_kiocb *req, bool *locked)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_kiocb *link = io_prep_linked_timeout(req);
 	struct io_uring_task *tctx = req->task->io_uring;
 
+	/* must not take the lock, NULL it as a precaution */
+	locked = NULL;
+
 	BUG_ON(!tctx);
 	BUG_ON(!tctx->io_wq);
 
@@ -2002,14 +2016,15 @@ static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
 	return __io_req_find_next(req);
 }
 
-static void ctx_flush_and_put(struct io_ring_ctx *ctx)
+static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
 {
 	if (!ctx)
 		return;
-	if (ctx->submit_state.compl_nr) {
-		mutex_lock(&ctx->uring_lock);
-		io_submit_flush_completions(ctx);
+	if (*locked) {
+		if (ctx->submit_state.compl_nr)
+			io_submit_flush_completions(ctx);
 		mutex_unlock(&ctx->uring_lock);
+		*locked = false;
 	}
 }
 
@@ -2021,6 +2036,7 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
  */
 static void tctx_task_work(struct callback_head *cb)
 {
+	bool locked = false;
 	struct io_ring_ctx *ctx = NULL;
 	struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
 						  task_work);
@@ -2043,17 +2059,17 @@ static void tctx_task_work(struct callback_head *cb)
 							    io_task_work.node);
 
 			if (req->ctx != ctx) {
-				ctx_flush_and_put(ctx);
+				ctx_flush_and_put(ctx, &locked);
 				ctx = req->ctx;
 			}
-			req->io_task_work.func(req);
+			req->io_task_work.func(req, &locked);
 			node = next;
 		} while (node);
 
 		cond_resched();
 	}
 
-	ctx_flush_and_put(ctx);
+	ctx_flush_and_put(ctx, &locked);
 }
 
 static void io_req_task_work_add(struct io_kiocb *req)
@@ -2105,27 +2121,21 @@ static void io_req_task_work_add(struct io_kiocb *req)
 	}
 }
 
-static void io_req_task_cancel(struct io_kiocb *req)
+static void io_req_task_cancel(struct io_kiocb *req, bool *locked)
 {
-	struct io_ring_ctx *ctx = req->ctx;
-
-	/* ctx is guaranteed to stay alive while we hold uring_lock */
-	mutex_lock(&ctx->uring_lock);
+	io_tw_lock(req->ctx, locked);
 	io_req_complete_failed(req, req->result);
-	mutex_unlock(&ctx->uring_lock);
 }
 
-static void io_req_task_submit(struct io_kiocb *req)
+static void io_req_task_submit(struct io_kiocb *req, bool *locked)
 {
-	struct io_ring_ctx *ctx = req->ctx;
-
 	/* ctx stays valid until unlock, even if we drop all ours ctx->refs */
-	mutex_lock(&ctx->uring_lock);
+	io_tw_lock(req->ctx, locked);
+
 	if (likely(!(req->task->flags & PF_EXITING)))
 		__io_queue_sqe(req);
 	else
 		io_req_complete_failed(req, -EFAULT);
-	mutex_unlock(&ctx->uring_lock);
 }
 
 static void io_req_task_queue_fail(struct io_kiocb *req, int ret)
@@ -2161,6 +2171,11 @@ static void io_free_req(struct io_kiocb *req)
 	__io_free_req(req);
 }
 
+static void io_free_req_work(struct io_kiocb *req, bool *locked)
+{
+	io_free_req(req);
+}
+
 struct req_batch {
 	struct task_struct	*task;
 	int			task_refs;
@@ -2260,7 +2275,7 @@ static inline void io_put_req(struct io_kiocb *req)
 static inline void io_put_req_deferred(struct io_kiocb *req)
 {
 	if (req_ref_put_and_test(req)) {
-		req->io_task_work.func = io_free_req;
+		req->io_task_work.func = io_free_req_work;
 		io_req_task_work_add(req);
 	}
 }
@@ -2555,7 +2570,7 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res)
 	return false;
 }
 
-static void io_req_task_complete(struct io_kiocb *req)
+static void io_req_task_complete(struct io_kiocb *req, bool *locked)
 {
 	__io_req_complete(req, 0, req->result, io_put_rw_kbuf(req));
 }
@@ -2565,7 +2580,7 @@ static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
 {
 	if (__io_complete_rw_common(req, res))
 		return;
-	io_req_task_complete(req);
+	__io_req_complete(req, 0, req->result, io_put_rw_kbuf(req));
 }
 
 static void io_complete_rw(struct kiocb *kiocb, long res, long res2)
@@ -4980,7 +4995,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask)
 	return !(flags & IORING_CQE_F_MORE);
 }
 
-static void io_poll_task_func(struct io_kiocb *req)
+static void io_poll_task_func(struct io_kiocb *req, bool *locked)
 {
 	struct io_ring_ctx *ctx = req->ctx;
 	struct io_kiocb *nxt;
@@ -5004,7 +5019,7 @@ static void io_poll_task_func(struct io_kiocb *req)
 		if (done) {
 			nxt = io_put_req_find_next(req);
 			if (nxt)
-				io_req_task_submit(nxt);
+				io_req_task_submit(nxt, locked);
 		}
 	}
 }
@@ -5116,7 +5131,7 @@ static void io_async_queue_proc(struct file *file, struct wait_queue_head *head,
 	__io_queue_proc(&apoll->poll, pt, head, &apoll->double_poll);
 }
 
-static void io_async_task_func(struct io_kiocb *req)
+static void io_async_task_func(struct io_kiocb *req, bool *locked)
 {
 	struct async_poll *apoll = req->apoll;
 	struct io_ring_ctx *ctx = req->ctx;
@@ -5133,7 +5148,7 @@ static void io_async_task_func(struct io_kiocb *req)
 	spin_unlock(&ctx->completion_lock);
 
 	if (!READ_ONCE(apoll->poll.canceled))
-		io_req_task_submit(req);
+		io_req_task_submit(req, locked);
 	else
 		io_req_complete_failed(req, -ECANCELED);
 }
@@ -5532,7 +5547,7 @@ static int io_poll_update(struct io_kiocb *req, unsigned int issue_flags)
 	return 0;
 }
 
-static void io_req_task_timeout(struct io_kiocb *req)
+static void io_req_task_timeout(struct io_kiocb *req, bool *locked)
 {
 	req_set_fail(req);
 	io_req_complete_post(req, -ETIME, 0);
@@ -6087,7 +6102,7 @@ static bool io_drain_req(struct io_kiocb *req)
 	if (!req_need_defer(req, seq) && list_empty(&ctx->defer_list)) {
 		spin_unlock(&ctx->completion_lock);
 		kfree(de);
-		io_queue_async_work(req);
+		io_queue_async_work(req, NULL);
 		return true;
 	}
 
@@ -6409,7 +6424,7 @@ static inline struct file *io_file_get(struct io_ring_ctx *ctx,
 		return io_file_get_normal(ctx, req, fd);
 }
 
-static void io_req_task_link_timeout(struct io_kiocb *req)
+static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
 {
 	struct io_kiocb *prev = req->timeout.prev;
 	int ret;
@@ -6513,7 +6528,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
 			 * Queued up for async execution, worker will release
 			 * submit reference when the iocb is actually submitted.
 			 */
-			io_queue_async_work(req);
+			io_queue_async_work(req, NULL);
 			break;
 		}
 
@@ -6538,7 +6553,7 @@ static inline void io_queue_sqe(struct io_kiocb *req)
 		if (unlikely(ret))
 			io_req_complete_failed(req, ret);
 		else
-			io_queue_async_work(req);
+			io_queue_async_work(req, NULL);
 	}
 }
 
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/3] io_uring: IRQ rw completion batching
  2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
  2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
  2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov
@ 2021-08-18 11:42 ` Pavel Begunkov
  2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe
  3 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw)
  To: Jens Axboe, io-uring

Employ inline completion logic for read/write completions done via
io_req_task_complete(). If ->uring_lock is contended, just do normal
request completion, but if not, make tctx_task_work() to grab the lock
and do batched inline completions in io_req_task_complete().

Signed-off-by: Pavel Begunkov <[email protected]>
---
 fs/io_uring.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54c4d8326944..7179e34df8e9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2061,6 +2061,8 @@ static void tctx_task_work(struct callback_head *cb)
 			if (req->ctx != ctx) {
 				ctx_flush_and_put(ctx, &locked);
 				ctx = req->ctx;
+				/* if not contended, grab and improve batching */
+				locked = mutex_trylock(&ctx->uring_lock);
 			}
 			req->io_task_work.func(req, &locked);
 			node = next;
@@ -2572,7 +2574,20 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res)
 
 static void io_req_task_complete(struct io_kiocb *req, bool *locked)
 {
-	__io_req_complete(req, 0, req->result, io_put_rw_kbuf(req));
+	unsigned int cflags = io_put_rw_kbuf(req);
+	long res = req->result;
+
+	if (*locked) {
+		struct io_ring_ctx *ctx = req->ctx;
+		struct io_submit_state *state = &ctx->submit_state;
+
+		io_req_complete_state(req, res, cflags);
+		state->compl_reqs[state->compl_nr++] = req;
+		if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
+			io_submit_flush_completions(ctx);
+	} else {
+		io_req_complete_post(req, res, cflags);
+	}
 }
 
 static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
-- 
2.32.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] tw mutex & IRQ rw completion batching
  2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
                   ` (2 preceding siblings ...)
  2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov
@ 2021-08-19 15:53 ` Jens Axboe
  2021-08-19 16:35   ` Pavel Begunkov
  3 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-08-19 15:53 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 8/18/21 5:42 AM, Pavel Begunkov wrote:
> In essence, it's about two features. The first one is implemented by
> 1-2 and saves ->uring_lock lock/unlock in a single call of
> tctx_task_work(). Should be useful for links, apolls and BPF requests
> at some moment.
> 
> The second feature (3/3) is batching freeing and completing of
> IRQ-based read/write requests.
> 
> Haven't got numbers yet, but just throwing it for public discussion.

I ran some numbers and it looks good to me, it's a nice boost for the
IRQ completions. It's funny how the initial move to task_work for IRQ
completions took a small hit, but there's so many optimizations that it
unlocks that it's already better than before.

I'd like to apply 1/3 for now, but it depends on both master and
for-5.15/io_uring. Hence I think it'd be better to defer that one until
after the initial batch has gone in.

For the batched locking, the principle is sound and measures out to be a
nice win. But I have a hard time getting over the passed lock state, I
do wonder if there's a cleaner way to accomplish this...

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/3] tw mutex & IRQ rw completion batching
  2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe
@ 2021-08-19 16:35   ` Pavel Begunkov
  0 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-19 16:35 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 8/19/21 4:53 PM, Jens Axboe wrote:
> On 8/18/21 5:42 AM, Pavel Begunkov wrote:
>> In essence, it's about two features. The first one is implemented by
>> 1-2 and saves ->uring_lock lock/unlock in a single call of
>> tctx_task_work(). Should be useful for links, apolls and BPF requests
>> at some moment.
>>
>> The second feature (3/3) is batching freeing and completing of
>> IRQ-based read/write requests.
>>
>> Haven't got numbers yet, but just throwing it for public discussion.
> 
> I ran some numbers and it looks good to me, it's a nice boost for the
> IRQ completions. It's funny how the initial move to task_work for IRQ
> completions took a small hit, but there's so many optimizations that it
> unlocks that it's already better than before.
> 
> I'd like to apply 1/3 for now, but it depends on both master and
> for-5.15/io_uring. Hence I think it'd be better to defer that one until
> after the initial batch has gone in.
> 
> For the batched locking, the principle is sound and measures out to be a
> nice win. But I have a hard time getting over the passed lock state, I
> do wonder if there's a cleaner way to accomplish this...

The initial idea was to have a request flag telling whether a task_work
function may need a lock, but setting/clearing it would be more subtle.
Then there is io_poll_task_func -> io_req_task_submit -> lock, and
reads/writes based using trylock, because otherwise I'd rather be
afraid of it hurting latency.

This version looks good enough, apart from conditional locking always
being a pain. We can hide bool into a struct, and with a bunch of
helpers leave no visibility into it. Though, I don't think it helps
anything.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-08-20 18:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
2021-08-20  9:21   ` Hao Xu
2021-08-20  9:32     ` Hao Xu
2021-08-20  9:49     ` Pavel Begunkov
2021-08-20 10:16       ` Hao Xu
2021-08-20 12:26         ` Pavel Begunkov
2021-08-20 18:41           ` Hao Xu
2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov
2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov
2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe
2021-08-19 16:35   ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox