* [PATCH 0/3] tw mutex & IRQ rw completion batching
@ 2021-08-18 11:42 Pavel Begunkov
2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
` (3 more replies)
0 siblings, 4 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
In essence, it's about two features. The first one is implemented by
1-2 and saves ->uring_lock lock/unlock in a single call of
tctx_task_work(). Should be useful for links, apolls and BPF requests
at some moment.
The second feature (3/3) is batching freeing and completing of
IRQ-based read/write requests.
Haven't got numbers yet, but just throwing it for public discussion.
P.S. for the new horrendous part of io_req_task_complete() placing
state->compl_reqs and flushing is temporary, will be killed by other
planned patches.
Pavel Begunkov (3):
io_uring: flush completions for fallbacks
io_uring: batch task work locking
io_uring: IRQ rw completion batching
fs/io_uring.c | 103 +++++++++++++++++++++++++++++++++-----------------
1 file changed, 69 insertions(+), 34 deletions(-)
--
2.32.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/3] io_uring: flush completions for fallbacks
2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
@ 2021-08-18 11:42 ` Pavel Begunkov
2021-08-20 9:21 ` Hao Xu
2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
io_fallback_req_func() doesn't expect anyone creating inline
completions, and no one currently does that. Teach the function to flush
completions preparing for further changes.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 3da9f1374612..ba087f395507 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
percpu_ref_get(&ctx->refs);
llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
req->io_task_work.func(req);
+
+ mutex_lock(&ctx->uring_lock);
+ if (ctx->submit_state.compl_nr)
+ io_submit_flush_completions(ctx);
+ mutex_unlock(&ctx->uring_lock);
percpu_ref_put(&ctx->refs);
}
--
2.32.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/3] io_uring: batch task work locking
2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
@ 2021-08-18 11:42 ` Pavel Begunkov
2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov
2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe
3 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
Many task_work handlers either grab ->uring_lock, or may benefit from
having it. Move locking logic out of individual handlers to a lazy
approach controlled by tctx_task_work(), so we don't keep doing
tons of mutex lock/unlock.
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 89 ++++++++++++++++++++++++++++++---------------------
1 file changed, 52 insertions(+), 37 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ba087f395507..54c4d8326944 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -775,7 +775,7 @@ struct async_poll {
struct io_poll_iocb *double_poll;
};
-typedef void (*io_req_tw_func_t)(struct io_kiocb *req);
+typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked);
struct io_task_work {
union {
@@ -1080,6 +1080,14 @@ struct sock *io_uring_get_socket(struct file *file)
}
EXPORT_SYMBOL(io_uring_get_socket);
+static inline void io_tw_lock(struct io_ring_ctx *ctx, bool *locked)
+{
+ if (!*locked) {
+ mutex_lock(&ctx->uring_lock);
+ *locked = true;
+ }
+}
+
#define io_for_each_link(pos, head) \
for (pos = (head); pos; pos = pos->link)
@@ -1193,16 +1201,19 @@ static void io_fallback_req_func(struct work_struct *work)
fallback_work.work);
struct llist_node *node = llist_del_all(&ctx->fallback_llist);
struct io_kiocb *req, *tmp;
+ bool locked = false;
percpu_ref_get(&ctx->refs);
llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
- req->io_task_work.func(req);
+ req->io_task_work.func(req, &locked);
- mutex_lock(&ctx->uring_lock);
- if (ctx->submit_state.compl_nr)
- io_submit_flush_completions(ctx);
- mutex_unlock(&ctx->uring_lock);
+ if (locked) {
+ if (ctx->submit_state.compl_nr)
+ io_submit_flush_completions(ctx);
+ mutex_unlock(&ctx->uring_lock);
+ }
percpu_ref_put(&ctx->refs);
+
}
static struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
@@ -1386,12 +1397,15 @@ static void io_prep_async_link(struct io_kiocb *req)
}
}
-static void io_queue_async_work(struct io_kiocb *req)
+static void io_queue_async_work(struct io_kiocb *req, bool *locked)
{
struct io_ring_ctx *ctx = req->ctx;
struct io_kiocb *link = io_prep_linked_timeout(req);
struct io_uring_task *tctx = req->task->io_uring;
+ /* must not take the lock, NULL it as a precaution */
+ locked = NULL;
+
BUG_ON(!tctx);
BUG_ON(!tctx->io_wq);
@@ -2002,14 +2016,15 @@ static inline struct io_kiocb *io_req_find_next(struct io_kiocb *req)
return __io_req_find_next(req);
}
-static void ctx_flush_and_put(struct io_ring_ctx *ctx)
+static void ctx_flush_and_put(struct io_ring_ctx *ctx, bool *locked)
{
if (!ctx)
return;
- if (ctx->submit_state.compl_nr) {
- mutex_lock(&ctx->uring_lock);
- io_submit_flush_completions(ctx);
+ if (*locked) {
+ if (ctx->submit_state.compl_nr)
+ io_submit_flush_completions(ctx);
mutex_unlock(&ctx->uring_lock);
+ *locked = false;
}
}
@@ -2021,6 +2036,7 @@ static void ctx_flush_and_put(struct io_ring_ctx *ctx)
*/
static void tctx_task_work(struct callback_head *cb)
{
+ bool locked = false;
struct io_ring_ctx *ctx = NULL;
struct io_uring_task *tctx = container_of(cb, struct io_uring_task,
task_work);
@@ -2043,17 +2059,17 @@ static void tctx_task_work(struct callback_head *cb)
io_task_work.node);
if (req->ctx != ctx) {
- ctx_flush_and_put(ctx);
+ ctx_flush_and_put(ctx, &locked);
ctx = req->ctx;
}
- req->io_task_work.func(req);
+ req->io_task_work.func(req, &locked);
node = next;
} while (node);
cond_resched();
}
- ctx_flush_and_put(ctx);
+ ctx_flush_and_put(ctx, &locked);
}
static void io_req_task_work_add(struct io_kiocb *req)
@@ -2105,27 +2121,21 @@ static void io_req_task_work_add(struct io_kiocb *req)
}
}
-static void io_req_task_cancel(struct io_kiocb *req)
+static void io_req_task_cancel(struct io_kiocb *req, bool *locked)
{
- struct io_ring_ctx *ctx = req->ctx;
-
- /* ctx is guaranteed to stay alive while we hold uring_lock */
- mutex_lock(&ctx->uring_lock);
+ io_tw_lock(req->ctx, locked);
io_req_complete_failed(req, req->result);
- mutex_unlock(&ctx->uring_lock);
}
-static void io_req_task_submit(struct io_kiocb *req)
+static void io_req_task_submit(struct io_kiocb *req, bool *locked)
{
- struct io_ring_ctx *ctx = req->ctx;
-
/* ctx stays valid until unlock, even if we drop all ours ctx->refs */
- mutex_lock(&ctx->uring_lock);
+ io_tw_lock(req->ctx, locked);
+
if (likely(!(req->task->flags & PF_EXITING)))
__io_queue_sqe(req);
else
io_req_complete_failed(req, -EFAULT);
- mutex_unlock(&ctx->uring_lock);
}
static void io_req_task_queue_fail(struct io_kiocb *req, int ret)
@@ -2161,6 +2171,11 @@ static void io_free_req(struct io_kiocb *req)
__io_free_req(req);
}
+static void io_free_req_work(struct io_kiocb *req, bool *locked)
+{
+ io_free_req(req);
+}
+
struct req_batch {
struct task_struct *task;
int task_refs;
@@ -2260,7 +2275,7 @@ static inline void io_put_req(struct io_kiocb *req)
static inline void io_put_req_deferred(struct io_kiocb *req)
{
if (req_ref_put_and_test(req)) {
- req->io_task_work.func = io_free_req;
+ req->io_task_work.func = io_free_req_work;
io_req_task_work_add(req);
}
}
@@ -2555,7 +2570,7 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res)
return false;
}
-static void io_req_task_complete(struct io_kiocb *req)
+static void io_req_task_complete(struct io_kiocb *req, bool *locked)
{
__io_req_complete(req, 0, req->result, io_put_rw_kbuf(req));
}
@@ -2565,7 +2580,7 @@ static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
{
if (__io_complete_rw_common(req, res))
return;
- io_req_task_complete(req);
+ __io_req_complete(req, 0, req->result, io_put_rw_kbuf(req));
}
static void io_complete_rw(struct kiocb *kiocb, long res, long res2)
@@ -4980,7 +4995,7 @@ static bool io_poll_complete(struct io_kiocb *req, __poll_t mask)
return !(flags & IORING_CQE_F_MORE);
}
-static void io_poll_task_func(struct io_kiocb *req)
+static void io_poll_task_func(struct io_kiocb *req, bool *locked)
{
struct io_ring_ctx *ctx = req->ctx;
struct io_kiocb *nxt;
@@ -5004,7 +5019,7 @@ static void io_poll_task_func(struct io_kiocb *req)
if (done) {
nxt = io_put_req_find_next(req);
if (nxt)
- io_req_task_submit(nxt);
+ io_req_task_submit(nxt, locked);
}
}
}
@@ -5116,7 +5131,7 @@ static void io_async_queue_proc(struct file *file, struct wait_queue_head *head,
__io_queue_proc(&apoll->poll, pt, head, &apoll->double_poll);
}
-static void io_async_task_func(struct io_kiocb *req)
+static void io_async_task_func(struct io_kiocb *req, bool *locked)
{
struct async_poll *apoll = req->apoll;
struct io_ring_ctx *ctx = req->ctx;
@@ -5133,7 +5148,7 @@ static void io_async_task_func(struct io_kiocb *req)
spin_unlock(&ctx->completion_lock);
if (!READ_ONCE(apoll->poll.canceled))
- io_req_task_submit(req);
+ io_req_task_submit(req, locked);
else
io_req_complete_failed(req, -ECANCELED);
}
@@ -5532,7 +5547,7 @@ static int io_poll_update(struct io_kiocb *req, unsigned int issue_flags)
return 0;
}
-static void io_req_task_timeout(struct io_kiocb *req)
+static void io_req_task_timeout(struct io_kiocb *req, bool *locked)
{
req_set_fail(req);
io_req_complete_post(req, -ETIME, 0);
@@ -6087,7 +6102,7 @@ static bool io_drain_req(struct io_kiocb *req)
if (!req_need_defer(req, seq) && list_empty(&ctx->defer_list)) {
spin_unlock(&ctx->completion_lock);
kfree(de);
- io_queue_async_work(req);
+ io_queue_async_work(req, NULL);
return true;
}
@@ -6409,7 +6424,7 @@ static inline struct file *io_file_get(struct io_ring_ctx *ctx,
return io_file_get_normal(ctx, req, fd);
}
-static void io_req_task_link_timeout(struct io_kiocb *req)
+static void io_req_task_link_timeout(struct io_kiocb *req, bool *locked)
{
struct io_kiocb *prev = req->timeout.prev;
int ret;
@@ -6513,7 +6528,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
* Queued up for async execution, worker will release
* submit reference when the iocb is actually submitted.
*/
- io_queue_async_work(req);
+ io_queue_async_work(req, NULL);
break;
}
@@ -6538,7 +6553,7 @@ static inline void io_queue_sqe(struct io_kiocb *req)
if (unlikely(ret))
io_req_complete_failed(req, ret);
else
- io_queue_async_work(req);
+ io_queue_async_work(req, NULL);
}
}
--
2.32.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 3/3] io_uring: IRQ rw completion batching
2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov
@ 2021-08-18 11:42 ` Pavel Begunkov
2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe
3 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-18 11:42 UTC (permalink / raw)
To: Jens Axboe, io-uring
Employ inline completion logic for read/write completions done via
io_req_task_complete(). If ->uring_lock is contended, just do normal
request completion, but if not, make tctx_task_work() to grab the lock
and do batched inline completions in io_req_task_complete().
Signed-off-by: Pavel Begunkov <[email protected]>
---
fs/io_uring.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 54c4d8326944..7179e34df8e9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2061,6 +2061,8 @@ static void tctx_task_work(struct callback_head *cb)
if (req->ctx != ctx) {
ctx_flush_and_put(ctx, &locked);
ctx = req->ctx;
+ /* if not contended, grab and improve batching */
+ locked = mutex_trylock(&ctx->uring_lock);
}
req->io_task_work.func(req, &locked);
node = next;
@@ -2572,7 +2574,20 @@ static bool __io_complete_rw_common(struct io_kiocb *req, long res)
static void io_req_task_complete(struct io_kiocb *req, bool *locked)
{
- __io_req_complete(req, 0, req->result, io_put_rw_kbuf(req));
+ unsigned int cflags = io_put_rw_kbuf(req);
+ long res = req->result;
+
+ if (*locked) {
+ struct io_ring_ctx *ctx = req->ctx;
+ struct io_submit_state *state = &ctx->submit_state;
+
+ io_req_complete_state(req, res, cflags);
+ state->compl_reqs[state->compl_nr++] = req;
+ if (state->compl_nr == ARRAY_SIZE(state->compl_reqs))
+ io_submit_flush_completions(ctx);
+ } else {
+ io_req_complete_post(req, res, cflags);
+ }
}
static void __io_complete_rw(struct io_kiocb *req, long res, long res2,
--
2.32.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] tw mutex & IRQ rw completion batching
2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
` (2 preceding siblings ...)
2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov
@ 2021-08-19 15:53 ` Jens Axboe
2021-08-19 16:35 ` Pavel Begunkov
3 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2021-08-19 15:53 UTC (permalink / raw)
To: Pavel Begunkov, io-uring
On 8/18/21 5:42 AM, Pavel Begunkov wrote:
> In essence, it's about two features. The first one is implemented by
> 1-2 and saves ->uring_lock lock/unlock in a single call of
> tctx_task_work(). Should be useful for links, apolls and BPF requests
> at some moment.
>
> The second feature (3/3) is batching freeing and completing of
> IRQ-based read/write requests.
>
> Haven't got numbers yet, but just throwing it for public discussion.
I ran some numbers and it looks good to me, it's a nice boost for the
IRQ completions. It's funny how the initial move to task_work for IRQ
completions took a small hit, but there's so many optimizations that it
unlocks that it's already better than before.
I'd like to apply 1/3 for now, but it depends on both master and
for-5.15/io_uring. Hence I think it'd be better to defer that one until
after the initial batch has gone in.
For the batched locking, the principle is sound and measures out to be a
nice win. But I have a hard time getting over the passed lock state, I
do wonder if there's a cleaner way to accomplish this...
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 0/3] tw mutex & IRQ rw completion batching
2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe
@ 2021-08-19 16:35 ` Pavel Begunkov
0 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-19 16:35 UTC (permalink / raw)
To: Jens Axboe, io-uring
On 8/19/21 4:53 PM, Jens Axboe wrote:
> On 8/18/21 5:42 AM, Pavel Begunkov wrote:
>> In essence, it's about two features. The first one is implemented by
>> 1-2 and saves ->uring_lock lock/unlock in a single call of
>> tctx_task_work(). Should be useful for links, apolls and BPF requests
>> at some moment.
>>
>> The second feature (3/3) is batching freeing and completing of
>> IRQ-based read/write requests.
>>
>> Haven't got numbers yet, but just throwing it for public discussion.
>
> I ran some numbers and it looks good to me, it's a nice boost for the
> IRQ completions. It's funny how the initial move to task_work for IRQ
> completions took a small hit, but there's so many optimizations that it
> unlocks that it's already better than before.
>
> I'd like to apply 1/3 for now, but it depends on both master and
> for-5.15/io_uring. Hence I think it'd be better to defer that one until
> after the initial batch has gone in.
>
> For the batched locking, the principle is sound and measures out to be a
> nice win. But I have a hard time getting over the passed lock state, I
> do wonder if there's a cleaner way to accomplish this...
The initial idea was to have a request flag telling whether a task_work
function may need a lock, but setting/clearing it would be more subtle.
Then there is io_poll_task_func -> io_req_task_submit -> lock, and
reads/writes based using trylock, because otherwise I'd rather be
afraid of it hurting latency.
This version looks good enough, apart from conditional locking always
being a pain. We can hide bool into a struct, and with a bunch of
helpers leave no visibility into it. Though, I don't think it helps
anything.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
@ 2021-08-20 9:21 ` Hao Xu
2021-08-20 9:32 ` Hao Xu
2021-08-20 9:49 ` Pavel Begunkov
0 siblings, 2 replies; 12+ messages in thread
From: Hao Xu @ 2021-08-20 9:21 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/8/18 下午7:42, Pavel Begunkov 写道:
> io_fallback_req_func() doesn't expect anyone creating inline
> completions, and no one currently does that. Teach the function to flush
> completions preparing for further changes.
>
> Signed-off-by: Pavel Begunkov <[email protected]>
> ---
> fs/io_uring.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 3da9f1374612..ba087f395507 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
> percpu_ref_get(&ctx->refs);
> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
> req->io_task_work.func(req);
> +
> + mutex_lock(&ctx->uring_lock);
> + if (ctx->submit_state.compl_nr)
> + io_submit_flush_completions(ctx);
> + mutex_unlock(&ctx->uring_lock);
why do we protect io_submit_flush_completions() with uring_lock,
regarding that it is called in original context. Btw, why not
use ctx_flush_and_put()
> percpu_ref_put(&ctx->refs);
> }
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
2021-08-20 9:21 ` Hao Xu
@ 2021-08-20 9:32 ` Hao Xu
2021-08-20 9:49 ` Pavel Begunkov
1 sibling, 0 replies; 12+ messages in thread
From: Hao Xu @ 2021-08-20 9:32 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/8/20 下午5:21, Hao Xu 写道:
> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>> io_fallback_req_func() doesn't expect anyone creating inline
>> completions, and no one currently does that. Teach the function to flush
>> completions preparing for further changes.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> fs/io_uring.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 3da9f1374612..ba087f395507 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct
>> work_struct *work)
>> percpu_ref_get(&ctx->refs);
>> llist_for_each_entry_safe(req, tmp, node,
>> io_task_work.fallback_node)
>> req->io_task_work.func(req);
>> +
>> + mutex_lock(&ctx->uring_lock);
>> + if (ctx->submit_state.compl_nr)
>> + io_submit_flush_completions(ctx);
>> + mutex_unlock(&ctx->uring_lock);
> why do we protect io_submit_flush_completions() with uring_lock,
> regarding that it is called in original context. Btw, why not
I mean it is in original context before this patch..
> use ctx_flush_and_put()
>> percpu_ref_put(&ctx->refs);
>> }
>>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
2021-08-20 9:21 ` Hao Xu
2021-08-20 9:32 ` Hao Xu
@ 2021-08-20 9:49 ` Pavel Begunkov
2021-08-20 10:16 ` Hao Xu
1 sibling, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-20 9:49 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 8/20/21 10:21 AM, Hao Xu wrote:
> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>> io_fallback_req_func() doesn't expect anyone creating inline
>> completions, and no one currently does that. Teach the function to flush
>> completions preparing for further changes.
>>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> ---
>> fs/io_uring.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 3da9f1374612..ba087f395507 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>> percpu_ref_get(&ctx->refs);
>> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>> req->io_task_work.func(req);
>> +
>> + mutex_lock(&ctx->uring_lock);
>> + if (ctx->submit_state.compl_nr)
>> + io_submit_flush_completions(ctx);
>> + mutex_unlock(&ctx->uring_lock);
> why do we protect io_submit_flush_completions() with uring_lock,
> regarding that it is called in original context. Btw, why not
> use ctx_flush_and_put()
The fallback thing is called from a workqueue not the submitter task
context. See delayed_work and so.
Regarding locking, it touches struct io_submit_state, and it's protected by
->uring_lock. In particular we're interested in ->reqs and ->free_list.
FWIW, there is refurbishment going on around submit state, so if proves
useful the locking may change in coming months.
ctx_flush_and_put() could have been used, but simpler to hand code it
and avoid the (always messy) conditional ref grabbing and locking.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
2021-08-20 9:49 ` Pavel Begunkov
@ 2021-08-20 10:16 ` Hao Xu
2021-08-20 12:26 ` Pavel Begunkov
0 siblings, 1 reply; 12+ messages in thread
From: Hao Xu @ 2021-08-20 10:16 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/8/20 下午5:49, Pavel Begunkov 写道:
> On 8/20/21 10:21 AM, Hao Xu wrote:
>> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>>> io_fallback_req_func() doesn't expect anyone creating inline
>>> completions, and no one currently does that. Teach the function to flush
>>> completions preparing for further changes.
>>>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> ---
>>> fs/io_uring.c | 5 +++++
>>> 1 file changed, 5 insertions(+)
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 3da9f1374612..ba087f395507 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>>> percpu_ref_get(&ctx->refs);
>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>>> req->io_task_work.func(req);
>>> +
>>> + mutex_lock(&ctx->uring_lock);
>>> + if (ctx->submit_state.compl_nr)
>>> + io_submit_flush_completions(ctx);
>>> + mutex_unlock(&ctx->uring_lock);
>> why do we protect io_submit_flush_completions() with uring_lock,
>> regarding that it is called in original context. Btw, why not
>> use ctx_flush_and_put()
>
> The fallback thing is called from a workqueue not the submitter task
> context. See delayed_work and so.
>
> Regarding locking, it touches struct io_submit_state, and it's protected by
> ->uring_lock. In particular we're interested in ->reqs and ->free_list.
> FWIW, there is refurbishment going on around submit state, so if proves
> useful the locking may change in coming months.
>
> ctx_flush_and_put() could have been used, but simpler to hand code it
> and avoid the (always messy) conditional ref grabbing and locking.
I didn't get it, what do you mean 'avoid the (always messy) conditional
ref grabbing and locking'? the code here and in ctx_flush_and_put() are
same..though I think in ctx_flush_and_put(), there is a problem that
compl_nr should also be protected.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
2021-08-20 10:16 ` Hao Xu
@ 2021-08-20 12:26 ` Pavel Begunkov
2021-08-20 18:41 ` Hao Xu
0 siblings, 1 reply; 12+ messages in thread
From: Pavel Begunkov @ 2021-08-20 12:26 UTC (permalink / raw)
To: Hao Xu, Jens Axboe, io-uring
On 8/20/21 11:16 AM, Hao Xu wrote:
> 在 2021/8/20 下午5:49, Pavel Begunkov 写道:
>> On 8/20/21 10:21 AM, Hao Xu wrote:
>>> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>>>> io_fallback_req_func() doesn't expect anyone creating inline
>>>> completions, and no one currently does that. Teach the function to flush
>>>> completions preparing for further changes.
>>>>
>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>> ---
>>>> fs/io_uring.c | 5 +++++
>>>> 1 file changed, 5 insertions(+)
>>>>
>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>> index 3da9f1374612..ba087f395507 100644
>>>> --- a/fs/io_uring.c
>>>> +++ b/fs/io_uring.c
>>>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>>>> percpu_ref_get(&ctx->refs);
>>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>>>> req->io_task_work.func(req);
>>>> +
>>>> + mutex_lock(&ctx->uring_lock);
>>>> + if (ctx->submit_state.compl_nr)
>>>> + io_submit_flush_completions(ctx);
>>>> + mutex_unlock(&ctx->uring_lock);
>>> why do we protect io_submit_flush_completions() with uring_lock,
>>> regarding that it is called in original context. Btw, why not
>>> use ctx_flush_and_put()
>>
>> The fallback thing is called from a workqueue not the submitter task
>> context. See delayed_work and so.
>>
>> Regarding locking, it touches struct io_submit_state, and it's protected by
>> ->uring_lock. In particular we're interested in ->reqs and ->free_list.
>> FWIW, there is refurbishment going on around submit state, so if proves
>> useful the locking may change in coming months.
>>
>> ctx_flush_and_put() could have been used, but simpler to hand code it
>> and avoid the (always messy) conditional ref grabbing and locking.
> I didn't get it, what do you mean 'avoid the (always messy) conditional
> ref grabbing and locking'? the code here and in ctx_flush_and_put() are
> same..though I think in ctx_flush_and_put(), there is a problem that
> compl_nr should also be protected.
Ok, the long story. First, notice a ctx check at the beginning of
ctx_flush_and_put(), that one is conditional. Even though we know
it's not NULL, it's more confusing and might be a problem for
static and human analysis.
Also, locking is never easy, and so IMHO it's preferable to keep
lock() and a matching unlock (or get/put) in the same function if
possible, much easier to read. Compare
ref_get();
do_something();
ref_put();
and
ref_get();
do_something();
flush_ctx();
I believe, the first one is of less mental overhead.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/3] io_uring: flush completions for fallbacks
2021-08-20 12:26 ` Pavel Begunkov
@ 2021-08-20 18:41 ` Hao Xu
0 siblings, 0 replies; 12+ messages in thread
From: Hao Xu @ 2021-08-20 18:41 UTC (permalink / raw)
To: Pavel Begunkov, Jens Axboe, io-uring
在 2021/8/20 下午8:26, Pavel Begunkov 写道:
> On 8/20/21 11:16 AM, Hao Xu wrote:
>> 在 2021/8/20 下午5:49, Pavel Begunkov 写道:
>>> On 8/20/21 10:21 AM, Hao Xu wrote:
>>>> 在 2021/8/18 下午7:42, Pavel Begunkov 写道:
>>>>> io_fallback_req_func() doesn't expect anyone creating inline
>>>>> completions, and no one currently does that. Teach the function to flush
>>>>> completions preparing for further changes.
>>>>>
>>>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>>>> ---
>>>>> fs/io_uring.c | 5 +++++
>>>>> 1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>>>> index 3da9f1374612..ba087f395507 100644
>>>>> --- a/fs/io_uring.c
>>>>> +++ b/fs/io_uring.c
>>>>> @@ -1197,6 +1197,11 @@ static void io_fallback_req_func(struct work_struct *work)
>>>>> percpu_ref_get(&ctx->refs);
>>>>> llist_for_each_entry_safe(req, tmp, node, io_task_work.fallback_node)
>>>>> req->io_task_work.func(req);
>>>>> +
>>>>> + mutex_lock(&ctx->uring_lock);
>>>>> + if (ctx->submit_state.compl_nr)
>>>>> + io_submit_flush_completions(ctx);
>>>>> + mutex_unlock(&ctx->uring_lock);
>>>> why do we protect io_submit_flush_completions() with uring_lock,
>>>> regarding that it is called in original context. Btw, why not
>>>> use ctx_flush_and_put()
>>>
>>> The fallback thing is called from a workqueue not the submitter task
>>> context. See delayed_work and so.
>>>
>>> Regarding locking, it touches struct io_submit_state, and it's protected by
>>> ->uring_lock. In particular we're interested in ->reqs and ->free_list.
>>> FWIW, there is refurbishment going on around submit state, so if proves
>>> useful the locking may change in coming months.
>>>
>>> ctx_flush_and_put() could have been used, but simpler to hand code it
>>> and avoid the (always messy) conditional ref grabbing and locking.
>
>> I didn't get it, what do you mean 'avoid the (always messy) conditional
>> ref grabbing and locking'? the code here and in ctx_flush_and_put() are
>> same..though I think in ctx_flush_and_put(), there is a problem that
>> compl_nr should also be protected.
>
> Ok, the long story. First, notice a ctx check at the beginning of
> ctx_flush_and_put(), that one is conditional. Even though we know
> it's not NULL, it's more confusing and might be a problem for
> static and human analysis.
>
> Also, locking is never easy, and so IMHO it's preferable to keep
> lock() and a matching unlock (or get/put) in the same function if
> possible, much easier to read. Compare
>
> ref_get();
> do_something();
> ref_put();
>
> and
>
> ref_get();
> do_something();
> flush_ctx();
>
> I believe, the first one is of less mental overhead.
Thanks, got it.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-08-20 18:41 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-18 11:42 [PATCH 0/3] tw mutex & IRQ rw completion batching Pavel Begunkov
2021-08-18 11:42 ` [PATCH 1/3] io_uring: flush completions for fallbacks Pavel Begunkov
2021-08-20 9:21 ` Hao Xu
2021-08-20 9:32 ` Hao Xu
2021-08-20 9:49 ` Pavel Begunkov
2021-08-20 10:16 ` Hao Xu
2021-08-20 12:26 ` Pavel Begunkov
2021-08-20 18:41 ` Hao Xu
2021-08-18 11:42 ` [PATCH 2/3] io_uring: batch task work locking Pavel Begunkov
2021-08-18 11:42 ` [PATCH 3/3] io_uring: IRQ rw completion batching Pavel Begunkov
2021-08-19 15:53 ` [PATCH 0/3] tw mutex & " Jens Axboe
2021-08-19 16:35 ` Pavel Begunkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox