* [PATCH] io_uring: get rid of intermediate aux cqe caches
@ 2024-02-29 16:36 Pavel Begunkov
2024-02-29 18:03 ` David Wei
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Pavel Begunkov @ 2024-02-29 16:36 UTC (permalink / raw)
To: io-uring; +Cc: Jens Axboe, asml.silence, dw
With defer taskrun we store aux cqes into a cache array and then flush
into the CQ, and we also maintain the ordering so aux cqes are flushed
before request completions. Why do we need the cache instead of pushing
them directly? We acutally don't, so let's kill it.
One nuance is synchronisation -- the path we touch here is only for
DEFER_TASKRUN and guaranteed to be executed in the task context, and
all cqe posting is serialised by that. We also don't need locks because
of that, see __io_cq_lock().
Signed-off-by: Pavel Begunkov <[email protected]>
---
include/linux/io_uring_types.h | 4 +--
io_uring/io_uring.c | 55 ++++------------------------------
2 files changed, 8 insertions(+), 51 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 854ad67a5f70..35c2945cb1bf 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -204,7 +204,9 @@ struct io_submit_state {
bool plug_started;
bool need_plug;
+ bool flush_cqes;
unsigned short submit_nr;
+
unsigned int cqes_count;
struct blk_plug plug;
};
@@ -338,8 +340,6 @@ struct io_ring_ctx {
unsigned cq_last_tm_flush;
} ____cacheline_aligned_in_smp;
- struct io_uring_cqe completion_cqes[16];
-
spinlock_t completion_lock;
/* IRQ completion list, under ->completion_lock */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 86761ec623f9..07f683368855 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -173,7 +173,7 @@ static struct ctl_table kernel_io_uring_disabled_table[] = {
static inline void io_submit_flush_completions(struct io_ring_ctx *ctx)
{
if (!wq_list_empty(&ctx->submit_state.compl_reqs) ||
- ctx->submit_state.cqes_count)
+ ctx->submit_state.flush_cqes)
__io_submit_flush_completions(ctx);
}
@@ -886,31 +886,6 @@ static bool io_fill_cqe_aux(struct io_ring_ctx *ctx, u64 user_data, s32 res,
return false;
}
-static void __io_flush_post_cqes(struct io_ring_ctx *ctx)
- __must_hold(&ctx->uring_lock)
-{
- struct io_submit_state *state = &ctx->submit_state;
- unsigned int i;
-
- lockdep_assert_held(&ctx->uring_lock);
- for (i = 0; i < state->cqes_count; i++) {
- struct io_uring_cqe *cqe = &ctx->completion_cqes[i];
-
- if (!io_fill_cqe_aux(ctx, cqe->user_data, cqe->res, cqe->flags)) {
- if (ctx->lockless_cq) {
- spin_lock(&ctx->completion_lock);
- io_cqring_event_overflow(ctx, cqe->user_data,
- cqe->res, cqe->flags, 0, 0);
- spin_unlock(&ctx->completion_lock);
- } else {
- io_cqring_event_overflow(ctx, cqe->user_data,
- cqe->res, cqe->flags, 0, 0);
- }
- }
- }
- state->cqes_count = 0;
-}
-
static bool __io_post_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 cflags,
bool allow_overflow)
{
@@ -938,32 +913,15 @@ bool io_fill_cqe_req_aux(struct io_kiocb *req, bool defer, s32 res, u32 cflags)
{
struct io_ring_ctx *ctx = req->ctx;
u64 user_data = req->cqe.user_data;
- struct io_uring_cqe *cqe;
if (!defer)
return __io_post_aux_cqe(ctx, user_data, res, cflags, false);
lockdep_assert_held(&ctx->uring_lock);
+ io_lockdep_assert_cq_locked(ctx);
- if (ctx->submit_state.cqes_count == ARRAY_SIZE(ctx->completion_cqes)) {
- __io_cq_lock(ctx);
- __io_flush_post_cqes(ctx);
- /* no need to flush - flush is deferred */
- __io_cq_unlock_post(ctx);
- }
-
- /* For defered completions this is not as strict as it is otherwise,
- * however it's main job is to prevent unbounded posted completions,
- * and in that it works just as well.
- */
- if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq))
- return false;
-
- cqe = &ctx->completion_cqes[ctx->submit_state.cqes_count++];
- cqe->user_data = user_data;
- cqe->res = res;
- cqe->flags = cflags;
- return true;
+ ctx->submit_state.flush_cqes = true;
+ return io_fill_cqe_aux(ctx, user_data, res, cflags);
}
static void __io_req_complete_post(struct io_kiocb *req, unsigned issue_flags)
@@ -1546,9 +1504,7 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
struct io_wq_work_node *node;
__io_cq_lock(ctx);
- /* must come first to preserve CQE ordering in failure cases */
- if (state->cqes_count)
- __io_flush_post_cqes(ctx);
+
__wq_list_for_each(node, &state->compl_reqs) {
struct io_kiocb *req = container_of(node, struct io_kiocb,
comp_list);
@@ -1570,6 +1526,7 @@ void __io_submit_flush_completions(struct io_ring_ctx *ctx)
io_free_batch_list(ctx, state->compl_reqs.first);
INIT_WQ_LIST(&state->compl_reqs);
}
+ ctx->submit_state.flush_cqes = false;
}
static unsigned io_cqring_events(struct io_ring_ctx *ctx)
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] io_uring: get rid of intermediate aux cqe caches
2024-02-29 16:36 [PATCH] io_uring: get rid of intermediate aux cqe caches Pavel Begunkov
@ 2024-02-29 18:03 ` David Wei
2024-03-01 1:57 ` Pavel Begunkov
2024-03-01 3:45 ` Jens Axboe
2024-03-01 3:45 ` Jens Axboe
2 siblings, 1 reply; 8+ messages in thread
From: David Wei @ 2024-02-29 18:03 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: Jens Axboe
On 2024-02-29 16:36, Pavel Begunkov wrote:
> With defer taskrun we store aux cqes into a cache array and then flush
> into the CQ, and we also maintain the ordering so aux cqes are flushed
> before request completions. Why do we need the cache instead of pushing
> them directly? We acutally don't, so let's kill it.
>
> One nuance is synchronisation -- the path we touch here is only for
> DEFER_TASKRUN and guaranteed to be executed in the task context, and
> all cqe posting is serialised by that. We also don't need locks because
> of that, see __io_cq_lock().
Overall the change looks good to me. Are there any tests that check
corner cases for multishot recv() CQE ordering?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] io_uring: get rid of intermediate aux cqe caches
2024-02-29 18:03 ` David Wei
@ 2024-03-01 1:57 ` Pavel Begunkov
0 siblings, 0 replies; 8+ messages in thread
From: Pavel Begunkov @ 2024-03-01 1:57 UTC (permalink / raw)
To: David Wei, io-uring; +Cc: Jens Axboe
On 2/29/24 18:03, David Wei wrote:
> On 2024-02-29 16:36, Pavel Begunkov wrote:
>> With defer taskrun we store aux cqes into a cache array and then flush
>> into the CQ, and we also maintain the ordering so aux cqes are flushed
>> before request completions. Why do we need the cache instead of pushing
>> them directly? We acutally don't, so let's kill it.
>>
>> One nuance is synchronisation -- the path we touch here is only for
>> DEFER_TASKRUN and guaranteed to be executed in the task context, and
>> all cqe posting is serialised by that. We also don't need locks because
>> of that, see __io_cq_lock().
>
> Overall the change looks good to me. Are there any tests that check
> corner cases for multishot recv() CQE ordering?
In short, recv-multishot.c:test_enobuf() should do
And it's pretty safe, we want aux cqes first -- we get them
first. They're posted immediately, and it's hard to imagine
it to be any earlier, i.e. posting before anyone even asked
to post them.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] io_uring: get rid of intermediate aux cqe caches
2024-02-29 16:36 [PATCH] io_uring: get rid of intermediate aux cqe caches Pavel Begunkov
2024-02-29 18:03 ` David Wei
@ 2024-03-01 3:45 ` Jens Axboe
2024-03-04 22:07 ` Pavel Begunkov
2024-03-01 3:45 ` Jens Axboe
2 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2024-03-01 3:45 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: dw
On 2/29/24 9:36 AM, Pavel Begunkov wrote:
> With defer taskrun we store aux cqes into a cache array and then flush
> into the CQ, and we also maintain the ordering so aux cqes are flushed
> before request completions. Why do we need the cache instead of pushing
> them directly? We acutally don't, so let's kill it.
>
> One nuance is synchronisation -- the path we touch here is only for
> DEFER_TASKRUN and guaranteed to be executed in the task context, and
> all cqe posting is serialised by that. We also don't need locks because
> of that, see __io_cq_lock().
Nicely spotted! Looks good to me.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] io_uring: get rid of intermediate aux cqe caches
2024-03-01 3:45 ` Jens Axboe
@ 2024-03-04 22:07 ` Pavel Begunkov
2024-03-04 22:18 ` Pavel Begunkov
0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2024-03-04 22:07 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: dw
On 3/1/24 03:45, Jens Axboe wrote:
> On 2/29/24 9:36 AM, Pavel Begunkov wrote:
>> With defer taskrun we store aux cqes into a cache array and then flush
>> into the CQ, and we also maintain the ordering so aux cqes are flushed
>> before request completions. Why do we need the cache instead of pushing
>> them directly? We acutally don't, so let's kill it.
>>
>> One nuance is synchronisation -- the path we touch here is only for
>> DEFER_TASKRUN and guaranteed to be executed in the task context, and
>> all cqe posting is serialised by that. We also don't need locks because
>> of that, see __io_cq_lock().
>
> Nicely spotted! Looks good to me.
Apparently I'm wrong as "defer" in that function is not about
defer taskrun, but rather IO_URING_F_COMPLETE_DEFER. Jens, can
you drop it for now?
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] io_uring: get rid of intermediate aux cqe caches
2024-03-04 22:07 ` Pavel Begunkov
@ 2024-03-04 22:18 ` Pavel Begunkov
2024-03-04 23:36 ` Jens Axboe
0 siblings, 1 reply; 8+ messages in thread
From: Pavel Begunkov @ 2024-03-04 22:18 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: dw
On 3/4/24 22:07, Pavel Begunkov wrote:
> On 3/1/24 03:45, Jens Axboe wrote:
>> On 2/29/24 9:36 AM, Pavel Begunkov wrote:
>>> With defer taskrun we store aux cqes into a cache array and then flush
>>> into the CQ, and we also maintain the ordering so aux cqes are flushed
>>> before request completions. Why do we need the cache instead of pushing
>>> them directly? We acutally don't, so let's kill it.
>>>
>>> One nuance is synchronisation -- the path we touch here is only for
>>> DEFER_TASKRUN and guaranteed to be executed in the task context, and
>>> all cqe posting is serialised by that. We also don't need locks because
>>> of that, see __io_cq_lock().
>>
>> Nicely spotted! Looks good to me.
>
> Apparently I'm wrong as "defer" in that function is not about
> defer taskrun, but rather IO_URING_F_COMPLETE_DEFER. Jens, can
> you drop it for now?
One way forward would be to limit the aux fast path to defer
taskrun, I don't really care about other cases, and aux is
mostly useful for multishots hence net, which should preferably
be DEFER_TASKRUN.
In any case the code is a bit unhandy if not smelly, can use
some refactoring.
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] io_uring: get rid of intermediate aux cqe caches
2024-03-04 22:18 ` Pavel Begunkov
@ 2024-03-04 23:36 ` Jens Axboe
0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-03-04 23:36 UTC (permalink / raw)
To: Pavel Begunkov, io-uring; +Cc: dw
On 3/4/24 3:18 PM, Pavel Begunkov wrote:
> On 3/4/24 22:07, Pavel Begunkov wrote:
>> On 3/1/24 03:45, Jens Axboe wrote:
>>> On 2/29/24 9:36 AM, Pavel Begunkov wrote:
>>>> With defer taskrun we store aux cqes into a cache array and then flush
>>>> into the CQ, and we also maintain the ordering so aux cqes are flushed
>>>> before request completions. Why do we need the cache instead of pushing
>>>> them directly? We acutally don't, so let's kill it.
>>>>
>>>> One nuance is synchronisation -- the path we touch here is only for
>>>> DEFER_TASKRUN and guaranteed to be executed in the task context, and
>>>> all cqe posting is serialised by that. We also don't need locks because
>>>> of that, see __io_cq_lock().
>>>
>>> Nicely spotted! Looks good to me.
>>
>> Apparently I'm wrong as "defer" in that function is not about
>> defer taskrun, but rather IO_URING_F_COMPLETE_DEFER. Jens, can
>> you drop it for now?
Dropped
> One way forward would be to limit the aux fast path to defer
> taskrun, I don't really care about other cases, and aux is
> mostly useful for multishots hence net, which should preferably
> be DEFER_TASKRUN.
I think that'd be fine.
> In any case the code is a bit unhandy if not smelly, can use
> some refactoring.
For sure.
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] io_uring: get rid of intermediate aux cqe caches
2024-02-29 16:36 [PATCH] io_uring: get rid of intermediate aux cqe caches Pavel Begunkov
2024-02-29 18:03 ` David Wei
2024-03-01 3:45 ` Jens Axboe
@ 2024-03-01 3:45 ` Jens Axboe
2 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2024-03-01 3:45 UTC (permalink / raw)
To: io-uring, Pavel Begunkov; +Cc: dw
On Thu, 29 Feb 2024 16:36:37 +0000, Pavel Begunkov wrote:
> With defer taskrun we store aux cqes into a cache array and then flush
> into the CQ, and we also maintain the ordering so aux cqes are flushed
> before request completions. Why do we need the cache instead of pushing
> them directly? We acutally don't, so let's kill it.
>
> One nuance is synchronisation -- the path we touch here is only for
> DEFER_TASKRUN and guaranteed to be executed in the task context, and
> all cqe posting is serialised by that. We also don't need locks because
> of that, see __io_cq_lock().
>
> [...]
Applied, thanks!
[1/1] io_uring: get rid of intermediate aux cqe caches
commit: a7fa76c8565aa6cbeb5030a6597d85643de0a306
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-03-04 23:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-29 16:36 [PATCH] io_uring: get rid of intermediate aux cqe caches Pavel Begunkov
2024-02-29 18:03 ` David Wei
2024-03-01 1:57 ` Pavel Begunkov
2024-03-01 3:45 ` Jens Axboe
2024-03-04 22:07 ` Pavel Begunkov
2024-03-04 22:18 ` Pavel Begunkov
2024-03-04 23:36 ` Jens Axboe
2024-03-01 3:45 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox