From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
io-uring <io-uring@vger.kernel.org>
Subject: Re: [PATCH] io_uring: ensure deferred completions are flushed for multishot
Date: Wed, 7 May 2025 09:26:58 -0600 [thread overview]
Message-ID: <611393de-4c50-4597-9290-82059e412a4b@kernel.dk> (raw)
In-Reply-To: <f6ac1b25-3185-4d3a-8e8e-d6d2771f2b3c@gmail.com>
On 5/7/25 9:11 AM, Pavel Begunkov wrote:
> On 5/7/25 15:58, Jens Axboe wrote:
>> On 5/7/25 8:36 AM, Pavel Begunkov wrote:
>>> On 5/7/25 14:56, Jens Axboe wrote:
>>>> Multishot normally uses io_req_post_cqe() to post completions, but when
>>>> stopping it, it may finish up with a deferred completion. This is fine,
>>>> except if another multishot event triggers before the deferred completions
>>>> get flushed. If this occurs, then CQEs may get reordered in the CQ ring,
>>>> as new multishot completions get posted before the deferred ones are
>>>> flushed. This can cause confusion on the application side, if strict
>>>> ordering is required for the use case.
>>>>
>>>> When multishot posting via io_req_post_cqe(), flush any pending deferred
>>>> completions first, if any.
>>>>
>>>> Cc: stable@vger.kernel.org # 6.1+
>>>> Reported-by: Norman Maurer <norman_maurer@apple.com>
>>>> Reported-by: Christian Mazakas <christian.mazakas@gmail.com>
>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>
>>>> ---
>>>>
>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>> index 769814d71153..541e65a1eebf 100644
>>>> --- a/io_uring/io_uring.c
>>>> +++ b/io_uring/io_uring.c
>>>> @@ -848,6 +848,14 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u32 cflags)
>>>> struct io_ring_ctx *ctx = req->ctx;
>>>> bool posted;
>>>> + /*
>>>> + * If multishot has already posted deferred completions, ensure that
>>>> + * those are flushed first before posting this one. If not, CQEs
>>>> + * could get reordered.
>>>> + */
>>>> + if (!wq_list_empty(&ctx->submit_state.compl_reqs))
>>>> + __io_submit_flush_completions(ctx);
>>>
>>> A request is already dead if it's in compl_reqs, there should be no
>>> way io_req_post_cqe() is called with it. Is it reordering of CQEs
>>> belonging to the same request? And what do you mean by "deferred"
>>> completions?
>>
>> It's not the same req, it's different requests using the same
>> provided buffer ring where it can be problematic.
>
> Ok, and there has never been any ordering guarantees between them.
> Is there any report describing the problem? Why it's ok if
> io_req_post_cqe() produced CQEs of two multishot requests get
> reordered, but it's not when one of them is finishing a request?
> What's the problem with provided buffers?
There better be ordering between posting of the CQEs - I'm not talking
about issue ordering in general, but if you have R1 (request 1) and R2
each consuming from the same buffer ring, then completion posting of
those absolutely need to be ordered. If not, you can have:
R1 peek X buffers, BID Y, BGID Z. Doesn't use io_req_post_cqe() because
it's stopping, end up posting via io_req_task_complete ->
io_req_complete_defer -> add to deferred list.
Other task_work in that run has R2, grabbing buffers from BGID Z,
doesn't terminate, posts CQE2 via io_req_post_cqe().
tw run done, deferred completions flushed, R1 posts CQE1.
Then we have CQE2 ahead of CQE1 in the CQ ring.
> It's a pretty bad spot to do such kinds of things, it disables any
> mixed mshot with non-mshot batching, and nesting flushing under
> an opcode handler is pretty nasty, it should rather stay top
> level as it is. From what I hear it's a provided buffer specific
> problem and should be fixed there.
I agree it's not the prettiest, but I would prefer if we do it this way
just in terms of stable backporting. This should be a relatively rare
occurence, so I'm not too concerned about missed batching here.
--
Jens Axboe
next prev parent reply other threads:[~2025-05-07 15:27 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-07 13:56 [PATCH] io_uring: ensure deferred completions are flushed for multishot Jens Axboe
2025-05-07 14:36 ` Pavel Begunkov
2025-05-07 14:58 ` Jens Axboe
2025-05-07 15:11 ` Pavel Begunkov
2025-05-07 15:26 ` Jens Axboe [this message]
2025-05-07 15:46 ` Pavel Begunkov
2025-05-07 15:49 ` Jens Axboe
2025-05-07 16:18 ` Pavel Begunkov
2025-05-07 17:51 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=611393de-4c50-4597-9290-82059e412a4b@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox