From: Pavel Begunkov <asml.silence@gmail.com>
To: Jens Axboe <axboe@kernel.dk>, io-uring <io-uring@vger.kernel.org>
Subject: Re: [PATCH] io_uring/poll: use regular CQE posting for multishot termination
Date: Thu, 8 May 2025 09:11:27 +0100 [thread overview]
Message-ID: <c0f72ca7-76f9-41b3-9bcd-18deba603ab3@gmail.com> (raw)
In-Reply-To: <d0c88f28-3915-4860-93d7-3a383aff8061@kernel.dk>
On 5/8/25 00:11, Jens Axboe wrote:
...>>> This removes the io_req_post_cqe() flushing, and instead puts the honus
>>> on the poll side to provide the ordering. I've verified that this also
>>> fixes the reported issue. The previous patch can be easily backported to
>>> stable, so makes sense to keep that one.
>>
>> It still gives a bad feeling tbh, it's not a polling problem,
>> we're working around shortcomings of the incremental / bundled
>> uapi and/or design. Patching it in semi unrelated places will
>> defitely bite back.
>
> I don't think that's fair, we should always strive to have as close to
> ordered completions as we can. The fact that multishot ends up using a
As a nice thing to have / perf / qos? Sure, but this makes it a
correctness requirement, which always takes precedence over any
future potential optimisation / cleanup that might lead to a rare
reordering.
> mix of both methods to fill CQEs is problematic.
It's a nuisance, I agree, but the problem here is introducing
implicit inter request ordering. I assume for incremental you're
returning {bid, len} and the user has to have a state to track
the current offset based on CQE ordering? In which case the
problem is really not returning the offset explicitly.
>> Can it be fixed in relevant opcodes? So it stays close to
>> those who actually use it. And let me ask since I'm lost in
>> new features, can the uapi be fixed so that it doesn't
>> depend on request ordering?
>
> The API absolutely relies on ordering within a buffer group ID.
>
> It can certainly be fixed at the opcode sites, but there'd be 3 spots in
> net and one in rw.c, and for each spot it'd be more involved to fix it.
But that's the spots that use provided buffers, while polling shouldn't
have a reason to know about provided buffers. For example, is it safe
to mix such mshot and oneshot requests? Let's say during initial
submission oneshot request succeeds and queues a deferred completion,
i.e. io_issue_sqe() -> io_req_complete_defer(). Next it submits an
mshot request, which posts some aux cqes, and only then we file a
CQE for the first request. Or it can be just an mshot executed not
from the polling loop but inline.
>
...>>> +{
>>> + bool filled;
>>> +
>>> + filled = io_req_post_cqe(req, req->cqe.res, req->cqe.flags);
>>
>> posting and overflow must be under the same CQ critical section,
>> like io_cq_lock(). Just copy io_post_aux_cqe() and add
>> ctx->cq_extra--? Hopefully we'll remove the cq_extra ugliness
>> later and combine them after.
>
> Would be great to combine those, as it stands there's a mix of them, and
> io_add_aux_cqe() for example does split locking. I'll update the
> locking.
It would be, but that's buggy and can cause reordering. io_add_aux_cqe()
doesn't split the _CQ critical section_, it relies / requires
DEFER_TASKRUN, and ->completion_lock is not equivalent to that.
>>> diff --git a/io_uring/poll.c b/io_uring/poll.c
>>> index 8eb744eb9f4c..af8e3d4f6f1f 100644
>>> --- a/io_uring/poll.c
>>> +++ b/io_uring/poll.c
>>> @@ -312,6 +312,13 @@ static int io_poll_check_events(struct io_kiocb *req, io_tw_token_t tw)
>>> return IOU_POLL_NO_ACTION;
>>> }
>>> +static void io_poll_req_complete(struct io_kiocb *req, io_tw_token_t tw)
>>> +{
>>> + if (io_req_post_cqe_overflow(req))
>>> + req->flags |= REQ_F_CQE_SKIP;
>>
>> Unconditional would be better. It'd still end up in attempting
>> to post, likely failing and reattemptng allocation just one
>> extra time, not like it gives any reliability. And if I'd be
>> choosing b/w dropping a completion or potentially getting a
>> botched completion as per the problem you tried, I say the
>> former is better.
>
> Not sure I follow, unconditional what? SKIP? Yes that probably makes
Yes
io_req_post_cqe_overflow(req);
req->flags |= REQ_F_CQE_SKIP;
> sense, if we don't overflow post, it'll get logged as such anyway.
--
Pavel Begunkov
prev parent reply other threads:[~2025-05-08 8:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-07 18:08 [PATCH] io_uring/poll: use regular CQE posting for multishot termination Jens Axboe
2025-05-07 20:53 ` Pavel Begunkov
2025-05-07 23:11 ` Jens Axboe
2025-05-08 8:11 ` Pavel Begunkov [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c0f72ca7-76f9-41b3-9bcd-18deba603ab3@gmail.com \
--to=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox