From: Pavel Begunkov <[email protected]>
To: Oliver Crumrine <[email protected]>, [email protected]
Cc: [email protected], [email protected]
Subject: Re: [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for io_uring zerocopy
Date: Mon, 8 Apr 2024 00:46:44 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAK1VsR1b-dbAa8pMqGvfcAAcVP3ZkTYZdyqcaK4wJdbuAZtJsA@mail.gmail.com>
On 4/7/24 20:14, Oliver Crumrine wrote:
> Oliver Crumrine wrote:
>> Pavel Begunkov wrote:
>>> On 4/5/24 21:04, Oliver Crumrine wrote:
>>>> Pavel Begunkov wrote:
>>>>> On 4/4/24 23:17, Oliver Crumrine wrote:
>>>>>> In his patch to enable zerocopy networking for io_uring, Pavel Begunkov
>>>>>> specifically disabled REQ_F_CQE_SKIP, as (at least from my
>>>>>> understanding) the userspace program wouldn't receive the
>>>>>> IORING_CQE_F_MORE flag in the result value.
>>>>>
>>>>> No. IORING_CQE_F_MORE means there will be another CQE from this
>>>>> request, so a single CQE without IORING_CQE_F_MORE is trivially
>>>>> fine.
>>>>>
>>>>> The problem is the semantics, because by suppressing the first
>>>>> CQE you're loosing the result value. You might rely on WAITALL
>>>> That's already happening with io_send.
>>>
>>> Right, and it's still annoying and hard to use
>> Another solution might be something where there is a counter that stores
>> how many CQEs with REQ_F_CQE_SKIP have been processed. Before exiting,
>> userspace could call a function like: io_wait_completions(int completions)
>> which would wait until everything is done, and then userspace could peek
>> the completion ring.
>>>
>>>>> as other sends and "fail" (in terms of io_uring) the request
>>>>> in case of a partial send posting 2 CQEs, but that's not a great
>>>>> way and it's getting userspace complicated pretty easily.
>>>>>
>>>>> In short, it was left out for later because there is a
>>>>> better way to implement it, but it should be done carefully
>>>> Maybe we could put the return values in the notifs? That would be a
>>>> discrepancy between io_send and io_send_zc, though.
>>>
>>> Yes. And yes, having a custom flavour is not good. It'd only
>>> be well usable if apart from returning the actual result
>>> it also guarantees there will be one and only one CQE, then
>>> the userspace doesn't have to do the dancing with counting
>>> and checking F_MORE. In fact, I outlined before how a generic
>>> solution may looks like:
>>>
>>> https://github.com/axboe/liburing/issues/824
>>>
>>> The only interesting part, IMHO, is to be able to merge the
>>> main completion with its notification. Below is an old stash
>>> rebased onto for-6.10. The only thing missing is relinking,
>>> but maybe we don't even care about it. I need to cover it
>>> well with tests.
>> The patch looks pretty good. The only potential issue is that you store
>> the res of the normal CQE into the notif CQE. This overwrites the
>> IORING_CQE_F_NOTIF with IORING_CQE_F_MORE. This means that the notif would
>> indicate to userspace that there will be another CQE, of which there
>> won't.
> I was wrong here; Mixed up flags and result value.
Right, it's fine. And it's synchronised by the ubuf refcounting,
though it might get more complicated if I'd try out some counting
optimisations.
FWIW, it shouldn't give any performance wins. The heavy stuff is
notifications waking the task, which is still there. I can even
imagine that having separate CQEs might be more flexible and would
allow more efficient CQ batching.
--
Pavel Begunkov
next prev parent reply other threads:[~2024-04-07 23:46 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-04 22:16 [PATCH 0/3] Add REQ_F_CQE_SKIP support to io_uring zerocopy Oliver Crumrine
2024-04-04 22:17 ` [PATCH 1/3] io_uring: Add REQ_F_CQE_SKIP support for " Oliver Crumrine
2024-04-05 13:01 ` Pavel Begunkov
2024-04-05 20:04 ` Oliver Crumrine
2024-04-06 21:23 ` Pavel Begunkov
2024-04-07 13:13 ` Oliver Crumrine
2024-04-07 19:14 ` Oliver Crumrine
2024-04-07 23:46 ` Pavel Begunkov [this message]
2024-04-09 1:33 ` Oliver Crumrine
2024-04-10 12:05 ` Pavel Begunkov
2024-04-11 0:52 ` Oliver Crumrine
2024-04-12 13:20 ` Pavel Begunkov
2024-04-15 23:51 ` Oliver Crumrine
2024-04-04 22:19 ` [PATCH 2/3] io_uring: Add io_uring_peek_cqe to mini_liburing Oliver Crumrine
2024-04-04 22:19 ` [PATCH 3/3] io_uring: Support IOSQE_CQE_SKIP_SUCCESS in io_uring zerocopy test Oliver Crumrine
2024-04-06 20:33 ` Muhammad Usama Anjum
2024-04-05 12:06 ` [PATCH 0/3] Add REQ_F_CQE_SKIP support to io_uring zerocopy Pavel Begunkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox