From: Hao Xu <[email protected]>
To: Dylan Yudaken <[email protected]>,
"[email protected]" <[email protected]>
Cc: "[email protected]" <[email protected]>,
"[email protected]" <[email protected]>
Subject: Re: [RFC] support memory recycle for ring-mapped provided buffer
Date: Tue, 14 Jun 2022 17:52:41 +0800 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
Hi Dylan,
On 6/14/22 16:38, Dylan Yudaken wrote:
> On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote:
>> On 6/12/22 15:30, Hao Xu wrote:
>>> On 6/10/22 13:55, Hao Xu wrote:
>>>> Hi all,
>>>>
>>>> I've actually done most code of this, but I think it's necessary
>>>> to
>>>> first ask community for comments on the design. what I do is when
>>>> consuming a buffer, don't increment the head, but check the
>>>> length
>>>> in real use. Then update the buffer info like
>>>> buff->addr += len, buff->len -= len;
>>>> (off course if a req consumes the whole buffer, just increment
>>>> head)
>>>> and since we now changed the addr of buffer, a simple buffer id
>>>> is
>>>> useless for userspace to get the data. We have to deliver the
>>>> original
>>>> addr back to userspace through cqe->extra1, which means this
>>>> feature
>>>> needs CQE32 to be on.
>>>> This way a provided buffer may be splited to many pieces, and
>>>> userspace
>>>> should track each piece, when all the pieces are spare again,
>>>> they can
>>>> re-provide the buffer.(they can surely re-provide each piece
>>>> separately
>>>> but that causes more and more memory fragments, anyway, it's
>>>> users'
>>>> choice.)
>>>>
>>>> How do you think of this? Actually I'm not a fun of big cqe, it's
>>>> not
>>>> perfect to have the limitation of having CQE32 on, but seems no
>>>> other
>>>> option?
>>
>> Another way is two rings, just like sqring and cqring. Users provide
>> buffers to sqring, kernel fetches it and when data is there put it to
>> cqring for users to read. The downside is we need to copy the buffer
>> metadata. and there is a limitation of how many times we can split
>> the
>> buffer since the cqring has a length.
>>
>>>>
>>>> Thanks,
>>>> Hao
>>>
>>> To implement this, CQE32 have to be introduced to almost
>>> everywhere.
>>> For example for io_issue_sqe:
>>>
>>> def->issue();
>>> if (unlikely(CQE32))
>>> __io_req_complete32();
>>> else
>>> __io_req_complete();
>>>
>>> which will cerntainly have some overhead for main path. Any
>>> comments?
For this downside, I think there is way to limit it to only read/recv
path.
>>>
>>> Regards,
>>> Hao
>>>
>>
>
> I find the idea interesting, but is it definitely worth doing?
>
> Other downsides I see with this approach:
> * userspace would have to keep track of when a buffer is finished. This
> might get complicated.
This one is fine I think, since users can choose not to enable this
feature and if they use it, they can choose not to track the buffer
but to re-provide each piece immediately.
(When a user register the pbuf ring, they can deliver a flag to enable
this feature.)
> * there is a problem of tiny writes - would we want to support a
> minimum buffer size?
Sorry I'm not following here, why do we need to have a min buffer size?
>
> I think in general it can be acheived using the existing buffer ring
> and leave the management to userspace. For example if a user prepares a
> ring with N large buffers, on each completion the user is free to
> requeue that buffer without the recently completed chunk.
[1]
I see, was not aware of this...
>
> The downsides here I see are:
> * there is a delay to requeuing the buffer. This might cause more
> ENOBUFS. Practically I 'feel' this will not be a big problem in
> practice
> * there is an additional atomic incrememnt on the ring
>
> Do you feel the wins are worth the extra complexity?
Personally speaking, the only downside of my first approach is overhead
of cqe32 on iopoll completion path and read/recv/recvmsg path. But looks
[1] is fine...TBH I'm not sure which one is better.
Thanks,
Hao
>
>
prev parent reply other threads:[~2022-06-14 9:53 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-10 5:55 [RFC] support memory recycle for ring-mapped provided buffer Hao Xu
2022-06-12 7:30 ` Hao Xu
2022-06-14 6:26 ` Hao Xu
2022-06-14 8:38 ` Dylan Yudaken
2022-06-14 9:52 ` Hao Xu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox