Re: [RFC] support memory recycle for ring-mapped provided buffer

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Hao Xu <[email protected]>
To: Dylan Yudaken <[email protected]>,
	"[email protected]" <[email protected]>
Cc: "[email protected]" <[email protected]>,
	"[email protected]" <[email protected]>
Subject: Re: [RFC] support memory recycle for ring-mapped provided buffer
Date: Tue, 14 Jun 2022 17:52:41 +0800	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

Hi Dylan,

On 6/14/22 16:38, Dylan Yudaken wrote:
> On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote:
>> On 6/12/22 15:30, Hao Xu wrote:
>>> On 6/10/22 13:55, Hao Xu wrote:
>>>> Hi all,
>>>>
>>>> I've actually done most code of this, but I think it's necessary
>>>> to
>>>> first ask community for comments on the design. what I do is when
>>>> consuming a buffer, don't increment the head, but check the
>>>> length
>>>> in real use. Then update the buffer info like
>>>> buff->addr += len, buff->len -= len;
>>>> (off course if a req consumes the whole buffer, just increment
>>>> head)
>>>> and since we now changed the addr of buffer, a simple buffer id
>>>> is
>>>> useless for userspace to get the data. We have to deliver the
>>>> original
>>>> addr back to userspace through cqe->extra1, which means this
>>>> feature
>>>> needs CQE32 to be on.
>>>> This way a provided buffer may be splited to many pieces, and
>>>> userspace
>>>> should track each piece, when all the pieces are spare again,
>>>> they can
>>>> re-provide the buffer.(they can surely re-provide each piece
>>>> separately
>>>> but that causes more and more memory fragments, anyway, it's
>>>> users'
>>>> choice.)
>>>>
>>>> How do you think of this? Actually I'm not a fun of big cqe, it's
>>>> not
>>>> perfect to have the limitation of having CQE32 on, but seems no
>>>> other
>>>> option?
>>
>> Another way is two rings, just like sqring and cqring. Users provide
>> buffers to sqring, kernel fetches it and when data is there put it to
>> cqring for users to read. The downside is we need to copy the buffer
>> metadata. and there is a limitation of how many times we can split
>> the
>> buffer since the cqring has a length.
>>
>>>>
>>>> Thanks,
>>>> Hao
>>>
>>> To implement this, CQE32 have to be introduced to almost
>>> everywhere.
>>> For example for io_issue_sqe:
>>>
>>> def->issue();
>>> if (unlikely(CQE32))
>>>       __io_req_complete32();
>>> else
>>>       __io_req_complete();
>>>
>>> which will cerntainly have some overhead for main path. Any
>>> comments?

For this downside, I think there is way to limit it to only read/recv
path.

>>>
>>> Regards,
>>> Hao
>>>
>>
> 
> I find the idea interesting, but is it definitely worth doing?
> 
> Other downsides I see with this approach:
> * userspace would have to keep track of when a buffer is finished. This
> might get complicated.
This one is fine I think, since users can choose not to enable this
feature and if they use it, they can choose not to track the buffer
but to re-provide each piece immediately.
(When a user register the pbuf ring, they can deliver a flag to enable
this feature.)

> * there is a problem of tiny writes - would we want to support a
> minimum buffer size?

Sorry I'm not following here, why do we need to have a min buffer size?

> 
> I think in general it can be acheived using the existing buffer ring
> and leave the management to userspace. For example if a user prepares a
> ring with N large buffers, on each completion the user is free to
> requeue that buffer without the recently completed chunk.

[1]
I see, was not aware of this...

> 
> The downsides here I see are:
>   * there is a delay to requeuing the buffer. This might cause more
> ENOBUFS. Practically I 'feel' this will not be a big problem in
> practice
>   * there is an additional atomic incrememnt on the ring
> 
> Do you feel the wins are worth the extra complexity?

Personally speaking, the only downside of my first approach is overhead
of cqe32 on iopoll completion path and read/recv/recvmsg path. But looks
[1] is fine...TBH I'm not sure which one is better.

Thanks,
Hao

> 
>

     prev parent reply	other threads:[~2022-06-14  9:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-10  5:55 [RFC] support memory recycle for ring-mapped provided buffer Hao Xu
2022-06-12  7:30 ` Hao Xu
2022-06-14  6:26   ` Hao Xu
2022-06-14  8:38     ` Dylan Yudaken
2022-06-14  9:52       ` Hao Xu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox