[RFC] support memory recycle for ring-mapped provided buffer

public inbox for [email protected]
 help / color / mirror / Atom feed

* [RFC] support memory recycle for ring-mapped provided buffer
@ 2022-06-10  5:55 Hao Xu
  2022-06-12  7:30 ` Hao Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Hao Xu @ 2022-06-10  5:55 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

Hi all,

I've actually done most code of this, but I think it's necessary to
first ask community for comments on the design. what I do is when
consuming a buffer, don't increment the head, but check the length
in real use. Then update the buffer info like
buff->addr += len, buff->len -= len;
(off course if a req consumes the whole buffer, just increment head)
and since we now changed the addr of buffer, a simple buffer id is
useless for userspace to get the data. We have to deliver the original
addr back to userspace through cqe->extra1, which means this feature
needs CQE32 to be on.
This way a provided buffer may be splited to many pieces, and userspace
should track each piece, when all the pieces are spare again, they can
re-provide the buffer.(they can surely re-provide each piece separately
but that causes more and more memory fragments, anyway, it's users'
choice.)

How do you think of this? Actually I'm not a fun of big cqe, it's not
perfect to have the limitation of having CQE32 on, but seems no other
option?

Thanks,
Hao

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] support memory recycle for ring-mapped provided buffer
  2022-06-10  5:55 [RFC] support memory recycle for ring-mapped provided buffer Hao Xu
@ 2022-06-12  7:30 ` Hao Xu
  2022-06-14  6:26   ` Hao Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Hao Xu @ 2022-06-12  7:30 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

On 6/10/22 13:55, Hao Xu wrote:
> Hi all,
> 
> I've actually done most code of this, but I think it's necessary to
> first ask community for comments on the design. what I do is when
> consuming a buffer, don't increment the head, but check the length
> in real use. Then update the buffer info like
> buff->addr += len, buff->len -= len;
> (off course if a req consumes the whole buffer, just increment head)
> and since we now changed the addr of buffer, a simple buffer id is
> useless for userspace to get the data. We have to deliver the original
> addr back to userspace through cqe->extra1, which means this feature
> needs CQE32 to be on.
> This way a provided buffer may be splited to many pieces, and userspace
> should track each piece, when all the pieces are spare again, they can
> re-provide the buffer.(they can surely re-provide each piece separately
> but that causes more and more memory fragments, anyway, it's users'
> choice.)
> 
> How do you think of this? Actually I'm not a fun of big cqe, it's not
> perfect to have the limitation of having CQE32 on, but seems no other
> option?
> 
> Thanks,
> Hao

To implement this, CQE32 have to be introduced to almost everywhere.
For example for io_issue_sqe:

def->issue();
if (unlikely(CQE32))
     __io_req_complete32();
else
     __io_req_complete();

which will cerntainly have some overhead for main path. Any comments?

Regards,
Hao


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] support memory recycle for ring-mapped provided buffer
  2022-06-12  7:30 ` Hao Xu
@ 2022-06-14  6:26   ` Hao Xu
  2022-06-14  8:38     ` Dylan Yudaken
  0 siblings, 1 reply; 5+ messages in thread
From: Hao Xu @ 2022-06-14  6:26 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, Pavel Begunkov

On 6/12/22 15:30, Hao Xu wrote:
> On 6/10/22 13:55, Hao Xu wrote:
>> Hi all,
>>
>> I've actually done most code of this, but I think it's necessary to
>> first ask community for comments on the design. what I do is when
>> consuming a buffer, don't increment the head, but check the length
>> in real use. Then update the buffer info like
>> buff->addr += len, buff->len -= len;
>> (off course if a req consumes the whole buffer, just increment head)
>> and since we now changed the addr of buffer, a simple buffer id is
>> useless for userspace to get the data. We have to deliver the original
>> addr back to userspace through cqe->extra1, which means this feature
>> needs CQE32 to be on.
>> This way a provided buffer may be splited to many pieces, and userspace
>> should track each piece, when all the pieces are spare again, they can
>> re-provide the buffer.(they can surely re-provide each piece separately
>> but that causes more and more memory fragments, anyway, it's users'
>> choice.)
>>
>> How do you think of this? Actually I'm not a fun of big cqe, it's not
>> perfect to have the limitation of having CQE32 on, but seems no other
>> option?

Another way is two rings, just like sqring and cqring. Users provide
buffers to sqring, kernel fetches it and when data is there put it to
cqring for users to read. The downside is we need to copy the buffer
metadata. and there is a limitation of how many times we can split the
buffer since the cqring has a length.

>>
>> Thanks,
>> Hao
> 
> To implement this, CQE32 have to be introduced to almost everywhere.
> For example for io_issue_sqe:
> 
> def->issue();
> if (unlikely(CQE32))
>      __io_req_complete32();
> else
>      __io_req_complete();
> 
> which will cerntainly have some overhead for main path. Any comments?
> 
> Regards,
> Hao
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] support memory recycle for ring-mapped provided buffer
  2022-06-14  6:26   ` Hao Xu
@ 2022-06-14  8:38     ` Dylan Yudaken
  2022-06-14  9:52       ` Hao Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Dylan Yudaken @ 2022-06-14  8:38 UTC (permalink / raw)
  To: [email protected], [email protected]
  Cc: [email protected], [email protected]

On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote:
> On 6/12/22 15:30, Hao Xu wrote:
> > On 6/10/22 13:55, Hao Xu wrote:
> > > Hi all,
> > > 
> > > I've actually done most code of this, but I think it's necessary
> > > to
> > > first ask community for comments on the design. what I do is when
> > > consuming a buffer, don't increment the head, but check the
> > > length
> > > in real use. Then update the buffer info like
> > > buff->addr += len, buff->len -= len;
> > > (off course if a req consumes the whole buffer, just increment
> > > head)
> > > and since we now changed the addr of buffer, a simple buffer id
> > > is
> > > useless for userspace to get the data. We have to deliver the
> > > original
> > > addr back to userspace through cqe->extra1, which means this
> > > feature
> > > needs CQE32 to be on.
> > > This way a provided buffer may be splited to many pieces, and
> > > userspace
> > > should track each piece, when all the pieces are spare again,
> > > they can
> > > re-provide the buffer.(they can surely re-provide each piece
> > > separately
> > > but that causes more and more memory fragments, anyway, it's
> > > users'
> > > choice.)
> > > 
> > > How do you think of this? Actually I'm not a fun of big cqe, it's
> > > not
> > > perfect to have the limitation of having CQE32 on, but seems no
> > > other
> > > option?
> 
> Another way is two rings, just like sqring and cqring. Users provide
> buffers to sqring, kernel fetches it and when data is there put it to
> cqring for users to read. The downside is we need to copy the buffer
> metadata. and there is a limitation of how many times we can split
> the
> buffer since the cqring has a length.
> 
> > > 
> > > Thanks,
> > > Hao
> > 
> > To implement this, CQE32 have to be introduced to almost
> > everywhere.
> > For example for io_issue_sqe:
> > 
> > def->issue();
> > if (unlikely(CQE32))
> >      __io_req_complete32();
> > else
> >      __io_req_complete();
> > 
> > which will cerntainly have some overhead for main path. Any
> > comments?
> > 
> > Regards,
> > Hao
> > 
> 

I find the idea interesting, but is it definitely worth doing? 

Other downsides I see with this approach:
* userspace would have to keep track of when a buffer is finished. This
might get complicated. 
* there is a problem of tiny writes - would we want to support a
minimum buffer size?

I think in general it can be acheived using the existing buffer ring
and leave the management to userspace. For example if a user prepares a
ring with N large buffers, on each completion the user is free to
requeue that buffer without the recently completed chunk. 

The downsides here I see are:
 * there is a delay to requeuing the buffer. This might cause more
ENOBUFS. Practically I 'feel' this will not be a big problem in
practice
 * there is an additional atomic incrememnt on the ring

Do you feel the wins are worth the extra complexity?



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC] support memory recycle for ring-mapped provided buffer
  2022-06-14  8:38     ` Dylan Yudaken
@ 2022-06-14  9:52       ` Hao Xu
  0 siblings, 0 replies; 5+ messages in thread
From: Hao Xu @ 2022-06-14  9:52 UTC (permalink / raw)
  To: Dylan Yudaken, [email protected]
  Cc: [email protected], [email protected]

Hi Dylan,

On 6/14/22 16:38, Dylan Yudaken wrote:
> On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote:
>> On 6/12/22 15:30, Hao Xu wrote:
>>> On 6/10/22 13:55, Hao Xu wrote:
>>>> Hi all,
>>>>
>>>> I've actually done most code of this, but I think it's necessary
>>>> to
>>>> first ask community for comments on the design. what I do is when
>>>> consuming a buffer, don't increment the head, but check the
>>>> length
>>>> in real use. Then update the buffer info like
>>>> buff->addr += len, buff->len -= len;
>>>> (off course if a req consumes the whole buffer, just increment
>>>> head)
>>>> and since we now changed the addr of buffer, a simple buffer id
>>>> is
>>>> useless for userspace to get the data. We have to deliver the
>>>> original
>>>> addr back to userspace through cqe->extra1, which means this
>>>> feature
>>>> needs CQE32 to be on.
>>>> This way a provided buffer may be splited to many pieces, and
>>>> userspace
>>>> should track each piece, when all the pieces are spare again,
>>>> they can
>>>> re-provide the buffer.(they can surely re-provide each piece
>>>> separately
>>>> but that causes more and more memory fragments, anyway, it's
>>>> users'
>>>> choice.)
>>>>
>>>> How do you think of this? Actually I'm not a fun of big cqe, it's
>>>> not
>>>> perfect to have the limitation of having CQE32 on, but seems no
>>>> other
>>>> option?
>>
>> Another way is two rings, just like sqring and cqring. Users provide
>> buffers to sqring, kernel fetches it and when data is there put it to
>> cqring for users to read. The downside is we need to copy the buffer
>> metadata. and there is a limitation of how many times we can split
>> the
>> buffer since the cqring has a length.
>>
>>>>
>>>> Thanks,
>>>> Hao
>>>
>>> To implement this, CQE32 have to be introduced to almost
>>> everywhere.
>>> For example for io_issue_sqe:
>>>
>>> def->issue();
>>> if (unlikely(CQE32))
>>>       __io_req_complete32();
>>> else
>>>       __io_req_complete();
>>>
>>> which will cerntainly have some overhead for main path. Any
>>> comments?

For this downside, I think there is way to limit it to only read/recv
path.

>>>
>>> Regards,
>>> Hao
>>>
>>
> 
> I find the idea interesting, but is it definitely worth doing?
> 
> Other downsides I see with this approach:
> * userspace would have to keep track of when a buffer is finished. This
> might get complicated.
This one is fine I think, since users can choose not to enable this
feature and if they use it, they can choose not to track the buffer
but to re-provide each piece immediately.
(When a user register the pbuf ring, they can deliver a flag to enable
this feature.)

> * there is a problem of tiny writes - would we want to support a
> minimum buffer size?

Sorry I'm not following here, why do we need to have a min buffer size?

> 
> I think in general it can be acheived using the existing buffer ring
> and leave the management to userspace. For example if a user prepares a
> ring with N large buffers, on each completion the user is free to
> requeue that buffer without the recently completed chunk.

[1]
I see, was not aware of this...

> 
> The downsides here I see are:
>   * there is a delay to requeuing the buffer. This might cause more
> ENOBUFS. Practically I 'feel' this will not be a big problem in
> practice
>   * there is an additional atomic incrememnt on the ring
> 
> Do you feel the wins are worth the extra complexity?

Personally speaking, the only downside of my first approach is overhead
of cqe32 on iopoll completion path and read/recv/recvmsg path. But looks
[1] is fine...TBH I'm not sure which one is better.

Thanks,
Hao

> 
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-06-14  9:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-06-10  5:55 [RFC] support memory recycle for ring-mapped provided buffer Hao Xu
2022-06-12  7:30 ` Hao Xu
2022-06-14  6:26   ` Hao Xu
2022-06-14  8:38     ` Dylan Yudaken
2022-06-14  9:52       ` Hao Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox