* [RFC] support memory recycle for ring-mapped provided buffer @ 2022-06-10 5:55 Hao Xu 2022-06-12 7:30 ` Hao Xu 0 siblings, 1 reply; 5+ messages in thread From: Hao Xu @ 2022-06-10 5:55 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe, Pavel Begunkov Hi all, I've actually done most code of this, but I think it's necessary to first ask community for comments on the design. what I do is when consuming a buffer, don't increment the head, but check the length in real use. Then update the buffer info like buff->addr += len, buff->len -= len; (off course if a req consumes the whole buffer, just increment head) and since we now changed the addr of buffer, a simple buffer id is useless for userspace to get the data. We have to deliver the original addr back to userspace through cqe->extra1, which means this feature needs CQE32 to be on. This way a provided buffer may be splited to many pieces, and userspace should track each piece, when all the pieces are spare again, they can re-provide the buffer.(they can surely re-provide each piece separately but that causes more and more memory fragments, anyway, it's users' choice.) How do you think of this? Actually I'm not a fun of big cqe, it's not perfect to have the limitation of having CQE32 on, but seems no other option? Thanks, Hao ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] support memory recycle for ring-mapped provided buffer 2022-06-10 5:55 [RFC] support memory recycle for ring-mapped provided buffer Hao Xu @ 2022-06-12 7:30 ` Hao Xu 2022-06-14 6:26 ` Hao Xu 0 siblings, 1 reply; 5+ messages in thread From: Hao Xu @ 2022-06-12 7:30 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe, Pavel Begunkov On 6/10/22 13:55, Hao Xu wrote: > Hi all, > > I've actually done most code of this, but I think it's necessary to > first ask community for comments on the design. what I do is when > consuming a buffer, don't increment the head, but check the length > in real use. Then update the buffer info like > buff->addr += len, buff->len -= len; > (off course if a req consumes the whole buffer, just increment head) > and since we now changed the addr of buffer, a simple buffer id is > useless for userspace to get the data. We have to deliver the original > addr back to userspace through cqe->extra1, which means this feature > needs CQE32 to be on. > This way a provided buffer may be splited to many pieces, and userspace > should track each piece, when all the pieces are spare again, they can > re-provide the buffer.(they can surely re-provide each piece separately > but that causes more and more memory fragments, anyway, it's users' > choice.) > > How do you think of this? Actually I'm not a fun of big cqe, it's not > perfect to have the limitation of having CQE32 on, but seems no other > option? > > Thanks, > Hao To implement this, CQE32 have to be introduced to almost everywhere. For example for io_issue_sqe: def->issue(); if (unlikely(CQE32)) __io_req_complete32(); else __io_req_complete(); which will cerntainly have some overhead for main path. Any comments? Regards, Hao ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] support memory recycle for ring-mapped provided buffer 2022-06-12 7:30 ` Hao Xu @ 2022-06-14 6:26 ` Hao Xu 2022-06-14 8:38 ` Dylan Yudaken 0 siblings, 1 reply; 5+ messages in thread From: Hao Xu @ 2022-06-14 6:26 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe, Pavel Begunkov On 6/12/22 15:30, Hao Xu wrote: > On 6/10/22 13:55, Hao Xu wrote: >> Hi all, >> >> I've actually done most code of this, but I think it's necessary to >> first ask community for comments on the design. what I do is when >> consuming a buffer, don't increment the head, but check the length >> in real use. Then update the buffer info like >> buff->addr += len, buff->len -= len; >> (off course if a req consumes the whole buffer, just increment head) >> and since we now changed the addr of buffer, a simple buffer id is >> useless for userspace to get the data. We have to deliver the original >> addr back to userspace through cqe->extra1, which means this feature >> needs CQE32 to be on. >> This way a provided buffer may be splited to many pieces, and userspace >> should track each piece, when all the pieces are spare again, they can >> re-provide the buffer.(they can surely re-provide each piece separately >> but that causes more and more memory fragments, anyway, it's users' >> choice.) >> >> How do you think of this? Actually I'm not a fun of big cqe, it's not >> perfect to have the limitation of having CQE32 on, but seems no other >> option? Another way is two rings, just like sqring and cqring. Users provide buffers to sqring, kernel fetches it and when data is there put it to cqring for users to read. The downside is we need to copy the buffer metadata. and there is a limitation of how many times we can split the buffer since the cqring has a length. >> >> Thanks, >> Hao > > To implement this, CQE32 have to be introduced to almost everywhere. > For example for io_issue_sqe: > > def->issue(); > if (unlikely(CQE32)) > __io_req_complete32(); > else > __io_req_complete(); > > which will cerntainly have some overhead for main path. Any comments? > > Regards, > Hao > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] support memory recycle for ring-mapped provided buffer 2022-06-14 6:26 ` Hao Xu @ 2022-06-14 8:38 ` Dylan Yudaken 2022-06-14 9:52 ` Hao Xu 0 siblings, 1 reply; 5+ messages in thread From: Dylan Yudaken @ 2022-06-14 8:38 UTC (permalink / raw) To: [email protected], [email protected] Cc: [email protected], [email protected] On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote: > On 6/12/22 15:30, Hao Xu wrote: > > On 6/10/22 13:55, Hao Xu wrote: > > > Hi all, > > > > > > I've actually done most code of this, but I think it's necessary > > > to > > > first ask community for comments on the design. what I do is when > > > consuming a buffer, don't increment the head, but check the > > > length > > > in real use. Then update the buffer info like > > > buff->addr += len, buff->len -= len; > > > (off course if a req consumes the whole buffer, just increment > > > head) > > > and since we now changed the addr of buffer, a simple buffer id > > > is > > > useless for userspace to get the data. We have to deliver the > > > original > > > addr back to userspace through cqe->extra1, which means this > > > feature > > > needs CQE32 to be on. > > > This way a provided buffer may be splited to many pieces, and > > > userspace > > > should track each piece, when all the pieces are spare again, > > > they can > > > re-provide the buffer.(they can surely re-provide each piece > > > separately > > > but that causes more and more memory fragments, anyway, it's > > > users' > > > choice.) > > > > > > How do you think of this? Actually I'm not a fun of big cqe, it's > > > not > > > perfect to have the limitation of having CQE32 on, but seems no > > > other > > > option? > > Another way is two rings, just like sqring and cqring. Users provide > buffers to sqring, kernel fetches it and when data is there put it to > cqring for users to read. The downside is we need to copy the buffer > metadata. and there is a limitation of how many times we can split > the > buffer since the cqring has a length. > > > > > > > Thanks, > > > Hao > > > > To implement this, CQE32 have to be introduced to almost > > everywhere. > > For example for io_issue_sqe: > > > > def->issue(); > > if (unlikely(CQE32)) > > __io_req_complete32(); > > else > > __io_req_complete(); > > > > which will cerntainly have some overhead for main path. Any > > comments? > > > > Regards, > > Hao > > > I find the idea interesting, but is it definitely worth doing? Other downsides I see with this approach: * userspace would have to keep track of when a buffer is finished. This might get complicated. * there is a problem of tiny writes - would we want to support a minimum buffer size? I think in general it can be acheived using the existing buffer ring and leave the management to userspace. For example if a user prepares a ring with N large buffers, on each completion the user is free to requeue that buffer without the recently completed chunk. The downsides here I see are: * there is a delay to requeuing the buffer. This might cause more ENOBUFS. Practically I 'feel' this will not be a big problem in practice * there is an additional atomic incrememnt on the ring Do you feel the wins are worth the extra complexity? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] support memory recycle for ring-mapped provided buffer 2022-06-14 8:38 ` Dylan Yudaken @ 2022-06-14 9:52 ` Hao Xu 0 siblings, 0 replies; 5+ messages in thread From: Hao Xu @ 2022-06-14 9:52 UTC (permalink / raw) To: Dylan Yudaken, [email protected] Cc: [email protected], [email protected] Hi Dylan, On 6/14/22 16:38, Dylan Yudaken wrote: > On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote: >> On 6/12/22 15:30, Hao Xu wrote: >>> On 6/10/22 13:55, Hao Xu wrote: >>>> Hi all, >>>> >>>> I've actually done most code of this, but I think it's necessary >>>> to >>>> first ask community for comments on the design. what I do is when >>>> consuming a buffer, don't increment the head, but check the >>>> length >>>> in real use. Then update the buffer info like >>>> buff->addr += len, buff->len -= len; >>>> (off course if a req consumes the whole buffer, just increment >>>> head) >>>> and since we now changed the addr of buffer, a simple buffer id >>>> is >>>> useless for userspace to get the data. We have to deliver the >>>> original >>>> addr back to userspace through cqe->extra1, which means this >>>> feature >>>> needs CQE32 to be on. >>>> This way a provided buffer may be splited to many pieces, and >>>> userspace >>>> should track each piece, when all the pieces are spare again, >>>> they can >>>> re-provide the buffer.(they can surely re-provide each piece >>>> separately >>>> but that causes more and more memory fragments, anyway, it's >>>> users' >>>> choice.) >>>> >>>> How do you think of this? Actually I'm not a fun of big cqe, it's >>>> not >>>> perfect to have the limitation of having CQE32 on, but seems no >>>> other >>>> option? >> >> Another way is two rings, just like sqring and cqring. Users provide >> buffers to sqring, kernel fetches it and when data is there put it to >> cqring for users to read. The downside is we need to copy the buffer >> metadata. and there is a limitation of how many times we can split >> the >> buffer since the cqring has a length. >> >>>> >>>> Thanks, >>>> Hao >>> >>> To implement this, CQE32 have to be introduced to almost >>> everywhere. >>> For example for io_issue_sqe: >>> >>> def->issue(); >>> if (unlikely(CQE32)) >>> __io_req_complete32(); >>> else >>> __io_req_complete(); >>> >>> which will cerntainly have some overhead for main path. Any >>> comments? For this downside, I think there is way to limit it to only read/recv path. >>> >>> Regards, >>> Hao >>> >> > > I find the idea interesting, but is it definitely worth doing? > > Other downsides I see with this approach: > * userspace would have to keep track of when a buffer is finished. This > might get complicated. This one is fine I think, since users can choose not to enable this feature and if they use it, they can choose not to track the buffer but to re-provide each piece immediately. (When a user register the pbuf ring, they can deliver a flag to enable this feature.) > * there is a problem of tiny writes - would we want to support a > minimum buffer size? Sorry I'm not following here, why do we need to have a min buffer size? > > I think in general it can be acheived using the existing buffer ring > and leave the management to userspace. For example if a user prepares a > ring with N large buffers, on each completion the user is free to > requeue that buffer without the recently completed chunk. [1] I see, was not aware of this... > > The downsides here I see are: > * there is a delay to requeuing the buffer. This might cause more > ENOBUFS. Practically I 'feel' this will not be a big problem in > practice > * there is an additional atomic incrememnt on the ring > > Do you feel the wins are worth the extra complexity? Personally speaking, the only downside of my first approach is overhead of cqe32 on iopoll completion path and read/recv/recvmsg path. But looks [1] is fine...TBH I'm not sure which one is better. Thanks, Hao > > ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-06-14 9:53 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-06-10 5:55 [RFC] support memory recycle for ring-mapped provided buffer Hao Xu 2022-06-12 7:30 ` Hao Xu 2022-06-14 6:26 ` Hao Xu 2022-06-14 8:38 ` Dylan Yudaken 2022-06-14 9:52 ` Hao Xu
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox