* [PATCHSET 0/2] kbuf fixes @ 2026-04-28 15:44 Jens Axboe 2026-04-28 15:44 ` [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member Jens Axboe 2026-04-28 15:44 ` [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers Jens Axboe 0 siblings, 2 replies; 7+ messages in thread From: Jens Axboe @ 2026-04-28 15:44 UTC (permalink / raw) To: io-uring Hi, First patch just kills an unused member in struct io_buffer_list, and then patch 2 fixes up an issue with incrementally consumed buffers where the application cannot inform the kernel of how much space should be left before moving on to the next buffer. This can cause spurious -EFAULT with multishot recvmsg, when the current buffer falls below the limit needed for the headers. -- Jens Axboe ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member 2026-04-28 15:44 [PATCHSET 0/2] kbuf fixes Jens Axboe @ 2026-04-28 15:44 ` Jens Axboe 2026-04-28 17:54 ` Gabriel Krisman Bertazi 2026-04-28 15:44 ` [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers Jens Axboe 1 sibling, 1 reply; 7+ messages in thread From: Jens Axboe @ 2026-04-28 15:44 UTC (permalink / raw) To: io-uring; +Cc: Jens Axboe This is only ever assigned, never used. The only used part is the calculated mask, which is used for indexing. Kill 'nr_entries'. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- io_uring/kbuf.c | 1 - io_uring/kbuf.h | 1 - 2 files changed, 2 deletions(-) diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 8da2ff798170..43e4f8615fe8 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -680,7 +680,6 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) } #endif - bl->nr_entries = reg.ring_entries; bl->mask = reg.ring_entries - 1; bl->flags |= IOBL_BUF_RING; bl->buf_ring = br; diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index bf15e26520d3..abf7052b556e 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -27,7 +27,6 @@ struct io_buffer_list { __u16 bgid; /* below is for ring provided buffers */ - __u16 nr_entries; __u16 head; __u16 mask; -- 2.53.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member 2026-04-28 15:44 ` [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member Jens Axboe @ 2026-04-28 17:54 ` Gabriel Krisman Bertazi 0 siblings, 0 replies; 7+ messages in thread From: Gabriel Krisman Bertazi @ 2026-04-28 17:54 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring Jens Axboe <axboe@kernel.dk> writes: > This is only ever assigned, never used. The only used part is the > calculated mask, which is used for indexing. Kill 'nr_entries'. > > Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> > --- > io_uring/kbuf.c | 1 - > io_uring/kbuf.h | 1 - > 2 files changed, 2 deletions(-) > > diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c > index 8da2ff798170..43e4f8615fe8 100644 > --- a/io_uring/kbuf.c > +++ b/io_uring/kbuf.c > @@ -680,7 +680,6 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) > } > #endif > > - bl->nr_entries = reg.ring_entries; > bl->mask = reg.ring_entries - 1; > bl->flags |= IOBL_BUF_RING; > bl->buf_ring = br; > diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h > index bf15e26520d3..abf7052b556e 100644 > --- a/io_uring/kbuf.h > +++ b/io_uring/kbuf.h > @@ -27,7 +27,6 @@ struct io_buffer_list { > __u16 bgid; > > /* below is for ring provided buffers */ > - __u16 nr_entries; > __u16 head; > __u16 mask; -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers 2026-04-28 15:44 [PATCHSET 0/2] kbuf fixes Jens Axboe 2026-04-28 15:44 ` [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member Jens Axboe @ 2026-04-28 15:44 ` Jens Axboe 2026-04-28 17:53 ` Gabriel Krisman Bertazi 1 sibling, 1 reply; 7+ messages in thread From: Jens Axboe @ 2026-04-28 15:44 UTC (permalink / raw) To: io-uring; +Cc: Martin Michaelis, stable, Jens Axboe From: Martin Michaelis <code@mgjm.de> Incrementally consumed buffer rings are generally fully consumed, but it's quite possible that the application has a minimum size it needs to meet to avoid truncation. Currently that minimum limit is 1 byte, but this should be a setting that is the hands of the application. For recvmsg multishot, a prime use case for incrementally consumed buffers, the application may get spurious -EFAULT returned at the end of an incrementally consumed buffer, as less space is available than the headers need. Grab a u32 field in struct io_uring_buf_reg, which the application can use to inform the kernel of the minimum size that should be available in an incrementally consumed buffer. If less than that is available, the current buffer is fully processed and the next one will be picked. Cc: stable@vger.kernel.org Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Link: https://github.com/axboe/liburing/issues/1433 Signed-off-by: Martin Michaelis <code@mgjm.de> [axboe: write commit message, change io_buffer_list member name] Signed-off-by: Jens Axboe <axboe@kernel.dk> --- include/uapi/linux/io_uring.h | 3 ++- io_uring/kbuf.c | 8 +++++++- io_uring/kbuf.h | 7 +++++++ 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 17ac1b785440..909fb7aea638 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -905,7 +905,8 @@ struct io_uring_buf_reg { __u32 ring_entries; __u16 bgid; __u16 flags; - __u64 resv[3]; + __u32 min_left; + __u32 resv[5]; }; /* argument for IORING_REGISTER_PBUF_STATUS */ diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index 43e4f8615fe8..63061aa1cab9 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -47,7 +47,7 @@ static bool io_kbuf_inc_commit(struct io_buffer_list *bl, int len) this_len = min_t(u32, len, buf_len); buf_len -= this_len; /* Stop looping for invalid buffer length of 0 */ - if (buf_len || !this_len) { + if (buf_len > bl->min_left_sub_one || !this_len) { WRITE_ONCE(buf->addr, READ_ONCE(buf->addr) + this_len); WRITE_ONCE(buf->len, buf_len); return false; @@ -637,6 +637,10 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) if (reg.ring_entries >= 65536) return -EINVAL; + /* minimum left byte count is a property of incremental buffers */ + if (!(reg.flags & IOU_PBUF_RING_INC) && reg.min_left) + return -EINVAL; + bl = io_buffer_get_list(ctx, reg.bgid); if (bl) { /* if mapped buffer ring OR classic exists, don't allow */ @@ -683,6 +687,8 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) bl->mask = reg.ring_entries - 1; bl->flags |= IOBL_BUF_RING; bl->buf_ring = br; + if (reg.min_left) + bl->min_left_sub_one = reg.min_left - 1; if (reg.flags & IOU_PBUF_RING_INC) bl->flags |= IOBL_INC; ret = io_buffer_add_list(ctx, bl, reg.bgid); diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h index abf7052b556e..401773e1ef80 100644 --- a/io_uring/kbuf.h +++ b/io_uring/kbuf.h @@ -32,6 +32,13 @@ struct io_buffer_list { __u16 flags; + /* + * minimum required amount to be left to reuse an incrementally + * consumed buffer. If less than this is left at consumption time, + * buffer is done and head is incremented to the next buffer. + */ + __u32 min_left_sub_one; + struct io_mapped_region region; }; -- 2.53.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers 2026-04-28 15:44 ` [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers Jens Axboe @ 2026-04-28 17:53 ` Gabriel Krisman Bertazi 2026-04-28 18:02 ` Jens Axboe 0 siblings, 1 reply; 7+ messages in thread From: Gabriel Krisman Bertazi @ 2026-04-28 17:53 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Martin Michaelis, stable Jens Axboe <axboe@kernel.dk> writes: > From: Martin Michaelis <code@mgjm.de> > > Incrementally consumed buffer rings are generally fully consumed, but > it's quite possible that the application has a minimum size it needs to > meet to avoid truncation. Currently that minimum limit is 1 byte, but > this should be a setting that is the hands of the application. For > recvmsg multishot, a prime use case for incrementally consumed buffers, > the application may get spurious -EFAULT returned at the end of an > incrementally consumed buffer, as less space is available than the > headers need. > > Grab a u32 field in struct io_uring_buf_reg, which the application can > use to inform the kernel of the minimum size that should be available > in an incrementally consumed buffer. If less than that is available, > the current buffer is fully processed and the next one will be picked. > > Cc: stable@vger.kernel.org > Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") > Link: https://github.com/axboe/liburing/issues/1433 > Signed-off-by: Martin Michaelis <code@mgjm.de> > [axboe: write commit message, change io_buffer_list member name] > Signed-off-by: Jens Axboe <axboe@kernel.dk> > --- > include/uapi/linux/io_uring.h | 3 ++- > io_uring/kbuf.c | 8 +++++++- > io_uring/kbuf.h | 7 +++++++ > 3 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > index 17ac1b785440..909fb7aea638 100644 > --- a/include/uapi/linux/io_uring.h > +++ b/include/uapi/linux/io_uring.h > @@ -905,7 +905,8 @@ struct io_uring_buf_reg { > __u32 ring_entries; > __u16 bgid; > __u16 flags; > - __u64 resv[3]; > + __u32 min_left; > + __u32 resv[5]; Honest question, isn't this a property of the specific operation and/or fd being operated, instead of the buffer_reg? > }; > > /* argument for IORING_REGISTER_PBUF_STATUS */ > diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c > index 43e4f8615fe8..63061aa1cab9 100644 > --- a/io_uring/kbuf.c > +++ b/io_uring/kbuf.c > @@ -47,7 +47,7 @@ static bool io_kbuf_inc_commit(struct io_buffer_list *bl, int len) > this_len = min_t(u32, len, buf_len); > buf_len -= this_len; > /* Stop looping for invalid buffer length of 0 */ > - if (buf_len || !this_len) { > + if (buf_len > bl->min_left_sub_one || !this_len) { Cosmetic, but perhaps store min_left_sub_one instead of min_left itself? the buf_len must be >= min_left, and that is easier to read. (buf_len && buf_len >= min_left || !this_len) > WRITE_ONCE(buf->addr, READ_ONCE(buf->addr) + this_len); > WRITE_ONCE(buf->len, buf_len); > return false; > @@ -637,6 +637,10 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) > if (reg.ring_entries >= 65536) > return -EINVAL; > > + /* minimum left byte count is a property of incremental buffers */ > + if (!(reg.flags & IOU_PBUF_RING_INC) && reg.min_left) > + return -EINVAL; > + > bl = io_buffer_get_list(ctx, reg.bgid); > if (bl) { > /* if mapped buffer ring OR classic exists, don't allow */ > @@ -683,6 +687,8 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg) > bl->mask = reg.ring_entries - 1; > bl->flags |= IOBL_BUF_RING; > bl->buf_ring = br; > + if (reg.min_left) > + bl->min_left_sub_one = reg.min_left - 1; > if (reg.flags & IOU_PBUF_RING_INC) > bl->flags |= IOBL_INC; > ret = io_buffer_add_list(ctx, bl, reg.bgid); > diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h > index abf7052b556e..401773e1ef80 100644 > --- a/io_uring/kbuf.h > +++ b/io_uring/kbuf.h > @@ -32,6 +32,13 @@ struct io_buffer_list { > > __u16 flags; > > + /* > + * minimum required amount to be left to reuse an incrementally > + * consumed buffer. If less than this is left at consumption time, > + * buffer is done and head is incremented to the next buffer. > + */ > + __u32 min_left_sub_one; > + > struct io_mapped_region region; > }; -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers 2026-04-28 17:53 ` Gabriel Krisman Bertazi @ 2026-04-28 18:02 ` Jens Axboe 2026-04-28 19:08 ` Gabriel Krisman Bertazi 0 siblings, 1 reply; 7+ messages in thread From: Jens Axboe @ 2026-04-28 18:02 UTC (permalink / raw) To: Gabriel Krisman Bertazi; +Cc: io-uring, Martin Michaelis, stable On 4/28/26 11:53 AM, Gabriel Krisman Bertazi wrote: > Jens Axboe <axboe@kernel.dk> writes: > >> From: Martin Michaelis <code@mgjm.de> >> >> Incrementally consumed buffer rings are generally fully consumed, but >> it's quite possible that the application has a minimum size it needs to >> meet to avoid truncation. Currently that minimum limit is 1 byte, but >> this should be a setting that is the hands of the application. For >> recvmsg multishot, a prime use case for incrementally consumed buffers, >> the application may get spurious -EFAULT returned at the end of an >> incrementally consumed buffer, as less space is available than the >> headers need. >> >> Grab a u32 field in struct io_uring_buf_reg, which the application can >> use to inform the kernel of the minimum size that should be available >> in an incrementally consumed buffer. If less than that is available, >> the current buffer is fully processed and the next one will be picked. >> >> Cc: stable@vger.kernel.org >> Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") >> Link: https://github.com/axboe/liburing/issues/1433 >> Signed-off-by: Martin Michaelis <code@mgjm.de> >> [axboe: write commit message, change io_buffer_list member name] >> Signed-off-by: Jens Axboe <axboe@kernel.dk> >> --- >> include/uapi/linux/io_uring.h | 3 ++- >> io_uring/kbuf.c | 8 +++++++- >> io_uring/kbuf.h | 7 +++++++ >> 3 files changed, 16 insertions(+), 2 deletions(-) >> >> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h >> index 17ac1b785440..909fb7aea638 100644 >> --- a/include/uapi/linux/io_uring.h >> +++ b/include/uapi/linux/io_uring.h >> @@ -905,7 +905,8 @@ struct io_uring_buf_reg { >> __u32 ring_entries; >> __u16 bgid; >> __u16 flags; >> - __u64 resv[3]; >> + __u32 min_left; >> + __u32 resv[5]; > > Honest question, isn't this a property of the specific operation and/or > fd being operated, instead of the buffer_reg? It kind of is, in that some users may not care. But it's not currently possible to pass this in on a per-op basis, and while I did hack that up initially, it's almost impossible as you end up with layering violations. In practice, this is really mostly a recvmsg multishot issue, because we need to store the headers. Hence the solution to stuff it in the io_uring_buf_reg instead, and make it a fixed property of the buffer group. In practice, you may even want a larger min_left than what the recvmsg requires, as you don't want a tiny truncated transfer at the end, regardless of what type of recv or read operation this is. Hence it works generically as well. Also see the linked GH issue, that's where most of the discussion around this have happened already. >> /* argument for IORING_REGISTER_PBUF_STATUS */ >> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c >> index 43e4f8615fe8..63061aa1cab9 100644 >> --- a/io_uring/kbuf.c >> +++ b/io_uring/kbuf.c >> @@ -47,7 +47,7 @@ static bool io_kbuf_inc_commit(struct io_buffer_list *bl, int len) >> this_len = min_t(u32, len, buf_len); >> buf_len -= this_len; >> /* Stop looping for invalid buffer length of 0 */ >> - if (buf_len || !this_len) { >> + if (buf_len > bl->min_left_sub_one || !this_len) { > > Cosmetic, but perhaps store min_left_sub_one instead of min_left itself? the > buf_len must be >= min_left, and that is easier to read. (buf_len && > buf_len >= min_left || !this_len) Also see GH issue. -- Jens Axboe ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers 2026-04-28 18:02 ` Jens Axboe @ 2026-04-28 19:08 ` Gabriel Krisman Bertazi 0 siblings, 0 replies; 7+ messages in thread From: Gabriel Krisman Bertazi @ 2026-04-28 19:08 UTC (permalink / raw) To: Jens Axboe; +Cc: io-uring, Martin Michaelis, stable Jens Axboe <axboe@kernel.dk> writes: >> Honest question, isn't this a property of the specific operation and/or >> fd being operated, instead of the buffer_reg? > > It kind of is, in that some users may not care. But it's not currently > possible to pass this in on a per-op basis, and while I did hack that > up initially, it's almost impossible as you end up with layering > violations. In practice, this is really mostly a recvmsg multishot > issue, because we need to store the headers. Hence the solution to > stuff it in the io_uring_buf_reg instead, and make it a fixed property > of the buffer group. In practice, you may even want a larger min_left > than what the recvmsg requires, as you don't want a tiny truncated > transfer at the end, regardless of what type of recv or read operation > this is. Hence it works generically as well. > > Also see the linked GH issue, that's where most of the discussion > around this have happened already. > >>> - if (buf_len || !this_len) { >>> + if (buf_len > bl->min_left_sub_one || !this_len) { >> >> Cosmetic, but perhaps store min_left_sub_one instead of min_left itself? the >> buf_len must be >= min_left, and that is easier to read. (buf_len && >> buf_len >= min_left || !this_len) > > Also see GH issue. Ack. Thanks. Feel free to add: Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de> -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-04-28 19:08 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-28 15:44 [PATCHSET 0/2] kbuf fixes Jens Axboe 2026-04-28 15:44 ` [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member Jens Axboe 2026-04-28 17:54 ` Gabriel Krisman Bertazi 2026-04-28 15:44 ` [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers Jens Axboe 2026-04-28 17:53 ` Gabriel Krisman Bertazi 2026-04-28 18:02 ` Jens Axboe 2026-04-28 19:08 ` Gabriel Krisman Bertazi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox