public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/2] kbuf fixes
@ 2026-04-28 15:44 Jens Axboe
  2026-04-28 15:44 ` [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member Jens Axboe
  2026-04-28 15:44 ` [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers Jens Axboe
  0 siblings, 2 replies; 7+ messages in thread
From: Jens Axboe @ 2026-04-28 15:44 UTC (permalink / raw)
  To: io-uring

Hi,

First patch just kills an unused member in struct io_buffer_list,
and then patch 2 fixes up an issue with incrementally consumed
buffers where the application cannot inform the kernel of how much
space should be left before moving on to the next buffer. This can
cause spurious -EFAULT with multishot recvmsg, when the current
buffer falls below the limit needed for the headers.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member
  2026-04-28 15:44 [PATCHSET 0/2] kbuf fixes Jens Axboe
@ 2026-04-28 15:44 ` Jens Axboe
  2026-04-28 17:54   ` Gabriel Krisman Bertazi
  2026-04-28 15:44 ` [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers Jens Axboe
  1 sibling, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2026-04-28 15:44 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe

This is only ever assigned, never used. The only used part is the
calculated mask, which is used for indexing. Kill 'nr_entries'.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/kbuf.c | 1 -
 io_uring/kbuf.h | 1 -
 2 files changed, 2 deletions(-)

diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 8da2ff798170..43e4f8615fe8 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -680,7 +680,6 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
 	}
 #endif
 
-	bl->nr_entries = reg.ring_entries;
 	bl->mask = reg.ring_entries - 1;
 	bl->flags |= IOBL_BUF_RING;
 	bl->buf_ring = br;
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index bf15e26520d3..abf7052b556e 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -27,7 +27,6 @@ struct io_buffer_list {
 	__u16 bgid;
 
 	/* below is for ring provided buffers */
-	__u16 nr_entries;
 	__u16 head;
 	__u16 mask;
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers
  2026-04-28 15:44 [PATCHSET 0/2] kbuf fixes Jens Axboe
  2026-04-28 15:44 ` [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member Jens Axboe
@ 2026-04-28 15:44 ` Jens Axboe
  2026-04-28 17:53   ` Gabriel Krisman Bertazi
  1 sibling, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2026-04-28 15:44 UTC (permalink / raw)
  To: io-uring; +Cc: Martin Michaelis, stable, Jens Axboe

From: Martin Michaelis <code@mgjm.de>

Incrementally consumed buffer rings are generally fully consumed, but
it's quite possible that the application has a minimum size it needs to
meet to avoid truncation. Currently that minimum limit is 1 byte, but
this should be a setting that is the hands of the application. For
recvmsg multishot, a prime use case for incrementally consumed buffers,
the application may get spurious -EFAULT returned at the end of an
incrementally consumed buffer, as less space is available than the
headers need.

Grab a u32 field in struct io_uring_buf_reg, which the application can
use to inform the kernel of the minimum size that should be available
in an incrementally consumed buffer. If less than that is available,
the current buffer is fully processed and the next one will be picked.

Cc: stable@vger.kernel.org
Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
Link: https://github.com/axboe/liburing/issues/1433
Signed-off-by: Martin Michaelis <code@mgjm.de>
[axboe: write commit message, change io_buffer_list member name]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/uapi/linux/io_uring.h | 3 ++-
 io_uring/kbuf.c               | 8 +++++++-
 io_uring/kbuf.h               | 7 +++++++
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 17ac1b785440..909fb7aea638 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -905,7 +905,8 @@ struct io_uring_buf_reg {
 	__u32	ring_entries;
 	__u16	bgid;
 	__u16	flags;
-	__u64	resv[3];
+	__u32	min_left;
+	__u32	resv[5];
 };
 
 /* argument for IORING_REGISTER_PBUF_STATUS */
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 43e4f8615fe8..63061aa1cab9 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -47,7 +47,7 @@ static bool io_kbuf_inc_commit(struct io_buffer_list *bl, int len)
 		this_len = min_t(u32, len, buf_len);
 		buf_len -= this_len;
 		/* Stop looping for invalid buffer length of 0 */
-		if (buf_len || !this_len) {
+		if (buf_len > bl->min_left_sub_one || !this_len) {
 			WRITE_ONCE(buf->addr, READ_ONCE(buf->addr) + this_len);
 			WRITE_ONCE(buf->len, buf_len);
 			return false;
@@ -637,6 +637,10 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
 	if (reg.ring_entries >= 65536)
 		return -EINVAL;
 
+	/* minimum left byte count is a property of incremental buffers */
+	if (!(reg.flags & IOU_PBUF_RING_INC) && reg.min_left)
+		return -EINVAL;
+
 	bl = io_buffer_get_list(ctx, reg.bgid);
 	if (bl) {
 		/* if mapped buffer ring OR classic exists, don't allow */
@@ -683,6 +687,8 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
 	bl->mask = reg.ring_entries - 1;
 	bl->flags |= IOBL_BUF_RING;
 	bl->buf_ring = br;
+	if (reg.min_left)
+		bl->min_left_sub_one = reg.min_left - 1;
 	if (reg.flags & IOU_PBUF_RING_INC)
 		bl->flags |= IOBL_INC;
 	ret = io_buffer_add_list(ctx, bl, reg.bgid);
diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
index abf7052b556e..401773e1ef80 100644
--- a/io_uring/kbuf.h
+++ b/io_uring/kbuf.h
@@ -32,6 +32,13 @@ struct io_buffer_list {
 
 	__u16 flags;
 
+	/*
+	 * minimum required amount to be left to reuse an incrementally
+	 * consumed buffer. If less than this is left at consumption time,
+	 * buffer is done and head is incremented to the next buffer.
+	 */
+	__u32 min_left_sub_one;
+
 	struct io_mapped_region region;
 };
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers
  2026-04-28 15:44 ` [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers Jens Axboe
@ 2026-04-28 17:53   ` Gabriel Krisman Bertazi
  2026-04-28 18:02     ` Jens Axboe
  0 siblings, 1 reply; 7+ messages in thread
From: Gabriel Krisman Bertazi @ 2026-04-28 17:53 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Martin Michaelis, stable

Jens Axboe <axboe@kernel.dk> writes:

> From: Martin Michaelis <code@mgjm.de>
>
> Incrementally consumed buffer rings are generally fully consumed, but
> it's quite possible that the application has a minimum size it needs to
> meet to avoid truncation. Currently that minimum limit is 1 byte, but
> this should be a setting that is the hands of the application. For
> recvmsg multishot, a prime use case for incrementally consumed buffers,
> the application may get spurious -EFAULT returned at the end of an
> incrementally consumed buffer, as less space is available than the
> headers need.
>
> Grab a u32 field in struct io_uring_buf_reg, which the application can
> use to inform the kernel of the minimum size that should be available
> in an incrementally consumed buffer. If less than that is available,
> the current buffer is fully processed and the next one will be picked.
>
> Cc: stable@vger.kernel.org
> Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
> Link: https://github.com/axboe/liburing/issues/1433
> Signed-off-by: Martin Michaelis <code@mgjm.de>
> [axboe: write commit message, change io_buffer_list member name]
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  include/uapi/linux/io_uring.h | 3 ++-
>  io_uring/kbuf.c               | 8 +++++++-
>  io_uring/kbuf.h               | 7 +++++++
>  3 files changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 17ac1b785440..909fb7aea638 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -905,7 +905,8 @@ struct io_uring_buf_reg {
>  	__u32	ring_entries;
>  	__u16	bgid;
>  	__u16	flags;
> -	__u64	resv[3];
> +	__u32	min_left;
> +	__u32	resv[5];

Honest question, isn't this a property of the specific operation and/or
fd being operated, instead of the buffer_reg?

>  };
>  
>  /* argument for IORING_REGISTER_PBUF_STATUS */
> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
> index 43e4f8615fe8..63061aa1cab9 100644
> --- a/io_uring/kbuf.c
> +++ b/io_uring/kbuf.c
> @@ -47,7 +47,7 @@ static bool io_kbuf_inc_commit(struct io_buffer_list *bl, int len)
>  		this_len = min_t(u32, len, buf_len);
>  		buf_len -= this_len;
>  		/* Stop looping for invalid buffer length of 0 */
> -		if (buf_len || !this_len) {
> +		if (buf_len > bl->min_left_sub_one || !this_len) {

Cosmetic, but perhaps store min_left_sub_one instead of min_left itself? the
buf_len must be >= min_left, and that is easier to read.  (buf_len &&
buf_len >= min_left || !this_len)

>  			WRITE_ONCE(buf->addr, READ_ONCE(buf->addr) + this_len);
>  			WRITE_ONCE(buf->len, buf_len);
>  			return false;
> @@ -637,6 +637,10 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
>  	if (reg.ring_entries >= 65536)
>  		return -EINVAL;
>  
> +	/* minimum left byte count is a property of incremental buffers */
> +	if (!(reg.flags & IOU_PBUF_RING_INC) && reg.min_left)
> +		return -EINVAL;
> +
>  	bl = io_buffer_get_list(ctx, reg.bgid);
>  	if (bl) {
>  		/* if mapped buffer ring OR classic exists, don't allow */
> @@ -683,6 +687,8 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
>  	bl->mask = reg.ring_entries - 1;
>  	bl->flags |= IOBL_BUF_RING;
>  	bl->buf_ring = br;
> +	if (reg.min_left)
> +		bl->min_left_sub_one = reg.min_left - 1;
>  	if (reg.flags & IOU_PBUF_RING_INC)
>  		bl->flags |= IOBL_INC;
>  	ret = io_buffer_add_list(ctx, bl, reg.bgid);
> diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
> index abf7052b556e..401773e1ef80 100644
> --- a/io_uring/kbuf.h
> +++ b/io_uring/kbuf.h
> @@ -32,6 +32,13 @@ struct io_buffer_list {
>  
>  	__u16 flags;
>  
> +	/*
> +	 * minimum required amount to be left to reuse an incrementally
> +	 * consumed buffer. If less than this is left at consumption time,
> +	 * buffer is done and head is incremented to the next buffer.
> +	 */
> +	__u32 min_left_sub_one;
> +
>  	struct io_mapped_region region;
>  };

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member
  2026-04-28 15:44 ` [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member Jens Axboe
@ 2026-04-28 17:54   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 7+ messages in thread
From: Gabriel Krisman Bertazi @ 2026-04-28 17:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring

Jens Axboe <axboe@kernel.dk> writes:

> This is only ever assigned, never used. The only used part is the
> calculated mask, which is used for indexing. Kill 'nr_entries'.
>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>

> ---
>  io_uring/kbuf.c | 1 -
>  io_uring/kbuf.h | 1 -
>  2 files changed, 2 deletions(-)
>
> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
> index 8da2ff798170..43e4f8615fe8 100644
> --- a/io_uring/kbuf.c
> +++ b/io_uring/kbuf.c
> @@ -680,7 +680,6 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
>  	}
>  #endif
>  
> -	bl->nr_entries = reg.ring_entries;
>  	bl->mask = reg.ring_entries - 1;
>  	bl->flags |= IOBL_BUF_RING;
>  	bl->buf_ring = br;
> diff --git a/io_uring/kbuf.h b/io_uring/kbuf.h
> index bf15e26520d3..abf7052b556e 100644
> --- a/io_uring/kbuf.h
> +++ b/io_uring/kbuf.h
> @@ -27,7 +27,6 @@ struct io_buffer_list {
>  	__u16 bgid;
>  
>  	/* below is for ring provided buffers */
> -	__u16 nr_entries;
>  	__u16 head;
>  	__u16 mask;

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers
  2026-04-28 17:53   ` Gabriel Krisman Bertazi
@ 2026-04-28 18:02     ` Jens Axboe
  2026-04-28 19:08       ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 7+ messages in thread
From: Jens Axboe @ 2026-04-28 18:02 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: io-uring, Martin Michaelis, stable

On 4/28/26 11:53 AM, Gabriel Krisman Bertazi wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> From: Martin Michaelis <code@mgjm.de>
>>
>> Incrementally consumed buffer rings are generally fully consumed, but
>> it's quite possible that the application has a minimum size it needs to
>> meet to avoid truncation. Currently that minimum limit is 1 byte, but
>> this should be a setting that is the hands of the application. For
>> recvmsg multishot, a prime use case for incrementally consumed buffers,
>> the application may get spurious -EFAULT returned at the end of an
>> incrementally consumed buffer, as less space is available than the
>> headers need.
>>
>> Grab a u32 field in struct io_uring_buf_reg, which the application can
>> use to inform the kernel of the minimum size that should be available
>> in an incrementally consumed buffer. If less than that is available,
>> the current buffer is fully processed and the next one will be picked.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption")
>> Link: https://github.com/axboe/liburing/issues/1433
>> Signed-off-by: Martin Michaelis <code@mgjm.de>
>> [axboe: write commit message, change io_buffer_list member name]
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  include/uapi/linux/io_uring.h | 3 ++-
>>  io_uring/kbuf.c               | 8 +++++++-
>>  io_uring/kbuf.h               | 7 +++++++
>>  3 files changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>> index 17ac1b785440..909fb7aea638 100644
>> --- a/include/uapi/linux/io_uring.h
>> +++ b/include/uapi/linux/io_uring.h
>> @@ -905,7 +905,8 @@ struct io_uring_buf_reg {
>>  	__u32	ring_entries;
>>  	__u16	bgid;
>>  	__u16	flags;
>> -	__u64	resv[3];
>> +	__u32	min_left;
>> +	__u32	resv[5];
> 
> Honest question, isn't this a property of the specific operation and/or
> fd being operated, instead of the buffer_reg?

It kind of is, in that some users may not care. But it's not currently
possible to pass this in on a per-op basis, and while I did hack that
up initially, it's almost impossible as you end up with layering
violations. In practice, this is really mostly a recvmsg multishot
issue, because we need to store the headers. Hence the solution to
stuff it in the io_uring_buf_reg instead, and make it a fixed property
of the buffer group. In practice, you may even want a larger min_left
than what the recvmsg requires, as you don't want a tiny truncated
transfer at the end, regardless of what type of recv or read operation
this is. Hence it works generically as well.

Also see the linked GH issue, that's where most of the discussion
around this have happened already.

>>  /* argument for IORING_REGISTER_PBUF_STATUS */
>> diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
>> index 43e4f8615fe8..63061aa1cab9 100644
>> --- a/io_uring/kbuf.c
>> +++ b/io_uring/kbuf.c
>> @@ -47,7 +47,7 @@ static bool io_kbuf_inc_commit(struct io_buffer_list *bl, int len)
>>  		this_len = min_t(u32, len, buf_len);
>>  		buf_len -= this_len;
>>  		/* Stop looping for invalid buffer length of 0 */
>> -		if (buf_len || !this_len) {
>> +		if (buf_len > bl->min_left_sub_one || !this_len) {
> 
> Cosmetic, but perhaps store min_left_sub_one instead of min_left itself? the
> buf_len must be >= min_left, and that is easier to read.  (buf_len &&
> buf_len >= min_left || !this_len)

Also see GH issue.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers
  2026-04-28 18:02     ` Jens Axboe
@ 2026-04-28 19:08       ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 7+ messages in thread
From: Gabriel Krisman Bertazi @ 2026-04-28 19:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: io-uring, Martin Michaelis, stable

Jens Axboe <axboe@kernel.dk> writes:

>> Honest question, isn't this a property of the specific operation and/or
>> fd being operated, instead of the buffer_reg?
>
> It kind of is, in that some users may not care. But it's not currently
> possible to pass this in on a per-op basis, and while I did hack that
> up initially, it's almost impossible as you end up with layering
> violations. In practice, this is really mostly a recvmsg multishot
> issue, because we need to store the headers. Hence the solution to
> stuff it in the io_uring_buf_reg instead, and make it a fixed property
> of the buffer group. In practice, you may even want a larger min_left
> than what the recvmsg requires, as you don't want a tiny truncated
> transfer at the end, regardless of what type of recv or read operation
> this is. Hence it works generically as well.
>
> Also see the linked GH issue, that's where most of the discussion
> around this have happened already.
>
>>> -		if (buf_len || !this_len) {
>>> +		if (buf_len > bl->min_left_sub_one || !this_len) {
>> 
>> Cosmetic, but perhaps store min_left_sub_one instead of min_left itself? the
>> buf_len must be >= min_left, and that is easier to read.  (buf_len &&
>> buf_len >= min_left || !this_len)
>
> Also see GH issue.

Ack. Thanks.  Feel free to add:

Reviewed-by: Gabriel Krisman Bertazi <krisman@suse.de>


-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-28 19:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 15:44 [PATCHSET 0/2] kbuf fixes Jens Axboe
2026-04-28 15:44 ` [PATCH 1/2] io_uring/kbuf: kill dead struct io_buffer_list 'nr_entries' member Jens Axboe
2026-04-28 17:54   ` Gabriel Krisman Bertazi
2026-04-28 15:44 ` [PATCH 2/2] io_uring/kbuf: support min length left for incremental buffers Jens Axboe
2026-04-28 17:53   ` Gabriel Krisman Bertazi
2026-04-28 18:02     ` Jens Axboe
2026-04-28 19:08       ` Gabriel Krisman Bertazi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox