* [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
@ 2025-11-11 19:15 Caleb Sander Mateos
2025-11-11 19:19 ` Chaitanya Kulkarni
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Caleb Sander Mateos @ 2025-11-11 19:15 UTC (permalink / raw)
To: Ming Lei, Keith Busch, Chaitanya Kulkarni, Jens Axboe
Cc: linux-block, Caleb Sander Mateos, io-uring, linux-kernel
io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
the number of bvecs in the request. However, bvecs may be split into
multiple segments depending on the queue limits. Thus, the number of
segments may overestimate the number of bvecs. For ublk devices, the
only current users of io_buffer_register_bvec(), virt_boundary_mask,
seg_boundary_mask, max_segments, and max_segment_size can all be set
arbitrarily by the ublk server process.
Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
loop actually yields. However, continue using blk_rq_nr_phys_segments()
as an upper bound on the number of bvecs when allocating imu to avoid
needing to iterate the bvecs a second time.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
---
io_uring/rsrc.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index d787c16dc1c3..301c6899d240 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -941,12 +941,12 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx;
struct io_rsrc_data *data = &ctx->buf_table;
struct req_iterator rq_iter;
struct io_mapped_ubuf *imu;
struct io_rsrc_node *node;
- struct bio_vec bv, *bvec;
- u16 nr_bvecs;
+ struct bio_vec bv;
+ unsigned int nr_bvecs = 0;
int ret = 0;
io_ring_submit_lock(ctx, issue_flags);
if (index >= data->nr) {
ret = -EINVAL;
@@ -963,32 +963,34 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq,
if (!node) {
ret = -ENOMEM;
goto unlock;
}
- nr_bvecs = blk_rq_nr_phys_segments(rq);
- imu = io_alloc_imu(ctx, nr_bvecs);
+ /*
+ * blk_rq_nr_phys_segments() may overestimate the number of bvecs
+ * but avoids needing to iterate over the bvecs
+ */
+ imu = io_alloc_imu(ctx, blk_rq_nr_phys_segments(rq));
if (!imu) {
kfree(node);
ret = -ENOMEM;
goto unlock;
}
imu->ubuf = 0;
imu->len = blk_rq_bytes(rq);
imu->acct_pages = 0;
imu->folio_shift = PAGE_SHIFT;
- imu->nr_bvecs = nr_bvecs;
refcount_set(&imu->refs, 1);
imu->release = release;
imu->priv = rq;
imu->is_kbuf = true;
imu->dir = 1 << rq_data_dir(rq);
- bvec = imu->bvec;
rq_for_each_bvec(bv, rq, rq_iter)
- *bvec++ = bv;
+ imu->bvec[nr_bvecs++] = bv;
+ imu->nr_bvecs = nr_bvecs;
node->buf = imu;
data->nodes[index] = node;
unlock:
io_ring_submit_unlock(ctx, issue_flags);
--
2.45.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
2025-11-11 19:15 [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs Caleb Sander Mateos
@ 2025-11-11 19:19 ` Chaitanya Kulkarni
2025-11-12 1:01 ` Ming Lei
2025-11-12 15:26 ` Jens Axboe
2 siblings, 0 replies; 6+ messages in thread
From: Chaitanya Kulkarni @ 2025-11-11 19:19 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Keith Busch, linux-block@vger.kernel.org,
io-uring@vger.kernel.org, Jens Axboe, Chaitanya Kulkarni,
linux-kernel@vger.kernel.org, Ming Lei
On 11/11/25 11:15, Caleb Sander Mateos wrote:
> io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> the number of bvecs in the request. However, bvecs may be split into
> multiple segments depending on the queue limits. Thus, the number of
> segments may overestimate the number of bvecs. For ublk devices, the
> only current users of io_buffer_register_bvec(), virt_boundary_mask,
> seg_boundary_mask, max_segments, and max_segment_size can all be set
> arbitrarily by the ublk server process.
> Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> loop actually yields. However, continue using blk_rq_nr_phys_segments()
> as an upper bound on the number of bvecs when allocating imu to avoid
> needing to iterate the bvecs a second time.
>
> Signed-off-by: Caleb Sander Mateos<csander@purestorage.com>
> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
2025-11-11 19:15 [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs Caleb Sander Mateos
2025-11-11 19:19 ` Chaitanya Kulkarni
@ 2025-11-12 1:01 ` Ming Lei
2025-11-12 1:44 ` Caleb Sander Mateos
2025-11-12 15:26 ` Jens Axboe
2 siblings, 1 reply; 6+ messages in thread
From: Ming Lei @ 2025-11-12 1:01 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Keith Busch, Chaitanya Kulkarni, Jens Axboe, linux-block,
io-uring, linux-kernel
On Tue, Nov 11, 2025 at 12:15:29PM -0700, Caleb Sander Mateos wrote:
> io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> the number of bvecs in the request. However, bvecs may be split into
> multiple segments depending on the queue limits. Thus, the number of
> segments may overestimate the number of bvecs. For ublk devices, the
> only current users of io_buffer_register_bvec(), virt_boundary_mask,
> seg_boundary_mask, max_segments, and max_segment_size can all be set
> arbitrarily by the ublk server process.
> Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> loop actually yields. However, continue using blk_rq_nr_phys_segments()
> as an upper bound on the number of bvecs when allocating imu to avoid
> needing to iterate the bvecs a second time.
>
> Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
BTW, this issue may not be a problem because ->nr_bvecs is only used in
iov_iter_bvec(), in which 'offset' and 'len' can control how far the
iterator can reach, so the uninitialized bvecs won't be touched basically.
Otherwise, the issue should have been triggered somewhere.
Also the bvec allocation may be avoided in case of single-bio request,
which can be one future optimization.
Thanks,
Ming
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
2025-11-12 1:01 ` Ming Lei
@ 2025-11-12 1:44 ` Caleb Sander Mateos
2025-11-12 1:59 ` Ming Lei
0 siblings, 1 reply; 6+ messages in thread
From: Caleb Sander Mateos @ 2025-11-12 1:44 UTC (permalink / raw)
To: Ming Lei
Cc: Keith Busch, Chaitanya Kulkarni, Jens Axboe, linux-block,
io-uring, linux-kernel
On Tue, Nov 11, 2025 at 5:01 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Tue, Nov 11, 2025 at 12:15:29PM -0700, Caleb Sander Mateos wrote:
> > io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> > the number of bvecs in the request. However, bvecs may be split into
> > multiple segments depending on the queue limits. Thus, the number of
> > segments may overestimate the number of bvecs. For ublk devices, the
> > only current users of io_buffer_register_bvec(), virt_boundary_mask,
> > seg_boundary_mask, max_segments, and max_segment_size can all be set
> > arbitrarily by the ublk server process.
> > Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> > loop actually yields. However, continue using blk_rq_nr_phys_segments()
> > as an upper bound on the number of bvecs when allocating imu to avoid
> > needing to iterate the bvecs a second time.
> >
> > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
>
> Reviewed-by: Ming Lei <ming.lei@redhat.com>
>
> BTW, this issue may not be a problem because ->nr_bvecs is only used in
> iov_iter_bvec(), in which 'offset' and 'len' can control how far the
> iterator can reach, so the uninitialized bvecs won't be touched basically.
I see your point, but what about iov_iter_extract_bvec_pages()? That
looks like it only uses i->nr_segs to bound the iteration, not
i->count. Hopefully there aren't any other helpers relying on nr_segs.
If you really don't think it's a problem, I'm fine deferring the patch
to 6.19. We haven't encountered any problems caused by this bug, but
we haven't tested with any non-default virt_boundary_mask,
seg_boundary_mask, max_segments, or max_segment_size on the ublk
device.
>
> Otherwise, the issue should have been triggered somewhere.
>
> Also the bvec allocation may be avoided in case of single-bio request,
> which can be one future optimization.
I'm not sure what you're suggesting. The bio_vec array is a flexible
array member of io_mapped_ubuf, so unless we add another pointer
indirection, I don't see how to reuse the bio's bi_io_vec array.
io_mapped_ubuf is also used for user registered buffers, where this
optimization isn't possible, so it may not be a clear win.
Best,
Caleb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
2025-11-12 1:44 ` Caleb Sander Mateos
@ 2025-11-12 1:59 ` Ming Lei
0 siblings, 0 replies; 6+ messages in thread
From: Ming Lei @ 2025-11-12 1:59 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Keith Busch, Chaitanya Kulkarni, Jens Axboe, linux-block,
io-uring, linux-kernel
On Tue, Nov 11, 2025 at 05:44:18PM -0800, Caleb Sander Mateos wrote:
> On Tue, Nov 11, 2025 at 5:01 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Tue, Nov 11, 2025 at 12:15:29PM -0700, Caleb Sander Mateos wrote:
> > > io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> > > the number of bvecs in the request. However, bvecs may be split into
> > > multiple segments depending on the queue limits. Thus, the number of
> > > segments may overestimate the number of bvecs. For ublk devices, the
> > > only current users of io_buffer_register_bvec(), virt_boundary_mask,
> > > seg_boundary_mask, max_segments, and max_segment_size can all be set
> > > arbitrarily by the ublk server process.
> > > Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> > > loop actually yields. However, continue using blk_rq_nr_phys_segments()
> > > as an upper bound on the number of bvecs when allocating imu to avoid
> > > needing to iterate the bvecs a second time.
> > >
> > > Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
> > > Fixes: 27cb27b6d5ea ("io_uring: add support for kernel registered bvecs")
> >
> > Reviewed-by: Ming Lei <ming.lei@redhat.com>
> >
> > BTW, this issue may not be a problem because ->nr_bvecs is only used in
> > iov_iter_bvec(), in which 'offset' and 'len' can control how far the
> > iterator can reach, so the uninitialized bvecs won't be touched basically.
>
> I see your point, but what about iov_iter_extract_bvec_pages()? That
> looks like it only uses i->nr_segs to bound the iteration, not
> i->count. Hopefully there aren't any other helpers relying on nr_segs.
iov_iter_extract_bvec_pages() is only called from iov_iter_extract_pages(),
in which 'maxsize' is capped by i->count.
> If you really don't think it's a problem, I'm fine deferring the patch
> to 6.19. We haven't encountered any problems caused by this bug, but
> we haven't tested with any non-default virt_boundary_mask,
> seg_boundary_mask, max_segments, or max_segment_size on the ublk
> device.
IMO it should belong to v6.18: your fix not only makes code more robust, but
also it is correct thing to do.
I am just thinking why the issue wasn't triggered because we have lots of
test cases(rw verify, mkfs & mount ...)
>
> >
> > Otherwise, the issue should have been triggered somewhere.
> >
> > Also the bvec allocation may be avoided in case of single-bio request,
> > which can be one future optimization.
>
> I'm not sure what you're suggesting. The bio_vec array is a flexible
> array member of io_mapped_ubuf, so unless we add another pointer
> indirection, I don't see how to reuse the bio's bi_io_vec array.
> io_mapped_ubuf is also used for user registered buffers, where this
> optimization isn't possible, so it may not be a clear win.
io_mapped_ubuf->acct_pages can be one field reused for the indirect
pointer, please see lo_rw_aio() about how to reuse the bvec array.
Thanks,
Ming
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
2025-11-11 19:15 [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs Caleb Sander Mateos
2025-11-11 19:19 ` Chaitanya Kulkarni
2025-11-12 1:01 ` Ming Lei
@ 2025-11-12 15:26 ` Jens Axboe
2 siblings, 0 replies; 6+ messages in thread
From: Jens Axboe @ 2025-11-12 15:26 UTC (permalink / raw)
To: Ming Lei, Keith Busch, Chaitanya Kulkarni, Caleb Sander Mateos
Cc: linux-block, io-uring, linux-kernel
On Tue, 11 Nov 2025 12:15:29 -0700, Caleb Sander Mateos wrote:
> io_buffer_register_bvec() currently uses blk_rq_nr_phys_segments() as
> the number of bvecs in the request. However, bvecs may be split into
> multiple segments depending on the queue limits. Thus, the number of
> segments may overestimate the number of bvecs. For ublk devices, the
> only current users of io_buffer_register_bvec(), virt_boundary_mask,
> seg_boundary_mask, max_segments, and max_segment_size can all be set
> arbitrarily by the ublk server process.
> Set imu->nr_bvecs based on the number of bvecs the rq_for_each_bvec()
> loop actually yields. However, continue using blk_rq_nr_phys_segments()
> as an upper bound on the number of bvecs when allocating imu to avoid
> needing to iterate the bvecs a second time.
>
> [...]
Applied, thanks!
[1/1] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs
commit: 2d0e88f3fd1dcb37072d499c36162baf5b009d41
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-11-12 15:26 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-11 19:15 [PATCH] io_uring/rsrc: don't use blk_rq_nr_phys_segments() as number of bvecs Caleb Sander Mateos
2025-11-11 19:19 ` Chaitanya Kulkarni
2025-11-12 1:01 ` Ming Lei
2025-11-12 1:44 ` Caleb Sander Mateos
2025-11-12 1:59 ` Ming Lei
2025-11-12 15:26 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox