* [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
@ 2025-12-17 12:31 Ming Lei
2025-12-17 15:16 ` huang-jl
0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2025-12-17 12:31 UTC (permalink / raw)
To: Jens Axboe, io-uring; +Cc: Ming Lei, huang-jl, Caleb Sander Mateos
Replace the iov_iter_advance() call and the separate loop for
calculating nr_segs with a single unified loop that iterates over
bvecs to determine the starting position, iov_offset, and nr_segs.
This simplifies the logic and avoids the overhead of iov_iter_advance().
Cc: huang-jl <huang-jl@deepseek.com>
Cc: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
io_uring/rsrc.c | 38 ++++++++++++++++++++++++++++----------
1 file changed, 28 insertions(+), 10 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index a63474b331bf..f7ee146d5330 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1051,20 +1051,38 @@ static int validate_fixed_range(u64 buf_addr, size_t len,
static int io_import_kbuf(int ddir, struct iov_iter *iter,
struct io_mapped_ubuf *imu, size_t len, size_t offset)
{
- size_t count = len + offset;
-
- iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, count);
- iov_iter_advance(iter, offset);
+ const struct bio_vec *bvec = imu->bvec;
+ size_t iov_offset = 0;
+ unsigned int nr_segs = 0;
+ size_t remaining = len;
+ unsigned int i;
- if (count < imu->len) {
- const struct bio_vec *bvec = iter->bvec;
+ /* Single loop to calculate starting bvec, iov_offset, and nr_segs */
+ for (i = 0; i < imu->nr_bvecs; i++, bvec++) {
+ size_t bvec_avail;
- while (len > bvec->bv_len) {
- len -= bvec->bv_len;
- bvec++;
+ /* Skip offset bytes to find starting position */
+ if (offset > 0) {
+ if (offset >= bvec->bv_len) {
+ offset -= bvec->bv_len;
+ continue;
+ }
+ iov_offset = offset;
+ bvec_avail = bvec->bv_len - offset;
+ offset = 0;
+ } else {
+ bvec_avail = bvec->bv_len;
}
- iter->nr_segs = 1 + bvec - iter->bvec;
+
+ /* Count bvecs for len bytes */
+ nr_segs++;
+ if (remaining <= bvec_avail)
+ break;
+ remaining -= bvec_avail;
}
+
+ iov_iter_bvec(iter, ddir, bvec - nr_segs + 1, nr_segs, len);
+ iter->iov_offset = iov_offset;
return 0;
}
--
2.47.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
2025-12-17 12:31 [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop Ming Lei
@ 2025-12-17 15:16 ` huang-jl
2025-12-18 1:42 ` Ming Lei
0 siblings, 1 reply; 9+ messages in thread
From: huang-jl @ 2025-12-17 15:16 UTC (permalink / raw)
To: ming.lei; +Cc: axboe, csander, huang-jl, io-uring
The code looks correct to me.
> This simplifies the logic
I'm not an expert in Linux development, but from my perspective, the
original version seems simpler and more readable. The semantics of
iov_iter_advance() are clear and well-understood.
That said, I understand the appeal of merging them into a single loop.
> and avoids the overhead of iov_iter_advance()
Could you clarify what overhead you mean? If it's the function call
overhead, I think the compiler would inline it anyway. The actual
iteration work seems equivalent between both approaches.
Thanks,
huang-jl
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
2025-12-17 15:16 ` huang-jl
@ 2025-12-18 1:42 ` Ming Lei
2025-12-18 2:27 ` Ming Lei
0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2025-12-18 1:42 UTC (permalink / raw)
To: huang-jl; +Cc: axboe, csander, io-uring, nj.shetty
On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> The code looks correct to me.
>
> > This simplifies the logic
>
> I'm not an expert in Linux development, but from my perspective, the
> original version seems simpler and more readable. The semantics of
> iov_iter_advance() are clear and well-understood.
>
> That said, I understand the appeal of merging them into a single loop.
>
> > and avoids the overhead of iov_iter_advance()
>
> Could you clarify what overhead you mean? If it's the function call
> overhead, I think the compiler would inline it anyway. The actual
> iteration work seems equivalent between both approaches.
iov_iter_advance() is global function, and it can't be inline.
Also single loop is more readable, cause ->iov_offset can be ignored easily.
In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
potential split, however not get idea how it is triggered. Nitesh didn't
mention the exact reason:
https://lkml.org/lkml/2025/4/16/351
I will look at the reason and see if it can be avoided.
Thanks,
Ming
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
2025-12-18 1:42 ` Ming Lei
@ 2025-12-18 2:27 ` Ming Lei
2025-12-18 3:19 ` huang-jl
2025-12-22 19:56 ` Caleb Sander Mateos
0 siblings, 2 replies; 9+ messages in thread
From: Ming Lei @ 2025-12-18 2:27 UTC (permalink / raw)
To: huang-jl; +Cc: axboe, csander, io-uring, nj.shetty
On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > The code looks correct to me.
> >
> > > This simplifies the logic
> >
> > I'm not an expert in Linux development, but from my perspective, the
> > original version seems simpler and more readable. The semantics of
> > iov_iter_advance() are clear and well-understood.
> >
> > That said, I understand the appeal of merging them into a single loop.
> >
> > > and avoids the overhead of iov_iter_advance()
> >
> > Could you clarify what overhead you mean? If it's the function call
> > overhead, I think the compiler would inline it anyway. The actual
> > iteration work seems equivalent between both approaches.
>
> iov_iter_advance() is global function, and it can't be inline.
>
> Also single loop is more readable, cause ->iov_offset can be ignored easily.
>
> In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> potential split, however not get idea how it is triggered. Nitesh didn't
> mention the exact reason:
>
> https://lkml.org/lkml/2025/4/16/351
>
> I will look at the reason and see if it can be avoided.
The reason is in both bio_iov_bvec_set() and bio_may_need_split().
->bi_vcnt doesn't make sense for cloned bio, and shouldn't be used as multiple
segment hint.
However, it also shows bio_split_rw() is too heavy.
Thanks,
Ming
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
2025-12-18 2:27 ` Ming Lei
@ 2025-12-18 3:19 ` huang-jl
2025-12-22 19:56 ` Caleb Sander Mateos
1 sibling, 0 replies; 9+ messages in thread
From: huang-jl @ 2025-12-18 3:19 UTC (permalink / raw)
To: ming.lei; +Cc: axboe, csander, io-uring, nj.shetty
> ->bi_vcnt doesn't make sense for cloned bio, and shouldn't be used as multiple
> segment hint.
>
> However, it also shows bio_split_rw() is too heavy.
Sorry, I'm not familiar with this area of the codebase. Perhaps need
someone else to provide more valuable suggestions.
--
Thanks,
huang-jl
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
2025-12-18 2:27 ` Ming Lei
2025-12-18 3:19 ` huang-jl
@ 2025-12-22 19:56 ` Caleb Sander Mateos
2025-12-23 2:30 ` Ming Lei
1 sibling, 1 reply; 9+ messages in thread
From: Caleb Sander Mateos @ 2025-12-22 19:56 UTC (permalink / raw)
To: Ming Lei; +Cc: huang-jl, axboe, io-uring, nj.shetty
On Wed, Dec 17, 2025 at 9:28 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> > On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > > The code looks correct to me.
> > >
> > > > This simplifies the logic
> > >
> > > I'm not an expert in Linux development, but from my perspective, the
> > > original version seems simpler and more readable. The semantics of
> > > iov_iter_advance() are clear and well-understood.
> > >
> > > That said, I understand the appeal of merging them into a single loop.
> > >
> > > > and avoids the overhead of iov_iter_advance()
> > >
> > > Could you clarify what overhead you mean? If it's the function call
> > > overhead, I think the compiler would inline it anyway. The actual
> > > iteration work seems equivalent between both approaches.
> >
> > iov_iter_advance() is global function, and it can't be inline.
> >
> > Also single loop is more readable, cause ->iov_offset can be ignored easily.
> >
> > In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> > potential split, however not get idea how it is triggered. Nitesh didn't
> > mention the exact reason:
> >
> > https://lkml.org/lkml/2025/4/16/351
> >
> > I will look at the reason and see if it can be avoided.
>
> The reason is in both bio_iov_bvec_set() and bio_may_need_split().
nr_segs is not just a performance optimization, it's part of the
struct iov_iter API and used by iov_iter_extract_bvec_pages(), as
huang-jl pointed out. I don't think it's a good idea to assume that
nr_segs isn't going to be used and doesn't need to be calculated
correctly.
I think this patch is a definite improvement as it reduces the number
of assumptions about the internal structure of a bvec iov_iter. The
remaining assignment to iter->iov_offset is unfortunate, but I don't
see a great way around it.
My only suggestion would be to separate out a "find first bvec" loop
and a "find last bvec" loop to make the loop body less branchy.
Something like this:
const struct bio_vec *bvec = imu->bvec, bvec_start;
size_t remaining;
unsigned bvec_avail;
while (offset >= bvec->bv_len) {
offset -= bvec->bv_len;
bvec++;
}
bvec_avail = bvec->bv_len - offset;
bvec_start = bvec;
remaining = len;
while (remaining > bvec_avail) {
remaining -= bvec_avail;
bvec++;
bvec_avail = bvec->bv_len;
}
iov_iter_bvec(iter, ddir, start_bvec, bvec - start_bvec + 1, len);
iter->iov_offset = offset;
Best,
Caleb
>
> ->bi_vcnt doesn't make sense for cloned bio, and shouldn't be used as multiple
> segment hint.
>
> However, it also shows bio_split_rw() is too heavy.
>
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
2025-12-22 19:56 ` Caleb Sander Mateos
@ 2025-12-23 2:30 ` Ming Lei
2025-12-23 19:56 ` Caleb Sander Mateos
0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2025-12-23 2:30 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: huang-jl, axboe, io-uring, nj.shetty
On Mon, Dec 22, 2025 at 02:56:02PM -0500, Caleb Sander Mateos wrote:
> On Wed, Dec 17, 2025 at 9:28 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> > > On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > > > The code looks correct to me.
> > > >
> > > > > This simplifies the logic
> > > >
> > > > I'm not an expert in Linux development, but from my perspective, the
> > > > original version seems simpler and more readable. The semantics of
> > > > iov_iter_advance() are clear and well-understood.
> > > >
> > > > That said, I understand the appeal of merging them into a single loop.
> > > >
> > > > > and avoids the overhead of iov_iter_advance()
> > > >
> > > > Could you clarify what overhead you mean? If it's the function call
> > > > overhead, I think the compiler would inline it anyway. The actual
> > > > iteration work seems equivalent between both approaches.
> > >
> > > iov_iter_advance() is global function, and it can't be inline.
> > >
> > > Also single loop is more readable, cause ->iov_offset can be ignored easily.
> > >
> > > In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> > > potential split, however not get idea how it is triggered. Nitesh didn't
> > > mention the exact reason:
> > >
> > > https://lkml.org/lkml/2025/4/16/351
> > >
> > > I will look at the reason and see if it can be avoided.
> >
> > The reason is in both bio_iov_bvec_set() and bio_may_need_split().
>
> nr_segs is not just a performance optimization, it's part of the
> struct iov_iter API and used by iov_iter_extract_bvec_pages(), as
> huang-jl pointed out. I don't think it's a good idea to assume that
> nr_segs isn't going to be used and doesn't need to be calculated
> correctly.
It doesn't have to be exact if the bytes covered by `count` won't cross
`nr_segs`.
The `nr_segs` re-calculation is added only for fixing performance regression
in the following link:
https://lkml.org/lkml/2025/4/16/351
because bio_iov_bvec_set() takes iter->nr_segs for setting bio->bi_vcnt.
>
> I think this patch is a definite improvement as it reduces the number
> of assumptions about the internal structure of a bvec iov_iter. The
> remaining assignment to iter->iov_offset is unfortunate, but I don't
> see a great way around it.
>
The re-calculation can be removed, please see the following patches:
https://lore.kernel.org/linux-block/20251218093146.1218279-1-ming.lei@redhat.com/
Nitesh has verified that it won't cause perf regression by replacing
bio->bi_vcnt with __bvec_iter_bvec() in bio_may_need_split().
Thanks,
Ming
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
2025-12-23 2:30 ` Ming Lei
@ 2025-12-23 19:56 ` Caleb Sander Mateos
2025-12-23 22:45 ` Ming Lei
0 siblings, 1 reply; 9+ messages in thread
From: Caleb Sander Mateos @ 2025-12-23 19:56 UTC (permalink / raw)
To: Ming Lei; +Cc: huang-jl, axboe, io-uring, nj.shetty
On Mon, Dec 22, 2025 at 9:30 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Mon, Dec 22, 2025 at 02:56:02PM -0500, Caleb Sander Mateos wrote:
> > On Wed, Dec 17, 2025 at 9:28 PM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> > > > On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > > > > The code looks correct to me.
> > > > >
> > > > > > This simplifies the logic
> > > > >
> > > > > I'm not an expert in Linux development, but from my perspective, the
> > > > > original version seems simpler and more readable. The semantics of
> > > > > iov_iter_advance() are clear and well-understood.
> > > > >
> > > > > That said, I understand the appeal of merging them into a single loop.
> > > > >
> > > > > > and avoids the overhead of iov_iter_advance()
> > > > >
> > > > > Could you clarify what overhead you mean? If it's the function call
> > > > > overhead, I think the compiler would inline it anyway. The actual
> > > > > iteration work seems equivalent between both approaches.
> > > >
> > > > iov_iter_advance() is global function, and it can't be inline.
> > > >
> > > > Also single loop is more readable, cause ->iov_offset can be ignored easily.
> > > >
> > > > In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> > > > potential split, however not get idea how it is triggered. Nitesh didn't
> > > > mention the exact reason:
> > > >
> > > > https://lkml.org/lkml/2025/4/16/351
> > > >
> > > > I will look at the reason and see if it can be avoided.
> > >
> > > The reason is in both bio_iov_bvec_set() and bio_may_need_split().
> >
> > nr_segs is not just a performance optimization, it's part of the
> > struct iov_iter API and used by iov_iter_extract_bvec_pages(), as
> > huang-jl pointed out. I don't think it's a good idea to assume that
> > nr_segs isn't going to be used and doesn't need to be calculated
> > correctly.
>
> It doesn't have to be exact if the bytes covered by `count` won't cross
> `nr_segs`.
>
> The `nr_segs` re-calculation is added only for fixing performance regression
> in the following link:
>
> https://lkml.org/lkml/2025/4/16/351
>
> because bio_iov_bvec_set() takes iter->nr_segs for setting bio->bi_vcnt.
But iov_iter_extract_bvec_pages() appears to only use iter->nr_segs
and not iter->count. I don't understand how it can get away with an
overestimated iter->nr_segs.
Best,
Caleb
>
> >
> > I think this patch is a definite improvement as it reduces the number
> > of assumptions about the internal structure of a bvec iov_iter. The
> > remaining assignment to iter->iov_offset is unfortunate, but I don't
> > see a great way around it.
> >
>
> The re-calculation can be removed, please see the following patches:
>
> https://lore.kernel.org/linux-block/20251218093146.1218279-1-ming.lei@redhat.com/
>
> Nitesh has verified that it won't cause perf regression by replacing
> bio->bi_vcnt with __bvec_iter_bvec() in bio_may_need_split().
>
>
> Thanks,
> Ming
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
2025-12-23 19:56 ` Caleb Sander Mateos
@ 2025-12-23 22:45 ` Ming Lei
0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2025-12-23 22:45 UTC (permalink / raw)
To: Caleb Sander Mateos; +Cc: huang-jl, axboe, io-uring, nj.shetty
On Tue, Dec 23, 2025 at 02:56:45PM -0500, Caleb Sander Mateos wrote:
> On Mon, Dec 22, 2025 at 9:30 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Mon, Dec 22, 2025 at 02:56:02PM -0500, Caleb Sander Mateos wrote:
> > > On Wed, Dec 17, 2025 at 9:28 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> > > > > On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > > > > > The code looks correct to me.
> > > > > >
> > > > > > > This simplifies the logic
> > > > > >
> > > > > > I'm not an expert in Linux development, but from my perspective, the
> > > > > > original version seems simpler and more readable. The semantics of
> > > > > > iov_iter_advance() are clear and well-understood.
> > > > > >
> > > > > > That said, I understand the appeal of merging them into a single loop.
> > > > > >
> > > > > > > and avoids the overhead of iov_iter_advance()
> > > > > >
> > > > > > Could you clarify what overhead you mean? If it's the function call
> > > > > > overhead, I think the compiler would inline it anyway. The actual
> > > > > > iteration work seems equivalent between both approaches.
> > > > >
> > > > > iov_iter_advance() is global function, and it can't be inline.
> > > > >
> > > > > Also single loop is more readable, cause ->iov_offset can be ignored easily.
> > > > >
> > > > > In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> > > > > potential split, however not get idea how it is triggered. Nitesh didn't
> > > > > mention the exact reason:
> > > > >
> > > > > https://lkml.org/lkml/2025/4/16/351
> > > > >
> > > > > I will look at the reason and see if it can be avoided.
> > > >
> > > > The reason is in both bio_iov_bvec_set() and bio_may_need_split().
> > >
> > > nr_segs is not just a performance optimization, it's part of the
> > > struct iov_iter API and used by iov_iter_extract_bvec_pages(), as
> > > huang-jl pointed out. I don't think it's a good idea to assume that
> > > nr_segs isn't going to be used and doesn't need to be calculated
> > > correctly.
> >
> > It doesn't have to be exact if the bytes covered by `count` won't cross
> > `nr_segs`.
> >
> > The `nr_segs` re-calculation is added only for fixing performance regression
> > in the following link:
> >
> > https://lkml.org/lkml/2025/4/16/351
> >
> > because bio_iov_bvec_set() takes iter->nr_segs for setting bio->bi_vcnt.
>
> But iov_iter_extract_bvec_pages() appears to only use iter->nr_segs
> and not iter->count. I don't understand how it can get away with an
> overestimated iter->nr_segs.
iov_iter_extract_bvec_pages() does pass `max_size` for using iter->counter,
please see the only caller of iov_iter_extract_pages().
iov_iter has to respect both two, otherwise it is a iov_iter bug.
Thanks,
Ming
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-12-23 22:45 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17 12:31 [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop Ming Lei
2025-12-17 15:16 ` huang-jl
2025-12-18 1:42 ` Ming Lei
2025-12-18 2:27 ` Ming Lei
2025-12-18 3:19 ` huang-jl
2025-12-22 19:56 ` Caleb Sander Mateos
2025-12-23 2:30 ` Ming Lei
2025-12-23 19:56 ` Caleb Sander Mateos
2025-12-23 22:45 ` Ming Lei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox