public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
@ 2025-12-17 12:31 Ming Lei
  2025-12-17 15:16 ` huang-jl
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2025-12-17 12:31 UTC (permalink / raw)
  To: Jens Axboe, io-uring; +Cc: Ming Lei, huang-jl, Caleb Sander Mateos

Replace the iov_iter_advance() call and the separate loop for
calculating nr_segs with a single unified loop that iterates over
bvecs to determine the starting position, iov_offset, and nr_segs.

This simplifies the logic and avoids the overhead of iov_iter_advance().

Cc: huang-jl <huang-jl@deepseek.com>
Cc: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 io_uring/rsrc.c | 38 ++++++++++++++++++++++++++++----------
 1 file changed, 28 insertions(+), 10 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index a63474b331bf..f7ee146d5330 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1051,20 +1051,38 @@ static int validate_fixed_range(u64 buf_addr, size_t len,
 static int io_import_kbuf(int ddir, struct iov_iter *iter,
 			  struct io_mapped_ubuf *imu, size_t len, size_t offset)
 {
-	size_t count = len + offset;
-
-	iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, count);
-	iov_iter_advance(iter, offset);
+	const struct bio_vec *bvec = imu->bvec;
+	size_t iov_offset = 0;
+	unsigned int nr_segs = 0;
+	size_t remaining = len;
+	unsigned int i;
 
-	if (count < imu->len) {
-		const struct bio_vec *bvec = iter->bvec;
+	/* Single loop to calculate starting bvec, iov_offset, and nr_segs */
+	for (i = 0; i < imu->nr_bvecs; i++, bvec++) {
+		size_t bvec_avail;
 
-		while (len > bvec->bv_len) {
-			len -= bvec->bv_len;
-			bvec++;
+		/* Skip offset bytes to find starting position */
+		if (offset > 0) {
+			if (offset >= bvec->bv_len) {
+				offset -= bvec->bv_len;
+				continue;
+			}
+			iov_offset = offset;
+			bvec_avail = bvec->bv_len - offset;
+			offset = 0;
+		} else {
+			bvec_avail = bvec->bv_len;
 		}
-		iter->nr_segs = 1 + bvec - iter->bvec;
+
+		/* Count bvecs for len bytes */
+		nr_segs++;
+		if (remaining <= bvec_avail)
+			break;
+		remaining -= bvec_avail;
 	}
+
+	iov_iter_bvec(iter, ddir, bvec - nr_segs + 1, nr_segs, len);
+	iter->iov_offset = iov_offset;
 	return 0;
 }
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
  2025-12-17 12:31 [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop Ming Lei
@ 2025-12-17 15:16 ` huang-jl
  2025-12-18  1:42   ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: huang-jl @ 2025-12-17 15:16 UTC (permalink / raw)
  To: ming.lei; +Cc: axboe, csander, huang-jl, io-uring

The code looks correct to me.

> This simplifies the logic

I'm not an expert in Linux development, but from my perspective, the
original version seems simpler and more readable. The semantics of
iov_iter_advance() are clear and well-understood.

That said, I understand the appeal of merging them into a single loop.

> and avoids the overhead of iov_iter_advance()

Could you clarify what overhead you mean? If it's the function call
overhead, I think the compiler would inline it anyway. The actual
iteration work seems equivalent between both approaches.

Thanks,
huang-jl

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
  2025-12-17 15:16 ` huang-jl
@ 2025-12-18  1:42   ` Ming Lei
  2025-12-18  2:27     ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2025-12-18  1:42 UTC (permalink / raw)
  To: huang-jl; +Cc: axboe, csander, io-uring, nj.shetty

On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> The code looks correct to me.
> 
> > This simplifies the logic
> 
> I'm not an expert in Linux development, but from my perspective, the
> original version seems simpler and more readable. The semantics of
> iov_iter_advance() are clear and well-understood.
> 
> That said, I understand the appeal of merging them into a single loop.
> 
> > and avoids the overhead of iov_iter_advance()
> 
> Could you clarify what overhead you mean? If it's the function call
> overhead, I think the compiler would inline it anyway. The actual
> iteration work seems equivalent between both approaches.

iov_iter_advance() is global function, and it can't be inline.

Also single loop is more readable, cause ->iov_offset can be ignored easily.

In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
potential split, however not get idea how it is triggered. Nitesh didn't
mention the exact reason:

https://lkml.org/lkml/2025/4/16/351

I will look at the reason and see if it can be avoided.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
  2025-12-18  1:42   ` Ming Lei
@ 2025-12-18  2:27     ` Ming Lei
  2025-12-18  3:19       ` huang-jl
  2025-12-22 19:56       ` Caleb Sander Mateos
  0 siblings, 2 replies; 9+ messages in thread
From: Ming Lei @ 2025-12-18  2:27 UTC (permalink / raw)
  To: huang-jl; +Cc: axboe, csander, io-uring, nj.shetty

On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > The code looks correct to me.
> > 
> > > This simplifies the logic
> > 
> > I'm not an expert in Linux development, but from my perspective, the
> > original version seems simpler and more readable. The semantics of
> > iov_iter_advance() are clear and well-understood.
> > 
> > That said, I understand the appeal of merging them into a single loop.
> > 
> > > and avoids the overhead of iov_iter_advance()
> > 
> > Could you clarify what overhead you mean? If it's the function call
> > overhead, I think the compiler would inline it anyway. The actual
> > iteration work seems equivalent between both approaches.
> 
> iov_iter_advance() is global function, and it can't be inline.
> 
> Also single loop is more readable, cause ->iov_offset can be ignored easily.
> 
> In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> potential split, however not get idea how it is triggered. Nitesh didn't
> mention the exact reason:
> 
> https://lkml.org/lkml/2025/4/16/351
> 
> I will look at the reason and see if it can be avoided.

The reason is in both bio_iov_bvec_set() and bio_may_need_split().

->bi_vcnt doesn't make sense for cloned bio, and shouldn't be used as multiple
segment hint.

However, it also shows bio_split_rw() is too heavy.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
  2025-12-18  2:27     ` Ming Lei
@ 2025-12-18  3:19       ` huang-jl
  2025-12-22 19:56       ` Caleb Sander Mateos
  1 sibling, 0 replies; 9+ messages in thread
From: huang-jl @ 2025-12-18  3:19 UTC (permalink / raw)
  To: ming.lei; +Cc: axboe, csander, io-uring, nj.shetty

> ->bi_vcnt doesn't make sense for cloned bio, and shouldn't be used as multiple
> segment hint.
>
> However, it also shows bio_split_rw() is too heavy.

Sorry, I'm not familiar with this area of the codebase. Perhaps need
someone else to provide more valuable suggestions.

--
Thanks,
huang-jl

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
  2025-12-18  2:27     ` Ming Lei
  2025-12-18  3:19       ` huang-jl
@ 2025-12-22 19:56       ` Caleb Sander Mateos
  2025-12-23  2:30         ` Ming Lei
  1 sibling, 1 reply; 9+ messages in thread
From: Caleb Sander Mateos @ 2025-12-22 19:56 UTC (permalink / raw)
  To: Ming Lei; +Cc: huang-jl, axboe, io-uring, nj.shetty

On Wed, Dec 17, 2025 at 9:28 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> > On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > > The code looks correct to me.
> > >
> > > > This simplifies the logic
> > >
> > > I'm not an expert in Linux development, but from my perspective, the
> > > original version seems simpler and more readable. The semantics of
> > > iov_iter_advance() are clear and well-understood.
> > >
> > > That said, I understand the appeal of merging them into a single loop.
> > >
> > > > and avoids the overhead of iov_iter_advance()
> > >
> > > Could you clarify what overhead you mean? If it's the function call
> > > overhead, I think the compiler would inline it anyway. The actual
> > > iteration work seems equivalent between both approaches.
> >
> > iov_iter_advance() is global function, and it can't be inline.
> >
> > Also single loop is more readable, cause ->iov_offset can be ignored easily.
> >
> > In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> > potential split, however not get idea how it is triggered. Nitesh didn't
> > mention the exact reason:
> >
> > https://lkml.org/lkml/2025/4/16/351
> >
> > I will look at the reason and see if it can be avoided.
>
> The reason is in both bio_iov_bvec_set() and bio_may_need_split().

nr_segs is not just a performance optimization, it's part of the
struct iov_iter API and used by iov_iter_extract_bvec_pages(), as
huang-jl pointed out. I don't think it's a good idea to assume that
nr_segs isn't going to be used and doesn't need to be calculated
correctly.

I think this patch is a definite improvement as it reduces the number
of assumptions about the internal structure of a bvec iov_iter. The
remaining assignment to iter->iov_offset is unfortunate, but I don't
see a great way around it.

My only suggestion would be to separate out a "find first bvec" loop
and a "find last bvec" loop to make the loop body less branchy.
Something like this:
const struct bio_vec *bvec = imu->bvec, bvec_start;
size_t remaining;
unsigned bvec_avail;

while (offset >= bvec->bv_len) {
        offset -= bvec->bv_len;
        bvec++;
}

bvec_avail = bvec->bv_len - offset;
bvec_start = bvec;
remaining = len;
while (remaining > bvec_avail) {
        remaining -= bvec_avail;
        bvec++;
        bvec_avail = bvec->bv_len;
}

iov_iter_bvec(iter, ddir, start_bvec, bvec - start_bvec + 1, len);
iter->iov_offset = offset;

Best,
Caleb

>
> ->bi_vcnt doesn't make sense for cloned bio, and shouldn't be used as multiple
> segment hint.
>
> However, it also shows bio_split_rw() is too heavy.
>
>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
  2025-12-22 19:56       ` Caleb Sander Mateos
@ 2025-12-23  2:30         ` Ming Lei
  2025-12-23 19:56           ` Caleb Sander Mateos
  0 siblings, 1 reply; 9+ messages in thread
From: Ming Lei @ 2025-12-23  2:30 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: huang-jl, axboe, io-uring, nj.shetty

On Mon, Dec 22, 2025 at 02:56:02PM -0500, Caleb Sander Mateos wrote:
> On Wed, Dec 17, 2025 at 9:28 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> > > On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > > > The code looks correct to me.
> > > >
> > > > > This simplifies the logic
> > > >
> > > > I'm not an expert in Linux development, but from my perspective, the
> > > > original version seems simpler and more readable. The semantics of
> > > > iov_iter_advance() are clear and well-understood.
> > > >
> > > > That said, I understand the appeal of merging them into a single loop.
> > > >
> > > > > and avoids the overhead of iov_iter_advance()
> > > >
> > > > Could you clarify what overhead you mean? If it's the function call
> > > > overhead, I think the compiler would inline it anyway. The actual
> > > > iteration work seems equivalent between both approaches.
> > >
> > > iov_iter_advance() is global function, and it can't be inline.
> > >
> > > Also single loop is more readable, cause ->iov_offset can be ignored easily.
> > >
> > > In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> > > potential split, however not get idea how it is triggered. Nitesh didn't
> > > mention the exact reason:
> > >
> > > https://lkml.org/lkml/2025/4/16/351
> > >
> > > I will look at the reason and see if it can be avoided.
> >
> > The reason is in both bio_iov_bvec_set() and bio_may_need_split().
> 
> nr_segs is not just a performance optimization, it's part of the
> struct iov_iter API and used by iov_iter_extract_bvec_pages(), as
> huang-jl pointed out. I don't think it's a good idea to assume that
> nr_segs isn't going to be used and doesn't need to be calculated
> correctly.

It doesn't have to be exact if the bytes covered by `count` won't cross
`nr_segs`.

The `nr_segs` re-calculation is added only for fixing performance regression
in the following link:

https://lkml.org/lkml/2025/4/16/351

because bio_iov_bvec_set() takes iter->nr_segs for setting bio->bi_vcnt.

> 
> I think this patch is a definite improvement as it reduces the number
> of assumptions about the internal structure of a bvec iov_iter. The
> remaining assignment to iter->iov_offset is unfortunate, but I don't
> see a great way around it.
> 

The re-calculation can be removed, please see the following patches:

https://lore.kernel.org/linux-block/20251218093146.1218279-1-ming.lei@redhat.com/

Nitesh has verified that it won't cause perf regression by replacing
bio->bi_vcnt with __bvec_iter_bvec() in bio_may_need_split().


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
  2025-12-23  2:30         ` Ming Lei
@ 2025-12-23 19:56           ` Caleb Sander Mateos
  2025-12-23 22:45             ` Ming Lei
  0 siblings, 1 reply; 9+ messages in thread
From: Caleb Sander Mateos @ 2025-12-23 19:56 UTC (permalink / raw)
  To: Ming Lei; +Cc: huang-jl, axboe, io-uring, nj.shetty

On Mon, Dec 22, 2025 at 9:30 PM Ming Lei <ming.lei@redhat.com> wrote:
>
> On Mon, Dec 22, 2025 at 02:56:02PM -0500, Caleb Sander Mateos wrote:
> > On Wed, Dec 17, 2025 at 9:28 PM Ming Lei <ming.lei@redhat.com> wrote:
> > >
> > > On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> > > > On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > > > > The code looks correct to me.
> > > > >
> > > > > > This simplifies the logic
> > > > >
> > > > > I'm not an expert in Linux development, but from my perspective, the
> > > > > original version seems simpler and more readable. The semantics of
> > > > > iov_iter_advance() are clear and well-understood.
> > > > >
> > > > > That said, I understand the appeal of merging them into a single loop.
> > > > >
> > > > > > and avoids the overhead of iov_iter_advance()
> > > > >
> > > > > Could you clarify what overhead you mean? If it's the function call
> > > > > overhead, I think the compiler would inline it anyway. The actual
> > > > > iteration work seems equivalent between both approaches.
> > > >
> > > > iov_iter_advance() is global function, and it can't be inline.
> > > >
> > > > Also single loop is more readable, cause ->iov_offset can be ignored easily.
> > > >
> > > > In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> > > > potential split, however not get idea how it is triggered. Nitesh didn't
> > > > mention the exact reason:
> > > >
> > > > https://lkml.org/lkml/2025/4/16/351
> > > >
> > > > I will look at the reason and see if it can be avoided.
> > >
> > > The reason is in both bio_iov_bvec_set() and bio_may_need_split().
> >
> > nr_segs is not just a performance optimization, it's part of the
> > struct iov_iter API and used by iov_iter_extract_bvec_pages(), as
> > huang-jl pointed out. I don't think it's a good idea to assume that
> > nr_segs isn't going to be used and doesn't need to be calculated
> > correctly.
>
> It doesn't have to be exact if the bytes covered by `count` won't cross
> `nr_segs`.
>
> The `nr_segs` re-calculation is added only for fixing performance regression
> in the following link:
>
> https://lkml.org/lkml/2025/4/16/351
>
> because bio_iov_bvec_set() takes iter->nr_segs for setting bio->bi_vcnt.

But iov_iter_extract_bvec_pages() appears to only use iter->nr_segs
and not iter->count. I don't understand how it can get away with an
overestimated iter->nr_segs.

Best,
Caleb

>
> >
> > I think this patch is a definite improvement as it reduces the number
> > of assumptions about the internal structure of a bvec iov_iter. The
> > remaining assignment to iter->iov_offset is unfortunate, but I don't
> > see a great way around it.
> >
>
> The re-calculation can be removed, please see the following patches:
>
> https://lore.kernel.org/linux-block/20251218093146.1218279-1-ming.lei@redhat.com/
>
> Nitesh has verified that it won't cause perf regression by replacing
> bio->bi_vcnt with __bvec_iter_bvec() in bio_may_need_split().
>
>
> Thanks,
> Ming
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop
  2025-12-23 19:56           ` Caleb Sander Mateos
@ 2025-12-23 22:45             ` Ming Lei
  0 siblings, 0 replies; 9+ messages in thread
From: Ming Lei @ 2025-12-23 22:45 UTC (permalink / raw)
  To: Caleb Sander Mateos; +Cc: huang-jl, axboe, io-uring, nj.shetty

On Tue, Dec 23, 2025 at 02:56:45PM -0500, Caleb Sander Mateos wrote:
> On Mon, Dec 22, 2025 at 9:30 PM Ming Lei <ming.lei@redhat.com> wrote:
> >
> > On Mon, Dec 22, 2025 at 02:56:02PM -0500, Caleb Sander Mateos wrote:
> > > On Wed, Dec 17, 2025 at 9:28 PM Ming Lei <ming.lei@redhat.com> wrote:
> > > >
> > > > On Thu, Dec 18, 2025 at 09:42:43AM +0800, Ming Lei wrote:
> > > > > On Wed, Dec 17, 2025 at 11:16:47PM +0800, huang-jl wrote:
> > > > > > The code looks correct to me.
> > > > > >
> > > > > > > This simplifies the logic
> > > > > >
> > > > > > I'm not an expert in Linux development, but from my perspective, the
> > > > > > original version seems simpler and more readable. The semantics of
> > > > > > iov_iter_advance() are clear and well-understood.
> > > > > >
> > > > > > That said, I understand the appeal of merging them into a single loop.
> > > > > >
> > > > > > > and avoids the overhead of iov_iter_advance()
> > > > > >
> > > > > > Could you clarify what overhead you mean? If it's the function call
> > > > > > overhead, I think the compiler would inline it anyway. The actual
> > > > > > iteration work seems equivalent between both approaches.
> > > > >
> > > > > iov_iter_advance() is global function, and it can't be inline.
> > > > >
> > > > > Also single loop is more readable, cause ->iov_offset can be ignored easily.
> > > > >
> > > > > In theory, re-calculating nr_segs isn't necessary, it is just for avoiding
> > > > > potential split, however not get idea how it is triggered. Nitesh didn't
> > > > > mention the exact reason:
> > > > >
> > > > > https://lkml.org/lkml/2025/4/16/351
> > > > >
> > > > > I will look at the reason and see if it can be avoided.
> > > >
> > > > The reason is in both bio_iov_bvec_set() and bio_may_need_split().
> > >
> > > nr_segs is not just a performance optimization, it's part of the
> > > struct iov_iter API and used by iov_iter_extract_bvec_pages(), as
> > > huang-jl pointed out. I don't think it's a good idea to assume that
> > > nr_segs isn't going to be used and doesn't need to be calculated
> > > correctly.
> >
> > It doesn't have to be exact if the bytes covered by `count` won't cross
> > `nr_segs`.
> >
> > The `nr_segs` re-calculation is added only for fixing performance regression
> > in the following link:
> >
> > https://lkml.org/lkml/2025/4/16/351
> >
> > because bio_iov_bvec_set() takes iter->nr_segs for setting bio->bi_vcnt.
> 
> But iov_iter_extract_bvec_pages() appears to only use iter->nr_segs
> and not iter->count. I don't understand how it can get away with an
> overestimated iter->nr_segs.

iov_iter_extract_bvec_pages() does pass `max_size` for using iter->counter,
please see the only caller of iov_iter_extract_pages().

iov_iter has to respect both two, otherwise it is a iov_iter bug.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-12-23 22:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17 12:31 [PATCH v6.20] io_uring/rsrc: refactor io_import_kbuf() to use single loop Ming Lei
2025-12-17 15:16 ` huang-jl
2025-12-18  1:42   ` Ming Lei
2025-12-18  2:27     ` Ming Lei
2025-12-18  3:19       ` huang-jl
2025-12-22 19:56       ` Caleb Sander Mateos
2025-12-23  2:30         ` Ming Lei
2025-12-23 19:56           ` Caleb Sander Mateos
2025-12-23 22:45             ` Ming Lei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox