public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Pavel Begunkov <asml.silence@gmail.com>,
	Nitesh Shetty <nj.shetty@samsung.com>
Cc: gost.dev@samsung.com, nitheshshetty@gmail.com,
	io-uring@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer
Date: Wed, 16 Apr 2025 12:25:49 -0600	[thread overview]
Message-ID: <40a0bbd6-10c7-45bd-9129-51c1ea99a063@kernel.dk> (raw)
In-Reply-To: <37c982b5-92e1-4253-b8ac-d446a9a7d932@kernel.dk>

On 4/16/25 9:07 AM, Jens Axboe wrote:
> On 4/16/25 9:03 AM, Pavel Begunkov wrote:
>> On 4/16/25 06:44, Nitesh Shetty wrote:
>>> Sending exact nr_segs, avoids bio split check and processing in
>>> block layer, which takes around 5%[1] of overall CPU utilization.
>>>
>>> In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2]
>>> and 5% less CPU utilization.
>>>
>>> [1]
>>>       3.52%  io_uring         [kernel.kallsyms]     [k] bio_split_rw_at
>>>       1.42%  io_uring         [kernel.kallsyms]     [k] bio_split_rw
>>>       0.62%  io_uring         [kernel.kallsyms]     [k] bio_submit_split
>>>
>>> [2]
>>> sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2
>>> -r4 /dev/nvme0n1 /dev/nvme1n1
>>>
>>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
>>> ---
>>>   io_uring/rsrc.c | 3 +++
>>>   1 file changed, 3 insertions(+)
>>>
>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>>> index b36c8825550e..6fd3a4a85a9c 100644
>>> --- a/io_uring/rsrc.c
>>> +++ b/io_uring/rsrc.c
>>> @@ -1096,6 +1096,9 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>>>               iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1);
>>>           }
>>>       }
>>> +    iter->nr_segs = (iter->bvec->bv_offset + iter->iov_offset +
>>> +        iter->count + ((1UL << imu->folio_shift) - 1)) /
>>> +        (1UL << imu->folio_shift);
>>
>> That's not going to work with ->is_kbuf as the segments are not uniform in
>> size.
> 
> Oops yes good point.

How about something like this? Trims superflous end segments, if they
exist. The 'offset' section already trimmed the front parts. For
!is_kbuf that should be simple math, like in Nitesh's patch. For
is_kbuf, iterate them.

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index bef66e733a77..e482ea1e22a9 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1036,6 +1036,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
 			   struct io_mapped_ubuf *imu,
 			   u64 buf_addr, size_t len)
 {
+	const struct bio_vec *bvec;
 	unsigned int folio_shift;
 	size_t offset;
 	int ret;
@@ -1052,9 +1053,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
 	 * Might not be a start of buffer, set size appropriately
 	 * and advance us to the beginning.
 	 */
+	bvec = imu->bvec;
 	offset = buf_addr - imu->ubuf;
 	folio_shift = imu->folio_shift;
-	iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len);
+	iov_iter_bvec(iter, ddir, bvec, imu->nr_bvecs, offset + len);
 
 	if (offset) {
 		/*
@@ -1073,7 +1075,6 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
 		 * since we can just skip the first segment, which may not
 		 * be folio_size aligned.
 		 */
-		const struct bio_vec *bvec = imu->bvec;
 
 		/*
 		 * Kernel buffer bvecs, on the other hand, don't necessarily
@@ -1099,6 +1100,27 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
 		}
 	}
 
+	/*
+	 * Offset trimmed front segments too, if any, now trim the tail.
+	 * For is_kbuf we'll iterate them as they may be different sizes,
+	 * otherwise we can just do straight up math.
+	 */
+	if (len + offset < imu->len) {
+		bvec = iter->bvec;
+		if (imu->is_kbuf) {
+			while (len > bvec->bv_len) {
+				len -= bvec->bv_len;
+				bvec++;
+			}
+			iter->nr_segs = bvec - iter->bvec;
+		} else {
+			size_t vec_len;
+
+			vec_len = bvec->bv_offset + iter->iov_offset +
+					iter->count + ((1UL << folio_shift) - 1);
+			iter->nr_segs = vec_len >> folio_shift;
+		}
+	}
 	return 0;
 }
 

-- 
Jens Axboe

  reply	other threads:[~2025-04-16 18:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20250416055250epcas5p25fa8223a1bfeea5583ad8ba88c881a05@epcas5p2.samsung.com>
2025-04-16  5:44 ` [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer Nitesh Shetty
2025-04-16 14:19   ` Jens Axboe
2025-04-16 14:43     ` Jens Axboe
2025-04-16 14:49       ` Jens Axboe
2025-04-16 15:03   ` Pavel Begunkov
2025-04-16 15:07     ` Jens Axboe
2025-04-16 18:25       ` Jens Axboe [this message]
2025-04-16 19:57         ` Nitesh Shetty
2025-04-16 20:01           ` Jens Axboe
2025-04-16 20:29             ` Pavel Begunkov
2025-04-16 20:30               ` Jens Axboe
2025-04-16 21:03                 ` Pavel Begunkov
2025-04-16 22:23                   ` Jens Axboe
2025-04-16 22:42                     ` Jens Axboe
2025-04-17  9:12                     ` Pavel Begunkov
2025-04-16 20:03           ` Keith Busch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=40a0bbd6-10c7-45bd-9129-51c1ea99a063@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=asml.silence@gmail.com \
    --cc=gost.dev@samsung.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nitheshshetty@gmail.com \
    --cc=nj.shetty@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox