From: Nitesh Shetty <nitheshshetty@gmail.com>
To: Jens Axboe <axboe@kernel.dk>
Cc: Pavel Begunkov <asml.silence@gmail.com>,
Nitesh Shetty <nj.shetty@samsung.com>,
gost.dev@samsung.com, io-uring@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer
Date: Thu, 17 Apr 2025 01:27:55 +0530 [thread overview]
Message-ID: <CAOSviJ3MNDOYJzJFjQDCjc04pGsktQ5vjQvDotqYoRwC2Wf=HQ@mail.gmail.com> (raw)
In-Reply-To: <40a0bbd6-10c7-45bd-9129-51c1ea99a063@kernel.dk>
On Wed, Apr 16, 2025 at 11:55 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 4/16/25 9:07 AM, Jens Axboe wrote:
> > On 4/16/25 9:03 AM, Pavel Begunkov wrote:
> >> On 4/16/25 06:44, Nitesh Shetty wrote:
> >>> Sending exact nr_segs, avoids bio split check and processing in
> >>> block layer, which takes around 5%[1] of overall CPU utilization.
> >>>
> >>> In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2]
> >>> and 5% less CPU utilization.
> >>>
> >>> [1]
> >>> 3.52% io_uring [kernel.kallsyms] [k] bio_split_rw_at
> >>> 1.42% io_uring [kernel.kallsyms] [k] bio_split_rw
> >>> 0.62% io_uring [kernel.kallsyms] [k] bio_submit_split
> >>>
> >>> [2]
> >>> sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2
> >>> -r4 /dev/nvme0n1 /dev/nvme1n1
> >>>
> >>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
> >>> ---
> >>> io_uring/rsrc.c | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> >>> index b36c8825550e..6fd3a4a85a9c 100644
> >>> --- a/io_uring/rsrc.c
> >>> +++ b/io_uring/rsrc.c
> >>> @@ -1096,6 +1096,9 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
> >>> iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1);
> >>> }
> >>> }
> >>> + iter->nr_segs = (iter->bvec->bv_offset + iter->iov_offset +
> >>> + iter->count + ((1UL << imu->folio_shift) - 1)) /
> >>> + (1UL << imu->folio_shift);
> >>
> >> That's not going to work with ->is_kbuf as the segments are not uniform in
> >> size.
> >
> > Oops yes good point.
>
> How about something like this? Trims superflous end segments, if they
> exist. The 'offset' section already trimmed the front parts. For
> !is_kbuf that should be simple math, like in Nitesh's patch. For
> is_kbuf, iterate them.
>
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index bef66e733a77..e482ea1e22a9 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -1036,6 +1036,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
> struct io_mapped_ubuf *imu,
> u64 buf_addr, size_t len)
> {
> + const struct bio_vec *bvec;
> unsigned int folio_shift;
> size_t offset;
> int ret;
> @@ -1052,9 +1053,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
> * Might not be a start of buffer, set size appropriately
> * and advance us to the beginning.
> */
> + bvec = imu->bvec;
> offset = buf_addr - imu->ubuf;
> folio_shift = imu->folio_shift;
> - iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len);
> + iov_iter_bvec(iter, ddir, bvec, imu->nr_bvecs, offset + len);
>
> if (offset) {
> /*
> @@ -1073,7 +1075,6 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
> * since we can just skip the first segment, which may not
> * be folio_size aligned.
> */
> - const struct bio_vec *bvec = imu->bvec;
>
> /*
> * Kernel buffer bvecs, on the other hand, don't necessarily
> @@ -1099,6 +1100,27 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
> }
> }
>
> + /*
> + * Offset trimmed front segments too, if any, now trim the tail.
> + * For is_kbuf we'll iterate them as they may be different sizes,
> + * otherwise we can just do straight up math.
> + */
> + if (len + offset < imu->len) {
> + bvec = iter->bvec;
> + if (imu->is_kbuf) {
> + while (len > bvec->bv_len) {
> + len -= bvec->bv_len;
> + bvec++;
> + }
> + iter->nr_segs = bvec - iter->bvec;
> + } else {
> + size_t vec_len;
> +
> + vec_len = bvec->bv_offset + iter->iov_offset +
> + iter->count + ((1UL << folio_shift) - 1);
> + iter->nr_segs = vec_len >> folio_shift;
> + }
> + }
> return 0;
> }
This might not be needed for is_kbuf , as they already update nr_seg
inside iov_iter_advance.
How about changing something like this ?
- if (offset < bvec->bv_len) {
- iter->count -= offset;
- iter->iov_offset = offset;
- } else if (imu->is_kbuf) {
+ if (!imu->is_kbuf) {
+ size_t vec_len;
+
+ if (offset < bvec->bv_len) {
+ iter->count -= offset;
+ iter->iov_offset = offset;
+ } else {
+ unsigned long seg_skip;
+
+ /* skip first vec */
+ offset -= bvec->bv_len;
+ seg_skip = 1 + (offset >> folio_shift);
+
+ iter->bvec += seg_skip;
+ iter->count -= bvec->bv_len + offset;
+ iter->iov_offset = offset & ((1UL <<
folio_shift) - 1);
+ }
+ vec_len = ALIGN(iter->bvec->bv_offset +
iter->iov_offset +
+ iter->count, folio_shift;
+ iter->nr_segs = vec_len >> folio_shift;
+ } else
iov_iter_advance(iter, offset);
- } else {
- unsigned long seg_skip;
-
- /* skip first vec */
- offset -= bvec->bv_len;
- seg_skip = 1 + (offset >> folio_shift);
-
- iter->bvec += seg_skip;
- iter->count -= bvec->bv_len + offset;
- iter->iov_offset = offset & ((1UL << folio_shift) - 1);
- }
}
Regards,
Nitesh
next prev parent reply other threads:[~2025-04-16 19:58 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20250416055250epcas5p25fa8223a1bfeea5583ad8ba88c881a05@epcas5p2.samsung.com>
2025-04-16 5:44 ` [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer Nitesh Shetty
2025-04-16 14:19 ` Jens Axboe
2025-04-16 14:43 ` Jens Axboe
2025-04-16 14:49 ` Jens Axboe
2025-04-16 15:03 ` Pavel Begunkov
2025-04-16 15:07 ` Jens Axboe
2025-04-16 18:25 ` Jens Axboe
2025-04-16 19:57 ` Nitesh Shetty [this message]
2025-04-16 20:01 ` Jens Axboe
2025-04-16 20:29 ` Pavel Begunkov
2025-04-16 20:30 ` Jens Axboe
2025-04-16 21:03 ` Pavel Begunkov
2025-04-16 22:23 ` Jens Axboe
2025-04-16 22:42 ` Jens Axboe
2025-04-17 9:12 ` Pavel Begunkov
2025-04-16 20:03 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAOSviJ3MNDOYJzJFjQDCjc04pGsktQ5vjQvDotqYoRwC2Wf=HQ@mail.gmail.com' \
--to=nitheshshetty@gmail.com \
--cc=asml.silence@gmail.com \
--cc=axboe@kernel.dk \
--cc=gost.dev@samsung.com \
--cc=io-uring@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nj.shetty@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox