From: Jens Axboe <axboe@kernel.dk>
To: Nitesh Shetty <nitheshshetty@gmail.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>,
Nitesh Shetty <nj.shetty@samsung.com>,
gost.dev@samsung.com, io-uring@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer
Date: Wed, 16 Apr 2025 14:01:47 -0600 [thread overview]
Message-ID: <c9838a68-7443-40d8-a1b7-492a12e6f9dc@kernel.dk> (raw)
In-Reply-To: <CAOSviJ3MNDOYJzJFjQDCjc04pGsktQ5vjQvDotqYoRwC2Wf=HQ@mail.gmail.com>
On 4/16/25 1:57 PM, Nitesh Shetty wrote:
> On Wed, Apr 16, 2025 at 11:55?PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 4/16/25 9:07 AM, Jens Axboe wrote:
>>> On 4/16/25 9:03 AM, Pavel Begunkov wrote:
>>>> On 4/16/25 06:44, Nitesh Shetty wrote:
>>>>> Sending exact nr_segs, avoids bio split check and processing in
>>>>> block layer, which takes around 5%[1] of overall CPU utilization.
>>>>>
>>>>> In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2]
>>>>> and 5% less CPU utilization.
>>>>>
>>>>> [1]
>>>>> 3.52% io_uring [kernel.kallsyms] [k] bio_split_rw_at
>>>>> 1.42% io_uring [kernel.kallsyms] [k] bio_split_rw
>>>>> 0.62% io_uring [kernel.kallsyms] [k] bio_submit_split
>>>>>
>>>>> [2]
>>>>> sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2
>>>>> -r4 /dev/nvme0n1 /dev/nvme1n1
>>>>>
>>>>> Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
>>>>> ---
>>>>> io_uring/rsrc.c | 3 +++
>>>>> 1 file changed, 3 insertions(+)
>>>>>
>>>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>>>>> index b36c8825550e..6fd3a4a85a9c 100644
>>>>> --- a/io_uring/rsrc.c
>>>>> +++ b/io_uring/rsrc.c
>>>>> @@ -1096,6 +1096,9 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>>>>> iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1);
>>>>> }
>>>>> }
>>>>> + iter->nr_segs = (iter->bvec->bv_offset + iter->iov_offset +
>>>>> + iter->count + ((1UL << imu->folio_shift) - 1)) /
>>>>> + (1UL << imu->folio_shift);
>>>>
>>>> That's not going to work with ->is_kbuf as the segments are not uniform in
>>>> size.
>>>
>>> Oops yes good point.
>>
>> How about something like this? Trims superflous end segments, if they
>> exist. The 'offset' section already trimmed the front parts. For
>> !is_kbuf that should be simple math, like in Nitesh's patch. For
>> is_kbuf, iterate them.
>>
>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>> index bef66e733a77..e482ea1e22a9 100644
>> --- a/io_uring/rsrc.c
>> +++ b/io_uring/rsrc.c
>> @@ -1036,6 +1036,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>> struct io_mapped_ubuf *imu,
>> u64 buf_addr, size_t len)
>> {
>> + const struct bio_vec *bvec;
>> unsigned int folio_shift;
>> size_t offset;
>> int ret;
>> @@ -1052,9 +1053,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>> * Might not be a start of buffer, set size appropriately
>> * and advance us to the beginning.
>> */
>> + bvec = imu->bvec;
>> offset = buf_addr - imu->ubuf;
>> folio_shift = imu->folio_shift;
>> - iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len);
>> + iov_iter_bvec(iter, ddir, bvec, imu->nr_bvecs, offset + len);
>>
>> if (offset) {
>> /*
>> @@ -1073,7 +1075,6 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>> * since we can just skip the first segment, which may not
>> * be folio_size aligned.
>> */
>> - const struct bio_vec *bvec = imu->bvec;
>>
>> /*
>> * Kernel buffer bvecs, on the other hand, don't necessarily
>> @@ -1099,6 +1100,27 @@ static int io_import_fixed(int ddir, struct iov_iter *iter,
>> }
>> }
>>
>> + /*
>> + * Offset trimmed front segments too, if any, now trim the tail.
>> + * For is_kbuf we'll iterate them as they may be different sizes,
>> + * otherwise we can just do straight up math.
>> + */
>> + if (len + offset < imu->len) {
>> + bvec = iter->bvec;
>> + if (imu->is_kbuf) {
>> + while (len > bvec->bv_len) {
>> + len -= bvec->bv_len;
>> + bvec++;
>> + }
>> + iter->nr_segs = bvec - iter->bvec;
>> + } else {
>> + size_t vec_len;
>> +
>> + vec_len = bvec->bv_offset + iter->iov_offset +
>> + iter->count + ((1UL << folio_shift) - 1);
>> + iter->nr_segs = vec_len >> folio_shift;
>> + }
>> + }
>> return 0;
>> }
> This might not be needed for is_kbuf , as they already update nr_seg
> inside iov_iter_advance.
How so? If 'offset' is true, then yes it'd skip the front, but it
doesn't skip the end part. And if 'offset' is 0, then no advancing is
done in the first place - which does make sense, as it's just advancing
from the front.
> How about changing something like this ?
You can't hide this in the if (offset) section...
--
Jens Axboe
next prev parent reply other threads:[~2025-04-16 20:01 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20250416055250epcas5p25fa8223a1bfeea5583ad8ba88c881a05@epcas5p2.samsung.com>
2025-04-16 5:44 ` [PATCH] io_uring/rsrc: send exact nr_segs for fixed buffer Nitesh Shetty
2025-04-16 14:19 ` Jens Axboe
2025-04-16 14:43 ` Jens Axboe
2025-04-16 14:49 ` Jens Axboe
2025-04-16 15:03 ` Pavel Begunkov
2025-04-16 15:07 ` Jens Axboe
2025-04-16 18:25 ` Jens Axboe
2025-04-16 19:57 ` Nitesh Shetty
2025-04-16 20:01 ` Jens Axboe [this message]
2025-04-16 20:29 ` Pavel Begunkov
2025-04-16 20:30 ` Jens Axboe
2025-04-16 21:03 ` Pavel Begunkov
2025-04-16 22:23 ` Jens Axboe
2025-04-16 22:42 ` Jens Axboe
2025-04-17 9:12 ` Pavel Begunkov
2025-04-16 20:03 ` Keith Busch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c9838a68-7443-40d8-a1b7-492a12e6f9dc@kernel.dk \
--to=axboe@kernel.dk \
--cc=asml.silence@gmail.com \
--cc=gost.dev@samsung.com \
--cc=io-uring@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nitheshshetty@gmail.com \
--cc=nj.shetty@samsung.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox