* [PATCH 0/4] io_import_fixed cleanups and optimisation @ 2025-04-17 9:32 Pavel Begunkov 2025-04-17 9:32 ` [PATCH 1/4] io_uring/rsrc: don't skip offset calculation Pavel Begunkov ` (5 more replies) 0 siblings, 6 replies; 15+ messages in thread From: Pavel Begunkov @ 2025-04-17 9:32 UTC (permalink / raw) To: io-uring; +Cc: asml.silence, Nitesh Shetty io_import_fixed() cleanups topped with the nr segs optimisation patch from Nitesh. Doesn't have the kbuf part, assuming Jens will add it on top. Based on io_uring-6.15 Nitesh Shetty (1): io_uring/rsrc: send exact nr_segs for fixed buffer Pavel Begunkov (3): io_uring/rsrc: don't skip offset calculation io_uring/rsrc: separate kbuf offset adjustments io_uring/rsrc: refactor io_import_fixed io_uring/rsrc.c | 74 ++++++++++++++++++++----------------------------- 1 file changed, 30 insertions(+), 44 deletions(-) -- 2.48.1 ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/4] io_uring/rsrc: don't skip offset calculation 2025-04-17 9:32 [PATCH 0/4] io_import_fixed cleanups and optimisation Pavel Begunkov @ 2025-04-17 9:32 ` Pavel Begunkov 2025-04-17 9:32 ` [PATCH 2/4] io_uring/rsrc: separate kbuf offset adjustments Pavel Begunkov ` (4 subsequent siblings) 5 siblings, 0 replies; 15+ messages in thread From: Pavel Begunkov @ 2025-04-17 9:32 UTC (permalink / raw) To: io-uring; +Cc: asml.silence, Nitesh Shetty Don't optimise for requests with offset=0. Large registered buffers are the preference and hence the user is likely to pass an offset, and the adjustments are not expensive and will be made even cheaper in following patches. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/rsrc.c | 75 ++++++++++++++++++++++++------------------------- 1 file changed, 37 insertions(+), 38 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index b36c8825550e..4d62897d1c89 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1036,6 +1036,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, struct io_mapped_ubuf *imu, u64 buf_addr, size_t len) { + const struct bio_vec *bvec; size_t offset; int ret; @@ -1054,47 +1055,45 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, offset = buf_addr - imu->ubuf; iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len); - if (offset) { - /* - * Don't use iov_iter_advance() here, as it's really slow for - * using the latter parts of a big fixed buffer - it iterates - * over each segment manually. We can cheat a bit here for user - * registered nodes, because we know that: - * - * 1) it's a BVEC iter, we set it up - * 2) all bvecs are the same in size, except potentially the - * first and last bvec - * - * So just find our index, and adjust the iterator afterwards. - * If the offset is within the first bvec (or the whole first - * bvec, just use iov_iter_advance(). This makes it easier - * since we can just skip the first segment, which may not - * be folio_size aligned. - */ - const struct bio_vec *bvec = imu->bvec; + /* + * Don't use iov_iter_advance() here, as it's really slow for + * using the latter parts of a big fixed buffer - it iterates + * over each segment manually. We can cheat a bit here for user + * registered nodes, because we know that: + * + * 1) it's a BVEC iter, we set it up + * 2) all bvecs are the same in size, except potentially the + * first and last bvec + * + * So just find our index, and adjust the iterator afterwards. + * If the offset is within the first bvec (or the whole first + * bvec, just use iov_iter_advance(). This makes it easier + * since we can just skip the first segment, which may not + * be folio_size aligned. + */ + bvec = imu->bvec; - /* - * Kernel buffer bvecs, on the other hand, don't necessarily - * have the size property of user registered ones, so we have - * to use the slow iter advance. - */ - if (offset < bvec->bv_len) { - iter->count -= offset; - iter->iov_offset = offset; - } else if (imu->is_kbuf) { - iov_iter_advance(iter, offset); - } else { - unsigned long seg_skip; + /* + * Kernel buffer bvecs, on the other hand, don't necessarily + * have the size property of user registered ones, so we have + * to use the slow iter advance. + */ + if (offset < bvec->bv_len) { + iter->count -= offset; + iter->iov_offset = offset; + } else if (imu->is_kbuf) { + iov_iter_advance(iter, offset); + } else { + unsigned long seg_skip; - /* skip first vec */ - offset -= bvec->bv_len; - seg_skip = 1 + (offset >> imu->folio_shift); + /* skip first vec */ + offset -= bvec->bv_len; + seg_skip = 1 + (offset >> imu->folio_shift); - iter->bvec += seg_skip; - iter->nr_segs -= seg_skip; - iter->count -= bvec->bv_len + offset; - iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1); - } + iter->bvec += seg_skip; + iter->nr_segs -= seg_skip; + iter->count -= bvec->bv_len + offset; + iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1); } return 0; -- 2.48.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/4] io_uring/rsrc: separate kbuf offset adjustments 2025-04-17 9:32 [PATCH 0/4] io_import_fixed cleanups and optimisation Pavel Begunkov 2025-04-17 9:32 ` [PATCH 1/4] io_uring/rsrc: don't skip offset calculation Pavel Begunkov @ 2025-04-17 9:32 ` Pavel Begunkov 2025-04-17 9:32 ` [PATCH 3/4] io_uring/rsrc: refactor io_import_fixed Pavel Begunkov ` (3 subsequent siblings) 5 siblings, 0 replies; 15+ messages in thread From: Pavel Begunkov @ 2025-04-17 9:32 UTC (permalink / raw) To: io-uring; +Cc: asml.silence, Nitesh Shetty Kernel registered buffers are special because segments are not uniform in size, and we have a bunch of optimisations based on that uniformity for normal buffers. Handle kbuf separately, it'll be cleaner this way. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/rsrc.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 4d62897d1c89..fddde8ffe81e 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1048,11 +1048,14 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, if (!(imu->dir & (1 << ddir))) return -EFAULT; - /* - * Might not be a start of buffer, set size appropriately - * and advance us to the beginning. - */ offset = buf_addr - imu->ubuf; + + if (imu->is_kbuf) { + iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len); + iov_iter_advance(iter, offset); + return 0; + } + iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len); /* @@ -1072,17 +1075,9 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, * be folio_size aligned. */ bvec = imu->bvec; - - /* - * Kernel buffer bvecs, on the other hand, don't necessarily - * have the size property of user registered ones, so we have - * to use the slow iter advance. - */ if (offset < bvec->bv_len) { iter->count -= offset; iter->iov_offset = offset; - } else if (imu->is_kbuf) { - iov_iter_advance(iter, offset); } else { unsigned long seg_skip; -- 2.48.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] io_uring/rsrc: refactor io_import_fixed 2025-04-17 9:32 [PATCH 0/4] io_import_fixed cleanups and optimisation Pavel Begunkov 2025-04-17 9:32 ` [PATCH 1/4] io_uring/rsrc: don't skip offset calculation Pavel Begunkov 2025-04-17 9:32 ` [PATCH 2/4] io_uring/rsrc: separate kbuf offset adjustments Pavel Begunkov @ 2025-04-17 9:32 ` Pavel Begunkov 2025-04-17 9:32 ` [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer Pavel Begunkov ` (2 subsequent siblings) 5 siblings, 0 replies; 15+ messages in thread From: Pavel Begunkov @ 2025-04-17 9:32 UTC (permalink / raw) To: io-uring; +Cc: asml.silence, Nitesh Shetty io_import_fixed is a mess. Even though we know the final len of the iterator, we still assign offset + len and do some magic after to correct for that. Do offset calculation first and finalise it with iov_iter_bvec at the end. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/rsrc.c | 24 +++++++----------------- 1 file changed, 7 insertions(+), 17 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index fddde8ffe81e..5cf854318b1d 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1037,6 +1037,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, u64 buf_addr, size_t len) { const struct bio_vec *bvec; + unsigned nr_segs; size_t offset; int ret; @@ -1056,8 +1057,6 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, return 0; } - iov_iter_bvec(iter, ddir, imu->bvec, imu->nr_bvecs, offset + len); - /* * Don't use iov_iter_advance() here, as it's really slow for * using the latter parts of a big fixed buffer - it iterates @@ -1067,30 +1066,21 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, * 1) it's a BVEC iter, we set it up * 2) all bvecs are the same in size, except potentially the * first and last bvec - * - * So just find our index, and adjust the iterator afterwards. - * If the offset is within the first bvec (or the whole first - * bvec, just use iov_iter_advance(). This makes it easier - * since we can just skip the first segment, which may not - * be folio_size aligned. */ bvec = imu->bvec; - if (offset < bvec->bv_len) { - iter->count -= offset; - iter->iov_offset = offset; - } else { + if (offset >= bvec->bv_len) { unsigned long seg_skip; /* skip first vec */ offset -= bvec->bv_len; seg_skip = 1 + (offset >> imu->folio_shift); - - iter->bvec += seg_skip; - iter->nr_segs -= seg_skip; - iter->count -= bvec->bv_len + offset; - iter->iov_offset = offset & ((1UL << imu->folio_shift) - 1); + bvec += seg_skip; + offset &= (1UL << imu->folio_shift) - 1; } + nr_segs = imu->nr_bvecs - (bvec - imu->bvec); + iov_iter_bvec(iter, ddir, bvec, nr_segs, len); + iter->iov_offset = offset; return 0; } -- 2.48.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer 2025-04-17 9:32 [PATCH 0/4] io_import_fixed cleanups and optimisation Pavel Begunkov ` (2 preceding siblings ...) 2025-04-17 9:32 ` [PATCH 3/4] io_uring/rsrc: refactor io_import_fixed Pavel Begunkov @ 2025-04-17 9:32 ` Pavel Begunkov 2025-04-17 9:34 ` Pavel Begunkov 2025-04-17 12:31 ` [PATCH 0/4] io_import_fixed cleanups and optimisation Jens Axboe [not found] ` <CGME20250417142334epcas5p36df55874d21c896115d92e505f9793fd@epcas5p3.samsung.com> 5 siblings, 1 reply; 15+ messages in thread From: Pavel Begunkov @ 2025-04-17 9:32 UTC (permalink / raw) To: io-uring; +Cc: asml.silence, Nitesh Shetty From: Nitesh Shetty <nj.shetty@samsung.com> Sending exact nr_segs, avoids bio split check and processing in block layer, which takes around 5%[1] of overall CPU utilization. In our setup, we see overall improvement of IOPS from 7.15M to 7.65M [2] and 5% less CPU utilization. [1] 3.52% io_uring [kernel.kallsyms] [k] bio_split_rw_at 1.42% io_uring [kernel.kallsyms] [k] bio_split_rw 0.62% io_uring [kernel.kallsyms] [k] bio_submit_split [2] sudo taskset -c 0,1 ./t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1 Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com> [Pavel: fixed for kbuf, rebased and reworked on top of cleanups] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- io_uring/rsrc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 5cf854318b1d..4099b8225670 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -1037,6 +1037,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, u64 buf_addr, size_t len) { const struct bio_vec *bvec; + size_t folio_mask; unsigned nr_segs; size_t offset; int ret; @@ -1067,6 +1068,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, * 2) all bvecs are the same in size, except potentially the * first and last bvec */ + folio_mask = (1UL << imu->folio_shift) - 1; bvec = imu->bvec; if (offset >= bvec->bv_len) { unsigned long seg_skip; @@ -1075,10 +1077,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, offset -= bvec->bv_len; seg_skip = 1 + (offset >> imu->folio_shift); bvec += seg_skip; - offset &= (1UL << imu->folio_shift) - 1; + offset &= folio_mask; } - nr_segs = imu->nr_bvecs - (bvec - imu->bvec); + nr_segs = (offset + len + folio_mask) >> imu->folio_shift; iov_iter_bvec(iter, ddir, bvec, nr_segs, len); iter->iov_offset = offset; return 0; -- 2.48.1 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer 2025-04-17 9:32 ` [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer Pavel Begunkov @ 2025-04-17 9:34 ` Pavel Begunkov [not found] ` <CGME20250417103133epcas5p32c1e004e7f8a5135c4c7e3662b087470@epcas5p3.samsung.com> 0 siblings, 1 reply; 15+ messages in thread From: Pavel Begunkov @ 2025-04-17 9:34 UTC (permalink / raw) To: io-uring, Nitesh Shetty On 4/17/25 10:32, Pavel Begunkov wrote: > From: Nitesh Shetty <nj.shetty@samsung.com> ... > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c > index 5cf854318b1d..4099b8225670 100644 > --- a/io_uring/rsrc.c > +++ b/io_uring/rsrc.c > @@ -1037,6 +1037,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, > u64 buf_addr, size_t len) > { > const struct bio_vec *bvec; > + size_t folio_mask; > unsigned nr_segs; > size_t offset; > int ret; > @@ -1067,6 +1068,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, > * 2) all bvecs are the same in size, except potentially the > * first and last bvec > */ > + folio_mask = (1UL << imu->folio_shift) - 1; > bvec = imu->bvec; > if (offset >= bvec->bv_len) { > unsigned long seg_skip; > @@ -1075,10 +1077,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, > offset -= bvec->bv_len; > seg_skip = 1 + (offset >> imu->folio_shift); > bvec += seg_skip; > - offset &= (1UL << imu->folio_shift) - 1; > + offset &= folio_mask; > } > > - nr_segs = imu->nr_bvecs - (bvec - imu->bvec); > + nr_segs = (offset + len + folio_mask) >> imu->folio_shift; Nitesh, let me know if you're happy with this version. > iov_iter_bvec(iter, ddir, bvec, nr_segs, len); > iter->iov_offset = offset; > return 0; -- Pavel Begunkov ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CGME20250417103133epcas5p32c1e004e7f8a5135c4c7e3662b087470@epcas5p3.samsung.com>]
* Re: [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer [not found] ` <CGME20250417103133epcas5p32c1e004e7f8a5135c4c7e3662b087470@epcas5p3.samsung.com> @ 2025-04-17 10:23 ` Nitesh Shetty 2025-04-17 11:50 ` Nitesh Shetty 0 siblings, 1 reply; 15+ messages in thread From: Nitesh Shetty @ 2025-04-17 10:23 UTC (permalink / raw) To: Pavel Begunkov; +Cc: io-uring [-- Attachment #1: Type: text/plain, Size: 1408 bytes --] On 17/04/25 10:34AM, Pavel Begunkov wrote: >On 4/17/25 10:32, Pavel Begunkov wrote: >>From: Nitesh Shetty <nj.shetty@samsung.com> >... >>diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c >>index 5cf854318b1d..4099b8225670 100644 >>--- a/io_uring/rsrc.c >>+++ b/io_uring/rsrc.c >>@@ -1037,6 +1037,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >> u64 buf_addr, size_t len) >> { >> const struct bio_vec *bvec; >>+ size_t folio_mask; >> unsigned nr_segs; >> size_t offset; >> int ret; >>@@ -1067,6 +1068,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >> * 2) all bvecs are the same in size, except potentially the >> * first and last bvec >> */ >>+ folio_mask = (1UL << imu->folio_shift) - 1; >> bvec = imu->bvec; >> if (offset >= bvec->bv_len) { >> unsigned long seg_skip; >>@@ -1075,10 +1077,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >> offset -= bvec->bv_len; >> seg_skip = 1 + (offset >> imu->folio_shift); >> bvec += seg_skip; >>- offset &= (1UL << imu->folio_shift) - 1; >>+ offset &= folio_mask; >> } >>- nr_segs = imu->nr_bvecs - (bvec - imu->bvec); >>+ nr_segs = (offset + len + folio_mask) >> imu->folio_shift; > >Nitesh, let me know if you're happy with this version. > This looks great to me, I tested this series and see the improvement in IOPS from 7.15 to 7.65M here. Thanks, Nitesh Shetty [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer 2025-04-17 10:23 ` Nitesh Shetty @ 2025-04-17 11:50 ` Nitesh Shetty 2025-04-17 12:56 ` Pavel Begunkov 0 siblings, 1 reply; 15+ messages in thread From: Nitesh Shetty @ 2025-04-17 11:50 UTC (permalink / raw) To: Pavel Begunkov; +Cc: io-uring [-- Attachment #1: Type: text/plain, Size: 2224 bytes --] On 17/04/25 03:53PM, Nitesh Shetty wrote: >On 17/04/25 10:34AM, Pavel Begunkov wrote: >>On 4/17/25 10:32, Pavel Begunkov wrote: >>>From: Nitesh Shetty <nj.shetty@samsung.com> >>... >>>diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c >>>index 5cf854318b1d..4099b8225670 100644 >>>--- a/io_uring/rsrc.c >>>+++ b/io_uring/rsrc.c >>>@@ -1037,6 +1037,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>> u64 buf_addr, size_t len) >>> { >>> const struct bio_vec *bvec; >>>+ size_t folio_mask; >>> unsigned nr_segs; >>> size_t offset; >>> int ret; >>>@@ -1067,6 +1068,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>> * 2) all bvecs are the same in size, except potentially the >>> * first and last bvec >>> */ >>>+ folio_mask = (1UL << imu->folio_shift) - 1; >>> bvec = imu->bvec; >>> if (offset >= bvec->bv_len) { >>> unsigned long seg_skip; >>>@@ -1075,10 +1077,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>> offset -= bvec->bv_len; >>> seg_skip = 1 + (offset >> imu->folio_shift); >>> bvec += seg_skip; >>>- offset &= (1UL << imu->folio_shift) - 1; >>>+ offset &= folio_mask; >>> } >>>- nr_segs = imu->nr_bvecs - (bvec - imu->bvec); >>>+ nr_segs = (offset + len + folio_mask) >> imu->folio_shift; >> >>Nitesh, let me know if you're happy with this version. >> >This looks great to me, I tested this series and see the >improvement in IOPS from 7.15 to 7.65M here. > There is corner case where this might not work, This happens when there is a first bvec has non zero offset. Let's say bv_offset = 256, len = 512, iov_offset = 3584 (512*7, 8th IO), here we expect IO to have 2 segments with present codebase, but this patch set produces 1 segment. So having a fix like this solves the issue, + nr_segs = (offset + len + bvec->bv_offset + folio_mask) >> imu->folio_shift; Note: I am investigating whether this is a valid case or not, because having a 512 byte IO with 256 byte alignment feel odd. So have sent one patch for that as well[1]. If that patch[1] is upstreamed then above case is taken care of, so we can use this series as it is. Thanks, Nitesh [1] https://lore.kernel.org/all/20250415181419.16305-1-nj.shetty@samsung.com/ [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer 2025-04-17 11:50 ` Nitesh Shetty @ 2025-04-17 12:56 ` Pavel Begunkov 2025-04-17 13:41 ` Jens Axboe 0 siblings, 1 reply; 15+ messages in thread From: Pavel Begunkov @ 2025-04-17 12:56 UTC (permalink / raw) To: Nitesh Shetty; +Cc: io-uring On 4/17/25 12:50, Nitesh Shetty wrote: > On 17/04/25 03:53PM, Nitesh Shetty wrote: >> On 17/04/25 10:34AM, Pavel Begunkov wrote: >>> On 4/17/25 10:32, Pavel Begunkov wrote: >>>> From: Nitesh Shetty <nj.shetty@samsung.com> >>> ... >>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c >>>> index 5cf854318b1d..4099b8225670 100644 >>>> --- a/io_uring/rsrc.c >>>> +++ b/io_uring/rsrc.c >>>> @@ -1037,6 +1037,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>>> u64 buf_addr, size_t len) >>>> { >>>> const struct bio_vec *bvec; >>>> + size_t folio_mask; >>>> unsigned nr_segs; >>>> size_t offset; >>>> int ret; >>>> @@ -1067,6 +1068,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>>> * 2) all bvecs are the same in size, except potentially the >>>> * first and last bvec >>>> */ >>>> + folio_mask = (1UL << imu->folio_shift) - 1; >>>> bvec = imu->bvec; >>>> if (offset >= bvec->bv_len) { >>>> unsigned long seg_skip; >>>> @@ -1075,10 +1077,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>>> offset -= bvec->bv_len; >>>> seg_skip = 1 + (offset >> imu->folio_shift); >>>> bvec += seg_skip; >>>> - offset &= (1UL << imu->folio_shift) - 1; >>>> + offset &= folio_mask; >>>> } >>>> - nr_segs = imu->nr_bvecs - (bvec - imu->bvec); >>>> + nr_segs = (offset + len + folio_mask) >> imu->folio_shift; >>> >>> Nitesh, let me know if you're happy with this version. >>> >> This looks great to me, I tested this series and see the >> improvement in IOPS from 7.15 to 7.65M here. >> > > There is corner case where this might not work, > This happens when there is a first bvec has non zero offset. > Let's say bv_offset = 256, len = 512, iov_offset = 3584 (512*7, 8th IO), > here we expect IO to have 2 segments with present codebase, but this > patch set produces 1 segment. > > So having a fix like this solves the issue, > + nr_segs = (offset + len + bvec->bv_offset + folio_mask) >> imu->folio_shift; Ah yes, looks like the right fix up. We can make it nicer, but that's for later. It'd also be great to have a test for it. > Note: > I am investigating whether this is a valid case or not, because having a > 512 byte IO with 256 byte alignment feel odd. So have sent one patch for Block might filter it out, but for example net/ doesn't care, fs as well. IIUC what you mean, either way we definitely should correct that. > that as well[1]. If that patch[1] is upstreamed then above case is taken > care of, so we can use this series as it is. > > Thanks, > Nitesh > > [1] > https://lore.kernel.org/all/20250415181419.16305-1-nj.shetty@samsung.com/ > > -- Pavel Begunkov ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer 2025-04-17 12:56 ` Pavel Begunkov @ 2025-04-17 13:41 ` Jens Axboe 2025-04-17 13:57 ` Jens Axboe 0 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2025-04-17 13:41 UTC (permalink / raw) To: Pavel Begunkov, Nitesh Shetty; +Cc: io-uring On 4/17/25 6:56 AM, Pavel Begunkov wrote: > On 4/17/25 12:50, Nitesh Shetty wrote: >> On 17/04/25 03:53PM, Nitesh Shetty wrote: >>> On 17/04/25 10:34AM, Pavel Begunkov wrote: >>>> On 4/17/25 10:32, Pavel Begunkov wrote: >>>>> From: Nitesh Shetty <nj.shetty@samsung.com> >>>> ... >>>>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c >>>>> index 5cf854318b1d..4099b8225670 100644 >>>>> --- a/io_uring/rsrc.c >>>>> +++ b/io_uring/rsrc.c >>>>> @@ -1037,6 +1037,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>>>> u64 buf_addr, size_t len) >>>>> { >>>>> const struct bio_vec *bvec; >>>>> + size_t folio_mask; >>>>> unsigned nr_segs; >>>>> size_t offset; >>>>> int ret; >>>>> @@ -1067,6 +1068,7 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>>>> * 2) all bvecs are the same in size, except potentially the >>>>> * first and last bvec >>>>> */ >>>>> + folio_mask = (1UL << imu->folio_shift) - 1; >>>>> bvec = imu->bvec; >>>>> if (offset >= bvec->bv_len) { >>>>> unsigned long seg_skip; >>>>> @@ -1075,10 +1077,10 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, >>>>> offset -= bvec->bv_len; >>>>> seg_skip = 1 + (offset >> imu->folio_shift); >>>>> bvec += seg_skip; >>>>> - offset &= (1UL << imu->folio_shift) - 1; >>>>> + offset &= folio_mask; >>>>> } >>>>> - nr_segs = imu->nr_bvecs - (bvec - imu->bvec); >>>>> + nr_segs = (offset + len + folio_mask) >> imu->folio_shift; >>>> >>>> Nitesh, let me know if you're happy with this version. >>>> >>> This looks great to me, I tested this series and see the >>> improvement in IOPS from 7.15 to 7.65M here. >>> >> >> There is corner case where this might not work, >> This happens when there is a first bvec has non zero offset. >> Let's say bv_offset = 256, len = 512, iov_offset = 3584 (512*7, 8th IO), >> here we expect IO to have 2 segments with present codebase, but this >> patch set produces 1 segment. >> >> So having a fix like this solves the issue, >> + nr_segs = (offset + len + bvec->bv_offset + folio_mask) >> imu->folio_shift; > > Ah yes, looks like the right fix up. We can make it nicer, but > that's for later. It'd also be great to have a test for it. > > >> Note: >> I am investigating whether this is a valid case or not, because having a >> 512 byte IO with 256 byte alignment feel odd. So have sent one patch for > > Block might filter it out, but for example net/ doesn't care, > fs as well. IIUC what you mean, either way we definitely should > correct that. I just tested it, and yes it certainly blows up... Can also confirm that the corrected nr_segs calculation does the right thing, doesn't end up underestimating the segment count by 1 in that case. I'll turn the test case into something we can add to liburing, and fold in that change. -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer 2025-04-17 13:41 ` Jens Axboe @ 2025-04-17 13:57 ` Jens Axboe 2025-04-17 14:02 ` Pavel Begunkov 0 siblings, 1 reply; 15+ messages in thread From: Jens Axboe @ 2025-04-17 13:57 UTC (permalink / raw) To: Pavel Begunkov, Nitesh Shetty; +Cc: io-uring On 4/17/25 7:41 AM, Jens Axboe wrote: > I'll turn the test case into something we can add to liburing, and fold > in that change. Here's what I tested, fwiw, and it reliably blows up pre the fixup. I'll turn it into a normal test case, and then folks can add more invariants to this one if they wish. #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include <stdlib.h> #include <string.h> #include <sys/time.h> #include <liburing.h> static struct iovec vec; static int read_it(struct io_uring *ring, int fd, int len, int off) { struct io_uring_sqe *sqe; struct io_uring_cqe *cqe; int ret; sqe = io_uring_get_sqe(ring); io_uring_prep_read_fixed(sqe, fd, vec.iov_base + off, len, 0, 0); sqe->user_data = 1; io_uring_submit(ring); ret = io_uring_wait_cqe(ring, &cqe); if (ret) { fprintf(stderr, "wait %d\n", ret); return 1; } if (cqe->res < 0) { fprintf(stderr, "cqe res %s\n", strerror(-cqe->res)); return 1; } if (cqe->res != len) { fprintf(stderr, "Bad read amount: %d\n", cqe->res); return 1; } io_uring_cqe_seen(ring, cqe); return 0; } static int test(struct io_uring *ring, int fd, int vec_off) { struct iovec v = vec; int ret; v.iov_base += vec_off; v.iov_len -= vec_off; ret = io_uring_register_buffers(ring, &v, 1); if (ret) { fprintf(stderr, "Vec register: %d\n", ret); return 1; } ret = read_it(ring, fd, 4096, vec_off); if (ret) { fprintf(stderr, "4096 0 failed\n"); return 1; } ret = read_it(ring, fd, 8192, 4096); if (ret) { fprintf(stderr, "8192 4096 failed\n"); return 1; } ret = read_it(ring, fd, 4096, 4096); if (ret) { fprintf(stderr, "4096 4096 failed\n"); return 1; } io_uring_unregister_buffers(ring); return 0; } int main(int argc, char *argv[]) { struct io_uring ring; int fd, ret; if (argc == 1) { printf("%s: <file>\n", argv[0]); return 1; } fd = open(argv[1], O_RDONLY | O_DIRECT); if (fd < 0) { perror("open"); return 1; } posix_memalign(&vec.iov_base, 4096, 512*1024); vec.iov_len = 512*1024; io_uring_queue_init(32, &ring, 0); ret = test(&ring, fd, 0); if (ret) { fprintf(stderr, "test 0 failed\n"); return 1; } ret = test(&ring, fd, 512); if (ret) { fprintf(stderr, "test 512 failed\n"); return 1; } close(fd); io_uring_queue_exit(&ring); return 0; } -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer 2025-04-17 13:57 ` Jens Axboe @ 2025-04-17 14:02 ` Pavel Begunkov 2025-04-17 14:05 ` Jens Axboe 0 siblings, 1 reply; 15+ messages in thread From: Pavel Begunkov @ 2025-04-17 14:02 UTC (permalink / raw) To: Jens Axboe, Nitesh Shetty; +Cc: io-uring On 4/17/25 14:57, Jens Axboe wrote: > On 4/17/25 7:41 AM, Jens Axboe wrote: >> I'll turn the test case into something we can add to liburing, and fold >> in that change. > > Here's what I tested, fwiw, and it reliably blows up pre the fixup. I'll > turn it into a normal test case, and then folks can add more invariants > to this one if they wish. Awesome, thanks! -- Pavel Begunkov ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer 2025-04-17 14:02 ` Pavel Begunkov @ 2025-04-17 14:05 ` Jens Axboe 0 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2025-04-17 14:05 UTC (permalink / raw) To: Pavel Begunkov, Nitesh Shetty; +Cc: io-uring On 4/17/25 8:02 AM, Pavel Begunkov wrote: > On 4/17/25 14:57, Jens Axboe wrote: >> On 4/17/25 7:41 AM, Jens Axboe wrote: >>> I'll turn the test case into something we can add to liburing, and fold >>> in that change. >> >> Here's what I tested, fwiw, and it reliably blows up pre the fixup. I'll >> turn it into a normal test case, and then folks can add more invariants >> to this one if they wish. > > Awesome, thanks! Done: https://git.kernel.dk/cgit/liburing/commit/?id=d700f83631a780182c3a3f1315e926d5547acedb -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 0/4] io_import_fixed cleanups and optimisation 2025-04-17 9:32 [PATCH 0/4] io_import_fixed cleanups and optimisation Pavel Begunkov ` (3 preceding siblings ...) 2025-04-17 9:32 ` [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer Pavel Begunkov @ 2025-04-17 12:31 ` Jens Axboe [not found] ` <CGME20250417142334epcas5p36df55874d21c896115d92e505f9793fd@epcas5p3.samsung.com> 5 siblings, 0 replies; 15+ messages in thread From: Jens Axboe @ 2025-04-17 12:31 UTC (permalink / raw) To: io-uring, Pavel Begunkov; +Cc: Nitesh Shetty On Thu, 17 Apr 2025 10:32:30 +0100, Pavel Begunkov wrote: > io_import_fixed() cleanups topped with the nr segs optimisation > patch from Nitesh. Doesn't have the kbuf part, assuming Jens will > add it on top. > > Based on io_uring-6.15 > > Nitesh Shetty (1): > io_uring/rsrc: send exact nr_segs for fixed buffer > > [...] Applied, thanks! [1/4] io_uring/rsrc: don't skip offset calculation commit: 1ac571288822253db32196c49f240739148417e3 [2/4] io_uring/rsrc: separate kbuf offset adjustments commit: 50169d07548441e3033b9bbaa06e573e7224f140 [3/4] io_uring/rsrc: refactor io_import_fixed commit: 59852ebad954c8a3ac8b746930c2ea60febe797a [4/4] io_uring/rsrc: send exact nr_segs for fixed buffer commit: c5a173d9864a597288c6acd091315cfc3b32ca1c Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <CGME20250417142334epcas5p36df55874d21c896115d92e505f9793fd@epcas5p3.samsung.com>]
* Re: [PATCH 0/4] io_import_fixed cleanups and optimisation [not found] ` <CGME20250417142334epcas5p36df55874d21c896115d92e505f9793fd@epcas5p3.samsung.com> @ 2025-04-17 14:15 ` Nitesh Shetty 0 siblings, 0 replies; 15+ messages in thread From: Nitesh Shetty @ 2025-04-17 14:15 UTC (permalink / raw) To: Pavel Begunkov; +Cc: io-uring [-- Attachment #1: Type: text/plain, Size: 664 bytes --] On 17/04/25 10:32AM, Pavel Begunkov wrote: >io_import_fixed() cleanups topped with the nr segs optimisation >patch from Nitesh. Doesn't have the kbuf part, assuming Jens will >add it on top. > >Based on io_uring-6.15 > >Nitesh Shetty (1): > io_uring/rsrc: send exact nr_segs for fixed buffer > >Pavel Begunkov (3): > io_uring/rsrc: don't skip offset calculation > io_uring/rsrc: separate kbuf offset adjustments > io_uring/rsrc: refactor io_import_fixed > > io_uring/rsrc.c | 74 ++++++++++++++++++++----------------------------- > 1 file changed, 30 insertions(+), 44 deletions(-) > >-- >2.48.1 > Reviewed and Tested by: Nitesh Shetty <nj.shetty@samsung.com> [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-04-17 15:05 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-17 9:32 [PATCH 0/4] io_import_fixed cleanups and optimisation Pavel Begunkov 2025-04-17 9:32 ` [PATCH 1/4] io_uring/rsrc: don't skip offset calculation Pavel Begunkov 2025-04-17 9:32 ` [PATCH 2/4] io_uring/rsrc: separate kbuf offset adjustments Pavel Begunkov 2025-04-17 9:32 ` [PATCH 3/4] io_uring/rsrc: refactor io_import_fixed Pavel Begunkov 2025-04-17 9:32 ` [PATCH 4/4] io_uring/rsrc: send exact nr_segs for fixed buffer Pavel Begunkov 2025-04-17 9:34 ` Pavel Begunkov [not found] ` <CGME20250417103133epcas5p32c1e004e7f8a5135c4c7e3662b087470@epcas5p3.samsung.com> 2025-04-17 10:23 ` Nitesh Shetty 2025-04-17 11:50 ` Nitesh Shetty 2025-04-17 12:56 ` Pavel Begunkov 2025-04-17 13:41 ` Jens Axboe 2025-04-17 13:57 ` Jens Axboe 2025-04-17 14:02 ` Pavel Begunkov 2025-04-17 14:05 ` Jens Axboe 2025-04-17 12:31 ` [PATCH 0/4] io_import_fixed cleanups and optimisation Jens Axboe [not found] ` <CGME20250417142334epcas5p36df55874d21c896115d92e505f9793fd@epcas5p3.samsung.com> 2025-04-17 14:15 ` Nitesh Shetty
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox