On 18/12/25 05:45PM, Ming Lei wrote: >On Thu, Dec 18, 2025 at 01:37:37AM -0800, Christoph Hellwig wrote: >> On Thu, Dec 18, 2025 at 05:31:42PM +0800, Ming Lei wrote: >> > ->bi_vcnt doesn't make sense for cloned bio, which is perfectly fine >> > passed to bio_may_need_split(). >> > >> > So fix bio_may_need_split() by not taking ->bi_vcnt directly, instead >> > checking with help from bio size and bvec->len. >> > >> > Meantime retrieving the 1st bvec via __bvec_iter_bvec(). >> >> That totally misses the point. The ->bi_vcnt is a fast and lose >> check to see if we need the fairly expensive iterators to do the >> real check. > >It is just __bvec_iter_bvec(), whatever it should be in cache sooner or >later. > > Functionality wise overall patch looks fine to me, but as Christoph stated there is slight performance(IOPS) penalty. Here is my benchmarking numbers[1], I suspect Jens setup might show more regression. Regards, Nitesh [1] =============================== a. two optane nvme device setup: ---------- base case: ---------- sudo taskset -c 0,1 /home/nitesh/src/private/fio/t/io_uring -b512 \ -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1 submitter=0, tid=206586, file=/dev/nvme0n1, nfiles=1, node=-1 submitter=1, tid=206587, file=/dev/nvme1n1, nfiles=1, node=-1 polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=6.45M, BW=3.15GiB/s, IOS/call=32/31 IOPS=6.47M, BW=3.16GiB/s, IOS/call=32/32 IOPS=6.47M, BW=3.16GiB/s, IOS/call=32/32 Exiting on timeout Maximum IOPS=6.47M ---------------- with this patch: ---------------- sudo taskset -c 0,1 /home/nitesh/src/private/fio/t/io_uring -b512 \ -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nvme0n1 /dev/nvme1n1 submitter=0, tid=6352, file=/dev/nvme0n1, nfiles=1, node=-1 polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 submitter=1, tid=6353, file=/dev/nvme1n1, nfiles=1, node=-1 IOPS=6.30M, BW=3.08GiB/s, IOS/call=32/31 IOPS=6.35M, BW=3.10GiB/s, IOS/call=32/31 IOPS=6.37M, BW=3.11GiB/s, IOS/call=32/32 Exiting on timeout Maximum IOPS=6.37M ============================= b. two null-blk device setup: ------------------ null device setup: ------------------ sudo modprobe null_blk queue_mode=2 gb=10 bs=512 nr_devices=2 irqmode=2 \ completion_nsec=1000000 hw_queue_depth=256 memory_backed=0 discard=0 \ use_per_node_hctx=1 ---------- base case: ---------- sudo taskset -c 0,1 /home/nitesh/src/private/fio/t/io_uring -b512 \ -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nullb0 /dev/nullb1 submitter=0, tid=6743, file=/dev/nullb0, nfiles=1, node=-1 submitter=1, tid=6744, file=/dev/nullb1, nfiles=1, node=-1 polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=7.89M, BW=3.85GiB/s, IOS/call=32/31 IOPS=7.96M, BW=3.89GiB/s, IOS/call=32/32 IOPS=7.99M, BW=3.90GiB/s, IOS/call=32/32 Exiting on timeout Maximum IOPS=7.99M ------------------- with this patchset: ------------------- sudo taskset -c 0,1 /home/nitesh/src/private/fio/t/io_uring -b512 \ -d128 -c32 -s32 -p1 -F1 -B1 -n2 -r4 /dev/nullb0 /dev/nullb1 submitter=0, tid=35633, file=/dev/nullb0, nfiles=1, node=-1 submitter=1, tid=35634, file=/dev/nullb1, nfiles=1, node=-1 polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 polled=1, fixedbufs=1, register_files=1, buffered=0, QD=128 Engine=io_uring, sq_ring=128, cq_ring=128 IOPS=7.79M, BW=3.80GiB/s, IOS/call=32/31 IOPS=7.86M, BW=3.84GiB/s, IOS/call=32/32 IOPS=7.89M, BW=3.85GiB/s, IOS/call=32/32 Exiting on timeout Maximum IOPS=7.89M