On 9/4/22 11:52 PM, Kanchan Joshi wrote: > On Sun, Sep 04, 2022 at 02:17:33PM -0600, Jens Axboe wrote: >> On 9/4/22 11:01 AM, Kanchan Joshi wrote: >>> On Sat, Sep 03, 2022 at 11:00:43AM -0600, Jens Axboe wrote: >>>> On 9/2/22 3:25 PM, Jens Axboe wrote: >>>>> On 9/2/22 1:32 PM, Jens Axboe wrote: >>>>>> On 9/2/22 12:46 PM, Kanchan Joshi wrote: >>>>>>> On Fri, Sep 02, 2022 at 10:32:16AM -0600, Jens Axboe wrote: >>>>>>>> On 9/2/22 10:06 AM, Jens Axboe wrote: >>>>>>>>> On 9/2/22 9:16 AM, Kanchan Joshi wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Currently uring-cmd lacks the ability to leverage the pre-registered >>>>>>>>>> buffers. This series adds the support in uring-cmd, and plumbs >>>>>>>>>> nvme passthrough to work with it. >>>>>>>>>> >>>>>>>>>> Using registered-buffers showed peak-perf hike from 1.85M to 2.17M IOPS >>>>>>>>>> in my setup. >>>>>>>>>> >>>>>>>>>> Without fixedbufs >>>>>>>>>> ***************** >>>>>>>>>> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1 >>>>>>>>>> submitter=0, tid=5256, file=/dev/ng0n1, node=-1 >>>>>>>>>> polled=0, fixedbufs=0/0, register_files=1, buffered=1, QD=128 >>>>>>>>>> Engine=io_uring, sq_ring=128, cq_ring=128 >>>>>>>>>> IOPS=1.85M, BW=904MiB/s, IOS/call=32/31 >>>>>>>>>> IOPS=1.85M, BW=903MiB/s, IOS/call=32/32 >>>>>>>>>> IOPS=1.85M, BW=902MiB/s, IOS/call=32/32 >>>>>>>>>> ^CExiting on signal >>>>>>>>>> Maximum IOPS=1.85M >>>>>>>>> >>>>>>>>> With the poll support queued up, I ran this one as well. tldr is: >>>>>>>>> >>>>>>>>> bdev (non pt)??? 122M IOPS >>>>>>>>> irq driven??? 51-52M IOPS >>>>>>>>> polled??????? 71M IOPS >>>>>>>>> polled+fixed??? 78M IOPS >>>> >>>> Followup on this, since t/io_uring didn't correctly detect NUMA nodes >>>> for passthrough. >>>> >>>> With the current tree and the patchset I just sent for iopoll and the >>>> caching fix that's in the block tree, here's the final score: >>>> >>>> polled+fixed passthrough??? 105M IOPS >>>> >>>> which is getting pretty close to the bdev polled fixed path as well. >>>> I think that is starting to look pretty good! >>> Great! In my setup (single disk/numa-node), current kernel shows- >>> >>> Block MIOPS >>> *********** >>> command:t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -P1 -n1 /dev/nvme0n1 >>> plain: 1.52 >>> plain+fb: 1.77 >>> plain+poll: 2.23 >>> plain+fb+poll: 2.61 >>> >>> Passthru MIOPS >>> ************** >>> command:t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -P1 -u1 -n1 /dev/ng0n1 >>> plain: 1.78 >>> plain+fb: 2.08 >>> plain+poll: 2.21 >>> plain+fb+poll: 2.69 >> >> Interesting, here's what I have: >> >> Block MIOPS >> ============ >> plain: 2.90 >> plain+fb: 3.0 >> plain+poll: 4.04 >> plain+fb+poll: 5.09 >> >> Passthru MIPS >> ============= >> plain: 2.37 >> plain+fb: 2.84 >> plain+poll: 3.65 >> plain+fb+poll: 4.93 >> >> This is a gen2 optane > same. Do you see same 'FW rev' as below? > > # nvme list > Node????????????????? SN?????????????????? Model??????????????????????????????????? Namespace Usage????????????????????? Format?????????? FW Rev > --------------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- > /dev/nvme0n1????????? PHAL11730018400AGN?? INTEL SSDPF21Q400GB????????????????????? 1???????? 400.09? GB / 400.09? GB??? 512?? B +? 0 B?? L0310200 > > >> , it maxes out at right around 5.1M IOPS. Note that >> I have disabled iostats and merges generally in my runs: >> >> echo 0 > /sys/block/nvme0n1/queue/iostats >> echo 2 > /sys/block/nvme0n1/queue/nomerges >> >> which will impact block more than passthru obviously, particularly >> the nomerges. iostats should have a similar impact on both of them (but >> I haven't tested either of those without those disabled). > > bit improvment after disabling, but for all entries. > > block > ===== > plain: 1.6 > plain+FB: 1.91 > plain+poll: 2.36 > plain+FB+poll: 2.85 > > passthru > ======== > plain: 1.9 > plain+FB: 2.2 > plain+poll: 2.4 > plain+FB+poll: 2.9 > > Maybe there is something about my kernel-config that prevents from > reaching to expected peak (i.e. 5.1M). Will check more. Here's the config I use for this kind of testing, in case it's useful. -- Jens Axboe