On Fri, Sep 02, 2022 at 03:25:33PM -0600, Jens Axboe wrote: >On 9/2/22 1:32 PM, Jens Axboe wrote: >> On 9/2/22 12:46 PM, Kanchan Joshi wrote: >>> On Fri, Sep 02, 2022 at 10:32:16AM -0600, Jens Axboe wrote: >>>> On 9/2/22 10:06 AM, Jens Axboe wrote: >>>>> On 9/2/22 9:16 AM, Kanchan Joshi wrote: >>>>>> Hi, >>>>>> >>>>>> Currently uring-cmd lacks the ability to leverage the pre-registered >>>>>> buffers. This series adds the support in uring-cmd, and plumbs >>>>>> nvme passthrough to work with it. >>>>>> >>>>>> Using registered-buffers showed peak-perf hike from 1.85M to 2.17M IOPS >>>>>> in my setup. >>>>>> >>>>>> Without fixedbufs >>>>>> ***************** >>>>>> # taskset -c 0 t/io_uring -b512 -d128 -c32 -s32 -p0 -F1 -B0 -O0 -n1 -u1 /dev/ng0n1 >>>>>> submitter=0, tid=5256, file=/dev/ng0n1, node=-1 >>>>>> polled=0, fixedbufs=0/0, register_files=1, buffered=1, QD=128 >>>>>> Engine=io_uring, sq_ring=128, cq_ring=128 >>>>>> IOPS=1.85M, BW=904MiB/s, IOS/call=32/31 >>>>>> IOPS=1.85M, BW=903MiB/s, IOS/call=32/32 >>>>>> IOPS=1.85M, BW=902MiB/s, IOS/call=32/32 >>>>>> ^CExiting on signal >>>>>> Maximum IOPS=1.85M >>>>> >>>>> With the poll support queued up, I ran this one as well. tldr is: >>>>> >>>>> bdev (non pt)??? 122M IOPS >>>>> irq driven??? 51-52M IOPS >>>>> polled??????? 71M IOPS >>>>> polled+fixed??? 78M IOPS >>> >>> except first one, rest three entries are for passthru? somehow I didn't >>> see that big of a gap. I will try to align my setup in coming days. >> >> Right, sorry it was badly labeled. First one is bdev with polling, >> registered buffers, etc. The others are all the passthrough mode. polled >> goes to 74M with the caching fix, so it's about a 74M -> 82M bump using >> registered buffers with passthrough and polling. >> >>>> polled+fixed??? 82M >>>> >>>> I suspect the remainder is due to the lack of batching on the request >>>> freeing side, at least some of it. Haven't really looked deeper yet. >>>> >>>> One issue I saw - try and use passthrough polling without having any >>>> poll queues defined and it'll stall just spinning on completions. You >>>> need to ensure that these are processed as well - look at how the >>>> non-passthrough io_uring poll path handles it. >>> >>> Had tested this earlier, and it used to run fine. And it does not now. >>> I see that io are getting completed, irq-completion is arriving in nvme >>> and it is triggering task-work based completion (by calling >>> io_uring_cmd_complete_in_task). But task-work never got called and >>> therefore no completion happened. >>> >>> io_uring_cmd_complete_in_task -> io_req_task_work_add -> __io_req_task_work_add >>> >>> Seems task work did not get added. Something about newly added >>> IORING_SETUP_DEFER_TASKRUN changes the scenario. >>> >>> static inline void __io_req_task_work_add(struct io_kiocb *req, bool allow_local) >>> { >>> ?????? struct io_uring_task *tctx = req->task->io_uring; >>> ?????? struct io_ring_ctx *ctx = req->ctx; >>> ?????? struct llist_node *node; >>> >>> ?????? if (allow_local && ctx->flags & IORING_SETUP_DEFER_TASKRUN) { >>> ?????????????? io_req_local_work_add(req); >>> ?????????????? return; >>> ?????? } >>> ????.... >>> >>> To confirm, I commented that in t/io_uring and it runs fine. >>> Please see if that changes anything for you? I will try to find the >>> actual fix tomorow. >> >> Ah gotcha, yes that actually makes a lot of sense. I wonder if regular >> polling is then also broken without poll queues if >> IORING_SETUP_DEFER_TASKRUN is set. It should be, I'll check into >> io_iopoll_check(). > >A mix of fixes and just cleanups, here's what I got. Thanks, this looks much better. Just something to discuss on the fix though. Will use other thread for that.