From: Luis Chamberlain <[email protected]>
To: Hannes Reinecke <[email protected]>
Cc: Kanchan Joshi <[email protected]>,
[email protected],
[email protected], [email protected],
[email protected], Doug Gilbert <[email protected]>,
[email protected], [email protected], [email protected],
[email protected], [email protected], [email protected]
Subject: Re: [LSF/MM/BPF Topic] Towards more useful nvme-passthrough
Date: Wed, 2 Mar 2022 16:45:37 -0800 [thread overview]
Message-ID: <20220303004537.yceop3zwrwzg3wni@garbanzo> (raw)
In-Reply-To: <[email protected]>
On Thu, Jun 24, 2021 at 11:24:27AM +0200, Hannes Reinecke wrote:
> On 6/9/21 12:50 PM, Kanchan Joshi wrote:
> > Background & objectives:
> > ------------------------
> >
> > The NVMe passthrough interface
> >
> > Good part: allows new device-features to be usable (at least in raw
> > form) without having to build block-generic cmds, in-kernel users,
> > emulations and file-generic user-interfaces - all this take some time to
> > evolve.
> >
> > Bad part: passthrough interface has remain tied to synchronous ioctl,
> > which is a blocker for performance-centric usage scenarios. User-space
> > can take the pain of implementing async-over-sync on its own but it does
> > not make much sense in a world that already has io_uring.
> >
> > Passthrough is lean in the sense it cuts through layers of abstractions
> > and reaches to NVMe fast. One of the objective here is to build a
> > scalable pass-through that can be readily used to play with new/emerging
> > NVMe features. Another is to surpass/match existing raw/direct block
> > I/O performance with this new in-kernel path.
> >
> > Recent developments:
> > --------------------
> > - NVMe now has a per-namespace char interface that remains available/usable
> > even for unsupported features and for new command-sets [1].
> >
> > - Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
> > introduces new possibilities (beyond storage); async-passthrough is one of
> > those. Last posted version is V4 [2].
> >
> > - I have posted work on async nvme passthrough over block-dev [3]. Posted work
> > is in V4 (in sync with the infra of [2]).
> >
> > Early performance numbers:
> > --------------------------
> > fio, randread, 4k bs, 1 job
> > Kiops, with varying QD:
> >
> > QD Sync-PT io_uring Async-PT
> > 1 10.8 10.6 10.6
> > 2 10.9 24.5 24
> > 4 10.6 45 46
> > 8 10.9 90 89
> > 16 11.0 169 170
> > 32 10.6 308 307
> > 64 10.8 503 506
> > 128 10.9 592 596
> >
> > Further steps/discussion points:
> > --------------------------------
> > 1.Async-passthrough over nvme char-dev
> > It is in a shape to receive feedback, but I am not sure if community
> > would like to take a look at that before settling on uring-cmd infra.
> >
> > 2.Once above gets in shape, bring other perf-centric features of io_uring to
> > this path -
> > A. SQPoll and register-file: already functional.
> > B. Passthrough polling: This can be enabled for block and looks feasible for
> > char-interface as well. Keith recently posted enabling polling for user
> > pass-through [4]
> > C. Pre-mapped buffers: Early thought is to let the buffers registered by
> > io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
> > everything that passthrough does except pinning/unpinning the pages.
> >
> > 3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
> > which can be optimized.
> >
> > Ideally I'd like to cover good deal of ground before Dec. But there seems
> > plenty of possibilities on this path. Discussion would help in how best to
> > move forward, and cement the ideas.
> >
> > [1] https://lore.kernel.org/linux-nvme/[email protected]/
> > [2] https://lore.kernel.org/linux-nvme/[email protected]/
> > [3] https://lore.kernel.org/linux-nvme/[email protected]/
> > [4] https://lore.kernel.org/linux-block/[email protected]/#t
> >
> I do like the idea.
>
> What I would like to see is to make the ioring_cmd infrastructure
> generally available, such that we can port the SCSI sg asynchronous
> interface over to this.
What prevents you from doing this already? I think we just need more
patch reviews for the generic io-uring cmd patches, no?
Luis
prev parent reply other threads:[~2022-03-03 0:45 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20210609105347epcas5p42ab916655fca311157a38d54f79f95e7@epcas5p4.samsung.com>
2021-06-09 10:50 ` [LSF/MM/BPF Topic] Towards more useful nvme-passthrough Kanchan Joshi
2021-06-24 9:24 ` Hannes Reinecke
2022-03-03 0:45 ` Luis Chamberlain [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220303004537.yceop3zwrwzg3wni@garbanzo \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox