From: Kanchan Joshi <[email protected]>
To: [email protected],
[email protected], [email protected],
[email protected]
Cc: [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected],
Kanchan Joshi <[email protected]>
Subject: [LSF/MM/BPF Topic] Towards more useful nvme-passthrough
Date: Wed, 9 Jun 2021 16:20:50 +0530 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: CGME20210609105347epcas5p42ab916655fca311157a38d54f79f95e7@epcas5p4.samsung.com
Background & objectives:
------------------------
The NVMe passthrough interface
Good part: allows new device-features to be usable (at least in raw
form) without having to build block-generic cmds, in-kernel users,
emulations and file-generic user-interfaces - all this take some time to
evolve.
Bad part: passthrough interface has remain tied to synchronous ioctl,
which is a blocker for performance-centric usage scenarios. User-space
can take the pain of implementing async-over-sync on its own but it does
not make much sense in a world that already has io_uring.
Passthrough is lean in the sense it cuts through layers of abstractions
and reaches to NVMe fast. One of the objective here is to build a
scalable pass-through that can be readily used to play with new/emerging
NVMe features. Another is to surpass/match existing raw/direct block
I/O performance with this new in-kernel path.
Recent developments:
--------------------
- NVMe now has a per-namespace char interface that remains available/usable
even for unsupported features and for new command-sets [1].
- Jens has proposed async-ioctl like facility 'uring_cmd' in io_uring. This
introduces new possibilities (beyond storage); async-passthrough is one of
those. Last posted version is V4 [2].
- I have posted work on async nvme passthrough over block-dev [3]. Posted work
is in V4 (in sync with the infra of [2]).
Early performance numbers:
--------------------------
fio, randread, 4k bs, 1 job
Kiops, with varying QD:
QD Sync-PT io_uring Async-PT
1 10.8 10.6 10.6
2 10.9 24.5 24
4 10.6 45 46
8 10.9 90 89
16 11.0 169 170
32 10.6 308 307
64 10.8 503 506
128 10.9 592 596
Further steps/discussion points:
--------------------------------
1.Async-passthrough over nvme char-dev
It is in a shape to receive feedback, but I am not sure if community
would like to take a look at that before settling on uring-cmd infra.
2.Once above gets in shape, bring other perf-centric features of io_uring to
this path -
A. SQPoll and register-file: already functional.
B. Passthrough polling: This can be enabled for block and looks feasible for
char-interface as well. Keith recently posted enabling polling for user
pass-through [4]
C. Pre-mapped buffers: Early thought is to let the buffers registered by
io_uring, and add a new passthrough ioctl/uring_cmd in driver which does
everything that passthrough does except pinning/unpinning the pages.
3. Are there more things in the "io_uring->nvme->[block-layer]->nvme" path
which can be optimized.
Ideally I'd like to cover good deal of ground before Dec. But there seems
plenty of possibilities on this path. Discussion would help in how best to
move forward, and cement the ideas.
[1] https://lore.kernel.org/linux-nvme/[email protected]/
[2] https://lore.kernel.org/linux-nvme/[email protected]/
[3] https://lore.kernel.org/linux-nvme/[email protected]/
[4] https://lore.kernel.org/linux-block/[email protected]/#t
--
2.25.1
next parent reply other threads:[~2021-06-09 11:36 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20210609105347epcas5p42ab916655fca311157a38d54f79f95e7@epcas5p4.samsung.com>
2021-06-09 10:50 ` Kanchan Joshi [this message]
2021-06-24 9:24 ` [LSF/MM/BPF Topic] Towards more useful nvme-passthrough Hannes Reinecke
2022-03-03 0:45 ` Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox