* [PATCH v16 00/11] Block write streams with nvme fdp
[not found] <CGME20250506122633epcas5p21d2c989313f38dea82162fff7b9856e7@epcas5p2.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
[not found] ` <CGME20250506122635epcas5p145565666b3bfedf8da08075dd928d2ac@epcas5p1.samsung.com>
` (11 more replies)
0 siblings, 12 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Kanchan Joshi
The series enables FDP support for block IO.
The patches
- Add ki_write_stream in kiocb (patch 1), and bi_write_stream in bio (patch 2).
- Introduce two new queue limits - max_write_streams and
write_stream_granularity (patch 3, 4)
- Pass write stream (either from kiocb, or from inode write hints)
for block device (patch 5)
- Per I/O write stream interface in io_uring (patch 6)
- Register nvme fdp via write stream queue limits (patch 10, 11)
Changes since v15:
- Merged to latest for-next (Jens)
Previous discussions:
v15: https://lore.kernel.org/linux-nvme/20250203184129.1829324-1-kbusch@meta.com/T/#u
v14: https://lore.kernel.org/linux-nvme/20241211183514.64070-1-kbusch@meta.com/T/#u
v13: https://lore.kernel.org/linux-nvme/20241210194722.1905732-1-kbusch@meta.com/T/#u
v12: https://lore.kernel.org/linux-nvme/20241206221801.790690-1-kbusch@meta.com/T/#u
v11: https://lore.kernel.org/linux-nvme/20241206015308.3342386-1-kbusch@meta.com/T/#u
v10: https://lore.kernel.org/linux-nvme/20241029151922.459139-1-kbusch@meta.com/T/#u
v9: https://lore.kernel.org/linux-nvme/20241025213645.3464331-1-kbusch@meta.com/T/#u
v8: https://lore.kernel.org/linux-nvme/20241017160937.2283225-1-kbusch@meta.com/T/#u
v7: https://lore.kernel.org/linux-nvme/20240930181305.17286-1-joshi.k@samsung.com/T/#u
v6: https://lore.kernel.org/linux-nvme/20240924092457.7846-1-joshi.k@samsung.com/T/#u
v5: https://lore.kernel.org/linux-nvme/20240910150200.6589-1-joshi.k@samsung.com/T/#u
v4: https://lore.kernel.org/linux-nvme/20240826170606.255718-1-joshi.k@samsung.com/T/#u
v3: https://lore.kernel.org/linux-nvme/20240702102619.164170-1-joshi.k@samsung.com/T/#u
v2: https://lore.kernel.org/linux-nvme/20240528150233.55562-1-joshi.k@samsung.com/T/#u
v1: https://lore.kernel.org/linux-nvme/20240510134015.29717-1-joshi.k@samsung.com/T/#u
Christoph Hellwig (7):
fs: add a write stream field to the kiocb
block: add a bi_write_stream field
block: introduce a write_stream_granularity queue limit
block: expose write streams for block device nodes
nvme: add a nvme_get_log_lsi helper
nvme: pass a void pointer to nvme_get/set_features for the result
nvme: add FDP definitions
Keith Busch (4):
block: introduce max_write_streams queue limit
io_uring: enable per-io write streams
nvme: register fdp parameters with the block layer
nvme: use fdp streams if write stream is provided
Documentation/ABI/stable/sysfs-block | 15 +++
block/bio.c | 2 +
block/blk-crypto-fallback.c | 1 +
block/blk-merge.c | 4 +
block/blk-sysfs.c | 6 +
block/fops.c | 23 ++++
drivers/nvme/host/core.c | 191 ++++++++++++++++++++++++++-
drivers/nvme/host/nvme.h | 7 +-
include/linux/blk_types.h | 1 +
include/linux/blkdev.h | 10 ++
include/linux/fs.h | 1 +
include/linux/nvme.h | 77 +++++++++++
include/uapi/linux/io_uring.h | 4 +
io_uring/io_uring.c | 2 +
io_uring/rw.c | 1 +
15 files changed, 339 insertions(+), 6 deletions(-)
base-commit: e6d9dcfdc0c53b87cfe86163bfbd14f6457ef2b7
--
2.25.1
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v16 01/11] fs: add a write stream field to the kiocb
[not found] ` <CGME20250506122635epcas5p145565666b3bfedf8da08075dd928d2ac@epcas5p1.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Christoph Hellwig <hch@lst.de>
Prepare for io_uring passthrough of write streams. The write stream
field in the kiocb structure fits into an existing 2-byte hole, so its
size is not changed.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
include/linux/fs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 016b0fe1536e..d5988867fe31 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -408,6 +408,7 @@ struct kiocb {
void *private;
int ki_flags;
u16 ki_ioprio; /* See linux/ioprio.h */
+ u8 ki_write_stream;
union {
/*
* Only used for async buffered reads, where it denotes the
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 02/11] block: add a bi_write_stream field
[not found] ` <CGME20250506122637epcas5p4a4e84171a1c6fa4ce0f01b6783fa2385@epcas5p4.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Christoph Hellwig <hch@lst.de>
Add the ability to pass a write stream for placement control in the bio.
The new field fits in an existing hole, so does not change the size of
the struct.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
block/bio.c | 2 ++
block/blk-crypto-fallback.c | 1 +
block/blk-merge.c | 4 ++++
include/linux/blk_types.h | 1 +
4 files changed, 8 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index 4e6c85a33d74..1e42aefc7377 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -251,6 +251,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
bio->bi_flags = 0;
bio->bi_ioprio = 0;
bio->bi_write_hint = 0;
+ bio->bi_write_stream = 0;
bio->bi_status = 0;
bio->bi_iter.bi_sector = 0;
bio->bi_iter.bi_size = 0;
@@ -827,6 +828,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
bio_set_flag(bio, BIO_CLONED);
bio->bi_ioprio = bio_src->bi_ioprio;
bio->bi_write_hint = bio_src->bi_write_hint;
+ bio->bi_write_stream = bio_src->bi_write_stream;
bio->bi_iter = bio_src->bi_iter;
if (bio->bi_bdev) {
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index f154be0b575a..005c9157ffb3 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -173,6 +173,7 @@ static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
bio_set_flag(bio, BIO_REMAPPED);
bio->bi_ioprio = bio_src->bi_ioprio;
bio->bi_write_hint = bio_src->bi_write_hint;
+ bio->bi_write_stream = bio_src->bi_write_stream;
bio->bi_iter.bi_sector = bio_src->bi_iter.bi_sector;
bio->bi_iter.bi_size = bio_src->bi_iter.bi_size;
diff --git a/block/blk-merge.c b/block/blk-merge.c
index fdd4efb54c6c..782308b73b53 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -832,6 +832,8 @@ static struct request *attempt_merge(struct request_queue *q,
if (req->bio->bi_write_hint != next->bio->bi_write_hint)
return NULL;
+ if (req->bio->bi_write_stream != next->bio->bi_write_stream)
+ return NULL;
if (req->bio->bi_ioprio != next->bio->bi_ioprio)
return NULL;
if (!blk_atomic_write_mergeable_rqs(req, next))
@@ -953,6 +955,8 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
return false;
if (rq->bio->bi_write_hint != bio->bi_write_hint)
return false;
+ if (rq->bio->bi_write_stream != bio->bi_write_stream)
+ return false;
if (rq->bio->bi_ioprio != bio->bi_ioprio)
return false;
if (blk_atomic_write_mergeable_rq_bio(rq, bio) == false)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 5a46067e85b1..f38425338c3f 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -220,6 +220,7 @@ struct bio {
unsigned short bi_flags; /* BIO_* below */
unsigned short bi_ioprio;
enum rw_hint bi_write_hint;
+ u8 bi_write_stream;
blk_status_t bi_status;
atomic_t __bi_remaining;
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 03/11] block: introduce max_write_streams queue limit
[not found] ` <CGME20250506122638epcas5p364107da78e115a57f1fa91436265edeb@epcas5p3.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Keith Busch <kbusch@kernel.org>
Drivers with hardware that support write streams need a way to export how
many are available so applications can generically query this.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
[hch: renamed hints to streams, removed stacking]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
Documentation/ABI/stable/sysfs-block | 7 +++++++
block/blk-sysfs.c | 3 +++
include/linux/blkdev.h | 9 +++++++++
3 files changed, 19 insertions(+)
diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 11545c9e2e93..8bbe1eca28df 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -547,6 +547,13 @@ Description:
[RO] Maximum size in bytes of a single element in a DMA
scatter/gather list.
+What: /sys/block/<disk>/queue/max_write_streams
+Date: November 2024
+Contact: linux-block@vger.kernel.org
+Description:
+ [RO] Maximum number of write streams supported, 0 if not
+ supported. If supported, valid values are 1 through
+ max_write_streams, inclusive.
What: /sys/block/<disk>/queue/max_segments
Date: March 2010
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 1f9b45b0b9ee..986cdba4f550 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -134,6 +134,7 @@ QUEUE_SYSFS_LIMIT_SHOW(max_segments)
QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments)
QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments)
QUEUE_SYSFS_LIMIT_SHOW(max_segment_size)
+QUEUE_SYSFS_LIMIT_SHOW(max_write_streams)
QUEUE_SYSFS_LIMIT_SHOW(logical_block_size)
QUEUE_SYSFS_LIMIT_SHOW(physical_block_size)
QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors)
@@ -488,6 +489,7 @@ QUEUE_LIM_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb");
QUEUE_LIM_RO_ENTRY(queue_max_segments, "max_segments");
QUEUE_LIM_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
QUEUE_LIM_RO_ENTRY(queue_max_segment_size, "max_segment_size");
+QUEUE_LIM_RO_ENTRY(queue_max_write_streams, "max_write_streams");
QUEUE_RW_ENTRY(elv_iosched, "scheduler");
QUEUE_LIM_RO_ENTRY(queue_logical_block_size, "logical_block_size");
@@ -642,6 +644,7 @@ static struct attribute *queue_attrs[] = {
&queue_max_discard_segments_entry.attr,
&queue_max_integrity_segments_entry.attr,
&queue_max_segment_size_entry.attr,
+ &queue_max_write_streams_entry.attr,
&queue_hw_sector_size_entry.attr,
&queue_logical_block_size_entry.attr,
&queue_physical_block_size_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a9bd945e87b9..3747fbbd65fa 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -402,6 +402,8 @@ struct queue_limits {
unsigned short max_integrity_segments;
unsigned short max_discard_segments;
+ unsigned short max_write_streams;
+
unsigned int max_open_zones;
unsigned int max_active_zones;
@@ -1285,6 +1287,13 @@ static inline unsigned int bdev_max_segments(struct block_device *bdev)
return queue_max_segments(bdev_get_queue(bdev));
}
+static inline unsigned short bdev_max_write_streams(struct block_device *bdev)
+{
+ if (bdev_is_partition(bdev))
+ return 0;
+ return bdev_limits(bdev)->max_write_streams;
+}
+
static inline unsigned queue_logical_block_size(const struct request_queue *q)
{
return q->limits.logical_block_size;
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 04/11] block: introduce a write_stream_granularity queue limit
[not found] ` <CGME20250506122640epcas5p43b5abe6562ad64ee1d7254b1215906d4@epcas5p4.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Christoph Hellwig <hch@lst.de>
Export the granularity that write streams should be discarded with,
as it is essential for making good use of them.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
Documentation/ABI/stable/sysfs-block | 8 ++++++++
block/blk-sysfs.c | 3 +++
include/linux/blkdev.h | 1 +
3 files changed, 12 insertions(+)
diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 8bbe1eca28df..4ba771b56b3b 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -555,6 +555,14 @@ Description:
supported. If supported, valid values are 1 through
max_write_streams, inclusive.
+What: /sys/block/<disk>/queue/write_stream_granularity
+Date: November 2024
+Contact: linux-block@vger.kernel.org
+Description:
+ [RO] Granularity of a write stream in bytes. The granularity
+ of a write stream is the size that should be discarded or
+ overwritten together to avoid write amplification in the device.
+
What: /sys/block/<disk>/queue/max_segments
Date: March 2010
Contact: linux-block@vger.kernel.org
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 986cdba4f550..ed00dedfb9ce 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -135,6 +135,7 @@ QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments)
QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments)
QUEUE_SYSFS_LIMIT_SHOW(max_segment_size)
QUEUE_SYSFS_LIMIT_SHOW(max_write_streams)
+QUEUE_SYSFS_LIMIT_SHOW(write_stream_granularity)
QUEUE_SYSFS_LIMIT_SHOW(logical_block_size)
QUEUE_SYSFS_LIMIT_SHOW(physical_block_size)
QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors)
@@ -490,6 +491,7 @@ QUEUE_LIM_RO_ENTRY(queue_max_segments, "max_segments");
QUEUE_LIM_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
QUEUE_LIM_RO_ENTRY(queue_max_segment_size, "max_segment_size");
QUEUE_LIM_RO_ENTRY(queue_max_write_streams, "max_write_streams");
+QUEUE_LIM_RO_ENTRY(queue_write_stream_granularity, "write_stream_granularity");
QUEUE_RW_ENTRY(elv_iosched, "scheduler");
QUEUE_LIM_RO_ENTRY(queue_logical_block_size, "logical_block_size");
@@ -645,6 +647,7 @@ static struct attribute *queue_attrs[] = {
&queue_max_integrity_segments_entry.attr,
&queue_max_segment_size_entry.attr,
&queue_max_write_streams_entry.attr,
+ &queue_write_stream_granularity_entry.attr,
&queue_hw_sector_size_entry.attr,
&queue_logical_block_size_entry.attr,
&queue_physical_block_size_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3747fbbd65fa..886009b6c3e5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -403,6 +403,7 @@ struct queue_limits {
unsigned short max_discard_segments;
unsigned short max_write_streams;
+ unsigned int write_stream_granularity;
unsigned int max_open_zones;
unsigned int max_active_zones;
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 05/11] block: expose write streams for block device nodes
[not found] ` <CGME20250506122642epcas5p267fef037060e55d1e9c0055b0dfd692e@epcas5p2.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Christoph Hellwig <hch@lst.de>
Use the per-kiocb write stream if provided, or map temperature hints to
write streams (which is a bit questionable, but this shows how it is
done).
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[kbusch: removed statx reporting]
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
block/fops.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/block/fops.c b/block/fops.c
index b6d7cdd96b54..1309861d4c2c 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -73,6 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
}
bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
+ bio.bi_write_stream = iocb->ki_write_stream;
bio.bi_ioprio = iocb->ki_ioprio;
if (iocb->ki_flags & IOCB_ATOMIC)
bio.bi_opf |= REQ_ATOMIC;
@@ -206,6 +207,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
for (;;) {
bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
+ bio->bi_write_stream = iocb->ki_write_stream;
bio->bi_private = dio;
bio->bi_end_io = blkdev_bio_end_io;
bio->bi_ioprio = iocb->ki_ioprio;
@@ -333,6 +335,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
dio->iocb = iocb;
bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
+ bio->bi_write_stream = iocb->ki_write_stream;
bio->bi_end_io = blkdev_bio_end_io_async;
bio->bi_ioprio = iocb->ki_ioprio;
@@ -398,6 +401,26 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (blkdev_dio_invalid(bdev, iocb, iter))
return -EINVAL;
+ if (iov_iter_rw(iter) == WRITE) {
+ u16 max_write_streams = bdev_max_write_streams(bdev);
+
+ if (iocb->ki_write_stream) {
+ if (iocb->ki_write_stream > max_write_streams)
+ return -EINVAL;
+ } else if (max_write_streams) {
+ enum rw_hint write_hint =
+ file_inode(iocb->ki_filp)->i_write_hint;
+
+ /*
+ * Just use the write hint as write stream for block
+ * device writes. This assumes no file system is
+ * mounted that would use the streams differently.
+ */
+ if (write_hint <= max_write_streams)
+ iocb->ki_write_stream = write_hint;
+ }
+ }
+
nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
if (likely(nr_pages <= BIO_MAX_VECS)) {
if (is_sync_kiocb(iocb))
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 06/11] io_uring: enable per-io write streams
[not found] ` <CGME20250506122644epcas5p2b2bf2c66172dbaf3127f6621062efb24@epcas5p2.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Keith Busch <kbusch@kernel.org>
Allow userspace to pass a per-I/O write stream in the SQE:
__u8 write_stream;
The __u8 type matches the size the filesystems and block layer support.
Application can query the supported values from the block devices
max_write_streams sysfs attribute. Unsupported values are ignored by
file operations that do not support write streams or rejected with an
error by those that support them.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
include/uapi/linux/io_uring.h | 4 ++++
io_uring/io_uring.c | 2 ++
io_uring/rw.c | 1 +
3 files changed, 7 insertions(+)
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 5ce096090b0c..cfd17e382082 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -94,6 +94,10 @@ struct io_uring_sqe {
__u16 addr_len;
__u16 __pad3[1];
};
+ struct {
+ __u8 write_stream;
+ __u8 __pad4[3];
+ };
};
union {
struct {
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 703251f6f4d8..36c689a50126 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3916,6 +3916,8 @@ static int __init io_uring_init(void)
BUILD_BUG_SQE_ELEM(44, __s32, splice_fd_in);
BUILD_BUG_SQE_ELEM(44, __u32, file_index);
BUILD_BUG_SQE_ELEM(44, __u16, addr_len);
+ BUILD_BUG_SQE_ELEM(44, __u8, write_stream);
+ BUILD_BUG_SQE_ELEM(45, __u8, __pad4[0]);
BUILD_BUG_SQE_ELEM(46, __u16, __pad3[0]);
BUILD_BUG_SQE_ELEM(48, __u64, addr3);
BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd);
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 17a12a1cf3a6..303fdded3758 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -279,6 +279,7 @@ static int __io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
}
rw->kiocb.dio_complete = NULL;
rw->kiocb.ki_flags = 0;
+ rw->kiocb.ki_write_stream = READ_ONCE(sqe->write_stream);
if (req->ctx->flags & IORING_SETUP_IOPOLL)
rw->kiocb.ki_complete = io_complete_rw_iopoll;
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 07/11] nvme: add a nvme_get_log_lsi helper
[not found] ` <CGME20250506122646epcas5p3bd2a00493c94d1032c31ec64aaa1bbb0@epcas5p3.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Christoph Hellwig <hch@lst.de>
For log pages that need to pass in a LSI value, while at the same time
not touching all the existing nvme_get_log callers.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
drivers/nvme/host/core.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index eb6ea8acb3cc..0d834ca606d9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -150,6 +150,8 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
unsigned nsid);
static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
struct nvme_command *cmd);
+static int nvme_get_log_lsi(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page,
+ u8 lsp, u8 csi, void *log, size_t size, u64 offset, u16 lsi);
void nvme_queue_scan(struct nvme_ctrl *ctrl)
{
@@ -3084,8 +3086,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
return ret;
}
-int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
- void *log, size_t size, u64 offset)
+static int nvme_get_log_lsi(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page,
+ u8 lsp, u8 csi, void *log, size_t size, u64 offset, u16 lsi)
{
struct nvme_command c = { };
u32 dwlen = nvme_bytes_to_numd(size);
@@ -3099,10 +3101,18 @@ int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
c.get_log_page.lpol = cpu_to_le32(lower_32_bits(offset));
c.get_log_page.lpou = cpu_to_le32(upper_32_bits(offset));
c.get_log_page.csi = csi;
+ c.get_log_page.lsi = cpu_to_le16(lsi);
return nvme_submit_sync_cmd(ctrl->admin_q, &c, log, size);
}
+int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
+ void *log, size_t size, u64 offset)
+{
+ return nvme_get_log_lsi(ctrl, nsid, log_page, lsp, csi, log, size,
+ offset, 0);
+}
+
static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,
struct nvme_effects_log **log)
{
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 08/11] nvme: pass a void pointer to nvme_get/set_features for the result
[not found] ` <CGME20250506122647epcas5p41ed9efc231e2300a1547f6081db73842@epcas5p4.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Christoph Hellwig <hch@lst.de>
That allows passing in structures instead of the u32 result, and thus
reduce the amount of bit shifting and masking required to parse the
result.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
drivers/nvme/host/core.c | 4 ++--
drivers/nvme/host/nvme.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0d834ca606d9..dd71b4c2b7b7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1676,7 +1676,7 @@ static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned int fid,
int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
unsigned int dword11, void *buffer, size_t buflen,
- u32 *result)
+ void *result)
{
return nvme_features(dev, nvme_admin_set_features, fid, dword11, buffer,
buflen, result);
@@ -1685,7 +1685,7 @@ EXPORT_SYMBOL_GPL(nvme_set_features);
int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
unsigned int dword11, void *buffer, size_t buflen,
- u32 *result)
+ void *result)
{
return nvme_features(dev, nvme_admin_get_features, fid, dword11, buffer,
buflen, result);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 51e078642127..aedb734283b8 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -896,10 +896,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
int qid, nvme_submit_flags_t flags);
int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
unsigned int dword11, void *buffer, size_t buflen,
- u32 *result);
+ void *result);
int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
unsigned int dword11, void *buffer, size_t buflen,
- u32 *result);
+ void *result);
int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 09/11] nvme: add FDP definitions
[not found] ` <CGME20250506122649epcas5p1294652bcfc93f08dd12e6ba8a497c55b@epcas5p1.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
0 siblings, 0 replies; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Christoph Hellwig <hch@lst.de>
Add the config feature result, config log page, and management receive
commands needed for FDP.
Partially based on a patch from Kanchan Joshi <joshi.k@samsung.com>.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
include/linux/nvme.h | 77 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 77 insertions(+)
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 2479ed10f53e..51308f65b72f 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -303,6 +303,7 @@ enum nvme_ctrl_attr {
NVME_CTRL_ATTR_TBKAS = (1 << 6),
NVME_CTRL_ATTR_ELBAS = (1 << 15),
NVME_CTRL_ATTR_RHII = (1 << 18),
+ NVME_CTRL_ATTR_FDPS = (1 << 19),
};
struct nvme_id_ctrl {
@@ -689,6 +690,44 @@ struct nvme_rotational_media_log {
__u8 rsvd24[488];
};
+struct nvme_fdp_config {
+ __u8 flags;
+#define FDPCFG_FDPE (1U << 0)
+ __u8 fdpcidx;
+ __le16 reserved;
+};
+
+struct nvme_fdp_ruh_desc {
+ __u8 ruht;
+ __u8 reserved[3];
+};
+
+struct nvme_fdp_config_desc {
+ __le16 dsze;
+ __u8 fdpa;
+ __u8 vss;
+ __le32 nrg;
+ __le16 nruh;
+ __le16 maxpids;
+ __le32 nns;
+ __le64 runs;
+ __le32 erutl;
+ __u8 rsvd28[36];
+ struct nvme_fdp_ruh_desc ruhs[];
+};
+
+struct nvme_fdp_config_log {
+ __le16 numfdpc;
+ __u8 ver;
+ __u8 rsvd3;
+ __le32 sze;
+ __u8 rsvd8[8];
+ /*
+ * This is followed by variable number of nvme_fdp_config_desc
+ * structures, but sparse doesn't like nested variable sized arrays.
+ */
+};
+
struct nvme_smart_log {
__u8 critical_warning;
__u8 temperature[2];
@@ -915,6 +954,7 @@ enum nvme_opcode {
nvme_cmd_resv_register = 0x0d,
nvme_cmd_resv_report = 0x0e,
nvme_cmd_resv_acquire = 0x11,
+ nvme_cmd_io_mgmt_recv = 0x12,
nvme_cmd_resv_release = 0x15,
nvme_cmd_zone_mgmt_send = 0x79,
nvme_cmd_zone_mgmt_recv = 0x7a,
@@ -936,6 +976,7 @@ enum nvme_opcode {
nvme_opcode_name(nvme_cmd_resv_register), \
nvme_opcode_name(nvme_cmd_resv_report), \
nvme_opcode_name(nvme_cmd_resv_acquire), \
+ nvme_opcode_name(nvme_cmd_io_mgmt_recv), \
nvme_opcode_name(nvme_cmd_resv_release), \
nvme_opcode_name(nvme_cmd_zone_mgmt_send), \
nvme_opcode_name(nvme_cmd_zone_mgmt_recv), \
@@ -1087,6 +1128,7 @@ enum {
NVME_RW_PRINFO_PRCHK_GUARD = 1 << 12,
NVME_RW_PRINFO_PRACT = 1 << 13,
NVME_RW_DTYPE_STREAMS = 1 << 4,
+ NVME_RW_DTYPE_DPLCMT = 2 << 4,
NVME_WZ_DEAC = 1 << 9,
};
@@ -1174,6 +1216,38 @@ struct nvme_zone_mgmt_recv_cmd {
__le32 cdw14[2];
};
+struct nvme_io_mgmt_recv_cmd {
+ __u8 opcode;
+ __u8 flags;
+ __u16 command_id;
+ __le32 nsid;
+ __le64 rsvd2[2];
+ union nvme_data_ptr dptr;
+ __u8 mo;
+ __u8 rsvd11;
+ __u16 mos;
+ __le32 numd;
+ __le32 cdw12[4];
+};
+
+enum {
+ NVME_IO_MGMT_RECV_MO_RUHS = 1,
+};
+
+struct nvme_fdp_ruh_status_desc {
+ __le16 pid;
+ __le16 ruhid;
+ __le32 earutr;
+ __le64 ruamw;
+ __u8 reserved[16];
+};
+
+struct nvme_fdp_ruh_status {
+ __u8 rsvd0[14];
+ __le16 nruhsd;
+ struct nvme_fdp_ruh_status_desc ruhsd[];
+};
+
enum {
NVME_ZRA_ZONE_REPORT = 0,
NVME_ZRASF_ZONE_REPORT_ALL = 0,
@@ -1309,6 +1383,7 @@ enum {
NVME_FEAT_PLM_WINDOW = 0x14,
NVME_FEAT_HOST_BEHAVIOR = 0x16,
NVME_FEAT_SANITIZE = 0x17,
+ NVME_FEAT_FDP = 0x1d,
NVME_FEAT_SW_PROGRESS = 0x80,
NVME_FEAT_HOST_ID = 0x81,
NVME_FEAT_RESV_MASK = 0x82,
@@ -1329,6 +1404,7 @@ enum {
NVME_LOG_ANA = 0x0c,
NVME_LOG_FEATURES = 0x12,
NVME_LOG_RMI = 0x16,
+ NVME_LOG_FDP_CONFIGS = 0x20,
NVME_LOG_DISC = 0x70,
NVME_LOG_RESERVATION = 0x80,
NVME_FWACT_REPL = (0 << 3),
@@ -1923,6 +1999,7 @@ struct nvme_command {
struct nvmf_auth_receive_command auth_receive;
struct nvme_dbbuf dbbuf;
struct nvme_directive_cmd directive;
+ struct nvme_io_mgmt_recv_cmd imr;
};
};
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 10/11] nvme: register fdp parameters with the block layer
[not found] ` <CGME20250506122651epcas5p4100fd5435ce6e6686318265b414c1176@epcas5p4.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
2025-05-06 16:13 ` Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Keith Busch <kbusch@kernel.org>
Register the device data placement limits if supported. This is just
registering the limits with the block layer. Nothing beyond reporting
these attributes is happening in this patch.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
drivers/nvme/host/core.c | 144 +++++++++++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 2 +
2 files changed, 146 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index dd71b4c2b7b7..f25e03ff03df 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -38,6 +38,8 @@ struct nvme_ns_info {
u32 nsid;
__le32 anagrpid;
u8 pi_offset;
+ u16 endgid;
+ u64 runs;
bool is_shared;
bool is_readonly;
bool is_ready;
@@ -1611,6 +1613,7 @@ static int nvme_ns_info_from_identify(struct nvme_ctrl *ctrl,
info->is_shared = id->nmic & NVME_NS_NMIC_SHARED;
info->is_readonly = id->nsattr & NVME_NS_ATTR_RO;
info->is_ready = true;
+ info->endgid = le16_to_cpu(id->endgid);
if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) {
dev_info(ctrl->device,
"Ignoring bogus Namespace Identifiers\n");
@@ -1651,6 +1654,7 @@ static int nvme_ns_info_from_id_cs_indep(struct nvme_ctrl *ctrl,
info->is_ready = id->nstat & NVME_NSTAT_NRDY;
info->is_rotational = id->nsfeat & NVME_NS_ROTATIONAL;
info->no_vwc = id->nsfeat & NVME_NS_VWC_NOT_PRESENT;
+ info->endgid = le16_to_cpu(id->endgid);
}
kfree(id);
return ret;
@@ -2155,6 +2159,132 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
return ret;
}
+static int nvme_query_fdp_granularity(struct nvme_ctrl *ctrl,
+ struct nvme_ns_info *info, u8 fdp_idx)
+{
+ struct nvme_fdp_config_log hdr, *h;
+ struct nvme_fdp_config_desc *desc;
+ size_t size = sizeof(hdr);
+ void *log, *end;
+ int i, n, ret;
+
+ ret = nvme_get_log_lsi(ctrl, 0, NVME_LOG_FDP_CONFIGS, 0,
+ NVME_CSI_NVM, &hdr, size, 0, info->endgid);
+ if (ret) {
+ dev_warn(ctrl->device,
+ "FDP configs log header status:0x%x endgid:%d\n", ret,
+ info->endgid);
+ return ret;
+ }
+
+ size = le32_to_cpu(hdr.sze);
+ if (size > PAGE_SIZE * MAX_ORDER_NR_PAGES) {
+ dev_warn(ctrl->device, "FDP config size too large:%zu\n",
+ size);
+ return 0;
+ }
+
+ h = kvmalloc(size, GFP_KERNEL);
+ if (!h)
+ return -ENOMEM;
+
+ ret = nvme_get_log_lsi(ctrl, 0, NVME_LOG_FDP_CONFIGS, 0,
+ NVME_CSI_NVM, h, size, 0, info->endgid);
+ if (ret) {
+ dev_warn(ctrl->device,
+ "FDP configs log status:0x%x endgid:%d\n", ret,
+ info->endgid);
+ goto out;
+ }
+
+ n = le16_to_cpu(h->numfdpc) + 1;
+ if (fdp_idx > n) {
+ dev_warn(ctrl->device, "FDP index:%d out of range:%d\n",
+ fdp_idx, n);
+ /* Proceed without registering FDP streams */
+ ret = 0;
+ goto out;
+ }
+
+ log = h + 1;
+ desc = log;
+ end = log + size - sizeof(*h);
+ for (i = 0; i < fdp_idx; i++) {
+ log += le16_to_cpu(desc->dsze);
+ desc = log;
+ if (log >= end) {
+ dev_warn(ctrl->device,
+ "FDP invalid config descriptor list\n");
+ ret = 0;
+ goto out;
+ }
+ }
+
+ if (le32_to_cpu(desc->nrg) > 1) {
+ dev_warn(ctrl->device, "FDP NRG > 1 not supported\n");
+ ret = 0;
+ goto out;
+ }
+
+ info->runs = le64_to_cpu(desc->runs);
+out:
+ kvfree(h);
+ return ret;
+}
+
+static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
+{
+ struct nvme_ns_head *head = ns->head;
+ struct nvme_ctrl *ctrl = ns->ctrl;
+ struct nvme_fdp_ruh_status *ruhs;
+ struct nvme_fdp_config fdp;
+ struct nvme_command c = {};
+ size_t size;
+ int ret;
+
+ /*
+ * The FDP configuration is static for the lifetime of the namespace,
+ * so return immediately if we've already registered this namespace's
+ * streams.
+ */
+ if (head->nr_plids)
+ return 0;
+
+ ret = nvme_get_features(ctrl, NVME_FEAT_FDP, info->endgid, NULL, 0,
+ &fdp);
+ if (ret) {
+ dev_warn(ctrl->device, "FDP get feature status:0x%x\n", ret);
+ return ret;
+ }
+
+ if (!(fdp.flags & FDPCFG_FDPE))
+ return 0;
+
+ ret = nvme_query_fdp_granularity(ctrl, info, fdp.fdpcidx);
+ if (!info->runs)
+ return ret;
+
+ size = struct_size(ruhs, ruhsd, S8_MAX - 1);
+ ruhs = kzalloc(size, GFP_KERNEL);
+ if (!ruhs)
+ return -ENOMEM;
+
+ c.imr.opcode = nvme_cmd_io_mgmt_recv;
+ c.imr.nsid = cpu_to_le32(head->ns_id);
+ c.imr.mo = NVME_IO_MGMT_RECV_MO_RUHS;
+ c.imr.numd = cpu_to_le32(nvme_bytes_to_numd(size));
+ ret = nvme_submit_sync_cmd(ns->queue, &c, ruhs, size);
+ if (ret) {
+ dev_warn(ctrl->device, "FDP io-mgmt status:0x%x\n", ret);
+ goto free;
+ }
+
+ head->nr_plids = le16_to_cpu(ruhs->nruhsd);
+free:
+ kfree(ruhs);
+ return ret;
+}
+
static int nvme_update_ns_info_block(struct nvme_ns *ns,
struct nvme_ns_info *info)
{
@@ -2192,6 +2322,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
goto out;
}
+ if (ns->ctrl->ctratt & NVME_CTRL_ATTR_FDPS) {
+ ret = nvme_query_fdp_info(ns, info);
+ if (ret < 0)
+ goto out;
+ }
+
lim = queue_limits_start_update(ns->disk->queue);
memflags = blk_mq_freeze_queue(ns->disk->queue);
@@ -2225,6 +2361,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
if (!nvme_init_integrity(ns->head, &lim, info))
capacity = 0;
+ lim.max_write_streams = ns->head->nr_plids;
+ if (lim.max_write_streams)
+ lim.write_stream_granularity = max(info->runs, U32_MAX);
+ else
+ lim.write_stream_granularity = 0;
+
ret = queue_limits_commit_update(ns->disk->queue, &lim);
if (ret) {
blk_mq_unfreeze_queue(ns->disk->queue, memflags);
@@ -2328,6 +2470,8 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
ns->head->disk->flags |= GENHD_FL_HIDDEN;
else
nvme_init_integrity(ns->head, &lim, info);
+ lim.max_write_streams = ns_lim->max_write_streams;
+ lim.write_stream_granularity = ns_lim->write_stream_granularity;
ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index aedb734283b8..3e14daa4ed3e 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -496,6 +496,8 @@ struct nvme_ns_head {
struct device cdev_device;
struct gendisk *disk;
+
+ u16 nr_plids;
#ifdef CONFIG_NVME_MULTIPATH
struct bio_list requeue_list;
spinlock_t requeue_lock;
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v16 11/11] nvme: use fdp streams if write stream is provided
[not found] ` <CGME20250506122653epcas5p1824d4af64d0b599fde2de831d8ebf732@epcas5p1.samsung.com>
@ 2025-05-06 12:17 ` Kanchan Joshi
2025-05-06 16:14 ` Caleb Sander Mateos
0 siblings, 1 reply; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 12:17 UTC (permalink / raw)
To: axboe, kbusch, hch, asml.silence
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty, Kanchan Joshi
From: Keith Busch <kbusch@kernel.org>
Maps a user requested write stream to an FDP placement ID if possible.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
---
drivers/nvme/host/core.c | 31 ++++++++++++++++++++++++++++++-
drivers/nvme/host/nvme.h | 1 +
2 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f25e03ff03df..52331a14bce1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -672,6 +672,7 @@ static void nvme_free_ns_head(struct kref *ref)
ida_free(&head->subsys->ns_ida, head->instance);
cleanup_srcu_struct(&head->srcu);
nvme_put_subsystem(head->subsys);
+ kfree(head->plids);
kfree(head);
}
@@ -995,6 +996,18 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
if (req->cmd_flags & REQ_RAHEAD)
dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
+ if (op == nvme_cmd_write && ns->head->nr_plids) {
+ u16 write_stream = req->bio->bi_write_stream;
+
+ if (WARN_ON_ONCE(write_stream > ns->head->nr_plids))
+ return BLK_STS_INVAL;
+
+ if (write_stream) {
+ dsmgmt |= ns->head->plids[write_stream - 1] << 16;
+ control |= NVME_RW_DTYPE_DPLCMT;
+ }
+ }
+
if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req))
return BLK_STS_INVAL;
@@ -2240,7 +2253,7 @@ static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
struct nvme_fdp_config fdp;
struct nvme_command c = {};
size_t size;
- int ret;
+ int i, ret;
/*
* The FDP configuration is static for the lifetime of the namespace,
@@ -2280,6 +2293,22 @@ static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
}
head->nr_plids = le16_to_cpu(ruhs->nruhsd);
+ if (!head->nr_plids)
+ goto free;
+
+ head->plids = kcalloc(head->nr_plids, sizeof(head->plids),
+ GFP_KERNEL);
+ if (!head->plids) {
+ dev_warn(ctrl->device,
+ "failed to allocate %u FDP placement IDs\n",
+ head->nr_plids);
+ head->nr_plids = 0;
+ ret = -ENOMEM;
+ goto free;
+ }
+
+ for (i = 0; i < head->nr_plids; i++)
+ head->plids[i] = le16_to_cpu(ruhs->ruhsd[i].pid);
free:
kfree(ruhs);
return ret;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 3e14daa4ed3e..7aad581271c7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -498,6 +498,7 @@ struct nvme_ns_head {
struct gendisk *disk;
u16 nr_plids;
+ u16 *plids;
#ifdef CONFIG_NVME_MULTIPATH
struct bio_list requeue_list;
spinlock_t requeue_lock;
--
2.25.1
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v16 00/11] Block write streams with nvme fdp
2025-05-06 12:17 ` [PATCH v16 00/11] Block write streams with nvme fdp Kanchan Joshi
` (10 preceding siblings ...)
[not found] ` <CGME20250506122653epcas5p1824d4af64d0b599fde2de831d8ebf732@epcas5p1.samsung.com>
@ 2025-05-06 13:48 ` Jens Axboe
11 siblings, 0 replies; 19+ messages in thread
From: Jens Axboe @ 2025-05-06 13:48 UTC (permalink / raw)
To: kbusch, hch, asml.silence, Kanchan Joshi
Cc: io-uring, linux-block, linux-fsdevel, linux-nvme
On Tue, 06 May 2025 17:47:21 +0530, Kanchan Joshi wrote:
> The series enables FDP support for block IO.
> The patches
> - Add ki_write_stream in kiocb (patch 1), and bi_write_stream in bio (patch 2).
> - Introduce two new queue limits - max_write_streams and
> write_stream_granularity (patch 3, 4)
> - Pass write stream (either from kiocb, or from inode write hints)
> for block device (patch 5)
> - Per I/O write stream interface in io_uring (patch 6)
> - Register nvme fdp via write stream queue limits (patch 10, 11)
>
> [...]
Applied, thanks!
[01/11] fs: add a write stream field to the kiocb
commit: 732f25a2895a8c1c54fb56544f0b1e23770ef4d7
[02/11] block: add a bi_write_stream field
commit: 5006f85ea23ea0bda9a8e31fdda126f4fca48f20
[03/11] block: introduce max_write_streams queue limit
commit: d2f526ba27d29c442542f7c5df0a86ef0b576716
[04/11] block: introduce a write_stream_granularity queue limit
commit: c23acfac10786ac5062a0615e23e68b913ac8da0
[05/11] block: expose write streams for block device nodes
commit: c27683da6406031d47a65b344d04a40736490d95
[06/11] io_uring: enable per-io write streams
commit: 02040353f4fedb823f011f27962325f328d0689f
[07/11] nvme: add a nvme_get_log_lsi helper
commit: d4f8359eaecf0f8b0a9f631e6652b60ae61f3016
[08/11] nvme: pass a void pointer to nvme_get/set_features for the result
commit: 7a044d34b1e21fc4e04d4e48dae1dc3795621570
[09/11] nvme: add FDP definitions
commit: ee203d3d86113559b77b1723e0d10909ebbd66ad
[10/11] nvme: register fdp parameters with the block layer
commit: 30b5f20bb2ddab013035399e5c7e6577da49320a
[11/11] nvme: use fdp streams if write stream is provided
commit: 38e8397dde6338c76593ddb17ccf3118fc3f5203
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v16 10/11] nvme: register fdp parameters with the block layer
2025-05-06 12:17 ` [PATCH v16 10/11] nvme: register fdp parameters with the block layer Kanchan Joshi
@ 2025-05-06 16:13 ` Caleb Sander Mateos
2025-05-06 16:26 ` Keith Busch
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-05-06 16:13 UTC (permalink / raw)
To: Kanchan Joshi
Cc: axboe, kbusch, hch, asml.silence, io-uring, linux-block,
linux-fsdevel, linux-nvme, Hannes Reinecke, Nitesh Shetty
On Tue, May 6, 2025 at 5:31 AM Kanchan Joshi <joshi.k@samsung.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> Register the device data placement limits if supported. This is just
> registering the limits with the block layer. Nothing beyond reporting
> these attributes is happening in this patch.
>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
> drivers/nvme/host/core.c | 144 +++++++++++++++++++++++++++++++++++++++
> drivers/nvme/host/nvme.h | 2 +
> 2 files changed, 146 insertions(+)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index dd71b4c2b7b7..f25e03ff03df 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -38,6 +38,8 @@ struct nvme_ns_info {
> u32 nsid;
> __le32 anagrpid;
> u8 pi_offset;
> + u16 endgid;
> + u64 runs;
> bool is_shared;
> bool is_readonly;
> bool is_ready;
> @@ -1611,6 +1613,7 @@ static int nvme_ns_info_from_identify(struct nvme_ctrl *ctrl,
> info->is_shared = id->nmic & NVME_NS_NMIC_SHARED;
> info->is_readonly = id->nsattr & NVME_NS_ATTR_RO;
> info->is_ready = true;
> + info->endgid = le16_to_cpu(id->endgid);
> if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) {
> dev_info(ctrl->device,
> "Ignoring bogus Namespace Identifiers\n");
> @@ -1651,6 +1654,7 @@ static int nvme_ns_info_from_id_cs_indep(struct nvme_ctrl *ctrl,
> info->is_ready = id->nstat & NVME_NSTAT_NRDY;
> info->is_rotational = id->nsfeat & NVME_NS_ROTATIONAL;
> info->no_vwc = id->nsfeat & NVME_NS_VWC_NOT_PRESENT;
> + info->endgid = le16_to_cpu(id->endgid);
> }
> kfree(id);
> return ret;
> @@ -2155,6 +2159,132 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
> return ret;
> }
>
> +static int nvme_query_fdp_granularity(struct nvme_ctrl *ctrl,
> + struct nvme_ns_info *info, u8 fdp_idx)
> +{
> + struct nvme_fdp_config_log hdr, *h;
> + struct nvme_fdp_config_desc *desc;
> + size_t size = sizeof(hdr);
> + void *log, *end;
> + int i, n, ret;
> +
> + ret = nvme_get_log_lsi(ctrl, 0, NVME_LOG_FDP_CONFIGS, 0,
> + NVME_CSI_NVM, &hdr, size, 0, info->endgid);
> + if (ret) {
> + dev_warn(ctrl->device,
> + "FDP configs log header status:0x%x endgid:%d\n", ret,
> + info->endgid);
> + return ret;
> + }
> +
> + size = le32_to_cpu(hdr.sze);
> + if (size > PAGE_SIZE * MAX_ORDER_NR_PAGES) {
> + dev_warn(ctrl->device, "FDP config size too large:%zu\n",
> + size);
> + return 0;
> + }
> +
> + h = kvmalloc(size, GFP_KERNEL);
> + if (!h)
> + return -ENOMEM;
> +
> + ret = nvme_get_log_lsi(ctrl, 0, NVME_LOG_FDP_CONFIGS, 0,
> + NVME_CSI_NVM, h, size, 0, info->endgid);
> + if (ret) {
> + dev_warn(ctrl->device,
> + "FDP configs log status:0x%x endgid:%d\n", ret,
> + info->endgid);
> + goto out;
> + }
> +
> + n = le16_to_cpu(h->numfdpc) + 1;
> + if (fdp_idx > n) {
> + dev_warn(ctrl->device, "FDP index:%d out of range:%d\n",
> + fdp_idx, n);
> + /* Proceed without registering FDP streams */
> + ret = 0;
> + goto out;
> + }
> +
> + log = h + 1;
> + desc = log;
> + end = log + size - sizeof(*h);
> + for (i = 0; i < fdp_idx; i++) {
> + log += le16_to_cpu(desc->dsze);
> + desc = log;
> + if (log >= end) {
> + dev_warn(ctrl->device,
> + "FDP invalid config descriptor list\n");
> + ret = 0;
> + goto out;
> + }
> + }
> +
> + if (le32_to_cpu(desc->nrg) > 1) {
> + dev_warn(ctrl->device, "FDP NRG > 1 not supported\n");
> + ret = 0;
> + goto out;
> + }
> +
> + info->runs = le64_to_cpu(desc->runs);
> +out:
> + kvfree(h);
> + return ret;
> +}
> +
> +static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
> +{
> + struct nvme_ns_head *head = ns->head;
> + struct nvme_ctrl *ctrl = ns->ctrl;
> + struct nvme_fdp_ruh_status *ruhs;
> + struct nvme_fdp_config fdp;
> + struct nvme_command c = {};
> + size_t size;
> + int ret;
> +
> + /*
> + * The FDP configuration is static for the lifetime of the namespace,
> + * so return immediately if we've already registered this namespace's
> + * streams.
> + */
> + if (head->nr_plids)
> + return 0;
> +
> + ret = nvme_get_features(ctrl, NVME_FEAT_FDP, info->endgid, NULL, 0,
> + &fdp);
> + if (ret) {
> + dev_warn(ctrl->device, "FDP get feature status:0x%x\n", ret);
> + return ret;
> + }
> +
> + if (!(fdp.flags & FDPCFG_FDPE))
> + return 0;
> +
> + ret = nvme_query_fdp_granularity(ctrl, info, fdp.fdpcidx);
> + if (!info->runs)
> + return ret;
> +
> + size = struct_size(ruhs, ruhsd, S8_MAX - 1);
> + ruhs = kzalloc(size, GFP_KERNEL);
> + if (!ruhs)
> + return -ENOMEM;
> +
> + c.imr.opcode = nvme_cmd_io_mgmt_recv;
> + c.imr.nsid = cpu_to_le32(head->ns_id);
> + c.imr.mo = NVME_IO_MGMT_RECV_MO_RUHS;
> + c.imr.numd = cpu_to_le32(nvme_bytes_to_numd(size));
> + ret = nvme_submit_sync_cmd(ns->queue, &c, ruhs, size);
> + if (ret) {
> + dev_warn(ctrl->device, "FDP io-mgmt status:0x%x\n", ret);
> + goto free;
> + }
> +
> + head->nr_plids = le16_to_cpu(ruhs->nruhsd);
> +free:
> + kfree(ruhs);
> + return ret;
> +}
> +
> static int nvme_update_ns_info_block(struct nvme_ns *ns,
> struct nvme_ns_info *info)
> {
> @@ -2192,6 +2322,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
> goto out;
> }
>
> + if (ns->ctrl->ctratt & NVME_CTRL_ATTR_FDPS) {
> + ret = nvme_query_fdp_info(ns, info);
> + if (ret < 0)
> + goto out;
> + }
> +
> lim = queue_limits_start_update(ns->disk->queue);
>
> memflags = blk_mq_freeze_queue(ns->disk->queue);
> @@ -2225,6 +2361,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
> if (!nvme_init_integrity(ns->head, &lim, info))
> capacity = 0;
>
> + lim.max_write_streams = ns->head->nr_plids;
> + if (lim.max_write_streams)
> + lim.write_stream_granularity = max(info->runs, U32_MAX);
What is the purpose of this max(..., U32_MAX)? Should it be min() instead?
Best,
Caleb
> + else
> + lim.write_stream_granularity = 0;
> +
> ret = queue_limits_commit_update(ns->disk->queue, &lim);
> if (ret) {
> blk_mq_unfreeze_queue(ns->disk->queue, memflags);
> @@ -2328,6 +2470,8 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
> ns->head->disk->flags |= GENHD_FL_HIDDEN;
> else
> nvme_init_integrity(ns->head, &lim, info);
> + lim.max_write_streams = ns_lim->max_write_streams;
> + lim.write_stream_granularity = ns_lim->write_stream_granularity;
> ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
>
> set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index aedb734283b8..3e14daa4ed3e 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -496,6 +496,8 @@ struct nvme_ns_head {
> struct device cdev_device;
>
> struct gendisk *disk;
> +
> + u16 nr_plids;
> #ifdef CONFIG_NVME_MULTIPATH
> struct bio_list requeue_list;
> spinlock_t requeue_lock;
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v16 11/11] nvme: use fdp streams if write stream is provided
2025-05-06 12:17 ` [PATCH v16 11/11] nvme: use fdp streams if write stream is provided Kanchan Joshi
@ 2025-05-06 16:14 ` Caleb Sander Mateos
2025-05-06 16:28 ` Keith Busch
0 siblings, 1 reply; 19+ messages in thread
From: Caleb Sander Mateos @ 2025-05-06 16:14 UTC (permalink / raw)
To: Kanchan Joshi
Cc: axboe, kbusch, hch, asml.silence, io-uring, linux-block,
linux-fsdevel, linux-nvme, Hannes Reinecke, Nitesh Shetty
On Tue, May 6, 2025 at 5:31 AM Kanchan Joshi <joshi.k@samsung.com> wrote:
>
> From: Keith Busch <kbusch@kernel.org>
>
> Maps a user requested write stream to an FDP placement ID if possible.
>
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
> ---
> drivers/nvme/host/core.c | 31 ++++++++++++++++++++++++++++++-
> drivers/nvme/host/nvme.h | 1 +
> 2 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index f25e03ff03df..52331a14bce1 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -672,6 +672,7 @@ static void nvme_free_ns_head(struct kref *ref)
> ida_free(&head->subsys->ns_ida, head->instance);
> cleanup_srcu_struct(&head->srcu);
> nvme_put_subsystem(head->subsys);
> + kfree(head->plids);
> kfree(head);
> }
>
> @@ -995,6 +996,18 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
> if (req->cmd_flags & REQ_RAHEAD)
> dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
>
> + if (op == nvme_cmd_write && ns->head->nr_plids) {
> + u16 write_stream = req->bio->bi_write_stream;
> +
> + if (WARN_ON_ONCE(write_stream > ns->head->nr_plids))
> + return BLK_STS_INVAL;
> +
> + if (write_stream) {
> + dsmgmt |= ns->head->plids[write_stream - 1] << 16;
> + control |= NVME_RW_DTYPE_DPLCMT;
> + }
> + }
> +
> if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req))
> return BLK_STS_INVAL;
>
> @@ -2240,7 +2253,7 @@ static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
> struct nvme_fdp_config fdp;
> struct nvme_command c = {};
> size_t size;
> - int ret;
> + int i, ret;
>
> /*
> * The FDP configuration is static for the lifetime of the namespace,
> @@ -2280,6 +2293,22 @@ static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
> }
>
> head->nr_plids = le16_to_cpu(ruhs->nruhsd);
> + if (!head->nr_plids)
> + goto free;
> +
> + head->plids = kcalloc(head->nr_plids, sizeof(head->plids),
> + GFP_KERNEL);
Should this be sizeof(*head->plids)?
Best,
Caleb
> + if (!head->plids) {
> + dev_warn(ctrl->device,
> + "failed to allocate %u FDP placement IDs\n",
> + head->nr_plids);
> + head->nr_plids = 0;
> + ret = -ENOMEM;
> + goto free;
> + }
> +
> + for (i = 0; i < head->nr_plids; i++)
> + head->plids[i] = le16_to_cpu(ruhs->ruhsd[i].pid);
> free:
> kfree(ruhs);
> return ret;
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 3e14daa4ed3e..7aad581271c7 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -498,6 +498,7 @@ struct nvme_ns_head {
> struct gendisk *disk;
>
> u16 nr_plids;
> + u16 *plids;
> #ifdef CONFIG_NVME_MULTIPATH
> struct bio_list requeue_list;
> spinlock_t requeue_lock;
> --
> 2.25.1
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v16 10/11] nvme: register fdp parameters with the block layer
2025-05-06 16:13 ` Caleb Sander Mateos
@ 2025-05-06 16:26 ` Keith Busch
2025-05-06 18:14 ` Kanchan Joshi
0 siblings, 1 reply; 19+ messages in thread
From: Keith Busch @ 2025-05-06 16:26 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Kanchan Joshi, axboe, hch, asml.silence, io-uring, linux-block,
linux-fsdevel, linux-nvme, Hannes Reinecke, Nitesh Shetty
On Tue, May 06, 2025 at 09:13:33AM -0700, Caleb Sander Mateos wrote:
> On Tue, May 6, 2025 at 5:31 AM Kanchan Joshi <joshi.k@samsung.com> wrote:
> > @@ -2225,6 +2361,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
> > if (!nvme_init_integrity(ns->head, &lim, info))
> > capacity = 0;
> >
> > + lim.max_write_streams = ns->head->nr_plids;
> > + if (lim.max_write_streams)
> > + lim.write_stream_granularity = max(info->runs, U32_MAX);
>
> What is the purpose of this max(..., U32_MAX)? Should it be min() instead?
You're right, should have been min. Because "runs" is a u64 and the
queue_limit is a u32, so U32_MAX is the upper limit, but it's not
supposed to exceed "runs".
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v16 11/11] nvme: use fdp streams if write stream is provided
2025-05-06 16:14 ` Caleb Sander Mateos
@ 2025-05-06 16:28 ` Keith Busch
0 siblings, 0 replies; 19+ messages in thread
From: Keith Busch @ 2025-05-06 16:28 UTC (permalink / raw)
To: Caleb Sander Mateos
Cc: Kanchan Joshi, axboe, hch, asml.silence, io-uring, linux-block,
linux-fsdevel, linux-nvme, Hannes Reinecke, Nitesh Shetty
On Tue, May 06, 2025 at 09:14:19AM -0700, Caleb Sander Mateos wrote:
> > + head->plids = kcalloc(head->nr_plids, sizeof(head->plids),
> > + GFP_KERNEL);
>
> Should this be sizeof(*head->plids)?
Indeed it should. This as-is overallocates the array size, so wouldn't
have easily found it at runtime.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v16 10/11] nvme: register fdp parameters with the block layer
2025-05-06 16:26 ` Keith Busch
@ 2025-05-06 18:14 ` Kanchan Joshi
2025-05-06 19:03 ` Keith Busch
0 siblings, 1 reply; 19+ messages in thread
From: Kanchan Joshi @ 2025-05-06 18:14 UTC (permalink / raw)
To: Keith Busch
Cc: Caleb Sander Mateos, Kanchan Joshi, axboe, hch, asml.silence,
io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty
On Tue, May 6, 2025 at 9:56 PM Keith Busch <kbusch@kernel.org> wrote:
>
> On Tue, May 06, 2025 at 09:13:33AM -0700, Caleb Sander Mateos wrote:
> > On Tue, May 6, 2025 at 5:31 AM Kanchan Joshi <joshi.k@samsung.com> wrote:
> > > @@ -2225,6 +2361,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
> > > if (!nvme_init_integrity(ns->head, &lim, info))
> > > capacity = 0;
> > >
> > > + lim.max_write_streams = ns->head->nr_plids;
> > > + if (lim.max_write_streams)
> > > + lim.write_stream_granularity = max(info->runs, U32_MAX);
> >
> > What is the purpose of this max(..., U32_MAX)? Should it be min() instead?
>
> You're right, should have been min. Because "runs" is a u64 and the
> queue_limit is a u32, so U32_MAX is the upper limit, but it's not
> supposed to exceed "runs".
Would it be better to change write_stream_granularity to "long
unsigned int" so that it matches with what is possible in nvme?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v16 10/11] nvme: register fdp parameters with the block layer
2025-05-06 18:14 ` Kanchan Joshi
@ 2025-05-06 19:03 ` Keith Busch
0 siblings, 0 replies; 19+ messages in thread
From: Keith Busch @ 2025-05-06 19:03 UTC (permalink / raw)
To: Kanchan Joshi
Cc: Caleb Sander Mateos, Kanchan Joshi, axboe, hch, asml.silence,
io-uring, linux-block, linux-fsdevel, linux-nvme, Hannes Reinecke,
Nitesh Shetty
On Tue, May 06, 2025 at 11:44:27PM +0530, Kanchan Joshi wrote:
> On Tue, May 6, 2025 at 9:56 PM Keith Busch <kbusch@kernel.org> wrote:
> >
> > You're right, should have been min. Because "runs" is a u64 and the
> > queue_limit is a u32, so U32_MAX is the upper limit, but it's not
> > supposed to exceed "runs".
>
> Would it be better to change write_stream_granularity to "long
> unsigned int" so that it matches with what is possible in nvme?
That type is still 4 bytes on many 32-bit archs, but I know what you
mean (unsigned long long). I didn't think we'd see reclaim units
approach 4GB, but if you think it's possible, may as well have the
queue_limit type be large enough to report it.
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2025-05-06 19:03 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CGME20250506122633epcas5p21d2c989313f38dea82162fff7b9856e7@epcas5p2.samsung.com>
2025-05-06 12:17 ` [PATCH v16 00/11] Block write streams with nvme fdp Kanchan Joshi
[not found] ` <CGME20250506122635epcas5p145565666b3bfedf8da08075dd928d2ac@epcas5p1.samsung.com>
2025-05-06 12:17 ` [PATCH v16 01/11] fs: add a write stream field to the kiocb Kanchan Joshi
[not found] ` <CGME20250506122637epcas5p4a4e84171a1c6fa4ce0f01b6783fa2385@epcas5p4.samsung.com>
2025-05-06 12:17 ` [PATCH v16 02/11] block: add a bi_write_stream field Kanchan Joshi
[not found] ` <CGME20250506122638epcas5p364107da78e115a57f1fa91436265edeb@epcas5p3.samsung.com>
2025-05-06 12:17 ` [PATCH v16 03/11] block: introduce max_write_streams queue limit Kanchan Joshi
[not found] ` <CGME20250506122640epcas5p43b5abe6562ad64ee1d7254b1215906d4@epcas5p4.samsung.com>
2025-05-06 12:17 ` [PATCH v16 04/11] block: introduce a write_stream_granularity " Kanchan Joshi
[not found] ` <CGME20250506122642epcas5p267fef037060e55d1e9c0055b0dfd692e@epcas5p2.samsung.com>
2025-05-06 12:17 ` [PATCH v16 05/11] block: expose write streams for block device nodes Kanchan Joshi
[not found] ` <CGME20250506122644epcas5p2b2bf2c66172dbaf3127f6621062efb24@epcas5p2.samsung.com>
2025-05-06 12:17 ` [PATCH v16 06/11] io_uring: enable per-io write streams Kanchan Joshi
[not found] ` <CGME20250506122646epcas5p3bd2a00493c94d1032c31ec64aaa1bbb0@epcas5p3.samsung.com>
2025-05-06 12:17 ` [PATCH v16 07/11] nvme: add a nvme_get_log_lsi helper Kanchan Joshi
[not found] ` <CGME20250506122647epcas5p41ed9efc231e2300a1547f6081db73842@epcas5p4.samsung.com>
2025-05-06 12:17 ` [PATCH v16 08/11] nvme: pass a void pointer to nvme_get/set_features for the result Kanchan Joshi
[not found] ` <CGME20250506122649epcas5p1294652bcfc93f08dd12e6ba8a497c55b@epcas5p1.samsung.com>
2025-05-06 12:17 ` [PATCH v16 09/11] nvme: add FDP definitions Kanchan Joshi
[not found] ` <CGME20250506122651epcas5p4100fd5435ce6e6686318265b414c1176@epcas5p4.samsung.com>
2025-05-06 12:17 ` [PATCH v16 10/11] nvme: register fdp parameters with the block layer Kanchan Joshi
2025-05-06 16:13 ` Caleb Sander Mateos
2025-05-06 16:26 ` Keith Busch
2025-05-06 18:14 ` Kanchan Joshi
2025-05-06 19:03 ` Keith Busch
[not found] ` <CGME20250506122653epcas5p1824d4af64d0b599fde2de831d8ebf732@epcas5p1.samsung.com>
2025-05-06 12:17 ` [PATCH v16 11/11] nvme: use fdp streams if write stream is provided Kanchan Joshi
2025-05-06 16:14 ` Caleb Sander Mateos
2025-05-06 16:28 ` Keith Busch
2025-05-06 13:48 ` [PATCH v16 00/11] Block write streams with nvme fdp Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox