public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCHv2 00/11] block write streams with nvme fdp
@ 2025-02-03 18:41 Keith Busch
  2025-02-03 18:41 ` [PATCHv2 01/11] fs: add a write stream field to the kiocb Keith Busch
                   ` (10 more replies)
  0 siblings, 11 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Keith Busch

From: Keith Busch <[email protected]>

Changes since v14:

  Merged to latest nvme, block, and io_uring trees

  Replaced vmalloc with kvmalloc (Christoph)

Christoph Hellwig (7):
  fs: add a write stream field to the kiocb
  block: add a bi_write_stream field
  block: introduce a write_stream_granularity queue limit
  block: expose write streams for block device nodes
  nvme: add a nvme_get_log_lsi helper
  nvme: pass a void pointer to nvme_get/set_features for the result
  nvme: add FDP definitions

Keith Busch (4):
  block: introduce max_write_streams queue limit
  io_uring: enable per-io write streams
  nvme: register fdp parameters with the block layer
  nvme: use fdp streams if write stream is provided

 Documentation/ABI/stable/sysfs-block |  15 +++
 block/bio.c                          |   2 +
 block/blk-crypto-fallback.c          |   1 +
 block/blk-merge.c                    |   4 +
 block/blk-sysfs.c                    |   6 +
 block/bounce.c                       |   1 +
 block/fops.c                         |  23 ++++
 drivers/nvme/host/core.c             | 191 ++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h             |   7 +-
 include/linux/blk_types.h            |   1 +
 include/linux/blkdev.h               |  10 ++
 include/linux/fs.h                   |   1 +
 include/linux/nvme.h                 |  77 +++++++++++
 include/uapi/linux/io_uring.h        |   4 +
 io_uring/io_uring.c                  |   2 +
 io_uring/rw.c                        |   1 +
 16 files changed, 340 insertions(+), 6 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCHv2 01/11] fs: add a write stream field to the kiocb
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-03 18:41 ` [PATCHv2 02/11] block: add a bi_write_stream field Keith Busch
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Hannes Reinecke,
	Nitesh Shetty, Keith Busch

From: Christoph Hellwig <[email protected]>

Prepare for io_uring passthrough of write streams. The write stream
field in the kiocb structure fits into an existing 2-byte hole, so its
size is not changed.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 include/linux/fs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index be3ad155ec9f7..e9ec7afa5d5f8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -408,6 +408,7 @@ struct kiocb {
 	void			*private;
 	int			ki_flags;
 	u16			ki_ioprio; /* See linux/ioprio.h */
+	u8			ki_write_stream;
 	union {
 		/*
 		 * Only used for async buffered reads, where it denotes the
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 02/11] block: add a bi_write_stream field
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
  2025-02-03 18:41 ` [PATCHv2 01/11] fs: add a write stream field to the kiocb Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-03 18:41 ` [PATCHv2 03/11] block: introduce max_write_streams queue limit Keith Busch
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Hannes Reinecke,
	Nitesh Shetty, Keith Busch

From: Christoph Hellwig <[email protected]>

Add the ability to pass a write stream for placement control in the bio.
The new field fits in an existing hole, so does not change the size of
the struct.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 block/bio.c                 | 2 ++
 block/blk-crypto-fallback.c | 1 +
 block/blk-merge.c           | 4 ++++
 block/bounce.c              | 1 +
 include/linux/blk_types.h   | 1 +
 5 files changed, 9 insertions(+)

diff --git a/block/bio.c b/block/bio.c
index f0c416e5931d9..f7c2b3b1a55dd 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -251,6 +251,7 @@ void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table,
 	bio->bi_flags = 0;
 	bio->bi_ioprio = 0;
 	bio->bi_write_hint = 0;
+	bio->bi_write_stream = 0;
 	bio->bi_status = 0;
 	bio->bi_iter.bi_sector = 0;
 	bio->bi_iter.bi_size = 0;
@@ -827,6 +828,7 @@ static int __bio_clone(struct bio *bio, struct bio *bio_src, gfp_t gfp)
 	bio_set_flag(bio, BIO_CLONED);
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_write_hint = bio_src->bi_write_hint;
+	bio->bi_write_stream = bio_src->bi_write_stream;
 	bio->bi_iter = bio_src->bi_iter;
 
 	if (bio->bi_bdev) {
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 29a205482617c..66762243a886b 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -173,6 +173,7 @@ static struct bio *blk_crypto_fallback_clone_bio(struct bio *bio_src)
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
 	bio->bi_write_hint	= bio_src->bi_write_hint;
+	bio->bi_write_stream	= bio_src->bi_write_stream;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 15cd231d560cb..85642ead0d805 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -829,6 +829,8 @@ static struct request *attempt_merge(struct request_queue *q,
 
 	if (req->bio->bi_write_hint != next->bio->bi_write_hint)
 		return NULL;
+	if (req->bio->bi_write_stream != next->bio->bi_write_stream)
+		return NULL;
 	if (req->bio->bi_ioprio != next->bio->bi_ioprio)
 		return NULL;
 	if (!blk_atomic_write_mergeable_rqs(req, next))
@@ -950,6 +952,8 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 		return false;
 	if (rq->bio->bi_write_hint != bio->bi_write_hint)
 		return false;
+	if (rq->bio->bi_write_stream != bio->bi_write_stream)
+		return false;
 	if (rq->bio->bi_ioprio != bio->bi_ioprio)
 		return false;
 	if (blk_atomic_write_mergeable_rq_bio(rq, bio) == false)
diff --git a/block/bounce.c b/block/bounce.c
index 0d898cd5ec497..fb8f60f114d7d 100644
--- a/block/bounce.c
+++ b/block/bounce.c
@@ -170,6 +170,7 @@ static struct bio *bounce_clone_bio(struct bio *bio_src)
 		bio_set_flag(bio, BIO_REMAPPED);
 	bio->bi_ioprio		= bio_src->bi_ioprio;
 	bio->bi_write_hint	= bio_src->bi_write_hint;
+	bio->bi_write_stream	= bio_src->bi_write_stream;
 	bio->bi_iter.bi_sector	= bio_src->bi_iter.bi_sector;
 	bio->bi_iter.bi_size	= bio_src->bi_iter.bi_size;
 
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index dce7615c35e7e..4ca3449ce9c95 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -220,6 +220,7 @@ struct bio {
 	unsigned short		bi_flags;	/* BIO_* below */
 	unsigned short		bi_ioprio;
 	enum rw_hint		bi_write_hint;
+	u8			bi_write_stream;
 	blk_status_t		bi_status;
 	atomic_t		__bi_remaining;
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 03/11] block: introduce max_write_streams queue limit
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
  2025-02-03 18:41 ` [PATCHv2 01/11] fs: add a write stream field to the kiocb Keith Busch
  2025-02-03 18:41 ` [PATCHv2 02/11] block: add a bi_write_stream field Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-03 18:41 ` [PATCHv2 04/11] block: introduce a write_stream_granularity " Keith Busch
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Keith Busch,
	Hannes Reinecke, Nitesh Shetty

From: Keith Busch <[email protected]>

Drivers with hardware that support write streams need a way to export how
many are available so applications can generically query this.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
[hch: renamed hints to streams, removed stacking]
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 Documentation/ABI/stable/sysfs-block | 7 +++++++
 block/blk-sysfs.c                    | 3 +++
 include/linux/blkdev.h               | 9 +++++++++
 3 files changed, 19 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 0cceb2badc836..f67139b8b8eff 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -506,6 +506,13 @@ Description:
 		[RO] Maximum size in bytes of a single element in a DMA
 		scatter/gather list.
 
+What:		/sys/block/<disk>/queue/max_write_streams
+Date:		November 2024
+Contact:	[email protected]
+Description:
+		[RO] Maximum number of write streams supported, 0 if not
+		supported. If supported, valid values are 1 through
+		max_write_streams, inclusive.
 
 What:		/sys/block/<disk>/queue/max_segments
 Date:		March 2010
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index e09b455874bfd..8b8f2b0f0c048 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -106,6 +106,7 @@ QUEUE_SYSFS_LIMIT_SHOW(max_segments)
 QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments)
 QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments)
 QUEUE_SYSFS_LIMIT_SHOW(max_segment_size)
+QUEUE_SYSFS_LIMIT_SHOW(max_write_streams)
 QUEUE_SYSFS_LIMIT_SHOW(logical_block_size)
 QUEUE_SYSFS_LIMIT_SHOW(physical_block_size)
 QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors)
@@ -434,6 +435,7 @@ QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb");
 QUEUE_RO_ENTRY(queue_max_segments, "max_segments");
 QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
 QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size");
+QUEUE_RO_ENTRY(queue_max_write_streams, "max_write_streams");
 QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler");
 
 QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size");
@@ -568,6 +570,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_max_discard_segments_entry.attr,
 	&queue_max_integrity_segments_entry.attr,
 	&queue_max_segment_size_entry.attr,
+	&queue_max_write_streams_entry.attr,
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 76f0a4e7c2e5d..f92db4b35f4b6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -397,6 +397,8 @@ struct queue_limits {
 	unsigned short		max_integrity_segments;
 	unsigned short		max_discard_segments;
 
+	unsigned short		max_write_streams;
+
 	unsigned int		max_open_zones;
 	unsigned int		max_active_zones;
 
@@ -1245,6 +1247,13 @@ static inline unsigned int bdev_max_segments(struct block_device *bdev)
 	return queue_max_segments(bdev_get_queue(bdev));
 }
 
+static inline unsigned short bdev_max_write_streams(struct block_device *bdev)
+{
+	if (bdev_is_partition(bdev))
+		return 0;
+	return bdev_limits(bdev)->max_write_streams;
+}
+
 static inline unsigned queue_logical_block_size(const struct request_queue *q)
 {
 	return q->limits.logical_block_size;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 04/11] block: introduce a write_stream_granularity queue limit
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
                   ` (2 preceding siblings ...)
  2025-02-03 18:41 ` [PATCHv2 03/11] block: introduce max_write_streams queue limit Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-03 18:41 ` [PATCHv2 05/11] block: expose write streams for block device nodes Keith Busch
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Hannes Reinecke,
	Nitesh Shetty, Keith Busch

From: Christoph Hellwig <[email protected]>

Export the granularity that write streams should be discarded with,
as it is essential for making good use of them.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 Documentation/ABI/stable/sysfs-block | 8 ++++++++
 block/blk-sysfs.c                    | 3 +++
 include/linux/blkdev.h               | 1 +
 3 files changed, 12 insertions(+)

diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index f67139b8b8eff..c454c68b68fe6 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -514,6 +514,14 @@ Description:
 		supported. If supported, valid values are 1 through
 		max_write_streams, inclusive.
 
+What:		/sys/block/<disk>/queue/write_stream_granularity
+Date:		November 2024
+Contact:	[email protected]
+Description:
+		[RO] Granularity of a write stream in bytes.  The granularity
+		of a write stream is the size that should be discarded or
+		overwritten together to avoid write amplification in the device.
+
 What:		/sys/block/<disk>/queue/max_segments
 Date:		March 2010
 Contact:	[email protected]
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 8b8f2b0f0c048..a4badc1f8d904 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -107,6 +107,7 @@ QUEUE_SYSFS_LIMIT_SHOW(max_discard_segments)
 QUEUE_SYSFS_LIMIT_SHOW(max_integrity_segments)
 QUEUE_SYSFS_LIMIT_SHOW(max_segment_size)
 QUEUE_SYSFS_LIMIT_SHOW(max_write_streams)
+QUEUE_SYSFS_LIMIT_SHOW(write_stream_granularity)
 QUEUE_SYSFS_LIMIT_SHOW(logical_block_size)
 QUEUE_SYSFS_LIMIT_SHOW(physical_block_size)
 QUEUE_SYSFS_LIMIT_SHOW(chunk_sectors)
@@ -436,6 +437,7 @@ QUEUE_RO_ENTRY(queue_max_segments, "max_segments");
 QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
 QUEUE_RO_ENTRY(queue_max_segment_size, "max_segment_size");
 QUEUE_RO_ENTRY(queue_max_write_streams, "max_write_streams");
+QUEUE_RO_ENTRY(queue_write_stream_granularity, "write_stream_granularity");
 QUEUE_RW_LOAD_MODULE_ENTRY(elv_iosched, "scheduler");
 
 QUEUE_RO_ENTRY(queue_logical_block_size, "logical_block_size");
@@ -571,6 +573,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_max_integrity_segments_entry.attr,
 	&queue_max_segment_size_entry.attr,
 	&queue_max_write_streams_entry.attr,
+	&queue_write_stream_granularity_entry.attr,
 	&queue_hw_sector_size_entry.attr,
 	&queue_logical_block_size_entry.attr,
 	&queue_physical_block_size_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f92db4b35f4b6..927d5531930e3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -398,6 +398,7 @@ struct queue_limits {
 	unsigned short		max_discard_segments;
 
 	unsigned short		max_write_streams;
+	unsigned int		write_stream_granularity;
 
 	unsigned int		max_open_zones;
 	unsigned int		max_active_zones;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 05/11] block: expose write streams for block device nodes
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
                   ` (3 preceding siblings ...)
  2025-02-03 18:41 ` [PATCHv2 04/11] block: introduce a write_stream_granularity " Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-03 18:41 ` [PATCHv2 06/11] io_uring: enable per-io write streams Keith Busch
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Hannes Reinecke,
	Nitesh Shetty, Keith Busch

From: Christoph Hellwig <[email protected]>

Use the per-kiocb write stream if provided, or map temperature hints to
write streams (which is a bit questionable, but this shows how it is
done).

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
[kbusch: removed statx reporting]
Signed-off-by: Keith Busch <[email protected]>
---
 block/fops.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/block/fops.c b/block/fops.c
index be9f1dbea9ce0..11902b7120af2 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -73,6 +73,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	}
 	bio.bi_iter.bi_sector = pos >> SECTOR_SHIFT;
 	bio.bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
+	bio.bi_write_stream = iocb->ki_write_stream;
 	bio.bi_ioprio = iocb->ki_ioprio;
 	if (iocb->ki_flags & IOCB_ATOMIC)
 		bio.bi_opf |= REQ_ATOMIC;
@@ -206,6 +207,7 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 	for (;;) {
 		bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
 		bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
+		bio->bi_write_stream = iocb->ki_write_stream;
 		bio->bi_private = dio;
 		bio->bi_end_io = blkdev_bio_end_io;
 		bio->bi_ioprio = iocb->ki_ioprio;
@@ -333,6 +335,7 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 	dio->iocb = iocb;
 	bio->bi_iter.bi_sector = pos >> SECTOR_SHIFT;
 	bio->bi_write_hint = file_inode(iocb->ki_filp)->i_write_hint;
+	bio->bi_write_stream = iocb->ki_write_stream;
 	bio->bi_end_io = blkdev_bio_end_io_async;
 	bio->bi_ioprio = iocb->ki_ioprio;
 
@@ -398,6 +401,26 @@ static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 	if (blkdev_dio_invalid(bdev, iocb, iter))
 		return -EINVAL;
 
+	if (iov_iter_rw(iter) == WRITE) {
+		u16 max_write_streams = bdev_max_write_streams(bdev);
+
+		if (iocb->ki_write_stream) {
+			if (iocb->ki_write_stream > max_write_streams)
+				return -EINVAL;
+		} else if (max_write_streams) {
+			enum rw_hint write_hint =
+				file_inode(iocb->ki_filp)->i_write_hint;
+
+			/*
+			 * Just use the write hint as write stream for block
+			 * device writes.  This assumes no file system is
+			 * mounted that would use the streams differently.
+			 */
+			if (write_hint <= max_write_streams)
+				iocb->ki_write_stream = write_hint;
+		}
+	}
+
 	nr_pages = bio_iov_vecs_to_alloc(iter, BIO_MAX_VECS + 1);
 	if (likely(nr_pages <= BIO_MAX_VECS)) {
 		if (is_sync_kiocb(iocb))
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 06/11] io_uring: enable per-io write streams
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
                   ` (4 preceding siblings ...)
  2025-02-03 18:41 ` [PATCHv2 05/11] block: expose write streams for block device nodes Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-04  6:00   ` Christoph Hellwig
  2025-02-03 18:41 ` [PATCHv2 07/11] nvme: add a nvme_get_log_lsi helper Keith Busch
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Keith Busch,
	Hannes Reinecke, Nitesh Shetty

From: Keith Busch <[email protected]>

Allow userspace to pass a per-I/O write stream in the SQE:

      __u8 write_stream;

The __u8 type matches the size the filesystems and block layer support.

Application can query the supported values from the block devices
max_write_streams sysfs attribute. Unsupported values are ignored by
file operations that do not support write streams or rejected with an
error by those that support them.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 include/uapi/linux/io_uring.h | 4 ++++
 io_uring/io_uring.c           | 2 ++
 io_uring/rw.c                 | 1 +
 3 files changed, 7 insertions(+)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index e11c826385277..df191ccbcd7bf 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -92,6 +92,10 @@ struct io_uring_sqe {
 			__u16	addr_len;
 			__u16	__pad3[1];
 		};
+		struct {
+			__u8	write_stream;
+			__u8	__pad4[3];
+		};
 	};
 	union {
 		struct {
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 263e504be4a8b..944ce66d4f1e2 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3875,6 +3875,8 @@ static int __init io_uring_init(void)
 	BUILD_BUG_SQE_ELEM(44, __s32,  splice_fd_in);
 	BUILD_BUG_SQE_ELEM(44, __u32,  file_index);
 	BUILD_BUG_SQE_ELEM(44, __u16,  addr_len);
+	BUILD_BUG_SQE_ELEM(44, __u8,   write_stream);
+	BUILD_BUG_SQE_ELEM(45, __u8,   __pad4[0]);
 	BUILD_BUG_SQE_ELEM(46, __u16,  __pad3[0]);
 	BUILD_BUG_SQE_ELEM(48, __u64,  addr3);
 	BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd);
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 991ecfbea88e3..641d484d2a9e8 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -298,6 +298,7 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 	}
 	rw->kiocb.dio_complete = NULL;
 	rw->kiocb.ki_flags = 0;
+	rw->kiocb.ki_write_stream = READ_ONCE(sqe->write_stream);
 
 	rw->addr = READ_ONCE(sqe->addr);
 	rw->len = READ_ONCE(sqe->len);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 07/11] nvme: add a nvme_get_log_lsi helper
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
                   ` (5 preceding siblings ...)
  2025-02-03 18:41 ` [PATCHv2 06/11] io_uring: enable per-io write streams Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-03 18:41 ` [PATCHv2 08/11] nvme: pass a void pointer to nvme_get/set_features for the result Keith Busch
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Hannes Reinecke,
	Nitesh Shetty, Keith Busch

From: Christoph Hellwig <[email protected]>

For log pages that need to pass in a LSI value, while at the same time
not touching all the existing nvme_get_log callers.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 drivers/nvme/host/core.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 653aa6544a90e..8c43823cc37e1 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -150,6 +150,8 @@ static void nvme_remove_invalid_namespaces(struct nvme_ctrl *ctrl,
 					   unsigned nsid);
 static void nvme_update_keep_alive(struct nvme_ctrl *ctrl,
 				   struct nvme_command *cmd);
+static int nvme_get_log_lsi(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page,
+		u8 lsp, u8 csi, void *log, size_t size, u64 offset, u16 lsi);
 
 void nvme_queue_scan(struct nvme_ctrl *ctrl)
 {
@@ -3083,8 +3085,8 @@ static int nvme_init_subsystem(struct nvme_ctrl *ctrl, struct nvme_id_ctrl *id)
 	return ret;
 }
 
-int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
-		void *log, size_t size, u64 offset)
+static int nvme_get_log_lsi(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page,
+		u8 lsp, u8 csi, void *log, size_t size, u64 offset, u16 lsi)
 {
 	struct nvme_command c = { };
 	u32 dwlen = nvme_bytes_to_numd(size);
@@ -3098,10 +3100,18 @@ int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
 	c.get_log_page.lpol = cpu_to_le32(lower_32_bits(offset));
 	c.get_log_page.lpou = cpu_to_le32(upper_32_bits(offset));
 	c.get_log_page.csi = csi;
+	c.get_log_page.lsi = cpu_to_le16(lsi);
 
 	return nvme_submit_sync_cmd(ctrl->admin_q, &c, log, size);
 }
 
+int nvme_get_log(struct nvme_ctrl *ctrl, u32 nsid, u8 log_page, u8 lsp, u8 csi,
+		void *log, size_t size, u64 offset)
+{
+	return nvme_get_log_lsi(ctrl, nsid, log_page, lsp, csi, log, size,
+			offset, 0);
+}
+
 static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,
 				struct nvme_effects_log **log)
 {
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 08/11] nvme: pass a void pointer to nvme_get/set_features for the result
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
                   ` (6 preceding siblings ...)
  2025-02-03 18:41 ` [PATCHv2 07/11] nvme: add a nvme_get_log_lsi helper Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-03 18:41 ` [PATCHv2 09/11] nvme: add FDP definitions Keith Busch
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Hannes Reinecke,
	Nitesh Shetty, Keith Busch

From: Christoph Hellwig <[email protected]>

That allows passing in structures instead of the u32 result, and thus
reduce the amount of bit shifting and masking required to parse the
result.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 drivers/nvme/host/core.c | 4 ++--
 drivers/nvme/host/nvme.h | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8c43823cc37e1..324b31ba270a6 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1678,7 +1678,7 @@ static int nvme_features(struct nvme_ctrl *dev, u8 op, unsigned int fid,
 
 int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
 		      unsigned int dword11, void *buffer, size_t buflen,
-		      u32 *result)
+		      void *result)
 {
 	return nvme_features(dev, nvme_admin_set_features, fid, dword11, buffer,
 			     buflen, result);
@@ -1687,7 +1687,7 @@ EXPORT_SYMBOL_GPL(nvme_set_features);
 
 int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
 		      unsigned int dword11, void *buffer, size_t buflen,
-		      u32 *result)
+		      void *result)
 {
 	return nvme_features(dev, nvme_admin_get_features, fid, dword11, buffer,
 			     buflen, result);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 7be92d07430e9..9c94c1085869b 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -895,10 +895,10 @@ int __nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,
 		int qid, nvme_submit_flags_t flags);
 int nvme_set_features(struct nvme_ctrl *dev, unsigned int fid,
 		      unsigned int dword11, void *buffer, size_t buflen,
-		      u32 *result);
+		      void *result);
 int nvme_get_features(struct nvme_ctrl *dev, unsigned int fid,
 		      unsigned int dword11, void *buffer, size_t buflen,
-		      u32 *result);
+		      void *result);
 int nvme_set_queue_count(struct nvme_ctrl *ctrl, int *count);
 void nvme_stop_keep_alive(struct nvme_ctrl *ctrl);
 int nvme_reset_ctrl(struct nvme_ctrl *ctrl);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 09/11] nvme: add FDP definitions
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
                   ` (7 preceding siblings ...)
  2025-02-03 18:41 ` [PATCHv2 08/11] nvme: pass a void pointer to nvme_get/set_features for the result Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-03 18:41 ` [PATCHv2 10/11] nvme: register fdp parameters with the block layer Keith Busch
  2025-02-03 18:41 ` [PATCHv2 11/11] nvme: use fdp streams if write stream is provided Keith Busch
  10 siblings, 0 replies; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Hannes Reinecke,
	Nitesh Shetty, Keith Busch

From: Christoph Hellwig <[email protected]>

Add the config feature result, config log page, and management receive
commands needed for FDP.

Partially based on a patch from Kanchan Joshi <[email protected]>.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 include/linux/nvme.h | 77 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index fe3b60818fdcf..96962c95b7d12 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -277,6 +277,7 @@ enum nvme_ctrl_attr {
 	NVME_CTRL_ATTR_TBKAS		= (1 << 6),
 	NVME_CTRL_ATTR_ELBAS		= (1 << 15),
 	NVME_CTRL_ATTR_RHII		= (1 << 18),
+	NVME_CTRL_ATTR_FDPS		= (1 << 19),
 };
 
 struct nvme_id_ctrl {
@@ -663,6 +664,44 @@ struct nvme_rotational_media_log {
 	__u8	rsvd24[488];
 };
 
+struct nvme_fdp_config {
+	__u8			flags;
+#define FDPCFG_FDPE	(1U << 0)
+	__u8			fdpcidx;
+	__le16			reserved;
+};
+
+struct nvme_fdp_ruh_desc {
+	__u8			ruht;
+	__u8			reserved[3];
+};
+
+struct nvme_fdp_config_desc {
+	__le16			dsze;
+	__u8			fdpa;
+	__u8			vss;
+	__le32			nrg;
+	__le16			nruh;
+	__le16			maxpids;
+	__le32			nns;
+	__le64			runs;
+	__le32			erutl;
+	__u8			rsvd28[36];
+	struct nvme_fdp_ruh_desc ruhs[];
+};
+
+struct nvme_fdp_config_log {
+	__le16			numfdpc;
+	__u8			ver;
+	__u8			rsvd3;
+	__le32			sze;
+	__u8			rsvd8[8];
+	/*
+	 * This is followed by variable number of nvme_fdp_config_desc
+	 * structures, but sparse doesn't like nested variable sized arrays.
+	 */
+};
+
 struct nvme_smart_log {
 	__u8			critical_warning;
 	__u8			temperature[2];
@@ -889,6 +928,7 @@ enum nvme_opcode {
 	nvme_cmd_resv_register	= 0x0d,
 	nvme_cmd_resv_report	= 0x0e,
 	nvme_cmd_resv_acquire	= 0x11,
+	nvme_cmd_io_mgmt_recv	= 0x12,
 	nvme_cmd_resv_release	= 0x15,
 	nvme_cmd_zone_mgmt_send	= 0x79,
 	nvme_cmd_zone_mgmt_recv	= 0x7a,
@@ -910,6 +950,7 @@ enum nvme_opcode {
 		nvme_opcode_name(nvme_cmd_resv_register),	\
 		nvme_opcode_name(nvme_cmd_resv_report),		\
 		nvme_opcode_name(nvme_cmd_resv_acquire),	\
+		nvme_opcode_name(nvme_cmd_io_mgmt_recv),	\
 		nvme_opcode_name(nvme_cmd_resv_release),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_send),	\
 		nvme_opcode_name(nvme_cmd_zone_mgmt_recv),	\
@@ -1061,6 +1102,7 @@ enum {
 	NVME_RW_PRINFO_PRCHK_GUARD	= 1 << 12,
 	NVME_RW_PRINFO_PRACT		= 1 << 13,
 	NVME_RW_DTYPE_STREAMS		= 1 << 4,
+	NVME_RW_DTYPE_DPLCMT		= 2 << 4,
 	NVME_WZ_DEAC			= 1 << 9,
 };
 
@@ -1148,6 +1190,38 @@ struct nvme_zone_mgmt_recv_cmd {
 	__le32			cdw14[2];
 };
 
+struct nvme_io_mgmt_recv_cmd {
+	__u8			opcode;
+	__u8			flags;
+	__u16			command_id;
+	__le32			nsid;
+	__le64			rsvd2[2];
+	union nvme_data_ptr	dptr;
+	__u8			mo;
+	__u8			rsvd11;
+	__u16			mos;
+	__le32			numd;
+	__le32			cdw12[4];
+};
+
+enum {
+	NVME_IO_MGMT_RECV_MO_RUHS	= 1,
+};
+
+struct nvme_fdp_ruh_status_desc {
+	__le16			pid;
+	__le16			ruhid;
+	__le32			earutr;
+	__le64			ruamw;
+	__u8			reserved[16];
+};
+
+struct nvme_fdp_ruh_status {
+	__u8			rsvd0[14];
+	__le16			nruhsd;
+	struct nvme_fdp_ruh_status_desc ruhsd[];
+};
+
 enum {
 	NVME_ZRA_ZONE_REPORT		= 0,
 	NVME_ZRASF_ZONE_REPORT_ALL	= 0,
@@ -1283,6 +1357,7 @@ enum {
 	NVME_FEAT_PLM_WINDOW	= 0x14,
 	NVME_FEAT_HOST_BEHAVIOR	= 0x16,
 	NVME_FEAT_SANITIZE	= 0x17,
+	NVME_FEAT_FDP		= 0x1d,
 	NVME_FEAT_SW_PROGRESS	= 0x80,
 	NVME_FEAT_HOST_ID	= 0x81,
 	NVME_FEAT_RESV_MASK	= 0x82,
@@ -1303,6 +1378,7 @@ enum {
 	NVME_LOG_ANA		= 0x0c,
 	NVME_LOG_FEATURES	= 0x12,
 	NVME_LOG_RMI		= 0x16,
+	NVME_LOG_FDP_CONFIGS	= 0x20,
 	NVME_LOG_DISC		= 0x70,
 	NVME_LOG_RESERVATION	= 0x80,
 	NVME_FWACT_REPL		= (0 << 3),
@@ -1890,6 +1966,7 @@ struct nvme_command {
 		struct nvmf_auth_receive_command auth_receive;
 		struct nvme_dbbuf dbbuf;
 		struct nvme_directive_cmd directive;
+		struct nvme_io_mgmt_recv_cmd imr;
 	};
 };
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 10/11] nvme: register fdp parameters with the block layer
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
                   ` (8 preceding siblings ...)
  2025-02-03 18:41 ` [PATCHv2 09/11] nvme: add FDP definitions Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-04  6:01   ` Christoph Hellwig
  2025-02-03 18:41 ` [PATCHv2 11/11] nvme: use fdp streams if write stream is provided Keith Busch
  10 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Keith Busch,
	Hannes Reinecke, Nitesh Shetty

From: Keith Busch <[email protected]>

Register the device data placement limits if supported. This is just
registering the limits with the block layer. Nothing beyond reporting
these attributes is happening in this patch.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 drivers/nvme/host/core.c | 144 +++++++++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h |   2 +
 2 files changed, 146 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 324b31ba270a6..c8bc58b8ee3aa 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -38,6 +38,8 @@ struct nvme_ns_info {
 	u32 nsid;
 	__le32 anagrpid;
 	u8 pi_offset;
+	u16 endgid;
+	u64 runs;
 	bool is_shared;
 	bool is_readonly;
 	bool is_ready;
@@ -1613,6 +1615,7 @@ static int nvme_ns_info_from_identify(struct nvme_ctrl *ctrl,
 	info->is_shared = id->nmic & NVME_NS_NMIC_SHARED;
 	info->is_readonly = id->nsattr & NVME_NS_ATTR_RO;
 	info->is_ready = true;
+	info->endgid = le16_to_cpu(id->endgid);
 	if (ctrl->quirks & NVME_QUIRK_BOGUS_NID) {
 		dev_info(ctrl->device,
 			 "Ignoring bogus Namespace Identifiers\n");
@@ -1653,6 +1656,7 @@ static int nvme_ns_info_from_id_cs_indep(struct nvme_ctrl *ctrl,
 		info->is_ready = id->nstat & NVME_NSTAT_NRDY;
 		info->is_rotational = id->nsfeat & NVME_NS_ROTATIONAL;
 		info->no_vwc = id->nsfeat & NVME_NS_VWC_NOT_PRESENT;
+		info->endgid = le16_to_cpu(id->endgid);
 	}
 	kfree(id);
 	return ret;
@@ -2156,6 +2160,132 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
 	return ret;
 }
 
+static int nvme_query_fdp_granularity(struct nvme_ctrl *ctrl,
+				      struct nvme_ns_info *info, u8 fdp_idx)
+{
+	struct nvme_fdp_config_log hdr, *h;
+	struct nvme_fdp_config_desc *desc;
+	size_t size = sizeof(hdr);
+	void *log, *end;
+	int i, n, ret;
+
+	ret = nvme_get_log_lsi(ctrl, 0, NVME_LOG_FDP_CONFIGS, 0,
+			       NVME_CSI_NVM, &hdr, size, 0, info->endgid);
+	if (ret) {
+		dev_warn(ctrl->device,
+			 "FDP configs log header status:0x%x endgid:%d\n", ret,
+			 info->endgid);
+		return ret;
+	}
+
+	size = le32_to_cpu(hdr.sze);
+	if (size > PAGE_SIZE * MAX_ORDER_NR_PAGES) {
+		dev_warn(ctrl->device, "FDP config size too large:%zu\n",
+			 size);
+		return 0;
+	}
+
+	h = kvmalloc(size, GFP_KERNEL);
+	if (!h)
+		return -ENOMEM;
+
+	ret = nvme_get_log_lsi(ctrl, 0, NVME_LOG_FDP_CONFIGS, 0,
+			       NVME_CSI_NVM, h, size, 0, info->endgid);
+	if (ret) {
+		dev_warn(ctrl->device,
+			 "FDP configs log status:0x%x endgid:%d\n", ret,
+			 info->endgid);
+		goto out;
+	}
+
+	n = le16_to_cpu(h->numfdpc) + 1;
+	if (fdp_idx > n) {
+		dev_warn(ctrl->device, "FDP index:%d out of range:%d\n",
+			 fdp_idx, n);
+		/* Proceed without registering FDP streams */
+		ret = 0;
+		goto out;
+	}
+
+	log = h + 1;
+	desc = log;
+	end = log + size - sizeof(*h);
+	for (i = 0; i < fdp_idx; i++) {
+		log += le16_to_cpu(desc->dsze);
+		desc = log;
+		if (log >= end) {
+			dev_warn(ctrl->device,
+				 "FDP invalid config descriptor list\n");
+			ret = 0;
+			goto out;
+		}
+	}
+
+	if (le32_to_cpu(desc->nrg) > 1) {
+		dev_warn(ctrl->device, "FDP NRG > 1 not supported\n");
+		ret = 0;
+		goto out;
+	}
+
+	info->runs = le64_to_cpu(desc->runs);
+out:
+	kvfree(h);
+	return ret;
+}
+
+static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
+{
+	struct nvme_ns_head *head = ns->head;
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_fdp_ruh_status *ruhs;
+	struct nvme_fdp_config fdp;
+	struct nvme_command c = {};
+	size_t size;
+	int ret;
+
+	/*
+	 * The FDP configuration is static for the lifetime of the namespace,
+	 * so return immediately if we've already registered this namespace's
+	 * streams.
+	 */
+	if (head->nr_plids)
+		return 0;
+
+	ret = nvme_get_features(ctrl, NVME_FEAT_FDP, info->endgid, NULL, 0,
+				&fdp);
+	if (ret) {
+		dev_warn(ctrl->device, "FDP get feature status:0x%x\n", ret);
+		return ret;
+	}
+
+	if (!(fdp.flags & FDPCFG_FDPE))
+		return 0;
+
+	ret = nvme_query_fdp_granularity(ctrl, info, fdp.fdpcidx);
+	if (!info->runs)
+		return ret;
+
+	size = struct_size(ruhs, ruhsd, S8_MAX - 1);
+	ruhs = kzalloc(size, GFP_KERNEL);
+	if (!ruhs)
+		return -ENOMEM;
+
+	c.imr.opcode = nvme_cmd_io_mgmt_recv;
+	c.imr.nsid = cpu_to_le32(head->ns_id);
+	c.imr.mo = NVME_IO_MGMT_RECV_MO_RUHS;
+	c.imr.numd = cpu_to_le32(nvme_bytes_to_numd(size));
+	ret = nvme_submit_sync_cmd(ns->queue, &c, ruhs, size);
+	if (ret) {
+		dev_warn(ctrl->device, "FDP io-mgmt status:0x%x\n", ret);
+		goto free;
+	}
+
+	head->nr_plids = le16_to_cpu(ruhs->nruhsd);
+free:
+	kfree(ruhs);
+	return ret;
+}
+
 static int nvme_update_ns_info_block(struct nvme_ns *ns,
 		struct nvme_ns_info *info)
 {
@@ -2194,6 +2324,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 
 	lim = queue_limits_start_update(ns->disk->queue);
 
+	if (ns->ctrl->ctratt & NVME_CTRL_ATTR_FDPS) {
+		ret = nvme_query_fdp_info(ns, info);
+		if (ret < 0)
+			goto out;
+	}
+
 	blk_mq_freeze_queue(ns->disk->queue);
 	ns->head->lba_shift = id->lbaf[lbaf].ds;
 	ns->head->nuse = le64_to_cpu(id->nuse);
@@ -2225,6 +2361,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 	if (!nvme_init_integrity(ns->head, &lim, info))
 		capacity = 0;
 
+	lim.max_write_streams = ns->head->nr_plids;
+	if (lim.max_write_streams)
+		lim.write_stream_granularity = max(info->runs, U32_MAX);
+	else
+		lim.write_stream_granularity = 0;
+
 	ret = queue_limits_commit_update(ns->disk->queue, &lim);
 	if (ret) {
 		blk_mq_unfreeze_queue(ns->disk->queue);
@@ -2327,6 +2469,8 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 			ns->head->disk->flags |= GENHD_FL_HIDDEN;
 		else
 			nvme_init_integrity(ns->head, &lim, info);
+		lim.max_write_streams = ns_lim->max_write_streams;
+		lim.write_stream_granularity = ns_lim->write_stream_granularity;
 		ret = queue_limits_commit_update(ns->head->disk->queue, &lim);
 
 		set_capacity_and_notify(ns->head->disk, get_capacity(ns->disk));
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9c94c1085869b..b63164cf6b274 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -496,6 +496,8 @@ struct nvme_ns_head {
 	struct device		cdev_device;
 
 	struct gendisk		*disk;
+
+	u16			nr_plids;
 #ifdef CONFIG_NVME_MULTIPATH
 	struct bio_list		requeue_list;
 	spinlock_t		requeue_lock;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCHv2 11/11] nvme: use fdp streams if write stream is provided
  2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
                   ` (9 preceding siblings ...)
  2025-02-03 18:41 ` [PATCHv2 10/11] nvme: register fdp parameters with the block layer Keith Busch
@ 2025-02-03 18:41 ` Keith Busch
  2025-02-04  6:01   ` Christoph Hellwig
  10 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2025-02-03 18:41 UTC (permalink / raw)
  To: linux-nvme, io-uring, linux-block
  Cc: linux-fsdevel, asml.silence, axboe, hch, sagi, Keith Busch,
	Hannes Reinecke, Nitesh Shetty

From: Keith Busch <[email protected]>

Maps a user requested write stream to an FDP placement ID if possible.

Reviewed-by: Hannes Reinecke <[email protected]>
Reviewed-by: Nitesh Shetty <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
---
 drivers/nvme/host/core.c | 31 ++++++++++++++++++++++++++++++-
 drivers/nvme/host/nvme.h |  1 +
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c8bc58b8ee3aa..d14c23916671d 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -674,6 +674,7 @@ static void nvme_free_ns_head(struct kref *ref)
 	ida_free(&head->subsys->ns_ida, head->instance);
 	cleanup_srcu_struct(&head->srcu);
 	nvme_put_subsystem(head->subsys);
+	kfree(head->plids);
 	kfree(head);
 }
 
@@ -997,6 +998,18 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 	if (req->cmd_flags & REQ_RAHEAD)
 		dsmgmt |= NVME_RW_DSM_FREQ_PREFETCH;
 
+	if (op == nvme_cmd_write && ns->head->nr_plids) {
+		u16 write_stream = req->bio->bi_write_stream;
+
+		if (WARN_ON_ONCE(write_stream > ns->head->nr_plids))
+			return BLK_STS_INVAL;
+
+		if (write_stream) {
+			dsmgmt |= ns->head->plids[write_stream - 1] << 16;
+			control |= NVME_RW_DTYPE_DPLCMT;
+		}
+	}
+
 	if (req->cmd_flags & REQ_ATOMIC && !nvme_valid_atomic_write(req))
 		return BLK_STS_INVAL;
 
@@ -2241,7 +2254,7 @@ static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 	struct nvme_fdp_config fdp;
 	struct nvme_command c = {};
 	size_t size;
-	int ret;
+	int i, ret;
 
 	/*
 	 * The FDP configuration is static for the lifetime of the namespace,
@@ -2281,6 +2294,22 @@ static int nvme_query_fdp_info(struct nvme_ns *ns, struct nvme_ns_info *info)
 	}
 
 	head->nr_plids = le16_to_cpu(ruhs->nruhsd);
+	if (!head->nr_plids)
+		goto free;
+
+	head->plids = kcalloc(head->nr_plids, sizeof(head->plids),
+			      GFP_KERNEL);
+	if (!head->plids) {
+		dev_warn(ctrl->device,
+			 "failed to allocate %u FDP placement IDs\n",
+			 head->nr_plids);
+		head->nr_plids = 0;
+		ret = -ENOMEM;
+		goto free;
+	}
+
+	for (i = 0; i < head->nr_plids; i++)
+		head->plids[i] = le16_to_cpu(ruhs->ruhsd[i].pid);
 free:
 	kfree(ruhs);
 	return ret;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index b63164cf6b274..59ba31c48cf66 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -498,6 +498,7 @@ struct nvme_ns_head {
 	struct gendisk		*disk;
 
 	u16			nr_plids;
+	u16			*plids;
 #ifdef CONFIG_NVME_MULTIPATH
 	struct bio_list		requeue_list;
 	spinlock_t		requeue_lock;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 06/11] io_uring: enable per-io write streams
  2025-02-03 18:41 ` [PATCHv2 06/11] io_uring: enable per-io write streams Keith Busch
@ 2025-02-04  6:00   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-02-04  6:00 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, io-uring, linux-block, linux-fsdevel, asml.silence,
	axboe, hch, sagi, Keith Busch, Hannes Reinecke, Nitesh Shetty

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 10/11] nvme: register fdp parameters with the block layer
  2025-02-03 18:41 ` [PATCHv2 10/11] nvme: register fdp parameters with the block layer Keith Busch
@ 2025-02-04  6:01   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-02-04  6:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, io-uring, linux-block, linux-fsdevel, asml.silence,
	axboe, hch, sagi, Keith Busch, Hannes Reinecke, Nitesh Shetty

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCHv2 11/11] nvme: use fdp streams if write stream is provided
  2025-02-03 18:41 ` [PATCHv2 11/11] nvme: use fdp streams if write stream is provided Keith Busch
@ 2025-02-04  6:01   ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2025-02-04  6:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, io-uring, linux-block, linux-fsdevel, asml.silence,
	axboe, hch, sagi, Keith Busch, Hannes Reinecke, Nitesh Shetty

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2025-02-04  6:01 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-03 18:41 [PATCHv2 00/11] block write streams with nvme fdp Keith Busch
2025-02-03 18:41 ` [PATCHv2 01/11] fs: add a write stream field to the kiocb Keith Busch
2025-02-03 18:41 ` [PATCHv2 02/11] block: add a bi_write_stream field Keith Busch
2025-02-03 18:41 ` [PATCHv2 03/11] block: introduce max_write_streams queue limit Keith Busch
2025-02-03 18:41 ` [PATCHv2 04/11] block: introduce a write_stream_granularity " Keith Busch
2025-02-03 18:41 ` [PATCHv2 05/11] block: expose write streams for block device nodes Keith Busch
2025-02-03 18:41 ` [PATCHv2 06/11] io_uring: enable per-io write streams Keith Busch
2025-02-04  6:00   ` Christoph Hellwig
2025-02-03 18:41 ` [PATCHv2 07/11] nvme: add a nvme_get_log_lsi helper Keith Busch
2025-02-03 18:41 ` [PATCHv2 08/11] nvme: pass a void pointer to nvme_get/set_features for the result Keith Busch
2025-02-03 18:41 ` [PATCHv2 09/11] nvme: add FDP definitions Keith Busch
2025-02-03 18:41 ` [PATCHv2 10/11] nvme: register fdp parameters with the block layer Keith Busch
2025-02-04  6:01   ` Christoph Hellwig
2025-02-03 18:41 ` [PATCHv2 11/11] nvme: use fdp streams if write stream is provided Keith Busch
2025-02-04  6:01   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox