public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Read/Write with meta/integrity
       [not found] <CGME20240626101415epcas5p3b06a963aa0b0196d6599fb86c90bc38c@epcas5p3.samsung.com>
@ 2024-06-26 10:06 ` Anuj Gupta
       [not found]   ` <CGME20240626101511epcas5p31c95b67da58d408c371c49f8719140fc@epcas5p3.samsung.com>
                     ` (11 more replies)
  0 siblings, 12 replies; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Anuj Gupta

This adds a new io_uring interface to exchange meta along with read/write.

Interface:
Meta information is represented using a newly introduced 'struct io_uring_meta'.
Application sets up a SQE128 ring, and prepares io_uring_meta within unused
portion of SQE. Application populates 'struct io_uring_meta' fields as below:

* meta_type: describes type of meta that is passed. Currently one type
"Integrity" is supported.
* meta_flags: these are meta-type specific flags. Three flags are exposed for
integrity type, namely INTEGRITY_CHK_GUARD/APPTAG/REFTAG.
* meta_len: length of the meta buffer
* meta_addr: address of the meta buffer
* apptag: optional application-specific 16b value; this goes along with
INTEGRITY_CHK_APPTAG flag.

Block path (direct IO) and NVMe driver are modified to support
this.

The first patch is borrowed from Mikulas series[1] to make the metadata split
work correctly.
Next 5 patches are enhancements in the block/nvme so that user meta buffer
is handled correctly (mostly when it gets split).

Patch 8 adds the io_uring support.
Patch 9 adds the support for block direct IO, and patch 10 for NVMe.

Example program on how to use the interface is appended below [2]

Tree:
https://github.com/SamsungDS/linux/tree/feat/pi_us_v2
Testing:
has been done by modifying fio to use this interface.
https://github.com/samsungds/fio/commits/feat/test-meta-v3

Changes since v1:
https://lore.kernel.org/linux-block/[email protected]/
- Do not use new opcode for meta, and also add the provision to introduce new
meta types beyond integrity (Pavel)
- Stuff IOCB_HAS_META check in need_complete_io (Jens)
- Split meta handling in NVMe into a separate handler (Keith)
- Add meta handling for __blkdev_direct_IO too (Keith)
- Don't inherit BIP_COPY_USER flag for cloned bio's (Christoph)
- Better commit descriptions (Christoph)

Changes since RFC:
- modify io_uring plumbing based on recent async handling state changes
- fixes/enhancements to correctly handle the split for meta buffer
- add flags to specify guard/reftag/apptag checks
- add support to send apptag

[1] https://lore.kernel.org/linux-block/[email protected]/

[2]

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <linux/io_uring.h>
#include <linux/types.h>
#include "liburing.h"

/* write data/meta. read both. compare. send apptag too.
* prerequisite:
* unprotected xfer: format namespace with 4KB + 8b, pi_type = 0
* protected xfer: format namespace with 4KB + 8b, pi_type = 1
*/

#define DATA_LEN 4096
#define META_LEN 8

struct t10_pi_tuple {
        __be16  guard;
        __be16  apptag;
        __be32  reftag;
};

int main(int argc, char *argv[])
{
         struct io_uring ring;
         struct io_uring_sqe *sqe = NULL;
         struct io_uring_cqe *cqe = NULL;
         void *wdb,*rdb;
         char wmb[META_LEN], rmb[META_LEN];
         char *data_str = "data buffer";
         char *meta_str = "meta";
         int fd, ret, blksize;
         struct stat fstat;
         unsigned long long offset = DATA_LEN;
         struct t10_pi_tuple *pi;
	 struct io_uring_meta *md;

         if (argc != 2) {
                 fprintf(stderr, "Usage: %s <block-device>", argv[0]);
                 return 1;
         };

         if (stat(argv[1], &fstat) == 0) {
                 blksize = (int)fstat.st_blksize;
         } else {
                 perror("stat");
                 return 1;
         }

         if (posix_memalign(&wdb, blksize, DATA_LEN)) {
                 perror("posix_memalign failed");
                 return 1;
         }
         if (posix_memalign(&rdb, blksize, DATA_LEN)) {
                 perror("posix_memalign failed");
                 return 1;
         }

         strcpy(wdb, data_str);
         strcpy(wmb, meta_str);

         fd = open(argv[1], O_RDWR | O_DIRECT);
         if (fd < 0) {
                 printf("Error in opening device\n");
                 return 0;
         }

         ret = io_uring_queue_init(8, &ring, IORING_SETUP_SQE128);
         if (ret) {
                 fprintf(stderr, "ring setup failed: %d\n", ret);
                 return 1;
         }

         /* write data + meta-buffer to device */
         sqe = io_uring_get_sqe(&ring);
         if (!sqe) {
                 fprintf(stderr, "get sqe failed\n");
                 return 1;
         }

         io_uring_prep_write(sqe, fd, wdb, DATA_LEN, offset);

	 md = (struct io_uring_meta *) sqe->cmd;
	 md->meta_type = META_TYPE_INTEGRITY;
	 md->meta_addr = (__u64)wmb;
	 md->meta_len = META_LEN;
         /* flags to ask for guard/reftag/apptag*/
	 md->meta_flags = INTEGRITY_CHK_APPTAG;
         md->apptag = 0x1234;

	 pi = (struct t10_pi_tuple *)wmb;
         pi->apptag = 0x3412;

         ret = io_uring_submit(&ring);
         if (ret <= 0) {
                 fprintf(stderr, "sqe submit failed: %d\n", ret);
                 return 1;
         }

         ret = io_uring_wait_cqe(&ring, &cqe);
         if (!cqe) {
                 fprintf(stderr, "cqe is NULL :%d\n", ret);
                 return 1;
         }
         if (cqe->res < 0) {
                 fprintf(stderr, "write cqe failure: %d", cqe->res);
                 return 1;
         }

         io_uring_cqe_seen(&ring, cqe);

         /* read data + meta-buffer back from device */
         sqe = io_uring_get_sqe(&ring);
         if (!sqe) {
                 fprintf(stderr, "get sqe failed\n");
                 return 1;
         }

         io_uring_prep_read(sqe, fd, rdb, DATA_LEN, offset);

	 md = (struct io_uring_meta *) sqe->cmd;
	 md->meta_type = META_TYPE_INTEGRITY;
	 md->meta_addr = (__u64)rmb;
	 md->meta_len = META_LEN;
	 md->meta_flags = INTEGRITY_CHK_APPTAG;
         md->apptag = 0x1234;

         ret = io_uring_submit(&ring);
         if (ret <= 0) {
                 fprintf(stderr, "sqe submit failed: %d\n", ret);
                 return 1;
         }

         ret = io_uring_wait_cqe(&ring, &cqe);
         if (!cqe) {
                 fprintf(stderr, "cqe is NULL :%d\n", ret);
                 return 1;
         }

         if (cqe->res < 0) {
                 fprintf(stderr, "read cqe failure: %d", cqe->res);
                 return 1;
         }
         io_uring_cqe_seen(&ring, cqe);

         if (strncmp(wmb, rmb, META_LEN))
                 printf("Failure: meta mismatch!, wmb=%s, rmb=%s\n", wmb, rmb);

         if (strncmp(wdb, rdb, DATA_LEN))
                 printf("Failure: data mismatch!\n");

         io_uring_queue_exit(&ring);
         free(rdb);
         free(wdb);
         return 0;
}

Anuj Gupta (5):
  block: set bip_vcnt correctly
  block: copy bip_max_vcnt vecs instead of bip_vcnt during clone
  block: Handle meta bounce buffer correctly in case of split
  block: modify bio_integrity_map_user to accept iov_iter as argument
  io_uring/rw: add support to send meta along with read/write

Kanchan Joshi (4):
  block: introduce BIP_CLONED flag
  block: define meta io descriptor
  block: add support to pass user meta buffer
  nvme: add handling for user integrity buffer

Mikulas Patocka (1):
  block: change rq_integrity_vec to respect the iterator

 block/bio-integrity.c         | 75 ++++++++++++++++++++++++++-----
 block/fops.c                  | 28 +++++++++++-
 block/t10-pi.c                |  6 +++
 drivers/nvme/host/core.c      | 85 ++++++++++++++++++++++++-----------
 drivers/nvme/host/ioctl.c     | 11 ++++-
 drivers/nvme/host/pci.c       |  6 +--
 include/linux/bio.h           | 25 +++++++++--
 include/linux/blk-integrity.h | 16 +++----
 include/linux/fs.h            |  1 +
 include/uapi/linux/io_uring.h | 30 ++++++++++++-
 io_uring/io_uring.c           |  7 +++
 io_uring/rw.c                 | 68 ++++++++++++++++++++++++++--
 io_uring/rw.h                 |  9 +++-
 13 files changed, 308 insertions(+), 59 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/10] block: change rq_integrity_vec to respect the iterator
       [not found]   ` <CGME20240626101511epcas5p31c95b67da58d408c371c49f8719140fc@epcas5p3.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  0 siblings, 0 replies; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi, Anuj Gupta

From: Mikulas Patocka <[email protected]>

If we allocate a bio that is larger than NVMe maximum request size,
attach integrity metadata to it and send it to the NVMe subsystem, the
integrity metadata will be corrupted.

Splitting the bio works correctly. The function bio_split will clone the
bio, trim the iterator of the first bio and advance the iterator of the
second bio.

However, the function rq_integrity_vec has a bug - it returns the first
vector of the bio's metadata and completely disregards the metadata
iterator that was advanced when the bio was split. Thus, the second bio
uses the same metadata as the first bio and this leads to metadata
corruption.

This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec
instead of returning the first vector. mp_bvec_iter_bvec reads the
iterator and uses it to build a bvec for the current position in the
iterator.

The "queue_max_integrity_segments(rq->q) > 1" check was removed, because
the updated rq_integrity_vec function works correctly with multiple
segments.

Signed-off-by: Mikulas Patocka <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Kanchan Joshi <[email protected]>
Reviewed-by: Anuj Gupta <[email protected]>
---
 drivers/nvme/host/pci.c       |  6 +++---
 include/linux/blk-integrity.h | 16 ++++++++--------
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 102a9fb0c65f..5d8035218de9 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -826,9 +826,9 @@ static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req,
 		struct nvme_command *cmnd)
 {
 	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct bio_vec bv = rq_integrity_vec(req);
 
-	iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req),
-			rq_dma_dir(req), 0);
+	iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0);
 	if (dma_mapping_error(dev->dev, iod->meta_dma))
 		return BLK_STS_IOERR;
 	cmnd->rw.metadata = cpu_to_le64(iod->meta_dma);
@@ -967,7 +967,7 @@ static __always_inline void nvme_pci_unmap_rq(struct request *req)
 	        struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
 
 		dma_unmap_page(dev->dev, iod->meta_dma,
-			       rq_integrity_vec(req)->bv_len, rq_dma_dir(req));
+			       rq_integrity_vec(req).bv_len, rq_dma_dir(req));
 	}
 
 	if (blk_rq_nr_phys_segments(req))
diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h
index d201140d77a3..c58634924782 100644
--- a/include/linux/blk-integrity.h
+++ b/include/linux/blk-integrity.h
@@ -90,14 +90,13 @@ static inline bool blk_integrity_rq(struct request *rq)
 }
 
 /*
- * Return the first bvec that contains integrity data.  Only drivers that are
- * limited to a single integrity segment should use this helper.
+ * Return the current bvec that contains the integrity data. bip_iter may be
+ * advanced to iterate over the integrity data.
  */
-static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+static inline struct bio_vec rq_integrity_vec(struct request *rq)
 {
-	if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1))
-		return NULL;
-	return rq->bio->bi_integrity->bip_vec;
+	return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec,
+				 rq->bio->bi_integrity->bip_iter);
 }
 #else /* CONFIG_BLK_DEV_INTEGRITY */
 static inline int blk_rq_count_integrity_sg(struct request_queue *q,
@@ -146,9 +145,10 @@ static inline int blk_integrity_rq(struct request *rq)
 	return 0;
 }
 
-static inline struct bio_vec *rq_integrity_vec(struct request *rq)
+static inline struct bio_vec rq_integrity_vec(struct request *rq)
 {
-	return NULL;
+	/* the optimizer will remove all calls to this function */
+	return (struct bio_vec){ };
 }
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 02/10] block: set bip_vcnt correctly
       [not found]   ` <CGME20240626101513epcas5p10b3f8470148abb10ce3edfb90814cd94@epcas5p1.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  2024-06-28  6:04       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Anuj Gupta, Kanchan Joshi

Set the bip_vcnt correctly in bio_integrity_init_user and
bio_integrity_copy_user. If the bio gets split at a later point,
this value is required to set the right bip_vcnt in the cloned bio.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
 block/bio-integrity.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index dab70370b2c7..af79d9fbf413 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -276,6 +276,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
 
 	bip->bip_flags |= BIP_INTEGRITY_USER | BIP_COPY_USER;
 	bip->bip_iter.bi_sector = seed;
+	bip->bip_vcnt = nr_vecs;
 	return 0;
 free_bip:
 	bio_integrity_free(bio);
@@ -297,6 +298,7 @@ static int bio_integrity_init_user(struct bio *bio, struct bio_vec *bvec,
 	bip->bip_flags |= BIP_INTEGRITY_USER;
 	bip->bip_iter.bi_sector = seed;
 	bip->bip_iter.bi_size = len;
+	bip->bip_vcnt = nr_vecs;
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 03/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone
       [not found]   ` <CGME20240626101516epcas5p19fb40e8231d1832cab3d031672f0109e@epcas5p1.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  2024-06-27  6:14       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Anuj Gupta, Kanchan Joshi

If bio_integrity_copy_user is used to process the meta buffer, bip_max_vcnt
is one greater than bip_vcnt. In this case bip_max_vcnt vecs needs to be
copied to cloned bip.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 block/bio-integrity.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index af79d9fbf413..5e7596b74ef1 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -647,12 +647,12 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
 
 	BUG_ON(bip_src == NULL);
 
-	bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_vcnt);
+	bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_max_vcnt);
 	if (IS_ERR(bip))
 		return PTR_ERR(bip);
 
 	memcpy(bip->bip_vec, bip_src->bip_vec,
-	       bip_src->bip_vcnt * sizeof(struct bio_vec));
+	       bip_src->bip_max_vcnt * sizeof(struct bio_vec));
 
 	bip->bip_vcnt = bip_src->bip_vcnt;
 	bip->bip_iter = bip_src->bip_iter;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 04/10] block: Handle meta bounce buffer correctly in case of split
       [not found]   ` <CGME20240626101518epcas5p17e046bca77b218fc6914ddeb182eb42e@epcas5p1.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  2024-06-27  6:16       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Anuj Gupta

Do not inherit BIP_COPY_USER for cloned bio.
Also make sure that bounce buffer is copied back to user-space in
entirety when the parent bio completes.

Signed-off-by: Anuj Gupta <[email protected]>
---
 block/bio-integrity.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 5e7596b74ef1..845d4038afb1 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -105,9 +105,12 @@ static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs,
 
 static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip)
 {
+	struct bio *bio = bip->bip_bio;
+	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
 	unsigned short nr_vecs = bip->bip_max_vcnt - 1;
 	struct bio_vec *copy = &bip->bip_vec[1];
-	size_t bytes = bip->bip_iter.bi_size;
+	size_t bytes = bio_integrity_bytes(bi,
+					   bvec_iter_sectors(bip->bio_iter));
 	struct iov_iter iter;
 	int ret;
 
@@ -277,6 +280,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
 	bip->bip_flags |= BIP_INTEGRITY_USER | BIP_COPY_USER;
 	bip->bip_iter.bi_sector = seed;
 	bip->bip_vcnt = nr_vecs;
+	bip->bio_iter = bio->bi_iter;
 	return 0;
 free_bip:
 	bio_integrity_free(bio);
@@ -656,8 +660,8 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
 
 	bip->bip_vcnt = bip_src->bip_vcnt;
 	bip->bip_iter = bip_src->bip_iter;
-	bip->bip_flags = bip_src->bip_flags & ~BIP_BLOCK_INTEGRITY;
-
+	bip->bip_flags = bip_src->bip_flags & ~(BIP_BLOCK_INTEGRITY |
+						BIP_COPY_USER);
 	return 0;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 05/10] block: introduce BIP_CLONED flag
       [not found]   ` <CGME20240626101519epcas5p163b0735c1604a228196f0e8c14773005@epcas5p1.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  2024-06-27  6:21       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi

From: Kanchan Joshi <[email protected]>

Set the BIP_CLONED flag when bip is cloned.
Use that flag to ensure that integrity is freed for cloned user bip.

Note that a bio may have BIO_CLONED flag set but it may still not be
sharing the integrity vecs.

Signed-off-by: Kanchan Joshi <[email protected]>
---
 block/bio-integrity.c | 5 ++++-
 include/linux/bio.h   | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 845d4038afb1..8f07c4d0fada 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -147,7 +147,8 @@ void bio_integrity_free(struct bio *bio)
 	struct bio_integrity_payload *bip = bio_integrity(bio);
 	struct bio_set *bs = bio->bi_pool;
 
-	if (bip->bip_flags & BIP_INTEGRITY_USER)
+	if (bip->bip_flags & BIP_INTEGRITY_USER &&
+	    !(bip->bip_flags & BIP_CLONED))
 		return;
 	if (bip->bip_flags & BIP_BLOCK_INTEGRITY)
 		kfree(bvec_virt(bip->bip_vec));
@@ -662,6 +663,8 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
 	bip->bip_iter = bip_src->bip_iter;
 	bip->bip_flags = bip_src->bip_flags & ~(BIP_BLOCK_INTEGRITY |
 						BIP_COPY_USER);
+	bip->bip_flags |= BIP_CLONED;
+
 	return 0;
 }
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 818e93612947..44226fcb4d00 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -329,6 +329,7 @@ enum bip_flags {
 	BIP_IP_CHECKSUM		= 1 << 4, /* IP checksum */
 	BIP_INTEGRITY_USER	= 1 << 5, /* Integrity payload is user address */
 	BIP_COPY_USER		= 1 << 6, /* Kernel bounce buffer in use */
+	BIP_CLONED		= 1 << 7, /* Indicates that bip is cloned */
 };
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 06/10] block: modify bio_integrity_map_user to accept iov_iter as argument
       [not found]   ` <CGME20240626101521epcas5p42b0c1c0e123996b199e058bae9a69123@epcas5p4.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  2024-06-27  6:23       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Anuj Gupta, Kanchan Joshi

This patch refactors bio_integrity_map_user to accept iov_iter as
argument. This is a prep patch.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 block/bio-integrity.c     | 12 +++++-------
 drivers/nvme/host/ioctl.c | 11 +++++++++--
 include/linux/bio.h       |  8 +++++---
 3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 8f07c4d0fada..38418be07139 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -337,17 +337,16 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages,
 	return nr_bvecs;
 }
 
-int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes,
+int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter,
 			   u32 seed)
 {
 	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 	unsigned int align = q->dma_pad_mask | queue_dma_alignment(q);
 	struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages;
 	struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec;
+	size_t offset, bytes = iter->count;
 	unsigned int direction, nr_bvecs;
-	struct iov_iter iter;
 	int ret, nr_vecs;
-	size_t offset;
 	bool copy;
 
 	if (bio_integrity(bio))
@@ -360,8 +359,7 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes,
 	else
 		direction = ITER_SOURCE;
 
-	iov_iter_ubuf(&iter, direction, ubuf, bytes);
-	nr_vecs = iov_iter_npages(&iter, BIO_MAX_VECS + 1);
+	nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS + 1);
 	if (nr_vecs > BIO_MAX_VECS)
 		return -E2BIG;
 	if (nr_vecs > UIO_FASTIOV) {
@@ -371,8 +369,8 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes,
 		pages = NULL;
 	}
 
-	copy = !iov_iter_is_aligned(&iter, align, align);
-	ret = iov_iter_extract_pages(&iter, &pages, bytes, nr_vecs, 0, &offset);
+	copy = !iov_iter_is_aligned(iter, align, align);
+	ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs, 0, &offset);
 	if (unlikely(ret < 0))
 		goto free_bvec;
 
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 8b69427a4476..e77ea1e7be03 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -152,8 +152,15 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
 	if (bdev) {
 		bio_set_dev(bio, bdev);
 		if (meta_buffer && meta_len) {
-			ret = bio_integrity_map_user(bio, meta_buffer, meta_len,
-						     meta_seed);
+			struct iov_iter iter;
+			unsigned int direction;
+
+			if (bio_data_dir(bio) == READ)
+				direction = ITER_DEST;
+			else
+				direction = ITER_SOURCE;
+			iov_iter_ubuf(&iter, direction, meta_buffer, meta_len);
+			ret = bio_integrity_map_user(bio, &iter, meta_seed);
 			if (ret)
 				goto out_unmap;
 			req->cmd_flags |= REQ_INTEGRITY;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 44226fcb4d00..c90168274188 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -731,7 +731,7 @@ static inline bool bioset_initialized(struct bio_set *bs)
 	for_each_bio(_bio)						\
 		bip_for_each_vec(_bvl, _bio->bi_integrity, _iter)
 
-int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t len, u32 seed);
+int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed);
 void bio_integrity_unmap_free_user(struct bio *bio);
 extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
 extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
@@ -804,11 +804,13 @@ static inline int bio_integrity_add_page(struct bio *bio, struct page *page,
 	return 0;
 }
 
-static inline int bio_integrity_map_user(struct bio *bio, void __user *ubuf,
-					 ssize_t len, u32 seed)
+static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter,
+					u32 seed)
+
 {
 	return -EINVAL;
 }
+
 static inline void bio_integrity_unmap_free_user(struct bio *bio)
 {
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 07/10] block: define meta io descriptor
       [not found]   ` <CGME20240626101523epcas5p2616cf568575685bd251d28fc1398d4cd@epcas5p2.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  2024-06-27  6:22       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi

From: Kanchan Joshi <[email protected]>

Introduces a new 'uio_meta' structure that upper layer can
use to pass the meta/integrity information. This is a prep patch.

Signed-off-by: Kanchan Joshi <[email protected]>
---
 include/linux/bio.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/bio.h b/include/linux/bio.h
index c90168274188..966e22a04996 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -332,6 +332,12 @@ enum bip_flags {
 	BIP_CLONED		= 1 << 7, /* Indicates that bip is cloned */
 };
 
+struct uio_meta {
+	u16 flags;
+	u16 apptag;
+	struct iov_iter iter;
+};
+
 /*
  * bio integrity payload
  */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write
       [not found]   ` <CGME20240626101525epcas5p4dbcef84714e4e9214b951fe2ff649521@epcas5p4.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  2024-06-26 17:17       ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Anuj Gupta, Kanchan Joshi

This patch adds the capability of sending meta along with read/write.
This meta is represented by a newly introduced 'struct io_uring_meta'
which specifies information such as meta type/flags/buffer/length and
apptag.
Application sets up a SQE128 ring, prepares io_uring_meta within the SQE
at offset pointed by sqe->cmd.
The patch processes the user-passed information to prepare uio_meta
descriptor and passes it down using kiocb->private.

Meta exchange is supported only for direct IO.
Also vectored read/write operations with meta are not supported
currently.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 include/linux/fs.h            |  1 +
 include/uapi/linux/io_uring.h | 30 +++++++++++++++-
 io_uring/io_uring.c           |  7 ++++
 io_uring/rw.c                 | 68 +++++++++++++++++++++++++++++++++--
 io_uring/rw.h                 |  9 ++++-
 5 files changed, 110 insertions(+), 5 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index db26b4a70c62..0132565288c2 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -330,6 +330,7 @@ struct readahead_control;
 #define IOCB_NOIO		(1 << 20)
 /* can use bio alloc cache */
 #define IOCB_ALLOC_CACHE	(1 << 21)
+#define IOCB_HAS_META		(1 << 22)
 /*
  * IOCB_DIO_CALLER_COMP can be set by the iocb owner, to indicate that the
  * iocb completion can be passed back to the owner for execution from a safe
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 2aaf7ee256ac..9140c66b315b 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -101,12 +101,40 @@ struct io_uring_sqe {
 		__u64	optval;
 		/*
 		 * If the ring is initialized with IORING_SETUP_SQE128, then
-		 * this field is used for 80 bytes of arbitrary command data
+		 * this field is starting offset for 80 bytes of data.
+		 * This data is opaque for uring command op. And for meta io,
+		 * this contains 'struct io_uring_meta'.
 		 */
 		__u8	cmd[0];
 	};
 };
 
+enum io_uring_sqe_meta_type_bits {
+	META_TYPE_INTEGRITY_BIT,
+	/* not a real meta type; just to make sure that we don't overflow */
+	META_TYPE_LAST_BIT,
+};
+
+/* meta type flags */
+#define META_TYPE_INTEGRITY	(1U << META_TYPE_INTEGRITY_BIT)
+
+struct io_uring_meta {
+	__u16	meta_type;
+	__u16	meta_flags;
+	__u32	meta_len;
+	__u64	meta_addr;
+	/* the next 64 bytes goes to SQE128 */
+	__u16	apptag;
+	__u8	pad[62];
+};
+
+/*
+ * flags for integrity meta
+ */
+#define INTEGRITY_CHK_GUARD	(1U << 0)	/* enforce guard check */
+#define INTEGRITY_CHK_APPTAG	(1U << 1)	/* enforce app tag check */
+#define INTEGRITY_CHK_REFTAG	(1U << 2)	/* enforce ref tag check */
+
 /*
  * If sqe->file_index is set to this for opcodes that instantiate a new
  * direct descriptor (like openat/openat2/accept), then io_uring will allocate
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 7ed1e009aaec..0d26ee1193ca 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3704,6 +3704,13 @@ static int __init io_uring_init(void)
 	/* top 8bits are for internal use */
 	BUILD_BUG_ON((IORING_URING_CMD_MASK & 0xff000000) != 0);
 
+	BUILD_BUG_ON(sizeof(struct io_uring_meta) >
+		     2 * sizeof(struct io_uring_sqe) -
+		     offsetof(struct io_uring_sqe, cmd));
+
+	BUILD_BUG_ON(META_TYPE_LAST_BIT >
+		     8 * sizeof_field(struct io_uring_meta, meta_type));
+
 	io_uring_optable_init();
 
 	/*
diff --git a/io_uring/rw.c b/io_uring/rw.c
index c004d21e2f12..e8f5b5af4d2f 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -23,6 +23,8 @@
 #include "poll.h"
 #include "rw.h"
 
+#define	INTEGRITY_VALID_FLAGS (INTEGRITY_CHK_GUARD | INTEGRITY_CHK_APPTAG | \
+			       INTEGRITY_CHK_REFTAG)
 struct io_rw {
 	/* NOTE: kiocb has the file as the first member, so don't do it here */
 	struct kiocb			kiocb;
@@ -247,6 +249,42 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import)
 	return 0;
 }
 
+static int io_prep_rw_meta(struct io_kiocb *req, const struct io_uring_sqe *sqe,
+			   struct io_rw *rw, int ddir)
+{
+	const struct io_uring_meta *md = (struct io_uring_meta *)sqe->cmd;
+	u16 meta_type = READ_ONCE(md->meta_type);
+	const struct io_issue_def *def;
+	struct io_async_rw *io;
+	int ret;
+
+	if (!meta_type)
+		return 0;
+	if (!(meta_type & META_TYPE_INTEGRITY))
+		return -EINVAL;
+
+	/* should fit into two bytes */
+	BUILD_BUG_ON(INTEGRITY_VALID_FLAGS >= (1 << 16));
+
+	def = &io_issue_defs[req->opcode];
+	if (def->vectored)
+		return -EOPNOTSUPP;
+
+	io = req->async_data;
+	io->meta.flags = READ_ONCE(md->meta_flags);
+	if (io->meta.flags & ~INTEGRITY_VALID_FLAGS)
+		return -EINVAL;
+
+	io->meta.apptag = READ_ONCE(md->apptag);
+	ret = import_ubuf(ddir, u64_to_user_ptr(READ_ONCE(md->meta_addr)),
+			  READ_ONCE(md->meta_len), &io->meta.iter);
+	if (unlikely(ret < 0))
+		return ret;
+	rw->kiocb.ki_flags |= IOCB_HAS_META;
+	iov_iter_save_state(&io->meta.iter, &io->iter_meta_state);
+	return ret;
+}
+
 static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		      int ddir, bool do_import)
 {
@@ -269,11 +307,16 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		rw->kiocb.ki_ioprio = get_current_ioprio();
 	}
 	rw->kiocb.dio_complete = NULL;
+	rw->kiocb.ki_flags = 0;
 
 	rw->addr = READ_ONCE(sqe->addr);
 	rw->len = READ_ONCE(sqe->len);
 	rw->flags = READ_ONCE(sqe->rw_flags);
-	return io_prep_rw_setup(req, ddir, do_import);
+	ret = io_prep_rw_setup(req, ddir, do_import);
+
+	if (unlikely(req->ctx->flags & IORING_SETUP_SQE128 && !ret))
+		ret = io_prep_rw_meta(req, sqe, rw, ddir);
+	return ret;
 }
 
 int io_prep_read(struct io_kiocb *req, const struct io_uring_sqe *sqe)
@@ -400,7 +443,10 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
 static void io_resubmit_prep(struct io_kiocb *req)
 {
 	struct io_async_rw *io = req->async_data;
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
 
+	if (unlikely(rw->kiocb.ki_flags & IOCB_HAS_META))
+		iov_iter_restore(&io->meta.iter, &io->iter_meta_state);
 	iov_iter_restore(&io->iter, &io->iter_state);
 }
 
@@ -768,8 +814,12 @@ static inline int io_iter_do_read(struct io_rw *rw, struct iov_iter *iter)
 
 static bool need_complete_io(struct io_kiocb *req)
 {
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
+
+	/* Exclude meta IO as we don't support partial completion for that */
 	return req->flags & REQ_F_ISREG ||
-		S_ISBLK(file_inode(req->file)->i_mode);
+		S_ISBLK(file_inode(req->file)->i_mode) ||
+		!(rw->kiocb.ki_flags & IOCB_HAS_META);
 }
 
 static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
@@ -786,7 +836,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
 	if (!(req->flags & REQ_F_FIXED_FILE))
 		req->flags |= io_file_get_flags(file);
 
-	kiocb->ki_flags = file->f_iocb_flags;
+	kiocb->ki_flags |= file->f_iocb_flags;
 	ret = kiocb_set_rw_flags(kiocb, rw->flags, rw_type);
 	if (unlikely(ret))
 		return ret;
@@ -815,6 +865,14 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
 		kiocb->ki_complete = io_complete_rw;
 	}
 
+	if (unlikely(kiocb->ki_flags & IOCB_HAS_META)) {
+		struct io_async_rw *io = req->async_data;
+
+		if (!(req->file->f_flags & O_DIRECT))
+			return -EOPNOTSUPP;
+		kiocb->private = &io->meta;
+	}
+
 	return 0;
 }
 
@@ -881,6 +939,8 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 	 * manually if we need to.
 	 */
 	iov_iter_restore(&io->iter, &io->iter_state);
+	if (unlikely(kiocb->ki_flags & IOCB_HAS_META))
+		iov_iter_restore(&io->meta.iter, &io->iter_meta_state);
 
 	do {
 		/*
@@ -1091,6 +1151,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	} else {
 ret_eagain:
 		iov_iter_restore(&io->iter, &io->iter_state);
+		if (unlikely(kiocb->ki_flags & IOCB_HAS_META))
+			iov_iter_restore(&io->meta.iter, &io->iter_meta_state);
 		if (kiocb->ki_flags & IOCB_WRITE)
 			io_req_end_write(req);
 		return -EAGAIN;
diff --git a/io_uring/rw.h b/io_uring/rw.h
index 3f432dc75441..49944b539c51 100644
--- a/io_uring/rw.h
+++ b/io_uring/rw.h
@@ -9,7 +9,14 @@ struct io_async_rw {
 	struct iovec			fast_iov;
 	struct iovec			*free_iovec;
 	int				free_iov_nr;
-	struct wait_page_queue		wpq;
+	/* wpq is for buffered io, while meta fields are used with direct io*/
+	union {
+		struct wait_page_queue		wpq;
+		struct {
+			struct uio_meta			meta;
+			struct iov_iter_state		iter_meta_state;
+		};
+	};
 };
 
 int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 09/10] block: add support to pass user meta buffer
       [not found]   ` <CGME20240626101527epcas5p23e10a6701f552d16bd6a999418009ba0@epcas5p2.samsung.com>
@ 2024-06-26 10:06     ` Anuj Gupta
  0 siblings, 0 replies; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi, Anuj Gupta

From: Kanchan Joshi <[email protected]>

If iocb contains the meta, extract that and prepare the bip.
Extend bip so that can it can carry three new integrity-check flags
and application tag.

Make sure that ->prepare_fn and ->complete_fn are skipped for
user-owned meta buffer.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 block/bio-integrity.c | 44 +++++++++++++++++++++++++++++++++++++++++++
 block/fops.c          | 28 ++++++++++++++++++++++++++-
 block/t10-pi.c        |  6 ++++++
 include/linux/bio.h   | 10 ++++++++++
 4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 38418be07139..599f39999174 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -12,6 +12,7 @@
 #include <linux/bio.h>
 #include <linux/workqueue.h>
 #include <linux/slab.h>
+#include <uapi/linux/io_uring.h>
 #include "blk.h"
 
 static struct kmem_cache *bip_slab;
@@ -337,6 +338,49 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages,
 	return nr_bvecs;
 }
 
+static void bio_uio_meta_to_bip(struct bio *bio, struct uio_meta *meta)
+{
+	struct bio_integrity_payload *bip = bio_integrity(bio);
+	u16 bip_flags = 0;
+
+	if (meta->flags & INTEGRITY_CHK_GUARD)
+		bip_flags |= BIP_USER_CHK_GUARD;
+	if (meta->flags & INTEGRITY_CHK_APPTAG)
+		bip_flags |= BIP_USER_CHK_APPTAG;
+	if (meta->flags & INTEGRITY_CHK_REFTAG)
+		bip_flags |= BIP_USER_CHK_REFTAG;
+
+	bip->bip_flags |= bip_flags;
+	bip->apptag = meta->apptag;
+}
+
+int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta)
+{
+	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
+	unsigned int integrity_bytes;
+	int ret;
+	struct iov_iter it;
+
+	if (!bi)
+		return -EINVAL;
+	/*
+	 * original meta iterator can be bigger.
+	 * process integrity info corresponding to current data buffer only.
+	 */
+	it = meta->iter;
+	integrity_bytes = bio_integrity_bytes(bi, bio_sectors(bio));
+	if (it.count < integrity_bytes)
+		return -EINVAL;
+
+	it.count = integrity_bytes;
+	ret = bio_integrity_map_user(bio, &it, 0);
+	if (!ret) {
+		bio_uio_meta_to_bip(bio, meta);
+		iov_iter_advance(&meta->iter, integrity_bytes);
+	}
+	return ret;
+}
+
 int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter,
 			   u32 seed)
 {
diff --git a/block/fops.c b/block/fops.c
index be36c9fbd500..6477424b4ebc 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -126,12 +126,13 @@ static void blkdev_bio_end_io(struct bio *bio)
 {
 	struct blkdev_dio *dio = bio->bi_private;
 	bool should_dirty = dio->flags & DIO_SHOULD_DIRTY;
+	bool is_async = !(dio->flags & DIO_IS_SYNC);
 
 	if (bio->bi_status && !dio->bio.bi_status)
 		dio->bio.bi_status = bio->bi_status;
 
 	if (atomic_dec_and_test(&dio->ref)) {
-		if (!(dio->flags & DIO_IS_SYNC)) {
+		if (is_async) {
 			struct kiocb *iocb = dio->iocb;
 			ssize_t ret;
 
@@ -154,6 +155,9 @@ static void blkdev_bio_end_io(struct bio *bio)
 		}
 	}
 
+	if (is_async && (dio->iocb->ki_flags & IOCB_HAS_META))
+		bio_integrity_unmap_free_user(bio);
+
 	if (should_dirty) {
 		bio_check_pages_dirty(bio);
 	} else {
@@ -231,6 +235,16 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 			}
 			bio->bi_opf |= REQ_NOWAIT;
 		}
+		if (!is_sync && unlikely(iocb->ki_flags & IOCB_HAS_META)) {
+			ret = bio_integrity_map_iter(bio, iocb->private);
+			if (unlikely(ret)) {
+				bio_release_pages(bio, false);
+				bio_clear_flag(bio, BIO_REFFED);
+				bio_put(bio);
+				blk_finish_plug(&plug);
+				return ret;
+			}
+		}
 
 		if (is_read) {
 			if (dio->flags & DIO_SHOULD_DIRTY)
@@ -288,6 +302,9 @@ static void blkdev_bio_end_io_async(struct bio *bio)
 		ret = blk_status_to_errno(bio->bi_status);
 	}
 
+	if (unlikely(iocb->ki_flags & IOCB_HAS_META))
+		bio_integrity_unmap_free_user(bio);
+
 	iocb->ki_complete(iocb, ret);
 
 	if (dio->flags & DIO_SHOULD_DIRTY) {
@@ -348,6 +365,15 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 		task_io_account_write(bio->bi_iter.bi_size);
 	}
 
+	if (unlikely(iocb->ki_flags & IOCB_HAS_META)) {
+		ret = bio_integrity_map_iter(bio, iocb->private);
+		WRITE_ONCE(iocb->private, NULL);
+		if (unlikely(ret)) {
+			bio_put(bio);
+			return ret;
+		}
+	}
+
 	if (iocb->ki_flags & IOCB_ATOMIC)
 		bio->bi_opf |= REQ_ATOMIC;
 
diff --git a/block/t10-pi.c b/block/t10-pi.c
index cd7fa60d63ff..38c3da245b11 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -131,6 +131,8 @@ static void t10_pi_type1_prepare(struct request *rq)
 		/* Already remapped? */
 		if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
 			break;
+		if (bip->bip_flags & BIP_INTEGRITY_USER)
+			break;
 
 		bip_for_each_vec(iv, bip, iter) {
 			unsigned int j;
@@ -180,6 +182,8 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
 		struct bio_vec iv;
 		struct bvec_iter iter;
 
+		if (bip->bip_flags & BIP_INTEGRITY_USER)
+			break;
 		bip_for_each_vec(iv, bip, iter) {
 			unsigned int j;
 			void *p;
@@ -305,6 +309,8 @@ static void ext_pi_type1_prepare(struct request *rq)
 		/* Already remapped? */
 		if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
 			break;
+		if (bip->bip_flags & BIP_INTEGRITY_USER)
+			break;
 
 		bip_for_each_vec(iv, bip, iter) {
 			unsigned int j;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 966e22a04996..ff22b627906d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -330,6 +330,9 @@ enum bip_flags {
 	BIP_INTEGRITY_USER	= 1 << 5, /* Integrity payload is user address */
 	BIP_COPY_USER		= 1 << 6, /* Kernel bounce buffer in use */
 	BIP_CLONED		= 1 << 7, /* Indicates that bip is cloned */
+	BIP_USER_CHK_GUARD	= 1 << 8,
+	BIP_USER_CHK_APPTAG	= 1 << 9,
+	BIP_USER_CHK_REFTAG	= 1 << 10,
 };
 
 struct uio_meta {
@@ -349,6 +352,7 @@ struct bio_integrity_payload {
 	unsigned short		bip_vcnt;	/* # of integrity bio_vecs */
 	unsigned short		bip_max_vcnt;	/* integrity bio_vec slots */
 	unsigned short		bip_flags;	/* control flags */
+	u16			apptag;		/* apptag */
 
 	struct bvec_iter	bio_iter;	/* for rewinding parent bio */
 
@@ -738,6 +742,7 @@ static inline bool bioset_initialized(struct bio_set *bs)
 		bip_for_each_vec(_bvl, _bio->bi_integrity, _iter)
 
 int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed);
+int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta);
 void bio_integrity_unmap_free_user(struct bio *bio);
 extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
 extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
@@ -817,6 +822,11 @@ static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter,
 	return -EINVAL;
 }
 
+static inline int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta)
+{
+	return -EINVAL;
+}
+
 static inline void bio_integrity_unmap_free_user(struct bio *bio)
 {
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v2 10/10] nvme: add handling for user integrity buffer
       [not found]   ` <CGME20240626101529epcas5p49976c46701337830c400cefd8f074b40@epcas5p4.samsung.com>
@ 2024-06-26 10:07     ` Anuj Gupta
  2024-06-27  6:29       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Anuj Gupta @ 2024-06-26 10:07 UTC (permalink / raw)
  To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen
  Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi

From: Kanchan Joshi <[email protected]>

Create a new helper that contains the handling for both kernel and user
generated integrity buffer.
For user provided integrity buffer, convert bip flags
(guard/reftag/apptag checks) to protocol specific flags. Also pass
apptag and reftag down.

Signed-off-by: Kanchan Joshi <[email protected]>
---
 drivers/nvme/host/core.c | 85 ++++++++++++++++++++++++++++------------
 1 file changed, 60 insertions(+), 25 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ab0429644fe3..d17428a2b1dd 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -870,6 +870,13 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 	return BLK_STS_OK;
 }
 
+static void nvme_set_app_tag(struct nvme_command *cmnd, u16 apptag)
+{
+	cmnd->rw.apptag = cpu_to_le16(apptag);
+	/* use 0xfff as mask so that apptag is used in entirety*/
+	cmnd->rw.appmask = cpu_to_le16(0xffff);
+}
+
 static void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd,
 			      struct request *req)
 {
@@ -927,6 +934,55 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
 	return BLK_STS_OK;
 }
 
+static blk_status_t nvme_setup_rw_meta(struct nvme_ns *ns, struct request *req,
+				      struct nvme_command *cmnd, u16 *control,
+				      enum nvme_opcode op)
+{
+	struct bio_integrity_payload *bip = bio_integrity(req->bio);
+
+	if (!bip || !(bip->bip_flags & BIP_INTEGRITY_USER)) {
+		/*
+		 * If formated with metadata, the block layer always provides a
+		 * metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled.  Else
+		 * we enable the PRACT bit for protection information or set the
+		 * namespace capacity to zero to prevent any I/O.
+		 */
+		if (!blk_integrity_rq(req)) {
+			if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head)))
+				return BLK_STS_NOTSUPP;
+			*control |= NVME_RW_PRINFO_PRACT;
+		}
+
+		switch (ns->head->pi_type) {
+		case NVME_NS_DPS_PI_TYPE3:
+			*control |= NVME_RW_PRINFO_PRCHK_GUARD;
+			break;
+		case NVME_NS_DPS_PI_TYPE1:
+		case NVME_NS_DPS_PI_TYPE2:
+			*control |= NVME_RW_PRINFO_PRCHK_GUARD |
+					NVME_RW_PRINFO_PRCHK_REF;
+			if (op == nvme_cmd_zone_append)
+				*control |= NVME_RW_APPEND_PIREMAP;
+			nvme_set_ref_tag(ns, cmnd, req);
+			break;
+		}
+	} else {
+		unsigned short bip_flags = bip->bip_flags;
+
+		if (bip_flags & BIP_USER_CHK_GUARD)
+			*control |= NVME_RW_PRINFO_PRCHK_GUARD;
+		if (bip_flags & BIP_USER_CHK_REFTAG) {
+			*control |= NVME_RW_PRINFO_PRCHK_REF;
+			nvme_set_ref_tag(ns, cmnd, req);
+		}
+		if (bip_flags & BIP_USER_CHK_APPTAG) {
+			*control |= NVME_RW_PRINFO_PRCHK_APP;
+			nvme_set_app_tag(cmnd, bip->apptag);
+		}
+	}
+	return 0;
+}
+
 /*
  * NVMe does not support a dedicated command to issue an atomic write. A write
  * which does adhere to the device atomic limits will silently be executed
@@ -963,6 +1019,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 {
 	u16 control = 0;
 	u32 dsmgmt = 0;
+	blk_status_t ret;
 
 	if (req->cmd_flags & REQ_FUA)
 		control |= NVME_RW_FUA;
@@ -990,31 +1047,9 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 	cmnd->rw.appmask = 0;
 
 	if (ns->head->ms) {
-		/*
-		 * If formated with metadata, the block layer always provides a
-		 * metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled.  Else
-		 * we enable the PRACT bit for protection information or set the
-		 * namespace capacity to zero to prevent any I/O.
-		 */
-		if (!blk_integrity_rq(req)) {
-			if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head)))
-				return BLK_STS_NOTSUPP;
-			control |= NVME_RW_PRINFO_PRACT;
-		}
-
-		switch (ns->head->pi_type) {
-		case NVME_NS_DPS_PI_TYPE3:
-			control |= NVME_RW_PRINFO_PRCHK_GUARD;
-			break;
-		case NVME_NS_DPS_PI_TYPE1:
-		case NVME_NS_DPS_PI_TYPE2:
-			control |= NVME_RW_PRINFO_PRCHK_GUARD |
-					NVME_RW_PRINFO_PRCHK_REF;
-			if (op == nvme_cmd_zone_append)
-				control |= NVME_RW_APPEND_PIREMAP;
-			nvme_set_ref_tag(ns, cmnd, req);
-			break;
-		}
+		ret = nvme_setup_rw_meta(ns, req, cmnd, &control, op);
+		if (unlikely(ret))
+			return ret;
 	}
 
 	cmnd->rw.control = cpu_to_le16(control);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write
  2024-06-26 10:06     ` [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write Anuj Gupta
@ 2024-06-26 17:17       ` Gabriel Krisman Bertazi
  2024-07-01 14:09         ` Anuj gupta
  0 siblings, 1 reply; 25+ messages in thread
From: Gabriel Krisman Bertazi @ 2024-06-26 17:17 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block, Kanchan Joshi

Anuj Gupta <[email protected]> writes:

> This patch adds the capability of sending meta along with read/write.
> This meta is represented by a newly introduced 'struct io_uring_meta'
> which specifies information such as meta type/flags/buffer/length and
> apptag.
> Application sets up a SQE128 ring, prepares io_uring_meta within the SQE
> at offset pointed by sqe->cmd.
> The patch processes the user-passed information to prepare uio_meta
> descriptor and passes it down using kiocb->private.
>
> Meta exchange is supported only for direct IO.
> Also vectored read/write operations with meta are not supported
> currently.
>
> Signed-off-by: Anuj Gupta <[email protected]>
> Signed-off-by: Kanchan Joshi <[email protected]>
> ---
>  include/linux/fs.h            |  1 +
>  include/uapi/linux/io_uring.h | 30 +++++++++++++++-
>  io_uring/io_uring.c           |  7 ++++
>  io_uring/rw.c                 | 68 +++++++++++++++++++++++++++++++++--
>  io_uring/rw.h                 |  9 ++++-
>  5 files changed, 110 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index db26b4a70c62..0132565288c2 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -330,6 +330,7 @@ struct readahead_control;
>  #define IOCB_NOIO		(1 << 20)
>  /* can use bio alloc cache */
>  #define IOCB_ALLOC_CACHE	(1 << 21)
> +#define IOCB_HAS_META		(1 << 22)
>  /*
>   * IOCB_DIO_CALLER_COMP can be set by the iocb owner, to indicate that the
>   * iocb completion can be passed back to the owner for execution from a safe
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index 2aaf7ee256ac..9140c66b315b 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -101,12 +101,40 @@ struct io_uring_sqe {
>  		__u64	optval;
>  		/*
>  		 * If the ring is initialized with IORING_SETUP_SQE128, then
> -		 * this field is used for 80 bytes of arbitrary command data
> +		 * this field is starting offset for 80 bytes of data.
> +		 * This data is opaque for uring command op. And for meta io,
> +		 * this contains 'struct io_uring_meta'.
>  		 */
>  		__u8	cmd[0];
>  	};
>  };
>  
> +enum io_uring_sqe_meta_type_bits {
> +	META_TYPE_INTEGRITY_BIT,
> +	/* not a real meta type; just to make sure that we don't overflow */
> +	META_TYPE_LAST_BIT,
> +};
> +
> +/* meta type flags */
> +#define META_TYPE_INTEGRITY	(1U << META_TYPE_INTEGRITY_BIT)
> +
> +struct io_uring_meta {
> +	__u16	meta_type;
> +	__u16	meta_flags;
> +	__u32	meta_len;
> +	__u64	meta_addr;
> +	/* the next 64 bytes goes to SQE128 */
> +	__u16	apptag;
> +	__u8	pad[62];
> +};
> +
> +/*
> + * flags for integrity meta
> + */
> +#define INTEGRITY_CHK_GUARD	(1U << 0)	/* enforce guard check */
> +#define INTEGRITY_CHK_APPTAG	(1U << 1)	/* enforce app tag check */
> +#define INTEGRITY_CHK_REFTAG	(1U << 2)	/* enforce ref tag check */
> +
>  /*
>   * If sqe->file_index is set to this for opcodes that instantiate a new
>   * direct descriptor (like openat/openat2/accept), then io_uring will allocate
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 7ed1e009aaec..0d26ee1193ca 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3704,6 +3704,13 @@ static int __init io_uring_init(void)
>  	/* top 8bits are for internal use */
>  	BUILD_BUG_ON((IORING_URING_CMD_MASK & 0xff000000) != 0);
>  
> +	BUILD_BUG_ON(sizeof(struct io_uring_meta) >
> +		     2 * sizeof(struct io_uring_sqe) -
> +		     offsetof(struct io_uring_sqe, cmd));
> +
> +	BUILD_BUG_ON(META_TYPE_LAST_BIT >
> +		     8 * sizeof_field(struct io_uring_meta, meta_type));
> +
>  	io_uring_optable_init();
>  
>  	/*
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index c004d21e2f12..e8f5b5af4d2f 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -23,6 +23,8 @@
>  #include "poll.h"
>  #include "rw.h"
>  
> +#define	INTEGRITY_VALID_FLAGS (INTEGRITY_CHK_GUARD | INTEGRITY_CHK_APPTAG | \
> +			       INTEGRITY_CHK_REFTAG)
>  struct io_rw {
>  	/* NOTE: kiocb has the file as the first member, so don't do it here */
>  	struct kiocb			kiocb;
> @@ -247,6 +249,42 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import)
>  	return 0;
>  }
>  
> +static int io_prep_rw_meta(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> +			   struct io_rw *rw, int ddir)
> +{
> +	const struct io_uring_meta *md = (struct io_uring_meta *)sqe->cmd;
> +	u16 meta_type = READ_ONCE(md->meta_type);
> +	const struct io_issue_def *def;
> +	struct io_async_rw *io;
> +	int ret;
> +
> +	if (!meta_type)
> +		return 0;
> +	if (!(meta_type & META_TYPE_INTEGRITY))
> +		return -EINVAL;
> +
> +	/* should fit into two bytes */
> +	BUILD_BUG_ON(INTEGRITY_VALID_FLAGS >= (1 << 16));
> +
> +	def = &io_issue_defs[req->opcode];
> +	if (def->vectored)
> +		return -EOPNOTSUPP;
> +
> +	io = req->async_data;
> +	io->meta.flags = READ_ONCE(md->meta_flags);
> +	if (io->meta.flags & ~INTEGRITY_VALID_FLAGS)
> +		return -EINVAL;
> +
> +	io->meta.apptag = READ_ONCE(md->apptag);
> +	ret = import_ubuf(ddir, u64_to_user_ptr(READ_ONCE(md->meta_addr)),
> +			  READ_ONCE(md->meta_len), &io->meta.iter);
> +	if (unlikely(ret < 0))
> +		return ret;
> +	rw->kiocb.ki_flags |= IOCB_HAS_META;
> +	iov_iter_save_state(&io->meta.iter, &io->iter_meta_state);
> +	return ret;
> +}
> +
>  static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  		      int ddir, bool do_import)
>  {
> @@ -269,11 +307,16 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  		rw->kiocb.ki_ioprio = get_current_ioprio();
>  	}
>  	rw->kiocb.dio_complete = NULL;
> +	rw->kiocb.ki_flags = 0;
>  
>  	rw->addr = READ_ONCE(sqe->addr);
>  	rw->len = READ_ONCE(sqe->len);
>  	rw->flags = READ_ONCE(sqe->rw_flags);
> -	return io_prep_rw_setup(req, ddir, do_import);
> +	ret = io_prep_rw_setup(req, ddir, do_import);
> +
> +	if (unlikely(req->ctx->flags & IORING_SETUP_SQE128 && !ret))
> +		ret = io_prep_rw_meta(req, sqe, rw, ddir);
> +	return ret;

Would it be useful to have a flag to differentiate a malformed SQE from
a SQE with io_uring_meta, instead of assuming sqe->cmd has it? We don't
check for addr3 at the moment and differently from uring_cmd, userspace
will be mixing writes commands with and without metadata to different
files, so it would be useful to catch that.

Also, just styling, but can you turn that !ret into a separate if leg?
We are bound to add more code here eventually, and the next patch to
touch it will end up doing it anyway. I mean:

ret = io_prep_rw_setup(req, ddir, do_import);
if (unlikely(ret))
	return ret;
if (unlikely(req->ctx->flags & IORING_SETUP_SQE128))
	ret = io_prep_rw_meta(req, sqe, rw, ddir);
return ret;

-- 
Gabriel Krisman Bertazi

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 00/10] Read/Write with meta/integrity
  2024-06-26 10:06 ` [PATCH v2 00/10] Read/Write with meta/integrity Anuj Gupta
                     ` (9 preceding siblings ...)
       [not found]   ` <CGME20240626101529epcas5p49976c46701337830c400cefd8f074b40@epcas5p4.samsung.com>
@ 2024-06-27  6:05   ` Christoph Hellwig
  2024-06-27 19:12     ` Kanchan Joshi
  2024-06-28 20:36   ` (subset) " Jens Axboe
  11 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-27  6:05 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block

What tree does this apply to?  There's quite a few rejects vs
for-6.11/block.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone
  2024-06-26 10:06     ` [PATCH v2 03/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone Anuj Gupta
@ 2024-06-27  6:14       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-27  6:14 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block, Kanchan Joshi

On Wed, Jun 26, 2024 at 03:36:53PM +0530, Anuj Gupta wrote:
> If bio_integrity_copy_user is used to process the meta buffer, bip_max_vcnt
> is one greater than bip_vcnt.

Why?

> @@ -647,12 +647,12 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
>  
>  	BUG_ON(bip_src == NULL);
>  
> -	bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_vcnt);
> +	bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_max_vcnt);
>  	if (IS_ERR(bip))
>  		return PTR_ERR(bip);
>  
>  	memcpy(bip->bip_vec, bip_src->bip_vec,
> -	       bip_src->bip_vcnt * sizeof(struct bio_vec));
> +	       bip_src->bip_max_vcnt * sizeof(struct bio_vec));
>  
>  	bip->bip_vcnt = bip_src->bip_vcnt;
>  	bip->bip_iter = bip_src->bip_iter;

So trying to compare this to how the bio data path is cloned, it seems
like bio_integrity_clone is still copying the bvec array.  With the
concept of the immutable bvecs we've had for years this should not
be required.  That is reuse the actual bio_vecs, and just copy
the iter similar to bio_alloc_clone/__bio_clone.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/10] block: Handle meta bounce buffer correctly in case of split
  2024-06-26 10:06     ` [PATCH v2 04/10] block: Handle meta bounce buffer correctly in case of split Anuj Gupta
@ 2024-06-27  6:16       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-27  6:16 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block

On Wed, Jun 26, 2024 at 03:36:54PM +0530, Anuj Gupta wrote:
> @@ -105,9 +105,12 @@ static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs,
>  
>  static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip)
>  {
> +	struct bio *bio = bip->bip_bio;
> +	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
>  	unsigned short nr_vecs = bip->bip_max_vcnt - 1;
>  	struct bio_vec *copy = &bip->bip_vec[1];
> -	size_t bytes = bip->bip_iter.bi_size;
> +	size_t bytes = bio_integrity_bytes(bi,
> +					   bvec_iter_sectors(bip->bio_iter));

Maybe add a well documented helper that calculates the metadata bytes
based on the iter given that this is probably going to become more
common now that we're doing proper cloning?

> -	bip->bip_flags = bip_src->bip_flags & ~BIP_BLOCK_INTEGRITY;
> -
> +	bip->bip_flags = bip_src->bip_flags & ~(BIP_BLOCK_INTEGRITY |
> +						BIP_COPY_USER);

We're probably better off say what flags should be cloned and not
which ones should not.  Preferably with a new #define in bio.h.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 05/10] block: introduce BIP_CLONED flag
  2024-06-26 10:06     ` [PATCH v2 05/10] block: introduce BIP_CLONED flag Anuj Gupta
@ 2024-06-27  6:21       ` Christoph Hellwig
  2024-06-27 12:09         ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-27  6:21 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block, Kanchan Joshi

On Wed, Jun 26, 2024 at 03:36:55PM +0530, Anuj Gupta wrote:
> From: Kanchan Joshi <[email protected]>
> 
> Set the BIP_CLONED flag when bip is cloned.
> Use that flag to ensure that integrity is freed for cloned user bip.
> 
> Note that a bio may have BIO_CLONED flag set but it may still not be
> sharing the integrity vecs.

The design principle of the immutable bio_vecs for the data path
is that BIO_CLONED is just a debug aid and no code should check it.
I'd much prefer to keep that invariant for metadata.

> diff --git a/block/bio-integrity.c b/block/bio-integrity.c
> index 845d4038afb1..8f07c4d0fada 100644
> --- a/block/bio-integrity.c
> +++ b/block/bio-integrity.c
> @@ -147,7 +147,8 @@ void bio_integrity_free(struct bio *bio)
>  	struct bio_integrity_payload *bip = bio_integrity(bio);
>  	struct bio_set *bs = bio->bi_pool;
>  
> -	if (bip->bip_flags & BIP_INTEGRITY_USER)
> +	if (bip->bip_flags & BIP_INTEGRITY_USER &&
> +	    !(bip->bip_flags & BIP_CLONED))
>  		return;
>  	if (bip->bip_flags & BIP_BLOCK_INTEGRITY)
>  		kfree(bvec_virt(bip->bip_vec));

... and the right way to approach this is to clean up the mess
that we have in bio_integrity_free, which probably needs a split up
to deal wit hthe different cases:

 - block layer auto-generated bip_vecs we need it called where it is
   right now, but that side can now unconditionally free the data
   pointed to by the bip_vec
 - for callers that supply PI data themselves, including from user space,
   the caller needs to call __bio_integrity_free and clear
   bi_integrity and REQ_INTEGRITY

this is probably best done by moving the bip_flags checks out of
bio_integrity_free and have bio_integrity_free just do the
unconditional freeing, and have a new helper for
__bio_integrity_endio / bio_integrity_verify_fn to also
free the payload.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 07/10] block: define meta io descriptor
  2024-06-26 10:06     ` [PATCH v2 07/10] block: define meta io descriptor Anuj Gupta
@ 2024-06-27  6:22       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-27  6:22 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block, Kanchan Joshi

> +struct uio_meta {
> +	u16 flags;
> +	u16 apptag;
> +	struct iov_iter iter;
> +};

Everything else in the kernel uses app_tag instead of apptag, maybe
follow that here.

What flags go into flags?  Should this be a __bitwise type?
Also bio.h is used by every file system and all block drivers.
Should this be in bio-integrity.h instead?


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/10] block: modify bio_integrity_map_user to accept iov_iter as argument
  2024-06-26 10:06     ` [PATCH v2 06/10] block: modify bio_integrity_map_user to accept iov_iter as argument Anuj Gupta
@ 2024-06-27  6:23       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-27  6:23 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block, Kanchan Joshi

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 10/10] nvme: add handling for user integrity buffer
  2024-06-26 10:07     ` [PATCH v2 10/10] nvme: add handling for user integrity buffer Anuj Gupta
@ 2024-06-27  6:29       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-27  6:29 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block, Kanchan Joshi

On Wed, Jun 26, 2024 at 03:37:00PM +0530, Anuj Gupta wrote:
> From: Kanchan Joshi <[email protected]>
> 
> Create a new helper that contains the handling for both kernel and user
> generated integrity buffer.
> For user provided integrity buffer, convert bip flags
> (guard/reftag/apptag checks) to protocol specific flags. Also pass
> apptag and reftag down.

The driver should not have to know about the source.

> +static void nvme_set_app_tag(struct nvme_command *cmnd, u16 apptag)
> +{
> +	cmnd->rw.apptag = cpu_to_le16(apptag);
> +	/* use 0xfff as mask so that apptag is used in entirety*/

missing space before the closing comment.   But I think this also make
sense as:

	/* use the entire application tag */

> +	if (!bip || !(bip->bip_flags & BIP_INTEGRITY_USER)) {
> +		/*
> +		 * If formated with metadata, the block layer always provides a
> +		 * metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled.  Else
> +		 * we enable the PRACT bit for protection information or set the
> +		 * namespace capacity to zero to prevent any I/O.
> +		 */
> +		if (!blk_integrity_rq(req)) {
> +			if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head)))
> +				return BLK_STS_NOTSUPP;
> +			*control |= NVME_RW_PRINFO_PRACT;
> +		}

This feels like the wrong level of conditionals.  !bip implies
!blk_integrity_rq(req) already.

> +	} else {
> +		unsigned short bip_flags = bip->bip_flags;
> +
> +		if (bip_flags & BIP_USER_CHK_GUARD)
> +			*control |= NVME_RW_PRINFO_PRCHK_GUARD;
> +		if (bip_flags & BIP_USER_CHK_REFTAG) {
> +			*control |= NVME_RW_PRINFO_PRCHK_REF;
> +			nvme_set_ref_tag(ns, cmnd, req);
> +		}
> +		if (bip_flags & BIP_USER_CHK_APPTAG) {
> +			*control |= NVME_RW_PRINFO_PRCHK_APP;
> +			nvme_set_app_tag(cmnd, bip->apptag);
> +		}

But excpept for that the driver should always rely on the actual
flags passed by the block layer instead of having to see if it
is user passthrough data.  Also it seems like this series fails
to update the SCSI code to account for these new flags.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 05/10] block: introduce BIP_CLONED flag
  2024-06-27  6:21       ` Christoph Hellwig
@ 2024-06-27 12:09         ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-27 12:09 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block, Kanchan Joshi

On Thu, Jun 27, 2024 at 08:21:42AM +0200, Christoph Hellwig wrote:
> this is probably best done by moving the bip_flags checks out of
> bio_integrity_free and have bio_integrity_free just do the
> unconditional freeing, and have a new helper for
> __bio_integrity_endio / bio_integrity_verify_fn to also
> free the payload.

Something like the patch below, against my "integrity cleanups" series
from yesterday.  Lightly tested.

---
From f312de8ae5e329569c4810ddf977195e997e03ec Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <[email protected]>
Date: Thu, 27 Jun 2024 13:25:28 +0200
Subject: block: don't free submitter owned integrity payload on I/O completion

Currently __bio_integrity_endio unconditionally frees the integrity
payload.  While this works really well for block-layer generated
integrity payloads, it is a bad idea for those passed in by the
submitter, as it can't access the integrity data from the I/O completion
handler.

Change bio_integrity_endio to only call __bio_integrity_endio for
block layer generated integrity data, and leave freeing of submitter
allocated integrity data to bio_uninit which also gets called from
the final bio_put.  This requires that unmapping user mapped or copied
integrity data is done by the caller now.

Signed-off-by: Christoph Hellwig <[email protected]>
---
 block/bio-integrity.c | 68 ++++++++++++++++++++-----------------------
 block/blk-map.c       |  3 ++
 block/blk.h           |  4 ++-
 include/linux/bio.h   |  4 +--
 4 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 20bbfd5730dadd..d21ad6624f0062 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -22,9 +22,17 @@ void blk_flush_integrity(void)
 	flush_workqueue(kintegrityd_wq);
 }
 
-static void __bio_integrity_free(struct bio_set *bs,
-				 struct bio_integrity_payload *bip)
+/**
+ * bio_integrity_free - Free bio integrity payload
+ * @bio:	bio containing bip to be freed
+ *
+ * Description: Free the integrity portion of a bio.
+ */
+void bio_integrity_free(struct bio *bio)
 {
+	struct bio_integrity_payload *bip = bio_integrity(bio);
+	struct bio_set *bs = bio->bi_pool;
+
 	if (bs && mempool_initialized(&bs->bio_integrity_pool)) {
 		if (bip->bip_vec)
 			bvec_free(&bs->bvec_integrity_pool, bip->bip_vec,
@@ -33,6 +41,8 @@ static void __bio_integrity_free(struct bio_set *bs,
 	} else {
 		kfree(bip);
 	}
+	bio->bi_integrity = NULL;
+	bio->bi_opf &= ~REQ_INTEGRITY;
 }
 
 /**
@@ -49,19 +59,21 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
 						  gfp_t gfp_mask,
 						  unsigned int nr_vecs)
 {
-	struct bio_integrity_payload *bip;
 	struct bio_set *bs = bio->bi_pool;
+	bool has_mempool = bs && mempool_initialized(&bs->bio_integrity_pool);
+	struct bio_integrity_payload *bip;
 	unsigned inline_vecs;
 
 	if (WARN_ON_ONCE(bio_has_crypt_ctx(bio)))
 		return ERR_PTR(-EOPNOTSUPP);
 
-	if (!bs || !mempool_initialized(&bs->bio_integrity_pool)) {
-		bip = kmalloc(struct_size(bip, bip_inline_vecs, nr_vecs), gfp_mask);
-		inline_vecs = nr_vecs;
-	} else {
+	if (has_mempool) {
 		bip = mempool_alloc(&bs->bio_integrity_pool, gfp_mask);
 		inline_vecs = BIO_INLINE_VECS;
+	} else {
+		bip = kmalloc(struct_size(bip, bip_inline_vecs, nr_vecs),
+				gfp_mask);
+		inline_vecs = nr_vecs;
 	}
 
 	if (unlikely(!bip))
@@ -86,7 +98,10 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio,
 
 	return bip;
 err:
-	__bio_integrity_free(bs, bip);
+	if (has_mempool)
+		mempool_free(bip, &bs->bio_integrity_pool);
+	else
+		kfree(bip);
 	return ERR_PTR(-ENOMEM);
 }
 EXPORT_SYMBOL(bio_integrity_alloc);
@@ -118,9 +133,10 @@ static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip)
 	bio_integrity_unpin_bvec(copy, nr_vecs, true);
 }
 
-static void bio_integrity_unmap_user(struct bio_integrity_payload *bip)
+void bio_integrity_unmap_user(struct bio *bio)
 {
-	bool dirty = bio_data_dir(bip->bip_bio) == READ;
+	struct bio_integrity_payload *bip = bio_integrity(bio);
+	bool dirty = bio_data_dir(bio) == READ;
 
 	if (bip->bip_flags & BIP_COPY_USER) {
 		if (dirty)
@@ -131,28 +147,7 @@ static void bio_integrity_unmap_user(struct bio_integrity_payload *bip)
 
 	bio_integrity_unpin_bvec(bip->bip_vec, bip->bip_max_vcnt, dirty);
 }
-
-/**
- * bio_integrity_free - Free bio integrity payload
- * @bio:	bio containing bip to be freed
- *
- * Description: Used to free the integrity portion of a bio. Usually
- * called from bio_free().
- */
-void bio_integrity_free(struct bio *bio)
-{
-	struct bio_integrity_payload *bip = bio_integrity(bio);
-	struct bio_set *bs = bio->bi_pool;
-
-	if (bip->bip_flags & BIP_BLOCK_INTEGRITY)
-		kfree(bvec_virt(bip->bip_vec));
-	else if (bip->bip_flags & BIP_INTEGRITY_USER)
-		bio_integrity_unmap_user(bip);
-
-	__bio_integrity_free(bs, bip);
-	bio->bi_integrity = NULL;
-	bio->bi_opf &= ~REQ_INTEGRITY;
-}
+EXPORT_SYMBOL_GPL(bio_integrity_unmap_user);
 
 /**
  * bio_integrity_add_page - Attach integrity metadata
@@ -252,7 +247,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
 		goto free_bip;
 	}
 
-	bip->bip_flags |= BIP_INTEGRITY_USER | BIP_COPY_USER;
+	bip->bip_flags |= BIP_COPY_USER;
 	bip->bip_iter.bi_sector = seed;
 	return 0;
 free_bip:
@@ -272,7 +267,6 @@ static int bio_integrity_init_user(struct bio *bio, struct bio_vec *bvec,
 		return PTR_ERR(bip);
 
 	memcpy(bip->bip_vec, bvec, nr_vecs * sizeof(*bvec));
-	bip->bip_flags |= BIP_INTEGRITY_USER;
 	bip->bip_iter.bi_sector = seed;
 	bip->bip_iter.bi_size = len;
 	return 0;
@@ -479,6 +473,8 @@ static void bio_integrity_verify_fn(struct work_struct *work)
 	struct bio *bio = bip->bip_bio;
 
 	blk_integrity_verify(bio);
+
+	kfree(bvec_virt(bip->bip_vec));
 	bio_integrity_free(bio);
 	bio_endio(bio);
 }
@@ -499,13 +495,13 @@ bool __bio_integrity_endio(struct bio *bio)
 	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
 	struct bio_integrity_payload *bip = bio_integrity(bio);
 
-	if (bio_op(bio) == REQ_OP_READ && !bio->bi_status &&
-	    (bip->bip_flags & BIP_BLOCK_INTEGRITY) && bi->csum_type) {
+	if (bio_op(bio) == REQ_OP_READ && !bio->bi_status && bi->csum_type) {
 		INIT_WORK(&bip->bip_work, bio_integrity_verify_fn);
 		queue_work(kintegrityd_wq, &bip->bip_work);
 		return false;
 	}
 
+	kfree(bvec_virt(bip->bip_vec));
 	bio_integrity_free(bio);
 	return true;
 }
diff --git a/block/blk-map.c b/block/blk-map.c
index 71210cdb34426d..e7fc1f13f6b8d4 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -757,6 +757,9 @@ int blk_rq_unmap_user(struct bio *bio)
 			bio_release_pages(bio, bio_data_dir(bio) == READ);
 		}
 
+		if (bio_integrity(bio))
+			bio_integrity_unmap_user(bio);
+
 		next_bio = bio;
 		bio = bio->bi_next;
 		blk_mq_map_bio_put(next_bio);
diff --git a/block/blk.h b/block/blk.h
index 7917f86cca0ebd..5de7ce11f149b5 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -205,7 +205,9 @@ bool __bio_integrity_endio(struct bio *);
 void bio_integrity_free(struct bio *bio);
 static inline bool bio_integrity_endio(struct bio *bio)
 {
-	if (bio_integrity(bio))
+	struct bio_integrity_payload *bip = bio_integrity(bio);
+
+	if (bip && (bip->bip_flags & BIP_BLOCK_INTEGRITY))
 		return __bio_integrity_endio(bio);
 	return true;
 }
diff --git a/include/linux/bio.h b/include/linux/bio.h
index d5379548d684e1..622cf9c36c7e87 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -327,8 +327,7 @@ enum bip_flags {
 	BIP_CTRL_NOCHECK	= 1 << 2, /* disable HBA integrity checking */
 	BIP_DISK_NOCHECK	= 1 << 3, /* disable disk integrity checking */
 	BIP_IP_CHECKSUM		= 1 << 4, /* IP checksum */
-	BIP_INTEGRITY_USER	= 1 << 5, /* Integrity payload is user address */
-	BIP_COPY_USER		= 1 << 6, /* Kernel bounce buffer in use */
+	BIP_COPY_USER		= 1 << 5, /* Kernel bounce buffer in use */
 };
 
 /*
@@ -731,6 +730,7 @@ static inline bool bioset_initialized(struct bio_set *bs)
 		bip_for_each_vec(_bvl, _bio->bi_integrity, _iter)
 
 int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t len, u32 seed);
+void bio_integrity_unmap_user(struct bio *bio);
 extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int);
 extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int);
 extern bool bio_integrity_prep(struct bio *);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 00/10] Read/Write with meta/integrity
  2024-06-27  6:05   ` [PATCH v2 00/10] Read/Write with meta/integrity Christoph Hellwig
@ 2024-06-27 19:12     ` Kanchan Joshi
  0 siblings, 0 replies; 25+ messages in thread
From: Kanchan Joshi @ 2024-06-27 19:12 UTC (permalink / raw)
  To: Christoph Hellwig, Anuj Gupta
  Cc: asml.silence, mpatocka, axboe, kbusch, martin.petersen, io-uring,
	linux-nvme, linux-block

On 6/27/2024 11:35 AM, Christoph Hellwig wrote:
> What tree does this apply to?  There's quite a few rejects vs
> for-6.11/block.
> 

Jens for-next. On top of:

commit f078c063b954085cfa185aea2be6a836529d04fc
Author: Christoph Hellwig <[email protected]>
Date:   Mon Jun 24 19:38:35 2024 +0200

     block: fix the blk_queue_nonrot polarity

And as mentioned in cover letter, a tree is available here:
https://github.com/SamsungDS/linux/commits/feat/pi_us_v2/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/10] block: set bip_vcnt correctly
  2024-06-26 10:06     ` [PATCH v2 02/10] block: set bip_vcnt correctly Anuj Gupta
@ 2024-06-28  6:04       ` Christoph Hellwig
  2024-06-28 20:35         ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2024-06-28  6:04 UTC (permalink / raw)
  To: axboe
  Cc: Anuj Gupta, asml.silence, mpatocka, axboe, hch, kbusch,
	martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi

Jens, can you pick this one up for 6.11?  We can't really trigger
it as-is, but it would be good to get it out of the way and avoid
someone else triggering the issue (e.g. Mikulas with his dm-integrity
work).


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 02/10] block: set bip_vcnt correctly
  2024-06-28  6:04       ` Christoph Hellwig
@ 2024-06-28 20:35         ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2024-06-28 20:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Anuj Gupta, asml.silence, mpatocka, kbusch, martin.petersen,
	io-uring, linux-nvme, linux-block, Kanchan Joshi

On 6/28/24 12:04 AM, Christoph Hellwig wrote:
> Jens, can you pick this one up for 6.11?  We can't really trigger
> it as-is, but it would be good to get it out of the way and avoid
> someone else triggering the issue (e.g. Mikulas with his dm-integrity
> work).

Yeah done - worth noting that patch 1 in this series is already in
my tree as well.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: (subset) [PATCH v2 00/10] Read/Write with meta/integrity
  2024-06-26 10:06 ` [PATCH v2 00/10] Read/Write with meta/integrity Anuj Gupta
                     ` (10 preceding siblings ...)
  2024-06-27  6:05   ` [PATCH v2 00/10] Read/Write with meta/integrity Christoph Hellwig
@ 2024-06-28 20:36   ` Jens Axboe
  11 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2024-06-28 20:36 UTC (permalink / raw)
  To: asml.silence, mpatocka, hch, kbusch, martin.petersen, Anuj Gupta
  Cc: io-uring, linux-nvme, linux-block


On Wed, 26 Jun 2024 15:36:50 +0530, Anuj Gupta wrote:
> This adds a new io_uring interface to exchange meta along with read/write.
> 
> Interface:
> Meta information is represented using a newly introduced 'struct io_uring_meta'.
> Application sets up a SQE128 ring, and prepares io_uring_meta within unused
> portion of SQE. Application populates 'struct io_uring_meta' fields as below:
> 
> [...]

Applied, thanks!

[02/10] block: set bip_vcnt correctly
        commit: 3991657ae7074c3c497bf095093178bed37ea1b4

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write
  2024-06-26 17:17       ` Gabriel Krisman Bertazi
@ 2024-07-01 14:09         ` Anuj gupta
  0 siblings, 0 replies; 25+ messages in thread
From: Anuj gupta @ 2024-07-01 14:09 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi
  Cc: Anuj Gupta, asml.silence, mpatocka, axboe, hch, kbusch,
	martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi

On Wed, Jun 26, 2024 at 10:47 PM Gabriel Krisman Bertazi
<[email protected]> wrote:
>
> Anuj Gupta <[email protected]> writes:
>
> >  static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> >                     int ddir, bool do_import)
> >  {
> > @@ -269,11 +307,16 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
> >               rw->kiocb.ki_ioprio = get_current_ioprio();
> >       }
> >       rw->kiocb.dio_complete = NULL;
> > +     rw->kiocb.ki_flags = 0;
> >
> >       rw->addr = READ_ONCE(sqe->addr);
> >       rw->len = READ_ONCE(sqe->len);
> >       rw->flags = READ_ONCE(sqe->rw_flags);
> > -     return io_prep_rw_setup(req, ddir, do_import);
> > +     ret = io_prep_rw_setup(req, ddir, do_import);
> > +
> > +     if (unlikely(req->ctx->flags & IORING_SETUP_SQE128 && !ret))
> > +             ret = io_prep_rw_meta(req, sqe, rw, ddir);
> > +     return ret;
>
> Would it be useful to have a flag to differentiate a malformed SQE from
> a SQE with io_uring_meta, instead of assuming sqe->cmd has it? We don't
> check for addr3 at the moment and differently from uring_cmd, userspace
> will be mixing writes commands with and without metadata to different
> files, so it would be useful to catch that.
>
Yes, but I couldn't find a good place to keep that flag. sqe->rw_flags are RWF
flags and are meant for generic read/write. sqe->flags are generic io_uring
flags and are not opcode specific. Do you see a place where this flag
could fit in?

> --
> Gabriel Krisman Bertazi
>
--
Anuj Gupta

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-07-01 14:09 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20240626101415epcas5p3b06a963aa0b0196d6599fb86c90bc38c@epcas5p3.samsung.com>
2024-06-26 10:06 ` [PATCH v2 00/10] Read/Write with meta/integrity Anuj Gupta
     [not found]   ` <CGME20240626101511epcas5p31c95b67da58d408c371c49f8719140fc@epcas5p3.samsung.com>
2024-06-26 10:06     ` [PATCH v2 01/10] block: change rq_integrity_vec to respect the iterator Anuj Gupta
     [not found]   ` <CGME20240626101513epcas5p10b3f8470148abb10ce3edfb90814cd94@epcas5p1.samsung.com>
2024-06-26 10:06     ` [PATCH v2 02/10] block: set bip_vcnt correctly Anuj Gupta
2024-06-28  6:04       ` Christoph Hellwig
2024-06-28 20:35         ` Jens Axboe
     [not found]   ` <CGME20240626101516epcas5p19fb40e8231d1832cab3d031672f0109e@epcas5p1.samsung.com>
2024-06-26 10:06     ` [PATCH v2 03/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone Anuj Gupta
2024-06-27  6:14       ` Christoph Hellwig
     [not found]   ` <CGME20240626101518epcas5p17e046bca77b218fc6914ddeb182eb42e@epcas5p1.samsung.com>
2024-06-26 10:06     ` [PATCH v2 04/10] block: Handle meta bounce buffer correctly in case of split Anuj Gupta
2024-06-27  6:16       ` Christoph Hellwig
     [not found]   ` <CGME20240626101519epcas5p163b0735c1604a228196f0e8c14773005@epcas5p1.samsung.com>
2024-06-26 10:06     ` [PATCH v2 05/10] block: introduce BIP_CLONED flag Anuj Gupta
2024-06-27  6:21       ` Christoph Hellwig
2024-06-27 12:09         ` Christoph Hellwig
     [not found]   ` <CGME20240626101521epcas5p42b0c1c0e123996b199e058bae9a69123@epcas5p4.samsung.com>
2024-06-26 10:06     ` [PATCH v2 06/10] block: modify bio_integrity_map_user to accept iov_iter as argument Anuj Gupta
2024-06-27  6:23       ` Christoph Hellwig
     [not found]   ` <CGME20240626101523epcas5p2616cf568575685bd251d28fc1398d4cd@epcas5p2.samsung.com>
2024-06-26 10:06     ` [PATCH v2 07/10] block: define meta io descriptor Anuj Gupta
2024-06-27  6:22       ` Christoph Hellwig
     [not found]   ` <CGME20240626101525epcas5p4dbcef84714e4e9214b951fe2ff649521@epcas5p4.samsung.com>
2024-06-26 10:06     ` [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write Anuj Gupta
2024-06-26 17:17       ` Gabriel Krisman Bertazi
2024-07-01 14:09         ` Anuj gupta
     [not found]   ` <CGME20240626101527epcas5p23e10a6701f552d16bd6a999418009ba0@epcas5p2.samsung.com>
2024-06-26 10:06     ` [PATCH v2 09/10] block: add support to pass user meta buffer Anuj Gupta
     [not found]   ` <CGME20240626101529epcas5p49976c46701337830c400cefd8f074b40@epcas5p4.samsung.com>
2024-06-26 10:07     ` [PATCH v2 10/10] nvme: add handling for user integrity buffer Anuj Gupta
2024-06-27  6:29       ` Christoph Hellwig
2024-06-27  6:05   ` [PATCH v2 00/10] Read/Write with meta/integrity Christoph Hellwig
2024-06-27 19:12     ` Kanchan Joshi
2024-06-28 20:36   ` (subset) " Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox