* [PATCH v2 00/10] Read/Write with meta/integrity [not found] <CGME20240626101415epcas5p3b06a963aa0b0196d6599fb86c90bc38c@epcas5p3.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta [not found] ` <CGME20240626101511epcas5p31c95b67da58d408c371c49f8719140fc@epcas5p3.samsung.com> ` (11 more replies) 0 siblings, 12 replies; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Anuj Gupta This adds a new io_uring interface to exchange meta along with read/write. Interface: Meta information is represented using a newly introduced 'struct io_uring_meta'. Application sets up a SQE128 ring, and prepares io_uring_meta within unused portion of SQE. Application populates 'struct io_uring_meta' fields as below: * meta_type: describes type of meta that is passed. Currently one type "Integrity" is supported. * meta_flags: these are meta-type specific flags. Three flags are exposed for integrity type, namely INTEGRITY_CHK_GUARD/APPTAG/REFTAG. * meta_len: length of the meta buffer * meta_addr: address of the meta buffer * apptag: optional application-specific 16b value; this goes along with INTEGRITY_CHK_APPTAG flag. Block path (direct IO) and NVMe driver are modified to support this. The first patch is borrowed from Mikulas series[1] to make the metadata split work correctly. Next 5 patches are enhancements in the block/nvme so that user meta buffer is handled correctly (mostly when it gets split). Patch 8 adds the io_uring support. Patch 9 adds the support for block direct IO, and patch 10 for NVMe. Example program on how to use the interface is appended below [2] Tree: https://github.com/SamsungDS/linux/tree/feat/pi_us_v2 Testing: has been done by modifying fio to use this interface. https://github.com/samsungds/fio/commits/feat/test-meta-v3 Changes since v1: https://lore.kernel.org/linux-block/[email protected]/ - Do not use new opcode for meta, and also add the provision to introduce new meta types beyond integrity (Pavel) - Stuff IOCB_HAS_META check in need_complete_io (Jens) - Split meta handling in NVMe into a separate handler (Keith) - Add meta handling for __blkdev_direct_IO too (Keith) - Don't inherit BIP_COPY_USER flag for cloned bio's (Christoph) - Better commit descriptions (Christoph) Changes since RFC: - modify io_uring plumbing based on recent async handling state changes - fixes/enhancements to correctly handle the split for meta buffer - add flags to specify guard/reftag/apptag checks - add support to send apptag [1] https://lore.kernel.org/linux-block/[email protected]/ [2] #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include <linux/io_uring.h> #include <linux/types.h> #include "liburing.h" /* write data/meta. read both. compare. send apptag too. * prerequisite: * unprotected xfer: format namespace with 4KB + 8b, pi_type = 0 * protected xfer: format namespace with 4KB + 8b, pi_type = 1 */ #define DATA_LEN 4096 #define META_LEN 8 struct t10_pi_tuple { __be16 guard; __be16 apptag; __be32 reftag; }; int main(int argc, char *argv[]) { struct io_uring ring; struct io_uring_sqe *sqe = NULL; struct io_uring_cqe *cqe = NULL; void *wdb,*rdb; char wmb[META_LEN], rmb[META_LEN]; char *data_str = "data buffer"; char *meta_str = "meta"; int fd, ret, blksize; struct stat fstat; unsigned long long offset = DATA_LEN; struct t10_pi_tuple *pi; struct io_uring_meta *md; if (argc != 2) { fprintf(stderr, "Usage: %s <block-device>", argv[0]); return 1; }; if (stat(argv[1], &fstat) == 0) { blksize = (int)fstat.st_blksize; } else { perror("stat"); return 1; } if (posix_memalign(&wdb, blksize, DATA_LEN)) { perror("posix_memalign failed"); return 1; } if (posix_memalign(&rdb, blksize, DATA_LEN)) { perror("posix_memalign failed"); return 1; } strcpy(wdb, data_str); strcpy(wmb, meta_str); fd = open(argv[1], O_RDWR | O_DIRECT); if (fd < 0) { printf("Error in opening device\n"); return 0; } ret = io_uring_queue_init(8, &ring, IORING_SETUP_SQE128); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } /* write data + meta-buffer to device */ sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); return 1; } io_uring_prep_write(sqe, fd, wdb, DATA_LEN, offset); md = (struct io_uring_meta *) sqe->cmd; md->meta_type = META_TYPE_INTEGRITY; md->meta_addr = (__u64)wmb; md->meta_len = META_LEN; /* flags to ask for guard/reftag/apptag*/ md->meta_flags = INTEGRITY_CHK_APPTAG; md->apptag = 0x1234; pi = (struct t10_pi_tuple *)wmb; pi->apptag = 0x3412; ret = io_uring_submit(&ring); if (ret <= 0) { fprintf(stderr, "sqe submit failed: %d\n", ret); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (!cqe) { fprintf(stderr, "cqe is NULL :%d\n", ret); return 1; } if (cqe->res < 0) { fprintf(stderr, "write cqe failure: %d", cqe->res); return 1; } io_uring_cqe_seen(&ring, cqe); /* read data + meta-buffer back from device */ sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); return 1; } io_uring_prep_read(sqe, fd, rdb, DATA_LEN, offset); md = (struct io_uring_meta *) sqe->cmd; md->meta_type = META_TYPE_INTEGRITY; md->meta_addr = (__u64)rmb; md->meta_len = META_LEN; md->meta_flags = INTEGRITY_CHK_APPTAG; md->apptag = 0x1234; ret = io_uring_submit(&ring); if (ret <= 0) { fprintf(stderr, "sqe submit failed: %d\n", ret); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (!cqe) { fprintf(stderr, "cqe is NULL :%d\n", ret); return 1; } if (cqe->res < 0) { fprintf(stderr, "read cqe failure: %d", cqe->res); return 1; } io_uring_cqe_seen(&ring, cqe); if (strncmp(wmb, rmb, META_LEN)) printf("Failure: meta mismatch!, wmb=%s, rmb=%s\n", wmb, rmb); if (strncmp(wdb, rdb, DATA_LEN)) printf("Failure: data mismatch!\n"); io_uring_queue_exit(&ring); free(rdb); free(wdb); return 0; } Anuj Gupta (5): block: set bip_vcnt correctly block: copy bip_max_vcnt vecs instead of bip_vcnt during clone block: Handle meta bounce buffer correctly in case of split block: modify bio_integrity_map_user to accept iov_iter as argument io_uring/rw: add support to send meta along with read/write Kanchan Joshi (4): block: introduce BIP_CLONED flag block: define meta io descriptor block: add support to pass user meta buffer nvme: add handling for user integrity buffer Mikulas Patocka (1): block: change rq_integrity_vec to respect the iterator block/bio-integrity.c | 75 ++++++++++++++++++++++++++----- block/fops.c | 28 +++++++++++- block/t10-pi.c | 6 +++ drivers/nvme/host/core.c | 85 ++++++++++++++++++++++++----------- drivers/nvme/host/ioctl.c | 11 ++++- drivers/nvme/host/pci.c | 6 +-- include/linux/bio.h | 25 +++++++++-- include/linux/blk-integrity.h | 16 +++---- include/linux/fs.h | 1 + include/uapi/linux/io_uring.h | 30 ++++++++++++- io_uring/io_uring.c | 7 +++ io_uring/rw.c | 68 ++++++++++++++++++++++++++-- io_uring/rw.h | 9 +++- 13 files changed, 308 insertions(+), 59 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101511epcas5p31c95b67da58d408c371c49f8719140fc@epcas5p3.samsung.com>]
* [PATCH v2 01/10] block: change rq_integrity_vec to respect the iterator [not found] ` <CGME20240626101511epcas5p31c95b67da58d408c371c49f8719140fc@epcas5p3.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 0 siblings, 0 replies; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi, Anuj Gupta From: Mikulas Patocka <[email protected]> If we allocate a bio that is larger than NVMe maximum request size, attach integrity metadata to it and send it to the NVMe subsystem, the integrity metadata will be corrupted. Splitting the bio works correctly. The function bio_split will clone the bio, trim the iterator of the first bio and advance the iterator of the second bio. However, the function rq_integrity_vec has a bug - it returns the first vector of the bio's metadata and completely disregards the metadata iterator that was advanced when the bio was split. Thus, the second bio uses the same metadata as the first bio and this leads to metadata corruption. This commit changes rq_integrity_vec, so that it calls mp_bvec_iter_bvec instead of returning the first vector. mp_bvec_iter_bvec reads the iterator and uses it to build a bvec for the current position in the iterator. The "queue_max_integrity_segments(rq->q) > 1" check was removed, because the updated rq_integrity_vec function works correctly with multiple segments. Signed-off-by: Mikulas Patocka <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Kanchan Joshi <[email protected]> Reviewed-by: Anuj Gupta <[email protected]> --- drivers/nvme/host/pci.c | 6 +++--- include/linux/blk-integrity.h | 16 ++++++++-------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 102a9fb0c65f..5d8035218de9 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -826,9 +826,9 @@ static blk_status_t nvme_map_metadata(struct nvme_dev *dev, struct request *req, struct nvme_command *cmnd) { struct nvme_iod *iod = blk_mq_rq_to_pdu(req); + struct bio_vec bv = rq_integrity_vec(req); - iod->meta_dma = dma_map_bvec(dev->dev, rq_integrity_vec(req), - rq_dma_dir(req), 0); + iod->meta_dma = dma_map_bvec(dev->dev, &bv, rq_dma_dir(req), 0); if (dma_mapping_error(dev->dev, iod->meta_dma)) return BLK_STS_IOERR; cmnd->rw.metadata = cpu_to_le64(iod->meta_dma); @@ -967,7 +967,7 @@ static __always_inline void nvme_pci_unmap_rq(struct request *req) struct nvme_iod *iod = blk_mq_rq_to_pdu(req); dma_unmap_page(dev->dev, iod->meta_dma, - rq_integrity_vec(req)->bv_len, rq_dma_dir(req)); + rq_integrity_vec(req).bv_len, rq_dma_dir(req)); } if (blk_rq_nr_phys_segments(req)) diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h index d201140d77a3..c58634924782 100644 --- a/include/linux/blk-integrity.h +++ b/include/linux/blk-integrity.h @@ -90,14 +90,13 @@ static inline bool blk_integrity_rq(struct request *rq) } /* - * Return the first bvec that contains integrity data. Only drivers that are - * limited to a single integrity segment should use this helper. + * Return the current bvec that contains the integrity data. bip_iter may be + * advanced to iterate over the integrity data. */ -static inline struct bio_vec *rq_integrity_vec(struct request *rq) +static inline struct bio_vec rq_integrity_vec(struct request *rq) { - if (WARN_ON_ONCE(queue_max_integrity_segments(rq->q) > 1)) - return NULL; - return rq->bio->bi_integrity->bip_vec; + return mp_bvec_iter_bvec(rq->bio->bi_integrity->bip_vec, + rq->bio->bi_integrity->bip_iter); } #else /* CONFIG_BLK_DEV_INTEGRITY */ static inline int blk_rq_count_integrity_sg(struct request_queue *q, @@ -146,9 +145,10 @@ static inline int blk_integrity_rq(struct request *rq) return 0; } -static inline struct bio_vec *rq_integrity_vec(struct request *rq) +static inline struct bio_vec rq_integrity_vec(struct request *rq) { - return NULL; + /* the optimizer will remove all calls to this function */ + return (struct bio_vec){ }; } #endif /* CONFIG_BLK_DEV_INTEGRITY */ -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101513epcas5p10b3f8470148abb10ce3edfb90814cd94@epcas5p1.samsung.com>]
* [PATCH v2 02/10] block: set bip_vcnt correctly [not found] ` <CGME20240626101513epcas5p10b3f8470148abb10ce3edfb90814cd94@epcas5p1.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 2024-06-28 6:04 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Anuj Gupta, Kanchan Joshi Set the bip_vcnt correctly in bio_integrity_init_user and bio_integrity_copy_user. If the bio gets split at a later point, this value is required to set the right bip_vcnt in the cloned bio. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> --- block/bio-integrity.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index dab70370b2c7..af79d9fbf413 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -276,6 +276,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec, bip->bip_flags |= BIP_INTEGRITY_USER | BIP_COPY_USER; bip->bip_iter.bi_sector = seed; + bip->bip_vcnt = nr_vecs; return 0; free_bip: bio_integrity_free(bio); @@ -297,6 +298,7 @@ static int bio_integrity_init_user(struct bio *bio, struct bio_vec *bvec, bip->bip_flags |= BIP_INTEGRITY_USER; bip->bip_iter.bi_sector = seed; bip->bip_iter.bi_size = len; + bip->bip_vcnt = nr_vecs; return 0; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 02/10] block: set bip_vcnt correctly 2024-06-26 10:06 ` [PATCH v2 02/10] block: set bip_vcnt correctly Anuj Gupta @ 2024-06-28 6:04 ` Christoph Hellwig 2024-06-28 20:35 ` Jens Axboe 0 siblings, 1 reply; 25+ messages in thread From: Christoph Hellwig @ 2024-06-28 6:04 UTC (permalink / raw) To: axboe Cc: Anuj Gupta, asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi Jens, can you pick this one up for 6.11? We can't really trigger it as-is, but it would be good to get it out of the way and avoid someone else triggering the issue (e.g. Mikulas with his dm-integrity work). ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 02/10] block: set bip_vcnt correctly 2024-06-28 6:04 ` Christoph Hellwig @ 2024-06-28 20:35 ` Jens Axboe 0 siblings, 0 replies; 25+ messages in thread From: Jens Axboe @ 2024-06-28 20:35 UTC (permalink / raw) To: Christoph Hellwig Cc: Anuj Gupta, asml.silence, mpatocka, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi On 6/28/24 12:04 AM, Christoph Hellwig wrote: > Jens, can you pick this one up for 6.11? We can't really trigger > it as-is, but it would be good to get it out of the way and avoid > someone else triggering the issue (e.g. Mikulas with his dm-integrity > work). Yeah done - worth noting that patch 1 in this series is already in my tree as well. -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101516epcas5p19fb40e8231d1832cab3d031672f0109e@epcas5p1.samsung.com>]
* [PATCH v2 03/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone [not found] ` <CGME20240626101516epcas5p19fb40e8231d1832cab3d031672f0109e@epcas5p1.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 2024-06-27 6:14 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Anuj Gupta, Kanchan Joshi If bio_integrity_copy_user is used to process the meta buffer, bip_max_vcnt is one greater than bip_vcnt. In this case bip_max_vcnt vecs needs to be copied to cloned bip. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- block/bio-integrity.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index af79d9fbf413..5e7596b74ef1 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -647,12 +647,12 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src, BUG_ON(bip_src == NULL); - bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_vcnt); + bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_max_vcnt); if (IS_ERR(bip)) return PTR_ERR(bip); memcpy(bip->bip_vec, bip_src->bip_vec, - bip_src->bip_vcnt * sizeof(struct bio_vec)); + bip_src->bip_max_vcnt * sizeof(struct bio_vec)); bip->bip_vcnt = bip_src->bip_vcnt; bip->bip_iter = bip_src->bip_iter; -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 03/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone 2024-06-26 10:06 ` [PATCH v2 03/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone Anuj Gupta @ 2024-06-27 6:14 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2024-06-27 6:14 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi On Wed, Jun 26, 2024 at 03:36:53PM +0530, Anuj Gupta wrote: > If bio_integrity_copy_user is used to process the meta buffer, bip_max_vcnt > is one greater than bip_vcnt. Why? > @@ -647,12 +647,12 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src, > > BUG_ON(bip_src == NULL); > > - bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_vcnt); > + bip = bio_integrity_alloc(bio, gfp_mask, bip_src->bip_max_vcnt); > if (IS_ERR(bip)) > return PTR_ERR(bip); > > memcpy(bip->bip_vec, bip_src->bip_vec, > - bip_src->bip_vcnt * sizeof(struct bio_vec)); > + bip_src->bip_max_vcnt * sizeof(struct bio_vec)); > > bip->bip_vcnt = bip_src->bip_vcnt; > bip->bip_iter = bip_src->bip_iter; So trying to compare this to how the bio data path is cloned, it seems like bio_integrity_clone is still copying the bvec array. With the concept of the immutable bvecs we've had for years this should not be required. That is reuse the actual bio_vecs, and just copy the iter similar to bio_alloc_clone/__bio_clone. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101518epcas5p17e046bca77b218fc6914ddeb182eb42e@epcas5p1.samsung.com>]
* [PATCH v2 04/10] block: Handle meta bounce buffer correctly in case of split [not found] ` <CGME20240626101518epcas5p17e046bca77b218fc6914ddeb182eb42e@epcas5p1.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 2024-06-27 6:16 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Anuj Gupta Do not inherit BIP_COPY_USER for cloned bio. Also make sure that bounce buffer is copied back to user-space in entirety when the parent bio completes. Signed-off-by: Anuj Gupta <[email protected]> --- block/bio-integrity.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 5e7596b74ef1..845d4038afb1 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -105,9 +105,12 @@ static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs, static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip) { + struct bio *bio = bip->bip_bio; + struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); unsigned short nr_vecs = bip->bip_max_vcnt - 1; struct bio_vec *copy = &bip->bip_vec[1]; - size_t bytes = bip->bip_iter.bi_size; + size_t bytes = bio_integrity_bytes(bi, + bvec_iter_sectors(bip->bio_iter)); struct iov_iter iter; int ret; @@ -277,6 +280,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec, bip->bip_flags |= BIP_INTEGRITY_USER | BIP_COPY_USER; bip->bip_iter.bi_sector = seed; bip->bip_vcnt = nr_vecs; + bip->bio_iter = bio->bi_iter; return 0; free_bip: bio_integrity_free(bio); @@ -656,8 +660,8 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src, bip->bip_vcnt = bip_src->bip_vcnt; bip->bip_iter = bip_src->bip_iter; - bip->bip_flags = bip_src->bip_flags & ~BIP_BLOCK_INTEGRITY; - + bip->bip_flags = bip_src->bip_flags & ~(BIP_BLOCK_INTEGRITY | + BIP_COPY_USER); return 0; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 04/10] block: Handle meta bounce buffer correctly in case of split 2024-06-26 10:06 ` [PATCH v2 04/10] block: Handle meta bounce buffer correctly in case of split Anuj Gupta @ 2024-06-27 6:16 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2024-06-27 6:16 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block On Wed, Jun 26, 2024 at 03:36:54PM +0530, Anuj Gupta wrote: > @@ -105,9 +105,12 @@ static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs, > > static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip) > { > + struct bio *bio = bip->bip_bio; > + struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); > unsigned short nr_vecs = bip->bip_max_vcnt - 1; > struct bio_vec *copy = &bip->bip_vec[1]; > - size_t bytes = bip->bip_iter.bi_size; > + size_t bytes = bio_integrity_bytes(bi, > + bvec_iter_sectors(bip->bio_iter)); Maybe add a well documented helper that calculates the metadata bytes based on the iter given that this is probably going to become more common now that we're doing proper cloning? > - bip->bip_flags = bip_src->bip_flags & ~BIP_BLOCK_INTEGRITY; > - > + bip->bip_flags = bip_src->bip_flags & ~(BIP_BLOCK_INTEGRITY | > + BIP_COPY_USER); We're probably better off say what flags should be cloned and not which ones should not. Preferably with a new #define in bio.h. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101519epcas5p163b0735c1604a228196f0e8c14773005@epcas5p1.samsung.com>]
* [PATCH v2 05/10] block: introduce BIP_CLONED flag [not found] ` <CGME20240626101519epcas5p163b0735c1604a228196f0e8c14773005@epcas5p1.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 2024-06-27 6:21 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi From: Kanchan Joshi <[email protected]> Set the BIP_CLONED flag when bip is cloned. Use that flag to ensure that integrity is freed for cloned user bip. Note that a bio may have BIO_CLONED flag set but it may still not be sharing the integrity vecs. Signed-off-by: Kanchan Joshi <[email protected]> --- block/bio-integrity.c | 5 ++++- include/linux/bio.h | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 845d4038afb1..8f07c4d0fada 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -147,7 +147,8 @@ void bio_integrity_free(struct bio *bio) struct bio_integrity_payload *bip = bio_integrity(bio); struct bio_set *bs = bio->bi_pool; - if (bip->bip_flags & BIP_INTEGRITY_USER) + if (bip->bip_flags & BIP_INTEGRITY_USER && + !(bip->bip_flags & BIP_CLONED)) return; if (bip->bip_flags & BIP_BLOCK_INTEGRITY) kfree(bvec_virt(bip->bip_vec)); @@ -662,6 +663,8 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src, bip->bip_iter = bip_src->bip_iter; bip->bip_flags = bip_src->bip_flags & ~(BIP_BLOCK_INTEGRITY | BIP_COPY_USER); + bip->bip_flags |= BIP_CLONED; + return 0; } diff --git a/include/linux/bio.h b/include/linux/bio.h index 818e93612947..44226fcb4d00 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -329,6 +329,7 @@ enum bip_flags { BIP_IP_CHECKSUM = 1 << 4, /* IP checksum */ BIP_INTEGRITY_USER = 1 << 5, /* Integrity payload is user address */ BIP_COPY_USER = 1 << 6, /* Kernel bounce buffer in use */ + BIP_CLONED = 1 << 7, /* Indicates that bip is cloned */ }; /* -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 05/10] block: introduce BIP_CLONED flag 2024-06-26 10:06 ` [PATCH v2 05/10] block: introduce BIP_CLONED flag Anuj Gupta @ 2024-06-27 6:21 ` Christoph Hellwig 2024-06-27 12:09 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Christoph Hellwig @ 2024-06-27 6:21 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi On Wed, Jun 26, 2024 at 03:36:55PM +0530, Anuj Gupta wrote: > From: Kanchan Joshi <[email protected]> > > Set the BIP_CLONED flag when bip is cloned. > Use that flag to ensure that integrity is freed for cloned user bip. > > Note that a bio may have BIO_CLONED flag set but it may still not be > sharing the integrity vecs. The design principle of the immutable bio_vecs for the data path is that BIO_CLONED is just a debug aid and no code should check it. I'd much prefer to keep that invariant for metadata. > diff --git a/block/bio-integrity.c b/block/bio-integrity.c > index 845d4038afb1..8f07c4d0fada 100644 > --- a/block/bio-integrity.c > +++ b/block/bio-integrity.c > @@ -147,7 +147,8 @@ void bio_integrity_free(struct bio *bio) > struct bio_integrity_payload *bip = bio_integrity(bio); > struct bio_set *bs = bio->bi_pool; > > - if (bip->bip_flags & BIP_INTEGRITY_USER) > + if (bip->bip_flags & BIP_INTEGRITY_USER && > + !(bip->bip_flags & BIP_CLONED)) > return; > if (bip->bip_flags & BIP_BLOCK_INTEGRITY) > kfree(bvec_virt(bip->bip_vec)); ... and the right way to approach this is to clean up the mess that we have in bio_integrity_free, which probably needs a split up to deal wit hthe different cases: - block layer auto-generated bip_vecs we need it called where it is right now, but that side can now unconditionally free the data pointed to by the bip_vec - for callers that supply PI data themselves, including from user space, the caller needs to call __bio_integrity_free and clear bi_integrity and REQ_INTEGRITY this is probably best done by moving the bip_flags checks out of bio_integrity_free and have bio_integrity_free just do the unconditional freeing, and have a new helper for __bio_integrity_endio / bio_integrity_verify_fn to also free the payload. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 05/10] block: introduce BIP_CLONED flag 2024-06-27 6:21 ` Christoph Hellwig @ 2024-06-27 12:09 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2024-06-27 12:09 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi On Thu, Jun 27, 2024 at 08:21:42AM +0200, Christoph Hellwig wrote: > this is probably best done by moving the bip_flags checks out of > bio_integrity_free and have bio_integrity_free just do the > unconditional freeing, and have a new helper for > __bio_integrity_endio / bio_integrity_verify_fn to also > free the payload. Something like the patch below, against my "integrity cleanups" series from yesterday. Lightly tested. --- From f312de8ae5e329569c4810ddf977195e997e03ec Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <[email protected]> Date: Thu, 27 Jun 2024 13:25:28 +0200 Subject: block: don't free submitter owned integrity payload on I/O completion Currently __bio_integrity_endio unconditionally frees the integrity payload. While this works really well for block-layer generated integrity payloads, it is a bad idea for those passed in by the submitter, as it can't access the integrity data from the I/O completion handler. Change bio_integrity_endio to only call __bio_integrity_endio for block layer generated integrity data, and leave freeing of submitter allocated integrity data to bio_uninit which also gets called from the final bio_put. This requires that unmapping user mapped or copied integrity data is done by the caller now. Signed-off-by: Christoph Hellwig <[email protected]> --- block/bio-integrity.c | 68 ++++++++++++++++++++----------------------- block/blk-map.c | 3 ++ block/blk.h | 4 ++- include/linux/bio.h | 4 +-- 4 files changed, 40 insertions(+), 39 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 20bbfd5730dadd..d21ad6624f0062 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -22,9 +22,17 @@ void blk_flush_integrity(void) flush_workqueue(kintegrityd_wq); } -static void __bio_integrity_free(struct bio_set *bs, - struct bio_integrity_payload *bip) +/** + * bio_integrity_free - Free bio integrity payload + * @bio: bio containing bip to be freed + * + * Description: Free the integrity portion of a bio. + */ +void bio_integrity_free(struct bio *bio) { + struct bio_integrity_payload *bip = bio_integrity(bio); + struct bio_set *bs = bio->bi_pool; + if (bs && mempool_initialized(&bs->bio_integrity_pool)) { if (bip->bip_vec) bvec_free(&bs->bvec_integrity_pool, bip->bip_vec, @@ -33,6 +41,8 @@ static void __bio_integrity_free(struct bio_set *bs, } else { kfree(bip); } + bio->bi_integrity = NULL; + bio->bi_opf &= ~REQ_INTEGRITY; } /** @@ -49,19 +59,21 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio, gfp_t gfp_mask, unsigned int nr_vecs) { - struct bio_integrity_payload *bip; struct bio_set *bs = bio->bi_pool; + bool has_mempool = bs && mempool_initialized(&bs->bio_integrity_pool); + struct bio_integrity_payload *bip; unsigned inline_vecs; if (WARN_ON_ONCE(bio_has_crypt_ctx(bio))) return ERR_PTR(-EOPNOTSUPP); - if (!bs || !mempool_initialized(&bs->bio_integrity_pool)) { - bip = kmalloc(struct_size(bip, bip_inline_vecs, nr_vecs), gfp_mask); - inline_vecs = nr_vecs; - } else { + if (has_mempool) { bip = mempool_alloc(&bs->bio_integrity_pool, gfp_mask); inline_vecs = BIO_INLINE_VECS; + } else { + bip = kmalloc(struct_size(bip, bip_inline_vecs, nr_vecs), + gfp_mask); + inline_vecs = nr_vecs; } if (unlikely(!bip)) @@ -86,7 +98,10 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio, return bip; err: - __bio_integrity_free(bs, bip); + if (has_mempool) + mempool_free(bip, &bs->bio_integrity_pool); + else + kfree(bip); return ERR_PTR(-ENOMEM); } EXPORT_SYMBOL(bio_integrity_alloc); @@ -118,9 +133,10 @@ static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip) bio_integrity_unpin_bvec(copy, nr_vecs, true); } -static void bio_integrity_unmap_user(struct bio_integrity_payload *bip) +void bio_integrity_unmap_user(struct bio *bio) { - bool dirty = bio_data_dir(bip->bip_bio) == READ; + struct bio_integrity_payload *bip = bio_integrity(bio); + bool dirty = bio_data_dir(bio) == READ; if (bip->bip_flags & BIP_COPY_USER) { if (dirty) @@ -131,28 +147,7 @@ static void bio_integrity_unmap_user(struct bio_integrity_payload *bip) bio_integrity_unpin_bvec(bip->bip_vec, bip->bip_max_vcnt, dirty); } - -/** - * bio_integrity_free - Free bio integrity payload - * @bio: bio containing bip to be freed - * - * Description: Used to free the integrity portion of a bio. Usually - * called from bio_free(). - */ -void bio_integrity_free(struct bio *bio) -{ - struct bio_integrity_payload *bip = bio_integrity(bio); - struct bio_set *bs = bio->bi_pool; - - if (bip->bip_flags & BIP_BLOCK_INTEGRITY) - kfree(bvec_virt(bip->bip_vec)); - else if (bip->bip_flags & BIP_INTEGRITY_USER) - bio_integrity_unmap_user(bip); - - __bio_integrity_free(bs, bip); - bio->bi_integrity = NULL; - bio->bi_opf &= ~REQ_INTEGRITY; -} +EXPORT_SYMBOL_GPL(bio_integrity_unmap_user); /** * bio_integrity_add_page - Attach integrity metadata @@ -252,7 +247,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec, goto free_bip; } - bip->bip_flags |= BIP_INTEGRITY_USER | BIP_COPY_USER; + bip->bip_flags |= BIP_COPY_USER; bip->bip_iter.bi_sector = seed; return 0; free_bip: @@ -272,7 +267,6 @@ static int bio_integrity_init_user(struct bio *bio, struct bio_vec *bvec, return PTR_ERR(bip); memcpy(bip->bip_vec, bvec, nr_vecs * sizeof(*bvec)); - bip->bip_flags |= BIP_INTEGRITY_USER; bip->bip_iter.bi_sector = seed; bip->bip_iter.bi_size = len; return 0; @@ -479,6 +473,8 @@ static void bio_integrity_verify_fn(struct work_struct *work) struct bio *bio = bip->bip_bio; blk_integrity_verify(bio); + + kfree(bvec_virt(bip->bip_vec)); bio_integrity_free(bio); bio_endio(bio); } @@ -499,13 +495,13 @@ bool __bio_integrity_endio(struct bio *bio) struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); struct bio_integrity_payload *bip = bio_integrity(bio); - if (bio_op(bio) == REQ_OP_READ && !bio->bi_status && - (bip->bip_flags & BIP_BLOCK_INTEGRITY) && bi->csum_type) { + if (bio_op(bio) == REQ_OP_READ && !bio->bi_status && bi->csum_type) { INIT_WORK(&bip->bip_work, bio_integrity_verify_fn); queue_work(kintegrityd_wq, &bip->bip_work); return false; } + kfree(bvec_virt(bip->bip_vec)); bio_integrity_free(bio); return true; } diff --git a/block/blk-map.c b/block/blk-map.c index 71210cdb34426d..e7fc1f13f6b8d4 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -757,6 +757,9 @@ int blk_rq_unmap_user(struct bio *bio) bio_release_pages(bio, bio_data_dir(bio) == READ); } + if (bio_integrity(bio)) + bio_integrity_unmap_user(bio); + next_bio = bio; bio = bio->bi_next; blk_mq_map_bio_put(next_bio); diff --git a/block/blk.h b/block/blk.h index 7917f86cca0ebd..5de7ce11f149b5 100644 --- a/block/blk.h +++ b/block/blk.h @@ -205,7 +205,9 @@ bool __bio_integrity_endio(struct bio *); void bio_integrity_free(struct bio *bio); static inline bool bio_integrity_endio(struct bio *bio) { - if (bio_integrity(bio)) + struct bio_integrity_payload *bip = bio_integrity(bio); + + if (bip && (bip->bip_flags & BIP_BLOCK_INTEGRITY)) return __bio_integrity_endio(bio); return true; } diff --git a/include/linux/bio.h b/include/linux/bio.h index d5379548d684e1..622cf9c36c7e87 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -327,8 +327,7 @@ enum bip_flags { BIP_CTRL_NOCHECK = 1 << 2, /* disable HBA integrity checking */ BIP_DISK_NOCHECK = 1 << 3, /* disable disk integrity checking */ BIP_IP_CHECKSUM = 1 << 4, /* IP checksum */ - BIP_INTEGRITY_USER = 1 << 5, /* Integrity payload is user address */ - BIP_COPY_USER = 1 << 6, /* Kernel bounce buffer in use */ + BIP_COPY_USER = 1 << 5, /* Kernel bounce buffer in use */ }; /* @@ -731,6 +730,7 @@ static inline bool bioset_initialized(struct bio_set *bs) bip_for_each_vec(_bvl, _bio->bi_integrity, _iter) int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t len, u32 seed); +void bio_integrity_unmap_user(struct bio *bio); extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int); extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int); extern bool bio_integrity_prep(struct bio *); -- 2.43.0 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101521epcas5p42b0c1c0e123996b199e058bae9a69123@epcas5p4.samsung.com>]
* [PATCH v2 06/10] block: modify bio_integrity_map_user to accept iov_iter as argument [not found] ` <CGME20240626101521epcas5p42b0c1c0e123996b199e058bae9a69123@epcas5p4.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 2024-06-27 6:23 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Anuj Gupta, Kanchan Joshi This patch refactors bio_integrity_map_user to accept iov_iter as argument. This is a prep patch. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- block/bio-integrity.c | 12 +++++------- drivers/nvme/host/ioctl.c | 11 +++++++++-- include/linux/bio.h | 8 +++++--- 3 files changed, 19 insertions(+), 12 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 8f07c4d0fada..38418be07139 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -337,17 +337,16 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages, return nr_bvecs; } -int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes, +int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed) { struct request_queue *q = bdev_get_queue(bio->bi_bdev); unsigned int align = q->dma_pad_mask | queue_dma_alignment(q); struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages; struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec; + size_t offset, bytes = iter->count; unsigned int direction, nr_bvecs; - struct iov_iter iter; int ret, nr_vecs; - size_t offset; bool copy; if (bio_integrity(bio)) @@ -360,8 +359,7 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes, else direction = ITER_SOURCE; - iov_iter_ubuf(&iter, direction, ubuf, bytes); - nr_vecs = iov_iter_npages(&iter, BIO_MAX_VECS + 1); + nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS + 1); if (nr_vecs > BIO_MAX_VECS) return -E2BIG; if (nr_vecs > UIO_FASTIOV) { @@ -371,8 +369,8 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes, pages = NULL; } - copy = !iov_iter_is_aligned(&iter, align, align); - ret = iov_iter_extract_pages(&iter, &pages, bytes, nr_vecs, 0, &offset); + copy = !iov_iter_is_aligned(iter, align, align); + ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs, 0, &offset); if (unlikely(ret < 0)) goto free_bvec; diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index 8b69427a4476..e77ea1e7be03 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -152,8 +152,15 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer, if (bdev) { bio_set_dev(bio, bdev); if (meta_buffer && meta_len) { - ret = bio_integrity_map_user(bio, meta_buffer, meta_len, - meta_seed); + struct iov_iter iter; + unsigned int direction; + + if (bio_data_dir(bio) == READ) + direction = ITER_DEST; + else + direction = ITER_SOURCE; + iov_iter_ubuf(&iter, direction, meta_buffer, meta_len); + ret = bio_integrity_map_user(bio, &iter, meta_seed); if (ret) goto out_unmap; req->cmd_flags |= REQ_INTEGRITY; diff --git a/include/linux/bio.h b/include/linux/bio.h index 44226fcb4d00..c90168274188 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -731,7 +731,7 @@ static inline bool bioset_initialized(struct bio_set *bs) for_each_bio(_bio) \ bip_for_each_vec(_bvl, _bio->bi_integrity, _iter) -int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t len, u32 seed); +int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed); void bio_integrity_unmap_free_user(struct bio *bio); extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int); extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int); @@ -804,11 +804,13 @@ static inline int bio_integrity_add_page(struct bio *bio, struct page *page, return 0; } -static inline int bio_integrity_map_user(struct bio *bio, void __user *ubuf, - ssize_t len, u32 seed) +static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, + u32 seed) + { return -EINVAL; } + static inline void bio_integrity_unmap_free_user(struct bio *bio) { } -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 06/10] block: modify bio_integrity_map_user to accept iov_iter as argument 2024-06-26 10:06 ` [PATCH v2 06/10] block: modify bio_integrity_map_user to accept iov_iter as argument Anuj Gupta @ 2024-06-27 6:23 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2024-06-27 6:23 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi Looks good: Reviewed-by: Christoph Hellwig <[email protected]> ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101523epcas5p2616cf568575685bd251d28fc1398d4cd@epcas5p2.samsung.com>]
* [PATCH v2 07/10] block: define meta io descriptor [not found] ` <CGME20240626101523epcas5p2616cf568575685bd251d28fc1398d4cd@epcas5p2.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 2024-06-27 6:22 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi From: Kanchan Joshi <[email protected]> Introduces a new 'uio_meta' structure that upper layer can use to pass the meta/integrity information. This is a prep patch. Signed-off-by: Kanchan Joshi <[email protected]> --- include/linux/bio.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/include/linux/bio.h b/include/linux/bio.h index c90168274188..966e22a04996 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -332,6 +332,12 @@ enum bip_flags { BIP_CLONED = 1 << 7, /* Indicates that bip is cloned */ }; +struct uio_meta { + u16 flags; + u16 apptag; + struct iov_iter iter; +}; + /* * bio integrity payload */ -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 07/10] block: define meta io descriptor 2024-06-26 10:06 ` [PATCH v2 07/10] block: define meta io descriptor Anuj Gupta @ 2024-06-27 6:22 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2024-06-27 6:22 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi > +struct uio_meta { > + u16 flags; > + u16 apptag; > + struct iov_iter iter; > +}; Everything else in the kernel uses app_tag instead of apptag, maybe follow that here. What flags go into flags? Should this be a __bitwise type? Also bio.h is used by every file system and all block drivers. Should this be in bio-integrity.h instead? ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101525epcas5p4dbcef84714e4e9214b951fe2ff649521@epcas5p4.samsung.com>]
* [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write [not found] ` <CGME20240626101525epcas5p4dbcef84714e4e9214b951fe2ff649521@epcas5p4.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 2024-06-26 17:17 ` Gabriel Krisman Bertazi 0 siblings, 1 reply; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Anuj Gupta, Kanchan Joshi This patch adds the capability of sending meta along with read/write. This meta is represented by a newly introduced 'struct io_uring_meta' which specifies information such as meta type/flags/buffer/length and apptag. Application sets up a SQE128 ring, prepares io_uring_meta within the SQE at offset pointed by sqe->cmd. The patch processes the user-passed information to prepare uio_meta descriptor and passes it down using kiocb->private. Meta exchange is supported only for direct IO. Also vectored read/write operations with meta are not supported currently. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- include/linux/fs.h | 1 + include/uapi/linux/io_uring.h | 30 +++++++++++++++- io_uring/io_uring.c | 7 ++++ io_uring/rw.c | 68 +++++++++++++++++++++++++++++++++-- io_uring/rw.h | 9 ++++- 5 files changed, 110 insertions(+), 5 deletions(-) diff --git a/include/linux/fs.h b/include/linux/fs.h index db26b4a70c62..0132565288c2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -330,6 +330,7 @@ struct readahead_control; #define IOCB_NOIO (1 << 20) /* can use bio alloc cache */ #define IOCB_ALLOC_CACHE (1 << 21) +#define IOCB_HAS_META (1 << 22) /* * IOCB_DIO_CALLER_COMP can be set by the iocb owner, to indicate that the * iocb completion can be passed back to the owner for execution from a safe diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 2aaf7ee256ac..9140c66b315b 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -101,12 +101,40 @@ struct io_uring_sqe { __u64 optval; /* * If the ring is initialized with IORING_SETUP_SQE128, then - * this field is used for 80 bytes of arbitrary command data + * this field is starting offset for 80 bytes of data. + * This data is opaque for uring command op. And for meta io, + * this contains 'struct io_uring_meta'. */ __u8 cmd[0]; }; }; +enum io_uring_sqe_meta_type_bits { + META_TYPE_INTEGRITY_BIT, + /* not a real meta type; just to make sure that we don't overflow */ + META_TYPE_LAST_BIT, +}; + +/* meta type flags */ +#define META_TYPE_INTEGRITY (1U << META_TYPE_INTEGRITY_BIT) + +struct io_uring_meta { + __u16 meta_type; + __u16 meta_flags; + __u32 meta_len; + __u64 meta_addr; + /* the next 64 bytes goes to SQE128 */ + __u16 apptag; + __u8 pad[62]; +}; + +/* + * flags for integrity meta + */ +#define INTEGRITY_CHK_GUARD (1U << 0) /* enforce guard check */ +#define INTEGRITY_CHK_APPTAG (1U << 1) /* enforce app tag check */ +#define INTEGRITY_CHK_REFTAG (1U << 2) /* enforce ref tag check */ + /* * If sqe->file_index is set to this for opcodes that instantiate a new * direct descriptor (like openat/openat2/accept), then io_uring will allocate diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7ed1e009aaec..0d26ee1193ca 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3704,6 +3704,13 @@ static int __init io_uring_init(void) /* top 8bits are for internal use */ BUILD_BUG_ON((IORING_URING_CMD_MASK & 0xff000000) != 0); + BUILD_BUG_ON(sizeof(struct io_uring_meta) > + 2 * sizeof(struct io_uring_sqe) - + offsetof(struct io_uring_sqe, cmd)); + + BUILD_BUG_ON(META_TYPE_LAST_BIT > + 8 * sizeof_field(struct io_uring_meta, meta_type)); + io_uring_optable_init(); /* diff --git a/io_uring/rw.c b/io_uring/rw.c index c004d21e2f12..e8f5b5af4d2f 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -23,6 +23,8 @@ #include "poll.h" #include "rw.h" +#define INTEGRITY_VALID_FLAGS (INTEGRITY_CHK_GUARD | INTEGRITY_CHK_APPTAG | \ + INTEGRITY_CHK_REFTAG) struct io_rw { /* NOTE: kiocb has the file as the first member, so don't do it here */ struct kiocb kiocb; @@ -247,6 +249,42 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import) return 0; } +static int io_prep_rw_meta(struct io_kiocb *req, const struct io_uring_sqe *sqe, + struct io_rw *rw, int ddir) +{ + const struct io_uring_meta *md = (struct io_uring_meta *)sqe->cmd; + u16 meta_type = READ_ONCE(md->meta_type); + const struct io_issue_def *def; + struct io_async_rw *io; + int ret; + + if (!meta_type) + return 0; + if (!(meta_type & META_TYPE_INTEGRITY)) + return -EINVAL; + + /* should fit into two bytes */ + BUILD_BUG_ON(INTEGRITY_VALID_FLAGS >= (1 << 16)); + + def = &io_issue_defs[req->opcode]; + if (def->vectored) + return -EOPNOTSUPP; + + io = req->async_data; + io->meta.flags = READ_ONCE(md->meta_flags); + if (io->meta.flags & ~INTEGRITY_VALID_FLAGS) + return -EINVAL; + + io->meta.apptag = READ_ONCE(md->apptag); + ret = import_ubuf(ddir, u64_to_user_ptr(READ_ONCE(md->meta_addr)), + READ_ONCE(md->meta_len), &io->meta.iter); + if (unlikely(ret < 0)) + return ret; + rw->kiocb.ki_flags |= IOCB_HAS_META; + iov_iter_save_state(&io->meta.iter, &io->iter_meta_state); + return ret; +} + static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, int ddir, bool do_import) { @@ -269,11 +307,16 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, rw->kiocb.ki_ioprio = get_current_ioprio(); } rw->kiocb.dio_complete = NULL; + rw->kiocb.ki_flags = 0; rw->addr = READ_ONCE(sqe->addr); rw->len = READ_ONCE(sqe->len); rw->flags = READ_ONCE(sqe->rw_flags); - return io_prep_rw_setup(req, ddir, do_import); + ret = io_prep_rw_setup(req, ddir, do_import); + + if (unlikely(req->ctx->flags & IORING_SETUP_SQE128 && !ret)) + ret = io_prep_rw_meta(req, sqe, rw, ddir); + return ret; } int io_prep_read(struct io_kiocb *req, const struct io_uring_sqe *sqe) @@ -400,7 +443,10 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req) static void io_resubmit_prep(struct io_kiocb *req) { struct io_async_rw *io = req->async_data; + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + if (unlikely(rw->kiocb.ki_flags & IOCB_HAS_META)) + iov_iter_restore(&io->meta.iter, &io->iter_meta_state); iov_iter_restore(&io->iter, &io->iter_state); } @@ -768,8 +814,12 @@ static inline int io_iter_do_read(struct io_rw *rw, struct iov_iter *iter) static bool need_complete_io(struct io_kiocb *req) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + /* Exclude meta IO as we don't support partial completion for that */ return req->flags & REQ_F_ISREG || - S_ISBLK(file_inode(req->file)->i_mode); + S_ISBLK(file_inode(req->file)->i_mode) || + !(rw->kiocb.ki_flags & IOCB_HAS_META); } static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type) @@ -786,7 +836,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type) if (!(req->flags & REQ_F_FIXED_FILE)) req->flags |= io_file_get_flags(file); - kiocb->ki_flags = file->f_iocb_flags; + kiocb->ki_flags |= file->f_iocb_flags; ret = kiocb_set_rw_flags(kiocb, rw->flags, rw_type); if (unlikely(ret)) return ret; @@ -815,6 +865,14 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type) kiocb->ki_complete = io_complete_rw; } + if (unlikely(kiocb->ki_flags & IOCB_HAS_META)) { + struct io_async_rw *io = req->async_data; + + if (!(req->file->f_flags & O_DIRECT)) + return -EOPNOTSUPP; + kiocb->private = &io->meta; + } + return 0; } @@ -881,6 +939,8 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags) * manually if we need to. */ iov_iter_restore(&io->iter, &io->iter_state); + if (unlikely(kiocb->ki_flags & IOCB_HAS_META)) + iov_iter_restore(&io->meta.iter, &io->iter_meta_state); do { /* @@ -1091,6 +1151,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) } else { ret_eagain: iov_iter_restore(&io->iter, &io->iter_state); + if (unlikely(kiocb->ki_flags & IOCB_HAS_META)) + iov_iter_restore(&io->meta.iter, &io->iter_meta_state); if (kiocb->ki_flags & IOCB_WRITE) io_req_end_write(req); return -EAGAIN; diff --git a/io_uring/rw.h b/io_uring/rw.h index 3f432dc75441..49944b539c51 100644 --- a/io_uring/rw.h +++ b/io_uring/rw.h @@ -9,7 +9,14 @@ struct io_async_rw { struct iovec fast_iov; struct iovec *free_iovec; int free_iov_nr; - struct wait_page_queue wpq; + /* wpq is for buffered io, while meta fields are used with direct io*/ + union { + struct wait_page_queue wpq; + struct { + struct uio_meta meta; + struct iov_iter_state iter_meta_state; + }; + }; }; int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe); -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write 2024-06-26 10:06 ` [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write Anuj Gupta @ 2024-06-26 17:17 ` Gabriel Krisman Bertazi 2024-07-01 14:09 ` Anuj gupta 0 siblings, 1 reply; 25+ messages in thread From: Gabriel Krisman Bertazi @ 2024-06-26 17:17 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi Anuj Gupta <[email protected]> writes: > This patch adds the capability of sending meta along with read/write. > This meta is represented by a newly introduced 'struct io_uring_meta' > which specifies information such as meta type/flags/buffer/length and > apptag. > Application sets up a SQE128 ring, prepares io_uring_meta within the SQE > at offset pointed by sqe->cmd. > The patch processes the user-passed information to prepare uio_meta > descriptor and passes it down using kiocb->private. > > Meta exchange is supported only for direct IO. > Also vectored read/write operations with meta are not supported > currently. > > Signed-off-by: Anuj Gupta <[email protected]> > Signed-off-by: Kanchan Joshi <[email protected]> > --- > include/linux/fs.h | 1 + > include/uapi/linux/io_uring.h | 30 +++++++++++++++- > io_uring/io_uring.c | 7 ++++ > io_uring/rw.c | 68 +++++++++++++++++++++++++++++++++-- > io_uring/rw.h | 9 ++++- > 5 files changed, 110 insertions(+), 5 deletions(-) > > diff --git a/include/linux/fs.h b/include/linux/fs.h > index db26b4a70c62..0132565288c2 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -330,6 +330,7 @@ struct readahead_control; > #define IOCB_NOIO (1 << 20) > /* can use bio alloc cache */ > #define IOCB_ALLOC_CACHE (1 << 21) > +#define IOCB_HAS_META (1 << 22) > /* > * IOCB_DIO_CALLER_COMP can be set by the iocb owner, to indicate that the > * iocb completion can be passed back to the owner for execution from a safe > diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h > index 2aaf7ee256ac..9140c66b315b 100644 > --- a/include/uapi/linux/io_uring.h > +++ b/include/uapi/linux/io_uring.h > @@ -101,12 +101,40 @@ struct io_uring_sqe { > __u64 optval; > /* > * If the ring is initialized with IORING_SETUP_SQE128, then > - * this field is used for 80 bytes of arbitrary command data > + * this field is starting offset for 80 bytes of data. > + * This data is opaque for uring command op. And for meta io, > + * this contains 'struct io_uring_meta'. > */ > __u8 cmd[0]; > }; > }; > > +enum io_uring_sqe_meta_type_bits { > + META_TYPE_INTEGRITY_BIT, > + /* not a real meta type; just to make sure that we don't overflow */ > + META_TYPE_LAST_BIT, > +}; > + > +/* meta type flags */ > +#define META_TYPE_INTEGRITY (1U << META_TYPE_INTEGRITY_BIT) > + > +struct io_uring_meta { > + __u16 meta_type; > + __u16 meta_flags; > + __u32 meta_len; > + __u64 meta_addr; > + /* the next 64 bytes goes to SQE128 */ > + __u16 apptag; > + __u8 pad[62]; > +}; > + > +/* > + * flags for integrity meta > + */ > +#define INTEGRITY_CHK_GUARD (1U << 0) /* enforce guard check */ > +#define INTEGRITY_CHK_APPTAG (1U << 1) /* enforce app tag check */ > +#define INTEGRITY_CHK_REFTAG (1U << 2) /* enforce ref tag check */ > + > /* > * If sqe->file_index is set to this for opcodes that instantiate a new > * direct descriptor (like openat/openat2/accept), then io_uring will allocate > diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c > index 7ed1e009aaec..0d26ee1193ca 100644 > --- a/io_uring/io_uring.c > +++ b/io_uring/io_uring.c > @@ -3704,6 +3704,13 @@ static int __init io_uring_init(void) > /* top 8bits are for internal use */ > BUILD_BUG_ON((IORING_URING_CMD_MASK & 0xff000000) != 0); > > + BUILD_BUG_ON(sizeof(struct io_uring_meta) > > + 2 * sizeof(struct io_uring_sqe) - > + offsetof(struct io_uring_sqe, cmd)); > + > + BUILD_BUG_ON(META_TYPE_LAST_BIT > > + 8 * sizeof_field(struct io_uring_meta, meta_type)); > + > io_uring_optable_init(); > > /* > diff --git a/io_uring/rw.c b/io_uring/rw.c > index c004d21e2f12..e8f5b5af4d2f 100644 > --- a/io_uring/rw.c > +++ b/io_uring/rw.c > @@ -23,6 +23,8 @@ > #include "poll.h" > #include "rw.h" > > +#define INTEGRITY_VALID_FLAGS (INTEGRITY_CHK_GUARD | INTEGRITY_CHK_APPTAG | \ > + INTEGRITY_CHK_REFTAG) > struct io_rw { > /* NOTE: kiocb has the file as the first member, so don't do it here */ > struct kiocb kiocb; > @@ -247,6 +249,42 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import) > return 0; > } > > +static int io_prep_rw_meta(struct io_kiocb *req, const struct io_uring_sqe *sqe, > + struct io_rw *rw, int ddir) > +{ > + const struct io_uring_meta *md = (struct io_uring_meta *)sqe->cmd; > + u16 meta_type = READ_ONCE(md->meta_type); > + const struct io_issue_def *def; > + struct io_async_rw *io; > + int ret; > + > + if (!meta_type) > + return 0; > + if (!(meta_type & META_TYPE_INTEGRITY)) > + return -EINVAL; > + > + /* should fit into two bytes */ > + BUILD_BUG_ON(INTEGRITY_VALID_FLAGS >= (1 << 16)); > + > + def = &io_issue_defs[req->opcode]; > + if (def->vectored) > + return -EOPNOTSUPP; > + > + io = req->async_data; > + io->meta.flags = READ_ONCE(md->meta_flags); > + if (io->meta.flags & ~INTEGRITY_VALID_FLAGS) > + return -EINVAL; > + > + io->meta.apptag = READ_ONCE(md->apptag); > + ret = import_ubuf(ddir, u64_to_user_ptr(READ_ONCE(md->meta_addr)), > + READ_ONCE(md->meta_len), &io->meta.iter); > + if (unlikely(ret < 0)) > + return ret; > + rw->kiocb.ki_flags |= IOCB_HAS_META; > + iov_iter_save_state(&io->meta.iter, &io->iter_meta_state); > + return ret; > +} > + > static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, > int ddir, bool do_import) > { > @@ -269,11 +307,16 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, > rw->kiocb.ki_ioprio = get_current_ioprio(); > } > rw->kiocb.dio_complete = NULL; > + rw->kiocb.ki_flags = 0; > > rw->addr = READ_ONCE(sqe->addr); > rw->len = READ_ONCE(sqe->len); > rw->flags = READ_ONCE(sqe->rw_flags); > - return io_prep_rw_setup(req, ddir, do_import); > + ret = io_prep_rw_setup(req, ddir, do_import); > + > + if (unlikely(req->ctx->flags & IORING_SETUP_SQE128 && !ret)) > + ret = io_prep_rw_meta(req, sqe, rw, ddir); > + return ret; Would it be useful to have a flag to differentiate a malformed SQE from a SQE with io_uring_meta, instead of assuming sqe->cmd has it? We don't check for addr3 at the moment and differently from uring_cmd, userspace will be mixing writes commands with and without metadata to different files, so it would be useful to catch that. Also, just styling, but can you turn that !ret into a separate if leg? We are bound to add more code here eventually, and the next patch to touch it will end up doing it anyway. I mean: ret = io_prep_rw_setup(req, ddir, do_import); if (unlikely(ret)) return ret; if (unlikely(req->ctx->flags & IORING_SETUP_SQE128)) ret = io_prep_rw_meta(req, sqe, rw, ddir); return ret; -- Gabriel Krisman Bertazi ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write 2024-06-26 17:17 ` Gabriel Krisman Bertazi @ 2024-07-01 14:09 ` Anuj gupta 0 siblings, 0 replies; 25+ messages in thread From: Anuj gupta @ 2024-07-01 14:09 UTC (permalink / raw) To: Gabriel Krisman Bertazi Cc: Anuj Gupta, asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi On Wed, Jun 26, 2024 at 10:47 PM Gabriel Krisman Bertazi <[email protected]> wrote: > > Anuj Gupta <[email protected]> writes: > > > static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, > > int ddir, bool do_import) > > { > > @@ -269,11 +307,16 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, > > rw->kiocb.ki_ioprio = get_current_ioprio(); > > } > > rw->kiocb.dio_complete = NULL; > > + rw->kiocb.ki_flags = 0; > > > > rw->addr = READ_ONCE(sqe->addr); > > rw->len = READ_ONCE(sqe->len); > > rw->flags = READ_ONCE(sqe->rw_flags); > > - return io_prep_rw_setup(req, ddir, do_import); > > + ret = io_prep_rw_setup(req, ddir, do_import); > > + > > + if (unlikely(req->ctx->flags & IORING_SETUP_SQE128 && !ret)) > > + ret = io_prep_rw_meta(req, sqe, rw, ddir); > > + return ret; > > Would it be useful to have a flag to differentiate a malformed SQE from > a SQE with io_uring_meta, instead of assuming sqe->cmd has it? We don't > check for addr3 at the moment and differently from uring_cmd, userspace > will be mixing writes commands with and without metadata to different > files, so it would be useful to catch that. > Yes, but I couldn't find a good place to keep that flag. sqe->rw_flags are RWF flags and are meant for generic read/write. sqe->flags are generic io_uring flags and are not opcode specific. Do you see a place where this flag could fit in? > -- > Gabriel Krisman Bertazi > -- Anuj Gupta ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101527epcas5p23e10a6701f552d16bd6a999418009ba0@epcas5p2.samsung.com>]
* [PATCH v2 09/10] block: add support to pass user meta buffer [not found] ` <CGME20240626101527epcas5p23e10a6701f552d16bd6a999418009ba0@epcas5p2.samsung.com> @ 2024-06-26 10:06 ` Anuj Gupta 0 siblings, 0 replies; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:06 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi, Anuj Gupta From: Kanchan Joshi <[email protected]> If iocb contains the meta, extract that and prepare the bip. Extend bip so that can it can carry three new integrity-check flags and application tag. Make sure that ->prepare_fn and ->complete_fn are skipped for user-owned meta buffer. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- block/bio-integrity.c | 44 +++++++++++++++++++++++++++++++++++++++++++ block/fops.c | 28 ++++++++++++++++++++++++++- block/t10-pi.c | 6 ++++++ include/linux/bio.h | 10 ++++++++++ 4 files changed, 87 insertions(+), 1 deletion(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 38418be07139..599f39999174 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -12,6 +12,7 @@ #include <linux/bio.h> #include <linux/workqueue.h> #include <linux/slab.h> +#include <uapi/linux/io_uring.h> #include "blk.h" static struct kmem_cache *bip_slab; @@ -337,6 +338,49 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages, return nr_bvecs; } +static void bio_uio_meta_to_bip(struct bio *bio, struct uio_meta *meta) +{ + struct bio_integrity_payload *bip = bio_integrity(bio); + u16 bip_flags = 0; + + if (meta->flags & INTEGRITY_CHK_GUARD) + bip_flags |= BIP_USER_CHK_GUARD; + if (meta->flags & INTEGRITY_CHK_APPTAG) + bip_flags |= BIP_USER_CHK_APPTAG; + if (meta->flags & INTEGRITY_CHK_REFTAG) + bip_flags |= BIP_USER_CHK_REFTAG; + + bip->bip_flags |= bip_flags; + bip->apptag = meta->apptag; +} + +int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta) +{ + struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); + unsigned int integrity_bytes; + int ret; + struct iov_iter it; + + if (!bi) + return -EINVAL; + /* + * original meta iterator can be bigger. + * process integrity info corresponding to current data buffer only. + */ + it = meta->iter; + integrity_bytes = bio_integrity_bytes(bi, bio_sectors(bio)); + if (it.count < integrity_bytes) + return -EINVAL; + + it.count = integrity_bytes; + ret = bio_integrity_map_user(bio, &it, 0); + if (!ret) { + bio_uio_meta_to_bip(bio, meta); + iov_iter_advance(&meta->iter, integrity_bytes); + } + return ret; +} + int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed) { diff --git a/block/fops.c b/block/fops.c index be36c9fbd500..6477424b4ebc 100644 --- a/block/fops.c +++ b/block/fops.c @@ -126,12 +126,13 @@ static void blkdev_bio_end_io(struct bio *bio) { struct blkdev_dio *dio = bio->bi_private; bool should_dirty = dio->flags & DIO_SHOULD_DIRTY; + bool is_async = !(dio->flags & DIO_IS_SYNC); if (bio->bi_status && !dio->bio.bi_status) dio->bio.bi_status = bio->bi_status; if (atomic_dec_and_test(&dio->ref)) { - if (!(dio->flags & DIO_IS_SYNC)) { + if (is_async) { struct kiocb *iocb = dio->iocb; ssize_t ret; @@ -154,6 +155,9 @@ static void blkdev_bio_end_io(struct bio *bio) } } + if (is_async && (dio->iocb->ki_flags & IOCB_HAS_META)) + bio_integrity_unmap_free_user(bio); + if (should_dirty) { bio_check_pages_dirty(bio); } else { @@ -231,6 +235,16 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, } bio->bi_opf |= REQ_NOWAIT; } + if (!is_sync && unlikely(iocb->ki_flags & IOCB_HAS_META)) { + ret = bio_integrity_map_iter(bio, iocb->private); + if (unlikely(ret)) { + bio_release_pages(bio, false); + bio_clear_flag(bio, BIO_REFFED); + bio_put(bio); + blk_finish_plug(&plug); + return ret; + } + } if (is_read) { if (dio->flags & DIO_SHOULD_DIRTY) @@ -288,6 +302,9 @@ static void blkdev_bio_end_io_async(struct bio *bio) ret = blk_status_to_errno(bio->bi_status); } + if (unlikely(iocb->ki_flags & IOCB_HAS_META)) + bio_integrity_unmap_free_user(bio); + iocb->ki_complete(iocb, ret); if (dio->flags & DIO_SHOULD_DIRTY) { @@ -348,6 +365,15 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, task_io_account_write(bio->bi_iter.bi_size); } + if (unlikely(iocb->ki_flags & IOCB_HAS_META)) { + ret = bio_integrity_map_iter(bio, iocb->private); + WRITE_ONCE(iocb->private, NULL); + if (unlikely(ret)) { + bio_put(bio); + return ret; + } + } + if (iocb->ki_flags & IOCB_ATOMIC) bio->bi_opf |= REQ_ATOMIC; diff --git a/block/t10-pi.c b/block/t10-pi.c index cd7fa60d63ff..38c3da245b11 100644 --- a/block/t10-pi.c +++ b/block/t10-pi.c @@ -131,6 +131,8 @@ static void t10_pi_type1_prepare(struct request *rq) /* Already remapped? */ if (bip->bip_flags & BIP_MAPPED_INTEGRITY) break; + if (bip->bip_flags & BIP_INTEGRITY_USER) + break; bip_for_each_vec(iv, bip, iter) { unsigned int j; @@ -180,6 +182,8 @@ static void t10_pi_type1_complete(struct request *rq, unsigned int nr_bytes) struct bio_vec iv; struct bvec_iter iter; + if (bip->bip_flags & BIP_INTEGRITY_USER) + break; bip_for_each_vec(iv, bip, iter) { unsigned int j; void *p; @@ -305,6 +309,8 @@ static void ext_pi_type1_prepare(struct request *rq) /* Already remapped? */ if (bip->bip_flags & BIP_MAPPED_INTEGRITY) break; + if (bip->bip_flags & BIP_INTEGRITY_USER) + break; bip_for_each_vec(iv, bip, iter) { unsigned int j; diff --git a/include/linux/bio.h b/include/linux/bio.h index 966e22a04996..ff22b627906d 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -330,6 +330,9 @@ enum bip_flags { BIP_INTEGRITY_USER = 1 << 5, /* Integrity payload is user address */ BIP_COPY_USER = 1 << 6, /* Kernel bounce buffer in use */ BIP_CLONED = 1 << 7, /* Indicates that bip is cloned */ + BIP_USER_CHK_GUARD = 1 << 8, + BIP_USER_CHK_APPTAG = 1 << 9, + BIP_USER_CHK_REFTAG = 1 << 10, }; struct uio_meta { @@ -349,6 +352,7 @@ struct bio_integrity_payload { unsigned short bip_vcnt; /* # of integrity bio_vecs */ unsigned short bip_max_vcnt; /* integrity bio_vec slots */ unsigned short bip_flags; /* control flags */ + u16 apptag; /* apptag */ struct bvec_iter bio_iter; /* for rewinding parent bio */ @@ -738,6 +742,7 @@ static inline bool bioset_initialized(struct bio_set *bs) bip_for_each_vec(_bvl, _bio->bi_integrity, _iter) int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed); +int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta); void bio_integrity_unmap_free_user(struct bio *bio); extern struct bio_integrity_payload *bio_integrity_alloc(struct bio *, gfp_t, unsigned int); extern int bio_integrity_add_page(struct bio *, struct page *, unsigned int, unsigned int); @@ -817,6 +822,11 @@ static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, return -EINVAL; } +static inline int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta) +{ + return -EINVAL; +} + static inline void bio_integrity_unmap_free_user(struct bio *bio) { } -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <CGME20240626101529epcas5p49976c46701337830c400cefd8f074b40@epcas5p4.samsung.com>]
* [PATCH v2 10/10] nvme: add handling for user integrity buffer [not found] ` <CGME20240626101529epcas5p49976c46701337830c400cefd8f074b40@epcas5p4.samsung.com> @ 2024-06-26 10:07 ` Anuj Gupta 2024-06-27 6:29 ` Christoph Hellwig 0 siblings, 1 reply; 25+ messages in thread From: Anuj Gupta @ 2024-06-26 10:07 UTC (permalink / raw) To: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen Cc: io-uring, linux-nvme, linux-block, Kanchan Joshi From: Kanchan Joshi <[email protected]> Create a new helper that contains the handling for both kernel and user generated integrity buffer. For user provided integrity buffer, convert bip flags (guard/reftag/apptag checks) to protocol specific flags. Also pass apptag and reftag down. Signed-off-by: Kanchan Joshi <[email protected]> --- drivers/nvme/host/core.c | 85 ++++++++++++++++++++++++++++------------ 1 file changed, 60 insertions(+), 25 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index ab0429644fe3..d17428a2b1dd 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -870,6 +870,13 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, return BLK_STS_OK; } +static void nvme_set_app_tag(struct nvme_command *cmnd, u16 apptag) +{ + cmnd->rw.apptag = cpu_to_le16(apptag); + /* use 0xfff as mask so that apptag is used in entirety*/ + cmnd->rw.appmask = cpu_to_le16(0xffff); +} + static void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd, struct request *req) { @@ -927,6 +934,55 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns, return BLK_STS_OK; } +static blk_status_t nvme_setup_rw_meta(struct nvme_ns *ns, struct request *req, + struct nvme_command *cmnd, u16 *control, + enum nvme_opcode op) +{ + struct bio_integrity_payload *bip = bio_integrity(req->bio); + + if (!bip || !(bip->bip_flags & BIP_INTEGRITY_USER)) { + /* + * If formated with metadata, the block layer always provides a + * metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled. Else + * we enable the PRACT bit for protection information or set the + * namespace capacity to zero to prevent any I/O. + */ + if (!blk_integrity_rq(req)) { + if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head))) + return BLK_STS_NOTSUPP; + *control |= NVME_RW_PRINFO_PRACT; + } + + switch (ns->head->pi_type) { + case NVME_NS_DPS_PI_TYPE3: + *control |= NVME_RW_PRINFO_PRCHK_GUARD; + break; + case NVME_NS_DPS_PI_TYPE1: + case NVME_NS_DPS_PI_TYPE2: + *control |= NVME_RW_PRINFO_PRCHK_GUARD | + NVME_RW_PRINFO_PRCHK_REF; + if (op == nvme_cmd_zone_append) + *control |= NVME_RW_APPEND_PIREMAP; + nvme_set_ref_tag(ns, cmnd, req); + break; + } + } else { + unsigned short bip_flags = bip->bip_flags; + + if (bip_flags & BIP_USER_CHK_GUARD) + *control |= NVME_RW_PRINFO_PRCHK_GUARD; + if (bip_flags & BIP_USER_CHK_REFTAG) { + *control |= NVME_RW_PRINFO_PRCHK_REF; + nvme_set_ref_tag(ns, cmnd, req); + } + if (bip_flags & BIP_USER_CHK_APPTAG) { + *control |= NVME_RW_PRINFO_PRCHK_APP; + nvme_set_app_tag(cmnd, bip->apptag); + } + } + return 0; +} + /* * NVMe does not support a dedicated command to issue an atomic write. A write * which does adhere to the device atomic limits will silently be executed @@ -963,6 +1019,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, { u16 control = 0; u32 dsmgmt = 0; + blk_status_t ret; if (req->cmd_flags & REQ_FUA) control |= NVME_RW_FUA; @@ -990,31 +1047,9 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, cmnd->rw.appmask = 0; if (ns->head->ms) { - /* - * If formated with metadata, the block layer always provides a - * metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled. Else - * we enable the PRACT bit for protection information or set the - * namespace capacity to zero to prevent any I/O. - */ - if (!blk_integrity_rq(req)) { - if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head))) - return BLK_STS_NOTSUPP; - control |= NVME_RW_PRINFO_PRACT; - } - - switch (ns->head->pi_type) { - case NVME_NS_DPS_PI_TYPE3: - control |= NVME_RW_PRINFO_PRCHK_GUARD; - break; - case NVME_NS_DPS_PI_TYPE1: - case NVME_NS_DPS_PI_TYPE2: - control |= NVME_RW_PRINFO_PRCHK_GUARD | - NVME_RW_PRINFO_PRCHK_REF; - if (op == nvme_cmd_zone_append) - control |= NVME_RW_APPEND_PIREMAP; - nvme_set_ref_tag(ns, cmnd, req); - break; - } + ret = nvme_setup_rw_meta(ns, req, cmnd, &control, op); + if (unlikely(ret)) + return ret; } cmnd->rw.control = cpu_to_le16(control); -- 2.25.1 ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 10/10] nvme: add handling for user integrity buffer 2024-06-26 10:07 ` [PATCH v2 10/10] nvme: add handling for user integrity buffer Anuj Gupta @ 2024-06-27 6:29 ` Christoph Hellwig 0 siblings, 0 replies; 25+ messages in thread From: Christoph Hellwig @ 2024-06-27 6:29 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block, Kanchan Joshi On Wed, Jun 26, 2024 at 03:37:00PM +0530, Anuj Gupta wrote: > From: Kanchan Joshi <[email protected]> > > Create a new helper that contains the handling for both kernel and user > generated integrity buffer. > For user provided integrity buffer, convert bip flags > (guard/reftag/apptag checks) to protocol specific flags. Also pass > apptag and reftag down. The driver should not have to know about the source. > +static void nvme_set_app_tag(struct nvme_command *cmnd, u16 apptag) > +{ > + cmnd->rw.apptag = cpu_to_le16(apptag); > + /* use 0xfff as mask so that apptag is used in entirety*/ missing space before the closing comment. But I think this also make sense as: /* use the entire application tag */ > + if (!bip || !(bip->bip_flags & BIP_INTEGRITY_USER)) { > + /* > + * If formated with metadata, the block layer always provides a > + * metadata buffer if CONFIG_BLK_DEV_INTEGRITY is enabled. Else > + * we enable the PRACT bit for protection information or set the > + * namespace capacity to zero to prevent any I/O. > + */ > + if (!blk_integrity_rq(req)) { > + if (WARN_ON_ONCE(!nvme_ns_has_pi(ns->head))) > + return BLK_STS_NOTSUPP; > + *control |= NVME_RW_PRINFO_PRACT; > + } This feels like the wrong level of conditionals. !bip implies !blk_integrity_rq(req) already. > + } else { > + unsigned short bip_flags = bip->bip_flags; > + > + if (bip_flags & BIP_USER_CHK_GUARD) > + *control |= NVME_RW_PRINFO_PRCHK_GUARD; > + if (bip_flags & BIP_USER_CHK_REFTAG) { > + *control |= NVME_RW_PRINFO_PRCHK_REF; > + nvme_set_ref_tag(ns, cmnd, req); > + } > + if (bip_flags & BIP_USER_CHK_APPTAG) { > + *control |= NVME_RW_PRINFO_PRCHK_APP; > + nvme_set_app_tag(cmnd, bip->apptag); > + } But excpept for that the driver should always rely on the actual flags passed by the block layer instead of having to see if it is user passthrough data. Also it seems like this series fails to update the SCSI code to account for these new flags. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 00/10] Read/Write with meta/integrity 2024-06-26 10:06 ` [PATCH v2 00/10] Read/Write with meta/integrity Anuj Gupta ` (9 preceding siblings ...) [not found] ` <CGME20240626101529epcas5p49976c46701337830c400cefd8f074b40@epcas5p4.samsung.com> @ 2024-06-27 6:05 ` Christoph Hellwig 2024-06-27 19:12 ` Kanchan Joshi 2024-06-28 20:36 ` (subset) " Jens Axboe 11 siblings, 1 reply; 25+ messages in thread From: Christoph Hellwig @ 2024-06-27 6:05 UTC (permalink / raw) To: Anuj Gupta Cc: asml.silence, mpatocka, axboe, hch, kbusch, martin.petersen, io-uring, linux-nvme, linux-block What tree does this apply to? There's quite a few rejects vs for-6.11/block. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 00/10] Read/Write with meta/integrity 2024-06-27 6:05 ` [PATCH v2 00/10] Read/Write with meta/integrity Christoph Hellwig @ 2024-06-27 19:12 ` Kanchan Joshi 0 siblings, 0 replies; 25+ messages in thread From: Kanchan Joshi @ 2024-06-27 19:12 UTC (permalink / raw) To: Christoph Hellwig, Anuj Gupta Cc: asml.silence, mpatocka, axboe, kbusch, martin.petersen, io-uring, linux-nvme, linux-block On 6/27/2024 11:35 AM, Christoph Hellwig wrote: > What tree does this apply to? There's quite a few rejects vs > for-6.11/block. > Jens for-next. On top of: commit f078c063b954085cfa185aea2be6a836529d04fc Author: Christoph Hellwig <[email protected]> Date: Mon Jun 24 19:38:35 2024 +0200 block: fix the blk_queue_nonrot polarity And as mentioned in cover letter, a tree is available here: https://github.com/SamsungDS/linux/commits/feat/pi_us_v2/ ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (subset) [PATCH v2 00/10] Read/Write with meta/integrity 2024-06-26 10:06 ` [PATCH v2 00/10] Read/Write with meta/integrity Anuj Gupta ` (10 preceding siblings ...) 2024-06-27 6:05 ` [PATCH v2 00/10] Read/Write with meta/integrity Christoph Hellwig @ 2024-06-28 20:36 ` Jens Axboe 11 siblings, 0 replies; 25+ messages in thread From: Jens Axboe @ 2024-06-28 20:36 UTC (permalink / raw) To: asml.silence, mpatocka, hch, kbusch, martin.petersen, Anuj Gupta Cc: io-uring, linux-nvme, linux-block On Wed, 26 Jun 2024 15:36:50 +0530, Anuj Gupta wrote: > This adds a new io_uring interface to exchange meta along with read/write. > > Interface: > Meta information is represented using a newly introduced 'struct io_uring_meta'. > Application sets up a SQE128 ring, and prepares io_uring_meta within unused > portion of SQE. Application populates 'struct io_uring_meta' fields as below: > > [...] Applied, thanks! [02/10] block: set bip_vcnt correctly commit: 3991657ae7074c3c497bf095093178bed37ea1b4 Best regards, -- Jens Axboe ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2024-07-01 14:09 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20240626101415epcas5p3b06a963aa0b0196d6599fb86c90bc38c@epcas5p3.samsung.com> 2024-06-26 10:06 ` [PATCH v2 00/10] Read/Write with meta/integrity Anuj Gupta [not found] ` <CGME20240626101511epcas5p31c95b67da58d408c371c49f8719140fc@epcas5p3.samsung.com> 2024-06-26 10:06 ` [PATCH v2 01/10] block: change rq_integrity_vec to respect the iterator Anuj Gupta [not found] ` <CGME20240626101513epcas5p10b3f8470148abb10ce3edfb90814cd94@epcas5p1.samsung.com> 2024-06-26 10:06 ` [PATCH v2 02/10] block: set bip_vcnt correctly Anuj Gupta 2024-06-28 6:04 ` Christoph Hellwig 2024-06-28 20:35 ` Jens Axboe [not found] ` <CGME20240626101516epcas5p19fb40e8231d1832cab3d031672f0109e@epcas5p1.samsung.com> 2024-06-26 10:06 ` [PATCH v2 03/10] block: copy bip_max_vcnt vecs instead of bip_vcnt during clone Anuj Gupta 2024-06-27 6:14 ` Christoph Hellwig [not found] ` <CGME20240626101518epcas5p17e046bca77b218fc6914ddeb182eb42e@epcas5p1.samsung.com> 2024-06-26 10:06 ` [PATCH v2 04/10] block: Handle meta bounce buffer correctly in case of split Anuj Gupta 2024-06-27 6:16 ` Christoph Hellwig [not found] ` <CGME20240626101519epcas5p163b0735c1604a228196f0e8c14773005@epcas5p1.samsung.com> 2024-06-26 10:06 ` [PATCH v2 05/10] block: introduce BIP_CLONED flag Anuj Gupta 2024-06-27 6:21 ` Christoph Hellwig 2024-06-27 12:09 ` Christoph Hellwig [not found] ` <CGME20240626101521epcas5p42b0c1c0e123996b199e058bae9a69123@epcas5p4.samsung.com> 2024-06-26 10:06 ` [PATCH v2 06/10] block: modify bio_integrity_map_user to accept iov_iter as argument Anuj Gupta 2024-06-27 6:23 ` Christoph Hellwig [not found] ` <CGME20240626101523epcas5p2616cf568575685bd251d28fc1398d4cd@epcas5p2.samsung.com> 2024-06-26 10:06 ` [PATCH v2 07/10] block: define meta io descriptor Anuj Gupta 2024-06-27 6:22 ` Christoph Hellwig [not found] ` <CGME20240626101525epcas5p4dbcef84714e4e9214b951fe2ff649521@epcas5p4.samsung.com> 2024-06-26 10:06 ` [PATCH v2 08/10] io_uring/rw: add support to send meta along with read/write Anuj Gupta 2024-06-26 17:17 ` Gabriel Krisman Bertazi 2024-07-01 14:09 ` Anuj gupta [not found] ` <CGME20240626101527epcas5p23e10a6701f552d16bd6a999418009ba0@epcas5p2.samsung.com> 2024-06-26 10:06 ` [PATCH v2 09/10] block: add support to pass user meta buffer Anuj Gupta [not found] ` <CGME20240626101529epcas5p49976c46701337830c400cefd8f074b40@epcas5p4.samsung.com> 2024-06-26 10:07 ` [PATCH v2 10/10] nvme: add handling for user integrity buffer Anuj Gupta 2024-06-27 6:29 ` Christoph Hellwig 2024-06-27 6:05 ` [PATCH v2 00/10] Read/Write with meta/integrity Christoph Hellwig 2024-06-27 19:12 ` Kanchan Joshi 2024-06-28 20:36 ` (subset) " Jens Axboe
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox