* [PATCH v4 00/11] Read/Write with meta/integrity [not found] <CGME20241016113705epcas5p1edc284b347de99bc34802b0b0c5e1b27@epcas5p1.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta [not found] ` <CGME20241016113724epcas5p191ccce0f473274fc95934956662fc769@epcas5p1.samsung.com> ` (11 more replies) 0 siblings, 12 replies; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta This adds a new io_uring interface to exchange meta along with read/write. Interface: Meta information is represented using a newly introduced 'struct io_uring_meta'. Application sets up a SQE128 ring, and prepares io_uring_meta within second SQE. Application populates 'struct io_uring_meta' fields as below: * meta_type: describes type of meta that is passed. Currently one type "Integrity" is supported. * meta_flags: these are meta-type specific flags. Three flags are exposed for integrity type, namely BLK_INTEGRITY_CHK_GUARD/APPTAG/REFTAG. * meta_len: length of the meta buffer * meta_addr: address of the meta buffer * seed: seed value for ref tag remapping * app_tag: optional application-specific 16b value; this goes along with INTEGRITY_CHK_APPTAG flag. Block path (direct IO) , NVMe and SCSI driver are modified to support this. Patch 1 is an enhancement patch. Patch 2 is required to make the user metadata split work correctly. Patch 3 to 6 are prep patches. Patch 7 adds the io_uring support. Patch 8 gives us unified interface for user and kernel generated integrity. Patch 9 adds the support for block direct IO, patch 10 for NVMe, and patch 11 for SCSI. In patch 11 for scsi, we added a check to prevent scenarios where refcheck is specified without appcheck and vice-versa, as it is not possible in scsi. However block layer generated integrity doesn't specify appcheck. For drives formatted with type1/type2 PI, block layer would specify refcheck but not appcheck. Hence, these I/O's would fail. Any suggestions how this could be handled? Some of the design choices came from this discussion [1]. Example program on how to use the interface is appended below [2] (It also tests whether reftag remapping happens correctly or not) Tree: https://github.com/SamsungDS/linux/tree/feat/pi_us_v4 Testing: has been done by modifying fio to use this interface. https://github.com/SamsungDS/fio/tree/priv/feat/pi-test-v5 Changes since v3: https://lore.kernel.org/linux-block/[email protected]/ - add reftag seed support (Martin) - fix incorrect formatting in uio_meta (hch) - s/IOCB_HAS_META/IOCB_HAS_METADATA (hch) - move integrity check flags to block layer header (hch) - add comments for BIP_CHECK_GUARD/REFTAG/APPTAG flags (hch) - remove bio_integrity check during completion if IOCB_HAS_METADATA is set (hch) - use goto label to get rid of duplicate error handling (hch) - add warn_on if trying to do sync io with iocb_has_metadata flag (hch) - remove check for disabling reftag remapping (hch) - remove BIP_INTEGRITY_USER flag (hch) - add comment for app_tag field introduced in bio_integrity_payload (hch) - pass request to nvme_set_app_tag function (hch) - right indentation at a place in scsi patch (hch) - move IOCB_HAS_METADATA to a separate fs patch (hch) Changes since v2: https://lore.kernel.org/linux-block/[email protected]/ - io_uring error handling styling (Gabriel) - add documented helper to get metadata bytes from data iter (hch) - during clone specify "what flags to clone" rather than "what not to clone" (hch) - Move uio_meta defination to bio-integrity.h (hch) - Rename apptag field to app_tag (hch) - Change datatype of flags field in uio_meta to bitwise (hch) - Don't introduce BIP_USER_CHK_FOO flags (hch, martin) - Driver should rely on block layer flags instead of seeing if it is user-passthrough (hch) - update the scsi code for handling user-meta (hch, martin) Changes since v1: https://lore.kernel.org/linux-block/[email protected]/ - Do not use new opcode for meta, and also add the provision to introduce new meta types beyond integrity (Pavel) - Stuff IOCB_HAS_META check in need_complete_io (Jens) - Split meta handling in NVMe into a separate handler (Keith) - Add meta handling for __blkdev_direct_IO too (Keith) - Don't inherit BIP_COPY_USER flag for cloned bio's (Christoph) - Better commit descriptions (Christoph) Changes since RFC: - modify io_uring plumbing based on recent async handling state changes - fixes/enhancements to correctly handle the split for meta buffer - add flags to specify guard/reftag/apptag checks - add support to send apptag [1] https://lore.kernel.org/linux-block/[email protected]/ [2] #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include <linux/blkdev.h> #include <linux/io_uring.h> #include <linux/types.h> #include "liburing.h" /* write data/meta. read both. compare. send apptag too. * prerequisite: * protected xfer: format namespace with 4KB + 8b, pi_type = 1 * For testing reftag remapping on device-mapper, create a * device-mapper and run this program. Device mapper creation: * # echo 0 80 linear /dev/nvme0n1 0 > /tmp/table * # echo 80 160 linear /dev/nvme0n1 200 >> /tmp/table * # dmsetup create two /tmp/table * # ./a.out /dev/dm-0 */ #define DATA_LEN 4096 #define META_LEN 8 struct t10_pi_tuple { __be16 guard; __be16 apptag; __be32 reftag; }; int main(int argc, char *argv[]) { struct io_uring ring; struct io_uring_sqe *sqe = NULL; struct io_uring_cqe *cqe = NULL; void *wdb,*rdb; char wmb[META_LEN], rmb[META_LEN]; char *data_str = "data buffer"; int fd, ret, blksize; struct stat fstat; unsigned long long offset = DATA_LEN * 10; struct t10_pi_tuple *pi; struct io_uring_meta *md; if (argc != 2) { fprintf(stderr, "Usage: %s <block-device>", argv[0]); return 1; }; if (stat(argv[1], &fstat) == 0) { blksize = (int)fstat.st_blksize; } else { perror("stat"); return 1; } if (posix_memalign(&wdb, blksize, DATA_LEN)) { perror("posix_memalign failed"); return 1; } if (posix_memalign(&rdb, blksize, DATA_LEN)) { perror("posix_memalign failed"); return 1; } memset(wdb, 0, DATA_LEN); fd = open(argv[1], O_RDWR | O_DIRECT); if (fd < 0) { printf("Error in opening device\n"); return 0; } ret = io_uring_queue_init(8, &ring, IORING_SETUP_SQE128); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } /* write data + meta-buffer to device */ sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); return 1; } io_uring_prep_write(sqe, fd, wdb, DATA_LEN, offset); md = (struct io_uring_meta *) sqe->big_sqe_cmd; md->meta_type = META_TYPE_INTEGRITY; md->meta_addr = (__u64)wmb; md->meta_len = META_LEN; /* flags to ask for guard/reftag/apptag*/ md->meta_flags = BLK_INTEGRITY_CHK_GUARD | BLK_INTEGRITY_CHK_REFTAG | BLK_INTEGRITY_CHK_APPTAG; md->app_tag = 0x1234; md->seed = 10; pi = (struct t10_pi_tuple *)wmb; pi->guard = 0; pi->reftag = 0x0A000000; pi->apptag = 0x3412; ret = io_uring_submit(&ring); if (ret <= 0) { fprintf(stderr, "sqe submit failed: %d\n", ret); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (!cqe) { fprintf(stderr, "cqe is NULL :%d\n", ret); return 1; } if (cqe->res < 0) { fprintf(stderr, "write cqe failure: %d", cqe->res); return 1; } io_uring_cqe_seen(&ring, cqe); /* read data + meta-buffer back from device */ sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); return 1; } io_uring_prep_read(sqe, fd, rdb, DATA_LEN, offset); md = (struct io_uring_meta *) sqe->big_sqe_cmd; md->meta_type = META_TYPE_INTEGRITY; md->meta_addr = (__u64)rmb; md->meta_len = META_LEN; md->meta_flags = BLK_INTEGRITY_CHK_GUARD | BLK_INTEGRITY_CHK_REFTAG | BLK_INTEGRITY_CHK_APPTAG; md->app_tag = 0x1234; md->seed = 10; ret = io_uring_submit(&ring); if (ret <= 0) { fprintf(stderr, "sqe submit failed: %d\n", ret); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (!cqe) { fprintf(stderr, "cqe is NULL :%d\n", ret); return 1; } if (cqe->res < 0) { fprintf(stderr, "read cqe failure: %d", cqe->res); return 1; } pi = (struct t10_pi_tuple *)rmb; if (pi->apptag != 0x3412) printf("Failure: apptag mismatch!\n"); if (pi->reftag != 0x0A000000) printf("Failure: reftag mismatch!\n"); io_uring_cqe_seen(&ring, cqe); pi = (struct t10_pi_tuple *)rmb; if (strncmp(wmb, rmb, META_LEN)) printf("Failure: meta mismatch!, wmb=%s, rmb=%s\n", wmb, rmb); if (strncmp(wdb, rdb, DATA_LEN)) printf("Failure: data mismatch!\n"); io_uring_queue_exit(&ring); free(rdb); free(wdb); return 0; } Anuj Gupta (8): block: define set of integrity flags to be inherited by cloned bip block: copy back bounce buffer to user-space correctly in case of split block: modify bio_integrity_map_user to accept iov_iter as argument fs: introduce IOCB_HAS_METADATA for metadata block: add flags for integrity meta io_uring/rw: add support to send meta along with read/write block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags scsi: add support for user-meta interface Kanchan Joshi (3): block: define meta io descriptor block: add support to pass user meta buffer nvme: add support for passing on the application tag block/bio-integrity.c | 73 +++++++++++++++++++++++++++++----- block/blk-integrity.c | 10 ++++- block/fops.c | 44 +++++++++++++++----- drivers/nvme/host/core.c | 21 ++++++---- drivers/scsi/sd.c | 25 +++++++++++- include/linux/bio-integrity.h | 30 ++++++++++++-- include/linux/fs.h | 1 + include/uapi/linux/blkdev.h | 11 +++++ include/uapi/linux/io_uring.h | 26 ++++++++++++ io_uring/io_uring.c | 6 +++ io_uring/rw.c | 75 +++++++++++++++++++++++++++++++++-- io_uring/rw.h | 15 ++++++- 12 files changed, 300 insertions(+), 37 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113724epcas5p191ccce0f473274fc95934956662fc769@epcas5p1.samsung.com>]
* [PATCH v4 01/11] block: define set of integrity flags to be inherited by cloned bip [not found] ` <CGME20241016113724epcas5p191ccce0f473274fc95934956662fc769@epcas5p1.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-16 18:03 ` Keith Busch 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta Introduce BIP_CLONE_FLAGS describing integrity flags that should be inherited in the cloned bip from the parent. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> --- block/bio-integrity.c | 2 +- include/linux/bio-integrity.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 88e3ad73c385..8c41a380f2bd 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -562,7 +562,7 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src, bip->bip_vec = bip_src->bip_vec; bip->bip_iter = bip_src->bip_iter; - bip->bip_flags = bip_src->bip_flags & ~BIP_BLOCK_INTEGRITY; + bip->bip_flags = bip_src->bip_flags & BIP_CLONE_FLAGS; return 0; } diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h index dd831c269e99..485d8a43017a 100644 --- a/include/linux/bio-integrity.h +++ b/include/linux/bio-integrity.h @@ -30,6 +30,9 @@ struct bio_integrity_payload { struct bio_vec bip_inline_vecs[];/* embedded bvec array */ }; +#define BIP_CLONE_FLAGS (BIP_MAPPED_INTEGRITY | BIP_CTRL_NOCHECK | \ + BIP_DISK_NOCHECK | BIP_IP_CHECKSUM) + #ifdef CONFIG_BLK_DEV_INTEGRITY #define bip_for_each_vec(bvl, bip, iter) \ -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 01/11] block: define set of integrity flags to be inherited by cloned bip 2024-10-16 11:29 ` [PATCH v4 01/11] block: define set of integrity flags to be inherited by cloned bip Anuj Gupta @ 2024-10-16 18:03 ` Keith Busch 0 siblings, 0 replies; 48+ messages in thread From: Keith Busch @ 2024-10-16 18:03 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Wed, Oct 16, 2024 at 04:59:02PM +0530, Anuj Gupta wrote: > Introduce BIP_CLONE_FLAGS describing integrity flags that should be > inherited in the cloned bip from the parent. Looks good. Reviewed-by: Keith Busch <[email protected]> ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113736epcas5p3a03665bf0674e68a8f95bbd5f3607357@epcas5p3.samsung.com>]
* [PATCH v4 02/11] block: copy back bounce buffer to user-space correctly in case of split [not found] ` <CGME20241016113736epcas5p3a03665bf0674e68a8f95bbd5f3607357@epcas5p3.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-16 18:04 ` Keith Busch 2024-10-17 7:53 ` Christoph Hellwig 0 siblings, 2 replies; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta Copy back the bounce buffer to user-space in entirety when the parent bio completes. Signed-off-by: Anuj Gupta <[email protected]> --- block/bio-integrity.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 8c41a380f2bd..8948e635432d 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -119,8 +119,8 @@ static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs, static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip) { unsigned short nr_vecs = bip->bip_max_vcnt - 1; - struct bio_vec *copy = &bip->bip_vec[1]; - size_t bytes = bip->bip_iter.bi_size; + struct bio_vec *copy = &bip->bip_vec[1], *bvec = &bip->bip_vec[0]; + size_t bytes = bvec->bv_len; struct iov_iter iter; int ret; -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 02/11] block: copy back bounce buffer to user-space correctly in case of split 2024-10-16 11:29 ` [PATCH v4 02/11] block: copy back bounce buffer to user-space correctly in case of split Anuj Gupta @ 2024-10-16 18:04 ` Keith Busch 2024-10-17 7:53 ` Christoph Hellwig 1 sibling, 0 replies; 48+ messages in thread From: Keith Busch @ 2024-10-16 18:04 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Wed, Oct 16, 2024 at 04:59:03PM +0530, Anuj Gupta wrote: > Copy back the bounce buffer to user-space in entirety when the parent > bio completes. Looks good. Reviewed-by: Keith Busch <[email protected]> ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 02/11] block: copy back bounce buffer to user-space correctly in case of split 2024-10-16 11:29 ` [PATCH v4 02/11] block: copy back bounce buffer to user-space correctly in case of split Anuj Gupta 2024-10-16 18:04 ` Keith Busch @ 2024-10-17 7:53 ` Christoph Hellwig 1 sibling, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 7:53 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Wed, Oct 16, 2024 at 04:59:03PM +0530, Anuj Gupta wrote: > Copy back the bounce buffer to user-space in entirety when the parent > bio completes. This commit message is a bit sparse.. > --- a/block/bio-integrity.c > +++ b/block/bio-integrity.c > @@ -119,8 +119,8 @@ static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs, > static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip) > { > unsigned short nr_vecs = bip->bip_max_vcnt - 1; > - struct bio_vec *copy = &bip->bip_vec[1]; > - size_t bytes = bip->bip_iter.bi_size; > + struct bio_vec *copy = &bip->bip_vec[1], *bvec = &bip->bip_vec[0]; > + size_t bytes = bvec->bv_len; > struct iov_iter iter; > int ret; And while trying to understand what this code does I keep getting confused by what bio_integrity_uncopy_user actually does. And I think a bit part of that is the "creative" abuse of bip_vec by the copy-based integrity code, where bip_vec[0] is the vector passed to the block layer, and the rest contains the user buffer. Maybe it's just me, but not stashing the user pages in the bvecs but just stasing away the actual iov_iter would be much preferable for this code? Either way, back to the code. The existing code uses bip_iter.bi_size for sizing the copy, and that can be modified when the bio is cloned (or at least by the rules for the bio data) be modified by the driver. So yes, switching away from that is good and the change looks correct. That being said, even if we aren't going to fix up the logic as mentioned above instantly, can we pick better names for the variables? static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip) { unsigned short orig_nr_vecs = bip->bip_max_vcnt - 1; struct bio_vec *orig_bvecs = &bip->bip_vec[1]; struct bio_vec *bounce_bvec = &bip->bip_vec[0]; size_t bytes = boune_bvec->bv_len; struct iov_iter orig_iter; int ret; iov_iter_bvec(&orig_iter, ITER_DEST, orig_bvecs, orig_nr_vecs, bytes); ret = copy_to_iter(bvec_virt(bounce_bvec), bytes, &orig_iter); WARN_ON_ONCE(ret != bytes); bio_integrity_unpin_bvec(orig_bvecs, orig_nr_vecs, true); } ? Also please add a Fixes tag. ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113738epcas5p2d217b8d6efbf855f61c133f64deaa486@epcas5p2.samsung.com>]
* [PATCH v4 03/11] block: modify bio_integrity_map_user to accept iov_iter as argument [not found] ` <CGME20241016113738epcas5p2d217b8d6efbf855f61c133f64deaa486@epcas5p2.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 0 siblings, 0 replies; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta, Kanchan Joshi This patch refactors bio_integrity_map_user to accept iov_iter as argument. This is a prep patch. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> --- block/bio-integrity.c | 12 +++++------- block/blk-integrity.c | 10 +++++++++- include/linux/bio-integrity.h | 6 +++--- 3 files changed, 17 insertions(+), 11 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 8948e635432d..341a0382befd 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -303,17 +303,16 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages, return nr_bvecs; } -int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes, +int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed) { struct request_queue *q = bdev_get_queue(bio->bi_bdev); unsigned int align = blk_lim_dma_alignment_and_pad(&q->limits); struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages; struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec; + size_t offset, bytes = iter->count; unsigned int direction, nr_bvecs; - struct iov_iter iter; int ret, nr_vecs; - size_t offset; bool copy; if (bio_integrity(bio)) @@ -326,8 +325,7 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes, else direction = ITER_SOURCE; - iov_iter_ubuf(&iter, direction, ubuf, bytes); - nr_vecs = iov_iter_npages(&iter, BIO_MAX_VECS + 1); + nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS + 1); if (nr_vecs > BIO_MAX_VECS) return -E2BIG; if (nr_vecs > UIO_FASTIOV) { @@ -337,8 +335,8 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes, pages = NULL; } - copy = !iov_iter_is_aligned(&iter, align, align); - ret = iov_iter_extract_pages(&iter, &pages, bytes, nr_vecs, 0, &offset); + copy = !iov_iter_is_aligned(iter, align, align); + ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs, 0, &offset); if (unlikely(ret < 0)) goto free_bvec; diff --git a/block/blk-integrity.c b/block/blk-integrity.c index 83b696ba0cac..cb8f7260bdcf 100644 --- a/block/blk-integrity.c +++ b/block/blk-integrity.c @@ -115,8 +115,16 @@ EXPORT_SYMBOL(blk_rq_map_integrity_sg); int blk_rq_integrity_map_user(struct request *rq, void __user *ubuf, ssize_t bytes, u32 seed) { - int ret = bio_integrity_map_user(rq->bio, ubuf, bytes, seed); + int ret; + struct iov_iter iter; + unsigned int direction; + if (op_is_write(req_op(rq))) + direction = ITER_DEST; + else + direction = ITER_SOURCE; + iov_iter_ubuf(&iter, direction, ubuf, bytes); + ret = bio_integrity_map_user(rq->bio, &iter, seed); if (ret) return ret; diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h index 485d8a43017a..90aab50a3e14 100644 --- a/include/linux/bio-integrity.h +++ b/include/linux/bio-integrity.h @@ -75,7 +75,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio, gfp_t gfp, unsigned int nr); int bio_integrity_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int offset); -int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t len, u32 seed); +int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed); void bio_integrity_unmap_user(struct bio *bio); bool bio_integrity_prep(struct bio *bio); void bio_integrity_advance(struct bio *bio, unsigned int bytes_done); @@ -101,8 +101,8 @@ static inline void bioset_integrity_free(struct bio_set *bs) { } -static inline int bio_integrity_map_user(struct bio *bio, void __user *ubuf, - ssize_t len, u32 seed) +static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, + u32 seed) { return -EINVAL; } -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113741epcas5p3b90adb3b43b6b443ffd00df29d63d289@epcas5p3.samsung.com>]
* [PATCH v4 04/11] block: define meta io descriptor [not found] ` <CGME20241016113741epcas5p3b90adb3b43b6b443ffd00df29d63d289@epcas5p3.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-16 19:35 ` Keith Busch ` (2 more replies) 0 siblings, 3 replies; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi, Anuj Gupta From: Kanchan Joshi <[email protected]> Introduces a new 'uio_meta' structure that upper layer can use to pass the meta/integrity information. Signed-off-by: Kanchan Joshi <[email protected]> Signed-off-by: Anuj Gupta <[email protected]> --- include/linux/bio-integrity.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h index 90aab50a3e14..529ec7a8df20 100644 --- a/include/linux/bio-integrity.h +++ b/include/linux/bio-integrity.h @@ -30,6 +30,16 @@ struct bio_integrity_payload { struct bio_vec bip_inline_vecs[];/* embedded bvec array */ }; +/* flags for integrity meta */ +typedef __u16 __bitwise meta_flags_t; + +struct uio_meta { + meta_flags_t flags; + u16 app_tag; + u32 seed; + struct iov_iter iter; +}; + #define BIP_CLONE_FLAGS (BIP_MAPPED_INTEGRITY | BIP_CTRL_NOCHECK | \ BIP_DISK_NOCHECK | BIP_IP_CHECKSUM) -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/11] block: define meta io descriptor 2024-10-16 11:29 ` [PATCH v4 04/11] block: define meta io descriptor Anuj Gupta @ 2024-10-16 19:35 ` Keith Busch 2024-10-17 5:49 ` Anuj Gupta 2024-10-17 7:57 ` Christoph Hellwig 2024-10-22 2:11 ` Martin K. Petersen 2 siblings, 1 reply; 48+ messages in thread From: Keith Busch @ 2024-10-16 19:35 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi On Wed, Oct 16, 2024 at 04:59:05PM +0530, Anuj Gupta wrote: > +struct uio_meta { > + meta_flags_t flags; > + u16 app_tag; > + u32 seed; > + struct iov_iter iter; > +}; Is the seed used for anything other than the kernel's t10 generation and verification? It looks like that's all it is for today, and that part is skipped for userspace metadata, so I'm not sure we need it. I know it's been used for passthrough commands since nvme started supporitng it, but I don't see why the driver ever bothered. I think it wasn't necessary and we've been carrying it forward ever since. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/11] block: define meta io descriptor 2024-10-16 19:35 ` Keith Busch @ 2024-10-17 5:49 ` Anuj Gupta 0 siblings, 0 replies; 48+ messages in thread From: Anuj Gupta @ 2024-10-17 5:49 UTC (permalink / raw) To: Keith Busch Cc: axboe, hch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi [-- Attachment #1: Type: text/plain, Size: 1249 bytes --] On 16/10/24 01:35PM, Keith Busch wrote: >On Wed, Oct 16, 2024 at 04:59:05PM +0530, Anuj Gupta wrote: >> +struct uio_meta { >> + meta_flags_t flags; >> + u16 app_tag; >> + u32 seed; >> + struct iov_iter iter; >> +}; > >Is the seed used for anything other than the kernel's t10 generation and >verification? It looks like that's all it is for today, and that part is >skipped for userspace metadata, so I'm not sure we need it. > >I know it's been used for passthrough commands since nvme started >supporitng it, but I don't see why the driver ever bothered. I think it >wasn't necessary and we've been carrying it forward ever since. Not for generation/verfication, but seed is used to remap the ref tag when submitting metadata on a partition (see blk_integrity_prepare/complete). For cases like partitioning, MD/DM cloning, where virtual start sector is different from physical sector remapping is required. It is skipped for passthrough, but we require it for this series where I/O can be done on partition too. Christoph [1], Martin [2] also expressed the need for it in the previous version. [1] https://lore.kernel.org/linux-block/[email protected]/ [2] https://lore.kernel.org/linux-block/[email protected]/ [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/11] block: define meta io descriptor 2024-10-16 11:29 ` [PATCH v4 04/11] block: define meta io descriptor Anuj Gupta 2024-10-16 19:35 ` Keith Busch @ 2024-10-17 7:57 ` Christoph Hellwig 2024-10-22 2:11 ` Martin K. Petersen 2 siblings, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 7:57 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi On Wed, Oct 16, 2024 at 04:59:05PM +0530, Anuj Gupta wrote: > +/* flags for integrity meta */ > +typedef __u16 __bitwise meta_flags_t; > + > +struct uio_meta { > + meta_flags_t flags; Please either add the actual flags here, or if there is a good reason to do that later add the meta_flags_t type and the member when adding it. Also maybe the type name wants a prefix, maybe uio? Also from looking at the reset of the series uio_meta is in no way block specific and referenced from io_uring. So this probably should go into uio.h? ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/11] block: define meta io descriptor 2024-10-16 11:29 ` [PATCH v4 04/11] block: define meta io descriptor Anuj Gupta 2024-10-16 19:35 ` Keith Busch 2024-10-17 7:57 ` Christoph Hellwig @ 2024-10-22 2:11 ` Martin K. Petersen 2024-10-22 6:04 ` Christoph Hellwig 2024-10-28 3:46 ` Anuj Gupta 2 siblings, 2 replies; 48+ messages in thread From: Martin K. Petersen @ 2024-10-22 2:11 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi Anuj, > +struct uio_meta { > + meta_flags_t flags; > + u16 app_tag; > + u32 seed; > + struct iov_iter iter; > +}; Glad to see the seed added. In NVMe it can be a 64-bit value so we should bump the size of that field in the user API. Not sure what to do about the storage tag. For Linux that would probably be owned by the filesystem (as opposed to the application). But I guess one could envision a userland application acting as a storage target and in that case the tag would need to be passed to the kernel. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/11] block: define meta io descriptor 2024-10-22 2:11 ` Martin K. Petersen @ 2024-10-22 6:04 ` Christoph Hellwig 2024-10-23 1:20 ` Martin K. Petersen 2024-10-28 3:46 ` Anuj Gupta 1 sibling, 1 reply; 48+ messages in thread From: Christoph Hellwig @ 2024-10-22 6:04 UTC (permalink / raw) To: Martin K. Petersen Cc: Anuj Gupta, axboe, hch, kbusch, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi On Mon, Oct 21, 2024 at 10:11:35PM -0400, Martin K. Petersen wrote: > Glad to see the seed added. In NVMe it can be a 64-bit value so we > should bump the size of that field in the user API. > > Not sure what to do about the storage tag. For Linux that would probably > be owned by the filesystem (as opposed to the application). But I guess > one could envision a userland application acting as a storage target and > in that case the tag would need to be passed to the kernel. Right now the series only supports PI on the block device. In that case the userspace application can clearly make use of it. If we want to support PI on file systems (which I'd really like to do for XFS) both ownership models might make sense depending on the file system, although I think just giving it to the application is going to be the only thing we'll see for a while. Maybe we need a way to query if the application can use the app tag? ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/11] block: define meta io descriptor 2024-10-22 6:04 ` Christoph Hellwig @ 2024-10-23 1:20 ` Martin K. Petersen 0 siblings, 0 replies; 48+ messages in thread From: Martin K. Petersen @ 2024-10-23 1:20 UTC (permalink / raw) To: Christoph Hellwig Cc: Martin K. Petersen, Anuj Gupta, axboe, kbusch, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi Christoph, >> Not sure what to do about the storage tag. For Linux that would >> probably be owned by the filesystem (as opposed to the application). >> But I guess one could envision a userland application acting as a >> storage target and in that case the tag would need to be passed to >> the kernel. > > Right now the series only supports PI on the block device. In that > case the userspace application can clearly make use of it. If we > want to support PI on file systems (which I'd really like to do for > XFS) both ownership models might make sense depending on the file > system, although I think just giving it to the application is going > to be the only thing we'll see for a while. Maybe we need a way > to query if the application can use the app tag? tag_size currently indicates the size of the app tag available to the application. I.e. if ATO=1, tag_size is 2. And if ATO=0, tag_size is 0. To properly support the NVMe extensions I guess we'll have to have fields indicating the actual size of the reference and storage tags. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 04/11] block: define meta io descriptor 2024-10-22 2:11 ` Martin K. Petersen 2024-10-22 6:04 ` Christoph Hellwig @ 2024-10-28 3:46 ` Anuj Gupta 1 sibling, 0 replies; 48+ messages in thread From: Anuj Gupta @ 2024-10-28 3:46 UTC (permalink / raw) To: Martin K. Petersen Cc: axboe, hch, kbusch, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi [-- Attachment #1: Type: text/plain, Size: 484 bytes --] > Not sure what to do about the storage tag. For Linux that would probably > be owned by the filesystem (as opposed to the application). But I guess > one could envision a userland application acting as a storage target and > in that case the tag would need to be passed to the kernel. I will reserve space for storage tag in the user interface for now. That way, we can introduce and use it later when it is actually used. > > -- > Martin K. Petersen Oracle Linux Engineering > [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113743epcas5p3b4092c938d8795cea666ab4e6baf4aa9@epcas5p3.samsung.com>]
* [PATCH v4 05/11] fs: introduce IOCB_HAS_METADATA for metadata [not found] ` <CGME20241016113743epcas5p3b4092c938d8795cea666ab4e6baf4aa9@epcas5p3.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-17 7:58 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta Introduce an IOCB_HAS_METADATA flag for the kiocb struct, for handling requests containing meta payload. Signed-off-by: Anuj Gupta <[email protected]> --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index 3559446279c1..f451942db74b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -346,6 +346,7 @@ struct readahead_control; #define IOCB_DIO_CALLER_COMP (1 << 22) /* kiocb is a read or write operation submitted by fs/aio.c. */ #define IOCB_AIO_RW (1 << 23) +#define IOCB_HAS_METADATA (1 << 24) /* for use in trace events */ #define TRACE_IOCB_STRINGS \ -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 05/11] fs: introduce IOCB_HAS_METADATA for metadata 2024-10-16 11:29 ` [PATCH v4 05/11] fs: introduce IOCB_HAS_METADATA for metadata Anuj Gupta @ 2024-10-17 7:58 ` Christoph Hellwig 0 siblings, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 7:58 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g Looks good: Reviewed-by: Christoph Hellwig <[email protected]> ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113745epcas5p1723d91b979fd0e597495fef377ad0f62@epcas5p1.samsung.com>]
* [PATCH v4 06/11] block: add flags for integrity meta [not found] ` <CGME20241016113745epcas5p1723d91b979fd0e597495fef377ad0f62@epcas5p1.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-17 8:00 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta Add flags to describe checks for integrity meta buffer. These flags are specified by application as io_uring meta_flags, added in the next patch. Signed-off-by: Anuj Gupta <[email protected]> --- include/uapi/linux/blkdev.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/uapi/linux/blkdev.h b/include/uapi/linux/blkdev.h index 66373cd1a83a..d606f8b9c0a0 100644 --- a/include/uapi/linux/blkdev.h +++ b/include/uapi/linux/blkdev.h @@ -11,4 +11,15 @@ */ #define BLOCK_URING_CMD_DISCARD _IO(0x12, 0) +/* + * flags for integrity meta + */ +#define BLK_INTEGRITY_CHK_GUARD (1U << 0) /* enforce guard check */ +#define BLK_INTEGRITY_CHK_APPTAG (1U << 1) /* enforce app tag check */ +#define BLK_INTEGRITY_CHK_REFTAG (1U << 2) /* enforce ref tag check */ + +#define BLK_INTEGRITY_VALID_FLAGS (BLK_INTEGRITY_CHK_GUARD |\ + BLK_INTEGRITY_CHK_APPTAG |\ + BLK_INTEGRITY_CHK_REFTAG) + #endif -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 06/11] block: add flags for integrity meta 2024-10-16 11:29 ` [PATCH v4 06/11] block: add flags for integrity meta Anuj Gupta @ 2024-10-17 8:00 ` Christoph Hellwig 2024-10-17 10:45 ` Anuj Gupta 0 siblings, 1 reply; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 8:00 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Wed, Oct 16, 2024 at 04:59:07PM +0530, Anuj Gupta wrote: > Add flags to describe checks for integrity meta buffer. These flags are > specified by application as io_uring meta_flags, added in the next patch. These are now blkdev uapis, but io_uring ones even if currently only the block file operations implement them. I do plan to support these through file systems eventually. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 06/11] block: add flags for integrity meta 2024-10-17 8:00 ` Christoph Hellwig @ 2024-10-17 10:45 ` Anuj Gupta 2024-10-17 12:01 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-17 10:45 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g [-- Attachment #1: Type: text/plain, Size: 562 bytes --] On Thu, Oct 17, 2024 at 10:00:15AM +0200, Christoph Hellwig wrote: > On Wed, Oct 16, 2024 at 04:59:07PM +0530, Anuj Gupta wrote: > > Add flags to describe checks for integrity meta buffer. These flags are > > specified by application as io_uring meta_flags, added in the next patch. > > These are now blkdev uapis, but io_uring ones even if currently only > the block file operations implement them. I do plan to support these > through file systems eventually. Are these flags placed correctly here or you see that they should be moved somewhere else? > > [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 06/11] block: add flags for integrity meta 2024-10-17 10:45 ` Anuj Gupta @ 2024-10-17 12:01 ` Christoph Hellwig 2024-10-17 12:59 ` Anuj gupta 0 siblings, 1 reply; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 12:01 UTC (permalink / raw) To: Anuj Gupta Cc: Christoph Hellwig, axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Thu, Oct 17, 2024 at 04:15:02PM +0530, Anuj Gupta wrote: > On Thu, Oct 17, 2024 at 10:00:15AM +0200, Christoph Hellwig wrote: > > On Wed, Oct 16, 2024 at 04:59:07PM +0530, Anuj Gupta wrote: > > > Add flags to describe checks for integrity meta buffer. These flags are > > > specified by application as io_uring meta_flags, added in the next patch. > > > > These are now blkdev uapis, but io_uring ones even if currently only > > the block file operations implement them. I do plan to support these > > through file systems eventually. > > Are these flags placed correctly here or you see that they should be > moved somewhere else? They are not as they aren't blkdev apis. They are generic for io_uring, or maybe even VFS-wide. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 06/11] block: add flags for integrity meta 2024-10-17 12:01 ` Christoph Hellwig @ 2024-10-17 12:59 ` Anuj gupta 2024-10-17 14:34 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj gupta @ 2024-10-17 12:59 UTC (permalink / raw) To: Christoph Hellwig Cc: Anuj Gupta, axboe, kbusch, martin.petersen, asml.silence, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Thu, Oct 17, 2024 at 5:31 PM Christoph Hellwig <[email protected]> wrote: > > On Thu, Oct 17, 2024 at 04:15:02PM +0530, Anuj Gupta wrote: > > On Thu, Oct 17, 2024 at 10:00:15AM +0200, Christoph Hellwig wrote: > > > On Wed, Oct 16, 2024 at 04:59:07PM +0530, Anuj Gupta wrote: > > > > Add flags to describe checks for integrity meta buffer. These flags are > > > > specified by application as io_uring meta_flags, added in the next patch. > > > > > > These are now blkdev uapis, but io_uring ones even if currently only > > > the block file operations implement them. I do plan to support these > > > through file systems eventually. > > > > Are these flags placed correctly here or you see that they should be > > moved somewhere else? > > They are not as they aren't blkdev apis. They are generic for io_uring, > or maybe even VFS-wide. > The last iteration of this series added these flags as io-uring flags [1]. Based on feedback received [2], I moved it here in this version. Should I move them back to io-uring? [1] https://lore.kernel.org/linux-block/[email protected]/ [2] https://lore.kernel.org/linux-block/[email protected]/ ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 06/11] block: add flags for integrity meta 2024-10-17 12:59 ` Anuj gupta @ 2024-10-17 14:34 ` Christoph Hellwig 0 siblings, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 14:34 UTC (permalink / raw) To: Anuj gupta Cc: Christoph Hellwig, Anuj Gupta, axboe, kbusch, martin.petersen, asml.silence, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Thu, Oct 17, 2024 at 06:29:32PM +0530, Anuj gupta wrote: > The last iteration of this series added these flags as io-uring flags [1]. > Based on feedback received [2], I moved it here in this version. > Should I move them back to io-uring? Maybe I misread the patch back then, but IIRC at that point the flag was also used on the bio and not just in the uapi? As of this series it is used in the uapi, and in the block layer. Based on that uapi/linux/fs.h might be the best place. ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113747epcas5p4e276eb0da2695ba032ce1d2a3b83fff4@epcas5p4.samsung.com>]
* [PATCH v4 07/11] io_uring/rw: add support to send meta along with read/write [not found] ` <CGME20241016113747epcas5p4e276eb0da2695ba032ce1d2a3b83fff4@epcas5p4.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-17 8:10 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta, Kanchan Joshi This patch adds the capability of sending meta along with read/write. This meta is represented by a newly introduced 'struct io_uring_meta' which specifies information such as meta type,flags,buffer,length,seed and apptag. Application sets up a SQE128 ring, prepares io_uring_meta within the second SQE. The patch processes the user-passed information to prepare uio_meta descriptor and passes it down using kiocb->private. Meta exchange is supported only for direct IO. Also vectored read/write operations with meta are not supported currently. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- include/uapi/linux/io_uring.h | 26 ++++++++++++ io_uring/io_uring.c | 6 +++ io_uring/rw.c | 75 +++++++++++++++++++++++++++++++++-- io_uring/rw.h | 15 ++++++- 4 files changed, 118 insertions(+), 4 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 86cb385fe0b5..1cd165720fcc 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -105,6 +105,32 @@ struct io_uring_sqe { */ __u8 cmd[0]; }; + /* + * If the ring is initialized with IORING_SETUP_SQE128, then + * this field is starting offset for 64 bytes of data. For meta io + * this contains 'struct io_uring_meta' + */ + __u8 big_sqe_cmd[0]; +}; + +enum io_uring_sqe_meta_type_bits { + META_TYPE_INTEGRITY_BIT, + /* not a real meta type; just to make sure that we don't overflow */ + META_TYPE_LAST_BIT, +}; + +/* meta type flags */ +#define META_TYPE_INTEGRITY (1U << META_TYPE_INTEGRITY_BIT) + +/* this goes to SQE128 */ +struct io_uring_meta { + __u16 meta_type; + __u16 meta_flags; + __u32 meta_len; + __u64 meta_addr; + __u32 seed; + __u16 app_tag; + __u8 pad[42]; }; /* diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index d7ad4ea5f40b..e5551e2e7bde 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3857,6 +3857,12 @@ static int __init io_uring_init(void) /* top 8bits are for internal use */ BUILD_BUG_ON((IORING_URING_CMD_MASK & 0xff000000) != 0); + BUILD_BUG_ON(sizeof(struct io_uring_meta) > + sizeof(struct io_uring_sqe)); + + BUILD_BUG_ON(META_TYPE_LAST_BIT > + 8 * sizeof_field(struct io_uring_meta, meta_type)); + io_uring_optable_init(); /* diff --git a/io_uring/rw.c b/io_uring/rw.c index 80ae3c2ebb70..b727e5ef19fc 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -257,6 +257,49 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import) return 0; } +static inline void io_meta_save_state(struct io_async_rw *io) +{ + io->meta_state.seed = io->meta.seed; + iov_iter_save_state(&io->meta.iter, &io->meta_state.iter_meta); +} + +static inline void io_meta_restore(struct io_async_rw *io) +{ + io->meta.seed = io->meta_state.seed; + iov_iter_restore(&io->meta.iter, &io->meta_state.iter_meta); +} + +static int io_prep_rw_meta(struct io_kiocb *req, const struct io_uring_sqe *sqe, + struct io_rw *rw, int ddir) +{ + const struct io_uring_meta *md = (struct io_uring_meta *)sqe->big_sqe_cmd; + u16 meta_type = READ_ONCE(md->meta_type); + const struct io_issue_def *def; + struct io_async_rw *io; + int ret; + + if (!meta_type) + return 0; + if (!(meta_type & META_TYPE_INTEGRITY)) + return -EINVAL; + + def = &io_issue_defs[req->opcode]; + if (def->vectored) + return -EOPNOTSUPP; + + io = req->async_data; + io->meta.flags = READ_ONCE(md->meta_flags); + io->meta.app_tag = READ_ONCE(md->app_tag); + io->meta.seed = READ_ONCE(md->seed); + ret = import_ubuf(ddir, u64_to_user_ptr(READ_ONCE(md->meta_addr)), + READ_ONCE(md->meta_len), &io->meta.iter); + if (unlikely(ret < 0)) + return ret; + rw->kiocb.ki_flags |= IOCB_HAS_METADATA; + io_meta_save_state(io); + return ret; +} + static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, int ddir, bool do_import) { @@ -279,11 +322,18 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe, rw->kiocb.ki_ioprio = get_current_ioprio(); } rw->kiocb.dio_complete = NULL; + rw->kiocb.ki_flags = 0; rw->addr = READ_ONCE(sqe->addr); rw->len = READ_ONCE(sqe->len); rw->flags = READ_ONCE(sqe->rw_flags); - return io_prep_rw_setup(req, ddir, do_import); + ret = io_prep_rw_setup(req, ddir, do_import); + + if (unlikely(ret)) + return ret; + if (unlikely(req->ctx->flags & IORING_SETUP_SQE128)) + ret = io_prep_rw_meta(req, sqe, rw, ddir); + return ret; } int io_prep_read(struct io_kiocb *req, const struct io_uring_sqe *sqe) @@ -410,7 +460,10 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req) static void io_resubmit_prep(struct io_kiocb *req) { struct io_async_rw *io = req->async_data; + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + if (unlikely(rw->kiocb.ki_flags & IOCB_HAS_METADATA)) + io_meta_restore(io); iov_iter_restore(&io->iter, &io->iter_state); } @@ -777,8 +830,12 @@ static inline int io_iter_do_read(struct io_rw *rw, struct iov_iter *iter) static bool need_complete_io(struct io_kiocb *req) { + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + + /* Exclude meta IO as we don't support partial completion for that */ return req->flags & REQ_F_ISREG || - S_ISBLK(file_inode(req->file)->i_mode); + S_ISBLK(file_inode(req->file)->i_mode) || + !(rw->kiocb.ki_flags & IOCB_HAS_METADATA); } static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type) @@ -795,7 +852,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type) if (!(req->flags & REQ_F_FIXED_FILE)) req->flags |= io_file_get_flags(file); - kiocb->ki_flags = file->f_iocb_flags; + kiocb->ki_flags |= file->f_iocb_flags; ret = kiocb_set_rw_flags(kiocb, rw->flags, rw_type); if (unlikely(ret)) return ret; @@ -824,6 +881,14 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type) kiocb->ki_complete = io_complete_rw; } + if (unlikely(kiocb->ki_flags & IOCB_HAS_METADATA)) { + struct io_async_rw *io = req->async_data; + + if (!(req->file->f_flags & O_DIRECT)) + return -EOPNOTSUPP; + kiocb->private = &io->meta; + } + return 0; } @@ -898,6 +963,8 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags) * manually if we need to. */ iov_iter_restore(&io->iter, &io->iter_state); + if (unlikely(kiocb->ki_flags & IOCB_HAS_METADATA)) + io_meta_restore(io); do { /* @@ -1102,6 +1169,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) } else { ret_eagain: iov_iter_restore(&io->iter, &io->iter_state); + if (unlikely(kiocb->ki_flags & IOCB_HAS_METADATA)) + io_meta_restore(io); if (kiocb->ki_flags & IOCB_WRITE) io_req_end_write(req); return -EAGAIN; diff --git a/io_uring/rw.h b/io_uring/rw.h index 3f432dc75441..af9e338b2bd8 100644 --- a/io_uring/rw.h +++ b/io_uring/rw.h @@ -1,6 +1,12 @@ // SPDX-License-Identifier: GPL-2.0 #include <linux/pagemap.h> +#include <linux/bio-integrity.h> + +struct io_meta_state { + u32 seed; + struct iov_iter_state iter_meta; +}; struct io_async_rw { size_t bytes_done; @@ -9,7 +15,14 @@ struct io_async_rw { struct iovec fast_iov; struct iovec *free_iovec; int free_iov_nr; - struct wait_page_queue wpq; + /* wpq is for buffered io, while meta fields are used with direct io*/ + union { + struct wait_page_queue wpq; + struct { + struct uio_meta meta; + struct io_meta_state meta_state; + }; + }; }; int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe); -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/11] io_uring/rw: add support to send meta along with read/write 2024-10-16 11:29 ` [PATCH v4 07/11] io_uring/rw: add support to send meta along with read/write Anuj Gupta @ 2024-10-17 8:10 ` Christoph Hellwig 2024-10-17 22:51 ` Jens Axboe ` (2 more replies) 0 siblings, 3 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 8:10 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi s/meta/metadata/ in the subject. > + const struct io_uring_meta *md = (struct io_uring_meta *)sqe->big_sqe_cmd; Overly long line. > + if (!meta_type) > + return 0; > + if (!(meta_type & META_TYPE_INTEGRITY)) > + return -EINVAL; What is the meta_type for? To distintinguish PI from non-PI metadata? Why doesn't this support non-PI metadata? Also PI or TO_PI might be a better name than the rather generic integrity. (but I'll defer to Martin if he has any good arguments for naming here). > static bool need_complete_io(struct io_kiocb *req) > { > + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); > + > + /* Exclude meta IO as we don't support partial completion for that */ > return req->flags & REQ_F_ISREG || > - S_ISBLK(file_inode(req->file)->i_mode); > + S_ISBLK(file_inode(req->file)->i_mode) || > + !(rw->kiocb.ki_flags & IOCB_HAS_METADATA); > } What partial ocmpletions aren't supported? Note that this would trigger easily as right now metadata is only added for block devices anyway. > + if (unlikely(kiocb->ki_flags & IOCB_HAS_METADATA)) { For a workload using metadata this is everything but unlikely. Is there a specific reason you're trying to override the existing branch predictor here (although on at least x86_64 gcc these kinds of unlikely calls tend to be no-ops anyway). > + struct io_async_rw *io = req->async_data; > + > + if (!(req->file->f_flags & O_DIRECT)) > + return -EOPNOTSUPP; I guess you are forcing this here rather in the file oerations instance because the union of the field in struct io_async_rw. Please add comment about that. > + /* wpq is for buffered io, while meta fields are used with direct io*/ missing whitespace before the closing of the comment. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/11] io_uring/rw: add support to send meta along with read/write 2024-10-17 8:10 ` Christoph Hellwig @ 2024-10-17 22:51 ` Jens Axboe 2024-10-21 5:31 ` Anuj Gupta 2024-10-22 1:50 ` Martin K. Petersen 2 siblings, 0 replies; 48+ messages in thread From: Jens Axboe @ 2024-10-17 22:51 UTC (permalink / raw) To: Christoph Hellwig, Anuj Gupta Cc: kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi On 10/17/24 2:10 AM, Christoph Hellwig wrote: > s/meta/metadata/ in the subject. > >> + const struct io_uring_meta *md = (struct io_uring_meta *)sqe->big_sqe_cmd; > > Overly long line. That's fine in io_uring, I prefer slightly longer lines rather than needlessly breaking them up. -- Jens Axboe ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/11] io_uring/rw: add support to send meta along with read/write 2024-10-17 8:10 ` Christoph Hellwig 2024-10-17 22:51 ` Jens Axboe @ 2024-10-21 5:31 ` Anuj Gupta 2024-10-22 6:02 ` Christoph Hellwig 2024-10-22 1:50 ` Martin K. Petersen 2 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-21 5:31 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi [-- Attachment #1: Type: text/plain, Size: 1670 bytes --] > What is the meta_type for? To distintinguish PI from non-PI metadata? meta_type field is kept so that meta_types beyond integrity can also be supported in future. Pavel suggested this to Kanchan when this was discussed in LSF/MM. > Why doesn't this support non-PI metadata? It supports that. We have tested that (pi_type = 0 case). > Also PI or TO_PI might be > a better name than the rather generic integrity. (but I'll defer to > Martin if he has any good arguments for naming here). Open to a different/better name. > > > static bool need_complete_io(struct io_kiocb *req) > > { > > + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); > > + > > + /* Exclude meta IO as we don't support partial completion for that */ > > return req->flags & REQ_F_ISREG || > > - S_ISBLK(file_inode(req->file)->i_mode); > > + S_ISBLK(file_inode(req->file)->i_mode) || > > + !(rw->kiocb.ki_flags & IOCB_HAS_METADATA); > > } > > What partial ocmpletions aren't supported? Note that this would > trigger easily as right now metadata is only added for block devices > anyway. It seems that this scenario is less likely to happen. The plumbing seemed a bit non trivial. I have the plan to look at it, once the initial version of this series goes in. > > > + if (unlikely(kiocb->ki_flags & IOCB_HAS_METADATA)) { > > For a workload using metadata this is everything but unlikely. Is > there a specific reason you're trying to override the existing > branch predictor here (although on at least x86_64 gcc these kinds > of unlikely calls tend to be no-ops anyway). The branch predictions were added to make it a bit friendly for non-metadata read/write case. [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/11] io_uring/rw: add support to send meta along with read/write 2024-10-21 5:31 ` Anuj Gupta @ 2024-10-22 6:02 ` Christoph Hellwig 0 siblings, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-22 6:02 UTC (permalink / raw) To: Anuj Gupta Cc: Christoph Hellwig, axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi On Mon, Oct 21, 2024 at 11:01:10AM +0530, Anuj Gupta wrote: > > What is the meta_type for? To distintinguish PI from non-PI metadata? > > meta_type field is kept so that meta_types beyond integrity can also > be supported in future. Pavel suggested this to Kanchan when this was > discussed in LSF/MM. > > > Why doesn't this support non-PI metadata? > > It supports that. We have tested that (pi_type = 0 case). What other metadata except for PI and plain non-integrity data do you plan to support? This seems like a weird field. In doubt just leave reserved space that is checked for 0 instead of adding an encondig that doesn't make much sense. If we actually do end up with a metadata scheme we can't encode into the pi_type we can still use that reserved space. > > > Also PI or TO_PI might be > > a better name than the rather generic integrity. (but I'll defer to > > Martin if he has any good arguments for naming here). > > Open to a different/better name. metadata? > > > + /* Exclude meta IO as we don't support partial completion for that */ > > > return req->flags & REQ_F_ISREG || > > > - S_ISBLK(file_inode(req->file)->i_mode); > > > + S_ISBLK(file_inode(req->file)->i_mode) || > > > + !(rw->kiocb.ki_flags & IOCB_HAS_METADATA); > > > } > > > > What partial ocmpletions aren't supported? Note that this would > > trigger easily as right now metadata is only added for block devices > > anyway. > > It seems that this scenario is less likely to happen. The plumbing > seemed a bit non trivial. I have the plan to look at it, once the > initial version of this series goes in. I still don't understand what this is trying to do, especially as it is dead code with the checks above. > > > > > + if (unlikely(kiocb->ki_flags & IOCB_HAS_METADATA)) { > > > > For a workload using metadata this is everything but unlikely. Is > > there a specific reason you're trying to override the existing > > branch predictor here (although on at least x86_64 gcc these kinds > > of unlikely calls tend to be no-ops anyway). > > The branch predictions were added to make it a bit friendly for > non-metadata read/write case. Throwing in these hints unless you have numbers justifying them is not a good idea. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 07/11] io_uring/rw: add support to send meta along with read/write 2024-10-17 8:10 ` Christoph Hellwig 2024-10-17 22:51 ` Jens Axboe 2024-10-21 5:31 ` Anuj Gupta @ 2024-10-22 1:50 ` Martin K. Petersen 2 siblings, 0 replies; 48+ messages in thread From: Martin K. Petersen @ 2024-10-22 1:50 UTC (permalink / raw) To: Christoph Hellwig Cc: Anuj Gupta, axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi Christoph, >> + if (!meta_type) >> + return 0; >> + if (!(meta_type & META_TYPE_INTEGRITY)) >> + return -EINVAL; > > What is the meta_type for? To distintinguish PI from non-PI metadata? > Why doesn't this support non-PI metadata? Also PI or TO_PI might be > a better name than the rather generic integrity. (but I'll defer to > Martin if he has any good arguments for naming here). It should probably be "META_TYPE_T10_PI". "Integrity" was meant as a protocol-agnostic name since there were other proposed protection information schemes being discussed in the standards at the time. I didn't want to limit the block layer changes to one particular storage protocol. NVMe implements features that are not defined by T10 PI such as the storage tag and the larger CRCs. But despite those, NVMe still follows the defined T10 PI model. So I think "META_TYPE_T10_PI" is fairly accurate. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113750epcas5p2089e395ca764de023be64519da9b0982@epcas5p2.samsung.com>]
* [PATCH v4 08/11] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags [not found] ` <CGME20241016113750epcas5p2089e395ca764de023be64519da9b0982@epcas5p2.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-17 8:12 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta, Kanchan Joshi This patch introduces BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags which indicate how the hardware should check the integrity payload. The driver can now just rely on block layer flags, and doesn't need to know the integrity source. Submitter of PI decides which tags to check. This would also give us a unified interface for user and kernel generated integrity. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- block/bio-integrity.c | 5 +++++ drivers/nvme/host/core.c | 11 +++-------- include/linux/bio-integrity.h | 6 +++++- 3 files changed, 13 insertions(+), 9 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 341a0382befd..d3c8b56d3fe6 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -436,6 +436,11 @@ bool bio_integrity_prep(struct bio *bio) if (bi->csum_type == BLK_INTEGRITY_CSUM_IP) bip->bip_flags |= BIP_IP_CHECKSUM; + /* describe what tags to check in payload */ + if (bi->csum_type) + bip->bip_flags |= BIP_CHECK_GUARD; + if (bi->flags & BLK_INTEGRITY_REF_TAG) + bip->bip_flags |= BIP_CHECK_REFTAG; if (bio_integrity_add_page(bio, virt_to_page(buf), len, offset_in_page(buf)) < len) { printk(KERN_ERR "could not attach integrity payload\n"); diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 43d73d31c66f..211f44cc02a3 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1004,18 +1004,13 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, control |= NVME_RW_PRINFO_PRACT; } - switch (ns->head->pi_type) { - case NVME_NS_DPS_PI_TYPE3: + if (bio_integrity_flagged(req->bio, BIP_CHECK_GUARD)) control |= NVME_RW_PRINFO_PRCHK_GUARD; - break; - case NVME_NS_DPS_PI_TYPE1: - case NVME_NS_DPS_PI_TYPE2: - control |= NVME_RW_PRINFO_PRCHK_GUARD | - NVME_RW_PRINFO_PRCHK_REF; + if (bio_integrity_flagged(req->bio, BIP_CHECK_REFTAG)) { + control |= NVME_RW_PRINFO_PRCHK_REF; if (op == nvme_cmd_zone_append) control |= NVME_RW_APPEND_PIREMAP; nvme_set_ref_tag(ns, cmnd, req); - break; } } diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h index 529ec7a8df20..a9dd0594dfc8 100644 --- a/include/linux/bio-integrity.h +++ b/include/linux/bio-integrity.h @@ -11,6 +11,9 @@ enum bip_flags { BIP_DISK_NOCHECK = 1 << 3, /* disable disk integrity checking */ BIP_IP_CHECKSUM = 1 << 4, /* IP checksum */ BIP_COPY_USER = 1 << 5, /* Kernel bounce buffer in use */ + BIP_CHECK_GUARD = 1 << 6, /* guard check */ + BIP_CHECK_REFTAG = 1 << 7, /* reftag check */ + BIP_CHECK_APPTAG = 1 << 8, /* apptag check */ }; struct bio_integrity_payload { @@ -41,7 +44,8 @@ struct uio_meta { }; #define BIP_CLONE_FLAGS (BIP_MAPPED_INTEGRITY | BIP_CTRL_NOCHECK | \ - BIP_DISK_NOCHECK | BIP_IP_CHECKSUM) + BIP_DISK_NOCHECK | BIP_IP_CHECKSUM | \ + BIP_CHECK_GUARD | BIP_CHECK_REFTAG | BIP_CHECK_APPTAG) #ifdef CONFIG_BLK_DEV_INTEGRITY -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 08/11] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags 2024-10-16 11:29 ` [PATCH v4 08/11] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags Anuj Gupta @ 2024-10-17 8:12 ` Christoph Hellwig 2024-10-17 10:46 ` Anuj Gupta 0 siblings, 1 reply; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 8:12 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi On Wed, Oct 16, 2024 at 04:59:09PM +0530, Anuj Gupta wrote: > This patch introduces BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags which > indicate how the hardware should check the integrity payload. The > driver can now just rely on block layer flags, and doesn't need to > know the integrity source. Submitter of PI decides which tags to check. > This would also give us a unified interface for user and kernel > generated integrity. The conversion of the existing logic looks good, but the BIP_CHECK_APPTAG flag is completely unreferenced. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 08/11] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags 2024-10-17 8:12 ` Christoph Hellwig @ 2024-10-17 10:46 ` Anuj Gupta 2024-10-17 14:37 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-17 10:46 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi [-- Attachment #1: Type: text/plain, Size: 772 bytes --] On Thu, Oct 17, 2024 at 10:12:23AM +0200, Christoph Hellwig wrote: > On Wed, Oct 16, 2024 at 04:59:09PM +0530, Anuj Gupta wrote: > > This patch introduces BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags which > > indicate how the hardware should check the integrity payload. The > > driver can now just rely on block layer flags, and doesn't need to > > know the integrity source. Submitter of PI decides which tags to check. > > This would also give us a unified interface for user and kernel > > generated integrity. > > The conversion of the existing logic looks good, but the BIP_CHECK_APPTAG > flag is completely unreferenced. It's being used by the nvme and scsi patch later. Should I introduce this flag later in either nvme or scsi patch where we actually use it. > > [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 08/11] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags 2024-10-17 10:46 ` Anuj Gupta @ 2024-10-17 14:37 ` Christoph Hellwig 0 siblings, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 14:37 UTC (permalink / raw) To: Anuj Gupta Cc: Christoph Hellwig, axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi On Thu, Oct 17, 2024 at 04:16:13PM +0530, Anuj Gupta wrote: > On Thu, Oct 17, 2024 at 10:12:23AM +0200, Christoph Hellwig wrote: > > On Wed, Oct 16, 2024 at 04:59:09PM +0530, Anuj Gupta wrote: > > > This patch introduces BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags which > > > indicate how the hardware should check the integrity payload. The > > > driver can now just rely on block layer flags, and doesn't need to > > > know the integrity source. Submitter of PI decides which tags to check. > > > This would also give us a unified interface for user and kernel > > > generated integrity. > > > > The conversion of the existing logic looks good, but the BIP_CHECK_APPTAG > > flag is completely unreferenced. > > It's being used by the nvme and scsi patch later. Should I introduce this > flag later in either nvme or scsi patch where we actually use it. Maybe a separate one? Or at least very clearly state that the other two are conversion of the existing semantics, while this one is new. ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113752epcas5p4c365819fce1e5d498fd781ae2b309341@epcas5p4.samsung.com>]
* [PATCH v4 09/11] block: add support to pass user meta buffer [not found] ` <CGME20241016113752epcas5p4c365819fce1e5d498fd781ae2b309341@epcas5p4.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-17 8:15 ` Christoph Hellwig 2024-10-17 8:24 ` Christoph Hellwig 0 siblings, 2 replies; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi, Anuj Gupta From: Kanchan Joshi <[email protected]> If an iocb contains metadata, extract that and prepare the bip. Based on flags specified by the user, set corresponding guard/app/ref tags to be checked in bip. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- block/bio-integrity.c | 51 +++++++++++++++++++++++++++++++++++ block/fops.c | 44 +++++++++++++++++++++++------- include/linux/bio-integrity.h | 7 +++++ 3 files changed, 93 insertions(+), 9 deletions(-) diff --git a/block/bio-integrity.c b/block/bio-integrity.c index d3c8b56d3fe6..24fad9b6f3ec 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -12,6 +12,7 @@ #include <linux/bio.h> #include <linux/workqueue.h> #include <linux/slab.h> +#include <uapi/linux/blkdev.h> #include "blk.h" static struct kmem_cache *bip_slab; @@ -303,6 +304,55 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages, return nr_bvecs; } +static void bio_uio_meta_to_bip(struct bio *bio, struct uio_meta *meta) +{ + struct bio_integrity_payload *bip = bio_integrity(bio); + + if (meta->flags & BLK_INTEGRITY_CHK_GUARD) + bip->bip_flags |= BIP_CHECK_GUARD; + if (meta->flags & BLK_INTEGRITY_CHK_APPTAG) + bip->bip_flags |= BIP_CHECK_APPTAG; + if (meta->flags & BLK_INTEGRITY_CHK_REFTAG) + bip->bip_flags |= BIP_CHECK_REFTAG; + + bip->app_tag = meta->app_tag; +} + +int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta) +{ + struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk); + unsigned int integrity_bytes; + int ret; + struct iov_iter it; + + if (!bi) + return -EINVAL; + + /* should fit into two bytes */ + BUILD_BUG_ON(BLK_INTEGRITY_VALID_FLAGS >= (1 << 16)); + + if (meta->flags && (meta->flags & ~BLK_INTEGRITY_VALID_FLAGS)) + return -EINVAL; + + /* + * original meta iterator can be bigger. + * process integrity info corresponding to current data buffer only. + */ + it = meta->iter; + integrity_bytes = bio_integrity_bytes(bi, bio_sectors(bio)); + if (it.count < integrity_bytes) + return -EINVAL; + + it.count = integrity_bytes; + ret = bio_integrity_map_user(bio, &it, meta->seed); + if (!ret) { + bio_uio_meta_to_bip(bio, meta); + iov_iter_advance(&meta->iter, integrity_bytes); + meta->seed += bio_integrity_intervals(bi, bio_sectors(bio)); + } + return ret; +} + int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed) { @@ -566,6 +616,7 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src, bip->bip_vec = bip_src->bip_vec; bip->bip_iter = bip_src->bip_iter; bip->bip_flags = bip_src->bip_flags & BIP_CLONE_FLAGS; + bip->app_tag = bip_src->app_tag; return 0; } diff --git a/block/fops.c b/block/fops.c index e696ae53bf1e..59257b209bfb 100644 --- a/block/fops.c +++ b/block/fops.c @@ -57,6 +57,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb, struct bio bio; ssize_t ret; + WARN_ON_ONCE(iocb->ki_flags & IOCB_HAS_METADATA); if (nr_pages <= DIO_INLINE_BIO_VECS) vecs = inline_vecs; else { @@ -131,6 +132,9 @@ static void blkdev_bio_end_io(struct bio *bio) if (bio->bi_status && !dio->bio.bi_status) dio->bio.bi_status = bio->bi_status; + if (dio->iocb->ki_flags & IOCB_HAS_METADATA) + bio_integrity_unmap_user(bio); + if (atomic_dec_and_test(&dio->ref)) { if (!(dio->flags & DIO_IS_SYNC)) { struct kiocb *iocb = dio->iocb; @@ -224,14 +228,16 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, * a retry of this from blocking context. */ if (unlikely(iov_iter_count(iter))) { - bio_release_pages(bio, false); - bio_clear_flag(bio, BIO_REFFED); - bio_put(bio); - blk_finish_plug(&plug); - return -EAGAIN; + ret = -EAGAIN; + goto fail; } bio->bi_opf |= REQ_NOWAIT; } + if (!is_sync && (iocb->ki_flags & IOCB_HAS_METADATA)) { + ret = bio_integrity_map_iter(bio, iocb->private); + if (unlikely(ret)) + goto fail; + } if (is_read) { if (dio->flags & DIO_SHOULD_DIRTY) @@ -272,6 +278,14 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, bio_put(&dio->bio); return ret; + +fail: + bio_release_pages(bio, false); + bio_clear_flag(bio, BIO_REFFED); + bio_put(bio); + blk_finish_plug(&plug); + return ret; + } static void blkdev_bio_end_io_async(struct bio *bio) @@ -289,6 +303,9 @@ static void blkdev_bio_end_io_async(struct bio *bio) ret = blk_status_to_errno(bio->bi_status); } + if (iocb->ki_flags & IOCB_HAS_METADATA) + bio_integrity_unmap_user(bio); + iocb->ki_complete(iocb, ret); if (dio->flags & DIO_SHOULD_DIRTY) { @@ -333,10 +350,8 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, bio_iov_bvec_set(bio, iter); } else { ret = bio_iov_iter_get_pages(bio, iter); - if (unlikely(ret)) { - bio_put(bio); - return ret; - } + if (unlikely(ret)) + goto out_bio_put; } dio->size = bio->bi_iter.bi_size; @@ -349,6 +364,13 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, task_io_account_write(bio->bi_iter.bi_size); } + if (iocb->ki_flags & IOCB_HAS_METADATA) { + ret = bio_integrity_map_iter(bio, iocb->private); + WRITE_ONCE(iocb->private, NULL); + if (unlikely(ret)) + goto out_bio_put; + } + if (iocb->ki_flags & IOCB_ATOMIC) bio->bi_opf |= REQ_ATOMIC; @@ -363,6 +385,10 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb, submit_bio(bio); } return -EIOCBQUEUED; + +out_bio_put: + bio_put(bio); + return ret; } static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter) diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h index a9dd0594dfc8..ec792873f2d5 100644 --- a/include/linux/bio-integrity.h +++ b/include/linux/bio-integrity.h @@ -24,6 +24,7 @@ struct bio_integrity_payload { unsigned short bip_vcnt; /* # of integrity bio_vecs */ unsigned short bip_max_vcnt; /* integrity bio_vec slots */ unsigned short bip_flags; /* control flags */ + u16 app_tag; /* application tag value */ struct bvec_iter bio_iter; /* for rewinding parent bio */ @@ -90,6 +91,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio, gfp_t gfp, int bio_integrity_add_page(struct bio *bio, struct page *page, unsigned int len, unsigned int offset); int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, u32 seed); +int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta); void bio_integrity_unmap_user(struct bio *bio); bool bio_integrity_prep(struct bio *bio); void bio_integrity_advance(struct bio *bio, unsigned int bytes_done); @@ -121,6 +123,11 @@ static inline int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter, return -EINVAL; } +static inline int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta) +{ + return -EINVAL; +} + static inline void bio_integrity_unmap_user(struct bio *bio) { } -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 09/11] block: add support to pass user meta buffer 2024-10-16 11:29 ` [PATCH v4 09/11] block: add support to pass user meta buffer Anuj Gupta @ 2024-10-17 8:15 ` Christoph Hellwig 2024-10-17 8:24 ` Christoph Hellwig 1 sibling, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 8:15 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi Looks good: Reviewed-by: Christoph Hellwig <[email protected]> ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 09/11] block: add support to pass user meta buffer 2024-10-16 11:29 ` [PATCH v4 09/11] block: add support to pass user meta buffer Anuj Gupta 2024-10-17 8:15 ` Christoph Hellwig @ 2024-10-17 8:24 ` Christoph Hellwig 1 sibling, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 8:24 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi On Wed, Oct 16, 2024 at 04:59:10PM +0530, Anuj Gupta wrote: > From: Kanchan Joshi <[email protected]> > > If an iocb contains metadata, extract that and prepare the bip. > Based on flags specified by the user, set corresponding guard/app/ref > tags to be checked in bip. > > Signed-off-by: Anuj Gupta <[email protected]> > Signed-off-by: Kanchan Joshi <[email protected]> > --- > block/bio-integrity.c | 51 +++++++++++++++++++++++++++++++++++ > block/fops.c | 44 +++++++++++++++++++++++------- > include/linux/bio-integrity.h | 7 +++++ > 3 files changed, 93 insertions(+), 9 deletions(-) > > diff --git a/block/bio-integrity.c b/block/bio-integrity.c > index d3c8b56d3fe6..24fad9b6f3ec 100644 > --- a/block/bio-integrity.c > +++ b/block/bio-integrity.c > @@ -12,6 +12,7 @@ > #include <linux/bio.h> > #include <linux/workqueue.h> > #include <linux/slab.h> > +#include <uapi/linux/blkdev.h> > #include "blk.h" > > static struct kmem_cache *bip_slab; > @@ -303,6 +304,55 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages, > return nr_bvecs; > } > > +static void bio_uio_meta_to_bip(struct bio *bio, struct uio_meta *meta) > +{ > + struct bio_integrity_payload *bip = bio_integrity(bio); > + > + if (meta->flags & BLK_INTEGRITY_CHK_GUARD) > + bip->bip_flags |= BIP_CHECK_GUARD; > + if (meta->flags & BLK_INTEGRITY_CHK_APPTAG) > + bip->bip_flags |= BIP_CHECK_APPTAG; > + if (meta->flags & BLK_INTEGRITY_CHK_REFTAG) > + bip->bip_flags |= BIP_CHECK_REFTAG; > + > + bip->app_tag = meta->app_tag; > +} > + > +int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta) Just noticed this when looking at the seed situation: Can you please move bio_integrity_map_iter below bio_integrity_map_user as it is a relatively thing wrapper for it? ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113755epcas5p2d563b183a9f4e19f5c02d73255282342@epcas5p2.samsung.com>]
* [PATCH v4 10/11] nvme: add support for passing on the application tag [not found] ` <CGME20241016113755epcas5p2d563b183a9f4e19f5c02d73255282342@epcas5p2.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-17 8:14 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi, Anuj Gupta From: Kanchan Joshi <[email protected]> With user integrity buffer, there is a way to specify the app_tag. Set the corresponding protocol specific flags and send the app_tag down. Signed-off-by: Anuj Gupta <[email protected]> Signed-off-by: Kanchan Joshi <[email protected]> --- drivers/nvme/host/core.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 211f44cc02a3..c7ba264504b5 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -872,6 +872,12 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req, return BLK_STS_OK; } +static void nvme_set_app_tag(struct request *req, struct nvme_command *cmnd) +{ + cmnd->rw.lbat = cpu_to_le16(bio_integrity(req->bio)->app_tag); + cmnd->rw.lbatm = cpu_to_le16(0xffff); +} + static void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd, struct request *req) { @@ -1012,6 +1018,10 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns, control |= NVME_RW_APPEND_PIREMAP; nvme_set_ref_tag(ns, cmnd, req); } + if (bio_integrity_flagged(req->bio, BIP_CHECK_APPTAG)) { + control |= NVME_RW_PRINFO_PRCHK_APP; + nvme_set_app_tag(req, cmnd); + } } cmnd->rw.control = cpu_to_le16(control); -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 10/11] nvme: add support for passing on the application tag 2024-10-16 11:29 ` [PATCH v4 10/11] nvme: add support for passing on the application tag Anuj Gupta @ 2024-10-17 8:14 ` Christoph Hellwig 0 siblings, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 8:14 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Kanchan Joshi Looks good: Reviewed-by: Christoph Hellwig <[email protected]> But I think this should be moved to before the user of this is added. ^ permalink raw reply [flat|nested] 48+ messages in thread
[parent not found: <CGME20241016113757epcas5p42b95123c857e5d92d9cdec55e190ce4e@epcas5p4.samsung.com>]
* [PATCH v4 11/11] scsi: add support for user-meta interface [not found] ` <CGME20241016113757epcas5p42b95123c857e5d92d9cdec55e190ce4e@epcas5p4.samsung.com> @ 2024-10-16 11:29 ` Anuj Gupta 2024-10-17 8:15 ` Christoph Hellwig ` (2 more replies) 0 siblings, 3 replies; 48+ messages in thread From: Anuj Gupta @ 2024-10-16 11:29 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g, Anuj Gupta Add support for sending user-meta buffer. Set tags to be checked using flags specified by user/block-layer user and underlying DIF/DIX configuration. Signed-off-by: Anuj Gupta <[email protected]> --- drivers/scsi/sd.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index ca4bc0ac76ad..87ae19c9b29c 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -802,6 +802,23 @@ static unsigned int sd_prot_flag_mask(unsigned int prot_op) return flag_mask[prot_op]; } +/* + * Can't check reftag alone or apptag alone + */ +static bool sd_prot_flags_valid(struct scsi_cmnd *scmd) +{ + struct request *rq = scsi_cmd_to_rq(scmd); + struct bio *bio = rq->bio; + + if (bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && + !bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) + return false; + if (!bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && + bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) + return false; + return true; +} + static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd, unsigned int dix, unsigned int dif) { @@ -814,14 +831,16 @@ static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd, if (bio_integrity_flagged(bio, BIP_IP_CHECKSUM)) scmd->prot_flags |= SCSI_PROT_IP_CHECKSUM; - if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false) + if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false && + (bio_integrity_flagged(bio, BIP_CHECK_GUARD))) scmd->prot_flags |= SCSI_PROT_GUARD_CHECK; } if (dif != T10_PI_TYPE3_PROTECTION) { /* DIX/DIF Type 0, 1, 2 */ scmd->prot_flags |= SCSI_PROT_REF_INCREMENT; - if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false) + if ((bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false) && + (!dix || bio_integrity_flagged(bio, BIP_CHECK_REFTAG))) scmd->prot_flags |= SCSI_PROT_REF_CHECK; } @@ -1373,6 +1392,8 @@ static blk_status_t sd_setup_read_write_cmnd(struct scsi_cmnd *cmd) dif = scsi_host_dif_capable(cmd->device->host, sdkp->protection_type); dld = sd_cdl_dld(sdkp, cmd); + if (!sd_prot_flags_valid(cmd)) + goto fail; if (dif || dix) protect = sd_setup_protect_cmnd(cmd, dix, dif); else -- 2.25.1 ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 11/11] scsi: add support for user-meta interface 2024-10-16 11:29 ` [PATCH v4 11/11] scsi: add support for user-meta interface Anuj Gupta @ 2024-10-17 8:15 ` Christoph Hellwig 2024-10-17 11:39 ` Anuj Gupta 2024-10-22 1:58 ` Martin K. Petersen 2 siblings, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 8:15 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Wed, Oct 16, 2024 at 04:59:12PM +0530, Anuj Gupta wrote: > Add support for sending user-meta buffer. Set tags to be checked > using flags specified by user/block-layer user and underlying DIF/DIX > configuration. The patch itself looks good: Reviewed-by: Christoph Hellwig <[email protected]> again this should move to before the user is added. But we still seem to lack an interface to tell the userspace application what flags are actually supported. Just failing the I/O down in the sd driver still feels suboptimal. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 11/11] scsi: add support for user-meta interface 2024-10-16 11:29 ` [PATCH v4 11/11] scsi: add support for user-meta interface Anuj Gupta 2024-10-17 8:15 ` Christoph Hellwig @ 2024-10-17 11:39 ` Anuj Gupta 2024-10-17 14:39 ` Christoph Hellwig 2024-10-22 1:58 ` Martin K. Petersen 2 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-17 11:39 UTC (permalink / raw) To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g [-- Attachment #1: Type: text/plain, Size: 997 bytes --] > +/* > + * Can't check reftag alone or apptag alone > + */ > +static bool sd_prot_flags_valid(struct scsi_cmnd *scmd) > +{ > + struct request *rq = scsi_cmd_to_rq(scmd); > + struct bio *bio = rq->bio; > + > + if (bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && > + !bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) > + return false; > + if (!bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && > + bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) > + return false; > + return true; > +} > + Martin, Christoph, and all, This snippet prevents a scenario where a apptag check is specified without a reftag check and vice-versa, which is not possible for scsi[1]. But for block layer generated integrity apptag check (BIP_CHECK_APPTAG) is not specified. When scsi drive is formatted with type1/2 PI, block layer would specify refcheck but not appcheck. Hence, these I/O's would fail. Do you see how we can handle this? [1] https://lore.kernel.org/linux-block/[email protected]/ [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 11/11] scsi: add support for user-meta interface 2024-10-17 11:39 ` Anuj Gupta @ 2024-10-17 14:39 ` Christoph Hellwig 2024-10-18 8:26 ` Anuj Gupta 0 siblings, 1 reply; 48+ messages in thread From: Christoph Hellwig @ 2024-10-17 14:39 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Thu, Oct 17, 2024 at 05:09:23PM +0530, Anuj Gupta wrote: > This snippet prevents a scenario where a apptag check is specified without > a reftag check and vice-versa, which is not possible for scsi[1]. > But for > block layer generated integrity apptag check (BIP_CHECK_APPTAG) is not > specified. When scsi drive is formatted with type1/2 PI, block layer would > specify refcheck but not appcheck. Hence, these I/O's would fail. Do you > see how we can handle this? Well, this is also related to difference in capability checking. Just curious, do you have any user of the more fine grained checking in NVMe? If not we could support the SCSI semantics only and emulate them using the fine grained NVMe semantics and have no portability problems. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 11/11] scsi: add support for user-meta interface 2024-10-17 14:39 ` Christoph Hellwig @ 2024-10-18 8:26 ` Anuj Gupta 2024-10-18 9:02 ` Christoph Hellwig 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-18 8:26 UTC (permalink / raw) To: Christoph Hellwig Cc: axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g [-- Attachment #1: Type: text/plain, Size: 3041 bytes --] On Thu, Oct 17, 2024 at 04:39:18PM +0200, Christoph Hellwig wrote: > On Thu, Oct 17, 2024 at 05:09:23PM +0530, Anuj Gupta wrote: > > This snippet prevents a scenario where a apptag check is specified without > > a reftag check and vice-versa, which is not possible for scsi[1]. > > But for > > block layer generated integrity apptag check (BIP_CHECK_APPTAG) is not > > specified. When scsi drive is formatted with type1/2 PI, block layer would > > specify refcheck but not appcheck. Hence, these I/O's would fail. Do you > > see how we can handle this? > > Well, this is also related to difference in capability checking. Right. > Just curious, do you have any user of the more fine grained checking > in NVMe? If not we could support the SCSI semantics only and emulate > them using the fine grained NVMe semantics and have no portability > problems. We can choose to support scsi semantics only and expose only the valid scsi combinations to userspace i.e. 1. no check 2. guard check only 3. ref + app check only 4. guard + ref + app check Something like this [*] on top of this series, untested though. Does this align with what you have in mind? [*] diff --git a/block/bio-integrity.c b/block/bio-integrity.c index 24fad9b6f3ec..2ca27910770b 100644 --- a/block/bio-integrity.c +++ b/block/bio-integrity.c @@ -308,12 +308,10 @@ static void bio_uio_meta_to_bip(struct bio *bio, struct uio_meta *meta) { struct bio_integrity_payload *bip = bio_integrity(bio); - if (meta->flags & BLK_INTEGRITY_CHK_GUARD) + if (meta->flags & IO_INTEGRITY_CHK_GUARD) bip->bip_flags |= BIP_CHECK_GUARD; - if (meta->flags & BLK_INTEGRITY_CHK_APPTAG) - bip->bip_flags |= BIP_CHECK_APPTAG; - if (meta->flags & BLK_INTEGRITY_CHK_REFTAG) - bip->bip_flags |= BIP_CHECK_REFTAG; + if (meta->flags & IO_INTEGRITY_CHK_REF_APP) + bip->bip_flags |= BIP_CHECK_REFTAG | BIP_CHECK_APPTAG; bip->app_tag = meta->app_tag; } @@ -329,9 +327,9 @@ int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta) return -EINVAL; /* should fit into two bytes */ - BUILD_BUG_ON(BLK_INTEGRITY_VALID_FLAGS >= (1 << 16)); + BUILD_BUG_ON(IO_INTEGRITY_VALID_FLAGS >= (1 << 16)); - if (meta->flags && (meta->flags & ~BLK_INTEGRITY_VALID_FLAGS)) + if (meta->flags && (meta->flags & ~IO_INTEGRITY_VALID_FLAGS)) return -EINVAL; /* diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index 753971770733..714700f9826e 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -40,6 +40,15 @@ #define BLOCK_SIZE_BITS 10 #define BLOCK_SIZE (1<<BLOCK_SIZE_BITS) +/* + * flags for integrity meta + */ +#define IO_INTEGRITY_CHK_GUARD (1U << 0) /* enforce guard check */ +#define IO_INTEGRITY_CHK_REF_APP (1U << 1) /* enforce ref and app check */ + +#define IO_INTEGRITY_VALID_FLAGS (IO_INTEGRITY_CHK_GUARD | \ + IO_INTEGRITY_CHK_REF_APP) + #define SEEK_SET 0 /* seek relative to beginning of file */ #define SEEK_CUR 1 /* seek relative to current file position */ #define SEEK_END 2 /* seek relative to end of file */ [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 11/11] scsi: add support for user-meta interface 2024-10-18 8:26 ` Anuj Gupta @ 2024-10-18 9:02 ` Christoph Hellwig 0 siblings, 0 replies; 48+ messages in thread From: Christoph Hellwig @ 2024-10-18 9:02 UTC (permalink / raw) To: Anuj Gupta Cc: Christoph Hellwig, axboe, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g On Fri, Oct 18, 2024 at 01:56:48PM +0530, Anuj Gupta wrote: > > Just curious, do you have any user of the more fine grained checking > > in NVMe? If not we could support the SCSI semantics only and emulate > > them using the fine grained NVMe semantics and have no portability > > problems. > > We can choose to support scsi semantics only and expose only the valid > scsi combinations to userspace i.e. > > 1. no check > 2. guard check only > 3. ref + app check only > 4. guard + ref + app check > > Something like this [*] on top of this series, untested though. Does > this align with what you have in mind? That is indeed what I had in mind. But I'd really like to hear from Martin on this as well. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 11/11] scsi: add support for user-meta interface 2024-10-16 11:29 ` [PATCH v4 11/11] scsi: add support for user-meta interface Anuj Gupta 2024-10-17 8:15 ` Christoph Hellwig 2024-10-17 11:39 ` Anuj Gupta @ 2024-10-22 1:58 ` Martin K. Petersen 2024-10-28 7:36 ` Anuj Gupta 2 siblings, 1 reply; 48+ messages in thread From: Martin K. Petersen @ 2024-10-22 1:58 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g Anuj, > +/* > + * Can't check reftag alone or apptag alone > + */ > +static bool sd_prot_flags_valid(struct scsi_cmnd *scmd) > +{ > + struct request *rq = scsi_cmd_to_rq(scmd); > + struct bio *bio = rq->bio; > + > + if (bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && > + !bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) > + return false; > + if (!bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && > + bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) > + return false; > + return true; > +} This breaks reading the partition table. The BIP_CHECK_* flags should really only control DIX in the SCSI case. Filling out *PROTECT is left as an exercise for the SCSI disk driver. It's the only way we can sanely deal with this. Especially given ATO, GRD_CHK, REF_CHK, and APP_CHK. It just gets too complicated. You should just drop sd_prot_flags_valid() and things work fine. And then with BIP_CHECK_* introduced we can drop BIP_CTRL_NOCHECK. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 11/11] scsi: add support for user-meta interface 2024-10-22 1:58 ` Martin K. Petersen @ 2024-10-28 7:36 ` Anuj Gupta 2024-10-29 2:24 ` Martin K. Petersen 0 siblings, 1 reply; 48+ messages in thread From: Anuj Gupta @ 2024-10-28 7:36 UTC (permalink / raw) To: Martin K. Petersen Cc: axboe, hch, kbusch, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g [-- Attachment #1: Type: text/plain, Size: 2102 bytes --] On Mon, Oct 21, 2024 at 09:58:57PM -0400, Martin K. Petersen wrote: > > Anuj, > > > +/* > > + * Can't check reftag alone or apptag alone > > + */ > > +static bool sd_prot_flags_valid(struct scsi_cmnd *scmd) > > +{ > > + struct request *rq = scsi_cmd_to_rq(scmd); > > + struct bio *bio = rq->bio; > > + > > + if (bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && > > + !bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) > > + return false; > > + if (!bio_integrity_flagged(bio, BIP_CHECK_REFTAG) && > > + bio_integrity_flagged(bio, BIP_CHECK_APPTAG)) > > + return false; > > + return true; > > +} > > This breaks reading the partition table. > > The BIP_CHECK_* flags should really only control DIX in the SCSI case. > Filling out *PROTECT is left as an exercise for the SCSI disk driver. > It's the only way we can sanely deal with this. Especially given ATO, > GRD_CHK, REF_CHK, and APP_CHK. It just gets too complicated. > > You should just drop sd_prot_flags_valid() and things work fine. And > then with BIP_CHECK_* introduced we can drop BIP_CTRL_NOCHECK. So I will keep the fine grained userspace/bip flags (which we have in this version). And drop the sd_prot_flags_valid() and BIP_CTRL_NOCHECK like below [1]. Hope that looks fine [1] diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index ca4bc0ac76ad..0913bd43f48a 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -814,14 +814,14 @@ static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd, if (bio_integrity_flagged(bio, BIP_IP_CHECKSUM)) scmd->prot_flags |= SCSI_PROT_IP_CHECKSUM; - if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false) + if (bio_integrity_flagged(bio, BIP_CHECK_GUARD) scmd->prot_flags |= SCSI_PROT_GUARD_CHECK; } if (dif != T10_PI_TYPE3_PROTECTION) { /* DIX/DIF Type 0, 1, 2 */ scmd->prot_flags |= SCSI_PROT_REF_INCREMENT; - if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false) + if (bio_integrity_flagged(bio, BIP_CHECK_REFTAG)) scmd->prot_flags |= SCSI_PROT_REF_CHECK; } -- 2.25.1 > > -- > Martin K. Petersen Oracle Linux Engineering > [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply related [flat|nested] 48+ messages in thread
* Re: [PATCH v4 11/11] scsi: add support for user-meta interface 2024-10-28 7:36 ` Anuj Gupta @ 2024-10-29 2:24 ` Martin K. Petersen 0 siblings, 0 replies; 48+ messages in thread From: Martin K. Petersen @ 2024-10-29 2:24 UTC (permalink / raw) To: Anuj Gupta Cc: Martin K. Petersen, axboe, hch, kbusch, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g Anuj, > So I will keep the fine grained userspace/bip flags (which we have in > this version). And drop the sd_prot_flags_valid() and BIP_CTRL_NOCHECK > like below [1]. Hope that looks fine Yep, that's fine. Thanks! -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: [PATCH v4 00/11] Read/Write with meta/integrity 2024-10-16 11:29 ` [PATCH v4 00/11] Read/Write with meta/integrity Anuj Gupta ` (10 preceding siblings ...) [not found] ` <CGME20241016113757epcas5p42b95123c857e5d92d9cdec55e190ce4e@epcas5p4.samsung.com> @ 2024-10-22 2:04 ` Martin K. Petersen 11 siblings, 0 replies; 48+ messages in thread From: Martin K. Petersen @ 2024-10-22 2:04 UTC (permalink / raw) To: Anuj Gupta Cc: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538, krisman, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g Anuj, > * meta_flags: these are meta-type specific flags. Three flags are > exposed for integrity type, namely > BLK_INTEGRITY_CHK_GUARD/APPTAG/REFTAG. It's a bit weird that these are BLK_INTEGRITY_FLAGS since they are exposed via io_uring. I have reads limping along in my test tool on top of this series (with the sd_prot_flags_valid check back out). Will work on writes tomorrow. -- Martin K. Petersen Oracle Linux Engineering ^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2024-10-29 2:24 UTC | newest] Thread overview: 48+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <CGME20241016113705epcas5p1edc284b347de99bc34802b0b0c5e1b27@epcas5p1.samsung.com> 2024-10-16 11:29 ` [PATCH v4 00/11] Read/Write with meta/integrity Anuj Gupta [not found] ` <CGME20241016113724epcas5p191ccce0f473274fc95934956662fc769@epcas5p1.samsung.com> 2024-10-16 11:29 ` [PATCH v4 01/11] block: define set of integrity flags to be inherited by cloned bip Anuj Gupta 2024-10-16 18:03 ` Keith Busch [not found] ` <CGME20241016113736epcas5p3a03665bf0674e68a8f95bbd5f3607357@epcas5p3.samsung.com> 2024-10-16 11:29 ` [PATCH v4 02/11] block: copy back bounce buffer to user-space correctly in case of split Anuj Gupta 2024-10-16 18:04 ` Keith Busch 2024-10-17 7:53 ` Christoph Hellwig [not found] ` <CGME20241016113738epcas5p2d217b8d6efbf855f61c133f64deaa486@epcas5p2.samsung.com> 2024-10-16 11:29 ` [PATCH v4 03/11] block: modify bio_integrity_map_user to accept iov_iter as argument Anuj Gupta [not found] ` <CGME20241016113741epcas5p3b90adb3b43b6b443ffd00df29d63d289@epcas5p3.samsung.com> 2024-10-16 11:29 ` [PATCH v4 04/11] block: define meta io descriptor Anuj Gupta 2024-10-16 19:35 ` Keith Busch 2024-10-17 5:49 ` Anuj Gupta 2024-10-17 7:57 ` Christoph Hellwig 2024-10-22 2:11 ` Martin K. Petersen 2024-10-22 6:04 ` Christoph Hellwig 2024-10-23 1:20 ` Martin K. Petersen 2024-10-28 3:46 ` Anuj Gupta [not found] ` <CGME20241016113743epcas5p3b4092c938d8795cea666ab4e6baf4aa9@epcas5p3.samsung.com> 2024-10-16 11:29 ` [PATCH v4 05/11] fs: introduce IOCB_HAS_METADATA for metadata Anuj Gupta 2024-10-17 7:58 ` Christoph Hellwig [not found] ` <CGME20241016113745epcas5p1723d91b979fd0e597495fef377ad0f62@epcas5p1.samsung.com> 2024-10-16 11:29 ` [PATCH v4 06/11] block: add flags for integrity meta Anuj Gupta 2024-10-17 8:00 ` Christoph Hellwig 2024-10-17 10:45 ` Anuj Gupta 2024-10-17 12:01 ` Christoph Hellwig 2024-10-17 12:59 ` Anuj gupta 2024-10-17 14:34 ` Christoph Hellwig [not found] ` <CGME20241016113747epcas5p4e276eb0da2695ba032ce1d2a3b83fff4@epcas5p4.samsung.com> 2024-10-16 11:29 ` [PATCH v4 07/11] io_uring/rw: add support to send meta along with read/write Anuj Gupta 2024-10-17 8:10 ` Christoph Hellwig 2024-10-17 22:51 ` Jens Axboe 2024-10-21 5:31 ` Anuj Gupta 2024-10-22 6:02 ` Christoph Hellwig 2024-10-22 1:50 ` Martin K. Petersen [not found] ` <CGME20241016113750epcas5p2089e395ca764de023be64519da9b0982@epcas5p2.samsung.com> 2024-10-16 11:29 ` [PATCH v4 08/11] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags Anuj Gupta 2024-10-17 8:12 ` Christoph Hellwig 2024-10-17 10:46 ` Anuj Gupta 2024-10-17 14:37 ` Christoph Hellwig [not found] ` <CGME20241016113752epcas5p4c365819fce1e5d498fd781ae2b309341@epcas5p4.samsung.com> 2024-10-16 11:29 ` [PATCH v4 09/11] block: add support to pass user meta buffer Anuj Gupta 2024-10-17 8:15 ` Christoph Hellwig 2024-10-17 8:24 ` Christoph Hellwig [not found] ` <CGME20241016113755epcas5p2d563b183a9f4e19f5c02d73255282342@epcas5p2.samsung.com> 2024-10-16 11:29 ` [PATCH v4 10/11] nvme: add support for passing on the application tag Anuj Gupta 2024-10-17 8:14 ` Christoph Hellwig [not found] ` <CGME20241016113757epcas5p42b95123c857e5d92d9cdec55e190ce4e@epcas5p4.samsung.com> 2024-10-16 11:29 ` [PATCH v4 11/11] scsi: add support for user-meta interface Anuj Gupta 2024-10-17 8:15 ` Christoph Hellwig 2024-10-17 11:39 ` Anuj Gupta 2024-10-17 14:39 ` Christoph Hellwig 2024-10-18 8:26 ` Anuj Gupta 2024-10-18 9:02 ` Christoph Hellwig 2024-10-22 1:58 ` Martin K. Petersen 2024-10-28 7:36 ` Anuj Gupta 2024-10-29 2:24 ` Martin K. Petersen 2024-10-22 2:04 ` [PATCH v4 00/11] Read/Write with meta/integrity Martin K. Petersen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox