public inbox for [email protected]
 help / color / mirror / Atom feed
* [PATCH v10 00/10] Read/Write with meta/integrity
       [not found] <CGME20241125071431epcas5p3a3d9633606d2f0b46de2c144bb7f3711@epcas5p3.samsung.com>
@ 2024-11-25  7:06 ` Anuj Gupta
       [not found]   ` <CGME20241125071449epcas5p1f1d44ee61d1af7c847920680767637e7@epcas5p1.samsung.com>
                     ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta

This adds a new io_uring interface to exchange additional integrity/pi
metadata with read/write.

Example program for using the interface is appended below [1].

The patchset is on top of block/for-next.

Block path (direct IO) , NVMe and SCSI driver are modified to support
this.

Patch 1 is an enhancement patch.
Patch 2 is required to make the bounce buffer copy back work correctly.
Patch 3 to 5 are prep patches.
Patch 6 adds the io_uring support.
Patch 7 gives us unified interface for user and kernel generated
integrity.
Patch 8 adds support in SCSI and patch 9 in NVMe.
Patch 10 adds the support for block direct IO.

Changes since v9:
https://lore.kernel.org/linux-block/[email protected]/

- pass PI attribute information via pointer (Pavel)
- fix kernel bot warnings

Changes since v8:
https://lore.kernel.org/io-uring/[email protected]/

- add option of the pass the PI information from user space via a
  pointer (Pavel)

Changes since v7:
https://lore.kernel.org/io-uring/[email protected]/

- change the sign-off order (hch)
- add a check for doing metadata completion handling only for async-io
- change meta_type name to something more meaningful (hch, keith)
- add detail description in io-uring patch (hch)

Changes since v6:
https://lore.kernel.org/linux-block/[email protected]/

- io_uring changes (bring back meta_type, move PI to the end of SQE128)
- Fix robot warnings

Changes since v5:
https://lore.kernel.org/linux-block/[email protected]/

- remove meta_type field from SQE (hch, keith)
- remove __bitwise annotation (hch)
- remove BIP_CTRL_NOCHECK from scsi (hch)

Changes since v4:
https://lore.kernel.org/linux-block/[email protected]/

- better variable names to describe bounce buffer copy back (hch)
- move defintion of flags in the same patch introducing uio_meta (hch)
- move uio_meta definition to include/linux/uio.h (hch)
- bump seed size in uio_meta to 8 bytes (martin)
- move flags definition to include/uapi/linux/fs.h (hch)
- s/meta/metadata in commit description of io-uring (hch)
- rearrange the meta fields in sqe for cleaner layout
- partial submission case is not applicable as, we are only plumbing for async case
- s/META_TYPE_INTEGRITY/META_TYPE_PI (hch, martin)
- remove unlikely branching (hch)
- Better formatting, misc cleanups, better commit descriptions, reordering commits(hch)

Changes since v3:
https://lore.kernel.org/linux-block/[email protected]/

- add reftag seed support (Martin)
- fix incorrect formatting in uio_meta (hch)
- s/IOCB_HAS_META/IOCB_HAS_METADATA (hch)
- move integrity check flags to block layer header (hch)
- add comments for BIP_CHECK_GUARD/REFTAG/APPTAG flags (hch)
- remove bio_integrity check during completion if IOCB_HAS_METADATA is set (hch)
- use goto label to get rid of duplicate error handling (hch)
- add warn_on if trying to do sync io with iocb_has_metadata flag (hch)
- remove check for disabling reftag remapping (hch)
- remove BIP_INTEGRITY_USER flag (hch)
- add comment for app_tag field introduced in bio_integrity_payload (hch)
- pass request to nvme_set_app_tag function (hch)
- right indentation at a place in scsi patch (hch)
- move IOCB_HAS_METADATA to a separate fs patch (hch)

Changes since v2:
https://lore.kernel.org/linux-block/[email protected]/
- io_uring error handling styling (Gabriel)
- add documented helper to get metadata bytes from data iter (hch)
- during clone specify "what flags to clone" rather than
"what not to clone" (hch)
- Move uio_meta defination to bio-integrity.h (hch)
- Rename apptag field to app_tag (hch)
- Change datatype of flags field in uio_meta to bitwise (hch)
- Don't introduce BIP_USER_CHK_FOO flags (hch, martin)
- Driver should rely on block layer flags instead of seeing if it is
user-passthrough (hch)
- update the scsi code for handling user-meta (hch, martin)

Changes since v1:
https://lore.kernel.org/linux-block/[email protected]/
- Do not use new opcode for meta, and also add the provision to introduce new
meta types beyond integrity (Pavel)
- Stuff IOCB_HAS_META check in need_complete_io (Jens)
- Split meta handling in NVMe into a separate handler (Keith)
- Add meta handling for __blkdev_direct_IO too (Keith)
- Don't inherit BIP_COPY_USER flag for cloned bio's (Christoph)
- Better commit descriptions (Christoph)

Changes since RFC:
- modify io_uring plumbing based on recent async handling state changes
- fixes/enhancements to correctly handle the split for meta buffer
- add flags to specify guard/reftag/apptag checks
- add support to send apptag

[1]

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <linux/fs.h>
#include <linux/io_uring.h>
#include <linux/types.h>
#include "liburing.h"

/*
 * write data/meta. read both. compare. send apptag too.
 * prerequisite:
 * protected xfer: format namespace with 4KB + 8b, pi_type = 1
 * For testing reftag remapping on device-mapper, create a
 * device-mapper and run this program. Device mapper creation:
 * # echo 0 80 linear /dev/nvme0n1 0 > /tmp/table
 * # echo 80 160 linear /dev/nvme0n1 200 >> /tmp/table
 * # dmsetup create two /tmp/table
 * # ./a.out /dev/dm-0
 */

#define DATA_LEN 4096
#define META_LEN 8

struct t10_pi_tuple {
        __be16  guard;
        __be16  apptag;
        __be32  reftag;
};

int main(int argc, char *argv[])
{
         struct io_uring ring;
         struct io_uring_sqe *sqe = NULL;
         struct io_uring_cqe *cqe = NULL;
         void *wdb,*rdb;
         char wmb[META_LEN], rmb[META_LEN];
         char *data_str = "data buffer";
         int fd, ret, blksize;
         struct stat fstat;
         unsigned long long offset = DATA_LEN * 10;
         struct t10_pi_tuple *pi;
         struct io_uring_sqe_ext *sqe_ext;
	 struct io_uring_attr w_pi, r_pi;

         if (argc != 2) {
                 fprintf(stderr, "Usage: %s <block-device>", argv[0]);
                 return 1;
         };

         if (stat(argv[1], &fstat) == 0) {
                 blksize = (int)fstat.st_blksize;
         } else {
                 perror("stat");
                 return 1;
         }

         if (posix_memalign(&wdb, blksize, DATA_LEN)) {
                 perror("posix_memalign failed");
                 return 1;
         }
         if (posix_memalign(&rdb, blksize, DATA_LEN)) {
                 perror("posix_memalign failed");
                 return 1;
         }

         memset(wdb, 0, DATA_LEN);

         fd = open(argv[1], O_RDWR | O_DIRECT);
         if (fd < 0) {
                 printf("Error in opening device\n");
                 return 0;
         }

         ret = io_uring_queue_init(8, &ring, 0);
         if (ret) {
                 fprintf(stderr, "ring setup failed: %d\n", ret);
                 return 1;
         }

         /* write data + meta-buffer to device */
         sqe = io_uring_get_sqe(&ring);
         if (!sqe) {
                 fprintf(stderr, "get sqe failed\n");
                 return 1;
         }

         io_uring_prep_write(sqe, fd, wdb, DATA_LEN, offset);

	 sqe->attr_type_mask = ATTR_FLAG_PI;
	 w_pi.attr_type = ATTR_TYPE_PI;
         w_pi.pi.addr = (__u64)wmb;
         w_pi.pi.len = META_LEN;
         /* flags to ask for guard/reftag/apptag*/
         w_pi.pi.flags = IO_INTEGRITY_CHK_GUARD | IO_INTEGRITY_CHK_REFTAG | IO_INTEGRITY_CHK_APPTAG;
         w_pi.pi.app_tag = 0x1234;
         w_pi.pi.seed = 10;
	 w_pi.pi.rsvd = 0;
	 sqe->attr_ptr = (__u64)&w_pi;

         pi = (struct t10_pi_tuple *)wmb;
         pi->guard = 0;
         pi->reftag = 0x0A000000;
         pi->apptag = 0x3412;

         ret = io_uring_submit(&ring);
         if (ret <= 0) {
                 fprintf(stderr, "sqe submit failed: %d\n", ret);
                 return 1;
         }

         ret = io_uring_wait_cqe(&ring, &cqe);
         if (!cqe) {
                 fprintf(stderr, "cqe is NULL :%d\n", ret);
                 return 1;
         }
         if (cqe->res < 0) {
                 fprintf(stderr, "write cqe failure: %d", cqe->res);
                 return 1;
         }

         io_uring_cqe_seen(&ring, cqe);

         /* read data + meta-buffer back from device */
         sqe = io_uring_get_sqe(&ring);
         if (!sqe) {
                 fprintf(stderr, "get sqe failed\n");
                 return 1;
         }

         io_uring_prep_read(sqe, fd, rdb, DATA_LEN, offset);

	 sqe->attr_type_mask = ATTR_FLAG_PI;
	 r_pi.attr_type = ATTR_TYPE_PI;
         r_pi.pi.addr = (__u64)rmb;
         r_pi.pi.len = META_LEN;
         r_pi.pi.flags = IO_INTEGRITY_CHK_GUARD | IO_INTEGRITY_CHK_REFTAG | IO_INTEGRITY_CHK_APPTAG;
         r_pi.pi.app_tag = 0x1234;
         r_pi.pi.seed = 10;
	 r_pi.pi.rsvd = 0;
	 sqe->attr_ptr = (__u64)&r_pi;

         ret = io_uring_submit(&ring);
         if (ret <= 0) {
                 fprintf(stderr, "sqe submit failed: %d\n", ret);
                 return 1;
         }

         ret = io_uring_wait_cqe(&ring, &cqe);
         if (!cqe) {
                 fprintf(stderr, "cqe is NULL :%d\n", ret);
                 return 1;
         }

         if (cqe->res < 0) {
                 fprintf(stderr, "read cqe failure: %d", cqe->res);
                 return 1;
         }

	 pi = (struct t10_pi_tuple *)rmb;
	 if (pi->apptag != 0x3412)
		 printf("Failure: apptag mismatch!\n");
	 if (pi->reftag != 0x0A000000)
		 printf("Failure: reftag mismatch!\n");

         io_uring_cqe_seen(&ring, cqe);

         pi = (struct t10_pi_tuple *)rmb;

         if (strncmp(wmb, rmb, META_LEN))
                 printf("Failure: meta mismatch!, wmb=%s, rmb=%s\n", wmb, rmb);

         if (strncmp(wdb, rdb, DATA_LEN))
                 printf("Failure: data mismatch!\n");

         io_uring_queue_exit(&ring);
         free(rdb);
         free(wdb);
         return 0;
}

Anuj Gupta (7):
  block: define set of integrity flags to be inherited by cloned bip
  block: modify bio_integrity_map_user to accept iov_iter as argument
  fs, iov_iter: define meta io descriptor
  fs: introduce IOCB_HAS_METADATA for metadata
  io_uring: introduce attributes for read/write and PI support
  block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags
  scsi: add support for user-meta interface

Christoph Hellwig (1):
  block: copy back bounce buffer to user-space correctly in case of
    split

Kanchan Joshi (2):
  nvme: add support for passing on the application tag
  block: add support to pass user meta buffer

 block/bio-integrity.c         | 84 ++++++++++++++++++++++++++++-------
 block/blk-integrity.c         | 10 ++++-
 block/fops.c                  | 45 ++++++++++++++-----
 drivers/nvme/host/core.c      | 21 +++++----
 drivers/scsi/sd.c             |  4 +-
 include/linux/bio-integrity.h | 25 ++++++++---
 include/linux/fs.h            |  1 +
 include/linux/uio.h           |  9 ++++
 include/uapi/linux/fs.h       |  9 ++++
 include/uapi/linux/io_uring.h | 31 +++++++++++++
 io_uring/io_uring.c           |  2 +
 io_uring/rw.c                 | 82 +++++++++++++++++++++++++++++++++-
 io_uring/rw.h                 | 14 +++++-
 13 files changed, 291 insertions(+), 46 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v10 01/10] block: define set of integrity flags to be inherited by cloned bip
       [not found]   ` <CGME20241125071449epcas5p1f1d44ee61d1af7c847920680767637e7@epcas5p1.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta

Introduce BIP_CLONE_FLAGS describing integrity flags that should be
inherited in the cloned bip from the parent.

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Martin K. Petersen <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
---
 block/bio-integrity.c         | 2 +-
 include/linux/bio-integrity.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 2a4bd6611692..a448a25d13de 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -559,7 +559,7 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
 
 	bip->bip_vec = bip_src->bip_vec;
 	bip->bip_iter = bip_src->bip_iter;
-	bip->bip_flags = bip_src->bip_flags & ~BIP_BLOCK_INTEGRITY;
+	bip->bip_flags = bip_src->bip_flags & BIP_CLONE_FLAGS;
 
 	return 0;
 }
diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h
index dbf0f74c1529..0f0cf10222e8 100644
--- a/include/linux/bio-integrity.h
+++ b/include/linux/bio-integrity.h
@@ -30,6 +30,9 @@ struct bio_integrity_payload {
 	struct bio_vec		bip_inline_vecs[];/* embedded bvec array */
 };
 
+#define BIP_CLONE_FLAGS (BIP_MAPPED_INTEGRITY | BIP_CTRL_NOCHECK | \
+			 BIP_DISK_NOCHECK | BIP_IP_CHECKSUM)
+
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 
 #define bip_for_each_vec(bvl, bip, iter)				\
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 02/10] block: copy back bounce buffer to user-space correctly in case of split
       [not found]   ` <CGME20241125071451epcas5p2e50329d88842569e5a2a07b918406d28@epcas5p2.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta

From: Christoph Hellwig <[email protected]>

Copy back the bounce buffer to user-space in entirety when the parent
bio completes. The existing code uses bip_iter.bi_size for sizing the
copy, which can be modified. So move away from that and fetch it from
the vector passed to the block layer. While at it, switch to using
better variable names.

Fixes: 492c5d455969f ("block: bio-integrity: directly map user buffers")
Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
---
 block/bio-integrity.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index a448a25d13de..4341b0d4efa1 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -118,17 +118,18 @@ static void bio_integrity_unpin_bvec(struct bio_vec *bv, int nr_vecs,
 
 static void bio_integrity_uncopy_user(struct bio_integrity_payload *bip)
 {
-	unsigned short nr_vecs = bip->bip_max_vcnt - 1;
-	struct bio_vec *copy = &bip->bip_vec[1];
-	size_t bytes = bip->bip_iter.bi_size;
-	struct iov_iter iter;
+	unsigned short orig_nr_vecs = bip->bip_max_vcnt - 1;
+	struct bio_vec *orig_bvecs = &bip->bip_vec[1];
+	struct bio_vec *bounce_bvec = &bip->bip_vec[0];
+	size_t bytes = bounce_bvec->bv_len;
+	struct iov_iter orig_iter;
 	int ret;
 
-	iov_iter_bvec(&iter, ITER_DEST, copy, nr_vecs, bytes);
-	ret = copy_to_iter(bvec_virt(bip->bip_vec), bytes, &iter);
+	iov_iter_bvec(&orig_iter, ITER_DEST, orig_bvecs, orig_nr_vecs, bytes);
+	ret = copy_to_iter(bvec_virt(bounce_bvec), bytes, &orig_iter);
 	WARN_ON_ONCE(ret != bytes);
 
-	bio_integrity_unpin_bvec(copy, nr_vecs, true);
+	bio_integrity_unpin_bvec(orig_bvecs, orig_nr_vecs, true);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 03/10] block: modify bio_integrity_map_user to accept iov_iter as argument
       [not found]   ` <CGME20241125071454epcas5p449a4b9a80f6bfe2ffa1181e3af6c2ac6@epcas5p4.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta, Kanchan Joshi

This patch refactors bio_integrity_map_user to accept iov_iter as
argument. This is a prep patch.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
---
 block/bio-integrity.c         | 12 +++++-------
 block/blk-integrity.c         | 10 +++++++++-
 include/linux/bio-integrity.h |  5 ++---
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 4341b0d4efa1..f56d01cec689 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -302,16 +302,15 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages,
 	return nr_bvecs;
 }
 
-int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes)
+int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
 {
 	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 	unsigned int align = blk_lim_dma_alignment_and_pad(&q->limits);
 	struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages;
 	struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec;
+	size_t offset, bytes = iter->count;
 	unsigned int direction, nr_bvecs;
-	struct iov_iter iter;
 	int ret, nr_vecs;
-	size_t offset;
 	bool copy;
 
 	if (bio_integrity(bio))
@@ -324,8 +323,7 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes)
 	else
 		direction = ITER_SOURCE;
 
-	iov_iter_ubuf(&iter, direction, ubuf, bytes);
-	nr_vecs = iov_iter_npages(&iter, BIO_MAX_VECS + 1);
+	nr_vecs = iov_iter_npages(iter, BIO_MAX_VECS + 1);
 	if (nr_vecs > BIO_MAX_VECS)
 		return -E2BIG;
 	if (nr_vecs > UIO_FASTIOV) {
@@ -335,8 +333,8 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes)
 		pages = NULL;
 	}
 
-	copy = !iov_iter_is_aligned(&iter, align, align);
-	ret = iov_iter_extract_pages(&iter, &pages, bytes, nr_vecs, 0, &offset);
+	copy = !iov_iter_is_aligned(iter, align, align);
+	ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs, 0, &offset);
 	if (unlikely(ret < 0))
 		goto free_bvec;
 
diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index b180cac61a9d..4a29754f1bc2 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -115,8 +115,16 @@ EXPORT_SYMBOL(blk_rq_map_integrity_sg);
 int blk_rq_integrity_map_user(struct request *rq, void __user *ubuf,
 			      ssize_t bytes)
 {
-	int ret = bio_integrity_map_user(rq->bio, ubuf, bytes);
+	int ret;
+	struct iov_iter iter;
+	unsigned int direction;
 
+	if (op_is_write(req_op(rq)))
+		direction = ITER_DEST;
+	else
+		direction = ITER_SOURCE;
+	iov_iter_ubuf(&iter, direction, ubuf, bytes);
+	ret = bio_integrity_map_user(rq->bio, &iter);
 	if (ret)
 		return ret;
 
diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h
index 0f0cf10222e8..58ff9988433a 100644
--- a/include/linux/bio-integrity.h
+++ b/include/linux/bio-integrity.h
@@ -75,7 +75,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio, gfp_t gfp,
 		unsigned int nr);
 int bio_integrity_add_page(struct bio *bio, struct page *page, unsigned int len,
 		unsigned int offset);
-int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t len);
+int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter);
 void bio_integrity_unmap_user(struct bio *bio);
 bool bio_integrity_prep(struct bio *bio);
 void bio_integrity_advance(struct bio *bio, unsigned int bytes_done);
@@ -101,8 +101,7 @@ static inline void bioset_integrity_free(struct bio_set *bs)
 {
 }
 
-static inline int bio_integrity_map_user(struct bio *bio, void __user *ubuf,
-					 ssize_t len)
+static int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
 {
 	return -EINVAL;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 04/10] fs, iov_iter: define meta io descriptor
       [not found]   ` <CGME20241125071457epcas5p498c0641542bed9057e23cfff9cfc5ff0@epcas5p4.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta, Kanchan Joshi

Add flags to describe checks for integrity meta buffer. Also, introduce
a  new 'uio_meta' structure that upper layer can use to pass the
meta/integrity information.

Signed-off-by: Kanchan Joshi <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
 include/linux/uio.h     | 9 +++++++++
 include/uapi/linux/fs.h | 9 +++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 853f9de5aa05..8ada84e85447 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -82,6 +82,15 @@ struct iov_iter {
 	};
 };
 
+typedef __u16 uio_meta_flags_t;
+
+struct uio_meta {
+	uio_meta_flags_t	flags;
+	u16			app_tag;
+	u64			seed;
+	struct iov_iter		iter;
+};
+
 static inline const struct iovec *iter_iov(const struct iov_iter *iter)
 {
 	if (iter->iter_type == ITER_UBUF)
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 753971770733..9070ef19f0a3 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -40,6 +40,15 @@
 #define BLOCK_SIZE_BITS 10
 #define BLOCK_SIZE (1<<BLOCK_SIZE_BITS)
 
+/* flags for integrity meta */
+#define IO_INTEGRITY_CHK_GUARD		(1U << 0) /* enforce guard check */
+#define IO_INTEGRITY_CHK_REFTAG		(1U << 1) /* enforce ref check */
+#define IO_INTEGRITY_CHK_APPTAG		(1U << 2) /* enforce app check */
+
+#define IO_INTEGRITY_VALID_FLAGS (IO_INTEGRITY_CHK_GUARD | \
+				  IO_INTEGRITY_CHK_REFTAG | \
+				  IO_INTEGRITY_CHK_APPTAG)
+
 #define SEEK_SET	0	/* seek relative to beginning of file */
 #define SEEK_CUR	1	/* seek relative to current file position */
 #define SEEK_END	2	/* seek relative to end of file */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 05/10] fs: introduce IOCB_HAS_METADATA for metadata
       [not found]   ` <CGME20241125071459epcas5p3f603d511a03c790476cce37505e61a0b@epcas5p3.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta

Introduce an IOCB_HAS_METADATA flag for the kiocb struct, for handling
requests containing meta payload.

Signed-off-by: Anuj Gupta <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
 include/linux/fs.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7e29433c5ecc..2cc3d45da7b0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -348,6 +348,7 @@ struct readahead_control;
 #define IOCB_DIO_CALLER_COMP	(1 << 22)
 /* kiocb is a read or write operation submitted by fs/aio.c. */
 #define IOCB_AIO_RW		(1 << 23)
+#define IOCB_HAS_METADATA	(1 << 24)
 
 /* for use in trace events */
 #define TRACE_IOCB_STRINGS \
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
       [not found]   ` <CGME20241125071502epcas5p46c373574219a958b565f20732797893f@epcas5p4.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  2024-11-25 14:58       ` Pavel Begunkov
  2024-11-26 13:01       ` Pavel Begunkov
  0 siblings, 2 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta, Kanchan Joshi

Add the ability to pass additional attributes along with read/write.
Application can populate attribute type and attibute specific information
in 'struct io_uring_attr' and pass its address using the SQE field:
	__u64	attr_ptr;

Along with setting a mask indicating attributes being passed:
	__u64	attr_type_mask;

Overall 64 attributes are allowed and currently one attribute
'ATTR_TYPE_PI' is supported.

With PI attribute, userspace can pass following information:
- flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG}
- len: length of PI/metadata buffer
- addr: address of metadata buffer
- seed: seed value for reftag remapping
- app_tag: application defined 16b value

Process this information to prepare uio_meta_descriptor and pass it down
using kiocb->private.

PI attribute is supported only for direct IO.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
---
 include/uapi/linux/io_uring.h | 31 +++++++++++++
 io_uring/io_uring.c           |  2 +
 io_uring/rw.c                 | 82 ++++++++++++++++++++++++++++++++++-
 io_uring/rw.h                 | 14 +++++-
 4 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index aac9a4f8fa9a..bf28d49583ad 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -98,6 +98,10 @@ struct io_uring_sqe {
 			__u64	addr3;
 			__u64	__pad2[1];
 		};
+		struct {
+			__u64	attr_ptr; /* pointer to attribute information */
+			__u64	attr_type_mask; /* bit mask of attributes */
+		};
 		__u64	optval;
 		/*
 		 * If the ring is initialized with IORING_SETUP_SQE128, then
@@ -107,6 +111,33 @@ struct io_uring_sqe {
 	};
 };
 
+
+/* Attributes to be passed with read/write */
+enum io_uring_attr_type {
+	ATTR_TYPE_PI,
+	/* max supported attributes */
+	ATTR_TYPE_LAST = 64,
+};
+
+/* sqe->attr_type_mask flags */
+#define ATTR_FLAG_PI	(1U << ATTR_TYPE_PI)
+/* PI attribute information */
+struct io_uring_attr_pi {
+		__u16	flags;
+		__u16	app_tag;
+		__u32	len;
+		__u64	addr;
+		__u64	seed;
+		__u64	rsvd;
+};
+
+/* attribute information along with type */
+struct io_uring_attr {
+	enum io_uring_attr_type	attr_type;
+	/* type specific struct here */
+	struct io_uring_attr_pi	pi;
+};
+
 /*
  * If sqe->file_index is set to this for opcodes that instantiate a new
  * direct descriptor (like openat/openat2/accept), then io_uring will allocate
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c3a7d0197636..02291ea679fb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3889,6 +3889,8 @@ static int __init io_uring_init(void)
 	BUILD_BUG_SQE_ELEM(46, __u16,  __pad3[0]);
 	BUILD_BUG_SQE_ELEM(48, __u64,  addr3);
 	BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd);
+	BUILD_BUG_SQE_ELEM(48, __u64, attr_ptr);
+	BUILD_BUG_SQE_ELEM(56, __u64, attr_type_mask);
 	BUILD_BUG_SQE_ELEM(56, __u64,  __pad2);
 
 	BUILD_BUG_ON(sizeof(struct io_uring_files_update) !=
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 0bcb83e4ce3c..71bfb74fef96 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -257,11 +257,54 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import)
 	return 0;
 }
 
+static inline void io_meta_save_state(struct io_async_rw *io)
+{
+	io->meta_state.seed = io->meta.seed;
+	iov_iter_save_state(&io->meta.iter, &io->meta_state.iter_meta);
+}
+
+static inline void io_meta_restore(struct io_async_rw *io)
+{
+	io->meta.seed = io->meta_state.seed;
+	iov_iter_restore(&io->meta.iter, &io->meta_state.iter_meta);
+}
+
+static int io_prep_rw_pi(struct io_kiocb *req, struct io_rw *rw, int ddir,
+			 u64 attr_ptr, u64 attr_type_mask)
+{
+	struct io_uring_attr pi_attr;
+	struct io_async_rw *io;
+	int ret;
+
+	if (copy_from_user(&pi_attr, u64_to_user_ptr(attr_ptr),
+	    sizeof(pi_attr)))
+		return -EFAULT;
+
+	if (pi_attr.attr_type != ATTR_TYPE_PI)
+		return -EINVAL;
+
+	if (pi_attr.pi.rsvd)
+		return -EINVAL;
+
+	io = req->async_data;
+	io->meta.flags = pi_attr.pi.flags;
+	io->meta.app_tag = pi_attr.pi.app_tag;
+	io->meta.seed = READ_ONCE(pi_attr.pi.seed);
+	ret = import_ubuf(ddir, u64_to_user_ptr(pi_attr.pi.addr),
+			  pi_attr.pi.len, &io->meta.iter);
+	if (unlikely(ret < 0))
+		return ret;
+	rw->kiocb.ki_flags |= IOCB_HAS_METADATA;
+	io_meta_save_state(io);
+	return ret;
+}
+
 static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		      int ddir, bool do_import)
 {
 	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
 	unsigned ioprio;
+	u64 attr_type_mask;
 	int ret;
 
 	rw->kiocb.ki_pos = READ_ONCE(sqe->off);
@@ -279,11 +322,27 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		rw->kiocb.ki_ioprio = get_current_ioprio();
 	}
 	rw->kiocb.dio_complete = NULL;
+	rw->kiocb.ki_flags = 0;
 
 	rw->addr = READ_ONCE(sqe->addr);
 	rw->len = READ_ONCE(sqe->len);
 	rw->flags = READ_ONCE(sqe->rw_flags);
-	return io_prep_rw_setup(req, ddir, do_import);
+	ret = io_prep_rw_setup(req, ddir, do_import);
+
+	if (unlikely(ret))
+		return ret;
+
+	attr_type_mask = READ_ONCE(sqe->attr_type_mask);
+	if (attr_type_mask) {
+		u64 attr_ptr;
+
+		if (attr_type_mask != ATTR_FLAG_PI)
+			return -EINVAL;
+
+		attr_ptr = READ_ONCE(sqe->attr_ptr);
+		ret = io_prep_rw_pi(req, rw, ddir, attr_ptr, attr_type_mask);
+	}
+	return ret;
 }
 
 int io_prep_read(struct io_kiocb *req, const struct io_uring_sqe *sqe)
@@ -409,7 +468,10 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
 static void io_resubmit_prep(struct io_kiocb *req)
 {
 	struct io_async_rw *io = req->async_data;
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
 
+	if (rw->kiocb.ki_flags & IOCB_HAS_METADATA)
+		io_meta_restore(io);
 	iov_iter_restore(&io->iter, &io->iter_state);
 }
 
@@ -794,7 +856,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
 	if (!(req->flags & REQ_F_FIXED_FILE))
 		req->flags |= io_file_get_flags(file);
 
-	kiocb->ki_flags = file->f_iocb_flags;
+	kiocb->ki_flags |= file->f_iocb_flags;
 	ret = kiocb_set_rw_flags(kiocb, rw->flags, rw_type);
 	if (unlikely(ret))
 		return ret;
@@ -828,6 +890,18 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
 		kiocb->ki_complete = io_complete_rw;
 	}
 
+	if (kiocb->ki_flags & IOCB_HAS_METADATA) {
+		struct io_async_rw *io = req->async_data;
+
+		/*
+		 * We have a union of meta fields with wpq used for buffered-io
+		 * in io_async_rw, so fail it here.
+		 */
+		if (!(req->file->f_flags & O_DIRECT))
+			return -EOPNOTSUPP;
+		kiocb->private = &io->meta;
+	}
+
 	return 0;
 }
 
@@ -902,6 +976,8 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 	 * manually if we need to.
 	 */
 	iov_iter_restore(&io->iter, &io->iter_state);
+	if (kiocb->ki_flags & IOCB_HAS_METADATA)
+		io_meta_restore(io);
 
 	do {
 		/*
@@ -1125,6 +1201,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	} else {
 ret_eagain:
 		iov_iter_restore(&io->iter, &io->iter_state);
+		if (kiocb->ki_flags & IOCB_HAS_METADATA)
+			io_meta_restore(io);
 		if (kiocb->ki_flags & IOCB_WRITE)
 			io_req_end_write(req);
 		return -EAGAIN;
diff --git a/io_uring/rw.h b/io_uring/rw.h
index 3f432dc75441..2d7656bd268d 100644
--- a/io_uring/rw.h
+++ b/io_uring/rw.h
@@ -2,6 +2,11 @@
 
 #include <linux/pagemap.h>
 
+struct io_meta_state {
+	u32			seed;
+	struct iov_iter_state	iter_meta;
+};
+
 struct io_async_rw {
 	size_t				bytes_done;
 	struct iov_iter			iter;
@@ -9,7 +14,14 @@ struct io_async_rw {
 	struct iovec			fast_iov;
 	struct iovec			*free_iovec;
 	int				free_iov_nr;
-	struct wait_page_queue		wpq;
+	/* wpq is for buffered io, while meta fields are used with direct io */
+	union {
+		struct wait_page_queue		wpq;
+		struct {
+			struct uio_meta			meta;
+			struct io_meta_state		meta_state;
+		};
+	};
 };
 
 int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 07/10] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags
       [not found]   ` <CGME20241125071505epcas5p34469830c74b82603c57cb4122d0850f7@epcas5p3.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta, Kanchan Joshi

This patch introduces BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags which
indicate how the hardware should check the integrity payload.
BIP_CHECK_GUARD/REFTAG are conversion of existing semantics, while
BIP_CHECK_APPTAG is a new flag. The driver can now just rely on block
layer flags, and doesn't need to know the integrity source. Submitter
of PI decides which tags to check. This would also give us a unified
interface for user and kernel generated integrity.

Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
---
 block/bio-integrity.c         |  5 +++++
 drivers/nvme/host/core.c      | 11 +++--------
 include/linux/bio-integrity.h |  6 +++++-
 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index f56d01cec689..3bee43b87001 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -434,6 +434,11 @@ bool bio_integrity_prep(struct bio *bio)
 	if (bi->csum_type == BLK_INTEGRITY_CSUM_IP)
 		bip->bip_flags |= BIP_IP_CHECKSUM;
 
+	/* describe what tags to check in payload */
+	if (bi->csum_type)
+		bip->bip_flags |= BIP_CHECK_GUARD;
+	if (bi->flags & BLK_INTEGRITY_REF_TAG)
+		bip->bip_flags |= BIP_CHECK_REFTAG;
 	if (bio_integrity_add_page(bio, virt_to_page(buf), len,
 			offset_in_page(buf)) < len) {
 		printk(KERN_ERR "could not attach integrity payload\n");
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 40e7be3b0339..e4e3653c27fb 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1017,18 +1017,13 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 			control |= NVME_RW_PRINFO_PRACT;
 		}
 
-		switch (ns->head->pi_type) {
-		case NVME_NS_DPS_PI_TYPE3:
+		if (bio_integrity_flagged(req->bio, BIP_CHECK_GUARD))
 			control |= NVME_RW_PRINFO_PRCHK_GUARD;
-			break;
-		case NVME_NS_DPS_PI_TYPE1:
-		case NVME_NS_DPS_PI_TYPE2:
-			control |= NVME_RW_PRINFO_PRCHK_GUARD |
-					NVME_RW_PRINFO_PRCHK_REF;
+		if (bio_integrity_flagged(req->bio, BIP_CHECK_REFTAG)) {
+			control |= NVME_RW_PRINFO_PRCHK_REF;
 			if (op == nvme_cmd_zone_append)
 				control |= NVME_RW_APPEND_PIREMAP;
 			nvme_set_ref_tag(ns, cmnd, req);
-			break;
 		}
 	}
 
diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h
index 58ff9988433a..fe2bfe122db2 100644
--- a/include/linux/bio-integrity.h
+++ b/include/linux/bio-integrity.h
@@ -11,6 +11,9 @@ enum bip_flags {
 	BIP_DISK_NOCHECK	= 1 << 3, /* disable disk integrity checking */
 	BIP_IP_CHECKSUM		= 1 << 4, /* IP checksum */
 	BIP_COPY_USER		= 1 << 5, /* Kernel bounce buffer in use */
+	BIP_CHECK_GUARD		= 1 << 6, /* guard check */
+	BIP_CHECK_REFTAG	= 1 << 7, /* reftag check */
+	BIP_CHECK_APPTAG	= 1 << 8, /* apptag check */
 };
 
 struct bio_integrity_payload {
@@ -31,7 +34,8 @@ struct bio_integrity_payload {
 };
 
 #define BIP_CLONE_FLAGS (BIP_MAPPED_INTEGRITY | BIP_CTRL_NOCHECK | \
-			 BIP_DISK_NOCHECK | BIP_IP_CHECKSUM)
+			 BIP_DISK_NOCHECK | BIP_IP_CHECKSUM | \
+			 BIP_CHECK_GUARD | BIP_CHECK_REFTAG | BIP_CHECK_APPTAG)
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 08/10] nvme: add support for passing on the application tag
       [not found]   ` <CGME20241125071507epcas5p3b898d0960fb411cd176aea29029d820a@epcas5p3.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Kanchan Joshi, Anuj Gupta

From: Kanchan Joshi <[email protected]>

With user integrity buffer, there is a way to specify the app_tag.
Set the corresponding protocol specific flags and send the app_tag down.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
---
 drivers/nvme/host/core.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e4e3653c27fb..571d4106d256 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -885,6 +885,12 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 	return BLK_STS_OK;
 }
 
+static void nvme_set_app_tag(struct request *req, struct nvme_command *cmnd)
+{
+	cmnd->rw.lbat = cpu_to_le16(bio_integrity(req->bio)->app_tag);
+	cmnd->rw.lbatm = cpu_to_le16(0xffff);
+}
+
 static void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd,
 			      struct request *req)
 {
@@ -1025,6 +1031,10 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 				control |= NVME_RW_APPEND_PIREMAP;
 			nvme_set_ref_tag(ns, cmnd, req);
 		}
+		if (bio_integrity_flagged(req->bio, BIP_CHECK_APPTAG)) {
+			control |= NVME_RW_PRINFO_PRCHK_APP;
+			nvme_set_app_tag(req, cmnd);
+		}
 	}
 
 	cmnd->rw.control = cpu_to_le16(control);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 09/10] scsi: add support for user-meta interface
       [not found]   ` <CGME20241125071510epcas5p47a424c419577f1e5c09375ce39a880c3@epcas5p4.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Anuj Gupta

Add support for sending user-meta buffer. Set tags to be checked
using flags specified by user/block-layer.
With this change, BIP_CTRL_NOCHECK becomes unused. Remove it.

Signed-off-by: Anuj Gupta <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
 drivers/scsi/sd.c             |  4 ++--
 include/linux/bio-integrity.h | 16 +++++++---------
 2 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8947dab132d7..cb7ac8736b91 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -814,14 +814,14 @@ static unsigned char sd_setup_protect_cmnd(struct scsi_cmnd *scmd,
 		if (bio_integrity_flagged(bio, BIP_IP_CHECKSUM))
 			scmd->prot_flags |= SCSI_PROT_IP_CHECKSUM;
 
-		if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false)
+		if (bio_integrity_flagged(bio, BIP_CHECK_GUARD))
 			scmd->prot_flags |= SCSI_PROT_GUARD_CHECK;
 	}
 
 	if (dif != T10_PI_TYPE3_PROTECTION) {	/* DIX/DIF Type 0, 1, 2 */
 		scmd->prot_flags |= SCSI_PROT_REF_INCREMENT;
 
-		if (bio_integrity_flagged(bio, BIP_CTRL_NOCHECK) == false)
+		if (bio_integrity_flagged(bio, BIP_CHECK_REFTAG))
 			scmd->prot_flags |= SCSI_PROT_REF_CHECK;
 	}
 
diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h
index fe2bfe122db2..2195bc06dcde 100644
--- a/include/linux/bio-integrity.h
+++ b/include/linux/bio-integrity.h
@@ -7,13 +7,12 @@
 enum bip_flags {
 	BIP_BLOCK_INTEGRITY	= 1 << 0, /* block layer owns integrity data */
 	BIP_MAPPED_INTEGRITY	= 1 << 1, /* ref tag has been remapped */
-	BIP_CTRL_NOCHECK	= 1 << 2, /* disable HBA integrity checking */
-	BIP_DISK_NOCHECK	= 1 << 3, /* disable disk integrity checking */
-	BIP_IP_CHECKSUM		= 1 << 4, /* IP checksum */
-	BIP_COPY_USER		= 1 << 5, /* Kernel bounce buffer in use */
-	BIP_CHECK_GUARD		= 1 << 6, /* guard check */
-	BIP_CHECK_REFTAG	= 1 << 7, /* reftag check */
-	BIP_CHECK_APPTAG	= 1 << 8, /* apptag check */
+	BIP_DISK_NOCHECK	= 1 << 2, /* disable disk integrity checking */
+	BIP_IP_CHECKSUM		= 1 << 3, /* IP checksum */
+	BIP_COPY_USER		= 1 << 4, /* Kernel bounce buffer in use */
+	BIP_CHECK_GUARD		= 1 << 5, /* guard check */
+	BIP_CHECK_REFTAG	= 1 << 6, /* reftag check */
+	BIP_CHECK_APPTAG	= 1 << 7, /* apptag check */
 };
 
 struct bio_integrity_payload {
@@ -33,8 +32,7 @@ struct bio_integrity_payload {
 	struct bio_vec		bip_inline_vecs[];/* embedded bvec array */
 };
 
-#define BIP_CLONE_FLAGS (BIP_MAPPED_INTEGRITY | BIP_CTRL_NOCHECK | \
-			 BIP_DISK_NOCHECK | BIP_IP_CHECKSUM | \
+#define BIP_CLONE_FLAGS (BIP_MAPPED_INTEGRITY | BIP_IP_CHECKSUM | \
 			 BIP_CHECK_GUARD | BIP_CHECK_REFTAG | BIP_CHECK_APPTAG)
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v10 10/10] block: add support to pass user meta buffer
       [not found]   ` <CGME20241125071513epcas5p28b1c27bc43262eb575d576e32f8e3d7b@epcas5p2.samsung.com>
@ 2024-11-25  7:06     ` Anuj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Anuj Gupta @ 2024-11-25  7:06 UTC (permalink / raw)
  To: axboe, hch, kbusch, martin.petersen, asml.silence, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Kanchan Joshi, Anuj Gupta

From: Kanchan Joshi <[email protected]>

If an iocb contains metadata, extract that and prepare the bip.
Based on flags specified by the user, set corresponding guard/app/ref
tags to be checked in bip.

Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Anuj Gupta <[email protected]>
Signed-off-by: Kanchan Joshi <[email protected]>
Reviewed-by: Keith Busch <[email protected]>
---
 block/bio-integrity.c         | 50 +++++++++++++++++++++++++++++++++++
 block/fops.c                  | 45 ++++++++++++++++++++++++-------
 include/linux/bio-integrity.h |  7 +++++
 3 files changed, 92 insertions(+), 10 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 3bee43b87001..5d81ad9a3d20 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -364,6 +364,55 @@ int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
 	return ret;
 }
 
+static void bio_uio_meta_to_bip(struct bio *bio, struct uio_meta *meta)
+{
+	struct bio_integrity_payload *bip = bio_integrity(bio);
+
+	if (meta->flags & IO_INTEGRITY_CHK_GUARD)
+		bip->bip_flags |= BIP_CHECK_GUARD;
+	if (meta->flags & IO_INTEGRITY_CHK_APPTAG)
+		bip->bip_flags |= BIP_CHECK_APPTAG;
+	if (meta->flags & IO_INTEGRITY_CHK_REFTAG)
+		bip->bip_flags |= BIP_CHECK_REFTAG;
+
+	bip->app_tag = meta->app_tag;
+}
+
+int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta)
+{
+	struct blk_integrity *bi = blk_get_integrity(bio->bi_bdev->bd_disk);
+	unsigned int integrity_bytes;
+	int ret;
+	struct iov_iter it;
+
+	if (!bi)
+		return -EINVAL;
+	/*
+	 * original meta iterator can be bigger.
+	 * process integrity info corresponding to current data buffer only.
+	 */
+	it = meta->iter;
+	integrity_bytes = bio_integrity_bytes(bi, bio_sectors(bio));
+	if (it.count < integrity_bytes)
+		return -EINVAL;
+
+	/* should fit into two bytes */
+	BUILD_BUG_ON(IO_INTEGRITY_VALID_FLAGS >= (1 << 16));
+
+	if (meta->flags && (meta->flags & ~IO_INTEGRITY_VALID_FLAGS))
+		return -EINVAL;
+
+	it.count = integrity_bytes;
+	ret = bio_integrity_map_user(bio, &it);
+	if (!ret) {
+		bio_uio_meta_to_bip(bio, meta);
+		bip_set_seed(bio_integrity(bio), meta->seed);
+		iov_iter_advance(&meta->iter, integrity_bytes);
+		meta->seed += bio_integrity_intervals(bi, bio_sectors(bio));
+	}
+	return ret;
+}
+
 /**
  * bio_integrity_prep - Prepare bio for integrity I/O
  * @bio:	bio to prepare
@@ -564,6 +613,7 @@ int bio_integrity_clone(struct bio *bio, struct bio *bio_src,
 	bip->bip_vec = bip_src->bip_vec;
 	bip->bip_iter = bip_src->bip_iter;
 	bip->bip_flags = bip_src->bip_flags & BIP_CLONE_FLAGS;
+	bip->app_tag = bip_src->app_tag;
 
 	return 0;
 }
diff --git a/block/fops.c b/block/fops.c
index 2d01c9007681..412ae74032ad 100644
--- a/block/fops.c
+++ b/block/fops.c
@@ -54,6 +54,7 @@ static ssize_t __blkdev_direct_IO_simple(struct kiocb *iocb,
 	struct bio bio;
 	ssize_t ret;
 
+	WARN_ON_ONCE(iocb->ki_flags & IOCB_HAS_METADATA);
 	if (nr_pages <= DIO_INLINE_BIO_VECS)
 		vecs = inline_vecs;
 	else {
@@ -124,12 +125,16 @@ static void blkdev_bio_end_io(struct bio *bio)
 {
 	struct blkdev_dio *dio = bio->bi_private;
 	bool should_dirty = dio->flags & DIO_SHOULD_DIRTY;
+	bool is_sync = dio->flags & DIO_IS_SYNC;
 
 	if (bio->bi_status && !dio->bio.bi_status)
 		dio->bio.bi_status = bio->bi_status;
 
+	if (!is_sync && (dio->iocb->ki_flags & IOCB_HAS_METADATA))
+		bio_integrity_unmap_user(bio);
+
 	if (atomic_dec_and_test(&dio->ref)) {
-		if (!(dio->flags & DIO_IS_SYNC)) {
+		if (!is_sync) {
 			struct kiocb *iocb = dio->iocb;
 			ssize_t ret;
 
@@ -221,14 +226,16 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 			 * a retry of this from blocking context.
 			 */
 			if (unlikely(iov_iter_count(iter))) {
-				bio_release_pages(bio, false);
-				bio_clear_flag(bio, BIO_REFFED);
-				bio_put(bio);
-				blk_finish_plug(&plug);
-				return -EAGAIN;
+				ret = -EAGAIN;
+				goto fail;
 			}
 			bio->bi_opf |= REQ_NOWAIT;
 		}
+		if (!is_sync && (iocb->ki_flags & IOCB_HAS_METADATA)) {
+			ret = bio_integrity_map_iter(bio, iocb->private);
+			if (unlikely(ret))
+				goto fail;
+		}
 
 		if (is_read) {
 			if (dio->flags & DIO_SHOULD_DIRTY)
@@ -269,6 +276,12 @@ static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter,
 
 	bio_put(&dio->bio);
 	return ret;
+fail:
+	bio_release_pages(bio, false);
+	bio_clear_flag(bio, BIO_REFFED);
+	bio_put(bio);
+	blk_finish_plug(&plug);
+	return ret;
 }
 
 static void blkdev_bio_end_io_async(struct bio *bio)
@@ -286,6 +299,9 @@ static void blkdev_bio_end_io_async(struct bio *bio)
 		ret = blk_status_to_errno(bio->bi_status);
 	}
 
+	if (iocb->ki_flags & IOCB_HAS_METADATA)
+		bio_integrity_unmap_user(bio);
+
 	iocb->ki_complete(iocb, ret);
 
 	if (dio->flags & DIO_SHOULD_DIRTY) {
@@ -330,10 +346,8 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 		bio_iov_bvec_set(bio, iter);
 	} else {
 		ret = bio_iov_iter_get_pages(bio, iter);
-		if (unlikely(ret)) {
-			bio_put(bio);
-			return ret;
-		}
+		if (unlikely(ret))
+			goto out_bio_put;
 	}
 	dio->size = bio->bi_iter.bi_size;
 
@@ -346,6 +360,13 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 		task_io_account_write(bio->bi_iter.bi_size);
 	}
 
+	if (iocb->ki_flags & IOCB_HAS_METADATA) {
+		ret = bio_integrity_map_iter(bio, iocb->private);
+		WRITE_ONCE(iocb->private, NULL);
+		if (unlikely(ret))
+			goto out_bio_put;
+	}
+
 	if (iocb->ki_flags & IOCB_ATOMIC)
 		bio->bi_opf |= REQ_ATOMIC;
 
@@ -360,6 +381,10 @@ static ssize_t __blkdev_direct_IO_async(struct kiocb *iocb,
 		submit_bio(bio);
 	}
 	return -EIOCBQUEUED;
+
+out_bio_put:
+	bio_put(bio);
+	return ret;
 }
 
 static ssize_t blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
diff --git a/include/linux/bio-integrity.h b/include/linux/bio-integrity.h
index 2195bc06dcde..de0a6c9de4d1 100644
--- a/include/linux/bio-integrity.h
+++ b/include/linux/bio-integrity.h
@@ -23,6 +23,7 @@ struct bio_integrity_payload {
 	unsigned short		bip_vcnt;	/* # of integrity bio_vecs */
 	unsigned short		bip_max_vcnt;	/* integrity bio_vec slots */
 	unsigned short		bip_flags;	/* control flags */
+	u16			app_tag;	/* application tag value */
 
 	struct bvec_iter	bio_iter;	/* for rewinding parent bio */
 
@@ -78,6 +79,7 @@ struct bio_integrity_payload *bio_integrity_alloc(struct bio *bio, gfp_t gfp,
 int bio_integrity_add_page(struct bio *bio, struct page *page, unsigned int len,
 		unsigned int offset);
 int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter);
+int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta);
 void bio_integrity_unmap_user(struct bio *bio);
 bool bio_integrity_prep(struct bio *bio);
 void bio_integrity_advance(struct bio *bio, unsigned int bytes_done);
@@ -108,6 +110,11 @@ static int bio_integrity_map_user(struct bio *bio, struct iov_iter *iter)
 	return -EINVAL;
 }
 
+static inline int bio_integrity_map_iter(struct bio *bio, struct uio_meta *meta)
+{
+	return -EINVAL;
+}
+
 static inline void bio_integrity_unmap_user(struct bio *bio)
 {
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-25  7:06     ` [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support Anuj Gupta
@ 2024-11-25 14:58       ` Pavel Begunkov
  2024-11-26 10:40         ` Anuj Gupta
  2024-11-26 13:01       ` Pavel Begunkov
  1 sibling, 1 reply; 22+ messages in thread
From: Pavel Begunkov @ 2024-11-25 14:58 UTC (permalink / raw)
  To: Anuj Gupta, axboe, hch, kbusch, martin.petersen, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Kanchan Joshi

On 11/25/24 07:06, Anuj Gupta wrote:
> Add the ability to pass additional attributes along with read/write.
> Application can populate attribute type and attibute specific information
> in 'struct io_uring_attr' and pass its address using the SQE field:
> 	__u64	attr_ptr;
> 
> Along with setting a mask indicating attributes being passed:
> 	__u64	attr_type_mask;
> 
> Overall 64 attributes are allowed and currently one attribute
> 'ATTR_TYPE_PI' is supported.
> 
> With PI attribute, userspace can pass following information:
> - flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG}
> - len: length of PI/metadata buffer
> - addr: address of metadata buffer
> - seed: seed value for reftag remapping
> - app_tag: application defined 16b value

The API and io_uring parts look good, I'll ask to address the
ATTR_TYPE comment below, the rest are nits, which that can be
ignored and/or delayed.

> Process this information to prepare uio_meta_descriptor and pass it down
> using kiocb->private.

I'm not sure using ->private is a good thing, but I assume it
was discussed, so I'll leave it to Jens and other folks.


> PI attribute is supported only for direct IO.
> 
> Signed-off-by: Anuj Gupta <[email protected]>
> Signed-off-by: Kanchan Joshi <[email protected]>
> ---
>   include/uapi/linux/io_uring.h | 31 +++++++++++++
>   io_uring/io_uring.c           |  2 +
>   io_uring/rw.c                 | 82 ++++++++++++++++++++++++++++++++++-
>   io_uring/rw.h                 | 14 +++++-
>   4 files changed, 126 insertions(+), 3 deletions(-)
> 
> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> index aac9a4f8fa9a..bf28d49583ad 100644
> --- a/include/uapi/linux/io_uring.h
> +++ b/include/uapi/linux/io_uring.h
> @@ -98,6 +98,10 @@ struct io_uring_sqe {
>   			__u64	addr3;
>   			__u64	__pad2[1];
>   		};
> +		struct {
> +			__u64	attr_ptr; /* pointer to attribute information */
> +			__u64	attr_type_mask; /* bit mask of attributes */
> +		};
>   		__u64	optval;
>   		/*
>   		 * If the ring is initialized with IORING_SETUP_SQE128, then
> @@ -107,6 +111,33 @@ struct io_uring_sqe {
>   	};
>   };
>   
> +
> +/* Attributes to be passed with read/write */
> +enum io_uring_attr_type {
> +	ATTR_TYPE_PI,
> +	/* max supported attributes */
> +	ATTR_TYPE_LAST = 64,

ATTR_TYPE sounds too generic, too easy to get a symbol collision
including with user space code.

Some options: IORING_ATTR_TYPE_PI, IORING_RW_ATTR_TYPE_PI.
If it's not supposed to be io_uring specific can be
IO_RW_ATTR_TYPE_PI

> +};
> +
> +/* sqe->attr_type_mask flags */
> +#define ATTR_FLAG_PI	(1U << ATTR_TYPE_PI)
> +/* PI attribute information */
> +struct io_uring_attr_pi {
> +		__u16	flags;
> +		__u16	app_tag;
> +		__u32	len;
> +		__u64	addr;
> +		__u64	seed;
> +		__u64	rsvd;
> +};
> +
> +/* attribute information along with type */
> +struct io_uring_attr {
> +	enum io_uring_attr_type	attr_type;

I'm not against it, but adding a type field to each attribute is not
strictly needed, you can already derive where each attr placed purely
from the mask. Are there some upsides? But again I'm not against it.

> +	/* type specific struct here */
> +	struct io_uring_attr_pi	pi;
> +};
> +
>   /*
>    * If sqe->file_index is set to this for opcodes that instantiate a new
>    * direct descriptor (like openat/openat2/accept), then io_uring will allocate
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index c3a7d0197636..02291ea679fb 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3889,6 +3889,8 @@ static int __init io_uring_init(void)
>   	BUILD_BUG_SQE_ELEM(46, __u16,  __pad3[0]);
>   	BUILD_BUG_SQE_ELEM(48, __u64,  addr3);
>   	BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd);
> +	BUILD_BUG_SQE_ELEM(48, __u64, attr_ptr);
> +	BUILD_BUG_SQE_ELEM(56, __u64, attr_type_mask);
>   	BUILD_BUG_SQE_ELEM(56, __u64,  __pad2);
>   
>   	BUILD_BUG_ON(sizeof(struct io_uring_files_update) !=
> diff --git a/io_uring/rw.c b/io_uring/rw.c
> index 0bcb83e4ce3c..71bfb74fef96 100644
> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
> @@ -257,11 +257,54 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import)
>   	return 0;
>   }
...
> @@ -902,6 +976,8 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
>   	 * manually if we need to.
>   	 */
>   	iov_iter_restore(&io->iter, &io->iter_state);
> +	if (kiocb->ki_flags & IOCB_HAS_METADATA)
> +		io_meta_restore(io);

That can be turned into a helper, but that can be done as a follow up.

I'd also add a IOCB_HAS_METADATA into or around of
io_rw_should_retry(). You're relying on O_DIRECT checks, but that
sounds a bit fragile in the long run.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-25 14:58       ` Pavel Begunkov
@ 2024-11-26 10:40         ` Anuj Gupta
  2024-11-26 12:53           ` Pavel Begunkov
  0 siblings, 1 reply; 22+ messages in thread
From: Anuj Gupta @ 2024-11-26 10:40 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: axboe, hch, kbusch, martin.petersen, anuj1072538, brauner, jack,
	viro, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi,
	vishak.g, linux-fsdevel, Kanchan Joshi

[-- Attachment #1: Type: text/plain, Size: 1644 bytes --]

On Mon, Nov 25, 2024 at 02:58:19PM +0000, Pavel Begunkov wrote:
> On 11/25/24 07:06, Anuj Gupta wrote:
> 
> ATTR_TYPE sounds too generic, too easy to get a symbol collision
> including with user space code.
> 
> Some options: IORING_ATTR_TYPE_PI, IORING_RW_ATTR_TYPE_PI.
> If it's not supposed to be io_uring specific can be
> IO_RW_ATTR_TYPE_PI
> 

Sure, will change to a different name in the next iteration.

> > +
> > +/* attribute information along with type */
> > +struct io_uring_attr {
> > +	enum io_uring_attr_type	attr_type;
> 
> I'm not against it, but adding a type field to each attribute is not
> strictly needed, you can already derive where each attr placed purely
> from the mask. Are there some upsides? But again I'm not against it.
> 

The mask indicates what all attributes have been passed. But while
processing we would need to know where exactly the attributes have been
placed, as attributes are of different sizes (each attribute is of
fixed size though) and they could be placed in any order. Processing when
multiple attributes are passed would look something like this:

attr_ptr = READ_ONCE(sqe->attr_ptr);
attr_mask = READ_ONCE(sqe->attr_type_mask);
size = total_size_of_attributes_passed_from_attr_mask;

copy_from_user(attr_buf, attr_ptr, size);

while (size > 0) {
	if (sizeof(io_uring_attr_type) > size)
		break;

	attr_type = get_type(attr_buf);
	attr_size = get_size(attr_type);

	process_attr(attr_type, attr_buf);
	attr_buf += attr_size;
	size -= attr_size;
}

We cannot derive where exactly the attribute information is placed
purely from the mask, so we need the type field. Do you see it
differently?

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-26 10:40         ` Anuj Gupta
@ 2024-11-26 12:53           ` Pavel Begunkov
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2024-11-26 12:53 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: axboe, hch, kbusch, martin.petersen, anuj1072538, brauner, jack,
	viro, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi,
	vishak.g, linux-fsdevel, Kanchan Joshi

On 11/26/24 10:40, Anuj Gupta wrote:
> On Mon, Nov 25, 2024 at 02:58:19PM +0000, Pavel Begunkov wrote:
>> On 11/25/24 07:06, Anuj Gupta wrote:
>>
>> ATTR_TYPE sounds too generic, too easy to get a symbol collision
>> including with user space code.
>>
>> Some options: IORING_ATTR_TYPE_PI, IORING_RW_ATTR_TYPE_PI.
>> If it's not supposed to be io_uring specific can be
>> IO_RW_ATTR_TYPE_PI
>>
> 
> Sure, will change to a different name in the next iteration.
> 
>>> +
>>> +/* attribute information along with type */
>>> +struct io_uring_attr {
>>> +	enum io_uring_attr_type	attr_type;
>>
>> I'm not against it, but adding a type field to each attribute is not
>> strictly needed, you can already derive where each attr placed purely
>> from the mask. Are there some upsides? But again I'm not against it.
>>
> 
> The mask indicates what all attributes have been passed. But while
> processing we would need to know where exactly the attributes have been
> placed, as attributes are of different sizes (each attribute is of
> fixed size though) and they could be placed in any order. Processing when
> multiple attributes are passed would look something like this:
> 
> attr_ptr = READ_ONCE(sqe->attr_ptr);
> attr_mask = READ_ONCE(sqe->attr_type_mask);
> size = total_size_of_attributes_passed_from_attr_mask;
> 
> copy_from_user(attr_buf, attr_ptr, size);
> 
> while (size > 0) {
> 	if (sizeof(io_uring_attr_type) > size)
> 		break;
> 
> 	attr_type = get_type(attr_buf);
> 	attr_size = get_size(attr_type);
> 
> 	process_attr(attr_type, attr_buf);
> 	attr_buf += attr_size;
> 	size -= attr_size;
> }
> 
> We cannot derive where exactly the attribute information is placed
> purely from the mask, so we need the type field. Do you see it
> differently?

In the mask version I outlined attributes go in the array in order
of their types, max 1 attribute of each type, in which case the
mask fully describes the placement.

static attr_param_sizes[] = {[TYPE_PI] = sizeof(pi), ...};
mask = READ_ONCE(sqe->mask);
off = 0;

for (type = 0; type < NR_TYPE; type++) { // or find_next_bit trick
	if (!(mask & BIT(i)))
		continue;
	process(type=i, pointer= base + attr_param_sizes[i]);
	off += attr_param_sizes[i];
}


Maybe it's a good idea to have a type field for double checking
though, and with it we don't have to commit to one version or
another yet.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-25  7:06     ` [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support Anuj Gupta
  2024-11-25 14:58       ` Pavel Begunkov
@ 2024-11-26 13:01       ` Pavel Begunkov
  2024-11-26 13:04         ` Pavel Begunkov
  2024-11-26 13:54         ` Anuj Gupta
  1 sibling, 2 replies; 22+ messages in thread
From: Pavel Begunkov @ 2024-11-26 13:01 UTC (permalink / raw)
  To: Anuj Gupta, axboe, hch, kbusch, martin.petersen, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Kanchan Joshi

On 11/25/24 07:06, Anuj Gupta wrote:
...
> +/* sqe->attr_type_mask flags */
> +#define ATTR_FLAG_PI	(1U << ATTR_TYPE_PI)
> +/* PI attribute information */
> +struct io_uring_attr_pi {
> +		__u16	flags;
> +		__u16	app_tag;
> +		__u32	len;
> +		__u64	addr;
> +		__u64	seed;
> +		__u64	rsvd;
> +};
> +
> +/* attribute information along with type */
> +struct io_uring_attr {
> +	enum io_uring_attr_type	attr_type;

Hmm, I think there will be implicit padding, we need to deal
with it.

> +	/* type specific struct here */
> +	struct io_uring_attr_pi	pi;
> +};

This also looks PI specific but with a generic name. Or are
attribute structures are supposed to be unionised?

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-26 13:01       ` Pavel Begunkov
@ 2024-11-26 13:04         ` Pavel Begunkov
  2024-11-26 13:54         ` Anuj Gupta
  1 sibling, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2024-11-26 13:04 UTC (permalink / raw)
  To: Anuj Gupta, axboe, hch, kbusch, martin.petersen, anuj1072538,
	brauner, jack, viro
  Cc: io-uring, linux-nvme, linux-block, gost.dev, linux-scsi, vishak.g,
	linux-fsdevel, Kanchan Joshi

On 11/26/24 13:01, Pavel Begunkov wrote:
> On 11/25/24 07:06, Anuj Gupta wrote:
> ...
>> +/* sqe->attr_type_mask flags */
>> +#define ATTR_FLAG_PI    (1U << ATTR_TYPE_PI)
>> +/* PI attribute information */
>> +struct io_uring_attr_pi {
>> +        __u16    flags;
>> +        __u16    app_tag;
>> +        __u32    len;
>> +        __u64    addr;
>> +        __u64    seed;
>> +        __u64    rsvd;
>> +};
>> +
>> +/* attribute information along with type */
>> +struct io_uring_attr {
>> +    enum io_uring_attr_type    attr_type;
> 
> Hmm, I think there will be implicit padding, we need to deal
> with it.

And it's better to be explicitly sized, e.g.
s/enum io_uring_attr_type/__u16/

>> +    /* type specific struct here */
>> +    struct io_uring_attr_pi    pi;
>> +};
> 
> This also looks PI specific but with a generic name. Or are
> attribute structures are supposed to be unionised?
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-26 13:01       ` Pavel Begunkov
  2024-11-26 13:04         ` Pavel Begunkov
@ 2024-11-26 13:54         ` Anuj Gupta
  2024-11-26 15:45           ` Pavel Begunkov
  1 sibling, 1 reply; 22+ messages in thread
From: Anuj Gupta @ 2024-11-26 13:54 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: axboe, hch, kbusch, martin.petersen, anuj1072538, brauner, jack,
	viro, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi,
	vishak.g, linux-fsdevel, Kanchan Joshi

[-- Attachment #1: Type: text/plain, Size: 877 bytes --]

On Tue, Nov 26, 2024 at 01:01:03PM +0000, Pavel Begunkov wrote:
> On 11/25/24 07:06, Anuj Gupta wrote:
> ...
> > +	/* type specific struct here */
> > +	struct io_uring_attr_pi	pi;
> > +};
> 
> This also looks PI specific but with a generic name. Or are
> attribute structures are supposed to be unionised?

Yes, attribute structures would be unionised here. This is done so that
"attr_type" always remains at the top. When there are multiple attributes
this structure would look something like this:

/* attribute information along with type */
struct io_uring_attr {
	enum io_uring_attr_type attr_type;
	/* type specific struct here */
	union {
		struct io_uring_attr_pi	pi;
		struct io_uring_attr_x	x;
		struct io_uring_attr_y	y;
	};
};

And then on the application side for sending attribute x, one would do:

io_uring_attr attr;
attr.type = TYPE_X;
prepare_attr(&attr.x);

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-26 13:54         ` Anuj Gupta
@ 2024-11-26 15:45           ` Pavel Begunkov
  2024-11-26 16:23             ` Anuj gupta
  2024-11-27  9:46             ` Anuj Gupta
  0 siblings, 2 replies; 22+ messages in thread
From: Pavel Begunkov @ 2024-11-26 15:45 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: axboe, hch, kbusch, martin.petersen, anuj1072538, brauner, jack,
	viro, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi,
	vishak.g, linux-fsdevel, Kanchan Joshi

On 11/26/24 13:54, Anuj Gupta wrote:
> On Tue, Nov 26, 2024 at 01:01:03PM +0000, Pavel Begunkov wrote:
>> On 11/25/24 07:06, Anuj Gupta wrote:
>> ...
>>> +	/* type specific struct here */
>>> +	struct io_uring_attr_pi	pi;
>>> +};
>>
>> This also looks PI specific but with a generic name. Or are
>> attribute structures are supposed to be unionised?
> 
> Yes, attribute structures would be unionised here. This is done so that
> "attr_type" always remains at the top. When there are multiple attributes
> this structure would look something like this:
> 
> /* attribute information along with type */
> struct io_uring_attr {
> 	enum io_uring_attr_type attr_type;
> 	/* type specific struct here */
> 	union {
> 		struct io_uring_attr_pi	pi;
> 		struct io_uring_attr_x	x;
> 		struct io_uring_attr_y	y;
> 	};
> };
> 
> And then on the application side for sending attribute x, one would do:
> 
> io_uring_attr attr;
> attr.type = TYPE_X;
> prepare_attr(&attr.x);

Hmm, I have doubts it's going to work well because the union
members have different sizes. Adding a new type could grow
struct io_uring_attr, which is already bad for uapi. And it
can't be stacked:

io_uring_attr attrs[2] = {..., ...}
sqe->attr_ptr = &attrs;
...

This example would be incorrect. Even if it's just one attribute
the user would be wasting space on stack. The only use for it I
see is having ephemeral pointers during parsing, ala

void parse(voud *attributes, offset) {
	struct io_uring_attr *attr = attributes + offset;
	
	if (attr->type == PI) {
		process_pi(&attr->pi);
		// or potentially fill_pi() in userspace
	}
}

But I don't think it's worth it. I'd say, if you're leaving
the structure, let's rename it to struct io_uring_attr_type_pi
or something similar. We can always add a new one later, it
doesn't change the ABI.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-26 15:45           ` Pavel Begunkov
@ 2024-11-26 16:23             ` Anuj gupta
  2024-11-27 10:35               ` Pavel Begunkov
  2024-11-27  9:46             ` Anuj Gupta
  1 sibling, 1 reply; 22+ messages in thread
From: Anuj gupta @ 2024-11-26 16:23 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: Anuj Gupta, axboe, hch, kbusch, martin.petersen, brauner, jack,
	viro, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi,
	vishak.g, linux-fsdevel, Kanchan Joshi

On Tue, Nov 26, 2024 at 9:14 PM Pavel Begunkov <[email protected]> wrote:
>
> On 11/26/24 13:54, Anuj Gupta wrote:
> > On Tue, Nov 26, 2024 at 01:01:03PM +0000, Pavel Begunkov wrote:
> >> On 11/25/24 07:06, Anuj Gupta wrote:
> >> ...
> >>> +   /* type specific struct here */
> >>> +   struct io_uring_attr_pi pi;
> >>> +};
> >>
> >> This also looks PI specific but with a generic name. Or are
> >> attribute structures are supposed to be unionised?
> >
> > Yes, attribute structures would be unionised here. This is done so that
> > "attr_type" always remains at the top. When there are multiple attributes
> > this structure would look something like this:
> >
> > /* attribute information along with type */
> > struct io_uring_attr {
> >       enum io_uring_attr_type attr_type;
> >       /* type specific struct here */
> >       union {
> >               struct io_uring_attr_pi pi;
> >               struct io_uring_attr_x  x;
> >               struct io_uring_attr_y  y;
> >       };
> > };
> >
> > And then on the application side for sending attribute x, one would do:
> >
> > io_uring_attr attr;
> > attr.type = TYPE_X;
> > prepare_attr(&attr.x);
>
> Hmm, I have doubts it's going to work well because the union
> members have different sizes. Adding a new type could grow
> struct io_uring_attr, which is already bad for uapi. And it
> can't be stacked:
>
> io_uring_attr attrs[2] = {..., ...}
> sqe->attr_ptr = &attrs;
> ...
>
> This example would be incorrect. Even if it's just one attribute
> the user would be wasting space on stack. The only use for it I
> see is having ephemeral pointers during parsing, ala
>
> void parse(voud *attributes, offset) {
>         struct io_uring_attr *attr = attributes + offset;
>
>         if (attr->type == PI) {
>                 process_pi(&attr->pi);
>                 // or potentially fill_pi() in userspace
>         }
> }
>
> But I don't think it's worth it. I'd say, if you're leaving
> the structure, let's rename it to struct io_uring_attr_type_pi
> or something similar. We can always add a new one later, it
> doesn't change the ABI.
>

In that case I can just drop the io_uring_attr_pi structure then. We can
keep the mask version where we won't need the type and attributes would go
in the array in order of their types as you suggested here [1]. Does that
sound fine?

[1] https://lore.kernel.org/io-uring/[email protected]/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-26 15:45           ` Pavel Begunkov
  2024-11-26 16:23             ` Anuj gupta
@ 2024-11-27  9:46             ` Anuj Gupta
  2024-11-27 11:24               ` Pavel Begunkov
  1 sibling, 1 reply; 22+ messages in thread
From: Anuj Gupta @ 2024-11-27  9:46 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: axboe, hch, kbusch, martin.petersen, anuj1072538, brauner, jack,
	viro, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi,
	vishak.g, linux-fsdevel, Kanchan Joshi

[-- Attachment #1: Type: text/plain, Size: 7332 bytes --]

On Tue, Nov 26, 2024 at 03:45:09PM +0000, Pavel Begunkov wrote:
> On 11/26/24 13:54, Anuj Gupta wrote:
> > On Tue, Nov 26, 2024 at 01:01:03PM +0000, Pavel Begunkov wrote:
> > > On 11/25/24 07:06, Anuj Gupta wrote:
> 
> Hmm, I have doubts it's going to work well because the union
> members have different sizes. Adding a new type could grow
> struct io_uring_attr, which is already bad for uapi. And it
> can't be stacked:
> 

How about something like this [1]. I have removed the io_uring_attr
structure, and with the mask scheme the user would pass attributes in
order of their types. Do you still see some cracks?

[1]

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index aac9a4f8fa9a..38f0d6b10eaf 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -98,6 +98,10 @@ struct io_uring_sqe {
 			__u64	addr3;
 			__u64	__pad2[1];
 		};
+		struct {
+			__u64	attr_ptr; /* pointer to attribute information */
+			__u64	attr_type_mask; /* bit mask of attributes */
+		};
 		__u64	optval;
 		/*
 		 * If the ring is initialized with IORING_SETUP_SQE128, then
@@ -107,6 +111,18 @@ struct io_uring_sqe {
 	};
 };
 
+/* sqe->attr_type_mask flags */
+#define IORING_RW_ATTR_FLAG_PI	(1U << 0)
+/* PI attribute information */
+struct io_uring_attr_pi {
+		__u16	flags;
+		__u16	app_tag;
+		__u32	len;
+		__u64	addr;
+		__u64	seed;
+		__u64	rsvd;
+};
+
 /*
  * If sqe->file_index is set to this for opcodes that instantiate a new
  * direct descriptor (like openat/openat2/accept), then io_uring will allocate
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index c3a7d0197636..02291ea679fb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3889,6 +3889,8 @@ static int __init io_uring_init(void)
 	BUILD_BUG_SQE_ELEM(46, __u16,  __pad3[0]);
 	BUILD_BUG_SQE_ELEM(48, __u64,  addr3);
 	BUILD_BUG_SQE_ELEM_SIZE(48, 0, cmd);
+	BUILD_BUG_SQE_ELEM(48, __u64, attr_ptr);
+	BUILD_BUG_SQE_ELEM(56, __u64, attr_type_mask);
 	BUILD_BUG_SQE_ELEM(56, __u64,  __pad2);
 
 	BUILD_BUG_ON(sizeof(struct io_uring_files_update) !=
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 0bcb83e4ce3c..8d2ec89fd76b 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -257,11 +257,53 @@ static int io_prep_rw_setup(struct io_kiocb *req, int ddir, bool do_import)
 	return 0;
 }
 
+static inline void io_meta_save_state(struct io_async_rw *io)
+{
+	io->meta_state.seed = io->meta.seed;
+	iov_iter_save_state(&io->meta.iter, &io->meta_state.iter_meta);
+}
+
+static inline void io_meta_restore(struct io_async_rw *io, struct kiocb *kiocb)
+{
+	if (kiocb->ki_flags & IOCB_HAS_METADATA) {
+		io->meta.seed = io->meta_state.seed;
+		iov_iter_restore(&io->meta.iter, &io->meta_state.iter_meta);
+	}
+}
+
+static int io_prep_rw_pi(struct io_kiocb *req, struct io_rw *rw, int ddir,
+			 u64 attr_ptr, u64 attr_type_mask)
+{
+	struct io_uring_attr_pi pi_attr;
+	struct io_async_rw *io;
+	int ret;
+
+	if (copy_from_user(&pi_attr, u64_to_user_ptr(attr_ptr),
+	    sizeof(pi_attr)))
+		return -EFAULT;
+
+	if (pi_attr.rsvd)
+		return -EINVAL;
+
+	io = req->async_data;
+	io->meta.flags = pi_attr.flags;
+	io->meta.app_tag = pi_attr.app_tag;
+	io->meta.seed = READ_ONCE(pi_attr.seed);
+	ret = import_ubuf(ddir, u64_to_user_ptr(pi_attr.addr),
+			  pi_attr.len, &io->meta.iter);
+	if (unlikely(ret < 0))
+		return ret;
+	rw->kiocb.ki_flags |= IOCB_HAS_METADATA;
+	io_meta_save_state(io);
+	return ret;
+}
+
 static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		      int ddir, bool do_import)
 {
 	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
 	unsigned ioprio;
+	u64 attr_type_mask;
 	int ret;
 
 	rw->kiocb.ki_pos = READ_ONCE(sqe->off);
@@ -279,11 +321,28 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
 		rw->kiocb.ki_ioprio = get_current_ioprio();
 	}
 	rw->kiocb.dio_complete = NULL;
+	rw->kiocb.ki_flags = 0;
 
 	rw->addr = READ_ONCE(sqe->addr);
 	rw->len = READ_ONCE(sqe->len);
 	rw->flags = READ_ONCE(sqe->rw_flags);
-	return io_prep_rw_setup(req, ddir, do_import);
+	ret = io_prep_rw_setup(req, ddir, do_import);
+
+	if (unlikely(ret))
+		return ret;
+
+	attr_type_mask = READ_ONCE(sqe->attr_type_mask);
+	if (attr_type_mask) {
+		u64 attr_ptr;
+
+		/* only PI attribute is supported currently */
+		if (attr_type_mask != IORING_RW_ATTR_FLAG_PI)
+			return -EINVAL;
+
+		attr_ptr = READ_ONCE(sqe->attr_ptr);
+		ret = io_prep_rw_pi(req, rw, ddir, attr_ptr, attr_type_mask);
+	}
+	return ret;
 }
 
 int io_prep_read(struct io_kiocb *req, const struct io_uring_sqe *sqe)
@@ -409,7 +468,9 @@ static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
 static void io_resubmit_prep(struct io_kiocb *req)
 {
 	struct io_async_rw *io = req->async_data;
+	struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw);
 
+	io_meta_restore(io, &rw->kiocb);
 	iov_iter_restore(&io->iter, &io->iter_state);
 }
 
@@ -744,6 +805,10 @@ static bool io_rw_should_retry(struct io_kiocb *req)
 	if (kiocb->ki_flags & (IOCB_DIRECT | IOCB_HIPRI))
 		return false;
 
+	/* never retry for meta io */
+	if (kiocb->ki_flags & IOCB_HAS_METADATA)
+		return false;
+
 	/*
 	 * just use poll if we can, and don't attempt if the fs doesn't
 	 * support callback based unlocks
@@ -794,7 +859,7 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
 	if (!(req->flags & REQ_F_FIXED_FILE))
 		req->flags |= io_file_get_flags(file);
 
-	kiocb->ki_flags = file->f_iocb_flags;
+	kiocb->ki_flags |= file->f_iocb_flags;
 	ret = kiocb_set_rw_flags(kiocb, rw->flags, rw_type);
 	if (unlikely(ret))
 		return ret;
@@ -828,6 +893,18 @@ static int io_rw_init_file(struct io_kiocb *req, fmode_t mode, int rw_type)
 		kiocb->ki_complete = io_complete_rw;
 	}
 
+	if (kiocb->ki_flags & IOCB_HAS_METADATA) {
+		struct io_async_rw *io = req->async_data;
+
+		/*
+		 * We have a union of meta fields with wpq used for buffered-io
+		 * in io_async_rw, so fail it here.
+		 */
+		if (!(req->file->f_flags & O_DIRECT))
+			return -EOPNOTSUPP;
+		kiocb->private = &io->meta;
+	}
+
 	return 0;
 }
 
@@ -902,6 +979,7 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags)
 	 * manually if we need to.
 	 */
 	iov_iter_restore(&io->iter, &io->iter_state);
+	io_meta_restore(io, kiocb);
 
 	do {
 		/*
@@ -1125,6 +1203,7 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	} else {
 ret_eagain:
 		iov_iter_restore(&io->iter, &io->iter_state);
+		io_meta_restore(io, kiocb);
 		if (kiocb->ki_flags & IOCB_WRITE)
 			io_req_end_write(req);
 		return -EAGAIN;
diff --git a/io_uring/rw.h b/io_uring/rw.h
index 3f432dc75441..2d7656bd268d 100644
--- a/io_uring/rw.h
+++ b/io_uring/rw.h
@@ -2,6 +2,11 @@
 
 #include <linux/pagemap.h>
 
+struct io_meta_state {
+	u32			seed;
+	struct iov_iter_state	iter_meta;
+};
+
 struct io_async_rw {
 	size_t				bytes_done;
 	struct iov_iter			iter;
@@ -9,7 +14,14 @@ struct io_async_rw {
 	struct iovec			fast_iov;
 	struct iovec			*free_iovec;
 	int				free_iov_nr;
-	struct wait_page_queue		wpq;
+	/* wpq is for buffered io, while meta fields are used with direct io */
+	union {
+		struct wait_page_queue		wpq;
+		struct {
+			struct uio_meta			meta;
+			struct io_meta_state		meta_state;
+		};
+	};
 };
 
 int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe);
-- 
2.25.1

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-26 16:23             ` Anuj gupta
@ 2024-11-27 10:35               ` Pavel Begunkov
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2024-11-27 10:35 UTC (permalink / raw)
  To: Anuj gupta
  Cc: Anuj Gupta, axboe, hch, kbusch, martin.petersen, brauner, jack,
	viro, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi,
	vishak.g, linux-fsdevel, Kanchan Joshi

On 11/26/24 16:23, Anuj gupta wrote:
> On Tue, Nov 26, 2024 at 9:14 PM Pavel Begunkov <[email protected]> wrote:
...
>> This example would be incorrect. Even if it's just one attribute
>> the user would be wasting space on stack. The only use for it I
>> see is having ephemeral pointers during parsing, ala
>>
>> void parse(voud *attributes, offset) {
>>          struct io_uring_attr *attr = attributes + offset;
>>
>>          if (attr->type == PI) {
>>                  process_pi(&attr->pi);
>>                  // or potentially fill_pi() in userspace
>>          }
>> }
>>
>> But I don't think it's worth it. I'd say, if you're leaving
>> the structure, let's rename it to struct io_uring_attr_type_pi
>> or something similar. We can always add a new one later, it
>> doesn't change the ABI.
>>
> 
> In that case I can just drop the io_uring_attr_pi structure then. We can
> keep the mask version where we won't need the type and attributes would go
> in the array in order of their types as you suggested here [1]. Does that
> sound fine?

That should work, the approach in this patchset is fine as well.
I'll take a look at the path a bit later today.

> [1] https://lore.kernel.org/io-uring/[email protected]/

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support
  2024-11-27  9:46             ` Anuj Gupta
@ 2024-11-27 11:24               ` Pavel Begunkov
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Begunkov @ 2024-11-27 11:24 UTC (permalink / raw)
  To: Anuj Gupta
  Cc: axboe, hch, kbusch, martin.petersen, anuj1072538, brauner, jack,
	viro, io-uring, linux-nvme, linux-block, gost.dev, linux-scsi,
	vishak.g, linux-fsdevel, Kanchan Joshi

On 11/27/24 09:46, Anuj Gupta wrote:
> On Tue, Nov 26, 2024 at 03:45:09PM +0000, Pavel Begunkov wrote:
>> On 11/26/24 13:54, Anuj Gupta wrote:
>>> On Tue, Nov 26, 2024 at 01:01:03PM +0000, Pavel Begunkov wrote:
>>>> On 11/25/24 07:06, Anuj Gupta wrote:
>>
>> Hmm, I have doubts it's going to work well because the union
>> members have different sizes. Adding a new type could grow
>> struct io_uring_attr, which is already bad for uapi. And it
>> can't be stacked:
>>
> 
> How about something like this [1]. I have removed the io_uring_attr
> structure, and with the mask scheme the user would pass attributes in
> order of their types. Do you still see some cracks?

Looks good to me

> --- a/io_uring/rw.c
> +++ b/io_uring/rw.c
...
> +static int io_prep_rw_pi(struct io_kiocb *req, struct io_rw *rw, int ddir,
> +			 u64 attr_ptr, u64 attr_type_mask)
> +{
> +	struct io_uring_attr_pi pi_attr;
> +	struct io_async_rw *io;
> +	int ret;
> +
> +	if (copy_from_user(&pi_attr, u64_to_user_ptr(attr_ptr),
> +	    sizeof(pi_attr)))
> +		return -EFAULT;
> +
> +	if (pi_attr.rsvd)
> +		return -EINVAL;
> +
> +	io = req->async_data;
> +	io->meta.flags = pi_attr.flags;
> +	io->meta.app_tag = pi_attr.app_tag;
> +	io->meta.seed = READ_ONCE(pi_attr.seed);

Seems an unnecessary READ_ONCE slipped here

> +	ret = import_ubuf(ddir, u64_to_user_ptr(pi_attr.addr),
> +			  pi_attr.len, &io->meta.iter);
> +	if (unlikely(ret < 0))
> +		return ret;
> +	rw->kiocb.ki_flags |= IOCB_HAS_METADATA;
> +	io_meta_save_state(io);
> +	return ret;
> +}
...

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-11-27 11:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CGME20241125071431epcas5p3a3d9633606d2f0b46de2c144bb7f3711@epcas5p3.samsung.com>
2024-11-25  7:06 ` [PATCH v10 00/10] Read/Write with meta/integrity Anuj Gupta
     [not found]   ` <CGME20241125071449epcas5p1f1d44ee61d1af7c847920680767637e7@epcas5p1.samsung.com>
2024-11-25  7:06     ` [PATCH v10 01/10] block: define set of integrity flags to be inherited by cloned bip Anuj Gupta
     [not found]   ` <CGME20241125071451epcas5p2e50329d88842569e5a2a07b918406d28@epcas5p2.samsung.com>
2024-11-25  7:06     ` [PATCH v10 02/10] block: copy back bounce buffer to user-space correctly in case of split Anuj Gupta
     [not found]   ` <CGME20241125071454epcas5p449a4b9a80f6bfe2ffa1181e3af6c2ac6@epcas5p4.samsung.com>
2024-11-25  7:06     ` [PATCH v10 03/10] block: modify bio_integrity_map_user to accept iov_iter as argument Anuj Gupta
     [not found]   ` <CGME20241125071457epcas5p498c0641542bed9057e23cfff9cfc5ff0@epcas5p4.samsung.com>
2024-11-25  7:06     ` [PATCH v10 04/10] fs, iov_iter: define meta io descriptor Anuj Gupta
     [not found]   ` <CGME20241125071459epcas5p3f603d511a03c790476cce37505e61a0b@epcas5p3.samsung.com>
2024-11-25  7:06     ` [PATCH v10 05/10] fs: introduce IOCB_HAS_METADATA for metadata Anuj Gupta
     [not found]   ` <CGME20241125071502epcas5p46c373574219a958b565f20732797893f@epcas5p4.samsung.com>
2024-11-25  7:06     ` [PATCH v10 06/10] io_uring: introduce attributes for read/write and PI support Anuj Gupta
2024-11-25 14:58       ` Pavel Begunkov
2024-11-26 10:40         ` Anuj Gupta
2024-11-26 12:53           ` Pavel Begunkov
2024-11-26 13:01       ` Pavel Begunkov
2024-11-26 13:04         ` Pavel Begunkov
2024-11-26 13:54         ` Anuj Gupta
2024-11-26 15:45           ` Pavel Begunkov
2024-11-26 16:23             ` Anuj gupta
2024-11-27 10:35               ` Pavel Begunkov
2024-11-27  9:46             ` Anuj Gupta
2024-11-27 11:24               ` Pavel Begunkov
     [not found]   ` <CGME20241125071505epcas5p34469830c74b82603c57cb4122d0850f7@epcas5p3.samsung.com>
2024-11-25  7:06     ` [PATCH v10 07/10] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags Anuj Gupta
     [not found]   ` <CGME20241125071507epcas5p3b898d0960fb411cd176aea29029d820a@epcas5p3.samsung.com>
2024-11-25  7:06     ` [PATCH v10 08/10] nvme: add support for passing on the application tag Anuj Gupta
     [not found]   ` <CGME20241125071510epcas5p47a424c419577f1e5c09375ce39a880c3@epcas5p4.samsung.com>
2024-11-25  7:06     ` [PATCH v10 09/10] scsi: add support for user-meta interface Anuj Gupta
     [not found]   ` <CGME20241125071513epcas5p28b1c27bc43262eb575d576e32f8e3d7b@epcas5p2.samsung.com>
2024-11-25  7:06     ` [PATCH v10 10/10] block: add support to pass user meta buffer Anuj Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox