public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Anuj Gupta <[email protected]>,
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected], [email protected]
Cc: [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected],
	Kanchan Joshi <[email protected]>
Subject: Re: [PATCH v9 06/11] io_uring: introduce attributes for read/write and PI support
Date: Sat, 16 Nov 2024 00:00:43 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 11/14/24 10:45, Anuj Gupta wrote:
> Add the ability to pass additional attributes along with read/write.
> Application can populate an array of 'struct io_uring_attr_vec' and pass
> its address using the SQE field:
> 	__u64	attr_vec_addr;
> 
> Along with number of attributes using:
> 	__u8	nr_attr_indirect;
> 
> Overall 16 attributes are allowed and currently one attribute
> 'ATTR_TYPE_PI' is supported.

Why only 16? It's possible that might need more, 256 would
be a safer choice and fits into u8. I don't think you even
need to commit to a number, it should be ok to add more as
long as it fits into the given types (u8 above). It can also
be u16 as well.

> With PI attribute, userspace can pass following information:
> - flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG}
> - len: length of PI/metadata buffer
> - addr: address of metadata buffer
> - seed: seed value for reftag remapping
> - app_tag: application defined 16b value

In terms of flexibility I like it apart from small nits,
but the double indirection could be a bit inefficient,
this thing:

struct pi_attr pi = {};
attr_array = { &pi, ... };
sqe->attr_addr = attr_array;

So maybe we should just flatten it? An attempt to pseudo
code it to understand what it entails is below. Certainly
buggy and some handling is omitted, but should show the
idea.

// uapi/.../io_uring.h

struct sqe {
	...
	u64 attr_addr;
	/* the total size of the array pointed by attr_addr */
	u16 attr_size; /* max 64KB, more than enough */
}

struct io_attr_header {
	/* bit mask of attributes passed, can be helpful in the future
	 * for optimising processing.
	 */
	u64 attr_type_map;
};

/* each attribute should start with a preamble */
struct io_uring_attr_preamble {
	u16 attr_type;
};

// user space

struct PI_param {
	struct io_attr_header header;
	struct io_uring_attr_preamble preamble;
	struct io_uring_attr_pi pi;
};

struct PI_param p = {};
p.header.map = 1 << ATTR_TYPE_PI;
p.preamble.type = ATTR_TYPE_PI;
p.pi = {...};

sqe->attr_addr = &p;
sqe->attr_size = sizeof(p);


The holes b/w structures should be packed better. For the same
reason I don't like a separate preamble structure much, maybe it
should be embedded into the attribute structures, e.g.

struct io_uring_attr_pi {
	u16 attr_type;
	...
}

The user side looks ok to me, should be pretty straightforward
if the user can define a structure like PI_param, i.e. knows
at compilation time which attributes it wants to use.

// kernel space (current patch set, PI only)

addr = READ_ONCE(sqe->attr_addr);
if (addr) {
	size = READ_ONCE(sqe->attr_size);
	process_pi(addr, size);
}

process_pi(addr, size) {
	struct PI_param p;

	if (size != sizeof(PI_attr + struct attr_preamble + struct attr_header))
		return -EINVAL;
	copy_from_user(p, addr, sizeof(p));
	if (p.preamble != ATTR_TYPE_PI)
		return -EINVAL;
	do_pi_setup(&p->pi);
}

This one is pretty simple as well. A bit more troublesome if
extended with many attributes, but it'd need additional handling
regardless:

process_pi(addr, size) {
	if (size < sizeof(header + preamble))
		return -EINVAL;

	attr_array = malloc(size); // +caching by io_uring
	copy_from_user(attr_array);
	handle_attributes(attr_array, size);
}

handle_attributes(attr_array, size) {
	struct io_attr_header *hdr = attr_array;
	offset = sizeof(*hdr);

	while (1) {
		if (offset + sizeof(struct preamble) > size)
			break;

		struct preamble *pr = attr_array + offset;
		if (pr->type > MAX_TYPES)
			return -EINVAL;
		attr_size = attr_sizes[pr->type];
		if (offset + sizeof(preamble) + attr_size > size)
			return -EINVAL;
		offset += sizeof(preamble) + attr_size;

		process_attr(pr->type, (void *)(pr + 1));
	}
}

Some checks can probably be optimised by playing with the uapi
a bit.

attr_type_map is unused here, but I like the idea. I'd love
to see all actual attribute handling to move deeper into the
stack to those who actually need it, but that's for far
away undecided future. E.g.

io_uring_rw {
	p = malloc();
	copy_from_user(p, sqe->attr_addr);
	kiocb->attributes = p;
}

block_do_read {
	hdr = kiocb->attributes;
	type_mask = /* all types block layer recognises */
	if (hdr->attr_type_map & type_mask)
		use_attributes();
}

copy_from_user can be optimised, I mentioned before, we can
have a pre-mapped area into which the indirection can point.
The infra is already in there and even used for passing
waiting arguments.

process_pi(addr, size) {
	struct PI_param *p, __p;

	if (some_flags & USE_REGISTERED_REGION) {
		// Glorified p = ctx->ptr; with some checks
		p = io_uring_get_mem(addr, size);
	} else {
		copy_from_user(__p, addr, sizeof(__p));
		p = &__p;
	}
	...
}

In this case all reads would need to be READ_ONCE, but that
shouldn't be a problem. It might also optimise out the kmalloc
in the extended version.

-- 
Pavel Begunkov

  parent reply	other threads:[~2024-11-15 23:59 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20241114105326epcas5p103b2c996293fa680092b97c747fdbd59@epcas5p1.samsung.com>
2024-11-14 10:45 ` [PATCH v9 00/11] Read/Write with meta/integrity Anuj Gupta
     [not found]   ` <CGME20241114105352epcas5p109c1742fa8a6552296b9c104f2271308@epcas5p1.samsung.com>
2024-11-14 10:45     ` [PATCH v9 01/11] block: define set of integrity flags to be inherited by cloned bip Anuj Gupta
     [not found]   ` <CGME20241114105354epcas5p49a73947c3d37be4189023f66fb7ba413@epcas5p4.samsung.com>
2024-11-14 10:45     ` [PATCH v9 02/11] block: copy back bounce buffer to user-space correctly in case of split Anuj Gupta
     [not found]   ` <CGME20241114105357epcas5p41fd14282d4abfe564e858b37babe708a@epcas5p4.samsung.com>
2024-11-14 10:45     ` [PATCH v9 03/11] block: modify bio_integrity_map_user to accept iov_iter as argument Anuj Gupta
     [not found]   ` <CGME20241114105400epcas5p270b8062a0c4f26833a5b497f057d65a7@epcas5p2.samsung.com>
2024-11-14 10:45     ` [PATCH v9 04/11] fs, iov_iter: define meta io descriptor Anuj Gupta
     [not found]   ` <CGME20241114105402epcas5p41b1f6054a557f1bda2cfddfdfb9a9477@epcas5p4.samsung.com>
2024-11-14 10:45     ` [PATCH v9 05/11] fs: introduce IOCB_HAS_METADATA for metadata Anuj Gupta
     [not found]   ` <CGME20241114105405epcas5p24ca2fb9017276ff8a50ef447638fd739@epcas5p2.samsung.com>
2024-11-14 10:45     ` [PATCH v9 06/11] io_uring: introduce attributes for read/write and PI support Anuj Gupta
2024-11-14 12:16       ` Christoph Hellwig
2024-11-14 13:09         ` Pavel Begunkov
2024-11-14 15:19           ` Christoph Hellwig
2024-11-15 16:40             ` Pavel Begunkov
2024-11-15 17:12               ` Christoph Hellwig
2024-11-15 17:44                 ` Jens Axboe
2024-11-15 18:00                   ` Christoph Hellwig
2024-11-15 19:03                 ` Pavel Begunkov
2024-11-18 12:49                   ` Christoph Hellwig
2024-11-15 18:04           ` Matthew Wilcox
2024-11-20 17:35             ` Darrick J. Wong
2024-11-21  6:54               ` Christoph Hellwig
2024-11-21 13:45               ` Pavel Begunkov
2024-11-15 13:29         ` Anuj gupta
2024-11-16  0:00       ` Pavel Begunkov [this message]
2024-11-16  0:32         ` Pavel Begunkov
2024-11-18 12:50           ` Christoph Hellwig
2024-11-18 16:59             ` Pavel Begunkov
2024-11-18 17:03               ` Christoph Hellwig
2024-11-18 17:45                 ` Pavel Begunkov
2024-11-19 12:49                   ` Christoph Hellwig
2024-11-21 13:29                     ` Pavel Begunkov
2024-11-21  8:59               ` Anuj Gupta
2024-11-21 15:45                 ` Pavel Begunkov
2024-11-16 23:09       ` kernel test robot
     [not found]   ` <CGME20241114105408epcas5p3c77cda2faf7ccb37abbfd8e95b4ad1f5@epcas5p3.samsung.com>
2024-11-14 10:45     ` [PATCH v9 07/11] io_uring: inline read/write attributes and PI Anuj Gupta
     [not found]   ` <CGME20241114105410epcas5p1c6a4e6141b073ccfd6277288f7d5e28b@epcas5p1.samsung.com>
2024-11-14 10:45     ` [PATCH v9 08/11] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags Anuj Gupta
     [not found]   ` <CGME20241114105413epcas5p2d7da8675df2de0d1efba3057144e691d@epcas5p2.samsung.com>
2024-11-14 10:45     ` [PATCH v9 09/11] nvme: add support for passing on the application tag Anuj Gupta
     [not found]   ` <CGME20241114105416epcas5p3a7aa552775cfe50f60ef89f7d982ea12@epcas5p3.samsung.com>
2024-11-14 10:45     ` [PATCH v9 10/11] scsi: add support for user-meta interface Anuj Gupta
     [not found]   ` <CGME20241114105418epcas5p1537d72b9016d10670cf97751704e2cc8@epcas5p1.samsung.com>
2024-11-14 10:45     ` [PATCH v9 11/11] block: add support to pass user meta buffer Anuj Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox