From: Kanchan Joshi <[email protected]>
To: Pavel Begunkov <[email protected]>, Keith Busch <[email protected]>
Cc: [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
Anuj Gupta <[email protected]>
Subject: Re: [PATCH v6 06/10] io_uring/rw: add support to send metadata along with read/write
Date: Mon, 11 Nov 2024 00:06:57 +0530 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 11/7/2024 10:53 PM, Pavel Begunkov wrote:
> Let's say we have 3 different attributes META_TYPE{1,2,3}.
>
> How are they placed in an SQE?
>
> meta1 = (void *)get_big_sqe(sqe);
> meta2 = meta1 + sizeof(?); // sizeof(struct meta1_struct)
> meta3 = meta2 + sizeof(struct meta2_struct);
Not necessary to do this kind of additions and think in terms of
sequential ordering for the extra information placed into
primary/secondary SQE.
Please see v8:
https://lore.kernel.org/io-uring/[email protected]/
It exposes a distinct flag (sqe->ext_cap) for each attribute/cap, and
userspace should place the corresponding information where kernel has
mandated.
If a particular attribute (example write-hint) requires <20b of extra
information, we should just place that in first SQE. PI requires more so
we are placing that into second SQE.
When both PI and write-hint flags are specified by user they can get
processed fine without actually having to care about above
additions/ordering.
> Structures are likely not fixed size (?). At least the PI looks large
> enough to force everyone to be just aliased to it.
>
> And can the user pass first meta2 in the sqe and then meta1?
Yes. Just set the ext_cap flags without bothering about first/second.
User can pass either or both, along with the corresponding info. Just
don't have to assume specific placement into SQE.
> meta2 = (void *)get_big_sqe(sqe);
> meta1 = meta2 + sizeof(?); // sizeof(struct meta2_struct)
>
> If yes, how parsing should look like? Does the kernel need to read each
> chunk's type and look up its size to iterate to the next one?
We don't need to iterate if we are not assuming any ordering.
> If no, what happens if we want to pass meta2 and meta3, do they start
> from the big_sqe?
The one who adds the support for meta2/meta3 in kernel decides where to
place them within first/second SQE or get them fetched via a pointer
from userspace.
> How do we pass how many of such attributes is there for the request?
ext_cap allows to pass 16 cap/attribute flags. Maybe all can or can not
be passed inline in SQE, but I have no real visibility about the space
requirement of future users.
> It should support arbitrary number of attributes in the long run, which
> we can't pass in an SQE, bumping the SQE size is not scalable in
> general, so it'd need to support user pointers or sth similar at some
> point. Placing them in an SQE can serve as an optimisation, and a first> step, though it might be easier to start with user pointer instead.
>
> Also, when we eventually come to user pointers, we want it to be
> performant as well and e.g. get by just one copy_from_user, and the
> api/struct layouts would need to be able to support it. And once it's
> copied we'll want it to be handled uniformly with the SQE variant, that
> requires a common format. For different formats there will be a question
> of perfomance, maintainability, duplicating kernel and userspace code.
>
> All that doesn't need to be implemented, but we need a clear direction
> for the API. Maybe we can get a simplified user space pseudo code
> showing how the end API is supposed to look like?
Yes. For a large/arbitrary number, we may have to fetch the entire
attribute list using a user pointer/len combo. And parse it (that's
where all your previous questions fit).
And that can still be added on top of v8.
For example, adding a flag (in ext_cap) that disables inline-sqe
processing and switches to external attribute buffer:
/* Second SQE has PI information */
#define EXT_CAP_PI (1U << 0)
/* First SQE has hint information */
#define EXT_CAP_WRITE_HINT (1U << 1)
/* Do not assume CAP presence in SQE, and fetch capability buffer page
instead */
#define EXT_CAP_INDIRECT (1U << 2)
Corresponding pointer (and/or len) can be put into last 16b of SQE.
Use the same flags/structures for the given attributes within this buffer.
That will keep things uniform and will reuse the same handling that we
add for inline attributes.
next prev parent reply other threads:[~2024-11-10 18:37 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20241030180957epcas5p3312b0a582e8562f8c2169e64d41592b2@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 00/10] Read/Write with metadata/integrity Kanchan Joshi
[not found] ` <CGME20241030181000epcas5p2bfb47a79f1e796116135f646c6f0ccc7@epcas5p2.samsung.com>
2024-10-30 18:01 ` [PATCH v6 01/10] block: define set of integrity flags to be inherited by cloned bip Kanchan Joshi
[not found] ` <CGME20241030181002epcas5p2b44e244bcd0c49d0a379f0f4fe07dc3f@epcas5p2.samsung.com>
2024-10-30 18:01 ` [PATCH v6 02/10] block: copy back bounce buffer to user-space correctly in case of split Kanchan Joshi
[not found] ` <CGME20241030181005epcas5p43b40adb5af1029c9ffaecde317bf1c5d@epcas5p4.samsung.com>
2024-10-30 18:01 ` [PATCH v6 03/10] block: modify bio_integrity_map_user to accept iov_iter as argument Kanchan Joshi
2024-10-31 4:33 ` kernel test robot
[not found] ` <CGME20241030181008epcas5p333603fdbf3afb60947d3fc51138d11bf@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 04/10] fs, iov_iter: define meta io descriptor Kanchan Joshi
2024-10-31 6:55 ` Christoph Hellwig
[not found] ` <CGME20241030181010epcas5p2c399ecea97ed6d0e5fb228b5d15c2089@epcas5p2.samsung.com>
2024-10-30 18:01 ` [PATCH v6 05/10] fs: introduce IOCB_HAS_METADATA for metadata Kanchan Joshi
[not found] ` <CGME20241030181013epcas5p2762403c83e29c81ec34b2a7755154245@epcas5p2.samsung.com>
2024-10-30 18:01 ` [PATCH v6 06/10] io_uring/rw: add support to send metadata along with read/write Kanchan Joshi
2024-10-30 21:09 ` Keith Busch
2024-10-31 14:39 ` Pavel Begunkov
2024-11-01 17:54 ` Kanchan Joshi
2024-11-07 17:23 ` Pavel Begunkov
2024-11-10 17:41 ` Kanchan Joshi
2024-11-12 0:54 ` Pavel Begunkov
2024-11-10 18:36 ` Kanchan Joshi [this message]
2024-11-12 1:32 ` Pavel Begunkov
2024-10-31 6:55 ` Christoph Hellwig
[not found] ` <CGME20241030181016epcas5p3da284aa997e81d9855207584ab4bace3@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 07/10] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags Kanchan Joshi
[not found] ` <CGME20241030181019epcas5p135961d721959d80f1f60bd4790ed52cf@epcas5p1.samsung.com>
2024-10-30 18:01 ` [PATCH v6 08/10] nvme: add support for passing on the application tag Kanchan Joshi
[not found] ` <CGME20241030181021epcas5p1c61b7980358f3120014b4f99390d1595@epcas5p1.samsung.com>
2024-10-30 18:01 ` [PATCH v6 09/10] scsi: add support for user-meta interface Kanchan Joshi
2024-10-31 5:09 ` kernel test robot
2024-10-31 5:10 ` kernel test robot
[not found] ` <CGME20241030181024epcas5p3964697a08159f8593a6f94764f77a7f3@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 10/10] block: add support to pass user meta buffer Kanchan Joshi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox