From: Pavel Begunkov <[email protected]>
To: Kanchan Joshi <[email protected]>, Keith Busch <[email protected]>
Cc: [email protected], [email protected], [email protected],
[email protected], [email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected],
Anuj Gupta <[email protected]>
Subject: Re: [PATCH v6 06/10] io_uring/rw: add support to send metadata along with read/write
Date: Tue, 12 Nov 2024 01:32:58 +0000 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 11/10/24 18:36, Kanchan Joshi wrote:
> On 11/7/2024 10:53 PM, Pavel Begunkov wrote:
>
>> Let's say we have 3 different attributes META_TYPE{1,2,3}.
>>
>> How are they placed in an SQE?
>>
>> meta1 = (void *)get_big_sqe(sqe);
>> meta2 = meta1 + sizeof(?); // sizeof(struct meta1_struct)
>> meta3 = meta2 + sizeof(struct meta2_struct);
>
> Not necessary to do this kind of additions and think in terms of
> sequential ordering for the extra information placed into
> primary/secondary SQE.
>
> Please see v8:
> https://lore.kernel.org/io-uring/[email protected]/
>
> It exposes a distinct flag (sqe->ext_cap) for each attribute/cap, and
> userspace should place the corresponding information where kernel has
> mandated.
>
> If a particular attribute (example write-hint) requires <20b of extra
> information, we should just place that in first SQE. PI requires more so
> we are placing that into second SQE.
>
> When both PI and write-hint flags are specified by user they can get
> processed fine without actually having to care about above
> additions/ordering.
Ok, this option is to statically define a place in SQE for each
meta type. The problem is that we can't place everything into
an SQE, and the next big meta would need to be a user pointer,
at which point copy_from_user() is expensive again and we need
to invent something new. PI becomes a special case, most likely
handled in a special way, and either becomes one of few "optimised"
or forces for nothing its users into SQE128 (with all additional
costs) when it could've been aligned with other later meta types.
>> Structures are likely not fixed size (?). At least the PI looks large
>> enough to force everyone to be just aliased to it.
>>
>> And can the user pass first meta2 in the sqe and then meta1?
>
> Yes. Just set the ext_cap flags without bothering about first/second.
> User can pass either or both, along with the corresponding info. Just
> don't have to assume specific placement into SQE.
>
>
>> meta2 = (void *)get_big_sqe(sqe);
>> meta1 = meta2 + sizeof(?); // sizeof(struct meta2_struct)
>>
>> If yes, how parsing should look like? Does the kernel need to read each
>> chunk's type and look up its size to iterate to the next one?
>
> We don't need to iterate if we are not assuming any ordering.
>
>> If no, what happens if we want to pass meta2 and meta3, do they start
>> from the big_sqe?
>
> The one who adds the support for meta2/meta3 in kernel decides where to
> place them within first/second SQE or get them fetched via a pointer
> from userspace.
>
>> How do we pass how many of such attributes is there for the request?
>
> ext_cap allows to pass 16 cap/attribute flags. Maybe all can or can not
> be passed inline in SQE, but I have no real visibility about the space
> requirement of future users.
I like ext_cap, if not in the current form / API, then as a user
hint - quick map of what meta types are passed.
--
Pavel Begunkov
next prev parent reply other threads:[~2024-11-12 1:32 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20241030180957epcas5p3312b0a582e8562f8c2169e64d41592b2@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 00/10] Read/Write with metadata/integrity Kanchan Joshi
[not found] ` <CGME20241030181000epcas5p2bfb47a79f1e796116135f646c6f0ccc7@epcas5p2.samsung.com>
2024-10-30 18:01 ` [PATCH v6 01/10] block: define set of integrity flags to be inherited by cloned bip Kanchan Joshi
[not found] ` <CGME20241030181002epcas5p2b44e244bcd0c49d0a379f0f4fe07dc3f@epcas5p2.samsung.com>
2024-10-30 18:01 ` [PATCH v6 02/10] block: copy back bounce buffer to user-space correctly in case of split Kanchan Joshi
[not found] ` <CGME20241030181005epcas5p43b40adb5af1029c9ffaecde317bf1c5d@epcas5p4.samsung.com>
2024-10-30 18:01 ` [PATCH v6 03/10] block: modify bio_integrity_map_user to accept iov_iter as argument Kanchan Joshi
2024-10-31 4:33 ` kernel test robot
[not found] ` <CGME20241030181008epcas5p333603fdbf3afb60947d3fc51138d11bf@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 04/10] fs, iov_iter: define meta io descriptor Kanchan Joshi
2024-10-31 6:55 ` Christoph Hellwig
[not found] ` <CGME20241030181010epcas5p2c399ecea97ed6d0e5fb228b5d15c2089@epcas5p2.samsung.com>
2024-10-30 18:01 ` [PATCH v6 05/10] fs: introduce IOCB_HAS_METADATA for metadata Kanchan Joshi
[not found] ` <CGME20241030181013epcas5p2762403c83e29c81ec34b2a7755154245@epcas5p2.samsung.com>
2024-10-30 18:01 ` [PATCH v6 06/10] io_uring/rw: add support to send metadata along with read/write Kanchan Joshi
2024-10-30 21:09 ` Keith Busch
2024-10-31 14:39 ` Pavel Begunkov
2024-11-01 17:54 ` Kanchan Joshi
2024-11-07 17:23 ` Pavel Begunkov
2024-11-10 17:41 ` Kanchan Joshi
2024-11-12 0:54 ` Pavel Begunkov
2024-11-10 18:36 ` Kanchan Joshi
2024-11-12 1:32 ` Pavel Begunkov [this message]
2024-10-31 6:55 ` Christoph Hellwig
[not found] ` <CGME20241030181016epcas5p3da284aa997e81d9855207584ab4bace3@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 07/10] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags Kanchan Joshi
[not found] ` <CGME20241030181019epcas5p135961d721959d80f1f60bd4790ed52cf@epcas5p1.samsung.com>
2024-10-30 18:01 ` [PATCH v6 08/10] nvme: add support for passing on the application tag Kanchan Joshi
[not found] ` <CGME20241030181021epcas5p1c61b7980358f3120014b4f99390d1595@epcas5p1.samsung.com>
2024-10-30 18:01 ` [PATCH v6 09/10] scsi: add support for user-meta interface Kanchan Joshi
2024-10-31 5:09 ` kernel test robot
2024-10-31 5:10 ` kernel test robot
[not found] ` <CGME20241030181024epcas5p3964697a08159f8593a6f94764f77a7f3@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 10/10] block: add support to pass user meta buffer Kanchan Joshi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox