public inbox for [email protected]
 help / color / mirror / Atom feed
From: Kanchan Joshi <[email protected]>
To: Pavel Begunkov <[email protected]>, Keith Busch <[email protected]>
Cc: [email protected], [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	Anuj Gupta <[email protected]>
Subject: Re: [PATCH v6 06/10] io_uring/rw: add support to send metadata along with read/write
Date: Mon, 11 Nov 2024 00:06:57 +0530	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 11/7/2024 10:53 PM, Pavel Begunkov wrote:

> Let's say we have 3 different attributes META_TYPE{1,2,3}.
> 
> How are they placed in an SQE?
> 
> meta1 = (void *)get_big_sqe(sqe);
> meta2 = meta1 + sizeof(?); // sizeof(struct meta1_struct)
> meta3 = meta2 + sizeof(struct meta2_struct);

Not necessary to do this kind of additions and think in terms of 
sequential ordering for the extra information placed into 
primary/secondary SQE.

Please see v8:
https://lore.kernel.org/io-uring/[email protected]/

It exposes a distinct flag (sqe->ext_cap) for each attribute/cap, and 
userspace should place the corresponding information where kernel has 
mandated.

If a particular attribute (example write-hint) requires <20b of extra 
information, we should just place that in first SQE. PI requires more so 
we are placing that into second SQE.

When both PI and write-hint flags are specified by user they can get 
processed fine without actually having to care about above 
additions/ordering.

> Structures are likely not fixed size (?). At least the PI looks large
> enough to force everyone to be just aliased to it.
> 
> And can the user pass first meta2 in the sqe and then meta1?

Yes. Just set the ext_cap flags without bothering about first/second.
User can pass either or both, along with the corresponding info. Just 
don't have to assume specific placement into SQE.


> meta2 = (void *)get_big_sqe(sqe);
> meta1 = meta2 + sizeof(?); // sizeof(struct meta2_struct)
> 
> If yes, how parsing should look like? Does the kernel need to read each
> chunk's type and look up its size to iterate to the next one?

We don't need to iterate if we are not assuming any ordering.

> If no, what happens if we want to pass meta2 and meta3, do they start
> from the big_sqe?

The one who adds the support for meta2/meta3 in kernel decides where to 
place them within first/second SQE or get them fetched via a pointer 
from userspace.

> How do we pass how many of such attributes is there for the request?

ext_cap allows to pass 16 cap/attribute flags. Maybe all can or can not 
be passed inline in SQE, but I have no real visibility about the space 
requirement of future users.


> It should support arbitrary number of attributes in the long run, which
> we can't pass in an SQE, bumping the SQE size is not scalable in
> general, so it'd need to support user pointers or sth similar at some
> point. Placing them in an SQE can serve as an optimisation, and a first> step, though it might be easier to start with user pointer instead.
> 
> Also, when we eventually come to user pointers, we want it to be
> performant as well and e.g. get by just one copy_from_user, and the
> api/struct layouts would need to be able to support it. And once it's
> copied we'll want it to be handled uniformly with the SQE variant, that
> requires a common format. For different formats there will be a question
> of perfomance, maintainability, duplicating kernel and userspace code.
> 
> All that doesn't need to be implemented, but we need a clear direction
> for the API. Maybe we can get a simplified user space pseudo code
> showing how the end API is supposed to look like?

Yes. For a large/arbitrary number, we may have to fetch the entire 
attribute list using a user pointer/len combo. And parse it (that's 
where all your previous questions fit).

And that can still be added on top of v8.
For example, adding a flag (in ext_cap) that disables inline-sqe 
processing and switches to external attribute buffer:

/* Second SQE has PI information */
#define EXT_CAP_PI		(1U << 0)
/* First SQE has hint information */
#define EXT_CAP_WRITE_HINT	(1U << 1)	
/* Do not assume CAP presence in SQE, and fetch capability buffer page 
instead */
#define EXT_CAP_INDIRECT 	(1U << 2)

Corresponding pointer (and/or len) can be put into last 16b of SQE.
Use the same flags/structures for the given attributes within this buffer.
That will keep things uniform and will reuse the same handling that we 
add for inline attributes.

  parent reply	other threads:[~2024-11-10 18:37 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20241030180957epcas5p3312b0a582e8562f8c2169e64d41592b2@epcas5p3.samsung.com>
2024-10-30 18:01 ` [PATCH v6 00/10] Read/Write with metadata/integrity Kanchan Joshi
     [not found]   ` <CGME20241030181000epcas5p2bfb47a79f1e796116135f646c6f0ccc7@epcas5p2.samsung.com>
2024-10-30 18:01     ` [PATCH v6 01/10] block: define set of integrity flags to be inherited by cloned bip Kanchan Joshi
     [not found]   ` <CGME20241030181002epcas5p2b44e244bcd0c49d0a379f0f4fe07dc3f@epcas5p2.samsung.com>
2024-10-30 18:01     ` [PATCH v6 02/10] block: copy back bounce buffer to user-space correctly in case of split Kanchan Joshi
     [not found]   ` <CGME20241030181005epcas5p43b40adb5af1029c9ffaecde317bf1c5d@epcas5p4.samsung.com>
2024-10-30 18:01     ` [PATCH v6 03/10] block: modify bio_integrity_map_user to accept iov_iter as argument Kanchan Joshi
2024-10-31  4:33       ` kernel test robot
     [not found]   ` <CGME20241030181008epcas5p333603fdbf3afb60947d3fc51138d11bf@epcas5p3.samsung.com>
2024-10-30 18:01     ` [PATCH v6 04/10] fs, iov_iter: define meta io descriptor Kanchan Joshi
2024-10-31  6:55       ` Christoph Hellwig
     [not found]   ` <CGME20241030181010epcas5p2c399ecea97ed6d0e5fb228b5d15c2089@epcas5p2.samsung.com>
2024-10-30 18:01     ` [PATCH v6 05/10] fs: introduce IOCB_HAS_METADATA for metadata Kanchan Joshi
     [not found]   ` <CGME20241030181013epcas5p2762403c83e29c81ec34b2a7755154245@epcas5p2.samsung.com>
2024-10-30 18:01     ` [PATCH v6 06/10] io_uring/rw: add support to send metadata along with read/write Kanchan Joshi
2024-10-30 21:09       ` Keith Busch
2024-10-31 14:39         ` Pavel Begunkov
2024-11-01 17:54           ` Kanchan Joshi
2024-11-07 17:23             ` Pavel Begunkov
2024-11-10 17:41               ` Kanchan Joshi
2024-11-12  0:54                 ` Pavel Begunkov
2024-11-10 18:36               ` Kanchan Joshi [this message]
2024-11-12  1:32                 ` Pavel Begunkov
2024-10-31  6:55       ` Christoph Hellwig
     [not found]   ` <CGME20241030181016epcas5p3da284aa997e81d9855207584ab4bace3@epcas5p3.samsung.com>
2024-10-30 18:01     ` [PATCH v6 07/10] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags Kanchan Joshi
     [not found]   ` <CGME20241030181019epcas5p135961d721959d80f1f60bd4790ed52cf@epcas5p1.samsung.com>
2024-10-30 18:01     ` [PATCH v6 08/10] nvme: add support for passing on the application tag Kanchan Joshi
     [not found]   ` <CGME20241030181021epcas5p1c61b7980358f3120014b4f99390d1595@epcas5p1.samsung.com>
2024-10-30 18:01     ` [PATCH v6 09/10] scsi: add support for user-meta interface Kanchan Joshi
2024-10-31  5:09       ` kernel test robot
2024-10-31  5:10       ` kernel test robot
     [not found]   ` <CGME20241030181024epcas5p3964697a08159f8593a6f94764f77a7f3@epcas5p3.samsung.com>
2024-10-30 18:01     ` [PATCH v6 10/10] block: add support to pass user meta buffer Kanchan Joshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox