public inbox for [email protected]
 help / color / mirror / Atom feed
From: Keith Busch <[email protected]>
To: Christoph Hellwig <[email protected]>
Cc: Kanchan Joshi <[email protected]>,
	Anuj gupta <[email protected]>,
	Anuj Gupta <[email protected]>,
	[email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected],
	[email protected], [email protected]
Subject: Re: [PATCH v7 06/10] io_uring/rw: add support to send metadata along with read/write
Date: Tue, 5 Nov 2024 09:23:19 -0700	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On Tue, Nov 05, 2024 at 05:00:51PM +0100, Christoph Hellwig wrote:
> On Tue, Nov 05, 2024 at 09:21:27PM +0530, Kanchan Joshi wrote:
> > Can add the documentation (if this version is palatable for Jens/Pavel), 
> > but this was discussed in previous iteration:
> > 
> > 1. Each meta type may have different space requirement in SQE.
> > 
> > Only for PI, we need so much space that we can't fit that in first SQE. 
> > The SQE128 requirement is only for PI type.
> > Another different meta type may just fit into the first SQE. For that we 
> > don't have to mandate SQE128.
> 
> Ok, I'm really confused now.  The way I understood Anuj was that this
> is NOT about block level metadata, but about other uses of the big SQE.
> 
> Which version is right?  Or did I just completely misunderstand Anuj?

Let's not call this "meta_type". Can we use something that has a less
overloaded meaning, like "sqe_extended_capabilities", or "ecap", or
something like that.
 
> > 2. If two meta types are known not to co-exist, they can be kept in the 
> > same place within SQE. Since each meta-type is a flag, we can check what 
> > combinations are valid within io_uring and throw the error in case of 
> > incompatibility.
> 
> And this sounds like what you refer to is not actually block metadata
> as in this patchset or nvme, (or weirdly enough integrity in the block
> layer code).
> 
> > 3. Previous version was relying on SQE128 flag. If user set the ring 
> > that way, it is assumed that PI information was sent.
> > This is more explicitly conveyed now - if user passed META_TYPE_PI flag, 
> > it has sent the PI. This comment in the code:
> > 
> > +       /* if sqe->meta_type is META_TYPE_PI, last 32 bytes are for PI */
> > +       union {
> > 
> > If this flag is not passed, parsing of second SQE is skipped, which is 
> > the current behavior as now also one can send regular (non pi) 
> > read/write on SQE128 ring.
> 
> And while I don't understand how this threads in with the previous
> statements, this makes sense.  If you only want to send a pointer (+len)
> to metadata you can use the normal 64-byte SQE.  If you want to send
> a PI tuple you need SEQ128.  Is that what the various above statements
> try to express?  If so the right API to me would be to have two flags:
> 
>  - a flag that a pointer to metadata is passed.  This can work with
>    a 64-bit SQE.
>  - another flag that a PI tuple is passed.  This requires a 128-byte
>    and also the previous flag.

I don't think anything done so far aligns with what Pavel had in mind.
Let me try to lay out what I think he's going for. Just bare with me,
this is just a hypothetical example.

  This patch adds a PI extension.
  Later, let's say write streams needs another extenion.
  Then key per-IO wants another extention.
  Then someone else adds wizbang-awesome-feature extention.

Let's say you have device that can do all 4, or any combination of them.
Pavel wants a solution that is future proof to such a scenario. So not
just a single new "meta_type" with its structure, but a list of types in
no particular order, and their structures.

That list can exist either in the extended SQE, or in some other user
address that the kernel will need copy.

  reply	other threads:[~2024-11-05 16:23 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20241104141427epcas5p2174ded627e2d785294ac4977b011a75b@epcas5p2.samsung.com>
2024-11-04 14:05 ` [PATCH v7 00/10] Read/Write with meta/integrity Anuj Gupta
     [not found]   ` <CGME20241104141445epcas5p3fa11a5bebe88ac2bb3541850369591f7@epcas5p3.samsung.com>
2024-11-04 14:05     ` [PATCH v7 01/10] block: define set of integrity flags to be inherited by cloned bip Anuj Gupta
     [not found]   ` <CGME20241104141448epcas5p4179505e12f9cf45fd792dc6da6afce8e@epcas5p4.samsung.com>
2024-11-04 14:05     ` [PATCH v7 02/10] block: copy back bounce buffer to user-space correctly in case of split Anuj Gupta
2024-11-05 10:03       ` Christoph Hellwig
2024-11-05 13:15         ` Anuj gupta
     [not found]   ` <CGME20241104141451epcas5p2aef1f93e905c27e34b3e16d89ff39245@epcas5p2.samsung.com>
2024-11-04 14:05     ` [PATCH v7 03/10] block: modify bio_integrity_map_user to accept iov_iter as argument Anuj Gupta
     [not found]   ` <CGME20241104141453epcas5p201e4aabfa7aa1f4af1cdf07228f8d4e7@epcas5p2.samsung.com>
2024-11-04 14:05     ` [PATCH v7 04/10] fs, iov_iter: define meta io descriptor Anuj Gupta
2024-11-05  9:55       ` Christoph Hellwig
     [not found]   ` <CGME20241104141456epcas5p38fef2ccde087de84ffc6f479f50e8071@epcas5p3.samsung.com>
2024-11-04 14:05     ` [PATCH v7 05/10] fs: introduce IOCB_HAS_METADATA for metadata Anuj Gupta
     [not found]   ` <CGME20241104141459epcas5p27991e140158b1e7294b4d6c4e767373c@epcas5p2.samsung.com>
2024-11-04 14:05     ` [PATCH v7 06/10] io_uring/rw: add support to send metadata along with read/write Anuj Gupta
2024-11-05  9:56       ` Christoph Hellwig
2024-11-05 13:04         ` Anuj gupta
2024-11-05 13:56           ` Christoph Hellwig
2024-11-05 15:51             ` Kanchan Joshi
2024-11-05 16:00               ` Christoph Hellwig
2024-11-05 16:23                 ` Keith Busch [this message]
2024-11-05 16:50                   ` Kanchan Joshi
2024-11-06  5:29                   ` Christoph Hellwig
2024-11-06  6:00                     ` Kanchan Joshi
2024-11-06  6:12                       ` Christoph Hellwig
2024-11-05 16:38                 ` Kanchan Joshi
2024-11-06  5:33                   ` Christoph Hellwig
     [not found]   ` <CGME20241104141501epcas5p38203d98ce0b2ac95cc45e02a142e84ef@epcas5p3.samsung.com>
2024-11-04 14:05     ` [PATCH v7 07/10] block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags Anuj Gupta
     [not found]   ` <CGME20241104141504epcas5p47e46a75f9248a37c9a4180de8e72b54c@epcas5p4.samsung.com>
2024-11-04 14:05     ` [PATCH v7 08/10] nvme: add support for passing on the application tag Anuj Gupta
     [not found]   ` <CGME20241104141507epcas5p161e39cef85f8fa5f5ad59e959e070d0b@epcas5p1.samsung.com>
2024-11-04 14:06     ` [PATCH v7 09/10] scsi: add support for user-meta interface Anuj Gupta
     [not found]   ` <CGME20241104141509epcas5p4ed0c68c42ccad27f9a38dc0c0ef7628d@epcas5p4.samsung.com>
2024-11-04 14:06     ` [PATCH v7 10/10] block: add support to pass user meta buffer Anuj Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox