On Thu, Mar 24, 2022 at 07:32:18AM +0100, Christoph Hellwig wrote:
>On Tue, Mar 22, 2022 at 10:40:27PM +0530, Kanchan Joshi wrote:
>> On Fri, Mar 11, 2022 at 11:57 AM Christoph Hellwig <hch@lst.de> wrote:
>> > > And that's because this ioctl requires additional "__u64 result;" to
>> > > be updated within "struct nvme_passthru_cmd64".
>> > > To update that during completion, we need, at the least, the result
>> > > field to be a pointer "__u64 result_ptr" inside the struct
>> > > nvme_passthru_cmd64.
>> > > Do you see that is possible without adding a new passthru ioctl in nvme?
>> >
>> > We don't need a new passthrough ioctl in nvme.
>> Right. Maybe it is easier for applications if they get to use the same
>> ioctl opcode/structure that they know well already.
>
>I disagree.  Reusing the same opcode and/or structure for something
>fundamentally different creates major confusion.  Don't do it.

Ok. If you are open to take new opcode/struct route, that is all we
require to pair with big-sqe and have this sorted. How about this -

+/* same as nvme_passthru_cmd64 but expecting result field to be pointer */
+struct nvme_passthru_cmd64_ind {
+       __u8    opcode;
+       __u8    flags;
+       __u16   rsvd1;
+       __u32   nsid;
+       __u32   cdw2;
+       __u32   cdw3;
+       __u64   metadata;
+       __u64   addr;
+       __u32   metadata_len;
+       union {
+               __u32   data_len; /* for non-vectored io */
+               __u32   vec_cnt; /* for vectored io */
+       };
+       __u32   cdw10;
+       __u32   cdw11;
+       __u32   cdw12;
+       __u32   cdw13;
+       __u32   cdw14;
+       __u32   cdw15;
+       __u32   timeout_ms;
+       __u32   rsvd2;
+       __u64   presult; /* pointer to result */
+};
+
 #define nvme_admin_cmd nvme_passthru_cmd

+#define NVME_IOCTL_IO64_CMD_IND        _IOWR('N', 0x50, struct nvme_passthru_cmd64_ind)

Not heavy on code-volume too, because only one statement (updating
result) changes and we reuse everything else.

>> >From all that we discussed, maybe the path forward could be this:
>> - inline-cmd/big-sqe is useful if paired with big-cqe. Drop big-sqe
>> for now if we cannot go the big-cqe route.
>> - use only indirect-cmd as this requires nothing special, just regular
>> sqe and cqe. We can support all passthru commands with a lot less
>> code. No new ioctl in nvme, so same semantics. For common commands
>> (i.e. read/write) we can still avoid updating the result (put_user
>> cost will go).
>>
>> Please suggest if we should approach this any differently in v2.
>
>Personally I think larger SQEs and CQEs are the only sensible interface
>here.  Everything else just fails like a horrible hack I would not want
>to support in NVMe.

So far we have gathered three choices: 

(a) big-sqe + new opcode/struct in nvme
(b) big-sqe + big-cqe
(c) regular-sqe + regular-cqe

I can post a RFC on big-cqe work if that is what it takes to evaluate
clearly what path to take. But really, the code is much more compared
to choice (a) and (c). Differentiating one CQE with another does not
seem very maintenance-friendly, particularly in liburing.

For (c), I did not get what part feels like horrible hack.
It is same as how we do sync passthru - read passthru command from
user-space memory, and update result into that on completion.
But yes, (a) seems like the best option to me.