On Thu, Mar 24, 2022 at 07:32:18AM +0100, Christoph Hellwig wrote: >On Tue, Mar 22, 2022 at 10:40:27PM +0530, Kanchan Joshi wrote: >> On Fri, Mar 11, 2022 at 11:57 AM Christoph Hellwig wrote: >> > > And that's because this ioctl requires additional "__u64 result;" to >> > > be updated within "struct nvme_passthru_cmd64". >> > > To update that during completion, we need, at the least, the result >> > > field to be a pointer "__u64 result_ptr" inside the struct >> > > nvme_passthru_cmd64. >> > > Do you see that is possible without adding a new passthru ioctl in nvme? >> > >> > We don't need a new passthrough ioctl in nvme. >> Right. Maybe it is easier for applications if they get to use the same >> ioctl opcode/structure that they know well already. > >I disagree. Reusing the same opcode and/or structure for something >fundamentally different creates major confusion. Don't do it. Ok. If you are open to take new opcode/struct route, that is all we require to pair with big-sqe and have this sorted. How about this - +/* same as nvme_passthru_cmd64 but expecting result field to be pointer */ +struct nvme_passthru_cmd64_ind { + __u8 opcode; + __u8 flags; + __u16 rsvd1; + __u32 nsid; + __u32 cdw2; + __u32 cdw3; + __u64 metadata; + __u64 addr; + __u32 metadata_len; + union { + __u32 data_len; /* for non-vectored io */ + __u32 vec_cnt; /* for vectored io */ + }; + __u32 cdw10; + __u32 cdw11; + __u32 cdw12; + __u32 cdw13; + __u32 cdw14; + __u32 cdw15; + __u32 timeout_ms; + __u32 rsvd2; + __u64 presult; /* pointer to result */ +}; + #define nvme_admin_cmd nvme_passthru_cmd +#define NVME_IOCTL_IO64_CMD_IND _IOWR('N', 0x50, struct nvme_passthru_cmd64_ind) Not heavy on code-volume too, because only one statement (updating result) changes and we reuse everything else. >> >From all that we discussed, maybe the path forward could be this: >> - inline-cmd/big-sqe is useful if paired with big-cqe. Drop big-sqe >> for now if we cannot go the big-cqe route. >> - use only indirect-cmd as this requires nothing special, just regular >> sqe and cqe. We can support all passthru commands with a lot less >> code. No new ioctl in nvme, so same semantics. For common commands >> (i.e. read/write) we can still avoid updating the result (put_user >> cost will go). >> >> Please suggest if we should approach this any differently in v2. > >Personally I think larger SQEs and CQEs are the only sensible interface >here. Everything else just fails like a horrible hack I would not want >to support in NVMe. So far we have gathered three choices: (a) big-sqe + new opcode/struct in nvme (b) big-sqe + big-cqe (c) regular-sqe + regular-cqe I can post a RFC on big-cqe work if that is what it takes to evaluate clearly what path to take. But really, the code is much more compared to choice (a) and (c). Differentiating one CQE with another does not seem very maintenance-friendly, particularly in liburing. For (c), I did not get what part feels like horrible hack. It is same as how we do sync passthru - read passthru command from user-space memory, and update result into that on completion. But yes, (a) seems like the best option to me.