From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A0A8C433EF for ; Mon, 28 Mar 2022 04:44:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238062AbiC1EqW (ORCPT ); Mon, 28 Mar 2022 00:46:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238061AbiC1EqW (ORCPT ); Mon, 28 Mar 2022 00:46:22 -0400 Received: from mail-lj1-x22e.google.com (mail-lj1-x22e.google.com [IPv6:2a00:1450:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36E86DFCE; Sun, 27 Mar 2022 21:44:42 -0700 (PDT) Received: by mail-lj1-x22e.google.com with SMTP id b43so13028720ljr.10; Sun, 27 Mar 2022 21:44:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=21sZB0cQIQyT4qQnFzBbbk2zGnmsUzVxHD9eQSWsXSM=; b=O+KOuXPC6hePjUcR3JaYu2NtHEyI4LardvYhun3ke3JbEboM/6CkPnkJJDAwFmCcv7 XlcLW2To4Am+zfpcyU4zlLlZzHZI46nNlJyy1vUsFRXiPN7dcvH/twJeHnkgJPaoyRUh fUyarJt5921dZk4m8OTY+6K5lNXhwvJqqSwhJvlJh8/9SPGCe9S/DNJeoaXJuSCxBKO0 f2vglgLp/NzEks/xBr5ok9jM+Sg6bLyOl24wwxTIneIEdSQPUxJlCwayzQ38atUrtbUF hjR0CVFY+f0Vd6gWn+zf607BCtz0MUpVkAzRLoQwN8jIv4Y8mltXT8kuSaiey7EO+gmQ AX6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=21sZB0cQIQyT4qQnFzBbbk2zGnmsUzVxHD9eQSWsXSM=; b=p7wgUlKc5WOOdMwqLbGJTt0ogCBi/j4H0kFSvJRfHq5DtYzqDjZonWh07Y4z2QHCbr TMHdmUsmnkfKnaCCqpWh5Z5q77VhNtSqc2M21VtivGNGNtP4nTczxzUL5ft8q560wwUp gxcKdBr3o9ncL5JTB4zIzP6I4JcwktbaZ0WMZ9G81h6XYRDBo1luovAMgcpm0ZSSMngB LwX7h1z9IUrVdc4XuTE8NRlee3t7ryPSz0K9flYi0IwtqsAkNrIC+fcvenTFw49QGWmS 8l0wRn/VtL4a0m/mAxpEx959OXQuRuFeELcKLyeRgomO5kqBA4hsus/PEnqflSs4AdeB nNNg== X-Gm-Message-State: AOAM532hbr3uhcm6okbXEP4lLAEZ5nVB4+Vqsp828vpBzz3Ta56qUWNr w/Qv6dW66qTsf1P2+/qUBw8BJRJSGsoXonKOeic= X-Google-Smtp-Source: ABdhPJxgExojDOUG55D0M4bMrvzsdvV63x26gqkghravOLYEjaWru8kgts2OevmizkhBxwU1zczAKfndxGYVXDqDtk4= X-Received: by 2002:a2e:302:0:b0:24a:c997:d34c with SMTP id 2-20020a2e0302000000b0024ac997d34cmr7433761ljd.445.1648442680381; Sun, 27 Mar 2022 21:44:40 -0700 (PDT) MIME-Version: 1.0 References: <20220308152105.309618-1-joshi.k@samsung.com> <20220308152105.309618-18-joshi.k@samsung.com> <20220310083652.GF26614@lst.de> <20220310141945.GA890@lst.de> <20220311062710.GA17232@lst.de> <20220324063218.GC12660@lst.de> <20220325133921.GA13818@test-zns> In-Reply-To: <20220325133921.GA13818@test-zns> From: Kanchan Joshi Date: Mon, 28 Mar 2022 10:14:13 +0530 Message-ID: Subject: Re: [PATCH 17/17] nvme: enable non-inline passthru commands To: Kanchan Joshi Cc: Christoph Hellwig , Jens Axboe , Keith Busch , Pavel Begunkov , io-uring@vger.kernel.org, linux-nvme@lists.infradead.org, linux-block@vger.kernel.org, sbates@raithlin.com, logang@deltatee.com, Pankaj Raghav , =?UTF-8?Q?Javier_Gonz=C3=A1lez?= , Luis Chamberlain , Adam Manzanares , Anuj Gupta Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org > >I disagree. Reusing the same opcode and/or structure for something > >fundamentally different creates major confusion. Don't do it. > > Ok. If you are open to take new opcode/struct route, that is all we > require to pair with big-sqe and have this sorted. How about this - > > +/* same as nvme_passthru_cmd64 but expecting result field to be pointer */ > +struct nvme_passthru_cmd64_ind { > + __u8 opcode; > + __u8 flags; > + __u16 rsvd1; > + __u32 nsid; > + __u32 cdw2; > + __u32 cdw3; > + __u64 metadata; > + __u64 addr; > + __u32 metadata_len; > + union { > + __u32 data_len; /* for non-vectored io */ > + __u32 vec_cnt; /* for vectored io */ > + }; > + __u32 cdw10; > + __u32 cdw11; > + __u32 cdw12; > + __u32 cdw13; > + __u32 cdw14; > + __u32 cdw15; > + __u32 timeout_ms; > + __u32 rsvd2; > + __u64 presult; /* pointer to result */ > +}; > + > #define nvme_admin_cmd nvme_passthru_cmd > > +#define NVME_IOCTL_IO64_CMD_IND _IOWR('N', 0x50, struct nvme_passthru_cmd64_ind) > > Not heavy on code-volume too, because only one statement (updating > result) changes and we reuse everything else. > > >> >From all that we discussed, maybe the path forward could be this: > >> - inline-cmd/big-sqe is useful if paired with big-cqe. Drop big-sqe > >> for now if we cannot go the big-cqe route. > >> - use only indirect-cmd as this requires nothing special, just regular > >> sqe and cqe. We can support all passthru commands with a lot less > >> code. No new ioctl in nvme, so same semantics. For common commands > >> (i.e. read/write) we can still avoid updating the result (put_user > >> cost will go). > >> > >> Please suggest if we should approach this any differently in v2. > > > >Personally I think larger SQEs and CQEs are the only sensible interface > >here. Everything else just fails like a horrible hack I would not want > >to support in NVMe. > > So far we have gathered three choices: > > (a) big-sqe + new opcode/struct in nvme > (b) big-sqe + big-cqe > (c) regular-sqe + regular-cqe > > I can post a RFC on big-cqe work if that is what it takes to evaluate > clearly what path to take. But really, the code is much more compared > to choice (a) and (c). Differentiating one CQE with another does not > seem very maintenance-friendly, particularly in liburing. > > For (c), I did not get what part feels like horrible hack. > It is same as how we do sync passthru - read passthru command from > user-space memory, and update result into that on completion. > But yes, (a) seems like the best option to me. Thinking a bit more on "(b) big-sqe + big-cqe". Will that also require a new ioctl (other than NVME_IOCTL_IO64_CMD) in nvme? Because semantics will be slightly different (i.e. not updating the result inside the passthrough command but sending it out-of-band to io_uring). Or am I just overthinking it.