From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C64FC433EF for ; Mon, 4 Apr 2022 15:14:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345897AbiDDPQp (ORCPT ); Mon, 4 Apr 2022 11:16:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234743AbiDDPQo (ORCPT ); Mon, 4 Apr 2022 11:16:44 -0400 Received: from mail-ot1-x32e.google.com (mail-ot1-x32e.google.com [IPv6:2607:f8b0:4864:20::32e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1A342B196 for ; Mon, 4 Apr 2022 08:14:47 -0700 (PDT) Received: by mail-ot1-x32e.google.com with SMTP id n19-20020a9d7113000000b005cd9cff76c3so7458357otj.1 for ; Mon, 04 Apr 2022 08:14:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=Cxv6eRZihM5d2+KA9Gy/o6F4LSNKpnki/Cp9tds22EU=; b=V5VblXb5znlCxPmvo3sok6C1BjFPwvtWApDnW/3AmtQzRz2Pb+vAQAjZRiMWfpcTUj DAjH0h4DRR8oLoVVKMCPLu8tsp6Tw9+tkz9P6saZJuQiGliDmrLL9VVgKxUpBfLHaU8i KxZ+OWEDQFWZi9yVegYuB8Z+FcG6R97cyrpySSjM5tVNJATny9acknhhOnP2ulygmY8S oHGjW83qHYGlGFp6Sw/qvlOzkCjaF/VITyR35ynSXIqZl6Hd+GFFckOSHFcm+Wneao9t ejDeeAIm9mYJy8RLcJNIKcvcQJrETrtU7HE5rBsK75/lDrok8jD3PJEdgwToS9Wqy6iD EW6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=Cxv6eRZihM5d2+KA9Gy/o6F4LSNKpnki/Cp9tds22EU=; b=j3wruIrwlE3gD9IHB53XKLTCZcf5mYC9nMlJ6OuqYp6/lJ4WMjBj5cllvFGzPVsU+x e0XowFPHlLKZ0wv3GpTrvZs3aRnQ+dpm5RY0qoCkDfihl5W9+PDs/HuostZZma/WDGp4 5CRoerlDMdRVL7d8SRzEzwhJWxrJiMUIWgrXMjEDUtYRDpQGRn0C9qc5/0yHGr3vVGOH eR18UM6IHj9lNSFWkB5/EmqxDYXhPVsqpehtNosbZQnXEeorIgBuPWHMUZ14IZ9C51ml AMMi4EZugXAtMJ5aWezR4qaebM/AsXnjk3gOUHcw8W3f1g3UDYnFhhcue3zy8opTniv9 7vbA== X-Gm-Message-State: AOAM530TTlYvG+yEWgLbrQak+06mc77oA42lQDSLQ1x47GpTNXxHfxIW kTy2GOeooSJoi0+AAbBB994J5hhYvNoqNi3/B57yrEjOWlbAzg== X-Google-Smtp-Source: ABdhPJyJ8AjKxuFryAua5NpBUOSJbti4efWspiDfTtSRUNZCcdYnfI+uJUnGYQ8d9iPalw+hVsibY3rN7Lc6Tu1Lbvk= X-Received: by 2002:a9d:eef:0:b0:5d2:8e2f:6729 with SMTP id 102-20020a9d0eef000000b005d28e2f6729mr241040otj.86.1649085287158; Mon, 04 Apr 2022 08:14:47 -0700 (PDT) MIME-Version: 1.0 References: <20220401110310.611869-1-joshi.k@samsung.com> <20220401110310.611869-4-joshi.k@samsung.com> <20220404071656.GC444@lst.de> In-Reply-To: <20220404071656.GC444@lst.de> From: Kanchan Joshi Date: Mon, 4 Apr 2022 20:44:20 +0530 Message-ID: Subject: Re: [RFC 3/5] io_uring: add infra and support for IORING_OP_URING_CMD To: Christoph Hellwig Cc: Kanchan Joshi , Jens Axboe , io-uring@vger.kernel.org, linux-nvme@lists.infradead.org, Pavel Begunkov , Ming Lei , Luis Chamberlain , Pankaj Raghav , =?UTF-8?Q?Javier_Gonz=C3=A1lez?= , Anuj Gupta Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org On Mon, Apr 4, 2022 at 12:46 PM Christoph Hellwig wrote: > > Cann we plese spell out instastructure here? Or did you mean > infraread anyway :) :-) sure, I see the problem with this shorthand now. > > -enum io_uring_cmd_flags { > > - IO_URING_F_COMPLETE_DEFER =3D 1, > > - IO_URING_F_UNLOCKED =3D 2, > > - /* int's last bit, sign checks are usually faster than a bit test= */ > > - IO_URING_F_NONBLOCK =3D INT_MIN, > > -}; > > This doesn't actually get used anywhere outside of io_uring.c, so why > move it? This got missed out. We are going to use it for things like this in nvme (from past series): + if (ioucmd && (ioucmd->flags & IO_URING_F_NONBLOCK)) { + rq_flags |=3D REQ_NOWAIT; + blk_flags |=3D BLK_MQ_REQ_NOWAIT; + } + req =3D nvme_alloc_request(q, cmd, blk_flags, rq_flags); Also for polling too. Will fix this up in the next version. > > +static void io_uring_cmd_work(struct io_kiocb *req, bool *locked) > > +{ > > + req->uring_cmd.driver_cb(&req->uring_cmd); > > +} > > + > > +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, > > + void (*driver_cb)(struct io_uring_cmd *)) > > +{ > > + struct io_kiocb *req =3D container_of(ioucmd, struct io_kiocb, ur= ing_cmd); > > + > > + req->uring_cmd.driver_cb =3D driver_cb; > > + req->io_task_work.func =3D io_uring_cmd_work; > > + io_req_task_work_add(req, !!(req->ctx->flags & IORING_SETUP_SQPOL= L)); > > +} > > +EXPORT_SYMBOL_GPL(io_uring_cmd_complete_in_task); > > I'm still not a fund of the double indirect call here. I don't really > have a good idea yet, but I plan to look into it. > > > static void io_req_task_queue_fail(struct io_kiocb *req, int ret) > > Also it would be great to not add it between io_req_task_queue_fail and > the callback set by it. Right. Will change. > > +void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret) > > +{ > > + struct io_kiocb *req =3D container_of(ioucmd, struct io_kiocb, ur= ing_cmd); > > + > > + if (ret < 0) > > + req_set_fail(req); > > + io_req_complete(req, ret); > > +} > > +EXPORT_SYMBOL_GPL(io_uring_cmd_done); > > It seems like all callers of io_req_complete actually call req_set_fail > on failure. So maybe it would be nice pre-cleanup to handle the > req_set_fail call from =C4=A9o_req_complete? io_tee and io_splice do slightly different handling. If we take this route, it seems they can use __io_req_complete() instead. > > + /* queued async, consumer will call io_uring_cmd_done() when comp= lete */ > > + if (ret =3D=3D -EIOCBQUEUED) > > + return 0; > > + io_uring_cmd_done(ioucmd, ret); > > Why not: > if (ret !=3D -EIOCBQUEUED) > io_uring_cmd_done(ioucmd, ret); > return 0; > > That being said I wonder why not just remove the retun value from > ->async_cmd entirely and just require the implementation to always call > io_uring_cmd_done? That removes the confusion on who needs to call it > entirely, similarly to what we do in the block layer for ->submit_bio. Right, this seems doable at this point as we do not do any special handling (in io_uring) on the return code. > > +struct io_uring_cmd { > > + struct file *file; > > + void *cmd; > > + /* for irq-completion - if driver requires doing stuff in task-co= ntext*/ > > + void (*driver_cb)(struct io_uring_cmd *cmd); > > + u32 flags; > > + u32 cmd_op; > > + u16 cmd_len; > > The cmd_len field does not seem to actually be used anywhere. Another stuff that got left out from the previous series :-( Using this field for a bit of sanity checking at the moment. Like this in n= vme: + if (ioucmd->cmd_len !=3D sizeof(struct nvme_passthru_cmd64)) + return -EINVAL; + cptr =3D (struct nvme_passthru_cmd64 *)ioucmd->cmd; > > +++ b/include/uapi/linux/io_uring.h > > @@ -22,10 +22,12 @@ struct io_uring_sqe { > > union { > > __u64 off; /* offset into file */ > > __u64 addr2; > > + __u32 cmd_op; > > }; > > union { > > __u64 addr; /* pointer to buffer or iovecs */ > > __u64 splice_off_in; > > + __u16 cmd_len; > > }; > > __u32 len; /* buffer size or number of iovecs */ > > union { > > @@ -60,7 +62,10 @@ struct io_uring_sqe { > > __s32 splice_fd_in; > > __u32 file_index; > > }; > > - __u64 __pad2[2]; > > + union { > > + __u64 __pad2[2]; > > + __u64 cmd; > > + }; > > Can someone explain these changes to me a little more? All these three (cmd_op, cmd_len and cmd) are operation specific fields. user-space supplies these into SQE, io_uring packages those into "struct io_uring_cmd" and pass that down to the provider for doing the real processing. 1. cmd_op =3D operation code for async cmd (e.g. passthru ioctl opcode or whatever else we want to turn async from user-space) 2. cmd_len =3D actual length of async command (e.g. we have max 80 bytes with big-sqe and for nvme-passthru we use the max, but for some other usecase one can go with smaller length) 3. cmd =3D this is the starting-offset where async-command is placed (by user-space) inside the big-sqe. 16 bytes in first-sqe, and next 64 bytes comes from second-sqe. And if we were doing pointer-based command submission, this "cmd" would have pointed to user-space command (of whatever length). Just to put the entire thought-process across; I understand that we are not taking that route.