public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Kanchan Joshi <[email protected]>, [email protected]
Cc: [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected], [email protected], [email protected],
	[email protected]
Subject: Re: [PATCH v4 1/5] fs,io_uring: add infrastructure for uring-cmd
Date: Thu, 5 May 2022 11:04:41 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 5/5/22 10:17 AM, Jens Axboe wrote:
> On 5/5/22 12:06 AM, Kanchan Joshi wrote:
>> +static int io_uring_cmd_prep(struct io_kiocb *req,
>> +			     const struct io_uring_sqe *sqe)
>> +{
>> +	struct io_uring_cmd *ioucmd = &req->uring_cmd;
>> +	struct io_ring_ctx *ctx = req->ctx;
>> +
>> +	if (ctx->flags & IORING_SETUP_IOPOLL)
>> +		return -EOPNOTSUPP;
>> +	/* do not support uring-cmd without big SQE/CQE */
>> +	if (!(ctx->flags & IORING_SETUP_SQE128))
>> +		return -EOPNOTSUPP;
>> +	if (!(ctx->flags & IORING_SETUP_CQE32))
>> +		return -EOPNOTSUPP;
>> +	if (sqe->ioprio || sqe->rw_flags)
>> +		return -EINVAL;
>> +	ioucmd->cmd = sqe->cmd;
>> +	ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
>> +	return 0;
>> +}
> 
> While looking at the other suggested changes, I noticed a more
> fundamental issue with the passthrough support. For any other command,
> SQE contents are stable once prep has been done. The above does do that
> for the basic items, but this case is special as the lower level command
> itself resides in the SQE.
> 
> For cases where the command needs deferral, it's problematic. There are
> two main cases where this can happen:
> 
> - The issue attempt yields -EAGAIN (we ran out of requests, etc). If you
>   look at other commands, if they have data that doesn't fit in the
>   io_kiocb itself, then they need to allocate room for that data and have
>   it be persistent
> 
> - Deferral is specified by the application, using eg IOSQE_IO_LINK or
>   IOSQE_ASYNC.
> 
> We're totally missing support for both of these cases. Consider the case
> where the ring is setup with an SQ size of 1. You prep a passthrough
> command (request A) and issue it with io_uring_submit(). Due to one of
> the two above mentioned conditions, the internal request is deferred.
> Either it was sent to ->uring_cmd() but we got -EAGAIN, or it was
> deferred even before that happened. The application doesn't know this
> happened, it gets another SQE to submit a new request (request B). Fills
> it in, calls io_uring_submit(). Since we only have one SQE available in
> that ring, when request A gets re-issued, it's now happily reading SQE
> contents from command B. Oops.
> 
> This is why prep handlers are the only ones that get an sqe passed to
> them. They are supposed to ensure that we no longer read from the SQE
> past that. Applications can always rely on that fact that once
> io_uring_submit() has been done, which consumes the SQE in the SQ ring,
> that no further reads are done from that SQE.
> 
> IOW, we need io_req_prep_async() handling for URING_CMD, which needs to
> allocate the full 80 bytes and copy them over. Subsequent issue attempts
> will then use that memory rather than the SQE parts. Just need to find a
> sane way to do that so we don't need to make the usual prep + direct
> issue path any slower.

This one should take care of that. Some parts should be folded into
patch 1, other bits into the nvme adoption.


diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 8c3b15d3e86d..7daa95d50db1 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -411,8 +411,7 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 		struct io_uring_cmd *ioucmd, unsigned int issue_flags, bool vec)
 {
-	struct nvme_uring_cmd *cmd =
-		(struct nvme_uring_cmd *)ioucmd->cmd;
+	struct nvme_uring_cmd *cmd = ioucmd->cmd;
 	struct request_queue *q = ns ? ns->queue : ctrl->admin_q;
 	struct nvme_command c;
 	struct request *req;
@@ -548,8 +547,8 @@ long nvme_ns_chr_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 	return __nvme_ioctl(ns, cmd, (void __user *)arg);
 }
 
-static void nvme_ns_uring_cmd(struct nvme_ns *ns, struct io_uring_cmd *ioucmd,
-		unsigned int issue_flags)
+static int nvme_ns_uring_cmd(struct nvme_ns *ns, struct io_uring_cmd *ioucmd,
+			     unsigned int issue_flags)
 {
 	int ret;
 
@@ -567,15 +566,15 @@ static void nvme_ns_uring_cmd(struct nvme_ns *ns, struct io_uring_cmd *ioucmd,
 		ret = -ENOTTY;
 	}
 
-	if (ret != -EIOCBQUEUED)
-		io_uring_cmd_done(ioucmd, ret, 0);
+	return ret;
 }
 
-void nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
+int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd, unsigned int issue_flags)
 {
 	struct nvme_ns *ns = container_of(file_inode(ioucmd->file)->i_cdev,
 			struct nvme_ns, cdev);
-	nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
+
+	return nvme_ns_uring_cmd(ns, ioucmd, issue_flags);
 }
 
 #ifdef CONFIG_NVME_MULTIPATH
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 761ad6c629c4..26adc7660d28 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -783,7 +783,7 @@ long nvme_ns_head_chr_ioctl(struct file *file, unsigned int cmd,
 		unsigned long arg);
 long nvme_dev_ioctl(struct file *file, unsigned int cmd,
 		unsigned long arg);
-void nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd,
+int nvme_ns_chr_uring_cmd(struct io_uring_cmd *ioucmd,
 		unsigned int issue_flags);
 void nvme_ns_head_chr_uring_cmd(struct io_uring_cmd *ioucmd,
 		unsigned int issue_flags);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index cf7087dc12b3..cee7fc95f2c2 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1251,6 +1251,8 @@ static const struct io_op_def io_op_defs[] = {
 	[IORING_OP_URING_CMD] = {
 		.needs_file		= 1,
 		.plug			= 1,
+		.async_size		= 2 * sizeof(struct io_uring_sqe) -
+					  offsetof(struct io_uring_sqe, cmd),
 	},
 };
 
@@ -4936,6 +4938,20 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, ssize_t res2)
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_done);
 
+static int io_uring_cmd_prep_async(struct io_kiocb *req)
+{
+	struct io_uring_cmd *ioucmd = &req->uring_cmd;
+	struct io_ring_ctx *ctx = req->ctx;
+	size_t cmd_size = sizeof(struct io_uring_sqe) -
+				offsetof(struct io_uring_sqe, cmd);
+
+	if (ctx->flags & IORING_SETUP_SQE128)
+		cmd_size += sizeof(struct io_uring_sqe);
+
+	memcpy(req->async_data, ioucmd->cmd, cmd_size);
+	return 0;
+}
+
 static int io_uring_cmd_prep(struct io_kiocb *req,
 			     const struct io_uring_sqe *sqe)
 {
@@ -4951,7 +4967,7 @@ static int io_uring_cmd_prep(struct io_kiocb *req,
 		return -EOPNOTSUPP;
 	if (sqe->ioprio || sqe->rw_flags)
 		return -EINVAL;
-	ioucmd->cmd = sqe->cmd;
+	ioucmd->cmd = (void *) sqe->cmd;
 	ioucmd->cmd_op = READ_ONCE(sqe->cmd_op);
 	return 0;
 }
@@ -4960,10 +4976,18 @@ static int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct file *file = req->file;
 	struct io_uring_cmd *ioucmd = &req->uring_cmd;
+	int ret;
 
 	if (!req->file->f_op->uring_cmd)
 		return -EOPNOTSUPP;
-	file->f_op->uring_cmd(ioucmd, issue_flags);
+
+	if (req_has_async_data(req))
+		ioucmd->cmd = req->async_data;
+
+	ret = file->f_op->uring_cmd(ioucmd, issue_flags);
+	if (ret != -EIOCBQUEUED)
+		io_uring_cmd_done(ioucmd, ret, 0);
+
 	return 0;
 }
 
@@ -7849,6 +7873,8 @@ static int io_req_prep_async(struct io_kiocb *req)
 		return io_recvmsg_prep_async(req);
 	case IORING_OP_CONNECT:
 		return io_connect_prep_async(req);
+	case IORING_OP_URING_CMD:
+		return io_uring_cmd_prep_async(req);
 	}
 	printk_once(KERN_WARNING "io_uring: prep_async() bad opcode %d\n",
 		    req->opcode);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 30c98d9993df..87b5af1d9fbe 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1996,7 +1996,7 @@ struct file_operations {
 				   struct file *file_out, loff_t pos_out,
 				   loff_t len, unsigned int remap_flags);
 	int (*fadvise)(struct file *, loff_t, loff_t, int);
-	void (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
+	int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags);
 } __randomize_layout;
 
 struct inode_operations {
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index e4cbc80949ce..74371503478a 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -14,10 +14,11 @@ enum io_uring_cmd_flags {
 
 struct io_uring_cmd {
 	struct file	*file;
-	const u8	*cmd;
+	void		*cmd;
 	/* callback to defer completions to task context */
 	void (*task_work_cb)(struct io_uring_cmd *cmd);
 	u32		cmd_op;
+	u32		pad;
 	u8		pdu[32]; /* available inline for free use */
 };
 

-- 
Jens Axboe


  reply	other threads:[~2022-05-05 17:04 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20220505061142epcas5p2c943572766bfd5088138fe0f7873c96c@epcas5p2.samsung.com>
2022-05-05  6:06 ` [PATCH v4 0/5] io_uring passthrough for nvme Kanchan Joshi
     [not found]   ` <CGME20220505061144epcas5p3821a9516dad2b5eff5a25c56dbe164df@epcas5p3.samsung.com>
2022-05-05  6:06     ` [PATCH v4 1/5] fs,io_uring: add infrastructure for uring-cmd Kanchan Joshi
2022-05-05 12:52       ` Jens Axboe
2022-05-05 13:48         ` Ming Lei
2022-05-05 13:54           ` Jens Axboe
2022-05-05 13:29       ` Christoph Hellwig
2022-05-05 16:17       ` Jens Axboe
2022-05-05 17:04         ` Jens Axboe [this message]
2022-05-06  7:12         ` Kanchan Joshi
2022-05-10 14:23         ` Kanchan Joshi
2022-05-10 14:35           ` Jens Axboe
     [not found]   ` <CGME20220505061146epcas5p3919c48d58d353a62a5858ee10ad162a0@epcas5p3.samsung.com>
2022-05-05  6:06     ` [PATCH v4 2/5] block: wire-up support for passthrough plugging Kanchan Joshi
2022-05-05 14:21       ` Ming Lei
     [not found]   ` <CGME20220505061148epcas5p188618b5b15a95cbe48c8c1559a18c994@epcas5p1.samsung.com>
2022-05-05  6:06     ` [PATCH v4 3/5] nvme: refactor nvme_submit_user_cmd() Kanchan Joshi
2022-05-05 13:30       ` Christoph Hellwig
2022-05-05 18:37       ` Clay Mayers
2022-05-05 19:03         ` Jens Axboe
2022-05-05 19:11           ` Jens Axboe
2022-05-05 19:30             ` Clay Mayers
2022-05-05 19:31               ` Jens Axboe
2022-05-05 19:50                 ` hch
2022-05-05 20:44                   ` Jens Axboe
2022-05-06  5:56                     ` hch
     [not found]   ` <CGME20220505061150epcas5p2b60880c541a4b2f144c348834c7cbf0b@epcas5p2.samsung.com>
2022-05-05  6:06     ` [PATCH v4 4/5] nvme: wire-up uring-cmd support for io-passthru on char-device Kanchan Joshi
2022-05-05 13:33       ` Christoph Hellwig
2022-05-05 13:38       ` Jens Axboe
2022-05-05 13:42         ` Christoph Hellwig
2022-05-05 13:50           ` Jens Axboe
2022-05-05 17:23             ` Jens Axboe
2022-05-06  8:28               ` Christoph Hellwig
2022-05-06 13:37                 ` Jens Axboe
2022-05-06 14:50                   ` Christoph Hellwig
2022-05-06 14:57                     ` Jens Axboe
2022-05-07  5:03                       ` Christoph Hellwig
2022-05-07 12:53                         ` Jens Axboe
2022-05-09  6:00                           ` Christoph Hellwig
2022-05-09 12:52                             ` Jens Axboe
     [not found]   ` <CGME20220505061151epcas5p2523dc661a0daf3e6185dee771eade393@epcas5p2.samsung.com>
2022-05-05  6:06     ` [PATCH v4 5/5] nvme: add vectored-io support for uring-cmd Kanchan Joshi
2022-05-05 18:20   ` [PATCH v4 0/5] io_uring passthrough for nvme Jens Axboe
2022-05-05 18:29     ` Jens Axboe
2022-05-06  6:42       ` Kanchan Joshi
2022-05-06 13:14         ` Jens Axboe
2022-05-10  7:20     ` Christoph Hellwig
2022-05-10 12:29       ` Jens Axboe
2022-05-10 14:21         ` Kanchan Joshi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox