From: Jens Axboe <axboe@kernel.dk>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: io-uring <io-uring@vger.kernel.org>
Subject: Re: [PATCH] io_uring/uring_cmd: be smarter about SQE copying
Date: Tue, 3 Jun 2025 09:11:14 -0600 [thread overview]
Message-ID: <9dd8a125-5d15-4480-a86e-87ee96e238a4@kernel.dk> (raw)
In-Reply-To: <CADUfDZo=mbiz=0wxKSihhw9cxRdj5Uojh=XO0aPxKOZKtEc22A@mail.gmail.com>
On 6/3/25 9:05 AM, Caleb Sander Mateos wrote:
> On Sat, May 31, 2025 at 1:52?PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> uring_cmd currently copies the SQE unconditionally, which was introduced
>> as a work-around in commit:
>>
>> d6211ebbdaa5 ("io_uring/uring_cmd: unconditionally copy SQEs at prep time")
>>
>> because the checking for whether or not this command may have ->issue()
>> called from io-wq wasn't complete. Rectify that, ensuring that if the
>> request is marked explicitly async via REQ_F_FORCE_ASYNC or if it's
>> part of a link chain, then the SQE is copied upfront.
>>
>> Always copying can be costly, particularly when dealing with SQE128
>> rings. But even a normal 64b SQE copy is noticeable at high enough
>> rates.
>>
>> Reported-by: Caleb Sander Mateos <csander@purestorage.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index 929cad6ee326..cb4b867a2656 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -181,29 +181,42 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2,
>> }
>> EXPORT_SYMBOL_GPL(io_uring_cmd_done);
>>
>> +static void io_uring_sqe_copy(struct io_kiocb *req, struct io_uring_cmd *ioucmd)
>> +{
>> + struct io_async_cmd *ac = req->async_data;
>> +
>> + if (ioucmd->sqe != ac->sqes) {
>> + memcpy(ac->sqes, ioucmd->sqe, uring_sqe_size(req->ctx));
>> + ioucmd->sqe = ac->sqes;
>> + }
>> +}
>> +
>> static int io_uring_cmd_prep_setup(struct io_kiocb *req,
>> const struct io_uring_sqe *sqe)
>> {
>> struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>> + struct io_ring_ctx *ctx = req->ctx;
>> struct io_async_cmd *ac;
>>
>> /* see io_uring_cmd_get_async_data() */
>> BUILD_BUG_ON(offsetof(struct io_async_cmd, data) != 0);
>>
>> - ac = io_uring_alloc_async_data(&req->ctx->cmd_cache, req);
>> + ac = io_uring_alloc_async_data(&ctx->cmd_cache, req);
>> if (!ac)
>> return -ENOMEM;
>> ac->data.op_data = NULL;
>>
>> /*
>> - * Unconditionally cache the SQE for now - this is only needed for
>> - * requests that go async, but prep handlers must ensure that any
>> - * sqe data is stable beyond prep. Since uring_cmd is special in
>> - * that it doesn't read in per-op data, play it safe and ensure that
>> - * any SQE data is stable beyond prep. This can later get relaxed.
>> + * Copy SQE now, if we know we're going async. Drain will set
>> + * FORCE_ASYNC, and assume links may cause it to go async. If not,
>> + * copy is deferred until issue time, if the request doesn't issue
>> + * or queue inline.
>> */
>> - memcpy(ac->sqes, sqe, uring_sqe_size(req->ctx));
>> - ioucmd->sqe = ac->sqes;
>> + ioucmd->sqe = sqe;
>> + if (req->flags & (REQ_F_FORCE_ASYNC| REQ_F_LINK | REQ_F_HARDLINK) ||
>> + ctx->submit_state.link.head)
>
> To check my understanding, io_init_req() will set REQ_F_FORCE_ASYNC on
> any request with IOSQE_IO_DRAIN as well as all subsequent requests
> until the IOSQE_IO_DRAIN request completes? Looks like this condition
Correct
> should work then. I think you can drop REQ_F_LINK | REQ_F_HARDLINK,
> though; the initial request of a linked chain will be issued
> synchronously, and ctx->submit_state.link.head will be set for the
> subsequent requests.
>
> I do share Pavel's concern that whether or not a request will be
> initially issued asynchronously is up to the core io_uring code, so it
> seems a bit fragile to make these assumptions in the uring_cmd layer.
> I think I would prefer either passing a bool issue_async to the
> ->prep() handler, or adding an optional ->prep_async() hook called if
> the initial issue may happen asynchronously.
Yes I do too, which is why I suggested we add a specific cold handler
for this kind of case. That leaves the core of io_uring calling it
appropriately, and eliminates the need for storing an sqe pointer. Which
would've been fine before resizing, but not a great idea to do now. I'll
do a v2 of this with that in mind, just haven't gotten around to it yet.
>> + io_uring_sqe_copy(req, ioucmd);
>> +
>> return 0;
>> }
>>
>> @@ -259,6 +272,12 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
>> }
>>
>> ret = file->f_op->uring_cmd(ioucmd, issue_flags);
>> + if (ret == -EAGAIN) {
>> + io_uring_sqe_copy(req, ioucmd);
>> + return ret;
>> + } else if (ret == -EIOCBQUEUED) {
>> + return ret;
>> + }
>> if (ret == -EAGAIN || ret == -EIOCBQUEUED)
>> return ret;
>
> This if condition is always false now, remove it?
Heh yes, just forgot to remove those lines when adding the above change.
--
Jens Axboe
prev parent reply other threads:[~2025-06-03 15:11 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-31 20:52 [PATCH] io_uring/uring_cmd: be smarter about SQE copying Jens Axboe
2025-06-02 10:24 ` Pavel Begunkov
2025-06-02 11:55 ` Jens Axboe
2025-06-02 12:10 ` Pavel Begunkov
2025-06-02 13:04 ` Jens Axboe
2025-06-03 15:05 ` Caleb Sander Mateos
2025-06-03 15:11 ` Jens Axboe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9dd8a125-5d15-4480-a86e-87ee96e238a4@kernel.dk \
--to=axboe@kernel.dk \
--cc=csander@purestorage.com \
--cc=io-uring@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox