public inbox for io-uring@vger.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Caleb Sander Mateos <csander@purestorage.com>
Cc: io-uring <io-uring@vger.kernel.org>
Subject: Re: [PATCH] io_uring/uring_cmd: be smarter about SQE copying
Date: Tue, 3 Jun 2025 09:11:14 -0600	[thread overview]
Message-ID: <9dd8a125-5d15-4480-a86e-87ee96e238a4@kernel.dk> (raw)
In-Reply-To: <CADUfDZo=mbiz=0wxKSihhw9cxRdj5Uojh=XO0aPxKOZKtEc22A@mail.gmail.com>

On 6/3/25 9:05 AM, Caleb Sander Mateos wrote:
> On Sat, May 31, 2025 at 1:52?PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> uring_cmd currently copies the SQE unconditionally, which was introduced
>> as a work-around in commit:
>>
>> d6211ebbdaa5 ("io_uring/uring_cmd: unconditionally copy SQEs at prep time")
>>
>> because the checking for whether or not this command may have ->issue()
>> called from io-wq wasn't complete. Rectify that, ensuring that if the
>> request is marked explicitly async via REQ_F_FORCE_ASYNC or if it's
>> part of a link chain, then the SQE is copied upfront.
>>
>> Always copying can be costly, particularly when dealing with SQE128
>> rings. But even a normal 64b SQE copy is noticeable at high enough
>> rates.
>>
>> Reported-by: Caleb Sander Mateos <csander@purestorage.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> ---
>>
>> diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
>> index 929cad6ee326..cb4b867a2656 100644
>> --- a/io_uring/uring_cmd.c
>> +++ b/io_uring/uring_cmd.c
>> @@ -181,29 +181,42 @@ void io_uring_cmd_done(struct io_uring_cmd *ioucmd, ssize_t ret, u64 res2,
>>  }
>>  EXPORT_SYMBOL_GPL(io_uring_cmd_done);
>>
>> +static void io_uring_sqe_copy(struct io_kiocb *req, struct io_uring_cmd *ioucmd)
>> +{
>> +       struct io_async_cmd *ac = req->async_data;
>> +
>> +       if (ioucmd->sqe != ac->sqes) {
>> +               memcpy(ac->sqes, ioucmd->sqe, uring_sqe_size(req->ctx));
>> +               ioucmd->sqe = ac->sqes;
>> +       }
>> +}
>> +
>>  static int io_uring_cmd_prep_setup(struct io_kiocb *req,
>>                                    const struct io_uring_sqe *sqe)
>>  {
>>         struct io_uring_cmd *ioucmd = io_kiocb_to_cmd(req, struct io_uring_cmd);
>> +       struct io_ring_ctx *ctx = req->ctx;
>>         struct io_async_cmd *ac;
>>
>>         /* see io_uring_cmd_get_async_data() */
>>         BUILD_BUG_ON(offsetof(struct io_async_cmd, data) != 0);
>>
>> -       ac = io_uring_alloc_async_data(&req->ctx->cmd_cache, req);
>> +       ac = io_uring_alloc_async_data(&ctx->cmd_cache, req);
>>         if (!ac)
>>                 return -ENOMEM;
>>         ac->data.op_data = NULL;
>>
>>         /*
>> -        * Unconditionally cache the SQE for now - this is only needed for
>> -        * requests that go async, but prep handlers must ensure that any
>> -        * sqe data is stable beyond prep. Since uring_cmd is special in
>> -        * that it doesn't read in per-op data, play it safe and ensure that
>> -        * any SQE data is stable beyond prep. This can later get relaxed.
>> +        * Copy SQE now, if we know we're going async. Drain will set
>> +        * FORCE_ASYNC, and assume links may cause it to go async. If not,
>> +        * copy is deferred until issue time, if the request doesn't issue
>> +        * or queue inline.
>>          */
>> -       memcpy(ac->sqes, sqe, uring_sqe_size(req->ctx));
>> -       ioucmd->sqe = ac->sqes;
>> +       ioucmd->sqe = sqe;
>> +       if (req->flags & (REQ_F_FORCE_ASYNC| REQ_F_LINK | REQ_F_HARDLINK) ||
>> +           ctx->submit_state.link.head)
> 
> To check my understanding, io_init_req() will set REQ_F_FORCE_ASYNC on
> any request with IOSQE_IO_DRAIN as well as all subsequent requests
> until the IOSQE_IO_DRAIN request completes? Looks like this condition

Correct

> should work then. I think you can drop REQ_F_LINK | REQ_F_HARDLINK,
> though; the initial request of a linked chain will be issued
> synchronously, and ctx->submit_state.link.head will be set for the
> subsequent requests.
> 
> I do share Pavel's concern that whether or not a request will be
> initially issued asynchronously is up to the core io_uring code, so it
> seems a bit fragile to make these assumptions in the uring_cmd layer.
> I think I would prefer either passing a bool issue_async to the
> ->prep() handler, or adding an optional ->prep_async() hook called if
> the initial issue may happen asynchronously.

Yes I do too, which is why I suggested we add a specific cold handler
for this kind of case. That leaves the core of io_uring calling it
appropriately, and eliminates the need for storing an sqe pointer. Which
would've been fine before resizing, but not a great idea to do now. I'll
do a v2 of this with that in mind, just haven't gotten around to it yet.

>> +               io_uring_sqe_copy(req, ioucmd);
>> +
>>         return 0;
>>  }
>>
>> @@ -259,6 +272,12 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
>>         }
>>
>>         ret = file->f_op->uring_cmd(ioucmd, issue_flags);
>> +       if (ret == -EAGAIN) {
>> +               io_uring_sqe_copy(req, ioucmd);
>> +               return ret;
>> +       } else if (ret == -EIOCBQUEUED) {
>> +               return ret;
>> +       }
>>         if (ret == -EAGAIN || ret == -EIOCBQUEUED)
>>                 return ret;
> 
> This if condition is always false now, remove it?

Heh yes, just forgot to remove those lines when adding the above change.

-- 
Jens Axboe

      reply	other threads:[~2025-06-03 15:11 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-31 20:52 [PATCH] io_uring/uring_cmd: be smarter about SQE copying Jens Axboe
2025-06-02 10:24 ` Pavel Begunkov
2025-06-02 11:55   ` Jens Axboe
2025-06-02 12:10     ` Pavel Begunkov
2025-06-02 13:04       ` Jens Axboe
2025-06-03 15:05 ` Caleb Sander Mateos
2025-06-03 15:11   ` Jens Axboe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9dd8a125-5d15-4480-a86e-87ee96e238a4@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=csander@purestorage.com \
    --cc=io-uring@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox