public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, Ming Lei <[email protected]>
Cc: [email protected]
Subject: Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
Date: Tue, 25 Apr 2023 16:46:03 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 4/25/23 16:25, Jens Axboe wrote:
> On 4/25/23 9:07?AM, Ming Lei wrote:
>> On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
>>> On 4/25/23 8:42?AM, Ming Lei wrote:
>>>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
>>>>> On 4/24/23 8:50?PM, Ming Lei wrote:
>>>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>>>>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>   io_uring/io_uring.c |  2 +-
>>>>>>>>>>>>>   io_uring/opdef.c    | 22 ++++++++++++++++++++--
>>>>>>>>>>>>>   io_uring/opdef.h    |  2 ++
>>>>>>>>>>>>>   3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>>>>>>   		return -EBADF;
>>>>>>>>>>>>>   
>>>>>>>>>>>>>   	if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>>>>>>> -	    (!req->file || !file_can_poll(req->file)))
>>>>>>>>>>>>> +	    (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>>>>>>>>   		issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>>>>>>
>>>>>>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>>>>>>
>>>>>>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>>>>>>>>> returns if nonblock == true.
>>>>>>>>>>
>>>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for  ->always_iowq, and
>>>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context
>>>>>>>>>> directly.
>>>>>>>>>
>>>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>>>>>>>>> it :-)
>>>>>>>>
>>>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
>>>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
>>>>>>>> ->always_iowq is a bit confusing?
>>>>>>>
>>>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
>>>>>>> be happy to take suggestions on what would make it clearer.
>>>>>>
>>>>>> Except for the naming, I am also wondering why these ->always_iowq OPs
>>>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
>>>>>> shouldn't improve performance by doing so because these OPs are supposed
>>>>>> to be slow and always slept, not like others(buffered writes, ...),
>>>>>> can you provide one hint about not offloading these OPs? Or is it just that
>>>>>> NO_OFFLOAD needs to not offload every OPs?
>>>>>
>>>>> The whole point of NO_OFFLOAD is that items that would normally be
>>>>> passed to io-wq are just run inline. This provides a way to reap the
>>>>> benefits of batched submissions and syscall reductions. Some opcodes
>>>>> will just never be async, and io-wq offloads are not very fast. Some of
>>>>
>>>> Yeah, seems io-wq is much slower than inline issue, maybe it needs
>>>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
>>>
>>> Indeed, depending on what is being linked, you may see io-wq activity
>>> which is not ideal.
>>
>> That is why I prefer to fused command for ublk zero copy, because the
>> registering buffer approach suggested by Pavel and Ziyang has to link
>> register buffer OP with the actual IO OP, and it is observed that
>> IOPS drops to 1/2 in 4k random io test with registered buffer approach.
> 
> It'd be worth looking into if we can avoid io-wq for link execution, as
> that'd be a nice win overall too. IIRC, there's no reason why it can't
> be done like initial issue rather than just a lazy punt to io-wq.

I might've missed a part of the discussion, but links are _usually_
executed by task_work, e.g.

io_submit_flush_completions() -> io_queue_next() -> io_req_task_queue()

There is one optimisation where if we're already in io-wq, it'll try
to serve the next linked request by the same io-wq worker with no
overhead on requeueing, but otherwise it'll only get there if
the request can't be executed nowait / async as per usual rules.

The tw execution part might be further optimised, it can be executed
almost in place instead of queueing a tw. It saved quite a lot of CPU
when I tried it with BPF requests.

-- 
Pavel Begunkov

  reply	other threads:[~2023-04-25 15:47 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-20 18:31 [PATCHSET v2 0/4] Enable NO_OFFLOAD support Jens Axboe
2023-04-20 18:31 ` [PATCH 1/4] io_uring: add support for NO_OFFLOAD Jens Axboe
2023-04-20 18:31 ` [PATCH 2/4] Revert "io_uring: always go async for unsupported fadvise flags" Jens Axboe
2023-04-20 18:31 ` [PATCH 3/4] Revert "io_uring: for requests that require async, force it" Jens Axboe
2023-04-20 18:31 ` [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt Jens Axboe
2023-04-24  7:30   ` Ming Lei
2023-04-24 15:24     ` Jens Axboe
2023-04-25  0:57       ` Ming Lei
2023-04-25  2:08         ` Jens Axboe
2023-04-25  2:13           ` Ming Lei
2023-04-25  2:18             ` Jens Axboe
2023-04-25  2:50               ` Ming Lei
2023-04-25 13:31                 ` Jens Axboe
2023-04-25 14:42                   ` Ming Lei
2023-04-25 14:50                     ` Jens Axboe
2023-04-25 15:07                       ` Ming Lei
2023-04-25 15:25                         ` Jens Axboe
2023-04-25 15:46                           ` Pavel Begunkov [this message]
2023-04-26  3:25                             ` Ming Lei
2023-04-26  4:28                               ` Ming Lei
2023-04-26  1:43                           ` Ming Lei
2023-04-25 16:10                         ` Pavel Begunkov
2023-04-26  3:37                           ` Ming Lei
2023-04-25 15:28                     ` Pavel Begunkov
2023-04-30 13:34                       ` Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox