From: Pavel Begunkov <[email protected]>
To: Ming Lei <[email protected]>, Jens Axboe <[email protected]>
Cc: [email protected]
Subject: Re: [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt
Date: Tue, 25 Apr 2023 17:10:41 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 4/25/23 16:07, Ming Lei wrote:
> On Tue, Apr 25, 2023 at 08:50:33AM -0600, Jens Axboe wrote:
>> On 4/25/23 8:42?AM, Ming Lei wrote:
>>> On Tue, Apr 25, 2023 at 07:31:10AM -0600, Jens Axboe wrote:
>>>> On 4/24/23 8:50?PM, Ming Lei wrote:
>>>>> On Mon, Apr 24, 2023 at 08:18:02PM -0600, Jens Axboe wrote:
>>>>>> On 4/24/23 8:13?PM, Ming Lei wrote:
>>>>>>> On Mon, Apr 24, 2023 at 08:08:09PM -0600, Jens Axboe wrote:
>>>>>>>> On 4/24/23 6:57?PM, Ming Lei wrote:
>>>>>>>>> On Mon, Apr 24, 2023 at 09:24:33AM -0600, Jens Axboe wrote:
>>>>>>>>>> On 4/24/23 1:30?AM, Ming Lei wrote:
>>>>>>>>>>> On Thu, Apr 20, 2023 at 12:31:35PM -0600, Jens Axboe wrote:
>>>>>>>>>>>> Add an opdef bit for them, and set it for the opcodes where we always
>>>>>>>>>>>> need io-wq punt. With that done, exclude them from the file_can_poll()
>>>>>>>>>>>> check in terms of whether or not we need to punt them if any of the
>>>>>>>>>>>> NO_OFFLOAD flags are set.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>>>>>>>>> ---
>>>>>>>>>>>> io_uring/io_uring.c | 2 +-
>>>>>>>>>>>> io_uring/opdef.c | 22 ++++++++++++++++++++--
>>>>>>>>>>>> io_uring/opdef.h | 2 ++
>>>>>>>>>>>> 3 files changed, 23 insertions(+), 3 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>>>>>>>>>> index fee3e461e149..420cfd35ebc6 100644
>>>>>>>>>>>> --- a/io_uring/io_uring.c
>>>>>>>>>>>> +++ b/io_uring/io_uring.c
>>>>>>>>>>>> @@ -1948,7 +1948,7 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>>>>>>>>> return -EBADF;
>>>>>>>>>>>>
>>>>>>>>>>>> if (issue_flags & IO_URING_F_NO_OFFLOAD &&
>>>>>>>>>>>> - (!req->file || !file_can_poll(req->file)))
>>>>>>>>>>>> + (!req->file || !file_can_poll(req->file) || def->always_iowq))
>>>>>>>>>>>> issue_flags &= ~IO_URING_F_NONBLOCK;
>>>>>>>>>>>
>>>>>>>>>>> I guess the check should be !def->always_iowq?
>>>>>>>>>>
>>>>>>>>>> How so? Nobody that takes pollable files should/is setting
>>>>>>>>>> ->always_iowq. If we can poll the file, we should not force inline
>>>>>>>>>> submission. Basically the ones setting ->always_iowq always do -EAGAIN
>>>>>>>>>> returns if nonblock == true.
>>>>>>>>>
>>>>>>>>> I meant IO_URING_F_NONBLOCK is cleared here for ->always_iowq, and
>>>>>>>>> these OPs won't return -EAGAIN, then run in the current task context
>>>>>>>>> directly.
>>>>>>>>
>>>>>>>> Right, of IO_URING_F_NO_OFFLOAD is set, which is entirely the point of
>>>>>>>> it :-)
>>>>>>>
>>>>>>> But ->always_iowq isn't actually _always_ since fallocate/fsync/... are
>>>>>>> not punted to iowq in case of IO_URING_F_NO_OFFLOAD, looks the naming of
>>>>>>> ->always_iowq is a bit confusing?
>>>>>>
>>>>>> Yeah naming isn't that great, I can see how that's bit confusing. I'll
>>>>>> be happy to take suggestions on what would make it clearer.
>>>>>
>>>>> Except for the naming, I am also wondering why these ->always_iowq OPs
>>>>> aren't punted to iowq in case of IO_URING_F_NO_OFFLOAD, given it
>>>>> shouldn't improve performance by doing so because these OPs are supposed
>>>>> to be slow and always slept, not like others(buffered writes, ...),
>>>>> can you provide one hint about not offloading these OPs? Or is it just that
>>>>> NO_OFFLOAD needs to not offload every OPs?
>>>>
>>>> The whole point of NO_OFFLOAD is that items that would normally be
>>>> passed to io-wq are just run inline. This provides a way to reap the
>>>> benefits of batched submissions and syscall reductions. Some opcodes
>>>> will just never be async, and io-wq offloads are not very fast. Some of
>>>
>>> Yeah, seems io-wq is much slower than inline issue, maybe it needs
>>> to be looked into, and it is easy to run into io-wq for IOSQE_IO_LINK.
>>
>> Indeed, depending on what is being linked, you may see io-wq activity
>> which is not ideal.
>
> That is why I prefer to fused command for ublk zero copy, because the
> registering buffer approach suggested by Pavel and Ziyang has to link
> register buffer OP with the actual IO OP, and it is observed that
> IOPS drops to 1/2 in 4k random io test with registered buffer approach.
What's good about it is that you can use linked requests with it
but you _don't have to_.
Curiously, I just recently compared submitting 8 two-request links
(16 reqs in total) vs submit(8)+submit(8), all that in a loop.
The latter was faster. It wasn't a clean experiment, but shows
that links are not super fast and would be nice to get them better.
For the register buf approach, I tried it out, looked good to me.
It outperforms splice requests (with a hack that removes force
iowq execution) by 5-10% with synthetic benchmark. Works better than
splice(2) for QD>=2. Let me send it out, perhaps today, so we can
figure out how it compares against ublk/fused and see the margin is.
--
Pavel Begunkov
next prev parent reply other threads:[~2023-04-25 16:12 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-20 18:31 [PATCHSET v2 0/4] Enable NO_OFFLOAD support Jens Axboe
2023-04-20 18:31 ` [PATCH 1/4] io_uring: add support for NO_OFFLOAD Jens Axboe
2023-04-20 18:31 ` [PATCH 2/4] Revert "io_uring: always go async for unsupported fadvise flags" Jens Axboe
2023-04-20 18:31 ` [PATCH 3/4] Revert "io_uring: for requests that require async, force it" Jens Axboe
2023-04-20 18:31 ` [PATCH 4/4] io_uring: mark opcodes that always need io-wq punt Jens Axboe
2023-04-24 7:30 ` Ming Lei
2023-04-24 15:24 ` Jens Axboe
2023-04-25 0:57 ` Ming Lei
2023-04-25 2:08 ` Jens Axboe
2023-04-25 2:13 ` Ming Lei
2023-04-25 2:18 ` Jens Axboe
2023-04-25 2:50 ` Ming Lei
2023-04-25 13:31 ` Jens Axboe
2023-04-25 14:42 ` Ming Lei
2023-04-25 14:50 ` Jens Axboe
2023-04-25 15:07 ` Ming Lei
2023-04-25 15:25 ` Jens Axboe
2023-04-25 15:46 ` Pavel Begunkov
2023-04-26 3:25 ` Ming Lei
2023-04-26 4:28 ` Ming Lei
2023-04-26 1:43 ` Ming Lei
2023-04-25 16:10 ` Pavel Begunkov [this message]
2023-04-26 3:37 ` Ming Lei
2023-04-25 15:28 ` Pavel Begunkov
2023-04-30 13:34 ` Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox