public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>, [email protected]
Cc: [email protected]
Subject: Re: [PATCH 3/6] io_uring: add support for NO_OFFLOAD
Date: Thu, 20 Apr 2023 09:56:40 -0600	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 4/20/23 9:16?AM, Pavel Begunkov wrote:
> On 4/20/23 16:03, Jens Axboe wrote:
>> On 4/19/23 7:01?PM, Pavel Begunkov wrote:
>>> On 4/19/23 17:25, Jens Axboe wrote:
>>>> Some applications don't necessarily care about io_uring not blocking for
>>>> request issue, they simply want to use io_uring for batched submission
>>>> of IO. However, io_uring will always do non-blocking issues, and for
>>>> some request types, there's simply no support for doing non-blocking
>>>> issue and hence they get punted to io-wq unconditionally. If the
>>>> application doesn't care about issue potentially blocking, this causes
>>>> a performance slowdown as thread offload is not nearly as efficient as
>>>> inline issue.
>>>>
>>>> Add support for configuring the ring with IORING_SETUP_NO_OFFLOAD, and
>>>> add an IORING_ENTER_NO_OFFLOAD flag to io_uring_enter(2). If either one
>>>> of these is set, then io_uring will ignore the non-block issue attempt
>>>> for any file which we cannot poll for readiness. The simplified io_uring
>>>> issue model looks as follows:
>>>>
>>>> 1) Non-blocking issue is attempted for IO. If successful, we're done for
>>>>      now.
>>>>
>>>> 2) Case 1 failed. Now we have two options
>>>>         a) We can poll the file. We arm poll, and we're done for now
>>>>         until that triggers.
>>>>          b) File cannot be polled, we punt to io-wq which then does a
>>>>         blocking attempt.
>>>>
>>>> If either of the NO_OFFLOAD flags are set, we should never hit case
>>>> 2b. Instead, case 1 would issue the IO without the non-blocking flag
>>>> being set and perform an inline completion.
>>>>
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>> ---
>>>>    include/linux/io_uring_types.h |  3 +++
>>>>    include/uapi/linux/io_uring.h  |  7 +++++++
>>>>    io_uring/io_uring.c            | 26 ++++++++++++++++++++------
>>>>    io_uring/io_uring.h            |  2 +-
>>>>    io_uring/sqpoll.c              |  3 ++-
>>>>    5 files changed, 33 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>>>> index 4dd54d2173e1..c54f3fb7ab1a 100644
>>>> --- a/include/linux/io_uring_types.h
>>>> +++ b/include/linux/io_uring_types.h
>>>> @@ -403,6 +403,7 @@ enum {
>>>>        REQ_F_APOLL_MULTISHOT_BIT,
>>>>        REQ_F_CLEAR_POLLIN_BIT,
>>>>        REQ_F_HASH_LOCKED_BIT,
>>>> +    REQ_F_NO_OFFLOAD_BIT,
>>>>        /* keep async read/write and isreg together and in order */
>>>>        REQ_F_SUPPORT_NOWAIT_BIT,
>>>>        REQ_F_ISREG_BIT,
>>>> @@ -475,6 +476,8 @@ enum {
>>>>        REQ_F_CLEAR_POLLIN    = BIT_ULL(REQ_F_CLEAR_POLLIN_BIT),
>>>>        /* hashed into ->cancel_hash_locked, protected by ->uring_lock */
>>>>        REQ_F_HASH_LOCKED    = BIT_ULL(REQ_F_HASH_LOCKED_BIT),
>>>> +    /* don't offload to io-wq, issue blocking if needed */
>>>> +    REQ_F_NO_OFFLOAD    = BIT_ULL(REQ_F_NO_OFFLOAD_BIT),
>>>>    };
>>>>      typedef void (*io_req_tw_func_t)(struct io_kiocb *req, struct io_tw_state *ts);
>>>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
>>>> index 0716cb17e436..ea903a677ce9 100644
>>>> --- a/include/uapi/linux/io_uring.h
>>>> +++ b/include/uapi/linux/io_uring.h
>>>> @@ -173,6 +173,12 @@ enum {
>>>>     */
>>>>    #define IORING_SETUP_DEFER_TASKRUN    (1U << 13)
>>>>    +/*
>>>> + * Don't attempt non-blocking issue on file types that would otherwise
>>>> + * punt to io-wq if they cannot be completed non-blocking.
>>>> + */
>>>> +#define IORING_SETUP_NO_OFFLOAD        (1U << 14)
>>>> +
>>>>    enum io_uring_op {
>>>>        IORING_OP_NOP,
>>>>        IORING_OP_READV,
>>>> @@ -443,6 +449,7 @@ struct io_cqring_offsets {
>>>>    #define IORING_ENTER_SQ_WAIT        (1U << 2)
>>>>    #define IORING_ENTER_EXT_ARG        (1U << 3)
>>>>    #define IORING_ENTER_REGISTERED_RING    (1U << 4)
>>>> +#define IORING_ENTER_NO_OFFLOAD        (1U << 5)
>>>>      /*
>>>>     * Passed in for io_uring_setup(2). Copied back with updated info on success
>>>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>>>> index 9568b5e4cf87..04770b06de16 100644
>>>> --- a/io_uring/io_uring.c
>>>> +++ b/io_uring/io_uring.c
>>>> @@ -1947,6 +1947,10 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags)
>>>>        if (unlikely(!io_assign_file(req, def, issue_flags)))
>>>>            return -EBADF;
>>>>    +    if (req->flags & REQ_F_NO_OFFLOAD &&
>>>> +        (!req->file || !file_can_poll(req->file)))
>>>> +        issue_flags &= ~IO_URING_F_NONBLOCK;
>>>> +
>>>>        if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred()))
>>>>            creds = override_creds(req->creds);
>>>>    @@ -2337,7 +2341,7 @@ static __cold int io_submit_fail_init(const struct io_uring_sqe *sqe,
>>>>    }
>>>>      static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>>>> -             const struct io_uring_sqe *sqe)
>>>> +             const struct io_uring_sqe *sqe, bool no_offload)
>>>>        __must_hold(&ctx->uring_lock)
>>>>    {
>>>>        struct io_submit_link *link = &ctx->submit_state.link;
>>>> @@ -2385,6 +2389,9 @@ static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>>>>            return 0;
>>>>        }
>>>>    +    if (no_offload)
>>>> +        req->flags |= REQ_F_NO_OFFLOAD;
>>>
>>> Shouldn't it be a part of the initial "in syscall" submission
>>> but not extended to tw? I'd say it should, otherwise it risks
>>> making !DEFER_TASKRUN totally unpredictable. E.g. any syscall
>>> can try to execute tw and get stuck waiting in there.
>>
>> Yeah, it should probably be ignore outside of off io_uring_enter(2)
>> submissions. If we do that, we could drop the flag too (and the flags
>> extension).
> 
> issue_flags instead of req->flags might be a better place for it

For sure, if we're not carrying it in io_kiocb state, then an issue flag
is the way to go. FWIW, I already rebased and did that. And then I ran
the full test suite, with a modification to the queue init helpers that
sets IORING_SETUP_NO_OFFLOAD for all queue creations, except the ones
where IORING_SETUP_IOPOLL is set. Ran into one minor issue, which is
test/fallocate.c which will trigger SIGXFSZ for the process itself
(rather than io-wq, where it gets ignored) when exceeding the file size.
That's to be expected.

Outside of that, everything worked, nothing odd observed. Obviously not
a comprehensive test for potential issues, but it does show that we're
not THAT much in trouble here.

Didn't drop the uring_lock yet, outside of dropping the REQ_F_NO_OFFLOAD
flag and making it an issue flag, it's all pretty much the same as
before.

-- 
Jens Axboe


  reply	other threads:[~2023-04-20 15:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-19 16:25 [PATCHSET 0/6] Enable NO_OFFLOAD support Jens Axboe
2023-04-19 16:25 ` [PATCH 1/6] io_uring: grow struct io_kiocb 'flags' to a 64-bit value Jens Axboe
2023-04-19 16:25 ` [PATCH 2/6] io_uring: move poll_refs up a cacheline to fill a hole Jens Axboe
2023-04-20  0:50   ` Pavel Begunkov
2023-04-19 16:25 ` [PATCH 3/6] io_uring: add support for NO_OFFLOAD Jens Axboe
2023-04-20  1:01   ` Pavel Begunkov
2023-04-20 15:03     ` Jens Axboe
2023-04-20 15:16       ` Pavel Begunkov
2023-04-20 15:56         ` Jens Axboe [this message]
2023-04-19 16:25 ` [PATCH 4/6] Revert "io_uring: always go async for unsupported fadvise flags" Jens Axboe
2023-04-19 16:25 ` [PATCH 5/6] Revert "io_uring: for requests that require async, force it" Jens Axboe
2023-04-19 16:25 ` [PATCH 6/6] io_uring: mark opcodes that always need io-wq punt Jens Axboe
2023-04-20  0:43 ` [PATCHSET 0/6] Enable NO_OFFLOAD support Pavel Begunkov
2023-04-20 15:08   ` Jens Axboe
2023-04-20 15:28     ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox