public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, Jann Horn <[email protected]>
Cc: [email protected],
	"[email protected]" <[email protected]>
Subject: Re: [RFC] io_uring CQ ring backpressure
Date: Thu, 7 Nov 2019 00:54:51 +0300	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>


[-- Attachment #1.1: Type: text/plain, Size: 4246 bytes --]

On 07/11/2019 00:31, Jens Axboe wrote:
> On 11/6/19 1:08 PM, Jens Axboe wrote:
>> On 11/6/19 12:51 PM, Jann Horn wrote:
>>> On Wed, Nov 6, 2019 at 5:23 PM Jens Axboe <[email protected]> wrote:
>>>> Currently we drop completion events, if the CQ ring is full. That's fine
>>>> for requests with bounded completion times, but it may make it harder to
>>>> use io_uring with networked IO where request completion times are
>>>> generally unbounded. Or with POLL, for example, which is also unbounded.
>>>>
>>>> This patch adds IORING_SETUP_CQ_NODROP, which changes the behavior a bit
>>>> for CQ ring overflows. First of all, it doesn't overflow the ring, it
>>>> simply stores backlog of completions that we weren't able to put into
>>>> the CQ ring. To prevent the backlog from growing indefinitely, if the
>>>> backlog is non-empty, we apply back pressure on IO submissions. Any
>>>> attempt to submit new IO with a non-empty backlog will get an -EBUSY
>>>> return from the kernel.
>>>>
>>>> I think that makes for a pretty sane API in terms of how the application
>>>> can handle it. With CQ_NODROP enabled, we'll never drop a completion
>>>> event (well unless we're totally out of memory...), but we'll also not
>>>> allow submissions with a completion backlog.
>>> [...]
>>>> +static void io_cqring_overflow(struct io_ring_ctx *ctx, u64 ki_user_data,
>>>> +                              long res)
>>>> +       __must_hold(&ctx->completion_lock)
>>>> +{
>>>> +       struct cqe_drop *drop;
>>>> +
>>>> +       if (!(ctx->flags & IORING_SETUP_CQ_NODROP)) {
>>>> +log_overflow:
>>>> +               WRITE_ONCE(ctx->rings->cq_overflow,
>>>> +                               atomic_inc_return(&ctx->cached_cq_overflow));
>>>> +               return;
>>>> +       }
>>>> +
>>>> +       drop = kmalloc(sizeof(*drop), GFP_ATOMIC);
>>>> +       if (!drop)
>>>> +               goto log_overflow;
>>>> +
>>>> +       drop->user_data = ki_user_data;
>>>> +       drop->res = res;
>>>> +       list_add_tail(&drop->list, &ctx->cq_overflow_list);
>>>> +}
>>>
>>> This could potentially consume moderately large amounts of atomic
>>> memory quickly and without any guarantee that the memory will be freed
>>> anytime soon, right? That seems moderately bad. Is there no way to
>>> e.g. pre-reserve memory for completion events, or something like that?
>>
>> As soon as there's even one entry in that backlog, the ring won't accept
>> anymore new IO. So I don't think it's a huge concern. If we pre-reserve,
>> we haven't really made much progress in making sure we don't drop events,
>> and we'll be tying up that memory all the time.
>>
>> The alternative, as Pavel also mentioned, is to re-use the io_kiocb
>> for this. But that'll tie up more memory, and it's a bit tricky with
>> the life times. Just because the request has completed doesn't mean
>> that someone isn't still holding a reference to it, and who knows
>> what they will do.
> 
> OK, I took a stab at it, here's a brain dump of the "complications"
> 
> 1) Some places now use __io_free_req() to drop both references, if we
>    know we haven't issued a request yet. Needs double drop, not a big
>    deal.
> 2) Some ordering changes between io_put_req() and the fill/add event
>    logic. Again not a huge deal, easy to spot.
> 3) We have one failure case that does not have a request, exactly because
>    we failed to allocate one. Don't look at that part in the below patch,
>    I think what we should do here is just reserve a request for that case.
>    It won't help with the submission, but it'll get it logged correctly
>    for the overflow backlog. Any new submission can't proceed with that
>    request in the overflow backlog anyway, so we need just the one.
>    Not super pretty, but at least we can keep this out of the fast path,
>    as the only one that will free this request is the overflow flush
>    path.
> 

2 (maybe partially) and 3 will hopefully be solved by the patchset
removing passing sqe_submit. I'll resend it in a minute.

> I'll do a prep patch that makes the fill/add event path deal in requests,
> then we can build the backpressure on top.
> 

-- 
Pavel Begunkov


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2019-11-06 21:55 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-06 16:21 [RFC] io_uring CQ ring backpressure Jens Axboe
2019-11-06 19:12 ` Pavel Begunkov
2019-11-06 19:43   ` Jens Axboe
2019-11-06 19:51 ` Jann Horn
2019-11-06 20:08   ` Jens Axboe
2019-11-06 21:31     ` Jens Axboe
2019-11-06 21:54       ` Pavel Begunkov [this message]
2019-11-06 21:56         ` Jens Axboe
2019-11-06 22:42       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox