public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Jens Axboe <[email protected]>, [email protected]
Cc: [email protected], Kanchan Joshi <[email protected]>,
	Ming Lei <[email protected]>
Subject: Re: [PATCH 11/11] io_uring: get rid of intermediate aux cqe caches
Date: Fri, 15 Mar 2024 18:51:24 +0000	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 3/15/24 18:26, Jens Axboe wrote:
> On 3/15/24 11:26 AM, Pavel Begunkov wrote:
>> On 3/15/24 16:49, Jens Axboe wrote:
>>> On 3/15/24 10:44 AM, Pavel Begunkov wrote:
>>>> On 3/15/24 16:27, Jens Axboe wrote:
>>>>> On 3/15/24 10:25 AM, Jens Axboe wrote:
>>>>>> On 3/15/24 10:23 AM, Pavel Begunkov wrote:
>>>>>>> On 3/15/24 16:20, Jens Axboe wrote:
>>>>>>>> On 3/15/24 9:30 AM, Pavel Begunkov wrote:
>>>>>>>>> io_post_aux_cqe(), which is used for multishot requests, delays
>>>>>>>>> completions by putting CQEs into a temporary array for the purpose
>>>>>>>>> completion lock/flush batching.
>>>>>>>>>
>>>>>>>>> DEFER_TASKRUN doesn't need any locking, so for it we can put completions
>>>>>>>>> directly into the CQ and defer post completion handling with a flag.
>>>>>>>>> That leaves !DEFER_TASKRUN, which is not that interesting / hot for
>>>>>>>>> multishot requests, so have conditional locking with deferred flush
>>>>>>>>> for them.
>>>>>>>>
>>>>>>>> This breaks the read-mshot test case, looking into what is going on
>>>>>>>> there.
>>>>>>>
>>>>>>> I forgot to mention, yes it does, the test makes odd assumptions about
>>>>>>> overflows, IIRC it expects that the kernel allows one and only one aux
>>>>>>> CQE to be overflown. Let me double check
>>>>>>
>>>>>> Yeah this is very possible, the overflow checking could be broken in
>>>>>> there. I'll poke at it and report back.
>>>>>
>>>>> It does, this should fix it:
>>>>>
>>>>>
>>>>> diff --git a/test/read-mshot.c b/test/read-mshot.c
>>>>> index 8fcb79857bf0..501ca69a98dc 100644
>>>>> --- a/test/read-mshot.c
>>>>> +++ b/test/read-mshot.c
>>>>> @@ -236,7 +236,7 @@ static int test(int first_good, int async, int overflow)
>>>>>             }
>>>>>             if (!(cqe->flags & IORING_CQE_F_MORE)) {
>>>>>                 /* we expect this on overflow */
>>>>> -            if (overflow && (i - 1 == NR_OVERFLOW))
>>>>> +            if (overflow && i >= NR_OVERFLOW)
>>>>
>>>> Which is not ideal either, e.g. I wouldn't mind if the kernel stops
>>>> one entry before CQ is full, so that the request can complete w/o
>>>> overflowing. Not supposing the change because it's a marginal
>>>> case, but we shouldn't limit ourselves.
>>>
>>> But if the event keeps triggering we have to keep posting CQEs,
>>> otherwise we could get stuck.
>>
>> Or we can complete the request, then the user consumes CQEs
>> and restarts as usual
> 
> So you'd want to track if we'd overflow, wait for overflow to clear, and
> then restart that request?

No, the 2 line change in io_post_cqe() from the last email's
snippet is the only thing you'd need.

I probably don't understand why and what tracking you mean, but
fwiw we currently do track and account for overflows.


/* For defered completions this is not as strict as it is otherwise,
  * however it's main job is to prevent unbounded posted completions,
  * and in that it works just as well.
  */
if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq))
	return false;


which is being killed in the series.

> I think that sounds a bit involved, no?
> Particularly for a case like overflow, which generally should not occur.
> If it does, just terminate it, and have the user re-issue it. That seems
> like the simpler and better solution to me.
> 
>>> As far as I'm concerned, the behavior with
>>> the patch looks correct. The last CQE is overflown, and that terminates
>>> it, and it doesn't have MORE set. The one before that has MORE set, but
>>> it has to, unless you aborted it early. But that seems impossible,
>>> because what if that was indeed the last current CQE, and we reap CQEs
>>> before the next one is posted.
>>>
>>> So unless I'm missing something, I don't think we can be doing any
>>> better.
>>
>> You can opportunistically try to avoid overflows, unreliably
>>
>> bool io_post_cqe() {
>>      // Not enough space in the CQ left, so if there is a next
>>      // completion pending we'd have to overflow. Avoid that by
>>      // terminating it now.
>>      //
>>      // If there are no more CQEs after this one, we might
>>      // terminate a bit earlier, but that better because
>>      // overflows are so expensive and unhandy and so on.
>>      if (cq_space_left() <= 1)
>>          return false;
>>      fill_cqe();
>>      return true;
>> }
>>
>> some_multishot_function(req) {
>>      if (!io_post_cqe(res))
>>          complete_req(req, res);
>> }
>>
>> Again, not suggesting the change for all the obvious reasons, but
>> I think semantically we should be able to do it.
> 
> Yeah not convinced this is worth looking at. If it was the case that the
> hot path would often see overflows and it'd help to avoid it, then
> probably it'd make sense. But I don't think that's the case.

We're talking about different things. Seems you're discussing a
particular implementation, its constraints and performance. I care
purely about the semantics, the implicit uapi. And I define it as
"multishot requests may decide to terminate at any point, the user
should expect it and reissue when appropriate", not restricting it
to "can only (normally) terminate when CQ is full".

We're changing tests from time to time, but the there is that
"behaviour defines semantics", especially when it wasn't clear
in advance and breaks someone's app, and people might be using
assumptions in tests as the universal truth.

-- 
Pavel Begunkov

  reply	other threads:[~2024-03-15 18:52 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-15 15:29 [PATCH 00/11] remove aux CQE caches Pavel Begunkov
2024-03-15 15:29 ` [PATCH 01/11] io_uring: fix poll_remove stalled req completion Pavel Begunkov
2024-03-15 15:29 ` [PATCH 02/11] io_uring/cmd: kill one issue_flags to tw conversion Pavel Begunkov
2024-03-15 15:29 ` [PATCH 03/11] io_uring/cmd: fix tw <-> issue_flags conversion Pavel Begunkov
2024-03-15 15:29 ` [PATCH 04/11] io_uring/cmd: introduce io_uring_cmd_complete Pavel Begunkov
2024-03-15 15:29 ` [PATCH 05/11] ublk: don't hard code IO_URING_F_UNLOCKED Pavel Begunkov
2024-03-15 15:29 ` [PATCH 06/11] nvme/io_uring: " Pavel Begunkov
2024-03-15 15:29 ` [PATCH 07/11] io_uring/rw: avoid punting to io-wq directly Pavel Begunkov
2024-03-15 15:29 ` [PATCH 08/11] io_uring: force tw ctx locking Pavel Begunkov
2024-03-15 15:40   ` Jens Axboe
2024-03-15 16:14     ` Pavel Begunkov
2024-03-15 15:29 ` [PATCH 09/11] io_uring: remove struct io_tw_state::locked Pavel Begunkov
2024-03-15 15:30 ` [PATCH 10/11] io_uring: refactor io_fill_cqe_req_aux Pavel Begunkov
2024-03-15 15:30 ` [PATCH 11/11] io_uring: get rid of intermediate aux cqe caches Pavel Begunkov
2024-03-15 16:20   ` Jens Axboe
2024-03-15 16:23     ` Pavel Begunkov
2024-03-15 16:25       ` Jens Axboe
2024-03-15 16:27         ` Jens Axboe
2024-03-15 16:44           ` Pavel Begunkov
2024-03-15 16:49             ` Jens Axboe
2024-03-15 17:26               ` Pavel Begunkov
2024-03-15 18:26                 ` Jens Axboe
2024-03-15 18:51                   ` Pavel Begunkov [this message]
2024-03-15 19:02                     ` Jens Axboe
2024-03-15 16:29         ` Pavel Begunkov
2024-03-15 16:33           ` Jens Axboe
2024-03-15 15:42 ` [PATCH 00/11] remove aux CQE caches Jens Axboe
2024-03-15 16:00 ` Jens Axboe
2024-03-15 22:53 ` (subset) " Jens Axboe
2024-03-16  2:03   ` Ming Lei
2024-03-16  2:24     ` Ming Lei
2024-03-16  2:54       ` Pavel Begunkov
2024-03-16  3:54         ` Ming Lei
2024-03-16  4:13           ` Pavel Begunkov
2024-03-16  4:20             ` Pavel Begunkov
2024-03-16  9:53               ` Ming Lei
2024-03-16 11:52   ` Ming Lei
2024-03-16 13:27     ` Pavel Begunkov
2024-03-16 13:56       ` Ming Lei
2024-03-17 20:55         ` Pavel Begunkov
2024-03-17 21:24           ` Jens Axboe
2024-03-17 21:29             ` Pavel Begunkov
2024-03-17 21:32               ` Jens Axboe
2024-03-17 21:34                 ` Pavel Begunkov
2024-03-17 21:47                   ` Pavel Begunkov
2024-03-17 21:51                     ` Jens Axboe
2024-03-17 22:07                       ` Jens Axboe
2024-03-17 22:24                         ` Jens Axboe
2024-03-18  0:15                           ` Ming Lei
2024-03-18  1:34                             ` Jens Axboe
2024-03-18  1:44                               ` Jens Axboe
2024-03-18  1:49                               ` Ming Lei
2024-03-17 23:16                       ` Pavel Begunkov
2024-03-16 14:39       ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox