From: Jens Axboe <[email protected]>
To: Pavel Begunkov <[email protected]>,
David Wei <[email protected]>,
[email protected]
Subject: Re: [PATCH next v1 2/2] io_uring: limit local tw done
Date: Thu, 21 Nov 2024 07:34:57 -0700 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 11/21/24 7:29 AM, Pavel Begunkov wrote:
> On 11/21/24 00:52, David Wei wrote:
>> On 2024-11-20 15:56, Pavel Begunkov wrote:
>>> On 11/20/24 22:14, David Wei wrote:
> ...
>>> One thing that is not so nice is that now we have this handling and
>>> checks in the hot path, and __io_run_local_work_loop() most likely
>>> gets uninlined.
>>>
>>> I wonder, can we just requeue it via task_work again? We can even
>>> add a variant efficiently adding a list instead of a single entry,
>>> i.e. local_task_work_add(head, tail, ...);
>>
>> That was an early idea, but it means re-reversing the list and then
>> atomically adding each node back to work_llist concurrently with e.g.
>> io_req_local_work_add().
>>
>> Using a separate retry_llist means we don't need to concurrently add to
>> either retry_llist or work_llist.
>>
>>>
>>> I'm also curious what's the use case you've got that is hitting
>>> the problem?
>>>
>>
>> There is a Memcache-like workload that has load shedding based on the
>> time spent doing work. With epoll, the work of reading sockets and
>> processing a request is done by user, which can decide after some amount
>> of time to drop the remaining work if it takes too long. With io_uring,
>> the work of reading sockets is done eagerly inside of task work. If
>> there is a burst of work, then so much time is spent in task work
>> reading from sockets that, by the time control returns to user the
>> timeout has already elapsed.
>
> Interesting, it also sounds like instead of an arbitrary 20 we
> might want the user to feed it to us. Might be easier to do it
> with the bpf toy not to carve another argument.
David and I did discuss that, and I was not in favor of having an extra
argument. We really just need some kind of limit to prevent it
over-running. Arguably that should always be min_events, which we
already have, but that kind of runs afoul of applications just doing
io_uring_wait_cqe() and hence asking for 1. That's why the hand wavy
number exists, which is really no different than other hand wavy numbers
we have to limit running of "something" - eg other kinds of retries.
Adding another argument to this just again doubles wait logic complexity
in terms of the API. If it's needed down the line for whatever reason,
then yeah we can certainly do it, probably via the wait regions. But
adding it to the generic wait path would be a mistake imho.
I also strongly suggest this is the last we'll ever hear of this, and
for that reason alone I don't think it's worth any kind of extra
arguments or added complexity.
--
Jens Axboe
next prev parent reply other threads:[~2024-11-21 14:35 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-20 22:14 [PATCH next v1 0/2] limit local tw done David Wei
2024-11-20 22:14 ` [PATCH next v1 1/2] io_uring: add io_local_work_pending() David Wei
2024-11-20 23:45 ` Pavel Begunkov
2024-11-20 22:14 ` [PATCH next v1 2/2] io_uring: limit local tw done David Wei
2024-11-20 23:56 ` Pavel Begunkov
2024-11-21 0:52 ` David Wei
2024-11-21 14:29 ` Pavel Begunkov
2024-11-21 14:34 ` Jens Axboe [this message]
2024-11-21 14:58 ` Pavel Begunkov
2024-11-21 15:02 ` Jens Axboe
2024-11-21 1:12 ` Jens Axboe
2024-11-21 14:25 ` Pavel Begunkov
2024-11-21 14:31 ` Jens Axboe
2024-11-21 15:07 ` Pavel Begunkov
2024-11-21 15:15 ` Jens Axboe
2024-11-21 15:22 ` Jens Axboe
2024-11-21 16:00 ` Pavel Begunkov
2024-11-21 16:05 ` Jens Axboe
2024-11-21 16:18 ` Pavel Begunkov
2024-11-21 16:20 ` Jens Axboe
2024-11-21 16:43 ` Pavel Begunkov
2024-11-21 16:57 ` Jens Axboe
2024-11-21 17:05 ` Jens Axboe
2024-11-21 17:53 ` David Wei
2024-11-21 1:12 ` [PATCH next v1 0/2] " Jens Axboe
2024-11-21 14:16 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox