From: Pavel Begunkov <[email protected]>
To: Dylan Yudaken <[email protected]>,
"[email protected]" <[email protected]>,
"[email protected]" <[email protected]>
Cc: Kernel Team <[email protected]>
Subject: Re: [PATCH for-next v3 4/7] io_uring: add IORING_SETUP_DEFER_TASKRUN
Date: Tue, 30 Aug 2022 11:29:37 +0100 [thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
On 8/30/22 10:54, Dylan Yudaken wrote:
> On Mon, 2022-08-22 at 12:34 +0100, Pavel Begunkov wrote:
[...]
>>> +
>>> + node = io_llist_cmpxchg(&ctx->work_llist, &fake, NULL);
>>> + if (node != &fake) {
>>> + current_final = &fake;
>>> + node = io_llist_xchg(&ctx->work_llist, &fake);
>>> + goto again;
>>> + }
>>> +
>>> + if (locked) {
>>> + io_submit_flush_completions(ctx);
>>> + mutex_unlock(&ctx->uring_lock);
>>> + }
>>> + return ret;
>>> +}
>>
>> I was thinking about:
>>
>> int io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
>> {
>> locked = try_lock();
>> }
>>
>> bool locked = false;
>> io_run_local_work(ctx, *locked);
>> if (locked)
>> unlock();
>>
>> // or just as below when already holding it
>> bool locked = true;
>> io_run_local_work(ctx, *locked);
>>
>> Which would replace
>>
>> if (DEFER) {
>> // we're assuming that it'll unlock
>> io_run_local_work(true);
>> } else {
>> unlock();
>> }
>>
>> with
>>
>> if (DEFER) {
>> bool locked = true;
>> io_run_local_work(&locked);
>> }
>> unlock();
>>
>> But anyway, it can be mulled later.
>
> I think there is an easier way to clean it up if we allow an extra
> unlock/lock in io_uring_enter (see below). Will do that in v4
fwiw, I'm fine with the current code, the rest can
be cleaned up later if you'd prefer so.
[...]
>>> @@ -3057,10 +3160,20 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
>>> int, fd, u32, to_submit,
>>> }
>>> if ((flags & IORING_ENTER_GETEVENTS) && ctx-
>>>> syscall_iopoll)
>>> goto iopoll_locked;
>>> + if ((flags & IORING_ENTER_GETEVENTS) &&
>>> + (ctx->flags & IORING_SETUP_DEFER_TASKRUN))
>>> {
>>> + int ret2 = io_run_local_work(ctx, true);
>>> +
>>> + if (unlikely(ret2 < 0))
>>> + goto out;
>>
>> It's an optimisation and we don't have to handle errors here,
>> let's ignore them and make it looking a bit better.
>
> I'm not convinced about that - as then there is no way the application
> will know it is trying to complete events on the wrong thread. Work
> will just silently pile up instead.
by optimisation I mean exactly this chunk right after submsission.
If it's a wrong thread this will be ignored, then control flow will
fall into cq_wait and then fail there returning an error. So, the
userspace should get an error in the end but the handling would be
consolidated in cq_wait.
> That being said - with the changes below I can just get rid of this
> code I think.
>
>>
>>> + goto getevents_ran_local;
>>> + }
>>> mutex_unlock(&ctx->uring_lock);
>>> }
>>> +
>>> if (flags & IORING_ENTER_GETEVENTS) {
>>> int ret2;
>>> +
>>> if (ctx->syscall_iopoll) {
>>> /*
>>> * We disallow the app entering
>>> submit/complete with
>>> @@ -3081,6 +3194,12 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
>>> int, fd, u32, to_submit,
>>> const sigset_t __user *sig;
>>> struct __kernel_timespec __user *ts;
>>>
>>> + if (ctx->flags &
>>> IORING_SETUP_DEFER_TASKRUN) {
>>
>> I think it should be in io_cqring_wait(), which calls it anyway
>> in the beginning. Instead of
>>
>> do {
>> io_cqring_overflow_flush(ctx);
>> if (io_cqring_events(ctx) >= min_events)
>> return 0;
>> if (!io_run_task_work())
>> break;
>> } while (1);
>>
>> Let's have
>>
>> do {
>> ret = io_run_task_work_ctx();
>> // handle ret
>> io_cqring_overflow_flush(ctx);
>> if (io_cqring_events(ctx) >= min_events)
>> return 0;
>> } while (1);
>
> I think that is ok.
> The downside is that it adds an extra lock/unlock of the ctx in some
> cases. I assume that will be neglegible?
Not sure there will be any extra locking. IIRC, it was about replacing
// io_uring_enter() -> GETEVENTS path
run_tw();
// io_cqring_wait()
while (cqes_ready() < needed)
run_tw();
With:
// io_uring_enter()
do {
run_tw();
} while(cqes_ready() < needed);
>>> + ret2 = io_run_local_work(ctx,
>>> false);
>>> + if (unlikely(ret2 < 0))
>>> + goto getevents_out;
>>> + }
>>> +getevents_ran_local:
>>> ret2 = io_get_ext_arg(flags, argp, &argsz,
>>> &ts, &sig);
>>> if (likely(!ret2)) {
>>> min_complete = min(min_complete,
>>> @@ -3090,6 +3209,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int,
>>> fd, u32, to_submit,
>>> }
>>> }
>>>
>>> +getevents_out:
>>> if (!ret) {
>>> ret = ret2;
>>>
>>> @@ -3289,17 +3409,29 @@ static __cold int io_uring_create(unsigned
>>> entries, struct io_uring_params *p,
>>> if (ctx->flags & IORING_SETUP_SQPOLL) {
>>> /* IPI related flags don't make sense with SQPOLL
>>> */
>>> if (ctx->flags & (IORING_SETUP_COOP_TASKRUN |
>>> - IORING_SETUP_TASKRUN_FLAG))
>>> + IORING_SETUP_TASKRUN_FLAG |
>>> + IORING_SETUP_DEFER_TASKRUN))
>>
>> Sounds like we should also fail if SQPOLL is set, especially with
>> the task check on the waiting side.
>>
>
> That is what this code is doing I think? Did I miss something?
Ok, great then
--
Pavel Begunkov
next prev parent reply other threads:[~2022-08-30 10:34 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-19 12:19 [PATCH for-next v3 0/7] io_uring: defer task work to when it is needed Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 1/7] io_uring: remove unnecessary variable Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 2/7] io_uring: introduce io_has_work Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 3/7] io_uring: do not run task work at the start of io_uring_enter Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 4/7] io_uring: add IORING_SETUP_DEFER_TASKRUN Dylan Yudaken
2022-08-22 11:34 ` Pavel Begunkov
2022-08-29 6:32 ` Hao Xu
2022-08-30 7:23 ` Dylan Yudaken
2022-08-30 7:54 ` Hao Xu
2022-08-30 9:54 ` Dylan Yudaken
2022-08-30 10:29 ` Pavel Begunkov [this message]
2022-08-30 13:19 ` Hao Xu
2022-08-30 13:34 ` Dylan Yudaken
2022-08-30 14:04 ` Hao Xu
2022-08-19 12:19 ` [PATCH for-next v3 5/7] io_uring: move io_eventfd_put Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 6/7] io_uring: signal registered eventfd to process deferred task work Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 7/7] io_uring: trace local task work run Dylan Yudaken
2022-08-29 7:01 ` [PATCH for-next v3 0/7] io_uring: defer task work to when it is needed Hao Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
[email protected] \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox