public inbox for [email protected]
 help / color / mirror / Atom feed
From: Pavel Begunkov <[email protected]>
To: Dylan Yudaken <[email protected]>,
	"[email protected]" <[email protected]>,
	"[email protected]" <[email protected]>
Cc: Kernel Team <[email protected]>
Subject: Re: [PATCH for-next v3 4/7] io_uring: add IORING_SETUP_DEFER_TASKRUN
Date: Tue, 30 Aug 2022 11:29:37 +0100	[thread overview]
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

On 8/30/22 10:54, Dylan Yudaken wrote:
> On Mon, 2022-08-22 at 12:34 +0100, Pavel Begunkov wrote:
[...]
>>> +
>>> +       node = io_llist_cmpxchg(&ctx->work_llist, &fake, NULL);
>>> +       if (node != &fake) {
>>> +               current_final = &fake;
>>> +               node = io_llist_xchg(&ctx->work_llist, &fake);
>>> +               goto again;
>>> +       }
>>> +
>>> +       if (locked) {
>>> +               io_submit_flush_completions(ctx);
>>> +               mutex_unlock(&ctx->uring_lock);
>>> +       }
>>> +       return ret;
>>> +}
>>
>> I was thinking about:
>>
>> int io_run_local_work(struct io_ring_ctx *ctx, bool *locked)
>> {
>>          locked = try_lock();
>> }
>>
>> bool locked = false;
>> io_run_local_work(ctx, *locked);
>> if (locked)
>>          unlock();
>>
>> // or just as below when already holding it
>> bool locked = true;
>> io_run_local_work(ctx, *locked);
>>
>> Which would replace
>>
>> if (DEFER) {
>>          // we're assuming that it'll unlock
>>          io_run_local_work(true);
>> } else {
>>          unlock();
>> }
>>
>> with
>>
>> if (DEFER) {
>>          bool locked = true;
>>          io_run_local_work(&locked);
>> }
>> unlock();
>>
>> But anyway, it can be mulled later.
> 
> I think there is an easier way to clean it up if we allow an extra
> unlock/lock in io_uring_enter (see below). Will do that in v4

fwiw, I'm fine with the current code, the rest can
be cleaned up later if you'd prefer so.

[...]
>>> @@ -3057,10 +3160,20 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
>>> int, fd, u32, to_submit,
>>>                  }
>>>                  if ((flags & IORING_ENTER_GETEVENTS) && ctx-
>>>> syscall_iopoll)
>>>                          goto iopoll_locked;
>>> +               if ((flags & IORING_ENTER_GETEVENTS) &&
>>> +                       (ctx->flags & IORING_SETUP_DEFER_TASKRUN))
>>> {
>>> +                       int ret2 = io_run_local_work(ctx, true);
>>> +
>>> +                       if (unlikely(ret2 < 0))
>>> +                               goto out;
>>
>> It's an optimisation and we don't have to handle errors here,
>> let's ignore them and make it looking a bit better.
> 
> I'm not convinced about that - as then there is no way the application
> will know it is trying to complete events on the wrong thread. Work
> will just silently pile up instead.

by optimisation I mean exactly this chunk right after submsission.
If it's a wrong thread this will be ignored, then control flow will
fall into cq_wait and then fail there returning an error. So, the
userspace should get an error in the end but the handling would be
consolidated in cq_wait.

> That being said - with the changes below I can just get rid of this
> code I think.
> 
>>
>>> +                       goto getevents_ran_local;
>>> +               }
>>>                  mutex_unlock(&ctx->uring_lock);
>>>          }
>>> +
>>>          if (flags & IORING_ENTER_GETEVENTS) {
>>>                  int ret2;
>>> +
>>>                  if (ctx->syscall_iopoll) {
>>>                          /*
>>>                           * We disallow the app entering
>>> submit/complete with
>>> @@ -3081,6 +3194,12 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned
>>> int, fd, u32, to_submit,
>>>                          const sigset_t __user *sig;
>>>                          struct __kernel_timespec __user *ts;
>>>    
>>> +                       if (ctx->flags &
>>> IORING_SETUP_DEFER_TASKRUN) {
>>
>> I think it should be in io_cqring_wait(), which calls it anyway
>> in the beginning. Instead of
>>
>>          do {
>>                  io_cqring_overflow_flush(ctx);
>>                  if (io_cqring_events(ctx) >= min_events)
>>                          return 0;
>>                  if (!io_run_task_work())
>>                          break;
>>          } while (1);
>>
>> Let's have
>>
>>          do {
>>                  ret = io_run_task_work_ctx();
>>                  // handle ret
>>                  io_cqring_overflow_flush(ctx);
>>                  if (io_cqring_events(ctx) >= min_events)
>>                          return 0;
>>          } while (1);
> 
> I think that is ok.
> The downside is that it adds an extra lock/unlock of the ctx in some
> cases. I assume that will be neglegible?

Not sure there will be any extra locking. IIRC, it was about replacing

// io_uring_enter() -> GETEVENTS path
run_tw();
// io_cqring_wait()
while (cqes_ready() < needed)
	run_tw();

With:

// io_uring_enter()
do {
	run_tw();
} while(cqes_ready() < needed);


>>> +                               ret2 = io_run_local_work(ctx,
>>> false);
>>> +                               if (unlikely(ret2 < 0))
>>> +                                       goto getevents_out;
>>> +                       }
>>> +getevents_ran_local:
>>>                          ret2 = io_get_ext_arg(flags, argp, &argsz,
>>> &ts, &sig);
>>>                          if (likely(!ret2)) {
>>>                                  min_complete = min(min_complete,
>>> @@ -3090,6 +3209,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int,
>>> fd, u32, to_submit,
>>>                          }
>>>                  }
>>>    
>>> +getevents_out:
>>>                  if (!ret) {
>>>                          ret = ret2;
>>>    
>>> @@ -3289,17 +3409,29 @@ static __cold int io_uring_create(unsigned
>>> entries, struct io_uring_params *p,
>>>          if (ctx->flags & IORING_SETUP_SQPOLL) {
>>>                  /* IPI related flags don't make sense with SQPOLL
>>> */
>>>                  if (ctx->flags & (IORING_SETUP_COOP_TASKRUN |
>>> -                                 IORING_SETUP_TASKRUN_FLAG))
>>> +                                 IORING_SETUP_TASKRUN_FLAG |
>>> +                                 IORING_SETUP_DEFER_TASKRUN))
>>
>> Sounds like we should also fail if SQPOLL is set, especially with
>> the task check on the waiting side.
>>
> 
> That is what this code is doing I think? Did I miss something?

Ok, great then

-- 
Pavel Begunkov

  reply	other threads:[~2022-08-30 10:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-19 12:19 [PATCH for-next v3 0/7] io_uring: defer task work to when it is needed Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 1/7] io_uring: remove unnecessary variable Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 2/7] io_uring: introduce io_has_work Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 3/7] io_uring: do not run task work at the start of io_uring_enter Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 4/7] io_uring: add IORING_SETUP_DEFER_TASKRUN Dylan Yudaken
2022-08-22 11:34   ` Pavel Begunkov
2022-08-29  6:32     ` Hao Xu
2022-08-30  7:23       ` Dylan Yudaken
2022-08-30  7:54         ` Hao Xu
2022-08-30  9:54     ` Dylan Yudaken
2022-08-30 10:29       ` Pavel Begunkov [this message]
2022-08-30 13:19   ` Hao Xu
2022-08-30 13:34     ` Dylan Yudaken
2022-08-30 14:04       ` Hao Xu
2022-08-19 12:19 ` [PATCH for-next v3 5/7] io_uring: move io_eventfd_put Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 6/7] io_uring: signal registered eventfd to process deferred task work Dylan Yudaken
2022-08-19 12:19 ` [PATCH for-next v3 7/7] io_uring: trace local task work run Dylan Yudaken
2022-08-29  7:01 ` [PATCH for-next v3 0/7] io_uring: defer task work to when it is needed Hao Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox