public inbox for [email protected]
 help / color / mirror / Atom feed
From: Jann Horn <[email protected]>
To: Jens Axboe <[email protected]>
Cc: io-uring <[email protected]>,
	Glauber Costa <[email protected]>,
	Peter Zijlstra <[email protected]>,
	Pavel Begunkov <[email protected]>
Subject: Re: [PATCH 5/7] io_uring: add per-task callback handler
Date: Fri, 21 Feb 2020 23:56:07 +0100	[thread overview]
Message-ID: <CAG48ez2ErAJgEiPdkK+PeNBoHchVEkkw5674Wt2eSaNjqyZ98g@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>

On Fri, Feb 21, 2020 at 10:46 PM Jens Axboe <[email protected]> wrote:
> For poll requests, it's not uncommon to link a read (or write) after
> the poll to execute immediately after the file is marked as ready.
> Since the poll completion is called inside the waitqueue wake up handler,
> we have to punt that linked request to async context. This slows down
> the processing, and actually means it's faster to not use a link for this
> use case.
>
> We also run into problems if the completion_lock is contended, as we're
> doing a different lock ordering than the issue side is. Hence we have
> to do trylock for completion, and if that fails, go async. Poll removal
> needs to go async as well, for the same reason.
>
> eventfd notification needs special case as well, to avoid stack blowing
> recursion or deadlocks.
>
> These are all deficiencies that were inherited from the aio poll
> implementation, but I think we can do better. When a poll completes,
> simply queue it up in the task poll list. When the task completes the
> list, we can run dependent links inline as well. This means we never
> have to go async, and we can remove a bunch of code associated with
> that, and optimizations to try and make that run faster. The diffstat
> speaks for itself.
[...]
> @@ -3637,8 +3587,8 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
>  {
>         struct io_kiocb *req = wait->private;
>         struct io_poll_iocb *poll = &req->poll;
> -       struct io_ring_ctx *ctx = req->ctx;
>         __poll_t mask = key_to_poll(key);
> +       struct task_struct *tsk;
>
>         /* for instances that support it check for an event match first: */
>         if (mask && !(mask & poll->events))
> @@ -3646,46 +3596,11 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
>
>         list_del_init(&poll->wait.entry);
>
[...]
> +       tsk = req->task;
> +       req->result = mask;
> +       init_task_work(&req->task_work, io_poll_task_func);
> +       task_work_add(tsk, &req->task_work, true);
> +       wake_up_process(tsk);
>         return 1;
>  }

Let's say userspace has some code like this:

[prepare two uring requests: one POLL and a RECVMSG linked behind it]
// submit requests
io_uring_enter(uring_fd, 2, 0, 0, NULL, 0);
// wait for something to happen, either a completion event from uring
or input from stdin
struct pollfd fds[] = {
  { .fd = 0, .events = POLLIN },
  { .fd = uring_fd, .events = POLLIN }
};
while (1) {
  poll(fds, 2, -1);
  if (fds[0].revents) {
    [read stuff from stdin]
  }
  if (fds[1].revents) {
    [fetch completions from shared memory]
  }
}

If userspace has reached the poll() by the time the uring POLL op
completes, I think you'll wake up the do_poll() loop while it is in
poll_schedule_timeout(); then it will do another iteration, see that
no signals are pending and none of the polled files have become ready,
and go to sleep again. So things are stuck until the io_uring fd
signals that it is ready.

The options I see are:

 - Tell the kernel to go through signal delivery code, which I think
will cause the pending syscall to actually abort and return to
userspace (which I think is kinda gross). You could maybe add a
special case where that doesn't happen if the task is already in
io_uring_enter() and waiting for CQ events.
 - Forbid eventfd notifications, ensure that the ring's ->poll handler
reports POLLIN when work items are pending for userspace, and then
rely on the fact that those work items will be picked up when
returning from the poll syscall. Unfortunately, this gets a bit messy
when you're dealing with multiple threads that access the same ring,
since then you'd have to ensure that *any* thread can pick up this
work, and that that doesn't mismatch how the uring instance is shared
between threads; but you could probably engineer your way around this.
For userspace, this whole thing just means "POLLIN may be spurious".
 - Like the previous item, except you tell userspace that if it gets
POLLIN (or some special poll status like POLLRDBAND) and sees nothing
in the completion queue, it should call io_uring_enter() to process
the work. This addresses the submitter-is-not-completion-reaper
scenario without having to add some weird version of task_work that
will be processed by the first thread, but you'd get some extra
syscalls.

  reply	other threads:[~2020-02-21 22:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21 21:45 [PATCHSET v2 0/7] io_uring: use polled async retry Jens Axboe
2020-02-21 21:46 ` [PATCH 1/7] io_uring: consider any io_read/write -EAGAIN as final Jens Axboe
2020-02-21 21:46 ` [PATCH 2/7] io_uring: io_accept() should hold on to submit reference on retry Jens Axboe
2020-02-21 21:46 ` [PATCH 3/7] task_work_run: don't take ->pi_lock unconditionally Jens Axboe
2020-02-21 21:46 ` [PATCH 4/7] io_uring: store io_kiocb in wait->private Jens Axboe
2020-02-21 21:46 ` [PATCH 5/7] io_uring: add per-task callback handler Jens Axboe
2020-02-21 22:56   ` Jann Horn [this message]
2020-02-21 23:00     ` Jann Horn
2020-02-22 14:41       ` Jens Axboe
2020-02-22 15:09         ` Jens Axboe
2020-02-22 18:49           ` Jens Axboe
2020-02-22 18:54             ` Jens Axboe
2020-02-23  3:30               ` Jens Axboe
2020-02-21 21:46 ` [PATCH 6/7] io_uring: mark requests that we can do poll async in io_op_defs Jens Axboe
2020-02-21 21:46 ` [PATCH 7/7] io_uring: use poll driven retry for files that support it Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez2ErAJgEiPdkK+PeNBoHchVEkkw5674Wt2eSaNjqyZ98g@mail.gmail.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox