Re: [PATCH 5/7] io_uring: add per-task callback handler

public inbox for [email protected]
 help / color / mirror / Atom feed

From: Jann Horn <[email protected]>
To: Jens Axboe <[email protected]>
Cc: io-uring <[email protected]>,
	Glauber Costa <[email protected]>,
	Peter Zijlstra <[email protected]>,
	Pavel Begunkov <[email protected]>
Subject: Re: [PATCH 5/7] io_uring: add per-task callback handler
Date: Sat, 22 Feb 2020 00:00:29 +0100	[thread overview]
Message-ID: <CAG48ez3rsk6TzF82Q0PvDDCRp6wfWWUn8bsSZ2+OB9FgSOGgsw@mail.gmail.com> (raw)
In-Reply-To: <CAG48ez2ErAJgEiPdkK+PeNBoHchVEkkw5674Wt2eSaNjqyZ98g@mail.gmail.com>

On Fri, Feb 21, 2020 at 11:56 PM Jann Horn <[email protected]> wrote:
> On Fri, Feb 21, 2020 at 10:46 PM Jens Axboe <[email protected]> wrote:
> > For poll requests, it's not uncommon to link a read (or write) after
> > the poll to execute immediately after the file is marked as ready.
> > Since the poll completion is called inside the waitqueue wake up handler,
> > we have to punt that linked request to async context. This slows down
> > the processing, and actually means it's faster to not use a link for this
> > use case.
> >
> > We also run into problems if the completion_lock is contended, as we're
> > doing a different lock ordering than the issue side is. Hence we have
> > to do trylock for completion, and if that fails, go async. Poll removal
> > needs to go async as well, for the same reason.
> >
> > eventfd notification needs special case as well, to avoid stack blowing
> > recursion or deadlocks.
> >
> > These are all deficiencies that were inherited from the aio poll
> > implementation, but I think we can do better. When a poll completes,
> > simply queue it up in the task poll list. When the task completes the
> > list, we can run dependent links inline as well. This means we never
> > have to go async, and we can remove a bunch of code associated with
> > that, and optimizations to try and make that run faster. The diffstat
> > speaks for itself.
> [...]
> > @@ -3637,8 +3587,8 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
> >  {
> >         struct io_kiocb *req = wait->private;
> >         struct io_poll_iocb *poll = &req->poll;
> > -       struct io_ring_ctx *ctx = req->ctx;
> >         __poll_t mask = key_to_poll(key);
> > +       struct task_struct *tsk;
> >
> >         /* for instances that support it check for an event match first: */
> >         if (mask && !(mask & poll->events))
> > @@ -3646,46 +3596,11 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
> >
> >         list_del_init(&poll->wait.entry);
> >
> [...]
> > +       tsk = req->task;
> > +       req->result = mask;
> > +       init_task_work(&req->task_work, io_poll_task_func);
> > +       task_work_add(tsk, &req->task_work, true);
> > +       wake_up_process(tsk);
> >         return 1;
> >  }
>
> Let's say userspace has some code like this:
>
> [prepare two uring requests: one POLL and a RECVMSG linked behind it]
> // submit requests
> io_uring_enter(uring_fd, 2, 0, 0, NULL, 0);
> // wait for something to happen, either a completion event from uring
> or input from stdin
> struct pollfd fds[] = {
>   { .fd = 0, .events = POLLIN },
>   { .fd = uring_fd, .events = POLLIN }
> };
> while (1) {
>   poll(fds, 2, -1);
>   if (fds[0].revents) {
>     [read stuff from stdin]
>   }
>   if (fds[1].revents) {
>     [fetch completions from shared memory]
>   }
> }
>
> If userspace has reached the poll() by the time the uring POLL op
> completes, I think you'll wake up the do_poll() loop while it is in
> poll_schedule_timeout(); then it will do another iteration, see that
> no signals are pending and none of the polled files have become ready,
> and go to sleep again. So things are stuck until the io_uring fd
> signals that it is ready.
>
> The options I see are:
>
>  - Tell the kernel to go through signal delivery code, which I think
> will cause the pending syscall to actually abort and return to
> userspace (which I think is kinda gross). You could maybe add a
> special case where that doesn't happen if the task is already in
> io_uring_enter() and waiting for CQ events.
>  - Forbid eventfd notifications, ensure that the ring's ->poll handler
> reports POLLIN when work items are pending for userspace, and then
> rely on the fact that those work items will be picked up when
> returning from the poll syscall. Unfortunately, this gets a bit messy
> when you're dealing with multiple threads that access the same ring,
> since then you'd have to ensure that *any* thread can pick up this
> work, and that that doesn't mismatch how the uring instance is shared
> between threads; but you could probably engineer your way around this.
> For userspace, this whole thing just means "POLLIN may be spurious".
>  - Like the previous item, except you tell userspace that if it gets
> POLLIN (or some special poll status like POLLRDBAND) and sees nothing
> in the completion queue, it should call io_uring_enter() to process
> the work. This addresses the submitter-is-not-completion-reaper
> scenario without having to add some weird version of task_work that
> will be processed by the first thread, but you'd get some extra
> syscalls.

... or I suppose you could punt to worker context if anyone uses the
ring's ->poll handler or has an eventfd registered, if you don't
expect high-performance users to do those things.

next prev parent reply	other threads:[~2020-02-21 23:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-21 21:45 [PATCHSET v2 0/7] io_uring: use polled async retry Jens Axboe
2020-02-21 21:46 ` [PATCH 1/7] io_uring: consider any io_read/write -EAGAIN as final Jens Axboe
2020-02-21 21:46 ` [PATCH 2/7] io_uring: io_accept() should hold on to submit reference on retry Jens Axboe
2020-02-21 21:46 ` [PATCH 3/7] task_work_run: don't take ->pi_lock unconditionally Jens Axboe
2020-02-21 21:46 ` [PATCH 4/7] io_uring: store io_kiocb in wait->private Jens Axboe
2020-02-21 21:46 ` [PATCH 5/7] io_uring: add per-task callback handler Jens Axboe
2020-02-21 22:56   ` Jann Horn
2020-02-21 23:00     ` Jann Horn [this message]
2020-02-22 14:41       ` Jens Axboe
2020-02-22 15:09         ` Jens Axboe
2020-02-22 18:49           ` Jens Axboe
2020-02-22 18:54             ` Jens Axboe
2020-02-23  3:30               ` Jens Axboe
2020-02-21 21:46 ` [PATCH 6/7] io_uring: mark requests that we can do poll async in io_op_defs Jens Axboe
2020-02-21 21:46 ` [PATCH 7/7] io_uring: use poll driven retry for files that support it Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez3rsk6TzF82Q0PvDDCRp6wfWWUn8bsSZ2+OB9FgSOGgsw@mail.gmail.com \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    [email protected] \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox